Hard Choices in Scien - Simon Fraser Universityoschulte/files/pubs/schulte_phd.pdf · Chapter 1...
Transcript of Hard Choices in Scien - Simon Fraser Universityoschulte/files/pubs/schulte_phd.pdf · Chapter 1...
Hard Choices in Scienti�c Inquiry
Oliver Schulte
Department of Philosophy
Carnegie Mellon University
December 12, 1997
Contents
1 Induction: The Problem and How To Solve It 7
1.1 The Problem of Induction . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Hypothetical Imperatives for Inductive Inference . . . . . . . . . 9
1.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.3.1 Means-Ends Vindications of Traditional Proposals . . . . 11
1.3.2 Novel Solutions to Traditional Problems . . . . . . . . . . 11
1.3.3 New Questions and Answers . . . . . . . . . . . . . . . . . 12
1.3.4 Analysis of Inductive Problems from Scienti�c Practice . 14
1.3.5 Rational Choice in Games . . . . . . . . . . . . . . . . . . 14
1.4 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2 A Model of Scienti�c Inquiry 17
2.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.2 A Model of Scienti�c Inquiry . . . . . . . . . . . . . . . . . . . . 18
2.3 Examples of Inductive Problems and Scienti�c Methods . . . . . 26
2.3.1 Universal Generalizations . . . . . . . . . . . . . . . . . . 26
2.3.2 Almost Universal Generalizations . . . . . . . . . . . . . . 28
2.3.3 Goodman's Riddle of Induction . . . . . . . . . . . . . . . 30
2.3.4 Identifying Limiting Relative Frequencies . . . . . . . . . 33
2.3.5 Cognitive Science and the Physical Symbol System Hy-
pothesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3.6 Theories of Particle Physics . . . . . . . . . . . . . . . . . 37
2.4 Inquiry, Belief and Action . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Revising Background Knowledge . . . . . . . . . . . . . . . . . . 43
3 Truth,Content and Minimal Change 45
3.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.2 Dominance in Error and in Content . . . . . . . . . . . . . . . . 47
3.3 Dominance Principles and Minimal Theory Change . . . . . . . . 49
3.4 Is \Minimal Change" Belief Revision Minimal? . . . . . . . . . . 53
3.5 Empirical Inquiry and Belief Revision . . . . . . . . . . . . . . . 58
3.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
1
4 Discovery Problems and Reliable Solutions 63
4.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
4.2 Convergence to the Truth . . . . . . . . . . . . . . . . . . . . . . 64
4.3 Reliable Solutions for Discovery Problems . . . . . . . . . . . . . 66
4.4 Testing and Topology . . . . . . . . . . . . . . . . . . . . . . . . 70
4.5 Against Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
4.6 Contra Convergence . . . . . . . . . . . . . . . . . . . . . . . . . 78
4.7 There's No Reliable Method|So What? . . . . . . . . . . . . . . 82
4.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
5 Reliable Inference 85
5.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
5.2 Reliable Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
5.3 Popper, Levi and Deductivism . . . . . . . . . . . . . . . . . . . 89
5.3.1 A New and Improved Falsi�cationism . . . . . . . . . . . 89
5.3.2 A Reliable Enterprise of Knowledge . . . . . . . . . . . . 90
5.4 Gettier meets Meno . . . . . . . . . . . . . . . . . . . . . . . . . 91
5.5 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6 Fast and Steadfast Inquiry 96
6.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.2 Data-Minimal Methods . . . . . . . . . . . . . . . . . . . . . . . 97
6.3 Retractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.4 Minimaxing Retractions . . . . . . . . . . . . . . . . . . . . . . . 103
6.5 A Characterization of Discovery With Bounded Mind Changes . 109
6.6 The Hierarchy of Cognitive Goals . . . . . . . . . . . . . . . . . . 114
6.7 Data-Minimality vs. Minimaxing Retractions . . . . . . . . . . . 115
6.8 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
7 Theory Discovery 127
7.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
7.2 Reliable Theory Inference . . . . . . . . . . . . . . . . . . . . . . 128
7.3 Uniform Theory Discovery . . . . . . . . . . . . . . . . . . . . . . 130
7.4 Piecemeal Theory Discovery . . . . . . . . . . . . . . . . . . . . . 134
7.5 Countable Hypothesis Languages and Finite Axiomatizability . . 136
7.6 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8 Reliable Theory Discovery in Particle Physics 140
8.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.2 Elementary Particles and Reactions . . . . . . . . . . . . . . . . . 142
8.3 Evidence in Particle Physics . . . . . . . . . . . . . . . . . . . . . 144
8.4 What Particles Are There? . . . . . . . . . . . . . . . . . . . . . 146
8.5 Identifying Subnuclear Reactions . . . . . . . . . . . . . . . . . . 148
8.6 Conservation Laws in Particle Physics . . . . . . . . . . . . . . . 150
2
8.7 Inferring Conservation Theories Without Virtual Particles . . . . 154
8.8 Inferring Conservation Theories With Virtual Particles . . . . . . 157
8.9 Parsimony, Conservatism and the Number of Quantum Properties 163
8.10 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 166
9 Admissibility in Games 171
9.1 Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
9.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172
9.2.1 Extensive and Strategic Form Games . . . . . . . . . . . . 172
9.2.2 Restricted Game Trees . . . . . . . . . . . . . . . . . . . . 173
9.3 Admissibility in Games . . . . . . . . . . . . . . . . . . . . . . . 174
9.4 Iterated Admissibility . . . . . . . . . . . . . . . . . . . . . . . . 177
9.5 Strict Dominance and Backward Induction . . . . . . . . . . . . 182
9.6 Weak Dominance and Forward Induction . . . . . . . . . . . . . 187
9.7 Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191
10 Conclusion 197
3
List of Figures
1.1 Categorical vs. Hypothetical Imperatives . . . . . . . . . . . . . 9
1.2 Performance Standard for Inductive Methods = Evaluation Cri-
terion + Cognitive Value . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 The Hierarchy of Cognitive Goals . . . . . . . . . . . . . . . . . . 13
2.1 Possible Worlds and Propositions . . . . . . . . . . . . . . . . . . 19
2.2 Data Stream � . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 The Observations that may arise in a given World . . . . . . . . 22
2.4 Global Underdetermination of the question \is there a black swan"? 23
2.5 Empirical Hypotheses and Entailment . . . . . . . . . . . . . . . 25
2.6 An Inductive Method . . . . . . . . . . . . . . . . . . . . . . . . 26
2.7 A Universal Generalization . . . . . . . . . . . . . . . . . . . . . 27
2.8 Almost Universal Generalizations with Finitely Many Exceptions 29
2.9 The New Riddle of Induction . . . . . . . . . . . . . . . . . . . . 32
2.10 The limit of the relative frequencies is 1/2. . . . . . . . . . . . . 34
2.11 How many particles are there? . . . . . . . . . . . . . . . . . . . 38
3.1 Content vs. Error . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
3.2 Pareto-Minimal Theory Changes that avoid Additions and Re-
tractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
3.3 Pareto-Minimal Theory Changes and the AGM Axioms . . . . . 54
3.4 Three Notions of Minimal Theory Change . . . . . . . . . . . . . 57
4.1 Successful Discovery: On data stream �, method � identi�es the
correct hypothesis from a set of alternatives. . . . . . . . . . . . . 65
4.2 Testing Empirical Hypotheses: Decision in the Limit of Inquiry . 68
4.3 The Projection Set of a Discovery Method � . . . . . . . . . . . . 71
4.4 Method � veri�es hypothesis H in the limit. . . . . . . . . . . . . 73
4.5 Data stream � is a limit point of hypothesis H. . . . . . . . . . . 75
5.1 Decomposing Hypotheses Into Refutable Subsets. . . . . . . . . . 87
5.2 The Bumping Pointer Method . . . . . . . . . . . . . . . . . . . . 88
4
5.3 Given the observations from data stream �, a connectionist model
N is the true theory of machine intelligence, but the production
systems approach is never conclusively refuted along �. . . . . . . 93
6.1 A data-minimal method must project its conjectures: �0 domi-
nates � with respect to convergence time. . . . . . . . . . . . . . 98
6.2 Method � always projects its current conjecture and hence is data-
minimal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.3 In Search of a Neutrino. . . . . . . . . . . . . . . . . . . . . . . . 104
6.4 The Riddle of Induction. . . . . . . . . . . . . . . . . . . . . . . . 106
6.5 Another Riddle of Induction. . . . . . . . . . . . . . . . . . . . . 108
6.6 \Feather" Structures characterize Discovery with Bounded Mind
Changes. The �gure illustrates 0-feathers and 1-feathers. . . . . . 111
6.7 2-feathers and 3-feathers . . . . . . . . . . . . . . . . . . . . . . . 112
6.8 Minimaxing Retractions requires waiting until time n. . . . . . . 113
6.9 The Hierarchy of Cognitive Goals. . . . . . . . . . . . . . . . . . 116
7.1 Two Notions of Theory Discovery: (a) Uniform Theory Discovery
(b) Piecemeal Theory Discovery . . . . . . . . . . . . . . . . . . . 129
7.2 Method � projects both Hr and H2 along some data stream, but
not both on the same data stream. . . . . . . . . . . . . . . . . . 131
7.3 Method �0 changes its overall theory three times, its conjecture
about H!1 twice, about H1 once. . . . . . . . . . . . . . . . . . . 133
8.1 A Particle World and the Particles in it. . . . . . . . . . . . . . . 144
8.2 The Evidence that may arise in a Particle World . . . . . . . . . 147
8.3 Does reaction r occur? . . . . . . . . . . . . . . . . . . . . . . . . 149
8.4 A Set of Reactions, encoded as Vectors with associated Linear
Equations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
8.5 Track of the Decay of a Pion into a Muon . . . . . . . . . . . . . 158
9.1 Admissibility in a Game of Perfect Information. The label inside
a nodes indicates which player is choosing at that node. . . . . . 175
9.2 A Game Without Perfect Recall . . . . . . . . . . . . . . . . . . 178
9.3 Weak Admissibility . . . . . . . . . . . . . . . . . . . . . . . . . . 179
9.4 Order-Free Elimination of weakly Dominated Strategies . . . . . 181
9.5 A Game of Perfect Information . . . . . . . . . . . . . . . . . . . 184
9.6 Subgame Perfection vs. Weak Admissibility . . . . . . . . . . . . 186
9.7 Backward vs. Forward Induction Principles . . . . . . . . . . . . 189
5
List of Tables
8.1 Some Elementary Particles . . . . . . . . . . . . . . . . . . . . . 143
8.2 Quantum Number Assignments . . . . . . . . . . . . . . . . . . 151
6
Chapter 1
Induction: The Problem
and How To Solve It
1.1 The Problem of Induction
Much of epistemology and the philosophy of science is a set of responses to the
problem of induction|the observation that no matter how much evidence we
have accumulated, the very next datum might refute our generalizations. This
observation led Hume to the conclusion that, although we may be in the habit ofpreferring some inferences over others, one generalization is as good as another
[Hume 1984]. Others maintain that although we may not be certain that any
generalization is correct, still some inferences are better than others: There are
some conclusions that an inquirer ought to draw, whether or not her psychologyinclines her towards them. For centuries philosophers have sought to articu-
late principles that lead to the right inferences. Statisticians and researchers
interested in algorithms for machine learning have joined in this enterprise.
What grounds can we give for the claim that some inferences are the right
ones? Kant referred to rules for what one ought to do as imperatives, and drew
a fundamental distinction between two kinds: categorical imperatives and hy-
pothetical imperatives [Kant 1785]. Categorical imperatives are rules that
constrain a person's choices no matter what her interests or abilities are. An
example of a categorical imperative for epistemic agents, to which many philoso-
phers subscribe, is that they ought to have consistent beliefs. Another example
are \postulates of rationality" such as the constraints on \rational" decision
making that Savage stipulates [Savage 1954]. Hypothetical imperatives are
rules of \sagacity"|Klugheit|as Kant said, which guide an agent as to how
he should go about attaining a given goal. Thus hypothetical imperatives are
of the form: \Action A will bring about consequence C: Therefore, if you want
C (under the hypothesis that you want C), you should do A." Hypothetical
7
imperatives form the core of instrumental , or means-ends rationality: On
this conception, rational agents are those who choose the best means for accom-
plishing their aims.
Students of inductive inference have o�ered both categorical and hypothet-
ical imperatives for principles of induction. The main mode of justi�cation for
categorical imperatives is to defend them because they are \intuitively plau-
sible", or because they agree with \exemplary scienti�c practice". If a di�-
cult case turns up where the categorical imperative appears to give the wrong
guidance|as judged by intuition|the champion of \pure rationality" adjusts
his principles to accommodate the di�cult case, hoping that one day he will
reach \re ective equilibrium", a point at which his principles of \pure rational-
ity" pass the tribunal of intuition. 1
For a proponent of \pure inductive rationality", the basic unit of analysis is
the methodological maxim. She applies the maxim to various inductive prob-
lems to see if it gives the \intuitively plausible" answer, tallying up \successes"
and \failures" (e.g., [Earman 1992]). For means-ends rationality, the basic unit
of analysis is the inductive or scienti�c problem. No general \principle of ra-
tionality" can beat the trivial advice \choose the best method for the problem
at hand". The proponent of instrumental rationality casts a critical eye on al-
legedly general principles of inductive rationality: does the principle in question
help the agent to achieve the aims of inquiry? Or does it prevent the agent
from doing so? As William James said, \a rule that would prevent me from
acknowledging certain kinds of truth if those kinds of truth were really there,
would be an irrational rule" [James 1982, Sec.X, p.206]. 2
Figure 1.1 summarizes the main contrasts between these two perspectives
on the problem of induction.
In philosophy, the tradition of con�rmation theory seeks categorical impera-
tives for inductive inference (e.g., [Carnap 1962], [Glymour 1980]). The means-
ends approach is the normative perspective underlying Classical Statistics and
formal learning theory. Formal learning theory examines what inductive
methods are reliable means for arriving at a true theory in the limit of in-
quiry. Hilary Putnam, one of the founders of formal learning theory, asked
whether Carnap's con�rmation functions were adequate to the goal of settling
on true generalizations, and found them wanting [Putnam 1963]. Since then
learning theory has ourished in philosophy and in computer science (see for
example [Angluin and Smith 1983], [Glymour 1991], [Kelly and Glymour 1992],
[Kelly et al. 1994]). Kevin Kelly's recent book, \The Logic of Reliable Inquiry",
1A locus classicus that outlines this project is [Goodman 1983, Section III.3]. The search
for re ective equilibrium is common to all attempts at philosophical \explication". For ex-ample, John Rawls says that in constructing a procedural test of justice, we must check the
formulations of the procedure by seeing whether the conclusions reached match \our consid-
ered judments on due re ection" [Rawls 1996, III.1.4].2James's fellow pragmatist Peirce made the point even more colorfully: \The following
motto deserves to be inscribed upon every wall of the city of philosophy: Do not block the
path of inquiry." [Peirce 1958, 1:35]
8
Figure 1.1: Categorical vs. Hypothetical Imperatives
builds on learning theory to develop a comprehensive philosophy of science and
empirical inquiry [Kelly 1995]. This thesis extends Kelly's work.
1.2 Hypothetical Imperatives for Inductive In-
ference
I apply learning-theoretic techniques to determine what methods are best for
accomplishing given aims of inquiry in a given problem. I examine a variety
of cognitive values that various writers have proposed as desiderata in inquiry.
These are: content, truth, avoiding error, avoiding retractions (theory changes),
convergence to the truth, fast convergence to the truth, and producing theories
that are �nitely axiomatizable. The next step is to de�ne what it is for an
investigative method to perform well with respect to a given cognitive value.
For this I draw on two basic principles from decision theory: admissibility and
the minimax principle. Combining an evaluation criterion with a cognitive value
yields a performance standard for inductive methods; see Figure 1.2.
The theory of optimal inductive inference that I develop in this thesis answers
three kinds of questions.
1. What standards of success are feasible in a given problem?
2. What methods are optimal for feasible standards of success in a given
inductive problem?
9
Figure 1.2: Performance Standard for Inductive Methods = Evaluation Crite-
rion + Cognitive Value
10
3. Are there systematic dependencies and con icts between di�erent epis-
temic values?
1.3 Results
The fruits of the theory are rewarding. I group the results in �ve di�erent
categories.
1.3.1 Means-Ends Vindications of Traditional Proposals
I show that some traditional proposals for how scienti�c inference ought to
proceed|usually defended on grounds of \intuitive plausibility"|are in fact
the optimal means for attaining certain cognitive goals. These proposals include
the principle of always entailing the evidence, a skeptical attitude that will not
generalize beyond the evidence, axioms for \minimal change" belief revision,
and Popper's conjectures-and-refutations scheme. My results clarify the status
of such norms: Those who subscribe to the epistemic values implicit in the norm
should follow it, whereas others may (and sometimes should) prefer di�erent
methods.
Theories of belief revision are linked with theories of conditionals and non-
monotonic inference; currently, this cluster of topics is one of the most active
areas of research in philosophical logic. My means-ends analysis leads to a prin-
cipled new notion of minimal belief change that di�ers from the received axioms
(the so-called \AGM" postulates from [Alchourr�on et. al. 1985]).
1.3.2 Novel Solutions to Traditional Problems
Some combinations of epistemic values are particularly interesting. Suppose
that we follow learning theory in ranking reliable convergence to the truth �rst.
Then add the desiderata of minimizing convergence time and avoiding retrac-
tions. We may think of the result as a standard of e�ciency for asymptotically
reliable methods. This e�ciency criterion has particular intuitive appeal:
� In Goodman's Riddle of Induction, the only projection rule that is e�-
cient in the sense described is the natural one. (On a sample of all green
emeralds, project that all emeralds are green.)
� This criterion requires a variant ofOccam's razor. To be precise, it requires
that a scientist should not infer the existence of an entity that is observable
in principle until she in fact observes it.
� In some circumstances, e�ciency in the sense de�ned requires a particle
theorist to posit the existence of unobservable particles. Thus there is a
purely instrumental reason for introducing hidden entities.
11
It is no accident that the same combination of epistemic values underwrites
the intuitive inferences in these problems: I show that Goodman's Riddle, Oc-
cam's Razor and the theory of reactions among elementary particles share a
common topological structure. Thus learning theory discovers epistemically rel-
evant common structure among problems that look very di�erent to unguided
intuition.
These results provide reliabilists such as Hans Reichenbach (cf. Section 2.3.4)
with an answer to those critics|such as [Salmon 1967]|who point out that
convergence in the long run does not rule out any inference in the short run.
Although this observation hardly re ects badly on the aim of arriving at the
truth, it shows that long-run reliability by itself does not provide the guidance
for what to believe in the short run that many writers concerned with the
problem of induction have sought (e.g., [Earman 1992, Ch.9]).
I propose a new answer for the reliabilist to Salmon's challenge: If we aug-
ment the reliabilist's \pragmatic vindication" (Reichenbach's term for a means-
ends justi�cation) of reliable inference rules by taking into account other epis-temic values, such as avoiding retractions and minimizing convergence time, we
obtain interesting strong constraints for what to infer in the short run. Indeed,
that move gives the long-run perspective the resources to solve famous short-
run puzzles such as Goodman's Riddle of Induction|puzzles that have eluded
theories of induction framed from a short-run perspective.
1.3.3 New Questions and Answers
I de�ne a set of notions of e�ciency for reliable methods by combining the
decision-theoretic evaluation criteria with various epistemic goals. Considering
the variety of cognitive values that I examine, one may fear that the resulting
theory will be a complicated catalog of means-ends analyses with no systematic
relationships among di�erent epistemic aims. On the contrary, the cognitive
goals fall into a tidy feasibility hierarchy . For example, for any inductive prob-
lem in which it is possible to minimax convergence time, it is possible to attain
any of the other standards of e�ciency. In that sense, minimaxing convergence
time is the least feasible or the most stringent of any of the e�ciency criteria in
the hierarchy. On the other hand, applying the admissibility criterion to evalu-
ate reliable methods with respect to convergence time yields the most feasible or
least stringent standard of e�ciency. Figure 1.3 shows the hierarchy of cognitive
goals;
Chapter 6 explains in detail the relationships that the �gure illustrates.
The hierarchy of cognitive goals exhibits systematic dependencies among
cognitive values. I also examine systematic con icts among them. Speaking
roughly and generally, the \skeptical" aims of avoiding error and avoiding re-
tractions form a group that con ict with the \realist" goals of converging to the
truth, providing content and minimizing convergence time. The former values
pull inquiry away from \inductive risks" whereas the latter require \bold gener-
12
Figure 1.3: The Hierarchy of Cognitive Goals
13
alizations". However, the relationships among these cognitive values are more
subtle than this simple dichotomy. Often minimaxing retractions does con ict
with minimizing convergence time, and I provide an exact characterization of
the extent of the con ict. But sometimes an agent can have it all, epistemically
speaking: She can reliably converge to the right answer, without unnecessary de-
lays and without unnecessarily many retractions. In these case the combination
of reliability with minimizing convergence time and retractions has particular
intuitive appeal. Cases of this kind are Goodman's Riddle, Occam's Razor and
theories of particle reactions.
In sum, systematic means-ends analysis describes precisely subtle dependen-
cies and con icts among epistemic aims that are news in epistemology and the
theory of inductive inference.
1.3.4 Analysis of Inductive Problems from Scienti�c Prac-
tice
One of the goals of the philosophy of science is to illuminate methodological
problems that arise in scienti�c practice. The two main problems of particle
physics are (1) to �nd the set of elementary particles, and (2) to describe the
possible reactions among elementary particles [Omnes 1971]. I analyze these
problems from a learning-theoretic perspective. Without further background
assumptions, there is no method for theorizing about the set of elementary
particles that is guaranteed to arrive at the right answer, even in the limit of
inquiry. In practice it is common to assume that there are only �nitely many
elementary particles. With that assumption, learning theory gives a positive
result: there is a method for generalizing from laboratory data to the existence of
elementary particles that reliably converges to positing the true set of elementary
particles.
As for problem (2), I show that without the assumption that particle re-
actions can be described with conservation principles|again, a common as-
sumption in practice|or another assumption of similar strength, there is no
reliable method for theorizing about which reactions are possible. But given
this assumption, there is an e�cient algorithm for reliably identifying the set of
possible reactions|-that is, an algorithm that minimizes convergence time and
retractions. Moreover, there are situations|which I characterize precisely|in
which such a reliable e�cient method has to introduce hidden particles to avoid
taking back its conjectures.
These results give a precise sense in which important elements of particle
physics|conservation laws and hidden particles|serve goals of inquiry.
1.3.5 Rational Choice in Games
I rely on the well-known decision-theoretic principle of admissibility to evaluateinductive strategies. Several of my results show that for a given epistemic value,
14
a method is admissible with respect to that value just in case it is admissible
at each stage of inquiry. This connection between admissibility in the scien-
ti�c \game against nature" overall and admissibility at stages of inquiry is a
fundamental reason why the admissibility principle yields short-run constraints
on inductive methods. I show that this connection re ects a fundamental fact
about sequential games in general, not just \games against nature" (which have
a rather special form): In just about any sequential game, a strategy is admis-
sible just in case it is admissible at each stage of the game. This basic theorem
of game theory suggests using the admissibility criterion to derive a powerful
procedure for making predictions about strategic interactions in general, with
applications in the foundations of micro-economics and political philosophy. I
prove that this procedure has several attractive properties as a solution concept
for games.
1.4 Overview
Chapter 2 presents the model of empirical inquiry that I will use throughout
this dissertation to formulate and analyze inductive problems. I describe several
prominent examples of inductive problems to illustrate the model, and discuss
some of the epistemological assumptions that are (or are not) implicit in the
model.
Chapter 3 begins the systematic investigation of optimality criteria for induc-
tive methods, starting with content, avoiding error and avoiding mind changes.
This chapter includes a critique of common proposals for \minimal theory
change" and suggests an alternative.
Chapter 4 characterizes the conditions under which inductive problems havereliable solutions|that is, methods that are guaranteed to arrive at a correct
answer to the problem in the long run. I describe the di�erences between reliable
convergence in the long run as an ideal of success in inquiry and other proposals,
and defend the importance of long-run convergence against various objections.
Chapter 5 describes the structure of reliable methods. I prove a normal form
theorem that shows that (virtually) all reliable methods can be constructed
in a certain way. This insight has applications to problems in Popper's and
Levi's epistemologies. I use the normal form theorem to raise and answer an
in�nitary version of Gettier's paradox for Plato's account of knowledge as stable
true belief.
Chapter 6 is the core chapter of this thesis. I de�ne a set of six e�ciency
criteria for reliable inference, and characterize what methods are e�cient in
the respective senses. These results yield a solution to Goodman's Riddle of
Induction, and a vindication of a variant of Occam's Razor. I show that the six
e�ciency criteria fall into an exact hierarchy of feasibility.
Chapter 7 extends the theory of e�ciency for reliable inquiry to the task of
�nding correct theories for a range of logically independent phenomena under
15
investigation. (The previous chapters examine the problem of identifying the
true hypothesis from a range of mutually exclusive alternatives.)Chapter 8 applies the machinery from Chapter 7 to analyze inductive prob-
lems that arise in particle physics. I show that conservation principles and
hidden particles (or similar theoretical ingredients) are necessary for reliable,
e�cient inquiry. A by-product of the analysis is a striking result about the
structure of conservation principles in particle physics: Roughly, under con-
servation of energy, there cannot be more conservation principles than stable
particles.
Chapter 9 proves a fundamental theorem in game theory: Roughly, that a
strategy is admissible for a sequential game if and only if it is admissible at
each stage of the game. Several of the characterizations of admissible methods
from the previous chapters depend on this fact. I apply this insight to derive
a solution concept for games in general based on the admissibility principle.
I establish various attractive properties of this solution concept, and develop
some of the implications for the theory of rational choice in games and the
foundations of micro-economics.
Each chapter begins with an outline that summarizes the main questions
that I tackle in the chapter, and describes the answers. The body of the chapter
gives the details, with examples, diagrams and de�nitions. I state and explain
formal results, and outline informal but precise arguments for why they are true.
The last section of each chapter contains the formal proofs.
16
Chapter 2
A Model of Scienti�c
Inquiry
2.1 Outline
Isaac Levi writes:
Scienti�c inquiry, like other forms of human deliberation, is goal-
directed activity. Consequently, an adequate conception of the goal
or goals of scienti�c inquiry ought to shed light on the di�erence
between valid and invalid inference; for valid inferences are good
strategies designed to attain these objectives [Levi 1967, preface].
This passage suggests a three-step program for the study of scienti�c method.
1. Develop an adequate conception of scienti�c inquiry.
2. Develop an adequate conception of the goals of scienti�c inquiry.
3. Investigate what strategies for inquiry attain these goals.
These steps have counterparts in the way that learning theorists treat prob-
lems of induction.
1. Specify a model of the learning situation|what kind of data are available
to the learner, and what kind of outputs does the learner produce?
2. Specify a criterion of success|for example, to reliably converge to the
truth in the Putnam-Gold paradigm, or to produce an approximately cor-
rect hypothesis, fairly quickly and with high probability. 1
1This is known as the PAC paradigm; see Section 4.6.
17
3. Investigate when and how it is possible to achieve the speci�ed kind of
success.
Philosophers have spent a good deal of e�ort in weighing the virtues of one
conception of scienti�c inquiry and its aims against another. They have been
less thorough with the third step: to determine the best scienti�c methods for a
given conception of science and its aims. Learning theorists, on the other hand,
present such means-end analyses within a number of learning models. They
are not so concerned with �nding \the adequate conception" of science, or in-
duction. What matters is whether a model captures some important aspects
of empirical inquiry. Di�erent models describe di�erent types of inquiry. For
example, we may study various scenarios for how the scientist gathers evidence:
by passive observation only, through experiments [Kelly 1995, Ch.14], or with
random sampling. Similarly, learning theorists investigate a number of stan-
dards of success for inquiry, corresponding to di�erent cognitive objectives and
the resources that the inquirer has for achieving them.
The approach of this thesis is in the spirit of learning theory. My project
is to carry out a means-ends analysis for a variety of proposals about the goals
of inquiry, all of which are interesting, but none of which I take to be the only
adequate one. I describe a model of the scientist's situation (step 1), but I do
not claim that it is the only adequate description. Rather, this model has three
virtues that recommend it as a basis for my project: (1) it is conceptually and
formally relatively simple, (2) it allows me to formulate a number of problems of
induction clearly, and (3) it �ts certain parts of scienti�c practice (drawn from
particle physics) in a natural way. The model is Kelly's extension [Kelly 1995,
Ch.3] of the Putnam-Gold framework. This chapter describes Kelly's model,
and illustrates how useful and exible it is for formulating inductive problems
precisely. I conclude with a discussion of how my conception of scienti�c inquiry
relates to other proposals in the philosophical literature.
2.2 A Model of Scienti�c Inquiry
Empirical inquiry begins with uncertainty about the world. A possible world
w completely speci�es the facts of relevance to the inquirer. For example, if the
researcher is interested in the colour of swans, a possible world determines the
colour of each swan in that world. The set of possible worlds W comprises
all possible descriptions of the way things|those things that the researcher is
interested in|might be.
At the beginning of inquiry, the inquirer is uncertain about which of the
possible worlds is the actual one. I identify a proposition P with a set of
possible worlds (namely those at which the proposition is true). Figure 2.1
illustrates these notions.
A scientist collects evidence to �nd out whether one or more propositions
of interest|hypotheses|are true in the actual world. As inquiry proceeds, one
18
Figure 2.1: Possible Worlds and Propositions
19
piece of evidence after another is collected; if inquiry continues inde�nitely, an
in�nite sequence �1; �2; :::�n ; ::: of evidence items is obtained. I refer to such
sequences as data streams. Figure 2.2 shows a generic data stream.
The scientist may be uncertain about what will be observed in a given world.
In a world in which all swans are black, she may suppose that only black swans
will be reported; in a world in which one swan is black and all others white, she
may expect to observe a black swan exactly once (see Figure 2.3).
I represent the inquirer's beliefs about what may be observed in a given world
by a relation Gen(�; w) (read: \In world w, data stream � may be generated in
inquiry".) 2 The set of evidence items E contains all pieces of evidence that
appear at some time in some world; formally, E =Sw2Wfrange(�) : Gen(�; w)
holdsg, where range(�) is the set of evidence items that occur along data stream
�. A data stream is an in�nite sequence of discrete observations drawn from
E: I denote the set of all data streams by E!. The empirical content of a
propositionP is the set of all data streams that the scientist regards as consistent
with P ; formally, the empirical content of P is the set of data streams de�ned
bySw2P
f� : Gen(�; w)g.
If two di�erent possible worlds can generate the same data stream (that is, if
Gen(�; �) is not a function), even the total in�nite amount of evidence may not
settle the hypotheses under investigation. For example, if the scientist considers
it possible that a black swan may exist without ever being found, he concedes
that only white swans may be observed even if there in fact is a black swan. If
this is the case, philosophers say that the hypothesis \there is a black swan" is
globally underdetermined (see Figure 2.4).
Global underdetermination gives rise to a number of interesting issues in
methodology (cf. [Kelly 1995, Ch.13, Ch.15]). But for studying how the ends
of inquiry control the legitimacy of inferences, it is better to avoid the compli-
cations that result when the possibility of global underdetermination is raised.
If we make the simplifying assumption that global underdetermination does not
arise|such that, for example, a black swan exists if and only if some black
swan is observed|we may consider the empirical content of the hypotheses un-
der investigation instead of the hypotheses directly. Accordingly, I assume that
the hypotheses under investigation are given as empirical hypotheses|that
is, as a set of data streams. 3 In general, an empirical proposition is a set of
data streams, and so is an empirical theory (which we may think of as the
conjunction|intersection|of a set of empirical propositions). I usually denote
an empirical hypothesis by H, and a collection of empirical hypotheses by H.
2This relation speci�es what [Kelly 1995, Ch.3] calls \the data protocol".3Even if we grant the possibility of global underdetermination, investigating the empirical
content of hypotheses instead of the hypotheses directly is justi�ed if the scientist is satis�edwith an empirically adequate theory. For example, the hypothesis that all swans are white is
false in the world in which there is a black swan, but nonetheless empirically adequate if noblack swan is ever observed. [Van Fraassen 1980] makes a case that the goal of science should
be construed as �nding empirically adequate theories, and nothing more.
20
Figure 2.2: Data Stream �
21
Figure 2.3: The Observations that may arise in a given World
22
Figure 2.4: Global Underdetermination of the question \is there a black swan"?
23
An empirical hypothesis H is true (or: correct) on an in�nite data sequence � if
� 2 H. The complement (or negation) of an empirical hypothesis H is the set
of data streams on which H is false, that is, E!�H. I denote the complement of
H by H . An empirical proposition P entails another empirical proposition P 0
just in case P 0 is true whenever P is; formally, P j= P0 () P � P
0. Similarly,
two empirical propositions P1 and P2 entail P 0 if P 0 is true whenever P1 and
P2 are; hence I de�ne P1; P2 j= P0() P1 \ P2 � P
0. An empirical proposition
P is consistent with P0 if on some data stream, P and P 0 are both true, that
is if P \ P0 6= ;. I say that a �nite data sequence e is consistent with a data
stream � just in case � extends e (written � � e). The concatenation of two
�nite data sequences e and e0 is denoted by e� e0; similarly I write e� � and e�x
for the concatenation of e with a data stream �, and with a datum x 2 E. A
�nite data sequence e corresponds to the empirical proposition that some data
stream consistent with e is the one obtained in inquiry. I write this proposition
as [e]; so [e] = f� : � � eg. The length of a �nite data sequence e is the number
of items in e, and is denoted by lh(e). The �nite initial sequence of a data
stream � of length n, namely �1; �2; :::; �n , is written as �jn . Figures 2.2 and
2.5 illustrate these notions.
A scienti�c, or inductive, method � takes as input a �nite data sequence,
and produces an empirical theory; see Figure 2.6. 4
This de�nition does not allow agents to distinguish between \serious possibil-
ities", \epistemically necessary" propositions and plain \belief", or \background
assumptions" and \conjectures". In Section 2.4 I discuss di�erent epistemic at-
titudes that a scientist may have towards a proposition. Eventually I shall
adopt a model of inquiry in which the scientist's background knowledge, or her
\standard of serious possibility", remains essentially the same through time (see
Section 2.5). But before we engage the general and abstract philosophical issues
involved in the conception of belief, it is best to complete the exposition of my
basic model of scienti�c inquiry by going through a number of examples. To ac-
commodate the scientist's standard of "serious possibility", I split the scientist's
theory into two parts, K and T , where T must entail K. We may think of K
as the scientist's background knowledge and of T as the scientist's conjecture.
In Section 2.4, I discuss how one may interpret this kind of theory in terms of
the scientist's epistemic attitudes.
For the formal de�nition of such a scienti�c method, let E� denote the set of
�nite sequences of evidence items drawn from E. A scienti�c method � assigns
to each �nite data sequence e 2 E� a pair of empirical theories (K;T ) that
4Learning theory is exible about what exactly it is that empirical methods produce.
Taking the outputs of an empirical method to be an empirical theory is a simple yet widelyapplicable model that serves my purposes best. However, the outputs of a method could also
be theories containing theoretical terms (in Chapter 8, scienti�c methods introduce hidden
particles), a scientist's \practices" [Kitcher 1993, Ch.3] or a grammar for a language, as inapplications of learning theory to modelling language learning (e.g., [Osherson et al. 1986]).
See also Section 2.4 below.
24
Figure 2.5: Empirical Hypotheses and Entailment
25
Figure 2.6: An Inductive Method
satisfy T j= K.
2.3 Examples of Inductive Problems and Scien-
ti�c Methods
The main purpose of the following examples is to show how the concepts just
presented apply to familiar problems of empirical inquiry. These examples will
serve as illustrations of methodological points throughout this thesis. Another
purpose is to introduce the concept of a reliable method by way of examples.
2.3.1 Universal Generalizations
Suppose we are interested in a universal generalization such as \all swans are
white". A hypothetical ornithologist may investigate this hypothesis by exam-
ining one swan after another. Let us assume that the ornithologist divides the
colour spectrum into discrete colors|such as white, black, gray, blue, green|
numbered 0,1,2,.. Then the evidence items are reports of the form \this swan
has colour n", which we may simply encode by the number n; let 0 encode \this
swan is white" and 1 \this swan is black". A data stream is a sequence of natu-
ral numbers, representing the observed colors. If no global underdetermination
obtains, the hypothesis that all swans are white is true just in case all observed
swans are white; that is, just in case the data stream produced in inquiry is an
in�nite sequence of 0s. The empirical content of the hypothesis H \all swans
are white" is the singleton containing the everywhere 0 data stream; formally,
H = f0!g. The complement H of H, \not all swans are white", is the set of all
data streams with some observation other than 0; formally, H = f� 2 N! : for
some time k; �k 6= 0g. The scientist may initially be convinced that only white
and black swans will be observed. That is, his background knowledge K is the
set f0; 1g! of all data streams featuring 0s and 1s. Figure 2.7 illustrates this
situation.
His initial conjecture, before seeing any data, might be that all swans are
white. In my notation, the scientist's �rst theory is �(;) = (K;H). Assuming
26
Figure 2.7: A Universal Generalization
27
that only white and black swans are observed, a complete rule for inductive
inference in this problem is the following.
1. Expand background knowledge by adding the data to the original assump-
tion that only white and black swans will be observed.
2. If all observed swans are white, conjecture \all swans are white";
3. otherwise, let the current conjecture be just the current background knowl-
edge (which by clause 1 entails that not all swans are white).
This procedure is de�ned as follows for all �nite data sequences e consistent
with K.
1. If all observed swans are white|that is, if range(e) = f0g|then �(e) =
(K \ [e];H):
2. Else if some black swan appears along e|that is, if 1 2 range(e)|then
�(e) = (K \ [e];K \ [e]):
This method has the property that it eventually settles on the correct truth-
value for H, on all data streams consistent with background knowledge K. If H
is true|if all swans are white|� makes the right conjecture from the start. If
H is false, eventually a black swan is observed, and � becomes certain that H is
false. When a method is guaranteed to eventually settle on the right answer no
matter what the right answer is, I say that the method is reliable. Although
a reliable method is guaranteed to converge to the truth, we may never know
when the method has done so. In the current example, if all swans are white, �
converges to the right answer, but at any time � might change its mind if a black
swan appears. Another way of putting the matter is that a reliable method will
eventually give the right answer, but not with certainty.
2.3.2 Almost Universal Generalizations
Consider again the birds from the previous example, but this time with hypothe-
ses that allow �nitely many exceptions to a rule about the colour of swans. The
hypothesis \almost all swans are black" is true if only �nitely many non-black
swans are observed, and similarly \almost all swans are white" is true if only
�nitely many non-white swans are observed (see Figure 2.8.).
In terms of our previous encoding, \almost all swans are black" is correct
on a data stream � if � stabilizes to 0, and \all swans are white" is true on � if �
stabilizes to 1. An interesting rule for inductive inference in this problem is to
conjecture that all future swans have the same colour as the last one.
1. Begin with background knowledgeK = f0; 1g! as in the previous example,
and conjecture that all swans are white.
28
Figure 2.8: Almost Universal Generalizations with Finitely Many Exceptions
29
2. Expand background knowledge by adding the data to the original assump-
tion that only white and black swans will be observed.
3. If the last of n observed swans is white, conjecture \almost all swans are
white";
4. otherwise, conjecture \almost all swans are black".
In our formal notation, we may render this inference rule as follows. Let
Hw be the empirical content of the hypothesis that all but n swans are white.
That is, � 2 Hw () there are only �nitely many times n such that �n 6= 0.
Similarly, the empirical content of the hypothesis that almost all swans are black
is denoted by Hb. So � 2 H
b () there are only �nitely many times n such
that �n 6= 1. Our inference rule is:
1. �(;) = (K;Hw).
2. Let lh(e) = n > 0. If the last observed swan is white|that is, if en = 0|
then �(e) = (K \ [e];Hw \ [e]);
3. if the last observed swan is black|that is, if en = 1|then �(e) = (K \
[e];Hb\ [e]):
If one of the almost universal generalizations is true, then � eventually settles
on the right one. For example, if almost all swans are white, then after some
�nite time, only white swans appear, and � converges to Hw . However, both
universal generalizations might be false, namely if there are in�nitely many
white swans and in�nitely many black swans. In that case � goes back and
forth between the two possibilities, without ever ruling out both of them. Thus
� does not reliably settle on a true theory given K. On the other hand, if we
assume that one of the universal generalizations is true, then � is reliable with
respect to this assumption. Formally, if K 0 = Hw [Hb, then � is reliable given
K0. This illustrates how the reliability of a method depends on the scientist's
background knowledge.
2.3.3 Goodman's Riddle of Induction
In his \New Riddle of Induction", Nelson Goodman introduces an unusual color
predicate for emeralds [Goodman 1983].
Suppose that all emeralds examined before a certain time t are green
. . . Our evidence statements assert that emerald a is green, that
emerald b is green, and so on . . .
Now let me introduce another predicate less familiar than \green".
It is the predicate \grue" and it applies to all things examined before
t just in case they are green but to other things just in case they are
30
blue. Then at time t we have, for each evidence statement asserting
that a given emerald is green, a parallel evidence statement asserting
that emerald is grue.
It is natural to consider not just one \grue" predicate, but a family of them,
one for each critical time t. I will model the Riddle of Induction as the problem of
�nding the colour predicate that correctly classi�es all emeralds. To describe therange of alternative hypotheses|universal generalizations of colour predicates|
it is convenient to assume, as in [Salmon 1963], that the emeralds are examined
in a �xed order, so that we may denote the emerald examined at time 1 by 1
(rather than a), the one examined at time 2 by 2 (rather than b), etc. Then we
can de�ne
x is grue(n) () x � t and x is green(n) or
x > n and x is blue
and
x is bleen(n) () x � n and x is blue or
x > n and x is green
.
As Goodman noted, green can be de�ned from grue(n) and bleen(n), for
any n:
x is green () x � n and x is grue(n) or
x > n and x is bleen(n)
Similarly for blue.
The hypotheses of interest in the riddle of induction are the universal general-
izations of these predicates, which I denote by Hgreen;Hblue;Hgrue(n);Hbleen(n) .
Each sequence of emeralds satis�es at most one of these universals (regard-
less of which colour predicates are used to report the colour of the individual
emeralds). In the green � blue reference frame, we can diagram the empirical
content of some of the universal hypotheses as in Figure 2.9.
Now for inference rules, or as Goodman calls them, projection rules. The
scientist's initial background assumptionK is that either all emeralds are green,
blue, grue(n), or bleen(n), for some n. Formally, if we let H be the collection
of the universal hypotheses that are candidates for projection, K = [H. The
natural projection rule is the following, for all �nite data sequences consistent
with the background knowledge.
31
Figure 2.9: The New Riddle of Induction
32
1. If n green emeralds have been observed, project that all emeralds are
green.
2. If n green emeralds have been observed and the n+ 1-th emerald is blue,
project that all emeralds are grue(n).
3. If n blue emeralds have been observed, conjecture that all emeralds are
blue (where n > 0).
4. If n blue emeralds have been observed and the n+ 1-th emerald is green,
conjecture that all emeralds are bleen(n).
In our notation, we may describe the natural projection rule as follows, for
all �nite data sequences e consistent with K.
1. �(;) = (K;Hgreen):
2. If e is consistent with Hgreen; �(e) = (K \ [e];Hgreen) .
3. If e is consistent with Hblue; �(e) = (K \ [e];Hblue)
4. Otherwise �(e) = (K \ [e];K \ [e]):
It is easy to see that the natural projection rule is reliable given K. So
are many other projection rules, for example: project grue(100) if less than 100
green emeralds have been observed; project green if 100 or more green emeralds
have been observed; and otherwise project the colour predicate that is consistent
with the evidence (there is only one). On the other hand, a \grue" a�cionado,
who projects nothing but predicates of the form grue(n), would fail to identify
the correct colour predicate if all emeralds are green. In Chapter 6, I show that
among reliable projection rules, the natural one is the best for converging to the
true colour predicate quickly while avoiding unnecessarily many retractions.
2.3.4 Identifying Limiting Relative Frequencies
Hans Reichenbach sought to reduce all inductive inference to estimates of prob-
abilities [Reichenbach 1949]. Reichenbach subscribed to the frequentist inter-
pretation of probability. According to frequentists, the statement that \the
probability of this coin coming up heads is p" means that in an in�nite random
sequence of coin tosses, the rate of heads comes closer and closer to 1/2 as more
and more tosses are observed [Von Mises 1981]. To make this idea precise, let
e be a �nite (non-empty) sequence of coin tosses. The relative frequency of
heads in e is the number of heads occurring in e divided by the number of total
tosses in e. Given an in�nite sequence � of coin tosses, the limiting frequency
of heads in � is (for example) 1/2 just in case for every ratio r di�erent from
1/2, eventually the relative frequency of coin tosses in the �nite initial sequences
of � is always closer to 1/2 than to r; see Figure 2.10.
33
6
-
relative frequency in sample
0
1
1/2
��������������LLLLLLLLLLLL������,,,B
BB��� . . .
number of tosses
XXXX
Figure 2.10: The limit of the relative frequencies is 1/2.
34
The hypothesis \the limiting frequency of heads is 1/2" is true on an in�nite
sequence of coin tosses � just in case the limiting frequency of heads in � is 1/2.
Reichenbach proposed to go about identifying the limiting frequency of an event
from short-run frequency data in this way: Posit that the probability of the event
in question ism=n if the event occursm times in n trials. He called this inference
rule the straight rule. Reichenbach noted that if the observed frequencies of
the event of interest converge to a limit p, the straight rule will come arbitrarily
close to p. That is, for any degree of approximation r, eventually the posits of
the straight rule are always within jp� rj of p. Reichenbach viewed this fact as
a \pragmatic vindication" of the straight rule.
To formulate a version of Reichenbach's problem and his solution in our
model, let us consider the problem of �nding the probability of a coin coming
up heads. There are two evidence statements: \the coin shows heads", encoded
by 0, and \the coin shows tails", encoded by 1. The possible data streams
are the in�nite sequences of 0s and 1s, that is, the set f0; 1g!. Reichenbach's
assumption that the limiting relative frequency of heads exists on the actual
data stream is represented by the background knowledge K = f� 2 f0; 1g! : for
some p between 0 and 1, the limiting frequency of heads in � is pg: The straight
rule is the following method, de�ned for all �nite sequences of 0s and 1s:
1. �(;) = (K; 1=2), where 1/2 is just an arbitrary guess for the probability
of heads before any trials are made.
2. �(e) = (K \ [e];m=n), where n = lh(e) > 0, and m is the number of 0s
(heads) occurring in e.
The straight rule does not reliably converge stabilize to the true limiting
frequency of heads, but it approaches the limiting frequency to an arbitrary
degree. Following Kelly's terminology [Kelly 1995, Ch.9], I say that the straight
rule gradually identi�es the true limiting relative frequency of heads (assum-
ing the relative frequencies of heads exists|that is, if K is true). If we assume
as background knowledge that the limiting relative frequency exists and is either
1/4 or 3/4, then there is a reliable method for identifying which of these two
alternatives is true: Given a sequence of coin tosses e, if the relative frequency
of heads in e is closer to 1/4 than to 3/4, conjecture \the limiting relative fre-
quency is 1/4"; otherwise conjecture that it is 3/4. By the de�nition of limiting
relative frequency, eventually the observed frequency will always di�er from the
true one by less than 1/2; after that point, this procedure stabilizes to the true
limiting relative frequency. In general, it is possible to reliably identify which of
a number of possible limiting relative frequencies is the true one if the number
of alternatives is �nite.
35
2.3.5 Cognitive Science and the Physical Symbol System
Hypothesis
Newell and Simon raised the conjecture that any intelligent behavior can be pro-
duced by a \physical symbol system"|essentially, a computer [Newell and Simon 1976].
This claim was part of their \physical symbol system hypothesis". (The other
part is that intelligence requires the capacities of a physical symbol system).
They emphasize that this conjecture is not a philosophical claim about the con-
cept of intelligence, but rather an empirical hypothesis. A simple-minded way
in which we might go about testing the physical symbol system hypothesis (cf.
[Kelly 1995, Ch. 7]) is this: Select some general task that requires intelligence,
and see if there is a computer program that solves a sequence of instances of the
task. For de�niteness, we may ask the mechanical candidates for intelligence
to judge the grammaticality of sentences in English. Choose some means for
producing an unbounded sequence of sentences with English words; for exam-
ple, we may take the �rst sentence s of the editorial in the day's New York
Times, together with the result of adding \not" at the beginning of s. The
computer has to classify these sentences as grammatically correct or not. The
computer's answer counts as correct if it agrees with the judgments of competent
speakers of English (if competent speakers of English disagree about whether
a given sentence is grammatical or not, we shall discard the sentence). For
any �nite number of sentences, it is trivial to �nd a physical symbol system|a
computer|that correctly classi�es these sentences: we can employ a look-up
table, or \hardcode" the right answers, as programmers say. The interesting
question is whether a computer can give all of the right answers.Formally, we have an in�nite sequence s1; s2; ::: of sentences with English
words, and evidence items of the form hsk; bi, where b is 0 if sk is ungrammatical,
1 if sk is grammatical. Let K be the set of in�nite sequences of such evidence
items. The physical symbol system hypothesis, denoted by HPSS , is correct
(as far as our evidence goes) on a data stream � in K just in case there is
a computer program M such that for all k;M outputs \grammatical" on sk
if �k = hsk; 1i, and outputs \ungrammatical" on sk if �k = hsk; 0i. A �rm
believer in cognitive science would conjecture that HPSS is correct no matter
what evidence is observed. In that case, �(e) = (K;HPSS) for all �nite data
sequences e. This procedure is unreliable because it gives the wrong answer if
HPSS is false.
A more open-minded researcher might proceed as follows. Start with a
promising system m1, and conjecture that the physical symbol system hypoth-
esis is correct as long as m1 performs correctly. If m1 fails, the researcher
conjectures that the physical symbol system hypothesis is false, and tries other
programsm2;m3; :::. He continues to conjecture that HPSS is false until one of
these programsmn gives the correct answer on all the sentences to be classi�ed.
Then he conjectures that HPSS is true as long as mn performs correctly, and
so on.
36
To describe this procedure in our formalism, let m1;m2; :::;mn ; ::: be a se-
quence which enumerates the programs that our researcher considers candidates
for arti�cial intelligence. I introduce a \pointer" to keep track of which programs
have failed at a given stage (cf. [Kelly 1995, Ch.9] and Section 5.2 below). A �-
nite sequence of sentence classi�cations hs1; b1i; hs2; b2i; :::; hsn; bni is consistent
with a computer program M if for all si;M halts on si and outputs bi. The
following research method � models our cognitive scientist.
1. pointer(;) = m1;�(;) = (K;HPSS):
2. Let pointer(e) = mn. If mn is consistent with e � x, then pointer(e � x) =
mn, and �(e � x) = (K;HPSS). Otherwise pointer(e � x) = mn+1, and
�(e � x) = (K;HPSS).
We may assume that if there is a program that masters the sentence clas-
si�cation task, it is included in the enumeration of candidates for arti�cial in-
telligence, m1;m2; ::: (for a logically omniscient inquirer, this assumption is of
no consequence because she can enumerate all computer programs, intelligent
or not). Then if HPSS is true, � will eventually try a program that masters
the sentence classi�cation task, and settle on the conjecture that HPSS is true.
If there is no program that classi�es all sentences correctly, � will typically go
through the following pattern: the �rst programs that � tries fail immediately
on the test sentences, and � conjectures that HPSS is false. After a while, �
�nds a programM that classi�es the test sentences correctly for some time, and
conjectures that HPSS is true. Eventually M will make a mistake, � conjectures
that HPSS is false, tries some more programs, and so forth. Thus � goes back
and forth between \the physical symbol system hypothesis is true" and \the
physical symbol system hypothesis is false". It is also possible that all of �'s
programs fail on the available test sentences as soon as � tries them. In that
case, � would|correctly|stabilize to the conjecture thatHPSS is false. To sum
up: if HPSS is correct, � eventually settles on the correct conjecture 1. If HPSS
is false, � may stabilize to 0, but will not stabilize to 1. In Kelly's terminology, �
reliably veri�es HPSS in the limit given K [Kelly 1995, Ch.4] (see Section 4.4
below).
2.3.6 Theories of Particle Physics
Particle physics aims to discover what elementary particles exist, how they
decay and how they react with each other. With regard to �nding out what
particles exist, let us take as our data annual reports from particle physicists
as to whether they have discovered a new elementary particle or not. In Figure
2.11, 0 encodes\no new particles this year", and 1 encodes \a new particle has
been discovered".
A particle physicist may be interested in determining the exact number of
elementary particles; let Hn denote the hypothesis that there are exactly n
37
Figure 2.11: How many particles are there?
38
particles. Assuming that global underdetermination does not arise, Hn is true
just in case exactly n particles are observed. In our representation,Hn is correct
on those data streams that feature exactly n 1s; that is, Hn = f� : exactly n 1s
appear along �g. One rule of inference for this problem is to conjecture at each
stage that the particles discovered so far are all that there are. If background
knowledge K is consistent with any number of particles being discovered in any
order|that is, if K = f0; 1g!|the corresponding method is this:
1. �(;) = (K;H0):
2. �(e) = (K \ [e];Hn), where n is the number of 1s occurring in e, that is,
the number of discovered particles.
If there are only �nitely many particles, � eventually identi�es their correct
number. But if there are in�nitely many, � fails to identify this fact. Thus � is
not reliable given K, but � is reliable given K0 =SnHn.
Chapter 8 examines the more complicated problem of inferring how elemen-
tary particles react.
There are many more examples of learning-theoretic models for empirical
problems. Glymour uses Kelly's framework to analyze the scope of methods
for neuropsychology [Glymour 1994]. Kelly examines Kant's question about
the divisibility of matter [Kelly 1995, Ch.3]. Learning theory has been applied
extensively for investigating language acquisition [Osherson et al. 1986].
In the next section, I compare my model of scienti�c inquiry with other
conceptions from the philosophical literature.
2.4 Inquiry, Belief and Action
Four questions form the core of the debate about conceptions of scienti�c in-
quiry: (1) what is the scientist's attitude towards his evidence? (2) what does
inquiry produce? (3) what is the scientist's attitude towards the results of his
inquiry? (4) how do the scientist's actions depend on (1){(3)? I will brie y
review the most common answers to these questions, and then discuss what
epistemological interpretations we can give to the formal model of scienti�c
inquiry from Section 2.2.
What is the scientist's attitude towards the evidence? Many writers sup-
pose that the inquirer is certain that his evidence is correct; for example, in
\Gambling With Truth" Levi writes:
To accept H as evidence is not merely to accept H as true but to
regard as pointless further evidence collection in order to check H.
[Levi 1967, p.149]
39
Karl Popper, by contrast, viewed scientists as accepting evidence statements
only provisionally; he pointed out that if a scientist fails to replicate a phe-
nomenon in question, he may reject the reports of the phenomenon as spurious
[Popper 1968]. My model can interpret both views. For example, if the available
evidence e reports ten white swans, a method � might accept [e] as evidence in
Levi's sense, such that �(e) = (K \ [e]; T ), for some theory T . Or � may only
provisionally accept [e], as Popper suggested, such that �(e) = (K;T \ [e]). Or
� may accept provisionally part of e but not all of it. Note that for Levi, as
for me, statements that are accepted as evidence need not come from some dis-
tinguished `observation language'. What matters is not the form or the source
of the evidence statement, but the inquirer's attitude towards it|that he will
assume the evidence statement in further inquiry (cf. [Levi 1967, p.28]). So
Levi allows that a scientist might accept a statement such as H =\all swans are
white" as evidence when, say, 100 swans are all observed to be white. In my
model, this means that the scientist includes H in her background knowledge,
such that �(e) = (H;H) on a sample e of 100 white swans.
What are the products of inquiry? There are sundry proposals for what
the outputs of scienti�c methods are: \Conjectures", \hypotheses" (Popper),
\acceptance as evidence", \acceptance as true" (early Levi), \posits" (Reichen-
bach), \inferences", \estimates" (Hacking), \degrees of con�rmation" (Carnap),
\degrees of belief" (Bayesians). Many writers see two epistemic attitudes in sci-
enti�c inquiry that correspond to the components of the output of scienti�c
methods in my model. In \Gambling With Truth", Levi calls these \acceptance
as evidence" and \acceptance as truth" [Levi 1967]; Kelly refers to \background
knowledge" and \conjectures" [Kelly 1995]; Kitcher distinguishes between \en-
dorsing" and \entertaining" a hypothesis [Kitcher 1993, p.65]; in a discussion
of Reichenbach's straight rule, Hacking speaks of \inferences" and \estimates":
Adam has made 20 tosses, say HHHTH..., giving him 17 H, 3 T.
Now from this data Adam, you and I would cheerfully infer|state
categorically in light of the data (though we may be wrong)|that
the chance of heads exceeds 1/10. ...I estimate that I shall use 63
board feet for my new fence, and order accordingly. I never need to
infer|to state categorically on the data|that I shall use exactly 63
feet. [Hacking 1968] , my emphasis.
If we regard the distinction between \acceptance as evidence" and \accep-
tance as truth" as irrelevant or, like the later Levi, as indefensible [Levi 1980],
we can adopt methods that produce theories with only one component. As
for degrees of belief, we could extend Kelly's model such that scienti�c meth-
ods produce a probability distribution over an algebra of propositions (which
would include all hypotheses of interest). If we further stipulate that the in-
quirer update his background knowledge K simply by adding the new evidence
e, such that the new background knowledge is K\ [e], we obtain the model that
40
Bayesians use [Earman 1992]. However, that school requires agents to update
their degrees of belief by Bayesian conditioning, whereas for learning theorists
conditioning is just one of many possible methods for changing degrees of belief.
What is the scientist's attitude towards the results of his inquiry?A natural interpretation of my model is that the scientist is certain of his
background assumptions and believes his conjecture, which corresponds to \ac-
ceptance as evidence" and \acceptance as true", or perhaps \inference" and
\estimate" in Hacking's terms. If methods produce probability distributions,
we may interpret these as the scientist's \degree of belief". Some writers say
that the study of empirical methods does not require consideration of belief or
other epistemic attitudes at all.
I used to take pride in the fact that I am not a belief philosopher:
I am primarily interested in ideas, in theories, and I �nd it com-
paratively unimportant whether or not anybody `believes' in them.
[Popper 1972, p.25]
In a similar vein, Reichenbach referred to estimates of long-run frequencies
as \posits", a neutral term that leaves open what the inquirer believes about
her posits (cf. [Salmon 1991, p.116]).
How do the scientist's theories relate to his actions?On a traditional view, scienti�c method guides the inquirer in what to be-
lieve, and practical action relies on these beliefs. For example, Popper says that
\we should prefer as basis for action the best-tested theory" [Popper 1972, p.22],(emphasis Popper's). Carnap held that the \aim of inductive logic" was to de-
termine \rational credence functions" that could be used in decision making as
\rational degrees of belief"[Carnap 1962]. However, the early Levi argued that
a scientist's belief in H does not entail that the scientist would or should act as
if H were true.
...the evidence may entitle the scientist to accept one of the Hj 's
as true, yet may not warrant the decision-maker's choosing the act
that produces maximum utility when Hj is true. Recently certain
medical groups temporarily suspended dispensing the birth control
pill, Enovid, pending further examinations of evidence regarding its
safety. Several physicians endorsed this policy, even though they
acknowledged that they believed the pill to be safe. [Levi 1967,
p.10]
He concluded that we should recognize \the quest for truth as a legitimate
human activity whose aims and products are not directly relevant to practical
concerns" [Levi 1967, p.14].
Salmon disagreed with Carnap about the role of probability estimates in
practical action (in this case, betting behavior). Although the straight rule
licenses the inference that the probability of heads is 1 if all coin tosses come up
41
heads, this does not mean that an agent should bet on these odds. \[Users of the
straight rule] would o�er such practical advice as to avoid making large bets at
unfavorable odds on the basis of probabilities whose values are not known with
great con�dence" [Salmon 1991, p.115]. Nowadays, many Bayesian con�rmation
theorists concur that rational degrees of belief need not be related to practical
decisions (e.g., [Hellman 1997]).
In \The Enterprise of Knowledge", Levi changed his position about a sci-
entist who takes her theories for granted in practical action, but is ready to
question them in scienti�c inquiry; he called someone in that situation a victim
of \cognitive schizophrenia" [Levi 1980, pp.16{18]. If a scientist accepts a belief
at time t, Levi says that he should regard it as infallibly true in decision making
at time t. Moreover, Levi holds that accepting a belief is itself a decision, and
therefore in deciding what beliefs to accept at time t + 1, the scientist should
consider his beliefs from time t to be infallibly true. Nonetheless, Levi says,
the scientist's cognitive values may make it rational for him to change his be-
liefs. Thus he may adopt a new theory T at time t + 1, even though at time
t he is convinced that T is false. Hence for Levi, infallibility does not imply
incorrigibility|the scientist may revise beliefs that he regards as infallibly true.
Answers to these four questions de�ne a conception of what scienti�c infer-
ence is, and what it aims for. For each conception of scienti�c inquiry, we can
apply means-ends analysis to determine what inferences are good inferences. For
example, learning theory can make recommendations for methods that produce
probability distributions (e.g., [Putnam 1963], [Juhl 1993]). The reason why
my basic model treats only methods whose theories assign de�nite truth-values
to the hypotheses under investigation is, �rst, because this is a simpler model.
Second, for the applications that I am interested in|situations like those in
Section 2.3 |it is also a more natural model. For example, it is awkward to
ask to what degree a particle physicist should believe that quarks do not have
a �ner substructure; physicists don't seem to be aware of and certainly do not
advertise explicit degrees of belief about that kind of question. But it is nat-
ural to ask why this should be her current hypothesis. Scienti�c practice also
motivates the distinction between the scientist's background assumption and
his conjectures. For example, in particle physics, quantum mechanics is pre-
supposed, but the structure of quarks is still a question that warrants further
\collection of evidence". But the signi�cance of background assumptions in my
model goes beyond an attempt to accommodate features of scienti�c practice.
I agree with Levi that we should use decision theory to study decisions to be-
lieve. An essential part of formulating a decision problem is to specify the set of
possibilities that the decision maker regards as relevant. This is the role of the
scientist's background assumptions. Hence I interpret the scientist's background
knowledge as his standard of serious possibility (so far as the goals of inquiry
are concerned). The decision problem is to choose among possible methods,
which are evaluated relative to the scientist's background knowledge. I assume
that the scientist believes in his background assumptions|in the sense that his
42
choice of a method is based on them|but I do not assume, or deny, that the
scientist believes in his theories. Nor do I assume, or deny, that the scientist
bases practical decisions on his theories. Since the relationship between belief
and practical action is an open philosophical question, I regard it as a virtue of
the learning-theoretic approach to methodology that it does not presuppose an
answer to this question. 5
It is worth remarking that the formal model does not come with psycholog-
ical claims about what the scientist is or is not aware of. In some applications
it is implausible to suppose that the learner is aware of what the serious possi-
bilities and the alternative theories are. For example, when a child is learning
a language, she is not aware of whatever constraints on natural languages there
might be, and does not have in mind a set of \alternative" languages from which
she is drawing successive conjectures. A scientist working in the context of a
research program may not be aware of the space of alternative theories spanned
by the program, and may simply be disposed to respond to evidence in a cer-
tain way, without consciously following an explicit research strategy. In cases
like this the perspective of our model is that of an outsider, someone who is
analyzing the assumptions and inferential dispositions that are implicit in the
scientist's theory and practice.
2.5 Revising Background Knowledge
In this thesis I will examine strategies for revising conjectures on the basis of
background assumptions, but not strategies for revising background knowledge.
This is common practice in learning theory, but a simpli�ed view of scienti�c in-
quiry. Historians of science have described how \scienti�c revolutions" interrupt
the course of \normal science" [Kuhn 1970]. One of the hallmarks of a scienti�c
revolution is a change in what Lakatos calls \core assumptions" [Lakatos 1970].
This leads in most cases to what I would call changes in background knowledge.
Other changes that attend scienti�c revolutions include:
� Changes in the basic ontology of scienti�c theories, that is, the set of
possible worlds.
� From changes in the basic ontology often follow changes in the meaning
of hypotheses, that is, the set of possible worlds in which a hypothesis is
true changes.
5As I mentioned above, Levi rejects my distinction between the scientist's backgroundassumptions and his conjectures as untenable [Levi 1980, pp.16{18]. He holds that an agent
who accepts a theory must think that it could not possibly be false. To model this position,I would take the scientist's theories to have only one component, namely the theory that he
currently takes to be necessarily true. In Section 5.3.1, I indicate how to adapt my resultsabout inductive inference for Levi's notion of acceptance.
43
� Improvements in observational capabilities, which alter the scientist's be-
liefs about what may be observed in a given possible world.
� New scienti�c questions added to the set of hypotheses under investigation.
[Kelly 1995, Ch.15] and [Kelly and Glymour 1992] give a learning-theoretic
analysis of methods that actively bring about these kinds of changes. But since
my goal is not a comprehensive theory of all aspects of scienti�c inquiry, but to
understand how cognitive values underwrite inductive inferences, I shall make
the simplifying assumption that the set of possible worlds, the hypotheses of in-
terest, and the scientist's background knowledge do not change in the course of
inquiry. In the language of [Kuhn 1970], my topic is induction in normal science.6 Hence I restrict scienti�c methods to revise their initial background
knowledge by adding nothing but the evidence. Formally, this means
that if �(;) = (K;T ); then for all e consistent with K; �(e) = (K\ [e]; T 0), where
T and T 0 are arbitrary theories. The conjectures of methods are not de�ned for
data sequences that are inconsistent with their original background knowledge.7
Under these restrictions, specifying the initial background knowledge K deter-
mines the future background knowledge. So to describe an inductive method, it
su�ces to specify its initial background knowledge K and its conjectures; I will
speak of methods with background knowledge K that produce empirical
theories only, not background knowledge paired with a theory. For example, to
de�ne the method from Section 2.3.1 in this way, let K = f0; 1g!, and �(e) = H
(\all swans are white") if all swans observed in e are white, and �(e) = [e]
otherwise (which entails that H is false).
6[Kitcher 1993, Ch.7, Sec.4] too views scientist as attacking problems of induction within
a given \background practice", which he connects to Kuhn's paradigms: \Thus, Kuhn'sconception of `normal science' allows for determinate resolution of issues because of the
constraining role of background practice (`the paradigm')." [Kitcher 1993, p.248, fn.42].[Donovan et al. 1988] and [Shapere 1984] also model scientists as relying on background
knowledge.7Thus my model of updating background knowledge agrees with the Bayesian's evolution
of his \sample space"; see Section 2.4.
44
Chapter 3
Truth,Content and Minimal
Change
3.1 Outline
This chapter begins the investigation of which methods are optimal for given
cognitive values. I start with content and truth (or empirical adequacy).
To evaluate the performance of a method with respect to a given desider-
atum, I apply the principle of admissibility. A method is admissible if it is
not dominated. In general, an act a dominates another option a0 if a in no case
leads to worse outcomes than a0, and sometimes leads to better outcomes. It
is easy to see that the only methods that are admissible with respect to avoid-
ing errors|call these error-minimal|are the methods whose theories never go
beyond the evidence. Following [Levi 1967, p.6], we may label such methods
\skeptical". 1 A skeptical method � never produces a false conjecture. But
if some conjecture T of a non-skeptical method � goes beyond the evidence,
T might be false. So � might make an error when � doesn't, and hence �
dominates � with respect to avoiding errors. Of course, skeptical methods do
not produce theories with any interesting content. Let us say that a theory T
has more content than a theory T0 if T is logically stronger than T
0 (in
semantic terms, T � T0). Then the only methods that are admissible with re-
spect to content|call these content-optimal|are those that always produce the
contradiction, because the contradiction has maximum content. If we restrict
ourselves to consistent theories, the only content-optimal methods are those
that provide complete theories; a theory T is complete if T makes a unique
prediction for each future observation, such that T = f�g for some data stream
1[Levi 1967] distinguishes between \global" skepticism, which denies claims to knowledgeof anything, and \local" skepticism, which holds that beliefs are justi�ed only if they follow
logically from the evidence. My skeptic is a local skeptic.
45
�. In either case, avoiding error con icts strongly with the goal of providing con-
tent, as several writers have noted. [Levi 1967] examines how an inquirer might
balance these values according to her taste. Without resorting to subjective
weights, we can go one step further by applying another principle from decision
theory: Pareto-optimality. The Pareto principle says that when an agent has
to make a trade-o� between con icting desiderata, it should not be possible to
improve her choice on one dimension without making her worse o� on another.
I call methods that are Pareto-optimal with respect to content and avoiding
errors content-error acceptable. I show that the content-error acceptable
methods are exactly those that always entail the evidence. These results are
simple, but they illustrate one of the main themes of this thesis: that familiar
methodological norms have a means-ends justi�cation with respect to certain
cognitive values. Another development of this theme leads to a critique of some
well-known principles for \minimal change" belief revision [G�ardenfors 1988].
Suppose we consider not only properties of a scientist's theory T at stage n of
inquiry, but also how her theory changes at stage n + 1 in light of new evi-
dence. What principles should guide this change? This question has received
much attention in philosophical logic and computer science. Many writers �nd
it plausible that the change should be a \minimal change". The idea is that in a
change from one theory to another, the new theory should be \as close as possi-
ble" to the old theory. However, it is notoriously di�cult to de�ne a satisfactory
notion of distance between theories (as the work on verisimilitude has shown
us; cf. [Miller 1974]). Another approach is to apply the concept of dominance.
Let us distinguish two kinds of change to a theory T : Adding a proposition P
to T , and retracting a proposition P from T . There are two plausible ways of
de�ning a notion of minimal change with dominance considerations:
1. Apply the Pareto principle. Then a change T0 from T is not minimal if
there is another change T1 such that T1 retracts no more from T than T0does, but adds less, or such that T1 adds no more to T than T0 does, but
retracts less.
2. Rank avoiding retractions �rst, avoiding additions second.
I show that each of these two notions half-agrees with the standard principles
for belief revision (known as the AGM postulates): The �rst agrees with the
AGM postulates when the evidence contradicts the current theory, but not
necessarily otherwise. The second agrees with the AGM postulates when the
evidence is consistent with the current theory, but not necessarily otherwise.
The recommendations for minimal belief change that stem from the Pareto
principle form an intuitively plausible new set of axioms for belief revision. It
is well-known in philosophical logic that axioms for belief change correspond
to principles of the logic of conditionals (\if p, then q") [G�ardenfors 1988]. An
interesting topic for future research is to examine which principles of conditional
logic the Pareto-optimality de�nition of minimal change validates.
46
Finally, I discuss ways in which belief revision theorymay constrain inductive
inferences, and argue that the AGM principles are not plausible as rules for
empirical inquiry. They do not help agents to gradually replace false beliefs
with true ones; if anything, they are obstacles on the path to truth that an
inquirer must steer around.
3.2 Dominance in Error and in Content
To de�ne what it means for a method to perform better than another with
respect to errors, I apply dominance twice: First, a method � weakly dominates
another method �0 with respect to error on a given data stream � if �0 makes
an error along � whenever � does, and at least once when � doesn't. Second,
a method � weakly dominates another method �0 with respect to background
knowledge K if � weakly dominates �0 on some data stream consistent with K,
and �0 makes an error along any data stream in K whenever � does.
De�nition 3.1 Dominance in Error
� � dominates �0 in error on a data stream � (written � �E
��0)()
1. for all n; if �(�jn) is false on �, then �0(�jn) is false on �, and
2. for some k; �0(�jn) is false on �, and �(�jn) is true.
� � dominates �0 in error given K (written � �E
K�0)()
1. for all data streams � in K, for all n; if �(�jn) is false on �, then�0(�jn) is false on �, and
2. for some data stream � in K, � dominates �0 in error on a data stream
� (i.e., � �E
��0).
� � is error-minimal given K () � is not dominated in error given K.
It is clear that a skeptical method � whose theories at each stage are entailed
by background knowledge and the evidence never makes an error on any data
stream consistent with the background knowledge. And if another method �
produces a theory that is not entailed by the evidence and background knowl-
edge, then that theory|by the de�nition of entailment|is false on some data
stream consistent with the background knowledge. So � dominates �0 in error.
It follows that the only error-minimal methods are the skeptical ones.
Fact 3.1 A method � is error-minimal given background knowledge K () for
all �nite data sequences e consistent with K; �(e) is entailed by [e] and K.
Because the content of a theory, unlike its truth, does not depend on the
entire data stream, I de�ne dominance with respect to content in terms of �nite
data sequences.
47
De�nition 3.2 Dominance in Content
� � dominates �0 in content given K (written � �C
K�0)()
1. for all �nite data sequences e consistent with K; �(e) has at least
much content as �0(e)|that is, �(e) j= �0(e)\K, and
2. for some �nite data sequence e consistent with K; �(e) has more con-tent than �
0(e)|that is, �(e) � �0(e)\K.
� � is content-optimal given K () � is not dominated in content givenK.
Since the contradiction ; has more content than any other theory, it follows
that the only content-optimal method is the one that always produces the con-
tradiction. However, we may wish to require that our methods produce only
theories consistent with given background knowledge K|call such methods
consistent given K|and then consider which theories among the consistent
ones provide maximum content. To formalize this idea, rephrase the second
part of De�nition 3.2 such that a method is content-optimal given K amongconsistent methods just in case it is not weakly dominated in content given K
by any consistent method. Restricting the test for weak dominance in a given
criterion c (in this case, content-optimality) to methods that satisfy another
criterion c0 (in this case, consistency) has the e�ect of ranking c
0 before c. I
shall often make use of this device. If we rank consistency before content, a
method is content-optimal among consistent methods if and only if it produces
complete theories.
Fact 3.2 A consistent method � is content-admissible given K among consistentmethods () for all �nite data sequences e consistent with K; �(e) = f�g, forsome data stream � consistent with K \ [e].
Facts 3.1 and 3.2 show that the goals of avoiding error and providing con-
tent are two masters that no method can serve at the same time. A scientist
might weight these factors in some subjective way to arrive at a trade-o� (along
the lines of [Levi 1967]). But independent of subjective weights, it seems un-
controversial that a bad trade-o� would be one that could be improved in one
dimension without impairing the other. This is the familiar Pareto principle. I
de�ne Pareto-dominance for content and error as follows.
De�nition 3.3 Pareto-dominance in Content and Error
� � Pareto-dominates �0 given K ()
1. for all �nite data sequences e consistent with K; �(e) has at least asmuch content as �
0(e), and � dominates �0 in error given K (i.e.,
� �E
K�0), or
48
2. for all data streams � in K, for all n, if �(�jn) is false on �, then�0(�jn) is false on �, and � dominates �
0 in content given K (i.e.,� �
C
K�0).
� � is content-error acceptable given K () � is not Pareto-dominated
given K.
What are the content-error acceptable methods? Suppose that a method
� fails to entail some evidence e. Then we can strengthen �(e) to entail the
evidence, without incurring an additional possibility of error. Hence � is not
content-error acceptable. Conversely, suppose we have a method � whose theo-
ries always entails the evidence. If we strengthen one or more conjectures of �,
we introduce a possibility of error where � has none. If we weaken one or more
conjectures of �, we lose content. Hence � is content-error acceptable. Thus a
method � is content-error acceptable just in case its theories always entail the
evidence; Figure 3.1 illustrates this fact.
Fact 3.3 A method � is content-error acceptable given K () for all �nite datasequences consistent with K; �(e) j= K \ [e] .
3.3 Dominance Principles and Minimal Theory
Change
Content and truth are properties of theories at a given time t. Just as a body
in motion has not only a position at time t, but also a velocity, we may ask
how the content and truth of a method's conjectures are changing from time
to time. What principles should govern theory change? A proposal that has
been in uential is that revising one's theory T in light of a proposition P|
such as a piece of evidence|should lead to a \minimal change" from T . How
might we de�ne a \minimal change"? One approach is to specify a notion of
distance between theories|a metric on the space of theories|and to use this
to measure how much a change T 0 di�ers from T . However, the di�culties with
de�ning distance between theories are familiar [Miller 1974]. But even without a
metric on theories, we can apply the concept of dominance to arrive at a notion
of \minimal change". There are two ways of changing a theory T : adding a
proposition to T , and removing a proposition from T . To be precise, let us say
that a theory T0 adds a proposition P to T just in case T 6j= P and T
0 j= P .
Similarly, a theory T0 retracts a proposition P from T just in case T j= P
and T0 6j= P . Using the concept of dominance to de�ne \adding more" and
\retracting less" leads to the following de�nition.
De�nition 3.4 Dominance in Retractions and Additions
� T0 retracts more from T than T1 (written T1 �R
TT0)()
49
Figure 3.1: Content vs. Error
50
1. for all propositions P , if T1 retracts P from T , then so does T0, and
2. for some proposition P , T0 retracts P from T but T1 does not.
� T0 adds more to T than T1 (written T1 �A
TT0)()
1. for all propositions P , if T1 adds P to T , so does T0, and
2. for some proposition P , T0 adds P to T but T1 does not.
The problem of revising theories with minimal changes is this: Given a
theory T , we wish to include a new piece of information P such that the result
of revising T with P|which I denote by T + P|is the minimal change from
T that includes P . I say that a revision T + P is a minimal change from
T if there is no other change T 0 from T such that (1) T 0 entails P , and (2) T 0
Pareto-dominatesT with respect to additions and retractions. (The de�nition of
Pareto-dominance with respect to additions and retractions follows the pattern
of De�nition 3.3. I spell it out in Section 3.6.) The following necessary and
su�cient conditions characterize minimal theory changes.
Proposition 3.4 Suppose that a revision T + P entails P . Then T + P is aminimal change ()
1. T \ P entails T + P and
2. if T j= P , then T + P = T .
Figure 3.2 illustrates Proposition 3.4.
Clause 2 obtains because obviously T itself is the minimal change from T
when T entails the new information P . Since adding P to T retracts nothing
from T , it follows that a theory T 0 that has more content than the addition of
P to T (that is, T 0 � T \ P ) adds more to T than T \ P , but does not retract
any less from T . So a minimal revision T + P cannot be stronger than T \ P .
However, at �rst glance it may seem that a minimal revision should not have
less content than T \ P , because a weaker theory retracts beliefs from T . But
as the proof of Proposition 3.4 in Section 3.6 shows, weaker theories than T \P
add fewer beliefs to T than T \P (provided that T 6j= P ), which compensates for
such retractions. An example may clarify this point. Recall the physical symbol
system hypothesis from Section 2.3.5. A scientist investigating this hypothesis
may believe that a certain AI system, say SOAR, is the only candidate for
arti�cial intelligence. This scientist presumably believes that \if SOAR passes
all future tests for intelligent responses, SOAR is intelligent", and also that \if
SOAR fails a test, there is no machine with arti�cial intelligence". Initially, the
scientist believes neither that the physical symbol system hypothesis is true,
nor that it is false. Now suppose the scientist learns that SOAR , after a
promising beginning of passing one-hundred tests, failed the last one. If he
added this evidence to his beliefs as they stand, he would conclude that machine
51
Figure 3.2: Pareto-Minimal Theory Changes that avoid Additions and Retrac-
tions
52
intelligence is impossible, although beforehand he had no de�nite view about the
matter. But if he were to retract his belief that \if SOAR fails a test, there is
no arti�cially intelligent machine", he might speculate that some machine other
than SOAR will possess true intelligence. So retracting this belief avoids adding
any conclusions about the possibility or impossibility of machine intelligence.
3.4 Is \Minimal Change" Belief Revision Mini-
mal?
In the previous section, I showed how we can apply the Pareto principle to
the values of avoiding additions and retractions to arrive at a notion of min-
imal theory change. How does this notion of minimal change compare with
proposals in the literature on \belief revision theory"? In a seminal paper,
[Alchourr�on et. al. 1985] propose that a revision should count as a minimal
change if it satis�es certain axioms, known as the AGM postulates. In my
setting (with consistent evidence statements), the AGM postulates amount to
the following (cf. [Kelly et al. 1995]).
(AGM 1) T + P j= P .
(AGM 2) If P 6= ;, then T + P 6= ;.
(AGM 3) If P \ T 6= ;, then T + P = T \ P .
(AGM 4) If P \ (T +Q) 6= ;, then T + (P \Q) = (T +Q)\ P .
Figure 3.3 contrasts the AGM postulates with minimal theory changes de-
�ned via the Pareto principle.
AGM 1 expresses the idea that the result of incorporating P into a theory
must entail P . It is not the notion of minimal change that seems to motivate
AGM 2, but rather an independent norm that an agent should avoid contra-
dictions. (G�ardenfors calls an inconsistent theory \epistemic hell".) AGM 3
says that if P is consistent with T , the minimal change to T that incorporates
P is simply the result of adding P to T . AGM 4 implies that revising T on a
conjunct P \Q should yield the same result as revising T �rst on Q and then
on P , provided that the revision of T on Q is consistent with P .
What I call a minimal revision on P entails P and hence satis�es AGM 1.
If T is inconsistent with P , the contradiction is a minimal revision, contrary to
AGM 2. Other than ruling out the contradiction, the AGM postulates provide
no guidance. Likewise, by clause 1 of Proposition 3.4, all revisions are minimal
revisions (since they are no stronger than T \ P = ;). In the case in which
T entails P , by Clause 2 of Proposition 3.4 the minimal revision of T on P is
just T , as in belief revision theory. (G�ardenfors lists Clause 2 as a separate
postulate,K+4 [G�ardenfors 1988, p.49].) But when T does not entail P , AGM
53
Figure 3.3: Pareto-Minimal Theory Changes and the AGM Axioms
54
3 rules out any theory that is weaker than the addition of P to T , although these
are minimal revisions by my de�nition. This is the main di�erence between the
notion of minimal change based on Pareto-optimality with respect to changes,
and the AGM approach. Is there another way of de�ning \minimal" change
from basic dominance principles that is closer to the AGM axioms? We can
�nd a clue in G�ardenfors' writing; he justi�es the postulate in question, AGM
3, as follows.
The next postulate for expansions can be justi�ed by the \eco-
nomic" side of rationality. The key idea is that, when we change our
beliefs, we want to retain as much as possible of our old beliefs|
information is in general not gratuitous, and unnecessary losses of
information are therefore to be avoided. This heuristic criterion is
called the criterion of information economy... If P is indetermined
in T (or if P is accepted in T ), then ... P does not contradict any of
the beliefs in T . It is therefore possible to retain all the old beliefs
in the expansion of T by P ; so the criterion of information econ-
omy justi�es the following: (K+3) if T \ P 6= ;, then T + P j= T .
[G�ardenfors 1988, p.49] (Emphasis G�ardenfors'; here and elsewhere
I adapt his notation to mine.)
Avoiding \loss of information" is a dubious defense of (K+3). The postulate
prohibits an agent from changing her mind in a way that we might well regard as
a gain in information. For example, if an ornithologist investigating the colour
of swans (as in Section 2.3.1) starts out with the belief that there is a black
swan, but after �nding one-hundred white swans accepts that all swans are
white, she violates (K+3); but if anything, she seems to have gained, not lost
information! I will take up this issue in Section 3.5. The quotation makes clear
that the purpose of (K+3), and of AGM 3, is to \retain old beliefs", that is to
avoid retractions. G�ardenfors states that his postulates are motivated by the
\conservativity principle", which says that \when changing beliefs in response
to new evidence, you should continue to believe as many of the old beliefs as
possible" [G�ardenfors 1988, p.67]. This suggests examining changes T 0 that are
not dominated in retractions, such that no other change retracts less than T 0 (in
the sense of De�nition 3.4). The next proposition shows that retraction-minimal
changes have at least as much content as the addition of P to T .
Proposition 3.5 A revision T + P is retraction-minimal () T + P j= T .
Now if we select among the changes that minimize retractions those that
minimize additions, we �nd that the only such change is the addition of the
evidence to the current theory, as AGM 3 requires.
Proposition 3.6 If a revision T + P is addition-minimal among retraction-minimal revisions, then T + P = T \ P whenever T and P are consistent with
each other.
55
In light of these results and G�ardenfors' text, can we interpret the AGM
postulates as specifying the means for an agent who wishes to, �rst, avoid
retractions, and second, avoid additions? The answer is no: If the current
theory T is inconsistent with P , then by Proposition 3.5, the retraction-minimal
revision T +P entails T \P = ;, and hence T +P = ;. If we categorically rule
out the contradictory theory as a result of a revision, we still have that in order
to minimize retractions, an agent must produce a complete theory whenever
the new information contradicts his previous beliefs. Figure 3.4 summarizes the
relationships between the two notions of belief change derived from dominance
principles and the AGM postulates.
G�ardenfors rejects this consequence of avoiding retractions as \unpalatable".
When T is a nonmaximal belief set, which is the normal case, [and T
is inconsistent with P ,] T +P is a maximal belief set that obviously
is too large [viewed as a set of sentences] to represent any intuitive
process of rational belief revision|just because you revise your be-
liefs with respect to P , you should not be led to have a de�nite
opinion on every matter. [G�ardenfors 1988, p.59]
It seems fair to ask why conservatism is the fundamental principle of \ratio-
nal" belief revision when the theory is consistent with the evidence, but not
otherwise. If we should aim to \retain as many of our old beliefs as pos-
sible", and being a \besserwisser" [G�ardenfors 1988, p.58], a know-it-all, is
the only way to do so when contradictory information arrives (by Proposition
3.5 and G�ardenfors' observations), then should we not become besserwissers?
G�ardenfors' text suggests that we should not have to adopt \a de�nite opinion
on every matter" when, \in the normal" case we did not have one before. He
seems to think that adding many new beliefs is too high a price for retaining
old beliefs. But if avoiding additions can compensate for retractions when the
evidence refutes the agent's beliefs, the same should be true when the agent's
beliefs are consistent with the evidence. The Pareto-optimality principle for
additions and retractions strikes just this kind of balance.
One question is how to de�ne a plausible conception of \minimal belief
change" in a principled way. The dominance principles that I proposed accom-
plish that; it is not clear that the AGM postulates do. Postulates for minimal
theory change may give good guidance for updating data bases or revising legal
codes [G�ardenfors 1988]. But for the theory of inductive inference, the issue is
whether agents who are engaged in empirical inquiry should revise their theories
in a minimal way. Should scienti�c inquiry obey the AGM postulates? This is
the question of the next section.
56
Figure 3.4: Three Notions of Minimal Theory Change
57
3.5 Empirical Inquiry and Belief Revision
What are the implications of the AGM principles for empirical inquiry? Some
writers deny that there are any. Thus [Nayak 1994] states that \AGM is inad-
equate for iterated belief change". And recently, some writers have suggested
that we should view belief revision principles not as norms for actual belief
change at all, but instead as specifying acceptance conditions for conditionals
(e.g., [Levi 1988]). But to see if the AGM principles are plausible recommen-
dations for scienti�c inference, I shall explore some ways of interpreting them
in this setting. A natural proposal is to de�ne a scienti�c method as the result
of revising an initial theory on the evidence. 2 Let a belief revision operator
+ and an initial theory T be given. I say that a method � is represented
repetitively by + and T if for all data sequences e and single observations x:
1. �(;) = T:
2. �(e � x) = �(;) + [e � x].
Another proposal is that it is always the current theory that is revised as
the data comes in; I call this sequential revision. Accordingly, a method � is
represented sequentially by + and T if for all data sequences e and single
observations x:
1. �(;) = T:
2. �(e � x) = �(e) + [e � x].
A method is consistent with the AGM postulates if there is some belief revi-sion operator + satisfying the AGM postulates such that + and �(;) represent
�. I refer to such methods as AGM methods. [Kelly et al. 1995, Prop.2] show
that as far as de�ning the class of AGM methods is concerned, it does not
matter if we represent methods repetitively or sequentially.
Proposition 3.7 (Kelly, Schulte, Hendricks) A method � is represented se-quentially by some AGM belief revision operator + () � is represented repet-
itively by some AGM belief revision operator +0.
[Kelly et al. 1995, Prop.2] characterize what the AGM methods are like.
Proposition 3.8 (Kelly, Schulte, Hendricks) A method � is AGM() for
all �nite data sequences e; e � x
1. if e � x is consistent with �(e), then �(e � x) = �(e)\ [e � x];
2A di�erent proposal is that some inductive process decides what proposition to incorporate
into a theory, but that the AGM principles guide the incorporation; see [Kelly et al. 1995].The following de�nitions are from [Kelly et al. 1995].
58
2. �(e) is consistent;
3. �(e) j= [e].
By Fact 3.3, all content-error acceptable methods satisfy clause 3. Clause 2
keepsAGM methods out of \epistemic hell". (Proposition 6.1 in Chapter 6 pro-
vides a means-end justi�cation for producing only consistent theories). Clause 1
combines two requirements for the case in which the data is consistent with the
current theory: The method should not add more than the data to its theory,
and the method should not weaken its theory. [Kelly et al. 1995] refer to the
�rst requirement as \timidity" and to the second as \stubbornness". Proposi-
tion 3.4 shows that timidity follows not only from the AGM principles, but also
from the weaker notion of minimal change based on Pareto-optimality between
additions and retractions. G�ardenfors motivates the principle as follows.
[The other postulates] do not ... exclude the possibility that T+P
contains a lot of beliefs, not included in T , that have no connection
whatsoever with P . Because we want to avoid endorsing beliefs that
are not justi�ed, we should require that T+P not contain any beliefs
that are not required by the other postulates [i.e., AGM 1, 2 and
the requirement that T + P j= T \ P when T is consistent with P ].
[G�ardenfors 1988, p.51]
But if \unjusti�ed beliefs" are the chicken to kill, AGM 3 is a butcher's
knife. Consider a hypothetical ornithologist working on the colour of swans, as
in Example 2.3.1, who is initially agnostic about whether all swans are white
or not. If in fact all swans are white, timidity prevents the ornithologist from
accepting this generalization, no matter how many white swans he sees: for each
new white swan, a minimal change in his beliefs will not add any more than that
this swan, too, is white. Writers from [Putnam 1963] to [Maher 1996] agree that
an inductive method should eventually produce such a universal generalization.
Hence timidity is unacceptable as a norm for empirical inquiry.
Stubbornness fares little better. Proposition 3.5 shows that stubbornness
avoids retractions. But if the scientist starts out with false beliefs, good meth-
ods should lead him to retract these in favor of true beliefs. However, if our
ornithologist initially believes that there is a black swan, then stubbornness will
prevent him from retracting this belief no matter how many white swans he
sees|clearly a bad prescription for inductive inference.
So the AGM principles do not help an agent to arrive at true beliefs|
but do they interfere with that goal? This depends on the scientist's initial
beliefs. In the example of the swans, a scientist who begins with the bold
hypothesis that all swans are white, and only adds the data at each stage, is
guaranteed to eventually have the correct opinion about this hypothesis (cf.
Section 9.5). This idea works in general: [Kelly et al. 1995] show that if there
is any method at all that reliably settles a set of hypotheses of interest, then
59
there is an AGM method that does so. 3 The trick is to choose su�ciently
strong initial beliefs, for example a complete theory. Then one of two things
may happen: Either the strong theory is true, in which case simply adding the
data to the theory entails the correct answer about the hypotheses of interest.
Or the theory is false, and is eventually refuted by the evidence; at that point,
the AGM principles allow the method to choose any other conjecture. Making
these choices in the right way leads to a reliable method. Thus from the point
of view of means-ends analysis, the problem with the AGM norms is not that
they make reliable inquiry impossible, but that they force agents who want to
be reliable to produce theories with more content than necessary for settling the
question under investigation, exposing the inquirer to higher risk of error and
retractions. (Section 5.3.1 discusses this issue in more detail, and shows that
the AGM theory shares this problem with Popper's falsi�cationism.)
One important goal of inquiry is to replace uncertainty at the beginning
with true theories. The AGM principles contribute nothing towards this goal.
Part of the problem is that the epistemic value that motivates belief revision
theory is avoiding retractions, not �nding the truth. Another part is that the
minimal-change perspective looks ahead only one step at a time. In the next
chapter I consider a perspective that stands in sharp contrast to this myopia:
What is empirical inquiry like if the goal is to settle on a true theory, whether
it takes one, ten, or any number of steps?
3.6 Proofs
De�nition 3.5 Let T; T0; T1 be three theories.
� A change T0 from T is Pareto-dominated with respect to retractions andadditions by T1 ()
1. T0 adds more propositions than T1 to T , and for all empirical propo-
sitions H; if T j= H and T0 j= H; then T1 j= H, or
2. T0 retracts more propositions than T1 from T , and for all empirical
propositions H; if T 6j= H and T1 j= H; then T0 j= H.
� A revision T + P is a minimal change () T + P is a change from T
that is not Pareto-dominated with respect to retractions and additions byanother change T 0 that entails P .
Proposition 3.4 Suppose that a revision T + P entails P . Then T + P is aminimal change ()
1. T \ P entails T + P and
3[Martin and Osherson 1995] prove a similar universality result for AGM methods.
60
2. if T j= P , then T + P = T .
Proof. (=)) Part 1: I show the contrapositive. Suppose that T \ P 6j=
T +P . Then there is a data stream � that makes T and P true but T +P false.
Thus T + P j= f�g, but T 6j= f�g. Now consider (T + P ) [ f�g. This theory
does not entail f�g, and if (T + P ) [ f�g adds anything to T , so does T + P
since T + P j= (T + P )[ f�g. Hence (a) (T + P )[ f�g adds fewer propositions
to T than T + P . Moreover, (b) (T + P ) [ f�g retracts from T only what
T + P retracts from T : clearly if T + P retracts H from T , then (T + P )[ f�g
retracts H from T since (T +P )[f�g has less content than T +P . Conversely,
suppose that (T + P )[ f�g 6j= H, and that T j= H. Since � 2 T , it follows that
T +P 6j= H. Thus if (T +P )[f�g retracts a proposition from T , so does T +P .
Finally, we have (c) (T + P ) [ f�g j= P , since both T + P and f�g entail P . It
follows from (a), (b) and (c) that T + P is not a minimal change
Part 2: Since T j= P by hypothesis, T is the unique minimal change from T
that entails P .
((=) Suppose that T + P satis�es conditions 1 and 2. Then the claim is
immediate if T j= P ; suppose that T 6j= P , and let � be a data stream on which
T is true and P false. I show that no change T 0 that entails P Pareto-dominates
T +P with respect to revisions. First, suppose that T +P retracts a proposition
H from T , and T 0 does not, so that T 0 j=H. Since T 0 j= P j= f�g, we have that
(d) T 0 j= H\f�g. But T 6j= f�g, and hence (e) T 6j= H \f�g. Since T +P 6j= H,
we have that (f) T +P 6j= H \f�g: From (d), (e) and (f) it follows that T 0 adds
a belief to T that T + P does not to T . So T 0 does not Pareto-dominate T + P
in additions or retractions from T .
Second, suppose that T + P adds a proposition H to T , but T 0 does not
entail H. Hence T 0 is consistent with H; this and the fact that T \ P j= T + P
implies that T 0 \H 6j= T \ P , because otherwise T 0 \H and hence H would be
consistent with T + P . But if T 0 \H 6j= T \ P , then there is a data stream �
in T0 \ H on which T is false, because T 0 and thus T 0 \ H entails P . So (g)
T j= f�g, but T 0 does not. Finally, (h) T + P j= f�g because T + P j= H and
H is false on �. From (g) and (h) it follows that T 0 retracts a belief from T
that T + P does not retract from T . So T 0 does not Pareto-dominate T + P in
additions or retractions from T .2
Proposition 3.5 A revision T + P is retraction-minimal () T + P j= T .
Proof. ((=) If T + P entails T , then T + P retracts nothing from T and
hence is clearly retraction-minimal.
(=)) I show the contrapositive. Suppose that T + P 6j= T ; let � be a data
stream that makes T + P true but T false. Then T entails f�g, but T + P does
not. So T + P � f�g retracts fewer propositions from T than T + P .2
Proposition 3.6 If a revision T + P is addition-minimal among retraction-
minimal revisions, then T + P = T \ P whenever T is consistent with P .
61
Proof. From Proposition 3.5 we have that T +P j= T \P . From Proposition
3.4 it follows that T \ P j= T + P . Thus T + P = T \ P .2
62
Chapter 4
Discovery Problems and
Reliable Solutions
4.1 Outline
In the previous chapter I showed that principles of minimal change revision
give no guidance towards �nding a true hypothesis, and indeed may interfere
with that goal. In sharp contrast, philosophers such as Peirce, Putnam and
Reichenbach have proposed that inquiry should take as its primary goal �nding
the truth|if not immediately, then in the long run. This chapter and the next
analyze the design of scienti�c methods for attaining this goal. I consider the
problem of identifying a correct hypothesis among a range of mutually exclusive
alternatives. A method � is admissible for this purpose if there is no other
method �0 that identi�es a correct hypothesis on more data streams than �.
The �rst result of this chapter is that the methods admissible in this sense are
exactly those that [Kelly 1995] has termed reliable: they succeed on all data
streams consistent with given background knowledge.
When exactly is there a reliable method for identifying a correct hypothe-
ses from a given set of alternatives? The answer is that the set of alternatives
must be enumerable, and that it must be possible to decide each hypothesis
in the limit. The second condition means that there must be a method that
eventually always entails the hypothesis if the hypothesis is true, and eventually
always entails the negation of the hypothesis if the hypothesis is false. Some
hypotheses can be reliably assessed in the limit of inquiry even though no �-
nite amount of evidence conclusively falsi�es the hypothesis in question. Thus,
contrary to Popper's ideas, we can admit unfalsi�able hypotheses as candidates
for|asymptotically reliable|empirical tests without jeopardizing the ability of
science to avoid error and �nd interesting truth in the long run [Popper 1968].
Indeed, if all alternatives under investigation are unfalsi�able but decidable in
63
the limit, Popper's recommended conjectures-and-refutations scheme is not themost reliable means for �nding the truth.
Since the feasibility of reliably identifying a true hypothesis depends crucially
on the testability of the alternatives in question, I examine what the structureof hypotheses must be like if we want to be able to test them in various senses.
This de�nes a precise topological scale of inductive complexity which places
hypotheses that are harder to test above the easier ones.
The results in this chapter bring out some of the consequences for method-
ology if we adopt the norm that scienti�c methods should be reliable in the long
run. This norm has been criticized both as too weak|because success in the
long run is allegedly not an interesting aim for science|and as too strong|
because we may want to regard an inductive problem as tractable even when
every method for investigating it might fail to converge to the truth, on the
grounds that the possibilities of failure are \negligible". I address these criti-
cisms at the end of this chapter.
4.2 Convergence to the Truth
Signi�cant questions about the world prompt empirical research, and translate
into hypotheses. The researcher seeks a theory that settles the hypotheses of
interest. An important special case obtains when the hypotheses of interest are
mutually exclusive alternatives; then I say that these hypotheses form a parti-
tion. Popper referred to the task of selecting one hypothesis from a partition of
alternatives as the problem of discovery [Popper 1968]. Of course, choosing
some alternative is a trivial task; the interesting problem is to choose the right
one. We would like science to quickly determine with certainty which alternative
is correct. But as the examples from Section 2.3 show, this is usually too much
to ask: no �nite amount of evidence will tell us with certainty that all swans
are white or whether there are in�nitely many elementary particles. As William
James pointed out, \the intellect, even with truth directly in its grasp, may
have no infallible signal for knowing whether it be truth or no" [James 1982,
Sec.VI, p.197].
Philosophers such as Peirce, Putnam, Reichenbach, Glymour and Kelly, and
learning theorists such as Gold, have proposed a more feasible goal of inquiry:
that the scientist should eventually settle on the right alternative, without nec-
essarily providing a sign that she has done so (see Figure 4.1).
This conception of success �ts with Plato's idea that stable true belief is bet-
ter than �ckle opinion that happens to be true, and may indeed be a necessary
aspect of knowledge.
For [true opinions], so long as they stay with us, are a �ne pos-
session, and e�ect all that is good; but they do not care to stay for
long, and run away out of the human soul, and thus are of no great
64
Figure 4.1: Successful Discovery: On data stream �, method � identi�es the
correct hypothesis from a set of alternatives.
65
value until one makes them fast with causal reasoning...But when
once they are fastened, in the �rst place they turn into knowledge,
and in the second, are abiding. [Plato 1967, p.363]
We may de�ne success in the limit of inquiry like this. A method � con-
verges to an alternative H on a data stream � if after some stage n, for all
later stages n0 � n :
1. �(�jn0) j= H, and
2. �(�jn0) is consistent.
Given a collection of alternative hypothesesH and a data stream �, I denote
the hypothesis from H that is true on � by H(�). (Here and elsewhere I assume
that for each data stream � in K, there is a hypothesis in H that is correct on �;
that is, K j=SH.) I say that a method � succeeds on � if � converges to H(�)
on �. What methods are admissible with respect to the aim of converging to a
true alternative? Obviously, it su�ces if a method succeeds on all data streams
consistent with background knowledge K. Conversely, a method � is admissible
as far as convergence to the truth is concerned only if � succeeds everywhere in
K. For suppose that � fails on some data stream � consistent with K. Then
de�ne �0 like this: Conjecture H(�) while the data are consistent with �. If
the data deviate from �, �0 follows �. Then �0 succeeds on � and everywhere
that � does, and hence dominates � with respect to converging to the truth. So
convergence-admissible methods succeed everywhere in K.
Fact 4.1 Let H be a collection of alternative hypotheses, and let K be back-ground knowledge. Then a method � is admissible given K with respect to con-
verging to a correct alternative in H () � converges to the correct alternativeon each data stream � in K.
Following [Kelly 1995], I say that a method � solves the discovery prob-
lem for H given background knowledge K, or is reliable for H given K, if �
succeeds on all data streams in K. Thus Fact 4.1 says that a method is admissi-
ble with respect to converging to the truth if and only if the method is reliable.
Section 2.3 gives examples of reliable methods.
4.3 Reliable Solutions for Discovery Problems
As with any proposed aim of inquiry, I examine under what conditions the aim
is feasible, and how to attain it when it is feasible. In the case of converging to
a true hypothesis in a discovery problem, by Fact 4.1 this question turns into
the question of when a discovery problem permits a reliable solution, and what
reliable methods are like.
66
Popper had a proposal for how to attack discovery problems: His conjectures-
and-refutations method. Brie y, the idea is the following. Assume that the
collection of alternatives is countable, and thus can be enumerated in some se-
quence H1;H2; ::: (perhaps with \bolder" hypotheses coming before more timid
ones, if an audacity ranking is available). Begin with the �rst hypothesis H1,
and conjecture H1 until the evidence falsi�es H1. Then move on to H2, con-
jecture H3 if H2 is falsi�ed, etc. The conjectures-and-refutations scheme will
reliably identify the true alternative if each alternative hypothesis Hi has the
property that whenever Hi is false, the evidence eventually falsi�es Hi. I say
that hypotheses with this property are refutable with certainty. To see that
the conjectures and refutations scheme is reliable if all alternatives are refutable
with certainty, let Hn be the true hypothesis. So H1;H2; : : : ;Hn�1 are false,
and hence eventually each of these alternatives is conclusively falsi�ed. After
that point, the conjectures-and-refutations procedure always produces the true
hypothesis Hn.
One might think that the alternative hypotheses under investigationmust berefutable with certainty if inquiry can reliably identify a true one. For otherwise
we run the danger of always maintaining a false conjecture that is never conclu-
sively refuted by the evidence. But the almost universal generalizations from
Section 2.3.2 show the fallacy in this argument: The generalizations \almost all
swans are white" and \almost all swans are black" allow any �nite number of
exceptions, and hence are consistent with any evidence. Yet, we saw in Section
2.3.2 that it is possible to reliably identify which of the almost universal gener-
alizations is true (assuming that one of them is). Similarly, statements about
the limiting frequency of an event are not falsi�able by any �nite amount of evi-
dence. But as we saw in Section 2.3.4, an inductive method can reliably identify
the true limiting frequency of an event from a �nite set of possible alternatives.
Popper derived his conjectures-and-refutations scheme from his conception
of how science tests theories, namely by successive attempts at falsi�cation. In-
deed, testing alternative against empirical data hypotheses is crucial to reliable
discovery. But reliable discovery methods need not proceed by waiting for the
evidence to conclusively falsify the current hypothesis. It su�ces to have a test
method that will settle the truth value of each alternative in the limit, without
ever yielding certainty about whether a conjecture is true or false. Following
[Kelly 1995, Ch.4], I say that such a test procedure decides a hypothesis H in
the limit. To be precise, a method � decides a hypothesis H in the limit given
background knowledge K if for all data streams � in K:
1. � converges to H if H is true on �;
2. � converges to H if H is false on �.
Figure 4.2 illustrates decision in the limit.
A hypothesis H is decidable in the limit given background knowledge K
if there is a method � that decides H in the limit given K: A hypothesis H
67
increasing evidencefrom ε
δ conjecturesthat H
δ conjecturesthat H
(a) H is true on
is false
is true
increasing evidencefrom ε
H
δ conjecturesthat H
δ conjecturesthat H
(b) H is false on
is false
is true
Conjectures ofmethod δ
Conjectures ofmethod δ
ε;conjectures stabilize to H
ε; conjectures stabilize to the negation of
Figure 4.2: Testing Empirical Hypotheses: Decision in the Limit of Inquiry
68
that is refutable with certainty, such as \all swans are white", is decidable in
the limit: Conjecture H until the evidence refutes H. If H is never refuted,
then H is true and hence this method settles on the correct truth-value for H
(immediately). The hypothesis \almost all swans are white" is not refutable
with certainty, but decidable in the limit if we assume that either almost all
swans are white or almost all swans are black: Conjecture \almost all swans
are white" if the last observed swan is white, and \almost all swans are black"
otherwise (cf. Section 2.3.2).
If there are only countably many alternative hypotheses, and if each of them
is decidable in the limit, there is a reliable method for identifying the true one
among them: Use a limiting decision test procedure �1; �2; :: for each of the
hypotheses H1;H2::: respectively. Out of all the hypotheses Hi whose test �iis positive on given evidence e|that is, �(e) j= H and �(e) is consistent|
conjecture the alternative that comes �rst in the enumeration. If Hn is the true
alternative, after some �nite time the limiting decision procedures �1; �2; :::; �n�1
for H1;H2:::;Hn�1 always return negative results, whereas the test procedure
�n for Hn is positive. After that point, the discovery method described set-
tles on Hn. Conversely, if there is a reliable method � for identifying the true
alternative from a collection of hypotheses H, then each hypothesis H in H
must be decidable in the limit. In fact, � itself decides each hypothesis H in
H in the limit, because � eventually settles on the true one, which entails the
negation of all the other (false) alternatives, since the hypotheses in H are mu-
tually exclusive. Finally, if there are more than countably many alternatives in
H, there is no reliable method for discovering the true one. This is so because
there are only countably many �nite data sequences, so if there are uncount-
ably many alternatives, there must be some that a discovery method � never
produces. If one of these is correct|call it H|� never conjectures H, much
less stabilizes to H. Thus the space of alternative hypotheses must be small|at
most countable|in comparison to the uncountable space of empirical possibil-
ities, the data streams. In sum, we have the following necessary and su�cient
conditions for reliable discovery.
Proposition 4.2 (Kevin Kelly) Let a partition H of background knowledgeK be given such that K =
SH:
1 Then the discovery problem for H is solvable
given K ()
1. H is countable, and
2. each H 2 H is decidable in the limit given K:
1The assumption that K =S
H incurs no loss of generality. For I assume throughout
that H covers all possibilities in K, such that K �S
H . And in the presence of background
knowledge K, we may consider H 0 = fH \ K : H 2 Hg as the speci�cation of alternative
hypotheses; clearly K =S
H 0.
69
This characterization is useful in several ways. For given background knowl-
edge K, it tells us whether there is a reliable discovery method for a partition
H of alternative hypotheses. If there is, we may recommend that a scientist
should use a reliable method (as in the examples from Section 2.3), and criticize
those who do not (as I criticized the AGM postulates in Chapter 3). If there
is no reliable discovery method, we may lower our sights to a weaker notion of
inductive success (such as gradual identi�cation; see Section 2.3.4), or we may
criticize a research program for a mismatch between its ambitions and its means
(as Kelly critiqued cognitive science [Kelly 1995, Ch.7]; see also Section 4.4 be-
low.) If the standard of serious possibility K is open to discussion, one of the
factors that may be relevant in deciding to adopt a standard K is whether K
gives rise to reliable discovery methods for problems of interest. (The results in
Chapter 8 suggest that this is one motivation for using conservation principles
in particle physics.)
4.4 Testing and Topology
Proposition 4.2 shows that the question \is there a reliable discovery method
for a countable partition of alternatives H?" reduces to the question \is each
alternativeH decidable in the limit?". This section characterizes the structure of
hypotheses that are decidable in the limit. If a hypothesisH and its complement
H are both countable unions of refutable hypotheses, then H is decidable in the
limit. For in that case we can form an enumeration C = CH
1 ; CH
1 ; CH
2 ; CH
2 ; ::: of
refutable subsets of H and H , respectively, such that C covers H and H. Then
we \internally" conjecture the �rst hypothesis in the enumeration consistent
with the evidence. Since C covers both H and its complement, one of the
refutable hypotheses in the enumeration must be true, and our method will
eventually stabilize to the �rst true \internal" hypothesis Ci in C that entails
the correct answer H, or H , as the case may be.
The converse holds as well. Let � be an inductive method. I say that �
projects a hypothesis H from H along � at e if � extends e and � stabilizes to
H at e. I refer to this set as the projection set of the method �, and denote
it by proj(�; e); formally, proj(�; e) = f� : if e � e0 � � and �(e) j= H 2 H, then
�(e0) j= H and �(e0) 6= ;g. Figure 4.3 illustrates this concept.
The proposition that a method � stabilizes at a given time is refutable with
certainty; for if it is false, then � changes its mind and conclusively falsi�es the
proposition.
Lemma 4.3 Let � be a discovery method, and let background knowledge K begiven. Then for all �nite data sequences e, proj(�; e) \K is refutable with cer-tainty given K.
Now suppose that � decides a hypothesis H in the limit. Consider the set
Stabil(�;H) of all �nite data sequences at which � stabilizes to H; formally,
70
Figure 4.3: The Projection Set of a Discovery Method �
71
Stabil(�;H) = fe : proj(�; e) j= Hg. By Lemma 4.3, Stabil(�;H) is a countable
collection of refutable hypotheses, each of which entails H. Since � is reliable,
� stabilizes to H whenever H is true, and thus each data stream that makes H
true is contained in some member of Stabil(�;H). Therefore H is a countable
union of refutable hypotheses. The same argument holds for H. Taking into
account background knowledge, these observations yield the following result.
Proposition 4.4 (Kevin Kelly) A hypothesis H is reliably decidable in the
limit given background knowledge K () H \K and H \K are each countableunions of hypotheses that are refutable with certainty given K.
A hypothesis like \there are only �nitely many elementary particles" (cf.
Section 2.3.6) is a countable union (disjunction) of refutable hypotheses: the
hypothesis is equivalent to \there are no elementary particles or there is at
most one or there are at most two ...". But this hypothesis|call it H|is not
reliably decidable in the limit. For let any method � be given. An inductive
cousin of a Cartesian demon may lead � astray as follows. If �(e) entails H, the
demon presents one new particle after another, until � conjectures that there
are in�nitely many elementary particles. Then the demon stops the ow of new
particles until the theory of � entails H again, etc. If � stabilizes to H, then
the demon presents in�nitely many particles, and H is false. If � stabilizes to
H, then the demon presents only �nitely many particles, and H is false. So if
� stabilizes to an answer, it is the wrong answer. On the other hand, if � does
not converge, then � fails to decide H in the limit. Thus in all cases, � fails.
Since the demonic strategy works against any method, H is not decidable in
the limit. It follows from Proposition 4.4 that the negation of H, \there are
in�nitely many elementary particles" is not a countable disjunction of refutable
hypotheses. However, we saw in Section 2.3.6 that a method � can reliably
assess H in a weaker sense: if H is true, then � eventually always entails H;
and if the hypothesis is false, � will not stabilize to entailing H, although � may
not stabilize to entailing H either. Following [Kelly 1995, Ch.3], I say that �
reliably veri�es H in the limit (Figure 4.4).
The di�erence between decision and veri�cation in the limit arises when a
hypothesis H is false. In that case decision requires that a reliable method
must eventually always entail that H is false, whereas veri�cation allows that a
reliable method may go back and forth between H and its negation H. Other
hypotheses that are veri�able but not decidable in the limit are the two almost
universal generalizations from Section 2.3.2, assuming that both of them may be
false on the same data stream, and Newell and Simon's physical symbol system
hypothesis (cf. Section 2.3.5). There are two ways of viewing the fact that the
physical symbol system hypothesis is veri�able but not decidable in the limit.
A supporter of arti�cial intelligence might point out that there is a reliable
procedure for �nding an intelligent machine if there is one, and hence Newell
and Simon's research program is indeed capable of leading us to the truth about
72
increasing evidencefrom ε
δ conjecturesthat H
δ conjecturesthat H
(a) H is true on
is false
H
is true
increasing evidencefrom ε
δ conjecturesthat H
δ conjecturesthat H
(b) H is false on
is false
is true
Conjectures ofmethod δ
Conjectures ofmethod δ
...
ε;
ε;conjectures need not stabilize
conjectures stabilize to
Figure 4.4: Method � veri�es hypothesis H in the limit.
73
machine intelligence. A critic might reply that if the physical symbol system
hypothesis is false, there is no reliable method for determining this fact. He may
add that if there is no intelligent machine, then one attempt after another at
building an intelligent system will fail, but no particular history of failures will
appear to be a better reason to abandon the project than another. Reliability|
avoiding conjecturing the physical symbol system hypothesis forever when it is
in fact false|puts pressure on the AI proponent to eventually give up his quest;but she may respond to each failure by asking for more patience (and research
grants). Such a critic would conclude that Newell and Simon's faith in the ability
of \empirical research" to settle the physical symbol system hypothesis as they
formulated it is unjusti�ed, and that arti�cial intelligence researchers need a
stronger theory of intelligence to arrive at plausible background assumptions
that reduce the inductive complexity of the physical symbol system hypothesis.
An argument just like the one for Proposition 4.4 establishes that a hypoth-
esis H is reliably veri�able in the limit just in case H is a countable union of
refutable hypotheses.
Proposition 4.5 (Kevin Kelly) A hypothesis H is veri�able in the limit given
background knowledge K () H \K is a countable union of hypotheses that arerefutable with certainty given K.
As an immediate corollary, a hypothesis H is decidable in the limit just in
case both H and its negation H are veri�able in the limit.
Corollary 4.6 (Kevin Kelly) A hypothesis H is reliably decidable in the limitgiven background knowledge K () H and H are each reliably veri�able in the
limit.
Thus the answer to the questions \is H decidable in the limit?" and \is
H veri�able in the limit?" depends on whether H and H can be decomposed
into refutable hypotheses. The structure of refutable hypotheses is as follows.
Recall that a hypothesis H is refutable with certainty just in case the evidence
eventually falsi�es H whenever H is false. This condition will fail if and only
if there is a data stream � that makes H false but no �nite amount of evidence
from � conclusively falsi�es H. Following [Kelly 1995, Ch.4], I call such a data
stream a limit point of H. Figure 4.5 illustrates this concept.
To be precise, a data stream � is a limit point of H just in case all �nite initial
segments �jn of � are consistent with H. A set of data streams H is closed just
in caseH contains all of its limit points, and closed given background knowledge
K just in case H contains all of its limit points that are consistent with K. So
\all swans are white" is closed, but its negation \there is a non-white swan" is
not. A closed hypothesis H is refutable with certainty: if H is false on a data
stream �, then � is not a limit point of H since H is closed, which implies that
H is eventually falsi�ed along �. Conversely, if H is refutable with certainty,
then H is closed: Otherwise there is a limit point � of H missing from H, which
means that H is false on � but never falsi�ed along �. Thus we have:
74
Figure 4.5: Data stream � is a limit point of hypothesis H.
75
Proposition 4.7 (Kevin Kelly) A hypothesis H is refutable with certaintygiven background knowledge K () H is closed given K.
This notion of closed set de�nes a topology in the space of data streams
(cf. [Kelly 1995, Ch.4], [Schulte and Juhl 1996]) whose open sets are the com-
plements of closed sets. Kelly shows that an open set in this topology is a union
of evidence propositions [e]. The open sets are exactly the hypotheses that
are eventually entailed by the evidence whenever they are true; we may call
such a hypothesis veri�able with certainty. Obviously hypotheses that are
veri�able with certainty are veri�able in the limit. Since a refutable (closed) hy-
pothesis is also veri�able in the limit|by Proposition 4.5|and its complement
is open and thus veri�able with certainty, it follows from Proposition 4.4 that
all refutable hypotheses are decidable in the limit (as we observed in Section
4.3).
Topologists refer to countable unions of closed sets|hypotheses veri�able
in the limit|as G� sets. There is a correspondence between quanti�er alterna-
tions in natural-language de�nitions of hypotheses and their topological struc-
ture: Closed sets correspond to universally quanti�ed hypotheses (\all swans
are white"). Open sets correspond to existential hypotheses (\there is a non-
white swan"). G� sets correspond to 98 claims (\there is a time n such that for
all later times n0, no new particles are discovered"). The correspondence holds
exactly only under certain conditions (cf. [Kelly 1995, Ch.12]), but it is a useful
heuristic for determining the sense in which a given hypothesis is testable.
4.5 Against Method
An essential part of long-run methodology is the notion of a long-term strategy
for responding to evidence. Isaac Levi rejects this picture of inquiry on the
grounds that an agent should be free to review and revise his approach during
the course of inquiry.
In the �rst place, although I am inclined to the view that there
are some �xed methodological norms..., these norms are extremely
weak and heavily dependent for their operation on contextual factors
which change with developments in scienti�c inquiry. Consequently,
I too am \against method" if by this one means a very substantive
method immune to criticism and revision during the course of inquiry
itself. [Levi 1980, p.68]
I grant that \contextual factors" may change during the course of inquiry, for
example in a scienti�c revolution (cf. Section 2.5). But although the course of
inquiry may change how we conceive of a given scienti�c question, we would still
like to know how to investigate an empirical question as we currently conceive
of it. I grant too that a scientist ought to be able to review his approach to a
76
research problem and to reject it if he �nds it wanting. But it does not follow
that an agent should not make any plans at all, or refrain from comparing
the potential results of one long-term plan from another. This is a question
not just about scienti�c methodology, but about the role of plans in sequential
decision making in general. Suppose Maite plans to drive from Pittsburgh to
San Francisco via Chicago. As she approaches the exit to Chicago, she checks
the map and realizes that she might take a shortcut to Iowa and avoid Chicago,
with a likely time savings of several hours. Of course she should be able to
change her plan at that point. But rather than taking this as an argument
against making a plan at all, I would say that Maite should have chosen a better
plan; she should have planned to avoid Chicago in the �rst place. Although
an agent may reject a plan as she is following it, it is plausible to require of a
good plan that it will withstand \criticism and revision" during the course of
its execution. I therefore advocate an equilibrium between the long-run and the
myopic perspective: Optimal long-term plans should be such that an agent will
not want to change them during the course of inquiry. For scienti�c methods,
this motivates the following principle of sequential optimality.
If a method � is optimal by some criterion C given background
knowledge K; then � should remain C-optimal after observing evi-
dence e|that is, � should be C-optimal given background knowledge
K \ [e]|provided that e is consistent with K.
Game theorists have adopted the principle of sequential optimality as a
constraint on strategies for sequential decisions, under di�erent names and
with slight variations, such as: \subgame-perfection" [Selten 1965], \the back-
wards induction principle" [Kohlberg and Mertens 1986], \sequential rational-
ity" [Kreps and Ramey 1987], \sequential admissibility" [Bicchieri and Schulte 1997]
(see also Chapter 9). All methodological recommendations presented in this the-
sis endorse methods that are sequentially optimal. In Chapter 9, I show that
the admissibility principle guarantees sequential optimality, no matter what val-
ues are taken as the relevant ones: Methods that are admissible with respect
to background knowledge K are admissible after learning evidence e, that is,
with respect to background knowledge K \ [e]. But in some decision problems,
the con ict between short-run optimality and long-run bene�ts is irreconcilable.
For example, suppose that Maite makes a plan to quit smoking tonight. After
dinner, she reviews this resolution and realizes that if she waits another day,
the pleasure from one more cigarette tonight far outweighs the slight decrease in
the risk of lung cancer that quitting today will bring her compared to quitting
tomorrow. Criticizing her original plan in this way, she revises it and plans to
quit tomorrow. But tomorrow the plan to quit is again open to the same criti-
cism, which will lead her to revise it, etc. Medical associations at least would be
\for method" in this case, if this means a commitment to a long-term plan that
the agent honors even when the plan does not look good to her from a myopic
77
perspective. To sum up: Levi views the possibility that an agent may criticize a
plan during its execution as a reason for not making one; I take it to motivate a
constraint on the choice of long-term strategies for inference, namely that they
should withstand such criticism as inquiry proceeds. I call methods that meet
this requirement sequentially optimal. Serious con icts between the long-run
and the short-run perspective arise only when there are no sequentially optimal
methods. When such con icts arise, it is not clear that we should favor the my-
opic perspective over the long-run perspective; but in any case, the criteria that
I propose for evaluating inductive methods select sequentially optimal methods.
In particular, I show in Chapter 9 that admissible methods are sequentially
optimal.2
Another cluster of objections to long-run reliability as a goal for scienti�c
inference runs from Keynes's quip \in the long run we are all dead" to PAC
models (see next section) that yield bounds on how long a method may take to
settle on an approximately right answer. These criticism share the idea that it
is not su�cient for a method to �nd the truth eventually, but that it should do
so \soon enough", by some deadline known a priori, or at least \sooner rather
than later". The next section takes up these objections.
4.6 Contra Convergence
Is a reliable method an interesting solution to a research problem? We know
that a reliable method will, after some delay, hold the correct opinion for eter-
nity. But since we and our descendants reap the fruits of inquiry for only a
�nite time, how do we pro�t from the limit of inquiry? Furthermore, often|
perhaps in most cases|the inductive problem under investigation changes in
the course of inquiry (see Section 2.5), leading us to change our methods. These
considerations suggest that the value of a reliable method lies not so much in
its actual convergence to the truth, but in its disposition to hold on to the cor-
rect opinion. Another way of putting the point is that what a reliable method
achieves|within �nite time|is true stable belief, namely belief that does not
change with further evidence. Just as an algorithmic solution to a formal ques-
tion is guaranteed to �nd the right answer, but may take any �nite amount of
time to do so, a reliable method may take any �nite amount of time to produce
2Levi raises another objection to taking into account long-run reliability that stems fromhis theory of empirical knowledge and how it functions in decision making (cf. Section 2.4).
If the agent takes his current conjecture as background knowledge at time t, and evaluateschanging his conjecture with respect to that background knowledge, he must decide not to
change it because given his background knowledge, a change would certainly lead to an error.Levi's solution is that the agent may \contract" background knowledge without re ecting
on the fact that the result of this contraction might lead to a theory at time t + 1 that is
certainly false as far as the agent at time t is concerned. Thus Levi does not want an agent tore ect on whether his theory at the next stage of inquiry is true or not, much less on whether
eventually all his theories are true. Levi's myopic stance is implausible, and tied to premisesabout knowledge and action that I rejected in Section 2.4.
78
stable true belief. As Plato suggested, we may think of stable true belief as a
sign of \understanding", or having \gotten the gist" of a problem.3
One may insist that true stable belief is not good enough, and that we want
certain belief. Indeed, some say that we want not only the right answer with
certainty; we want to know that we shall not have to wait \too long" to �nd
the right answer. Thus Kitcher remarks that
To be sure, there are [Bayesian] convergence theorems about the
long run|but as writers from Keynes on have pointedly remarked,
we want to achieve correct beliefs in the span of human lifetimes.
[Kitcher 1993, p.293]
The second of these demands is stronger than the �rst: If it is possible to
give a deadline (e.g., \human lifetime") by which a method is guaranteed to
�nd a right answer, then we can be sure that the method's conjecture is correct
when the deadline is reached.4 We all want lots of things, especially instant true
theories. But as skeptical arguments from Sextus to Popper to Kelly pointedly
show, it is a fact of life that, unless the scientist is granted strong background
assumptions, there is no philosopher's stone that is guaranteed to turn the lead
of evidence into the gold of true generalizations. Still, as Kitcher points out,
\prior background practice" in a number of research problems rules out enough
possibilities for \eliminative induction" to single out one of the alternative hy-
potheses as the only one that \accommodates" the data [Kitcher 1993, Ch.7].
If this is not the case, he says, the \reasoning" involved should be \frowned
upon".5 Perhaps the suggestion is that we should adopt a quasi-conventional
standard according to which a research problem is feasible and worthy of in-
vestigation, relative to given background knowledge, only if the background
3Plato neglected the possibility that an agent may have stable true belief for the wrongreasons. I describe such a case in Section 5.4, and show how reliable methods can avoid this
problem.4An analogy to computability theory suggests another way of de�ning \not too long". An
\e�cient" solution to a computational problem is an algorithmwhose run-time is a polynomial
function of the size of the input. By analogy, one might suggest that the amount of evidence
that a reliable method requires to �nd a correct answer should be a polynomial function ofthe size of the space of alternatives H. This is one of the core ideas of the PAC learning
paradigm. (To meet the PAC criterion, the required evidence must be a polynomial functionof H as well as a \con�dence" and a \tolerance" parameter; see below in this section.) This
idea clearly applies only in settings in which the hypothesis space H is �nite, and there is a
natural way of de�ning its \size".5\However, when the constraints are lax or when con�dence in the completeness of prior
practice is (quite reasonably) low, there is room for doubt about hypotheses that accomodateaccepted results. The problem in this case is not accomodation itself, but the state of prior
practice. I conjecture that opponents of accomodation have been moved by examples in which
the constraints from prior practice are lax, and there is no serious attempt to explore a spaceof rival hypotheses. That type of reasoning should be frowned upon, but the troubles should
be traced to failure to activate the eliminative propensity, not to some supposed defect in thestrategy of accommodation." [Kitcher 1993, p.246]; emphasis Kitcher's.
79
assumptions imply that some \reasonably small" amount of evidence will elim-
inate all but one alternative|somewhat in the manner of computer scientists
who consider a computational problem \tractable" only if it can be solved by a
Turing machine whose run-time is a polynomial function of the size of the input.
It is not my purpose to enter into a discussion of whether we should adopt such
a standard of feasibility for research problems. I note however that along with
the problem of induction, Kitcher's standard throws out many research projects
of interest in science|for example, �nding a complete theory of particle physics
and investigating gravitational theories|as well as most of the inductive prob-
lems that philosophers have found worthy of re ection, such as Reichenbach's
problem of inferring long-run frequencies, and Goodman's Riddle of Induction.
An interesting alternative would be to de�ne a discovery problem as tractable
if and only if there is a reliable method for solving it. By Proposition 4.3, this
means that we should restrict attention to hypotheses that are decidable in the
limit. By that standard, Goodman's Riddle is tractable (cf. Section 2.3.3), and
so is the problem of identifying conservation principles for particle reactions
(Chapter 8). But Simon and Newell's physical symbol system hypothesis would
not count as tractable, as we saw in the previous section.
Another attempt to achieve more than long-run reliability is to appeal to
short-run chances. Even if the available evidence does not yield certainty as
to which theory is correct, we might want to follow methods that have a high
chance of producing a correct theory. For example, Kitcher holds that
[Our] normative project is to consider whether making particular
kinds of changes on the basis of particular kinds of processes is likely
to promote cognitive progress. [Kitcher 1993, p.221]
But short-run chances at the truth may add up to less than long-run reli-
ability: There is no guarantee that repeated guesses will ever stop vacillating.
Scientists who take myopic stabs at the truth might end up chasing their tails,
rather than engaging in the systematic, cumulative and progressive enterprise
that reliability theory (and realists like Kitcher) envision.6 On the other hand,
if a method with a high likelihood of producing correct theories in the short run
actually does lead to true hypotheses in the long run, a reliabilist can embrace it
as having a desirable short-run feature in addition to long-run reliability. Thus
the appeal to short-run chances does not inherently seem to be in con ict with
6It seems that Kitcher fails to appreciate this point. In various places his vision of scienceinvolves \cumulative progress"; more and more parts of our scienti�c theory \persist" through
time, even if others are discarded. Thus his \optimistic induction" [Kitcher 1993, p.137] saysthat since through time and changing theories, science has always held on to some parts of its
current theories, we may infer that science will continue to accumulate beliefs. Another part
of Kitcher's picture that assumes stable belief through time is his account of natural kinds,which he takes to be those which are eventually always included in the successive ontologies
that science produces [Kitcher 1993, p.172]. Long-run reliability can sustain this picture ofcumulative progress; short-run chances at truth cannot.
80
reliability analysis, if we understand it as an addition rather than an alternative
to an investigation of where a given type of inquiry is headed in the long run.
The most serious shortcoming of the short-run likelihood approach is that it is
hard to get o� the ground. How can we establish what the chance is that a
given method will produce a true theory? The naturalist approach is to launch
an \empirical investigation", typically of the \psychological propensities", or
\exemplary reasoning", of prominent �gures of science, apparently in the ex-
pectation that these propensities, or that kind of reasoning, are likely to produce
true theories, at least in our world. For example, Kitcher proposes to analyze
Galileo's way of doing science as follows.
First we must identify the processes that underlie Galileo's beliefs
and decisions. Second we have to assess the ability of such processes
to generate and sustain cognitively valuable states. Both endeavors
are error-prone. But there is a fact of the matter: either Galileo's
reasoning exempli�ed a strategy likely to promote cognitive goals or
it did not. [Kitcher 1993, p.186]
How are we to \assess the ability" of a psychological propensity to produce
the truth? Kitcher and other naturalists suggest that we should consider the per-
formance of such a propensity across a number of \epistemic contexts"[Kitcher 1993,
p.189], which for the purposes of the present discussion we may equate with pairs
of discovery problems and �nite data sequences. A number of problems arise:
How do we choose a \representative sample" of epistemic contexts on which we
should test the scienti�c disposition in question (cf. [Kitcher 1993, p.237])? And
given such a sample, how do we \identify the processes that underlie Galileo's
beliefs and decisions" precisely enough to tell what Galileo would have said
in a given epistemic context? Who can say whether Galileo would have|for
example|concurred with Newton's rejection of the wave theory of light? And
if we could say what Galileo's cognitive strategies were, how can we justify
inferences from \these strategies worked for problems P1; P2; ::; Pn" to \these
strategies will work for problem Pn+1"?7 Faraday relied heavily on visualization
in his research into electromagnetism. Would this cognitive strategy have helped
or hindered him in space-time physics? Newton made progress on mechanics,
but set back the wave theory of light, and got nowhere at all in his alchemy; is
there any reason to think that his \cognitive strategies" were di�erent in these
respective researches? From the reliabilist point of view, di�erent problems
demand di�erent solutions that take advantage of the special structure of the
particular problem involved. No general \methodological maxim", or \cognitive
strategy", or \style of reasoning" can beat the trite advice \choose an optimal
strategy for the problem at hand". For if the result of such \propensities" con-
tradicts this trivial advice, then by de�nition the result is less than optimal.
Thus I critiqued the maxim of \minimal change" belief revision in Chapter 3
7The introduction to [Donovan et al. 1988] discusses this issue further.
81
by describing how it can lead to bad results on a simple problem. Learning
theorists criticize the Bayesian maxim \thou shalt conditionalize" because this
principle can lead scientists who are not logically omniscient into severe losses
of reliability [Kelly and Schulte 1995b], [Osherson and Weinstein 1988]. And
if psychological analysis can provide us with enough information about what
Galileo's or Newton's cognitive strategies exactly are, we could and should hold
them up to reliabilist standards in the same way.8
Fairly recently, computational learning theory has found an ingenious way
to argue for the likelihood of a method's conjecture being true given a �nite
(polynomial-size) amount of evidence [Valiant 1984]. In essence, the argument
is this. We suppose that the data were generated by random sampling from an
(unknown) probability distribution. A method PAC-learns if, for any sampling
distribution, after some bounded number of samples, the method produces,
with a probability (determined by the sampling distribution) of at least �, a
hypothesis whose margin of error (determined by the sampling distribution) is
no greater than �; here 0 � � < 1 and 0 < � � 1 may be speci�ed arbitrarily.
As I mentioned, we can include such short-run performance in a long-run relia-
bilist framework by asking whether repeated applications of the method would
stabilize to a true hypothesis. However, there are serious technical problems in
generalizing the PAC-idea to the kinds of empirical hypotheses that I am con-
cerned with. The PAC-framework applies to concepts for classifying instances,
not arbitrary empirical propositions. Even more importantly, the performance
guarantees lean heavily on the assumption that the data are sampled at random.
But in the kinds of scienti�c problems I am considering, there is no reason to
think that successive observations are independent of each other.9
4.7 There's No Reliable Method|So What?
What conclusions should we draw about a research project|say, a discovery
problem|when it has no reliable solution? Such a negative result seems to
support a skeptical attitude about the ability of empirical science to solve the
questions at issue. But the skeptic's case against the possibility of knowledge
(or reliably inferred stable true belief) rests on material assumptions that she
cannot know (or reliably infer). For example, an improvement in instruments
for detecting particles may reduce the inductive complexity of particle theories.
8For example, Newton's fourth \rule of reasoning" [Newton 1995, Book III] amounts to
what [Kelly et al. 1995] call \stubborness" (cf. Section 3.5). Stubbornness is unreliable if thehypotheses under investigation are not refutable with certainty, as in the example of the two
almost universal generalizations (cf. Section 4.3).9However, the PAC-framework may be appropriate for theories of particle reactions. We
shall see in Chapter 8 that theories of particle reactions may be thought of as concepts
classifying reactions into \possible" and \impossible". And it seems that one may assumethat successive observations of reactions in, say, a bubble chamber are independent of each
other.
82
Direct observations of brain activity may allow us to test theories of human
cognition in a way that stimulus-response data do not. In 1825, August Comte
o�ered speculation about the chemical composition of the stars' atmosphere as
the kind of metaphysical speculation grounded in no empirical evidence that
was unworthy of a good positivist [Comte 1968]. But then spectral analysis
came along. Thus reliabilist underdetermination arguments do not establish,
once and for all, that an empirical question is beyond the reach of science.
Rather, they show a gap between the ambitions of inquiry and its current
means. What attitude should we take towards such a gap? One kind of response
would be to set a lower standard of success|for example, gradual veri�cation
rather than decision in the limit|or to introduce more assumptions to make
the problem simpler. This is how a computer scientist treats an intractable
computational problem: she might seek an algorithm whose outputs are always
within a constant factor from the optimal solution, or she might isolate special,
tractable cases of the original problem. Another response is to acknowledge
that for all we are currently willing to assume about a given inductive problem,
our methods might fail to �nd the correct answer, but that nonetheless we need
not \worry" about this possibility because it is \not signi�cant". Probability
theory supplies the tools for developing one version of this response. [Kelly 1995,
Ch.13] shows how, and critiques the result from the point of view of learning
theory (see also [Juhl and Kelly 1994]). Since these matters are complex, not
directly relevant to the project of this thesis, and I have nothing to add to the
detailed discussions in the literature, I will not take them up here.
In any case, it is clear that we prefer to �nd the truth sooner rather than
later, and that we may be unhappy if we don't know when our methods will
deliver it. The discontent may insist on strong background assumptions, or
appeal to probabilistic arguments, to feel justi�ed in believing that they will
soon have a correct theory. Reliability theory has no general arguments for or
against making background assumptions; but methodology should be tolerant
enough to give guidance to those who don't want to neglect possibilities. Even
if we don't set science a deadline by which it must deliver the certain truth,
but content ourselves with eventually true and stable, but uncertain, beliefs, we
still want methods that do not unnecessarily delay arriving at correct theories.
These are the �rst topic of Chapter 6. Before we take up that topic, I take a
closer look at reliable methods in the next chapter.
4.8 Proofs
[Kelly 1995] gives the proofs of Propositions and Corollaries 4.2, 4.4, 4.5, 4.6
and 4.7: My Proposition 4.2 follows from his Proposition 9.12, my Propositions
4.4, 4.5, 4.6 from his 4.10, and my Proposition 4.7 from his Proposition 4.6.
Here's the proof of Lemma 4.3.
Lemma 4.2 Let � be a discovery method, and let background knowledge K be
83
given. Then for all �nite data sequences e, proj(�; e) \K is refutable with cer-tainty given K.
Proof. Let � be a data stream in K that is not a member of proj(�; e) \K.
Then � is not a member of proj(�; e) = f� � e : if e � e0 � � and �(e) j= H 2 H,
then �(e0) j= H and �(e0) 6= ;g. If � does not extend e, then proj(�; e) is refuted
when � deviates from e. Otherwise there is an initial segment e0 of � extending
e such that �(e0) is either inconsistent or fails to entail �(e). In either case, e0
conclusively falsi�es proj(�; e).2
84
Chapter 5
Reliable Inference
5.1 Outline
The previous chapter determined when it is feasible to reliably converge to the
right answer about a given discovery problem. This chapter characterizes what
the methods are like that achieve this goal. I prove a normal form theorem that
shows that (virtually) all reliable methods take a form known as the bumpingpointer method. This normal form theorem suggests a more exible and plau-
sible formulation of the falsi�cationist position. Suppose that the hypotheses
under investigation are unfalsi�able, and that there is a reliable method � for
identifying the true alternative. (� would not be a conjectures-and-refutations
method.) The theorem shows how to decompose the original hypotheses into
falsi�able conjectures, so that we may use a conjectures-and-refutations method
for those hypotheses to reliably identify a theory that gives the right answer to
the questions under investigation. The result is a conjectures-and-refutations
method �0 that is just as reliable as the method � designed for the original
hypotheses, yields more content than �, but also makes more errors. Thus we
may construe Popperian recommendations as specifying the means of choice for
those chie y concerned with reliably �nding the truth, and more concerned with
providing content than avoiding errors.
Another application of the normal form theorem concerns the ancient philo-
sophical question about the relationship between true belief and knowledge.
According to a traditional proposal, knowledge is justi�ed true belief. Edmund
Gettier refuted this idea with a famous counterexample [Gettier 1963]. The
essence of Gettier's counterexample is this: Suppose that (1) an agent is justi-
�ed in believing a claim C, (2) that C entails H, and this leads the agent to
believe H, and that (3) H is true but C is false. Then the agent's belief in
the true proposition H is justi�ed, but intuition says that the agent does not
know H because he believes H for a wrong reason, namely because he accepts
85
C. Now presumably the agent would abandon her belief in H if she received
evidence refuting C. Thus if we took knowledge to be stable true belief|belief
that does not change with more information about the world|the agent's be-
lief in H would not count as knowledge, as intuition con�rms. Indeed, in the
Meno dialogue [Plato 1967], Plato stipulated that knowledge must be stable
true belief. Although Plato's condition solves the Gettier paradox, I show with
a counterexample that it cannot be the whole story about knowledge: I describe
an in�nitary version of the Gettier problem in which an agent has stable true be-
lief, but for in�nitely many wrong reasons. On the other hand, the normal form
theorem shows that for any discovery problem that admits a reliable solution,
there is a reliable method � that avoids the in�nitary Gettier problem. That is,
if the method � stabilizes to a true alternative H, its internal beliefs supporting
H are eventually true as well. There are no Gettier-type counterexamples to
regarding the stable true beliefs of such agents as knowledge.
5.2 Reliable Methods
What are reliable methods like? To answer this question, I establish �rst that we
can view any reliable discovery method as an instance of what is known as the
bumping pointer architecture [Osherson et al. 1986], [Kelly 1995, Ch.9] (pro-
vided the method satis�es a small proviso speci�ed below). Bumping pointer
methods are like Popperian conjectures-and-refutations methods in that they
enumerate a collection of refutable hypotheses and move on to the next hypoth-
esis if and only if the evidence falsi�es the current hypothesis. The di�erence is
that a bumping pointer method needn't output the refutable hypothesis itself,
but may conjecture a weaker theory. For example, take the two almost univer-
sal generalizations from Section 2.3.2, Hw =\almost all swans are white" and
Hb =\almost all swans are black". Hw is equivalent to an in�nite disjunction,
\all swans are white or at most one swan is black or at most two swans are
black or ...". Let Cw
1 ; Cw
2 ; ::: denote these disjuncts, such that Cw
kis true just in
case at most k swans are black. Note that each Cw
kis refutable with certainty:
if there are more than k black swans, Cw
kis falsi�ed when k+1 black swans are
observed. Similarly, we may decompose Hb into refutable disjuncts Cb
1; Cb
2; :::
such that Cb
kis true just in case at most k white swans are observed; see Figure
5.1
Now combine the refutable hypotheses C0k; C
b
kinto one big enumeration
C1; C2; ::: such that each Cw
kand C
b
khypothesis occurs exactly once in the Ci
enumeration. A bumping pointer method based on the Ci enumeration employs
an \internal pointer" to mark the �rst hypothesis Ck that is consistent with
given evidence e, and conjectures Hw or Hb depending on whether Ck entails
Hw or Hb. For example, suppose that C1 = C
w
0 , \all swans are white", and
that C2 = Cb
1, \at most one swan is white". Then the bumping pointer initially
points to C1 which entails Hw, so the method initially conjecturesHw, \almost
86
Figure 5.1: Decomposing Hypotheses Into Refutable Subsets.
87
Figure 5.2: The Bumping Pointer Method
all swans are white". If the �rst observed swan is black, the pointer bumps to
C2, so the method conjectures Hb, \almost all swans are black". The method
from Section 2.3.4 that reliably identi�es the true limiting relative frequency of
an event among the two alternatives 1/4 or 3/4 can be implemented as a bump-
ing pointer method: Let the \internal" hypotheses be of the form \at stage n
and all later times, the observed relative frequency will always be closer to 1/4
(respectively, 3/4) than to 3/4 (respectively 1/4)". Each of these internal hy-
potheses is refutable with certainty. The bumping pointer method that operates
on these internal hypotheses works in the spirit of Reichenbach's straight rule:
it conjectures that the limiting frequency of an event is 1/4 (respectively, 3/4)
as long as the observed frequency is closer to 1/4 (respectively, 3/4).
To de�ne the notion of a pointer precisely, let a countable collection C of
refutable empirical hypotheses be given, and let � be an enumeration of the hy-
potheses in H. Then pointer(C; �; e) is the �rst|in �'s enumeration|hypothesis
C in C such that C is consistent with e. (Here and elsewhere I assume that there
is such a hypothesis C in C.) Now consider a collection of alternative hypothe-
ses H and background knowledge K that de�ne a discovery problem. Let C
be a countable collection of refutable empirical hypotheses, each of which en-
tails some hypothesis in H (i.e., for each C in C, there is an H in H such that
C j= H). I say that C is a re�nement of H. Given evidence e, a bumping
pointer method based on an enumeration � of a re�nement C conjectures the
hypothesis H from H that is entailed by pointer(C; �; e); see Figure 5.2.
The reason why reliable discovery methods can be implemented as bumping
88
pointer methods is this. Let � be a reliable method for a given discovery problem.
By Lemma 4.3, the projection sets of � are each refutable with certainty. So
there is a bumping pointer method �0 that operates on a collection C comprising
the projection sets of �. The pointer of �0 bumps just in case � changes its
mind. Furthermore, the projection sets of � are subsets of a hypothesis H from
H since a data stream � is in H whenever � stabilizes to H along � . Hence
the refutable sets in C|the projection sets of � |entail the same hypotheses in
H that � stabilizes to; the proof of Proposition 5.1 shows that if the bumping
pointer method �0 enumerates the projection sets of � in the right way, �0 and
� conjecture the same alternative from H whenever � projects this alternative.
The bumping pointer method �0 will thus di�er from � only if � does not project
any alternative fromH on a given data sequence e. Since in that case � is sure to
change its mind about its hypothesis �(e) no matter what, I say that � does not
take seriously the conjecture �(e). Formally, � always takes its conjectures
seriously given K if for all e consistent with K; proj(�; e)\K 6= ;. Finally, since
� is reliable given K, � stabilizes along each data stream � consistent with K,
such that � is contained in some projection set of �; I say that the collection
of projection sets covers K. All together, this shows that a reliable discovery
method � can be implemented as a bumping pointer method if � always takes
its conjectures seriously.
Proposition 5.1 Suppose that � is a reliable discovery method for H given
K, and that � always takes its conjectures seriously given K. Then � can beimplemented as a bumping pointer method. That is, there is a re�nement C
of H, an enumeration � of C, and a bumping pointer method �0 de�ned by C
and � such that for all �nite data sequences e consistent with K, �(e) j= H
() �0(e) j= H.
5.3 Popper, Levi and Deductivism
The bumping pointer method suggests a plausible formulation of a falsi�cation-
ist approach to scienti�c discovery. This proposal also addresses an unsolved
problem for Levi's epistemology: How can an agent rationally decide to change
her beliefs when she is certain that her beliefs are true?
5.3.1 A New and Improved Falsi�cationism
We saw in Section 4.3 that Popper's conjectures-and-refutations method is un-
reliable when the alternatives under investigation are decidable in the limit but
not refutable with certainty. Proposition 5.1 o�ers a falsi�cationist an inter-
esting reply: He can argue that the original discovery problem de�ned by the
alternatives H should be reposed with the alternatives C in place of H, where C
is a collection of refutable sets that re�nesH and coversK. I call this translation
89
of the problem popperizing a discovery problem. For example, to popperize
the problem of identifying an almost universal generalization about swans, we
may take the alternative hypotheses to be the collections Hw
kand H
b
k, that is,
hypotheses of the form \at most k swans are white", or \at most k swans are
black". A conjectures-and-refutations method reliably identi�es a true alterna-
tive from C since each such alternative is refutable with certainty (cf. Section
4.3). Another way of putting the matter is that Proposition 5.1 permits us to
think of a reliable method � as a bumping pointer method, and we can turn a
bumping pointer method into a conjectures-and-refutations method that pro-
duces the \internal" refutable hypotheses marked by the bumping pointer. The
result of popperizing the original method � in this way is a method �P that is
just as reliable, but produces refutable strengthenings of the original hypotheses.
A good reason for following �P is that �P is more informative; �P dominates
� with respect to content, in the sense of De�nition 3.2. A good reason for
following the original method � is that � makes fewer errors; � dominates �Pwith respect to error, in the sense of De�nition 3.1. (� also changes its mind
less often; see De�nition 6.2 below.) Thus we may construe a falsi�cationist's
preference for refutable hypotheses and the conjectures-and-refutations method
as the right advice for someone who wants to reliably �nd the truth, and cares
more about content than about avoiding errors.
Popperian conjectures-and-refutations methods are stubborn and timid in
the same sense as AGM methods (see Section 3.5): They hang on to their
conjectures as long as these are consistent with the data. As constraints on
inductive methods, the AGM principles are essentially the conjectures-and-
refutations scheme minus falsi�ability. The way to make AGM methods into
reliable discovery methods is to add falsi�ability (i.e., let them produce refutable
theories). Thus we �nd reliable AGM methods at the same high-content side of
the content-error continuum as Popper's conjectures-and-refutations approach.
5.3.2 A Reliable Enterprise of Knowledge
The bumping pointer architecture can accommodate Levi's epistemology (cf.
Section 2.4), and indeed suggests a solution for a problem that Levi left open.
The problem is this: Levi requires that an agent adopt her current theory as
background knowledge. But if the agent is certain that her current beliefs are
true, how can she rationally decide to abandon any of her current beliefs? Re-
tracting any of her beliefs incurs a loss of content without eliminating any serious
possibilities of error (that is, retracting beliefs is not content-error acceptable in
the sense of De�nition 3.3). Levi suggests that \if X detected inconsistency in
his initial corpus, he could have excellent reasons to contract. An inconsistent
corpus fails as a standard of possibility" [Levi 1983, p.165]. Levi does not make
clear what other rational reasons an agent may have to contract. 1 But if X
1Levi claims that an agent would be justi�ed in giving a theory \a hearing" if the theory has
\superior explanatory virtues", even when he knows that the theory|despite its explanatory
90
makes the falsi�cationist move from the previous section|adopting beliefs that
are refutable with certainty|the only reason that X needs to retract beliefs is
inconsistency with the current evidence. At the same time, the agent's retrac-
tions through time will take him to true beliefs, solving another mystery that
puzzles Levi:
But if our proximate aim is merely to accept hypotheses for pur-
poses of testing them and somehow this concern is seen to promote
the long-run aim of obtaining the true complete story of the world (in
a manner which remains a mystery to me), then truth or avoidance
of error may be an important desideratum in the long run. It has
no importance, however, in the proximate aims of inquiry. If test-
worthiness is what we are after, we need not concern ourselves with
the truth values of our hypotheses or with avoiding error....Thus,
Popper's view succeeds in placing truth on a pedestal remote from
the immediate concerns of inquiry. [Levi 1983, p.168]
There is no mystery at all about how the bumping pointer method uses tests
to promote the long-run aim of �nding the truth about the questions under
investigation. Rather than placing truth on a \pedestal" remote from empirical
tests, the normal form theorem proves that testing strategies must be part ofreliable processes for �nding the truth. Whatever \testworthiness" may be,
testing alternative hypotheses against empirical evidence is an indispensable
means for the long-run aim of �nding the right answer to the questions that
prompted inquiry.
5.4 Gettier meets Meno
What are reliable bumping pointer methods like? A bumping pointer method
is reliable given background knowledge K for a collection of alternatives H if
its re�nement C of H covers all serious possibilities in H, that is, if K j=SC.
For if this is the case, then some hypothesis Ck in C is true whenever K is,
and hence always consistent with the evidence, eventually the data falsify all
refutable hypotheses C1; C2; :::; Ck�1 preceding Ck, and the bumping pointer
method stabilizes to Ck. However, there are reliable bumping pointer methods
whose re�nement C does not cover all serious possibilities. For an example,
recall the physical symbol system hypothesis from Section 2.3.5. Suppose that
power|must be false [Levi 1983, p.166]. But for this to work, providing good explanations
must be an important epistemic value, so important that an inquirer should sometimes con-
sider explanations that she knows to be false because they would be so nice if they were true.But Levi's o�cial view is that content and avoiding error are by far the most important epis-
temic values. For example, in the same paper, Levi insists that an agent should not change toa more \testworthy" theory|a la Popper|because, Levi says, testworthiness is not related
to content and error; see the quotation below.
91
computers are capable of intelligence, and that only one paradigm of cognitive
science, say connectionism, is capable of producing true machine intelligence.
However, a model from another paradigm, say production systems, can account
for any �nite component of intelligent behavior. Figure 5.3 depicts this situation.
Suppose that the data stream � is the actual one; so a connectionist neural-
network modelN is true on �. The models P0; P1; ::: are production systems that
account for parts of the evidence from �. Let us focus on the empirical content
of N and P0; P1; :::; call this set of data streamsK. Consider a bumping pointer
method �PS whose enumeration of refutable sets includes P0; P1; ::: but not N ,
in such a way that for any �nite data sequence �jn from �, some production sys-
tems model Pi is the �rst hypothesis that is consistent with �jn. These refutable
sets do not cover � because the connectionist model N is missing. Nonetheless,
�PS is a reliable method for investigating the physical symbol system hypothesis
over the range of possibilities K 2: Since all along �, the bumping pointer points
to a production-system model Pi, and each Pi entails that the physical sym-
bol system hypothesis HPSS is true, �PS correctly stabilizes to HPSS along �.
Another reliable bumping pointer method �N may include N among its closed
sets in addition to P0; P1; :::; along �, its pointer would eventually stabilize to
N . Throughout K, and in particular on �, both methods give exactly the same
answers about the physical symbol system hypothesis. But there is something
unsatisfactory about the success of �PS: This method has the right opinion
about the possibility of machine intelligence, but for the wrong reason; it never
abandons its (internal) false belief that some production-system model of intel-
ligence is adequate. What is at stake here philosophically is that we don't want
just the right behavior, but the right behavior for the right reasons. This is
the sort of problem that Edmund Gettier raised for accounts of knowledge as
justi�ed true belief [Gettier 1963]. 3 A typical Gettier case is the following.
Two other people are in my o�ce and I am justi�ed on the basis
of much evidence in believing the �rst owns a Ford car; though he
(now) does not, the second person (a stranger to me) owns one. I
2We may extend � to a reliable method over the other possibilities outside of K with thereliable method described in Section 2.3.5.
3As Plato noted, \And presumably as long as he has right opinion about matters of which
the other has knowledge, he will be no worse a guide than the man who understands it, eventhough he only believes truly without understanding" [Plato 1967, 96d{98b]. The problem is
one of global underdetermination: Even if we knew everything about the overt behavior of anagent, or an entity, at all times, this information does not determine the internal reasons for
the agent's behavior. Situations like this give rise to philosophical di�culties. One example
is Searle's famous attack on Turing's test for machine intelligence [Searle 1980]. Searle arguesthat even if an entity acts intelligently, it is not actually intelligent unless it produces the
intelligent behaviour in the right way; a computer program, Searle says, is a wrong way.Another example is the question of whether a good deed su�ces to earn moral credit, or
whether in addition the deed must be done with the right kind of intentions. Kant for oneheld that a dutiful action loses all moral value if the actor was emotionally inclined to perform
the act [Kant 1785].
92
Figure 5.3: Given the observations from data stream �, a connectionist modelN
is the true theory of machine intelligence, but the production systems approach
is never conclusively refuted along �.
93
believe truly and justi�ably that someone (or other) in my o�ce
owns a Ford car, but I do not know someone does. [Nozick 1981]
In terms of methods with a bumping pointer, the Gettier problem arises
when the pointer bumps from a false \internal" hypothesis C1 (�rst person
owns a Ford car) that entails a hypothesis H (someone in my o�ce owns a Ford
car) to a true \internal" hypothesis C2 (the second person owns a Ford car)
that also entails H. When the pointer marks C1, the method believes H but
for the wrong reason.
Similarly, we are inclined to say that a follower of the production system
method �PS does not know at any point that the physical symbol system hy-
pothesis is true when the connectionist approach is the right one, whereas the
connectionist does know it, at least after some time. This shows that the Pla-
tonic condition that knowledge must be stable true belief may be necessary but
is not su�cient. My example is an in�nitary version of the Gettier problem,
in which a method settles on the right hypothesis because of in�nitely manywrong reasons. The proof of Proposition 5.1 shows that we can avoid Gettier
cases by choosing the right kind of re�nement C of H. More precisely, there is a
re�nement C, such that if the pointer of a bumping pointer method moves from
an internal hypothesis C1 to another C2, then C1 and C2 entail di�erent alter-
natives fromH. Although some reliable methods may have the Gettier problem,
for any solvable discovery problem there is a reliable method that does not.
5.5 Proofs
It is useful for the proof of Proposition 5.1 to have some notation for the last
place on a given �nite data sequence e at which a discovery method � changed
its mind. Let LastMC(�; e) be the shortest sequence of data e0 contained in e
such that � does not change its mind between e0 and e, i.e. for all e0 � e
00 �
e; �(e00) j= H () �(e) j= H. If � changes its mind at e, then LastMC(�; e) = e.
Recall that e � x denotes the �nite data sequence in which datum x follows the
observations e. So LastMC(�; e � x) = LastMC(�; e) if and only if � does not
change its mind at e � x:
Proposition 5.1 Suppose that � is a reliable discovery method for H givenK, and that � always takes its conjectures seriously given K. Then � can beimplemented as a bumping pointer method. That is, there is a re�nement C
of H, an enumeration � of C, and a bumping pointer method �0 de�ned by C
and � such that for all �nite data sequences e consistent with K, �(e) j= H
() �0(e) j= H.
Proof. For abbreviation, I write T � T0 if T j= H () T
0j= H, for all
H 2 H. Let � be a reliable discovery method for H. Choose a 1-1 encoding hi
of �nite sequences of natural numbers into natural numbers that is monotone
94
in the sense that he � xi > hei. 4 Let Ci = proj(�; hii�1), let H0 = fCigi2!,
and let � be the corresponding enumeration of the closed sets Ci. Let �0 be
the bumping pointer method with pointer(H0; �; �) that conjectures �(hii
�1) if
pointer(H0; �; e) = Ci. I show that for all �nite data sequences e; �(e) � �
0(e).
As an auxiliary hypothesis, I establish that (*) for all e; pointer(H0; �; e) =
proj(�; LastMC(�; e)). The proof is by induction on the length of �nite data
sequences.
Base Case, e = ;. Since hi is monotone, h;i = 0. So pointer(H0; �; ;) = C0.
Hence �0(;) � �(h0i�1) = �(;). Also, LastMC(�; ;) = ;. So proj(�; LastMC(�; ;)) =
proj(�; ;) = C0 .
Inductive Step: Assume that �(e) � �0(e), and consider a �nite data sequence
e � x.
Case 1: �(e) � �(e � x). Since � does not change its mind at e � x and
e � x is consistent with K; e � x is consistent with proj(�; LastMC(�; e)). So by
inductive hypothesis, e � x is consistent with pointer(H0; �; e). By the de�nition
of a pointer, pointer(H0; �; e � x) =pointer(H0
; �; e). So �0(e � x) � �
0(e) �
�(e) � �(e � x) by inductive and case hypothesis. Similarly, pointer(H0; �; e) =
proj(�; LastMC(�; e)) =proj(�; LastMC(�; e � x)) =pointer(H0; �; e � x).
Case 2: �(e) 6� �(e � x). Then LastMC(�; e � x) = e � x: Let Ci =
pointer(H0; �; e � x), that is, let i be the least number such that Ci is consis-
tent with e � x. Since � always takes its conjectures seriously, proj(�; e � x) 6= ;.
So C<e�x> =proj(�; e�x) is consistent with e�x; hence Ci, the �rst set in the enu-
meration consistent with e�x cannot follow C<e�x> in the enumeration, that is,
i � he � xi. Because hi is monotone, this means that hii�1� e�x, sinceCi is con-
sistent with e �x; and hence hii�1is consistent with e �x. Moreover, i � he � xi,
because otherwise again due to the fact that hi is monotone, hii�1
� e � x, so
that hii�1
� e. But � changes its mind at e � x and hence between hii�1and
e � x , so that e � x is inconsistent with proj(�; hii�1) = Ci. Since this contra-
dicts the assumption that e �x is consistent with Ci, we have that hii = he � xi,
so Ci = Che�xi , and hence pointer(H0; �; e � x) = Che�xi =proj(�; e � x). Thus
�0(e � x) � �(e � x).2
4A monotone 1-1 encoding himay be de�ned like this. Let Pi denote the i-th prime, and lete be a �nite sequence of natural numbers. Let h;i = 0, and let hx1 ; x2; : : : ; xki = �k
i=1(Pi)
xi .
95
Chapter 6
Fast and Steadfast Inquiry
6.1 Outline
Combining evaluation criteria for inductive methods, such as admissibility and
minimax, with cognitive values, such as content, error and avoiding retractions,
can provide strong guidance for what inferences an agent should draw in the
short run, as we saw in Chapter 3. But strong guidance is not necessarily good
guidance; in particular, the constraints I established in Chapter 3 do not help
agents �nd correct theories. Reliability analysis, on the other hand, makes clear
how to organize inquiry in a manner that is headed for correct beliefs in the long
run; but long-run reliability is consistent with any beliefs in the short run. This
chapter shows how we can apply auxiliary cognitive values to choose among
reliable methods. The result is a powerful combination of long-run reliability
and short-run constraints. We may think of the resulting norms as standards
of e�ciency for reliable inquiry. I consider three epistemic values that pertain
to discovery problems: minimizing convergence time, retractions and errors. I
apply the criteria of admissibility and minimax to each of these three to arrive
at six standards of e�ciency for reliable inquiry. Then I investigate under what
circumstances it is feasible for an inductive method to attain a given standard of
performance, and what methods do so when possible. The result is a hierarchy
of cognitive goals (illustrated in Figure 6.9): It turns out that the norms of
e�ciency fall into a strict order of feasibility.
Certain combinations of cognitive goals lead to particularly interesting re-
sults. For example, evaluating reliable methods by their convergence time with
the admissibility principle, and by their retractions with the minimax principle,
underwrites a version of Occam's Razor, selects the natural projection rule in
Goodman's Riddle of Induction as the most e�cient, and mandates the intro-
duction of hidden particles in theories of particle reactions, as we shall see in
Chapter 8.
96
6.2 Data-Minimal Methods
Time is a resource of inquiry. An inquirer who wants a correct theory as soon
as possible prefers his methods to stabilize to a true belief sooner rather than
later. Let us call the time that a method � requires to settle on a hypothesis
from a collection of alternatives H, on a given data stream �, the modulus of
� on �; I denote the modulus by mod(�; �). Formally, mod(�; �) = the least time
n such that at n and all later stages n0 > n; �(�jn0) is consistent and entails
the hypothesis H from H that is correct on �. If a method � fails to converge
to a true hypothesis on a data stream �, then I take its modulus on � to be
in�nite, somod(�; �) = !. In isolation from other epistemic concerns, minimizing
convergence time is a trivial objective: An inquirer who never changes his initial
conjecture converges immediately. The interesting question is what reliablemethods converge as fast as possible. We can use the admissibility principle
to evaluate the speed of a reliable method as follows.
De�nition 6.1 Let K be background knowledge, and let H be a collection ofalternative empirical hypotheses.
1. A method � dominates another method �0 given K;H with respect to con-
vergence time ()
(a) for all data streams � consistent with K;mod(�; �) � mod(�0; �), and
(b) for some data stream � consistent with K;mod(�; �) < mod(�0; �).
2. A method � is data-minimal given K;H () � is not dominated givenK;H with respect to convergence time by another method �0 that is reliable
for K;H .
The term \data-minimal" is standard usage in learning theory (due to Gold);
it expresses the idea that methods that converge as soon as possible make e�-
cient use of the data. Data-minimal methods are exactly the ones that satisfy
a simple, intuitive criterion: They always take their conjectures seriously, in
the sense of Section 5.2. In other words, with a data-minimal method �, it is
possible at each stage of inquiry that � locks on to its current, correct, conjec-
ture. Thus data-minimality underwrites the requirement of always producing
consistent theories (cf. Section 3.2).
Proposition 6.1 Let K be background knowledge, and let H be a collection of
alternative empirical hypotheses. A reliable method � for H is data-minimalgiven K () � always takes its conjectures seriously given K.
Figure 6.1 shows why a data-minimal method must take its conjectures se-
riously.
If a method fails to take its conjecture seriously on some �nite data sequence
e consistent with K, then we can speed it up on some data stream � extending
97
Figure 6.1: A data-minimal method must project its conjectures: �0 dominates
� with respect to convergence time.
98
e without slowing it down on other data streams, because � is not converging
on other data streams anyway. Figure 6.2 illustrates the converse.
If a method � always takes its conjectures seriously given K, any other
method �0 that seeks to �nd the true hypothesis before � on some data stream �
must disagree with the conjecture of � at some point along �, say at stage n. But
since � always takes its conjectures seriously, it does so after receiving evidence
�jn, and locks onto its conjecture �(�jn) on some data stream � extending �jn
for which �(�jn) is correct. Since we supposed that �0 disagrees with � on �jn;
the theory of �0 on �jn must be false on the data stream � . So the method �0
does converge to a correct theory on data stream � by time n, but only later on
data stream � , if at all. So � is faster than �0 on � . This shows that methods
that always take their conjectures seriously are data-minimal. The proof of
Proposition 6.1 formalizes these arguments.
Thus data-minimality imposes a plausible but weak constraint on inductive
inquiry; whenever there is a reliable method for a given discovery problem, we
can choose a data-minimal reliable method that always takes its conjectures
seriously (any bumping pointer method will do).
6.3 Retractions
Thomas Kuhn argued that one reason for sticking with a scienti�c paradigm in
trouble is the cost of retraining and retooling the scienti�c community [Kuhn 1970].
The philosophical literature around \minimal change" belief revision shows
that avoiding retractions is a plausible desideratum for theory change. Simi-
larly, learning theorists have investigated methods that avoid \mind changes"
[Putnam 1965]. For discovery methods, this motivates a di�erent criterion for
evaluating the performance of a method on a given data stream: We want
methods whose conjectures vacillate as little as possible.
Let � be a discovery method for a collection of alternatives H. I say that
� retracts its conjecture on a data stream � at time n + 1, or changes its
mind at �jn+ 1, if �(�jn) is consistent and entails a hypothesis H from H, but
�(�jn + 1) either is inconsistent or does not entail H. I denote the number of
times that a method � changes its mind on a data stream � byMC(�; �); formally,
MC(�; �) = jfn : � changes its mind at �jn + 1gj. If � does not stabilize to a
hypothesis on a data stream �, then MC(�; �) is in�nite. I de�ne admissibility
with respect to avoiding retractions as follows.
De�nition 6.2 Let K be background knowledge, and let H be a collection of
alternative empirical hypotheses.
1. A method � dominates another method �0 given K;H with respect to mind
changes ()
(a) for all data streams � consistent with K;MC(�; �) �MC(�0; �), and
99
Figure 6.2: Method � always projects its current conjecture and hence is data-
minimal.
100
(b) for some data stream � consistent with K;MC(�; �) < MC(�0; �).
2. A method � is mind-change{minimal given K;H () � is not domi-
nated given K;H with respect to mind changes by another method �0.
I shall use the terms \mind-change{minimal" and \retraction-minimal" to
mean the same thing. Retraction-minimality as just de�ned is a trivial criterion:
The only methods that satisfy it are those that never change their minds. For the
\skeptical" method from Section 3.2 that never goes beyond the evidence never
changes its mind; hence no retraction-minimal method does. These skeptical
methods are reliable only if the evidence is eventually guaranteed to eliminate
all alternatives but the correct one. 1 In other words, they are reliable only if
there is no genuine problem of induction|for the problem of induction arises
precisely when no �nite amount of evidence entails the true theory.
Can we use the desideratum of avoiding mind changes to choose among reli-
able methods, as with the aim of �nding a true theory sooner rather than later?
Long-run reliability forces the skeptic to eventually take a chance and conjec-
ture a hypothesis that goes beyond the evidence (assuming that the discovery
problem is such that the alternatives under consideration are genuine general-
izations). But because long-run reliability is consistent with any behavior in
the short run, the skeptic may delay this moment and the risk of having to
retract her theories, for as long as she pleases. For any amount of time that the
skeptic may choose to delay taking an inductive risk, she could have delayed
more and been just as reliable in the long run. It follows that a reliable method
is mind-change{minimal just in case it never changes its mind; that is, just in
case it never goes beyond the available evidence.
Proposition 6.2 Let H be a collection of alternative hypotheses, and let Kbe given background knowledge. Then there is a reliable mind-change{minimalmethod for H given K () on every data stream �, there is a time n such that
�jn and K entail the true hypothesis H in H.
In Chapter 9 I show that a method is admissible|with respect to any epis-
temic value|just in case the method is sequentially optimal, that is, optimal
at each stage of inquiry (cf. Section 4.5). Proposition 6.2 implies that if a
discovery problem is genuinely inductive, there is no reliable method that is
admissible with respect to mind changes; hence there is no method that is
sequentially optimal with respect to mind changes. This result points to an in-
teresting methodological phenomenon: myopically avoiding retractions at each
stage of inquiry leads to inferences that are not reliable in the long run. Con-
sider again the physical symbol system hypothesis, according to which there is
some computer programwhose intelligence matches that of humans. If cognitive
scientists experience a series of failures in building an intelligent system, they
1as in Kitcher's \eliminative induction"; see Section 4.6.
101
must eventually abandon the physical symbol system hypothesis, or else risk the
possibility that they might go on for eternity searching for the path to machine
intelligence when there is none (cf. Section 4.4). But just when should they
become pessimistic about the prospects of cognitive science? Suppose that the
researchers try programs m1;m2; :::;mn , in vain, and now consider whether to
give up the physical symbol system hypothesis, or else to try one more program
mn+1 before they conjecture that machine intelligence is impossible. As far as
long-run reliability is concerned, it makes no di�erence whether they give up the
belief in machine intelligence after seeing the failures on the �rst n machines, or
after trying another one. But with respect to retractions, maintaining faith in
cognitive science until the system mn+1 has been tried dominates giving up the
belief in cognitive science beforehand. For if mn+1 is successful, the researchers
need not have retracted their belief in the physical symbol system hypothesis.
But if mn+1 fails too, AI researchers may reason in the same way again:
trying one more systemmn+2 before recanting their faith in arti�cial intelligence
might save them a mind change, without risking any additional retractions. If
the researchers continue to avoid changing their belief in cognitive science in
this way, they will never recognize that machine intelligence is impossible when
it is so.
The cognitive scientists' dilemma is an instance of the general problem of
when exactly an inquirer or a group of inquirers should abandon their current
paradigm. The scientists must eventually jump ship if they want to avoid fol-
lowing the wrong paradigm for ever. But as Thomas Kuhn observed, there is
no particular point at which the revolution must occur [Kuhn 1957]. Indeed,
short-run considerations such as avoiding the embarrassment of dismissing their
prior work|that is, avoiding retractions|pull scientists in the direction of con-
servatism. [Kelly et al. 1997] discusses how the problem of deciding among
scienti�c paradigms looks from a reliabilist perspective. This problem is com-
mon to all forms of inquiry that aim at convergence to the right answer|for
example, philosophers aiming at \re ective equilibrium" face it too. 2
We saw that applying admissibility to the aim of avoiding retractions yields a
standard of performance that is too high for the interesting inductive problems,
in which we cannot simply rely on the evidence and background knowledge to
eliminate all but the true alternative. Learning theorists have examined another
decision criterion by which we may evaluate the performance of a method with
2For example, Rawls says that \since we are using our reason to describe itself and reason
is not transparent to itself, we can misdescribe our reason as we can anything else. The strug-
gle for re ective equilibrium continues inde�nitely, in this case as in all others"[Rawls 1996,III.1.4.]. At the same time, he acknowledges that \of course, the repeated failure to formulate
the procedure [for determining what political choices are just] so that it yields acceptableconclusions may lead us to abandon political contructivism. It must eventually add up or
be rejected." [Rawls 1996, III.1.4, fn. 8]. So part of the ongoing struggle for re ective equi-librium is to \eventually" reject philosophical paradigms (such as constructivism in political
philosophy) that \fail repeatedly". [Glymour and Kelly 1992] interpret Platonic inquiry into
the structure of concepts in terms of convergence to a correct hypothesis.
102
respect to retractions: the classic minimax criterion. Minimaxing retractions
turns out to be a very fruitful principle for deriving plausible constraints on the
short-run inferences of reliable methods.
6.4 Minimaxing Retractions
The minimax principle directs an agent to consider the worst-case results of
her options and to choose the act whose worst-case outcome is the best. So
to minimax retractions with respect to given background knowledge K, we
consider the maximum number of times that a method might change its mind
assuming that K is true, which is given by maxfMC(�; �) : � 2 Kg. 3 If
maxfMC(�; �) : � 2 Kg < maxfMC(�0; �) : � 2 Kg, minimaxing retractions
directs us to prefer the method � to the method �0. As is the case with mind-
change{minimal methods, the principle of minimaxing retractions by itself is
trivial, because the skeptic who always conjectures exactly the evidence never
retracts anything. But using the minimax criterion to select among the reliablemethods the one that minimaxes retractions yields interesting results, as we
shall see shortly. The following de�nition makes precise how we may use the
minimax criterion in this way.
De�nition 6.3 Suppose that � is a reliable discovery method for alternative
hypotheses H given background knowledge K. Then � minimaxes retractions
() there is no other reliable method �0 for H given K such that maxfMC(�; �) :
� 2 Kg > maxfMC(�0; �) : � 2 Kg.
If a discovery problem is such that there is no bound on the number of
times that a reliable method may have to change its mind to arrive at the
truth (as in the case of the almost universal generalizations from Section 2.3.2),
maxfMC(�; �) : � 2 Kg is in�nite for all reliable methods �, and the minimax
criterion has no interesting consequences. But if we can guarantee that a reliable
method � can succeed in identifying the correct hypothesis without ever using
more than n mind changes, the principle selects the method with the best such
bound on vacillations. I say that a method � identi�es a true hypothesis from
a collection of alternatives H given background knowledge K with at most n
mind changes if � is a reliable method for H given K, and maxfMC(�; �) :
� 2 Kg � n. The goal of minimaxing retractions leads us to seek methods that
succeed with as few mind changes as possible; learning theorists refer to this
paradigm as discovery with bounded mind changes [Kelly 1995, Ch.9] .
To get a feel for what minimaxing mind changes is like, let us consider a
simple example. Suppose a scientist wants to investigate whether a certain
particle, say a neutrino �, exists or not. Imagine that the physicist undertakes
a series of experiments e1; e2; :::; en ; :::(bigger accelerators, more sophisticated
detection devices, pure neutrino beams); see Figure 6.3
3If fMC(�; �) : � 2 Kg has no maximum, let maxfMC(�; �) : � 2 Kg = !.
103
Figure 6.3: In Search of a Neutrino.
104
Suppose that the physicist believes that if the particle � exists, one of the
experiments will eventually turn it up, so the existence of the particle is not glob-
ally underdetermined. The situation is essentially the same as with the question
of whether all swans are white (cf. Figure 2.7). What should the physicist's
inferences be? If the particle turns up, there is no problem|conclude that the
particle exists. Di�culties arise when experiment after experiment fails to de-
tect v. The physicist may withhold judgment for a while, but eventually she
must conjecture either that the particle exists or not, or else she never gives an
answer to the question under investigation. Should her �rst hypothesis be that
� will turn up eventually, or that it does not exist? If she conjectures that the
particle does exist, without having found it, reliability requires that she even-
tually change her mind if it does not appear|for else she would never abandon
her mistaken belief in its existence. But after the point at which she �nally con-
cludes that the particle will not be found, a more powerful experimental design
could turn it up, forcing her to change her mind for a second time. By contrast,
a physicist whose initial hypothesis is that � does not exist, may maintain this
view as long as the particle is not detected, and change his mind when it is.
That physicist need never change his hypothesis if � does not exist, and retracts
his view exactly once if � is found. Since the second physicist requires at most
one mind change, but the �rst may undergo two, the second minimaxes mind
changes but the �rst does not. In other words, the principle of minimaxing
mind changes requires that the physicist should �rst conjecture that the parti-
cle in question does not exist. So in this case, minimaxing mind changes leads
to a version of Occam's razor: Do not posit the existence of entities that are
observable in principle but that you have not yet observed.
The traditional version of Occam's razor|\do not needlessly multiply entities"|
applies to cases of global underdetermination, but minimaxing retractions does
not. For example, if the physicist thinks that detecting the particle � in an ex-
periment (or perhaps several experiments) indeed proves its existence, but that
� might be a \hidden" particle that is beyond the reach of physicists' appara-
tus, the hypothesis that � exists is globally underdetermined, and there is no
reliable method for investigating it, let alone a reliable method that minimaxes
retractions. Occam's razor nonetheless directs us to assume that v does not ex-
ist. So although Occam's razor and minimaxing retractions sometimes give the
same result, they are di�erent principles. Indeed, we shall see in Chapter 8 that
minimaxing retractions can even require a theorist to posit hidden particles.
Another example in which the principle of minimaxing retractions leads to
the intuitively plausible recommendation is Goodman's much-discussed \Riddle
of Induction". Recall from Section 2.3.3 that we may think of Goodman's
Riddle of Induction as a discovery problem, with the possible data streams and
hypotheses of interest as indicated in Figure 6.4.
By the same argument as in the search for the neutrino, reliability and
minimaxing retractions directs us to project �rst that all emeralds are green,
rather than any of the \grue" predicates. But a reliable projection rule that
105
Figure 6.4: The Riddle of Induction.
106
minimaxes retractionsmay still wait as long as it pleases to make any projections
about the future. Moreover, we might begin by projecting \green", and produce
the contradiction if a blue emerald is found for as long as we like, provided we
eventually project the appropriate \grue" predicate. Such a method is reliable
and retracts its projection at most once. In both of these cases, the methods that
deviate from the natural projection rule|project \green" until a blue emerald
is observed, then project the appropriate \grue" predicate|fail to take some of
their conjectures seriously, because they do not project them along any possible
sequence of reports on emerald colors. By Proposition 6.1, these methods are
not data-minimal. Hence the only reliable data-minimal projection rule that
minimaxes mind changes is the natural one. (I give a rigorous proof of this fact
in Section 6.8.)
Fact 6.3 In the Riddle of Induction, the only projection rule that is reliable,data-minimal and minimaxes retractions is the natural one.
Since minimaxing retractions yields a form of Occam's razor as well as the
natural projection rule in the Riddle of Induction (in conjunction with relia-
bility and data-minimality), one might think that this solution to the Riddle
is \nothing but" an implicit appeal to Occam's razor. To see that this is not
so, consider the following modi�cation of the Riddle, in which an alternating
hypothesis|the �rst emerald is green, the second blue, the third green, the
fourth blue, etc.|replaces \all emeralds are green". Instead of \grue" predi-
cates, we have hypotheses asserting that the colour alternations come to an end
at a certain point; see Figure 6.5.
As the reader may verify easily, the only reliable data-minimal method for
this discovery problem that minimaxes retractions projects the alternating hy-
pothesis until it is falsi�ed. This example also shows that the means-ends so-
lution to the Riddle does not rest on a syntactic criterion of the sort that was
the target of Goodman's criticism, such as higher \degrees of con�rmation" be-
stowed on universal generalizations of \basic predicates". Nor does it appeal
to notions of intuitive \simplicity" or \uniformity"; the alternating hypothesis
seems to be have neither of these attributes. Instead, my solution depends on
pragmatic features of the problem: which data streams the inquirer regards as
serious possibilities, and which alternative hypotheses he entertains. In Good-
man's Riddle, the topology of the space of possible data streams is such that
minimaxing retractions rules out projecting any of the \grue" predicates. This
connection between the natural projection rule and minimaxing retractions is
not an accident: The structure that Goodman described is exactly the kind of
structure in which the principle of minimaxing retractions applies. The ques-
tion about the existence of the neutrino shares this topological structure, and we
shall encounter it again in the problem of describing reactions among elemen-
tary particles (in Chapter 8 ). The next section characterizes this topological
107
Figure 6.5: Another Riddle of Induction.
108
structure and shows precisely how it is linked to minimaxing retractions.4
6.5 A Characterization of DiscoveryWith Bounded
Mind Changes
A reliable discovery method � identi�es a correct hypothesis from a collection
of alternatives H with at most n mind changes given background knowledge
K if � does not change its mind more than n times on any data stream �
consistent with K. That is, � succeeds with at most n mind changes if � is a
reliable discovery method for H given K, and maxfMC(�; �) : � 2 Kg � n. The
next proposition characterizes what background knowledge K must be like if
there is a discovery method � that reliably identi�es a correct hypothesis from
a collection of alternatives H and never changes its mind more than n times
on any data stream � consistent with K. I de�ne the characteristic condition
inductively, starting with discovery without any mind changes. Consider an
initial conjectureH. Suppose that H is not certain. Then any reliable discovery
method starting with H has to change its mind if H is false. If after this mind
change, still n more mind changes are required, a total of n + 1 mind changes
may result. So if a reliable discovery method � whose initial conjecture is H
never requires more than n mind changes, there must be some point at which
� can change its mind and incur no more than n � 1 mind changes whenever
H is false. The structures that meet this requirement look like \feathers". I
write Fn(K;H) for \K is an n -feather for H". The intended interpretation of
Fn(K;H) is \every reliable discovery method starting with H requires at least
n+ 1 mind changes given K".
De�nition 6.4 Let H be a collection of empirical hypotheses, and let K bebackground knowledge. Let H(�) stand for the (unique) member of H correct on
�. Write Fn(K;H) for \ K is an n-feather for H", and de�ne this notion asfollows.
� F0(K;H)() 9� 2 K �H.
� Fn+1(K;H)() 9� 2 K �H:8k: Fn(K \ [�jk];H(�)):
To illustrate this de�nition, the background knowledgeK in the Riddle of In-
duction is a 0-feather forHgreen (\all emeralds are green")|that is, F0(K;Hgreen)
holds|because Hgreen is not true on every data stream consistent with K (see
Figure 6.4).
4The topological perspective points to a deep interpretation of the signi�cance of Good-man's Riddle: The Riddle shows that translation does not necessarily preserve metric notions,
such as \distance" in the degree of con�rmation of alternative hypotheses|translation changesthe con�rmation ordering. On the other hand, translations do preserve topological relations,
and these determine which hypotheses minimax retractions. It seems that Goodman provided
us with an example ofa translation that is a homeomorphism but not an isometry.
109
But K is not a 1-feather for Hgreen, that is, :F1(K;Hgreen) holds: for
every data stream � in K on which Hgreen is false|on which a blue emerald
is observed at, say time k|there is an initial segment �jk, such that K \ [�jk]
is not a 0-feather for Hgrue(k) (\all emeralds are grue(k)"), that is, Hgrue(k)
is entailed by K; �jk. By contrast, K is a 1-feather for any Hgrue(n), that is,
F1(K;Hgrue(n)) holds: A given hypothesis Hgrue(n) is false on the sequence �
of all green emeralds. And no initial segment � jk entails Hgreen = H(�). So
K \ �jk is a 0-feather for Hgreen.
Figure 6.6 shows the general structure of 0 and 1-feathers, and Figure 6.7illustrates
2-feathers and 3-feathers.
The next lemma shows that feather structures characterize how many mind
changes a discovery method requires that starts with a certain initial conjecture.
Lemma 6.4 Let H be a collection of alternative hypotheses, and let backgroundknowledge K be given. Then there is a reliable discovery method � for H given
K such that
1. � succeeds with at most n mind changes, and
2. �(;) j= H
()(K;H) is not an n-feather (that is, :Fn(K;H)).
The last complication is that a reliable method may delay conjecturing any
of the alternative hypotheses. In fact, minimaxing retractions can require arbi-
trarily long delays. Consider a single hypothesisH under test, and suppose that
the scientist's background knowledge K implies that by time n, the evidence
is guaranteed to entail either H or its negation, but not by any earlier time.
(Figure 6.8 illustrates this scenario.)
A method that accepts only the evidence until time n thus succeeds without
any mind changes; but any conjecture as to whether H is true or false made
before time n runs the risk of refutation.
Thus the full characterization of discovery with bounded mind changes is
this: A reliable method must use at least n + 1 mind changes, if and only
if, there is one data stream � consistent with background knowledge K such
that every initial segment �jk is an n -feather for every hypothesis H under
consideration.
Proposition 6.5 Let H be a collection of alternative hypotheses, and let back-ground knowledge K be given. Then there is a reliable discovery method � for H
given K that succeeds with at most n mind changes () for every data stream �
consistent with K; there is a time k such that (K\[�jk];H(�)) is not an n-feather
(i.e., :Fn(K \ [�jk];H(�)).
110
Figure 6.6: \Feather" Structures characterize Discovery with Bounded Mind
Changes. The �gure illustrates 0-feathers and 1-feathers.
111
Figure 6.7: 2-feathers and 3-feathers
112
Figure 6.8: Minimaxing Retractions requires waiting until time n.
113
From Proposition 6.5, we can derive a universal method �MC for reliably
identifying a correct hypothesis from H given K when (K;H) is not an n-
feather. Say that the dimension of (K;H) is n if (K;H) is an n-feather but not
an n + 1 feather. �MC begins by conjecturing nothing but the evidence until
(K\ [e];H) is of dimension n. Let a �nite data sequence e�x be given; if there is
an H 0 such that (K\ [e�x];H 0) is of lower dimension than (K\ [e�x]; �MC(e)),
then �MC(e � x) = H0; otherwise �MC(e � x) = �MC(e).
6.6 The Hierarchy of Cognitive Goals
So far I have applied admissibility and minimax to mind changes, and admissibil-
ity to convergence time, to arrive at standards of e�ciency for reliable methods.
Methods that minimax convergence time are those that settle on a true hypoth-
esis by a deadline n known a priori. For without such a deadline, there is no
bound on the time that a reliable method may require.
Proposition 6.6 Let H be a collection of alternative hypotheses with back-ground knowledge K. Then there is a reliable discovery method � for H given
K that minimaxes retractions () there is some time n such that for all datastreams � in K; �jn entails one of the hypotheses H in H.
It follows immediately from Proposition 6.6 and Proposition 6.2 that min-
imaxing convergence time places stronger demands on inductive inquiry than
mind-change{minimality does. Both standards of e�ciency require that eventu-
ally the data must entail the correct hypothesis; but minimaxing speed requires
in addition that the evidence yields certainty by a set time.
I complete my survey of reliable e�cient inquiry with another epistemic
value: the number of errors that a reliable method makes before it settles on
the right answer. Formally, error(�; �) = jfn : �(�jn) is false on �gj. We can
apply the admissibility and minimax principle to errors in the by now familiar
way to de�ne two more standards of e�ciency for reliable methods. It turns
out that these do not yield anything new: The only methods that are error-
admissible and minimax error are the \wait-and-see" methods that never go
beyond the evidence. The following theorem brings together a number of the
results from this chapter.
Theorem 6.7 Let H be a collection of alternative hypotheses with backgroundknowledge K. The following conditions are equivalent:
1. Each hypothesis H in H is decidable with certainty given K.
2. There is a reliable discovery method � for H given K that succeeds withno retractions.
114
3. There is a mind-change{minimal reliable discovery method � for H givenK.
4. There is an error-admissible reliable discovery method � for H given K.
5. There is a reliable discovery method � for H given K that succeeds with abounded number of errors.
Theorem 6.7 and Proposition 6.6 establish the lower part of the hierarchy
shown in Figure 6.9.
The fact that empirical inquiry can attain several standards of performance
only when there is no problem of induction is a precise sense in which the
problem of induction makes inquiry di�cult.
Proposition 6.5 demonstrates that there is no bound to the number of mind
changes that discovery problems may require: We can always add one more
feather dimension to make one more mind change necessary. This constitutes an
in�nite subhierarchy|discovery with 0 mind changes, 1 mind change,...|within
the order of cognitive goals. Finally, I noted in Section 6.2 that any solvable
discovery problem has a data-minimal solution. Hence data-minimality takes
the top place in the hierarchy of cognitive goals because it does not discriminate
among problems with reliable solutions.
The hierarchy explains the power data-minimality and minimaxing retrac-
tions in constraining inductive inferences: These two goals are the only ones
that apply when there is a genuine problem of induction. Of these two, min-
imaxing retractions is the more powerful constraint, as our analysis of various
inductive problems showed.
6.7 Data-Minimality vs. Minimaxing Retrac-
tions
Data-minimal methods may be forced to take back their conjectures much more
often than methods that are not evaluated by their convergence time. Figure 6.8
showed an example. A data-minimal method � cannot wait until the evidence
rules out all but one alternative. It must immediately take a guess. But the
next datummay refute that guess; by data-minimality, � has to produce another
conjecture which may be refuted immediately, etc., until � has changed its mind
n times in the worst case. On the other hand, with questions such as Goodman's
Riddle of Induction, the existence of a particle, and|we shall see|describing
the physically possible reactions among elementary particles, an inquirer can
epistemically have it all: Reliably �nd the truth and minimize convergence
time and retractions. In these cases reliable inductive inferences that are data-
minimal and minimax retractions seem to have special intuitive appeal.
This observation is relevant to the status of the minimax criterion as a
decision-theoretic principle for evaluating the performanceof inductive methods.
115
Figure 6.9: The Hierarchy of Cognitive Goals.
116
On general decision-theoretic grounds, [Levi 1980] proposes to evaluate options
�rst by the admissibility criterion, and to make further discriminations among
admissible options by the minimax criterion, much as I am proposing to evaluate
methods �rst by admissibility with respect to convergence time and then by
minimax with regard to retractions. On the other hand, minimaxers are often
criticized for being unreasonably adverse to risks. 5 If an agent thinks that his
chance of getting both birds in the bush is high enough, he should give up the
one he has in the hand. Whether one agrees with this criticism of minimaxing or
not, it does not pertain to data-minimal discovery with bounded mind changes.
For at each stage of inquiry, data-minimal methods may converge to a correct
hypothesis with no further mind changes. In general, the best-case performance
of a data-minimalmethod that minimaxes retractions will be just as good as that
of another method that requires more mind changes. Data-minimal methods
that minimax retractions do go for the birds in the bush, but they make sure
that they at least get the one in their hand.
The next proposition determines the exact extent to which data-minimal
methodsmay have to undergo extra mind changes to solve an inductive problem.
I begin with a variant of De�nition 6.4. If a reliable data-minimal discovery
method can succeed with nmind changes, then whenever the previous conjecture
H of a data-minimal method is refuted, the methodmust be able to immediately
change its mind to a conjectureH 0 after which no more than n�1 mind changes
are required. Another way of putting the matter is that the universal method
�MC for discovery with bounded mind changes is not data-minimal unless for
someH 0; (K\[e�x];H 0) is of lower dimension than (K\[e�x]; �MC(e)) whenever
e � x falsi�es the previous conjecture �MC(e): This is re ected in clause 2 of
the next de�nition. The intended interpretation of DM-Fn(K;H)|read \K is
a data-minimal|n-feather for H"|is \a reliable data-minimal method whose
initial conjecture is H requires at least n+ 1 mind changes".
De�nition 6.5 Let H be a collection of empirical hypotheses, and let K be
background knowledge.
� DM-F0(K;H)() 9� 2 K �H
� DM-Fn+1(K;H) () if H is consistent with K, then 9� 2 K � H suchthat
1. 8k :DM-Fn(K \ [�jk];H(�)), or
2. 9k : [�jk];K j= H, and 8H0 in H consistent with K \ [�jk]: DM-
Fn(K \ [�jk];H 0):
5This issue arises in a prominent place in political philosophy. In his famous reduction of
questions of justice to questions of rational choice behind \the veil of ignorance", Rawles arguesthat agents will apply the minimax criterion to choose social arrangements [Rawls 1971].
Harsanyi for one has contended that rational agents should be prepared to take more risks
than the minimax criterion allows them to do [Harsanyi 1975].
117
Since data-minimal methods must immediately project one of the alternative
hypotheses, a data-minimal method cannot wait for evidence before making a
conjecture; otherwise the characterization is analogous to Proposition 6.5.
Proposition 6.8 Let H be a partition of K. Then there is a reliable method �
for H given K such that
1. � starts with H, and
2. � requires at most n mind changes, and
3. � is data-minimal
() (K;H) is not a data-minimal n-feather, i.e. : Part-DM-Fn(K;H).
6.8 Proofs
Proposition 6.1 Let K be background knowledge, and let H be a collection ofalternative empirical hypotheses. A reliable method � for H is data-minimalgiven K () � always takes its conjectures seriously given K.
Proof. ()) I show the contrapositive. Suppose that there is some �nite
evidence sequence e (consistent withK) such that � does not take its conjecture
�(e) seriously. Let e1 be a shortest data sequence that extends e such that �
does project an hypothesis H from H that is entailed by �(e1) along some data
stream � 2 K. Since � does not project �(e), e1 must properly extend e; hence
we may take e1 = e0 � x, where x is the last datum that appears in e1. Now
de�ne �0 by �0(e0) = �(e1), and �
0(e0) = �(e0) at all data sequences e0 di�erent
from e0. I show that �0 weakly dominates �. By construction, �0 projects the
hypothesisH along � at e0. Thus mod(�0; �) � lh(e0). By contrast, the choice of
e0 implies that � does not stabilize to H along � at e0, so mod(�; �) > lh(e0). So
�0 converges on � faster than � does. Furthermore, on no data stream consistent
with background knowledge K does �0 converges after �. For the only place at
which � and �0 di�er is e0, and by assumption � is not converging at e0 on any
data stream � consistent with K , since e0 is shorter than e1 and so � doesn't
take its conjecture at e0 seriously given K. This establishes that �0 weakly
dominates �.
(() Suppose that � always takes its conjectures seriously given K. Consider
some other reliable method �0 that converges faster than � on some data stream
� 2 K (i.e., mod(�0; �) < mod(�; �)). Let H be the hypothesis correct on �, and
let k be the �rst time after �0 converges on � (that is, k � mod(�0; �)) such that
�(�jk) j= H0 6= H. Now by hypothesis, � projects its conjecture H 0 along some
data stream � 2 K at �jk; that is, k � mod(�; �). Since �0 projects H along �
at k; �0 does not entail H 0 at �jk = � jk. Thus mod(�0; �) > k � mod(�; �). So �0
does not dominate � in convergence time. Since any method �0 that dominates
118
� in convergence time must be strictly faster than � on some data stream � 2 K,
this argument shows that � is data-minimal.2
Proposition 6.2 Let H be a collection of alternative hypotheses, and let Kbe given background knowledge. Then there is a reliable mind-change{minimal
method for H given K () on every data stream �, there is a time n such that�jn and K entail the true hypothesis H in H.
Proof. (() Given the right-hand side, the \skeptical" method � whose theory
is always exactly the evidence reliably identi�es a true hypothesis from H and
never retracts its beliefs.
()) Let � be a reliable mind-change{minimal discovery method for H. Be-
cause beginning with the trivial conjecture �(;) =K does not increase the num-
ber of mind changes of a method on any data stream, we may without loss of
generality assume that �(;) = K. Now suppose that on some data stream �
consistent with K;MC(�; �) > 0. Then there is a �rst time m0 > 0 at which �
makes a non-vacuous conjecture, that is, �(�jm0) is consistent and entails some
hypothesis H from H, and a �rst time m1 > m0 at which � makes a mind
change, that is, �(�jm1) 6j= H or �(�jm1) = ;. Then the following method �0
weakly dominates � with respect to mind changes given K: If �jm0 � e � �jm1;
let �0(e) = K. Otherwise �0(e) = �(e). Then �0 conjecturesK along � until �jm1,
soMC(�0; �) =MC(�; �)�1. And clearly �0 never uses more mind changes than
� does. So � is weakly dominated with respect to mind changes given K. This
shows that if there is a reliable mind-change(minimal method for H given K,
then there is a reliable method � that never changes its mind along any data
stream in K. Such a method � never entails more than the evidence. For sup-
pose that e;K do not entail �(e). Then there is a data stream � consistent with
e and K on which �(e) is false. So to succeed on �; � must change its mind at
least once (after e), contrary to supposition. Since � never entails more than
the evidence and is reliable, eventually the evidence and background knowledge
must conclusively establish which alternative in H is the true one, no matter
what it is.2
Fact 6.3 In the Riddle of Induction, the only projection rule that is reliable,
data-minimal and minimaxes retractions is the natural one.
Proof. The natural projection rule �N conjectures that all emeralds are green
until it encounters a blue one; suppose that the k-th emerald is blue. Then �N
concludes that all emeralds are grue(k). If all emeralds are green, then �N
converges to the right generalization immediately. Otherwise, �N changes its
mind, for the �rst time, after the �rst blue emerald turns up. Hence|assuming
that all emeralds are green or grue(k)|�N �nds the correct generalization with
at most one mind change. Finally, �N always takes its conjectures seriously, so
by Proposition 6.1, �N is data-minimal.
119
Now consider any projection rule � that is reliable, data-minimal and mini-
maxes mind changes; I show that � = �N . Since � is reliable, � must eventually
infer that all emeralds are green if only green emeralds are observed. Let m be
the minimal number of green emeralds from which � generalizes that all emer-
alds are green. I argue that m = 1, that is, � must immediately infer that all
emeralds are green when one green emerald is observed. For suppose otherwise
(m > 1). Since � is data-minimal, � must project some hypothesis other than
\all emeralds are green" before the m-th green emerald is observed. That is, �
changes its mind when the m-th emerald appears. But after � has inferred that
all emeralds are green from the sample of m green emeralds, a blue emerald
may be found, say at time k, which establishes that all emeralds are grue(k).
Since � is reliable and data-minimal, � must then change its mind for the sec-
ond time to conclude that all emeralds are grue(k). So if m > 1, then � does
not minimax retractions; thus � infers that all emeralds are green after seeing
the �rst green emerald. Since � is data-minimal, � projects this hypothesis as
long as all observed emeralds are green. And again by data-minimality, � con-
cludes immediately that all emeralds are grue(k) if the k-th emerald is blue.
Thus � = �N ; that is, the only reliable and data-minimal projection rule that
minimaxes retractions in the Riddle of Induction is the natural projection rule.2
Lemma 6.4 Let H be a collection of alternative hypotheses, and let backgroundknowledge K be given. Then there is a reliable discovery method � for H given
K such that
1. � succeeds with at most n mind changes, and
2. �(;) j= H
()(K;H) is not an n-feather (i.e., :Fn(K;H)).
Proof. The proof is by induction on n.
Base Case, n = 0:
(() Suppose that (K;H) is not a 0-feather. Then H is a priori certain, that
is, K j= H. So the method � that always conjectures H reliably identi�es the
truth from H with 0 retractions.
()) Suppose that (K;H) is a 0-feather. Then there exists a data stream
� 2 K �H. Let � be any reliable method that starts with H (i.e., �(;) = H).
Since � is reliable, � changes its mind on � at least once. Hence every reliable
method that starts with H may change its mind at least once.
Inductive Step: Assume the hypothesis for n and consider n+ 1.
(() Suppose that (K;H) is not an n + 1-feather. Then for every data
stream � 2 K � H, there is a time k such that :Fn(K \ [�jk];H(�)). By
inductive hypothesis, for each such point �jk, we may choose a method ��jk and
a hypothesis H�jk such that �(;) = H�jk and � succeeds with at most n mind
changes given K \ [�jk].
120
Now de�ne a discovery method � that reliably identi�es a correct hypothesis
from H given K with no more than n+ 1 mind changes, starting with H:
1. �(;) = K \H;
2. If there is a time k such that
(a) 0 < k � lh(e), and
(b) (K \ [�jk]) is not an n+ 1-feather for some H 0 in H,
then let k be the least such time and conjecture ��jk(e) .
3. Otherwise, conjecture K \ [e] \H.
To see that � succeeds with at most n+ 1 mind changes, consider any data
stream � 2 K.
Case 1: Clause 3 always obtains along �. Then � converges to H with 0
retractions. Since (K;H) is not an n+ 1-feather, � 2 H, and � is correct.
Case 2: Clause 2 obtains at some point k along �. Assume that k is the �rst
such point. Then on �, time k is the earliest at which � might change its mind.
After time k, � follows ��jk and hence succeeds with at most n mind changes.
Hence overall, � changes its mind at most n+1 times along �. Since this is true
for any data stream � consistent with background knowledge K, � requires at
most n + 1 mind changes.
()) Suppose that (K;H) is an n + 1-feather. Then there is a data stream
� 2 K � H such that for all times k; (K \ [�jk];H(�)) is an n-feather (i.e.,
Fn(K \ [�jk];H(�)) holds). Let � be any reliable discovery method that starts
with H (i.e., �(;) j= H). Then some time along �, � changes its mind to H(�);
let k be the �rst such time. By inductive hypothesis, any method �0 that begins
withH(�) requires at least n+1 mind changes on some data stream � 2 K\[�jk].
In particular, the following method �0 does.
1. �0(;) = H(�);
2. if e � �jk, �0(e) = H(�);
3. if �jk � e, �0(e) = �(e).
By construction, on K \ [�jk], �0 changes its mind only after �jk, and hence
changes its mind on K \ [�jk] exactly when � does. Hence � changes its mind
at least n + 1 times on some data stream � 2 K \ [�jk]. Since � also changes
its mind before � jk = �jk; � requires at least n + 2 mind changes. Hence any
reliable method starting with H may change its mind at least n + 2 times.2
121
Proposition 6.5 Let H be a collection of alternative hypotheses, and let back-ground knowledge K be given. Then there is a reliable discovery method � for Hgiven K that succeeds with at most n mind changes () for every data stream �
consistent with K; there is a time k such that (K\[�jk];H(�)) is not an n-feather(i.e., :Fn(K \ [�jk];H(�) ).
Proof. (() Suppose that for every data stream � consistent withK; there is
a time k such that (K\ [�jk];H(�)) is not an n-feather (i.e., :Fn(K\ [�jk];H(�)).
By Lemma 6.4, for each such point �jk, we may choose a method ��jk and a
hypothesis H�jk such that �(;) = H�jk and � succeeds with at most n mind
changes given K\ [�jk]. Now de�ne a discovery method � that reliably identi�es
a correct hypothesis from H given K with no more than n mind changes:
1. If there is a time k such that
(a) 0 < k � lh(e), and
(b) (K \ [�jk];H 0) is not an n-feather for some H 0 in H,
then let k be the least such time and conjecture ��jk(e).
2. Otherwise, conjecture K \ [e].
For any data stream � 2 K, there eventually comes a �rst time k when
(K \ [�jk];H 0) is not an n-feather for some H 0 in H. After time k, � follows ��jkand hence succeeds with at most n mind changes. Before time k, � conjectures
only the evidence and hence does not change its mind.
()) Conversely, suppose that there is a data stream � 2 K such that for
all times k (K \ [�jk];H(�)) is an n-feather (i.e., Fn(K \ [�jk];H(�) ). Let � be
any reliable discovery method. Then there is a �rst time k along � at which �
conjecturesH(�) (i.e. �(�jk) j= H(�)). By the same argument as in Lemma 6.4,
� requires at least n + 1 mind changes on some data stream � 2 K \ [�jk].2
Proposition 6.6 Let H be a collection of alternative hypotheses with back-ground knowledge K. Then there is a reliable discovery method � for H givenK that minimaxes convergence time () there is some time n such that for all
data streams � in K; �jn entails one of the hypotheses H in H.
Proof. (() Suppose that there is a deadline n by which background knowl-
edge and the data entail which hypothesis is correct. Without loss of generality,
assume that n is the earliest such time. Then a reliable method � can simply
conjecture the evidence until time n . In the worst case (and in the best case), �
converges to the correct hypothesis by time n. Moreover, no other method �0 can
achieve a better guarantee on the time that it requires to �nd the truth. This is
immediate if n = 0. Otherwise, let � be any data stream such that K; �jn�1 do
not entail which hypothesis from H is correct. Since I chose n to be the earliest
122
time by which background knowledge and the evidence always entail the correct
hypothesis, there is such a data stream. Then there is a hypothesis H in H that
is true on some extension � of �jn�1|consistent with K|such that �0(�jn�1)
does not entail H. Hence �0 converges to the correct hypothesis on � only after
time n (i.e., mod(�0; �) � n). Therefore every method requires, on some data
stream consistent with background knowledge K, at least n pieces of evidence
before settling on a correct hypothesis.
()) I show the contrapositive. Suppose that for every bound n , there is a
data stream � such that K; �jn do not entail which hypothesis in H is correct.
By same argument as for the converse implication, this means that for every
method �, and every bound n, there is some data stream � consistent with K
on which � converges to the correct hypothesis only after time n. Hence there
is no maximal element in fmod(�; �) : � 2 Kg.2
Theorem 6.7 Let H be a collection of alternative hypotheses with background
knowledge K. The following conditions are equivalent:
1. Each hypothesis H in H is decidable with certainty given K.
2. There is a reliable discovery method � for H given K that succeeds with
no retractions.
3. There is a mind-change{minimal reliable discovery method � for H given
K.
4. There is an error-admissible reliable discovery method � for H given K.
5. There is a reliable discovery method � for H given K that succeeds with a
bounded number of errors.
Proof. Proposition 6.5 implies that claims 1 and 2 are equivalent. By Propo-
sition 6.2, claim 3 is equivalent to claim 1. I prove the equivalence of claims 4
and 5 with claim 1.
(1 ) 4; 5) If each hypothesis H in H is decidable with certainty given K;
the method � that conjectures nothing but the evidence reliably identi�es the
correct hypothesis from H and never makes any errors. Hence � both is error-
admissible and minimaxes error.
(1( 4) Let � be an error-admissible method. Suppose that � makes an error
on some data stream � consistent with K, say at time k. Then the following
method �0 dominates � in error: �0 agrees with � everywhere but at �jk, and
�0(�jk) = K \ �jk: Then �
0 makes strictly fewer errors than � on �, and never
makes more. So the only error-admissible methods are those that never make an
error on any data stream consistent with K. Hence no error-admissible method
makes a conjecture that goes beyond the evidence. If such a method is reliable,
then all hypotheses are decidable with certainty.
123
(1( 5) I show the contrapositive. Suppose that not every hypothesis under
investigation is decidable with certainty given K. Then there is some hypothe-
sis H in H true on some data stream � consistent with H such that H is never
entailed along �. That is, H is a limit point of H \K (see Section 4.4). Now
consider some possible bound n on the number of errors that a reliable discovery
method � may make on any data stream. Since � is reliable, � eventually con-
verges to H on �, say at time k. Thus � conjecturesH on �jk; �jk+1; :::; �jk +n.
Since H is never entailed along �, there is data stream � extending �jk+n, con-
sistent with background knowledge K on which H is false. Thus on � , � makes
at least n + 1 errors. Since this argument applies to any bound n, there is no
bound on the number of errors that � might make on a data stream. Thus there
is no reliable discovery method that minimaxes errors unless all hypotheses in
H are decidable with certainty given background knowledge K.2
Proposition 6.8 Let H be a partition of K. Then there is a reliable data-
minimal method � for H given K such that
1. � starts with H, and
2. � requires at most n mind changes, and
3. � is data-minimal
() (K;H) is not a data-minimal n-feather, i.e. : DM-Fn(K;H).
Proof. The proof is by induction on n. The base case follows as in the proof
of Proposition 6.5.
Inductive Step: Assume the hypothesis for n and consider n+ 1.
()) Suppose that (K;H) is a data-minimal n+1-feather, i.e. DM-Fn+1(K;H).
Let � be any reliable data-minimal discovery method for H that starts with H.
It follows from Proposition 6.1 that H is consistent with K. So by the de�nition
of DM-Fn(K;H); we have that there is an � such that
1. 8k :DM-Fn(K \ [�jk];H(�)), or
2. 9k : �jk;K j= H , and 8H 0 6= H;H0 in H: DM-Fn(K \ [�jk];H 0):
Case 1: 8k :DM-Fn(K \ [�jk];H(�)). The argument proceeds as in the proof
of Proposition 6.5: A reliable method � must eventually change its mind, say at
�jk; to H(�). But then by the assumption of this case, DM-Fn(K \ [�jk];H(�)),
so � requires at least n + 1 more mind changes by inductive hypothesis.
Case 2: 9k : [�jk];K j= H, and 8H 06= H;H
0 inH: DM-Fn(K\[�jk];H 0): Let
k be the �rst time that witnesses the condition of this case. Since �jk falsi�es H
given K, and � is data-minimal, it follows from Proposition 6.1 that � changes
its mind at �jk, say to H 0 6= H. But then we have that DM-Fn(K \ [�jk];H 0),
124
so by inductive hypothesis, � requires at least n+ 1 more mind changes. Hence
in either case, � requires at least n + 2 mind changes.
(() Suppose that (K;H) is not a data-minimal n + 1-feather, i.e. :DM-
Fn(K;H). At each point �jk for which there is some H 0 in H such that (K \
[�jk];H 0) is not a data-minimal n-feather (i.e. :Fn(K \ [�jk];H 0)), apply the
inductive hypothesis to (K\[�jk];H 0) and choose a method �0�jk
and a hypothesis
H�jk with the properties that
1. �0�jk(;) = H�jk.
2. �0�jk
identi�es a correct hypothesis from H given K \ [�jk] with at most n
mind changes.
3. �0�jk
is data-minimal given K \ [�jk].
If H is consistent with K \ [�jk], I modify �0�jk
as follows: Choose ��jk 2
K \ [�jk] \H. Set ��jk(e) = K \ [e] \H if e � � , and ��jk(e) = �0�jk
otherwise.
Note that ��jk is data-minimal and reliable given K \ [�jk] since �0�jk
is.
Since (K;H) is not a data-minimal n + 1-feather, H is consistent with K .
Choose a data stream � 2 K that makes H true. Now de�ne a data-minimal
discovery method � that reliably identi�es a correct hypothesis from H given K
with no more than n + 1 mind changes:
1. If e � �; �(e) = K \ [e] \H;
2. Else if there is a time k such that
(a) 0 < k � lh(e) and
(b) (K \ [ejk];H 0) is not an n-feather for some H 0 in H
then let k be the least such time and conjecture ��jk(e) .
3. else conjecture H.
By de�nition � starts with H, i.e. �(;) = K \ [e] \ H. I show that � and
identi�es the correct hypothesis fromH using no more than n+1 mind-changes.
Let � 2 K .
Case 1: Clause 1 always obtains along �. Then � = � , and so � stabilizes to
the correct hypothesis H along � (immediately).
Case 2: Clause 1 fails at some point k along � . I consider two further cases.
Case 2a: Clause 2 is satis�ed at some n along � . Let m be the �rst such
time. Two more subcases:
Case 2a1: m � k. Then � follows ��jm, which identi�es the correct hypothesis
from H given K \ [�jm] along �. If � = ��jm, then � again does not change its
mind along � at all. Otherwise ��jm changes its mind at some time m0 � m
125
from H to follow �0�jm
, and thereafter requires at most n mind-changes . Hence
� identi�es the correct hypothesis along � using at most n+ 1 mind-changes.
Case 2a2: m < k. Then ��jn projects H along � until � jk = �jk. Thereafter
��jn follows �0�jn
.
Case 2b: Clause 2 always fails along �. Then by the de�nition of a DM-n+1-
feather, H is true on �. By construction � stabilizes to H along � (immediately).
To see that � is data-minimal, note that � takes its conjecture seriously by
inductive hypothesis on any evidence sequence e on which Clause 2 obtains. So
the only case to consider is when evidence e deviates from � but Clause 2 does
not obtain anywhere along e. This implies
1. �(e) j= H, and
2. that e is consistent with H
The �rst observation holds by Clause 3. The second follows from the second
clause of the de�nition of a DM-n + 1-feather and the fact that K \ [e] is not
an n-feather for some H 0 6= H (because otherwise Clause 2 obtains, contrary to
supposition). If Clause 2 never obtains along some data stream � 2 K\ [e], then
� projects H at e. Otherwise Clause 2 obtains eventually on all data streams
extending e. In particular, Clause 2 must obtain (for the �rst time) on some
data sequence e0 � e such that K; [e0] is consistent with H. But then �e0 and
hence � projects H at e0. Since � maintains H between e and e0 (by Clause 3), �
projectsH at e. So � always takes its conjectures seriously, and thus Proposition
6.1 implies that � is data-minimal.2
126
Chapter 7
Theory Discovery
7.1 Outline
So far I have examined the problem of �nding a correct theory from a range
of mutually exclusive alternatives. This model �ts the situation of a scientist
investigating competing theories about a particular phenomenon of interest.
However, general scienti�c theories treat a broad class of phenomena, and typ-
ically the hypothesis advanced to account for one phenomenon does not rule
out accounts of another phenomenon. For example, the goal of particle physics
is to �nd the ultimate constituents of matter|the elementary particles|and
to determine what reactions elementary particles may undergo. Knowing what
elementary particles there are does not imply knowing how they react with each
other, and the observation of a given reaction does not tell us whether another
is possible. 1 A comprehensive theory of elementary particles answers all these
questions. I refer to the task of reliably �nding a theory that gives the right an-
swers to the questions under investigation as theory discovery. This chapter
develops a learning-theoretic analysis of theory discovery.
I examine two standards of success for theory discovery de�ned by [Kelly 1995,
Ch.12]: uniform and piecemeal theory discovery. Uniform theory discovery aims
to eventually arrive at a complete theory of the domain under investigation. For-
mally, this paradigm reduces to the kind of discovery problems I have treated
so far: take the range of alternative hypotheses to be the set comprising the
complete theories for the domain of inquiry. Finding a true complete theory
of the phenomena under investigation is often a demanding task, even in the
limit of inquiry. Less ambitiously, we may be satis�ed if theorists eventually
�nd the right answer about each question of interest, although there may be no
particular time at which they have all the right answers. [Kelly 1995] refers to
1unless we make assumptions about the relationships among various reactions; see Chapter
8.
127
this standard of success as piecemeal theory discovery. I generalize the norms
of e�cient inquiry from Chapter 6 by considering how theory learners perform
not just with respect to their overall theories, but also with respect to each phe-
nomenon under investigation. It turns out that minimizing the time required to
converge to a complete theory su�ces to minimize the time required to settle
the individual hypotheses (but not necessarily vice versa). On the other hand,
there is no simple relation between minimaxing the number of global theory
changes and minimaxing the number of mind changes about each hypothesis.
Piecemeal reliability o�ers an attractive alternative to \verisimilitude" as a
conception of success in science. One of Popper's main motivations for intro-
ducing the verisimilitude concept was to allow for the possibility that science
may always produce false theories (in the sense of endorsing some false claim)
and yet be \close to the truth". Piecemeal reliability too is a notion of success
that science can attain while producing nothing but false theories (cf. Section
7.4). But piecemeal reliability is a topological notion based on convergence to
the truth that does not require a dubious metric for measuring \distance" from
the truth.
A common and attractive way of formulating scienti�c theories is to express
them as a �nite set of postulates, \laws of nature" or universal equations. I
show that whenever it is possible to piecemeal �nd the truth about each hy-
pothesis under investigation, it is possible to do so by producing, at each stage
of inquiry, a �nite set of axioms about the phenomena of interest. On the other
hand, theorists who produce only �nitely axiomatizable theories may converge
to the right answers more slowly than other theorists. So the desiderata of pro-
ducing �nitely axiomatizable theories and minimizing convergence time (i.e.,
data-minimality) may con ict with each other.
The model of theory discovery developed in this chapter extends the reach
of learning-theoretic analysis considerably. In Chapter 8, I apply the machinery
from this chapter to analyze the two main research problems of particle physics:
(1) Finding out what elementary particles exist, and (2) determining how ele-
mentary particles react with each other. Another important application of the
theory discovery model is a critique of common postulates for \belief revision"
from a reliabilist point of view (cf. Section 3.5).
7.2 Reliable Theory Inference
In the context of theory discovery, I shall refer to scienti�c methods as theory
learners, theory discovery methods, or simply theorists. [Kelly 1995,
Ch.12] de�nes two senses of reliability for theory learners. Uniform theory
identi�cation requires that the learner's conjectures must eventually always
be correct and complete. Piecemeal theory identi�cation requires that, for
each hypothesis of interest, the learner's conjectures eventually always entail
the truth about the hypothesis (but at no given time need the learner entail the
128
Figure 7.1: Two Notions of Theory Discovery: (a) Uniform Theory Discovery
(b) Piecemeal Theory Discovery
truth about all hypotheses at once). Figure 7.1 illustrates these two notions of
success in inferring theories.
The formal de�nitions are as follows.
Let H be a collection of empirical hypotheses, and let K be background
knowledge. In what follows, I will assume that if a hypothesis H is in H,
then so is its negation H. An H-theory T is K-complete () for every
hypothesis H in H, either K;T j= H or K;T j= H.
De�nition 7.1 A theory learner � uniformly identi�es the H-truth givenbackground knowledge K () for every data stream � consistent with K, thereis a time n such that for all later times n0 � n; �(�jn0) is K-complete and correct
on �.
129
De�nition 7.2 A theory learner � piecemeal identi�es the H-truth givenbackground knowledge K () for every data stream � consistent with K, forevery hypothesis H in H, there is a time n such that for all later times n0 �
n; �(�jn0);K j= H () H is correct on �.
7.3 Uniform Theory Discovery
I characterize the conditions under which it is possible to reliably identify a
complete theory. We may treat uniform theory discovery as a discovery problem
by taking the alternatives under investigation to be (equivalence classes of)
complete theories. Then the results fromChapter 4 yield necessary and su�cient
conditions for uniform theory discovery.
TwoH-theories T; T 0 areK-equivalent if and only if for eachH inH: T;K j=
H () T0;K j= H. The next lemma says that the K-equivalence classes of K
-complete theories form a partition: on any data stream � in K, all K-complete
theories that are true on � are equivalent to each other.
Lemma 7.1 Let � be a data stream in K, and let T; T 0 be two K-complete
H-theories true on �. Then T \K = T0\K.
Proposition 7.2 (Kevin Kelly) The H-truth is uniformly identi�able givenK () there are only countably many K-equivalence class of K-complete H-
theories [T1]; [T2 ]; : : : [Tn]; : : : and each Ti is decidable in the limit given K.
Since the K-complete H-theories form a partition of K, it follows by the
arguments from Chapter 6 that a method � for uniform theory discovery is
data-minimal just in case � always projects its current theory. In that case
I say that � is globally data-minimal. Similarly, the results from Chapter
6 characterize when and how uniform discovery methods can globally succeed
with a bounded number of mind changes. New issues arise when we consider the
performance of a theorist with respect to the individual hypotheses of interest.
A theory learner � that reliably identi�es the completeH-truth settles the truth
of each hypothesisH in H. Provided thatH is closed under complementation|
as I assume throughout|this means that � is a reliable discovery method for
H0 = fH;Hg. Thus we can de�ne the modulus and the mind changes of � with
respect to H as in Chapter 6; I denote these measures of �'s performance by
modH(�; �) and MCH(�; �).
I say that an H-theory learner � is piecemeal data-minimal if � is data-
minimal with respect to each hypothesis H in H. An H-theory learner � piece-
meal minimaxes mind changes if � minimaxes mind changes with respect
to each hypothesis H in H.
Since data-minimality requires a de�nite opinion about every question of
interest at each stage of inquiry, a theory learner � that is data-minimal either
with respect to convergence to a complete theory or piecemeal data-minimal
130
Figure 7.2: Method � projects both Hr and H2 along some data stream, but
not both on the same data stream.
always produces complete theories consistent with the evidence. Moreover, if �
converges to a complete theory as quickly as possible, � projects its current|
and complete|theory T along some data stream � (by Proposition 6.1). Since T
is complete, this implies that for each hypothesis H in H, � projects its current
conjecture about H along �. So if a method � is globally data-minimal, � is
piecemeal data-minimal.
Fact 7.3 Let H be a collection of hypotheses for investigation with background
knowledge K. Suppose that a method � for �nding a correct and complete H-theory is globally data-minimal. Then � is piecemeal data-minimal.
The converse may fail. A theory learner that is piecemeal data-minimal must
project each hypothesis entailed by its current theory along some data stream,
but need not project all of them along the same data stream. For example,
suppose that H2 asserts that at most two particles will be found in the lab, and
that Hr claims that some particle will decay into another (see Figure 7.2).
Suppose that before any evidence is gathered, � conjectures bothH2 andHr.
If the �rst experiment shows one or more particles, � conjectures thatH2 is false
131
but that Hr is true. Then if the second experiment shows one particle decaying
into another, � becomes certain that Hr is true. If the �rst experiment does not
turn up any particles, � conjectures that H2 is true but that Hr is false. Then
if the second experiment discovers no more than two particles, � conjectures
H2 and continues to do so unless more than two particles are observed. So �
projects each of H2 and Hr along some data stream, but there is no data stream
along which � projects its initial theory H2 \ Hr. Thus � is piecemeal but not
globally data-minimal.
Methods for discovering complete theories may minimize the number of the-
ory changes even though they cannot minimax retractions with respect to indi-
vidual hypotheses. Intuitively, the reason is that the right answer to one of the
questions under investigation may be a bold generalization that implies another
bold generalization about a second question. The theorist may have to take back
the generalization about the second question later, changing his mind more often
about the question than was necessary. In fact, Goodman's Riddle of Induction
has this structure (see Figure 6.4.) Each hypothesis Hgrue(k) is decidable with
certainty, and hence with 0 mind changes (Theorem 6.7), by waiting until the
critical time k. But if all observed emeralds are green, eventually any reliable
method � has to project that all future emeralds are green, too. If this occurs
at time k, � conjectures that Hgrue(k+1) is false. Then if the k + 1-th emerald
is blue, proving that Hgrue(k+1) is the correct generalization about the color of
emeralds, � has to change its mind about Hgrue(k+1). Hence � succeeds with a
bounded number of overall theory changes (namely 1), but does not minimax
retractions with respect to the individual Hgrue(k) hypotheses.
Conversely, a theory learner may piecemeal minimax retractions and at the
same time change its overall theories more often than is necessary. The reason
is that a theorist may schedule mind-changes about the individual hypotheses
badly. For example, take H1 to be the hypothesis that there is a white raven,
and let H!1 be the hypothesis that there is exactly one white raven (see Figure
7.3).
It is possible to assess H1 with at most one mind change; H!1 requires at
least two, but not more. The following method � �nds the complete truth about
H1 and H!1 with at most two mind changes: Conjecture that there is no white
raven until one is found (that is, hypothesize that both H1 and H!1 are false);
then conjecture that there is exactly one until a second white raven turns up.
Now consider another method �0 that also begins by conjecturing that there is
no white raven until one is discovered. When the �rst white raven appears, �0
concludes that H1 is true, as � does; but �0 continues to think that H!1 is false
(�0 may believe that if there is one white raven, there must be another). Now
if no second white raven appears, �0 must eventually change its theory|for the
second time|and guessH!1, claiming that there is exactly one white raven. But
just at that point, the second white raven may appear, leading �0 to change its
theory for the third time. Since there is another reliable method that changes
its theory at most twice, namely �, this means that �0 does not minimax theory
132
Figure 7.3: Method �0 changes its overall theory three times, its conjecture
about H!1 twice, about H1 once.
133
changes. On the other hand, �0 changes its mind about H1 and H!1 respectively
no more often than is necessary: At most once about H1, and at most twice
about H!1.2
Although globally minimaxing retractions is in general independent of piece-
meal minimaxing retractions, it seems natural to require that a method should
avoid changing its overall theory as well as its conjectures about particular
hypotheses as much as possible. Chapter 8 uses this idea to de�ne optimal
inference in particle physics.
7.4 Piecemeal Theory Discovery
Recall that a method � piecemeal identi�es the H-truth just in case for each
hypothesis H in H; � eventually converges to the correct opinion about H on
every possible data stream (De�nition 7.2). Clearly, uniform success in the-
ory discovery entails piecemeal success. Also, there is no guarantee that the
successive theories produced by a piecemeal discovery method get ever more
\verisimilar". A piecemeal method is permitted to add as much new falsehood
as it pleases at each stage, so long as more and more hypotheses have their truth
values correctly settled.
A few examples may clarify the di�erence between the two concepts of suc-
cess. Let H0 be the set of all evidence propositions [e] and their complements.
It is trivial to �nd the complete H0 truth in a piecemeal fashion simply by re-
peating the data as it is received. But no method can do so uniformly, since
for each data stream �, f�g is the complete H0-truth, and whereas there are
uncountably many such singleton theories, the range of each discovery method
is countable, so most such theories cannot even be conjectured.
Even when there are only countably many distinct H-complete theories,
piecemeal success may be possible when uniform success is not. Let the hypoth-
esis Hn say that all swans after and including the n-th one are white. Let H1
be the set of all such hypotheses and their negations. H1 requires piecemeal
solutions to perform nontrivial inductive inferences, since each hypothesis Hn is
a claim about the unbounded future. This time, there are only countably many
distinct H1-complete theories 3. To succeed piecemeal, let method � conjecture
Hn+1�Hn (\the last non-white swan was the n-th one, and the n�1-th swan is
not white") when the last non-white swan occurs in the data occurs at position
n. Nonetheless, uniform success is still impossible. For suppose for reductio
that some method � succeeds uniformly. Then a wily demon can present one
2This example raises another issue for evaluating the mind changes of a theory discoverymethod. Some of �'s overall theory changes seem more signi�cant than others (in particular,
the �rst one does). It would be good to have a criterion that weights mind changes by their
signi�cance, for example by their respective losses of content. This seems to require a measureof content (as in [Levi 1967]).
3Such a theory either says exactly when the last non-white swan appears or says that there
are in�nitely many non-white swans.
134
white swan after another until �'s conjecture entails H0 (\all swans are white"),
which it must eventually do on that data stream. Then a non-white swan is
presented (e.g., at stage m) followed by only white ones until �'s conjecture
entails Hm+1, which it must eventually do on that data stream, etc. The data
stream � so presented features in�nitely many non-white swans, so � produces
in�nitely many conjectures inconsistent with the complete H1-theory of � .
The same argument shows that no piecemeal solution could possibly succeed
by eventually producing only true hypotheses, since the construction forces an
arbitrary piecemeal solution to produce in�nitely many false conjectures. Hence,
piecemeal success is sometimes possible only if in�nitely many false conjectures
are produced.
When H is closed under complementation, a method � that piecemeal iden-
ti�es the H-truth decides each hypothesis H in H. Hence if the H-truth is
piecemeal identi�able, it must be the case that each H 2 H is decidable in the
limit (cf. Section 4.4). But this condition is not su�cient in general. A simple
counterexample is obtained by taking the hypotheses of interest to be the sin-
gleton theories consisting of exactly one data stream, i.e. H 2 H2 () H = f�g
or H = f�g. Each hypothesis in H2 is veri�able or refutable with certainty,
but it is impossible to piecemeal discover the H2-truth without countable back-
ground knowledge. For the only consistent H2 -theory that entails a singleton
hypothesis consists of the hypothesis itself. But if there are uncountably many
data streams consistent with the learner's background assumptions, then most
of these cannot be conjectured since the learner can only make countably many
conjectures. If f�g is one of the theories never conjectured by the learner, the
learner will fail on data stream �.
An important special case arises when there are only countably many hy-
potheses under investigation. For example, the hypotheses of interest may be de-
�ned in a countable language. Proposition 7.4 below (established by [Kelly 1995,
Prop.12.20]) asserts that if H is countable and each hypothesis in H is decidable
in the limit, then the H-truth can be identi�ed piecemeal. So if piecemeal dis-
covery of theH-truth requires thatH is countable, the additional condition that
each hypothesis in H be decidable in the limit would be necessary and su�cient
for identifying the H-truth piecemeal. I leave open the question of whether it
is possible to piecemeal identify the H-truth for uncountable collections H of
empirical hypotheses.
Proposition 7.4 (Kevin Kelly) Suppose that H is countable. Then the H -truth is piecemeal identi�able given K () each H 2 H is decidable in the limitgiven K.
As with uniform theory discovery, I say that a method � that piecemeal iden-
ti�es the H-truth is (piecemeal) data-minimal if and only if � is data-minimal
with respect to each hypothesis H in H. As we saw in Section 7.3, a data-
minimal theory learner � always conjectures a complete theory. The following
135
proposition 7.5 says that for any piecemeal solvable problem, there is a data-
minimal solution �. In fact, the proof of Proposition 7.5 shows more: We can
construct a piecemeal data-minimal method reliable method that projects its
entire current theory along some data stream at each stage.
Proposition 7.5 Suppose that the H-truth is piecemeal identi�able given K.Then there is a data-minimal H-theory learner � that piecemeal identi�es the
H-truth given K.
As with uniform theory learners, I say that a piecemeal theory learner �
piecemeal minimaxes retractions if � minimaxes retractions with respect to each
hypothesis under investigation. As we saw in Section 7.3, the analog of Propo-
sition 7.5 fails for piecemeal minimaxing retractions: The investigation of a
class of phenomena may force a theorist to change his mind about individual
hypotheses more often than is necessary.
7.5 Countable Hypothesis Languages and Finite
Axiomatizability
A natural and attractive way of presenting scienti�c theories is to express them
as a set of postulates, for example \laws of nature", or universal equations.
So we may desire methods for theory inference that express their theories with
�nitely many postulates.
I say that an empirical theory T is �nitely axiomatizable in a language
H given background knowledge K if T is the intersection of a �nite number of
empirical propositions fromH with K. According to this de�nition, a theorist's
conjecture may be �nitely axiomatizable even when his background theory is
not. For example, if the theorist postulates Kepler's laws, arithmeticmay well be
a part of her background assumptions (with no �nite axiomatization), whereas
there are only three of Kepler's laws.
The next proposition shows that if the collection H of hypotheses under
investigation is countable, and it is possible to piecemeal identify the H-truth,
then it is possible to do with �nitely axiomatizable theories only.
Proposition 7.6 (Kevin Kelly) Suppose that H is countable, and that the H-
truth is piecemeal identi�able given K. Then there is an H -theory learner � thatidenti�es the H-truth and always produces theories that are �nitely axiomatizable
in H.
By Proposition 7.5, we can have a data-minimal method for piecemeal iden-
tifying the H-truth, if that is possible at all, and by Proposition 7.6, we can
have a method that produces �nitely axiomatizable theories only. But we can-
not always have a method that does both. Suppose for example that we looked
upon mathematics as a \physics of abstract objects" [Co�a 1991], in which
136
mathematicians adopt proof systems that \save the mathematical phenomena",
namely the set of mathematical propositions accepted by the mathematical com-
munity at a given time. Let's consider arithmetic, the theory of natural num-
bers. To keep the example simple, I shall take the \phenomena"|the evidence
at time t|to consist of a �nite list of axioms sound and complete for the set of
arithmetical propositions accepted by the mathematical community at time t .
The hypotheses of interest are all sentences of arithmetic; hs is correct just in
case sentence s is eventually always part of the accepted arithmetic of the time.
More formally, the hypothesis hs is correct on a sequence A1; A2; ::: of axiom
systems just in case there is a time n such that for all later times n0 � n; s is a
theorem of An0 . As stated, each hypothesis hs is only veri�able but not decid-
able in the limit. But we may add the background assumption K that once a
sentence s of arithmetic is accepted by the community (that is, once s is a the-
orem of an axiom system An), then the community never retracts s (that is, s
remains a theorem of An+1; An+2; ::: ). Then each hypothesis hs is equivalent to
a simple existential claim (\there is a time at which s is accepted"), so by Propo-
sition 7.4, there is reliable method for piecemeal identifying which sentences of
arithmetic the mathematical community will accept; and by Proposition 7.5,
there is a data-minimal method for doing so. Also, there are only countably
many sentences of arithmetic, so by Proposition 7.6, there is a piecemeal theory
learner for this problem who conjectures �nitely axiomatizable theories only.
These propositions still apply if we add as further background knowledge K 0
the optimistic belief that in the limit of mathematical inquiry, mathematicians
�nd all (and only) truths of arithmetic. So hs is correct on any sequence � of
arithmetics consistent with K0 just in case s is a true sentence of arithmetic.
Now a data-minimal method for piecemeal identifying the truth about each hy-
pothesis hs given background knowledge K 0 has to produce a complete theory
consistent with K0, in the language of arithmetic. But it follows from G�odel's
�rst incompleteness theorem that no such complete theory is �nitely (or even
recursively) axiomatizable.
7.6 Proofs
Lemma 7.1 Let � be a data stream in K, and let T; T 0 be two K -completeH-theories true on �. Then T \K = T
0\K.
Proof. If both T and T0 are inconsistent with K, the claim is immediate.
Suppose for reductio that there is some data stream � on which T \K is correct
but T 0\K is not. LetHT = fH 2 H : T j= Hg, and de�neHT 0 similarly. Since
an H-theory is the intersection of hypotheses in H, we have that T =THT ,
and T0 =THT 0 . As � 62 T
0, there must be a hypothesis H 2 HT 0 such that H
is not correct on �. Since T 0 � H by de�nition of HT 0 ; we have that T0;K j= H,
and so T;K j= H because T and T0 are K-equivalent and K-complete. So
T \K � H, contrary to the assumption that � 2 T \K �H.2
137
Proposition 7.2 (Kevin Kelly) The H-truth is uniformly identi�able givenK () there are only countably many K-equivalence class of K-complete H-theories [T1]; [T2 ]; : : : [Tn]; : : : and each Ti is decidable in the limit given K.
Proof. Without loss of generality, consider only theory learners that produce
K-complete theories. We may think of such a theory learner as conjecturing
the equivalence class of its K-complete theory. The proposition then follows
immediately from Proposition 4.2 and Lemma 7.1.2
Proposition 7.5 Suppose that the H-truth is piecemeal identi�able given K.
Then there is a data-minimal H-theory learner � that piecemeal identi�es theH-truth given K.
Proof: Choose an H-theory learner � whose conjectures are always K-
consistent and that piecemeal identi�es the H-truth given K. I associate a
data stream Stream(e) 2 K with each �nite data sequence e consistent with
K, in the following way. Let e � x denote the �nite data sequence in which e is
followed by x 2 E.
1. Stream(;) = some � 2 K such that � 2 �(;).
2. Stream(e � x) =
(a) Stream(e) if e � x � Stream(e)
(b) else some � 2 K such that � 2 �(e � x).
Now de�ne
�0(e) = fStream(e)g:
I verify that �0 piecemeal identi�es the H-truth given K. Let H 2 H and � 2 K
be given. Let m = mod(�;H; �). Suppose that H is correct on �. Then for all
m0 � m; �
0(�jm0);K j= H. So for all m0 � m;Stream(�jm0) 2 H, and hence
�(�jm0);K j= H. Similarly when H is not correct on �. To see that � is data-
minimal, note that � always projects its current theory �(e) along Stream(e).
By the argument for Fact 7.3, � is piecemeal data-minimal. 2
Proposition 7.6 Suppose that H is countable, and that the H-truth is piece-meal identi�able given K. Then there is an H-theory learner � that identi�es
the H-truth and always produces theories that are �nitely axiomatizable in H.
Proof: Let H and K be as described, and let H1;H2; : : : ;Hn; : : : be an enu-
meration of H. Let � be a consistent H-theory learner that piecemeal identi�es
the H-truth given K. Let Pos(e) = fHj 2 H : �(e);K j= Hj and j � lh(e)g:
De�ne
�0(e) =
\Pos(e):
138
Clearly �'s theories are �nite intersections of propositions inH and hence �nitely
axiomatizable. To see that �0 piecemeal identi�es the H-truth, let � 2 K and
Hi 2 H be given. Let m(i) = modHi(�; �), and let m0 = maxfm; ig. Then
modHi(�0; �) � m
0.2
This argument together with the observation that if � piecemeal identi�es the
H-truth given K, then � decides each H 2 H in the limit given K, establishes
Proposition 7.4.
139
Chapter 8
Reliable Theory Discovery
in Particle Physics
8.1 Outline
One of the aims of the philosophy of science is to analyze the methodology of
problems that arise in scienti�c practice. In this chapter I study the two basic
inference problems of particle physics: To �nd out what particles exist, and to
determine what reactions are possible among them. For each of these questions,
I ask �rst under what assumptions it is possible to settle reliably on the right
answer. Second, I examine how to search for the right answer e�ciently, in the
sense of avoiding retractions and minimizing convergence time from Chapter 6.
Even with generous assumptions about the power of experimental apparatus
for detecting elementary particles, there is no reliable method for determining
whether there are �nitely or in�nitely many elementary particles|even in the
limit of inquiry. But if we add the common assumption that there are only
�nitely many elementary particles, it becomes a straightforward matter to iden-
tify them. Indeed, there is a unique data-minimal inference rule that minimaxes
retractions: Posit that the only particles that exist are those observed so far.
Thus we �nd, as in Section 6.4, that the same e�ciency considerations that
yield the natural projection rule in Goodman's Riddle of Induction underwrite
inferences in the spirit of Occam's razor in particle research.
Without further background assumptions, it is a di�cult empirical problem
to arrive at a complete true theory of how elementary particles react with each
other. There is no reliable method for accomplishing this, even if we assume
that there are only �nitely many elementary particles and that we have discov-
ered them all. This situation changes if we add a prominent feature of particle
physics: The idea that a complete theory of elementary particle reactions should
be a conservation theory. A conservation theory introduces certain quantities,
140
assigns each particle a value of those quantities, and posits that a reaction is
possible just in case it conserves all of them. The belief that some conserva-
tion theory can describe the physically possible particle reactions is a material
assumption about nature; I call it the conservation principle. Under the conser-vation principle, �nding a complete true theory of reactions among observable
particles becomes a relatively easy problem: For a given �nite set of particles,
we can identify the possible reactions among them with a bounded number of
mind changes (cf. Section 6.4). And again, avoiding retractions requires that
we take the possible reactions to be those observed so far.
The task of �nding a complete correct theory of particle reactions becomes
more complex if we allow that reactions may involve virtual, or hidden particles.
Under the conservation principle, together with the assumption that there are
only �nitely many particles (of the observable and the unobservable kind), we
still have a reliable solution to this problem. But we cannot guarantee a bound
on how often a theorist may have to change her mind to arrive at a correct
theory of particle reactions.
In the presence of virtual particles, the conservation principle interacts with
e�ciency considerations in surprising ways: It turns out that conservation theo-
ries must sometimes introduce hidden particles to rule out unobserved reactions.
I describe an algorithm for doing so. This algorithm is a reliable procedure for
identifying a complete true theory of the reactions that are possible among a
�nite set of particles; it is e�cient with respect to (1) minimizing convergence
time (data-minimality), (2) avoiding global theory changes, and (3) avoiding
changing its hypotheses about individual reactions. This analysis gives a means-
ends interpretation of the role of conservation theories and hidden particles in
particle physics: They are important moves in the game of inquiry that serve
reliability and e�ciency.
Both parsimony and conservatism|avoiding retractions|have been advanced
as important principles of scienti�c inference. In the problem of inferring con-
servation theories, there is a tension between the two. My results show that
when observed reactions refute the current conservation theory, the theorist
typically has a choice between assigning new and fewer quantum properties,
on the one hand, and extending the current ones by introducing virtual par-
ticles (such as the muon and electron neutrino). Historically, physicists have
not changed quantum properties once introduced. Whether this is a choice for
conservatism and against parsimony is an interesting question. In view of the
mass of data about particles and particle reactions, investigating this question
would require a computer program that, for a given database of particles and
reactions, �nds conservation theories that introduce as few quantum properties
and virtual particles as the data permit. Designing such a program and run-
ning it on the currently available evidence would be an interesting project. Is
it possible to simplify particle physics?
These considerations raise the question of how many quantum properties
particle physics might need. It turns out that there is a surprising connection
141
between the number of stable particles (which is small compared to the number
of unstable particles and reactions) and the complexity of conservation theories:
I show that under (a version of) conservation of energy, there cannot be more
(linearly independent) quantum properties than there are stable particles.
8.2 Elementary Particles and Reactions
A particle is an object that obeys the rules of quantum mechanics for a point
with well-de�ned mass and charge [Omnes 1971, Ch.1.1]. Physicists refer to
particles that are neither atoms nor nuclei as `elementary' particles 1. The goal
of particle physics is to determine what elementary particles there are, and to
�nd out how they react with each other. Table 8.1 shows some well-known
elementary particles. 2
Most elementary particles are unstable and decay quickly. For example,
the decay of the pion into a muon and a neutrino takes on average 2:5 �10�8
seconds. The standard notation for this decay is �+ ! �+ + ��. Two colliding
particles can react with each other to form new particles. For example, two
protons may react to produce a proton, a neutron and a pion. The notation for
this reaction is p+ p! p+ n+ �+. It appears that three-way collisions are too
unlikely to be of interest to physicists.
I introduce a number of mathematical objects to represent particles and
reactions. Let P be the set of logically possible types of particles. (In what
follows, I follow physicists in not always distinguishing between types and tokens
of particles, and will for example speak of the `logically possible particles' rather
than the \logically possible types of particles"). Since particles are discrete
entities, I take P to be the set of natural numbers N. As is standard practice,
I will denote particles by Roman or Greek letters, sometimes with indices (e.g.,
p1). The reagents in a reaction are represented by a function a : P! N; the
function a indicates how many instances of a given particle type are among
the reagents. For example, in the reaction p + p ! p + n + �+, we have that
a(p) = 2, and a(q) = 0 for all logically possible particles q other than the proton.
Similarly, the products in a reaction are represented by a function p : P! N.
In the reaction mentioned, we have that p(p) = 1;p(n) = 1;p(�+) = 1; and
p(q) = 0 for all other logically possible particles. A reaction r is a pair (a;p).
For a given reaction r = (a;p), I denote the reagents in r by agents(r), and
de�ne this set to be agents(r) = fp 2 P : a(p) > 0g: A reaction r = (a;p) is a
decay if only one particle occurs among the reagents, that is, if agents(r) = fpg
and a(p) = 1. A reaction r = (a;p) is a collision if exactly two particles occur
among the reagents, that is, ifP
p2agents(r) a(p) = 2. The products in r are
denoted by products(r), and de�ned as products(r) = fp 2 P : p(p) > 0g.
1except for the proton, which is considered an elementary particle although it is the nucleus
of the hydrogen atom.2The table is from [Cooper 1992, p.455].
142
Symbol Name Mass in Mev Charge
�� xi minus 1319 �e
��� antixi plus 1319 +e
�0 xi zero � 1311 0��0 antixi zero � 1311 0
�� sigma minus 1196 �e
��+ antisigma plus 1196 +e
�0 sigma zero 1192 0��0 antisigma zero 1192 0
�+ sigma plus 1190 +e��� antisigma minus 1190 �e
�0 lambda 1115 0��0 antilambda 1115 0
n neutron 940 0
�n antineutron 940 0
p proton 938 +e
�p antiproton 940 �e
K0 K zero 498 0
�K0 anti-K zero 498 0
K+ K plus 494 +e
K� K minus 494 �e
�+ pion plus 140 +e
�� pion minus 140 �e
�0 pion zero 135 0
photon 0 0
�� muon minus 106 �e
�+ muon plus 106 +e
e� electron 0.511 �e
e+ positron 0.511 +e
�e e neutrino 0 0
��e e antineutrino 0 0
�� � neutrino 0 0
��� � antineutrino 0 0
Table 8.1: Some Elementary Particles
143
Figure 8.1: A Particle World and the Particles in it.
The particles involved in r are the reagents and the products in r; formally
particles(r) = agents(r)[products(r). LetR denote the set of logically possible
reactions. For a given set P of particles, R(P ) denotes the set of possible
reactions involving particles in P ; formally, R(P ) = fr 2 R : particles(r) � Pg.
A possible particle world w is a set of reactions, that is a subset of R. The
reactions in w are called the physically possible reactions in w. The particles
in the particle world w are the particles involved in the physically possible
reactions in w; that is, particles(w) =Sfparticles(r) : r 2 wg|see Figure 8.1.3
I denote the set of possible particle worlds by W . A proposition about
particles is a set of particle worlds. A particle theory is a proposition about
particles. Thus my particle theories only give information about what reactions
are possible and what particles exist, but not about the order in which reactions
occur. Nor do they make claims about reaction times, the momenta of the
reagents and the products, or other quantities associated with a reaction. I
leave the task of expanding the model of particle theories for future work; we
shall see that the problem of determining what reactions are possible is rich
enough by itself.
8.3 Evidence in Particle Physics
Particle physicists gather information about the actual world of particles through
experiments and observations. As the experimental practice of particle physics
proceeds, the laboratories report more and more reactions. Let us imagine
that the experimental community issues a sequence l1; l2; :::; ln of successive re-
ports of the reactions that the laboratories of particle physicists have observed.
What can we assume about the relationship between the phenomena that the
lab reports and the way particles actually are? (In Kant's terminology, this
is a question about the relationship between the world of experience and the
particle world-in-itself.) An optimist would believe that whatever the lab re-
3Thus I implicitly assume that each particle in a particle world takes part in at least onereaction.
144
ports actually occurred; in that case, let us say that the experimental practice
is sound. He might also have faith that experimentalists will eventually turn
up all the particles that there are, if not immediately, then at least eventually
as inquiry continues inde�nitely. In that case I say that experimental practice
is complete in the limit. It's not easy to tell whether physicists believe that
their experimental practice will eventually turn up all particles that there are,
but in any case they don't seem too worried about the possibility that a particle
might forever escape detection by their instruments. 4 It is naive to think that
experimental practice is sound and always gets it right. The record shows some
prominent examples in which physicists reported experimental results that they
later rejected as spurious. This fact raises some interesting questions about the
epistemology of experimentation. One approach would be to ask under what
circumstances we are \justi�ed" in accepting experimental reports as true, or
when laboratory observations \con�rm" hypotheses about particles and their
behavior. ([Franklin 1990] pursues this project in detail.) A reliabilist would
instead take a long-run perspective and ask in what circumstances experimental
practice can eventually settle on a correct account of the empirical phenomena.
For example, Karl Popper proposed to view experimental reports as universal
hypotheses of the form \all future experiments will replicate this phenomenon"
[Popper 1968]; see also Section 2.4. On this view, accepting a lab report until
it fails to be replicated will converge to the correct opinion about experimental
e�ects (although we may not be certain that a phenomenon is genuine.) It may
be too optimistic to suppose that we can always trust that failures to replicate
a phenomenon prove the phenomenon to be spurious; all that we may want to
demand of experimental inquiry is that eventually, after some �nite number of
replications and failures of replication, it should yield the correct opinion about
whether or not a laboratory observation re ects the actual particle world accu-
rately. In that case, I say that experimental practice is sound in the limit. An
interesting reliabilist project is to examine what experimental designs are sound
and complete in the limit relative to given background assumptions about the
domain under study (perhaps using the tools from [Kelly 1995, Ch.14]). In this
thesis, however, my main interest is in what inferences best serve the ends of
inquiry. I will therefore focus on the task of inferring particle theories from given
evidence, and leave aside questions of how to design experiments for gathering
this evidence; we shall see that even with a naive, idealized view of the evidence
available to theorists, theorizing about particles is a complex and subtle induc-
tive task. The simplest possible interpretation of the evidence produced by the
lab is to take it at face value. So I assume that if a lab produces a report of
reactions l, then each reaction r listed in l actually occurred. Without loss of
generality, we may then take sequences of reaction reports l1; l2; :::; ln ; :: to be
monotone in the sense that the reactions from the n-th report are included in
4An empiricist might interpret this as indicating that particle physicists are content withempirically adequate theories; cf. Section 2.2 and [Van Fraassen 1980].
145
the n + 1-th report. This means that we may view our evidence simply as a
sequence of reactions. Thus a possible data stream � is simply an in�nite
sequence of reactions (i.e., a member of R!). I assume that experimental prac-
tice is sound, and I also assume that experimental practice is complete in the
limit. This means that a data stream � may be generated in a particle
world w just in case w = range(�). Figure 8.2 shows a particle world and
one of the data streams that may be generated in it.
A particle theorist conjectures a particle theory in response to evidence
about particle reactions. Formally, a particle theorist � is a function � : R� !
2W.
8.4 What Particles Are There?
Without further background knowledge, it is di�cult to �nd out what elemen-
tary particles exist in our world, even in the limit of inquiry. Let P = fp1; p2; :::g
be a set of particles, and let HP denote the hypothesis that the particles in P
are exactly the existing ones (i.e., HP = fw : particles(w) = Pg). I refer
to a proposition of this form as an ontological proposition. Let H be the
collection of all ontological propositions. Note that the alternatives in H are
mutually exclusive, so the problem of identifying the set of particles in our world
is a discovery problem in the sense of Chapter 4. Since there are uncountably
many alternatives in H (namely 2!), it follows from Proposition 4.2 that there
is no reliable solution for this discovery problem. Indeed, the demonic argu-
ment from Section 4.4 shows that it is impossible to reliably determine whether
there are in�nitely many elementary particles, much less identify exactly what
elementary particles there are. The demonic argument applies when we allow
the possibility that there may be an in�nite set P of particles, or any �nite
subset of P . Particle physicists seem to assume (or hope) that there are in fact
only �nitely many elementary particles in our world. We may formulate this
assumption as the proposition
FIN = fw : particles(w) is �niteg:
Given this assumption, it is a straightforward matter to reliably arrive at
the correct ontology of the particle world: Simply conjecture that the particles
observed so far are exactly the ones that exist. I call this inference rule the
Occam method. Thus we have the following result.
Fact 8.1 Let H be the set of alternative theories about what particles exist.
1. Without further background knowledge, it is impossible to reliably deter-mine what elementary particles exist. That is, the discovery problem forH has no reliable solution given the vacuous background knowledge W .
146
Figure 8.2: The Evidence that may arise in a Particle World
147
2. If we assume that there are only �nitely many particles, it is possible toreliably determine the ontology of the particle world.
The Occam procedure is a natural one for identifying the ontology of the
particle world; indeed, the criteria from Chapter 6 single it out as the most e�-
cient. To be precise, the Occam theorist is the only one to minimize convergence
time and at the same time avoid unnecessarily many mind changes about each
ontological proposition. In the terms of Section 7.3, the Occam rule is the only
method that is piecemeal data-minimal and minimaxes retractions with respect
to each ontological proposition in H consistent with the assumption FIN that
there are only �nitely many particles. The reason for this is essentially the same
as for the version of Occam's Razor in Section 6.4.
Proposition 8.2 Let H be the set of alternative theories about what particlesexist. Assume that there are only �nitely many particles (i.e., assume FIN
). Then the Occam theorist is the only theorist that reliably identi�es the trueontology of the particle world, is data-minimal and minimaxes retractions with
respect to each ontological proposition in H.
Of course it may be the case that physicists accept other background the-
ories that entail the existence of unobserved particles. 5 To take into account
background knowledge W , modify the Occam rule to conjecture that our world
contains exactly those particles whose existence is entailed by the available
evidence e and the theory W . The corresponding version of Proposition 8.2
holds|the modi�ed Occam rule is the only e�cient rule (in the sense of the
Proposition) for identifying the ontology of the particle world.
8.5 Identifying Subnuclear Reactions
Particle physics aims to �nd the reactions that are possible among a given set
of particles. It is trivial to piecemeal identify the physically possible reactions,
that is, to determine for each reaction r whether it is possible or not. 6 Figure
8.3 illustrates the empirical content of this hypothesis.
This hypothesis has the same topological structure as the Riddle of Induction
(see Figure 6.4) and the hypothesis that a given particle exists (see Figure
6.3). As with those discovery problems, the only rule that is data-minimal and
minimaxes retractions is the Occam-like or \closest-�t" rule: conjecture that
the reactions observed so far are exactly the possible ones.
To reliably settle on a complete theory of particle reactions, however, is a
di�cult inductive problem. This is true even if we have discovered all particles
5Famous historical examples of particles whose existence was predicted before their obser-vation are antiparticles, Yukawas pion, and the � particle.
6Assuming, as I do throughout this chapter, that all and only the possible reactions are
eventually observed.
148
Figure 8.3: Does reaction r occur?
149
that there are because a �nite number of particles may undergo an in�nite
number of transitions. Two particles su�ce to illustrate the di�culty. Consider
the family of reactions p+ p! p + p + �0 + �
0 + :::, a collision of two protons
producing some number of pions, which physicists observe in the laboratory.
Suppose that we allow that such a reaction can in principle produce any �nite
number of pions (say with the proviso that the kinetic energy of the colliding
protons must be su�ciently high 7). Then it is impossible to reliably �nd a
complete theory of the reactions among protons and pions. For let a theorist
� make an attempt to do so. Then a demon may arrange for the reactions
p + p ! p + p + �0; p + p ! p + p + �
0 + �0; ::: to occur until � theorizes that
any number of pions may be produced in a collision of two protons. Suppose
that at that time, the production of up to n pions has been observed. The
demon adds no more new reactions to the list, until the theorist conjectures
that the collision of two protons can produce at most n pions. Then the demon
resumes the reports of proton collisions that generate n+1; n+2; ::: pions, until
the theorist hypothesizes again that any number of pions may result from the
collision of two protons, etc. By the argument from Section 4.4, the theorist
fails on the data stream that results from this interplay.
This negative argument for local underdetermination raises the question of
how particle physicists might solve the task of identifying the set of physically
possible reactions. In practice, the answer is to appeal to a tradition according
to which physically possible reactions satisfy a set of conservation principles.
Physicists seem to assume that the language of conservation theories contains
a complete account of the particle reactions that we �nd in nature. I refer
to this assumption as the conservation hypothesis. We shall see that the
conservation hypothesis is a powerful antidote to local underdetermination.
8.6 Conservation Laws in Particle Physics
Roughly, two classes of conservation laws are thought to govern subatomic reac-
tions: the classical conservation laws such as conservation of energy, momentum,
angular momentumand charge, and the conservation of so-called quantum prop-
erties, namely baryon, electron, muon, and lepton number. An integer value for
each quantum property is assigned to every elementary particle. Table 8.2 shows
the values of these quantities that scientists have assigned to various particles.8
Physicists report that so far \all events have been found to be consistent
with the conservation of" these quantities [Cooper 1992, p.456]. (Strangeness
is a quantum property that is not conserved in all physically possible reactions.
Rather, the conservation of strangeness is connected with the time that it takes
for a transition to occur; see for example [Feynman 1965, p.68]. Since in this
7I neglect absolute bounds on the speed of protons such as the speed of light.8The table is from [Omnes 1971].
150
Particle Charge B.N. L.N. E.N. M.N. Strangeness Hypercharge
1 �� -1 1 0 0 0 -2 -1
2 ��+ 1 -1 0 0 0 2 1
3 �0 0 1 0 0 0 -2 -1
4 ��0 0 -1 0 0 0 2 1
5 �� -1 1 0 0 0 -1 0
6 ��+ 1 -1 0 0 0 1 0
7 �0 0 1 0 0 0 -1 0
8 ��0 0 -1 0 0 0 1 0
9 �+ 1 1 0 0 0 -1 0
10 ��� -1 -1 0 0 0 1 0
11 �0 0 1 0 0 0 -1 0
12 ��0 0 -1 0 0 0 1 0
13 n 0 1 0 0 0 0 1
14 �n 0 -1 0 0 0 0 -1
15 p 1 1 0 0 0 0 1
16 �p -1 -1 0 0 0 0 1
17 K0 0 0 0 0 0 1 1
18 �K0 0 0 0 0 0 -1 -1
19 K+ 1 0 0 0 0 1 1
20 K� -1 0 0 0 0 -1 -1
21 �+ 1 0 0 0 0 0 0
22 �� -1 0 0 0 0 0 0
23 �0 0 0 0 0 0 0 0
24 0 0 0 0 0 0 0
25 �� -1 0 1 0 1 0 0
26 �+ 1 0 -1 0 -1 0 0
27 e� -1 0 1 1 0 0 0
28 e+ 1 0 -1 -1 0 0 0
29 �e 0 0 1 1 0 0 0
30 ��e 0 0 -1 -1 0 0 0
32 �� 0 0 1 0 1 0 0
31 ��� 0 0 -1 -1 0 0 0
Table 8.2: Quantum Number Assignments
151
study I don't consider transition times, I will leave strangeness aside.) Rules
that posit the conservation of a quantum property in a particle reaction are
called selection rules. For an example of a selection rule, consider the fact
that the proton p is stable. None of the classical conservation laws rules out,
say, the decays p! e+ + (a proton decays into a positron and emits light) or
p! �++�
0 (a proton decays into a positively charged and a neutral pion). As
a standard text has it, \along lines made venerable by tradition, we explain this
by saying that the proton has a certain inherent property which is conserved"
[Omnes 1971, p.36].
Physicists have come to call the property in question \baryon number", or
\heavy particle number". As Table 8.2 shows, the decays mentioned do not
conserve baryon number (B.N.). That is, the baryon number of the decaying
particle, 1, is not the same as the sum of the baryon numbers of the products,
0. A selection rule describes a quantum property and asserts that all physically
possible reactions conserve it; that is, that the sum of the values of the quantum
property among the reagent(s) is the same as the sum of the values of the
quantum property among the product(s).
For decays, conservation of energy entails that the mass of the decaying
particle must not be smaller than the sum of the masses of the particles into
which it decays. For example, in the decay �0! �0 + �
0, the mass of the �0
is 1311 MeV (see Table 8.1), whereas the sum of the masses of the products is
1192 MeV + 135 MeV = 1327 MeV. So this decay is not possible. In collisions,
however, it is possible that the kinetic energy of the colliding particles is high
enough to permit the reaction to take place even when the combined mass of the
products is higher than those of the reagents. For example, when two protons
collide to produce a pion (in arrow notation, p+p! p+p+�), the energy count
balances because at least one of the protons loses momentum in the collision.
Notice that the only way in which the reaction p+ p! p+ p+ � can conserve
quantum properties is if the pion � carries 0 of all quantum properties. This in
turn implies that all reactions of the form p+ p! p + p + � + � + ::: conserve
all quantum properties: if the collision of two protons can produce one pion, it
can produce any number, as far as selection rules are concerned.
Mass and charge are assigned to each particle as soon as it is discovered, and
thus conservation of energy and charge immediately rule out reactions among
identi�ed particles. The inference problem is to assign values for quantum prop-
erties other than those of mass and charge. To simplify my analysis, I will ne-
glect conservation of mass and charge, and consider only how to account for a
given set of observed reactions by assigning quantum properties to the particles
involved. As [Valdes and Erdmann 1994] showed, we can apply useful concepts
from linear algebra to this question if we represent reactions as vectors, in the
following manner.
Let p1; p2; : : : ; pn;::: be a (possibly in�nite) set of elementary particles. For
a given reaction r, let r(pi) = a(pi) � p(pi) be the number of times particle
pi occurs among the reagents minus the number of times it occurs among the
152
products of the reaction; r(pi) is called the net occurrence of particle pi in
reaction r. For example, consider the particles �; e; ; �; p; p. Let d be the decay
� ! e + � + �. Then d(�) = 1; d(e) = �1; d(v) = �2; d(p) = 0; d(p) = 0.
For the reaction r: p + p ! p + p + p + p, we have r(�) = 0; r(e) = 0; r(v) =
0; r(p) = �1; r(p) = �1. Hence with each reaction r there is associated an
in�nite-dimensional vector r = (r(p1); r(p2); : : : ; r(pn); : : :), all but �nitely many
of whose entries are 0. In what follows, I identify reactions with the vectors
encoding them, so that the set of possible reactions R is now the set of vectors
with integral components. Note that the same vector might result from two
di�erent reactions. For example p+ p! p+ p+� and p! p+� have the same
net occurrences of protons and pions. However, two reactions with the same
encoding conserve exactly the same quantum properties, so from the point of
view of selection rules, we do not need to distinguish among them.
In what follows, I mainly consider the problem of inferring selection rules
for a �nite set of n observed particles. In that case we may take reactions to be
n-dimensional vectors with integral components, that is, the space of logically
possible reactions is !n. I view !
n as a subset of Qn, the vector space of
n-dimensional vectors with rational components and the rationals Q as scalars.
As is evident from Table 8.2, we may think of a quantum property q as
an in�nite-dimensional vector whose entries q(pi) are integers that represent
the value of the property for particle pi. If a reaction r conserves quantum
property q, then summing the value of the quantum property for each particle
times its net occurrence must yield 0. That is, q � r = 0, where � denotes
the standard dot or inner product of two vectors. A conservation theory
Q is a matrix of quantum properties, whose rows are the quantum properties
postulated by Q (so Table 8.2 features a 31 � 7 conservation theory). If Q is
�nite, with columns c1; c2; :::; cn ; I say that the particles of Q are the particles
p1; p2; :::; pn |denoted by particles(Q)|and that Q is for p1; p2; :::; pn . If Q is
in�nite, particles(Q) is the set of positive natural numbers.
A reaction r is physically possible according to a conservation theory Q()
q � r = 0 for each row q of Q, that is, r is physically possible if and only
if r conserves all the quantum properties listed in Q. Using the de�nition
of matrix multiplication, we have that r is physically possible according to
Q () [Q]r = 0, where 0 is the zero vector whose dimension is the number
of quantum properties postulated by Q, that is the number of rows in Q. For
a matrix Q, the set of vectors mapped by Q to 0 is called the kernel of Q
and de�ned by ker(Q) = fr : [Q]r = 0g. So the physically possible reactions
according to Q are the logically possible reactions in the kernel of Q, that
is, ker(Q) \ R. Thus we may state the conservation hypothesis Conserve|
the proposition that some conservation theory describes precisely all physically
possible reactions|as follows.
Conserve = fw:w = ker(Q) for some conservation theory Q s.t. particles(Q) = particles(w):g
153
Philosophers and physicists such as Poincar�e and Feynman have consid-
ered the testability of conservation principles [Poincare 52], [Feynman 1965].
( [Kelly et al. 1997] outlines a learning-theoretic perspective on this debate.)
With regard to Conserve; the conservation hypothesis for quantum properties,
it is not possible to reliably decide the truth of the proposition even in the limit.
The simplest counterexample is again the situation in which a collision of two
protons may produce any number of pions (p+p! p+p+�+�+ :::). Suppose
we observe the transition p+ p! p+ p+�, and that the proton p and the pion
� are the only elementary particles. Then the conservation hypothesis is true
just in case we observe all reactions of the form p + p ! p + p + � + � + :::,
for any �nite number of pions. As we saw at the beginning of this section, it is
impossible to decide the hypothesis that a collision of protons may produce any
number of pions without further background knowledge. Hence it is impossible
to reliably settle the truth of the conservation hypothesis, even in the limit of
inquiry.
The fact that there is no reliable test of the conservation principles does not
mean that the conservation principle is a \convention" about the use or meaning
of terms without empirical content. On the contrary, Conserve is inconsistent
with many logically possible particle worlds. For example, if there are n parti-
cles p1; p2; :::; pn in the world and we observe n linearly independent reactions
among them (viewing the reactions as vectors in the encoding described above),
then the conservation hypothesis degenerates into triviality: all reactions are
physically possible. This is because a reaction r de�nes a homogeneous linear
equation of the form r(p1)q(p1) + r(p2)q(p2) + ::: + r(pn)q(pn) = 0, where the
net occurrences of particle pi in r are the given coe�cients and the values of
quantum property q for each particle pi are the unknowns; see Figure 8.4.
If a quantum property q is conserved in a reaction r, then q is a solution
to the linear equation that r de�nes. A basic theorem of linear algebra states
that the only solution to n linearly independent equations in n unknowns is
the trivial solution that gives the value 0 to each unknown. But if all particles
have 0 of each quantum property, then all logically possible transitions among
them conserve all quantum properties, and the conservation hypothesis implies
that all logically possible transitions are possible (which is false as far as we can
tell|for example, the decay �+ ! �+ + �
0 + �0 has never been observed.)
8.7 Inferring Conservation Theories Without Vir-
tual Particles
Assuming the conservation hypothesis makes the assignment of quantum num-
bers much easier. More precisely, the conservation principle constrains the pos-
sible alternative theories of particle reactions. The results of this and the next
section show exactly how much this constraint reduces the complexity of the
154
Figure 8.4: A Set of Reactions, encoded as Vectors with associated Linear Equa-
tions.
inductive problem|complexity being measured on the scale of feasibility that
I have developed in previous chapters.
Some subtleties arise if we allow conservation theories to posit the existence
of \virtual" undetectable particles to balance the count of quantum properties in
some transitions. I �rst consider the simpler case in which conservation theories
are restricted to observable particles.
Let a �nite set of n observable particles, and some �nite setE = fe1; e2; :::; ekg
of observed reactions be given. As we saw in Section 8.5, minimaxing retractions
and data-minimality piecemeal with respect to each reaction requires choosing
the conservation theory with the \closest" �t to the reactions reported in E:
Choose a conservation theoryQ that is consistent with all the observed reactions
in E, but minimize the number of unobserved reactions consistent with Q. If we
restrict our theory to the n particles discovered so far, there is a unique conserva-
tion theory Q that meets these requirements. First, I observe that all reactions
that are linear combinations of the observed reactions in E must conserve all
quantum properties that the reactions in E conserve. For let r1 and r2 be two
(vectors encoding) reactions in E. If a conservation theory Q is consistent with
r1 and r2; we have that [Q]r1 = [Q]r2 = 0: By the linearity of matrix multipli-
cation, [Q](a1r1 + a2r2) = [Q](a1r1) + [Q](a2r2) = a1([Q]r1) + a2([Q]r2) = 0:
So all �nite linear combinations of observed reactions are consistent with any
conservation theory Q that is consistent with the observed reactions. To state
this fact formally, let span(E) be the set of linear combinations of reactions in
E (i.e., span(E) = fr : r = �jEj
i=1aiei, for arbitrary rationals ai:g:)
Fact 8.3 Let E be a �nite set of observed reactions, and let Q be a conservation
155
theory consistent with E (i.e., E � ker(Q)). Then all linear combinations ofreactions in E are consistent with Q (i.e., span(E) � ker(Q)).
Conversely, for a given list of reactions E featuring n observed particles,
there is a conservation theory that allows no reactions outside of the span of E:
we may simply choose a basis for the span of E, which is itself a linear subspace
of Qn. Since we may pick the basis vectors from E itself, it is clear that we can
choose quantum properties with integral values for each particle.
Fact 8.4 Let E be a �nite set of observed reactions. Then there is a conserva-
tion theory Q for the n observed particles in E such that Q is consistent withall and only linear combinations of the reactions in E (i.e., ker(Q) = span(E)).Moreover, the quantum properties in Q can be chosen to have integral values
only.
Fact 8.4 yields an algorithm � for inferring selection rules: For a given list
of reactions e, choose a basis for the span of e with integral components. This
procedure is the only one (up to choices of basis) that minimaxes retractions and
is data-minimal with respect to deciding whether a given reaction among the
n observed particles is possible or not. Assuming the conservation hypothesis,
the procedure is reliable and in fact identi�es a complete conservation theory
of reactions among the n observed particles with at most n mind changes. For
our algorithm � changes its mind after a list e � r only if r is not a linear
combination of the reactions in e. But this means that r is linearly independent
of the reactions in e. The rank of a set of vectors V is the maximum number of
linearly independent reactions in V , written rank(V ); similarly, rank(Q) is the
maximum number of linearly independent rows in a matrix Q. So if � changes
its mind on e � r, then rank(range(e) [ r) = rank(range(e)) + 1. That is,
with each mind change the rank of the observed reactions increases. Recall that
we may view the observed reactions as linear equations in n unknowns, with
each quantum property that is consistent with the observed reactions being a
solution to the equations. Then by a standard theorem of linear algebra, we
have that rank(E) + dim(S) = n, where S is the space of solutions to the
equations (reactions) in a set of reactions E, and dim(S) is the dimension of the
solution space, the size of the largest set of linearly independent solutions. If
a conservation theory Q is consistent with the equations (reactions) in E, then
each row in Q is a solution to the equations in E. Thus it follows that
rank(E) + rank(Q) = n
whenever Q is consistent with E. So every time that our procedure � changes
its mind on data e � r, the rank of its new conservation theory �(e � r) is one
lower than the rank of its previous conservation theory �(e). Since the highest
possible rank of a conservation theory for n particles is n and the lowest possible
one is 0, � changes its mind at most n times. To see that, given the conservation
156
hypothesisConserve, � is a reliable procedure for discovering theories of particle
reactions, note that on any data stream � consistent with Conserve, there is a
�nite time t after which span(�jt) = range(�). This is true again because every
time the span of the observed reactions increases, so does its dimensionality,
which cannot be greater than the dim(Qn) = n.
These results illustrate the power of the conservation hypothesis. With-
out this assumption, it is impossible to reliably identify a complete theory of
particle reactions among just two observable particles, even with an unlimited
number of mind changes (see Section 8.6). With the conservation hypothesis
in place, theorists can solve this problem with at most two mind changes. The
inference task becomes more complex if we allow the possibility of undetectable
particles. The next section shows that the goal of avoiding unnecessarily many
retractions|selecting the \closest �t" to the observed reactions|can require
theorists to introduce such virtual particles.
8.8 Inferring Conservation Theories With Vir-
tual Particles
Introducing undetectable particles allows the theorist to reinterpret a laboratory
report by asserting that the report only included detectable particles, while the
reaction that actually took place involved undetectable particles. For example,
experimentalists observed the decay of a neutron into a proton and an electron,
n ! p+ + e
�. This process fails to conserve energy. Physicists hypothesized
that there was another particle balancing the energy count, an anti-neutrino,
such that the neutron actually decays into three particles n! p+ + e
� + ��e.
To emphasize the distinction between what was directly observed and what
reaction actually occurred, I say that the transition n! p++e
� was reported
when the reaction n! p+ + e
� + ��e took place.
Conservation of momentum implies the presence of some hidden particles as
well. For example, Figure 8.5 illustrates the tracks that the decay of the pion
into a muon leaves in the bubble chamber [Cooper 1992, p.445].
Here conservation of momentum suggests the presence of a chargeless parti-
cle, say a neutrino. But the evidence does not determine the quantum numbers
of this particle; hence a theorist may say that it is a new kind of neutrino,
with muon number 1. Indeed, physicists take the decay of the pion to be
�+! �
+ + ��. This suggests a di�erent model for introducing undetectable
particles, where the evidence includes information about whether or not a par-
ticle without tracks was present, but the theorist decides whether or not it is
a new kind of particle and what its properties are. Indeed, the theorist may
make conjectures about exactly how many virtual particles were produced. For
example, it is thought that the decay of the muon produces a neutrino and an
anti-neutrino: �� ! e� + �� + �e.
157
Figure 8.5: Track of the Decay of a Pion into a Muon
158
I call this model of evidence for particle reactions the constrained virtualparticle model. The constrainedmodel seems closer to practice than allowing the
theorist to posit undetectable particles without constraints from the evidence.
However, the unconstrained model is simpler to analyze, and most results carry
over to the constrained scenario. I leave the analysis of inferring selection rules
in the constrained scenario open for future work.
Given a set of detectable particles D, the visible part of a reaction r is de-
noted by rjD, and de�ned by agents(rjD) = agents(r)\D, and products(rjD) =
products(r)\ D. In terms of our encoding of reactions by vectors, the visible
part of a reaction is its orthogonal projection onto the detectable particles. For
a set of reactions R;RjD denotes the visible parts of the reactions in R (i.e.,
RjD = frjD : r 2 Rg . (I assume that if physicists can track a particle in a
reaction r, then they can �nd it in another reaction r0.) Let T be a particle
theory according to which particles P exist, and let D � P be the detectable
particles in P . Then the empirical content of T is T jD . 9
Now we are ready to characterize the di�erence between conservation the-
ories with and without virtual particles. The di�erence is that whereas the
conservation hypothesis commits theories without virtual particles to allowing
all linear combinations of observed reactions, it commits theories with virtual
particles only to allowing the linear combinations of observed reactions with in-tegral coe�cients. Before I prove this result in general, it is helpful to illustrate
the phenomenon with an example.
Suppose that the transitionsK ! � and K+K ! K+K+�+� have been
observed. Then if we don't introduce further particles, all transitions involving
the K and � particle are possible: Let K = p1 and � = p2, and encode the �rst
reaction as (1,-1), and the second as (0,-2). Since (1,-1) and (0,-2) are linearly
independent, they form a basis for Q2 and hence their span is Q2. So by Fact
8.3, any conservation theory for K and � that is consistent with the observation
is consistent with all transitions among K and �. A direct way to see this is to
note that if K +K ! K +K + �+ � conserves all quantum properties, then �
must carry 0 of every property. So K must carry 0 of every property as well if
K ! � is possible.
Now let's hypothesize that during the transition K +K ! K +K + �+� a
neutrino � was present, and the reaction that actually took place was K+K !
K + K + � + � + v. Then if we introduce a quantum property q and assign
q(K) = 1;q(�) = 1;q(v) = 2, we obtain a non-trivial conservation theory that
is consistent with K ! � and K + K ! K + K + � + �. To see that q is
non-trivial, notice that the transition K +K ! � is not possible, because the
reactionsK+K ! �+v+ :::+v do not conserve q, for any number of neutrinos
�. The encoding of K +K ! � with only K and � as the particles involved
is (2;�1) = 2(1;�1)� 12(0;�2). So (the encoding of) K +K ! � is a linear
9Strictly speaking, at this point we should enrich the notion of a possible particle world to
include information about which particles are and are not observable.
159
combination of (the encodings of) K ! � and K+K ! K+K+�+�, but nota linear combination with integral coe�cients. On the other hand, the reaction
K +K ! K +K + �+ �+ �+ �+ v + v conserves q and hence the transition
K+K ! K+K+�+�+�+�may be observed, according to the conservation
theory q. The encoding of the transition K + K ! K + K + � + � + � + �
is (0,-4) = 2(0,-2), which is the encoding of K + K ! K + K + � + �. In
general, if a conservation theory Q is consistent with a given set of observed
transitions, then Q allows linear combinations with integral coe�cients of the
observed transitions. This is because multiple neutrinos may appear among the
products of reactions, but not \fractional" neutrinos|it is the essence of the
particle concept that particles are whole, discrete entities.
The next proposition states formally that under the conservation hypothesis,
any linear combination of possible transitions with integral coe�cients is also
possible. Let V = fv1;v2; :::;vng be a �nite set of vectors, and let the integral
span of V be the linear combinations of vectors in V with integral coe�cients.
I write int � span(V ) for the integral span of V ; so int � span(V ) = fv : v =
�ni=1kivi, where each coe�cient ki is an integer.g:
Proposition 8.5 Let E be a �nite set of observed transitions, and let Q be anyconservation theory consistent with E (i.e., E � ker(Q)jD, where D are the
(detectable) particles involved in the transitions of E). Then Q is consistent withall transitions in the integral span of the transitions in E (i.e., int� span(E) �
ker(Q)jD).
So the conservation hypothesis commits a particle theorist to the integral
span of observed transitions. Is there a way of ruling out transitions beyond
this, as minimaxing retractions requires? It turns out that there is: It is always
possible to �nd a conservation theory whose empirical content is exactly the
integral span of the observed transitions. It is easy enough to say how to �nd
such a theory. The task is trivial for the empty set of observed transitions with
empty integral span (for example, we can introduce a quantum property qp for
each particle p, such that only p carries a non-zero value of q). Suppose, induc-
tively, that we have a set of transitions E and a conservation theory Q (possibly
involving virtual particles) whose empirical content is exactly the integral span
of E. If a new transition t is observed that contradicts Q, we introduce a new
undetectable particle � and postulate that � was among the products of the
reaction that gave rise to the observation of t. Then for each quantum property
q in Q that t fails to conserve, we assign to � just enough of q to balance the
q count. It is clear that this will produce a new conservation theory consistent
with t; it's not obvious but nonetheless true that the new theory allows only
the transitions in the integral span of e [ t: (Section 8.10 gives the proof.)
Proposition 8.6 Let E be a sequence of observed transitions. Then there isa conservation theory Q whose empirical content is exactly the integral span of
160
the reactions in E (i.e., ker(Q)jD \R = int � span(E), where D is the set of(detectable) particles involved in the transitions of E).
I am indebted to Jesse Hughes for suggesting Propositions 8.5 and 8.6 and
outlines of their proofs.
So the di�erence between what conservation theories with and without vir-
tual particles can express amounts to the di�erence between the span and the
integral span of a set of observed transitions. It is worth getting clearer about
this di�erence. What transitions exactly are in the span but not the integral
span of a set of transitions e? As the examples above suggest, the answer is
that they are the transitions that are fractions , but not integral multiples, oftransitions in the integral span of e.
Say that a vector v is a proper fraction of another vector v0 if v = v0=m, for
some integerm such that jmj > 1. Given a set �nite set V = v1;v2; :::;vn of vec-
tors, I denote the set of proper fractions of V by fractions(V ); so fractions(V ) =
fv : v is a proper fraction of some vector v0 in V:g. Then we have:
Fact 8.7 Let E be a �nite set of reactions. Then a reaction r is in the span ofE ()some multiple r0 of r is in the integral span of E. That is, span(E) \R =fractions(int� span(E)).
The next question is how di�cult it is to identify a complete empirically
adequate conservation theory for transitions among n detectable particles. Al-
though it might appear as though allowing the particle theorist to introduce
virtual particles should make her task easier, Propositions 8.5 and 8.6 make
clear that she has in fact more empirical possibilities to contend with, since she
can no longer assume that all particles involved in a reaction will be reported.
Weakening her background knowledge in this way has signi�cant methodological
consequences: It is no longer possible to reliably identify a complete empirically
adequate conservation theory in the limit of inquiry, even assuming the conser-
vation principle. To see this, let � be the Occam theorist who always chooses
the closest possible �t to the observed reactions (de�ned in the proof of Propo-
sition 8.6 from Section 8.10). Given the conservation principle, this procedure
settles on an empirically adequate theory if it stops changing its mind. Will
it stop changing its mind? There are two cases in which � changes its theory
on evidence e � t: (1) The transition t is outside the span of the transitions in
e, and (2) t is in the span of e, but not in the integral span of e. If we have
n observable particles, the �rst case cannot occur more than n times, for the
same reason as in Section 8.7. (More precisely, if we observe n linearly inde-
pendent transitions among n detectable particles, then any conservation theory
consistent with the observations permits all transitions among the detectable
particles; see Section 8.9.)
However, if we allow that there may be in�nitely many virtual particles,
� may never stop changing its conservation theories, even if experimentalists
161
have found all the observable particles. Indeed, using the fact that there is no
greatest prime number, we can give a demonic argument to prove that no theo-
rist can reliably identify an empirically adequate conservation theory of particle
reactions. The di�culty arises with only two detectable particles. Suppose we
have observed a transition of the form (0,-n) (and nothing else). Let b be any
prime number greater than n, and consider the transition (0;�b) = b
n(0;�n).
So (0,-b) is in the span of (0,-n) but not in the integral span since b is a prime
number and therefore there is no integer m such that m(0;�n) = (0;�b). Thus
we can construct the by now familiar demonic argument against any theorist
who aspires to �nd an empirically adequate theory of transitions among our
two particles: Present (0;�n); and nothing else until conjectures that the
integral span of (0;�n) contains all possible transitions. Pick the �rst prime b1greater than n, and present (0;�b1). Present nothing else until theorizes that
the integral span of f(0;�n); (0;�b1)g contains all possible transitions. Pick
the next greatest prime b2; present (0;�b2), which is in the span but not the
integral span of f(0;�n); (0; b1)g, etc.
However, if we assume that there are only �nitely many particles|the propo-
sition FIN from Section 8.4|the demon can lead the Occam theorist � to in-
troduce new \virtual" particles only �nitely often. For let � be the stream of
observed transitions, and let Q be a conservation theory for observable and
hidden particles that is empirically adequate for �. Call a particle neutral ifit carries 0 of every quantum property in Q. Without loss of generality, we
may suppose that no hidden particle is neutral (for if a hidden particle has 0
of each quantum property, it is super uous, at least from the point of view of
selection rules). Since no transition in � features more than two reagents, there
is a maximum number of times that a non-neutral particle appears in a reac-
tion allowed by Q. (Otherwise the quantum properties of the hidden particles
would outweigh those of the reagents in some reaction.) This in turn implies
that we can choose a �nite set of reactions B allowed by Q that determine the
empirical content of Q, that is, range(�) = int � span(B)jD \R, where D are
the observable particles in � (i.e., D = particles(range(�))). So after the �nitely
many transitions in BjD are observed along � , the Occam theorist � converges
to an empirically adequate conservation theory.
In sum, the conservation principle Conserve, together with the assumption
that there are only �nitely many particles|that is, Conserve\FIN implies that
the Occam theorist � will converge to an empirically adequate theory of particle
reactions|but we don't know how many virtual particles � may introduce along
the way.
162
8.9 Parsimony, Conservatism and the Number
of Quantum Properties
The methods that I've considered so far are conservative in the sense that the
never eliminate quantum properties or virtual particles, but accommodate new
evidence by postulating the presence of new virtual particles. However, it may
well be possible to �nd more parsimonious theories as new evidence comes in.
For example, a new transition t that is linearly independent of the previous
ones decreases the dimension of the space of conserved quantum properties by
one. For let Q be the previous conservation theory, and let r be a reaction
with visible part t that we posit to account for t. Then there is no set of other
reactions R consistent with Q such that span(R) includes r; otherwise t is in
the span of the visible parts of R, contrary to the assumption that no linear
combination of transitions allowed by Q yields t.
So when a previously prohibited transition occurs, there may be an opportu-
nity to eliminate a quantum property, a virtual particle, or both. If there have
been such opportunities for parsimony, physicists have passed them by and cho-
sen to avoid retractions by extending their theories conservatively. It would
be an interesting project to investigate whether there is a conservation theory
with the same empirical content as the current one but with fewer quantum
properties or fewer virtual particles. Could physicists have avoided introducing
two kinds of neutrinos?
Answering this kind of question would require a computer program to sift
through the masses of data that have been accumulated on particles and particle
reactions. The program should start by considering transitions with no proper
fractions because Fact 8.7 tells us that we can account for these transitions
without virtual particles. Thus transitions whose encodings feature a 1 or -1
are of particular interest. Fortunately, such transitions abound: In a decay,
conservation of energy entails that the decaying particle does not appear among
the products, so the decaying particle occurs a net times of 1 in the decay. Thus
the more linearly independent decays we �nd, the more constraints we obtain
on quantum properties, without having to consider virtual particles.
Recall that there cannot be more linearly independent quantum properties
than jP j � rank(Decays), where jP j is the number of observable particles and
rank(E) is the number of linearly independent decays. Without loss of gen-
erality, we may assume that all quantum properties are linearly independent,
because conservation theories whose quantum properties have the same span
permit exactly the same reactions. So jP j � rank(Decays) = jQj, where jQj is
the number of quantum properties assigned by a conservation theory consistent
with the observed decays. Introducing virtual particles to increase the number
of total particles allows more quantum properties than jP j�rank(Decays), but
the resulting conservation theory is empirically trivial.
How many linearly independent decays can we expect to �nd? The next
163
theorem says: a lot. To be precise, under a mild physical assumption, the num-
ber of linearly independent decays is at least as great as the number of distinct
particles that decay. Physicists refer to particles that decay as unstable. So
we have that rank(Decays) � Unstable, where Unstable is the number of sta-
ble particles. Using the previous equality jP j � rank(Decays) = jQj, we have
that jP j � Unstable � jQj, which means that Stable � jQj, where Stable is the
number of stable particles. Thus under the physical assumption in question,
a conservation theory cannot introduce more (linearly independent) quantum
properties than there are stable particles|a striking connection between quan-
tum properties and the number of stable particles in the world. In particular,
this shows that the conservation hypothesis implies, on pain of triviality, that
there are stable particles in the world. For the task of �nding minimal conser-
vation theories, this result means that if we begin by examining decays, we have
to consider no more quantum properties than the number of stable particles,
which is very small compared to the total number of particles.
The physical assumption in question is that we cannot set up a \decay
pump": A situation in which we start with a particle p, and regain p through
a series of decays of p and its products. Under conservation of energy, decay
pumps should not be possible because each decay leads to a more favorable state
(products with less mass). 10 I refer to the condition that decay pumps do not
occur as the irreversibility condition. To de�ne the irreversibility condition, it
is useful to introduce the notion of a decay tree.
De�nition 8.1 Let D be a set of decays, involving a set of particles P . The
decay tree for particle p 2 P given D is de�ned as follows.
1. Each node of the tree is labelled with a single particle.
2. The root of the tree is labelled with p.
3. If q is stable in D (i.e., for all decays d 2 D; q 62 agents(d)), then allnodes labelled with q are terminal nodes.
4. If a node v labelled with q is not stable in D, the children of v are labelledwith the products of decays of q in D (i.e., the labels of the children of q
form the set fproducts(d) : d 2 D; fqg = agents(d)g ).
The irreversibility condition asserts that in a decay tree for a particle p , no
node other than the root is labelled with p.
De�nition 8.2 A set of decays D satis�es the irreversibility condition ()
for each particle p involved in D, the only node in the decay tree for p given D
labelled with p is the root.
10I leave a rigorous proof of this fact for future work.
164
Now we can state the following theorem: Let a number of decays D be given
in which a �nite number of unstable particles p1; p2; ::; pn decay. (D may also
involve stable particles). Suppose that the decays in D satisfy the irreversibility
condition. Then if we choose one decay di for each particle pi, the decays di are
linearly independent of each other.
There is no short informal argument for why this is true in general (Sec-
tion 8.10 has the formal proof, which involves the pigeon hole principle), but
considering the case of two decays will illustrate the phenomenon. Let p1 and
p2 be two distinct unstable particles with decays d1 and d2; let's suppose that
there is one other stable particle p3. The encodings of d1 and d2 are of the
form d1 = (1;�m1;�n1) and d2 = (�m2; 1;�n2), where m1;m2; n1; n2 are
natural numbers. If m2 = 0, then clearly d1 and d2 are linearly independent.
If m2 > 0, then p1 appears among the products of d2, that is, d2 is of the form
p2 ! p1 + ::: Thus the irreversibility condition implies that p2 does not appear
among the products of d1; that is, m1 = 0. But then d1 = (1; 0;�n1) is clearly
not collinear with d2 = (�m1; 1;�n2).
Theorem 8.8 Let D = fd1; d2; :::; dng be a �nite set of decays of distinctparticles p1; p2; :::; pn (i.e., if i 6= j, then pi 6= pj ). If D satis�es the ir-reversibility condition, then the decays in D are linearly independent, that is,
rank(fd1;d2; :::;dng) = n.
The good news in Theorem 8.8 is that the number of quantum properties
is bounded by the number of stable particles, which shows that conservation
theories are a succinct way of organizing the mass of data about particles and
their reactions. But it is surprising that nature should be so cooperative as to
�t a formalism that we happen to �nd congenial, 11 namely that of selection
rules. 12
One would like an explanation for why the number of linearly independent
decays that occur in nature, and linearly independent reactions in general, grows
immensely with each unstable particle, but then stops increasing to leave room
for the relatively small number of stable particles. Indeed, we may have some
doubts as whether the conservation hypothesis is in fact true|perhaps among
the mass of reactions, there is one that violates one of the current selection rules
11or as Feynman put it, selection rules are easy to \guess". [Feynman 1965, p.67]:\Thereason why we make these tables [of quantum properties] is that we are trying to guess at the
laws of nuclear interaction, and this is one of the quick ways of guessing at nature."12One would have more con�dence in a given quantum property or a virtual particle if we
could relate it to other physical phenomena. Indeed, physicists try to do just that. Thus
Feynman points out that physicists have made (unsuccessful) attempts to interpret baryons
as the source of a �eld, in analogy with charge. Similarly, after introducing a virtual particle,they try to relate it to other phenomena, and of course to detect it. \You might say that the
only reason for the anti-neutrino is to make the conservation of energy right. But it makesa lot of other things right, like the conservation of momentum and other conservation laws,
and very recently it has been directly demonstrated that such neutrinos do indeed exist".[Feynman 1965, p.76]
165
but hasn't been noticed by physicists, or more likely, just hasn't been observed
yet. A way of rigorously testing the conservation hypothesis would be to choose
quantum properties that account for the known decays of unstable particles|
a fairly small number by Theorem 8.8|then systematically generate reactions
that would violate them (that is, that are linearly independent of the decays
already accounted for) and check whether these have been observed or could be
produced in the laboratory.
8.10 Proofs
Fact 8.1 Let H be the set of alternative theories about what particles exist.
1. Without further background knowledge, it is impossible to reliably deter-
mine what elementary particles exist. That is, the discovery problem forH has no reliable solution given the vacuous background knowledge W .
2. If we assume that there are only �nitely many particles, it is possible toreliably determine the ontology of the particle world.
Proof. Part 1: Let � be a particle theorist who aspires to �nding the set of
actually existing elementary particles. An inductive kind of Cartesian demon
may prevent � from achieving its goal as follows. Let H be the set of particles
that � predicts exist before obtaining any evidence (i.e., H = particles(�(;)).
There are two cases: EitherH is �nite or in�nite. If the theorist initially predicts
that there are only �nitely many elementary particles, then the demon presents a
reaction with a new particle not in H. The demon presents no further reactions
until � changes its mind from its initial guess �(;) to a di�erent theory. Secondly,
if the theorist initially predicts that there are in�nitely many particles, then the
demon arranges a set E of reactions that feature only �nitely many particles,
and keeps presenting E until � changes its mind. If � changes its mind from �(;)
to another theory T , the demon chooses the next set of observed reactions in
the same way, depending on whether particles(T ) is �nite or in�nite. Let � be
the sequence of reactions that results from this interplay between the theorist
� and the demon. If � aspires to eventually determining what particles exist,
�'s theories must eventually entail a proposition H about the actually existing
particles along �. If in�nitely many elementary particles exist according to H,
the demon allows only �nitely many particles to occur along �, and so � stabilizes
to a false theory. If H asserts that only some �nite number n of elementary
particles exist, the number of particles that are involved in the reactions in �,
i.e. jparticles(range(�))j is in fact n + 1, so again � is again mistaken about
what elementary particles exist.
Part 2: Simply guess that the observed particles are exactly the ones seen so
far. If there are only �nitely many particles, our assumptions about the evidence
entail that eventually they will be observed. Then this procedure stabilizes to
the correct ontology.2
166
Proposition 8.2 Let H be the set of alternative theories about what particlesexist. Assume that there are only �nitely many particles (i.e., assume FIN
). Then the Occam theorist is the only theorist that reliably identi�es the true
ontology of the particle world, is data-minimal and minimaxes retractions withrespect to each ontological proposition in H.
Proof. If a theorist satis�es the Occam rule, that is, if �(e) = particles(range(e)),
she settles the truth about each ontological propositionH with at most two mind
changes: Her conjectures entail thatH is false until the existence of all and only
the particles inH is entailed by the evidence report. If particles(range(e)) = H,
the theorist might change her mind once so that H is entailed by the theorist's
conjecture. If then the existence of another particle ruled out by H is entailed
by the evidence and the theorist's background assumptions, H is conclusively
refuted, and the Occam theorist rejects H forever after another mind change.
Next suppose that � is reliable, data-minimal and minimaxes mind changes
with respect to what particles exist. Since � is data-minimal, � must take its con-
jectures seriously by Proposition 6.1. Hence particles(�(e))� particles(range(e)).
Suppose that on some evidence report e; �(e) predicts that some particle p exists
whose existence is not entailed by the evidence (and the applicable background
assumptions). Then the demon repeats the reactions in e, and provides no new
evidence to the theorist. Eventually the theorist must change her mind and con-
jecture that p does not exist. At that point the demon allows p to be discovered,
and presents a set of reactions E such that particles(E) = particles(�(e)). The
demon repeats E until � predicts that the actually existing particles are exactly
those featured in E. At that point, � has twice predicted that the actually
existing particles are those in E = particles(�(e)), and thus changed its mind
twice about this ontological proposition. Now the demon can simply let one
more particle be discovered (since the theorist's background knowledge allows
any �nite number of particles), so that � has to change its mind for a third time
about particles(E) to reject that hypothesis. Therefore � does not minimax
mind changes with respect to what elementary particles exist.2
Fact 8.3 Let E be a �nite set of observed reactions, and let Q be a conservationtheory consistent with E (i.e., E � ker(Q)). Then all linear combinations ofreactions in E are consistent with Q (i.e., span(E) � ker(Q)).
Proof. Let E = fe1; e2; :::; eng, and let r 2 span(E). Then r = �aiei for
some rationals ai. Since E � ker(Q), we have that for each ei; [Q]ei = 0, and
so by the fact that matrix multiplication is distributive, [Q]r = [Q](�aiei) =
�ai[Q]ei = �ai0 = 0. Hence r 2 ker(Q).2
Fact 8.4 Let E be a �nite set of observed reactions. Then there is a conserva-tion theory Q for the n observed particles in E such that Q is consistent with
all and only linear combinations of the reactions in E (i.e., ker(Q) = span(E)).Moreover, the quantum properties in Q can be chosen to have integral values
only.
167
Proof. Restrict the logically possible reactions to the particles occurring in
E, so that we are considering the vector spaceQn. The orthogonal complement
of a set of vectors V is denoted by V? , and de�ned as V ? = fv
0 : v � v0 =
0 for all v 2 V g. That is, V ? is the set of all vectors that are orthogonal
to every vector in V . It is a standard fact that if V is a linear space, then
(V ?)? = V . Since span(E) is a subspace of Qn, it follows from a familiar
theorem of linear algebra thatQn is the direct sum of span(E) and [span(E)]?,
that is, every vector r 2 Qn can be uniquely written as rE � rE? , where �
denotes vector addition and rE 2 span(E); rE? 2 [span(E)]?. Now choose a
basis fb1;b2; :::;bmg for span(E); where m � n. By another standard fact,
the orthogonal complement of a set is a subspace, so we may choose a basis
fqm+1; :::;qng for [span(E)]? such that fb1;b2; :::;bmg [ fqm+1; :::;qng is a
basis for Qn . I observe that (*) V ? = [span(V )]?. It is immediate that
[span(V )]? � V? because every vector that is orthogonal to all vectors in the
span of V is orthogonal to those in V . Conversely, let v? be a vector in V?,
and let v be a vector in span(V ). Then v =P
aivi for rationals ai and vectors
vi 2 V . So v? � v = v? � (P
aivi) =P
ai(v? � vi) = 0 since v? is orthogonal
to each vector in vi .
LetQ be thematrix whose rows are qm+1; :::;qn. The kernel ofQ is the set of
all vectors that are orthogonal to each row in Q; so ker(Q) = fqm+1; :::;qng?.
By (*), fqm+1; :::;qng? = [span(fqm+1; :::;qng)]
?. Since fqm+1; :::;qng is
a basis for [span(E)]?; span(fqm+1; :::;qng) = [span(E)]?. Hence ker(Q) =
(span(E)?)? = span(E), as required.2
Proposition 8.5 Let E be a �nite set of observed transitions, and let Q be any
conservation theory consistent with E (i.e., E � ker(Q)jD, where D are the(detectable) particles involved in the transitions of E). Then Q is consistent with
all transitions in the integral span of the transitions in E (i.e., int� span(E) �
ker(Q)jD).
Proof. Let E = fe1; e2; :::; eng, let r 2 span(E), and let P be the set of
particles in Q. For each ei, choose a reaction ri involving only the particles in P
such that the visible part of ri is ei, i.e. rijD = ei, and ri is physically possible
according to Q, that is, ri 2 ker(Q). Let t be a transition in the integral span of
E, such that t = �ziei for integers z0; :::; zjEj . Then [Q](�ziri) = �[Q](ziri) =
0, so r = �ziri is physically possible according to Q. But rjD = �zirijD =
�ziei = t. So the empirical content of Q includes the observation of t, as
required.2
Proposition 8.6 Let E be a �nite set of observed transitions. Then there is
a conservation theory Q whose empirical content is exactly the integral span ofthe reactions in E (i.e., ker(Q)jD \R = int � span(E), where D is the set of(detectable) particles involved in the transitions of E).
Proof. Let E = fe1; e2; :::; eng. I show how to �nd Q inductively. Let Q0 be
the jDj � jDj identity matrix. Clearly ker(Q0) = f0g: Inductively, suppose that
168
ker(Qk)jD\R = int� span(fe1; e2; :::; ek�1g). Reinterpret the transition ek =
agents(ek)! products(ek) by adding a hidden particle h among the products to
arrive at the \actual" reaction ehk= agents(ek)! products(ek) + h. Formally,
particles(ehk) = particles(ek) [ fhg; e
h
kjparticles(ek) = ek, and e
h
k(h) = �1.
Note that ehkjD = ek since all particles reported in ek are detectable. Extend
the matrix of quantum properties Qk with jP j columns to a matrix Qk+1 with
jP j+1 columns by modifying each quantum property (row) q in Qk as follows:
q0jP = qjP , and q0(h) = q � ek. That is, consider each quantum property q,
and assign to h the value q(h) that is required to restore the balance of q in
ek, if there is an imbalance. Geometrically, the introduction of the new particle
adds a dimension to the vector space. The component of each quantum property
in the new dimension is determined by choosing the component just so that it
balances the count of ek for the quantum property.
I now argue that the empirical content of Qk+1 is exactly the integral span
of fe1; e2; :::; ekg, that is, (ker(Qk+1) \R)jD = int � span(fe1; e2; :::; ekg).
(�) By hypothesis, choose for each i � k � 1 a reaction ri among only the
particles in P that is physically possible according to Qk and whose visible part
is ei. Since ri(h) = 0; for each extended quantum property q0, we have that
q0 � ri = q � ri+0(q0�ri(h)) = q�ri = 0 by the assumption that ri is physically
possible according to Qk. In other words, ri conserves all of the extended quan-
tum properties inQk+1 because ri conserves the unextended quantum properties
inQk and does not feature the hidden particle h. Therefore ri is physically possi-
ble according to Qk+1. Moreover, q � ehk = q � ek for each unextended quantum
property q because q(h) = 0. Hence for each extended quantum property q0;
we have that q0 � ehk = q � ehk + eh
k(h)q0(h) = q � ek � q � ek = 0. So e
h
kis
physically possible according to Qk+1, and the visible component of ehkis ek.
It follows by Proposition 8.5 that the physically possible transitions accord-
ing to Qk+1 include all transitions in the integral span of fe1; e2; :::; ekg.
(�) Note that Qk+1 contains as many linearly independent quantum prop-
erties as Qk, that is, rank(Qk) = rank(Qk+1). The dimension of the kernel
of Qk is called the nullity of Qk, written null(Q). By a standard theorem of
linear algebra, the nullity of Qk is the number of particles in P minus the rank
of Qk, and the nullity of Qk+1 is jP j + 1 � rank(Qk+1) = null(Qk) + 1. Let
fb1;b2; :::;bnull(Qk)g � R(P [ fhg) be a basis for ker(Q) \ R(P [ fhg) com-
prising logically possible reactions involving only the particles in P . Then ehkis independent of fb1;b2; :::;bnull(Qk)g since e
h
k(h) = �1 and the reactions en-
coded by b1;b2; :::;bnull(Qk) do not involve h. So fb1;b2; :::;bnull(Q) ; ehkg is a
basis for ker(Qk+1).
Now let r 2 ker(Qk+1) encode a physically possible reaction according to
Qk+1 among only the particles in P . Then r =P
null(Qk)
i=1 aibi+aehk for rationals
a; a1;::: . Since none of the reactions encoded by bi involve h; and eh
k(h) = �1; a
must be an integer if r is a possible reaction. Let rold =P
null(Q)
i=1 aibi and
rnew = aehk . Then rold is a possible reaction according to Qk, that is, rold 2
169
ker(Qk)\R(P ), and so by inductive hypothesis, the visible component of rold is
in the integral span of fe1; e2; :::; ek�1g; let roldjD = �k�1i=1 ziei for integers zi:
Thus the visible component of r is in the integral span of fe1; e2; :::; ekg, since
rjD = roldjD + rnewjD = �k�1i=1 ziei + a(ehk), where a is an integer. This shows
that the empirical content of all reactions that are physically possible according
to Qk+1 is the integral span of fe1; e2; :::; ekg, which completes the inductive
step. Continuing this process until k = n, we obtain the desired conservation
theory Q.2
Fact 8.7 Let E be a �nite set of reactions. Then a reaction r is in the span of
E () some multiple r0 of r is in the integral span of E. That is, span(E) \R =fractions(int� span(E)).
Proof. (�) Immediate.
(�) Let E = fe1; e2; :::; eng, and let r = �aiei encode a reaction in the span
of E involving the particles reported in E. Since r encodes a reaction, the aimust be rationals; let ai = pi=qi for each i. So (�jqj)r = �i(�jqj)pi=qiei =
�i(�j 6=iqj)piei, which is in the integral span of E. Letting �i(�j 6=iqj)piei =
�iniei, and (�jqj) = m we have that mr = �iniei, so r is a fraction of �iniei,
which is in the integral span of E.2
Theorem 8.8 Let D = fd1; d2; :::; dng be a �nite set of decays of distinct
particles p1; p2; :::; pn (i.e., if i 6= j, then pi 6= pj). If D satis�es the irre-versibility condition, then the decays in D are linearly independent, that is,rank(fd1;d2; :::;dng) = n.
Proof. Suppose for reductio thatD is not linearly independent. LetP
n
i=1 aidi =
0. I show that (*) for every particle pi, there is a particle pj such that pi is
contained in the decay tree for pi given D. For consider the decay di of particle
pi. If di is of the form pi ! pi + :::., then pi appears in the decay tree for pi:
Otherwise di is of the form pi ! :::, where pi does not appear on the right-hand
side among the decay products. Assume without loss of generality that ai > 0.
SinceP
n
i=1 aidi = 0, this implies that there is some k 6= i such that either dk is
of the form pi ! ::: and ak < 0, or dk is of the form pk ! pi + :::: and aj > 0.
The �rst case is impossible because we assume that pi only decays in decay di.
So the second case must obtain, and hence pi is in the decay tree of pk given
D. This establishes (*). Now by (*) a sequence of particles with the following
property exists: p0 = p1, and pi+1 is a particle in P such that pi is in the decay
tree of pi+1 given D. Since the successor relation in a tree is transitive, we have
that (**)8i:8j > i:pi is in the decay tree of pj given D. Furthermore, by the
pigeon hole principle at least one of the n particles in P must appear twice in
the sequence of length n + 1: By (**), this implies that some particle pi is in
the decay tree of pi given D . So if D is not linearly independent, D violates
the irreversibility condition.2
170
Chapter 9
Admissibility in Games
9.1 Outline
We have seen several times that admissibility (with respect to some epistemic
value) leads to short-run constraints on scienti�c methods. The theorems that
characterize admissible methods (3.1,3.2,4.1,6.1, 6.2,6.7) have a common struc-
ture: Admissible methods are those that satisfy certain constraints at each stageof inquiry. In this chapter, I show that this characteristic of admissible methods
is not limited to inductive problems, but stems from a more general fact of game
theory: In any game of perfect information|inductive problems such as discov-
ery and theory discovery problems are games of perfect information between the
scientist and \nature"|a strategy for the game is admissible if and only if the
strategy is admissible at each stage of the game, or at each \subgame" as game
theorists would say.
Indeed, this phenomenon is much more general: it applies even in games of
imperfect information, in which players may not always know everything about
the past history of the game when they are choosing their move. Again, ad-
missible strategies in games of imperfect information are exactly those that are
admissible at each stage of the game. 1 Most games of interest to economists,
political scientists and political philosophers for modeling social and economic
interactions are games of imperfect information. I apply my insights into the
structure of admissible strategies to derive predictions about how particular
games will be played based on the assumption that players will not follow dom-
1With a proviso: this claim may fail in games of imperfect recall, in which players forget
what they once knew or did; see De�nition 9.4. In micro-economics, games with imperfectrecall are of marginal interest. Throughout this thesis I have modelled inquiry as a game
of perfect recall. On the other hand, retracting background knowledge leads to a loss of
recall, because the inquirer does no longer know what she once knew. Hence the results fromthis chapter do not apply to a methodological setting that allows retractions of background
knowledge (such as the one in [Levi 1980]).
171
inated strategies. It is natural to iterate this idea, such that after ruling out
dominated strategies once in their deliberations before playing the game, I imag-
ine that players eliminate strategies that are dominated then, etc. I show that
the resulting deliberation procedure has several attractive formal properties that
game theorists have long sought in a \solution concept" for making predictions
in a game [Kohlberg and Mertens 1986]:
� The iterated elimination procedure gives the same answers when applied
to set of strategies in a game as it does when applied to the game tree that
explicitly models the dynamics of a game. This means that the procedure's
results do not depend on how exactly we represent the dynamics of a
particular strategic situation.
� The elimination procedure generalizes the standard backward solution for
games of perfect information; it sometimes makes stronger predictions
than backward induction, but never weaker ones.
� In games of imperfect information, iterated dominance underwrites typical
forward induction arguments.
Although the deliberation procedure based on iterated admissibility is nat-
ural and its results are intuitively attractive, it remains an open problem to
justify the assumption that players would reason about a game in this way, or
to give a decision-theoretic foundation for my solution concept (perhaps based
on common knowledge of \rationality", for a suitable de�nition of rationality,
rather than on deliberational dynamics.) [Bicchieri and Schulte 1997] discusses
some of the issues involved. The versions of the results from this chapter for
�nite games were previously published in [Bicchieri and Schulte 1997].
9.2 Preliminaries
9.2.1 Extensive and Strategic Form Games
I introduce some basic notions for describing deterministic games in extensive
form, which originate with [Von Neumann and Morgenstern 1947] and [Kuhn 1953].
The reader who is unfamiliar with game trees may �nd it useful to review the
game trees displayed in Figures 9.1{9.7 while going through the following de�-
nitions.
An extensive form game for players N = 1; 2; :::; n is described by a game
tree T with nodes V and root r, payo� functions ui for each player i, and
information sets Ii for each player i. The information sets partition the nodes
of a tree, and if information set Ii belongs to player i, then player i moves at
all nodes in Ii. A maximal path in a game tree T is called a history of T . The
payo� function ui assigns a payo� to each player i for each history in T . For
each node x in T , I(x) is the information set containing x. A (pure) strategy
172
si for player i in a game tree T assigns a unique action, called a move, to each
information set Ii of player i in T . I denote the set of i's pure strategies in T
by Si(T ). A strategy pro�le in T is a vector (s1; s2; :::; sn) consisting of one
strategy for each player i. I denote the set of pure strategy pro�les in T by
S(T ); that is, S(T ) = �i2NSi(T ). I use `s' for a generic strategy pro�le. It
is useful to write s�i for a vector of length n � 1 consisting of strategy choices
by player i's opponents. I write S�i(T ) for the set of strategy pro�les of i's
opponents; that is, S�i(T ) = �j2N�figSj(T ).
In the games that I shall consider, the root is the only member of its infor-
mation set (i.e., I(r) = frg), so a strategy pro�le s in T determines a unique
history < r; x1; x2; :::; xn ; ::: >. I refer to this history as the play sequence
resulting from s, and denote it by play(s). When a pure strategy pro�le s in T
is played, each player receives as payo� the payo� from the play sequence re-
sulting from s. With some abuse of notation, I use ui to denote both a function
from strategy pro�les to payo�s for player i, as well as a function from histories
to a payo� for player i, and de�ne ui(s) = ui(play(s)). For a �nite game tree
T , the height of a node x in T is denoted by h(x), and de�ned recursively by
h(x) = 0 if x is a terminal node in T , and h(x) = 1+maxfh(y) : y is a successor
of x in Tg otherwise.
A game G in strategic form is a triple hN;Si2N ; ui2N i, where N is the
number of players and, for each player i 2 N , Si is the set of (pure) strategies
available to i, and ui is player i's payo� function. Given a strategy pro�le
s = (s1; : : : ; sn); ui(s) denotes the payo� to player i when players follow the
strategies (s1; : : : ; sn). I denote the strategic form of an extensive form game T
by the collection S(T ) of strategies in T , with payo�s de�ned as in T .
9.2.2 Restricted Game Trees
I shall describe procedures for deliberation that eliminate possible plays of a
game before it is played. To do so, I introduce some notation for describing the
result of eliminating possibilities in a game. For games in extensive form, I refer
to the result of eliminating possibilities as a restricted game tree.
De�nition 9.1 Restricted Game Trees
� Let T be a �nite game tree for N = 1; 2; : : : ; n players.
� T jV is the restriction of T to V , where V is a subset of the nodes in T .
All information sets in T jV are subsets of information sets in T .
� Tx is the game tree starting at node x (i.e., Tx is the restriction of T to x
and its successors.) If I(x) = fxg, then Tx is called a subgame.
� If si is a strategy for T and T 0 is a restriction of T , sijT0 is the strategy that
assigns to all information sets in T0 the same choice as in T . Formally,
173
sijT0 (I 0
i) = si(Ii), where Ii is the (unique) information set in T that
contains all the nodes in I0i. Note that sijT
0 is not necessarily a strategyin T
0; for the move assigned by si at an information set Ii in T may be
not possible in T0.
� If s is a strategy pro�le in T and T0 is a restriction of T , sjT 0 is the
strategy vector consisting of s[i]jT 0 for each player i.
� Let S � S(T ) be a collection of strategy pro�les in a game tree T withplayers N . Then a node x is consistent with S if and only if there is a
strategy pro�le s in S such that x is part of the play sequence resultingfrom s, i.e., x 2 range(play(s)). The restriction of T to nodes consistentwith S is denoted by T jS. I observe that T jS(T ) = T .
� A node x is consistent with a strategy si by player i in T just in case thereis a strategy pro�le s�i in T such that x appears in the play sequenceplay(si; s�i).
9.3 Admissibility in Games
Consider a set of strategy pro�les S = S1 � S2 � ::: � Sn, and two strategies
si; s0i2 Si of player i. I say that a strategy si for player i is consistent with
S just in case there is a strategy pro�le s in S such that s[i] = si.
Player i's strategy si is weakly dominated by her strategy s0igiven S just
in case:
1. for all n � 1-tuples s�i chosen by i's opponents that are consistent with
S, ui(si; s�i) � ui(s0i; s�i) and
2. for at least one n� 1-tuple s�i consistent with S, ui(si; s�i) < ui(s0i; s�i):
A strategy si is weakly dominated given S just in case there is a strategy
s0iconsistent with S such that s0
iweakly dominates si given S. A strategy si is
admissible given S just in case si is not weakly dominated given S.
My goal in this section is to show that a strategy is admissible in the strate-
gic form of an extensive form game just in case the strategy is admissible at each
stage of the game, that is, at each information set. The next de�nition formu-
lates the notion of a strategy being admissible at an information set. Informally,
a strategy si is weakly dominated by another strategy s0iat an information Ii
in a game tree T if s0inever yields less to i at Ii than si does, and sometimes
yields more. For example, in the game of Figure 9.1, a weakly dominates b for 2
because a yields player 2 the payo� 2 for sure, while b may yield only 0 if player
1 plays R2.2 And in the game of Figure 9.6, b and c weakly dominate a at 2's
information set.2Here and elsewhere, the payo� at a terminal node is given as a pair (x; y), where x is the
payo� for player 1 and y is the payo� for player 2.
174
����#
##
###
ZZZZZZ
����
1,2
a
2
1
L1 R1
2,0
����1
3,2 1,0
%%
%%
eeee
L2 R2
b
�����
TTTTT
Figure 9.1: Admissibility in a Game of Perfect Information. The label inside a
nodes indicates which player is choosing at that node.
175
De�nition 9.2 Admissibility at an Information Set
� Let T be a �nite game tree for N = 1; 2; : : : ; n players.
� I de�ne the payo� to player i from strategy si and strategy pro�le s�i atx, written ui(si; s�i; x), to be ui(si; s�i; x) = ui(sijTx; s�ijTx).
� A strategy si is weakly dominated by another strategy s0iat an infor-
mation set Ii belonging to i in T just in case
1. for all strategy pro�les s�i in T , and for all y in Ii; ui(si; s�i; y) �
ui(s0i; s�i; y), and
2. for some strategy pro�le s�i and some node y in Ii; ui(si; si; y) <
ui(s0i; s�i; y).
� A strategy si is admissible at an information set Ii in T just in casesi is not weakly dominated at Ii:
With two quali�cations, I shall prove that a strategy is admissible in an
extensive form game just in case it is admissible at each information set. The
�rst quali�cation concerns the subtle point that a strategy may make prescrip-
tions at information sets that cannot be reached when the strategy is played.
For example, in the game of Figure 9.1, the strategy (R1R2) for player 1 yields
the same payo� as (R1L2). Hence both are admissible strategies for the overall
game, although (R1L2) is admissible at 1's second information set and (R1R2)
is not. Evaluating strategies only with respect to information sets that are con-
sistent with them leads to what [Bicchieri and Schulte 1997] call proper weakdominance, and proper admissibility. So in the game of Figure 9.1, (R1R2) is
properly admissible at 1's second information set .
I say that a node x in a game tree T is consistent with a strategy si
if there is some play sequence play(s) that reaches x such that si = s[i]. An
information set I in a game tree T is reachable with a strategy si if some
node in I is consistent with si .
De�nition 9.3 Sequential Proper Admissibility
� Let T be a �nite game tree.
� A strategy si is properly weakly dominated at an information set Iibelonging to i in T just in case Ii is reachable with si and si is weakly
dominated at Ii.
� A strategy si is properly admissible at an information set Ii just in
case si is not properly weakly dominated at Ii:
� A strategy si is sequentially properly admissible in T if and only ifsi is properly admissible at each information set Ii in T that belongs to
player i.
176
However, it is still not always the case that a strategy that is admissible in
the strategic form of a game is sequentially properly admissible in an extensive
form of the game. For example, in the game of Figure 9.2, the strategy L is
properly weakly dominated for player 2 at her information set: at node y;R
yields a higher payo� than L, and starting at node x, both L and R yield the
same. On the other hand, node y cannot be reached when 2 plays L, so L is an
admissible strategy for the overall game, yielding 2's maximal payo� of 1. The
game in Figure 9.2 has the strange feature that if 2 plays R at x to arrive at y,
she has `forgotten' this fact and cannot distinguish between x and y. Indeed,
this is a game without perfect recall. Informally, a game is one with perfect
recall if no player ever forgets what they knew or did. The formal de�nition of
perfect recall is as follows.
De�nition 9.4 (Harold Kuhn) Let T be a �nite game tree. Then T is anextensive game with perfect recall if and only if for each information set Iibelonging to player i, and each strategy si in T; all nodes in Ii are consistentwith si if any node in Ii is.
I note that if T is a game with perfect recall, then all restrictions of T satisfy
perfect recall. My main result is that in extensive form games with perfect recall,
the notion of proper weak dominance coincides exactly with admissibility among
strategies for the overall game (admissibility in the strategic form).
Theorem 9.1 Let T be a game tree with perfect recall. Then a strategy si forplayer i is admissible in the strategic form S(T ) if and only if si is sequentially
properly admissible in T .
9.4 Iterated Admissibility
A player who is reasoning, say, with the help of admissibility would not go very
far in eliminating plays of the game inconsistent with it, unless he assumes that
the other players are also applying the same principle. In the game of Figure
9.3, for example, player 1 could not eliminate a priori any play of the game
unless he assumed player 2 never plays a dominated strategy.
In general, even assuming that other players choose admissible strategies
might not be enough to rule out possibilities about how a given game might be
played. Players must reason about other players' reasoning, and such mutual
reasoning must be common knowledge. Unless otherwise speci�ed, I shall as-
sume that players have common knowledge of the structure of the game and of
their following the admissibility principle, and examine how common reasoning
about admissibility unfolds.
My procedure for capturing common reasoning about sequential weak ad-
missibility in T is the following. First, eliminate at each information set in T all
177
����#
##
###
����
0,1
L
2
1
����2
0,0 0,1
%%
%%
eeee
L R
'
&
$
%�������
`````̀R
x
y
Figure 9.2: A Game Without Perfect Recall
178
����#
##
###
ZZZZZZ
����
����
1,1 0,0 1,1 0,0
L R L R
2 2
1
a b0,0
hhhhhhhhhhhh
c
#"
!�
����
TTTTT
�����
TTTTT
Figure 9.3: Weak Admissibility
179
moves that are inconsistent with the admissibility principle, that is, dominated
choices. The result is a restricted game tree T 0.
Repeat the pruning procedurewith T 0 to obtain another restricted game tree,
and continue until no moves in the resulting game tree are weakly dominated.
Note that the recursive pruning procedure does not start at the �nal information
sets. This procedure allows players to consider the game tree as a whole and to
start eliminating branches anywhere in the tree by applying admissibility.
I de�ne the result of common reasoning about sequential proper admissibility
as follows. For a given game tree T , let Seq � PAi(T ) = fsi 2 Si(T ) : si is
sequentially properly admissible in Tg; and let Seq � PA(T ) = �i2NSeq �
PAi(T ).
De�nition 9.5 Common Reasoning About Sequential Proper Admissibility
� Let T be a game tree, with players N = 1; 2; :::n.
� The strategies in T consistent with common reasoning about se-
quential proper admissibility are denoted by CRPSeq(T ), and are de-
�ned as follows:
1. PSeq0(T ) = S(T ):
2. PSeqj+1(T ) = Seq � PA(T jPSeqj(T )):
3. s 2 CRPSeq(T )() 8j : sj[T jPSeqj(T )] 2 PSeqj+1(T ).
Next, consider a game G in strategic form. I de�ne an order-free iterative
procedure for eliminating weakly dominated strategies. If S is a set of strategy
pro�les, let Admissi(S) be the set of all strategies si for player i that are con-
sistent with S and admissible given S, and let Admiss(S) = �i2NAdmissi(S).
De�nition 9.6 Common Reasoning About Admissibility in the Strategic Form
� Let the strategic form of a �nite game G be given by hN;Si2N ; ui2N i, and
let S = S1 � S2 � ::: � Sn be the set of strategy pro�les in G.
� The strategies in S consistent with common reasoning about ad-
missibility are denoted by CRAd(S); and are de�ned as follows.
1. Ad0(S) = S:
2. Adj+1(S) = Admiss(Adj(S)):
3. CRAd(S) =T1
j=0Adj(S):
180
1,3 3,2 1,2
0,02,02,2
2,1 1,2 0,0
L M R
a
b
c
Figure 9.4: Order-Free Elimination of weakly Dominated Strategies
The procedure goes through at mostP
i2N jSi� 1j iterations; that is, for all
j �P
i2NjSi � 1j; Adj(S) = Ad
j+1(S) .
To illustrate common reasoning about admissibility, consider the game in
Figure 9.4. In the �rst iteration, player 1 will eliminate c, which is weakly
dominated by b, and player 2 will eliminate R, which is dominated by L andM .
Since admissibility is common knowledge, both players know that the reduced
matrix only contains the strategies a; b and L;M . Common reasoning about
admissibility means that both players will apply admissibility to the new matrix
(and know that they both do it), and since now L dominatesM , both will know
thatM is being eliminated. Finally, common reasoning about admissibility will
leave b; L as the unique outcome of the game.
From my main result it follows that in games with perfect recall, iterated
sequential proper admissibility and order-free elimination of inadmissible strate-
gies in the strategic form yield exactly the same result.
Theorem 9.2 Let T be a game tree with perfect recall. A strategy pro�le s isconsistent with common reasoning about sequential proper admissibility in T if
and only if s is consistent with common reasoning about admissibility in thestrategic form of T . That is, CRPSeq(T ) = CRAd(S(T )).
In in�nite games, there may not be any admissible strategy (a trivial example
181
is the one-player one-move game in which the player may choose a pay-o� as
high as he pleases). For �nite games, general existence is easy to establish.
Proposition 9.3 For all �nite games G with pure strategy pro�les S, CRAd(S) 6=
;.
9.5 Strict Dominance and Backward Induction
In this section I compare (iterated) sequential proper admissibility with two
other standard recommendations for reasoning about extensive form games:
backward and forward induction.
I establish that in �nite games of perfect information, common reasoning
about weak admissibility (eliminating strictly dominated strategies) gives ex-
actly the same results as Zermelo's backward induction algorithm, which in
�nite games of perfect information corresponds to Selten's notion of subgameperfection [Osborne and Rubinstein 1994, Ch.6]. I then show by examples that
the tight connection between common reasoning about weak admissibility and
subgame perfection breaks down in games of imperfect information.
A strategy is sequentially weakly admissible in a game tree T if it is weakly
admissible at each information set in T . A strategy si for player i is not weaklyadmissible at a given information set Ii if the strategy is strictly dominated
at Ii. This means that there is some other strategy s0ithat yields i a better
outcome than si at every node x in Ii. For example, in the game of Figure 9.3
, playing left (`L') at 2's information set strictly dominates playing right (`R').
The formal de�nition of sequential weak admissibility is the following.
De�nition 9.7 Strict Dominance and Weak Admissibility in Extensive FormGames
� Let T be a �nite game tree for N = 1; 2; : : : ; n players.
� A strategy si is strictly dominated by another strategy s0iat an infor-
mation set Ii belonging to i in T just in case for all strategy pro�les s�iin T , and for all y in Ii, ui(si; s�i; y) < ui(s
0i; s�i; y).
� A strategy si is weakly admissible at an information set Ii in T just
in case si is not strictly dominated at Ii.
� A strategy si is sequentially weakly admissible in T if and only if siis weakly admissible at each information set Ii in T that belongs to playeri.
As with (proper) admissibility, by iteratively applying weak admissibility to
eliminate possible plays we obtain predictions for the course of the game. To
illustrate the procedure, look at the game of Figure 9.3. R is eliminated at
182
2's information set in the �rst iteration, and then c is eliminated for player 1
because, after R is eliminated, either a or b yield player 1 a payo� of 1 for sure,
while c yields 0. This pruning procedure is formally de�ned as follows. For a
given game tree T , let Weak�Adi(T ) = fsi 2 Si(T ) : si is sequentially weakly
admissible in Tg , and let Weak �Ad(T ) = �i2NWeak �Adi(T ) .
De�nition 9.8 Common Reasoning about Sequential Weak Admissibility
� Let T be a �nite game tree for N = 1; 2; : : : ; n players.
� The strategies in T consistent with common reasoning about se-
quential weak admissibility are denoted by CRWA(T ), and are de�nedas follows:
1. WA0(T ) = S(T ):
2. WAj+1(T ) = Weak � Ad(T jWA
j(T )):
3. s 2 CRWA(T )() 8j : sj[T jWAj(T )] 2 WA
j+1(T ):
If T is a �nite game tree, the set of strategies for player i; Si(T ) is �nite, and
our procedure will go through only �nitely many iterations. To be precise, let
max =P
i2N jSij � 1; then the procedure will terminate after max iterations,
i.e. for all j � max;WAj(T ) = WA
j+1(T ).
To describe Zermelo's backward induction algorithm, I introduce the concept
of Nash equilibrium and one of its re�nements, subgame perfection, for generic
�nite games in extensive form. A strategy si in a game tree T is a best reply to
a strategy pro�le s�i of i's opponents if there is no strategy s0ifor player i such
that ui(s0i; s�i) > ui(si; s�i). A strategy pro�le s is aNash equilibrium if each
strategy s[i] in s is a best reply against s[�i]. A strategy pro�le s is a subgame
perfect equilibrium if for each subgame Tx of T; (sjTx) is a Nash equilibrium
of Tx. I say that a strategy si in T is consistent with subgame perfection if
there is a subgame perfect strategy pro�le s of which si is a component strategy,
that is, si = s[i]. I denote the set of player i's strategies in T that are consistent
with subgame perfection by SPEi(T ), and de�ne the set of strategy pro�les
consistent with subgame perfection by SPE(T ) = �i2NSPEi(T ). Note that
not all strategy pro�les that are consistent with subgame perfection are subgame
perfect equilibria. In Figure 9.5, all strategy pro�les are consistent with subgame
perfection, but L; ba0 and R; ab0 are not equilibria, since in equilibrium 1 must
be playing a best reply to 2's strategy.
Finally, T is a game of perfect information if each information set I of T
is a singleton. The game in Figure 9.5 is a game of perfect information.
A standard approach to �nite games of perfect information is to apply Zer-
melo's backwards induction algorithm which yields the set of strategy pro�les
that are consistent with subgame perfection (i.e., SPE(T )) [Osborne and Rubinstein 1994,
Ch.6.2]. Common reasoning about weak admissibility, as de�ned by the pro-
cedure WA, does not follow Zermelo's backwards induction algorithm. For
183
����#
##
###
ZZZZZZ
����
����
1,0 0,0 1,0 0,0
a b a' b'
2 2
1
L R
�����
TTTTT
�����
TTTTT
Figure 9.5: A Game of Perfect Information
184
example, suppose that in a game tree a move m at the root is strictly domi-
nated by another move m0 at the root for the �rst player. Common reasoning
about weak admissibility rules outm immediately, but the backwards induction
algorithm eliminates moves at the root only at its last iteration. Nonetheless,
in games of perfect information, the �nal outcome of the two procedures is the
same: In these games, the strategies that are consistent with common reasoning
about sequential weak admissibility are exactly those consistent with subgame
perfection.
Proposition 9.4 Let T be a �nite game tree of perfect information. Then a
strategy si is consistent with common reasoning about sequential weak admis-sibility in T if and only if si is consistent with subgame perfection. That is,CRWA(T ) = SPE(T ).
In games of imperfect information, the equivalence between strategies con-
sistent with subgame perfection and those consistent with common reasoning
about sequential weak admissibility fails in both directions. Figure 9.3 shows
that a strategy pro�le s may be a subgame perfect equilibrium although s is not
consistent with common reasoning about sequential weak admissibility: The
strategy pro�le (c;R) is a subgame perfect equilibrium, but R and (hence) c
are not consistent with common reasoning about sequential weak admissibility.
And in Figure 9.6, a is not strictly dominated for player 2, but a is neither a
best reply to L nor to R.
As we may expect, common reasoning about admissibility makes stronger
predictions about the course of the game than common reasoning about weak
admissibility. What is more surprising is that this can happen even in games of
perfect information. In light of Proposition 9.4, this means that the (iterated)
admissibility principle may rule out plays that are consistent with the standard
backwards induction solution to games of perfect information. For an example,
consider again the game of Figure 9.1. Common reasoning about admissibil-
ity rules out b as a choice for player 2 because b is weakly dominated. Then
given that only a remain at 2's decision node, R1 (strictly) dominates L1 for
player 1. So the only play consistent with common reasoning about sequential
proper admissibility is for player 1 to play R1 and end the game. Note however
that common reasoning about sequential weak admissibility, that is, the stan-
dard backwards induction procedure, is consistent with both R1 and the play
sequence L1; b; L2.
I next show that iteratively applying admissibility never leads to weaker re-
sults than iteratively applying weak admissibility. The key is to observe that
if a strategy si is strictly dominated in a game tree T , si will be strictly dom-
inated in a restriction of T . The next lemma asserts the contrapositive of this
observation: If a strategy si is admissible in a restriction of T , si is not strictly
dominated in T .
Lemma 9.5 If a game tree T is a restriction of T 0 and si is sequentially admis-
185
����
����
����
0,1 0,0 0,0 0,0
2 2
1
0,0 0,1
,,
,,
,,
llllll
,,
,,
,,
llllll
'
&
$
%a b c a b c
L R
"""""""""bb
bbbbbbb
Figure 9.6: Subgame Perfection vs. Weak Admissibility
186
sible in T , then there is an extension s0iof si to T
0 such that s0iis sequentially
weakly admissible in T0.
This means that our procedure P �Seq yields, at each stage j, a result that
is at least as strong as that of common reasoning about weak admissibility, the
procedureWA. Hence we have the following proposition.
Proposition 9.6 Let T be a game tree. If a play sequence is consistent with
common reasoning about sequential admissibility in T , then that play sequenceis consistent with common reasoning about sequential weak admissibility. Thatis, fplay(s) : s 2 CRSeq(T )g � fplay(s) : s 2 CRWA(T )g.
9.6 Weak Dominance and Forward Induction
It is commonly held that iterated weak dominance (i.e., iterated sequential
admissibility) captures some of the features of backward and forward induction.
[Fudenberg and Tirole 1993, p.461] thus state that
Iterated weak dominance incorporates backward induction in
games of perfect information: The suboptimal choices at the last
information sets are weakly dominated; once these are removed, all
subgame-imperfect choices at the next-to-last information sets are
removed at the next round of iteration; and so on. Iterated weak
dominance also captures part of the forward induction notions im-
plicit in stability, as a stable component contains a stable component
of the game obtained by deleting a weakly dominated strategy.
Indeed, I have previously shown that, in �nite game of perfect informa-
tion, common reasoning about weak admissibility yields exactly the backward
induction solution. In this section I show how, in �nite games of imperfect infor-
mation, common reasoning about admissibility yields typical forward induction
solutions. Thus backward and forward induction seem to follow from one princi-
ple, namely that players' choices should be consistent with common knowledge
of (and common reasoning about) admissibility. This result may seem question-
able, as it is also commonly held that backward and forward induction principlesare mutually inconsistent [Kohlberg and Mertens 1986], [Myerson 1991]. That
is, if we take backward and forward induction principles to be restrictions im-
posed on equilibria, then they may lead to contradictory conclusions about how
to play.
A backward induction principle states that each player's strategy must be
a best reply to the other players' strategies, not only when the play begins
at the initial node of the tree, but also when the play begins at any other
information set. 3 A forward induction principle says that players' beliefs
3This principle corresponds to subgame perfection and to sequential optimality (see Sec-
tion 4.5).
187
should be consistent with sensible interpretations of the opponents' play. Thus
a forward induction principle restricts the range of possible interpretations of
players' deviations from equilibrium play. Deviations should be construed as
`signals' (as opposed to mistakes), since players should privilege interpretations
of the opponents' play that are consistent with common knowledge of rationality.
The typical example of a contradiction between backward and forward induction
principles would be a game of imperfect information, where one may apply
forward induction in one part of the tree, and then use the conclusion for a
backward induction argument in a di�erent part of the tree [Kohlberg 1990].
The game of Figure 9.7 is taken from [Kohlberg 1990, p.10]. Since player I,
by choosing y, could have received 2, then by forward induction if he plays n
he intends to follow with T ; but for the same reason II, by choosing D, shows
that she intends to play R, and hence|by backward induction|I must play
B. What seems to be at stake here is a con ict between di�erent but equally
powerful intuitions. By playing D, player II is committing herself to follow up
with R, and thus player I would be safe to play y. On the other hand, once
player I's node has been reached, what happened before might be thought of as
strategically irrelevant, as I now has a chance|by choosing n|of signaling hiscommitment to follow with T . Which commitment is �rmer? Which signal is
most credible?
We must remember that players make their choices about which strategy to
adopt after a process of deliberation that takes place before the game is actu-
ally played. I have supposed that during deliberation, players will employ some
shared principle that allows them to rule out some plays of the game as incon-
sistent with it. A plausible candidate is admissibility. Let us now see how the
ex ante deliberation of the players might unfold in this game by applying the
procedure Seq(T ) to the strategies UL;UR;DL;DR and yT; yB; nT; nB. Note
that if we recursively apply to this game the concept of sequential admissibility
presented in the previous section, we must conclude that the only strategies
consistent with common reasoning about sequential admissibility are UR, and
yT . Indeed, common reasoning about sequential weak admissibility alone yields
this result. For during the �rst round of iteration, the strategy nB of player I
is eliminated because this strategy is strictly dominated by any strategy that
chooses y at I's �rst choice node. Similarly, the strategy DL of player II is im-
mediately eliminated because this strategy is strictly dominated by any strategy
that chooses U at the root. So after the �rst round of elimination, II's second
information set is restricted to the node reached with nT , and her choices at this
information set are restricted to R only. This means in turn that y now strictly
dominates nT at I's �rst information set, and U strictly dominates DR at the
root. Finally, the strategies yB and UL are not strategies in the restricted tree
obtained after the �rst round of elimination, and therefore they are eliminated.
After the second round of elimination, only UR and yT survive. Thus common
reasoning about sequential admissibility predicts that players who deliberate
according to a shared admissibility principle will expect U to be chosen at the
188
����#
##
###
ZZZZZZ
����
2,0
y
I
II
U D
0,2
����I
T B
n
����II ��
��II
4,1 0,0
""
""
""
TTTTT
0,0 1,4
TTTTT
'
&
$
%L R L R
�����
TTTTT
bbbbbb
Figure 9.7: Backward vs. Forward Induction Principles
189
beginning of the game.
A brief comment about the intuitive plausibility of the iterated admissibility
procedure is now in order. Note that this procedure does not allow the players
to discount whatever happens before a given information set as strategically
irrelevant. For example, if player II were to choose D , player I should not
keep playing as if he were in a new game starting at his decision node. I rather
suggest that I should expect II to follow with R, if given a chance. In which case
he should play y and player II, who can replicate I's reasoning, will in fact never
play D. On the other hand, playing D to signal that one wants to continue|if
given a chance|with R would make little sense, since II must know that nB
is never going to be chosen, and R makes sense only if it follows nB. In other
words, D is not a rational move for player II. Similar reasoning excludes nB as
a rational strategy for player I.
The problem with Kohlberg's and similar examples is that no constraints
are set on players' forward induction \signals". I de�ne the notion of a credible
signal in an extensive form game, and show that the credible signals are the
signals consistent with common reasoning about sequential admissibility (much
as Selten's subgame-perfect equilibria characterize \credible threats"). Thus the
examples in the literature which purport to show the con ict between backward
and forward induction principles involve forward induction signals that are not
credible.
The following de�nition formulates the notion of a forward induction signal
in general, and a credible forward induction signal in particular. The idea is this:
Let us consider a move m at a given information set Ii, and ask what future
moves of player i at lower information sets I 0iare consistent with sequential
admissibility and the fact that m was chosen at Ii. If there are future moves
that are consistent with sequential admissibility and the fact that m was chosen
at Ii, then I take the move m at Ii to be a signal that player i intends to follow
with one of those moves at I 0i. But I argue that in order for this signal to be
credible to i's opponents, at least one of the future admissible moves must be
consistent with common reasoning about sequential admissibility in T .
I say that an information set I 0iin a game tree T is reachable from another
information set Ii with a strategy si if there is are nodes x 2 Ii; y 2 I0isuch
that some play sequence that is consistent with sijTx contains y.
De�nition 9.9 Let T be a game tree with information set Ii. Let T jIi denote
the restriction of T to nodes in Ii and successors of nodes in Ii.
� A strategy si is consistent with forward induction at Ii if si is se-
quentially admissible at Ii.
� A move m at an information set Ii is a forward induction signal for
S�iat a lower information set I 0
i(written < Ii : m; I
0i: S�
i>), where
si 2 S�i()
190
1. si(Ii) = m;
2. I 0iis reachable from Ii with si;
3. si is consistent with forward induction at Ii.
� A forward induction signal < Ii : m; I0i: S�
i> is credible if some strategy
si in S�iis consistent with common reasoning about sequential admissibility
in T , i.e. si 2 CRSeq(T )i:
Let me illustrate these concepts in the game of Figure 9.7. According to my
de�nition, the only strategy that chooses n at I's �rst information set and is
consistent with forward induction is nT . So < I1I : n; I2I : fnTg > is a forward
induction signal, where I1I denotes I's �rst information set and I2I denotes I's
second information set. However, < I1I : n; I2I : fnTg > is not a credible signal.
For nT is inconsistent with common reasoning about sequential admissibility,
since such reasoning rules out L at II's second information set. Similarly for
player II, < I1II : D; I2II : fDRg > is a forward induction signal. But it is
not a credible signal, since DR is inconsistent with common reasoning about
sequential admissibility. Hence neither forward induction signal is credible, as
\sending" either signal is inconsistent with common reasoning about sequential
admissibility as de�ned by CRSeq.
In terms of reasoning about admissibility, the di�erence between Kohlberg's
and my analysis is this. Kohlberg applies admissibility once to argue that D is
a forward induction signal for R and n is a forward induction signal for T . But
if I assume that admissibility is common knowledge among the players, then
neither D nor n are credible signals. Indeed, common knowledge is not even
needed to get to this conclusion: it is su�cient to apply admissibility twice to
get the same result.
9.7 Proofs
Theorem 9.1 Let T be a game tree with perfect recall. Then a strategy si forplayer i is admissible in S(T ) if and only if si is sequentially properly admissiblein T .
Proof. Suppose that a strategy si in S(T ) for player i is weakly dominated
in S(T ). Then there is a strategy s0iconsistent with S(T ) such that
1. for all strategy pro�les s�i consistent with S(T ); ui(si; s�i) � ui(s0i; s�i),
and
2. for some strategy pro�le s��i consistent with S(T ); ui(si; s��i) < ui(s
0i; s��i).
Let x be the �rst node that appears along both the plays of si against s��i and
si against s��i at which si deviates from s
0i, so that x 2 range(play(si; s
��i)) \
191
range(play(s0i; s��i)) and si(Ii(x)) 6= s
0i(Ii(x)). Then x is consistent with si
and s0iin T . Let y be any node at Ii(x) consistent with si and s
0i, and let
s�i be any strategy pro�le of i's opponents. Then ui(si; s�i; y) � ui(s0i; s�i; y);
for otherwise, by perfect recall, let s��i be a strategy pro�le of i's opponents
such that both play(si; s��i) and play(s0
i; s��i) reach y, and such that s��ijTy =
s�ijTy . Then ui(si; s��i) > ui(s
0i; s��i), contrary to the hypothesis that s
0iweakly
dominates si in S(T ). Since I also have that ui(si; s��i; x) < ui(s
0i; s��i; x),
it follows that s0iweakly dominates si at Ii(x) so that si is not sequentially
properly admissible.
Suppose that a strategy si is properly weakly dominated at an information
set Ii in T by strategy s0i. Then there must be a node x in Ii consistent with si
and a strategy pro�le s0�i in T such that s0iyields a higher payo� at x against
s0�i than si does, i.e. ui(si; s
0�i; x) < ui(s
0i; s0�i; x). Assume without loss of
generality that x is reached by the play sequence of si against s0�i, i.e. x 2
range(play(si; s0�i)). Now I de�ne a strategy s�
ithat weakly dominates si in T
as follows.
1. At an information set I 0ithat does not contain x or any successor of x,
s�i(I 0i) = si(I
0i).
2. At an information set I 0ithat contains x or a successor of x; s�
i(I 0i) = s
0i(I 0i).
I show that s�iweakly dominates si in S(T ). Since play(si; s�i) reaches
x; play(s�i; s�i) also reaches x, and so ui(s
�i; s�i) = ui(s
�i; s�i; x) = ui(s
0i; s�i; x) >
ui(si; s�i; x) = ui(si; s�i). Thus s�iweakly dominates si in S(T ) if for no s�i
in T; ui(si; s�i) > ui(s�i; s�i), which I establish now: Let a strategy pro�le s�i
in T be given.
Case 1: the play sequence of (s�i; s�i) does not reach Ii(x). Then play(s
�i; s�i) =
play(si; s�i), and the claim follows immediately.
Case 2: the play sequence of (s�i; s�i) goes through some node y in Ii(x).
Since x is consistent with si and T is a game with perfect recall, y is consistent
with si, and so play(si; s�i) reaches y. As before, I have that (a) ui(si; s�i; y) =
ui(si; s�i) . Also, s�icoincides with s
0iafter node y, and so (b)ui(s
�i; s�i) =
ui(s0i; s�i; y). Since s
0iweakly dominates si at Ii(x), and y is in Ii(x), it follows
that (c) ui(s0i; s�i; y) � ui(si; s�i; y). Combining (a), (b) and (c) it follows that
ui(s�i; s�i) � ui(si; s�i). This establishes that si is weakly dominated given
S(T ):2
Theorem 9.2 Let T be a game tree with perfect recall. A strategy pro�le s is
consistent with common reasoning about sequential proper admissibility if andonly if s is consistent with common reasoning about admissibility in the strategicform of T . That is, CRPSeq(T ) = CRAd(S(T )).
Proof. I prove by induction on j that for all j � 0; PSeqj(T ) = Adj(S(T )):
Base Case, j = 0. Then by de�nition, PSeq0(T ) = S(T ) = Ad0(S(T )).
192
Inductive Step: Assume that PSeqj(T ) = Adj(S(T )) and consider j + 1:
By inductive hypothesis, T jPSeqj(T ) = T jAdj(S(T )). Now a strategy si is in
PSeqj+1i
(T ) () si is in PSeqj
i(T ) and si is sequentially properly admissible
in T jPSeqj(T ). By inductive hypothesis, the �rst condition implies that si is
in Adj(S(T )). By Theorem 9.1 and the facts that T jPSeqj(T ) = T jAdj(S(T ))
and that all restrictions of T are games with perfect recall, the second con-
dition implies that si is admissible in S(T jAdj(S(T ))) = Adj(S(T )). So si
is in Adj+1(S(T )). Conversely, a strategy si is in Ad
j+1(S(T )) () si is in
Adj(S(T )) and si is admissible in Ad
j(S(T )). By inductive hypothesis, the
�rst condition implies that si is in PSeqj(T ), and the second condition may
be restated to say that si is admissible in S(T jAdj(S(T ))). By Theorem 9.1,
the second condition then implies that si is sequentially properly admissible
in T jAdj(S(T )) = T jPSeqj(T ). Hence si is in PSeqj+1i
(T ). This shows that
PSeqj+1(T ) = Ad
j+1(S(T )), and completes the proof by induction.2
Proposition 9.3 For all �nite games G with pure strategy pro�les S;CRAd(S) 6=
;.
Proof. The admissible elements in Sj
isurvive at each iteration j , for each
player i, and there always is a admissible element in each Sj
isince each S
j
iis
�nite. Hence Sj 6= ; for any j, and so S
Pi2N
jSi�1j= CRAd(S) 6= ;.2
For the proof of proposition 9.4, I rely on the well-known one-deviation
property of subgame perfect equilibrium: If it is possible for one player to
pro�tably deviate from his subgame perfect equilibrium strategy si, he can do
so with a strategy s0ithat deviates from si only once.
Lemma 9.0 Let T be a �nite game tree of perfect information. Then s is a
subgame perfect equilibrium in T if and only if for each node x, for each playeri; ui(s[i]; s[�i]; x) � ui(s
0i; s[�i]; x), whenever s[i] and s
0idi�er only at x.
Proof. See [Osborne and Rubinstein 1994, Lemma 98.2].
For the next proposition, I note that if T is �nite, then our iterative proce-
dure goes only through �nitely many iterations. In particular, this means that if
a strategy si is strictly dominated given CRWA(T ), then si is not in CRWA(T ).
Proposition 9.4 Let T be a �nite game tree of perfect information. Then astrategy si is consistent with common reasoning about sequential weak admis-sibility in T if and only if si is consistent with subgame perfection. That is,
CRWA(T ) = SPE(T ).
Proof. I prove by induction on the height x of each node that CRWA(Tx) =
SPE(Tx). The proposition follows when I take x to be the root of T .
Base Case, h(x) = 1. Then all successors of x are terminal nodes. Let
player i be the player to move at x. Let max(x) be the maximum payo� player
i can achieve at x (i.e. max(x) = maxfui(y) : y is a successor of xg). Then
193
sijTx is consistent with subgame perfection at x if and only if si(x) yields i the
maximum payo� max(x), which is exactly when sijTx is not strictly dominated
at x.
Inductive Case: Assume the hypothesis in the case when h(y) < h(x) and
consider x.
()): Let s be a strategy pro�le consistent with common reasoning about
sequential weak admissibility (i.e. s 2 CRWA(Tx) ): Suppose that it is player
i's turn at x. For each player j; s[j]jTy is consistent with subgame perfection in
each proper subgame Ty of Tx, by the inductive hypothesis and the fact that
s[j] is consistent with common reasoning about sequential weak admissibility
in Tx. So the implication ()) is established if I show that s[i] is consistent
with subgame perfection in Tx. Let y be the successor of x that is reached
when i plays s[i] at x. Let max(y) be the maximum that i can achieve given
common reasoning about sequential weak admissibility when he follows s[i] (i.e.
max(y) = maxfui(s[i]; s�i; x) : s�i is consistent with CRWA(Tx)g). For each
y0 that is a successor of x, let min(y0) be the minimum that i can achieve given
common reasoning about sequential weak admissibility when he follows s[i] in
Ty0 . Then I have (*) that max(y) � min(y0) for each successor y0 of x. For oth-
erwise player i can ensure himself a higher payo� than s[i] can possibly yield,
by moving to some successor y0 of x and continuing with s[i]. That is, the
strategy s�iwhich moves to y0 at x and follows s[i] below y
0 strictly dominates
s[i] in TxjCRWA(Tx). But since T and hence Tx is �nite, this contradicts the
assumption that s[i] is consistent with CRWA(Tx). Now by inductive hypothe-
sis, CRWA(Ty0 ) = SPE(Ty0) for each successor y0 of x. So there is a subgame
perfect equilibrium smax in Ty which yields i the payo� max(y) in Ty and in
which player i follows s[i] (i.e. s[i] = smax[i]). Again by inductive hypothesis,
for each successor node y0 of x there is a subgame perfect equilibrium sy0
min in
Ty0 which gives player i the payo� min(y0) and in which player i follows s[i]
in Ty0 . Now I de�ne a subgame perfect equilibrium s� in Tx in which player i
follows s[i]:
1. s�[i](fxg) = s[i](fxg),
2. in Ty ,s� follows smax,
3. in Ty0 ; s� follows s
y0
min, where y0 is a successor of x other than y. By our
observation (*), there is no pro�table 1-deviation from s� for player i at
x, and hence by lemma 9.0, s� is a subgame perfect equilibrium in Tx.
(() Let s be consistent with subgame perfection in Tx. Let i be the player
moving at x: Consider any strategy s[j] in s, where j 6= i. Since j is not moving
at x; s[j] is consistent with common reasoning about sequential weak admissi-
bility in Tx if and only if s[j]jTy is consistent with common reasoning about
sequential weak admissibility in each subgame Ty of Tx. Since s is consistent
with subgame perfection in Tx, there is a subgame perfect equilibrium s� in Tx
194
in which j follows s[j]. Since s� is subgame perfect, s�jTy is subgame perfect in
Ty . Hence s[j]jTy = s�[j]jTy is consistent with subgame perfection in Ty . By in-
ductive hypothesis, this entails that s[j]jTy is consistent with common reasoning
about sequential weak admissibility in Ty . Since this is true for any subgame
Ty of Tx; s[j] is consistent with common reasoning about sequential weak ad-
missibility in Tx. Next, consider s[i], the strategy followed by the player who is
moving at x. I just established that for each iteration WAj(T ) of common rea-
soning about weak sequential admissibility, s�[�i] is consistent with WAj(T ).
Since s� is a subgame perfect equilibrium in Tx; s�[i] is a best reply against
s�[�i] in Tx and each subgame of Tx. So in each subgame Ty of Tx (including
Tx) and at each iterationWAj(T ); s�[i] is a best reply against some strategy pro-
�le of his opponents consistent with WAj(T ), namely s�[�i]jTy , and hence s
�[i]
is sequentially weakly admissible given WAj(T ). Since CRWA(T ) = WA
k(T )
for some k; because T is �nite, s�[i] is consistent with common reasoning about
sequential weak admissibility. This shows that all strategies in the strategy pro-
�le s are consistent with common reasoning about sequential weak admissibility
in Tx, and completes the proof by induction.2
Lemma 9.5 If a game tree T is a restriction of T 0 and si is sequentially admis-
sible in T , then there is an extension s0iof si to T
0 such that s0iis sequentially
weakly admissible in T0.
Proof. I construct s0ias follows. At each information set Ii in T
0 such that
Ii contains a node in T; s0i= si. At all other information sets Ii, s
0ifollows a
strategy that is weakly admissible at Ii. I claim that s0iis sequentially weakly
admissible in T0; let Ii be any information set in T
0 belonging to i.
Case 1: Ii contains a node x in T . Since T is a restriction of T 0; Ii contains
all nodes in IT (x), where IT (x) is the information set in T containing x. So if
si is strictly dominated in T0 at Ii, then si is strictly dominated in T at IT (x),
contrary to the supposition that si is admissible at IT (x)
Case 2: Ii contains no node x in T . By construction, si is weakly admissible
at Ii.2
Proposition 9.6 Let T be a game tree. If a play sequence is consistent withcommon reasoning about sequential admissibility in T , then that play sequenceis consistent with common reasoning about sequential weak admissibility. That
is, fplay(s) : s 2 CRSeq(T )g � fplay(s) : s 2 CRWA(T )g.
Proof. I prove by induction on j � 0 that for each j; T jSeqj(T ) is a restriction
of T jWAj(T ).
Base Case, j = 0. Then Seq0(T ) = WA0(T ), so the claim is immediate.
Inductive Step: Assume that T jSeqj(T ) is a restriction of T jWAj(T ), and
consider j + 1. Choose any strategy pro�le s in Seqj+1(T ). By lemma 9.5,
extend each s[i] in s to a strategy s0[i] that agrees with s[i] on information
sets that have members both in T jSeqj(T ) and T jWA
j(T ), and is sequentially
195
weakly admissible in T jWAj(T ). Call the resulting strategy pro�le s0; s0 is in
WAj+1(T ). Clearly s and s
0 result in the same play sequence, i.e. play(s0) =
play(s), because the same actions are taken at each information set. So all
nodes that are consistent with Seqj+1(T ) are consistent with WAj+1(T ), which
means that T jSeqj+1(T ) is a restriction of T jWAj+1(T ). This completes the
proof by induction. 2
196
Chapter 10
Conclusion
The problem of induction is at the heart of methodology, the philosophy of
science, and such epistemological disciplines as statistics and machine learning.
My dissertation pursued the question of what inferences we should draw from
empirical evidence not through historical case studies or formalizing common
intuitions, but by resolutely seeking hypothetical imperatives for inductive in-
ference, means-ends principles of the form: if those are your aims in inquiry,
then these are the methods you should employ.
Thus my question was what methods are the best for attaining given aims of
inquiry. I obtained principled answers based on a set of means-ends performance
criteria for evaluating the performance of inductive methods. These criteria
result from combining widely accepted epistemic values with standard principles
from decision theory.
Tracing out the consequences of adopting certain cognitive values leads
across some familiar territory, such as the observation, harking back to Descartes,
that those who want to avoid error must believe only what is certain, a means-
ends interpretation of Popper's falsi�cationism, and axioms for \minimal change"
belief revision. The means-ends analysis sheds new light on old principles by
showing them to be the optimal means of choice for certain cognitive values
(but not necessarily for others). It also led me to alternative|and I argue,
better|versions of the standard proposals, notably a new proposal for de�ning
\minimal" theory change.
A venerable tenet of methodology has been that scienti�c methods should
lead us to the right answer about the questions under investigation, if not
quickly, then at least in the long run, and that they should do so reliably; thatis, they should be designed so as to �nd the truth over a range of possible ways
the world could be. Many methodologists have held that this cannot be the
whole story about scienti�c inference, because long-run reliabilty is consistent
with any crazy behavior in the short run, and they want a theory of scienti�c
inference to guide us about what to say here and now. The most fruitful appli-
197
cation of my means-ends analysis addresses this concern. I combine the goal of
�nding the truth in the long run with other auxilliary epistemic values|such
as avoiding error and retractions, and minimizing convergence time|to obtain
powerful short-run constraints on asymptotically reliable methods. On this ap-
proach, we may think of the auxilliary cognitive goals as de�ning standards of
e�ciency for reliable inquiry.
These e�ciency criteria reveal a wealth of epistemologically signi�cant struc-
ture that has so far been overlooked by methodologists almost completely. It
turns out that the e�ciency criteria fall into a tidy hierarchy of feasibility. Thus
these performance standards constitute a scale of inductive complexity: We can
measure the di�culty of an inductive problem by the most stringent standard of
e�ciency that solutions to the problem can obtain. The hierarchy of cognitive
goals shows that two notions of e�ciency are of particular interest with respect
to the problem of induction: minimizing the time required to converge to the
right answer, and avoiding vacillations along the way. The problems of inquiry
in which these criteria apply share a common, topological structure, which I
characterized precisely. I traced this structure in several inductive problems
that look very di�erent on the surface: Goodman's Riddle of Induction, Oc-
cam's Razor, identifying the set of elementary particles, and determining what
particle interactions are possible. In all these cases, the combination of reliabil-
ity, convergence time, and avoiding retractions underwrites intuitively plausible
recommendations for what inferences to draw from given data.
I examined in some detail what e�cient inquiry in particle physics amounts
to. The interaction of means-ends methodology with the conceptual structures
that scientists employ illuminates both. We saw that prominent assumptions,
such as the idea that a set of conservation laws should be the complete theory of
particle reactions, signi�cantly reduce the inductive complexity of the research
problem|they allow reliable and e�cient solutions to the problem. Taking a
close look at the structure of conservation theories, I showed that e�ciency
can require particle theorists to introduce virtual particles, a new and purely
instrumental perspective on the role of hidden particles in particle theories. A
by-product from these insights into the structure of conservation theories is a
surprising connection between the number of stable particles and the number
of conserved quantities: Essentially, under conservation of energy there cannot
be more conservation principles than there are stable particles.
There is more to do in the methodological analysis of particle physics. But
even as it stands, the analysis can serve as a blueprint for other domains: Begin
by stating the problem|what questions are under investigation, and what evi-
dence is available to solve them? Then determine under what assumptions the
problem has a reliable solution, under what assumptions it has e�cient solu-
tions, and what exactly the e�cient solutions are like. The results in this thesis
assemble a variety of tools for carrying out this kind of analysis. Two complex
problems in which they might �nd fruitful applications are inferring theories of
gravitation, and curve �tting.
198
There are a number of important ways to add to the reliabilist toolkit. We
want to cover other aspects of scienti�c method by examining e�ciency in di�er-
ent scenarios for empirical inquiry, for example experimentation. Another open
question is what e�cient inquiry with bounded rationality is like|for example,
what means-ends recommendations can we give to computable agents? The the-
ory of reliable and e�cient empirical inquiry promises to be a rich conceptual
structure with rewarding applications.
199
Bibliography
[Alchourr�on et. al. 1985] C.E. Alchourr�on, P. G�ardenfors and
D. Makinson (1985). \On the logic of
theory change: partial meet contrac-
tion and revision functions." Journal ofSymbolic Logic 50: 510{530.
[Angluin and Smith 1983] Angluin, D. and Smith, C. (1983) \A
survey of inductive inference: The-
ory and methods." Computing Surveys15:237{289.
[Bicchieri and Schulte 1997] Bicchieri, C. and Schulte, O. (1997)
\CommonReasoning about Admissibil-
ity". Erkenntnis 45: 299{325.
[Blum and Blum 1975] Blum, M. and Blum, L. (1975). \To-
ward a Mathematical Theory of Induc-
tive Inference," Information and Con-trol 28:125{155.
[Bub 1994] Bub, J. (1994). \Testing Models of
Cognition Through the Analysis of
Brain-Damaged Performance," British
Journal for the Philosophy of Science.45:837{855.
[Carnap 1962] Carnap, R. (1962) \The Aim of Induc-
tive Logic", in Logic, Methodology andPhilosophy of Science, ed. E. Nagel, P.
Suppes and A. Tarski. Stanford: Stan-
ford University Press.
[Co�a 1991] Co�a, J. (1991) The Semantic Tradi-tion from Kant to Carnap. Cambridge:
Cambridge University Press.
200
[Comte 1968] Comte, A. (1968) A System of PositivePhilosophy. New York, B. Franklin.
[Cooper 1992] Cooper, L.N. Physics: Structure and
Meaning. Hanover, NH: University
Press of New England.
[DeFinetti 1990] DeFinetti, B. (1990). Theory of Proba-
bility, 2 vols. New York: Wiley.
[Donovan et al. 1988] Donovan, A, Laudan, L. and Laudan,
R. (ed.; 1992). Scrutinizing Science.
Baltimore: The John Hopkins Univer-
sity Press.
[Earman 1992] Earman, J. (1992) Bayes or Bust?.
Cambridge, Mass.: MIT Press.
[Feynman 1965] Feynman, R. (1965; 19th ed. 1990) TheCharacter of Physical Law. Cambridge,
Mass.: MIT Press.
[Ford 1963] Ford, K.W. (1963). The World of Ele-mentary Particles. New York: Blaisdell.
[Franklin 1990] Franklin, A. (1990). Experiment, right
or wrong. Cambridge: Cambridge Uni-versity Press.
[Fudenberg and Tirole 1993] Fudenberg, D. and Tirole, J. (1993).
Game Theory. MIT Press, Cambridge,
Mass.
[G�ardenfors 1988] G�ardenfors, P. (1988). Knowledge In
Flux: modeling the dynamics of epis-temic states. Cambridge: MIT Press.
[Gettier 1963] Gettier, E. (1963).
\Is Justi�ed True Belief Knowledge?",
Analysis. 23.6:121{123.
[Glymour 1980] Glymour, C. (1980). Theory and Evi-
dence. Princeton: Princeton University
Press.
[Glymour 1991] Glymour, C. (1991). \The Hierarchies
of Knowledge and the Mathematics of
Discovery", Minds and Machines. 1:75{
95.
201
[Glymour 1994] Glymour, C. (1994). \On the Methods
of Cognitive Neuropsychology," BritishJournal for the Philosophy of Science.
45:815{835.
[Glymour and Kelly 1990] Glymour, C. and Kelly, K. (1990) \Why
You'll Never Know if Roger Penrose is a
Computer," Behavioral and Brain Sci-
ences.Vol. 13.
[Glymour and Kelly 1992] Glymour, C. and Kelly, K. (1992)
\Thoroughly Modern Meno," in: Infer-ence, Explanation and Other Frustra-
tions. ed. John Earman. University of
California Press.
[Gold 1967] Gold, E. (1967). \Language Identi�ca-
tion in the Limit," Information and
Control. 10:447{474.
[Goodman 1983] Goodman, N. (1983). Fact, Fiction
and Forecast. Cambridge, MA: Harvard
University Press.
[Hacking 1968] Hacking, I. (1968). \One problem about
induction", in The Problem of Induc-
tive Logic, ed. Imre Lakatos, pp.44{59.
Amsterdam: North-Holland Publishing
Co.
[Harsanyi 1975] Harsanyi, J.C. (1975). \Can the Max-
imin Principle Serve as a Basis for
Morality? A Critique of John Rawls'
Theory," American Political ScienceReview vol. 69
[Hellman 1997] Hellman, G. (1997). \Bayes and Be-
yond," Forthcoming. Philosophy of Sci-ence.
[Hempel 1965] Hempel, C. (1965).Aspects of Scienti�c
Explanation. New York: Macmillan.
[Hesse 1970] Hesse, M. (1974). The Structure of Sci-
enti�c Inference. Berkeley: University
of California Press.
202
[Hume 1984] Hume, D. (1984). An Inquiry Con-cerning Human Understanding, ed.
C.Hendell. New York: Collier.
[James 1982] James, W. (1982) \The Will To Be-
lieve," in Pragmatism. ed. H.S. Thayer.Indianapolis: Hackett.
[Juhl 1995a] Juhl, C. (1995). \Is Gold-Putnam Di-
agonalization Complete?," Journal ofPhilosophical Logic. 24: 117{138.
[Juhl 1995b] Juhl, C. (1995). \Objectively Reli-
able Subjective Probabilities," Syn-these. forthcoming.
[Juhl 1994] Juhl, C. (1994). \The Speed-Optimality
of Reichenbach's Straight Rule of In-
duction," British Journal for the Phi-losophy of Science. 45:857{863.
[Juhl 1993] Juhl, C. (1993). \Bayesianism and Re-
liable Scienti�c Inquiry," Philosophy ofScience. 60, 2:302{319.
[Juhl and Kelly 1994] Juhl, C. and Kelly, K. (1994) \Realism,
Convergence, and Additivity," in Pro-ceedings of the 1994 Biennial Meeting ofthe Philosophy of Science Association,
ed. D. Hull, M. Forbes and R. Burian.
East Lansing: Mich.: Philosophy of Sci-
ence Association.
[Kant 1785] Kant, I. (1785). Groundwork of the
Metaphysic of Morals trans. J.J. Pa-
ton. London: Hutchinson's University
Library.
[Kelly 1995] Kelly, K. (1995). The Logic of Reli-
able Inquiry. Oxford: Oxford Univer-
sity Press.
[Kelly and Glymour 1992] Kelly, K. and Glymour, C. (1992). \In-
ductive Inference from Theory Laden
Data". Journal of Philosophical Logic.21:391{444.
203
[Kelly et al. 1994] Kelly, K., Juhl, C. and Glymour, C.
(1994). \Reliability, Realism, and Rel-
ativism", in Reading Putnam, ed. P.
Clark. London: BlackIll.
[Kelly and Schulte 1995a] Kelly, K. and Schulte, O. (1995).
\Church's Thesis and Hume's Prob-
lem," Proceedings of the IX Inter-national Joint Congress for Logic,Methodology and the Philosophy of Sci-
ence. Florence 1995.
[Kelly and Schulte 1995b] Kelly, K. and Schulte, O. (1995).
\The Computable Testability of The-
ories Making Uncomputable Predic-
tions,"Erkenntnis. 43:29{66.
[Kelly et al. 1995] Kelly, K., Schulte O. and Hendricks, V.
(1995). \Reliable Belief Revision". Pro-
ceedings of the IX International JointCongress for Logic, Methodology and
the Philosophy of Science. Florence
1995.
[Kelly et al. 1997] Kelly, K., Schulte, O. and Juhl, C.
(forthcoming) \Learning Theory and
the Philosophy of Science,"Philosophyof Science.
[Kitcher 1993] Kitcher, P. (1993). The Advancement
of Science. Oxford: Oxford University
Press.
[Kohlberg and Mertens 1986] Kohlberg, E., and Mertens, J.F. (1986).
\On the Strategic Stability of Equilib-
ria", Econometrica 54:1003{1037.
[Kohlberg 1990] Kohlberg, E. (1990) \Re�nement of
Nash Equilibrium: The Main Ideas", in
Game Theory and Applications, ed. T.Ichiishi, A. Neyman, and Y. Tauman,
Academic Press, San Diego.
[Kreps and Ramey 1987] Kreps, D.M. and Ramey, G. (1987).
\Structural Consistency, Consistency,
and Sequential Rationality", Econo-
metrica 55:1331{1348.
204
[Kuhn 1970] Kuhn, T. (1970). The Structure of Sci-enti�c Revolutions. Chicago: Univer-
sity of Chicago Press.
[Kuhn 1957] Kuhn, T. (1957). The Copernican Rev-
olution. Cambridge, MA: Harvard Uni-
versity Press.
[Kuhn 1953] Kuhn, H. (1953) \Extensive games and
the problem of information," in Contri-butions to the Theory of Games, eds.
Kuhn, H. and Tucker, A. Annals of
Mathematics Studies, 28, Princeton:
Princeton University Press.
[Lakatos 1970] Lakatos, I. (1970). \Falsi�cation and
the Methodology of Scienti�c Research
Programmes," in Criticism and the
Growth of Knowledge. eds. Lakatos, I.and Musgrave, A. Cambridge: Cam-
bridge University Press
[Levi 1967] Levi, I. (1967). Gambling With Truth.
New York: Alfred Knopf.
[Levi 1980] Levi, I. (1980)The Enterprise of Knowl-
edge. Cambridge: MIT Press.
[Levi 1983] Levi, I. (1983). \Truth, Fallibility and
the Growth of Knowledge", in Lan-guage, Logic and Method, eds. Cohen,
R. and Wartofsky, M. D. Reidel Pub-
lishing Company.
[Levi 1988] Levi, I. (1988) \Iteration of conditionals
and the Ramsey test", Synthese 76:49{
81.
[Lewis forthcoming] Lewis, D. ? \Elusive Knowledge," Aus-
tralasian Journal of Philosophy.
[Maher 1996] Maher, P. (1996). \Subjective and Ob-
jective Con�rmation," Philosophy ofScience. Vol. 63, 2: 149{174.
[Martin and Osherson 1995] Martin, E. and Osherson, D. (1995).
\Scienti�c discovery based on belief
205
revision". Proceedings of the X In-ternational Joint Congress for Logic,Methodology and the Philosophy of Sci-
ence. Florence 1995.
[Miller 1974] Miller, D. (1974). \On the Compari-
son of False Theories by Their Bases".
British Journal for the Philosophy ofScience 25:166-177.
[Myerson 1991] Myerson, R.B. (1991).Game Theory,Harvard University Press, Cambridge,
Mass.
[Newell and Simon 1976] Newell, A. and Simon, H. (1976).
\Computer Science as Empirical In-
quiry: Symbols and Search," Commu-nications of the ACM 19: 113{126.
[Newton 1995] Newton, I. (1995). The Principia.
Translated by AndrewMotte. Amherst,
NY: Prometheus Books.
[Nayak 1994] Nayak, A. (1994). \Iterated Belief
Change Based on Epistemic Entrench-
ment". Erkenntnis 41: 353-390.
[Nozick 1981] Nozick, R. (1981). Philosophical Expla-
nations. Cambridge: Harvard Univer-
sity Press.
[Omnes 1971] Omnes, R. (1971). Introduction to Par-ticle Physics. London, New York: Wi-
ley Interscience.
[Osborne and Rubinstein 1994] Osborne, M. and Rubinstein, A. (1994).
A Course in Game Theory, MIT Press,
Cambridge, Mass.
[Osherson and Weinstein 1988] Osherson, D. and Weinstein, S. (1988).
\Mechanical Learners Pay a Price
for Bayesianism," Journal of SymbolicLogic 53: 1245{1252.
[Osherson et al. 1991] Osherson, D., Stob, M. and Weinstein,
S. (1991). \A Universal Inductive In-
ference Machine," Journal of Symbolic
Logic. 56:661{672.
206
[Osherson et al. 1986] Osherson, D., Stob, M. and Weinstein,
S. (1986). Systems That Learn. Cam-
bridge, Mass: MIT Press.
[Peirce 1958] Peirce, C.S. (1958). Collected Pa-
pers of Charles Sanders Peirce, ed.
C.Hartshorne, P.Weiss and A. Burks.
Cambridge, Mass.: Belknap Press.
[Plato 1967] Plato (1967). Plato. Vol. II Translatedby W. Lamb. Cambridge, Ma.: Harvard
University Press.
[Poincare 52] Poincare (1952). Science and Hypothe-sis. New York: Dover
[Popper 1968] Popper, K. (1968). The Logic Of Scien-ti�c Discovery. New York: Harper.
[Popper 1972] Popper, K. (1972). Objective Knowl-edge. Oxford: Clarendon Press.
[Putnam 1963] Putnam, H. (1963). \`Degree of Con�r-
mation' and Inductive Logic," in The
Philosophy of Rudolf Carnap, ed. A.
Schilpp. La Salle, Ill: Open Court.
[Putnam 1965] Putnam, H. (1965). \Trial and Error
Predicates and a Solution to a Prob-
lem of Mostowski," Journal of SymbolicLogic 30: 49{57.
[Putnam 1975] Putnam, H. (1975). \Probability and
Con�rmation," in Mathematics, Matter
and Method. Cambridge: Cambridge
University Press.
[Rawls 1971] Rawls, J. (1971). A Theory of Justice.Cambridge, Ma.: Harvard University
Press.
[Rawls 1996] Rawls, J. (1996). Political Liberalism.
New York, NY: Columbia University
Press.
[Reichenbach 1949] Reichenbach, H. (1949). The Theory ofProbability. London: Cambridge Uni-
versity Press.
207
[Royden 1988] Royden, H.L. (1988).Real Analysis. 3rdedition. New York: Macmillan Publish-
ing Company.
[Salmon 1991] Salmon, W. (1991). \Hans Re-
ichenbach's Vindication of Induction,"
Erkenntnis. 35:99{122.
[Salmon 1963] Salmon, W. (1963). \On Vindicating
Induction," Philosophy of Science 24:
252-261.
[Salmon 1967] Salmon, W. (1967) The Foundations of
Scienti�c Inference. Pittsburgh: Uni-
versity of Pittsburgh Press.
[Savage 1954] Savage, L. (1954). \The Foundations of
Statistics". New York: Dover.
[Schulte and Juhl 1996] Schulte, O. and Juhl, C. \Topology
as Epistemology". The Monist 79:141{
148.
[Searle 1980] Searle, J. (1980). \Minds, Brains and
Programs". The Behavorial and BrainSciences 3:417{24.
[Seidenfeld 1988] Seidenfeld, T. (1988). \Decision The-
ory without `Independence' or without
`Ordering',"Economics and Philosophy4:267{290.
[Selten 1965] Selten, R. (1965) \Spieltheoretische Be-
handlung eines Oligopolmodells
mit Nachfragetr�agheit". Zeitschrift f�urdie gesamte Staatswissenschaft. 12:301-324.
[Sextus Empiricus 1985] Sextus Empiricus (1985). Selections
from the Major Writings on Skepticism,Man and God, ed. P. Hallie, trans. S.Etheridge. Indianapolis: Hackett.
[Shapere 1984] Shapere, D. (1984) Reason and the
Growth of Knowledge. Dordrecht: Rei-del.
208
[Valdes and Erdmann 1994] Valdes and Erdmann (1994). \Sys-
tematic Induction and Parsimony of
Phenomenological Conservation Laws",
Computer Physics Communications 83:171{180.
[Valiant 1984] Valiant, L.G. (1984). \A Theory of
the Learnable", Communications of the
ACM 27(11):1134{1142.
[Van Fraassen 1980] Van Fraassen, B. (1980). The Scienti�c
Image. Oxford: Clarendon Press.
[Von Mises 1981] Von Mises, R. (1981). Probability,Statistics, and Truth. NewYork: Dover.
[Von Neumann and Morgenstern 1947] Von Neumann, J. and Morgenstern, O.
Theory of Games and Economics Be-
havior 2nd edition, Princeton: Prince-
ton University Press.
209