ARITHMETICAL KNOWLEDGE AND ARITHMETICAL DEFINABILITY ...cholak/papers/walsh.pdf · I initially read...
Transcript of ARITHMETICAL KNOWLEDGE AND ARITHMETICAL DEFINABILITY ...cholak/papers/walsh.pdf · I initially read...
ARITHMETICAL KNOWLEDGE AND ARITHMETICAL DEFINABILITY:
FOUR STUDIES
A Dissertation
Submitted to the Graduate School
of the University of Notre Dame
in Partial Fulfillment of the Requirements
for the Degree of
Doctor of Philosophy
by
Sean Walsh,
Michael Detlefsen, Co-Director
Peter Cholak, Co-Director
Graduate Programs in Philosophy and Mathematics
Notre Dame, Indiana
December 2010
c© Copyright by
Sean Walsh
2010
All Rights Reserved
ARITHMETICAL KNOWLEDGE AND ARITHMETICAL DEFINABILITY:
FOUR STUDIES
Abstract
by
Sean Walsh
The subject of this dissertation is arithmetical knowledge and arithmetical
definability. The first two chapters contain respectively a critique of a logicist
account of a preferred means by which we may legitimately infer to arithmetical
truths and a tentative defense of an empiricist account. According to the logicist
account, one may infer from quasi-logical truths to patently arithmetical truths
because the arithmetical truths are representable in the logical truths. It is argued
in the first chapter that this account is subject to various problems: for instance,
the most straightforward versions seem vulnerable to various counterexamples.
The basic idea of the alternative empiricist account considered in chapter two is
that complicated arithmetical truths like mathematical induction may be inferred
by way of confirmation from less complicated quantifier-free arithmetical truths.
The notion of confirmation here is understood probabilistically, and responses
are given in this chapter to several seeming problems with this importation of
probability into arithmetic.
The final two chapters are concerned with arithmetical definability in two
different settings. In the third chapter, the interpretability strength of the arith-
metical and hyperarithmetical subsystems of second-order Peano arithmetic is
Sean Walsh
compared to the interpretability strength of analogous systems centered around
two principles called Hume’s Principle and Basic Law V, which respectively axiom-
atize a standard notion of cardinality and an alternative conception of set. One of
the major results of this chapter is that the hyperarithmetic subsystem of Hume’s
Principle does not interpret the hyperarithmetic subsystem of second-order Peano
arithmetic. The fourth chapter is concerned with arithmetical definability in the
setting of descriptive set theory, where the relevant benchmark is between notions
which may be defined without quantification over elements of certain topological
spaces (Borel notions) and notions whose definitions do require such quantifica-
tion (analytic, coanalytic, projective notions). In this fourth chapter the Denjoy
integral is studied from the vantage point of descriptive set theory, and it is shown
that the graph of the indefinite integral is not Borel but rather is properly coana-
lytic. This contrasts to the Lebesgue integral, which is Borel under this measure
of complexity.
To my father, for always encouraging me to see where the circles cross.
ii
CONTENTS
FIGURES . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . v
ACKNOWLEDGMENTS . . . . . . . . . . . . . . . . . . . . . . . . . . . vi
PREFACE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi
CHAPTER 1: LOGICISM, INTERPRETABILITY, AND KNOWLEDGEOF ARITHMETIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1 Introduction: The Logicist Template . . . . . . . . . . . . . . . . 11.2 Background: The Interpretability of Theories and Structures . . . 61.3 Theory-Based Versions: the Plethora and Consistency Problems . 121.4 Structure-Based Version: the Isomorphism and Signature Problems 241.5 Conclusions and Directions for Further Research . . . . . . . . . . 401.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
CHAPTER 2: EMPIRICISM, PROBABILITY, AND KNOWLEDGE OFARITHMETIC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 592.1 Introduction: Inceptive and Amplificatory Empiricism . . . . . . . 592.2 Challenges to Access to Probability Assignments . . . . . . . . . . 67
2.2.1 Countable Additivity: Aligning the True and Probable . . 682.2.2 The Non-Computability of Probability Assignments . . . . 77
2.3 Challenges to Arithmetical Instance Confirmation . . . . . . . . . 842.3.1 Baker and the Exigencies of Arithmetical Sampling . . . . 852.3.2 Stable and Unstable Reasoning in Geometry and Arithmetic 95
2.4 Challenges from Alternative Inferences . . . . . . . . . . . . . . . 1062.5 Conclusions and Directions for Future Research . . . . . . . . . . 1162.6 Notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
CHAPTER 3: COMPARING PEANO ARITHMETIC, BASIC LAW V,AND HUME’S PRINCIPLE . . . . . . . . . . . . . . . . . . . . . . . . 1493.1 Introduction, Definitions, and Overview of Main Results . . . . . 149
iii
3.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . 1493.1.2 Definition of Signatures and Theories of PA2, BL2 and HP2 . 1503.1.3 Definition of Subsystems of PA2, BL2 and HP2 . . . . . . . . 1563.1.4 Summary of Results about the Provability Relation . . . . 1603.1.5 Summary of Results about the Interpretability Relation . . 162
3.2 Standard Models of HP2 and Associated Results . . . . . . . . . . 1663.2.1 Models of HP2 from Infinite Cardinals . . . . . . . . . . . . 1673.2.2 The Mutual Interpretability of PA2 and HP2 . . . . . . . . . 175
3.3 Standard Models of Subsystems of BL2 and Associated Results . . 1863.3.1 Generalities on Models of Subsystems of BL2 . . . . . . . . 1863.3.2 Hyperarithmetic Theory and Related Results . . . . . . . 1903.3.3 Standard Models of the Hyperarithmetic Subsystems of BL2 197
3.4 Barwise-Schlipf Models of Subsystems of BL2 and HP2 . . . . . . . 2033.4.1 Generalized Barwise-Schlipf/Ferreira-Wehmeier Theorem . 2033.4.2 Application to Algebraically Closed Fields . . . . . . . . . 2123.4.3 Application to O-Minimal Expansions of Real-Closed Fields 2203.4.4 Application to Separably Closed Fields . . . . . . . . . . . 227
3.5 Further Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 235
CHAPTER 4: DENJOY INTEGRATION: DESCRIPTIVE SET THEORYAND MODEL THEORY . . . . . . . . . . . . . . . . . . . . . . . . . . 2374.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2374.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
4.2.1 Absolutely Continuous Functions and Generalizations . . . 2404.2.2 Basic Properties of the Denjoy Integral . . . . . . . . . . . 2464.2.3 Lebesgue’s Lemma and the Subspaces . . . . . . . . . . . . 250
4.3 Descriptive Set Theory . . . . . . . . . . . . . . . . . . . . . . . . 2564.3.1 Three Derivatives and Functions of Arbitrarily High Rank 2574.3.2 Totalization: Calibrating Rank and Entry into Subspaces . 2664.3.3 Definability: The Derivatives are Borel . . . . . . . . . . . 274
4.4 Model Theory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2884.4.1 Indexes of Subgroups and Non-Definability of the Integral 2894.4.2 Elementary Equivalence and Decidability . . . . . . . . . . 294
4.5 Further Questions . . . . . . . . . . . . . . . . . . . . . . . . . . . 304
BIBLIOGRAPHY . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 305
iv
FIGURES
1.1 Summary of Problems for Versions of the Logicist Template . . . 42
2.1 Alternative Confirming Inferences: Two Pairs of Contrasting Infer-ences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
3.1 Provability Relation in Subsystems of BL2, PA2, and HP2 . . . . . . 160
3.2 Interpretability Relation in Subsystems of BL2, PA2, and HP2 . . . 164
4.1 Containment Diagram for Subsets of M [a, b] and C[a, b] . . . . . . 241
v
ACKNOWLEDGMENTS
I would first and foremost like to thank my wife Kari for her love and patience–
I wouldn’t have been able to do any of this without her. I would also like to
thank my parents, Kevin and Linda, and my in-laws, Ron and Annette, for their
persistent support and encouragement.
My advisors Michael Detlefsen and Peter Cholak have obviously shaped this
dissertation and my overall thinking in ways that I cannot begin to describe,
and I would like to thank them for their patience and help on my dissertation,
as well as for their continued support for my doing interdisciplinary work. I
would also like to thank my teachers at Notre Dame, and in particular Timothy
Bays, Patricia Blanchette, Curtis Franks, Julia Knight, and Sergei Starchenko,
for introducing me to the philosophy of mathematics and mathematical logic.
Likewise, my education at Notre Dame has been enriched by the many visitors
to, as well as former members of, the logic group at Notre Dame, including Pe-
ter Gerdes, Karen Lange, David Lippel, Colin McLarty, Serge Randriambololona,
Reed Solomon, Vitezslav Svejdar, and William Tait. Finally, I would like to thank
two of my teachers from Gonzaga University, Wayne Pomerleau and John Burke,
who first introduced me to so much of what I have come to love about philosophy
and mathematics.
During graduate school I have been the beneficiary of generous financial sup-
port from many institutions and groups, including the Philosophy Department at
vi
Notre Dame, the Mathematics Departments at Notre Dame, the Ahtna Heritage
Foundation, the Deutscher Akademischer Austausch Dienst, the George-August
Universitat Gottingen, the National Science Foundation (under NSF Grants 02-
45167, EMSW21-RTG-03-53748, EMSW21-RTG-0739007, and DMS-0800198), the
Alexander von Humboldt Stiftung TransCoop Program, and the Ideals of Proof
Project, which in turn was funded and supported by Agence Nationale de la
Recherche, Universite Paris Diderot – Paris 7, Universite Nancy 2, College de
France, and Notre Dame. There are of course many people behind these vari-
ous institutions and groups, and I would like to especially thank Karine Chemla,
Brice Halimi, Gerhard Heinzmann, Felix Muhlholzer, Marco Panza, David Rabouin,
Ivahn Smadja, Jean-Jacques Szczeciniarz, and Christian Tapp.
I have been working on the material collected here for several years, and have
benefited both from many opportunities to present components of this material at
various conferences and seminars, as well as opportunities to discuss this material
with my teachers and friends, and I would like to record some of these debts here.
A very early version of some of the underlying thoughts from Chapter 1 was
presented at the Sixth Annual Midwest Philosophy of Mathematics Workshop
held at Notre Dame on October 8, 2005 under the title “Justifications of Hume’s
Principle and Mathematical Induction.” In that talk, I was focused on under-
standing the history of various attempts to justify Hume’s Principle and math-
ematical induction, but in giving the talk I was forced to try to articulate the
epistemic relationship between these two principles, and it was that first attempt
that prompted the reflections currently found in Chapter 1. A very succinct ver-
sion of the material from Chapter 1 was presented in the first half of my talk
at Dr. Detlefsen’s Ideals of Proof Fellows’ Workshop held at the Ecole normale
vii
superieure on September 8, 2009 under the title “The Role of Interpretability Re-
sults in the Justication of Axioms,” and a more definitive version of this material
was presented at FregeFest 2010 held at the Department of Logic and Philosophy
of Science at the University of California, Irvine on February 26, 2010 under the
title “Logicism, Interpretability, and Knowledge of Arithmetic.” Needless to say,
the material in Chapter 1 has been bettered by my having the opportunity to
present and discuss this material at these meetings, and in particular I would like
to thank Roy Cook, William Demopoulos, and Kai Wehmeier for a very helpful
discussion of this material subsequent to my talk in Irvine. I am also indebted
to several of my friends who have provided generous comments on this chapter,
including Andrew Arana, Sharon Berry, Sebastien Gandon, Christopher Porter,
and Iulian Toader. Finally, my understanding of Frege, logicism, and the partic-
ular topics which I treat in this chapter has been sharpened and deepened over
the years by my attending seminars by and discussing these matters with Patri-
cia Blanchette, whom I would particularly like to thank in this regard.
The Sixth Annual Midwest Philosophy of Mathematics Workshop was some-
what of a watershed event for me, both for the reasons mentioned above, and
because I heard a talk by Neil Tennant entitled “Natural Logicism,” which fo-
cused on the manner in which addition and multiplication were recoverable from
Hume’s Principle. It was in listening to Tennant’s talk that it first dawned on
me that there was something akin to a reverse mathematics project in the setting
of Hume’s Principle and Basic Law V. So subsequent to this talk, in the winter
months of 2005-2006, I started mapping out the provability and interpretability
relations among the predicative systems of Hume’s Principle and Basic Law V,
and here I would like to extend a special thanks to Christopher Porter, with whom
viii
I initially read and discussed some of the papers and books on this topic. Finally,
the material in this chapter has been improved by my having the opportunity
to present it to several gracious audiences. In particular, I would like to thank
Logan Axon, Joshua Cole, Stephen Flood, and Christopher Porter for listening
to me speak on this material in Dr. Cholak’s seminar, and I would like to thank
Antonio Montalban and the other organizers and participants in the University of
Chicago Logic Seminar, where I presented much of this material on March 2, 2009
under the title “Comparing Peano Arithmetic, Hume’s Principle, and Basic Law
V.” I would also like to thank Øystein Linnebo, Richard Pettigrew, and Albert
Visser, with whom I had some very helpful discussions of this material subsequent
to my arrival in Paris in summer 2009.
In regard to Chapter 4, I would like to thank Slawomir Solecki for several help-
ful discussions of this material. It was in conversations with him that it became
clear that the most natural way to treat the Denjoy integral was not in terms of
the measurable functions which are Denjoy integrable or the continuous functions
which are their indefinite integrals, but rather to focus on both notions simulta-
neously, as this was what was most naturally approximated by Borel notions. I
would also like to thank Dr. Solecki, and the other participants in the Urbana
Logic Seminar, for listening to me speak on this material on December 9, 2008 un-
der the title “Henstock-Kurzweil Integration: Descriptive Set Theory and Model
Theory.” Similarly, I would like to thank Steffen Lempp and the other partici-
pants in the Southern Wisconsin Logic Seminar, where I spoke on this material
on February 24, 2009, and likewise I would like to thank Alain Louveau and the
other participants in the Descriptive Set Theory working group at the Institute de
Mathematiques de Jussieu, where I spoke on this material on December 8, 2009.
ix
The material in Chapter 2 grew out of my attempt to try to understand what
was thought about the epistemology of the Peano axioms prior to Dedekind and
Frege’s seminal work published in the 1880s. Given some prior familiarity with
Kant, I was confident that Kant had never discussed mathematical induction, but
it was initially a great mystery for me to try to understand what exactly tran-
spired in the philosophy of arithmetic in the intervening century. I would thus
like to thank Paul Franks, Karl Ameriks, and Anja Jauernig, who first introduced
me to the philosophy of the post-Kantian period. It was when I was following up
the footnotes to Beiser’s The Fate of Reason that I first stumbled upon Kastner,
from whom I later encountered Fries (cf. Chapter 2, endnote 51). I would like to
thank David Rabouin and Sebastien Maronne for allowing me to present some of
this material on Kastner, Fries, and related figures at their Seminaire de travail
“Mathematiques a l’age classique” on December 2, 2009. Even though Kastner
and Fries don’t figure prominently in this final version of the dissertation, it was
trying to understand their idea that mathematical induction was epistemically
akin to enumerative induction which prompted me to write Chapter 2. An earlier
attempt at articulating the connection between mathematical induction and enu-
merative induction was presented at the First Paris-Nancy PhilMath Workshop
on October 21, 2009 under the title “The Justification of Mathematical Induc-
tion: The View from the 18th Century,” and I would like to thank Walter Dean,
Michael Potter, and Stewart Shaprio for a very helpful discussion subsequent to
that talk. Finally, I would like to thank my friends Andrew Arana and Iulian
Toader for several helpful discussions and comments on the material in this chap-
ter.
x
PREFACE
This dissertation is a study of arithmetical knowledge, arithmetical definabil-
ity, and the connections between them. Chapter 1 critically examines several
versions of a logicist account of arithmetical knowledge. The basic idea which
unifies these different accounts is that one may legitimately infer to knowledge
of arithmetical truths, like the Peano axioms, from knowledge of quasi-logical
principles like Hume’s Principle and the knowledge that the arithmetical truths
are representable in the logical truths. The focus of this first chapter is thus on
identifying some version of representation which would sustain this inference, and
my conclusion is that extant proposals are not successful in this regard. While
the notion of representation coming from the mathematical notion of interpreta-
tion seems like an important technical notion (and one which is further studied
in Chapter 3), this and related technical notions seem inadequate to the task of
providing a means by which to pass from knowledge of quais-logical truths like
Hume’s Principle to knowledge of arithmetical truths like the Peano axioms.
In Chapter 2, an alternative thesis is examined according to which knowledge
of arithmetical truths like the Peano axioms may be legitimately inferred from very
primitive arithmetical knowledge like 7+5=12 and the knowledge that these prim-
itive truths confirm these axioms. Here the notion of confirmation is a patently
probabilistic one, and thus the bulk of this chapter is focused on various challenges
which emerge when one tries to apply ordinary probabilistic notions to the setting
xi
of arithmetic. For instance, a probabilistic rule which initially seems quite natural
in the setting of arithmetic, namely a probabilistic version of the ω-rule, has the
consequence that arithmetical truth aligns with high probability. Likewise, unless
one assigns probability zero to some very basic arithmetical truths, all probability
assignments will be highly non-computable when represented in various standard
ways. Problems such as these obviously cast some initial doubt on the claim that
arithmetical knowledge can be based on knowledge of probabilities associated to
primitive arithmetical truths. The goal of this chapter is to offer responses to these
and other problems, and thus to at least secure the tenability of this empiricist
picture of arithmetical knowledge.
In Chapter 3, the mathematical portion of the dissertation begins, wherein
the overriding theme is arithmetical definability, where this roughly means defin-
ability without recourse to quantification over higher-order objects. In particular,
Chapter 3 is concerned with arithmetical definability in the setting of second-order
Peano arithmetic, Basic Law V and Hume’s Principle. Here the primary focus is
on the interpretability strength when conjoined with ∆11-comprehension, which is
standardly regarded as being on the outskirts of arithmetical definability. One of
the main results here is that there is a consistent extension of Basic Law V plus
∆11-comprehension which interprets ∆11 − CA0 (cf. Corollary 61). In contrast to
other known methods of constructing models of Basic Law V, this was done by
showing that the the minimal ω-model of ∆11 − CA0 is mutually interpretable with a
model of Basic Law V plus ∆11-comprehension (cf. Theorem 60). Likewise, in this
chapter it is shown that Hume’s Principle plus ∆11-comprehension is interpretable
in ACA0 but does not interpret ACA0 (cf. Corollary 99). This was done by building
a model of Hume’s Principle on a certain real closed field and by noting that the
xii
proof could be formalized in ACA0. Similar methods allow one to answer a question
of Linnebo (cf. Proposition 83 and Remark 81).
In Chapter 4, the focus of the dissertation shifts to arithmetical definability
in the setting of descriptive set theory, where the relevant demarcation line is
between Borel notions and analytic/co-analytic notions, the latter of which require
quantification over Polish spaces. Here the main question is whether a certain
integral which extends the Lebesgue integral, called the Denjoy integral, is Borel.
In this chapter it is shown that the relation “f is Denjoy integrable and F is
equal to its indefinite integral” is a co-analytic but not Borel relation on the
product space M [a, b] × C[a, b], where M [a, b] is the Polish space of real-valued
measurable functions on [a, b] and where C[a, b] is the Polish space of real-valued
continuous functions on [a, b] (cf. Corollary 195 and Figure 4.1). Using the same
methods, it is also shown that the class of indefinite Denjoy integrals is co-analytic
but not Borel relation on the space C[a, b], thus answering a question posed by
Dougherty and Kechris (cf. Corollary 197). In this chapter, some basic model
theory of the associated spaces of integrable functions are studied. Here the main
result is that, when viewed as an R[X]-module with the indeterminate X being
interpreted as the indefinite integral, the space of continuous functions on the
interval [a, b] is elementarily equivalent to the Lebesgue-integrable and Denjoy-
integrable functions on this interval.
Outside of the obvious topical similarities between Chapters 1-2 and Chap-
ters 3-4, I want to mention one final connection between the chapters of this
dissertation, a connection that I hope to explore further in subsequent work. Part
of idea of the type of logicism examined in Chapter 1 is that knowledge of the
Peano axioms is based in part on a knowledge that these axioms are representable
xiii
in terms of quasi-logical axioms like Hume’s Principle. If the relevant notion of
representation is taken to imply the technical notion of interpretation, then it is
natural to ask whether there are any interpretability results of this sort in settings
with limited amounts of comprehension. As mentioned above, in Corollary 99 of
Chapter 3, it is shown that the ∆11-comprehension version of Hume’s Principle
does not interpret the ∆11-comprehension version of the Peano axioms. Thus, this
suggests that the type of logicism considered in Chapter 1 has to additionally
defend the epistemic status of impredicative comprehension, i.e., comprehension
in which one quantifies over a higher-order object. In future work I hope to ex-
plore in more detail the philosophical implications of the technical results from
Chapter 3 for the tenability of the types of logicism which I consider in Chapter 1.
xiv
CHAPTER 1
LOGICISM, INTERPRETABILITY, AND KNOWLEDGE OF ARITHMETIC
1.1 Introduction: The Logicist Template
My topic in this chapter is the contention made by contemporary logicists
that knowledge of arithmetical principles may be based on knowledge of logical
principles. Here the notion of a “logical” principle is a rather loose one, and is
merely intended to convey the idea that the principle in question is epistemically
akin to modus ponens: it is apriori, it is analytic, etc. The primary example of a
principle which has been claimed to be logical in this sense is Hume’s Principle,
which roughly states that two properties have the same cardinality if and only if
they can be one-one correlated with each other, as the forks and knives on a well-
set dining room table can be correlated one-one with each other.1 Indeed, much of
the recent discussion of logicism has centered around Crispin Wright’s arguments
that Hume’s Principle is a logical principle in this sense.2 However, Wright and
other logicists are ultimately interested in Hume’s Principle because they think
that knowledge of it can account for our knowledge of arithmetic. Indeed, Wright
even says that “[. . . ] nothing can be essentially involved in the epistemology of
number theory that is not involved in an understanding, and knowledge of the
truth of Hume’s Principle” ([159] p. 366, [60] p. 255). My concern in this chapter
is with the question, which has been relatively neglected in the contemporary
1
literature, of how the logicist is able to contend that knowledge of principles like
Hume’s Principle can rationally sustain knowledge of arithmetical principles.
There are two respects in which this topic should be of interest to those with-
out prior interests in logicism. First, claims about the apriority or analyticity of
Hume’s Principle are independent of claims that knowledge of arithmetical prin-
ciples can be based on knowledge of Hume’s Principle. It is in these latter claims
that a template for how to acquire arithmetical knowledge is found, and it is
this template, which I call the Logicist Template, which shall be the focus of this
chapter. Indeed, since one can rationally endorse the Logicist Template without
endorsing the apriority or analyticity of Hume’s Principle, this template is poten-
tially of interest to those who are skeptical of or who even deny such claims of
apriority or analyticity. For instance, the Logicist Template would be of interest
to someone who thought that Hume’s Principle was aposteriori or synthetic, since
it would likewise show them how to proceed rationally from such knowledge to
knowledge of arithmetic.3
The second reason that this topic should be of interest to those without prior
interests in logicism is that the arithmetical principles in question, namely the
Peano axioms,4 are essential to contemporary mathematics. However, despite this,
contemporary philosophers of mathematics have had relatively little to say about
the epistemic status of the Peano axioms. For instance, among the Peano axioms
is the Mathematical Induction Principle, which says that if zero has a property
and if n + 1 has this property whenever n does, then all natural numbers have
this property. In his recent book Charles Parsons says of this principle: “Writers
on the foundations of arithmetic have found it difficult to state in a convincing
way why the principle of mathematical induction is evident” ([120] p. 264). Part
2
of what is important about the Logicist Template is thus that it is one of the few
contemporary accounts which explicitly addresses the question of the evidence for
mathematical induction.5
So in what follows, by the Logicist TemplateI shall mean the following schematic
claim: knowledge of arithmetical principles may be based on knowledge of logical
principles and the knowledge that these arithmetical principles can be represented
within the logical principles. This claim is schematic in two different respects.
First, it presupposes some specification of the arithmetical and logical principles
in question. Out of deference to the contemporary literature on logicism, in what
follows I shall assume that the arithmetical principles in question are the Peano
axioms and the logical principle in question is Hume’s Principle. However, none of
the points that I shall make in this chapter depend crucially on this specification.
The second sense in which the Logicist Template is schematic is that it pre-
supposes some antecedently specified notion of what it is for one set of principles
to be represented within another set of principles. In contemporary mathematical
logic, there are a number of notions of representation, which differ from one an-
other both in terms of what and how they represent. Some of these notions are
theory-based, wherein the key idea is that one theory is representable within an-
other if provability within the represented theory is matched by provability within
the representing theory. Others of these notions are structure-based, where the
key idea is that the represented structure be isomorphic to a structure definable
in the representing structure. In § 1.2, I review in more detail these differences
between the theory-based and structure-based notions of representation which are
found in contemporary mathematical logic.
While the versions of the Logicist Template which I consider in §§ 1.3-1.4 are
3
centered around these notions from contemporary mathematical logic, it is impor-
tant to emphasize that there is an obvious sense in which adopting such a perspec-
tive is both partially ahistorical and potentially limiting. For instance, while the
rudiments of the theory-based notion of representation seem to be present in tra-
ditional logicists such as Frege and Russell, it is not obvious that the same can be
said of the structure-based notions, simply due to the relatively recent provenance
of the model-theoretic ideas in terms of which the structure-based notions are
defined. Further, it is obvious but bears explicit mentioning that there is no rea-
son to think that the theory-based and structure-based versions of representation
considered here exhaust everything which might legitimately claim right to the
admittedly loose title of “a notion of representation.” However, in order to evalu-
ate the Logicist Template, some precise notion of representation must be offered,
and in this paper I focus on evaluating versions centered around the theory-based
and structure-based notions of representation from contemporary mathematical
logic.
In particular, in § 1.3, I consider theory-based versions of the Logicist Tem-
plate, in which it is contended that knowledge of arithmetical theory may be based
on knowledge of logical theory because the arithmetical theory may be represented
within the logical theory (in the manner of representation germane to theories).
My thesis here is that the theory-based versions of the Logicist Template cannot
exert an appropriate amount of control over the variety and scope of the proposi-
tions which are represented. In particular, two problems, which I call the plethora
problem and the consistency problem, show respectively that too much would be
counted as knowledge by this view or that inconsistent propositions would each
be counted as knowledge by this view.
4
In § 1.4, I turn to a version of the Logicist Template centered around a
structure-based notion representation. In particular, I articulate a specific structure-
based version of the Logicist Template and argue that it has the resources to over-
come the plethora problem and the consistency problem which beset the theory-
based versions. However, I argue that it does so at a certain cost, and that in
particular that it faces two problems pertaining to our knowledge of structures,
which I call the isomorphism problem and the signature problem. The isomorphism
problem is that this structure-based version requires knowledge of properties of
structures which are not invariant under isomorphism, a requirement which is
contrary to one way of making precise the thought that structures can only be
specified up to isomorphism. The signature problem is that knowledge of the sig-
nature of the natural numbers requires knowledge of the Peano axioms, so that
it seems that the structure-based version of the Logicist Template requires the
very knowledge which it seeks to deliver. While I will present reasons for thinking
that the isomorphism problem can be overcome, it is my view that the signature
problem poses a deep and presently unanswered challenge to the structure-based
version of the Logicist Template.
Hence, my overall conclusion in this chapter is that both the theory-based and
structure-based versions of the Logicist Template face deep problems, and hence
that hitherto no satisfactory version of the Logicist Template has been presented
which can secure the inference from knowledge of logical principles such as Hume’s
Principle to knowledge of arithmetical principles such as the Peano axioms. This,
of course, is not to say that this inference cannot be secured, but merely to point
out particular challenges to the extant proposals. Put positively, these challenges
give us a better picture of what a notion of representation must look like if it is
5
to sustain a viable version of the Logicist Template.
Before turning to an overview of the theory-based and structure-based versions
of representation which are found in contemporary mathematical logic, it is worth
underscoring the admittedly limited scope of the types of logicism that I consider
in this chapter. For, there are many important epistemic projects associated with
traditional logicists such as Frege and Russell– for instance, there is Frege’s idea
that arithmetical knowledge is more widely applicable than other types of math-
ematical knowledge.6 However, for the sake of being able to say something both
specific and brief, in this chapter I limit myself to an evaluation of the epistemic
strand of logicism which takes up Frege’s idea that mathematical induction is
“based on general logical laws” and Crispin Wright’s idea that Hume’s Principle
gives us a way to “apprehend the truth” of the Peano axioms.7
1.2 Background: The Interpretability of Theories and Structures
The goal of this section is to provide background on some of the notions of
representation which are to be found in contemporary mathematical logic. In the
tradition of mathematical logic, such representations are called “interpretations,”
and the reader who is already familiar with the notion of interpretability may
wish to proceed directly to § 1.3 and refer back to this section as needed. By way
of orientation, it is important to recall at the outset that part of the power of
mathematical logic resides in the fact that it moves back and forth between two
perspectives, one that is concerned with theories and proofs, and another that
is concerned with structures and definability. Hence, there are notions of inter-
pretability for theories and notions of interpretability for structures, and whereas
the former are centered around proof, the latter are centered around definability.
6
Structures and theories are both relative to formal languages or signatures, and
these are simply specifications of a class of constant symbols, relation symbols,
and function symbols. Given a signature, a structure then is simply a set along
with distinguished constants, relations, and functions on this set corresponding to
the symbols from the signature. Likewise, given a signature, a theory is simply
a collection of sentences in this signature. Natural examples of structures in this
sense are the real and complex fields, which are given in a signature containing
function symbols for addition and multiplication. Examples of theories in this
sense are the complete theories of the real and complex fields, i.e. the set of
all sentences in this signature which are true on these structure. The Zermelo-
Fraenkel axioms for set theory are another natural example of a theory, and in
this case the signature in question simply consists of the single binary relation
symbol corresponding to the membership relation.8 So, in this section, my goal is
to say what it means for one structure to be interpretable in another and what it
means for one theory to be interpretable in another.
The motivating idea behind the definition of the interpretability of one struc-
ture within another is that it is designed to generalize several classical construc-
tions from 19th Century mathematics. For instance, the field of complex numbers
is interpretable in the field of real numbers since the complex numbers can be
taken to be pairs of real numbers. Likewise, the real projective plane is inter-
pretable in the field of real numbers since the points of the real projective plane
can be taken to be equivalence classes (a,b,c)/E of non-zero triples (a, b, c) of real
numbers under the equivalence relation E of “being on the same line through the
origin”:
(a, b, c)E(x, y, z)⇐⇒ ∃ λ 6= 0 [a = λx & b = λy & c = λz] (1.1)
7
The notion of the interpretability of one structure in another generalizes these two
examples. In particular, if M is a structure, then a set X ⊆ Mn is definable in
M if there is a first-order formula ϕ(x), perhaps containing parameters from M ,
such that:
x ∈ X ⇐⇒M |= ϕ(x) (1.2)
Building on this, one says that a structure M is interpretable in a structure M∗
if it is isomorphic to a structure whose domain, constants, relations, and func-
tions are definable in M∗, perhaps using an equivalence relation for equality.
Here two structures in the same signature are said to be isomorphic if there is
a structure-preserving one-one map from the one onto the other.9 This is exactly
what happens with both the complex numbers and the real projective plane: the
complex numbers can be taken to be pairs of reals numbers and the points of the
real projective plane can be taken to be equivalence classes of non-zero triples of
real numbers.
To get a better sense for the types of distinctions that interpretability does
and does not recognize, it is helpful to define the notion of mutual interpretability.
Two structures are said to be mutually interpretable if each interprets the other.
One might initially think that mutually interpretable structures would have to
be very similar to each other, since each in some sense “contains” the other.
However, there are algebraic structures which are mutually interpretable with
geometric structures. For instance, the real numbers are mutually interpretable
with the Euclidean plane. One direction of this result is easy to see: the Euclidean
plane is interpretable in the real numbers since one can take points to be given
by their x- and y-coordinates and since one can take lines to be given by their
slope and y-intersection points. The other direction is non-trivial and is called
8
the “introduction of coordinates”: this result says that if one starts with the
points and lines of the Euclidean plane, then one can define notions of addition
and multiplication and thereby recover the real numbers.10 Hence, very natural
and traditional distinctions like the distinction between algebraic and geometric
structures cannot be recognized from the perspective of the interpretability of
structures.
Whereas the key role in the interpretability of structures is played by the
notion of definability, the key role in the interpretability of theories is played by
the notion of provability. In particular, one says that a theory T is interpretable
in a theory T ∗ if the primitives of the interpreted theory T can be translated into
formulas of the interpreting theory T ∗ so that the translation ϕ∗ of every theorem
ϕ of T is a theorem of T ∗. That is, the key idea is that the translation of theorems
are theorems:
T ` ϕ =⇒ T ∗ ` ϕ∗ (1.3)
For instance, the Zermelo-Fraenkel axioms for set theory interpret the Peano ax-
ioms for arithmetic because one can associate the arithmetical primitive “being
a natural number” with the set-theoretic formula “being a finite ordinal,” and
likewise one can associate “x < y” with “x ∈ y,” and “x = 0” with “x = ∅.” One
can then verify that the translations of arithmetical theorems are set-theoretic
theorems. For instance, it is a theorem of Peano arithmetic that no natural num-
ber is less than zero, and it is likewise a theorem of Zermelo-Fraenkel set theory
that no finite ordinal is contained in the empty set.11
It is important to note that this notion of a “translation” is not necessarily
imbued with any sense of “preservation of meaning,” but rather merely denotes a
mechanical method of transforming theorems of the interpreted theory into theo-
9
rems of the interpreting theory. One example which illustrates this phenomenon
nicely is when the interpreted and interpreting theories are the same. For instance,
take both the interpreted and interpreting theories to be the theory of a dense
linear order without endpoints. For the sake of concreteness, one can think of
these theories as the complete theory of the rational numbers (Q, <) as a linear
order.12 Then, if one translates “less than” by “greater than,” it is easy to see
that the translation of every theorem of the interpreted theory is a theorem of the
interpreting theory. For instance, it is a theorem of the interpreted theory that
the ordering is dense, i.e., between any two rational numbers is another rational
number:
∀ x, y [x < y → (∃ z x < z < y)] (1.4)
When translated, this theorem becomes the following theorem of the interpreted
theory:
∀ x, y [x > y → (∃ z x > z > y)] (1.5)
Hence, while translating “less than” by “greater than” preserves theoremhood and
hence provides us with an interpretation, it is clear that “less than” means some-
thing substantially different from “greater than.” This example illustrates that
translations used in interpretations are quite different in character from transla-
tions between natural languages.
It is instructive to contrast the interpretability of theories to the faithful inter-
pretability of theories. A theory T is said to be faithfully interpretable in a theory
T ∗ if T is interpretable in T ∗ so that translations of theorems are theorems and
so that translations of non-theorems are non-theorems:
T ` ϕ⇐⇒ T ∗ ` ϕ∗ (1.6)
10
It turns out that there are many examples of interpretations which are not faith-
ful interpretations. That is, while interpretability means that “provability facts”
about the interpreted theory are represented in the interpreting theory, it doesn’t
mean that the “non-provability facts” are so represented. For instance, the inter-
pretation of Peano arithmetic in Zermelo-Fraenkel set theory given above is not
a faithful interpretation because Peano arithmetic doesn’t prove its own consis-
tency, whereas Zermelo-Fraenkel set theory does prove the consistency of Peano
arithmetic. Such an example illustrates the fact that an interpreting theory may
be able to deduce more about the interpreted theory than the interpreted theory
itself can.13
A similar point about the potential incongruity of the interpreted and inter-
preting theories can be seen by switching to the perspective of structures. By the
completeness theorems, it is not difficult to see that one theory is interpretable
in another theory if and only if any structure that models the interpreting theory
uniformly interprets a structure that models the interpreted theory, where the
sense of “uniform” is that the same formulas are used each time to interpret the
models of the interpreted theory. Hence, to say that one theory is interpretable
in another is to say something about what happens in all the instantiations of
the interpreting theory. For instance, to say that axioms for arithmetic are inter-
pretable in axioms for set theory is to say something about what happens in every
model of set theory, and is not to say anything about what happens in all models
of arithmetic. So an interpreting theory may not be able to see all the models of
the theory which it interprets.
Finally, just as we defined the mutual interpretability of structures, so we can
define the mutual interpretability of theories and the mutual faithful interpretabil-
11
ity of theories. In particular, two theories are said to be mutually interpretable if
each interprets the other, and two theories are said to be mutually faithfully in-
terpretable if each faithfully interprets the other. Just as faithful interpretability
implies interpretability, so we have that mutual faithful interpretability implies
mutual interpretability. Hence, mutual faithful interpretability is the most re-
strictive notion of the interpretability of theories that we are considering here:
it implies but is not implied by the notions of mutual interpretability, faithful
interpretability, and interpretability.14
So in this section I have exposited two important families of notions of rep-
resentation. On the one hand, there are the notions of the interpretability and
mutual interpretability of structures. On the other hand, there are the notions of
the interpretability, faithful interpretability, mutual interpretability, and mutual
faithful interpretability of theories. From a certain perspective, this plurality of
notions of interpretability is of course exactly what one would expect. Hodges
makes this point with characteristic eloquence when he says: “Interpretations are
about different ways of looking at one and the same thing. So it should cause no
surprise that there are several different ways of looking at interpretations” ([70]
p. 201).
1.3 Theory-Based Versions: the Plethora and Consistency Problems
In the previous section, I distinguished between a family of theory-based no-
tions of representation and a family of structure-based notions of representation.
My concern here is to evaluate theory-based versions of the Logicist Template,
i.e., claims to the effect that knowledge of arithmetical theory may be based on
knowledge of logical theory and the knowledge that this arithmetical theory is
12
representable qua theory in the logical theory. In the next section, a structure-
based version of the Logicist Template will be considered. My thesis in this section
about the theory-based versions of the Logicist Template concerns the inability
of these versions to control the kinds of propositions which are representable qua
theories. In particular, each theory-based version of the Logicist Template seems
vulnerable to one of two problems, which I call the plethora problem and the
consistency problem. These problems are respectively that too much would get
counted as knowledge by these versions or that these versions would count both
a proposition and its negation as knowledge.
It is important to be clear that in this section I am considering a family of
theory-based versions of the Logicist Template, corresponding to the family of
theory-based notions of representation discussed in the previous section. In par-
ticular, there I distinguished between four different theory-based notions: the in-
terpretability, faithful interpretability, mutual interpretability, and mutual faithful
interpretability of theories. With respect to each such notion of interpretation X,
I want to here examine the following version of the Logicist Template:
Theory-Based Version of the Logicist Template (Relative to X-Interpretability):knowledge of arithmetical theory may be based on knowledge of logical the-ory and the knowledge that this arithmetical theory isX-interpretable withinthe logical theory.
My thesis about the failure of control applies across the board to all four theory-
based versions of the Logicist Template: in each case, either the plethora problem
or the consistency problem allows us to point to specific examples of propositions
which would on this account be counted as knowledge but which are neither
obviously nor necessarily known.
Before turning to these examples, let me briefly note one place where a promi-
nent logicist seems to endorse something quite similar to a theory-based version of
13
the Logicist Template. The following is a passage which Wright repeats verbatim
in two different essays:
The neo-Fregean thesis about arithmetic is that a knowledge of its funda-mental laws (essentially, the Dedekind-Peano axioms)– and hence of theexistence of a range of objects which satisfy them– may be based on Hume’sPrinciple as an explanation of the concept of cardinal number in general,and finite cardinal number in particular. More specifically, the thesis in-volves four ingredient claims: [¶] (i) that the vocabulary of higher-orderlogic plus the cardinality operator, octothorpe [#] or ‘Nx: . . . x. . . ’, providesa sufficient definitional basis for a statement of the basic laws of arithmetic;[¶] (ii) that when they are so stated, Hume’s Principle provides for a deriva-tion of those laws within higher-order logic [. . . ] ([160] p. 389, [161] p. 17,[60] pp. 256, 321).
It seems to me that the key idea expressed in these two roman numerals is that
(i) there is a way of translating arithmetical primitives into formulas about car-
dinalities, and that (ii) all the axioms of Peano arithmetic become theorems of
Hume’s Principle when so translated. This, of course, implies that the transla-
tion of theorems of Peano arithmetic are theorems of Hume’s Principle, which by
definition is what it means for the Peano axioms to be interpretable in Hume’s
Principle. Hence, it seems that what Wright is here suggesting is that knowledge
of the Peano axioms may be based on knowledge of Hume’s Principle because
Hume’s Principle interprets the Peano axioms.
However, there are some non-trivial difficulties involved in explicating the key
notion of a “sufficient definitional basis” which figures in component (i) of Wright’s
neo-Fregean thesis. I have understood it to mean simply a method of associating
arithmetical primitives to formulas about cardinalities. Hence, I have understood
components (i) and (ii) of Wright’s neo-Fregean thesis to mean that the inter-
pretability of the Peano axioms within Hume’s Principle is sufficient for knowl-
edge of the Peano axioms to be based on knowledge of Hume’s Principle. My
14
primary textual evidence for this can be found in Wright’s remarks in his essay
“Is Hume’s Principle Analytic?” ([161], [60] p. 307 ff). For instance, consider the
parenthetical remark which Wright makes at the opening of his essay:
The interest– if indeed any– of the question whether the principle [Hume’sPrinciple] is analytic is wholly consequential on what has come to be knownas Frege’s Theorem: the proof [. . . ] that second-order logic plus Hume’sPrinciple as sole additional axiom suffices for a derivation of second-orderarithmetic– or, more cautiously, for the derivation of a theory which allowsof interpretation as second-order arithmetic. (Actually I think the cautionis unnecessary– more of that later) ([161] p. 6, [60] p. 307).
I take it that, in this parenthetical remark, Wright is indicating that the inter-
pretability of the Peano axioms within Hume’s Principle is sufficient for his philo-
sophical purposes, which as he later states involves an account of our knowledge
of the Peano axioms, or as he calls them, “the fundamental laws of arithmetic.”
This seems to be confirmed when Wright later explicitly extrapolates on his
understanding of this key notion of a “sufficient definition basis” which figures in
component (i) of his neo-Fregean thesis. If I understand this key passage correctly,
Wright contends in the last sentence of the below quotation that interpretability
of the Peano axioms in Hume’s Principle suffices for our knowledge of “pure arith-
metic,” such as we find in number theory:
No question of course but that Frege shows how to define expressions whichcomport themselves like those for successor, zero, and the predicate ‘naturalnumber,’ thus enabling the formulation of a theory which allows of interpre-tation as Peano arithmetic. But– as we remarked right at the start– it isone thing to define expressions which, at least in pure arithmetical contexts,behave as though they express those various notions, another to define thosenotions themselves. [. . . ] How is the stronger point to be made good? [¶]Well, I imagine it will be granted that to define the distinctively arithmeti-cal concepts is so to define a range of expressions that the use thereby laiddown for those expressions is indistinguishable from that of expressions whichdo indeed express those concepts. The interpretability of Peano arithmeticwithin Fregean arithmetic ensures that has already been accomplished as faras all pure arithmetical uses are concerned ([161] pp. 17-18, [60] p. 322).
15
In the text immediately following this quotation, Wright goes onto discuss reasons
why Hume’s Principle can account for our knowledge of “applied arithmetic,” such
as e.g. my knowledge that I can infer from “there are exactly two F ’s” to “there
are distinct x, y which are F and everything that is an F is x or y.” But, in
any case, it is this last key sentence of the above quotation, namely that “the
interpretability of Peano arithmetic within Fregean arithmetic ensures that has
already been accomplished as far as all pure arithmetical uses are concerned,”
which is my primary evidence for understanding Wright as a proponent of a theory-
based version of the Logicist Template.
This of course is not to say that there are no passages in Wright which indicate
a sympathy for other versions of the Logicist Template. For instance, a few pages
earlier in this same essay, Wright says: “To be sure, it is a necessary condition of
the success of the neo-Fregean project that the relevant principle does more than
generate a theory within which arithmetic can be interpreted– there has to be
a tighter conceptual relationship than that” ([161] p. 15, [60] p. 317). However,
Wright does not say here what more is required or why more is required, and a
few pages later he goes onto make his remark that interpretability suffices “as far
as all pure arithmetical uses are concerned.” That is, if I understand correctly,
Wright ultimately endorses the contention that knowledge of “pure” statements of
arithmetic, such as the Mathematical Induction Principle, may be based entirely
on knowledge of Hume’s Principle and the knowledge that the Peano axioms are
interpretable in Hume’s Principle.
I want now to turn towards the evaluation of the claim that knowledge of
Hume’s Principle can be extended to knowledge of the Peano axioms by virtue of
the interpretability of the latter within the former. Of course, it is demonstrable
16
that Hume’s Principle interprets the Peano axioms. This result is sometimes
called Frege’s Theorem. The elements of the proof of this theorem can be found
in Frege’s writings, and the rediscovery of this theorem by Wright constitutes an
important contribution to our understanding of both traditional and contemporary
logicism.15 So my concern here is only with how to understand the philosophical
consequences of Frege’s theorem. In particular, I want now to consider the viability
of the following theory-based version of the Logicist Template: knowledge of an
arithmetical theory such as the Peano axioms may be based on knowledge of a
logical theory such as Hume’s Principle and the knowledge that this arithmetical
theory is interpretable in this logical theory.
One problem with this version of the Logicist Template, which I will dub
the plethora problem, has been voiced in different ways by Richard Heck and
Thomas Hofweber, and even earlier by Walter Hoering, although Hoering and
Hofweber are concerned with intertheoretic reduction and not with logicism in
particular.16 The plethora problem stems from the fact that many theories are
interpretable in the Peano axioms. For instance, it is well-known from the work
of Tarski that the complete first-order theory of the real and complex numbers
are interpretable in the Peano axioms.17 Since the interpretability of theories is
a transitive relation, Frege’s theorem implies that the complete theories of the
real and complex numbers are interpretable in Hume’s Principle. However, it
would seem strange to suggest that these theories can come to be known by way
of an interpretability result. For instance, one of the axioms of the complex
numbers is the Fundamental Theorem of Algebra, which asserts that every non-
zero polynomial in one variable has a root. The proofs of this theorem which
mathematicians accept and teach to their students are all non-trivial, and typically
17
require appeal to limits or to topological notions, each of which must be studied in
its own right before one can begin to understand these proofs of the Fundamental
Theorem of Algebra.18 It would seem counterintuitive to suggest that all of this
could be circumvented by appeal to a comparatively elementary interpretability
result. Hence, the plethora problem is that too much knowledge is generated by
the claim that knowledge of one theory can be based on knowledge of a theory
which interprets it.
One response to the plethora problem is simply to accept it and to strengthen
the notion of interpretation so as to avoid these sorts of counterexamples.19 In par-
ticular, thus far I have been considering a version of the Logicist Template which
claims that knowledge of arithmetical principles may be based on knowledge of
principles which interpret these arithmetical principles. Let us now consider a
more circumspect version of the Logicist Template which claims that knowledge
of arithmetical principles may be based on knowledge of principles which faithfully
interpret these arithmetical principles. Recall from the previous section that faith-
ful interpretability not only requires that translations of theorems are theorems,
but also that translations of non-theorems are non-theorems. One might initially
suspect that the plethora problem could be avoided by requiring that the inter-
preting theory know only as much about the interpreted theory as the interpreted
theory does itself.
However, it turns out that this is not the case. In particular, Tarski’s result was
that the complete theories of the real and complex numbers are interpretable in
the Peano axioms, and hence in Hume’s Principle. The theorems of the complete
theory of the complex numbers are by definition precisely the true statements
about the complex numbers, and the non-theorems are precisely the false state-
18
ments about the complex numbers. Hence, since the negations of false statements
are true statements, it follows that the negations of non-theorems are theorems in
this setting. Since translations preserve negations, it automatically follows that
the translations of non-theorems are non-theorems. Hence, the complete theories
of the real and complex numbers are faithfully interpretable in the Peano axioms,
and hence in Hume’s Principle. So, the plethora problem applies with equal force
to faithful interpretability as to interpretability itself.
This raises the question of whether a theory-based version of the Logicist Tem-
plate centered around mutual interpretability fares any better with respect to the
plethora problem than do versions based on interpretability and faithful inter-
pretability. Recall from the previous section that two theories are mutually inter-
pretable if each can interpret the other. Hence, I want to now consider a version
of the Logicist Template which says that knowledge of arithmetical principles may
be based on knowledge of principles which are mutually interpretable with these
arithmetical principles. It turns out that one can in fact avoid the plethora prob-
lem in this way. For instance, it is known from other parts of Tarski’s work on the
decidability of theories that the complete theories of the real and complex num-
bers are not mutually interpretable with the Peano axioms.20 Moreover, not only
can the plethora problem be avoided in this setting, but the analogous version of
Frege’s Theorem holds: in particular, it follows from work of Boolos that Hume’s
Principle and the Peano axioms are mutually interpretable.21,22
However, a different problem, which I call the consistency problem, besets
theory-based versions of the Logicist Template centered around mutual inter-
pretability. Such versions claim that knowledge of arithmetical principles may
be based on knowledge of principles which are mutually interpretable with these
19
arithmetical principles. The consistency problem is that the same sort of evidence
can provide us with “knowledge” of the negation of some of these arithmetical
principles. In particular, let us call the anti-Peano axioms the Peano axioms but
with the Mathematical Induction Principle replaced by its negation. Then, it
is essentially implicit in the work of Dedekind that the anti-Peano axioms are
mutually interpretable with the Peano axioms.23 Since mutual interpretability
is an equivalence relation, it follows from Frege’s Theorem that the anti-Peano
axioms are mutually interpretable with Hume’s Principle. Hence, if knowledge of
the Peano axioms may be based on Hume’s Principle because Hume’s Principle is
mutually interpretable with the Peano axioms, then presumably it is likewise true
that “knowledge” of the anti-Peano axioms may be based on Hume’s Principle
because Hume’s Principle is mutually interpretable with the anti-Peano axioms.
However, presumably it is absurd to suggest that both the Mathematical Induction
Principle and its negation can be known.
It turns out that the consistency problem is just as much a problem for mutual
faithful interpretability as it is for mutual interpretability. For instance, take the
theory T which is the theory of a dense linear order plus the axiom that one and
only one of the following possibilities occurs: either there is a least element and
no greatest element, or there is a greatest element and no least element. Then
by interpreting “greater than” by “less than,” one can easily see that T + ϕ and
T + ¬ϕ are mutually faithfully interpretable, where ϕ is the sentence saying that
there is a greatest element. Hence, it seems that it is just wrong to claim that
knowledge of one theory may be based on knowledge of a theory which is mutually
interpretable with that theory or which is mutually faithfully interpretable with
that theory. For, while this claim may support an inference from knowledge of
20
Hume’s Principle to knowledge of the Peano axioms, it does so at the cost of
failing to respect the dictum that one cannot know both a proposition and its
negation.24
It is helpful to contrast the consistency problem to the plethora problem. The
basic idea behind the plethora problem was that representations were too easy to
come by and hence that too much gets counted as knowledge if one claims that
knowledge of one theory may be based on knowledge of a theory which represents
that theory. By contrast, the consistency problem notes that such a claim leads
us to violate the basic principle that one cannot know both a proposition and
its negation. However, despite their differences, each of these two problems is
illustrative of a general problem of control: if one wants to claim that knowledge
can be passed along the kinds of theory-based representations considered here,
then one is faced with the problem that one cannot properly control the scope
and variety of the claims that get passed along. The plethora problem was that
too many things get counted as knowledge. The consistency problem was that
both a proposition and its negation would get counted as knowledge.
While it has been known for a long time that there are theories T and sentences
ϕ such that T + ϕ and T + ¬ϕ are mutually interpretable (or indeed mutually
faithfully interpretable), the relevance of this for logicism does not seem to have
been previously noted. However, its relevance for other programs in the philosophy
of mathematics has been previously discussed. For instance, Edward Nelson had
the idea of characterizing a very constructive theory of arithmetic as the collection
of all those sentences of arithmetic which were mutually interpretable with a very
weak set of arithmetical principles. But Nelson noted and posed the problem of
determining whether the conjunction of two sentences have this property whenever
21
the two sentences themselves individually have this property. Later, it was found
that both a sentence and its negation could have this property, thus undercutting
the idea that collection of all sentences with this property could have constituted
a theory of arithmetic in the first place.25
Another place in the philosophy of mathematics where the consistency prob-
lem has been noted is in discussions of set theory. In particular, some set theorists
have appealed to a theorem of Guaspari and Lindstrom, which says that finite ex-
tensions of the Zermelo-Frankel axioms for set theory are mutually interpretable if
and only if they prove exactly the same Π01-sentences.26 Here a Π0
1-sentence is sim-
ply a sentence which begins with a universal quantifier over natural numbers and
all of whose other quantifiers are bounded to natural numbers mentioned earlier
in the sentence. So, for example, Goldbach’s conjecture and the consistency state-
ments from Godel’s second incompleteness theorem are examples of such sentences.
There is a long tradition, stemming from Hilbert’s Program,27 of privileging such
sentences, and recently Peter Koellner has suggested a new variation on this idea.
Koellner’s idea is that the Π01-sentences are exactly the observational sentences,
so that the Guispari-Lindstrom theorem implies that while two mutually inter-
pretable set theories may disagree vastly about the nature of sets, they must of
necessity have the same observational consequences ([93] p. 98). However, there
does not seem to be any analogue of Koellner’s idea that is available to the logi-
cist. For, the logicist is interested in interpretations between theories in different
signatures, whereas the Guaspari-Lindstrom theorem only applies to extensions of
a fixed theory in a fixed signature. Further, since most of the sentences the logicist
is interested in, such as the claim that every natural number has a successor, are
not Π01-sentences, they would not be covered by the Guaspari-Lindstrom theorem
22
in the first place.
So in this section I have described how the problem of control emerges for
versions of the Logicist Template centered around four distinct notions of the
interpretability of theories: interpretability, faithful interpretability, mutual inter-
pretability, and mutual faithful interpretability. It is tempting to infer from this
that there is a problem of control for all versions of the Logicist Template cen-
tered around theory-based notions of representation. However, given the inherent
open-endedness of this notion of “theory-based,” it seems hard to substantiate
this claim. At best it seems that one can say something about what a notion of
representation must look like if it is going to sustain the Logicist Template. In
particular, the plethora problem shows us that such a notion of representation
can’t be such that many different theories are interpretable in any one given the-
ory. Likewise, the consistency problem shows that one can’t have cases where the
notion of representation in question fails to distinguish between a proposition and
its negation. To the best of my knowledge, all the known theory-based notions
of representations fail to meet one of these two conditions. However, this is no
reason to suspect that everything which has a right to the title of “a theory-based
version of the Logicist Template” will likewise fail.
In particular, one way in which the advocate of the Logicist Template might
respond to the criticism which I have been offering in this section is to present a
new description of the relationship between Hume’s Principle and the Peano ax-
ioms. Frege’s Theorem tells us that the Peano axioms are interpretable in Hume’s
Principle, but this does not preclude these two theories from being linked by some
stronger notion of interpretability, a notion which perhaps avoids the plethora
problem and the consistency problem. To the extent that this could be accom-
23
plished, it would be possible for the advocate of the Logicist Template to be in
complete accord with everything which I have said in this section. For, the most
general moral of the plethora problem and the consistency problem is that the
inference from knowledge of logical principles to knowledge of arithmetical prin-
ciples has to be based on something more than a knowledge of interpretability
(or mutual interpretability, or faithful interpretability, or faithful mutual inter-
pretability). For all that has been said in this section, it may be the case that this
something more is simply knowledge of a stronger theory-based interpretability
relation that links logical theory to arithmetical theory.
1.4 Structure-Based Version: the Isomorphism and Signature Problems
The goal of this section is to articulate and examine a single structure-based
version of the Logicist Template. This structure-based version is centered around
a notion of dual interpretability, which incorporates both theories and structures.
Subsequent to defining this notion, I describe how incorporating structures allows
this version to avoid the plethora and consistency problems. However, precisely
because it incorporates structures, the structure-based version of the Logicist Tem-
plate must now also account for our knowledge of structures, and this comes with
its own problems. In particular, I describe the isomorphism problem, which notes
that while the structure-based version requires knowledge of the properties of
structures which are not invariant under isomorphism, one might have the intu-
ition that all our knowledge of structures is so invariant. Likewise, I describe the
signature problem, which suggests that our knowledge of the signature of the nat-
ural numbers requires prior knowledge of arithmetical truths, and, in particular,
knowledge of arithmetical truths that the structure-based version was originally
24
designed to deliver. While I present reasons in this section for thinking that the
isomorphism problem can be overcome, I will argue in this section that the signa-
ture problem poses a deep challenge to the structure-based version of the Logicist
Template.
The motivating idea behind dual interpretability is not to relate individual
theories to individual theories or individual structures to individual structures,
but rather to relate one pairing of a theory with a structure to another pairing
of a theory with a structure. Such a pairing of a theory with a structure shall
be indicated by saying that the theory is about the structure. This locution is
introduced purely for the purpose of avoiding the cumbersomeness of speaking
of ordered pairs of theories and structures. Further, when I speak in this way
of a theory being about a structure, all that shall be assumed is that the theory
and the structure both have the same signature, and it shall not for instance be
assumed that the theory is true of the structure.
Having this bit of notation in place, the notion of dual interpretability can
now be defined. In particular, let us say that theory T about structure M is dual
interpretable in theory T ∗ about structure M∗ if (i) theory T is interpretable in
theory T ∗ and if (ii) structure M is interpretable in structure M∗, and if (iii) the
definitions used in both interpretations are the same. That is, the way in which
models of T ∗ uniformly interpret models of T is exactly the same way in which M∗
interprets M . In effect, the notion of a dual interpretability is a kind of pre-
established harmony of the interpretability of theories and structures. For, it
consists simply in the interpretability of theories on the one side and the inter-
pretability of structures on the other side, together with the added stipulation
that these two interpretations match up with one another.
25
This notion of dual interpretability allows for the definition of the following
structure-based version of the Logicist Template: knowledge that the arithmetical
principles are true of the natural numbers may be based on knowledge that the
logical principles are true of their subject-matter and the knowledge that these
arithmetical principles about the natural numbers are dual interpretable in the
logical principles about their subject-matter. Again, deferring to the contempo-
rary literature, we can take the arithmetical principles to be the Peano axioms and
the logical principles to be Hume’s Principle. Since Hume’s Principle says that
two properties are assigned the same cardinality if and only if they can be one-one
correlated with each other, I shall use the phrase “the cardinalities” to designate
the structure associated to Hume’s Principle. The structure-based version of the
Logicist Template then reads as follows:
Structure-Based Version of the Logicist Template: knowledge that the Peanoaxioms are true of the natural numbers may be based on knowledge thatHume’s Principle is true of cardinalities and the knowledge that the Peanoaxioms about the natural numbers are dual interpretable in Hume’s Principleabout cardinalities.
It is this version of the Logicist Template which shall occupy us throughout the
remainder of this chapter. But, as in the case of theory-based versions of the
Logicist Template, I do not mean to suggest that focus should be put on this
version because it is the only thing that might reasonably lay claim to the title of
a “structure-based version of the Logicist Template.” On my view, this particular
structure-based version merits our attention because it is able to avoid the prob-
lems which beleaguered the theory-based versions, namely the plethora problem
and the consistency problem.
This structure-based version is able to overcome the plethora problem because
it requires that there be an interpretation on the level of structures in addition to
26
an interpretation on the level of theories. The plethora problem was that an appeal
to interpretability would entitle us to more knowledge than we have legitimately
earned, and this structure-based version directly blocks this problem by crediting
us with knowledge only when there is knowledge of interpretability both at the
level of theories and the level of structures. To return to an example from the
previous section, it seems that knowledge of the Fundamental Theorem of Algebra
can be legitimately won by means of knowledge of a dual interpretability result,
viz. knowledge that the theory of the complex numbers about the structure of
the complex numbers is dual interpretable in the Peano axioms about the natural
numbers. For, this knowledge additionally involves being able to identify complex
numbers with certain sets of natural numbers. This, of course, is exactly the
route that mathematicians do take to establishing this theorem: they identify real
numbers with certain classes of natural numbers (Cauchy sequences, Dedekind
cuts), and then they identify complex numbers with pairs of real numbers, and
then they proceed to the Fundamental Theorem of Algebra by way of a study of
the analytic properties of the real and complex numbers which are expressible by
means of this identification. That is, if one attends to the actual proofs endorsed
by mathematicians, one sees quite immediately that the Fundamental Theorem of
Algebra is not a theorem of algebra but a theorem of analysis, since the theorem
is proved by recourse to analytic notions, which are made available by means of
interpretations on the level of structures. So, it is because this structure-based
version of the Logicist Template insists upon a knowledge of the interpretability of
both theories and structures that it can avoid the counterexamples which originally
attuned us to the plethora problem.28
The consistency problem revolved around examples of theories T and sen-
27
tences ϕ such that T + ϕ and T + ¬ϕ were mutually interpretable with one
another, so that the theory-based version of the Logicist Template was committed
to endorsing both a sentence and its negation. However, there is good reason to
think that this problem does not reemerge in the structure-based setting. In order
to see this, it is first necessary to state the following elementary result about dual
interpretability:
Elementary Result about Dual Interpretability: if theory T ∗ is true of struc-ture M∗ and if theory T about structure M is dual interpretable in theory T ∗
about structure M∗, then theory T is true of structure M .
That is, in the setting of dual interpretability, one can show that if the interpreting
theory is true of the interpreting structure, then the interpreted theory is true of
the interpreted structure. Hence, in the sense of dual interpretability, one can
actually demonstrate that truth is preserved downward under interpretability.29
It is easy to see how this elementary result allows us to overcome the consis-
tency problem. This problem was that there were natural examples of theories T
and sentences ϕ such that the two theories T +ϕ and T +¬ϕ were mutually inter-
pretable with one another and such that these two theories intuitively concerned
the same subject-matter, e.g. they were rival claims about natural numbers. How-
ever, the elementary result from the previous paragraph shows that these sorts of
examples cannot occur in this setting. For, suppose that T + ϕ and T + ¬ϕ are
both about the same structure M . Then it cannot be the case that (i) T is true
of M and that (ii) T + ¬ϕ about M is dual interpretable in T + ϕ about M , and
that (iii) T +ϕ about M is dual interpretable in T +¬ϕ about M . For, by (i), ei-
ther T+ϕ is true of M or T+¬ϕ is true of M . But, by (ii), if T+ϕ was true of M ,
then the elementary result from the previous paragraph tells us that T+¬ϕ would
be true of M , which is a contradiction. Likewise, by (iii), if T +¬ϕ was true of M ,
28
then this elementary result tells us that T +ϕ would be true of M , which is a con-
tradiction. Hence, what the elementary result from the previous paragraph tells
us is that by pinning everything down to a specific structure, the structure-based
version of the Logicist Template can avoid the consistency problem.30
It is also important to mention that this same elementary result provides us
with an explanation of why dual interpretability can be a source of knowledge.
For, it tells us that once one has a dual interpretability result, one may proceed in
a straightforwardly deductive manner from knowledge that the interpreting prin-
ciples are true of the interpreting structure to the knowledge that the interpreted
principles are true of the interpreted structure. In the previous section, the focus
was on counterexamples to theory-based versions of the Logicist Template and
the question was not even broached of what positive reasons one could adduce for
thinking that a theory-based interpretability result could be viewed as a source
of knowledge. Here, with respect to the structure-based version of the Logicist
Template, one can straightforwardly say why there is not anything mysterious
about how interpretability can be a source of knowledge: the mechanism that lets
us pass from knowledge of the interpreting theory to knowledge of the interpreted
theory in this setting is simply deduction.
I want now to discuss two problems with the structure-based version of the
Logicist Template, which I call the isomorphism problem and the signature prob-
lem. Both of these problems revolve around our knowledge of structures, and in
essence both problems arise when one starts to ask about the extent to which
knowledge of structures differs from knowledge of theories. It seems relatively
straightforward to speak of knowledge of theories, or knowledge that a theory is
true of a structure. For instance, it seems non-problematic to say that the recent
29
literature on the epistemic status of Hume’s Principle has focused on providing an
account of our knowledge of this theory, or perhaps an account of our knowledge
that this theory is true of the structure of cardinalities. However, when one begins
to speak of knowledge of structures that goes above and beyond knowledge of the
truths which they model, it seems that the picture becomes more opaque. The
idea behind both the isomorphism problem and the signature problem is that they
expose tensions between the transparency of our knowledge of structures and the
structure-based version Logicist Template. In the case of the isomorphism prob-
lem, I will indicate my reasons for thinking that this tension can ultimately be
tempered. However, I can presently see no way in which to alleviate this tension
in the case of the signature problem, and so on my view this problem poses a deep
and hitherto unanswered challenge to this structure-based version of the Logicist
Template.
The isomorphism problem is that the structure-based version of the Logicist
Template is inconsistent with a certain way of setting out an intuition about the
isomorphism of structures. Recall from § 1.2 that two structures are said to be
isomorphic if there is a structure-preserving one-one map from the one onto the
other (cf. endnote 9). One intuition which one might have about structures and
isomorphisms is that structures can only be specified up to isomorphism. The
content of this intuition might be explicated in terms of what I call the following
invariance thesis: any property that we know a structure to have is also had
by all isomorphic copies of this structure, regardless of whether or not we know
that this property is had by all these isomorphic copies.31 For instance, one
kind of knowledge of structures which we ostensibly have is that various first-
order sentences are true of structures, and since isomorphism preserves first-order
30
truth, this knowledge extends to isomorphic copies in the manner mandated by
the thesis. In essence, the invariance thesis predicts that our knowledge that
sentences are true of structures is paradigmatic of our knowledge of structures in
general.
To see the way in which this invariance thesis conflicts with the structure-based
version of the Logicist Template, it suffices to recall what is involved in the claim
that one structure is interpretable in another. In § 2, I presented the standard
definition of this notion, viz. that one structure is interpretable in another if it is
isomorphic to a structure whose domain, constants, relations, and functions are
definable in the second. For the sake of brevity, let us say that one structure is
definable in a second structure if its domain, constants, relations, and functions are
definable in the second. Hence, in this terminology, the standard definition reads
as follows: one structure is interpretable in another if it is isomorphic to a structure
definable in the second. It is easy to see that the interpretability relation is itself
invariant under isomorphism, in that if one structure is interpretable in a second,
then any isomorphic copy of the first is interpretable in any isomorphic copy of the
second. Hence, knowledge of the interpretability relation per se does not conflict
with the invariance thesis described above, since if I know that a structure is
interpretable in another, then anything isomorphic to the first is interpretable in
anything isomorphic to the second.
However, knowledge that one structure is interpretable in another involves
knowledge that the first structure is isomorphic to a structure which is definable
in the second, and definability is demonstrably not invariant under isomorphism.32
In particular, if one structure is definable in another, then it is simply false that
any isomorphic copy of the first is definable in any isomorphic copy of the sec-
31
ond.33 There are many ways to see this, but perhaps the most perspicuous way is
to adopt a set-theoretic perspective, and to note that while definability requires
that the defined structure be enumerated into the cumulative hierarchy immedi-
ately after the defining structure, it is nonetheless easy to construct isomorphic
copies that get enumerated at arbitrarily high stages. Hence, since definability is
not invariant under isomorphism, it follows that knowledge of definability is incon-
sistent with the invariance thesis. For, suppose that one structure is definable in
the second, and consider the property of “being definable in the second structure.”
If one knows that the first structure has this property, then the invariance thesis
requires that all isomorphic copies of the first structure have this property, which
is simply false. Likewise, if one knows that the second structure has the property
of “defining the first structure,” then the invariance thesis requires that all iso-
morphic copies of the second structure have this property, which is likewise false.
So this is why the invariance thesis is inconsistent with knowledge of definability
and hence with the structure-based version of the Logicist Template.
From a certain historical perspective, this is perhaps what one would expect.
For, thinking of Dedekind’s letter to Weber and Benacerraf’s and Cassirer’s criti-
cisms of Frege, it is not hard to convince oneself that structuralism was historically
borne of a rejection of logicism.34 Hence, to the extent that the invariance the-
sis could be counted as a structuralist thesis, one might have expected it to be
inconsistent with the structure-based version of the Logicist Template. However,
despite the fact that the invariance thesis is one way of rendering precise the in-
tuition that structures can only be specified up to isomorphism, it seems clear
that not all contemporary authors who identify themselves as structuralists are
ultimately committed to this thesis. For instance, while Resnik seems to endorse a
32
highly qualified version of the invariance thesis, other contemporary structuralists
such as Shapiro and Parsons do not seem committed to any form of this thesis.35
Indeed, if there is a thesis about structures that unites contemporary structuralists
such as Resnik, Shapiro, and Parsons, it is not any precisification of the intuition
that one can only specify structures up to isomorphism, but rather the thesis that
judgments about the identity and non-identity of mathematical objects are only
legitimate relative to some antecedently specified background structure.36 Unlike
the invariance thesis, this thesis about relativity is, as far as I can see, entirely
consistent with the structure-based version of the Logicist Template.
However, a robust defense of the structure-based version of the Logicist Tem-
plate requires that some positive reason be given for rejecting the invariance thesis,
and I think that such a reason is in fact available. In particular, while the picture
of our knowledge of structures which the invariance thesis recommends may not
be far from the truth with respect to intrinsic properties of structures, this pic-
ture is entirely inaccurate when it comes to the relational properties of structures.
While I do not claim to be able to provide an analysis of the notions of intrinsic
and relational, I can point to several examples which collectively cover most of
the properties of structures which have hitherto been studied. The paradigmatic
examples of intrinsic non-relational properties of structures include (i) whether a
given sentence is true or false of the structure, (ii) whether the structure has a
non-trivial automorphism, (iii) whether the structure has a non-trivial substruc-
ture. Paradigmatic examples of relational non-intrinsic properties of structures
include the following: (iv) one structure whose domain is a subset of natural
numbers being Turing computable in another structure whose domain is a subset
of natural numbers, (v) one structure being contained in the set-theoretic con-
33
structible universe relative to another structure, (vi) one structure being a Borel
subset of another structure equipped with an antecedently specified topology.37
While these three relational properties of structures constitute components of
the subject-matter of various sub-disciplines of mathematical logic– namely, com-
putable model theory, inner model theory, and descriptive set theory– they are
nonetheless not invariant under isomorphism.38 For instance, isomorphic copies of
computable structures need not be computable, and likewise for constructible and
Borel structures.39 Hence, if one thinks that these sub-disciplines of mathematical
logic provide us with knowledge of relational properties of structures, then one has
to reject the invariance thesis, since its mandate for invariance under isomorphism
is not satisfied by the types of knowledge generated by these disciplines.
I want now to turn to the signature problem. Like the isomorphism problem,
this problem exposes a tension between the structure-based version of the Logicist
Template and the perspicuity of the concept of the knowledge of structure. Unlike
the isomorphism problem, I presently see no way in which to dispel this tension.
The signature problem begins with the mundane observation that knowledge of
the dual interpretability result mentioned in the structure-based version of the
Logicist Template requires knowledge that the natural numbers are a structure
in the signature of the Peano axioms. This is simply due to the fact that for a
theory to be about a structure, it is necessary that the theory and the structure
share the same signature. Indeed, it was this shared signature which permitted
us to diffuse the consistency problem by recourse to the Elementary Result about
Dual Interpretability: for, unless the theory and the structure share a common
signature, it does not make sense to say that the theory is true of the structure.
So, by tracing out the definition of dual interpretability, one sees that knowledge
34
of the dual interpretability result from the structure-based version of the Logicist
Template requires knowledge that the natural numbers are a structure in the
signature of the Peano axioms.
However, it seems that knowledge that the natural numbers are a structure in
this signature as opposed to another signature requires knowledge of the Peano
axioms. For instance, the signature of the Peano axioms is traditionally taken
to contain function symbols for addition and multiplication, and is to be distin-
guished from the signature of first-order Presburger arithmetic, whose only symbol
is the addition symbol.40 It seems indisputable to me that we possess knowledge
that the natural numbers are a structure in the signature of the Peano axioms
and not a structure in the signature of first-order Presburger arithmetic. For, one
can point here to the fact that while the Presburger signature has resources to
express the primality of individual natural numbers, such as five and seven,41 it
does not have the resources to express the concept of primality in general. In
particular, any infinite set of natural numbers definable in the Presburger sig-
nature contains non-prime numbers,42 and since I know that there are infinitely
many prime numbers, I know that the concept of primality is not definable in the
Presburger signature. However, and this is the key point, it seems that when I
examine my reasons for thinking that the signature of the natural numbers is not
the Presburger signature, I advert to my knowledge of number-theoretic truths
such as the infinitude of primes, which are traditionally proven by recourse to the
Peano axioms.43 Hence, this is my reason for thinking that knowledge that the
natural numbers are a structure in this signature as opposed to another signature
requires knowledge of the Peano axioms.44
So I have argued that the knowledge that the natural numbers are a structure
35
in the signature of the Peano axioms as opposed to another signature requires
knowledge of the Peano axioms. However, it seems to me that it is also not un-
reasonable to suppose that if one has knowledge that the natural numbers are a
structure in the Peano signature, then one also has knowledge that the natural
numbers are not a structure in various other signatures, such as the Presburger
signature. Indeed, a capacity to rule out various relevant alternatives seems to
be a hallmark of both the knowledge evinced in mathematical practice and the
knowledge of foundational mathematical principles to which we aspire. For in-
stance, it is common in mathematical practice to regard an argument for the
claim that “All A’s are B’s or C’s” as deficient unless one can rule out alternative
claims like “All A’s are B’s” and “All A’s are C’s.” Hence, what this example
suggests is that mathematical knowledge requires knowledge that various relevant
alternatives do not obtain, and it for this reason that I think it reasonable to
suppose that if one has knowledge that the natural numbers are a structure in the
Peano signature, then one also has knowledge that the natural numbers are not a
structure in various other signatures, such as the Presburger signature.
This all now being in place, I am now in a position to state the thesis which
the signature problem is centered around. This thesis is called the priority thesis,
and it says the following: knowledge of the dual interpretability result from the
structure-based version of the Logicist Template requires knowledge of the Peano
axioms. It is not difficult to see that the priority thesis follows directly from
three claims for which I have presently been arguing. For, I first noted that
knowledge of the dual interpretability result plainly requires knowledge that the
natural numbers are a structure in the signature of the Peano axioms. Then I
argued that knowing that they are a structure in this signature requires knowing
36
that they are a structure in this signature as opposed to some other signature,
such as the Presburger signature. Finally, I argued that knowing that the natural
numbers are a structure in the Peano signature, as opposed to other signatures
such as the Presburger signature, requires knowledge of the Peano axioms. From
these three claims, one can straightforwardly deduce the priority thesis via two
applications of modus ponens.
Having argued for the priority thesis, I can now state the signature problem.
The signature problem is that the priority thesis is inconsistent with one natu-
ral conception of the epistemic role of the structure-based version of the Logicist
Template. On this conception, the structure-based version of the Logicist Tem-
plate is supposed to provide a sufficient condition for knowledge of arithmetical
principles such that this sufficient condition could obtain without prior knowledge
of the arithmetical principles. This, I take it, is part of what is traditionally taken
to be exciting and important about logicism: logicism claims to isolate a type of
logical knowledge which could in principle be used to first arrive at knowledge of
arithmetical principles. However, it is easy to see that the priority thesis implies
that the structure-based version of the Logicist Template cannot fulfill this role.
For, this structure-based version says that knowledge of Hume’s Principle and the
knowledge of a dual interpretability result is such a sufficient condition on knowl-
edge of the Peano axioms. But, the priority thesis plainly says that this sufficient
condition cannot obtain without prior knowledge of the arithmetical principles in
question, namely, the Peano axioms.
One might object that this conception of the epistemic role of the structure-
based version of the Logicist Template asks too much. In particular, one might
suggest that all that should be required is that the structure-based version of
37
the Logicist Template provide a sufficient condition on knowledge of arithmetical
principles, regardless of whether these sufficient conditions could obtain without
prior knowledge of the arithmetical principles. However, I take it that some added
condition such as this is necessary if one wants to separate the kind of sufficient
condition provided by logicism from various other sufficient conditions which are
not at all of interest. For instance, one sufficient condition on knowledge of arith-
metical principles is knowledge of arithmetical and geometrical principles. I take
it that it is obvious that the sufficient condition for knowledge of arithmetical
principles which logicism takes itself to provide are patently different in kind from
this sort of sufficient condition, and it seems that the natural way to distinguish
these two kinds of sufficient conditions is in terms of a requirement that the suf-
ficient conditions on knowledge of arithmetical principles be such that they could
obtain without prior knowledge of the arithmetical principles.
Hence, the signature problem is simply that the structure-based version of the
Logicist Template cannot provide such a sufficient condition, due to the priority
thesis. Further, it seems that several of most straightforward ways in which to
respond to the signature problem do not seem like practicable options. First, one
might suggest that the epistemic role which logicism ought to play is different
in kind from providing sufficient conditions such as these. Second, one might
suggest that knowledge of the signature of the natural numbers does not require
knowing that the signature is not that of various other rival signatures. Third,
one could argue that this latter knowledge does not in turn require knowledge of
the very arithmetical principles which logicism sought to deliver in the first place,
namely the Peano axioms. I do not view any of these straightforward responses as
viable options, and hence I regard the signature problem as a deep and hitherto
38
unanswered challenge to the structure-based version of the Logicist Template.
Prior to closing, it is helpful to explicitly point out why the signature problem
is not a problem for the theory-based versions of the Logicist Template. With
respect to each version, we can ask after that on which this version claims to
base our knowledge of arithmetical principles. Part of what is striking and in-
triguing about the theory-based versions is that these versions purport to base
our knowledge of arithmetical principles on knowledge that is not at all ostensibly
arithmetical in character, namely Hume’s Principle and Frege’s Theorem, which
are, respectively, a principle about the equality of cardinalities and a technical
theorem about the interpretability of two theories. However, the structure-based
version does base our knowledge of arithmetical principles on something whose
arithmetical character is readily apparent, namely, the knowledge that the natu-
ral numbers are a structure in the signature of the Peano axioms. The signature
problem then arises when we inquire as to what we base this latter knowledge
on. I have suggested that we base our knowledge that the natural numbers are a
structure in the signature of the Peano axioms on knowledge of the Peano axioms
themselves.
Finally, it is worth stating for the record one natural response to the signature
problem which should be of no consolation to the structure-based version of the
Logicist Template. One intuition which might emerge in the course of reflecting on
the signature problem is that it is simply misleading to speak about the signature
of the natural numbers, and that whatever the natural numbers are, they aren’t
something that comes equipped with a signature. Since structures by definition
come equipped with signatures, it is obvious that this thought cannot help the
structure-based version of the Logicist Template. However, it is by no means
39
obvious that there is not a hitherto unarticulated version of the Logicist Template
which could somehow provide a signature-free account of our knowledge of the
natural numbers.45
1.5 Conclusions and Directions for Further Research
I want to close by contrasting the nature of the challenges which I have pre-
sented for the theory-based and structure-based versions of the Logicist Template.
While taking recourse to different notions of representation, both these versions
suggest that knowledge of arithmetical principles may be based on knowledge
of logical principles and the knowledge that the arithmetical principles are repre-
sented within the logical principles. The most general version of the Logicist Tem-
plate can then be presented in terms of the following valid argument:
Base Premise: The logical principles are known.Representability Premise: It is known that the arithmetical principles arerepresentable in the logical principles.Preservation Premise: For all principles P and P ∗, if principles P ∗ areknown, and it is known that P is representable in P ∗, then principles Pare known.Conclusion: The arithmetical principles are known.
Expressed in these terms, it seems fair to say that most of the recent literature
on logicism has focused on the Base Premise, i.e., the epistemic status of Hume’s
Principle. It should be evident, but nonetheless bears emphasizing, that nothing
which I have said in this chapter touches on these recent arguments for and against
the logicality (or apriority, or analyticity) of Hume’s Principle.
Rather, expressed in terms of this premise-conclusion argument, my focus in
this chapter has been on the Representability Premise and the Preservation Premise.
The challenges to the theory-based version of the Logicist Template discussed in
§ 1.3 were challenges to the Preservation Premise, since the plethora problem
40
and the consistency problem suggested that in general it is not true that knowl-
edge can be passed along theory-based interpretations in this way. However, the
Representability Premise is patently true on the theory-based version, since as
mentioned in § 1.3, Frege’s Theorem simply says that the Peano axioms are inter-
pretable in Hume’s Principle. But, by the same token, since this is all that Frege’s
Theorem tells us, it is by no means obvious that the Representability Premise is
true in the structure-based case. In particular, I discussed two challenges to this
in § 1.4, namely, the isomorphism problem and the signature problem. While I
described a way in which to dissolve the isomorphism problem, I have suggested
that the signature problem admits of no such dissolution, and that it plainly
shows that to acquire the knowledge required by the Representability Premise, we
advert to knowledge of the Peano axioms, which was the very knowledge which
the above argument sought to secure. However, it is worth noting that while the
structure-based Representability Premise faces this deep problem, the structure-
based Preservation Premise seems demonstrably true. Indeed, as mentioned in
§ 1.4, truth is demonstrably preserved downward under the structure-based no-
tion of representation. These conclusions are summarized in Figure 1.1, where I
indicate that a given problem is a problem for such-and-such a premise of such-and-
such a version of the Logicist Template by writing that problem in the appropriate
entry of the table.
Hence, expressed in terms of the above argument, the primary conclusion of
this chapter is that each version of the Logicist Template considered here contains
at least one problematic premise. For one who accepts this conclusion but who is
still sympathetic to this brand of logicism, I think that the moral of this chapter is
that more work needs to be done on possible notions of representation which could
41
Theory-Based Version Structure-Based VersionRepresentability Premise Isomorphism Problem Isomorphism Problem
Signature Problem Signature ProblemPreservation Premise Plethora Problem Plethora Problem
Consistency Problem Consistency Problem
Figure 1.1. Summary of Problems for Versions of the Logicist Template
support a viable version of the Logicist Template. In particular, it is not clear
that there isn’t some middle ground to be found between the theory-based and
structure-based versions of the Logicist Template. So the challenge here would
be to find some notion of representation which went far enough beyond theories
to overcome the plethora and consistency problems, but which did not do so by
pinning everything down to a particular structure in a particular signature. For
all that I have said in this chapter, it is not at all clear to me that there is not
some notion of representation like this out there capable of supporting a viable
version of the Logicist Template.
42
1.6 Notes
1 The example of the forks and knives is of course Russell’s perennially aptillustration of Hume’s Principle. Formally, Hume’s Principle is a sentence inan expansion of second-order logic by a function symbol # from propertiesto objects (cf. Burgess [15] Chapter 1). That is, the idea is that if F is aproperty, then #F is an object. The notion of a “one-one correspondence”can be formally captured with the idea of a bijection. A map f : F → G isa bijection if it is injective and surjective. The map f : F → G is injective iff(x) = f(x′) implies x = x′, while the map f : F → G is surjective if for everyy ∈ G there is x ∈ F such that f(x) = y. Hence, formally, Hume’s Principleis the following sentence:
∀ F,G #F = #G↔ ∃ bijection f : F → G (1.7)
So, as the right-hand side of Hume’s Principle makes clear, the ambient logic ofHume’s Principle is second-order logic. There are of course several alternativesemantics for second-order logic, as are described in Shapiro [135] Chapter 4or Enderton [32] Chapter 4. In particular, these semantics differ from oneanother in terms of whether the second-order quantifiers range over the entirepowerset of the domain. In this chapter, nothing which I shall say will hingeon these differences. For the purposes of the technical results mentioned inthis chapter, the only essential feature of second-order logic which I shall useis the full comprehension schema, which says that to each formula there cor-responds a property such that the property is predicated of all and only thoseobjects of which the formula holds. The full comprehension schema is neededhere because one of the theorems which I discuss here, e.g. Frege’s Theorem,does not hold if a more restricted version of the comprehension schema isused (cf. Theorem 99 of Chapter 3). Hence, when discussing Hume’s Prin-ciple in this chapter, it shall be assumed that the full comprehension schemais included along with it. This of course is not to say that there are no deepand important philosophical issues surrounding the semantics for second-orderlogic and the status of the full comprehension schema. For the former, seeShapiro [135] Chapter 5 and for the latter, see Dummett [29] Chapter 18 orFeferman [37] pp. 254-258, 289-291. The point of being ambivalent about thesemantics of second-order logic is simply that the philosophical theses which Idiscuss in this chapter do not seem sensitive to the differences between thesealternative semantics. The point of assuming the full comprehension schemahere is that I am interested in understanding the philosophical consequencesof results like Frege’s Theorem, and there are simply less of these if the com-prehension schema is restricted.
2 For a summary of this discussion, see for example, see § 2 of Wright’s essay
43
“Is Hume’s Principle Analytic?” ([161] pp. 7-15, [60] pp. 308-320) or § 8 ofMacBride’s survey ([102] pp. 142-150).
3 Of course, one might consider a qualified version of the Logicist Templatewhich only specified how to proceed from knowledge of Hume’s Principle toknowledge of arithmetic in the case where the knowledge of Hume’s Principlewas appropriately apriori or otherwise logical in character. It will be clearupon reading, but is worth mentioning, that the objections which I suggestto versions of the Logicist Template in §§ 1.3-1.4 apply a fortiori to versionswhich are qualified in this manner.
4 For our purposes, we can take the Peano axioms to be given by the followingaxioms, called the axioms of Robinson’s Q
(Q1) s(x) 6= 0
(Q2) s(x) = s(y)→ x = y
(Q3) x 6= 0→ ∃ w x = s(w)
(Q4) x+ 0 = x
(Q5) x+ s(y) = s(x+ y)
(Q6) x · 0 = 0
(Q7) x · s(y) = x · y + x
(Q8) x ≤ y ↔ ∃ z x+ z = y.
and by the Mathematical Induction Principle:
∀ F [F (0) & ∀ n F (n)→ F (s(n))]→ [∀ n F (n)] (1.8)
The Mathematical Induction Principle is obviously a second-order principle,since it begins with a universal quantifier over properties. As discussed inendnote 1, I will be assuming the full comprehension schema and I will not bemaking any assumptions about the semantics for second-order logic. Hence,what I am describing in his paper as “the Peano axioms” is second-orderPeano arithmetic, as described and studied in e.g. Simpson [138]. This is tobe distinguished from first-order Peano arithmetic, in which the MathematicalInduction Principle is replaced by an infinite schema of formulas, and whichis studied in e.g. Hajek and Pudlak [59]. The reason for focusing on second-order Peano arithmetic as opposed to first-order Peano arithmetic here ispurely for the sake of simplicity: all of the points which I shall be makinghere about second-order Peano arithmetic could also be made with respect tofirst-order Peano arithmetic. This however is not to say that there are not
44
important philosophical considerations surrounding the choice between first-and second-order Peano arithmetic. For a discussion of some of these issues,see Shapiro [135] § 5.3.1.
5 There are a few other contemporary accounts of our knowledge of mathe-matical induction (and the other Peano axioms), although much less has beenwritten about these views than has been written on logicism. For a briefdescription of these views, see the second paragraph of Chapter 2, and es-pecially endnotes 48-49. Of course, the entirety of Chapter 2 is devoted toan examination and defense of an older pre-Fregean view that our knowledgeof mathematical induction is akin to our knowledge of enumerative induction(cf. especially endnote 51).
6 For instance, Demopoulos and Clark stress this aspect, saying:
Frege’s earliest contribution to the articulation of logicism consisted inshowing that the validity of reasoning by induction can be accountedfor on the basis of our general knowledge of principles of reasoningdiscoverable in every domain of inquiry. This directly engages theKantian tradition in the philosophy of the exact sciences, accordingto which principles of general reasoning peculiar to our understandingmust be supplemented by a faculty of intuition if we are to account forarithmetical knowledge. We are inclined today to view the answer toKant as requiring the demonstration that mathematical reasoning– inthis case, reasoning about the natural numbers– is recoverable as partof logical reasoning ([25] p. 138).
7 For instance, in the introduction to the Grundlagen, Frege says: “One will beable to see from this essay that even inferences which are apparently particularto mathematics, like the inference from n to n+1, are based on general logicallaws, so that they do not require particular laws of aggregative thought”([44] p. iv). There are many other statements to this effect in Frege, both inother places in the Grundlagen ([44] § 80 p. 93, § 108 p. 118) and in the essay“Formal Theories of Arithmetic” ([45] p. 104). In his 1983 book, Wright says:
Anyone who accepts the Peano axioms as truths ‘not of our making’must recognise the question of what account should be given of ourability to apprehend their truth. If Frege’s attempt to ground thatapprehension in pure logic were to succeed, we should have an answer–or at least a reduction of the problem to the more general one of theaccounting for our apprehension of logical truths ([158] p. 131, cf.p. xiv).
8 For more precise definitions of structure and theory, see for instance Marker [107]Chapter 1 or Enderton [32] Chapters 1-2. It should also be noted that there is
45
much which is peculiar to the examples given here. For instance, the Zermelo-Fraenkel axioms are recursive sets of axioms, whereas in general a theory neednot be recursive. Likewise, both the real and complex numbers are typicallytaken as structures in the field signature, that is, as structures with distin-guished addition and multiplication functions. However, structures can ingeneral have any signature whatsoever, and for instance need not contain anyfunction symbols.
9 More formally, suppose that M and N are two structures in the same signa-ture. Then Then M and N are said to be isomorphic if there is a one-onemap f from M onto N such that M models ϕ(a1, . . . , an) if and only if Nmodels ϕ(f(a1), . . . , f(an)) for every formula ϕ(x1, . . . , xn) in their shared sig-nature and every tuple of elements a1, . . . , an from M , e.g.:
M |= ϕ(a1, . . . , an)⇐⇒ N |= ϕ(f(a1), . . . , f(an)) (1.9)
10 For a proof of the introduction of coordinates in a very general setting, seeArtin [3] Chapter 2, and for a more restrictive setting, see Hartshorne [62]Theorem 21.1 p. 137. It should be emphasized that these authors do notstate these results in terms of interpretability. In particular, they do notexplicitly state that their way of “recovering” the field in the geometry orvice-versa is definable. However, reading through the proofs, one can easilycheck that everything that they are doing is definable. This is a fact whichis sometimes cited in discussions about interpretability in mathematical logictexts. See, for example, Hodges [70] pp. 222-223 Example 1.
11 Here, as with the interpretability of structures, in order to make this definitionprecise, we would need to present in more detail our notion of proof andtranslation. This definition is time-consuming, and typically takes up one totwo full pages in a typical mathematical logic textbook. See, for example, thepresentations in Lindstrom [99] pp. 96-97 and Hajek and Pudlak [59] pp. 148-149. Hence, for the sake of brevity, we have chosen to exposit this notion herewith the example of the Peano axioms and the Zermelo-Fraenkel axioms.
12 It is a classical result of model theory that the theory of a dense linear orderwithout endpoints is complete, so that if we identify theories which are prov-ably equivalent, then the theory of a dense linear order without endpoints isthe same as the complete theory of the rational numbers as a linear order.See for example, Marker [107] Theorem 2.4.1 p. 48.
13 For more examples of faithful interpretability, see Lindstrom [99] Chapter 6,§ 2, pp. 106 ff.
14 For stronger notions of interpretability, see Visser [148] § 3.3 and Hodges [70]§ 5.3.(c) pp. 222-225.
15 For a proof of Frege’s Theorem, see Wright [158] pp. 154-169 or Boolos [12], or
46
Theorem 27 of Chapter 3. For a discussion of where Frege’s Theorem appearsin the texts of Frege, see Boolos and Heck [14].
16 Heck articulates this objection in a few paragraphs of papers which haveother goals, for instance, a study of the epistemology of counting and anexposition of the elements of Frege’s Theorem ([67], [66]). On Heck’s versionof the objection, the concern is with using a theory-based interpretabilityresult (such as Frege’s Theorem) to secure an inference from the analyticity ofHume’s Principle to the analyticity of the Peano axioms. For, as Heck notes,axiomatizations of geometry are likewise interpretable in Hume’s Principle,and thus one can ask the following rhetorical questions about the analyticityof geometry:
Does it then follow that, on Frege’s view, Euclidean geometry must beanalytic? That would be unfortunate, for Frege explicitly agreed withKant that the laws of Euclidean geometry are synthetic apriori ([66]p. 59).
Frege held that analysis, as well as arithmetic, was analytic. He didnot, however, regard all of mathematics as analytic, since he agreedwith Kant that Euclidean geometry is synthetic a priori. But thetruths of Euclidean geometry can be proven in analysis (given suitabledefinitions). Were Frege’s views inconsistent then? ([67] p. 188).
One can also find versions of this objection in Hoering and Hofweber, each ofwhom is concerned to emphasize the distance between intertheoretic reduc-tion and theory-based interpretability. For instance, Hoering notes that ontheory-based notions of reduction, arithmetic would “reduce all recursivelyenumerable theories” ([71] p. 33, cf. p. 36). Likewise, Hofweber argues:
[. . . ] to see that relative interpretation alone does not provide a theoryreduction (in the present sense of the word), consider the following:take a first order formulation of some physical theory. By the resultmentioned above it is relatively interpretable in some arithmetic the-ory. But, of course, the former theory can not be reduced, in theintuitive sense of the word, to the latter. The former is about physicalobjects, the latter about numbers. The former is not just a specialcase of the latter. So, reduction requires more than just an associationof the relevant formal languages in the right way ([72] p. 132)
17 The easiest way to see this is to note three things. First, by the work of Tarski,the complete theories of the real and complex numbers are complete and re-cursively axiomatizable (cf. Marker [107] Corollary 3.2.3 p. 85 and Corol-lary 3.3.16 p. 97). Second, by formalizing Henkin’s proof of the completeness
47
theorem, one can show that if the Peano axioms prove the consistency of arecursively axiomatizable theory, then they interpret that theory (cf. Lind-strom [99] Theorem 4 p. 99 and Hajek and Pudlak [59] Theorem 2.39 p. 169).Third, it is easy to show by a model construction within Peano arithmeticthat the Peano axioms prove the consistency of the recursively axiomatiz-able fragments of the complete theories of the real and complex numbers (cf.Simpson [138] Theorem II.9.4 p. 97 and Theorem II.9.7 p. 98).
18 For the standard proof using the intermediate value theorem, see Lang [95] pp.272-273. For a complex-analytic proof, see Greene and Krantz [58] Theorem3.4.5 p. 87. For the proof using algebraic topology, see Hatcher [63] Theorem1.8 p. 31.
19 This is a particular instance of a more general response to the types of coun-terexamples to the theory-based versions of the Logicist Template which Ishall consider in this section. All of these counterexamples point out thatin general, it is false that knowledge of the interpreting theory and knowl-edge of the interpretion yields knowledge of the interpreted theory. Thus, itis always open to the advocate of the theory-based version of the LogicistTemplate to respond to such a counterexample by saying that they are notappealing to such a general claim, but rather to some more circumspect claimcentered around a more nuanced notion of interpretation germane to theo-ries. Of course, it is ultimately incumbent on such an advocate to also showthat the Peano axioms (or some other arithmetical theory) is interpretable inHume’s Principle (or some other logical theory) in this more nuanced sense.
20 This follows from the fact that no complete consistent extension of the Peanoaxioms is computable (cf. Tarski et. al. [145] Theorem 9 p. 60). However, ifthe Peano axioms were interpretable in the complete theory of the complexnumbers, then a model of the Peano axioms would be interpretable in thecomplex numbers. But the complete theory of any model in a finite signaturewhich is interpretable in the complex numbers is computable since the com-plete theory of the complex numbers is computable, and likewise for the realnumbers (cf. Marker [107] Corollary 3.2.3 p. 85 and Corollary 3.3.16 p. 97).
21 Boolos proves the consistency of Hume’s Principle relative to the Peano axiomsby showing that the Peano axioms are interpretable in Hume’s Principle ([10],[13] p. 190).
22 However, it should be emphasized that the plethora problem will still bea problem if one allows examples of more contrived theories. For instance,given any initial theory, one can always form a composite theory by gluingthe initial theory to the Peano axioms. For instance, this composite theorywould say: there are only two types of things, numbers and non-numbers,and while the numbers obey the Peano axioms, the non-numbers obey theaxioms of the initial theory. Moreover, if the initial theory is interpretable in
48
the Peano axioms, then the composite theory will be mutually interpretablewith the Peano axioms and hence with Hume’s Principle (by Frege’s Theoremand the aforementioned result of Boolos). However, it seems to me thatit is uncharitable to use such contrived examples against the logicist. For,presumably the idea behind the Logicist Template is that the theories orprinciples in question are intended to be theories or principles which we havesome stake in and whose truth we are interested in assaying, and it does notseem that these composite theories are theories in this sense.
23 For, Dedekind showed that by beginning with a model which satisfies all thePeano axioms besides the Mathematical Induction Principle, one can obtain amodel of all the Peano axioms by restricting the domain to all those numberswhich satisfy the Mathematical Induction Principle. In the terminology ofWas sind und was sollen die Zahlen?, Dedekind begins with a infinite systemand obtains a simply infinite system by the judicious use of chains. (,,72. Satz.In jedem unendlichen System S ist ein einfach unendliches System N als Teilenthalten“ ([23] vol. 3 pp. 359-360). For the other direction of the result, amodel of the anti-Peano axioms needs to be uniformly defined within eachmodel of the Peano axioms. This can be done as follows: working within thePeano axioms, build a model whose domain is an isomorphic copy of the nat-ural numbers plus some other element, call it an infinite number. Then definezero and successor on the natural numbers as normal and define the successorof the infinite element to be itself (and adjust the definition of addition andmultiplication by this infinite element accordingly). Then this model satisfiesthe axioms of Robinson’s Q (as described in endnote 4). Further, since onecan easily prove by the Mathematical Induction Principle that no numberis its own successor, this model fails to model the Mathematical InductionPrinciple.
24 It is important to note that the case which I have made against the theory-based version of the Logicist Template centered around mutual interpretabilityis stronger than the case which I have made against the version of the Logi-cist Template centered around mutual faithful interpretability. Even thoughboth arguments are focused around the consistency problem, the counterex-ample in the latter case (i.e. the example of the dense linear order with andwithout endpoints) is more contrived than the counterexample in the formercase (i.e. the example of the Peano axioms and the anti-Peano axioms). Itseems to me that the case against the version centered around mutual faith-ful interpretability would be made stronger to the extent that the examplesillustrative of the consistency problem were less contrived and closer to the ex-amples which the logicist might actually be concerned with, such as the Peanoaxioms and Hume’s Principle. This, of course, is related to the point aboutcontrived theories made in endnote 22. However, the problem here is that veryfew non-trivial natural examples of mutually faithfully interpretable theories
49
are presently known. For instance, it seems to be presently unknown whetherthe Peano axioms and the anti-Peano axioms are mutually faithfully inter-pretable, even though one can say something about the unfaithfulness of par-ticular interpretations. Likewise, it seems to be presently unknown whetherHume’s Principle is mutually faithfully interpretable in the Peano axioms (al-though again one can say something about the unfaithfulness of particularinterpretations). I suspect that Hume’s Principle is not mutually faithfullyinterpretable in the Peano axioms. For, it is relatively easy to come up withsentences which are independent of Hume’s Principle (e.g. the sentence thatevery object is the cardinality of some property). However, presently thereare very few known examples of sentences which are independent of the Peanoaxioms, and the faithful interpretability of Hume’s Principle in the Peano ax-ioms would allow one to automatically transfer any independence result aboutHume’s Principle into an independence result about the Peano axioms. For,faithful interpretability would require that non-provability facts about Hume’sPrinciple be mirrored by non-provability facts about the Peano axioms
25 The particular weak set of arithmetical principles which Nelson was concernedwith were the axioms of Robinson’s Q (cf. endnote 4). Nelson’s “compatibilityproblem” was then the problem of determining whether Q+ϕ+ψ is mutuallyinterpretable with Q whenever Q + ϕ and Q + ψ are mutually interpretablewith Q ([117] p. 63). Supposing that Robinson’s Q is consistent, it cannot bemutually interpretable with an inconsistent theory like Q + ϕ + ¬ϕ. Hence,under this supposition, a positive resolution to the compatibility problemwould have had the implication that at least one of Q+ ϕ and Q+¬ϕ is notmutually interpretable with Q. However, Kalsbeek later found examples ofsentences ϕ such that Q + ϕ and Q + ¬ϕ are mutually interpretable with Q(cf. Iwan [78] p. 151, Buss [17] p. 194).
26 For a proof of the Guaspari-Lindstrom theorem and bibliographic references,see Lindstrom’s book [99] Theorem 6 pp. 103, 115. For an example of thistheorem being cited by a set-theorist, see Steel [38] p. 427.
27 For instance, consider the opening paragraph of Tait’s famous paper on finitism:
The crux to understanding Hilbert’s conception of finitist mathematicsis this question: In what sense can we prove general propositions, suchas ∀ x, y x+ y = y + x about the natural numbers, without assumingthe infinitude of numbers or some other infinite totality? For, if thereis to be nontrivial finitist mathematics, one must be able to prove suchpropositions. Indeed, Hilbert was concerned with consistency proofsfor formal systems, which are proofs of just this sort ([143] p. 524)
It is of course not lost on set-theorists such as Steel that there is a connectionbetween Π0
1-sentences and Hilbert’s program. Indeed, where Steel invokes the
50
Guaspari-Lindstrom theorem, he describes it as part of an “instrumentalistdodge” ([38] p. 423).
28 So this response to the plethora problem has focused on the example whichI discussed in regard to the plethora problem in § 1.3, namely, the exam-ple of the Fundamental Theorem of Algebra. Hence, outside of disputingmy discussion of how the structure-based version handles this example, oneway in which to insist that the plethora problem is nonetheless a problemfor the structure-based version would be to suggest that the structure-basedversion flounders in its treatment of other examples. I cannot presently seeany relevant difference between the example of the Fundamental Theorem ofAlgebra and other examples, such as the examples of Euclidean geometry andcodifications of physical theory discussed by Heck, Hofweber, and Hoeringand mentioned in endnote 16. However, I would be remiss to suggest thatI have done anything here besides discuss how the structure-based versioncan respond to a few examples which seem to me to be representative of thegeneral problem. In particular, I do not purport to have given a proof thatthe plethora problem cannot reemerge for the structure-based version.
29 Here is the proof of the result that if theory T ∗ is true of structure M∗ andif theory T about structure M is dual interpretable in theory T ∗ about struc-ture M∗, then theory T is true of structure M . By definition, such dualinterpretability simply means that T is interpretable in theory T ∗ and struc-ture M is interpretable in structure M∗, and the definitions used in bothinterpretations are the same. Since T is interpretable in T ∗, every modelof T ∗ uniformly interprets a model of T , where the “uniformly” means thatthe same definitions are used each time. Hence, the model M∗ of T ∗ uniformlyinterprets a model N of T . But by the hypothesis of dual interpretability, theway that M∗ defines N is exactly the same way that M∗ defines an isomorphiccopy of M . Hence, N is identical to an isomorphic copy of M , and since N isa model of T , we have that M is a model of T .
30 However, it should be clear that if we do not insist on pinning everythingdown to a particular structure, then the consistency problem will reemerge.For instance, using the material from § 1.3, it is easy to find examples wheretheory T + ϕ about structure M is dually interpretable in theory T + ¬ϕabout structure M∗, and vice-versa. The reason that these examples are nottroublesome in this structure-based setting is that here our knowledge of ϕis elliptical for our knowledge that ϕ is true on some fixed structure, so thatthere is no contradiction between my knowing that ϕ is true on M and falseon M∗.
31 For an example of an author who seems to endorse a qualified version of thisthesis, see endnote 35.
32 Here I am making what I regard to be a natural assumption about our knowl-
51
edge of existential statements. In particular, I am assuming that in this settingour knowledge that ∃ x Fx is induced by there being some a such that it isknown that Fa. In particular, the knowledge in question here is knowledgethat “M is interpretable in M∗.” By definition, this is knowledge that “thereis a structure N such that M is isomorphic to N and N is definable in M∗.”I am assuming that in this setting we would acquire this knowledge by therebeing some N such that it is known that “M is isomorphic to N and that Nis definable in M∗.”
33 To be sure, there is a weaker type of invariance which is present in the case ofdefinability, in that it is demonstrably true that if one structure is definable inanother, then any isomorphic copy of the first is definable in some isomorphiccopy the second (and likewise any isomorphic copy of the second defines someisomorphic copy of the first).
34 This is Dedekind’s letter to Weber from January 24, 1888, where Dedekindexpresses some doubt about defining natural numbers in the manner suggestedby Frege at the end of § 68 of the Grundlagen, saying:
Something similar holds of the definition of cardinal number as a class;much will be said of this class (for instance, that it is a system ofinfinitely many elements, namely a system of all those systems withwhich it is bijective (ahnlich)) which one would most certainly assertmost reluctantly of the number itself; indeed, does anyone actuallyconsider, or would it not be better forgotten, that the number four isa system of infinitely many elements?” ([23] vol. 3 p. 490).
In Chapter 2 of his 1910 On the Concept of Substance and the Concept ofFunction, Cassirer presents a Dedekind-inspired critique of Frege, saying, forinstance: “[. . . ] for what is here logically deduced does not coincide with theactual sense which we attach to judgements of number in everyday cognition”([18] p. 62). Cassirer’s critique of Frege’s later indirectly influenced Benacer-raf’s dissertation on logicism (cf. [7] p. 162 fn), and one can see some of thisinfluence both in Benacerraf’s essay on Frege, where for instance Benacerraftells us that what the logicist needs is “[. . . ] an argument that the sentencesof arithmetic, in their preanalytic senses, mean the same (or approximatelythe same) as their homonyms in the logicist system” ([8], [24] p. 46).
35 For instance, Resnik says that “no mathematical theory can do more thandetermine its objects up to ismorphism” ([128] p. 529). Resnik elaborates onthis thought more extensively in his book, putting it in terms of our capacityto describe structures:
But no mathematical description of a pattern– not even one by meansof a categorical set of axioms– will differentiate its occurrences within
52
other patterns from each other or from its occurrences in isolation;unless the description also states that the pattern occurs within acertain containing pattern. For mathematics only describes structuresup to isomorphism, except when it describes them as embedded inother structures ([129] p. 220).
Even though Resnik does not cast this claim about our capacity to describestructures in epistemic terms, it seems obvious that he is talking about ourcapacity to accurately describe structures, and so it seems not unreasonable tointerpret the first part of this quotation as an endorsement of the invariancethesis. It is, however, important to note the last qualification in the quotationfrom Rensik. For, in this last qualification it seems that Resnik permits oneand only one exception to the invariance thesis, namely, the case of the defin-ability of one structure within another, which is precisely the case at issue inthe isomorphism problem. However, Resnik does not go onto explain why thecase of definability should be an exception to what I have called the invariancethesis. The explanation which I will go onto offer is that while the invariancethesis may be true with regard to intrinsic properties of structures, it is nottrue with respect to relational properties of structures.
In contrast to Resnik, it does not seem that structuralists such as Shapiroand Parsons are committed to anything like the invariance thesis. It is ofcourse difficult to provide succinct textual evidence for a negative existential,but something in Shapiro’s writings which is suggestive of this is his remarkthat the identity relation between structures is a matter of convenience: “Likethe identification of places from different structures [. . . ], the identity relation[between structures] we need is more a matter of decision or invention, basedon convenience, than a matter of discovery” ([136] p. 92). In the case ofParsons, while there are some places where he indicates a desire to focus onproperties which are invariant under isomorphism ([120] p. 75), there is by nomeans any sort of endorsement of the invariance thesis. Something indicativeof this is the different reading which Parsons gives to the admittedly ambigu-ous phrase of “specify up to isomorphism.” In the invariance thesis, and in theabove quotation from Resnik, the idea is that our knowledge of structures isonly of those properties which are invariant under isomorphism. However, Par-sons instead focuses attention on our capacity to describe a class of structuressuch that any two structures in this class are isomorphic: “[. . . ] the naturalnumbers are at least determinate up to isomorphism: If two structures answerequally well to our conception of the sequence of natural numbers, they areisomorphic. I will call this latter thesis the Uniqueness Thesis” ([120] p. 272).
It seems that what Parsons calls the uniqueness thesis is independent ofwhat I have called the invariance thesis. For instance, to anticipate some ofwhat I shall say about the distinction between intrinsic and relational prop-erties of structures, it seems that the uniqueness thesis could succeed and the
53
invariance thesis could fail if one had a theory such that (i) one knew thatall of its models were isomorphic and such that (ii) one knew that only someof its models were computable (or constructible, or Borel). Likewise, withregard to some particular theory (say the Peano axioms or Zermelo-Fraenkelaxioms for set theory), one might be convinced of the invariance thesis andyet be equally convinced by the Lowenheim-Skolem theorems that not all ofthe models of this theory are isomorphic to one another. Or, to take a morepedestrian example, one might simply note that there are both finite and in-finite models of the theory of groups, and yet still think that any property ofa particular group of which one knows conforms to the invariance thesis.
36 For instance, Parsons says: “By this [the structuralist view] I mean the viewthat reference to mathematical objects is always in the context of some back-ground structure, and that the objects involved have no more by way of a‘nature’ than is given by the basic relations of the structure” ([120] p. 40).Shapiro puts this point in terms of a dependence thesis: “Each mathematicalobject is a place in a particular structure. There is thus a certain priorityin the status of mathematical objects. The structure is prior to the math-ematical objects it contains, just as any organization is prior to the officesthat constitute it” ([136] p. 78). Shapiro thinks that this dependence thesisimplies that judgments of identity between mathematical objects in differ-ent structures is illegitimate, saying: “But it makes no sense to pursue theidentity between a place in the natural-number structure and some other ob-ject, expecting there to be a fact of the matter” ([136] p. 79). In his originalarticle on structuralism, Resnik concedes the following: “I have discussed sev-eral equivalence relations between patterns– congruence, mutual occurrence,equivalence– but have failed to propose any identity conditions for patterns.I will not; and that brings me to what I find the most difficult point of mytheory– the restriction of identity to within patterns” ([128] p. 536). In hislater book, Resnik is adamant on this point, saying: “[. . . ] restricting identityto positions in the same pattern goes hand in hand with their failure to haveany identifying features independently of a pattern” ([129] p. 211).
37 To put it very roughly, one set of natural numbers X is Turing computablefrom another set of natural numbers Y if there is a fixed program which, givenany input n and allowed access to arbitrarily large initial segments of Y , candetermine if n is in X. (For more details, see the definition of X ≤T Yin Soare [139] § III.1). Likewise, to give a very rough sketch, one set X isconstructible relative to another set Y if X is in Y or X is a definable subsetof Y , or X is a definable subset of a definable subset of Y etc., where thisprocess is iterated along all the ordinals. (For more details, see the definitionof L(Y ) in Jech [80] p. 193). Finally, a subset X of a topological space Y issaid to be Borel if X is one of the open subsets of Y , or X is the complementof one of these subsets, or X is a countable union of one of these two kinds of
54
sets, etc., where this process is iterated along all the countable ordinals. Ofcourse, while one can define Borel-ness relative to any topological space Y ,most of the theorems of descriptive set theory hold only if Y is a Polish space,i.e., it is separable in that has a countable dense set, like the rationals inthe reals, and it is completely metrizable in that there is a metric giving thetopology such that every Cauchy sequence converges, as in the real numbers.(For more details, see Kechris [84] §§I.3, II.11)
38 For an introduction to computable model theory, inner model theory, andthose parts of descriptive set theory concerned with Polish and Borel struc-tures, see for example Harizanov ([61]), Mitchell ([111]), and Gao ([51] Chap-ter 2) and Montalban-Nies ([112]).
39 Of course, in some particular cases, it may happen that isomorphisms preservethese properties. For instance, in the case of constructibility, a simple caseof Godel’s Condensation Lemma says that if X is transitive and (X,∈) isisomorphic to (Lα,∈) for a limit ordinal α, then X = Lα, and so X too isconstructible (cf. [26] Theorem 5.2 p. 80).
40 The signature of first-order Presburger arithmetic is the signature of the struc-ture (ω,+). It is clear that the ordering is definable in this structure, sincex ≤ z if and only if ∃ y x + y = z. Likewise, the constants zero and one aredefinable in this structure since zero is less than all the other natural numbers,and one is the least non-zero natural number. This signature is named afterPresburger, who in 1930 gave a complete axiomatization of this structure.This structure, and Presburger’s axiomatization, admit quantifier eliminationif one adds both unary function symbols Pn(x) for divisibility by n and sym-bols for ≤ and 0 and 1. That is, in this enriched signature, every definableset is definable by a quantifier-free formula, and this fact is registered in Pres-burger’s axiomatization. For a proof of this, along with the completeness ofPresburger’s axiomatization, see Marker [107] pp. 81 ff.
41 Let me explain in more detail what I mean when I say that the Presburgersignature has the resources to express the primality of individual natural num-bers. For each natural number n, consider the formula mentioned in the aboveendnote for divisibility by n, namely
Pn(x) ≡ ∃ y y + · · ·+ y︸ ︷︷ ︸n times
= x (1.10)
Using these formulas, for each natural number n, we can find sentences ϕn inthe Presburger signature such that (ω,+) |= ϕn if and only if n is prime. Inparticular, we can choose the sentence
ϕn ≡∨m<n
¬Pm(1 + · · ·+ 1︸ ︷︷ ︸n times
) (1.11)
55
(Here, of course, we use the fact, mentioned in the previous endnote, that thenumber one is definable in the Presburger signature). It seems to me quiteintuitive to say that ϕn expresses the primality of n: for, quite plainly, itssyntax says that for any natural number m < n, it is not the case that mdivides n.
But, it is important to note that for every subset X of natural numbers,there is a sequence of sentences ψn such that (ω,+) |= ψn if and only if nis in X. For instance, just let ψn say that 0 = 0 if n is in X and let ψnsay that 0 = 1 if n is not in X. However, this does not guarantee that thePresburger signature has the resources to express the X-ness of individualnatural numbers. For, there is a syntactic uniformity in the case of the ϕn’swhich is lacking in the case of the ψn’s.
42 Let me briefly explain why this is. Suppose that X is an infinite subset ofnatural numbers which is definable in the structure (ω,+). By the quantifier-elimination result mentioned two endnotes ago, this set is defined by a quantifier-free formula in the structure (ω, 0, 1,+,≤, Pn). Then the result follows by anenumeration of cases. For instance, if n > 1 then formula Pn(x) holds ofnon-primes since 2n is non-prime, and likewise the formula ¬Pn(x) holds ofmany non-primes it holds of p2 for any prime p > n.
43 It is not obvious that one can make any stronger claim here. For instance,it is by no means obvious what it even means to say that the such-and-such a result must be proven by means of the Peano axioms. Hence, eventhough sometimes I shall put the point by saying that knowledge of the Peanoaxioms is required for knowledge of arithmetical signature, this should beunderstood as elliptical for the more circumspect claim that our knowledgeof arithmetical signature is as a matter of fact based on our knowledge of thePeano axioms. So one way in which to disagree with this claim would be toillustrate some alternative manner in which we could secure knowledge of thearithmetical principles which grounds our knowledge of signature. Further, itshould be mentioned that one does not use all of the Peano axioms to provethe arithmetical results in question here. This, of course, is obvious, simplybecause both first- and second-order Peano arithmetic (cf. endnote 3) arenot finitely axiomatizable (cf. Hajek-Pudlak [59] Corollary III.2.24 p. 164,Simpson [138] Corollary VII.7.8 p. 306), and hence one does not use all ofthe Peano axioms to prove any particular result or any finite set of particularresults. However, in the text I will pass over this, as I don’t think that thisaffects the overall point which I am making about the signature problem. Ifone so desires, wherever in the text I say things like “knowledge of arithmeticalsignature requires knowledge of the Peano axioms,” one can replace this with“knowledge of arithmetical signature requires knowledge of some non-trivialfinite segment of the Peano axioms.”
56
44 I have focused the discussion around the Presburger signature, but there areother signatures which one could have likewise used. For instance, it is likewiseeasy to find many natural arithmetical concepts which are not expressible inthe signature which just contains successor s(x) = x + 1, since any definablesubset of (ω, s) is finite or its complement is finite (cf. [107] Exercise 3.4.3p. 104). Similarly, one might consider the second-order structure (ω, P (ω), s)where one is only allowed to quantify over subsets of natural numbers butnot over subsets of pairs of natural numbers or subsets of triples of naturalnumbers etc. Buchi showed that the subsets of natural numbers which aredefinable in this structure correspond to subsets which are recognizable bycertain finite automata (cf. [90] Theorems 3.10.3-3.10.4 pp. 201-202).
45 There are various extant projects in the philosophy of mathematics whichseek to render theories and structures free from particular signatures, and Iwant briefly to indicate some of my reservations about the extent to whichthese projects could be successfully imported into the setting of the Logi-cist Template. In developing his brand of structuralism, Resnik expressedsome reservations about having to describe structures as structures within aparticular signature, saying:
Most mathematicians and logicians would regard number theory de-veloped in a language in which the successor symbol is primitive asessentially the same as a development taking the less than symbolas primitive. Since I am viewing number theory as the science of acertain pattern or patterns, this would suggest that (N,S) and (N,<)[the natural numbers with successor, and the natural numbers with lessthan] should count as the same or essentially the same pattern. [. . . ]Moreover, they are not isolated examples of non-isomorphic structureswhich mathematicians view as essentially the same: we have BooleanAlgebras in the form of rings, but also in the form of lattices, alterna-tive definitions of groups and topologies, and so on ([128] p. 535, cf.[129] pp. 207-208).
Resnik later suggests a way to handle this problem by defining a certain equiv-alence relation on structures. In particular, Resnik says that two structuresare equivalent if they have the same domain and the constants, relations, andfunctions of the one are definable in the other, and vice-versa ([128] p. 536,[129] pp. 208-209). (Note that the notion of pattern occurrence which Resnikuses is explicitly cast in terms of definability ([128] p. 533, [129] p. 205)). Oneconcern which I have is that this is too fine-grained an equivalence relation forthe purposes of the structure-based version of the Logicist Template. For in-stance, no two of the following three structures are pairwise equivalent: (ω, s),(ω,+), and (ω,+,×). For, the even numbers are definable in (ω,+), while
57
any definable subset of (ω, s) is finite or cofinite (cf. endnote 41). Likewise,multiplication is not definable in (ω,+) since if we could define multiplication,then we could define the set of primes (cf. endnote 39). Hence, if one modifiedthe structure-based version of the Logicist Template so that it was phrasedin terms of equivalence classes of structures, then one could still ask why thenatural numbers are a structure in the equivalence class of (ω,+,×) and notin the equivalence class of (ω,+).
Another extant project which seems relevant to the signature problem isFeferman and Lavine’s notion of a schematic theory ([36] § 1.4 pp. 6 ff, [96]§ 5.7 pp. 117 ff). The idea here is that one begins with an initial theory Twhich contains schemata, in the way that first-order Peano arithmetic containsthe mathematical induction schema (cf. endnote 3). The schematic theory T ∗
associated to T is then a mapping from L-signatures extending the signatureof T to an L-theory extending T which contains all the new instances of T ’soriginal schemata. For instance, in the case of first-order Peano arithmetic,the passage from T to T ∗ would reflect a disposition to accept an instanceof the mathematical induction schema regardless of this instance’s signature.Feferman and Lavine’s method is admittedly an elegant way to avoid par-ticular signatures in the case of theories. However, it is not clear what thepossible analogue of this in the case of structures would be. Of course thegeneral idea would be to begin with a particular structure M and then definethe schematic structure M∗ to be a mapping from L-signatures extending thesignature of M to an L-structure which expands M . In the case of Fefermanand Lavine’s notion of a schematic theory, it is obvious how to define theextension of T to the new signature: one just adopts all the new instances ofT ’s original schemata. However, it does not seem like there is an analogue ofthis move in the case of structures. For instance, when I add a new unaryrelation symbol to M ’s signature, I have to choose a particular subset of Mto answer to this new symbol, and it is not obvious that anything about Mis going to be able to guide me in this choice in the way that T guides theexpansion to the new signature.
58
CHAPTER 2
EMPIRICISM, PROBABILITY, AND KNOWLEDGE OF ARITHMETIC:
A PRELIMINARY DISCUSSION
2.1 Introduction: Inceptive and Amplificatory Empiricism
The topic of this chapter is the tenability of a certain type of empiricism about
our knowledge of the Peano axioms. The Peano axioms constitute the standard
contemporary axiomatization of arithmetic, and they consist of two parts, a set
of eight axioms called Robinson’s Q, which ensure the correctness of the addition
and multiplication tables, and the principle of mathematical induction, which says
that if zero has a given property and n + 1 has it whenever n has it, then all
natural numbers have this property.46 The type of empiricism about the Peano
axioms which I want to consider here does not claim that perception can pro-
vide us with knowledge of these axioms in the same way that perception can
provide us with knowledge of the properties of middle-sized objects. Rather, the
idea behind the type of empiricism which I want to consider is that arithmetical
knowledge is akin to the knowledge by which we infer from the past to the fu-
ture, or from the observed to the unobserved. It is not uncommon today to hold
that such inductive inferences can be rationally sustained by appeal to informed
judgments of probability. The goal of this chapter is to articulate and evaluate
an empiricism which contends that the Peano axioms can be fully justified by
recourse to judgements of probability.
59
This empiricism merits our attention primarily because there are few contem-
porary accounts of our knowledge of the Peano axioms, and those accounts which
we do have seem to face deep problems. Logicism, for example, suggests that
knowledge of the Peano axioms may be based on knowledge of ostensibly logical
principles– such as Hume’s Principle– and the knowledge that the Peano axioms
are representable within these logical principles. The success of logicism thus
hinges upon identifying a concept of representation which can sustain this infer-
ence, and as I have argued elsewhere, it seems that we presently possess no such
concept.47 Alternatively, some structuralists have suggested that knowledge of
the Peano axioms may be based on our knowledge of the class of finite structures.
However, this account then owes us an explanation of why the analogues of the
Peano axioms hold on the class of finite structures: why, for example, there is no
finite structure which is larger than all the other finite structures.48 Finally, it
has been recently suggested that the natural number structure is itself perceiv-
able in a way which would justify the Peano axioms, or would at least justify the
satisfiability of these axioms.49 However, it seems difficult to see how such an infi-
nite structure is perceivable in anything like the same sense in which middle-sized
objects are perceivable, and thus this account owes us some explanation of what
the perception-like relation is which we bear to the natural number structure, and
why this perception-like relation should be a source of justification, despite the
manifest differences between it and our ordinary modes of perception.50
The second reason that a probability-based empiricism about the Peano ax-
ioms merits our attention is that it has been suggested in different ways by both
historical and contemporary sources. For instance, prior to Frege, a not uncom-
mon view seems to have been that mathematical induction was an empirical truth
60
akin to enumerative induction. This is why Kastner thought that mathematical
induction was not fit to be an axiom,51 and this is part of the background to Reid’s
begrudging concession that “necessary truths may sometimes have probable ev-
idence.”52 However, some contemporary authors writing on the epistemology of
arithmetic and arithmetical cognition have also suggested views related to this.
For instance, Rips and Asmuth– two cognitive scientists who work on mathe-
matical cognition– have recently considered the suggestion that “the theoretical
distinction between math[ematical] induction and empirical induction” is not as
clear as has been claimed, and that “even if the theoretical difference were se-
cure, it wouldn’t follow that the psychological counterparts of these operations
are distinct.”53 Finally, in the course of their work on the epistemic propriety of
randomized algorithms, Gaifman and Easwaran have both suggested the possibil-
ity of extending the notion of probability that they employ to broader issues in
the epistemology of arithmetic.54
So, as with Gaifman and Easwaran, the empiricism about arithmetical knowl-
edge that I want to consider is centered around the notion of a probability as-
signment and the associated confirmation relation. A probability assignment is
a mapping P from sentences in a fixed formal language to real numbers which
satisfies the following three axioms (cf. [74] pp. 20 ff, [30] pp. 35 ff):
(P1) P (ϕ) ≥ 0
(P2) P (ϕ) = 1 if |= ϕ
(P3) P (ϕ ∨ ψ) = P (ϕ) + P (ψ) if |= ¬(ϕ & ψ)
In what follows, all the probability assignments under consideration shall be as-
sumed to have a domain which includes all the sentences in the language of the
Peano axioms. Further, it shall be assumed that the consequence relation |= in
61
axioms P2-P3 is the logical consequence relation from first-order logic, so that |= ϕ
holds if and only if ϕ is true on all models. The notion of confirmation is then
defined as an increase of the probability of a hypothesis conditional on evidence
relative to the background knowledge. That is, hypothesis h is said to be confirmed
by evidence e relative to background knowledge k if P (h|e & k) > P (h|k), assum-
ing that the conditional probabilities P (h|e & k), P (h|k) are defined, where these
conditional probabilities are given by the equation P (h′|e′) = P (h′ & e′)P (e′)
. Further,
the hypothesis h is said to be confirmed tout court by evidence e if P (h|e) > P (h),
where again it is assumed that this conditional probability is defined. Hence it
is easy to see by standard manipulations of P1-P3 that to establish that a hy-
pothesis is confirmed by evidence relative to background knowledge, it suffices to
show that (i) the hypothesis and the background knowledge jointly logically im-
ply the evidence and that (ii) the conjunction of the evidence and the background
knowledge is assigned non-zero probability which is strictly less than the proba-
bility assigned to the background knowledge. Likewise, to show that a hypothesis
is confirmed tout court by evidence, it suffices to show that the hypothesis logi-
cally implies the evidence and that the evidence is assigned a non-zero probability
strictly less than one.55
Since there are two parts to the Peano axioms– namely Robinson’s Q and math-
ematical induction– so there are two complementary forms of empiricism which I
want to consider here, which I call inceptive empiricism and amplificatory empiri-
cism. Amplificatory empiricism contends that one is justified in inferring from the
antecedent of an instance of mathematical induction to its consequent, relative
to the background knowledge consisting of the conjunction of the eight axioms of
Robinson’s Q, because the consequent is confirmed by the antecedent relative to
62
this background knowledge. Since in conjunction with the eight axioms of Robin-
son’s Q, the consequent of such an instance (the claim that all numbers have a
given property) logically implies its antecedent (the claim that zero has this prop-
erty and that n+ 1 does whenever n does), it then follows that the consequent is
confirmed by the antecedent relative to the background knowledge consisting of
the conjunction of the eight axioms of Robinson’s Q if the conjunction of these
eight axioms and the antecedent is assigned a non-zero probability strictly less
than the probability assigned to the conjunction of these eight axioms. Hence,
were one to accept amplificatory empiricism, then there would be a straightfor-
ward connection between justification and probability, according to which one
would be justified in inferring from the antecedent of an instance of mathematical
induction to its consequent, against the background knowledge of the eight axioms
of Robinson’s Q, because of the probabilities assigned to these sentences.56, 57
Whereas amplificatory empiricism is a claim about how one may rationally
proceed from Robinson’s Q to mathematical induction, inceptive empiricism is a
claim about how one may rationally arrive at Robinson’s Q in the first place. In
particular, inceptive empiricism is the contention that one is justified in inferring
from several instances of the axioms of Robinson’s Q to these axioms themselves
because the axioms are confirmed tout court by the conjunction of these several
instances. For instance, Robinson’s Q includes the axiom ∀ x, y [x(y+1) = xy+x],
and inceptive empiricism claims that confirmation justifies one in inferring to this
axiom from several of its instances, such as 6(7 + 1) = 6 · 7 + 6. Let us call
this type of confirmation, wherein a universal claim is confirmed by several of
its instances, instance confirmation. Further, in the case where the claims in
question are arithmetical in character (resp. physical in character) let us call this
63
type of confirmation arithmetical instance confirmation (resp. physical instance
confirmation). So inceptive empiricism contends that the axioms of Robinson’s Q
can be justified by means of arithmetical instance confirmation.58
It is important to emphasize that inceptive empiricism and amplificatory em-
piricism are independent of one another.59 For instance, inceptive empiricism
relies on arithmetical instance confirmation in a way in which amplificatory em-
piricism does not, and hence were instance confirmation to be found to be somehow
inimical to justification, this would tell only against inceptive empiricism. Like-
wise, it does not seem irrational to endorse inceptive empiricism in addition to
some logicist account of the justification of mathematical induction,60 and hence
commitment to inceptive empiricism does not seem to demand commitment to
amplificatory empiricism. However, despite this independence, these two forms of
empiricism are complementary, in that they combine to give us a probabilistic ac-
count of the justification of the Peano axioms. In particular, inceptive empiricism
gives us a probabilistic route by which to proceed from a warrant for individual
quantifier-free truths about the natural numbers to a warrant for the axioms of
Robinson’s Q, and likewise amplificatory empiricism gives us a probabilistic route
by which to proceed from a warrant for the axioms of Robinson’s Q to a warrant
for instances of mathematical induction.
The goal of this chapter is to defend these two forms of empiricism against three
types of challenges, and in doing so to defend the tenability of the probabilistic
account of the justification of the Peano axioms which is jointly provided by these
two forms of empiricism. The first type of challenge is common to both forms
of empiricism, and stems from the fact that both of these forms of empiricism
presuppose that confirmation is a source of justification. The problem with this is
64
that there are reasons peculiar to the setting of arithmetic which suggest that we
do not have access to probability assignments and their associated confirmation
relations in this setting. One such reason has to do with different versions of
countable additivity, each of which provides a rule for calculating the probability
of non-propositional logical connectives, and another such reason has to do with
the non-computability of the probability assignments themselves. In the case of
countable additivity, my response in § 2.2.1 is to note that the particular version
of countable additivity which gives rise to this challenge is not a consequence of
the conception of rationality commonly associated with Dutch Book Arguments.
In the case of computability, I respond in § 2.2.2 by arguing that the tension
between the non-computability of probability assignments and the tractability of
rational belief dissipates if one views rational beliefs as being reflected by a family
of probability assignments, as opposed to a single probability assignment.
A second series of challenges are specific to the arithmetical instance confir-
mation upon which inceptive empiricism relies. In § 2.3.1, I discuss the first of
these challenges, which is due to Baker ([5]), who suggests that arithmetical in-
stance confirmation is alternatively unreliable or insufficiently diverse because it
relies upon small samplings. On the score of unreliability, my response is that
physical instance confirmation displays a similar reliance and yet fails to display
a similar unreliability, while on the score of insufficient diversity, my response is
simply that on three extant analyses of evidential diversity, arithmetical instance
confirmation is not insufficiently diverse. In § 2.3.2, I discuss a second challenge to
arithmetical instance confirmation, which suggests that it is objectionable because
it is unstable, where evidence for a universal hypothesis about a domain of objects
is said to be unstable if there are particular objects from the domain such that this
65
evidence can be bettered by the additional evidence that these objects satisfy the
hypothesis. My response to this challenge is to suggest that while stability may
be a virtue with regard to geometric reasoning, it is not a virtue in arithmetical
reasoning.
A final type of challenge is specific to amplificatory empiricism, and consists in
the challenge of explaining why the inference from the antecedent of an instance
of mathematical induction to its consequent is better than certain alternative
inferences. Recall that the consequent of an instance of mathematical induction
says that all numbers have a given property, and that the antecedent says that
zero has a property and that n + 1 has this property whenever n does. For
the sake of disambiguation, let us call this antecedent and this consequent the
genuine antecedent and the genuine consequent. Just as the genuine consequent
may be confirmed by the genuine antecedent, so it may be confirmed by the
following claim, which I call pseudo-antecedent: zero has the property and 2(n+1)
has the property whenever 2n does. Further, just as the pseudo-antecedent may
confirm the genuine consequent, so it may confirm the following claim, which I
call the pseudo-consequent: all even natural numbers have the property. In § 2.4, I
employ the degree of confirmation to explain why the genuine antecedent provides
better evidence for the genuine consequent than does the pseudo-antecedent, and
likewise, to explain why the pseudo-antecedent provides better evidence for the
pseudo-consequent than for the genuine consequent. In this section I also make
similar suggestions regarding alternative inferences centered around non-standard
integers.
Hence, this chapter constitutes a prolegomenon to a thoroughgoing empiricism
about the epistemology of arithmetic, in that it articulates and responds to what I
66
regard as the most pressing objections to inceptive and amplificatory empiricism–
objections which, if unanswerable, would render such empiricism unworthy of fur-
ther investigation. In § 2.5, I discuss the two primary tasks to which future work
on such empiricism must attend, namely, an identification of sources of arithmeti-
cal probability and a delineation of the type of arithmetical reasoning figuring in
the Peano axioms from the type of arithmetical reasoning figuring in the addition
and subtraction of probabilities. But the task of this present chapter is simply
to secure the tenability of this alternative conception of arithmetical knowledge.
For, despite the historical provenance of this probabilistic perspective, it seems
safe to say that it is entirely alien to the predominant ideas in the epistemology
of arithmetic, such as the logicist idea that knowledge of the Peano axioms is
epistemically akin to modus ponens, or the idea, mentioned above, that one has
a type of perceptual access to the natural number structure itself. The guiding
idea of this alternative probabilistic perspective is that mathematical induction
and the other Peano axioms are epistemically akin to enumerative induction, and
the aim of the present chapter is thus merely to give voice and answer to some of
the more pressing objections to the verisimilitude of this probabilistic conception
of the nature of arithmetical knowledge.
2.2 Challenges to Access to Probability Assignments
Both inceptive and amplificatory empiricism presuppose that confirmation is a
source of justification, and the challenges to be considered in this section suggest
in different ways that we do not have access to this source of justification, due to
the fact that probability in the setting of arithmetic is quite different in character
from probability in the setting of the natural sciences. In particular, both of the
67
challenges considered here adduce reasons for thinking that grasping probability
assignments in the setting of arithmetic is no less difficult than grasping arith-
metical truth itself. The first of these challenges arises from countable additivity,
each version of which provides a way to calculate the probabilities associated to
certain non-propositional logical connectives. My response to this challenge is to
argue that those versions of countable additivity which generate this challenge
do not follow from the conception of probability evinced in Dutch Book Argu-
ments (§ 2.2.1). The second of these challenges arises from the fact probability
assignments are in general non-computable. My response here is to suggest that
non-computability is an issue only if we take the type of probability to which we
have access to be reflected by a single probability assignment, as opposed to a
class of probability assignments (§ 2.2.2).
2.2.1 Countable Additivity: Aligning the True and Probable
There are several different versions of countable additivity, but their common
impetus lies in the thought that the probability axioms P1-P3 only articulate rules
of probability for the propositional connectives. For instance, it is straightforward
to derive from P1-P3 the following rules which relate probabilities to disjunctions,
conjunctions, and negations:
(P4) P (ϕ ∨ ψ) + P (ϕ & ψ) = P (ϕ) + P (ψ)
(P5) P (¬ϕ) = 1− P (ϕ)
The basic motivation behind countable additivity is to exhibit analogous rules for
non-propositional connectives. In particular, suppose that the formal language or
signature under consideration is the signature L of the Peano axioms, so that it
contains a function symbol S for successor and a constant symbol 0 for zero. It
68
follows from this that the signature L contains terms sn(0) corresponding to the n-
th successor of zero, e.g. s2(0) is s(s(0)), the second successor of zero, and s3(0)
is s(s(s(0))), the third successor of zero. One can then articulate the following ver-
sion of countable additivity, which for the sake of disambiguation can be referred
to as ω-additivity, where ϕ(x) is an L-formula with free variable x:
(Pω) P (∀ x ϕ(x)) = limN P (∧Nn=1 ϕ(sn(0)))
Hence, the idea of ω-additivity is that the probability of a universal arithmetical
hypothesis may be approximated arbitrarily closely by the probabilities assigned
to the sentences expressing that further and further arithmetical terms satisfy this
hypothesis.
To obtain a different version of countable additivity, one can consider an ex-
tension to a setting where one can form new sentences by taking conjunctions and
disjunctions over countable sets of sentences. These operations of conjunction
and disjunction are respectively written as∧n ϕn and
∨n ϕn, and the resulting
class of sentences are called Lω1ω-sentences. Relative to a natural semantics and
deductive system for these sentences, there is a completeness theorem for Lω1ω-
sentences,61 and hence the notion of a probability assignment on these sentences
can be defined. In particular, an Lω1ω-probability assignment is an assignment of
real numbers to Lω1ω-sentences which satisfies P1-P3 (relative to the consequence
relation on Lω1ω-sentences for which the completeness theorem holds). One can
then consider the following version of countable additivity, which for the sake of
disambiguation can be referred to as ω1-additivity, where ϕ1, ϕ2, . . . is a countable
sequence of Lω1ω-sentences:
(Pω1) P (∧n ϕn) = limN P (
∧Nn=1 ϕn)
Outside of the difference between the universal quantifier and the infinite conjunc-
69
tion, the primary difference between ω-additivity and ω1-additivity is an analogue
of the use-mention distinction: in ω-additivity, the natural number n is employed
to make a statement about the n-th successor of zero, whereas in the case of ω1-
additivity, it is only employed as an index for the sentence ϕn, which may or may
not be a statement about numbers at all.
While this difference between ω-additivity and ω1-additivity may seem innocu-
ous, it is not difficult to see that ω-additivity and only ω-additivity requires that
“having a high probability” align with arithmetical truth. For, suppose that the
conjunction of the eight axioms of Robinson’s Q is assigned a high probability,
say greater than 1 − ε, where ε is some small non-zero error threshold. Under
these circumstances, it follows from the fact that Robinson’s Q proves the cor-
rectness of the addition and multiplication tables that if a probability assignment
satisfies ω-additivity, then an arithmetical sentence is true of the standard model
if and only if it is assigned probability greater than 1 − ε. Here the standard
model is the structure (ω, 0, s,+,×), where ω = {0, 1, 2, . . .} is the set of natural
numbers. Hence, if a probability assignment satisfies ω-additivity, then registering
a high probability by reference to this assignment is coextensive with truth for
arithmetical sentences.62 However, the same is not the case with respect to ω1-
additivity. In particular, it is not difficult to see that for any sentence of first-order
predicate logic which is not a consequence of the axioms of Robinson’s Q, there
is an ω1-additive probability assignment which assigns this sentence probability
zero and which still gives the conjunction of the axioms of Robinson’s Q a high
probability.63 This simple fact shows that unlike ω-additivity, it is not the case
that the satisfaction of ω1-additivity forces the alignment of the arithmetically
true and the arithmetically probable.
70
Since the idea common to inceptive and amplificatory empiricism is that arith-
metical claims can be justified by recourse to judgements about confirmation and
probability, it is important for the tenability of these forms of empiricism that
they not be committed to ω-additivity. For, by the result mentioned in the pre-
vious paragraph, such commitment would force the arithmetically true to align
with the arithmetically probable, and such alignment would cast doubt on our ac-
cess to judgements about confirmation and probability in the case of arithmetic.
To see this, consider an analogous scenario centered not around probability but
around perception. Should someone posit perception as a source of justification
about arithmetic, but then inform us that this sort of perception happened to
be infallible, it seems that the proper response would be to question whether this
type of perception is something which we actually possess, given that it is so man-
ifestly different from our normal modes of perception. Likewise, it seems that the
alignment of the true and the probable in the case of arithmetic should lead us
to question whether we have access to arithmetical probability. Since such access
is vital to the ultimate tenability of inceptive empiricism and amplificatory em-
piricism, it is necessary to say why these forms of empiricism are not committed
to ω-additivity.
My response to this challenge is to argue that the reasons which commit in-
ceptive empiricism and amplificatory empiricism to the probability axioms P1-P3
do not extend to ω-additivity, even though they do extend to ω1-additivity. For,
it is common today to justify commitment to P1-P3 by taking recourse to Dutch
Book arguments, and just as it is demonstrable that ω1-additivity is justifiable
by recourse to such arguments, so it is likewise demonstrable that ω-additivity is
not so justifiable. Let me first describe the relevant theorems and non-theorems
71
before turning to the relation of the theorems to the justification of probability
axioms. The theorems and non-theorems in question here concern complete con-
sistent theories T in the signature L of the Peano axioms, and in what follows it
will be convenient to regard such complete extensions as zero-one valued functions
on the set of L-sentences, so that T (ϕ) = 1 if T |= ϕ and T (ϕ) = 0 otherwise.
Having this convention in place, the standard version of the Dutch Book Theorem
reads as follows:
Dutch Book Theorem, Standard Version: Suppose that P is a functionfrom L-sentences to real numbers. Then P is a probability assignment iffor every finite sequence of real numbers s1, . . . , sN and every finite sequenceof L-sentences ϕ1, . . . , ϕN , there is a complete consistent L-theory T suchthat
∑Nn=1 sn(T (ϕn)− P (ϕn)) ≥ 0.
The situation described in the antecedent of the theorem may be vivified as follows.
Suppose that a bookie offers stakes sn of units of currency on sentence ϕn and
that a bettor provides the bookie with snP (ϕn) units. Suppose further that there
is an agreement in place that if ϕn turns out false, then the bettor wins nothing
(for a net total of −snP (ϕn) units), and that if ϕn turns out true, then the bettor
wins sn (for net total of sn − snP (ϕn) units). Finally, say that the bettor is
invulnerable to a Dutch book if for any finite sequence of bets there is always some
situation– representable in terms of a complete, consistent theory– in which the
net total due to the bettor across all bets is not strictly negative. Hence, cast in
these terms, the Dutch Book theorem says that invulnerability to a Dutch book
is a sufficient condition for an assignment to be a probability assignment.64
The technical point that I view as relevant here is that while the analogous
theorem holds for ω1-additivity, it does not hold for ω-additivity. In particular, it
is well-known that by appropriately augmenting the proof of the standard version
of the Dutch Book Theorem,65 one can establish the following:
72
Dutch Book Theorem, ω1-additive Version: Suppose that P is a functionfrom Lω1ω-sentences to real numbers. Then P is an ω1-additive Lω1ω-probabilityassignment if for every infinite sequence of real numbers sn and every in-finite sequence of Lω1ω-sentences ϕn such that the sequence snP (ϕn) isabsolutely convergent, there is a complete consistent Lω1ω-theory T suchthat
∑n sn(T (ϕn)− P (ϕn)) ≥ 0.
In developing the analogous version for ω1-additivity, it turns out that it is im-
portant to include the stipulation about absolute convergence.66 Here, absolute
convergence means that∑∞
n=1 |snP (ϕn)| < ∞, i.e. that the sequence of partial
sums∑N
n=1 |snP (ϕn)| approaches a finite limit in the real numbers. In terms of
the betting scenario described above, this corresponds to the requirement that
the units of currency potentially exchanged between the bookie and the bettor be
finite.
However, when we turn from ω1-additivity to ω-additivity, what we find is that
the analogous version of the Dutch Book Theorem is false:
Counterexample to Dutch Book Theorem, ω-additive Version: There is a
function P from L-sentences to real numbers such that (i) P is not an ω-
additive probability assignment, and such that (ii) P has the following prop-
erty: for every infinite sequence of real numbers sn and every infinite se-
quence of L-sentences ϕn such that the sequence snP (ϕn) is absolutely con-
vergent, there is a complete consistent L-theory T such that∑
n sn(T (ϕn)−
P (ϕn)) ≥ 0.
It is quite easy to produce such a counterexample. In particular, choose a com-
plete consistent L-theory T0 such that T0 implies Robinson’s Q and such that T0
proves ¬ψ, where ψ is true on the standard model and where ψ ≡ ∀ x ψ0(x)
begins with a universal quantifier followed by a quantifier-free formula ψ0(x) or
by a formula ψ0(x) whose quantifiers are bound to variables appearing earlier in
73
the sentence. For instance, the claim that x is always strictly less than 2x for
non-zero values of x can be expressed in this way, as well as the consistency state-
ment for the Peano axioms. Given this sentence ψ and this theory T0, then define
a function P from L-sentences to real numbers by setting P (ϕ) = T0(ϕ). Since
Robinson’s Q ensures the correctness of the addition and multiplication tables,67
it follows that
P (∀ x ψ0(x)) = T0(ψ) = 0 6= 1 = limNT0(
N∧n=1
ψ0(sn(0)))
= limNP (
N∧n=1
ψ0(sn(0)))
Hence, this is how one obtains the failure of ω-additivity, which corresponds to
the roman numeral (i) in the counterexample. It is much easier to see how one
obtains roman numeral (ii) in the counterexample, since one can always choose the
complete consistent theory T to be identical to the complete consistent theory T0,
which will ensure that the sum in question is equal to zero. So this is one way to
see that there is no ω-additive version of the Dutch Book Theorem.
The philosophical significance of Dutch Book Theorem resides in the fact that
invulnerability to a Dutch book is indicative of a certain type of rationality when
the assignment in question is reflective of degrees of belief, so that the theorems
show that conformity to the probability axioms P1-P3 is a necessary condition of
a certain type of rationality. The type of rationality implicated here is of course
minimally thought to require a disposition to arrange degrees of belief in such a
way that were one to bet units of currency on these degrees of belief, then there
would be at least one situation in which a loss would not be suffered. There are
thus at least two presuppositions to the contention that this type of rationality
74
constitutes an epistemic virtue. The first presupposition is that some virtues are
revealed purely in terms of counterfactual behavior, since it is obviously not en-
visioned here that one actually engages in such betting scenarios. But while such
“dormant virtues” may be a rarity in the practical sphere, they are commonplace
in the theoretical sphere. For instance, there is a virtue related to consistency
which consists in a disposition to retract previously endorsed axioms were they
to exhibit a demonstrable inconsistency, and it seems reasonable to say that this
virtue is present in our reasoning even if it turns out that the axioms in question
(say the set-theoretic axioms) are in fact consistent. The second presupposition
is that there is a suitable abundance of potential situations across which gains or
losses may be incurred, since were the number or variety of these situations to be
highly curtailed, then the demands of invulnerability would become quite severe.
However, since we are identifying potential situations with complete consistent
theories in a given formal language or signature, the fact that there are contin-
uum many of these would seem sufficient to allay concerns about the severity of
the demands of invulnerability.
Hence, my response to the challenge of ω-additivity is to suggest that in-
ceptive and amplificatory empiricism be conceived as justifying their appeal to
confirmation and probability by means of Dutch Book Theorems, so that the fact
that there is no ω-additive Dutch Book Theorem may be taken as evidence that
these forms of empiricism are not committed to ω-additivity. While this response
clearly meets the challenge of ω-additivity, it has at least two drawbacks. The
first is that if these forms of empiricism are tied to the philosophical interpreta-
tion of the Dutch Book Theorems described above, then all the concerns voiced in
the literature about this interpretation automatically become concerns for these
75
forms of empiricism.68 The second drawback is that if inceptive and amplificatory
empiricism are going to operate only with those rules of probability which are
licensed by Dutch Book Theorems, then these forms of empiricism cannot justify
the contention that various kinds of confirmation actually occur by recourse to
probabilistic rules. For instance, inceptive empiricism turns on the supposition
that several instances of a universal arithmetical hypothesis are assigned a non-
zero probability strictly less than one, and this supposition by no means follows
from the probability axioms P1-P3 alone. Hence, if these forms of empiricism
are only allowed to operate with these probabilistic rules, then for their ultimate
success they must provide other reasons for giving such assignments. This is one
of the further challenges to these forms of empiricism which I discuss in § 2.5.
Before turning to the challenge from computability, it is helpful to briefly com-
pare this response to ω-additivity to Isaacson’ well-known response to the ω-rule
([77] § III). The ω-rule is a proof-theoretic rule which licenses the inference to the
claim that ∀ x ϕ(x) from the totality of all claims of the form ϕ(sn(0)), where n
ranges over natural numbers. For the very same reasons that the arithmetically
true and the highly probable become aligned under ω-additive probability assign-
ments which assign high probability to Robinson’s Q, so the arithmetically true
is aligned with what is derivable from Robinson’s Q in deductive systems that
are augmented by the ω-rule. Isaacson was concerned with this because he had
previously argued that the Peano axioms in conjunction with the standard rules
of inference were effectively complete and completely determined our concept of
number ([76]). Thus Isaacson was concerned to show that the ω-rule was not part
of our concept of number, since otherwise the collapse of truth and proof engen-
dered by the ω-rule would make this concept vastly outstrip the concept given to
76
us by the Peano axioms and standard rules of inference.
One of Isaacson’s basic strategies is to point out that standard defenses of
the ω-rule appeal to truth about the natural numbers, and such an appeal to
truth about the natural numbers is not part of our concept of number, but goes
above and beyond this concept, and is essentially a second-order or higher-order
concept ([77] p. 108). There are obviously many differences between Isaacson’s
strategy for handling the ω-rule and my strategy for handling ω-additivity, but
the one difference which bears especial mentioning is that my discussion of ω-
additivity did not at any place appeal to points specific to our concept of number.
Rather, my discussion focused entirely on what did and did not follow from a
standard justification of the probability axioms. The analogue of my strategy in
Isaacson’s setting would be to argue the ω-rule did not follow from some standard
justification of the other accepted rules of inference, such as modus ponens.
2.2.2 The Non-Computability of Probability Assignments
I want now to turn to a challenge to access from considerations of computabil-
ity. The basic idea with this version of the challenge is that probability assignments
are in general non-computable, and that such non-computability should render
suspect the presupposition that these probabilities are something which we can
readily discern. Prior to setting out this version of the challenge more carefully,
something must first be said about what it means to say that a probability as-
signment is computable or non-computable. For, the predicates of computable
and non-computable apply only to subsets of natural numbers, and by proxy, to
countable objects which are represented as subsets of natural numbers.69 How-
ever, the real numbers, with which probability assignments are concerned, are
77
uncountable. Despite this, there are at least two natural ways of representing
probability assignments as countable objects. First, one can restrict one’s atten-
tion to those assignments which map sentences into a countable subfield of the
real numbers, such as the real algebraic numbers, the smallest subfield of the
real numbers which is elementarily equivalent to the real numbers.70 Second, one
can restrict one’s attention to those assignments which maps sentences not into
real numbers per se, but rather into certain representations of these numbers as
quickly-converging Cauchy sequences of rationals. In this way, these assignments
can be represented as countable sequences of such Cauchy sequences, so that the
predicates of computable and non-computable are readily applicable.71 Each of
these means of representation is admittedly not without its disadvantages: the
first means of representation excludes many real numbers (such as e and π), while
the second means prescribes what might be regarded as an overly uniform manner
of representation.
But under either of these two modes of representations, one can show that if
a probability assignment assigns non-zero probability to the conjunction of the
eight axioms of Robinson’s Q, then the probability assignment is not computable.
This argument proceeds by showing that such an assignment computes a complete
consistent extension of Robinson’s Q, which is known to be non-computable by
work of Tarski (cf. Tarski et. al. [145] Theorem 9 p. 60). In particular, suppose
that the sentences in the formal language or signature L of Robinson’s Q are
enumerated as ϕ1, . . . , ϕn, . . . in such a way that ϕ1 is the conjunction of the
eight axioms of Robinson’s Q. Supposing that P is a probability assignment
such that P (ϕ1) > 0, it must be shown that P computes a complete consistent
extension TP of Robinson’s Q. Let TP (ϕ1) = 1, and suppose that for all i < n it
78
has already been decided whether to set TP (ϕi) = 0 or TP (ϕi) = 1 in such a way
that
0 < P (∧
TP (ϕi)=1
ϕi &∧
TP (ϕi)=0
¬ϕi) (2.1)
Then it follows from P1-P3 that
0 < P ([∧
TP (ϕi)=1
ϕi] & ϕn & [∧
TP (ϕi)=0
¬ϕi]) + P ([∧
TP (ϕi)=1
ϕi] & [∧
TP (ϕi)=0
¬ϕi] & ¬ϕn)
(2.2)
so that at least one of the two quantities featured in this sum is strictly pos-
itive. If one computes from P that the first quantity is strictly positive, then
set T (ϕn) = 1 and T (¬ϕn) = 0, and if one computes from P that the second
quantity is strictly positive, then do the converse.72 This construction results in
complete theory TP which extends Robinson’s Q and which is computable from
the probability assignment P . Further, this theory is consistent, since if not, then
there is some finite fragment of the theory which proves a contradiction. Since the
axioms P1-P3 imply that contradictions are assigned probability zero, and since
they likewise imply that equivalent sentences are assigned the same probability,
and since anything which proves a contradiction is equivalent to a contradiction,
it follows from P1-P3 that the conjunction of some finite fragment of TP would be
assigned probability zero. This contradicts our construction, in which all the finite
fragments of TP were assigned non-zero probability (cf. equation (2.1)). Hence,
this is one way to see that TP is a complete consistent extension of Robinson’s Q
which is computable from P . From this it follows by the theorem of Tarski that
the probability assignment P is non-computable, at least assuming it assigns the
79
conjunction of the eight axioms of Robinson’s Q a non-zero probability.
This is important because both inceptive and amplificatory empiricism re-
quire that the conjunction of the eight axioms of Robinson’s Q be assigned a
non-zero probability. For instance, inceptive empiricism claims that the hypoth-
esis h consisting of the conjunction of the eight axioms of Robinson’s Q may be
confirmed by evidence e saying that various instances of these universal axioms
hold. However, if this hypothesis h is assigned probability zero, then one has
that P (h|e) − P (h) = 0 − 0 = 0, so that no confirmation occurs. Likewise, am-
plificatory empiricism claims that the hypothesis h consisting of a consequent of
an instance of mathematical induction may be confirmed, relative to background
knowledge k consisting of Robinson’sQ, by evidence e consisting of the antecedent.
However, if this background knowledge is assigned probability zero, then the quan-
tity P (h|e & k)−P (h|k) is not defined, so that confirmation cannot occur. Thus,
the probability assignments to which inceptive and amplificatory empiricism make
avail will inevitably assign a non-zero probability to the conjunction of the eight
axioms of Robinson’s Q.
Thus, these probability assignments will be non-computable, and this raises
the concern that the sources of justification to which these forms of empiricism
take recourse are simply not available to us. One might object to this concern
by suggesting that such an insistence upon computability is tantamount to an
appeal to ignorance, since all known natural examples of sets of natural numbers
are either computable or are non-computable because they compute the halting
set, e.g. they contain information about the behavior of all partial computable
functions.73 Likewise, one might question the role which computability could play
in undergirding these forms of empiricism in the first place, since it seems difficult
80
to see how one could recognize a computable function as such without some prior
grasp of the arithmetical axioms which these forms of empiricism seek to justify.74
In light of these concerns, it seems that the proper way to defend this insistence
on computability would be to suggest that the computability of those sources
of justification to which the appellations of computability and non-computability
apply is at best a prima facie requirement, a requirement whose satisfaction the
agent or subject in question need not necessarily be in a position to verify.
My suggestion is not to dispute this requirement of computability, but to
question whether the conformity of inceptive and amplificatory empiricism to this
requirement actually entails the computability of specific probability assignments.
For, inceptive and amplificatory empiricism appeal to probability assignments
and their associated confirmation relations as a source of justification, but it is
not clear that such an appeal needs to be conceived of as an appeal to a specific
probability assignment. To reiterate a suggestion which has been proffered in
other contexts, the relevant concepts of probability and confirmation might be
better represented by a large and perhaps even uncountable class of probability
assignments, as opposed to a single probability assignment.75 If this were the
case, then since the predicates of computable and non-computable do not apply
to such classes, then the non-computability of the individual members of this class
would not violate the aforementioned requirement of computability, which only
insists on computability of those sources of justification to which the predicates
of computable and non-computable apply.
Before turning to a concern which one might have with this response to the
challenge of the non-computability of probability assignments, it is helpful to draw
an analogy between the situation of probability assignments and the situation of
81
complete consistent extensions of theories. The result of Tarski mentioned above
says that there is no complete computable consistent extension of Robinson’s Q.
However, it seems that there is no record of anyone ever suggesting that Tarski’s
result represents an epistemic barrier to Robinson’s Q and the type of knowledge
about the natural numbers that this axiomatization provides. Presumably, this
is so because no one ever thought that appeal to these axioms involved appeal to
a specific complete consistent extension. Rather, presumably the idea is that one
is appealing to these axioms themselves, along with whatever can be legitimately
deduced from them by means of standard rules of inference. Likewise, the idea
behind my response to the challenge of non-computability is to suggest that when
inceptive and amplificatory empiricism appeal to probability assignments, they
appeal not to particular probability assignments, but rather they appeal to a
potentially large class of such assignments whose members satisfy the axioms P1-
P3, as well as to whatever else can be legitimately inferred from these and other
characteristic properties of the class by means of accepted rules of deductive and
inductive inference.
One might object to this response by suggesting that while the predicates of
computable and non-computable do not apply to uncountable classes of proba-
bility assignments, there is nevertheless a related measure of complexity which
applies to such classes, that while uncountable classes might be excepted from the
aforementioned requirement of computability, they need not be excepted from an
analogous requirement on the non-complexity of classes. In mathematical logic,
one standard measure of the complexity of classes of subsets of natural numbers is
given by the so-called arithmetical hierarchy, which measures how many alterna-
tions of quantifiers over natural numbers are required to define the class.76 For in-
82
stance, the class of complete consistent extensions of Robinson’s Q can be defined
with only one universal quantifier over natural numbers, since consistency merely
requires that every proof not be a proof of contradiction, and since complete-
ness merely requires that every sentence in the language be included or excluded
from the theory. Hence, in analogue to the aforementioned requirement of com-
putability, one might require that those sources of justification which are naturally
identified with such classes have minimal complexity under this quantifier-based
measure of complexity.
This requirement of non-complexity of classes is of course only precise to the
extent that some minimum complexity level is antecedently specified, but it seems
that under any reasonable specification of such a minimum, the class of proba-
bility assignments will surely satisfy this requirement. For, under the second
representation of probability assignments as countable objects described at the
outset of this subsection, the class of representations of probability assignments
will be definable by one universal quantifier over natural numbers, just like the
class of complete consistent extensions of Robinson’s Q. Further, under the first
representation of probability assignments as countable objects described above,
the class of such representations will be definable by a universal quantifier fol-
lowed by an existential quantifier. Hence, it seems that the class of probability
assignments will surely satisfy any reasonable demands of non-complexity when
such complexity is measured in terms of the number of alternations of quantifiers
required to define the class.
There is a certain analogy between this reply to the concern about class com-
plexity and a point that Kevin Kelly has made in a number of places ([86–89]).
Part of the background to this point is that Kelly is in general critical of the
83
idea that scientific inference can be captured via probability assignments (cf. [88]
p. 96). His alternative suggestion is that one should view scientific hypotheses
as a conjunction of (i) a description of a class of possible observation sequences
and (ii) a predication that the actual sequence of observations will be among this
class. One of the virtues of this approach is that it allows for a means by which
to characterize the simplicity of hypotheses (and other virtues of hypotheses, such
as verifiability and refutability), namely, in terms of the quantifier-based measure
describe above (cf. [89] § 3, [87] § 3). Likewise, another of the virtues of this
approach is that it accounts for how scientific hypotheses can be simple under this
measure and yet how the actual sequence of observations may be very complex
under the analogous measure of computability and non-computability (cf. [89]
§ 8, [87] § 4). While the entire idea of inceptive and amplificatory empiricism is
based on a conception of probability assignments whose value Kelly would dispute
in the arena of scientific inference, the point made in the last few paragraphs can
be viewed as a kind of meta-level analogue of Kelly’s idea. For, the idea has been
that these forms of empiricism involve a class of assignments which is simple under
the measure of complexity of classes, but whose individual members are highly
complex under the measure of computability and non-computability.
2.3 Challenges to Arithmetical Instance Confirmation
In the previous section, challenges common to both inceptive and amplificatory
empiricism were considered. These challenges suggested that for reasons related
to countable additivity and computability, the probability assignments and asso-
ciated confirmation relations to which these forms of empiricism make avail are
simply not available to us. In this section, challenges specific to arithmetical
84
instance confirmation are considered. Recall that arithmetical instance confirma-
tion is that type of confirmation in which a universal arithmetical hypothesis is
confirmed by evidence to the effect that several numbers satisfy the property in
question, and further recall that physical instance confirmation is defined analo-
gously in terms of physical hypotheses and physical objects. Of the two forms of
empiricism considered here, it is only inceptive empiricism that relies on arith-
metical instance confirmation, and so the challenges to be presently considered
are peculiar to inceptive empiricism.
The first of these challenges is due to Baker ([5]), who suggests that arith-
metical instance confirmation displays either unacceptable levels of unreliability
or insufficient levels of diversity of evidence. I argue in § 2.3.1 that arithmetical
instance confirmation is no more unreliable than physical instance confirmation,
and that arithmetical instance confirmation is not insufficiently diverse on sev-
eral extant analyses of evidential diversity. The second of these challenges is that
arithmetical instance confirmation is objectionable because it is unstable. As a
first approximation, an inference from evidence to a universal hypothesis is unsta-
ble if it may be materially bettered by the further evidence that various particular
objects satisfy this universal hypothesis. In response to the challenge from stabil-
ity, I argue in § 2.3.2 that while stability may be a virtue with regard to certain
forms of mathematical reasoning, such as geometrical reasoning, it is not a virtue
of arithmetical reasoning.
2.3.1 Baker and the Exigencies of Arithmetical Sampling
Baker’s thesis is that arithmetical instance confirmation is biased in a way in
which physical instance confirmation is not, and that this is due to the samplings
85
in arithmetical instance confirmation being small.77 However, there are at least
two natural senses in which such samplings may be said to be small, which I
call setwise-small and pointwise-small, and there are two relevant dimensions of
bias to be found in Baker’s essay, namely unreliability and insufficient diversity.78
Hence, one can distinguish between several different versions of Baker’s thesis,
depending on whether the relevant notion of smallness is setwise- or pointwise-
smallness, and depending on whether the relevant notion of bias is unreliability or
insufficient diversity. Hence, subsequent to defining these two notions of smallness,
I turn to versions of Baker’s thesis centered around setwise-smallness, and then
to versions of Baker’s thesis centered around pointwise-smallness.
These two notions of setwise-smallness and pointwise-smallness apply to finite
sets of natural numbers, and they both are defined in terms of a third notion of
smallness which applies to individual natural numbers. For instance, 100 is a small
natural number, but 1001000 is not a small natural number, and if one natural
number is small and another number is less than it, then that second natural
number is small as well.79 A finite set of natural numbers can then be said to be
pointwise-small if each of its elements is a small natural number, while a finite
set of natural numbers can be said to be setwise-small if its cardinality is a small
natural number. Hence, with the exception of the set of all small natural numbers,
any pointwise-small set is itself setwise-small.80 However, the converse is not in
general true. For instance, given the 3-element sets X = {2001000, 3001000, 4001000}
and Y = {2, 3, 4}, it follows that Y is both setwise-small and pointwise-small,
whereas X is setwise-small but not pointwise-small.
One version of Baker’s thesis would thus contend that arithmetical instance
confirmation is biased in a way in which physical instance confirmation is not,
86
and that this is due to the samplings in arithmetical instance confirmation being
setwise-small. It is presumably indisputable that arithmetical instance confirma-
tion is in fact based on setwise-small samplings of natural numbers. For, even
with the aid of computers, one can only look at so many natural numbers, and in
comparison with the set of all natural numbers, the cardinality of such samplings
will inevitably appear diminutive. However, presumably physical instance con-
firmation relies on setwise-small samplings in exactly the same manner: indeed,
the same sorts of constraints that prevent us from doing innumerable calcula-
tions also prevent us from taking innumerable measurements. Hence, regardless
of whether bias is understood in terms of unreliability or insufficient diversity,
a difference between the levels of bias in arithmetical instance confirmation and
physical instance confirmation cannot be attributable to a difference in the man-
ner in which they rely upon setwise-small samplings, simply because they so rely
on setwise-small samplings in exactly the same way. Hence, versions of Baker’s
thesis centered around setwise-smallness seem plainly untenable.
Thus, it seems that Baker’s thesis might be more profitably understood in
terms of pointwise-smallness. Here it is helpful to explicitly note by way of score-
keeping that versions of Baker’s thesis centered around setwise-smallness have im-
plications for versions of Baker’s thesis centered around pointwise-smallness, for
the simple reason that Baker’s thesis is a thesis about the implications of small
samplings, and as noted above, virtually all pointwise-small samples are setwise-
small samples.81 However, the converse does not hold– i.e. most setwise-small
samples are not pointwise-small samples– and hence versions of Baker’s thesis
centered around pointwise-smallness do not automatically have implications for
versions of Baker’s thesis centered around setwise-smallness. Hence, one cannot
87
infer directly from the untenability of the latter to the untenability of the for-
mer, and so it is thus necessary to consider separately versions of Baker’s thesis
centered around pointwise-smallness.
So let us first consider the version of Baker’s thesis centered around pointwise-
smallness and unreliability. This version of the thesis holds that arithmetical
instance confirmation is unreliable in a way that physical instance confirmation is
not, and that this is due to the samplings in arithmetical instance confirmation be-
ing pointwise-small. Here, the unreliability of instance confirmation is understood
in a standard manner, so that a relative increase in unreliability is concomitant
with a relative increase in the number of false universal hypotheses which are con-
firmed by several true instances. Now, it seems hard to dispute that samplings
in arithmetical instance confirmation are drawn exclusively from pointwise-small
samples: for, given constraints of time and space, even the best computers can
only calculate with numbers of so large a size, and mathematicians likewise face
similar sampling constraints and limitations.
Hence, it seems that the only contentious point in this version of Baker’s thesis
is the claim that such a reliance upon pointwise-small samples renders arithmetical
instance confirmation unreliable in a way in which physical instance confirmation
is not. However, there is an obvious analogue of this reliance upon pointwise-small
sampling in the case of physical instance confirmation, an analogue suggested by
Baker himself.82 In particular, say that a sampling of physical data is timewise-
small if each data point in the sampling was measured (or otherwise observed)
at a point in time that is relatively close to the present. Just as it seems in-
disputable that samplings of natural numbers are pointwise-small, so it seems
indisputable that samplings of physical data are timewise-small. However, it is
88
generally conceded that physical instance confirmation is sufficiently reliable.83
Hence, if the inference from the dependence of physical instance confirmation on
timewise-small samplings to the unreliability of physical instance confirmation is
rejected, but at the same time the inference from the dependence of arithmetical
instance confirmation on pointwise-small samplings to the unreliability of arith-
metical instance confirmation is accepted, then one should be able to point out
some relevant difference between the two cases.
Baker suggests that the relevant difference between the two consists in the fact
that “there are no [. . . ] systematic differences between the past and the future
[. . . ].”84 It may indeed be the case that many of the properties that interest
scientists are in fact temporally invariant in this sense, so that what is true of
timewise-small samplings will likewise be true in general. However, it is also the
case that many of the properties that interest mathematicians are such that what
is true of pointwise-small samplings is likewise true in general. For instance, this is
the case with respect to the properties which feature in the instance confirmation
of the axioms of Robinson’s Q. So if there is to be a disanalogy between the
setting of arithmetic and the setting of the physical sciences here, it has to be
with regard to something deeper than the fact that many of the properties which
interest scientists (resp. the mathematician) are projectable from timewise-small
samplings (resp. pointwise-small samplings).
Hence, one might try to suggest that the relevant difference between arith-
metical and physical instance confirmation consists in the fact that all physical
properties are projectable from timewise-small samplings, whereas not all arith-
metical properties are projectable from pointwise-small samplings. But if the key
term of physical properties is understood as a naturalistic term that simply picks
89
out spatio-temporal properties describing portions of the external world, then it is
simply false that all physical properties are so projectable. Indeed, were this the
case, then knowledge of the future would be much easier to come by than it actu-
ally is. Likewise, if the key term of physical properties is understood historically,
as picking out those properties that have interested certain intellectual commu-
nities, then again it is false that all these properties are temporally-projectable:
for, were this the case, then science would be endowed with a kind of infallibility
which is definitively vitiated by the historical record.
It might then be suggested that the important difference between arithmetical
instance confirmation and physical instance confirmation is the success of the
extant practice: most of the physical properties picked out by the community of
scientists have in fact turned out to be temporally projectable, whereas there is no
similar track record of success of mathematicians projecting from the pointwise-
small. So part of the idea here would be that natural scientists somehow learned
to discern the properties of physical objects that do not depend on their temporal
location, whereas mathematicians have yet to learn to discern the properties of
numbers that do not depend on their location in the ordering of greater-than and
less-than on the natural numbers. I am willing to grant all this for the sake of
argument: however, what I want to emphasize is that this does not establish an
entailment from the reliance upon pointwise-small samplings to the unreliability
of arithmetical instance confirmation, any more than pointing to the failures of
pre-scientific communities to project from the temporally-small would establish
that there is an entailment from the reliance upon temporally-small samplings to
the unreliability of physical instance confirmation.
Hence, it seems that no relevant difference has been identified which would jus-
90
tify us in simultaneously accepting the inference from the reliance upon pointwise-
small samplings to the unreliability of arithmetical instance confirmation while
rejecting the inference from the reliance upon temporally-small samplings to the
unreliability of physical instance confirmation. However, it is important to em-
phasize that one could legitimately reject the need to identify such a relevant
difference. For instance, one might legitimately reject such a need by producing a
valid premise-conclusion argument (with plausible premises) for the inference from
the reliance upon pointwise-small samplings to the unreliability of arithmetical in-
stance confirmation. However, neither Baker nor anyone else that I know of has
produced such an argument. Absent such an argument for this inference, it does
not seem to be unreasonable philosophical methodology to withhold assent from
this inference until some relevant difference between it and a clearly dubious albeit
similar inference has been identified.
However, perhaps the situation is different with regard to the version of Baker’s
thesis centered around pointwise-small samplings and insufficient diversity. This
version claims that arithmetical instance confirmation is insufficiently diverse in
a way in which physical instance confirmation is not, and that this is due to the
samplings in arithmetical instance confirmation being pointwise-small. In what
follows, I will grant for the sake of argument that, all other things being equal,
sufficiently diverse evidence better confirms a hypothesis than insufficiently diverse
evidence, and I will focus instead on ascertaining whether, according to various
analyses of diversity, it is in fact the case that arithmetical instance confirmation
is insufficiently diverse due to its reliance upon pointwise-small samplings. My
conclusion is that there is no extant analysis of evidential diversity on which
arithmetical instance confirmation is insufficiently diverse due to its reliance upon
91
pointwise-small samplings.
On one analysis, evidential diversity is tied to probabilistic independence, in
that evidence e1 and e2 for hypothesis h is said to be diverse to the extent that
the two quantities P (e1 & e2) and P (e1) ·P (e2) are close to one another (either in
terms of their quotient being close to one, or their difference being close to zero).85
Hence, on this analysis, the version of Baker’s thesis presently under considera-
tion would predict that the reliance of arithmetical instance confirmation upon
pointwise-small samplings would result in evidence that displayed an insufficient
level of probabilistic independence. However, there are some examples that do not
accord with this prediction. For instance, consider a case which Baker himself ex-
amines, namely the Goldbach conjecture. This is a universal statement, and so can
be written as h ≡ ∀ x H(x). Hence, in arithmetical instance confirmation which
relies upon pointwise-small samples, the evidence for this hypothesis h would be
of the form e1, . . . , eN , where N is a small natural number, where en ≡ H(sn(0)),
and where e.g. s2(0) = s(s(0)) denotes the successor of the successor of zero.
Further, suppose (as is natural in this setting) that the conjunction of the eight
axioms of Robinson’s Q is assigned a high probability, and suppose further that
(as expected) the Goldbach conjecture is true. Then since Robinson’s Q proves
the correctness of the addition and multiplication tables,86 and since en can be
written with only bounded quantifiers, it follows that the en as well as any finite
conjunction of the en will likewise be assigned a high probability. But then the
quantities P (en & em) and P (en) · P (em) will be close to one (and hence their
quotient will be close to one and their difference will be close to zero), so that
on this analysis the evidence will be diverse. Hence, if evidential diversity is ana-
lyzed in terms of an approximation to probabilistic independence, it is simply false
92
that arithmetical instance confirmation is insufficiently diverse due to its reliance
upon pointwise-small samplings: for, this example is an example where the sam-
plings are pointwise-small and yet the evidence is close to being probabilistically
independent.87
Another analysis connects the diversity of evidence to low likelihood with re-
spect to a pool of competing plausible hypotheses, in the following sense. Suppose
that there is evidence which confirms each of a number of competing, plausible
hypotheses. Further suppose that the hypotheses are competitors in that they
are mutually exclusive (no two can both be true) and mutually exhaustive (one
must be true), and suppose that they are plausible in that they all have some
antecedently-specified non-trivial level of prior probability. Then on this analysis,
the evidence e is said to be diverse to the extent that its likelihood P (e|h) is low
with respect to a large number of these hypotheses h.88 The guiding intuition here
seems to be that diverse evidence is evidence whose complexity shields it from be-
ing rendered likely by the majority of these competing plausible hypotheses: it
is diverse not because of some feature intrinsic to the evidence itself, but rather
because it is unexpected from the vantage point of most of these hypotheses.89
However, on this analysis, it does not seem that reliance upon pointwise-small
samplings in arithmetical instance confirmation entails insufficient levels of diver-
sity. For, in arithmetical instance confirmation, the pool of plausible competing
hypotheses would presumably be restricted to a universal arithmetical hypothe-
sis h ≡ ∀ x H(x) and its negation ¬h, and the evidence in this case would be of
the form eX =∧n∈X H(sn(0)), where X ranges over setwise-small sets of natural
numbers. Further, on this analysis, evidence eX will be diverse to the extent that
the likelihood P (eX |¬h) is low (since the likelihood P (eX |h) is always equal to
93
one). Hence, to contend that reliance upon point-wise small samplings results in
insufficient diversity would be to contend here that P (eX |¬h) is not sufficiently
low when X is pointwise-small.
However, it seems difficult to see how the pointwise-smallness of X is supposed
to impact the quantity P (eX |¬h) one way or another. For instance, suppose that
one evaluates the quantity P (e′|h′) in terms of the level of assent which one would
give to e′ were one to assent entirely to h′,90 and suppose that the sets under consid-
eration are three-element sets of natural numbers, and consider the pointwise-small
set Y = {2, 3, 4} and the non-pointwise-small set X = {2001000, 3001000, 4001000}.
Were I assent to ¬h without having an explicit counterexample in mind, it does
not seem obvious that I would be more or less reticent to assent to eX than eY .91
This then suggests that the two quantities P (eX |¬h) and P (eY |¬h) need not be far
apart from another. Hence, it does not seem evident that reliance upon pointwise-
small samplings entails insufficient levels of diversity of evidence, at least on this
analysis of diversity of evidence.
A third and final analysis of the diversity of evidence suggests the diversity of
evidence requires that “as many hypotheses as possible are tested [by the evidence]
in as many different ways as possible.”92 The background to this analysis is the
idea that what is confirmed are not individual hypotheses taken one by one, but
rather a theory consisting of several auxiliary and several primary hypotheses,
which differ from one another in that the auxiliary hypotheses are themselves
deployed in the confirmation of the primary hypotheses. Given this dependence,
the thought is that it is prudent to develop theories where any given primary
hypothesis is confirmed by several different auxiliary hypotheses. Such theories
will thus only be confirmed by evidence that tests all of these different auxiliary
94
hypotheses, and that hence tests the primary hypotheses in multiple different
ways.
This analysis of the diversity of evidence was thus obviously designed with
physical theories in mind, but there is nothing in principle that precludes its
application to the setting of arithmetic. For instance, one might view Robinson’sQ
and the Goldbach conjecture as the primary hypotheses and various instances of
mathematical induction as the auxiliary hypotheses. However, on this analysis
of the diversity of evidence, it seems that the diversity of a body of evidence for
this arithmetical theory is ultimately orthogonal to whether or not the evidence
is pointwise-small. For, it does not see that there is anything connecting the
pointwise-smallness of the instances which are evaluated and whether or not these
instances collectively confirm as many hypotheses in the theory in as many ways
as possible. To be sure, the setwise-smallness of a sampling might be a roadblock
to diversity in this sense, since setwise-small samplings simply have fewer items of
evidence to work with, but there is no obvious roadblock to diversity in the case
of pointwise-small samples. Hence, on this analysis too, it seems that there is no
obvious entailment between pointwise-smallness of the sampling in arithmetical
instance confirmation and insufficient levels of the diversity of evidence.
2.3.2 Stable and Unstable Reasoning in Geometry and Arithmetic
Whereas Baker’s concern was that arithmetical instance confirmation lacked
the virtues of reliability or diversity of evidence, the concern to be considered
now is that arithmetical instance confirmation lacks the virtue of stability. Since
the concept of stability is less conventional than that of reliability, it is necessary
to first set out some preliminaries. Stability pertains to mathematical reasoning
95
about domains of mathematical objects, the canonical examples being arithmeti-
cal reasoning about the natural numbers and geometrical reasoning about the
Euclidean plane. However, stability does not presuppose that we are in posses-
sion of a means by which to uniquely describe this domain: stable reasoning is
about a domain in the sense that it is defined relative to a domain, and not in
the sense that it characterizes it. In this, it is similar to the verisimilitude of per-
ception: this is a relation between perception and the extra-perceptual, but this
relation can obtain without the extra-perceptual being perceptually specifiable.
The notion of stability applies to reasoning about these domains, and such rea-
soning may be represented in terms of inferences, which are triples of (i) premises
from which one infers, (ii) a conclusion to which one infers, and (iii) a specifica-
tion of the mode in which the premises are said to bear on the conclusion, e.g.
whether they confirm the conclusion, whether they constitute a proof of the con-
clusion, etc. In the literature on probability whose vocabulary has been employed
in this chapter, the premises are called the evidence and the conclusion is called the
hypothesis, and in what follows, I will continue to use this terminology. Another
feature of the terminology which I employ which merits explicit mentioning is that
one and the same pair (e, h) of evidence and hypothesis might be associated with
two different inferences. For, the inferences may differ in that one asserts that
the evidence confirms the hypothesis, while the other asserts that the evidence
constitutes a proof of the hypothesis. Hence, in this terminology, an inference is
completely specified only when one specifies its evidence, its hypothesis, and its
mode (e.g. confirmation, proof).
Finally, the concept of stability presupposes that with respect to each mode
of inference (confirmation, proof) and each antecedently fixed hypothesis, there is
96
an associated strict partial order on inferences from evidence to this hypothesis,
namely, the strict partial order of inference from evidence e1 to hypothesis h being
of strictly superior quality to inference from evidence e2 to hypothesis h. If the
quality of evidence e relative to hypothesis h is denoted by qh(e), then this strict
partial order can be written as qh(e1) >h qh(e2), and it will be assumed that it
satisfies the following laws:
Transitivity: qh(e1) >h qh(e2) and qh(e2) >h qh(e3) implies qh(e1) >h q(e3).
Non-reflexivity: qh(e1) ≯h qh(e1).
For instance, with respect to confirmation, this strict partial order might be that
of degree of confirmation, so that one would have qh(e1) >h qh(e2) if and only
if P (h|e1 & k) − P (h|k) > P (h|e2 & k) − P (h|k), where k is the background
knowledge. That is, this order reflects the magnitude to which the probability of
the hypothesis given the evidence and background knowledge exceeds the prior
probability of the hypothesis given the background knowledge.93 Likewise, with
respect to proof, this strict partial order of quality might reflect the degree to
which the proof (i) appeals only to generally accepted axioms, (ii) employs only
standard rules of inference, (iii) and does not appeal to the hypothesis itself.
Hence, if one has doubts about the extent to which each of these modes of inference
(e.g. confirmation, proof) comes equipped with a strict partial order of quality
which facilitates comparisons of evidence for a given hypothesis, then one will
have legitimate concerns about the cogency of the notion of stability.
These preliminaries in place, it is now possible to define stability. Suppose
that D is a mathematical domain (e.g. the natural numbers, the Euclidean plane).
Then relative to this domain, the instability of an inference can be defined as
follows:
97
The inference from evidence e to universal hypothesis h ≡ ∀ x (Dx → Hx)is unstable if there are a finite number of objects d1, . . . , dn in the domain Dsuch that the conjunctive evidence e &
∧ni=1H(di) for hypothesis h is strictly
superior to the evidence e for hypothesis h.
In this definition, it is presupposed that the mode of the inference from the con-
junctive evidence e &∧ni=1H(di) to the hypothesis h is the same as the mode of
the inference from the prior evidence e to the hypothesis h, e.g. if the latter asserts
a relation of confirmation (resp. proof) then the former also asserts a relation of
confirmation (resp. proof). Finally, an inference from evidence to a universal
hypothesis is said to be stable if it is not unstable. A body of mathematical rea-
soning about an antecedently fixed domain is said to be unstable if some inference
from evidence to universal hypotheses present in this reasoning is unstable, and
it is said to be stable otherwise. Hence, in stable reasoning, an inference from
evidence to a universal hypothesis cannot be strictly bettered by the inclusion
of additional evidence to the effect that some finite number of particular objects
from the domain satisfy this hypothesis.
Some canonical forms of reasoning in mathematics are stable. In particular,
certain canonical forms of geometrical reasoning are stable, and this is due to the
fact that certain canonical geometrical structures display a high level of indiscerni-
bility. Suppose that M is a structure and that D ⊆ Mn is a set (not necessarily
definable) and A ⊆M is a set of parameters (not necessarily definable). Then D
is said to be A-indiscernible if any two elements a, b of D are such that they satisfy
the same first-order A-definable properties.94 It is not difficult to see that stability
is implied by indiscernibility, or more specifically, by known indiscernability. For,
suppose that D is A-indiscernible, and suppose that this is part of the background
knowledge. Suppose further that evidence e is taken to constitute a proof of the
universal hypothesis h ≡ ∀ x (Dx → Hx), where H is an A-definable property.
98
Then the inference from e to h is stable. For, suppose that d1, . . . , dn are tuples
from D. It must then be asked whether conjunctive evidence e &∧ni=1H(di) for
hypothesis h is strictly superior to the prior evidence e for hypothesis h. However,
there is clear sense in which this conjunctive evidence is not strictly superior to
the prior evidence. For, what the conjunctive evidence adds to the prior evidence
is additional evidence of the form H(di), and it is known from indiscernibility that
these are each equivalent to the hypothesis h ≡ ∀ x (Dx→ Hx). Since adding ev-
idence which is obviously equivalent to the hypothesis does not improve or better
the inference from the evidence to the hypothesis, this inference is stable. This is
why stability is implied by known indiscernability.
One example of a canonical geometric structure which displays high levels
of indiscernability is the standard presentation of the Euclidean plane as a two-
sorted structure P t L with a set of points P and a set of lines L, where the
only non-logical relation is the incidence relation of a point lying on a line. This
is a presentation of the Euclidean plane to which many of Euclid’s axioms are
readily applicable. It turns out that in this structure, the set of points P is
an ∅-indiscernible set: that is, any two points satisfy the same parameter-free
first-order formulas. This is due to the fact that the only non-logical relation in
the structure is the incidence relation P ∈ ` of a point P lying on a line `. This
relation is preserved under any permutation π : P → P , in that P ∈ ` if and only
if π(P ) ∈ π(`) ≡ {π(Q) : Q ∈ `}. Here a permutation π : P → P is simply a map
such that no two distinct points are sent to the same point under this map and
such that every point is a point to which some other point is sent. Further, such
a permutation π : P → P is said to be line-preserving if π(`) ≡ {π(Q) : Q ∈ `}
is itself a line for any line `. The reason that P is ∅-indiscernible is that for
99
any two points P,Q there is a line-preserving permutation π : P → P such
that π(P ) = Q. For, since such permutations preserve the incidence relation
and send lines to lines, it induces an automorphism of the structure P t L, and
any two automorphic elements of any structure are ∅-indiscernible. Hence, the
relevant feature of geometrical reasoning which is generative of indiscernibility in
this presentation of the Euclidean plane is the fact that its primitive non-logical
relation (e.g. incidence) is invariant under permutations.95, 96
However, while this certain canonical type of geometrical reasoning is sta-
ble, it is easy to see that arithmetical instance confirmation is not stable. For
the sake of simplicity, let us suppose that the arithmetical instance confirma-
tion in question is such that the universal hypothesis h ≡ ∀ x (Dx → Hx)
is confirmed by evidence of the form en ≡∧ni=1H(si(0)) for some particular
value of n, say m, against the background knowledge k. If there is a natural
number M > m such that 0 < P (eM & k) < P (em & k) < 1, then the evi-
dence em for h is unstable. For, under these circumstances, it follows from P1-
P3 that P (h|eM & k) − P (h|k) > P (h|em & k) − P (h|k), that is, the degree
to which eM confirms the universal hypothesis h is greater than the degree to
which em confirms this universal hypothesis.97 Further, it seems that there are
several plausible scenarios in which there would be a M > m such that the prob-
ability of eM is less than the prior probability of em. For instance, this would
happen if probabilities are associated with degrees of confidence, and this would
likewise happen if some of the evidence displayed independence. Hence, to the
extent that these scenarios are plausible, it follows that arithmetical instance con-
firmation need not be stable.
The instability of arithmetical instance confirmation is a problem because,
100
other things being equal, it seems that stable mathematical reasoning has certain
advantages over unstable mathematical reasoning. For, if one is operating in a
context where all the reasoning in which one engages is stable, then one is given
a kind of license to prescind from the examination of particulars in establish-
ing universal hypotheses. For, stability guarantees that evidence for a universal
hypothesis cannot be bettered by examination of particular cases. Since the ex-
amination of particular cases often requires a non-trivial expenditure of resources,
foreknowledge of stability frees one from such expenditures, and allows one to
focus resources elsewhere. By contrast, consider what happens in the case of un-
stable reasoning about an infinite domain. On the one hand, instability says that
there is a finite set of objects such that the evidence of their satisfying a universal
hypothesis materially improves the prior evidence for the universal hypothesis. On
the other hand, the infinitude of the domain does not ensure that one will succeed
in finding such a finite set (or ensure that one will succeed in finding such a set
and recognizing it as such). In stable reasoning about infinite domains, such a
dilemma simply cannot occur, and freedom from such dilemmas is further indica-
tion of the chief good delivered by stable reasoning, namely, the freedom from the
examination of particular cases in establishing universal hypotheses. Hence, be-
cause it aids in delivering this chief good, stable reasoning enjoys a ceteris paribus
advantage over unstable reasoning.
However, there are at least three potential objections to this argument that
stable reasoning is ceteris paribus better than unstable reasoning. The first ob-
jection is that this argument only establishes that stable reasoning has certain
pragmatic non-epistemic advantages, since considerations of the expenditure of
resources are at best pragmatic considerations, not ultimately bearing upon any
101
genuine epistemic notion, like that of justification. I am willing to grant for the
sake of argument that considerations of the expenditure of resources are ulti-
mately pragmatic in nature.98 However, I would suggest that such pragmatic
considerations are ultimately constitutive of mathematical reasoning as such. In
mathematics one reasons about infinite domains with the aid of theories whose de-
ductive relations are often non-computable, and for one to meet with any success
in establishing universal hypotheses about these domains, one must ultimately de-
velop strategies for limiting expenditures of resources. The argument given in the
above paragraph shows that stable reasoning can avoid one particularly resource-
consuming activity, namely, the consideration of particular cases in establishing
universal hypotheses.
The second objection arises from the observation that while stable reasoning
may avoid the consideration of particular cases in establishing universal hypothe-
ses, obviously neither it nor any other extant available form of reasoning can avoid
the consideration of particular cases in establishing existential hypotheses. Thus
the objection would be that the aforementioned suggested advantage of stable over
unstable reasoning is illusory since one cannot know ahead of time whether to go
about seeking to establish a universal hypothesis or its negation. It is of course
true that one cannot know ahead of time whether to go about seeking to establish
a universal hypothesis or its negation. However, in the normal course of events,
one will alternate between (i) expending resources in attempting to establish a
universal hypothesis and (ii) expending resources in attempting to establish its
negation. My suggestion is merely that stable reasoning is ceteris paribus prefer-
able to unstable reasoning simply because it enjoys an advantage with respect
to (i), albeit not with respect to (ii). That is, it is preferable because it exacts a
102
savings in the expenditure of resources in regard to one important component of
mathematical activity.
Finally, one might voice not an objection per se but rather a lingering concern
that the argument for the ceteris paribus preferability of stable reasoning is ill-
motivated, in that it does not have an obvious historical precedent. However, it
seems that there are extant arguments in the philosophy of mathematics for con-
clusions to the effect that unstable reasoning is objectionable. For instance, in his
commentary on Euclid, Proclus tells us that “[. . . ] a universal premise is better for
demonstration than a particular [. . . ]” and“[. . . ] demonstrations from universals
are more truly demonstrative [. . . ]” ([125] p. 14, italics added). Likewise, in his
book on mathematical knowledge, Kitcher asks: “How, for example, do I have the
right to conclude, on inspecting a scalene triangle, that the sum of the lengths of
two sides of a triangle is greater than the length of the third side but not that
all triangles are scalene?” ([91] p. 51, italics added). These concerns of Proclus
and Kitcher seem to articulate what I regard as a not uncommon view in the phi-
losophy of mathematics, namely that evidence of several particulars satisfying a
universal mathematical hypothesis cannot materially better our evidence for that
universal hypothesis itself. By definition, such improvements cannot take place
in stable reasoning, and hence the argument given above for the ceteris paribus
preferability of stable reasoning in mathematics is simply an attempt to articulate
an admittedly new argument for this not uncommon view.
Hence, thus far three things have been argued for in this section: (i) that one
important type of mathematical reasoning– namely, a canonical sort of geometrical
reasoning– is stable, and (ii) that arithmetical instance confirmation is not neces-
sarily stable, and that (iii) that stable mathematical reasoning is ceteris paribus
103
preferable to unstable mathematical reasoning. Hence, this suggests the following
challenge: arithmetical instance confirmation is objectionable not because its fails
to provide a measure of justification, but rather because such reasoning fails to
display an important virtue of mathematical reasoning, namely, stability.
My response to this challenge is to suggest that while stability may be a
virtue of geometric reasoning, it is not a virtue of arithmetical reasoning. For,
the chief good that stability imparts on our reasoning is that it frees us from the
burden of having to examine particular cases in establishing a universal hypothesis.
However, if all extant reasoning about a given domain of objects was such that it
involved the examination of particular cases in establishing a universal hypothesis,
then this would suggest that stability is not of value in regard to our reasoning
about this domain. For, if the value X is an instrumental value which derives
its worth from the extent to which it promotes value Y , and if value Y is simply
known not to be attainable in certain contexts, then in these contexts X likewise
loses its value. In these terms, my response to the challenge from stability is
to contend that in arithmetical contexts, freedom from the burden of examining
particular cases in establishing universal hypotheses is simply not extant, and
since this is the chief end to which stability is directed as a means, the stability
of reasoning is simply not of value in arithmetical contexts.
Hence, this response requires an argument to the effect that extant arithmeti-
cal reasoning always involves the examination of particular cases in establishing a
universal hypothesis. The warrant for this is that the primary extant manner in
which mathematical claims are established in arithmetical contexts is by mathe-
matical induction: that is, one argues that zero has the property, and that n+ 1
has this property whenever n does, and from this one concludes that all natural
104
numbers have this property. To the best of my knowledge, outside of the ax-
ioms of Robinson’s Q with which inceptive empiricism is concerned, all universal
arithmetical claims are established by mathematical induction. Indeed, this is
the reason why amplificatory empiricism is important, in that purports to justify
this mode of inference. However, the point being made now is independent of
both amplificatory and inceptive empiricism: this is the point that in establishing
universal arithmetical claims, such as feature in the consequent of an instance of
mathematical induction, there is always as a matter of fact an examination of par-
ticular cases, such as in the antecedent of an instance of mathematical induction.
It is helpful here to explicitly contrast the geometrical and arithmetical setting
with regard to the necessity of examining particular cases. To establish a universal
hypothesis in the arithmetical setting by mathematical induction, it is necessary
to first garner evidence that zero (or some other “base case”) satisfies the universal
hypothesis. Hence, in the arithmetical setting, examining a particular case is a
necessary precondition to implementing this canonical means of establishing a
universal hypothesis. However, suppose that one is in a geometrical setting with
a high level of recognized indiscernability, and suppose that one is attempting
to establish a universal hypothesis to which indiscernability applies, in that if
one object in the domain has the property, then all do. In developing proofs of
this hypothesis, one can always eliminate claims to the effect that a particular
object satisfies the universal hypothesis, since such a claim contributes nothing to
the proof, since it is equivalent to the universal hypothesis itself. Hence, in this
canonical type of geometric setting, the examination of particular cases is always
in principle eliminable.
Thus, my response to the challenge from stability is simply to concede that sta-
105
bility is a virtue in some types of geometrical reasoning, but to deny that the same
is true in arithmetical reasoning. For, it seems that the good that stability secures,
namely the legitimate disengagement from the consideration of particular cases, is
simply not to be found in any known means of establishing universal arithmetical
hypotheses. That said, one way in which to disagree with this response would be
to adduce some manner of securing universal arithmetical hypotheses which did
not rely on the examination of particular cases. It is by no means obvious that
we presently possess a complete enumeration of types of argumentation germane
to arithmetical universal hypotheses. But, absent such a countervailing example
of arithmetical argumentation, it seems that arithmetical instance confirmation is
of a piece with the other known methods of establishing universal claims about
universal arithmetical hypotheses. For, unlike in the geometric setting, in the
arithmetical setting all of of these methods involve the examination of particular
cases.
2.4 Challenges from Alternative Inferences
In the previous section, challenges centered around arithmetical instance con-
firmation were considered, and these challenges were thus specific to inceptive em-
piricism, since of the two types of empiricism considered here– namely incep-
tive and amplificatory empiricism– it is only inceptive empiricism which relies
upon arithmetical instance confirmation. By contrast, this section is devoted to
a difficulty which threatens the sustainability of amplificatory empiricism. Recall
that amplificatory empiricism contends that one is justified in inferring from the
antecedent of an instance of mathematical induction to its consequent, relative
to the background knowledge consisting of the conjunction of the eight axioms
106
of Robinson’s Q, because the consequent is confirmed by the antecedent relative
to this background knowledge. The challenge considered in this section contends
that confirmation is not a good guide to justification in this arithmetical setting
since there are a series of inferences which are relatively similar to the inference
from the antecedent of an instance of mathematical induction to its consequent,
but in which the evidence in question intuitively constitutes poor evidence for the
hypothesis in question. The response which I suggest in this section is that con-
firmation can be made to accord with these intuitive judgments about the quality
of evidence if one takes into account not only confirmation per se but also the
degree of confirmation.
To describe the relevantly similar inferences that I have in mind, it is help-
ful to first recall some terminology introduced in § 2.1 and to take note of some
elementary logical implications. The consequent of an instance of mathemati-
cal induction simply says that all numbers have a given fixed property, and that
the antecedent simply says that zero has a property and that n + 1 has this
property whenever n does. Let us respectively call this antecedent and this conse-
quent the genuine antecedent and the genuine consequent. Let us then define the
pseudo-antecedent to be the following claim: zero has the property and 2(n + 1)
has the property whenever 2n does. Likewise, in parallel with this, let us fi-
nally define the pseudo-consequent to be the following claim: all even natural
numbers have the property, where an even number is simply a number which
is equal to 2n for some natural number n. Further, let us slightly expand the
background knowledge from Robinson’s Q to a slightly larger finite theory– which
we can term supplemented Robinson’s Q– which consists of Robinson’s Q plus
the axiom ∀ n 2(n + 1) = 2n + 2 = ((2n) + 1) + 1. Then against the background
107
knowledge of supplemented Robinson’s Q, one has the following elementary logical
implications: (i) the genuine consequent logically implies the genuine antecedent,
the pseudo-antecedent, and the pseudo-consequent, (ii) the pseudo-consequent
logically implies the pseudo-antecedent, and (iii) the genuine antecedent logically
implies the pseudo-antecedent.99
I want now to consider two pairs of inferences, and to contrast what amplifica-
tory empiricism says about these cases with what our intuitive judgments about
the quality of evidence say about these cases. First, consider the contrast between
the following two inferences:
(a) the inference from the genuine antecedent to the genuine consequent,(b) the inference from the pseudo-antecedent to the genuine consequent.
From an intuitive perspective, this latter inference (b) is inferior to the former in-
ference (a). For, intuitively, the evidence which features in the pseudo-antecedent
only concerns half of the natural numbers, namely, the even natural numbers, and
in general what is true of one infinite coinfinite subset of the natural numbers need
not be true of all the natural numbers. For instance, only one of the even numbers
is prime, whereas there are infinitely many prime numbers. However, it is clear
that the very same considerations which show that the genuine antecedent con-
firms the genuine consequent will also show that the pseudo-antecedent confirms
the genuine consequent: namely, against the background knowledge of supple-
mented Robinson’s Q, it follows that the consequent logically implies the an-
tecedent in both (a) and (b), as we noted in previous paragraph (at roman nu-
meral (i)). Hence, the very same considerations which underlie amplificatory em-
piricism would commit us to saying that one is justified in inferring from the
pseudo-antecedent to the genuine consequent. This, of course, is intuitively prob-
lematic, since this would justify us in concluding the obviously false statement that
108
all natural numbers are even numbers on the basis of the obviously true statement
that zero is an even number and that if 2n is an even number then 2(n+ 1) is an
even number.100
Likewise, consider the contrast between the following two inferences, the first
of which has already been encountered:
(b) the inference from the pseudo-antecedent to the genuine consequent,(c) the inference from the pseudo-antecedent to the pseudo-consequent.
The contrast between this pair of inferences is relevant because, supposing that one
had acquired the pseudo-antecedent as evidence, there arises the natural question
as to whether one is better justified in inferring towards the genuine consequent
or towards the pseudo-consequent. Or, supposing that one person infers from the
pseudo-antecedent to the genuine consequent, while another person infers from the
pseduo-antecedent to the pseudo-consequent, there arises the natural question of
which of these two people has the better evidence for their conclusion. Intuitively,
it seems that the inference towards the pseudo-consequent is better justified than
the inference towards the genuine consequent, and largely for the same reasons as
described in the previous paragraph. For, if the evidence at hand only concerns
even numbers, then it seems that one should refrain from endorsing a universal
hypothesis about all natural numbers and rather endorse a more circumspect
universal hypothesis about all even numbers. However, as noted above in roman
numerals (i)-(ii), in both the inference (b) and the inference (c), one has that
the consequent logically implies the antecedent, and hence it seems that the same
considerations which support amplificatory empiricism would lend support to the
contention that one is justified in making both of these inferences, despite the
fact that inference (c) seems superior to inference (b). This, of course, seems
intuitively quite problematic: for, given the evidence that zero is an even number
109
and that 2(n + 1) is an even number whenever 2n is an even number, it seems
far more reasonable to conclude the obvious truth that all even numbers are even
numbers than it seems to conclude the obvious falsehood that all natural numbers
are even numbers.
In order to bring the considerations of confirmation which underlie amplifica-
tory empiricism in accord with these intuitive judgements of the superiority and
inferiority of evidence, my suggestion is to advert to the notion of the degree of
confirmation. The degree of confirmation can be taken to be given by the quan-
tity P (h|e & k) − P (h|k), which measures the extent to which the probability
of the hypothesis h conditional on the evidence e and background knowledge k
exceeds the probability of the hypothesis h conditional merely on the background
knowledge k. In our examples, the background knowledge k is the conjunction
of the nine axioms of supplemented Robinson’s Q, the hypothesis in question is
either the genuine consequent hg or the pseudo-consequent hp, and evidence in
question is either the genuine antecedent eg or the pseudo-antecedent ep. Fur-
ther, in the antepenultimate paragraph, it was noted that against background
knowledge k, one has that (i) hg logically implies eg, ep and hp, (ii) hp logically
implies ep, and (iii) eg logically implies ep. Finally, in this terminology, the first
contrast considered was between the inference (a) from eg to hg and the infer-
ence (b) from ep to hg, while the second contrast considered was between the
inference (b) from ep to hg and the inference (c) from ep to hp. This information is
summarized in Figure 2.1, where the logical implications are written with arrows
labeled by the logical consequence relation |= and the inferences undertaken by
agents are written with arrows labeled by (a)-(c). Note that the directions of the
arrows here are exactly as one would expect, e.g. while hg logically implies eg, we
110
hg
|=
��
|=,,
|=
��
hp
|=
��eg
(a)
LL
|=22 ep
(b)
[[
(c)
LL
Figure 2.1. Alternative Confirming Inferences: Two Pairs of ContrastingInferences
are considering the suggestion that one is justified in inferring from eg to hg.
It is helpful to present these examples in such an abstract manner because
the information depicted in Figure 2.1 suffices to explain why the degrees of con-
firmation in our two pairs of contrasting inferences accords with the intuitive
judgements of the superiority and inferiority of evidence described above. For,
suppose that a given quadruple of sentences hg, eg, hp, ep stands in the same log-
ical relationships as depicted in Figure 2.1, in the sense that the arrows labeled
with the logical consequence relation |= are the same. It then follows from stan-
dard manipulations of P1-P3 that the degree of confirmation in inference (a) is
greater than or equal to the degree of confirmation in (b), and that likewise the
degree of confirmation in inference (c) is greater than or equal to the degree of
confirmation in inference (b).101 This, of course, accords entirely with the intu-
itive judgments described above, which held that inference (a) was superior to
inference (b), and likewise that inference (c) was superior to inference (b). Hence,
111
by taking into account not just confirmation but the degree of confirmation, it is
possible to distinguish between superior and inferior inferences in a manner which
agrees with our intuitive judgements.102
The challenge considered in this section is that in order for confirmation to be
a good guide to justification, it needs to accord with our intuitive judgments of
the superiority and inferiority of evidence. My response to this challenge is simply
to note that the comparisons of degree of confirmation agree entirely with these
intuitive judgments. However, it is not difficult to see that nothing in this response
hinged heavily upon considerations peculiar to the even numbers, in terms of which
the pseudo-antecedent and the pseudo-consequent were defined. Rather, all that
was important was the relations of logical implication depicted in Figure 2.1.
Hence, this response is quite general, and applies to other alternative inferences
which stand in similar relations of logical implication to the genuine antecedent
and the genuine consequent.
To illustrate this generality, consider an example which differs from the exam-
ple of even numbers in that it concerns a class of numbers which are not distributed
uniformly throughout the natural numbers, but rather are clustered towards the
beginning of the natural numbers. In particular, suppose that one grants that
there is a class of natural numbers called the standard natural numbers, such that
probabilities can be assigned to all sentences in the signature of the Peano axioms
augmented by a unary predicate symbol for this class, and such that this class
has the following properties, all of which are incorporated into the background
knowledge: (1) all standard natural numbers are natural numbers, but there are
some natural numbers which are not standard natural numbers, (2) if n is a stan-
dard natural number and m is less than n, then m is a standard natural number,
112
(3) zero is a standard natural number and n + 1 is a standard natural number
whenever n is a standard natural number. Obviously, properties (1) and (3) imply
the falsity of various instances of the mathematical induction schema which are
expressible in the signature augmented by a unary predicate symbol for the stan-
dard natural numbers, but elementary compactness considerations indicate that
these three properties (1)-(3) do not require violations of mathematical induction
applied to properties which are expressible in the signature unaugmented by the
unary predicate symbol for this new class. That is, the supposition of such a class
is entirely consonant with mathematical induction being true for all arithmetical
predicates expressible purely in terms of addition and multiplication.103
Hence, just as the notion of the pseudo-antecedent ep and the pseudo-consequent hp
were defined above, so one can define the notions of the standard-antecedent and
the standard-consequent as follows: the standard antecedent es says that zero has
the given property and that n+1 has the property whenever the standard natural
number n has the property, while the standard consequent hs says that all stan-
dard natural numbers have the property in question. It is easy to see that relative
to the background knowledge consisting of Robinson’s Q and the claims (1)-(3)
from the previous paragraph, one has that (i) hg logically implies eg, es and hs,
(ii) hs logically implies es, and (iii) eg logically implies es. Hence, when the sub-
script p is replaced by the subscript s, one sees immediately that the quadruple
of sentences hg, eg, hs, es instantiates the logical implication relations from Fig-
ure 2.1. Hence, by what was said above, it follows that the degree of confirmation
in the inference from the genuine antecedent to the genuine consequent is greater
than or equal to the degree of confirmation from the standard antecedent to the
genuine consequent, and likewise that the degree of confirmation in the inference
113
from the standard antecedent to the standard consequent is greater than or equal
to the degree of confirmation from the standard antecedent to the genuine con-
sequent. This, it seems, accords with our intuitive judgments about the quality
of evidence: a universal hypothesis about natural numbers is better confirmed
by evidence about all the natural numbers than by evidence about the standard
natural numbers, and likewise evidence pertaining exclusively to standard natural
numbers better confirms a universal hypothesis about standard natural numbers
than a universal hypothesis about natural numbers.
This last point seems particularly important to mention, because one might
be sympathetic to the intuition that, as far as we know, there are natural num-
bers which are not standard natural numbers, whereas all our evidence concerns
standard natural numbers. For, on the one hand, the assumptions on the class
of standard natural numbers articulated in (1)-(3) of the penultimate paragraph
require that the natural numbers 0, 1 ,2, 3, 4, 100, 2000, 30000, 400000 are all
standard natural numbers, and such considerations suggest the plausibility of the
thought that all our evidence pertains to standard natural numbers. But, on the
other hand, if one takes probabilities to be indicative of degrees of confidence, one
might be sympathetic to assigning a non-zero probability to the claim that that
some natural numbers are non-standard natural numbers.
However, what I want to emphasize is that if the evidence in question is all rel-
ativized to the standard natural numbers, then this does not significantly impact
the viability of the thought underlying amplificatory empiricism. For, whereas
amplificatory empiricism is a claim about the epistemic implications of the con-
firmation of the genuine consequent by the genuine antecedent, there is a obvious
analogue which would concern the epistemic implications of the confirmation of
114
the standard consequent by the standard antecedent. Further, as noted in the last
paragraph, given evidence consisting of the standard antecedent, both our intu-
itive judgements and considerations of the degree of confirmation would suggest
that it is better to infer from such evidence to the standard consequent than to
the genuine consequent. Hence, if one thinks that it is plausible that there are
natural numbers which are not standard natural numbers and if one thinks that
it is plausible that everything in our evidentiary store concerns standard natural
numbers, then one can employ an obvious analogue of amplificatory empiricism
as a means by which to confirm universal statements about the natural numbers
in our evidentiary store.
Much of the philosophical literature on the Peano axioms is preoccupied by
considerations about non-standard integers, and indeed rightly so, as much of
this literature is concerned with the extent to which first-order axiomatizations
such as the Peano axioms are capable of uniquely describing or characterizing the
subject-matter of the natural numbers.104 However, part of what I have tried
to underscore in the preceding paragraphs is that the distinction between stan-
dard and non-standard natural numbers is orthogonal to the tenability of the
probability-based conception of arithmetical knowledge which I consider here. On
some level, this is not a surprising result– for instance, I take it that it is not
obvious that every important distinction in the metaphysics of middle-sized ob-
jects will automatically have implications for the epistemology of such objects.
However, given the predominance of considerations of non-standard integers in
the philosophical literature on the Peano axioms, it seems prudent to have ex-
plicitly made a case for this orthogonality here in this section. My suggestion is
that alternative inferences centered around non-standard natural numbers are no
115
more difficult to deal with than alternative inferences centered around even nat-
ural numbers, since elementary considerations about the degree of confirmation
accord with our intuitive judgments about the superiority and inferiority of the
evidence in these alternative inferences.
2.5 Conclusions and Directions for Future Research
This essay has been a preliminary defense of an empiricism according to which
our knowledge of arithmetic is of a piece with the knowledge by which we infer from
the past to the future or from the observed to the unobserved. It is a preliminary
defense in that I have not adduced positive arguments for the types of empiricism
which I consider here– namely inceptive and amplificatory empiricism– but rather
have defended the tenability of these forms of empiricism against various challenges
and objections. One overarching feature of this defense has been to argue that
if one of these challenges tells decisively against empiricism, then by parity of
reasoning it tells decisively against confirmation as a source of justification in the
setting of the physical sciences (e.g. the discussion of reliability in § 2.3.1) or
against deduction as a source of justification in the setting of mathematics (e.g.
the discussion of complete computable extensions in § 2.2.2). Similarly, I have tried
to explain why issues apparently peculiar to the setting of arithmetic, such as the
discussion of alternative inferences centered around non-standard numbers in § 2.4,
can in fact be treated by recourse to purely probabilistic considerations pertaining
to degree of confirmation. Another distinctive feature of the defense presented here
has been the precise delineation of the aretaic notion of stability (e.g. §§ 2.3.2) and
the examination of whether this is preferable (or even attainable) in the type of
reasoning which occurs under the aegis of inceptive and amplificatory empiricism.
116
The primary task for future work on these forms of empiricism lies in develop-
ing and articulating an account of the sources of arithmetical probabilities. For,
these forms of empiricism reduced knowledge of the Peano axioms to knowledge
of confirmation, and such confirmation is contingent upon being able to ascertain
when the evidence in conjunction with the background knowledge has non-zero
probability strictly less than the probability of the background knowledge (or in
the case of confirmation tout court, it is contingent upon being able to ascertain
when then evidence is assigned a non-zero probability strictly less than one). I
view this as a difficult challenge because this essay has in effect ruled out various
potential sources of justification. For instance, the discussion in §§ 2.2.1-2.2.2
rules out appealing to ω-additivity or to the computability of the ambient proba-
bility assignment. Further, it is not obvious that ideas which are used to ascertain
probabilities in the setting of the physical sciences– such as de Finetti’s notion of
exchangeability (cf. [21] § 8, [119])– can be likewise used in the setting of arith-
metic. Hence, it seems that what is required to complete this task is some new
idea about the source of arithmetical probabilities, or at least the means by which
these probabilities can be ascertained.
A secondary task for future work on inceptive and amplificatory empiricism
lies in carefully delineating the type of arithmetical reasoning which features in
the Peano axioms from the type of arithmetical reasoning which features in the
addition and subtraction of probabilities. For, were one unable to do this, then
it would seem hopeless to try to ground our knowledge the Peano axioms on our
knowledge of the probability axioms P1-P3, since the latter may very well im-
plicate or presuppose the former. Of course, Tarski’s work on the decidability
of the theory of the real closed field is clearly relevant here. This work tells us
117
that the natural numbers, with which inceptive and amplificatory empiricism are
concerned, cannot be defined in the real numbers, with which probability assign-
ments are concerned. Further, this work provides us with a complete decidable
axiomatization of the real numbers, which suggests the idea that our knowledge of
real numbers figuring in probability assignments can be taken to be based purely
on this axiomatization, and not on an underlying conception of natural number
(cf. Marker [107] § 3.3 pp. 93 ff). However, this is at best a start to a response,
since it does not address the chief difficulty, namely, of spelling out in some more
precise manner what it means for one proposition or axiomatization to presuppose
or implicate another. This is connected to the widely recognized resistance of the
concept of circularity to conceptual analysis. For instance, it is widely recognized
that it is highly challenging to provide any conceptual analysis of circularity on
which some but not all valid arguments are regarded as circular (cf. [79] p. 26,
[132]). Hence, subsequent to some progress on the conceptual analysis of circular-
ity, the task for inceptive and amplificatory empiricism would be to see if Tarski’s
work can support the claim that knowledge of the arithmetic of the real numbers
does not presuppose knowledge of the arithmetic of the natural numbers.
These are the primary tasks which inceptive and amplificatory empiricism
must fulfill if they are to be ultimately endorsed. However, the task of this essay
has been more humble in character: in particular, this essay has merely sought to
defend inceptive and amplificatory empiricism against some particularly pressing
objections. In particular, I have argued that countable additivity and the non-
computability of probability assignments are not genuine barriers to our access
to probability in the setting of arithmetic (§§ 2.2.1-2.2.2). Likewise, I have ar-
gued that the arithmetical instance confirmation on which inceptive empiricism
118
relies is not vitiated by considerations of unreliability, insufficient diversity, or
instability (§§ 2.3.1-2.3.2). Finally, I have argued that elementary considerations
pertaining to the degree of confirmation can explain various intuitions about the
quality of alternative confirming inferences centered around non-standard num-
bers (§ 2.4). These objections are by no means the only objections which one
could mount against inceptive and amplificatory empiricism, but in my view they
are the objections which are the most pressing, precisely because they concern
various apparent difficulties which emerge when one begins to take seriously the
idea that just as enumerative induction can be justified by recourse to informed
judgments of probability, so too can mathematical induction and the other Peano
axioms.
119
2.6 Notes
46 The theory Robinson’s Q consists of the following eight axioms:
(Q1) s(x) 6= 0
(Q2) s(x) = s(y)→ x = y
(Q3) x 6= 0→ ∃ w x = s(w)
(Q4) x+ 0 = x
(Q5) x+ s(y) = s(x+ y)
(Q6) x · 0 = 0
(Q7) x · s(y) = x · y + x
(Q8) x ≤ y ↔ ∃ z x+ z = y.
The mathematical induction schema is the following schema, where ϕ(x)ranges over formulas in the language (and which may contain additional freevariables):
(Iϕ) [ϕ(0) & ∀ n ϕ(n)→ ϕ(n+ 1)]→ [∀ n ϕ(n)]The system of first-order Peano arithmetic consists of the axioms of Robin-son’s Q and the mathematical induction schema Iϕ, and this is the systemstudied in e.g. Hajek and Pudlak [59]. This is to be distinguished from thesystem of second-order Peano arithmetic, as studied in e.g. Simpson [138],wherein the mathematical induction schema Iϕ is replaced by both the math-ematical induction axiom
(MI) ∀ F [F (0) & ∀ n F (n)→ F (n+ 1)]→ [∀ n F (n)]and the comprehension schema, where ϕ(x) is a formula (which may con-tain additional second-order quantifiers and which may contain additionalfree variables):
(Cϕ) ∃ F ∀ n (ϕ(n)↔ Fn)It is easy to see that second-order Peano arithmetic is equivalent to the systemconsisting of Robinson’sQ, the comprehension schema Cϕ, and the mathemat-ical induction schema Iϕ, where ϕ is allowed to contain second-order quanti-fiers. Hence, since the mathematical induction principle can be represented bythe mathematical induction schema Iϕ in both first- and second-order Peanoarithmetic, in this chapter I shall focus on the mathematical induction schema,and examine types of justification which can be imparted on instances of thisschema by considerations of probability. However, one drawback of this ap-proach is that nothing will be said here about the status of the comprehensionschema Cϕ, and it is not obvious that its existential claims can be justified
120
by recourse to judgements of probability in the same way that the axiomsof Robinson’s Q and the mathematical induction schema can be so justified.Hence, one who was sympathetic to the conclusions of this chapter but whowas nonetheless committed to second-order Peano arithmetic as opposed tofirst-order Peano arithmetic would have to provide a supplemental justifica-tion of the comprehension schema Cϕ. This of course is a non-trivial task,since it is known that the comprehension schema is not proof-theoretically in-nocuous: for instance, second-order Peano arithmetic proves the consistencyof first-order Peano arithmetic. Finally, it bears mentioning that the issue ofthe epistemic warrant for the comprehension schema Cϕ is in principle sepa-rable from issues surrounding the purported logicality of second-order logic.For, there are semantics for second-order logic in which all the ordinary the-orems of first-order predicate logic hold– i.e. the so-called Henkin-semantics(cf. Shapiro [135] Chapter 4 or Enderton [32] Chapter 4). But even for onewho accepts the Henkin semantics, and hence for whom second-order quan-tifiers are no less logical in character than first-order quantifiers, there stillremains the question of the epistemic warrant of the comprehension schema.
47 I examine and discuss this issue at length in Chapter 1.
48 For instance, in the chapter “Epistemology and Reference” of his book onstructuralism, Shaprio suggests that “pattern recognition [. . . ] can lead toknowledge of small infinite structures, such as the natural-number structureand perhaps the continuum” ([136] p. 112). Summarizing his account, Shapirosays: “To briefly reiterate, then, we first contemplate the finite structures asobjects in their own right. Then we form a system that consists of the collec-tion of these finite structures with an appropriate order. Finally, we discussthe structure of this system” ([136] p. 118). Hence, it seems that Shapiro issuggesting that our knowledge of natural number is based on our knowledgeof the class or system of all finite structures. However, this class will obey var-ious axioms which are similar to the Peano axioms. For instance, the class offinite structures satisfies the following inductive principle, which for the sakeof disambiguation can be called structure induction: all finite structures havea given property if the zero element structure has a property and if whenevera given structure has the property, then so do all structures which containexactly one more element than this given structure. Presumably the ideabehind Shapiro’s account is that our knowledge of mathematical inductionis based on our knowledge of structure induction. But then it can be askedhow one knows structure induction. One might respond to this objection bysuggesting that all that is required for the epistemology of arithmetic is anaccount of the psychological mechanism which facilities or otherwise underliescognition of natural numbers. But while an account of this mechanism wouldno doubt be invaluable, MacBride’s remark in his discussion of Shapiro alsoseems apposite here: “It is also necessary to undertake the distinctively nor-
121
mative project of coming to an understanding of our justification for holdingthe mathematical beliefs we do, the justification which in favourable casesdistinguishes mathematical knowledge from mere true belief” ([103] p. 159).
49 The background to this is Godel’s brief remark that “[. . . ] the law of completeinduction [. . . ] I perceive to be true on the basis of my understanding (that is,perception) of the concept of integer” ([55] p. 320). Recently, an elaborationand defense of Godel’s remark has been offered by Leitgeb [98] § 3. Leitgeb’smain idea is that the natural number structure can be mentally representedas a “fixed point” of certain operation on graphs: “[. . . ] our agent has somesort of mental representation available which represents the natural numberstructure as being a fixed point under this mental remove-the-initial-nodeoperation” ([98] p. 278). Leitgeb argues that one can thus “see” that thePeano axioms are true of this structure: “Intuitions might give us direct orindirect evidence for the satisfiability of concepts and thus support existenceaxioms [. . . ]: e.g., while it would be hard to ‘see’ that [second-order Peanoarithmetic] is satisfiable just by consulting the conceptual structure of thatconcept, this is no longer so once we gain intuitive access to the naturalnumber structure as sketched above” ([98] p. 279).
50 At one point, Leitgeb himself asks: “So the really interesting question atthis point is [¶] How is this Anschauung der Begriffe achieved?” ([98] p. 281).However, what I have tried to urge in these brief remarks is that an additionalquestion also needs to be asked, namely: “Why should this Anschauung derBegriffe be regarded as a source of justification, given that it is so manifestlydifferent from our normal modes of perception?” It is no doubt obvious, but isworth explicitly noting, that a similar question could be asked about the typesof arithmetical probability which I consider here. Indeed, part of what I seekto do in this chapter is to explain why, even though arithmetical probability isdifferent in certain ways from ordinary notions of probability, it neverthelesscan be viewed as a source of justification.
51 Kastner wrote several essays on the philosophy of mathematics for Eberhard’santi-Kant journal Philosophisches Magazin (cf. [9] p. 219), and the essay inwhich Kastner expresses his skepticism about the connection between mathe-matical induction and axiomhood is entitled “On Geometrical Axioms” ([82]).Here is a selection from the essay, with the key remark on the axiomatic statusof mathematical induction appearing at the outset of section 20:
15) If induction means to observe something in individual casesand to form from this a general proposition, then one has to be ablerun through all the cases, and show that what one claims occurs ineach case. I did this in my Geometry (p. 4) in that I showed that twotriangles are similar when the sides are similar, and on p. 21 I showed
122
that the angle on the perimeter of the triangle is half as large as thatin the midpoint.
16) This induction in fact says nothing new, but rather only com-bines in a general proposition what a collection of particular proposi-tions said, and so is also only reliable in so far as each of the particularpropositions is true.
Mercury, Venus, the Earth, Mars, Jupiter, and Saturn, and theirmoons each get their light from the sun. The induction: all planetshave their light from the sun, is only secure when one knows that theindividual proposition is true of each planet which one is acquaintedwith or which can be acquainted with. If the planet which Herscheldiscovered has its own light, as Hell believes, then the induction is notpermitted.
17) Another type of induction is as follows: various cases are de-rived from one another, and one shows that what occurs in one casemust also occur in the case which immediately succeeds it; and so itis enough to show of the first case, the second case, the third case,in short, of some of the first cases, what one claims of all of them,perhaps continuing without end.
18) I learned this method first from Hausen. It is used in Propo-sition 23 of his Elementary Arithmetic in series of figurative num-bers; Hausen’s method is, so far as I know, based on Jacob Bernoulli’smethod (Iacobi Bernoullii Ars Conjectandi Bas. 1713. Pas II cap. 3pag. 87). This method is very useful to show of laws which one hasperceived through experience in several cases that they are general.So I have used this method frequently in my Introduction to Analy-sis, and it is also used in applied mathematics, for example by HebelIntroduction to Political Science.
19) Since one here proceeds from the n-th case to the (n + 1)-stcase, a good friend of mine in Leipzig, M. Orchliz, called it in jest:the (n+ 1) method.
But one can give it a more noble name. In a genealogy, whensomeone is gentrified, so all his descendants are, in accordance withlaw, gentrified. So it is therefore the ancestral method, and draws onsuch men as Nepern, l’Hospitals, Tschirnhausen, and many others whoknow something of n and n+ 1.
20) But none of these types of induction is the way through whichone comes to mathematical axioms. This way is that of abstraction.Two sticks laid across one another are for the understanding a picturein which it recognizes that a pair of straight lines can only intersecteach other once. This perception is based on the capacity of the un-derstanding to abstract, to think something with the two sticks, which
123
it would likewise think by timbers, as by strings strung straight acrosseach other, or by drawing lines himself.
This capacity of the understanding, called common reason by Leib-niz on p. 416, makes it that one implicitly recognizes axioms, as Leibnizsays, whether or not they are abstractly expressed when they are sorecognized. The axiom, as Leibniz says, is the embodiment of theexemplar ([82] pp. 426-428).
Of course, here Kastner refers to Leibniz’s well-known remark in his NewEssays that axioms are “known implicitly, so to speak, though not at firstin an abstract and isolated way. The instances derive their truth from theembodied axiom, and the axiom is not grounded in the instances” ([97] p. 449).This remark is in effect Leibniz’s way of avoiding the following dilemma posedby Locke in the Essay: “[. . . ] which is known first and clearest by most people,the particular instance, or the general rule; and which is it that gives life andbirth to the other” ([101] IV.xii.3).
Kastner’s discussion was not unknown in the 19th Century. For instance,Fries suggests that the difference between enumerative induction and math-ematical induction is that only the latter involves a “secure overview” of allthe relevant cases:
Induction is a kind of inference which is entirely appropriate to math-ematics, so long as it proceeds from a mathematical division whichgrants us a secure overview of all cases falling under the rule. Butthis overview will not always be directly verifiable, but often in manyways indirectly verifiable. For example, the latter is the case withBernoulli’s induction, which one uses in analysis to such a great effect.I mean here the inference, which by an overview of a series of perhapsinfinitely many cases, proves a law for one case and then shows thatwhen it is valid for any of these cases, so it must be valid for the caseimmediately following, whereby the proof then infers immediately toall cases ([46] pp. 46-47).
Likewise, Trendelenburg records his disagreement with Kastner, stating with-out further explanation that “The apriori sciences don’t recognize any genuineinduction; when they employ induction, they intertwine with it a deduction,a synthetic procedure” ([146] vol. 2 p. 283).
52 Reid’s remarks from Essays on the Intellectual Powers of Man occur in adiscussion of Wallis:
The field of demonstration, as has been observed, is necessarytruth; the field of probable reasoning is contingent truth, not whatnecessarily must be at all times, but what is, or was, or shall be.
124
No contingent truth is capable of strict demonstration; but neces-sary truths may sometimes have probable evidence.
Dr. Wallis discovered many important mathematical truths, bythat kind of induction which draws a general conclusion from particu-lar premises. This is not strict demonstration, but, in some cases, givesas full conviction as demonstration itself; and a man may be certain,that a truth is demonstrable before it ever has been demonstrated. Inother cases, a mathematical proposition may have such probable evi-dence from induction or analogy, as encourages the Mathematician toinvestigate its demonstration. But still the reasoning proper to math-ematical and other necessary truths, is demonstration; and that whichis proper to contingent truths, is probable reasoning ([126] VII.ii.1).
The results of Wallis to which Reid refers are found in Wallis’ The Arith-metic of Infinitesimals, where Wallis finds polynomial expressions for thesums Sm(n) = 1m + 2m + · · · + (n − 1)m by the method which Wallis calls
“induction.” For instance, Wallis showed that S2(n)n(n−1)2
= 13
+ 16(n−1)
by writ-ing out both sides of the equation for small values of n and saying that “theinvestigation may be done by the method of induction” ([152] p. 26 Propo-sition 19). It is interesting to note that today these results are most readilyproved by mathematical induction (cf. Graham et. al [57] § 6.5), althoughother methods of proof are also now known (cf. Ireland and Rosen [75] § 15.1).Wallis’ methods were controversial in his own day, and were criticized by bothHobbes and Fermat (cf. [68] pp. 45-46, [39] pp. 27-28). For a representativesample of Wallis’ responses to Hobbes and Fermat, see Wallis’ Due Correctionfor Mr. Hobbes ([150] p. 42) and Chapter 78 of Wallis’ Treatise of Algebra([151] pp. 298 ff).
53 The quotation is from Rips and Asmuth [130] p. 205. Rips and Asmuth con-clude that empirical induction “plays a mediating role in reminding us of aproperty of the natural number system but provides no independent justifica-tion” ([130] p. 254). My only criticism of this conclusion would be that Ripsand Asmuth do not operate with a sufficiently broad conception of empiricalinduction, and so do not consider the possible relevance of considerations ofprobability. It should also be mentioned that, in other work, Rips and As-muth develop a positive proposal concerning the psychology of the numberconcept, saying: “[. . . ] our top-down approach suggests that these principles[the Peano axioms] (or logically equivalent ones) are acquired as such– that is,as generalizations– rather than being induced from facts about physical ob-jects” ([131] p. 638). If this is understood purely as a psychological hypothesis,then I have nothing to say for or against it. However, if the suggestion is thatan innate “intuition” of number can provide us with arithmetical knowledge,then it seems that many of the same criticisms discussed in endnote 50 would
125
apply equally well here.
54 Randomized algorithms are algorithms which appeal at some point in theirimplementation to a probabilistic process like the tossing of a coin, so thatthe probability of the correctness of the algorithm can be calculated explic-itly relative to assumptions about e.g. the fairness of the coin (cf. [114]).The recent debate about the propriety of these algorithms stems from Fallis([33, 34]), and in the course of their contribution to this debate, both Gaif-man and Easwaran have suggested that one could consider the more generalproject of assigning probabilities to arithmetical sentences based on less ob-jective conceptions of probability. For instance, Easwaran says that “Mostmathematicians are quite convinced that [Goldbach’s conjecture] is true, be-cause no counterexamples have been found among the first several millionintegers” and adds: “While there may be good reason to consider these sortsof arguments in a more general study of probabilistic proofs, I will not focuson them here” ([31] p. 347). Likewise, while Gaifman focuses on the probabil-ities which emerge from randomized algorithms, he suggests that in principlethere is nothing preventing one from considering less objective probabilities inthe case of mathematics: “The methodology of eliciting probabilities by con-sidering bets applies in the case of mathematics as it applies in general. Notthat I find the methodology unproblematic, but its problems have little to dowith the distinction between the types of statements [. . . ] Here, again, thereshould be no distinction between the empirical and the purely deductive. Inprinciple, there can be experts who specialize in certain types of combinatorialproblems, just as there are experts that provide probabilities for finding oil”([48] pp. 107-108).
55 See Howson and Urbach [74] pp. 119 ff, and Earman [30] pp. 63 ff. However,for the sake of completeness, I include here a proof of both of these elemen-tary facts. Suppose that h & k |= e and 0 < P (e & k) < P (k). Then thehypotheses 0 < P (e & k) and 0 < P (k) imply that the conditional probabil-
ities P (h|e & k) = P (h & e & k)P (e & k)
and P (h|k) = P (h & k)P (k)
are defined. Likewise,
note that the hypothesis h & k |= e implies P (h & e & k) = P (h & k), sothat
P (h|e & k) > P (h|k)⇐⇒ P (h & e & k)
P (e & k)>P (h & k)
P (k)
⇐⇒ P (h & k)
P (e & k) >
P (h & k)
P (k)
⇐⇒ 1
P (e & k)>
1
P (k)
⇐⇒ P (k) > P (e & k)
126
The case of confirmation tout court follows directly from these considerationsby taking the background knowledge k to consist of a tautology.
56 For the sake of succinctness, in the main body of the text I stated amplificatoryempiricism as follows:
Amplificatory empiricism contends that one is justified in inferring fromthe antecedent of an instance of mathematical induction to its conse-quent, relative to the background knowledge consisting of the conjunc-tion of the eight axioms of Robinson’s Q, because the consequent isconfirmed by the antecedent relative to this background knowledge.
Precisely due to its succinctness, there are potentially several ambiguities insuch a formulation, which I seek to quickly dispel in this endnote. In partic-ular, formulated as such, amplificatory empiricism instantiates the followingschema:
One is justified in inferring from p to q relative to background knowl-edge k because p, q, k stand in relation R.
I take it as obvious that for such a schema to be interesting, it needs to beunderstood as an abbreviation for the following schema:
One is justified in inferring from p & k to q if one is justified in be-lieving p & k and if one is justified in believing that p, q, k stand inrelation R. Further, there are a preponderance of examples of p, q suchthat p, q, k stand in relation R. Moreover, for every normal agent op-erating in normal circumstances there are several examples of p, q suchthat the agent is justified in believing that p, q, k stand in relation R.Finally, normal agents operating in normal circumstances are justifiedin believing k.
Since such a formulation is so overly verbose, I will avoid it in the main bodyof the text.
57 I have stated amplificatory empiricism as a thesis about justification as op-posed to knowledge. This is because knowledge is typically assumed to betrue, whereas part of the idea here is to articulate epistemic relationships ofan agent to propositions which the agent regards as being not true but merelyprobable. For instance, my idea is that the agent who successfully employsamplificatory empiricism to justify a belief in the consequent of an instanceof mathematical induction could simultaneously deem the probability of thisconsequent to be somewhere between 75% and 85%.
58 Remarks similar to those made in endnotes 56-57 about amplificatory empiri-cism can of course be made here in regard to inceptive empiricism.
59 One sense in which two propositions could be independent of one another isfor the proposition that expresses that they each materially imply one anotherto be false. However, this is clearly not the appropriate sense of independencefor this setting, since in this sense two true propositions could not be inde-
127
pendent of one another. There are at least two other associated notions ofindependence which are more germane to the type of propositions which areentertained and discussed in the philosophical literature. On the one hand,one could say that two propositions are independent of one another if thereis some objection or problem which pertained to the one but which did notobviously pertain to the other. On the other hand, one could say that twopropositions are independent of one another if one but not the other could berationally endorsed by one and the same person at one and the same time.
60 For instance, in Chapter 3 Theorem 22, it is shown that the axioms of Robin-son’s Q, together with the full comprehension schema, interpret the Peanoaxioms. Hence, by appealing to some version of the Logicist Template de-scribed in Chapter 1, one could infer to the Peano axioms from the axioms ofRobinson’s Q, which themselves might be justified by inceptive empiricism.
61 Standard references for Lω1ω include Keisler [85] and Nadel [116]. In particu-lar, for the Lω1ω-completeness theorem, see Keisler [85] Theorem 3 p. 16 andNadel [116] Theorem 3.2.1 p. 280. However, in spite of the Lω1ω-completenesstheorem, the most straightforward version of compactness for Lω1ω-sentencesis false, since there is an Lω1ω-theory which does not have a model but suchthat every finite subtheory has a model. There is of course a more attenuatedversion of compactness for Lω1ω-sentences known as Barwise compactness (cf.Keisler [85] Theorem 11 p. 45, Nadel [116] Theorem 5.6.1 p. 295).
62 For the sake of completeness, I include here a proof of this fact, which for thesake of convenience I restate as follows:
Proposition 1.Suppose that L is the signature of Peano arithmetic and that ϕQ is theconjunction of the eight axioms of Robinson’s Q (cf. endnote 46). Supposethat 0 < ε < 1
2. Suppose that P is an ω-additive probability assignment such
that P (ϕQ) > 1− ε. Then (ω, 0, s,+,×) |= ϕ if and only if P (ϕ) > 1− ε.
First note that it suffices to prove the left-to-right direction. For supposethat we knew the left-to-right direction, i.e. we knew that (ω, 0, s,+,×) |= ϕimplies P (ϕ) > 1 − ε. To prove the right-to-left direction, suppose for thesake of contradiction that we are given sentence ϕ such that P (ϕ) > 1 −ε and (ω, 0, s,+,×) |= ¬ϕ. Then by the left-to-right direction, it followsthat P (¬ϕ) > 1 − ε and from P1-P3 it follows that 1 − P (ϕ) > 1 − ε andhence P (ϕ) < ε < 1
2< 1− ε < P (ϕ), which is a contradiction.
Hence it suffices to prove the left-to-right direction. Note that it suf-fices to show that (ω, 0, s,+,×) |= ϕ implies P (ϕ & ϕQ) = P (ϕQ). For, itwould then follow from P1-P3 and our hypothesis that P (ϕ) ≥ P (ϕ & ϕQ) =P (ϕQ) > 1 − ε. Hence, we now show by induction on the complexity of ϕ
128
that (ω, 0, s,+,×) |= ϕ implies P (ϕ & ϕQ) = P (ϕQ). (i) First we appeal toa well-known fact about Robinson’s Q, namely that it proves all true Σ0
1-sentences: if ϕ is Σ0
1 then (ω, 0, s,+,×) |= ϕ implies that ϕ is provablefrom Robinson’s Q. This fact is sometimes called the Σ0
1-completeness ofRobinson’s Q (cf. Hajek and Pudlak [59] Theorem I.1.8 pp. 30-31). Now, bythe Σ0
1-completeness of Robinson’s Q and by P1-P3, it follows that if ϕ is Σ01
and (ω, 0, s,+,×) |= ϕ then P (ϕ & ϕQ) = P (ϕQ). (ii) Second, if ϕ(x) is ∆00
and (ω, 0, s,+,×) |= ∀ x ϕ(x), then by ω-additivity and (i) it follows that
P ([∀ x ϕ(x]) & ϕQ) = P (∀ x [ϕ(x) & ϕQ])
= limnP (
n∧i=0
[ϕ(si(0)) & ϕQ])
= limnP ([
n∧i=0
ϕ(si(0))] & ϕQ)
= P (ϕQ) (2.3)
(iii) Suppose that the statement is true for all Σ0n and Π0
n-formulas for n ≥ 1.Suppose that ϕ(x) is Σ0
n or Π0n. Suppose that (ω, 0, s,+,×) |= ∃ x ϕ(x).
Then (ω, 0, s,+,×) |= ϕ(sm(0)) for some m ∈ ω. Then by the inductionhypothesis and ω-additivity it follows that
P ([∃ x ϕ(x)] & ϕQ) = P (∃ x [ϕ(x) & ϕQ])
= limnP (
n∨i=0
[ϕ(si(0)) & ϕQ])
≥ P (ϕ(sm(0)) & ϕQ)
= P (ϕQ) (2.4)
But it then follows from P1-P3 that P ([∃ x ϕ(x)] & ϕQ) ≤ P (ϕQ), sothat in fact we have P ([∃ x ϕ(x)] & ϕQ) = P (ϕQ). Alternatively, supposethat (ω, 0, s,+,×) |= ∀ x ϕ(x). Then by the induction hypothesis and ω-additivity it follows that
P ([∀ x ϕ(x)] & ϕQ) = P (∀ x [ϕ(x) & ϕQ])
= limnP (
n∧i=0
[ϕ(si(0)) & ϕQ])
= limnP ([
n∧i=0
ϕ(si(0))] & ϕQ)
= P (ϕQ) (2.5)
129
Hence, the result is now proven.
63 Again for the sake of completeness, I include here a proof of this fact, whichfor the sake of convenience I restate as follows:
Proposition 2.Suppose that L is the signature of Peano arithmetic and that ϕQ is theconjunction of the eight axioms of Robinson’s Q (cf. endnote 46). Supposethat ϕ is an L-sentence such that there is a model M such that M |=ϕQ & ¬ϕ. Then there is an ω1-additive probability assignment on Lω1ω-sentences such that P (ϕQ) = 1 and P (ϕ) = 0.
The proof of this fact is relatively simple. For, given an Lω1ω-sentence ψ,let P (ψ) = 1 if M |= ψ and let P (ψ) = 0 if M |= ¬ψ. Then P is an ω1-additiveprobability assignment: for, the semantics of Lω1ω-sentences are defined sothat if M |=
∧ni=1 ϕi for all n > 0, then M |=
∧n ϕn. Further, by construction,
it follows that P (ϕQ) = 1 and P (ϕ) = 0.Note that by choosing any ϕ such that (ω, 0, s,+,×) |= ϕ and such that ϕ
is not provable from Robinson’s Q, one can obtain the following result, whichexplicitly contrasts to Proposition 1 proved in the preceding endnote:
Proposition 3.Suppose that L is the signature of Peano arithmetic and that ϕQ is theconjunction of the eight axioms of Robinson’s Q (cf. endnote 1). Thenthere is an ω1-additive probability assignment P on Lω1ω-sentences suchthat for any 0 < ε < 1
2it is not the case that (ω, 0, s,+,×) |= ψ if and only
if P (ψ) > 1− ε for all Lω1ω-sentences ψ.
64 Sometimes this ordinary language expression of the Dutch Book Theorem isgiven as the official statement of the theorem. For instance, Howson andUrbach describe the theorem as follows: “if the [ϕi] do not satisfy the prob-ability axioms, then there is a betting strategy and a set of stakes [si] suchthat whoever follows this betting strategy will lose a finite sum whatever thetruth-values of the hypotheses turn out to be” ([74] p. 79). Likewise, Earmansays: “[. . . ] Dutch book, a finite series of bets such that no matter whathappens, your net is negative (a violation of what is called coherence for de-grees of belief). The Dutch-book theorem shows that if any one of the axioms[(P1)-(P3)] is violated, then Dutch book can be made” ([74] p. 39). Thesesorts of statements have the advantage of making manifestly transparent theprimary philosophical application of the theorem, namely, as facilitating aninference from invulnerability to a Dutch book to the satisfaction of the prob-ability axioms. However, it has the disadvantage of rendering opaque the
130
precise manner in which the modal notions implicit in the concept of invul-nerability are characterized in the theorem (“whatever the truth-values of thehypotheses turn out to be,” “no matter what happens”). To the best of myknowledge, the Dutch Book Theorem can only be proven if the implicit modalnotion of necessity is characterized in terms of the non-existence of a completeconsistent theory (or an ersatz thereof, such as a model in the language of thetheory).
65 For instance, this argument may be found in Williamson [156] pp. 411-412.However, one persistent feature of the literature on countable additivity is thatwhile the formal calculations are done with what I have called ω1-additivity,it is typically presumed that this type of countable additivity provides onewith a way in which to assign probabilities to universal statements, a rolewhich in my terminology is exclusively the provenance of ω-additivity. Hence,for this reason and for the sake of completeness, I include here the proof ofthe ω1-additive version of the Dutch Book Theorem. This theorem followsreadily from the following fact:
Proposition 4.Suppose that P is a probability assignment on Lω1ω-sentences. Then Pis an ω1-additive probability assignment if and only if for every sequenceof Lω1ω-sentences ϕ1, . . . , ϕn, . . . such that |= ¬(ϕi & ϕj) for i 6= j and |=∨n ϕn, it is the case that
∑n P (ϕn) = 1.
Here is the proof of this fact. First suppose that P is an ω1-additive proba-bility assignment. Then |=
∨n ϕn implies that P (
∨n ϕn) = 1. Then by ω1-
additivity and P1-P3, it follows that 1 = P (∨n ϕn) = limn P (
∨ni=1 ϕi) =
limn
∑ni=1 P (ϕi) =
∑n P (ϕn). Now we assume that P has this feature and
we attempt to show that it also satisfies ω1-additivity. Let ψ0 = ¬∨n ϕn
and let ψn = ϕn &∧n−1i=1 ¬ϕi. Then |=
∨n ψn and |= ¬(ψi & ψj) for i 6= j.
Then P (¬∨n ϕn) + P (
∨n ϕn) = 1 =
∑n P (ψn), and of course
∑n P (ψn) =
P (¬∨n ϕn) +
∑n P (ϕn &
∧n−1i=1 ¬ϕi) so that one finally has P (
∨n ϕn) =∑
n P (ϕn &∧n−1i=1 ¬ϕi) = limn P (
∨ni=1 ϕi).
Now, for ease of reference, recall that the version of the Dutch BookTheorem currently under consideration reads as follows:
Theorem 5.Dutch Book Theorem, ω1-additive Version: Suppose that P is a functionfrom Lω1ω-sentences to real numbers. Then P is an ω1-additive Lω1ω-probabilityassignment if for every infinite sequence of real numbers sn and every in-finite sequence of Lω1ω-sentences ϕn such that the sequence snP (ϕn) isabsolutely convergent, there is a complete consistent Lω1ω-theory T suchthat
∑n sn(T (ϕn)− P (ϕn)) ≥ 0.
131
It is now easy to see how this theorem follows from the fact which we justproved, and here we follow Williamson [156] pp. 411-412. For, by the standardversion of the Dutch Book Theorem, it follows that P is a probability assign-ment. So suppose that P is not ω1-additive. Then by the above fact, there isa sequence of Lω1ω-sentences ϕ1, . . . , ϕn, . . . such that |= ¬(ϕi & ϕj) for i 6= jand |=
∨n ϕn and
∑n P (ϕn) < 1. Choose sn = −1, so that
∑n |snP (ϕn)| =∑
n P (ϕn) < 1. Let T be a complete consistent Lω1ω-theory. Then∑
n sn(T (ϕn)−P (ϕn)) = −1 +
∑n P (ϕn) < 0, which contradicts our assumption on P .
66 It is important because one is also often interested in proving the converse tothe Dutch Book Theorems, and in doing so, it will be necessary to include sucha convergence criterion. Technically, the convergence criterion is not necessaryfor the Dutch Book Theorem itself: one can remove it and the theorem willstill hold.
67 More formally, here we are appealing to the Σ01-completeness of Robinson’s Q,
which was defined in endnote 62.
68 Given the prominence of Dutch Book arguments in the literature on probabil-ity, there are naturally many such concerns and objections. For instance, oneobjection is that the inference from rationality to invulnerability presupposesthat the value attributed to currency is additive, so that the entire inferencefrom invulnerability to the satisfaction of probability axioms (such as the ad-ditive axiom P3) displays a subtle kind of circularity: one obtains additivitywith respect to probability only because one surreptitiously presupposes ad-ditivity in the setting of value (cf. Armendt [2]). Another objection is basedon the observation that the type of invulnerability in question in the theoremis an invulnerability to net loss with respect to a finite number of bets. Thisobjection then alternatively suggests that rationality only requires invulnera-bility to loss with respect to individual bets taken one-by-one (cf. Maher [104]§ 4.6 pp. 94 ff).
69 By computable, I am here adverting to the notion of Turing computation andrelative Turing computation. To put it very roughly, one set of natural num-bers X is Turing computable from another set of natural numbers Y if thereis a fixed program which, given any input n and allowed access to arbitrarilylarge initial segments of Y , can determine if n is in X. (For more details, seethe definition of X ≤T Y in Soare [139] § III.1). Hence, if neither X nor Y is asubset of the natural numbers, then the formal notions of Turing computabil-ity and relative Turing computability are simply not applicable. However, if Xand Y are countable, then by definition there are injective maps f : X → ωand g : Y → ω and hence the notions of Turing computability and relativeTuring computability apply to f(X) and g(Y ). If the maps f : X → ωand g : Y → ω are somehow suitably natural or canonical, then it would becommonplace in working mathematics to extend the notions of Turing compu-
132
tation and relative Turing computation to X and Y themselves. For instance,if X and Y are sets of rational numbers, and rational numbers are defined ascertain sorts of equivalence classes of natural numbers, then the functions fand g might simply pick out representatives from these equivalence classes.It is this sort of example that I have in mind when I say that the predicatesof Turing computation and relative Turing computation can apply by proxyto countable objects. But it is by no means trivial to say something moreprecise about the conditions under which these predicates can be extendedto countable objects, precisely because it is difficult to say something moreprecise about the sense in which the maps f : X → ω and g : Y → ω may besaid to be natural or canonical.
70 There are several ways to characterize the real algebraic numbers. On the onehand, they can be defined as as the field of all those real numbers which areroots of polynomial equations with rational-valued coefficients. For instance,using this definition, one can produce concrete examples and non-examples ofreal algebraic numbers: for instance,
√2 is real algebraic, while e and π are
not real algebraic. On the other hand, it follows from work of Tarski that thereal algebraic numbers are the smallest field which is elementarily equivalentto the real numbers, in that every first-order sentence which is true of thereal algebraic numbers is likewise true of the real numbers, and vice-versa (cfMarker [107] § 3.3). This definition has the advantage that it tells one that thereal algebraic numbers satisfy various laws which are partially characteristicof the real numbers: for instance, it tells one that the intermediate value the-orem is true for definable continuous functions. In discussing computabilityand non-computability of probability assignments with values in the real al-gebraic numbers (K, 0, 1,+,×,≤), it shall be supposed that the real algebraicnumbers are identified with an isomorphic copy (M, 0, 1,⊕,⊗,�) where Mis computable subset of the natural numbers and where ⊕ and ⊗ and � arecomputable functions. That such a computable copy exists can be seen eitherby directly effectivizing the normal proof of the existence of real closures (cf.Simpson [138] Theorem II.9.7 p. 98) or by using the Effective CompletenessTheorem (cf. Harizanov [61] Theorem 4.1 p. 18). The application of the Ef-fective Completeness Theorem presupposes Tarski’s proof that the completetheory of the real numbers (and hence the real algebraic numbers) is decidable(cf. Marker [107] Corollary 3.3.16 p. 96). Tarski’s proof has been previouslyemployed in the probabilistic setting: for instance, Fitelson uses it to provethat there is a computable procedure for determining whether there are prob-ability assignments on a finite number of propositional letters which satisfyvarious probabilistic constraints (cf. [42] p. 114).
71 A sequence of rationals qn is called a Cauchy sequence if the members of thesequence eventually get arbitrarily close to one another, in the following sense:for every K > 0, there is N > 0 such that |qn − qm| < 2−K for all n,m ≥ N .
133
A sequence of rationals qn is called a quickly-converging Cauchy sequence if therate at which they get close to each other is fixed in advance, in the followingsense: for every K > 0 it is the case that |qK − qK+n| < 2−K for all n ≥ 0(cf. Simpson [138] Definition II.4.4 p. 74). Two Cauchy sequences qn and q′nare said to be equivalent if the absolute value of their difference eventuallyapproaches zero, in the following sense: for every K > 0 there is N > 0 suchthat |qn − q′n| < 2−K for all n ≥ N . Likewise, two quickly-converging Cauchysequences qn and q′n are said to be equivalent if the absolute value of theirdifference approaches zero at a fixed rate, in the following sense: |qn − q′n| ≤2−n+1 for all n ≥ 0 (cf. Simpson [138] Definition II.4.4 p. 74). Just as onecan prove that the set of all equivalence classes of Cauchy sequences is a fieldthat is isomorphic to the real numbers, so one can prove that the set of allequivalence classes of quickly-convering Cauchy sequences is a field which isisomorphic to the real numbers (cf. Simpson [138] Theorem II.4.5 p. 76).
72 The precise details of this step of the computation actually vary dependingon which of the two aforementioned means of representation of real numberswe choose to employ. If we use the representation as real algebraic numbersdescribed in endnote 70, then this is trivial, since the relation 0 ≺ P (ϕ) isdirectly computable from P and the computable structure (M, 0, 1,⊕,⊗,�)mentioned in that endnote. If we use the representation of quickly-convergingCauchy sequences described in endnote 71, then we appeal to the fact thatthe relation P (ϕ) > 0 is computably enumerable in P or Σ0,P
1 , in the followingsense: there is a P -computable predicate R such that P (ϕ) > 0 if and onlyif ∃ n R(ϕ, n) (cf. Simpson [138] p. 76, Soare [139] § I.4 pp. 18 ff). Hence, ifwe antecedently know the disjunction P (ϕ) > 0∨P (ψ) > 0, then we can beginto step through the natural numbers, P -computably testing whether R(ϕ, n)or R(ψ, n) along the way, knowing that there will eventually be some n suchthat we compute R(ϕ, n) or we compute R(ψ, n).
73 For one expression of the idea that outside of the halting set there is a lack ofnatural examples of non-computable computably enumerable sets, see Simp-son [137] p. 287.
74 To the best of my knowledge, there is no extant literature on the relationshipbetween the epistemology of arithmetical principles (like the Peano axioms)and the epistemology of computation. There is an obvious sense in whichboth are sources of arithmetical justification, and there is a certain sense inwhich each source implicates the other. For instance, computations are typ-ically are shown to be correct by virtue of induction and other of the Peanoaxioms. Hence, the question “How do you know that algorithm e computesantecedently specified function f?” always seems to be answered by recourseto induction and the Peano axioms, in that one formally proves by recourseto these axioms that ∀ n ϕe(n) = f(n). Likewise, the axioms of Robinson’s Q
134
define addition and multiplication in terms of their recursive defining equa-tions, and if these axioms were written out in an entirely relational language(i.e. with no function symbols), then they would include axioms saying thataddition and multiplication are total functions. Hence, the Peano axiomsas described in the functional language of endnote 46 do not provide an an-swer to the question: “How do you know that addition and multiplicationare total functions?” Outside of the inceptive empiricism defined in § 2.1 anddiscussed further in § 2.3, it seems that one possible answer to this questionwould be that one has some primitive “algorithmic knowledge” of these facts.Of course, any satisfactory version of this answer would have to say some-thing more definitive about the nature of algorithmic knowledge and how itultimately differs from knowledge of arithmetical axioms.
75 For instance, just to point to one recent source, Joyce has suggested thatthe judgements of probability that feature in justification might be betterrepresented by a family of probability assignments than a single probabilityassignment (cf. [81] p. 171). This suggestion comes as a response to theobjection that it is unrealistic to suppose that degrees of belief or assent canbe represented in terms of a single probability assignment.
76 The arithmetic hierarchy and its extension– the projective hierarchy– per-vades contemporary mathematical logic and occurs in both computability-theoretic and set-theoretic settings. For instance, see Rogers [133] Chap-ters 14-16, Odifreddi [118] Chapters IV.1-2, Jech [80] Chapters 11, 25, 32-33,and Moschovakis [113], especially Chapter 3.
77 Baker explicitly casts the discussion in these comparative terms. For instance,he takes himself to be answering the following question: “(C) Is the use ofinduction in mathematics more or less rationally justified than its use in theempirical case?” ([5] p. 64). Further, he explicitly assumes for the sake ofargument that “we do have good rational grounds for trusting inductive in-ference in the empirical case” ([5] p. 64). Hence, Baker’s idea is that there issomething peculiar to arithmetical instance confirmation that renders it lesstrustworthy than physical instance confirmation, and hence in what followsI shall focus on examining this facet of Baker’s essay. However, it should bementioned that there is much in Baker’s essay which is independent of thispoint. For instance, Baker’s essay includes an examination of what variousmathematicians have said about the status of arithmetical instance confirma-tion. Baker calls this the “descriptive question” (cf. [5] § 3 pp. 61-63), whichhe distinguishes from the “normative question” of “Is enumerative inductionin mathematics rationally justified” (cf. [5] § 5 pp. 65-68). Baker’s answerto the normative question is “no,” and his primary reason for this is whatI have called Baker’s thesis, namely, that arithmetical instance confirmationis biased in a way in which physical instance confirmation is not because the
135
samplings in arithmetical instance confirmation are small. This is the thesisthat I reconstruct from the argumentation adduced on pp. 67-68 of Baker’sessay, although it should be mentioned that Baker never explicitly states thethesis in this particular manner. However, for textual evidence for this recon-struction, see the following endnote (endnote 78).
78 The relevant passage in Baker’s essay is the following:
The problem, in the case of GC [the Goldbach Conjecture] and inall other cases of induction in mathematics, is that the sample weare looking at is biased. [. . . ] [¶] Definition: a positive integer, n,is minute just in case n is within the range of numbers we can (givenour actual physical and mental capabilities) write down using ordinarydecimal notation, including (non-iterated exponentiation). [¶] Verifiedinstances of GC to date are not just small, they are minute. [. . . ] [¶]Hence the sample of positive instances of GC is biased, and unavoid-ably so. Imagine, for example, that mathematicians had only looked ateven numbers divisible by 4 when checking GC, or only (even) squarenumbers. Presumably such evidence would carry less weight since therange of instances is comparatively unvaried ([5] pp. 67-68).
It seems that there are two dimensions of bias which can be seen in theabove paragraph. On the one hand, the example of verifying the GoldbachConjecture on the even numbers divisible by four suggests that the relevantdimension of bias is that of unreliability, since if this procedure were in generalfollowed, then one would end up confirming a large number of falsehoods onthe basis of an examination of a large number of truths: for instance, ifone’s samples were drawn exclusively from the even numbers, then one couldconfirm the obviously false statement that all numbers are even numbers. Onthe other hand, Baker does speak of “comparatively unvaried” samplings atthe close of the above quotation. However, this is the only place where hementions this notion, and he does not explicitly say that what he intends by“bias” is such lack of variation. Indeed, one of the difficulties of interpretingthis key section of Baker’s essay is that he does not explicitly say what heintends by “bias.” It seems that the best that can be said on the basis of thetexts at hand is that unreliability and insufficient diversity are two dimensionsof bias which can be found in Baker’s text.
79 One might also have the intuition that “being a small natural number” is avague term, and that if n is a small natural number, then n+1 is a small nat-ural number. In conjunction with the natural assumption that zero is a smallnatural number, such an additional supposition would result in the predicateof “being a small natural number” constituting a violation of mathematicalinduction. Nothing which I will say depends on this additional supposition,
136
and in fact one minor point that I discuss further in the subsequent endnotedepends on the negation of this additional supposition. Amongst those whohave discussed this matter, the consensus view appears to be that such aviolation of mathematical induction is merely apparent and that any satisfac-tory theory of vagueness must ultimately explain why this violation is merelyapparent. See Williamson [157] p. 42 fn 15, and the references therein. How-ever, as Williamson there notes, at least one author has instead endorsed thecontrary view that mathematical induction must be suitably modified andrestricted.
80 For instance, if the small natural numbers are exclusively the numbers 0, 1, . . . , N ,then the set of all small natural numbers is the set {0, 1, . . . , N}, which hascardinality N + 1, which is not a small natural number. However, it is worthpointing out that one might potentially object to the supposition that there isa “greatest” small natural numberN . In particular, suppose that (as discussedin the previous endnote) one suspects that “being a small natural number”is vague and violates mathematical induction. Then one might be sympa-thetic to the thought that typically the way that one proves the existence ofgreatest elements of finite sets is by noting that their complements have leastelements, and that the least number principle serves many of the same rolesas mathematical induction, and indeed across a suitable base theory will beprovably equivalent to mathematical induction.
81 As discussed two paragraphs previously, all the pointwise-small sets are setwise-small, with the exception of the set of all small natural numbers itself. Let uscall pointwise-small sets which are distinct from the set of all small naturalnumbers proper pointwise-small sets. Then it follows that all proper pointwise-small sets are setwise-small. Consider then the following two claims:
(i) If the samplings in arithmetical instance confirmation are proper pointwise-small sets, then this sampling is biased.
(ii) If the samplings in arithmetical instance confirmation are proper setwise-small sets, then this sampling is biased.
From the fact that all proper pointwise-small sets are setwise-small, it followstrivially that (ii) implies (i), but one cannot infer in the same manner that (i)implies (ii), since there are many setwise-small sets which are not pointwise-small. This is an admittedly more precise albeit excessively more belaboredversion of the point which I was seeking to make in the main body of thetext, which I expressed there in the manner in which I did merely for the sakeof not having to introduce the notion of proper pointwise-small sets into themain body of the text.
82 See the following endnote for the relevant quotation.
137
83 For instance, see Baker’s remark about “good rational grounds” quoted inendnote 77.
84 This quotation comes from the following key paragraph:
A defender of induction in mathematics might respond that mattersare no worse than in the empirical case. There are many distinctivefeatures that are common to all observed emeralds, ravens, electrons,and so on; for example, they have all been observed before the present,and they are all within the past light cone of the Earth. So why notargue, on analogous grounds, that empirical induction is biased? Thedisanalogy, as already mentioned, is that the position of a numberin the ordering of integers often does make a difference to its mathe-matical properties. There are no corresponding systematic differencesbetween past and future or between inside and outside the Earthslight cone. Indeed, insofar as there are any general theoretical princi-ples they tend to concern the spatial and temporal invariance– otherthings being equal– of fundamental physical properties. Of coursethere is still room for a purely sceptical worry concerning inductionin the empirical case, but it seems to lack the specific motivation forworry which afflicts induction in mathematics ([5] p. 68).
85 This analysis was articulated by Howson and Urbach, who say: “This ideaof the similarity between items of evidence is expressed naturally in proba-bilistic terms by saying that e1 and e2 are similar provided P (e2|e1) is higherthan P (e2); and one might add that the more the first probability exceeds thesecond, the greater the similarity” ([74] p. 160). In evaluating this version, itis thus helpful to distinguish between three quantities which may potentiallygauge the degree of similarity, namely:
s1(e1, e2) ≡ P (e1 & e2)P (e1)·P (e2)
= P (e2|e1)P (e1)
s2(e1, e2) ≡ P (e1 & e2)P (e1)
− P (e2) = P (e2|e1)− P (e2)
s3(e1, e2) ≡ P (e1 & e2)− P (e1)P (e2)
On the basis of the above quotation from Howson and Urbach, who talk ofone probability “exceeding“ a second, it might seem that s2 is the intendedmeasure of similarity. However, as is natural, Howson and Urbach also seek toarticulate a notion of similarity that is symmetric in that “if e2 is (dis)similarto e1, then e1 is (dis)similar to e2” ([74] p. 160). However, it is easy tosee that s2 is not symmetric in this sense, while s1 and s3 are symmetric inthis sense. In his discussion of Howson and Urbach on this matter, Wayneemploys s1 and says “For Howson and Urbach, degree of diversity simply is
138
degree of probabilistic independence” ([153] p. 113). However, I do not seeany obvious reason to prefer s1 to s3, and hence in the main body of the text,I express Howson and Urbach’s analysis disjunctively, as gauging degree ofsimilarity either in terms of the quotient from s1 or the difference from s3.
86 More formally, one needs to invoke the Σ01-completeness of Robinson’s Q here,
which was defined and employed in endnote 62.
87 There is admittedly something quite unsatisfactory about this example, namelythat since the evidence is by construction given a high prior probability, thisevidence will only be able to confirm the universal hypothesis to a low degree,at least if degree of confirmation of a hypothesis h by evidence e is measuredby the quantity P (h|e)−P (h). Hence, one might naturally try to seek out anexample where (i) the samplings are pointwise-small, where (ii) the evidenceis close to being probabilistically independent, and where (iii) the evidenceconfirms the hypothesis to a high degree. For, Baker naturally could suggestthat his thesis was only intended to cover arithmetical instance confirmationin which the evidence confirms the hypothesis to a high degree, in which casethe example described in the main body of the text would not vitiate thethesis.
88 This analysis is due to Horwich, who says: “[. . . ] I want to suggest thatevidence is significantly diverse to the extent that its likelihood is low, relativeto many of the most plausible competing hypotheses” ([73] p. 118). Horwichis concerned to prove that it follows from his analysis that diverse evidenceconfirms to a higher degree than less diverse evidence, and in the course of hisproof of this fact, he requires that the hypotheses are “mutually exclusive andit is known that one of them is true” ([73] p. 118). Further, it is evident fromthis proof that one can neglect hypotheses which do not have a substantialprior probability, and hence Horwich notes that he needs only insist upon lowlikelihood with respect to those hypotheses which do have a substantial priorprobability (cf. Horwich’s discussion of the example of curve-fitting on [73]p. 120).
89 I am speculating somewhat on whether Horwich would associate his notionof diversity with some notion of complexity, and so this should merely beregarded as my attempt to motivate the connection between diversity of evi-dence and low-likelihood. He is explicit that his notion of diversity of evidenceis dependent on some antecedent specification of the pool of competing hy-potheses, saying: “[. . . ] I deny that a data set can be evaluated with respectto significant diversity unless this is done in relation to a particular class ofalternative hypotheses and prior assessment of the plausibility of those hy-potheses” ([73] pp. 121-122).
90 For instance, Howson and Urbach employ such counterfactual language, say-ing: “[. . . ] we can gloss your conditional degree in a given b to be what you
139
believe the fair betting-rate on a would be relative to the same informationstock augmented by the additional information consisting of the statementthat b is true” ([74] p. 82).
91 For instance, imagine that one has recently discovered a proof of ¬h, but thatthis proof does not generate or provide an explicit counterexample. Then itseems that one would have no reason to suspect that the counterexamples weresmall rather than large, and hence no reason to treat eX and eY differently.
92 This analysis is due to Glymour, and the full quotation is the following:
The only means available for guarding against such errors is to have avariety of evidence, so that as many hypotheses as possible are testedin as many different ways as possible. What makes one way of test-ing relevantly different from another is that the hypotheses used inone computation are different from the hypotheses used in the othercomputation ([54] p. 140, cf. [52] pp. 419-420, [53] p. 234).
93 There is a large literature on alternative measures of degree of confirmation,and in his discussion of these, Fitelson notes that absent a compelling case thatone of these measures corresponds better to our pre-theoretic notions aboutthe quality of evidence, any argument that did not appeal to features partic-ular to some but not others of these measures would obviously be preferableto one which did so appeal (cf. Fitelson [41] S 364). Hence, in this chapter, Ishall adopt of the procedure of employing the quantity P (h|e & k) − P (h|k)as the degree of confirmation in the main body of the text, but then discussin the endnotes the extent to which the arguments given in the main body ofthe text also hold for alternative measures of the degree of confirmation. Forthese purposes, it will be helpful to here enumerate some of these alternativemeasures of confirmation and note some of their elementary properties. Inparticular, following Fitelson [41] p. S 363, consider the following four mea-sures of the degree of confirmation of a hypothesis h by evidence e relative tobackground knowledge k:
d(h|e; k) = P (h|e & k)− P (h|k)
r(h|e; k) = ln[P (h|e & k)P (h|k) ]
`(h|e; k) = ln[ P (e|h & k)P (e|¬h & k)
]
s(h|e; k) = P (k) · P (e & k) · d(h|e; k)
In some of what follows, various elementary properties of these measures ofthe degree of confirmation will be appealed to, which for the sake of complete-ness are stated and proven here. The motivation for the subscripts in what
140
follows is that our primary application is to Figure 2.1, where the g standsfor “genuine” and the p stands for “pseudo.” Hence, when reading the proofof parts C2-C3 of the following proposition, it is helpful to keep in mind themnemonic “genuine implies pseudo.”
Proposition 6.Suppose that c(h|e; k) is one of d(h|e; k), r(h|e; k), `(h|e; k) or s(h|e; k). Then(C1) If h & k |= eg, ep and c(h|eg; k), c(h|ep; k) > 0, then
(a) c(h|eg; k) > c(h|ep; k) if and only if P (ep & k) > P (eg & k), and(b) c(h|eg; k) ≥ c(h|ep; k) if and only if P (ep & k) ≥ P (eg & k).(c) c(h|eg; k) = c(h|ep; k) if and only if P (ep & k) = P (eg & k).
(C2) If h & k |= eg and eg & k |= ep, c(h|eg; k), c(h|ep; k) > 0, then(a) c(h|eg; k) ≥ c(h|ep; k).(b) If c(h|eg; k) = c(h|ep; k) then P (ep & k) = P (eg & k).
(C3) If hg & k |= hp and hp & k |= e and c(hg|e; k), c(hp|e; k) > 0, then(a) c(hp|e; k) ≥ c(hg|e; k).
(b) If c = r then c(hp|e; k) = c(hg|e; k) = ln[ P (k)P (e & k)
]
(c) If c ∈ {d, s, `} and c(hp|e; k) = c(hg|e; k) then P (hp & k) =P (hg & k)
First we establish (C1), proving only (a), since the proofs for (b)-(c) areidentical. Further, in the proof, we use ei to range over eg, ep and we use hito range over hg, hp. Suppose that h & k |= eg, ep. In the case of d, we haved(h|ei; k) = P (h|ei & k)− P (h|k) = P (h & k)( 1
P (ei & k)− 1
P (k)), so that
d(h|eg) > d(h|ep)⇔1
P (eg & k)− 1
P (k)>
1
P (ep & k)− 1
P (k)
⇔ P (ep & k) > P (eg & k)
In the case of r, it follows that P (h|ei & k)P (h|k) = P (k)
P (ei & k), so that
r(h|eg) > r(h|ep)⇔P (k)
P (eg & k)>
P (k)
P (ep & k)⇔ P (ep & k) > P (eg & k)
In the case of `, it follows that P (ei|h & k)P (ei|¬h & k)
= P (¬h & k)P (ei & ¬h & k)
, so that
`(h|eg) > `(h|ep)⇔P (¬h & k)
P (eg & ¬h & k)>
P (¬h & k)
P (ep & ¬h & k)
⇔ P (ep & ¬h & k) > P (eg & ¬h & k)
141
But note that
P (ep & ¬h & k) > P (eg & ¬h & k)
⇔ P (ep & ¬h & k) + P (h & k) > P (eg & ¬h & k) + P (h & k)
⇔ P (ep & ¬h & k) + P (ep & h & k) > P (eg & ¬h & k) + P (eg & h & k)
⇔ P (ep & k) > P (eg & k)
In the case of s, we have
s(h|ei; k) = P (k)·P (ei & k)·[P (h|ei & k)−P (h|k)] = P (h & k)·[P (k)−P (ei & k)]
Hence we have
s(h|eg) > s(h|ep)⇔ P (k)−P (eg & k) > P (k)−P (ep & k)⇔ P (ep & k) > P (eg & k)
For (C2)(a), note that it follows directly from (C1)(b). For, suppose thath & k |= eg and eg & k |= ep. Then eg & k |= ep & k implies that P (ep & k) ≥P (eg & k). Then by the left-to-right direction of (C1)(b), it follows thatc(h|eg; k) ≥ c(h|ep; k). Likewise, (C2)(b) follows directly from (C1)(c).
For (C3), the proofs of (b)-(c) follow more or less directly from (a), andso we will present the proof for (a) and make a few brief remarks indicatinghow this proof gives the proofs for (b)-(c). So suppose that hg & k |= hp andhp & k |= e. In the case of d, we have d(h|ei; k) = P (hi|e & k) − P (hi|k) =P (hi & k)( 1
P (e & k)− 1
P (k)), so that
d(hp|e) ≥ d(hg|e)⇔ P (hp & k)(1
P (e & k)− 1
P (k)) ≥ P (hg & k)(
1
P (e & k)− 1
P (k))
⇔ P (hp & k) ≥ P (hg & k)
Since hg & k |= hp, it follows that P (hp & k) ≥ P (hg & k), and so we
are done. In the case of r, it follows that P (hi|e & k)P (hi|k) = P (k)
P (e & k), so that
r(hp|e; k) = ln[ P (k)P (e & k)
] ≥ ln[ P (k)P (e & k)
] = r(hg|e; k). In the case of `, it fol-
142
lows that P (e|hi & k)P (e|¬hi & k)
= P (¬hi & k)P (e & ¬hi & k)
, so that
`(hp|e; k) ≥ `(hg|e; k) (2.6)
⇔ P (¬hp & k)
P (e & ¬hp & k)≥ P (¬hg & k)
P (e & ¬hg & k)
⇔ P (¬hp & k)
P (e & ¬hp & k)≥ P (¬hg & hp & k) + P (¬hg & ¬hp & k)
P (e & ¬hg & hp & k) + P (e & ¬hg & ¬hp & k)
⇔ P (¬hp & k)
P (e & ¬hp & k)≥ P (¬hg & hp & k) + P (¬hp & k)
P (e & ¬hg & hp & k) + P (e & ¬hp & k)
⇔ P (¬hp & k)
P (e & ¬hp & k)≥ P (¬hg & hp & k) + P (¬hp & k)
P (¬hg & hp & k) + P (e & ¬hp & k)
⇔ P (¬hp & k) · P (¬hg & hp & k) + P (¬hp & k) · P (e & ¬hp & k)
≥ P (e & ¬hp & k) · P (¬hg & hp & k) + P (e & ¬hp & k) · P (¬hp & k)
⇔ P (¬hp & k) · P (¬hg & hp & k) ≥ P (e & ¬hp & k) · P (¬hg & hp & k) (2.7)
But we have P (¬hp & k) ≥ P (e & ¬hp & k) since e & ¬hp & k |= ¬hp & k,and so we are done.
For (C3)(c) in the case where c = `, note that the equivalence betweenequation (2.6) and equation (2.7) remains if the inequality is replaced byan equality throughout. Assuming that P (¬hg & hp & k) > 0, one canthen infer from `(hp|e; k) = `(hg|e; k) that P (¬hp & k) = P (e & ¬hp & k),
so that `(hp|e; k) = ln[ P (e|hp & k)
P (e|¬hp & k)] = ln[ P (¬hp & k)
P (e & ¬hp & k)] = ln(1) = 0, which
contradicts the hypothesis of C3 that `(hp|e; k) > 0. Hence, one must ratherhave that P (¬hg & hp & k) = 0, so that P (hp & k) = P (¬hg & hp & k) +P (hg & hp & k) = P (hg & hp & k) = P (hg & k).
In the case of s, we have s(hi|e; k) = P (k) · P (e & k) · [P (hi|e & k) −P (hi|k)] = P (hi & k) · [P (k)− P (e & k)], so that
s(hp|e) ≥ s(hg|e)⇔ P (hp & k) · [P (k)− P (e & k)] ≥ P (hg & k) · [P (k)− P (e & k)]
⇔ P (hp & k) ≥ P (hg & k)
Since hg & k |= hp, it follows that P (hp & k) ≥ P (hg & k), and so we aredone.
94 This terminology is not entirely consonant with standard model-theoretic ter-minology. Instead of saying that the set D is A-indiscernible, in model theoryit would rather be said that all the elements of D have the same complete typeover A (cf. Marker [107] § 4.1 pp. 115 ff). Further, in model theory, sayingthat a set D is A-indiscernible would express a more general property, namelythat any two finite sequences of elements of the same length from D havethe same complete type over A (cf. Marker [107] Definition 5.2.1 p. 178). I
143
have co-opted the appellation of indiscernibility because it allows me to avoidintroducing the terminology of complete types and because it resonates withwell-understood philosophical notions like the indiscernibility of identicals.
95 However, one might legitimately ask how such invariance is implicated byreasoning which is specifically geometrical in character. One response to thisquestion is related to a point that Ken Manders has made in a series ofrecent essays. Mander’s idea is that diagrammatic reasoning is facilitated bythe fact that inferences that are licensed by the diagram concern co-exactfeatures– features such as incidence which are invariant under perturbationsof the diagram ([105] pp. 69 ff, [106] § 4.2.2 pp. 91 ff). In particular, invarianceunder perturbations blocks the potential objection that what one has inferredfrom the diagram is merely an artifact of the particular manner in which onehas drawn the diagram, since invariance precisely means that this featurewill persist in the face of a wide variety of alterations to the diagram. Forinstance, Manders quotes Felix Klein as expressing the concern that “thereis real danger that a pupil of Euclid may, because of a falsely drawn figure,come to a false conclusion” (quoted on [105] p. 88). I take it that Mandersis responding to such a concern when he speaks of the ‘threat of disarray’ inthe following quotation:
Co-exact attributions either arise by suitable entries in the discursivetext (the setting-out of a claim, the application of a prior result or apostulate, such as that licensing entry of a circle in the proof of I.1);or are licensed directly by the diagram; for example, an intersectionpoint of the two circles in Euclid I.1. This poses no immediate threat ofdisarray, because co-exact attributes (again, by definition) are ‘locallyinvariant’ under variation of the diagram: they are shared by a rangeof perturbed diagrams” ([106] p. 94).
So in regard to the question of how invariance relates to geometry, the sug-gestion here would be that geometric reasoning, insofar as it is both rigorousand diagrammatic, must be such that its primitive non-logical relations (e.g.incidence) display a high level of invariance.
96 It seems prudent to mention that there are alternative presentations of theEuclidean plane, some of which display high levels of indiscernability and someof which display no indiscernability. For an example of the latter, consider thereal numbers R with the usual addition and multiplication functions and theusual ordering, and consider the Euclidean plane to be the definable set R×Rin this structure. This is the presentation of the Euclidean plane with whichone operates in standard analytic geometry, e.g. in that type of analyticgeometry in which one develops and deploys such formulas as the quadraticformula. It is easy to see that any subset D ⊆ Rn with more than one
144
element is not indiscernible, for the simple reason that with addition andmultiplication, one can define each of the rational numbers and hence use theordering to distinguish between different elements of the set. For instance,if D ⊆ R and a, b ∈ D with a < b, then there is a rational q ∈ Q suchthat a < q < b, and hence b satisfies the formula ψ(x) ≡ x > q while a doesnot. So this is why subsets of Rn are not indiscernible relative to the basicstructure given by addition, multiplication, and the usual ordering on the realnumbers R.
For an alternative presentation of the Euclidean plane which does displayhigh levels of indiscernability, consider the complex numbers C with additionand multiplication. This is a presentation of the Euclidean plane in whichpoints may be added, multiplied, and otherwise treated as numerical quanti-ties (albeit with an ordering). If A ⊆ C is a set of parameters and B ⊆ Cnis an A-definable set, then the set D of A-independent points of B is an A-indiscernible set, where c = (c1, . . . , cn) ∈ B is A-independent if ci is not ina A ∪ {c1, . . . , ci−1, ci, . . . , cn}-definable finite set. In the case where A = ∅and n = 1 and B = C, this just says that any two transcendental numbersare indiscernible with respect to all parameter-free first-order formulas, wherea transcendental number is simply one which is not the root of any algebraicequation with rational coefficients. For instance,
√2 and i =
√−1 are not
transcendental, while π and e are transcendental.In this presentation of the Euclidean plane, indiscernability follows from
the fact that the complex numbers are uncountably categorical: any otherstructure that satisfies the same first-order sentences and which has the samecardinality as it is isomorphic to it. Indeed, any uncountably categoricalstructure M which is itself uncountable is similar to C in this respect: thereis some set C ⊆ Mk definable from parameters A such that if B ⊆ Cn isan A-definable set, then the set D of A-independent points of B is an A-indiscernible set (cf. Marker [107] Lemma 6.1.16 p. 209.) This fact emergesin contemporary proofs of Morley’s Theorem, which says that if a first-ordertheory has only one model of some uncountable cardinality (such as the sizecontinuum), then it only has one model of any uncountable cardinality (cf.Marker [107] Theorem 6.1.18 pp. 212-213). Hence, what generates indiscern-ability in this type of reasoning is the presence of first-order descriptions ofstructures of size continuum, descriptions that are unique in that any twostructures of size continuum which meet this description are isomorphic.
Here one can again ask why we should think that the type of reasoninginvolved in uncountable categoricity is geometrical in character. Boris Zilberhas suggested that what is geometrical in uncountable categoricity is that theindependence relation described above gives rise to a notion of dimension,which like in linear algebra or algebraic geometry may defined in terms ofmaximal independence. For a formal definition of dimension in this sense,
145
see e.g. Zilber [163] Appendix B.1.1 and in particular Definition B.1.12 onp. 223, or Marker [107] § 8.1 and in particular the definition subsequent toLemma 8.1.3 on p. 290. In speaking of Morley’s resolution of Los’s conjecture,Zilber says: “As a matter of fact the main logical problem after answeringthe question of J. Los was what properties of M make it κ-categorical foruncountableκ? [¶] The answer is now reasonably clear: The key factor is thatwe can measure definable sets by a rank-function (dimension) and the wholeconstruction is highly homogenous” ([163] Appendix B.2.1 p. 236). Hence,in regard to the question of how reasoning about uncountably categoricalstructures is geometrical in character, the suggestion is that in any non-trivialuncountably categorical structure, there is a notion of dimension that behaveslike the notion of dimension from canonical geometrical settings like linearalgebra and algebraic geometry.
97 Assuming that h is confirmed by en, this is a direct application of Propo-sition 6 (C1)(a) from endnote 93, so that the analogous result will hold ifthe other measures of degree of confirmation discussed in that endnote areemployed.
98 This of course is not to say that stability is a pragmatic notion. Obviously,stability is defined explicitly in epistemic terms. Hence, what I am suggestingis that the argument from the previous paragraph of the main body of thetext shows that stable reasoning– an epistemic notion– enjoys a certain ceterisparibus pragmatic advantage.
99 Of course, the extra axiom of supplemented Robinson’s Q is only needed toestablish (iii), which will be important in what follows. Clearly, to the extentthat one considers analogues of amplificatory empiricism presented in termsof supplemented Robinson’s Q as opposed to Robinson’s Q itself, one willlikewise need to consider analogues of inceptive empiricism that incorporatethe extra axiom of supplemented Robinson’s Q.
100 As stated, this example is not entirely apposite. For the evidence eF inthis example is the pseduo-antecedent associated to the property F (n) whichsays that n is an even number (i.e. a number equal to 2n for some n), andthe hypothesis hF is the pseudo-consequent associated to F , and the back-ground knowledge k is the conjunction of axioms of supplemented Robin-son’s Q. Since in this case one has hF & k |= eF and k |= eF , it followsthat P (hF |eF & k) = P (hF |k), so that that hypothesis hF is in fact not con-firmed by the evidence eF relative to the background knowledge k. However,this example can be easily augmented in a way that preserves the underlyingthought that the inference from the evidence to the hypothesis is unjustified.In particular, choose any sentence ψ that is not provable or disprovable fromsupplemented Robinson’s Q, and which one has no evidence for or against.Then consider the property G(n) which says that H(n) and ψ. Then there
146
are probability assignments P on which the hypothesis hG is confirmed by theevidence eG relative to background knowledge k. For instance, choose struc-tures Mi for 1 ≤ i ≤ N such that n of these structures model Robinson’s Qand ψ and the other N − n of these structures model Robinson’s Q and ¬ψfor some 1 ≤ n < N . Then define P (ϕ) = N−1 · |{i ∈ [1, N ] : Mi |= ϕ}|, sothat P is the “counting measure” on M1, . . . ,MN . It is easy to see that Psatisfies P1-P3 and hence is a probability assignment. Further, it is easyto see that P (eG & k) = n
Nand P (k) = 1. Hence, since hG & k |= eG
and 0 < P (eG & k) < P (k), one has that the pseudo-consequent hG isconfirmed by the pseudo-antecedent eG relative to background knowledge k.Hence, amplificatory empiricism– or rather a contention similar to it but cen-tered around supplemented Robinson’s Q– would claim that one is justifiedin inferring from eG to hG, which intuitively seems problematic.
101 Assuming that the hypotheses in inferences (a)-(c) are confirmed by the ev-idence in these inferences relative to the background knowledge, these twofacts follow immediately from Proposition 6 (C2)-(C3) from endnote 93, sothat the analogous results will hold if the other measures of degree of confir-mation discussed in that endnote are employed.
102 It is also possible to explain why the inference (a) is equally justifiable asthe inference (b) when the degree of confirmation in inference (a) is equalto the degree of confirmation in inference (b). For, suppose that degree ofconfirmation in inference (a) is equal to the degree of confirmation in infer-ence (b). Assuming that the hypotheses in inferences (a)-(b) are confirmedby the evidence in these inferences relative to the background knowledge,it then follows from Proposition 6 (C2) (b) that P (ep & k) = P (eg & k)and then it follows from k & eg |= ep and standard manipulations of P1-P3that P (k) ≤ P (ep ↔ eg). This suggests that if one were confident in k, thenone should be confident that ep and eg have the same truth value. Given thisparity between ep and eg, this then suggests that were one to be justified in k,then one would be equally justified in inferring from ep to hg as from eg to h.Hence, this is why, when the degree of confirmation in inference (a) is equalto the degree of confirmation in inference (b), the inference (a) is equallyjustifiable as the inference (b).
However, it is not obvious that the inference (b) is equally justifiable asthe inference (c) when the degree of confirmation in inference (b) is equal tothe degree of confirmation in inference (c). For, the plausibility of this de-pends heavily on the measure of the degree of confirmation that one employs.Suppose that the degree of confirmation in inference (b) is equal to the degreeof confirmation in inference (c), and suppose that the hypotheses in infer-ences (b)-(c) are confirmed by the evidence in these inferences relative to thebackground knowledge. If the degree of confirmation is measured by the func-tion d(h|e; k) = P (h|e & k)− P (h|k) or s(h|e; k) = P (k) · P (e & k) · d(h|e; k)
147
or `(h|e; k) = ln[ P (e|h & k)P (e|¬h & k)
] from endnote 93, then it follows from Proposi-
tion 6 (C3) (c) that P (hp & k) = P (hg & k), so that from hg & k |= hp andstandard manipulations of P1-P3 we can conclude that P (k) ≤ P (hp ↔ hg).Thus, as above, if one was confident in k, then one would likewise be confidentthat hp and hg have the same truth value, and given this parity, it seems thatone would be equally justified in inferring to hg from ep as to hp from ep.
However, if the degree of confirmation is measured by the function r(h|e; k) =
ln[P (h|e & k)P (h|k) ] from endnote 93, then as noted in Proposition 6 (C3) (b), one
has r(hg|ep; k) = r(hp|ep; k) = ln[ P (k)P (ep & k)
]. Hence, using this measure of the
degree of confirmation, we can say nothing about the difference between in-ference (b) and inference (c), since in each case the degree of confirmationis purely a function of ep and k. Hence, this is why it is not obvious thatthe inference (b) is equally justifiable as the inference (c) when the degree ofconfirmation in inference (b) is equal to the degree of confirmation in infer-ence (c): for, while an argument can be made for this if degree of confirmationis measured by the function d, s or ` from endnote 93, it is not obvious thatan argument can be made for this if the degree of confirmation is measuredby the function r from endnote 93.
103 For instance, consider a model N of the Peano axioms. By compactness,there is a model M which satisfies all the same first-order sentences as N (andhence which is likewise a model of the Peano axioms), but in which there is anelement c such that M |= c > sn(0) for all n ∈ ω where e.g. s2(0) = s(s(0)).Then define the standard natural numbers to be the set S = {d ∈ M :∃ n ∈ ω M |= d = sn(0)}. Then (M,S) will have the three properties (1)-(3)mentioned in the main body of the text, while M itself will satisfy the Peanoaxioms and hence every instance of mathematical induction.
104 For instance, see Parsons’ uniqueness thesis discussed in endnote 35 of Chap-ter 1 or the papers of Dean and Halbach-Horsten mentioned in endnote 39 ofChapter 1.
148
CHAPTER 3
COMPARING PEANO ARITHMETIC, BASIC LAW V, AND HUME’S
PRINCIPLE
3.1 Introduction, Definitions, and Overview of Main Results
3.1.1 Introduction
Second-order Peano arithmetic and its subsystems have been studied for many
decades by mathematical logicians (cf. [138]), and the resulting theory contin-
ues to be the subject of current research and a source of open problems. More
recently, philosophers of mathematics have begun to study systems closely re-
lated to second-order Peano arithmetic (cf. [15]). One of these systems, namely,
Hume’s Principle, constitutes an axiomatization of cardinality which is similar to
the notion of cardinality defined in Zermelo-Frankel set theory. The contemporary
philosophical interest in this principle stems from Crispin Wright’s suggestion that
it can serve as the centerpiece of a revitalized version of Frege’s logicism (cf. [158],
[161], [102], and Chapter 1). Frege himself focused his logicism around a principle
called Basic Law V, which in effect codified an alternative conception of set. While
Russell’s paradox shows that Basic Law V is inconsistent with the unrestricted
comprehension schema (cf. Proposition 10), this principle has garnered renewed
attention due to Ferreira and Wehmeier’s recent proof that it is consistent with
the hyperarithmetic comprehension schema ([40], cf. [154, 155] and Remark 59).
149
The goal of this chapter is to apply methods from the subsystems of second-
order Peano arithmetic to the subsystems of Basic Law V and Hume’s Principle.
In particular, we use methods from hyperarithmetic theory to build models of sub-
systems of Basic Law V (§ 3.3), and we use recursively saturated models and ideas
from the model theory of fields to build models of subsystems of Hume’s Principle
and Basic Law V (§ 3.4). Our primary application of these new constructions is
to compare the interpretability strength of the subsystems of second-order Peano
arithmetic to the subsystems of Basic Law V and Hume’s Principle. For, one of
the few known ways to show that one theory is of strictly greater interpretabil-
ity strength than another theory is to show that the first proves the consistency
of the second (cf. Proposition 13). Hence, by formalizing our constructions, we
can compare the interpretability strength of subsystems of Hume’s Principle and
Basic Law V to subsystems of Peano arithmetic. Our main results about inter-
pretability are summarized in § 3.1.5 and on Figure 3.2. Prior to summarizing
these results, we first present formal definitions of the theories and subsystems of
Hume’s Principle and Basic Law V (§§ 3.1.2-3.1.4) and then describe what is and
is not known about the provability relation among these subsystems (§ 3.1.4 and
Figure 3.1).
3.1.2 Definition of Signatures and Theories of PA2, BL2 and HP2
The signature of PA2 is a many-sorted signature, with sorts for numbers as well
as a sort for sets of numbers. The theory PA2 is a natural set of axioms for the
following many-sorted structure in this signature:
(ω, 0, s,+,×,≤, P (ω)) (3.1)
150
This structure satisfies the eight-axioms of Robinson’s Q
(Q1) s(x) 6= 0
(Q2) s(x) = s(y)→ x = y
(Q3) x 6= 0→ ∃ w x = s(w)
(Q4) x+ 0 = x
(Q5) x+ s(y) = s(x+ y)
(Q6) x · 0 = 0
(Q7) x · s(y) = x · y + x
(Q8) x ≤ y ↔ ∃ z x+ z = y.
and the mathematical induction axiom
∀ F [F (0) & F (n)→ F (s(n))]→ [∀ n F (n)] (3.2)
and each instance of the comprehension schema (where F not free in ϕ)
∃ F ∀ n [F (n)↔ ϕ(n)] (3.3)
Here, the formula ϕ is allowed to contain free object variables (in addition to n)
and free set variables (with the exception of F ). Hence, what an instance of this
comprehension schema says is that if ϕ(n) is a formula with parameters, then
there is a set F corresponding to it. This all in place, we are now in a position to
define:
151
Definition 7. The theory PA2 or CA2 or second-order Peano arithmetic consists
of Q1-Q8, the mathematical induction axiom (3.2), and each instance of the com-
prehension schema (3.3) (cf. [138] p. 4).
The name CA2 is also given to PA2 because it reminds us of comprehension.
The signature of HP2 and BL2 is likewise a many-sorted signature, with sorts
for objects as well as sorts for n-ary relations on objects, and with an additional
function symbol from the unary relation sort to the object sort. The unary rela-
tions are written as A,B,C, F,G,H,X, Y, Z and will be called sets , and the n-ary
relation symbols for n > 1 are written as f, g, h, P,Q,R, S and will be called rela-
tions. Occasionally when we want to say something about both sets and relations,
we will talk about all n-ary relations for n ≥ 1. The additional function symbol
is denoted by # in the case of HP2 and by ∂ in the case of BL2. So the signatures
of HP2 and BL2 are exactly the same: it is merely for the sake of convenience and
clarity that we use # in the context of HP2 and ∂ in the context of BL2. Hence,
structures in this signature have the form
(M,S1, S2, . . . ,#) (3.4)
where M is a set, Sn ⊆ P (Mn) and # : S1 → M . Note that the function # only
goes from S1 to M , so that the relations from Sn for n > 1 are not in the domain
of this function.
It is worth pausing for a moment to dwell on a technical point. Formally, the
signature of PA2 also contains a binary relation symbol E which holds between
an object and a set and which, in the standard model from (3.1), is interpreted
by the ∈ relation from the ambient set-theory. In structures where this holds, let
us say that the symbol E is interpreted absolutely. It is easy to see that every
152
structure in the signature of PA2 is isomorphic to a structure that interprets this
symbol absolutely, and it is for this reason that this symbol is typically suppressed
when describing structures. Likewise, formally the signature of HP2 and BL2 con-
tains (n + 1)-ary relation symbols En, which hold between n-tuples of objects
and n-ary relations. Further, there is an obvious generalization of the notion of
absoluteness for structures in this signature, such that the structure from (3.4)
interprets En absolutely, and such that every structure in this signature is iso-
morphic to a structure which interprets En absolutely. Hence, as in the case of
second-order Peano arithmetic, in what follows, these symbols will be suppressed
when describing structures, and it will be assumed that every structure in this
signature has the form of (3.4).
Hume’s Principle and Basic Law V can now be defined. Hume’s Principle is
the following axiom in the signature of structure (3.4):
#X = #Y ⇐⇒ ∃ bijection f : X → Y (3.5)
Here, the notion of bijectivity is defined in terms of functionality, injectivity, and
surjectivity in the obvious manner. The axiom Basic Law V is the following
sentence in this signature:
∂X = ∂Y ⇐⇒ X = Y (3.6)
Here, two sets are said to be equal if they are coextensive; formally, the equality of
coextensive sets can be taken to be an axiom of all the theories considered in this
chapter. The important thing to note here is that (M,S1, S2, . . . , ∂) is a model of
Basic Law V if and only if the function ∂ : S1 →M is an injection. That is, Basic
153
Law V mandates that a very simple relation holds between S1 and M . There is no
analogue of this in the case of Hume’s Principle, since the right-hand side of (3.5)
contains a higher-order quantifier.
Nevertheless, there are many natural models of Hume’s Principle, and examin-
ing these models is the easiest way to define the theories HP2 and BL2. In particular,
if α is an ordinal which is not a cardinal, and if # is interpreted as cardinality,
then the following structure is a model of Hume’s Principle:
(α, P (α), P (α2), . . . ,#) (3.7)
Restricting attention to ordinals α that are not cardinals serves the purpose of
ensuring that #(α) < α, so that dom(#α) is P (α) and so that rng(#α) is a subset
of α. For all n-ary relation variables R and all n ≥ 1, this structure also satisfies
each instance of the following comprehension schema (where R does not occur free
in ϕ(z))
∃ R ∀ n [n ∈ R↔ ϕ(n)] (3.8)
This comprehension schema is simply the generalization of the comprehension
schema from PA2, namely (3.3), to the n-ary relations for all n ≥ 1. Here, as with
(3.3), the formula ϕ is allowed to include free object variables (in addition to n)
and free relation variables of any arity m ≥ 1 (with the exception of R). Hence,
we can now define the following theories:
Definition 8. The theory HP2 is the theory that is given by Hume’s Principle
(3.5) and the comprehension schema (3.8).
Definition 9. The theory BL2 is the theory which is given by Basic Law V (3.6)
and the comprehension schema (3.8).
154
The primary focus of this chapter is on subsystems of HP2 and BL2 that are
generated by restrictions on the complexity of the formulas appearing in the com-
prehension schema (3.8). This is due to the fact that we seek to compare the
interpretability strength of these subsystems to those of second-order Peano arith-
metic. However, unlike in the case of PA2 and HP2, attention must be restricted
to these subsystems in the case of BL2. For, it is not difficult to see that Russell’s
paradox shows that BL2 is inconsistent:
Proposition 10. BL2 is inconsistent.
Proof. By applying the comprehension schema (3.8) to the formula
ϕ(x) ≡ ∃ Y ∂ Y = x & x /∈ Y (3.9)
it follows that BL2 proves that there is set X that satisfies
∀ x (x ∈ X)⇐⇒ (∃ Y ∂ Y = x & x /∈ Y ) (3.10)
There are then two cases: either ∂(X) ∈ X or ∂(X) /∈ X. Case one: suppose
that ∂(X) ∈ X. Then by the left-to-right direction of equation (3.10), it follows
that there is Y such that ∂(Y ) = ∂(X) and ∂(X) /∈ Y . But ∂(Y ) = ∂(X)
and Basic Law V imply that Y = X, so that ∂(X) /∈ X, which contradicts our
case assumption. Case two: suppose that ∂(X) /∈ X. Then by the right-to-left
direction of equation (3.10), it follows that for any Y we have that ∂(Y ) = ∂(X)
implies ∂(X) /∈ Y . But then ∂(X) = ∂(X) implies ∂(X) /∈ X, which contradicts
our case assumption.
Hence BL2 is inconsistent and does not have any models, unlike the theories PA2
and HP2, which respectively have the canonical models (3.1) and (3.7).
155
3.1.3 Definition of Subsystems of PA2, BL2 and HP2
So if one wants to study Basic Law V, one needs to pass to subsystems of
Basic Law V that do not allow instances of the comprehension schema (3.8) applied
to formulas like the one in (3.9). To this end, let us introduce the following natural
hierarchy of formulas in the signature of BL2 and HP2. A formula ϕ, perhaps with
free object variables z and free relation variables R of different arities m ≥ 1, is
called arithmetical or Π10 or Σ1
0 if it does not contain any bound m-ary relation
variables for any m ≥ 1. Further, if m ≥ 1 and R is an m-ary relation variable
and ϕ(R) is a Σ1n-formula, then ∃ R ϕ(R) is a Σ1
n-formula and ∀ R ϕ(R) is a
Π1n+1-formula. Likewise, if m ≥ 1 and R is an m-ary relation variable and ϕ(R)
is Π1n-formula, then ∃ R ϕ(R) is a Σ1
n+1-formula and ∀ R ϕ(R) is a Π1n-formula.
That is, in this hierarchy of formulas, one is allowed to accumulate arbitrarily
many existential relation quantifiers of different arities m ≥ 1 in front of a Σ1n-
formula and still remain Σ1n, and likewise one is allowed to accumulate arbitrarily
many universal relation quantifiers of different arities m ≥ 1 in front of a Π1n-
formula and still remain Π1n. It is only the change from a universal relation
quantifier of some arity m ≥ 1 to an existential relation quantifier of some arity
m ≥ 1 (or vice-versa) which increases the complexity of the sentence in this
hierarchy. For instance, if X is set variable and R and S are binary relation
variables, then the following formulas are respectively Σ11,Π
11,Σ
12,Π
12:
∃ X ∀ x R(x,#X) (3.11)
∀ R ∀ X ∃ y [∀ x R(x, y)→ y = ∂X] (3.12)
∃ X ∀ R [∃ x R(x, x)→ R(#X,#X)] (3.13)
∀ R ∃ X ∃ S ∀ y [(∀ x x ∈ X ↔ ¬Sxy)→ R(∂X, y)] (3.14)
156
Finally, it is worth explicitly noting that not all formulas are included in our
hierarchy of formulas. For instance, we have said nothing about the complexity of
formulas which include alternations of object quantifiers and set quantifiers, such
as the following formula:
∀ X ∃ y ∀ Z [R(#X,#Z)→ R(y,#Z)] (3.15)
However, this is not a serious omission, since so long as one includes enough of the
comprehension schema (3.8) to guarantee the existence of the singleton set {n}
for each element n, the above formula is equivalent to the following Π13-formula
∀ X ∃ Y ∀ Z [∃ y ∈ Y & ∀ z ∈ Y z = y] & [R(#X,#Z)→ R(y,#Z)] (3.16)
That is, we can correct for this omission by treating object quantifiers as set quan-
tifiers over singleton sets when they occur in alternation of object quantifiers and
set quantifiers.
Using this hierarchy of formulas, one can define the subsystems of BL2 and
HP2 by restricting the complexity of formulas which appear in the comprehension
schema (3.8). For the following definition, let us recall that CA2 is another name
for PA2 (cf. Definition 7). The idea behind the following definition is then that AC
reminds us of the axiom of choice and is the result of inverting the letters in CA,
which reminds us of comprehension. So with the exception of the choice schema,
each of the schemas which figure in the below definition asserts the existence of a
certain class of definable sets and relations:
Definition 11. Suppose that XY2 is one of CA2, BL2, or HP2. Then we can define
the following four subsystems of XY2:
157
(i) The subsystem AXY0 is XY2 but with the comprehension scheme (3.8) restricted
to arithmetical formulas.
(ii) The subsystem ∆11 − XY0 is XY2 but with the comprehension scheme (3.8) re-
placed by the following schema, which is called the ∆11-comprehension schema or
hyperarithmetic comprehension schema, wherein ϕ is a Σ11-formula and ψ is a
Π11-formula:
[∀ n ϕ(n)↔ ψ(n)]→ [∃ R ∀ n n ∈ R↔ ϕ(n)] (3.17)
(iii) The subsystem Σ11 − YX0 is AXY0 and the following schema, which is called the
Σ11-choice schema, wherein ϕ is a Σ11-formula:
[∀ n ∃ P ϕ(n, P )]→ [∃ R ∀ n ∀ P (∀ m (m ∈ P ↔ nm ∈ R))→ ϕ(n, P )] (3.18)
(iv) The subsystem Π1n − XY0 is XY2 but with the comprehension schema (3.8)
restricted to Π1n-formulas.
Further, in all these schemata, ϕ and ψ are allowed to contain free object vari-
ables (in addition to n) and free relation variables of any arity m ≥ 1 (with the
exception of R).
The intuition behind the choice schema (3.18) can be made clearer as follows.
Suppose that a structure (M,S1, S2, . . . ,#) is a model of Σ11 − PH0 and that the
antecedent of a given instance of the Σ11-choice schema (3.18) holds. Then Σ11 − PH0
asserts the existence of a relation R, which for the sake of simplicity we can
assume to be a binary relation. For each object n in M , the following set is
then guaranteed to exist in S1 by the arithmetic comprehension schema (which is
included in Σ11 − PH0):
Rn = {m : Rnm} (3.19)
158
So it follows that (M,S1, S2, . . . ,#) |= ϕ(n,Rn) for every n in M . Hence, in the
situation where for every n there is a choice of P such that ϕ(n, P ), the Σ11-choice
schema asserts that there is a uniform way to make these choices, in that there is
an R such that its columns Rn satisfy ϕ(n,Rn) for each n.
Note, however, that the map (R, n) 7→ #(Rn) is not a function symbol in the
signature of HP2 or BL2. For instance, given a binary relation R, the comprehension
schema (3.8) restricted to arithmetical formulas does not in general guarantee the
existence of the binary relation
{(n,m) : #(Rn) = m} = {(n,m) : ∃ X (∀ x x ∈ X ↔ Rnx) & #X = m}
= {(n,m) : ∀ X (∀ x x ∈ X ↔ Rnx) → #X = m}
(3.20)
For, as these definitions make evident, one will in general need the hyperarithmetic
comprehension schema (3.17) in order to show that this relation exists (cf. Propo-
sitions 55-56). This example underscores an important fact: intuitively simple
relations expressible via the maps # or ∂ may be quite complex when explicitly
written out in terms of the primitives of the signature. Since our interest in this
chapter is on restrictions of the comprehension schema, this fact will be particu-
larly important to keep in mind throughout this chapter. (In § 3.5, we raise the
question of what happens when one does include function symbols (R, n) 7→ #(Rn)
in the signature, so that relations like the one defined in equation (3.20) would
count as arithmetical.)
159
Π11 − CA0
��Σ11 − LB0
Σ11 − AC0
��
Π11 − HP0
"*LLLLLLLLL
LLLLLLLLL
? --Σ11 − PH0|mm
t| rrrrrrrrr
rrrrrrrrr
∆11 − BL0
?
JJ
��
∆11 − CA0
��
∆11 − HP0
��ABL0 ACA0 AHP0
Figure 3.1. Provability Relation in Subsystems of BL2, PA2, and HP2
3.1.4 Summary of Results about the Provability Relation
Our primary concern in this chapter is with the interpretability relation be-
tween subsystems of PA2, HP2, and BL2, and we summarize our results in the next
section (§ 3.1.5). However, since provability implies interpretability, and since the
provability relation is intrinsically interesting, in this section we record what is
known about this relation among the subsystems of PA2, HP2, and BL2. This is
summarized in Figure 3.1, where the double arrows indicate that the provability
implication is irreversible, and where the negated arrows indicate that the prov-
ability implication fails, and where the arrows with question marks beside them
indicate that the provability implication is unknown.
Each of the positive provability relations in in Figure 3.1 follows immediately
from the definitions, except for the fact that Π11 − CA0 proves Σ11 − AC0 and the
fact that Σ11-choice implies ∆1
1-comprehension. For the former, see Simpson [138]
Theorem V.8.3 pp. 205-206. For the latter, the proof from Simpson [138] The-
160
orem VII.6.6 (i) p. 295 carries over to the setting of HP2 and BL2, as we verify
now:
Proposition 12. Σ11 − AC0 → ∆11 − CA0, and Σ11 − PH0 → ∆11 − HP0, and Σ11 − LB0 →
∆11 − BL0
Proof. Let M = (M,S, . . .) be a model of Σ11 − AC0 (resp. Σ11 − PH0, Σ11 − LB0).
By standard conventions, M is non-empty. However, nothing in these standard
conventions requires that M be non-empty as opposed to say S. But, in the case
of Σ11 − AC0 we have that 0 ∈M , and in the case of Σ11 − PH0 we have that #∅ ∈M ,
and likewise in the case of Σ11 − LB0 we have that ∂∅ ∈M . Hence, for the remainder
of the proof, fix parameter a ∈ M . Suppose that M |= ∀ z ϕ(z) ↔ ψ(z), where
ϕ is Σ11 and ψ is Π1
1. Then M |= ∀ z ϕ(z) ∨ ¬ψ(z). Then by the arithmetical
comprehension schema,M |= ∀ z ∃ Z (ϕ(z) ∧ a ∈ Z)∨ (¬ψ(z)∧ a /∈ Z). By the
Σ11-Choice Schema, there is R such that
M |= ∀ z ∀ Z (∀x x ∈ Z ↔ Rzx)→ [(ϕ(z) ∧ a ∈ Z) ∨ (¬ψ(z) ∧ a /∈ Z)] (3.21)
By the arithmetical comprehension schema, there isW such that z ∈ W if and only
if Rza. Then we claim that z ∈ W if and only if ϕ(z). For, suppose that z ∈ W , so
that Rza. Then Z = {x : Rzx} exists by the arithmetical comprehension schema,
and we have a ∈ Z. Then by (3.21), it follows that ϕ(z). Conversely, suppose
that z /∈ W , so that ¬Rza. Then Z = {x : Rzx} exists by the arithmetical
comprehension schema, and we have a /∈ Z. Then by (3.21), it follows that ¬ψ(z)
and hence ¬ϕ(z). Hence, in fact we have established that z ∈ W if and only if
ϕ(z). So M models ∆11 − CA0 (resp. ∆11 − HP0, ∆11 − BL0).
The known non-provability relations in Figure 3.1 are not difficult to verify.
161
In the case of the subsystems of HP2, we can read these results off of the results
for the subsystems of PA2, as the proof of Proposition 53 indicates. In the case
of the subsystems of BL2, the only known result we have is that ABL0 does not
prove ∆11 − BL0, and this is shown in Proposition 51. In § 3.5, we list the remaining
unknown questions about the provability relation, namely, the question of whether
∆11 − BL0 implies Σ11 − LB0 and whether Π11 − HP0 implies Σ11 − PH0.
3.1.5 Summary of Results about the Interpretability Relation
Most of the formal work done on the the subsystems of PA2, HP2, BL2 has con-
cerned the interpretability strength of these theories. A theory T0 is interpretable
in a theory T1 (T0 ≤I T1) if every model M1 of T1 uniformly defines without
parameters some model M0 of T0, where “uniform” has the sense that e.g. a
binary relation symbol R in the signature of T0 is defined by one and the same
formula ϕ(x, y) in each model M1 of T1. (For a more syntactic definition, see
Lindstrom [99] p. 96 or Hajek and Pudlak [59] pp. 148-149). Since this relation is
reflexive and transitive, one can define the associated notions
T0 ≡I T1 ⇐⇒ T0 ≤I T1 & T1 ≤I T0 (3.22)
T0 <I T1 ⇐⇒ T0 ≤I T1 & T1 �I T0 (3.23)
The relation ≤I is then a partial order on the set of equivalence classes of theories
under the equivalence relation ≡I. Since this partial order is in fact a linear order
in many natural cases, it can be intuitively conceived as a measure of the strength
of the theory. This order is also connected to the formal notion of consistency
strength by the following proposition:
162
Proposition 13. Suppose T1 is a finitely axiomatizable theory such that ACA0 ⊆
T1 ⊆ PA2, and suppose that T0 is a computable theory in a computable signature.
Then
T1 ` Con(T0) =⇒ T1 �I T0 (3.24)
[T0 ≤I T1 & T1 ` Con(T0)] =⇒ T0 <I T1 (3.25)
Proof. (Sketch) For (3.24), note that if T1 ` Con(T0), then T1 proves that there is
a model M0 of T0 (cf. Simpson [138] Theorem IV.3.3 p. 140). But if T1 ≤I T0 and
T1 is finitely axiomatizable, then this interpretation is due to a finite number of the
axioms of T0. Further, since T0 is computable, this can be accurately represented in
T1, so that inside T1 the model M0 of T0 defines a model M1 of T1, which likewise
exists since the theory inside which we are working (namely T1 itself) includes
arithmetical comprehension. But then T1 would prove Con(T1), which contradicts
Godel’s Second Incompleteness Theorem. (For a formal proof, see Lindstrom [99]
Chapter 7 Corollary 1 p. 97). Note that (3.25) follows immediately from (3.24)
and definition (3.23).
In what follows, we will apply this proposition to T1 = ACA0 itself or T1 =
Π11 − CA0, both of which are known to be finitely axiomatizable (cf. Simpson [138]
Lemma VIII.1.5 pp. 311-312 and Lemma VI.1.1 pp. 217-218).
The major previous results on the interpretability strength of the subsystems
of PA2, HP2, BL2 can be described as follows. In the 19th Century, Frege in essence
showed that PA2 ≤I HP2 (cf. Frege [44], [11], Boolos and Heck [14]), and recently
Heck ([67] p. 192) and Linnebo ([100] p. 161) noted that Frege’s proofs in fact show
that Π11 − CA0 ≤I Π11 − HP0 (cf. § 3.2.2, Corollary 27). Further, Boolos ([10]) showed
163
Π11 − CA0
��s{ ooooooooooo
oooooooooooooFrege/Boolos// Π11 − HP0
Σ11 − LB0 + InfWalsh --
��
Σ11 − AC0?oo
��Σ11 − LB0
��
∆11 − CA0
��∆11 − BL0
����
ACA0
��ABL0 Q//
Heck/Ganea/V isseroo ? --
Σ11 − PH0Burgess
kk"*
WalshLLLLLLLLLL
LLLLLLLLLL
Figure 3.2. Interpretability Relation in Subsystems of BL2, PA2, and HP2
that the converse holds (cf. Corollary 29), so that one has Π11 − CA0 ≡I Π11 − HP0
(cf. Corollary 30). Heck ([65]) then showed that ABL0 interprets Robinson’s Q, and
Ganea and Visser ([50], [149]) independently showed that the converse holds, so
that ABL0 ≡I Q. Likewise, Burgess ([15]) showed that AHP0 interprets Robinson’s Q.
Finally, Ferreira and Wehmeier ([40]) showed that ∆11 − BL0 is consistent and a
slight modification of their proof shows that Σ11 − LB0 is consistent, and inspection
of this proof shows that Σ11 − LB0 <I Π11 − CA0. These previous results and our
new results are summarized in Figure 3.2, where the double arrows indicate that
the provability relation is irreversible, and where the single arrows indicate that
the provability relation may or may not be irreversible. That is, in the diagram
T1 ⇒ T0 means T0 <I T1 and T1 → T0 means T0 ≤I T1.
Our new results establish upper and lower bounds on consistent subsystems of
164
BL2 and HP2 by (i) finding new constructions of models of these theories, (ii) noting
that the constructions can be formalized in theories such as ACA0 and Π11 − CA0,
and (iii) applying Proposition 13. Our first main new result, Theorem 60, is a
construction of a model M of Σ11 − LB0 using ideas from higher recursion theory
(cf. Sacks [134] Part A). This structure M models a finite extension of Σ11 − LB0
called Σ11 − LB0+Inf which interprets Σ11 − AC0. Moreover, since this construction is
formalizable in Π11 − CA0, we have that Proposition 13 implies that Σ11 − LB0+Inf <I
Π11 − CA0.
Our second set of results concerns new constructions of models of ∆11 − LB0 and
Σ11 − PH0 and ∆11 − HP0 + ¬Σ11 − PH0. These results are all based on a generaliza-
tion of a theorem of Barwise-Schlipf and Ferreira-Wehmeier which allows us to
built models of these theories on top of various recursively saturated structures
(cf. Theorem 70). In particular, we show that if k is a countable recursively
saturated o-minimal expansion of a real-closed field, then then there is a function
# : D(k) → k, where D(kn) denotes the definable subsets of kn, such that the
structure
(k,D(k), D(k2), . . . ,#) (3.26)
is a model of Σ11 − PH0. Moreover, we show that this construction can be formalized
in ACA0, so that by Proposition 13, we have Σ11 − PH0 <I ACA0 (cf. Corollary 99).
Further, we show that if k is a countable saturated algebraically closed field, then
there is a there is a function # : D(k) → k, where D(kn) denotes the definable
subsets of kn, such that the structure
(k,D(k), D(k2), . . . ,#) (3.27)
165
is a model of ∆11 − HP0+¬Σ11 − PH0. Further, we can use this construction to answer
an open question of Linnebo (cf. Remark 81 and Proposition 83). However, we do
not presently know whether this construction can be formalized in ACA0, although
we have reduced it to the question of whether Ax’s Theorem can be formalized in
ACA0 (cf. Remark 78 and Question 111). Finally, we show that if k is a countable
recursively saturated separably closed field of finite imperfection degree, then there
is a function ∂ : D(k)→ k, where D(kn) denotes the definable subsets of kn, such
that the structure
(k,D(k), D(k2), . . . , ∂) (3.28)
is a model of ∆11 − LB0 (cf. Theorem 108). However, we do not presently know
whether this construction can be formalized in ACA0, although we have reduced
this question to the question of whether the proof of the elimination of imagi-
naries for separably closed fields can be formalized in ACA0 (cf. Remark 109 and
Question 112).
3.2 Standard Models of HP2 and Associated Results
Prior to turning to the primary results of this chapter in §§ 3.3-3.4, the re-
lationship between PA2 and HP2 is briefly explored in this section. On the one
hand, in § 3.2.2, a brief self-contained proof of Frege and Boolos’s result that
PA2 and HP2 are mutually interpretable is presented (cf. Corollary 30). Then, in
§ 3.2.1, some of the ways in which the standard models of HP2 are similar to and
different from the standard models of PA2 are examined. The standard model of
PA2 is the structure from equation (3.1), namely, (ω, 0, s,+,×,≤, P (ω)), while the
standard models of HP2 are the structures from equation (3.7), namely, structures
of the form (α, P (α), P (α2), . . . ,#α), where α is an ordinal which is not a cardi-
166
nal and where #α : P (α) → α denotes cardinality. In § 3.2.1, it is shown that
these standard models of HP2 depend only on the cardinality of α for α ≥ ω + ω
(Proposition 16 (i)), and further that they can have many automorphisms, unlike
the standard model of PA2 (cf. Proposition 17 (iv)). Finally, it is shown that
there is an analogue of the relative categoricity of PA2 in the setting of HP2 (cf.
Proposition 20 and Remark 21).
3.2.1 Models of HP2 from Infinite Cardinals
Proposition 14. Suppose α, β are ordinals that are not cardinals, and consider
the structures (α, P (α), P (α2), . . . ,#α) and (β, P (β), P (β2), . . . ,#β), where #α :
P (α)→ α and #β : P (β)→ β denote cardinality.
(i) The structures (α, P (α), P (α2), . . . ,#α) and (β, P (β), P (β2), . . . ,#β) model
HP2.
(ii) If α = ω + k + 1 where k ≥ 0, then |α− rng(#α)| = k
(iii) If α ≥ ω + ω, then |α− rng(#α)| = |α|.
(iv) The structures (α, P (α), P (α2), . . . ,#α) and (β, P (β), P (β2), . . . ,#β) are
isomorphic if and only if α = β or α, β ≥ ω + ω and |α| = |β|.
Proof. For (i), note that restricting attention to ordinals α which are not cardinals
serves the purpose of ensuring that #(α) < α, so that dom(#α) is P (α) and so
that rng(#α) is a subset of α. Further, note that (α, P (α), P (α2), . . . ,#α) satisfies
Hume’s Principle by the definition of cardinality. Further, note that by the Power
Set Axiom and the Separation Axiom, the structure (α, P (α), P (α2),#α) satisfies
the full comprehension schema. Hence, in fact (α, P (α), P (α2),#α) is a model of
HP2.
167
For (ii), note that α− rng(#α) = {ω + 1, . . . , ω + k}, which has cardinality k.
For (iii), note that since α ≥ ω + ω, we have that α − ω is infinite, and
hence |α| = |α− ω|. Case One: α is a limit ordinal. Then the mapping from
α − ω to α − rng(#α) given by β 7→ β + 1 is an injection. Case Two: α is a
successor ordinal. Then α = γ + n where n > 0 and γ is a limit ordinal. Then
|α| = |α− ω| = |γ − ω|. Then the mapping from γ − ω to α − rng(#α) given by
β 7→ β + 1 is an injection. Hence in both cases we have |α− rng(#α)| = |α|.
For (iv), suppose that the two structures are isomorphic. Then this isomor-
phism induces a bijection from α onto β, and hence α and β have the same
cardinality. Further, suppose for the sake of contradiction that α 6= β and it is
not the case that α, β ≥ ω + ω. If α < β < ω + ω, then by part (ii) we have that
|α− rng(#α)| < |β − rng(#β)| < ω, and so the two structures are not elementarily
equivalent and hence not isomorphic, which is a contradiction. If α < ω + ω ≤ β,
then by parts (ii) and (iii) we have that |α− rng(#α)| < ω ≤ |β − rng(#β)|, and
so the two structures are not elementarily equivalent and hence not isomorphic,
which is a contradiction. Hence, in fact, we must have that α = β or α, β ≥ ω+ω
and |α| = |β|.
Conversely, suppose that α, β ≥ ω + ω have the same cardinality, so that
rng(#α) = rng(#β) by definition, and hence that |α− rng(#α)| = |α| = |β| =
|β − rng(#β)| by part (iii). Hence choose a bijection f : α→ β such that f(α) = α
on rng(#α). Extend f to a bijection f : P (α)→ P (β) by setting f(X) = {f(x) :
x ∈ X}. Since f(α) = α on rng(#α) and since f is a bijection, we have that
f(#α(X)) = f(|X|) = |X| = |{f(x) : x ∈ X}| =∣∣f(X)
∣∣ = #β(f(X)) (3.29)
Hence, f is an isomorphism.
168
Definition 15. If κ is a cardinal, then define the ordinal
Hκ =
ω + κ+ 1 if κ < ω,
ω + ω if κ = ω
κ+ 1 if κ > ω.
(3.30)
and define the structure
Hκ = (Hκ, P (Hκ), P (H2κ), . . . ,#κ) (3.31)
where #κ : P (Hκ)→ Hκ denotes cardinality.
Proposition 16.
(i) For every ordinal α that is not a cardinal, there is exactly one cardinal κ such
that the structureHκ is isomorphic to the structure (α, P (α), P (α2), . . . ,#α),
where #α : P (α)→ α denotes cardinality.
(ii) If κ is a cardinal then |Hκ − rng(#κ)| = κ.
(iii) If κ, λ are cardinals, then Hκ and Hλ are isomorphic if and only if κ = λ.
Proof. For (ii), there are three cases. First, suppose that κ = k < ω. Then Hκ −
rng(#κ) = {ω+1, . . . , ω+k}. Second, suppose that κ = ω. Then Hκ−rng(#κ) =
{ω + n : 0 < n < ω}. Third, suppose that κ > ω. Then by Proposition 14 (iii),
|Hκ − rng(#κ)| = |κ+ 1− rng(#)| = |κ+ 1| = κ.
For (iii), note that the right-to-left direction is trivial. For the left-to-right
direction, suppose for the sake of contradiction that Hκ and Hλ are isomorphic
and that κ 6= λ. Then without loss of generality, κ < λ. First suppose that
169
κ < λ < ω. Then part (ii) implies thatHκ andHλ are not elementarily equivalent,
since Hκ models that there are exactly κ elements not in the range of #, whereas
Hκ models that there are exactly λ elements not in the range of #. Second suppose
that κ < ω ≤ λ. Then likewise the structures Hκ and Hλ are not elementarily
equivalent, since Hκ models that there are exactly κ many elements not in the
range of #, whereas Hλ models that there are at least κ + 1 many elements not
in the range of #. Third, suppose that κ = ω < λ. In fact, this cannot happen,
since the isomorphism from Hκ and Hλ would induce a bijection between the
first-order parts of these structures, which, respectively, have cardinality ω and
λ > ω. Fourth, suppose that ω < κ < λ. Again this cannot happen, since the
isomorphism from Hκ and Hλ would induce a bijection between the first-order
parts of these structures, which respectively, have cardinality κ and λ > κ.
For (i), note that uniqueness follows from part (iii). For existence, there are
two cases. If α < ω + ω, then α = ω + k + 1 where k ≥ 0. Then of course the
structure (α, P (α), P (α2), . . . ,#α) is identical with the structureHk. If α ≥ ω+ω,
then by Proposition 14 (iv), we have that (α, P (α), P (α2), . . . ,#α) is isomorphic
to H|α|.
Proposition 17. Suppose that κ is a cardinal.
(i) If β, γ ∈ (Hκ − rng(#κ)) then there is f ∈ Aut(Hκ) such that f(β) = γ.
(ii) If X ⊆ Hκ is ∅-definable in Hκ then X ⊆ rng(#κ) or (Hκ − rng(#κ)) ⊆ X.
(iii) If β ∈ rng(#κ) and f ∈ Aut(Hκ) then f(β) = β.
(iv) Aut(Hκ) and Aut(κ) are isomorphic, where we view κ as a structure in the
empty signature.
170
Proof. (i) Let f : Hκ → Hκ by setting f(γ) = β, f(β) = γ, and let f be the
identity otherwise, so that f is a bijection of Hκ. Extend f to a mapping f :
Hκ → Hκ by setting f(X) = {f(x) : x ∈ X}. Then f is clearly a bijection since f
is a bijection. To show that it is an automorphism of the structure Hκ, it suffices
to show that f(#κX) = #κf(X). But, since f is the identity on rng(#κ), we
have that f(#κX) = f(#κX) = #κX, and since f is a bijection, we have that
f � X : X → f(X) is a bijection, and so #κX = #κf(X). Hence, in fact f is an
automorphism of Hκ which sends β to γ.
(ii) Suppose that X ⊆ Hk is ∅-definable in Hκ, but it is not the case that
X ⊆ rng(#κ) or (Hκ − rng(#κ)) ⊆ X. Then there is β ∈ X ∩ (Hκ − rng(#κ))
and γ ∈ (Hκ − rng(#κ)) ∩ (Hκ − X). By part (i), there is f ∈ Aut(Hκ) such
that f(β) = γ. But since X is ∅-definable, we have that β ∈ X if and only if
γ = f(β) ∈ X, which is a contradiction.
(iii) Suppose that β ∈ rng(#κ) and f ∈ Aut(Hκ) and f(β) 6= β. Since
rng(#κ) is ∅-definable and β ∈ rng(#κ), we have that f(β) ∈ rng(#κ). Case One:
f(β) < β. Note that the relation < on rng(#κ) is ∅-definable, since on rng(#κ)
we have
λ ≤ λ′ ⇐⇒ Hκ |= ∃ X ∃ Y #κ(X) = λ & #κ(Y ) = λ′ & ∃ injective f : X → Y
(3.32)
Then our case assumption f(β) < β implies f(f(β)) < f(β) < β and so we obtain
an infinite decreasing sequence of ordinals, which is a contradiction. Case Two:
β < f(β). Since f ∈ Aut(Hκ) we have that f−1 ∈ Aut(Hκ), and since β < f(β)
we have f−1(β) < β, since again the relation < on rng(#κ) is ∅-definable. Hence,
by iterating f−1(f−1(β)) < f−1(β) < β as before, we again obtain an infinite
decreasing sequence of ordinals, which is a contradiction.
171
(iv) If X is a set viewed as a structure in the empty signature, then Aut(X) is
just the set of permutations of X, and hence if X and Y have the same cardinality,
then Aut(X) and Aut(Y ) are isomorphic as groups. Hence by Proposition 16 (ii),
we have that Aut(κ) and Aut(Hκ−rng(#)) are isomorphic as groups. So it suffices
to find a group isomorphism F : Aut(Hκ − rng(#))→ Aut(Hκ).
To this end, given a bijection f : Hκ → Hκ, extend f to a mapping f : Hκ →
Hκ by setting f(X) = {f(x) : x ∈ X}, so that f : Hκ → Hκ is a bijection. Then
we claim that
f ∈ Aut(Hκ)⇐⇒ f � (rng(#κ)) = idrng(#κ) (3.33)
The left-to-right direction follows directly from part (iii). For the right-to-left
direction, it suffices to show that f(#κX) = #κf(X). Since f is the identity on
rng(#κ), we have that f(#κX) = f(#κX) = #κX, and since f is a bijection,
we have that f � X : X → f(X) is a bijection, and so #κX = #κf(X). Hence,
equation (3.33) does hold, and so we can define F : Aut(Hκ−rng(#κ))→ Aut(Hκ)
by setting F (g) = f , where f is g on Hκ − rng(#κ) and where f is the identity
on rng(#κ). Since F (g1 ◦ g2) = F (g1) ◦F (g2), we have that F witnesses the group
isomorphism between Aut(Hκ) and Aut(Hκ − rng(#κ)).
Remark 18. The proof of the theorem above shows one how to construct many
natural examples of sentences that are independent of HP2. For instance, in equa-
tion (3.32), it was shown how to define the ordering in Hκ. Using this, one can
form a sentence ϕ such that Hκ |= ϕ if and only if κ is an infinite successor car-
dinal, so that Hω2 |= HP2 + ϕ and Hωω |= HP2 + ¬ϕ. This contrasts starkly with
the case of PA2, where there are comparatively few known examples of natural
independent sentences.
172
Remark 19. The structuresHκ for κ < ω from Definition 15 are on one level very
different: for, they are not elementarily equivalent since Hκ models that there are
exactly κ-many elements that are not in the range of the #-function. However,
on another level, these structures are very similar to each other: for, when κ < ω,
it is easy to see that Hκ is isomorphic to the structure (ω, P (ω), P (ω2), . . . ,#∗κ),
where #∗κ(X) = 0 if X is infinite and where #∗κ(X) = κ + 1 + |X| if X is finite.
Further, when one restricts to the ranges of the #∗κ-functions, the induced struc-
tures (rng(#∗κ), P (ω) ∩ P (rng(#∗κ)), S2 ∩ P (rng(#∗κ)2), . . . ,#∗κ) are all isomorphic
to the structure (ω, P (ω), P (ω2), . . . ,#∗) where #∗(X) = 0 if X is infinite and
where #∗(X) = 1 + |X| if X is finite. As the next theorem indicates, this is
a very general phenomenon among models of HP2: namely, so long as different
#-functions on one and the same underlying set can in some sense see each other,
they yield isomorphic structures when one restricts attention to their ranges.
Proposition 20. Suppose that (M,S1, S2, . . . ,#1,#2) is a structure where Sn ⊆
P (Mn) and where #i : S1 →M . Suppose further that the structures (M,S1, S2, . . . ,#i)
are models of HP2 for i ∈ {1, 2}, and further that the structure (M,S1, S2, . . . ,#1,#2)
satisfies every instance of the comprehension schema (3.8), in the signature that
includes both of the function symbols #1,#2. Finally, for i ∈ {1, 2}, define the
following induced structure:
Ni = (rng(#i), S1 ∩ P (rng(#i)), S2 ∩ P (rng(#i)2), . . . ,#i) (3.34)
Then N1 and N2 are isomorphic models of HP2.
Proof. First we define a bijection Γ : rng#1 → rng#2. If #1X ∈ rng#1 where
X ∈ S1, then we define Γ(#1X) = #2X. Note that Γ : rng#1 → rng#2 is well-
173
defined: if #1X = #1Y then we need to show that #2X = #2Y . This follows,
since
#1X = #1Y =⇒ [∃ bijection f : X → Y ] =⇒ #2X = #2Y (3.35)
Next, note that Γ : rng#1 → rng#2 is injective:
Γ(#1X) = Γ(#1Y ) =⇒ #2X = #2Y =⇒ [∃ bijection f : X → Y ] =⇒ #1X = #1Y
(3.36)
Finally, note that Γ : rng#1 → rng#2 is surjective: if #2X ∈ rng#2 then by
definition Γ(#1X) = #2X. Hence, in fact Γ : rng#1 → rng#2 is a bijection.
Further, note that the graph of Γ is in S2 since one has the equality
graph(Γ) = {(x, y) ∈M2 : ∃ Z #1(Z) = x & #2(Z) = y} (3.37)
and since it was assumed that the structure (M,S1, S2, . . . ,#1,#2) satisfies every
instance of the comprehension schema (3.8) in the signature that includes both
of the function symbols #1,#2. Now, extend to Γ : N1 → N2 by setting Γ(X) =
{Γ(x) : x ∈ X}, which exists in S1 since the graph of Γ is in S2. Then Γ : N1 → N2
is an isomorphism, because
Γ(#1X) = Γ(#1X) = #2X = #2{Γ(x) : x ∈ X} = #2Γ(X), (3.38)
where the first and second equalities follow respectively from the definitions of Γ
and Γ, and where the third equality follows from the fact that Γ : X → {Γ(x) :
x ∈ X} is a bijection whose graph is in S2, and where the last equality follows
from the definition of Γ.
174
Remark 21. The previous proposition can be thought of as an analogue of the rel-
ative categoricity results for models of PA2. In the 19th Century, Dedekind showed
that any two models (M,+,×, P (M), P (M2), . . .) and (N,⊕,⊗, P (N), P (N2), . . .)
of PA2 are isomorphic ([22] § 132, cf. Shapiro [135] Theorem 4.8 p. 82). How-
ever, it is not difficult to see that Dedekind’s result can be relativized, in the
following way: if (M,+,×,⊕,⊗, S1, S2, . . .) is a structure where Sn ⊆ P (Mn)
such that (M,+,×, S1, S2, . . .) and (M,⊕,⊗, S1, S2, . . .) are models of PA2 and
such that (M,+,×,⊕,⊗, S1, S2, . . .) satisfies every instance of the comprehen-
sion schema (3.8) in the signature of +,×,⊕,⊗, then (M,+,×, S1, S2, . . .) and
(M,⊕,⊗, S1, S2, . . .) are isomorphic (cf. Parsons [120] § 49 pp. 279 ff). The pre-
vious proposition is simply the analogue of this phenomenon in the setting of
HP2.
3.2.2 The Mutual Interpretability of PA2 and HP2
The goal of this section is to present a brief and self-contained proof of the
result that PA2 is mutually interpretable with HP2 (Corollary 30). One half of this
result, namely, the interpretability of HP2 in PA2 is due to Boolos (Corollary 29).
The other half of the result, namely, the interpretability of PA2 in HP2 is now
called Frege’s Theorem, namely (Corollary 27). The proof of Frege’s Theorem
can be broken down into two steps: first, the proof that PA2 is interpretable
in the theory consisting of (Q1)-(Q2) and the comprehension schema (3.3) (cf.
Theorem 22), and second the argument that this theory is interpretable in HP2 (cf.
Theorem 26). Elements of the first step can be found in Dedekind (cf. [22] § 72),
and elements of this second step can be traced back to Frege (cf. Boolos and Heck
[14]). However, the modern presentation stems from Wright [158] pp. 154-169 (cf.
175
also Boolos [12]). The warrant for including a proof of this result here is two-fold:
(i) the proof presented here is much briefer than other published presentations, and
(ii) the proof presented here is slightly different from other published presentations
in that it is centered around the notion of Dedekind-finiteness, defined in terms of
the lack of injective non-surjective functions, as opposed to Frege’s ancestral notion
(cf. the relation X ⊀ X in Proposition 24 and Theorem 26). The observations
recorded in this section about the Π1n-comprehension schema are due to Heck ([67]
p. 192) and Linnebo ([100] p. 161). The trick of defining the graph of addition
and multiplication in terms of its initial segments in the proof of Theorem 22 is
adapted from Burgess and Hazen [16] pp. 6-10, although their concern there was
not with Frege’s Theorem.
Theorem 22. PA2 is interpretable in the theory consisting of (Q1)-(Q2) and
the comprehension schema (3.3). More generally, Π1n − CA0 is interpretable in the
theory consisting of (Q1)-(Q2) and the comprehension schema (3.3) restricted to
Π1n-formulas for n > 0.
Proof. Suppose that we are working with structureM = (M,S1, S2, . . . , 0, s) that
satisfies (Q1)-(Q2) and the comprehension schema (3.3) restricted to Π1n-formulas
for n > 0. In what follows, we will refer respectively to the element 0 and the
function s as “zero” and “successor.” It must be shown how to uniformly define
a model of Π1n − CA0 within this structure. We say that X in S1 is inductive if it
contains zero and is closed under successor. Let N be the intersection of all the
inductive sets X in S1, which exists in S1 by Π11-comprehension. Note that zero is
in N by construction, and note that N is closed under successor: for, if a is in N
then a is contained in every inductive set X, and by definition of inductive sets,
it follows that the successor of a is contained in every inductive set X, which is
176
to say that the successor of a is in N .
Hence, we can define the structure N = (N,S1 ∩ P (N), S2 ∩ P (N2), . . . , 0, s)
uniformly within M. This structure then satisfies (Q1)-(Q2) since M satisfies
(Q1)-(Q2). Further, N satisfies the Mathematical Induction Axiom (3.2), since
if F ∈ S1 ∩ P (N) contains zero and is closed under successor, then F ∈ S1
contains zero and is closed under successor, and so by definition of N , it follows
that N ⊆ F ⊆ N . For (Q3), let X be the subset of N for which the conclusion
holds, i.e., X = {a ∈ N : a 6= 0 → ∃ w ∈ N x = sw}. Clearly zero is in X,
and suppose that a ∈ X ⊆ N : then of course sa = sw for some w ∈ N , namely
w = a, and hence sa ∈ X. Hence, by the Mathematical Induction Axiom (3.2),
it follows that X = N . Finally, before turning to the remainder of the axioms
of Robinson’s Q, note that since M satisfies Π1n-comprehension, we have that N
satisfies Π1n-comprehension as well, since the second-order parts of N are just the
second-order parts of M restricted to subsets of N .
To verify axioms Q4-Q5 of Robinson’s Q, we must first define addition. Let
x + y = z if and only if there is a graph of a partial function G ⊆ N3 such that
(x, y, z) ∈ G ⊆ N3 and
(x, 0, x) ∈ G & [(x, sy, z) ∈ G→ ∃ w sw = z & (x, y, w) ∈ G] (3.39)
That is, we define the graph of addition as the union of its initial segments. Note
that this graph of addition exists by the Π11-Comprehension Schema. Further,
note that addition is well-defined on its domain. Suppose that G0 and G1 are
partial functions which satisfy equation (3.39) and fix an arbitrary x and let
Y = {y ∈ N : ∀ z0, z1 (x, y, z0) ∈ G0 & (x, y, z1) ∈ G1 → z0 = z1}. Clearly, 0 ∈ Y
and if y ∈ Y and (x, sy, z0) ∈ G0 and (x, sy, z1) ∈ G1 then there is w0, w1 such
177
that sw0 = z0 and sw1 = z1 and (x, y, w0) ∈ G0 and (x, y, w1) ∈ G1. Then since
y ∈ Y we have w0 = w1 and hence z0 = sw0 = sw1 = z1. Hence, in fact, addition
is a well-defined function on its domain. To show that it is a total function, fix
an arbitrary x and let Y = {y ∈ N : ∃ z x+ y = z}. Clearly, 0 ∈ Y , since we can
choose G = {(x, 0, x)}. Suppose that y ∈ Y , say, with (x, y, z) ∈ G. To see that
sy ∈ Y , set G′ = G ∪ {(x, sy, sz)}. Then clearly G′ also satisfies equation (3.39).
Hence, in fact, addition is a total function. Finally, the verification of Q4 and Q5
follows directly from our construction in equation (3.39). To verify Q6-Q7, just
define multiplication analogously.
Remark 23. Hence, it remains to show that the theory consisting of (Q1)-(Q2)
and the comprehension schema (3.3) is interpretable in HP2. In preparation for
this result (Theorem 26), we first record some elementary considerations in the
following proposition.
Proposition 24. Suppose that (M,S1, S2, . . . ,#) models AHP0. For X, Y in S1,
define X ≺ Y if and only if there is injective non-surjective function f : X → Y
such that graph(f) is in S2. Then for a, b ∈ M and X,U,A,B in S1, it follows
that
(i) If a /∈ X and X ∪ {a} ≺ X ∪ {a} then X ≺ X.
(ii) If a /∈ X and U ≺ X ∪ {a} then U ≺ X or #U = #X.
(iii) If a ∈ A, b ∈ B and #A = #B then #(A− {a}) = #(B − {b})
(iv) If X 6= ∅ then ∅ ≺ X
(v) X ⊀ ∅
178
Proof. For (i), suppose that f : X ∪ {a} → X ∪ {a} is an injection that is not a
surjection. If f(X) ⊆ X then f(a) = a and so f(X) ( X, and hence X ≺ X. If
f(X) * X then say f(y) = a where y ∈ X and f(a) = z ∈ X, and hence define
g : X → X by g(y) = z and g = f otherwise. Then g is injective and misses
the same point that f does. Further, the graph of g exists by the arithmetical
comprehension schema.
For (ii), suppose that f : U → X∪{a} is an injection which is not a surjection.
If f(U) ⊆ X then #U = #X when f : U → X is a bijection and U ≺ X otherwise.
If f(U) * X then say f(y) = a and f misses b ∈ X, in which case we define an
injective function g : U → X by g(y) = b and g = f otherwise. The graph
of g exists by the arithmetical comprehension schema. If g is a bijection, then
#U = #X and U ≺ X otherwise.
For (iii), suppose that f : A→ B is a bijection. If f(a) = b then f � (A−{a})
is the desired bijection. If f(a) = d for d 6= b and f(c) = b for c 6= a, then define a
bijection g : (A− {a})→ (B − {b}) by g(c) = d and g = f otherwise. The graph
of this function g then exists by the arithmetical comprehension schema.
For (iv), note that the “empty” binary relation witnesses that there is an
injective non-surjective function from ∅ to X.
For (v), note that if X ≺ ∅, then there would be an injective non-surjective
function f : X → ∅, which would imply that there was an element in ∅ \ rng(f),
which would imply that there was some element in ∅.
Remark 25. It is well-known that the chief difficulty in the proof of the following
theorem is establishing the totality of the successor function (cf. remarks to this
effect in Wright [158] p. 161). Prior to looking at the proof, it is helpful to
179
think about what happens on the standard models (α, P (α), P (α2), . . . ,#) from
§ 3.2.1, where α is an ordinal which is not a cardinal and where # : P (α) → α
is cardinality. It is easy to see that ω is uniformly definable in each of these
structures. Further, it is easy to see that for each n ∈ ω, it follows that
{#W : W ≺ {0, . . . , n}} = {0, . . . , n} (3.40)
where as in the previous proposition, X ≺ Y if and only if there is injective
non-surjective function f : X → Y . From this we see that
{0, . . . , n} ⊀ {0, . . . , n} & #{0, . . . , n} = #{#W : W ≺ {0, . . . , n}} (3.41)
as well as
s(#{0, . . . , n}) = s(n+ 1) = n+ 2 = #({0, . . . , n} ∪ {n+ 1})
= #({#W : W ≺ {0, . . . , n}} ∪ {#({0, . . . , n})}) (3.42)
The entire idea of the below proof is to show that we can replicate these consid-
erations in arbitrary models of HP2. So in such an arbitrary model, we will define
an analogue N of ω, and for analogues X of {0, . . . , n}, we will find that
s(#X) = #({#W : W ≺ X} ∪ {#X}) (3.43)
This, in any case, is the heuristic explanation of the proof of the totality of the
successor function in the following theorem.
Theorem 26. The theory consisting of (Q1)-(Q2) and the comprehension schema (3.3)
is interpretable in HP2. More generally, the theory consisting of (Q1)-(Q2) and the
180
comprehension schema (3.3) restricted to Π1n-formulas is interpretable in Π1n − HP0
for n > 0.
Proof. Suppose that we are working with structure M = (M,S1, S2, . . . ,#) that
satisfies Π1n − HP0. It must be shown how to uniformly define a model of (Q1)-(Q2)
and the comprehension schema (3.3) restricted to Π1n-formulas. Define 0 = #∅
and define s(x, y) if and only if there is X, Y in S1 such that #X = x,#Y = y,
and there is b ∈ Y such that #X = #(Y − {b}). That is, s(x, y) says that x, y
are respectively cardinalities of sets X, Y and the cardinality of X is equal to the
cardinality of Y minus one point. Note that the relation s exists in S2 by the Π11-
comprehension schema. In what follows, we will respectively refer to the element
0 and the relation s as “zero” and “successor,” keeping in mind that formally s
is a binary relation. Then say that X in S1 is inductive if it contains zero and is
closed under successors, that is, if x ∈ X and s(x, y) then y ∈ X. Then define
N to be the intersection of all the inductive sets, so that N is in S1 by the Π11-
comprehension schema. Now we show that (i) s is a well-defined function on its
domain and that (ii) s is a total function on N that (iii) maps elements of N to
elements of N and that (iv) satisfies axioms Q1-Q2 on N .
For (i), to see that s is well-defined, suppose that s(x, y) and s(x, z). Then
x = #X, y = #Y , z = #Z and there exists b ∈ Y, c ∈ Z such that #X =
#(Y − {b}) = #(Z − {c}). Then there is bijection f : (Y − {b}) → (Z − {c})
whose graph is in S2. Define f : Y → Z by setting f � (Y −{b}) = f and f(b) = c.
Then the graph of f is in S2 by the arithmetical comprehension schema. Further,
since f : Y → Z is a bijection, it follows that y = #Y = #Z = z. Hence, s is a
well-defined function on its domain.
For (ii), recall from Proposition 24 that for X, Y in S1, we say X ≺ Y if and
181
only if there is an injective non-surjective function f : X → Y such that graph(f)
is in S2. Then by iterated applications of Π11-comprehension, the following exist
in S2 and S1 respectively
R = {(#W,#X) : W ≺ X} (3.44)
Z = {#X : X ⊀ X & ∃ Y (∀ w w ∈ Y ↔ (w,#X) ∈ R) & #X = #Y }(3.45)
Note that
Z = {#X : X ⊀ X & #X = #({#W : W ≺ X})} (3.46)
(It may be heuristically helpful to compare this with equation (3.41)). Suppose
that #X is in Z. Then X ⊀ X and #X = #({#W : W ≺ X}). Then
s(#X,#({#W : W ≺ X} ∪ {#X})) (3.47)
(Likewise, it may be helpful to compare this with equation (3.43)). Hence, we
have the inclusion Z ⊆ {x : ∃ y s(x, y)}, and so it suffices to show that Z is
inductive.
Clearly, 0 ∈ Z. Suppose that #X is in Z, so that X ⊀ X and #X =
#({#W : W ≺ X}). Then s(#X,#({#W : W ≺ X} ∪ {#X})). Since successor
is well-defined on its domain by part (i), it suffices to show that #({#W : W ≺
X} ∪ {#X}) is in Z. We have {#W : W ≺ X} ⊀ {#W : W ≺ X}. Since
#X /∈ {#W : W ≺ X}, it follows from Proposition 24 (i) that {#W : W ≺ X}∪
{#X} ⊀ {#W : W ≺ X} ∪ {#X}. Hence, #({#W : W ≺ X} ∪ {#X}) satisfies
the first conjunct of Z in equation (3.46). To see that #({#W : W ≺ X}∪{#X})
182
satisfies the second conjunct of Z in equation (3.46), it suffices to show that
{#W : W ≺ X} ∪ {#X} = {#U : U ≺ {#W : W ≺ X} ∪ {#X}} (3.48)
For the left-to-right direction, suppose first that W ≺ X. Since X is bijective
with {#W : W ≺ X}, we have that W ≺ {#W : W ≺ X} ∪ {#X}. Continuing
with the left-to-right direction, suppose that #U = #X. Since X is bijective with
{#W : W ≺ X}, we have that #U = #({#W : W ≺ X}) and hence U ≺ {#W :
W ≺ X} ∪ {#X}. For the right-to-left direction, suppose that U ≺ {#W : W ≺
X} ∪ {#X}. Since #X /∈ {#W : W ≺ X}, we have by Proposition 24 (ii) that
#U = #({#W : W ≺ X}) = #X or U ≺ {#W : W ≺ X}. Hence, in fact
equation (3.48) holds. It follows that #({#W : W ≺ X}∪{#X}) is in Z. Hence,
Z is an inductive set, and as mentioned at the close of the above paragraph, it
thus follows that successor is a total function on N .
(iii) Now we show that successor maps elements of N to elements of N . Sup-
pose that a is in N . Then by definition, a is contained in every inductive set, and
by parts (i)-(ii), it follows that there is unique b such that s(a, b), from which it
follows that b is contained in every inductive set, so that b is contained in N as
well. Hence, successor maps elements of N to elements of N .
(iv) Finally, we note that the successor function s satisfies axioms (Q1)-(Q2).
To see that it satisfies (Q1), note that if s#X = 0 = #∅, then ∅ would be
bijective with a non-empty set, which is a contradiction. To see that it satisfies
(Q2), suppose that s#X = s#Y . Then s#X = #A where #X = #(A − {a})
for some a ∈ A and s#Y = #B where #Y = #(B − {b}) for some b ∈ B. Then
Proposition 24 (iii) implies that #X = #(A− {a}) = #(B − {b}) = #Y .
Putting this all together, we can uniformly define the structure N = (N,S1 ∩
183
P (N), S2 ∩P (N2), . . . , 0, s) which satisfies (Q1)-(Q2). Finally, note that sinceM
satisfies Π1n-comprehension, we have that N satisfies Π1
n-comprehension as well,
since the second-order parts of N are just the second-order parts ofM restricted
to subsets of N .
Corollary 27. PA2 is interpretable in HP2. More generally, Π1n − CA0 is inter-
pretable in Π1n − HP0 for n > 0.
Proof. This follows immediately from Theorem 26 and Theorem 22.
Remark 28. The following theorem was first noted by Boolos ([10]). We include
here for the sake of having a relatively self-contained presentation of the main
results in this area, and because we will use Boolos’ construction to transfer facts
about the provability relation from subsystems of PA2 to subsystems of HP2 (cf.
the proofs of Proposition 53 and Proposition 55).
Theorem 29. HP2 is interpretable in PA2. More generally, Π1n − HP0 is interpretable
in Π1n − CA0 for n > 0, and Σ11 − PH0 is interpretable in Σ11 − AC0 and AHP0 is inter-
pretable in ACA0.
Proof. We begin with the proof of the interpretability of AHP0 in ACA0. We will
note how this proof yields all the other results as well. Let us work in a model
M = (M,S1, S2, . . . ,⊕,⊗) of ACA0, where Sn ⊆ P (Mn). We must show how to
uniformly define a model of AHP0. Consider the model N = (M,S1, S2, . . . ,#)
where #(X) = n + 1 if |X| = n, and where #(X) = 0 if X is infinite. Then
N is clearly definable in M since the graph of X is arithmetically definable.
Further, since this graph is arithmetically definable, it follows that N satisfies
the arithmetical comprehension schema. Further, by Simpson [138] Lemma II.3.6
p. 70, ACA0 proves that any two infinite sets are bijective, so that N is a model
184
of AHP0. Hence, in fact we have that AHP0 is interpretable in ACA0. Further,
it is obvious from this construction that N will satisfy whatever comprehension
schemas M satisfies.
Corollary 30. PA2 is mutually interpretable with HP2. More generally, Π1n − CA0
is mutually interpretable with Π1n − HP0 for n > 0.
Proof. This follows immediately from Corollary 27 and Theorem 29.
Remark 31. The notion of faithful interpretability is a modification of inter-
pretability in the following respect: whereas the interpretability of one theory in
another only requires that translations of theorems of the interpreted theory are
theorems of the interpreting theory, faithful interpretability additionally requires
that translations of non-theorems of the interpreted theory are non-theorems of
the interpreting theory (cf. Lindstrom [99] § 6.2 pp. 106 ff). It is not difficult to
see, using the ideas from the proof of Corollary 27 and Theorem 29, that PA2 is
faithfully interpretable in HP2. However, the converse is not obvious, that is, it is
not obvious whether or not HP2 is faithfully interpretable in PA2 (cf. Question 113).
In § 3.2.1, and in particular at Remark 18, we noted that there are numerous nat-
ural sentences that are independent of HP2, whereas there are comparatively few
natural sentences which are known to be independent of PA2. This leads one to
suspect that HP2 is not faithfully interpretable in PA2 or that any such faithful in-
terpretation is comparatively unnatural, since such a faithful interpretation would
allow us to turn all the independent sentences of HP2 into independent sentences
of PA2.
185
3.3 Standard Models of Subsystems of BL2 and Associated Results
The primary goal of this section is to study models of subsystems of BL2 that
are standard in the sense that they have the form (ω, S1, S2, . . . , ∂), where the sets
Sn ⊆ P (ωn) all come from some antecedently fixed computational class (e.g. the
recursive sets, the arithmetical sets, the hyperarithmetical sets, etc.). The main
result of this section is Theorem 60 which gives a construction of a standard model
of the hyperarithmetic subsystem of BL0 in terms of the hyperarithmetic subsets
of natural numbers. Further, this construction isolates a certain sentence Inf (cf.
Definition 58) such that Σ11 − AC0 ≤I Σ11 − LB0 + Inf <I Π
11 − CA0 (cf. Corollary 61
and Figure 3.2).
In the preliminary section § 3.3.1, we record some elementary facts about arbi-
trary models of subsystems of BL2, focusing in particular on the fact that arbitrary
models of the hyperarithmetic subsystems of BL2 require the existence of injective
non-surjective functions (cf. Proposition 38). Such functions are important both
because they are used to define the sentence Inf (cf. Definition 58) and because
such functions are not required to exist by the hyperarithmetic subsystems of HP2
(cf. Remark 37). Further, in the preliminary section § 3.3.2, we review some
elementary facts about hyperarithmetic theory, which we will employ in § 3.3.3.
We also use these facts to fill in some parts of the provability relation (cf. Propo-
sitions 47-53 and Figure 3.1). Finally, in § 3.3.3, we turn to the main results of
this section, namely the aforementioned Theorem 60 and Corollary 61.
3.3.1 Generalities on Models of Subsystems of BL2
Proposition 32. Suppose that Y ⊆ M is definable with parameters by an
arithmetical formula in the structure (M,S1, S2, . . . , ∂) (resp. in the structure
186
(M,S1, S2, . . . ,#)). Then Y is definable with parameters by an arithmetical for-
mula that does not contain any instances of ∂ (resp. does not contain any instances
of #).
Proof. If Y ⊆ M is definable in (M,S1, S2, . . . , ∂) by an arithmetical formula ϕ,
and if ∂(P ) appears in ϕ, then P is not free in ϕ but rather is a parameter from
S1 and hence a = ∂(P ) is a parameter from M . So, replacing parameters from
S1 with parameters from M , it follows that the set Y is also definable by an
arithmetical formula that does not contain any instances of ∂.
Proposition 33. Suppose that M is a structure and ∂ : D(M)→M is an injec-
tion, whereD(Mn) is the definable subsets ofMn. Then (M,D(M), D(M2), . . . , ∂)
is a model of ABL0.
Proof. It is a model of Basic Law V since ∂ is an injection (cf. discussion subse-
quent to (3.6)). Further, it satisfies the arithmetical comprehension schema, since
if X ⊆ M is defined by an arithmetical formula, then by Proposition 32 it is de-
fined by an arithmetical formula which does not include any instances of ∂. Hence,
since D(M) is closed under arithmetical comprehension, it follows that X is in
D(M), so that the structure (M,D(M), D(M2), . . . , ∂) satisfies the arithmetical
comprehension schema.
Proposition 34. Suppose that (M,S1, S2, . . . , ∂) is a model of ∆11 − BL0. (a)
Then there is a injective function s : M → M such that s(x) = ∂({x}) and such
that graph(s) is in S2. (b) Further, there is a function s : Mn → M such that
s(x1, . . . , xn) = ∂({x1, . . . , xn}) and such that graph(s) is in Sn+1.
Proof. The proof of (b) is identical to the proof of (a), so we present only the proof
of (a). It suffices to show three things: first, that the graph of this function is ∆11,
187
second that this function is well-defined and total, and third that the function is
injective. Note that the following Σ11 and Π1
1-definitions of s(x) = y agree:
[∃ X (∀ z z ∈ X ↔ z = x) & ∂X = y]⇐⇒ [∀ Y (∀ z z ∈ Y ↔ z = x)→ ∂Y = y]
(3.49)
Suppose that the left-hand-side of this equation holds and that Y = {x}. Then
Y = X and hence ∂(Y ) = ∂(X) = y. Conversely, suppose that the right-hand-
side of this equation holds. By arithmetical comprehension, form the set X =
{x}. Then by the right-hand-side it is the case that ∂(X) = y. Hence, by
∆11-comprehension, there is an s such that s(x, y) if and only if both the left-
hand-side and the right-hand-side of the above equation holds with respect to x
and y. To see that the function is well-defined, suppose that the left-hand-side
holds both of x and y and of x and z. By arithmetical comprehension, form the
set Y = {x}. Then the right-hand-side implies that y = ∂(Y ) = z. Hence, the
function is well-defined. Further, it is everywhere defined because given x one
can use arithmetical comprehension to form X = {x}, and hence x and ∂(X)
will satisfy the right-hand-side. Finally, to see that the function X is injective,
suppose that s(x) = s(y). Then ∂({x}) = ∂({y}). By Basic Law V, it follows
that {x} = {y} and hence that x = y.
Remark 35. The following proposition generalizes the construction in the Russell
Paradox (cf. Proposition (10)). Note that in the following proposition, the term
rng∂ is employed to designate the range of the function ∂. However, this set need
not exist in the second-order parts of any of the models under consideration, even
though it is is defined by a Σ11-formula in these models.
Proposition 36. Suppose that (M,S1, S2, . . . , ∂) is a model of ∆11 − BL0. For every
188
A in S1 such that A ⊆ rng∂, there is B in S1 such that B ⊆ A and ∂B ∈ rng∂−A.
Proof. First we claim that for all x it is the case that
[∃ X x ∈ A & ∂X = x & x /∈ X]⇐⇒ [∀ Y x ∈ A & (∂Y = x→ x /∈ Y )] (3.50)
Suppose that the left-hand-side holds, i.e., suppose that x ∈ A & ∂X = x & x /∈
X, and further suppose that Y is such that ∂Y = x. Then ∂X = x = ∂Y and
Basic Law V implies that X = Y . Conversely, suppose that the right-hand-side
holds, i.e., suppose it is the case that ∀ Y x ∈ A & (∂Y = x → x /∈ Y ). Since
x ∈ A ⊆ rng∂, there is X such that ∂X = x, and hence x /∈ X. The claim
is proved, and, hence, by the ∆11-Comprehension Schema, there exists B such
that x ∈ B if and only if both the left-hand-side and right-hand-side of (3.50)
hold with respect to x. Note that it follows automatically from the left-hand-side
that B ⊆ A. So it remains to show that ∂B ∈ rng∂ − A. Suppose not. Then
∂B ∈ rng∂ ∩ A. Then either ∂B ∈ B or ∂B /∈ B. If ∂B ∈ B then by right-
hand-side we have ∂B /∈ B, which is a contradiction. If ∂B /∈ B, then by the
left-hand-side we have that ∀ X ∂B /∈ A ∨ ∂X 6= ∂B ∨ ∂B ∈ X. Applying this
to X = B we have that ∂B /∈ A ∨ ∂B 6= ∂B ∨ ∂B ∈ B. Since by hypothesis we
have that ∂B ∈ rng∂∩A, we must conclude that ∂B ∈ B, which again contradicts
our supposition. Hence, in fact, ∂B ∈ rng∂ − A.
Remark 37. The following corollary is important because it shows that satisfying
∆11 − BL0 requires the existence of injective non-surjective functions. As we note
in Proposition 39 and later in Corollary 80, this is not the case with ABL0 and
∆11 − HP0.
Corollary 38. Suppose that (M,S1, S2, . . . , ∂) is a model of ∆11 − BL0. Then there
189
is a injective non-surjective function s : M → M such that graph(s) is in S2 and
such that s(x) = ∂({x}).
Proof. By Proposition 34 there is an injective function s : M → M such that
rng(s) ⊆ rng∂ and such that graph(s) is in S2 and such that s(x) = ∂({x}). By
Proposition 36, there is B in S1 such that B ⊆ rng(s) and ∂B ∈ rng∂ − rng(s).
Hence, s : M →M is not surjective.
Proposition 39. There is a structure (M,S1, S2, . . .) such that
(i) For any injection ∂ : S1 → M it is the case that (M,S1, S2, . . . , ∂) mod-
els ABL0.
(ii) There is no injection ∂ : S1 → M such that (M,S1, S2, . . . , ∂) models
∆11 − BL0.
Proof. Let M be an algebraically closed field (cf. Marker [107] Example 4.3.10
p. 140) and let Sn = D(Mn), i.e. the definable subsets of Mn. Suppose that
s : M →M was an injective surjective function whose graph was in S2 = D(M2).
Then this implies that there is a definable injective non-surjective function s :
M →M , which contradicts Ax’s Theorem (cf. Theorem 72). For (i), note that by
Proposition 33, the structure (M,S1, S2, . . . , ∂) is a model of ABL0 for any injection
∂ : D(k)→ k. For (ii), note that if there was such an injection ∂ : S1 →M , then
by Corollary 38, there would be an injective non-surjective s : M →M such that
graph(s) is in S2, which is a contradiction.
3.3.2 Hyperarithmetic Theory and Related Results
Definition 40. Suppose thatX, Y ∈ 2ω. ThenX ≤T Y ifX is Turing computable
from Y or if X is ∆0,Y1 . Further, X ≤a Y if X is arithmetical in Y or if there
190
is n > 0 such that X is ∆0,Yn . Finally, X ≤h Y if X is hyperarithmetic in Y or
if X is ∆1,Y1 (For computational definitions of these reducibilities and proofs that
they correspond with the relevant definability notion, see respectively Soare [139]
p. 64, Odifreddi [118] p. 375, Sacks [134] p. 44).
Definition 41. Suppose that Y ∈ 2ω. Then define
REC(Y ) = {X ∈ 2ω : X ≤T Y } (3.51)
ARITH(Y ) = {X ∈ 2ω : X ≤a Y } (3.52)
HYP(Y ) = {X ∈ 2ω : X ≤h Y } (3.53)
Further, let REC = REC(∅) and ARITH = ARITH(∅) and HYP = HYP(∅) (cf.
Simpson [138] Remark I.7.5. p. 25, Example I.11.2 p. 39).
Remark 42. Recall that structures in the language of HP2 and BL2 have the form
(M,S1, S2, . . . ,#), where Sn ⊆ P (Mn) and # : S1 → M (cf. equation (3.4)).
If # : HYP(Y ) → ω, then (ω,HYP(Y ),#) will be used as an abbreviation for
the structure (ω, S1, S2, . . . ,#), where Sn ⊆ P (ωn) is the set of n-ary relations
whose graph is in HYP(Y ). Similarly, in what follows, we will sometimes use the
abbreviations (ω,REC(Y ),#) and (ω,ARITH(Y ),#).
Proposition 43. The relation X ≤h Y is Π11.
Proof. See Sacks [134] p. 45.
Theorem 44. (Kleene’s Theorem on Restricted Quantification) Suppose that
ϕ(X, Y ) is a Π11 predicate. Then ∃ X ≤h Y ϕ(X, Y ) is a Π1
1-predicate. Moreover,
this is provable in Π11 − CA0.
191
Proof. See Kleene [92] and Moschovakis [113] Theorem 4D.3 p. 220. That this
theorem is provable in Π11 − CA0 was noted by Simpson [138] VIII.3.20 p. 330.
Theorem 45. (Spector-Gandy Theorem) Suppose that ϕ(Y ) is a Π11-predicate.
Then there is an arithmetic predicate ψ(X, Y ) such that ϕ(Y ) ↔ ∃ X ≤h
Y ψ(X, Y ).
Proof. See Spector and Gandy ([140], [49]), Sacks [134] Theorem III.3.5 p. 61 and
Exercise III.3.13 p. 62.
Remark 46. The following proposition is non-trivial only because the second-
order quantifiers must be evaluated with respect to the second-order part S1 ⊆
P (ω) of the structure (ω, S1) and not with respect to P (ω) itself. For instance,
one cannot infer that (ω,HYP(Y )) |= ¬Π11 − CA0 simply from the fact that OY is
Π11 but not Σ1
1, since to say this is merely to say that OY is Π11-definable but not
Σ11-definable on the structure (ω, P (ω)).
Proposition 47. Suppose that Y ∈ 2ω. Then (ω,ARITH(Y )) |= ACA0+¬∆11 − CA0
and (ω,HYP(Y )) |= Σ11 − AC0 + ¬Π11 − CA0.
Proof. For the fact that (ω,ARITH(Y )) |= ACA0, see Simpson [138] Theorem VIII.1.13
p. 313. Suppose that (ω,ARITH(Y )) |= ∆11 − CA0. But note that
(n,m) ∈ Y (ω) ⇐⇒ ∃ X ∈ ARITH(Y ) X = ⊕ni=1Y(i) & (n,m) ∈ X
⇐⇒ ∀ X ∈ ARITH(Y ) X = ⊕ni=1Y(i) → (n,m) ∈ X (3.54)
and hence Y (ω) ∈ ARITH(Y ), which would contradict Tarski’s Theorem on Truth.
Hence, in fact (ω,ARITH(Y )) |= ¬Σ11 − AC0. For the fact that (ω,HYP(Y )) |=
Σ11 − AC0, see Simpson [138] Theorem VIII.4.5 p. 334 and Theorem VIII.4.8 p. 335.
192
This proof uses Kleene’s Theorem on Restricted Quantification 44, and below in
Theorem 60 we will emulate this proof in the setting of BL2. Suppose for the
sake of contradiction that (ω,HYP(Y )) |= Π11 − CA0. Since OY is Π1,Y1 , by the
Spector-Gandy Theorem (45), there is an arithmetic predicate ψ(n,X, Y ) such
that n ∈ OY ⇐⇒ ∃ X ≤h Y ψ(n,X, Y ). ThenOY is Σ11-definable on (ω,HYP(Y ))
and hence exists in HYP(Y ) by Π11 − CA0, which contradicts that OY is not in
HYP(Y ).
Corollary 48. Suppose that there is a Π11-formula θ(X, Y, Z) such that for all
Z ∈ 2ω the set GZ = {(X, Y ) ∈ 2ω × 2ω : θ(X, Y, Z)} is the graph of a function
gZ : HYP(Z) → HYP(Z). Then the graph GZ of gZ is Σ11-definable in the
structure (ω,HYP(Z)) uniformly in Z.
Proof. Note that since gZ : HYP(Z)→ HYP(Z), we have that for all X, Y, Z ∈ 2ω
θ(X, Y, Z) =⇒ X ⊕ Y ≤h Z (3.55)
By the Spector-Gandy Theorem (45), there is an arithmetical predicate ψ(X, Y, Z,W )
such that for all X, Y, Z ∈ 2ω
θ(X, Y, Z)⇐⇒ ∃ W ≤h X ⊕ Y ⊕ Z ψ(X, Y, Z,W ) (3.56)
Putting the two previous equations together, we have that for all X, Y, Z ∈ 2ω
θ(X, Y, Z)⇐⇒ ∃ W ≤h Z ψ(X, Y, Z,W ) (3.57)
Then for all X, Y, Z ∈ 2ω
gZ(X) = Y ⇐⇒ (ω,HYP(Z)) |= ∃ W ψ(X, Y, Z,W ) (3.58)
193
Hence, in fact the graph GZ of gZ is Σ11-definable in the structure (ω,HYP(Z))
uniformly in Z.
Theorem 49. (Kondo’s Uniformization Theorem) Suppose that ϕ(X, Y ) is a Π11
predicate. Then there is a Π11-predicate ϕ′(X, Y ) such that
∀ X, Y [ϕ′(X, Y )→ ϕ(X, Y )] (3.59)
∀ X [∃ Y ϕ(X, Y )]→ [∃!Y ϕ′(X, Y )] (3.60)
Moreover, this is provable in Π11 − CA0.
Proof. See Moschovakis [113] pp. 235-236. Simpson notes that Kondo’s theorem
is provable in Π11 − CA0 (cf. [138] Theorem VI.2.6 p. 225).
Remark 50. The following two propositions use some of the preceding material
to fill in some information about the probability relation (cf. Figure 3.1).
Proposition 51. There are models of ABL0 + ¬∆11 − BL0.
Proof. Choose any injection ∂ : ARITH→ ω. Then by Proposition 33 the struc-
ture (ω,ARITH, ∂) is a model of ABL0. Further, since the graphs of addition and
multiplication are in ARITH, if (ω,ARITH, ∂) |= ∆11 − BL0, then one would have
that ∅(ω) ∈ ARITH (cf. equation (3.54)), which would contradict Tarski’s theorem
on truth.
Remark 52. The construction in the following proposition is the same construc-
tion as Boolos used to prove the interpretability of HP2 in PA2 (cf. the proof of
Corollary 29).
Proposition 53. There are models of AHP0 +¬∆11 − HP0 and Σ11 − PH0 +¬Π11 − HP0
and ∆11 − HP0 + ¬Σ11 − PH0
194
Proof. Define a function # : ARITH → ω by #X = 0 if X is infinite and
#X = |X| + 1 if X is finite. By Simpson [138] Lemma II.3.6 p. 70, ACA0 proves
that any two infinite sets are bijective, and hence (ω,ARITH,#) is a model of
Hume’s Principle. Further, it satisfies the arithmetical comprehension schema,
since if X ⊆ ω is defined by an arithmetical formula, then by Proposition 32 it
is defined by an arithmetical formula that does not include any instances of #.
Hence, since ARITH is closed under arithmetical comprehension, it follows that X
is in ARITH, so that the structure (ω,ARITH,#) satisfies the arithmetical com-
prehension schema. Since ∅(ω) /∈ ARITH but ∅(ω) is ∆11-definable over ARITH us-
ing the graphs of addition and multiplication as parameters (cf. equation (3.54)),
we have that (ω,ARITH,#) is a model of AHP0 + ¬∆11 − HP0. Similarly, using the
fact that the graph of # is arithmetical, we can argue that (ω,HYP,#) is a model
of Σ11 − PH0 + ¬Π11 − HP0. Likewise, Steel constructs a sequence of reals Gn such
that (ω,⋃∞n=1 HYPG1⊕···⊕Gn) is a model of ∆11 − CA0 +¬Σ11 − AC0 ([141] Theorem 4
pp. 68 ff), and we can argue as before that (ω,⋃∞n=1 HYPG1⊕···⊕Gn ,#) is a model
of ∆11 − HP0 + ¬Σ11 − PH0.
Remark 54. The following two propositions use elementary considerations about
arithmetical sets (cf. Definition 41) to record some observations about natural
functions whose existence cannot be proven in ABL0 or AHP0. For the motivation
for these propositions, see § 2.2, and in particular around equation (3.20). The only
reason for including these propositions here is that it seemed prudent to delay their
proof until the arithmetical sets had been introduced, which we did earlier in this
section (cf. Definition 41). Note that the construction in the following proposition
is analogous to the construction used by Boolos to prove the interpretability of
HP2 in PA2 (cf. the proof of Corollary 29).
195
Proposition 55. There is a structure M and a function # : D(M)→M , where
D(Mn) is the definable subsets of Mn, such that (M,D(M), D(M2), . . . ,#) is a
model of AHP0, and further there is binary relation R in D(M2) such that the set
{(n,m) : #(Rn) = m} does not exist in D(M2), where Rn = {x : Rnx}.
Proof. Let M be the standard model of first-order arithmetic (ω,+,×) so that
D(M) are the arithmetical sets ARITH. Choose a real Z /∈ ARTIH, such as
∅(ω), and enumerate Z as z0, z1, z2, . . .. Define the function # : ARITH → ω by
#(X) = zn if X is finite and |X| = n and define #(X) = z∞ for some z∞ /∈ Z if X
is infinite. This structure satisfies arithmetical comprehension, since if X ⊆M is
defined by an arithmetical formula, then by Proposition 32 it is defined by an arith-
metical formula which does not include any instances of #. Hence, since D(M)
is closed under arithmetical comprehension, it follows that X is in D(M), so that
the structure (M,D(M), D(M2), . . . ,#) satisfies the arithmetical comprehension
schema. Further, by Simpson [138] Lemma II.3.6 p. 70, ACA0 proves that any two
infinite sets are bijective, and hence the structure (M,D(M), D(M2), . . . ,#) is a
model of Hume’s Principle. Hence, (M,D(M), D(M2), . . . ,#) is a model of AHP0.
Consider now the set R = {(n,m) : m < n}, which is clearly arithmetical and so
exists in D(M2). Then Rn = {x : Rnx} = {0, . . . , n− 1} and #(Rn) = zn. Then
the set
{(n,m) : #(Rn) = m} = {(n,m) : zn = m} (3.61)
is equal to the graph of n 7→ zn, which is not arithmetical: for, if it were arithmeti-
cal, then its range Z would be be arithmetical, which contradicts the hypothesis
on Z.
Proposition 56. There is a structure M and an injection ∂ : D(M)→M , where
D(Mn) is the definable subsets of Mn, such that (M,D(M), D(M2), . . . , ∂) is a
196
model of ABL0, and further there is binary relation R in D(M2) such that the set
{(n,m) : ∂(Rn) = m} does not exist in D(M2), where Rn = {x : Rnx}.
Proof. Let M be the standard model of first-order arithmetic (ω,+,×) so that
D(M) are the arithmetical sets ARITH. Choose a real Z /∈ ARTIH, such as
∅(ω), and enumerate Z as z0, z1, z2, . . .. Choose an injection ∂ : ARITH → ω
such that ∂({n}) = zn, which we can do since Z is coinfinite (since it is not
arithmetical). Then by Proposition 33, the structure (M,D(M), D(M2), . . . , ∂)
is a model of ABL0. Consider now the diagonal R = {(n,m) : n = m} which is
clearly arithmetical and so exists in D(M2). Then Rn = {x : Rnx} = {n} and
∂(Rn) = ∂({n}) = zn. Then the set
{(n,m) : ∂(Rn) = m} = {(n,m) : zn = m} (3.62)
is equal to the graph of n 7→ zn, which is not arithmetical: for, if it were arithmeti-
cal, then its range Z would be be arithmetical, which contradicts the hypothesis
on Z.
3.3.3 Standard Models of the Hyperarithmetic Subsystems of BL2
Remark 57. Recall from Proposition 34 that ∆11 − BL0 proves the existence of the
graph of an injective function s : M →M such that s(x) = ∂({x}). This function
is is mentioned in the following axiom.
Definition 58. The following sentence Inf is a sentence in the signature of BL2:
197
Inf ≡ ∃ s : M →M [∀ x s(x) = ∂({x})] &
∃ N [∂(∅) ∈ N & ∀ x x ∈ N → sx ∈ N ]
& ∀ N ′ [∂(∅) ∈ N ′ & ∀ x x ∈ N ′ → sx ∈ N ′]→ N ⊆ N ′
& ∃ ⊕ : N2 → N ∃ ⊗ : N2 → N ∃ � ⊆ N2
[(N, ∂(∅), s,⊕,⊗,�) |= (Q1)− (Q8)] (3.63)
Intuitively, Inf says that there is a smallest set N which contains the zero ele-
ment ∂(∅) and which is closed under the successor function s(x) = ∂({x}) and
which has addition and multiplication functions ⊕ and ⊗ and an ordering relation
� which satisfy the eight axioms of Robinson’s Q.
Remark 59. The following theorem and its corollary is the main result of § 3.3.
Recall that the Russell paradox showed that BL0 and Π11 − BL0 is inconsistent (cf.
Proposition (10)). Recently Ferreira and Wehmeier ([40]) showed that ∆11 − BL0 is
consistent, using Barwise and Schlipf’s recursively-saturated model construction.
In § 3.4.1, we present a generalization of this construction (cf. Theorem 70), which
we apply to ∆11 − BL0 and ∆11 − HP0 (cf. Proposition 83, Corollary 99, Theorem 108,
and Remark 109). However, the recursively-saturated model construction does not
provide one with natural models, simply because most natural structures are not
recursively saturated (unless of course they are saturated tout court). Hence, this
raises the question of whether there are natural models of ∆11 − BL0. The following
theorem constructs a model of ∆11 − BL0 which is mutually interpretable with the
minimal ω-model of ∆11 − CA0, namely, the model whose second-order part consists
of the hyperarithmetic sets.
198
Theorem 60. For any real Y ∈ 2ω, there is a map ∂Y : HYP(Y )→ ω with Π1,Y1 -
graph such that (i) the structure MY = (ω,HYP(Y ), ∂Y ) is a model of (a) Σ11 − LB0
and (b) the sentence Inf, and such that (ii) the structures MY = (ω,HYP(Y ), ∂Y )
and (ω, 0, S,+,×,≤,HYP(Y )) are mutually interpretable uniformly in Y , in the
following sense:
(a) The map ∂Y : HYP(Y ) → ω is definable in (ω,HYP(Y ), 0, s,+,×,≤) uni-
formly in Y .
(b) An isomorphic copy HY of the structure (ω,HYP(Y ), 0, s,+,×,≤) is defin-
able in the structure MY = (ω,HYP(Y ), ∂Y ) uniformly in Y .
Moreover, all these facts are provable in Π11 − CA0.
Proof. Define P (Y ⊕X,n) iff X ∈ HYP(Y ) and n = 〈a, e〉 is a hyperarithmetical-
in-Y index of X:
P (Y ⊕X, 〈a, e〉) ≡ X ∈ HYP(Y ) & a ∈ OY & X = {e}HYa (3.64)
Since the relation X ∈ HYP(Y ) is Π11 and membership in HY
a is ∆1,Y1 for a ∈ OY
, we have that P (Y ⊕ X,n) is a Π11-predicate. By Kondo uniformization (Theo-
rem 49), there is a Π11-uniformization P ′ of P . For Y ∈ 2ω, define ∂Y (X) = n if
and only if P ′(Y ⊕X,n). Since ∂Y (X) = n implies that n is a hyperarithmetical-
in-Y index of X, we have that ∂Y : HYP(Y )→ ω is an injection and hence MY =
(ω,HYP(Y ), ∂Y ) is a model of Basic Law V. Note that since ∂Y : HYP(Y )→ ω has
a Π1,Y1 -graph, the Corollary to the Spector-Gandy Theorem (cf. Corollary 48) im-
plies that ∂Y : HYP(Y )→ ω is definable in the structure (ω,HYP(Y ), 0, S,+,×,≤
), and this establishes (ii)(a).
199
To establish (i)(a), note that since ∂Y : HYP(Y ) → ω is an injection, it
follows that MY = (ω,HYP(Y ), ∂Y ) is a model ABL0 (as in the proof of Propo-
sition 33). To see that it also models the Σ11-choice schema (3.18), suppose that
MY |= ∀ z ∃ X ϕ(z,X, ∂Y (X)), where ϕ is an arithmetical formula. (The
proof for the case where z is replaced by a tuple z, or where there are mul-
tiple existential set quantifiers and multiple existential relation quantifiers, or
where there are parameters from the model present in ϕ is exactly similar). Then
MY |= ∀ z ∃ X ∃ e [∂Y (X) = e ∧ ϕ(z,X, e)]. Define a relation Q(Y ⊕ {z}, X) as
follows:
Q(Y ⊕ {z}, X)⇐⇒ X ∈ HYP(Y ) & ∃ e [∂Y (X) = e ∧ ϕ(z,X, e)] (3.65)
Then Q is a Π11-predicate. By Kondo uniformization, there is a Π1
1-uniformization
Q′ of Q. For Y ∈ 2ω, define qY (z) = X if and only if Q′(Y ⊕ {z}, X) and let
RY = {(z, x) : ∃ X ∈ HYP(Y ) qY (z) = X ∧ x ∈ X} (3.66)
Then by Kleene’s Theorem on Restricted Quantification 44, RY is Π1,Y1 -definable.
Moreover, since Q′ is a uniformization, we also have
RY = {(z, x) : ∀ X ∈ HYP(Y ) qY (z) = X → x ∈ X} (3.67)
Again, by Kleene’s Theorem on Restricted Quantification (44), the set RY is
Σ1,Y1 -definable. Hence RY is ∆1,Y
1 and so RY ∈ HYP(Y ). Finally, since Q′ is a
uniformization, we have that MY |= ∀ z ϕ(z, (RY )z, ∂Y ((RY )z)), so in fact MY is
a model of Σ11 − BL0 and this establishes (i)(a).
To show (i)(b) and (ii)(b), we first prove (ii)(b) and then note how our proof of
200
(ii)(b) in fact establishes (i)(b). Recall that by Proposition 34, there is an injective
function sY : ω → ω whose graph is in HYP(Y ) such that sY (n) = ∂Y ({n}) for
all n ∈ ω. Define an sY -recursive function fY : ω → ω:
fY (0) = ∂Y (∅) & fY (n+ 1) = sY (fY (n)) (3.68)
Let NY be the range of fY , so that both the graph of fY and its range NY are
in HYP(Y ). Since NY = rng(fY ) and dom(fY ) = ω, the following induction
principle holds:
∀ P [fY (0) ∈ P & ∀ n ∈ ω fY (n) ∈ P → fY (n+ 1) ∈ P ]→ NY ⊆ P (3.69)
Using this form of induction, one can show that fY : ω → NY is injective, so that
its inverse f−1Y : NY → ω is likewise in HYP(Y ). Further, one can arithmetically
define from NY , fY and f−1Y the functions ⊕Y : N2
Y → NY and ⊗Y : N2Y → NY as
follows:
fY (x)⊕ fY (y) = fY (f−1Y (x) + f−1
Y (y)) fY (x)⊗ fY (y) = fY (f−1Y (x) · f−1
Y (y))
(3.70)
and then arithmetically define a relation � on N2Y by
x �Y y ⇐⇒ ∃ z ∈ NY x⊕Y z = y (3.71)
Further one can extend the map to fY : HYP(Y )→ (P (NY )∩HYP(Y )) by setting
fY (X) = {fY (n) : n ∈ ω} (3.72)
201
and define the following structure in the signature of (ω, 0, S,+,×,≤,HYP(Y )):
HY = (NY , ∂Y (∅), sY ,⊕Y ,⊗Y ,�Y , fY (HYP(Y ))) (3.73)
Then the functions fY and fY witness that the two structures (ω, 0, S,+,×,≤
,HYP(Y )) and HY are isomorphic.
Further, note that HY is definable within MY : for, by the induction princi-
ple (3.69) one can show that NY is the unique smallest set containing ∂Y (∅) and
closed under sY , and using equation (3.70) and the induction principle (3.69) one
can show that ⊕Y and ⊗Y are the unique functions on NY satisfying the following
recursion clauses
x⊕Y ∂Y (∅) = x x⊕Y (sY (z)) = sY (x⊕Y z) (3.74)
x⊗Y ∂Y (∅) = ∂Y (∅) x⊗Y (sY (z)) = (x⊗Y z)⊕Y x (3.75)
Hence, since HY and (ω,HYP(Y ), 0, s,+,×,≤) are isomorphic and since HY is
definable in MY , we have established (ii)(b). Finally, note by construction that
the structure HY witnesses that MY is a model of the axiom Inf, so that we have
established (i)(b).
Corollary 61. Σ11 − AC0 ≤I Σ11 − LB0 + Inf <I Π
11 − CA0.
Proof. Note that Σ11 − AC0 ≤I Σ11 − LB0 + Inf because the sentence Inf (cf. Defini-
tion 58) literally provides an interpretation. To see that Σ11 − LB0+Inf <I Π11 − CA0,
note that since the previous theorem can be proven in Π11 − CA0, it follows that
Π11 − CA0 proves the consistency of Σ11 − LB0 + Inf. The construction above also
provides an interpretation of Σ11 − LB0 + Inf in Π11 − CA0, so that the result follows
from Proposition 13.
202
3.4 Barwise-Schlipf Models of Subsystems of BL2 and HP2
In this section, we turn to building models of subsystems of BL2 and HP2 on
top of various recursively saturated fields. In particular, § 3.4.1 is devoted to
the statement and proof of a generalization of a theorem of Barwise-Schlipf and
Ferreira-Wehmeir (Theorem 70). Then in §§ 3.4.2-3.4.4 three applications of this
theorem are presented. The major result here is Corollary 99, which says that
Σ11 − PH0 <I ACA0, and this fills in a key piece of Figure 3.2 about the inter-
pretability relation.
3.4.1 Generalized Barwise-Schlipf/Ferreira-Wehmeier Theorem
The main theorem of this section (Theorem 70) is a generalization of the way in
which Barwise-Schlipf ([6]) built models of ∆11 − CA0 on top of recursively saturated
models of Peano arithmetic, and the way in which Ferreira-Wehmeir ([40]) built
models of ∆11 − BL0 on top of recursively saturated structures. The new addition
is the concept of a uniformly definable function ∂ : D(M) → M (Definition 62).
Subsequent to defining this notion, the definitions of definable skolem functions
and recursively saturated structures are recalled, and then Theorem 70 is stated
and proven.
Definition 62. Suppose that M is an L-structure and let D(Mn) be the definable
subsets of Mn. Then ∂ : D(M) → M is uniformly definable if for all L-formula
θ(x, y) with all free variables displayed and with a non-empty set y of parameter
variables, there is an L-formula θ′(x, y) with the same free variables, such that
{∂(θ(·, a))} = {x : M |= θ′(x, a)} for all a ∈M .
Definition 63. Suppose that L is countable and that M is an L-structure and
that B ∈ 2ω. Then ∂ : D(M) → M is B-computably uniformly definable if it is
203
uniformly definable and the map θ 7→ θ′ is B-computable.
Definition 64. Suppose that M is an L-structure. Then M has definable skolem
functions if for every definable set P ⊆Mm+n there is a definable set P ′ ⊆Mm+n
such that
M |= ∀x, y [P ′xy → Pxy] (3.76)
M |= ∀ x [∃ y Pxy]→ [∃! y P ′xy] (3.77)
Remark 65. Note that in this definition, the parameters used to define P ′ may
exceed those used to define P . Note also the obvious similarity between definable
skolem functions and the uniformization results, such as Kondo’s Uniformization
Theorem 49, which we employed in Theorem 60. In particular, equations (3.76)-
(3.77) are nearly identical to equations (3.59)-(3.60).
Definition 66. Suppose that M is an L-structure and A ⊆ M . A set of A-
formulas p(v) in finitely many variables v is realized in M if there is an b in M
such that M |= θ(b) for every A-formula θ(v) in p(v). A set of A-formulas p(v)
is finitely realized in M if every finite subset p0(v) of p(v) is realized in M . The
structure M is saturated if for every A ⊆ M with |A| < |M | and every set of
A-formulas p(v), if p(v) is finitely realized in M then p(v) is realized in M .
Definition 67. Suppose that L and M are countable and B ∈ 2ω. Then M is
B-recursively saturated if for every finite A ⊆ M and every B-computable set of
A-formulas p(v), if p(v) is finitely realized in M then p(v) is realized in M .
Remark 68. The following proposition records the very elementary observation
that saturated structures (resp. B-recursively saturated structures) have a kind
of compactness property, in that each covering of Mn by definable sets has a finite
204
sub-covering (resp. each B-recursive covering of Mn by definable sets has a finite
sub-covering).
Proposition 69. Suppose that M is a saturated L-structure (resp. B-recursively
saturated L-structure) and that A ⊆ M with |A| < |M |. Further, suppose that
{θi(v)}i∈I is a set of A-formulas (resp. B-computable set of A-formulas). Then
[M |= ∀ a∨i∈I
θi(v)] =⇒ [∃ finite I0 ⊆ I M |= ∀ a∨i∈I0
θi(v)] (3.78)
Proof. The contrapositive of equation (3.78) says that if the set of A-formulas
p(v) = {¬θi(a) : i ∈ I} is finitely realized, then it is realized.
Theorem 70. Suppose that M is an L-structure and ∂ : D(M) → M such that
the structure N = (M,D(M), D(M2), . . . , ∂) models ABL0 (resp. AHP0). Suppose
that B ∈ 2ω. Then
(i) If ∂ : D(M) → M is uniformly definable and M is saturated, then the
structure N models ∆11 − BL0 (resp. ∆11 − HP0).
(ii) If ∂ : D(M) → M is uniformly definable and M is saturated, then the
structure N models Σ11 − LB0 (resp. Σ11 − PH0) if and only if M has definable
skolem functions.
(iii) If ∂ : D(M) → M is B-computably uniformly definable and M is B-
recursively saturated, then the structureN models ∆11 − BL0 (resp. ∆11 − HP0).
(iv) If ∂ : D(M) → M is B-computably uniformly definable and M is B-
recursively saturated, then the structure N models Σ11 − LB0 (resp. Σ11 − PH0)
if and only if M has definable skolem functions.
205
Proof. In all four parts of this proof, the proof is identical between Basic Law V
and Hume’s Principle, and so we only include the proofs for the case of Ba-
sic Law V. Further, the proof of (i) and (iii) are parallel and the proof of (ii)
and (iv) are parallel, and so we present the proofs of (i) and (iii) simultaneously
and the proofs of (ii) and (iv) simultaneously. For (i) and (iii), suppose that
∂ : D(M)→ M is uniformly definable (resp. B-computably uniformly definable)
and M is saturated (resp. B-recursively saturated). To see that N is a model
of ∆11 − BL0, suppose that there is a subset Z of Mn which is defined on N by a
Σ11-formula ϕ(z) and by a Π1
1-formula ψ(z). Let us suppose that ϕ(z) and ψ(z)
use exactly one set parameter A ∈ D(M) where
A = {w ∈M : M |= ρ(w, a)} (3.79)
and where ρ(w, v) is an ∅-formula with a ∈ M , since the proof in the case where
there are multiple parameters, with some being objects, some sets, and some
binary relations etc., is exactly identical. Further, let us suppose that ϕ(z) ≡
∃ X ϕ0(z,X, ∂(X), A) and that ψ(z) ≡ ∀ X ψ0(z,X, ∂(X), A), since the proof in
the case where there are multiple existential (resp. universal) set-quantifiers or
relation-quantifiers in ϕ(z) (resp. ψ(z)) is exactly identical. Then
z ∈ Z ⇐⇒ N |= ∃ X ϕ0(z,X, ∂(X), A)⇐⇒ N |= ∀ X ψ0(z,X, ∂(X), A) (3.80)
Then
N |= ∀ z ∃ X ϕ0(z,X, ∂(X), A) ∨ ¬ψ0(z,X, ∂(X), A) (3.81)
206
Let us abbreviate
ξ0(z,X, ∂(X), A) ≡ ϕ0(z,X, ∂(X), A) ∨ ¬ψ0(z,X, ∂(X), A) (3.82)
so that equation (3.81) becomes
N |= ∀ z ∃ X ξ0(z,X, ∂(X), A) (3.83)
Then this translates into M as
M |= ∀ z∨θ(x,y)
∃ b ξ0(z, θ(·, b), ∂(θ(·, b)), ρ(·, a)) (3.84)
where θ(x, y) ranges over ∅-formulas with non-empty set of parameter variables
y. Since the map ∂ : D(M) → M is uniformly definable (resp. B-computably
uniformly definable) via the map θ 7→ θ′, we have
M |= ∀ z∨θ(x,y)
∃ b ∃ c (θ′(c, b) & ξ0(z, θ(·, b), c, ρ(·, a)) (3.85)
Since M is saturated (resp. B-recursively saturated), an application of Proposi-
tion 69 implies that there is K > 0 and there are ∅-formulas θ1(x, y), . . . , θK(x, y)
such that
M |= ∀ zK∨i=1
∃ b ∃ c (θ′i(c, b) & ξ0(z, θi(·, b), c, ρ(·, a))) (3.86)
207
Then by definition of ξ0 (cf. equation 3.82)), we have:
M |= ∀ zK∨i=1
∃ b ∃ c (θ′i(c, b) & (ϕ0(z, θi(·, b), c, ρ(·, a)) ∨ ¬ψ0(z, θi(·, b), c, ρ(·, a))))
(3.87)
It follows from equation (3.80) that
Z = {z ∈Mn : M |=K∨i=1
∃ b ∃ c (θ′i(c, b) & (ϕ0(z, θi(·, b), c, ρ(·, a)))} (3.88)
Hence Z ∈ D(Mn) and so N satisfies ∆11 − BL0. Hence, this completes the proof
of parts (i) and (iii).
We turn to the proofs of parts (ii) and (iv). First, we handle the proof of the
right-to-left direction, which is quite similar to the proof from the above para-
graph. Suppose that ∂ : D(M)→M is uniformly definable (resp. B-computably
uniformly definable) and M is saturated (resp. B-recursively saturated) and has
definable skolem functions. To see that N is a model of Σ11 − LB0, suppose that
N |= ∀ z ∃ X ξ0(z,X, ∂(X), A) (3.89)
where ξ0 is arithmetical and where A ∈ D(M) is a set parameter with
A = {w ∈M : M |= ρ(w, a)} (3.90)
and where ρ(w, v) is an ∅-formula with a ∈ M . (As in the proof in the previous
paragraph, the case of multiple parameters or multiple set or relation quantifiers
208
is exactly similar). Then equation (3.89) translates into M as
M |= ∀ z∨θ(x,y)
∃ b ξ0(z, θ(·, b), ∂(θ(·, b)), ρ(·, a)) (3.91)
where θ(x, y) ranges over ∅-formulas with non-empty set of parameter variables
y. Since ∂ : D(M) → M is uniformly definable (resp. B-computably uniformly
definable) via the map θ 7→ θ′, we have
M |= ∀ z∨θ(x,y)
∃ b ∃ c (θ′(c, b) & ξ0(z, θ(·, b), c, ρ(·, a)) (3.92)
Since M is saturated (resp. B-recursively saturated), an application of Proposi-
tion 69 implies that there is K > 0 and there are ∅-formulas θ1(x, y), . . . , θK(x, y)
such that
M |= ∀ zK∨i=1
∃ b ∃ c (θ′i(c, b) & ξ0(z, θi(·, b), c, ρ(·, a))) (3.93)
Then by adding dummy variables if need be, we can move the disjunction to the
right as follows:
M |= ∀ z ∃ b ∃ cK∨i=1
(θ′i(c, b) & ξ0(z, θi(·, b), c, ρ(·, a))) (3.94)
and one can take the first such i as follows:
M |= ∀ z ∃ b ∃ cK∨i=1
[(θ′i(c, b) & ξ0(z, θi(·, b), c, ρ(·, a)))
&∧j<i
¬(θ′j(c, b) & ξ0(z, θj(·, b), c, ρ(·, a)))] (3.95)
209
Then since M has definable skolem functions, there is a possibly larger finite set
of parameters a′ ⊇ a and a′-definable functions f, g such that
M |= ∀ zK∨i=1
[(θ′i(g(z), f(z)) & ξ0(z, θi(·, f(z)), g(z), ρ(·, a)))
&∧j<i
¬(θ′j(g(z), f(z)) & ξ0(z, θj(·, f(z)), g(z), ρ(·, a)))] (3.96)
Then there is a partition of Mn into the a′-definable sets P1, . . . , PK which are
defined as follows:
Pi = {z ∈Mn : M |=[(θ′i(g(z), f(z)) & ξ0(z, θi(·, f(z)), g(z), ρ(·, a)))
&∧j<i
¬(θ′j(g(z), f(z)) & ξ0(z, θj(·, f(z)), g(z), ρ(·, a)))]}
(3.97)
Then define the a′-definable relation
R = {(z, w) :K∨i=1
[z ∈ Pi → θi(w, f(z))]} (3.98)
so that
z ∈ Pi =⇒ Rz = {w ∈M : (z, w) ∈ R} = {w ∈M : M |= θi(w, f(z))} = θi(·, f(z))
(3.99)
z ∈ Pi =⇒ {∂(Rz)} = {∂(θi(·, f(z))} = {c ∈M : M |= θ′i(c, f(z))} = {g(z)}
(3.100)
z ∈ Pi =⇒ ∂(Rz) = g(z) (3.101)
Putting these things together and glancing back at the definition of Pi in equa-
210
tion (3.97) we have,
z ∈ Pi =⇒M |= (θ′i(g(z), f(z)) & ξ0(z, θi(·, f(z)), g(z), ρ(·, a)))
=⇒ N |= ξ0(z,Rz, ∂(Rz), A) (3.102)
Since the sets P1, . . . , PK partition Mn we have
N |= ∀ z ξ0(z,Rz, ∂(Rz), A) (3.103)
and this implies that N models Σ11 − BL0. Hence we have established the right-to-
left direction of (ii) and (iv).
We want to establish the left-to-right direction of (ii) and (iv). Suppose that
∂ : D(M)→ M is uniformly definable (resp. B-computably uniformly definable)
and M is saturated (resp. B-recursively saturated) and that N models Σ11 − BL0.
Suppose that P ⊆ Mm+n is definable, perhaps with a finite set a of parameters
from M . Note that for every x ∈ Mm with a tuple y ∈ Mn such that Pxy, we
can arbitrarily choose one such y ∈ Mn and form the y-definable singleton {y}.
This implies that
N |= ∀ x ∃ R [∃ y Pxy]→ [(∃! y Ry) & (∀ y Ry → Pxy)] (3.104)
Since N |= Σ11 − LB0, one then has
N |= ∃ P ′ ∀ x [∃ y Pxy]→ [(∃! y P ′xy) & (∀ y P ′xy → Pxy)] (3.105)
211
Since P ′xy if and only if P ′xy, this implies that
N |= ∃ P ′ ∀ x [∃ y Pxy]→ [(∃! y P ′xy) & (∀ y P ′xy → Pxy)] (3.106)
Finally, let P ′′ = P ′ ∩ P . Then
M |= ∀x, y [P ′′xy → Pxy] (3.107)
M |= ∀ x [∃ y Pxy]→ [∃! y P ′′xy] (3.108)
Hence, M has definable skolem functions.
3.4.2 Application to Algebraically Closed Fields
Remark 71. In this section, we apply Theorem 70 to construct models of ∆11 − HP0
on top of certain algebraically closed fields (cf. Theorem 77). The primary applica-
tion of this construction is to answer a question posed by Linnebo (cf. Remark 81
and Theorem 83). Prior to doing this, we recall Ax’s Theorem and note one
elementary consequence of this theorem.
Theorem 72. (Ax’s Theorem) Suppose that k is an algebraically closed field and
f : k → k is a definable injective function. Then f is surjective.
Proof. See Ax [4] Theorem C pp. 241, 270 or Poizat [123] Lemma 4.3 pp. 70-71,
in which is proved the stronger result wherein k is replaced by a definable subset
of kn.
Proposition 73. Suppose that k is an algebraically closed field and that X, Y ⊆ k
are definable. Then the following are equivalent:
(i) There is definable bijection f : X → Y
212
(ii) Either both X and Y are finite and of the same cardinality, or both X and
Y are cofinite and k \X and k \ Y are of the same cardinality.
Proof. Suppose that (i) holds. Then by strong minimality and the fact that an
infinite set cannot be bijective with a finite set, either both X and Y are finite or
both X and Y are cofinite. If X and Y are both finite then the fact that there is a
definable bijection between them implies that X and Y have the same cardinality.
If X and Y are both cofinite but k \X and k \ Y are not of the same cardinality,
then without loss of generality k \ X = {a1, . . . , am} and k \ Y = {b1, . . . , bn}
where m < n. Then define a function f : k → k by f � X = f and f(ai) = bi for
i ≤ m. Then f : k → k is an injection that is not a surjection, since bn is not the
in the range of f . This contradicts Ax’s Theorem 72. So, in fact, k \X and k \ Y
are of the same cardinality. Then (ii) holds.
Conversely, suppose that (ii) holds. If both X and Y are finite of the same
cardinality, then simply enumerate the elements ofX and Y and use these elements
as parameters to define a bijection f : X → Y . If X and Y are both cofinite and
k \ X and k \ Y are of the same finite cardinality, then enumerate k \ X =
{y1, . . . , yn} and k \ Y = {x1, . . . , xn}. By renumbering, we can assume without
loss of generality that (k\X)∩(k\Y ) = {x1, . . . , xm} = {y1, . . . , ym} where m ≤ n
and x1 = y1, . . . , xm = ym. If m = n then this implies that (k \X) = (k \ Y ) and
X = Y , and we can choose the definable bijection f : X → Y to be the identity
map. If m < n, then note that {xm+1, . . . , xn} ⊆ X and {ym+1, . . . , yn} ⊆ Y and
X \ {xm+1, . . . , xn} ⊆ Y and Y \ {ym+1, . . . , yn} ⊆ X. Then we can choose the
definable bijection f : X → Y which is given by the identity on X \{xm+1, . . . , xn}
and by f(xi) = yi on {xm+1, . . . , xn}.
Definition 74. A structure k is strongly minimal if every definable X ⊆ k is finite
213
or cofinite.
Proposition 75. Every algebraically closed field is strongly minimal.
Proof. See Marker [108] p. 5.
Proposition 76. Algebraically closed fields do not have definable skolem func-
tions.
Proof. Let ϕ(x, y) ≡ x = y2. Then k |= ∀ x ∃ y x = y2. If k has definable skolem
functions or parametrically definable skolem functions, then there is a definable
function f : k → k such that k |= ∀ x x = (f(x))2. Then rng(f) is a definable
set which includes exactly one square root for each x ∈ k. Then rng(f) is infinite
and coinfinite, which contradicts strong minimality.
Theorem 77. Suppose that k is a saturated algebraically closed field of charac-
teristic zero. Then there is a uniformly definable function # : D(k)→ k such that
(k,D(k), D(k2), . . . ,#) is a model of ∆11 − HP0 + ¬Σ11 − PH0 + ¬Π11 − HP0. Further,
there is no function ∂ : D(k)→ k such that (k,D(k), D(k2), . . . , ∂) is a model of
∆11 − BL0.
Proof. Since k is a field of characteristic zero, the prime field of k is Q and
the integers Z are hence embedded into k via Q. Using this embedding, de-
fine # : D(k) → k by #X = |X| if X is finite and #X = −(|k \X| + 1) if
X is cofinite. Then by Proposition 73, the structure (k,D(k), D(k2), . . . ,#) is a
model of Hume’s Principle. To apply Theorem 70 (i)-(ii), we need to show that
# : D(k) → k is uniformly definable. Suppose that θ(x, y) is an ∅-formula with
non-empty set y of parameter variables. Then by strong minimality, for any a we
214
have that θ(·, a) is finite or ¬θ(·, a) is finite. Then
k |= ∀ a∨N≥0
[|θ(·, a)| ≤ N ∨ |¬θ(·, a)| ≤ N ] (3.109)
Since k is saturated, by Proposition 69, there is an integer Nθ > 0 such that
k |= ∀ aNθ∨i=0
[|θ(·, a)| ≤ i ∨ |¬θ(·, a)| ≤ i] (3.110)
Then for each such formula θ(x, y) we define the following ∅-formula θ′(x, y) as
follows:
θ′(x, y) ≡Nθ∨i=0
[|θ(·, y)| = i & x = i] ∨ [|¬θ(·, y)| = i & x = −(i+ 1)] (3.111)
Hence, by definition, we have that for any a
{#(θ(·, a))} = {c : k |= θ′(c, a)} (3.112)
The map # : D(k) → k is uniformly definable. Hence, by Theorem 70 (i)-(ii)
and Proposition 76, we have that (k,D(k), D(k2), . . . ,#) is a model of ∆11 − HP0 +
¬Σ11 − PH0. Further, since the set rng(#) = Z is definable by a Σ11-formula in
the structure (k,D(k), D(k2), . . . ,#) but is not definable in k since k is strongly
minimal, we have that (k,D(k), D(k2), . . . ,#) is a model of ¬Π11 − HP0.
Now let us note why there is no function ∂ : D(k)→ k such that the structure
(k,D(k), D(k2), . . . , ∂) is a model of ∆11 − BL0. If there was such a function, then
by Corollary 38 it would follow that there was an injective non-surjective function
s : k → k whose graph is in D(k2), which would contradict Ax’s Theorem (72).
Remark 78. If we knew that all the parts of the proof of the above theorem were
215
formalizable in ACA0, then we could infer from the proof of the above theorem
and Proposition 13 that ∆11 − HP0 <I ACA0. It is clear from the proof that this
comes down to determining whether or not Ax’s Theorem 72 is provable in ACA0.
However, note that in the next subsection, we will prove Corollary 99, which
assures us that ∆11 − HP0 <I ACA0.
Remark 79. In conjunction with Corollary 38, the following corollary shows that
there is a stark contrast between ∆11 − HP0 and ∆11 − BL0 on the score of whether
they require the existence of injective non-surjective functions.
Corollary 80. There is a model (M,S1, S2, . . . ,#) of ∆11 − HP0 such that there is
no injective non-surjective function s : M →M such that graph(s) is in S2.
Proof. This follows immediately from the construction in Theorem 77 and Ax’s
Theorem 72.
Remark 81. Linnebo presented a description of properties that models of AHP0
and ∆11 − HP0 must have if they fail to model a certain sort of successor axiom
([100] pp. 164-165), and he additionally showed that there was a model of AHP0
which did not model this successor axiom ([100] Theorem 2 p. 164). Linnebo then
remarked that it was unknown whether there was a model of ∆11 − HP0 that did not
model the successor axiom (cf. [100] Remark 6 p. 168). Subsequent to defining
this successor axiom, we now show that the model from the previous theorem
does not model this axiom. We also explain why certain properties identified by
Linnebo hold in this model.
Definition 82. The following are formulas in the language of HP2 (cf. Lin-
nebo [100] pp. 158-160):
(i) P (n,m)⇐⇒ ∃ X, Y #X = n & #Y = m & ∃ y ∈ Y X = Y \ {y}
216
(ii) F is hereditary if Fn and P (n,m) implies Fm
(iii) F is closed if P (#∅,m) implies Fm
(iv) n is a pseudo-number if n = #∅ or n is contained in all hereditary, closed F .
(v) The successor axiom (SA) says that for any pseudo-number n, there is m
such that P (n,m).
Proposition 83. Suppose that k is a saturated algebraically closed field of char-
acteristic zero. Suppose that # : D(k)→ k by #X = |X| if X is finite and #X =
−(|k \X|+ 1) if X is cofinite. Then (k,D(k), D(k2), . . . ,#) |= ∆11 − HP0 + ¬SA.
Proof. Before we begin, it is perhaps helpful to informally state the definition of
# given above and describe how it interacts with the predicate P (n,m). If X is a
finite set with n elements, then #X = n, and if X is a cofinite set with n elements
in its complement, then #X = −(n+ 1). So, for example, the set X = {√
2,−1}
has #X = 2, and the set X = {a ∈ k : k |= a2 +1 6= 0} has #X = −(2+1) = −3,
and the set X = k has #X = −1, and the set X = ∅ has #X = 0. Further,
if X is finite, then by choosing an element y /∈ X, we have P (#X,#(X ∪ {y})).
For example, if X is finite and has n elements and y /∈ X, we have that #X = n
and #(X ∪ {y}) = n + 1, so that P (n, n + 1). Conversely, if X is cofinite and
has n > 0 elements in its complement and y /∈ X, then we have that X ∪ {y}
has n − 1 elements in its complement, so that #X = −(n + 1) = −n − 1 and
#(X∪{y}) = −((n−1)+1) = −n and hence so that P (−n−1,−n). For example,
we have P (0, 1), P (1, 2), P (2, 3), . . . and . . . , P (−4,−3), P (−3,−2), P (−2,−1).
Now we begin the proof. In particular, we want to begin by describing what
the hereditary, closed sets F ∈ D(k) look like. So suppose that F ∈ D(k) is
hereditary and closed. First we claim that N \ {0} ⊆ F . For, by the definition
217
of P (n,m) and #, we have that F ’s being closed implies that P (0, 1) and hence
1 ∈ F . So suppose that n ∈ (N \ {0}) ∩ F . Then by the definition of P (n,m)
and #, we have that F ’s being hereditary implies that P (n, n + 1) and hence
n+ 1 ∈ F . By induction, we have that if F ∈ D(k) is hereditary and closed then
N \ {0} ⊆ F .
We want to claim that {n ∈ Z : n 6= 0} ⊆ F . Suppose not. That is, suppose
that there are some negative integers that are not in F . Then, since F ∈ D(k) is
infinite, strong minimality implies that F is co-finite. So there are at most finitely
many negative integers that are not in F . Suppose that we write these negative
integers in increasing order as a1 < a2 < · · · < an. (E.g. if Z\F = {−5,−10,−12}
then a1 = −12, a2 = −10 and a3 = −5). This implies that a1 − 1 ∈ F . But then
by the definition of P (n,m) and #, we have that F ’s being hereditary implies
that P (a1−1, a1) and hence Fa1, which is a contradiction. Hence, in fact we have
that {n ∈ Z : n 6= 0} ⊆ F . So, what we have shown in this paragraph is that if
F ∈ D(k) is hereditary and closed, then {n ∈ Z : n 6= 0} ⊆ F .
This, of course, implies that every element of Z is a pseduo-number. Con-
versely, it is not difficult to see that all the pseudo-numbers are elements of Z.
Suppose that a ∈ k is not an integer. Then the set F = k \ {a} is a hereditary
closed set that does not contain a. Hence, what we have shown in this paragraph
is that the pseduo-numbers in the structure (k,D(k), D(k2), . . . ,#) are precisely
the integers.
Now we are in a position to show that (k,D(k), D(k2), . . . ,#) |= ¬SA. For,
consider the set k ∈ D(k). By definition #k = −(|k \ k| + 1) = −1. Hence,
by the results of the previous paragraph, we have that #k is a pseudo-number.
So suppose that SA held on the structure (k,D(k), D(k2), . . . ,#). Then there
218
would be m such that P (#k,m). Then by definition, there would be sets X, Y ∈
D(k) such that #k = #X and m = #Y and ∃ y ∈ Y X = Y \ {y}. Since
Hume’s Principle holds on the structure (k,D(k), D(k2), . . . ,#), we have that
#k = #X implies that there is a bijection f : X → Y that is definable in the
structure k. By Proposition 73, we have that k \ k and k \ X are of the same
cardinality, so that X = k. But then the condition that y ∈ Y \ X implies that
y ∈ k \ k, which is a contradiction. So, in fact, SA does not hold on the structure
(k,D(k), D(k2), . . . ,#).
Remark 84. In the course of his proof of the existence of a model of AHP0 +
¬SA, Linnebo noted several properties which must be had by such models ([100]
pp. 164-165). Since models of ∆11 − HP0 +¬SA are automatically models of AHP0 +
¬SA, Linnebo’s results predict several properties of the model from the previous
proposition. In this remark, we briefly explain why the properties identified by
Linnebo hold on this structure. First, Linnebo notes that the example of a pseduo-
number n witnessing that SA fails on the structure (k,D(k), D(k2), . . . ,#) must
be such that n = #k. In the last paragraph of the previous proposition, we showed
that n = #k was such a counterexample. Second, Linnebo notes that the example
of a structure (k,D(k), D(k2), . . . ,#) |= ¬SA must be such that k \X 6= ∅ implies
#k 6= #X. In the context of the model constructed in the previous proposition,
this is a consequence of Ax’s Theorem (or Proposition 73). Finally, Linnebo notes
that the example of a structure (k,D(k), D(k2), . . . ,#) |= ¬SA must in effect
contain a copy of both ω and ω∗ ordered by the P -relation, that is, this structure
must contain a copy of the positive integers and the negative integers ordered by
the P -relation. In the model constructed in the previous theorem, this is reflected
in the fact that the pseduo-numbers are precisely the integers.
219
3.4.3 Application to O-Minimal Expansions of Real-Closed Fields
Remark 85. In this section, we apply Theorem 70 to construct models of Σ11 − PH0
on top of certain o-minimal expansions of real-closed fields (cf. Theorem 97). The
primary application of this construction is to note that an effectivization this
construction allows us to conclude that Σ11 − PH0 <I ACA0 (cf. Corollary 99), thus
filling in a key piece of the interpretability relation (cf. Figure 3.2). Prior to doing
this, we recall some basic notions pertaining to the model theory of o-minimal
expansions of real-closed fields, such as dimension and Euler characteristic; the
reader who is already familiar with these notions may wish to proceed directly to
Theorem 97.
Definition 86. Suppose that L is a signature extending the signature of linear
orders, and suppose that M is an L-structure such that (M,≤) is a dense linear
order. Then M is o-minimal if every definable set is a finite union of points and
intervals.
Proposition 87. Every real-closed ordered field is o-minimal.
Proof. See Marker [108] Corollary 2.5 p. 11.
Definition 88. Suppose that M is an o-minimal structure. If X is a definable
subset of Mn, then let C(X) be the set of definable continuous functions f : X →
M , and let C∞(X) be C(X) plus the two constant functions −∞,∞. Further, if
f, g ∈ C∞(X) and f < g on X, then let
(f, g)X = {(x, r) ∈ X ×R : f(x) < r < g(x)} (3.113)
Then inductively define the notion of a σ-cell, where σ ∈ 2<ω is a finite sequence
of zeros and ones. First, 0-cells are points and 1-cells are open intervals, including
220
(−∞, a), (a,−∞). Second, given a σ-cell X, the σ0-cells are graphs of functions
f ∈ C(X), and the σ1-cells are sets (f, g)X where f, g ∈ C∞(X).
Definition 89. Suppose that M is an o-minimal structure. A decomposition of
Mn is defined inductively as follows. A decomposition of M1 is a finite partition
of M with the following form:
{(−∞, a1), (a1, a2), . . . , (ak,+∞), {a1}, . . . , {ak}} (3.114)
where a1 < a2 < · · · < ak. A decomposition of Mm+1 = Mm × M is a fi-
nite partition of Mm+1 into cells {A1, . . . , An} such that the set of projections
{π(A1), . . . , π(An)} is a decomposition of Mm, where π : Mm+1 → Mm by
π(x1, . . . , xm+1) = (x1, . . . , xm). A decomposition of Mm is said to partition a
definable set X ⊆ Mm if X can be written as a finite union of pairwise disjoint
cells in the decomposition.
Theorem 90. (Cell Decomposition Theorem) Suppose that M is an o-minimal
structure. For any finite sequence of B-definable sets A1, . . . , Ak ⊆ Mm, there is
a decomposition of Mm partitioning each of the Ai. Moreover, the cells in the
decomposition are B-definable.
Proof. See van den Dries [147] Theorem 2.11 p. 52.
Definition 91. Suppose that M is an o-minimal structure and that X ⊆ Mn.
Then define
dim(X) = max{i1 + · · ·+ in : X contains a (i1, . . . , in)-cell} (3.115)
E(X) = k0 − k1 + k2 − · · · =n∑d=0
kd(−1)d (3.116)
221
where kd is the number of d-dimensional cells contained in some cell decomposition
of X.
Remark 92. Note that if X ⊆M , then dim(X) > 0 if and only if X contains an
open interval. Note that the above definition of Euler dimension can be shown to
be independent of the choice of the cell decomposition (cf. [147] Proposition 2.2
p. 70).
Proposition 93. Suppose that M is an o-minimal structure and that θ(x, y) is
a ∅-formula. Then there is a positive integer Nθ > 0 such that for all b ∈M , it is
the case that ∣∣dim(θ(·, b)∣∣ , ∣∣E(θ(·, b))
∣∣ < Nθ (3.117)
Further, for each integer k, it is the case that the sets
{b ∈M : dim(θ(·, b)) = k} & {b ∈M : E(θ(·, b)) = k} (3.118)
are ∅-definable. Moreover, the formulas that define these sets and the positive
integer Nθ can be uniformly computed from θ.
Proof. See van den Dries [147] Proposition 1.5 p. 65 and Proposition 2.10 p.
72.
Proposition 94. Suppose that M is an o-minimal expansion of a real-closed
field, and suppose that X ⊆Mn and Y ⊆Mm are definable sets. Then there is a
definable bijection f : X → Y if and only if dim(X) = dim(Y ) and E(X) = E(Y ).
Proof. See van den Dries [147] p. 132.
Remark 95. As a simple illustration of this fact, consider the example of the two
222
sets
X = (−2,−1) t {0} t (1, 2) Y = (−1, 1) (3.119)
Both have dimension 1, since they both contain intervals, and their Euler charac-
teristics are the same, namely, E(X) = 1−2 = −1 and E(Y ) = 0−1 = −1. Hence,
the above proposition predicts that there is a definable bijection f : X → Y , and
in fact this is the case: one simply sends (−2,−1) to (−1, 0) and one sends 0 to
0 and one sends (1, 2) to (0, 1).
Proposition 96. O-minimal expansions of real closed fields have definable skolem
functions.
Proof. See van den Dries [147] p. 94 for details.
Theorem 97. Suppose that k is a recursively-saturated o-minimal expansion
of a real-closed field. Then there is a computably uniformly definable function
# : D(k)→ k such that (k,D(k), D(k2), . . . ,#) is a model of Σ11 − PH0+¬Π11 − HP0.
Further, there is no function ∂ : D(k) → k such that (k,D(k), D(k2), . . . , ∂) is a
model of ∆11 − BL0.
Proof. Since k is a field of characteristic zero, the prime field of k is Q and the
integers Z are hence embedded into k via Q. Choose a recursive bijection 〈·, ·〉 :
Z2 → Z. Using this embedding and this bijection, define # : D(k)→ k by #X =
〈dimX,E(X)〉. Then by Proposition 94, the structure (k,D(k), D(k2), . . . ,#) is
a model of Hume’s Principle. To apply Theorem 70 (iii)-(iv), we need to show that
# : D(k) → k is computably uniformly definable. So suppose that θ(x, y) is an
∅-formula with non-empty set y of parameter variables. Then by Proposition 93,
from the formula θ(x, y) we can uniformly compute a positive integer Nθ > 0 such
223
that
k |= ∀ b [∣∣dim(θ(·, b)
∣∣ , ∣∣E(θ(·, b))∣∣ < Nθ] (3.120)
as well as ∅-formulas defining the sets {b : dim(θ(·, b) = n} and {b : E(θ(·, b)) = n}.
Then for each such formula θ(x, y) we define the following ∅-formula θ′(x, y) as
follows:
θ′(x, y) ≡Nθ∨i=0
Nθ∨j=0
[dim(θ(·, y)) = i & E(θ(·, y)) = j]→ x = 〈i, j〉 (3.121)
Hence, by definition, we have that for any a
{#(θ(·, a))} = {c : k |= θ′(c, a)} (3.122)
Hence, by Theorem 70 (iii)-(iv) and Remark 96, we have that the structure
(k,D(k), D(k2), . . . ,#) is a model of Σ11 − PH0. Further, since the set rng(#) = Z
is definable by a Σ11-formula in the structure (k,D(k), D(k2), . . . ,#) but is not de-
finable in k since k is o-minimal, we have that (k,D(k), D(k2), . . . ,#) is a model
of ¬Π11 − HP0.
Now let us note why there is no function ∂ : D(k)→ k such that the structure
(k,D(k), D(k2), . . . , ∂) is a model of ∆11 − BL0. If there was such a function, then
by Proposition 34, there would be a function s : k2 → k whose graph was in D(k2)
and which satisfied s(x, y) = ∂({x, y}). Consider the definable set X = {(x, y) ∈
k2 : x < y}, and note that dim(X) = 2. Then s � X : X → k is an injection. For,
suppose s(x, y) = s(x′, y′) for (x, y), (x′, y′) ∈ X. Then ∂({x, y}) = ∂({x′, y′}) and
x < y and x′ < y′. Then by Basic Law V, {x, y} = {x′, y′} and x < y and x′ < y′.
Then x = x′ and y = y′. Hence, in fact, s � X : X → k is an injection. Then
trivially s � X : X → rng(s � X) is a bijection whose graph is in D(k2). Then by
224
the left-to-right direction of Proposition 94, it would follow that
2 = dim(X) = dim(rng(s � X)) ≤ dim(k) = 1 (3.123)
which is a contradiction.
Remark 98. It is our claim that all of the results quoted and proved in this
subsection can be proven in ACA0 for o-minimal structures M with ACA0-provable
quantifier-elimination, such as real-closed fields (cf. Marker [109] Theorem 2.3
p. 10, Simpson [138] Lemma II.9.6 p. 98). The reason for this is that (i) the proofs
from van den Dries [147] all concern properties of definable sets, as opposed to
properties of the defining formula, and (ii) the proofs from van den Dries [147] all
proceed by induction on the cartesian power of the definable set. It is worthwhile
to say a little bit more about each of these points.
In regard to (i), the proofs in this section from van den Dries [147] are all
concerned with properties of a definable set X, so that the definable set X has
the property regardless of which particular formula is used to define X. For
instance, the property of X’s being a cell has this feature, since a definable set
X ⊆ M is e.g. an interval or a point regardless of whether the formula ϕ or the
formula ψ is being used to define it (where ϕ and ψ are two formulas that do in fact
define X). By the same token, the proofs in this section from van den Dries [147]
are not concerned with the syntactic complexity of given formulas, for instance,
whether or not they are Π03-formulas or Π0
4-formulas. Hence, if M has quantifier-
elimination, then for the purposes of the proofs in this section from van den Dries
[147], we can take the quantifier-free formulas as representatives for the definable
sets. For instance, in proving the Cell Decomposition Theorem in this manner, we
would in fact prove that e.g. for every finite sequence of quantifier-free formulas
225
ϕ1(x), . . . , ϕk(x) in m-free variables, there is a quantifier-free decomposition of
Mm partitioning each of the ϕi(x).
In regard to (ii), the proofs in this section from van den Dries [147] all proceed
by induction, where it is first shown that the definable subsets of M have a given
property, and then it is shown that if the definable subsets of Mn have a given
property, then the definable subsets of Mn+1 have this given property. Given our
discussion in the previous paragraph, when proving these theorems in ACA0, we
would in fact prove that the quantifier-free formulas ϕ(x) have a given property,
and that if the quantifier-free formulas ϕ(x1, . . . , xn) have a given property, then
the quantifier-free formulas ϕ(x1, . . . , xn+1) have a given property. Since ACA0 has
the mathematical induction axiom for all sets X, it suffices to note that ACA0
has enough comprehension to show that the sets X on which it is doing induc-
tion exist. Here it suffices to note that the proofs in this section from van den
Dries [147] all concern properties of the definable sets that can be expressed by
(iii) finitely many quantifiers over quantifier-free definable sets and by (iv) finitely
many quantifiers over the structure M . For instance, to reiterate the point made
in the last paragraph, in proving the Cell Decomposition Theorem in this fashion,
we must show that for every m and every finite sequence of quantifier-free formu-
las ϕ1(x), . . . , ϕk(x) in m-free variables, there is a quantifier-free decomposition of
Mm partitioning each of the ϕi(x). In terms of (iii), this involves a universal quan-
tifier over quantifier-free definable sets followed by an existential quantifier over
quantifier-free definable sets. In terms of (iv), this involves a universal quantifier
to say that e.g. the cells in the decomposition are disjoint and another universal
quantifier to say that e.g. ϕi(x) can be written as a finite union of pairwise dis-
joint cells in the decomposition. Since the number of quantifiers in (iii) and (iv)
226
is fixed in advance, ACA0 can prove that the set on which one is doing induction
exists. In this way, the proofs from van den Dries [147] can be translated word-
for-word into proofs in ACA0 for o-minimal structures M which have ACA0-provable
quantifier-elimination, such as real-closed fields.
Corollary 99. Σ11 − PH0 <I ACA0.
Proof. This follows from Proposition 13, the fact that ACA0 proves the existence
of recursively saturated elementary extensions (cf. Simpson [138] Lemma IX.4.2
pp. 379), and the fact that the proof of the previous theorem can be formalized in
ACA0 for o-minimal expansions of real-closed fields with ACA0-provable quantifier-
elimination, such as real-closed fields.
3.4.4 Application to Separably Closed Fields
Remark 100. In the two previous subsections, we applied Theorem 70 to con-
struct models of ∆11 − HP0 on top of various fields, such as certain algebraically
closed fields and o-minimal expansions of real-closed fields. We noted in both
Theorem 77 and Theorem 97 that this construction cannot result in models of
∆11 − BL0. Hence, this raises the question of whether there is some natural field
such that one can apply Theorem 70 to it to obtain models of ∆11 − BL0. In this
section, we isolate certain model-theoretic conditions on a field (such a uniform
elimination of imaginaries) which suffice to ensure that such a construction can
succeed (cf. Theorem 106). Then we note that separably closed fields of finite
imperfection degree satisfy these model-theoretic conditions (cf. Theorem 108).
Definition 101. Suppose that M is an L-structure. Then M has uniform elimi-
nation of imaginaries if for every ∅-definable equivalence relation E on Mn there
227
is an ∅-definable function f : Mn →Mm for some m > 0 such that
zEy ⇐⇒ f(z) = f(y) (3.124)
Definition 102. Suppose that M is an L-structure. Then M has a ∅-definable
pairing function if there is an ∅-definable injection ι : M2 →M .
Proposition 103. Suppose that M has uniform elimination of imaginaries and
an ∅-definable pairing function. Then for every ∅-definable equivalence relation E
on Mn there is an ∅-definable function f : Mn →M such that
zEy ⇐⇒ f(z) = f(y) (3.125)
Proof. By hypothesis, M has an ∅-definable pairing function ι : M2 → M . Then
define injections jn : Mn →M recursively as follows:
j1(x1) = x1 (3.126)
j2(x1, x2) = ι(x1, x2) (3.127)
jn+1(x1, . . . xn, xn+1) = ι(jn(x1, . . . , xn), xn+1) (3.128)
Finally, given a function f : Mn → Mm for some m > 0 which witnesses the
uniform elimination of imaginaries, simply define f ∗ = jm ◦ f .
Proposition 104. Suppose that M has an ∅-definable pairing function and that
dcl(∅) has at least two elements. Then there is a uniformly computable sequence
of injections ιn : M →M such that n 6= m implies rng(ιn) ∩ rng(ιm) = ∅.
Proof. Suppose that ι : M2 →M is the ∅-definable injection and that b, c ∈ dcl(∅)
228
are distinct. Then define injections ιn : M →M recursively as follows:
ι0(x) = ι(c, ι(c, x)) (3.129)
ι2s+1(x) = ι(b, ι2s(x)) (3.130)
ι2s+2(x) = ι(c, ι2s+1(x)) (3.131)
By construction, all the functions ιn : M → M are injections. So it remains to
show by induction on m ≤ n that rng(ιn) ∩ rng(ιm) = ∅ when m 6= n. Clearly
this holds for n = 0. So suppose it holds for n. If n is even then n = 2s and
n + 1 = 2s + 1. Suppose that m < n + 1 is such that rng(ιn+1) ∩ rng(ιm) 6= ∅.
Then there are x, y such that ιn+1(x) = ιm(y). Expanding this equation on the
left, we have ι(b, ι2s(x)) = ι2s+1(x) = ιn+1(x) = ιm(y). Then by construction,
ιm(y) = ι(b, ι2t(y)) for some 2t + 1 = m. Then ιm−1(y) = ι2t(y) = ι2s(x) = ιn(x),
which contradicts our induction hypothesis on n. On the other hand, if n is odd
then n = 2s + 1 and n + 1 = 2s + 2. Suppose that m < n + 1 is such that
rng(ιn+1) ∩ rng(ιm) 6= ∅. Then there are x, y such that ιn+1(x) = ιm(y). Then
expanding this equation on the left we have ι(c, ι2s+1(x)) = ι2s+2(x) = ιn+1(x) =
ιm(y). There are then two cases. First suppose that m = 0. Then by construction
ιm(y) = ι(c, ι(c, y)). Then ι(b, ι2s(x)) = ι2s+1(x) = ι(c, y), and so b = c, which
is a contradiction. Second, suppose that m > 0. Then by construction, ιm(y) =
ι(c, ι2t+1(y)) for some 2t + 2 = m. Then ιm−1(y) = ι2t+1(y) = ι2s+1(x) = ιn(x),
contradicting our induction hypothesis on n.
Remark 105. The intuitive idea of the proof of the following theorem is very clear.
For, suppose that M has uniform elimination of imaginaries and a ∅-definable
pairing function. Then given a formula θ(x, y) with a set of parameter variables
229
y of length ` > 0, these assumptions yield an ∅-definable function ∂θ : M ` → M
such that
M |= [∀ x θ(x, a)→ θ(x, b)]⇐⇒ ∂θ(a) = ∂θ(b) (3.132)
Intuitively, the idea is to build a model (M,D(M), D(M2), . . . , ∂) of Basic Law V
by setting
∂(θ(·, a)) = ∂θ(a) (3.133)
However, there are two potential problems. First, such a function will not be
well-defined, since a given set X ∈ D(M) will be defined by many formulas
θ1(·, a), θ2(·, b), . . .. Second, it is not obvious that such a function will be in-
jective, which is required by Basic Law V. Overcoming these problems is the only
thing that makes the below proof non-trivial. In particular, the first problem is
overcome simply by fixing beforehand an enumeration of the all potential defining
formulas θ1(x, y), . . . , θn(x, y), . . ., and then defining ∂(X) to be ∂θn(a) for the first
θn(x, a) in the enumeration that defines X for some tuple a. The second problem
is overcome by including additional hypotheses on M which ensure that we can
partition M =⊔nMn and likewise ensure that ∂θn(a) always takes values in Mn.
The previous proposition was in effect devoted to explaining why the hypothesis
of a ∅-definable pairing function with |dcl(∅)| > 1 ensure that we can construct
such a partition.
Theorem 106. Suppose that M is a Th(M)-computably saturated structure
such that (i) M has uniform elimination of imaginaries, (ii) M has an ∅-definable
pairing function, and (iii) dcl(∅) has at least two elements. Then there is a
Th(M)-computably uniformly definable function ∂ : D(M) → M such that
(M,D(M), D(M2), . . . , ∂) is a model of ∆11 − BL0.
230
Proof. To apply Theorem 70 (iii)-(iv), we need to define an injection ∂ : D(M)→
M that is Th(M)-computably uniformly definable. Choose a fixed computable
enumeration of the ∅-formulas θ(x, y) with non-empty set y of parameter variables
of length `n as θ1(x, y), . . . , θn(x, y), . . .. For each n > 0 and 0 < m ≤ n, consider
the following ∅-definable sets Un,m ⊆ M `n , where again `n is the length of the
tuple y in θn(x, y):
U1,1 = M `1 (3.134)
U2,1 = {a ∈M `2 : ∃ b ∈M `1 [∀ x θ2(x, a)↔ θ1(x, b)]} (3.135)
U2,2 = M `2 \ U2,1 (3.136)
U3,1 = {a ∈M `3 : ∃ b ∈M `1 [∀ x θ3(x, a)↔ θ1(x, b)]} (3.137)
U3,2 = {a ∈M `3 : ∃ b ∈M `2 [∀ x θ3(x, a)↔ θ2(x, b)]} \ U3,1 (3.138)
U3,3 = M `3 \ (U3,1 ∪ U3,2) (3.139)
Note that for a fixed n > 0 that the sets Un,1, . . . , Un,n partition M `n and that
the formulas defining these sets are uniformly computable from n. Then define
∅-definable equivalence relations on M `n as follows:
yEnz ⇐⇒M |= [∀ x θn(x, y)↔ θn(x, z)] (3.140)
Note by definition that any two elements y and z which are En-equivalent are in
the same member of the partition Un,1, . . . , Un,n of M `n . By Proposition 103 from
θn(x, y) we can uniformly Th(M)-compute a ∅-definable function fn : M `n → M
such that
M |= [∀ x θn(x, y)↔ θn(x, z)]⇐⇒ yEnz ⇐⇒ fn(y) = fn(z) (3.141)
231
By Proposition 104, we can uniformly compute a sequence of injections ιn : M →
M with disjoint ranges, and we can define gn = ιn◦fn. Finally, define ∂ : D(M)→
M by setting ∂(θn(·, a)) = c if and only if
n∧m=1
[a ∈ Un,m → (∃ b ∈M `m & ∀ x θn(x, a)↔ θm(x, b) & c = gm(b))] (3.142)
First let us show that ∂ : D(M) → M is a well-defined function. So suppose
that θn(·, a) and c satisfy the right-hand side of equation (3.142) and that θn′(·, a′)
and c′ also satisfy the right-hand side of equation (3.142), and suppose that θn(·, a)
and θn′(·, a′) define the same set. Then we must show that c = c′. Without loss of
generality, n′ ≤ n. If n′ = n, then since θn(·, a) and θn′(·, a′) define the same set,
we have that a and a′ are En-equivalent and hence are in the same set Un,m. Then
by the right-hand side of equation (3.142), we have that there are b, b′ ∈M `m such
that
M |= ∀ x θm(x, b)↔ θn(x, a)↔ θn(x, a′)↔ θm(x, b′) (3.143)
c = gm(b) (3.144)
c′ = gm(b′) (3.145)
But by equation (3.143), we have that b and b′
are Em-equivalent, and hence by
equation (3.141), we have that fm(b) = fm(b′) and so by equations (3.144)-(3.145)
we have that
c = gm(b) = ιm ◦ fm(b) = ιm ◦ fm(b′) = gm(b
′) = c′ (3.146)
In the case where n′ < n, we have that a ∈ Un,m and a′ ∈ Un′,m′ and so by the
232
right-hand side of equation (3.142), we have that there is b ∈M `m , b′ ∈M `m′ such
that
M |= ∀ x θm(x, b)↔ θn(x, a)↔ θn(x, a′)↔ θm′(x, b′) (3.147)
c = gm(b) (3.148)
c′ = gm′(b′) (3.149)
Then by equation (3.147) and the definition of the sets Un,m, we must have that
m = m′. Then by equation (3.147) again, we have that b and b′are Em-equivalent,
and, hence, by equation (3.141), we have that fm(b) = fm(b′), and so by equa-
tions (3.148)-(3.149) we have that
c = gm(b) = ιm ◦ fm(b) = ιm ◦ fm(b′) = gm(b
′) = c′ (3.150)
Therefore, ∂ : D(M)→M is a well-defined function.
Now let us show that ∂ : D(M)→M is an injection. Suppose that θn(·, a) and
c satisfy the right-hand side of equation (3.142) and that θn′(·, a′) and c′ satisfy
the right-hand side of equation (3.142) and suppose that c = c′. Then we must
show that θn(·, a) and θn′(·, a′) define the same set. We have that a ∈ Un,m and
a′ ∈ Un′,m′ , and by the right-hand side of equation (3.142), we have that there is
b ∈M `m , b′ ∈M `m′ such that
M |= ∀ x θn(x, a)↔ θm(x, b) (3.151)
M |= ∀ x θn′(x, a′)↔ θm′(x, b′) (3.152)
gm(b) = c = c′ = gm′(b′) (3.153)
233
Since gm = ιm◦fm and since the functions ιm have distinct ranges, equation (3.153)
implies that m = m′ and since gm = ιm ◦ fm and ιm is an injection, we have
that equation (3.153) implies that fm(b) = fm′(b′), which by equation (3.141)
implies that θm(·, b) and θm′(·, b′) define the same set. This in turn implies with
equations (3.151)-(3.152) that θn(·, a) and θn′(·, a′) define the same set, which is
what we wanted to show. Hence, in fact ∂ : D(M)→M is an injection.
So, ∂ : D(M) → M is well-defined and indeed an injection. Note that by
its very definition in equation (3.142), we have that ∂ : D(M) → M is Th(M)-
computably uniformly definable. Hence, by Theorem 70 (iii)-(iv), we have that
(M,D(M), D(M2), . . . , ∂) is a mode of ∆11 − BL0.
Definition 107. Suppose that k is field of characteristic p > 0. Then k is a
separably closed field of finite imperfection degree if (i) there is a finite set B ⊆ k
such that the set of monomials {bm11 · · · bmee : 0 ≤ mi < p & b1, . . . , be ∈ B} is a
basis for k over kp, and if (ii) every f ∈ k[x] such that f ′ 6= 0 has a root in k.
Theorem 108. Suppose that k is a recursively saturated separably closed field
of finite imperfection degree. Then there is a computably uniformly definable
function ∂ : D(k)→ k such that (k,D(k), D(k2), . . . , ∂) is a model of ∆11 − BL0.
Proof. This follows immediately from the fact that such fields satisfy the an-
tecedents of the previous theorem and have a computable theory when names
are added for the finite set B from the previous definition (cf. Messmer [110]
Proposition 4.2 p. 140, p. 143, Remark 4.4 p. 141).
Remark 109. If we knew that all the elements of the proof of the previous
theorem were formalizable in ACA0, then we could infer from the proof of the
above theorem and Proposition 13 that we have ∆11 − BL0 <I ACA0. It is clear
234
from the proof that this comes down to determining whether or not the uniform
elimination of imaginaries for separably closed fields of finite imperfection degree
is provable in ACA0.
3.5 Further Questions
Question 110. In Figure 3.1, we summarized what is known about the provability
relation. Two questions which remain open are the following: does ∆11 − BL0 imply
Σ11 − LB0 and does Π11 − HP0 imply Σ11 − PH0?
Question 111. In Remark 78, we noted that if Ax’s Theorem 72 is provable in
ACA0, then we would have another proof of ∆11 − HP0 <I ACA0 besides the proof
from Corollary 99. Hence, is Ax’s Theorem 72 provable in ACA0?
Question 112. In Remark 109, we noted that if the uniform elimination of
imaginaries for separably closed fields is provable in ACA0, then we would have
∆11 − BL0 <I ACA0. Hence, is the uniform elimination of imaginaries for separably
closed fields provable in ACA0?
Question 113. Is HP2 faithfully interpretable in PA2? See the discussion of this
question at Remark 31.
Question 114. The results in Heck [64], Ganea [50], and Visser [149] imply that
ABL0 is mutually interpretable with Robinson’s Q. Is ∆11 − BL0 mutually inter-
pretable with Robinson’s Q?
Question 115. What is the exact interpretability strength of AHP0 and ∆11 − HP0?
Are these theories interpretable in Robinson’s Q?
Question 116. In § 2.2, and in particular around equation (3.20), we pointed out
that there is no function symbol in our language for the mapping (R, n) 7→ #(Rn),
235
where R is a binary relation and Rn = {m : Rnm}. The inclusion of such
a function symbol will not affect systems which contain the ∆11-comprehension
schema, since the graph of this function is ∆11-definable (cf. equation (3.20)).
However, in Propositions 55-56, we pointed that AHP0 and ABL0 do not prove
the existence of the graph of this function (R, n) 7→ #(Rn), in the sense that
AHP0 and ABL0 do not prove that the binary relation {(n,m) : #(Rn) = m}
exists for every binary relation R. Does the addition of this function symbol
affect the interpretability strength of AHP0 and ABL0? In particular, do the Heck-
Visser-Ganea results about the mutual interpretability of ABL0 and Robinson’s Q
mentioned in § 3.1.5 still hold if we add a function symbol for (R, n) 7→ #(Rn)?
236
CHAPTER 4
DENJOY INTEGRATION: DESCRIPTIVE SET THEORY AND MODEL
THEORY
4.1 Introduction
One of the classical results of descriptive set theory is Mazurkiewicz’s result
that Diff[a, b], the set of everywhere differential real-valued functions on [a, b], is a
Π11-complete subset of the Polish space C[a, b] of continuous real-valued functions
on [a, b] (cf. Kechris [84] § 33.D Theorem 33.9 p. 248). The set {F ∈ Diff[a, b] :
F (a) = 0} is in one-one correspondence with Derv[a, b], the set of derivatives
of everywhere differential real-valued functions on [a, b]. Building on unpublished
work of Ajtai, in the 1980s it was shown by Dougherty and Kechris that Derv[a, b],
viewed as subspace of the countable product space (C[a, b])ω of the Polish space
C[a, b], is co-analytic but not analytic (and indeed not even analytic on the co-
analytic subspace of (C[a, b])ω of sequences which converge pointwise) (cf. [28]
Theorem 1-2 p. 147, [83] Theorem 3.1-3.2 p. 310). This work raises the natural
question of the descriptive set theory complexity of integration, and this is how
Dougherty and Kechris frame the question:
A second problem is related to the definability aspects of the so-called“descriptive definitions of integrals” [. . . ]. These are essentially im-plicit definitions like the original one of the primitive. For example,the Lebesgue integral F of an integrable function f can be defined as
237
the unique (up to a constant) F such that (i) F is absolutely contin-uous and (ii) F ′ = f(x) for almost all x. By replacing in (i) absolutecontinuity by more general conditions, one can obtain descriptive def-initions of integrals involving any derivative. The question is whetherthese conditions can possibly be Borel ([28] p. 166).
The “more general” condition which Dougherty and Kechris refer to is known as
“generalized absolute continuity in the restricted sense” or ACG∗[a, b] (cf. Defi-
nition 121 and Theorems 128 and 134), and the resulting integrals are known as
Denjoy integrals. These turn out to be equivalent to the Henstock-Kurzweil inte-
grals and the Perron integrals (cf. Theorem 138 and Gordon [56] esp. Chapter 11
and Swartz [142]). In this chapter, Dougherty and Kechris’ question is answered
by showing that ACG∗[a, b] is a coanalytic but not analytic subset of the Polish
space of real-valued continuous functions C[a, b] (cf. Corollary 197). Using the
same methods, it is also shown that the operation of indefinite Denjoy integration
is coanalytic but not analytic. In particular, it is shown that the relation “f is
Denjoy integrable and F is equal to its indefinite integral” is a co-analytic but not
analytic relation on the product space M [a, b]×C[a, b], where M [a, b] is the Polish
space of real-valued measurable functions on [a, b] and where C[a, b] is again the
Polish space of real-valued continuous functions on [a, b] (cf. Corollary 195 and
Figure 4.1).
Dougherty and Kechris’ question was essentially a question of how difficult it
is to define the Denjoy integral. One can also ask about the complexity of the sets
which are defined by this integral. Here the appropriate setting seems to be that
of model theory, where one asks what can be defined in a first-order way from the
Denjoy integral, and a natural language for this is the language of R[X]-modules,
where the indeterminate X is interpreted as the indefinite Denjoy integral, so
that the atomic formulas are a very elementary type of integral equation. One
238
of the basic questions to ask here is whether there is any first-order difference
between the Denjoy integrable functions, the Lebesgue integrable functions, and
the continuous functions with the Riemann integral. This question is answered
here in the negative, in that it is shown that these R[X]-modules are elementar-
ily equivalent, and taken as Q[X]-modules their complete theory is computable
(Corollary 226). Hence, the conclusion of this paper is that from a descriptive set
theory standpoint, the Denjoy integrals are much more difficult to describe than
the Lebesgue or Riemann integrals, while from an admittedly elementary model-
theoretic standpoint, these integrals are indistinguishable. For suggestions as to
less elementary model-theoretic standpoints, see the further questions in § 4.5.
4.2 Background
The primary goal of this section is to review and collate the background ma-
terial on Denjoy integration which will be employed in subsequent sections. The
basic idea of the Denjoy integral is that it generalizes the Lebesgue integral by
replacing the notion of absolute continuity with the notion of “generalized abso-
lute continuity in the restricted sense”. Hence in § 4.2.1 this notion is defined and
several of its basic properties are recorded. Then in § 4.2.2, the definition of the
Denjoy integral is stated, and the manner in which this integral generalizes the
Lebesgue integral is explicitly discussed (cf. Theorems 128 and 134). In § 4.2.2,
several equivalent characterizations of the Denjoy integral are also recalled, such
as its equivalence with the Henstock-Kurzweil integral, and several of the basic
properties of this integral are noted. Finally, in § 4.2.3, two important lemmas
about the Denjoy integral are noted: namely, the Improper Integrals Lemma and
Lebesgue’s Lemma (Lemmas 143 and 146 and 149). These two lemmas allow for
239
the definition of a sequence of subspaces of the Denjoy integrable functions which
will prove important in what follows (cf. Definition 153 and Figure 4.1). Further,
in § 4.2.3, attention is paid to the exact closure properties possessed by these
subspaces, as these properties will be particularly important for the model theory
results discussed later (cf. Remark 199).
Definition 117. Let M [a, b] be the space of measurable real-valued functions on
[a, b] under the equivalence relation of almost everywhere equality. Let C[a, b] be
the space of continuous real-valued functions on [a, b]. Let K[a, b] be the space of
closed subsets of [a, b]. Let L1[a, b] be the space of Lebesgue Integrable functions
on [a, b]. (See Figure 4.1, and see Remarks 180-181 for the Polish structure on
C[a, b], K[a, b] and M [a, b]).
Remark 118. In what follows we always identify almost everywhere equal el-
ements of M [a, b]. Of course, several of the results also hold for the pointwise
functions, but especially when we consider the topology on M [a, b] in § 4.3.3, it
will be important to identify almost everywhere equal elements of M [a, b].
4.2.1 Absolutely Continuous Functions and Generalizations
Definition 119. Suppose that K ⊆ [a, b]. Then a K-edged subpartition D of [a, b]
is a finite non-empty collection J1, . . . , Jn of non-overlapping closed sub-intervals
of [a, b] which have both their endpoints in K. A sub-partition D of [a, b] is called
a partition if [a, b] = ∪J∈DJ . The length of a closed interval J will be denoted by
its Lebesgue measure µ(J).
Remark 120. The above terminology is introduced purely for the purpose of
not having to explicitly write out sub-partitions as [a1, b1], . . . , [an, bn], which can
240
Figure 4.1. Containment Diagram for Subsets of M [a, b] and C[a, b]
241
be quite cumbersome when one is quantifying over such sub-partitions, as in the
following definitions.
Definition 121. If F : [a, b]→ R and K ⊆ [a, b].
(i) Then F is said to be absolutely continuous on K, and written F ∈ AC(K),
if for every ε > 0 there is δ > 0 such that for all K-edged sub-partitions D
of [a, b] if∑
J∈D µ(J) < δ then∑
J∈D |F (max(J))− F (min(J))| < ε.
(ii) Further, F is said to be absolutely continuous in the restricted sense on K,
and written F ∈ AC∗(K), if for every ε > 0 there is δ > 0 such that for all
K-edged sub-partitions D of [a, b] if∑
J∈D µ(J) < δ then∑
J∈D ω(F, J) < ε,
where ω(F, J) = sup{|F (x)− F (y)| : x, y ∈ J}.
(iii) Finally, F is said to be generalized absolutely continuous in the restricted
sense, and written F ∈ ACG∗(K), if there are Kn ∈ K[a, b] such that
K =⋃nKn and F ∈ AC∗(Kn).
Remark 122. In the above definitions, note that no topological restrictions are
placed on F or K, although typically we will restrict ourselves to F ∈ C([a, b])
and K ∈ K[a, b].
Remark 123. Note that F ∈ AC([a, b]) or F ∈ AC∗([a, b]) trivially implies
F ∈ C[a, b], but F ∈ ACG∗([a, b]) does not in general imply that F ∈ C([a, b]).
For example consider F = 0 on [0, 12) and F = 1 on [1
2, 1]. Let K1 = [1
2, 1] and let
Kn = [0, 12− 1
n+1] for n > 1. Then [0, 1] =
⋃nKn and F ∈ AC∗(Kn) and hence
F ∈ ACG∗([0, 1]) but by construction F /∈ C([0, 1]). However, it turns out that
functions in ACG∗([a, b]) are differentiable almost everywhere:
Proposition 124. If F ∈ ACG∗([a, b]) then F is differentiable almost everywhere
on [a, b]
242
Proof. See Gordon [56] Corollary 6.19 p. 100.
Remark 125. The following proposition enumerates several elementary proper-
ties of absolutely continuous functions which shall be appealed to at various points
in what follows.
Proposition 126. Suppose that F ∈ C([a, b]).
(i) If E = [c, d] then F ∈ AC(E) if and only if F ∈ AC∗(E).
(ii) IfE ∈ K[a, b] & F ∈ AC∗(E) & (a, b)−E =⊔n(cn, dn) then
∑n ω(F, [cn, dn]) <
∞.
(iii) If E ∈ K[a, b] and Q ⊆ E is dense in E, then F ∈ AC∗(E) if and only if
F ∈ AC∗(Q).
(iv) If k ∈ R and F ∈ AC∗(K) then F + k ∈ AC∗(K).
(v) If k 6= 0 then F ∈ AC∗(K) if and only if kF ∈ AC∗(K).
(vi) If K ∈ K[a, b] and F ∈ AC∗(K) then F ∈ AC∗(K ∪ {a} ∪ {b}).
(vii) If L ⊆ K then F ∈ AC∗(K) implies F ∈ AC∗(L).
(viii) If K ⊆ [c, d] ⊆ [a, b], F ∈ AC∗(K), G = F on [c, d] for G ∈ C([a, b]) then
G ∈ AC∗(K).
Proof. For (i), the proof splits into two directions. For the left-to-right direction of
(i), since F is continuous on the bounded interval [a, b], given J ∈ D we define the
subinterval IJ ⊆ J so that |F (max(IJ))− F (min(IJ))| = ω(F, J). For the right-
to-left direction of (i), simply note that |F (max(J))− F (min(J))| ≤ ω(F, J).
For (ii), simply apply the definition of AC∗(E) for ε = 1 to obtain a δ >
0. Then there is N > 0 such that∑∞
i=n µ([cn, dn]) < δ for all n ≥ N . Then
243
∑n ω(F, [cn, dn]) =
∑n<N ω(F, [cn, dn]) +
∑n≥N ω(F, [cn, dn]), and this sum in
turn is ≤∑
n<N ω(F, [cn, dn]) + 1.
For (iii), this elementary fact is stated without proof in Gordon [56] Theo-
rem 6.2 (d) pp. 90-91, and a proof is included here merely for the sake of complete-
ness. First, note that the left-to-right direction holds trivially, since any Q-edged
sub-partition is automatically an E-edged sub-partition. For the right-to-left di-
rection of (iii), suppose that ε > 0. Choose δ > 0 such that for all Q-edged sub-
partitions D of [a, b] we have that∑
J∈D µ(J) < δ implies that∑
J∈D ω(F, J) < ε8.
Let D be an E-edged sub-partition of [a, b] such that∑
J∈D µ(J) < δ. Since F ∈
C([a, b]), choose η > 0 such that for |F (x)− F (y)| < ε8|D| whenever |x− y| < η.
Partition D into two E-edged sub-partitions D0 and D1 such that Di has the prop-
erty that no two-intervals in it have common endpoints, i.e., list out the elements
D in order and let D0 be the even ones in the list and let D1 be the odd ones.
Clearly it suffices to show that∑
J∈Di ω(F, J) < ε2. For each J ∈ Di, let mid(J)
be its midpoint and choose non-overlapping Q-edged intervals IJ so that
(a) min(IJ) ∈ (min(J)− η,mid(J)) ∩ (min(J)− η,min(J) + η)
(b) and max(IJ) ∈ (mid(J),max(J) + η) ∩ (max(J)− η,max(J) + η)
(c) and∑
J∈Di µ(IJ) < δ.
Then {IJ : J ∈ Di} is a Q-edged sub-partition of [a, b] such that∑
J∈Di µ(IJ) < δ,
and hence by choice of δ we have that∑
J∈Di ω(F, IJ) < ε8. It suffices to show that
for all J ∈ Di we have ω(F, J) ≤ ω(F, IJ)+ ε4|D| , since this implies
∑J∈Di ω(F, J) ≤∑
J∈Di(ω(F, IJ) + ε4|D|) = (
∑J∈Di ω(F, IJ)) + (
∑J∈Di
ε4|D|) <
ε8
+ ε4< ε
2. So we let
J ∈ Di and show that ω(F, J) ≤ ω(F, IJ) + ε4|D| . There are four cases, depending
on how J and IJ relate:
(C1) min(IJ) ≤ min(J) < mid(J) < max(J) ≤ max(IJ).
244
(C2) min(J) < min(IJ) < mid(J) < max(J) ≤ max(IJ).
(C3) min(IJ) ≤ min(J) < mid(J) < max(IJ) < max(J).
(C4) min(J) < min(IJ) < mid(J) < max(IJ) < max(J).
In case C1, we have IJ ⊇ J and hence ω(F, J) ≤ ω(F, IJ). In case C2, we can
conclude that ω(F, J) ≤ ω(F, [min(J),min(IJ)])+ω(F, [min(IJ),max(J)]) ≤ ε8|D|+
ω(F, IJ). In case C3, ω(F, J) ≤ ω(F, [min(J),max(IJ)])+ω(F, [max(IJ),max(J)]) ≤
ω(F, IJ) + ε8|D| . In case C4, we have that ω(F, J) ≤ ω(F, [min(J),min(IJ)]) +
ω(F, [min(IJ),max(IJ)]) + ω(F, [max(IJ),max(J)]) ≤ ε8|D| + ω(F, IJ) + ε
8|D| ≤
ω(F, IJ) + ε4|D| . Hence, in all four cases, we are done, and so we have established
the right-to-left direction of (iii).
For (iv), note that ω(F, J) = ω(F + k, J) since one has |F (x)− F (y)|=
|(F (x) + k)− (F (y) + k)|. Likewise, for (v), note that ω(kF, J) = |k|ω(F, J)
since |kF (x)− kF (y)| = |k| |F (x)− F (y)|.
For (vi), suppose that K ∈ K[a, b] and F ∈ AC∗(K). It suffices to show that
F ∈ AC∗(K ∪ {a}). Since K is closed, if a is a limit point of K, then a is already
in K, and we are done. Hence, we can suppose that there is some η > 0 such that
(a, a + η) ∩K = ∅. For ε > 0, choose δ > 0 corresponding to F ∈ AC∗(K) from
Definition 121 (ii) such that δ < η. Suppose that D is an K ∪ {a}-edged sub-
partition of [a, b] such that∑
J µ(J) < δ < η. Since each µ(J) < η, it cannot be
the case that one of the endpoints of J is a. Hence D is a K-edged sub-partition of
[a, b], from which we conclude that∑
J∈D ω(F, J) < ε by the hypothesis on δ > 0.
Hence, in fact F ∈ AC∗(K ∪ {a}).
For (vii), this follows immediately from the definitions. For (viii), simply note
that any K-edged sub-partition D is such that if J ∈ D then J ⊆ [c, d], and since
the values of F and G are the same on this interval, it follows that F ∈ AC∗(K)
245
implies G ∈ AC∗(K).
4.2.2 Basic Properties of the Denjoy Integral
Remark 127. The following is a version of the Fundamental Theorem of Calculus
for Lebesgue Integrals (cf. Folland [43] Theorem 3.35 p. 106).
Theorem 128. Suppose that f ∈ M [a, b] and F ∈ C[a, b] and F (a) = 0. Then
the following are equivalent:
(i) f ∈ L1[a, b] and F (x) =∫ xaf
(ii) F ∈ AC([a, b]) and F ′ = f a.e.
Remark 129. By Proposition 126 (i), it follows that F ∈ AC([a, b]) if and only
if F ∈ AC∗([a, b]). Hence, the above theorem can be restated as follows:
Theorem 130. Suppose that f ∈ M [a, b] and F ∈ C[a, b] and F (a) = 0. Then
the following are equivalent:
(i) f ∈ L1[a, b] and F (x) =∫ xaf
(ii) F ∈ AC∗([a, b]) and F ′ = f a.e.
Remark 131. Stated in this way, this theorem motivates following definition
of the Denjoy integral (cf. Gordon [56] Definition 7.1 p. 108, Peng-Yee [121]
Definition 6.8 p. 30).
Definition 132. Suppose that f : [a, b] → R. Then f is Denjoy integrable or
f ∈ Den[a, b] if and only if there is F ∈ C([a, b]) ∩ ACG∗([a, b]) such that F ′ = f
a.e.
246
Remark 133. It can be shown that if F ∈ C([a, b]) ∩ ACG∗[a, b] is such that
F ′ = 0 a.e. then F is constant everywhere on [a, b] (cf. Gordon [56] p. 108
and Corollary 6.26 p. 104, Peng-Yee [121] Theorem 6.11 p. 30). From this it
follows that for each f ∈ M [a, b], there is at most one F ∈ C([a, b]) ∩ ACG∗[a, b]
such that F ′ = f a.e. and F (a) = 0. Hence, when such a function exists, it
is called the indefinite Denjoy integral of f , and we write F (x) =∫ xaf . Hence,
it follows trivially from these definitions that we have the following analogue of
Theorem 128:
Theorem 134. Suppose that f ∈ M [a, b] and F ∈ C[a, b] and F (a) = 0. Then
the following are equivalent:
(i) f ∈ Den[a, b] and F (x) =∫ xaf
(ii) F ∈ ACG∗([a, b]) and F ′ = f a.e.
Remark 135. This analogue between Theorems 130 and 134 may not be enough
to convince one that the Denjoy integral is in fact deserving of the name of the
integral. It turns out that the Denjoy integral is equivalent to the Henstock-
Kurzweil integral, which directly generalizes the notion of the Riemann integral:
Definition 136. Suppose that f : [a, b] → R and suppose that a ≤ x ≤
b. Then f on [a, x] is Henstock-Kurzweil integrable with value F (x) if for ev-
ery ε > 0 there is a sequence of strictly positive values {δt}t∈[a,x] such that∣∣∣∑Ni=1 f(ti)(bi − ai)− F (x)
∣∣∣ < ε for all partitions [a1, b1], . . . , [aN , bN ] of [a, x] with
ti − δti < ai ≤ ti ≤ bi < ti + δti .
Remark 137. Hence, one sees immediately from this definition that every Rie-
mann integrable function f is Henstock-Kurzweil integrable with all the δt set
247
equal to a fixed constant δ > 0. It is easy to see the motivation for the Henstock-
Kurzweil integral by pursuing this analogy: for, the basic idea of the Riemann
integral is that given an error threshold ε > 0, there is a fixed width δ > 0 such
that so long as one takes boxes with width less than this fixed width, then the
estimates for the area under the curve in terms of these boxes will be within the
error threshold. Likewise, the idea of the Henstock-Kurzweil integral is that one
is allowed to vary the width-estimates along the domain of the integrable func-
tion, perhaps requiring greater precision on those areas of the domain where the
integrable function oscillates more frequently between large positive and nega-
tive values. This idea is very different from the motivating idea of the Denjoy
integral, which was defined to generalize the Fundamental Theorem of Calculus.
It is thus surprising that one can demonstrate that the Denjoy integral and the
Henstock-Kurzweil integral are one and the same:
Theorem 138. Suppose that f : [a, b] → R and F : [a, b] → R. Then the
following are equivalent
(i) f is Denjoy integrable with F (x) =∫ xaf .
(ii) f is Henstock-Kurzweil integrable with value F (x).
Proof. See Gordon [56] Chapter 11, and in particular Theorems 11.3-11.4 pp. 171-
173. See also Peng-Yee [121] Theorem 6.12-6.13 pp. 31-32.
Remark 139. There is a certain infelicity in the statement of the above theo-
rem, in that part (ii) of the theorem is stated in terms of the “values” from the
Definition 136 of the Henstock-Kurzweil integral. This was done merely for the
sake of not having to introduce subscripted integral signs. That is, one could
have stated the above theorem by subscripting the indefinite Henstock-Kurzweil
248
integral with “HK”, subscripting the indefinite Denjoy integral with “Den”, and
then stating in the above theorem that these two indefinite integrals are the same.
Such a formalism tends to obscure the main point: if one proceeds on the basis
of the Definition of the Denjoy integral from Definition 132 and the definition of
the Henstock-Kurzweil integral from Definition 136, then one obtains one and the
same class of integrable functions and one and the same values for these integrals.
Given this equivalence, one can quickly enumerate several elementary properties of
the Denjoy integral, many of which are easily proven using the Henstock-Kurzweil
characterization:
Proposition 140.
(i) L1[a, b] ⊆ Den[a, b] and the values of the integrals is the same
(ii) Derv[a, b] ⊆ Den[a, b] and∫ baF ′ = F (b)− F (a)
(iii) Den[a, b] ⊆M [a, b].
(iv) If f ∈ Den[a, b] then there are Kn ∈ K[a, b] with [a, b] =⋃nKn and fχKn ∈
L1[a, b].
(v) If f ∈ Den[a, b] and F (x) =∫ xa
then F ∈ C[a, b].
(vi) If f ∈ Den[a, b] and F (x) =∫ xa
then F ′ = f a.e.
Proof. For (i) see Pfeffer [122] Proposition 4. For (ii) see Swartz [142] Theo-
rem 5 p. 6. For (iii) see Peng-Yee [121] Theorem 5.10 pp. 23-24. For (iv) see
Gordon [56] Theorem 9.18 pp. 148-149. For (v), see either Definition 132 and
the subsequent remark, or Swartz [142] Corollary 2 p. 25. For (vi) see either
Definition 132 and the subsequent remark, or Swartz [142] Theorem 2 p. 135.
249
4.2.3 Lebesgue’s Lemma and the Subspaces
Definition 141. A subset X ⊆ M [a, b] is called subinterval-closed if f ∈ X and
(c, d) ⊆ (a, b) implies fχ(c,d) ∈ X .
Remark 142. Note that Den[a, b] is subinterval-closed. See, for example, Swartz
[142] Theorem 7 p. 16.
Lemma 143. (Improper Integrals Lemma) Suppose f ∈ M [a, b]. If fχ[c,b] ∈
Den[a, b] for every c ∈ (a, b), then f ∈ Den[a, b] with∫ baf = L if and only
if limc↘a+
∫ bcf exists and is equal to L. Likewise, if fχ[a,c] ∈ Den[a, b] for every
c ∈ (a, b), then f ∈ Den[a, b] with∫ baf = L if and only if limc↗b−
∫ ca
exists and is
equal to L.
Proof. Cf. Swartz [142] Chapter 3 Theorem 4 pp. 25-26.
Definition 144. Suppose that X ⊆ Den[a, b]. Then f ∈ Den[a, b] is an improper
integral of X if there is a countable sequence (an, bn) ⊆ (a, b) such that (i) (a, b) =⋃n(an, bn) and (an, bn) ⊆ (an+1, bn+1) and (ii) fχ(an,bn) ∈ X and (iii) limc↘a+
∫ b1cf
exists, and (iv) limc↗b−∫ ca1f exists. Further, let Lim(X ) be the set of improper
integrals of X .
Proposition 145. If X is a subset of Den[a, b] which is subinterval-closed and
which is closed under scalar multiplication, then Lim(X ) is subinterval-closed sub-
set of Den[a, b] which contains X and which is closed under scalar multiplication.
Further, if X is closed under addition, then so is Lim(X ).
Proof. First we show that Lim(X ) contains X . Since X is a subset of Den[a, b],
if f ∈ X then the left-to-right direction of the Improper Integrals Lemma implies
that limc↘a+
∫ b1cf and limc↗b−
∫ ca1f exist for any countable sequence (an, bn) ⊆
250
(a, b) such that (a, b) =⋃n(an, bn) and (an, bn) ⊆ (an+1, bn+1). Hence, by choosing
any such sequence as a witness, and by using the fact that both X is sub-interval
closed, we have that Lim(X ) contains X .
Second we note that Lim(X ) is closed under scalar multiplication. So sup-
pose that f is in Lim(X ) with the associated sequence of intervals (an, bn) and
suppose that s ∈ R is the scalar multiple. Since fχ(an,bn) ∈ X and since X is
by hypothesis closed under scalar multiplication, it follows that sfχ(an,bn) ∈ X ,
and limc↘a+
∫ b1csf = s · limc↘a+
∫ b1cf exists, and limc↗b−
∫ ca1sf = s limc↗b−
∫ ca1f
exists. Hence, Lim(X ) is closed under scalar multiplication.
Third we show that it is subinterval-closed. So suppose that f is in Lim(X )
with associated sequences of intervals (an, bn). Let (u, v) ⊆ (a, b). Since X is
subinterval-closed, it follows that (fχ(u,v))χ(an,bn) = (fχ(an,bn))χ(u,v) is in X . Since
Den[a, b] is subinterval closed and X is a subset of Den[a, b], we further have that
fχ(u,v) is in Den[a, b]. Applying the left-to-right direction of the Improper Integrals
Lemma, we then have that limc↘a+
∫ b1cfχ(u,v) and limc↗b−
∫ ca1fχ(u,v) exist. Hence,
it follows that fχ(c,d) is in Lim(X ), so that Lim(X ) is subinterval-closed.
Finally, supposing that X is closed under addition, we show that Lim(X )
is closed under addition. So suppose that f, g are in Lim(X ) with associated
sequences of intervals (an, bn) and (cn, dn). It must be shown that f + g is in
Lim(X ). Choose N > 0 such that (aN , bN)∩(cN , dN) 6= ∅, and define a sequence of
intervals by (un, vn) = (aN+n, bN+n)∩ (cN+n, dN+n). Since X is subinterval-closed
and contains fχ(an,bn) and gχ(cn,dn), it likewise contains fχ(un,vn) and gχ(un,vn).
Since X is closed under addition, it contains (f +g)χ(un,vn). Since X is a subspace
of Den[a, b], the left-to-right direction of the Improper Integrals Lemma implies
that each of limc↘a+
∫ v1cf and limc↗b−
∫ cu1f and limc↘a+
∫ v1cg and limc↗b−
∫ cu1g
251
exist, which in turn implies that limc↘a+
∫ v1cf + g and limc↗b−
∫ cu1f + g. Hence,
in fact Lim(X ) is closed under addition whenever X is closed under addition.
Lemma 146. (Lebesgue’s Lemma, first version) Suppose that f ∈ M [a, b] and
K ∈ K[a, b] and (a, b)−K =⊔∞n=1(cn, dn). Further suppose that fχK ∈ L1[a, b],
fχ[cn,dn] ∈ Den[a, b] and∑∞
n=1 ω(∫ xcnf, [cn, dn]) < ∞. Then f ∈ Den[a, b] and∫ b
af =
∫Kf +
∑∞n=1
∫ dncnf .
Proof. Cf. Pfeffer [122] Lemma 8, Peng-Yee [121] Theorem 7.1 and Corollary 7.11
and Gordon [56] Theorem 9.22 pp. 151-152.
Definition 147. Suppose that X ⊆ Den[a, b]. Then f ∈ Den[a, b] is given by
the first version of Lebesgue’s Lemma from X if there is a K ∈ K[a, b] with
(a, b) − K =⊔∞n=1(cn, dn) such that fχK ∈ L1[a, b] and fχ(cn,dn) ∈ X , and∑∞
n=1 ω(∫ xcnf, [cn, dn]) < ∞. Further, let Leb(X ) be the set of elements which
are given from the first version of Lebesgue’s Lemma by X .
Proposition 148. If X is a subset of Den[a, b] which is subinterval-closed and
which is closed under scalar multiplication, then Leb(X ) is subinterval-closed sub-
set of Den[a, b] which contains X and which is closed under scalar multiplication.
Proof. First we note that Leb(X ) contains X . Since X is a subset of Den[a, b], we
can choose the closed set K = ∅ to witness that any element f ∈ X is contained in
Leb(X ). Hence Leb(X ) contains X . Second we note that Leb(X ) is closed under
scalar multiplication, simply because if f is in Leb(X ) via the closed set K, then
the scalar multiple kf is in Leb(X ) via the closed set K.
Finally we note that Leb(X ) is sub-interval closed. So suppose that f is in
Leb(X ) via the closed set K, and suppose that (c, d) ⊆ (a, b) is the sub-interval.
252
Now, note that since (a, b)−K =⋃n(cn, dn) it follows that
(a, b)− (K ∩ [c, d]) = (a, c) ∪ (d, b) ∪⋃n
(cn, dn) (4.1)
Without loss of generality, we may assume that none of the (cn, dn) are subsets
of (a, c) or subsets of (d, b), since otherwise these (cn, dn) can be omitted without
affecting the equation (4.1). There are now four cases, depending on whether
(a, c) and (d, b) intersect any of the (cn, dn).
First suppose that there are no intersections, so that
(a, b)− (K ∩ [c, d]) = (a, c) t (d, b) t⊔n
(cn, dn) (4.2)
Then fχ(c,d)χ(cn,dn) ∈ X since fχ(cn,dn) ∈ X by hypothesis and X is sub-interval
closed. Likewise fχ(c,d)χ(a,c), fχ(c,d)χ(d,b) ∈ X since these functions are equal to
zero and since X is closed under scalar multiplication. Finally, we have that
fχ(c,d)χK ∈ L1[a, b] since fχK ∈ L1[a, b] by hypothesis. Hence, putting all these
elements together, one has fχ(c,d) ∈ Leb(X ).
Second suppose that (a, c) intersects (c`, d`) but that (d, b) does not intersect
any of the (cn, dn). Then a ≤ c` < c < d` ≤ d and
(a, b)− (K ∩ [c, d]) = (a, d`) t (d, b) t⊔n6=`
(cn, dn) (4.3)
Then for n 6= ` we have fχ(c,d)χ(cn,dn) ∈ X since fχ(cn,dn) ∈ X by hypothesis and
X is sub-interval closed. Similarly, fχ(c,d)χ(a,d`) = fχ(c,d`) ∈ X since fχ(c`,d`) ∈ X
by hypothesis and X is sub-interval closed and c` < c < d` by our case hypothesis.
Likewise fχ(c,d)χ(d,b) ∈ X since this function is equal to zero and since X is
253
closed under scalar multiplication. Finally, we have that fχ(c,d)χK ∈ L1[a, b] since
fχK ∈ L1[a, b] by hypothesis. Hence, putting all these elements together, we see
that fχ(c,d) ∈ Leb(X ).
The proofs of the remaining two cases are similar to this second case.
Lemma 149. (Lebesgue’s Lemma, second version) Suppose that f ∈M [a, b] and
K ∈ K[a, b] and (a, b) −K =⊔∞n=1(cn, dn). Further suppose that fχK ∈ L1[a, b]
and fχ[cn,dn] ∈ Den[a, b] and that there is F ∈ AC∗(K) such that F (x)−F (cn) =∫ xcnf on [cn, dn]. Then f ∈ Den[a, b] and
∫ baf =
∫Kf +
∑∞n=1
∫ dncnf .
Proof. This follows from the first version by Proposition 126 (ii).
Definition 150. Suppose that X ⊆ Den[a, b]. Then f ∈ Den[a, b] is given by
the second version of Lebesgue’s Lemma from X if there is a K ∈ K[a, b] with
(a, b) − K =⊔∞n=1(cn, dn) such that fχK ∈ L1[a, b] and fχ(cn,dn) ∈ X , and F ∈
AC∗(K) where F (x) =∫ xaf . Further, let Leb∗(X ) be the set of elements which
are given from Lebesgue’s Lemma by X .
Proposition 151. If X is a subset of Den[a, b] which is subinterval-closed and
which is closed under scalar multiplication, then Leb∗(X ) is subinterval-closed
subset of Den[a, b] which contains X and which is closed under scalar multiplica-
tion.
Proof. The argument that Leb∗(X ) contains X and that Leb∗(X ) is closed under
scalar multiplication is exactly identical to the argument from Proposition 148.
For the argument that Leb∗(X ) is sub-interval closed, suppose that f is in
Leb∗(X ) via the closed set K, and suppose that (c, d) ⊆ (a, b) is the sub-interval.
Let F (x) =∫ xaf so that F ∈ AC∗(K) and let G(x) =
∫ xafχ(c,d). Note that on
254
[c, d] we have F (x) =∫ xaf =
∫ xafχ(c,d)+
∫ caf = G(x)+
∫ caf , so that G and F differ
by the constant∫ caf on [c, d]. Since F ∈ AC∗(K) implies F ∈ AC∗(K ∩ [c, d]) (cf.
Proposition 126 (vii)), and since F and G differ by a constant on [c, d], it follows
from Proposition 126 (iv) & (viii) that G ∈ AC∗(K ∩ [c, d]). The argument then
proceeds exactly as in the proof of Proposition 148.
Definition 152. Den0[a, b] = L1[a, b] and Denα[a, b] = Leb(Lim(⋃β<α Denβ[a, b]))
when α > 0.
Definition 153. Den∗0[a, b] = L1[a, b] and Den∗α[a, b] = Leb∗(Lim(⋃β<α Den∗β[a, b]))
when α > 0.
Remark 154. The concept of the sets Denα[a, b] are very standard: they are
the usual way of understanding the Denjoy totalization process (cf. Gordon [56]
pp. 117 ff). However, it seems as though the sets Den∗α[a, b] are more amenable
to descriptive set theory analysis (cf. Theorem 172, Corollary 177, Corollary 196),
and hence in what follows we typically work with Den∗α[a, b] as opposed to Denα[a, b].
Proposition 155. For all α ≥ 0, it is the case that Denα[a, b] (resp. Den∗α[a, b])
is (i) a subinterval-closed subset of Den[a, b], and (ii) contains Denβ[a, b] (resp.
Den∗β[a, b]) for β < α, and (iii) is closed under scalar multiplication. Further, for
all α ≥ 0, it is the case that (iv) Denα[a, b] contains Den∗α[a, b].
Proof. By induction on α, using Proposition 145, Proposition 148, and Proposi-
tion 151 and the fact that L1[a, b] is sub-interval closed and is closed under scalar
multiplication. For (iv), this follows from the fact that Leb∗(X ) is a subset of
Leb(X ) by Proposition 126 (ii).
255
Definition 156. Suppose that X ⊆M [a, b]. Then let 〈X 〉 ⊆M [a, b] be the vector
subspace of M [a, b] generated by X .
Proposition 157. For all α ≥ 0, it is the case that 〈Denα[a, b]〉 (resp. 〈Den∗α[a, b]〉)
is (i) a subinterval-closed vector subspace of Den[a, b], and (ii) is equal to the set
of∑n
i=1 fi for fi ∈ Denα[a, b] (resp. the set of∑n
i=1 fi for fi ∈ Den∗α[a, b]). Fur-
ther, for all α ≥ 0, it is the case that (iii) 〈Denα[a, b]〉 contains Den∗α[a, b]〉 (cf.
Figure 4.1).
Proof. The proof is parallel between 〈Denα[a, b]〉 and 〈Den∗α[a, b]〉, and so we give
the proof for 〈Den∗α[a, b]〉. For (ii), note that formally 〈Den∗α[a, b]〉 is equal to the
set of∑n
i=1 kifi for ki ∈ R and fi ∈ Den∗α[a, b]. But since Den∗α[a, b] is closed
under scalar multiplication by Proposition 155, it follows that kifi ∈ Den∗α[a, b].
For (i), it must be shown that if f ∈ 〈Den∗α[a, b]〉 then fχ(c,d) ∈ 〈Den∗α[a, b]〉. So
by (ii), suppose that f =∑n
i=1 fi for fi ∈ Den∗α[a, b]. Since Den∗α[a, b] is sub-
interval closed by Proposition 155, it follows that fiχ(c,d) ∈ Den∗α[a, b], from which
it follows that fχ(c,d) =∑n
i=1 fiχ(c,d) ∈ 〈Den∗α[a, b]〉. Finally, part (iii) is a trivial
consequence of Proposition 155 (iv).
4.3 Descriptive Set Theory
The goal of this section is to prove that ACG∗[a, b] is a coanalytic but not
analytic (cf. Corollary 197). Using the same methods, it is also shown that the
operation of indefinite Denjoy integration is coanalytic but not analytic. In par-
ticular, it is shown that the relation “f is Denjoy integrable and F is equal to its
indefinite integral” is a co-analytic but not analytic relation on the product space
M [a, b]×C[a, b], where M [a, b] is the Polish space of real-valued measurable func-
tions on [a, b] and where C[a, b] is again the Polish space of real-valued continuous
256
functions on [a, b] (cf. Corollary 195 and Figure 4.1). The proofs here proceed
by identifying three derivatives in § 4.3.1 which measure the extent to which a
measurable function f and a continuous function F deviate from satisfying the
Fundamental Theorem of Calculus for the Lebesgue Integral (cf. Theorem 128).
Likewise, associated with these derivatives is an ordinal-valued rank, and in § 4.3.1
it is shown that there are functions of arbitrarily high countable rank (cf. Corol-
lary 166). Then, in § 4.3.2, it is shown that these ranks are correlated with the
entry into the subsets Den∗α[a, b] (cf. Theorem 172). Finally, in § 4.3.3, it is shown
that the derivatives are Borel (cf. Corollary 192), which allows us to apply an
important theorem linking the vanishing of Borel derivatives and coanalyticity
(cf. Kechris [84] Theorem 34.10 and Exercise 34.13).
4.3.1 Three Derivatives and Functions of Arbitrarily High Rank
Definition 158. Let K[a, b] be the set of closed subsets of [a, b]. Suppose that
B ⊆ K[a, b] is closed under subsets, i.e., if K ∈ B and L ⊆ K then L ∈ K[a, b].
Let K ∈ Bσ if K is the countable union of elements from B. Define a map
DB : K[a, b]→ K[a, b] by
DB(K) = {x ∈ K : U ∩K /∈ B for any open U 3 x} (4.4)
Define maps DαB : K[a, b]→ K[a, b] inductively by
D0B(K) = K, Dα+1
B (K) = DB(DαB(K)), Dα
B(K) =⋂β<α
DβB(K) when α limit
(4.5)
Let |K|B be the least α such that DαB(K) = Dα+1
B (K) and let D∞B (K) = D|K|BB (K).
257
Proposition 159. Suppose that B ⊆ K[a, b] is closed under subsets. Then
(i) If L ⊆ K then DαB(L) ⊆ Dα
B(K)
(ii) DαB(K) ∩ U ⊆ Dα
B(K ∩ U) for any open U
(iii) |K|B < ω1
(iv) DB(K) = {x ∈ K : (p, q) ∩K /∈ B for any rational p, q ∈ Q with (p, q) 3 x}
(v) D∞B (K) = ∅ if and only if K ∈ Bσ
Proof. For (i) & (iii)-(v), see Kechris [84] pp. 271-272. For (ii), suppose that
U is open. Suppose that α = 0. Then D0B(K) ∩ U = K ∩ U ⊆ K ∩ U =
D0B(K ∩ U). Suppose that α = β + 1. Suppose that x ∈ Dα
B(K) ∩ U but
x /∈ DαB(K ∩ U). Then there is open V 3 x such that V ∩Dβ
B(K ∩ U) ∈ B.
Since x ∈ U ∩ V ∩ DβB(K) ⊆ V ∩ Dβ
B(K ∩ U) ⊆ V ∩DβB(K ∩ U) we have that
U ∩ V ∩DβB(K) ⊆ V ∩Dβ
B(K ∩ U) ∈ B and since B is closed under subsets
we have that U ∩ V ∩DβB(K) ∈ B. But since U ∩ V 3 x this contradicts that
x ∈ DαB(K). Suppose that α is a limit. Then Dα
B(K) ∩ U =⋂β<α(Dβ
B(K) ∩ U) ⊆⋂β<αD
βB(K ∩ U) = Dα
B(K ∩ U).
Definition 160. For each f ∈M [a, b] define
Bf = {K ∈ K[a, b] : fχK ∈ L1[a, b]} (4.6)
For each F ∈ C[a, b] define
BF = {K ∈ K[a, b] : F ∈ AC∗(K)} (4.7)
258
For each pair f ∈M [a, b] and F ∈ C[a, b] define
Bf,F = Bf ∩ BF = {K ∈ K[a, b] : fχK ∈ L1[a, b] & F ∈ AC∗(K)} (4.8)
Since Bf , BF , and Bf,F are closed under subsets (cf. Proposition 126 (vii)), define
Df (K) = DBf (K), DF (K) = DBF (K), and Df,F (K) = DBf,F (K) and define
Dαf (K), Dα
F (K), and Dαf,F (K) as in Definition 158. Likewise, define |K|f = |K|Bf ,
|K|F = |K|BF and |K|f,F = |K|Bf,F . Finally define |f | = |[a, b]|f , |F | = |[a, b]|F
and |f, F | = |[a, b]|f,F .
Remark 161. For ease of future reference, it is helpful to unpack some of the
previous definition. So suppose that f ∈M [a, b] and F ∈ C[a, b]. Then it follows
from Definition 160 and Equation 4.4 of Definition 158 that
Df (K) = {x ∈ K : fχU∩K /∈ L1[a, b] for any open U 3 x} (4.9)
DF (K) = {x ∈ K : F /∈ AC∗(U ∩K) for any open U 3 x} (4.10)
Df,F (K) = {x ∈ K : [fχU∩K /∈ L1[a, b] & F /∈ AC∗(U ∩K)] for any open U 3 x}
(4.11)
That is, Df (K) is the points of K where f is not locally Lebesgue integrable,
while DF (K) is the points of K where F is locally absolutely continuous in the
restricted sense. Comparing this to the Fundamental Theorem of Calculus for
Lebesgue Integrals (cf. Theorem 128), one sees that these derivatives record the
points at which the Fundamental Theorem locally fails for a measurable function
f and a continuous function F .
After enumerating some of the elementary properties of these derivatives in
the next proposition, the goal in this section is to prove two facts about these
259
derivatives. First, that these derivatives vanish for each Denjoy integrable func-
tion and its indefinite integral (cf. Proposition 163), and second that there are
Denjoy integrable functions whose indefinite integrals have derivatives that vanish
at arbitrarily high stages (cf. Corollary 166). Intuitively, these two results tell
us that after countably many stages, each Denjoy integrable function becomes
Lebesgue integrable on its derivatives, and that for each countable stage, there
is some Denjoy integrable function on [a, b] which has of yet to become Lebesgue
integrable on its derivatives.
Proposition 162. Suppose f ∈M [a, b] and F ∈ C[a, b]. Then
(i) Df,F (K) = Df (K) ∪DF (K)
(ii) Dαf (K) ⊆ Dα
f,F (K)
(iii) DαF (K) ⊆ Dα
f,F (K)
(iv) If D∞f,F (K) = ∅ then D∞f (K) = ∅ and D∞F (K) = ∅
(v) If D∞f,F (K) = ∅ then |K|f , |K|F ≤ |K|f,F
(vi) If D∞f,F ([a, b]) = ∅ then |f | , |F | ≤ |f, F |
(vii) If k ∈ R then DF (K) = DF+k(K) and DαF (K) = Dα
F+k(K)
(viii) If k 6= 0 then DF (K) = DkF (K) and DαF (K) = Dα
kF (K)
Proof. (i) Suppose that x ∈ Df,F (K) but x /∈ (Df (K) ∪ DF (K)). Then there is
open U 3 x and open V 3 x such that K ∩ U ∈ Bf and K ∩ V ∈ BF . Then U ∩V
is open and contains x, and K ∩ U ∩ V ⊆ K ∩ U ∈ Bf and K ∩ U ∩ V ⊆ K ∩ V ∈
BF , and hence since Bf and Bf are closed under subsets, we have thatK ∩ U ∩ V ∈
Bf ∩ BF = Bf,F , which contradicts that x ∈ Df,F (K). Conversely, suppose that
260
x ∈ Df (K) but x /∈ Df,F (K). Then there is open U 3 x such that K ∩ U ∈
Bf,F = Bf ∩ BF ⊆ Bf , which contradicts that x ∈ Df (K). (ii) Suppose that
α = 0. Then by the previous item, Dαf (K) = K ⊆ K ∪K = Dα
f,F (K). Suppose
that α = β + 1. Then Dαf (K) = Df (D
βf (K)) ⊆ Df (D
βf,F (K)) ⊆ Df (D
βf,F (K)) ∪
DF (Dβf,F (K)) = Df,F (Dβ
f,F (K)) = Dαf,F (K). Suppose that α is a limit ordinal.
Then Dαf (K) =
⋂β<αD
βf (K) ⊆
⋂β<αD
βf,F (K) = Dα
f,F (K). (iii) The proof is
identical to the previous item. (iv) This follows directly from the previous two
items. (v) If D∞f,F (K) = ∅, then by (ii) we have D|K|f,F+1
f (K) ⊆ D|K|f,Ff (K) ⊆
D|K|f,Ff,F (K) = D∞f,F (K) = ∅. Since |K|f is by definition the least α such that
Dα+1f (K) = Dα
f (K), we have that |K|f ≤ |K|f,F . Similarly, we have |K|F ≤
|K|f,F . (vi) This follows directly from the previous item and Definition 160, which
said that e.g. |f | = |[a, b]|f . For (vii) and (viii), note that these follow directly
from Proposition 126 (iv)-(v).
Proposition 163. Suppose that f ∈ Den[a, b] and F (x) =∫ xaf and K ∈ K[a, b].
Then (i) D∞f (K) = ∅, (ii) D∞F (K) = ∅, and (iii) D∞f,F (K) = ∅.
Proof. For (i), note that Proposition 159 (v) implies that D∞f (K) = ∅ if and only
if K ∈ (Bf )σ, i.e. if there are Kn ∈ K[a, b] such that K =⋃nKn and fχKn ∈
L1[a, b]. But Proposition 140 (iv) says that this happens when f ∈ Den[a, b]. (ii)
Likewise, by Proposition 159 (v), we have that D∞F (K) = ∅ if and only if K ∈
(BF )σ, i.e. if there are Lm ∈ K[a, b] such that K =⋃m Lm and F ∈ AC∗(Lm).
But this is just to say that F ∈ ACG∗[a, b], and so this follows immediately from
the Fundamental Theorem of Calculus for Denjoy Integrals (cf. Theorem 134).
(iii) Now, retaining the closed sets Kn from part (i) and the closed sets Lm from
part (ii), consider the sequence of closed sets Cn,m = Kn ∩ Lm. Then we have
that K = K ∩K = (⋃nKn) ∩ (
⋃m Lm) =
⋃n,mKn ∩ Lm =
⋃n,mCn,m. Further,
261
since fχKn ∈ L1[a, b] and F ∈ AC∗(Lm), we have that fχCn,m ∈ L1[a, b] and
F ∈ AC∗(Cn,m). This is to say that K ∈ (Bf,F )σ, so that by Proposition 159 (v)
it follows that D∞f,F (K) = ∅.
Remark 164. The construction in the successor step of the following exam-
ple is based on the example discussed in Gordon [56] pp. 117-118, although this
discussion does not treat the derivatives DαF [a, b] which we introduced above in
Definition 160.
Proposition 165. For every α < ω1 and every [a, b] and r > 0 there is f ∈
Den[a, b] with F (x) =∫ xaf and
∫ baf = 0 and f(a) = f(b) = 0 and a, b ∈ Dα
F ([a, b])
and ω(F, [a, b]) = r.
Proof. Suppose that α = 0. Let f(x) = sin(2π(b−a)−1(x−a)). Since ω(F, [a, b]) =
b−aπ
> 0 where F (x) =∫ xaf , to ensure that for any r > 0 we can obtain
ω(F, [a, b]) = r, simply multiply f by rω(F,[a,b])
if need be. Here of course we
tacitly appeal to Proposition 162 (viii), which says that multiplying by non-zero
scalars does not affect the closed sets DαF [a, b].
Suppose now that α = β + 1. Let C be the Cantor 1/3-set on [a, b] and let
(a, b)−C =⊔n>0(cn, dn) and let Cn the Cantor 1/3-set on [cn, dn] and let (cn, dn)−
Cn =⊔m>0(cnm, dnm). Choose fnm ∈ Den[cnm, dnm] with Fnm(x) =
∫ xcnm
fnm and∫ dnmcnm
fnm = 0 and fnm(cnm) = fnm(dnm) = 0 and cnm, dnm ∈ DαFnm
([cnm, dnm])
and ω(Fnm, [cnm, dnm]) = 2−n if m < 2n and ω(Fnm, [cnm, dnm]) = 2−n2−m+2n−1
otherwise. Then by fixing n we have
∑m>0
ω(Fnm, [cnm, dnm]) = (2n − 1)2−n + 2−n∑m≥2n
2−m+2n−1 = 1 (4.12)
Still fixing n, let fn = fnm on [cnm, dnm] and fn = 0 otherwise, so that fn ∈
262
Den[cn, dn] with∫ dncnfn = 0 by the first version of Lebesgue’s Lemma 146, and
set Fn(x) =∫ xcnfn. Fixing n for the remainder of the paragraph, we claim that
ω(Fn, [cn, dn]) ≤ 2 · 2−n. For, let ε > 0 and let [x, y] ⊆ [cn, dn]. Since Fn is
continuous (cf. Proposition 140 (v)), choose δ > 0 such that 0 < u − x < δ
implies∣∣∫ uxfn∣∣ < ε
2and such that 0 < y − v < δ implies
∣∣∫ yvfn∣∣ < ε
2. Choose
u, v /∈ Cn such that cn ≤ x < u < v < y ≤ dn and 0 < u − x < δ and
0 < y − v < δ. If [u, v] ⊆ [cnm, dnm] then∣∣∫ vufn∣∣ ≤ ω(Fnm, [cnm, dnm]) ≤ 2−n and
hence∣∣∫ yxfn∣∣ ≤ ε+ 2−n. Otherwise, we have that cn` ≤ u ≤ dn` < cnm ≤ v ≤ dnm,
and then estimating as before we have∣∣∫ yxfn∣∣ ≤ ε + 2 · 2−n +
∣∣∣∫ cnmdn`fn
∣∣∣, and so
it suffices to show that∫ cnmdn`
fn = 0, which follows as above from the first version
of Lebesgue’s Lemma 146. Hence we have in fact shown that, fixing n, we have
ω(Fn, [cn, dn]) ≤ 2 · 2−n.
This of course implies that∑
n>0 ω(Fn, [cn, dn]) ≤∑
n>0 2 · 2−n ≤ 2, and so
letting f = fn on [cn, dn] and f = 0 otherwise, we have that f ∈ Den[a, b] with∫ baf = 0 by the first version of Lebesgue’s Lemma 146, and set F (x) =
∫ xaf . To
see that a, b ∈ DαF ([a, b]), note that by hypothesis cnm, dnm ∈ Dβ
Fnm([cnm, dnm]) and
hence cnm, dnm ∈ DβF ([a, b]), since Dβ
Fnm([cnm, dnm]) = Dβ
F �[cnm,dnm]([cnm, dnm]) ⊆
DβF ([a, b]) respectively by Proposition 162 (vii) and Proposition 159 (i). Since a
subsequence of the cnm converge to cn and since a subsequence of the dnm converge
to dn we have that cn, dn ∈ DβF ([a, b]). Then we claim that cn, dn ∈ DF (Dβ
F ([a, b])).
For, otherwise there is open U 3 cn such that F ∈ AC∗(U ∩DβF ([a, b])) and
hence F ∈ AC∗(U ∩ DβF ([a, b])) by Proposition 126 (vii). Since F ∈ AC∗(U ∩
DβF ([a, b])), choose δ > 0 corresponding to ε = 1
2in the definition of AC∗(K)
from Definition 121 (ii). Since U is open and intersects C, and since C is per-
fect and nowhere dense, U contains infinitely many intervals (c`, d`) and hence
263
an interval (c`, d`) with d` − c` < δ. By equation 4.12, choose a subsequence
(c`1, d`1), . . . , (c`M , d`M) such that∑M
m=1 ω(F`m, [c`m, d`m]) > 12. But this is a con-
tradiction, since (c`1, d`1), . . . , (c`M , d`M) is a U ∩ DβF ([a, b])-edged sub-partition
with∑M
m=1 d`m − c`m ≤ d` − c` < δ. Hence in fact cn, dn ∈ DF (DβF ([a, b])) for
all n which of course implies that a, b ∈ DF (DβF ([a, b])) = Dα
F ([a, b]), since there
is a subsequence of the cn converging to a and likewise a subsequence of the dn
converging to b.
Finally, note that ω(F, [a, b]) > 0 since 0 < 12
= ω(F1,1, [c1,1, d1,1]) ≤ ω(F, [a, b]).
Hence, to ensure that for any r > 0 we can obtain ω(F, [a, b]) = r, simply multi-
ply f by rω(F,[a,b])
if need be. Here we are appealing to Proposition 162 (viii), which
says that multiplying by non-zero scalars does not affect the closed sets DαF [a, b].
Suppose that α < ω1 is a limit ordinal. Let αn be an enumeration of the
ordinals less than α. Let w be the midpoint of [a, b]. Choose un ↘ a+ from above
with u0 = w and vn ↗ b− from below with v0 = w. Choose h : ω → ω such
that h−1(n) is infinite for all n. Choose fn ∈ Den[un+1, un] with Fn(x) =∫ xun+1
fn
and∫ unun+1
f = 0 and f(un+1) = f(un) = 0 and un+1, un ∈ Dαh(n)
Fn([un+1, un]) and
ω(Fn, [un+1, un]) = 1n. Likewise, choose gn ∈ Den[vn, vn+1] with Gn(x) =
∫ xvnfn
and∫ vn+1
vngn = 0 and gn(vn) = gn(vn+1) = 0 and vn+1, vn ∈ D
αh(n)
Gn([vn, vn+1]) and
ω(Gn, [vn, vn+1]) = 1n.
Let f = fn on [un+1, un] and f = gn on [vn, vn+1] and f(a) = f(b) = 0.
Since ω(Fn, [un+1, un]) = ω(Gn, [vn, vn+1]) = 1n, we claim that f ∈ Den[a, b] with∫ b
af = 0 by the Improper Integrals Lemma 143. For, to apply this lemma in this
way, it must be shown that limc↘a+
∫ wcf and limc↗b−
∫ caf exist and are equal
to zero, where recall that w is the midpoint of [a, b]. Without loss of generality,
consider the case of the first limit limc↘a+
∫ wcf . Let ε > 0. Choose N such that
264
1N< ε and set δ = uN − a. Suppose that 0 < c − a < δ, so that a < c < uN .
Let n ≥ N such that a < un+1 ≤ c < un ≤ uN . Since ω(Fn, [un+1, un]) = 1n
and∫ uiui+1
f = 0 , it follows that
∣∣∣∣∫ w
c
f
∣∣∣∣ ≤ ∣∣∣∣∫ un
c
f
∣∣∣∣+n−1∑i=0
∣∣∣∣∫ ui
ui+1
f
∣∣∣∣ ≤ 1
n+ 0 ≤ 1
N< ε (4.13)
Hence, in fact f ∈ Den[a, b] with∫ baf = 0 by the Improper Integrals Lemma 143,
and so we define F (x) =∫ xaf .
To show that a ∈ DαF ([a, b]), it suffices to show that a ∈ Dαn
F ([a, b]) for all
n. So, fixing n and recalling that h−1(n) is infinite, choose sequence unk ↘ a+
from above such that unk ∈ DαnFnk
([unk+1, unk ]). Since unk ∈ DαnFnk
([unk+1, unk ])
and DαnFnk
([unk+1, unk ]) = DαnF �[unk+1,unk ]([unk+1, unk ]) ⊆ Dαn
F ([a, b]) respectively by
Proposition 162 (vii) and Proposition 159 (i), it follows that unk ∈ DαnF ([a, b]).
Since unk ↘ a+ from above, it follows that a ∈ DαnF ([a, b]). Since the αn enu-
merated the ordinals below the limit ordinal α, it follows that a ∈ DαF ([a, b]). An
analogous argument shows that b ∈ DαF ([a, b]).
Finally, note that ω(F, [a, b]) > 0 since 0 < 1 = ω(F1, [u1, u0]) ≤ ω(F, [a, b]).
Hence, to ensure that for any r > 0 we can obtain ω(F, [a, b]) = r, simply multi-
ply f by rω(F,[a,b])
if need be. Here again we are appealing to Proposition 162 (viii),
which says that multiplying by non-zero scalars does not affect the closed sets
DαF [a, b].
Corollary 166. For all α < ω1 there is f ∈ Den[a, b] with F (x) =∫ xaf and
α < |F | ≤ |f, F |.
Proof. For each α < ω1, choose f ∈ Den[a, b] with F (x) =∫ xaf from the pre-
265
vious proposition so that DαF ([a, b]) 6= ∅. By Proposition 163, we have that
D∞F ([a, b]) = ∅ and D∞f,F ([a, b]) = ∅. Since DαF ([a, b]) 6= ∅, we have α < |F |.
By Proposition 162 (vi), we have |F | ≤ |f, F |.
4.3.2 Totalization: Calibrating Rank and Entry into Subspaces
Remark 167. Recall that in Remark 154, we noted that we would be working
with the less traditional subsets Den∗α[a, b] as opposed to the more traditional
subsets Denα[a, b]. In this section, we prove that the normal totalization result
for Denα[a, b] (in particular, the equivalence of (i) and (iv) in Corollary 174) also
holds for Den∗α[a, b] (in particular, the equivalence of (i) and (ii) in Corollary 174).
Hence, the subsets Den∗α[a, b] can be regarded as constituting a reasonable variant
on Denjoy totalization. Further, as indicated previously in Remark 154, it is the
subsets Den∗α[a, b] which seem more amenable to a descriptive set theory analysis
(cf. Corollary 196). In this section, the groundwork for this analysis is laid, in
that we precisely calibrate entry into Den∗α[a, b] with the vanishing of the derivative
Df,F introduced in the previous section (cf. Theorem 172). Later in § 4.3.3, it
will be proven that this derivative is Borel (cf. Corollary 192), which will allow
us to characterize the descriptive set-theory complexity of the subsets Den∗α[a, b]
(cf. Corollary 196).
Proposition 168. Suppose f ∈ M [a, b] and K ∈ K[a, b]. If Df (K) = ∅ then
fχK ∈ L1[a, b].
Proof. If Df (K) = ∅ then for every x ∈ K there is open nbhd Ux 3 x such
that fχUx∩K ∈ L1[a, b]. By the compactness of K, there is a finite subcov-
ering Ux1 , . . . , UxN . Then |fχK | ≤∑N
i=1
∣∣∣fχUxi∩K∣∣∣ and∣∣∫ fχK∣∣ ≤ ∫
|fχK | ≤∑Ni=1
∫ ∣∣∣fχUxi∩K∣∣∣ <∞, and so fχK ∈ L1[a, b].
266
Remark 169. In the estimates in the proof of the above proposition, it is im-
portant to note that we are working with the Lebesgue integral, since in general
it is not true that the absolute value of a Denjoy integrable function is Denjoy
integrable (cf. Swartz [142] Example 12 pp. 18-19). Indeed, it can be shown that
the Denjoy integrable functions whose absolute values are Denjoy integrable are
exactly the Lebesgue integrable functions (cf. Peng-Yee [121] p. 22).
Proposition 170. Suppose F ∈ C[a, b] and K ∈ K[a, b]. If DF (K) = ∅ then
F ∈ AC∗(K).
Proof. If DF (K) = ∅ then for every x ∈ K there is (cx, dx) 3 x such that
F ∈ AC∗((cx, dx) ∩K). By the compactness of K, there is a finite subcovering
(c1, d1), . . . , (cN , dN) such that F ∈ AC∗((ci, di) ∩K). Define ai = inf((ci, di)∩K)
and bi = sup((ci, di) ∩ K) so that ai, bi ∈ (ci, di) ∩K. Let η > 0 be strictly less
than all the nonzero |ai − bj|, |bi − aj| for i 6= j. Let ε > 0 and choose δi > 0 such
that for every (ci, di) ∩K-edged sub-partition D of [a, b] if∑
J∈D µ(J) < δi then∑J∈D ω(F, J) < N−1 · ε. Choose δ > 0 such that δ < δi and δ < η. Suppose that
D is an K-edged sub-partition of [a, b] with∑
J∈D µ(J) < δ.
First we establish the claim that if some closed interval J ∈ D is not (cj, dj)∩K-
edged for any j, then there are non-overlapping closed intervals IJ , LJ such that
J = IJ ∪LJ and IJ is (ci, di) ∩K-edged and LJ is (ck, dk) ∩K-edged for some i 6=
k. So suppose that J ∈ D is not (cj, dj)∩K-edged for any j. Then for some i 6= k
we have min(J) ∈ (ci, di) ∩K and max(J) ∈ (ck, dk) ∩K such that min(J) ≤ ck
and di ≤ max(J) and ai ≤ min(J) ≤ bi and ak ≤ max(J) ≤ bk. If bi < ak then
i 6= k implies η < ak − bi ≤ max(J) − min(J) = µ(J) < δ < η. Hence ak ≤ bi.
It suffices to show that bi ∈ (ck, dk) ∩K since then we may set IJ = [min(J), bi]
and LJ = [bi,max(J)]. If ak = bi then bi = ak ∈ (ck, dk) ∩K. If ak < bi then
267
ck ≤ ak < bi ≤ di ≤ max(J) < dk and so bi ∈ (ck, dk). Since bi = sup((ci, di)∩K),
choose a sequence xn ∈ ((ci, di) ∩K) which converges upwards to bi, so that the
sequence xn is eventually in (ck, dk) and hence bi ∈ (ck, dk) ∩K. Hence our claim
is established.
Let K be a [c, d] ∩K-edged sub-partition of [a, b] which (i) contains J where
J ∈ D is an (cj, dj) ∩ K-edged for some j, and which (ii) contains IJ , LJ where
J ∈ D is not (cj, dj) ∩K-edged for any j. Then for every J ∈ K there is some j
such that J is (cj, dj) ∩K-edged. Let Kj be an (cj, dj) ∩K-edged sub-partition
of [a, b] which consists of those J ∈ K such that J is (cj, dj) ∩K-edged. Then
Kj is an (cj, dj) ∩K-edged sub-parition of [a, b] such that∑
J∈Kj µ(J) < δ <
δj so that∑
J∈Kj ω(F, J) < N−1 · ε. Then∑
J∈D ω(F, J) ≤∑
J∈K ω(F, J) =∑Nj=1
∑J∈Kj ω(F, J) <
∑Nj=1N
−1ε = ε.
Corollary 171. Suppose f ∈M [a, b], F ∈ C[a, b], and K ∈ K[a, b].
(i) If Dα+1f (K) = ∅ then fχDαf (K) ∈ L1[a, b].
(ii) If Dα+1F (K) = ∅ then F ∈ AC∗(Dα
F (K)).
(iii) If Dα+1f,F (K) = ∅ then fχDαf,F (K) ∈ L1[a, b] and F ∈ AC∗(Dα
f,F (K)).
Theorem 172. For all f ∈ Den[a, b] with F (x) =∫ xaf and all α < ω1 and all
[c, d] ⊆ [a, b] we have Dα+1f,F [c, d] = ∅ if and only if fχ[c,d] ∈ Den∗α[a, b].
Proof. Suppose that α = 0. First suppose that Dα+1f,F [c, d] = ∅. By Corol-
lary 171 (iii), we have that fχ[c,d] ∈ L1[a, b] = Den∗0[a, b]. Second, suppose that
fχ[c,d] ∈ Den∗0[a, b] = L1[a, b]. By the Fundamental Theorem of Calculus for
Lebesgue Integrals (Theorem 128), F ∈ AC([c, d]) and hence F ∈ AC∗([c, d]) by
Proposition 126 (i). Then Dα+1f,F ([c, d]) = Df,F ([c, d]) = ∅.
268
Suppose now that α > 0. First suppose that Dα+1f,F ([c, d]) = ∅. By Corol-
lary 171 (iii), we have that fχDαf,F ([c,d]) ∈ L1[a, b] and F ∈ AC∗(Dαf,F ([c, d])).
Suppose (c, d)−Dαf,F ([c, d]) =
⊔n(cn, dn). If [c′, d′] ⊆ (cn, dn), then
Dαf,F ([c′, d′]) ⊆ [c′, d′] ⊆ (cn, dn) ⊆ (c, d)−Dα
f,F ([c, d]) (4.14)
Hence Dαf,F ([c′, d′]) = ∅. Then there is β < α such that Dβ+1
f,F ([c′, d′]) = ∅ and
hence by induction hypothesis fχ[c′,d′] ∈ Den∗β[a, b]. Hence, since we are sup-
posing that f ∈ Den[a, b] it follows from the left-to-right direction of the Im-
proper Integrals Lemma 143 that fχ[cn,dn] ∈ Lim(⋃β<α Den∗β[a, b]). Since by def-
inition we have (c, d) − Dαf,F ([c, d]) =
⊔n(cn, dn) and since we have already es-
tablished that F ∈ AC∗(Dαf,F ([c, d])), it follows from Definition 150 that fχ[c,d] ∈
Leb∗(Lim(⋃β<α Den∗β[a, b])) = Den∗α[a, b].
Second, suppose that fχ[c,d] ∈ Den∗α[a, b] = Leb∗(Lim(⋃β<α Den∗β[a, b])). By
Definition 150, there is a closed set K ∈ K[a, b] with (a, b) − K =⊔n(cn, dn)
such that fχ[c,d]χK ∈ L1[a, b] and fχ[c,d]χ(cn,dn) ∈ Lim(⋃β<α Den∗β[a, b]) and G ∈
AC∗(K) where G(x) =∫ xafχ[c,d]. Note that by Proposition 126 (vi), we may
assume without loss of generality that a, b ∈ K. Note that on [c, d] we have
F (x) =∫ xaf =
∫ xafχ[c,d] +
∫ caf = G(x) +
∫ caf , so that G and F differ by the
constant∫ caf on [c, d]. Since G ∈ AC∗(K) implies G ∈ AC∗(K ∩ [c, d]) (cf.
Proposition 126 (vii)), and since G and F differ by a constant on [c, d], it follows
from Proposition 126 (iv) & (viii) that F ∈ AC∗(K ∩ [c, d]).
Further, since fχ[c,d]χ(cn,dn) ∈ Lim(⋃β<α Den∗β[a, b]), it follows from Defini-
tion 144 that (a, b) =⋃m(cnm, dnm) and fχ[c,d]χ(cn,dn)χ(cnm,dnm) ∈ Den∗βnm [a, b] for
some βnm < α. Let [c′nm, d′nm] = [c, d] ∩ [cn, dn] ∩ [cnm, dnm], so that fχ[c′nm,d
′nm] ∈
Den∗βnm [a, b] for some βnm < α. By induction hypothesis, Dβnm+1f,F ([c′nm, d
′nm]) = ∅
269
and so Dαf,F ([c′nm, d
′nm]) = ∅. Since a, b ∈ K, it follows that
[c, d] ⊆ (K ∩ [c, d]) ∪ (a, b)−K = (K ∩ [c, d]) ∪⋃nm
(cn, dn) ∩ (cnm, dnm) (4.15)
and by successively applying this, Proposition 159 (ii), and Proposition 159 (i),
we obtain
Dαf,F ([c, d]) ⊆ (K ∩ [c, d]) ∪
⋃nm
Dαf,F ([c, d]) ∩ (cn, dn) ∩ (cnm, dnm) (4.16)
⊆ (K ∩ [c, d]) ∪⋃nm
Dαf,F ([c, d] ∩ (cn, dn) ∩ (cnm, dnm)) (4.17)
⊆ (K ∩ [c, d]) ∪⋃nm
Dαf,F ([c, d] ∩ [cn, dn] ∩ [cnm, dnm]) (4.18)
= (K ∩ [c, d]) ∪⋃nm
Dαf,F ([c′nm, d
′nm]) (4.19)
= (K ∩ [c, d]) (4.20)
From this and the fact that fχ[c,d]χK ∈ L1[a, b] and F ∈ AC∗(K ∩ [c, d]) it follows
that
Dα+1f,F ([c, d]) ⊆ Df,F (K ∩ [c, d]) = Df (K ∩ [c, d]) ∪DF (K ∩ [c, d]) = ∅ (4.21)
which is what we wanted to establish.
Corollary 173. Suppose that f ∈ Den[a, b] and F (x) =∫ xaf . Then
(i) D∞f,F ([a, b]) = ∅ =⇒ f ∈ Den∗|f,F |[a, b]
(ii) f ∈ Den∗α[a, b] and F (x) =∫ xaf =⇒ |f, F | ≤ α
(iii) Den∗ω1[a, b] = Den[a, b].
270
(iv) Den∗α[a, b] ⊆ Den∗β[a, b] ( Den[a, b] for all α < β < ω1
Proof. For (iii), this follows from the left-to-right direction of the previous theorem
and Proposition 163. For (iv), this follows from the right-to-left direction of
previous theorem and Corollary 166.
Corollary 174. Let f ∈ M [a, b] and F ∈ C[a, b] with F (a) = 0. Then the
following are equivalent:
(i) f ∈ Den[a, b] and F (x) =∫ xaf
(ii) There is α < ω1 such that f ∈ Den∗α[a, b] and F (x) =∫ xaf
(iii) There is α < ω1 such that f ∈ 〈Den∗α[a, b]〉 and F (x) =∫ xaf
(iv) There is α < ω1 such that f ∈ Denα[a, b] and F (x) =∫ xaf
(v) There is α < ω1 such that f ∈ 〈Denα[a, b]〉 and F (x) =∫ xaf
(vi) There is α < ω1 such that Dα+1f,F ([a, b]) = ∅ and F ′ = f a.e.
(vii) There is α < ω1 such that Dα+1F ([a, b]) = ∅ and F ′ = f a.e.
Proof. The proof proceeds by showing (i) ⇒ (ii) ⇒ (vi) ⇒ (vii) ⇒ (i) and
(ii)⇒ (iii)⇒ (i) and (ii)⇒ (iv)⇒ (i) and (iii)⇒ (v)⇒ (i).
For (ii) ⇒ (iv) ⇒ (i) and (iii) ⇒ (v) ⇒ (i), note that this follows trivially
from the fact that Den∗α[a, b] ⊆ Denα[a, b] ⊆ Den[a, b] by Proposition 157.
For (i) ⇒ (ii), note that this follows from item (iii) of the previous corollary.
For (ii) ⇒ (vi), note that this follows from the right-to-left direction of of Theo-
rem 172 and Proposition 140 (vi). For (vi)⇒ (vii), note that by Proposition 162,
Dα+1F ([a, b]) ⊆ Dα+1
f,F ([a, b]) = ∅. For (vii)⇒ (i), by Proposition 159 (vii), there are
271
En ∈ K[a, b] such that [a, b] =⋃nEn and F ∈ AC∗(En), so that F ∈ ACG∗[a, b],
and hence f ∈ Den[a, b] and F (x) =∫ xaf by Definition 132 and the subsequent
remark.
For (ii)⇒ (iii), note that this follows trivially from the fact that 〈Den∗α[a, b]〉
contains Den∗α[a, b]. For (iii)⇒ (i), this follows from the fact that 〈Den∗α[a, b]〉 is
a subspace of Den[a, b] (cf. Proposition 157).
Remark 175. One sees from the previous corollary that there is an asymmetry
between the derivatives DαF (K) and Dα
f (K). The following proposition shows that
this asymmetry is in a certain sense necessary. The construction below is based
on Gordon [56] Exercise 9 p. 119, which was used to produce an example of a
non-Denjoy integrable function, and so what we do is essentially just verify that
that we can obtain Dα+1f ([a, b]) = ∅ with this construction.
Proposition 176. Let f ∈ M [a, b] and F ∈ C[a, b] with F (a) = 0. Then the
following are not equivalent, and in particular, while (i) implies (ii), it is not the
case that (ii) implies (i):
(i) f ∈ Den[a, b] and F (x) =∫ xaf
(ii) There is α < ω1 such that Dα+1f ([a, b]) = ∅ and F ′ = f a.e.
Proof. That (i) implies (ii) follows immediately from Proposition 163 (i) and
Proposition 140 (vi). To see that (ii) does not imply (i), let C be the Cantor
1/3-set on [a, b] with (a, b) − C =⊔n>0(cn, dn). Define F = 0 on C and on
[cn, dn] define F to be everywhere differentiable with F (cn) = F (dn) = 0 and
ω(F, [cn, dn]) = 2−k if dn − cn = 3−k. Then F is differentiable a.e. and hence we
can choose f ∈ M [a, b] such that F ′ = f a.e. Further, consider the closed sets
E−1 = {a}, E0 = {b}, En = [cn, dn], so that [a, b] =⋃nEn. Then fχEn ∈ L1[a, b]
272
for each n ≥ −1, so that D∞f ([a, b]) = ∅ by Proposition 159 (v), and hence
Dαf ([a, b]) = ∅ for some α < ω1. Hence, the example of f and F satisfies the
hypotheses of (ii).
To see that this example does not satisfy the hypotheses of (i), it suffices to
show that F /∈ ACG∗[a, b]. So suppose that F ∈ ACG∗[a, b] with [a, b] =⋃nKn
and F ∈ AC∗(Kn). Then C∩ [a, b] =⋃nC∩Kn. By the Baire Category Theorem,
there is n and open U such that C ∩ U 6= ∅ and C ∩ U ⊆ C ∩ Kn. Since
F ∈ AC∗(Kn), we have F ∈ AC∗(C ∩ Kn) by Proposition 126 (vii). Let δ > 0
correspond to ε = 12
from the Definition of F ∈ AC∗(C∩Kn) in Definition 121 (ii).
Choose a ball V of radius < δ2
such that C ∩ V 6= ∅ and C ∩ V ⊆ C ∩Kn. Then
there is N > 0 and a C ∩ Kn-edged sub-partition DN = {(cnk,j , dnk,j) : 0 <
k < 2N , 0 < j < 2k, dnk,j − cnk,j = 3−N−k} such that J ∈ DN implies J ⊆ V .
That is, just as in the Cantor set there is one middle third, two “middle” ninths,
four “middle” 1/27ths, etc. so within V there will be one “middle” 3−N -ths,
two “middle” 3−N−1-ths, four “middle” 3−N−2-ths, etc. Then∑
J∈DN ω(F, J) =∑2N
k=1
∑2k
j=1 2−N−k =∑2N
k=1 2k2−N−k =∑2N
k=1 2−N = 2N2−N = 1. This contradicts
that∑
J∈DN µ(J) < δ.
Corollary 177. Let f ∈ M [a, b] and F ∈ C[a, b] with F (a) = 0. Then the
following are equivalent:
(i) f ∈ Den∗α[a, b] and F (x) =∫ xaf
(ii) Dα+1f,F ([a, b]) = ∅ and F ′ = f a.e.
Proof. (i) ⇒ (ii). By the right-to-left direction of Theorem 172 and Proposi-
tion 140 (vi). (ii) ⇒ (i). By the equivalence of (vi) and (i) in Corollary 174,
we have that f ∈ Den[a, b] and F (x) =∫ xaf . By the left-to-right direction of
Theorem 172, we have that f ∈ Den∗α[a, b].
273
Corollary 178. Let f ∈ M [a, b] and F ∈ C[a, b] with F (a) = 0. Then the
following are equivalent:
(i) f ∈ 〈Den∗α[a, b]〉 and F (x) =∫ xaf
(ii) There are f1, . . . , fn ∈ M [a, b] and F1, . . . , Fn ∈ C[a, b] with Fi(a) = 0 such
that Dα+1fi,Fi
([a, b]) = ∅ and F ′i = fi a.e. and f =∑n
i=1 fi and F =∑n
i=1 Fi
Proof. By the previous corollary and Proposition 157.
4.3.3 Definability: The Derivatives are Borel
Remark 179. The goal of this subsection is to prove that the three derivatives
Df (K), DF (K) and Df,F (K) introduced in § 4.3.1 and Definition 160 are Borel,
in the sense that as maps from the Polish space K[a, b] to K[a, b] they are Borel.
Hence, after briefly recalling the Polish structure on K[a, b], C[a, b], and M [a, b],
the goal here is to show that these three derivatives are Borel (cf. Corollar-
ies 188, 191, and 192). Since these derivatives are Borel, an important theorem
linking coanalyticity and the vanishing of Borel derivatives may be applied (cf.
Kechris [84] Theorem 34.10 and Exercise 34.13). In particular, using this theorem,
it can be shown that the relation f ∈ Den[a, b] and F (x) =∫ xaf is coanalytic but
not analytic on M [a, b]×C[a, b] (cf. Corollary 195), and likewise it is shown that
F ∈ ACG∗[a, b] is coanalytic but not analytic on C[a, b] (cf. Corollary 197 and
Figure 4.1).
Remark 180. Recall that K[a, b], the space of compact (or closed) subsets of
[a, b], is a Polish space, where the topology is generated by the “miss” sets {K ∈
K[a, b] : K∩U c = ∅} and the “hit” sets {K ∈ K[a, b] : K∩U 6= ∅}, where U ⊆ [a, b]
is open (cf. Kechris [84] § 4.F pp. 24 ff). Likewise C[a, b], the space of continuous
274
real-valued functions on [a, b], is Polish space, where the topology is given by the
sup-metric ‖F − G‖u = sup{x ∈ [a, b] : |F (x)−G(x)|} (cf. Kechris [84] § 4.E
p. 24).
Remark 181. The Polish space structure on M [a, b], the space of real-valued
measurable functions on [a, b] (where functions which are equal a.e. are identified),
is less familiar. It is given by the following metric:
d(f, g) = inf{ε > 0 : µ({x ∈ [a, b] : |f(x)− g(x)| > ε}) < ε} (4.22)
This metric is defined so that fn → f in M [a, b] if and only if fn → f in measure,
that is limn µ({x ∈ [a, b] : |fn(x)− f(x)| > ε}) = 0 for all ε > 0 (cf. Folland [43]
§ 2.4 pp. 60 ff and Doob [27] § V.12 pp. 67 ff). This Polish space does not appear
in standard references on descriptive set theory, such as Kechris [84]. Hence, for
the sake of completeness, we record here a proof that M [a, b] is a Polish space,
where this proof is based on Doob [27] § 12 pp. 67-68, but where we key whatever
results we can to the standard real analysis textbook Folland [43].
Proposition 182. M [a, b] is a Polish space, and in particular, the countable
dense set can be taken to be the rational-valued simple functions formed from
open intervals with rational endpoints.
Proof. For completeness, see Folland [43] Theorem 2.30. For separability, fix ε > 0
and choose a sequence of simple functions sn which converge a.e. to f (see Fol-
land [43] Theorem 2.10b). By Ergoroff’s Theorem [43] 2.33, there is a a measur-
able set E such that µ(E) < ε and sn → f uniformly on Ec. Choose N such
that |sN − f | < ε on Ec. Then µ({x ∈ [a, b] : |sN − f | > ε}) ≤ µ(E) < ε.
Choose a sequence ϕm of rational-valued simple functions formed from open in-
275
tervals with rational endpoints such that ϕm → sN a.e. (see Folland [43] Theo-
rem 2.26 & Corollary 2.32). By Ergoroff’s Theorem [43] 2.33, there is a a mea-
surable set D such that µ(D) < ε and ϕm → sN uniformly on Dc. Choose M
with |ϕM − sN | < ε on Dc. Then µ({x ∈ [a, b] : |ϕM − sN | > ε}) ≤ µ(D) < ε and
µ({x ∈ [a, b] : |f − ϕM | > 2ε}) ≤ µ({x ∈ [a, b] : |f − sN | > ε}) + µ({x ∈ [a, b] :
|sN − ϕM | > ε}) < 2ε. Then d(ϕM , f) ≤ 2ε. Hence, the countable dense set can
be chosen to be the rational-valued simple functions formed from open intervals
with rational endpoints.
Proposition 183. The map E 7→ χE from K[a, b] into M [a, b] is Borel.
Proof. By the previous proposition, it suffices to show that the set
Xf = {D ∈ K[a, b] : d(χD, f) < ε} (4.23)
is Borel in K[a, b] for every rational-valued simple function f ∈ M [a, b] formed
from open intervals with rational endpoints. However, note that
Xf =⋃
r∈(0,ε)∩Q
Xf,r (4.24)
where we define
Xf,r = {D ∈ K[a, b] : µ({x ∈ [a, b] : |χD − f | > r}) < r} (4.25)
To see equation (4.24), note that the right-to-left containment follows trivially
from the definition of the metric in equation (4.22). To see the left-to-right con-
tainment, suppose that d(χD, f) < ε. Then there is η such that d(χD, f) ≤ η < ε
and µ({x ∈ [a, b] : |χD − f | > η}) < η. Choose a strictly decreasing se-
276
quence of rational values rn which converge to η from above and which are all
strictly less than ε: that is, rn ↘+ η and η < rn+1 < rn < ε and rn ∈ Q.
Then {x ∈ [a, b] : |χD − f | > rn} ⊆ {x ∈ [a, b] : |χD − f | > rn+1} and
{x ∈ [a, b] : |χD − f | > η} =⋃n{x ∈ [a, b] : |χD − f | > rn}. Then
limnµ({x ∈ [a, b] : |χD − f | > rn}) = µ({x ∈ [a, b] : |χD − f | > η}) < η = lim
nrn
(4.26)
Then 0 < limn[rn − µ({x ∈ [a, b] : |χD − f | > rn})], and hence there is N > 0
such that 0 < rN − µ({x ∈ [a, b] : |χD − f | > rN}) or µ({x ∈ [a, b] : |χD − f | >
rN}) < rN , and hence D ∈ Xf,rN as defined in equation (4.25).
Hence, it suffices to show that the sets Xf,r from equation (4.25) are Borel,
where f is a fixed rational-valued simple function formed from open intervals with
rational endpoints and where likewise the rational r is fixed. Since functions in
M [a, b] are identified a.e., we have that f =∑N
i=1 biχ[ai,ai+1] where a = a1 < a2 <
· · · < aN+1 = b. Define constants ci by ci = 1 if |1− bi| > r and ci = 0 otherwise,
and likewise define constants di by di = 1 if |bi| > r and di = 0 otherwise. Note
that the values ai, bi, ci, di depend only on f and r, which are fixed. Then note
277
that for arbitrary D ∈ K[a, b], we have
µ({x ∈ [a, b] : |χD(x)− f(x)| > r}) =
N∑i=1
µ({x ∈ [ai, ai+1] : |χD(x)− f(x)| > r}) =
N∑i=1
µ({x ∈ [ai, ai+1] : |χD(x)− bi| > r}) =
N∑i=1
µ({x ∈ [ai, ai+1] ∩D : |χD(x)− bi| > r})
+ µ({x ∈ [ai, ai+1] \D : |χD(x)− bi| > r}) =
N∑i=1
µ({x ∈ [ai, ai+1] ∩D : |1− bi| > r}) + µ({x ∈ [ai, ai+1] \D : |bi| > r}) =
N∑i=1
ciµ([ai, ai+1] ∩D) + diµ([ai, ai+1] \D) =
N∑i=1
(ci − di)µ([ai, ai+1] ∩D) + di(ai+1 − ai) (4.27)
Then it follows from the definition in equation (4.25) that
Xf,r = {D ∈ K[a, b] :N∑i=1
(ci−di)µ([ai, ai+1]∩D) < r− [N∑i=1
di(ai+1−ai)]} (4.28)
But this is Borel, since the maps (D,E) 7→ D ∩ E and E 7→ µ(E) are Borel (cf.
Kechris [84] Exercise 11.4 ii p. 71 and Exercise 17.29 p. 114).
Remark 184. By e.g. Folland [43] p. 63, addition and multiplication are contin-
uous functions on M [a, b]. Note also that absolute value is continuous on M [a, b]
since if fn → f in measure, then |fn| → |f | in measure because {x ∈ [a, b] :
||fn(x)| − |f(x)|| ≥ ε} ⊆ {x ∈ [a, b] : |fn(x)− f(x)| ≥ ε}.
Proposition 185. The following sets are Borel:
278
(i) {f ∈M [a, b] : f ≥ 0}
(ii) {f ∈M [a, b] : ∃ measurable E ⊆ [a, b] f = χE}
(iii) {(f, g) ∈ (M [a, b])2 : ∃ disjoint measurable D,E ⊆ [a, b] f = χD, g = χE}
(iv) {(f, g) ∈ (M [a, b])2 : ∃ measurable E ⊆ [a, b] f = χE, g = χ[a,b]\E}
(v) {(f, r) ∈M [a, b]× R : ∃ measurable E ⊆ [a, b] f = χE &∫|f | > r}
(vi) {(f, r) ∈M [a, b]× R : ∃ measurable E ⊆ [a, b] f = χE &∫|f | < r}
(vii) {(f, r) ∈M [a, b]× R : ∃ measurable E ⊆ [a, b] f = χE &∫|f | = r}
Proof. In the proof of this proposition, it is helpful to keep in mind that elements of
the Polish space M [a, b] are identified when they are a.e. equal. For (i), note that
f ≥ 0 a.e. if and only if |f | = f a.e., and recall that absolute value is continuous
by Remark 184. For (ii), note that f is a.e. a characteristic function if and only
if f · f = f a.e., and recall that multiplication is continuous by Remark 184. For
(iii), note that two characteristic functions f, g represent disjoint sets a.e. if and
only if f · g = 0 a.e. Likewise, for (iv), note that two characteristic functions f, g
represent complementary sets a.e. if and only if f · g = 0 a.e. and f + g = 1 a.e.,
where recall that addition and multiplication are continuous by Remark 184.
For (v), note that if f is a.e. the characteristic function χE, then∫|f | = µ(E).
Further µ(E) > r if and only if there is a closed set K ∈ K[a, b] such that K ⊆ E
and µ(K) > r. That is, µ(E) > r if and only if there is a closed set K ∈ K[a, b]
such that 0 ≤ χE − χK a.e. and µ(K) > r, which is analytic by the previous
proposition and Kechris [84] Exercise 17.29 p. 114. Hence, by Souslin’s Theorem
279
([84] Theorem 14.11), it suffices to show that relation µ(E) ≤ r is analytic. But
note that µ(E) ≤ r if and only if (b− a)− µ([a, b] \E) ≤ r, which happens if and
only if µ([a, b] \ E) > q for every rational q < (b − a) − r, which is analytic by
part (iv) of this proposition and what was said previously in this paragraph.
For (vi), it again suffices to note that µ(E) < r if and only if (b−a)−µ([a, b]\
E) < r if and only if (b − a) − r < µ([a, b] \ E), so that the result follows from
parts (iv)-(v) of this proposition. Finally, for (vii), note that µ(E) = r if and only
if for every rational q > 0, it follows that r− q < µ(E) < r+ q, so that the result
follows immediately from parts (v)-(vi).
Proposition 186. L1[a, b] is Borel in M [a, b].
Proof. First note that by the previous proposition, we can talk of rational-valued
simple functions as finite sums of pairwise disjoint characteristic functions mul-
tiplied by a rational value. Hence on the one hand, f ∈ L1[a, b] if and only if
there is rational M > 0 such that for any rational-valued simple function ϕ with
|ϕ| ≤ |f |, it is the case that∫|ϕ| < M , and so L1[a, b] is co-analytic in M [a, b]. On
the other hand, to show that L1[a, b] is analytic in M [a, b], it suffices to show that
f ∈ L1[a, b] if and only if there is a sequence of rational-valued simple functions
|ϕn| → |f | in M [a, b] with |ϕn| ≤ |ϕn+1| ≤ |f | a.e. such that limn
∫|ϕn| exists and
is finite.
To see this equivalence, first suppose that f ∈ L1[a, b]. Then |f | ∈ L1[a, b] and
hence choose a sequence of rational-valued simple functions 0 ≤ ϕn ≤ ϕn+1 ≤ |f |
and ϕn → f a.e. Let ψn = |f | − ϕn. Then ψn ∈ L1[a, b] and ψn → 0 a.e.
and |ψn| ≤ |f | + |ϕn| ≤ 2 |f |. By the Dominated Convergence Theorem (Fol-
land [43] Theorem 2.24 p. 54), we have that 0 =∫ ba
0 = limn
∫ ba(|f | − ϕn). Hence
limn
∫ baϕn =
∫ ba|f | and so limn
∫ baϕn exists. Further, ϕn → |f | in L1[a, b] and
280
so ϕn → |f | in M [a, b] by Folland [43] Proposition 2.29 p. 61. Second, sup-
pose that there is a sequence of rational-valued simple functions |ϕn| → |f | in
M [a, b] with |ϕn| ≤ |ϕn+1| ≤ |f | a.e. such that limn
∫|ϕn| exists. By Fol-
land [43] Theorem 2.30, there is a subsequence ϕnk such that |ϕnk | → |f | a.e.
By the Monotone Convergence Theorem (Folland [43] Theorem 2.14), we have
that∫ ba|f | = limk
∫ ba|ϕnk | <∞, so that in fact f ∈ L1[a, b].
Proposition 187. Suppose that f ∈ M [a, b] and K ∈ K[a, b]. Then (p, q) ∩
Df (K) = ∅ if and only if for all rational [r, s] ⊆ (p, q) it is the case that fχ[r,s]∩K ∈
L1[a, b].
Proof. The left to right direction follows immediately from Corollary 171. For the
right-to-left direction, suppose for the sake of contradiction that x ∈ (p, q)∩Df (E).
Choose rational (r, s) 3 x such that [r, s] ⊆ (p, q). By hypothesis fχ[r,s]∩K ∈
L1[a, b] and so fχ(r,s)∩K ∈ L1[a, b]. But this contradicts that x ∈ Df (K).
Corollary 188. The map (f,K) 7→ Df (K) is Borel from M [a, b] × K[a, b] to
K[a, b].
Proof. First recall that being a Borel map is the same as having a Borel graph
(cf. Kechris [84] Theorem 14.12). Hence, it suffices to show that the following set
is Borel
G = {(f,K,E) ∈M [a, b]× (K[a, b])2 : E = Df (K)} (4.29)
But note that since K,E are closed sets, it follows that
G = {(f,K,E) ∈M [a, b]× (K[a, b])2 :∀ p < q in Q2
[(p, q) ∩ E = ∅ ⇐⇒ (p, q) ∩Df (K) = ∅]}
(4.30)
281
But the left-hand side of this biconditional is Borel in K[a, b] by definition of
the topology on K[a, b] (cf. Remark 180), while the right-hand side of this bi-
conditional is Borel in M [a, b]×K[a, b] by Proposition 187, Proposition 186, and
Proposition 183.
Proposition 189. The relation F ∈ AC∗(E) is Borel on C[a, b]×K[a, b].
Proof. By definition, F ∈ AC∗(E) if for every ε > 0 there is a δ > 0 such that for
all E-edged sub-partitions D of [a, b] if∑
J∈D µ(J) < δ then∑
J∈D ω(F, J) < ε.
Since F is continuous, Proposition 126 (iii) says that we may replace E by a
countable dense subset {dn(E)} of E. Further, maps dn : K[a, b]→ [a, b] may be
chosen to be Borel (see Kechris [84] Theorem 12.13 p. 76). Moreover, consider the
closed subset 4 = {(c, d) ∈ R×R : c ≤ d} which is thus a Polish space, and note
that the map (F, c, d) 7→ ω(F, [c, d]) from C([a, b])×4 to R is clearly a continuous
map. Finally ω<ω denote the set of finite strings of natural numbers, where |σ|
denotes the length of the string σ. Then it follows that
{(F,E) ∈ C[a, b]×K[a, b] : F ∈ AC∗(E)} =⋂
ε∈Q∩(0,∞)
⋃δ∈Q∩(0,∞)
⋃`>0
⋃|σ|=2`
Xε,δ,`,σ
(4.31)
where for σ = 〈σ(1), . . . , σ(2`)〉 in ω<ω of length 2` we define
Xε,δ,`,σ ={(F,E) ∈ C[a, b]×K[a, b]} :∧i=1
dσ(2i−1)(E) < dσ(2i)(E)
&`−1∧i=1
dσ(2i)(E) ≤ dσ(2i+1)(E)
&∑i=1
(dσ(2i)(E)− dσ(2i−1)(E)) < δ ⇒∑i=1
ω(F, [dσ(2i−1)(E), dσ(2i)(E)]) < ε}
(4.32)
282
Since the maps E 7→ dn(E) and (F, c, d) 7→ ω(F, [c, d]) are Borel, it thus follows
that Xε,δ,`,σ is Borel and hence that the relation F ∈ AC∗(E) is Borel.
Proposition 190. Suppose that F ∈ C[a, b] and K ∈ K[a, b]. Then (p, q) ∩
DF (K) = ∅ if and only if for all rational [r, s] ⊆ (p, q) it is the case that F ∈
AC∗([r, s] ∩K).
Proof. The left-to-right direction follows immediately from Corollary 171. For
the right-to-left direction, suppose for the sake of contradiction that x ∈ (p, q) ∩
DF (K). Choose rational (r, s) 3 x such that [r, s] ⊆ (p, q). By hypothesis F ∈
AC∗([r, s]∩K) and so F ∈ AC∗((r, s) ∩K). But this contradicts that x ∈ DF (K).
Corollary 191. The map (F,K) 7→ DF (K) is Borel from C[a, b] × K[a, b] to
K[a, b].
Proof. First recall that being a Borel map is the same as having a Borel graph
(cf. Kechris [84] Theorem 14.12). Hence, it suffices to show that the following set
is Borel
G = {(F,K,E) ∈ C[a, b]× (K[a, b])2 : E = DF (K)} (4.33)
But note that since K,E are closed sets, it follows that
G = {(F,K,E) ∈ C[a, b]× (K[a, b])2 :∀ p < q in Q2
[(p, q) ∩ E = ∅ ⇔ (p, q) ∩DF (K) = ∅]}
(4.34)
But the left-hand side of this biconditional is Borel in K[a, b] by definition of the
topology on K[a, b] (cf. Remark 180), while the right-hand side of this bicondi-
tional is Borel in C[a, b]×K[a, b] by Proposition 190, Proposition 189, and the fact
283
that the map (D,L) 7→ D∩L from K[a, b] to K[a, b] is Borel (cf. Kechris [84] Ex-
ercise 11.4 ii p. 71).
Corollary 192. The map (f, F,K) 7→ Df,F (K) is Borel from M [a, b]× C[a, b]×
K[a, b] to K[a, b].
Proof. This follows directly from the two previous corollaries, as well as the fact
that Df,F (K) = Df (K) ∪DF (K) (cf. Proposition 162 (i)) and the fact that the
map (D,E) 7→ D ∪ E is continuous (see Kechris [84] Exercise 4.29 iv p. 27).
Theorem 193. The following sets are co-analytic and the ranks |K|f , |K|F , and
|K|f,F are co-analytic ranks on these sets:
(i) {(f,K) ∈M [a, b]×K[a, b] : ∃ α < ω1 Dαf (K) = ∅}
(ii) {(F,K) ∈ C[a, b]×K[a, b] : ∃ α < ω1 DαF (K) = ∅}
(iii) {(f, F,K) ∈M [a, b]× C[a, b]×K[a, b] : ∃ α < ω1 Dαf,F (K) = ∅}
Proof. By the previous corollary, the derivativesDf , DF andDf,F are Borel deriva-
tives. The result then follows immediately from Kechris [84] Theorem 34.10 & Ex-
ercise 34.13.
Proposition 194. The class of (F, f) in C[a, b]×M [a, b] such that F is differen-
tiable a.e. and F ′ = f is Borel.
Proof. Let us define
C = {(F, f) ∈ C[a, b]×M [a, b] : F ′ exists a.e. & F ′ = f} (4.35)
B = {F ∈ C[a, b] : F ′ exists a.e.} (4.36)
284
Then the proposition requires us to prove that C is Borel. Note that C is the
graph of the function Γ : B → M [a, b] given by Γ(F ) = F ′. For fixed ε > 0 and
function f ∈M [a, b], define the sets
Bf = {F ∈ B : d(Γ(F ), f) < ε} (4.37)
Af = {F ∈ C[a, b] : d(F, f) < ε} (4.38)
where d is the metric on M [a, b] from equation (4.22). We claim that to show that
C is Borel, it suffices to show that (i) B is Borel, and that (ii) Bf is Borel for each
f ∈ M [a, b] from a countable dense set from M [a, b], and that (iii) Af is likewise
Borel for these same f ∈ M [a, b]. For, suppose that (i) and (ii) have both been
established. Define Γ : C[a, b]→M [a, b] by Γ(F ) = Γ(F ) on B and Γ(F ) = F on
C([a, b]) \ B. Then it follows from (i)-(iii) that Γ is a Borel map. For, suppose
that Uf = {g ∈M [a, b] : d(g, f) < ε} where f is an element of the countable dense
set, so that it suffices to show that Γ−1
(Uf ) is Borel. But this follows immediately
from our hypotheses (i)-(iii) and the equality
Γ−1
(Uf ) = Bf t (Af \B) (4.39)
But since being Borel is the same as having a Borel graph (cf. Kechris [84] Theo-
rem 14.12), it follows that Γ has a Borel graph, and hence we can infer that C is
Borel from the following equality and the hypothesis (i):
C = graph(Γ) ∩ (B ×M [a, b]) (4.40)
Hence, in fact, it suffices to establish (i)-(iii).
285
For (i), we must establish that B from equation (4.36) is Borel. To this end,
define
D = {(F, x) ∈ C[a, b]× [a, b] : F ′(x) exists} (4.41)
Further, for F ∈ C[a, b], x ∈ [a, b] and |h| > 0, define 4(F,x)(h) = F (x+h)−F (x)h
.
Then D is analytic, since for F ∈ C([a, b]) we have (where Q+ = Q ∩ (0,∞)
(F, x) ∈ D ⇐⇒
∃ L ∈ R ∀ ε ∈ Q+ ∃ δ ∈ Q+ ∀ |h| ∈ Q ∩ (0, δ)∣∣4(F,x)(h)− L
∣∣ < ε
(4.42)
Likewise, D is co-analytic, since for F ∈ C([a, b]) we have
(F, x) ∈ D ⇐⇒ ∀ hn, h′n → 0
[4(F,x)(hn),4(F,x)(h′n) Cauchy &
∣∣4(F,x)(hn)−4(F,x)(h′n)∣∣→ 0]
(4.43)
Hence, by Souslin’s Theorem ([84] Theorem 14.11, it follows that D is Borel. Since
D is Borel, the following set is Borel by Kechris [84] Theorem 17.25:
{F ∈ C([a, b]) : µ(DF ) = b− a} (4.44)
But this set is precisely equal to B, so that B too is Borel.
The proofs of (ii) and (iii) are nearly identical, and so we include only the
proof of (ii). For this, it must be shown that Bf from equation (4.37) is Borel for
f ∈ M [a, b] from some countable dense set in M [a, b]. So by Proposition 182, we
may suppose that f ∈ M [a, b] is a rational-valued simple functions formed from
286
open intervals with rational endpoints, so that f =∑N
i=1 biχ[ai,ai+1] where a =
a1 < a2 < · · · < aN+1 = b. Further note that, as in the proof of Proposition 183
(cf. equation (4.24)), it suffices to show that the following set is Borel:
Bf,r = {F ∈ B : µ({x ∈ [a, b] : |F ′(x)− f | > r}) < r} (4.45)
Then in analogue to equation (4.41), define
Di,r = {(F, x) ∈ C[a, b]× [ai, ai+1] : F ′(x) exists & |F ′(x)− bi| > r} (4.46)
Then, just as in the proof of (i) in the above paragraph, it can be shown that Di,r
is Borel. Since it is Borel, the following set is Borel by Kechris [84] Theorem 17.25:
{F ∈ B :N∑i=1
µ(Di,rF ) < r} (4.47)
But this set is precisely equal to Bf,r, which is what we wanted to show.
Corollary 195. The set of (f, F ) in M [a, b] × C[a, b] such that f ∈ Den[a, b]
and F (x) =∫ xaf is co-analytic but not analytic, and hence, assuming analytic
determinacy, this set is Π11-complete.
Proof. That this set is co-analytic follows immediately from the previous propo-
sition and theorem, as well as Corollary 174. That the set is not analytic follows
from the fact that if the set is Borel then there is α < ω1 such that |f, F | ≤ α
for all f, F in the class (see Kechris [84] Theorem 35.23). But this contradicts
Corollary 166. The last statement about determinacy is just Kechris [84] Theo-
rem 26.4.
287
Corollary 196. For all α < ω1 the set of (f, F ) in M [a, b] × C[a, b] such that
f ∈ Den∗α[a, b] and F (x) =∫ xaf is Borel, and the set Den∗α[a, b] is analytic.
Proof. That this set is Borel follows from the previous theorem and Corollary 177.
That the set Den∗α[a, b] is analytic follows from the fact that its definition requires
us to say that there are g ∈ Den∗β[a, b] and G(x) =∫ xag.
Corollary 197. The set of F in C[a, b] such that F ∈ ACG∗([a, b]) is co-
analytic but not analytic, and hence, assuming analytic determinacy, this set
is Π11-complete.
Proof. That this set is co-analytic follows immediately from the previous theorem
and Proposition 159 (v). That the set is not analytic follows from the fact that
if the set is Borel then there is α < ω1 such that |F | ≤ α for all f, F in the class
(see Kechris [84] Theorem 35.23). But this contradicts Corollary 166. The last
statement about determinacy is just Kechris [84] Theorem 26.4.
4.4 Model Theory
There are many different languages in which one can view C[a, b], L1[a, b],
〈Den∗α[a, b]〉, 〈Denα[a, b]〉 and Den[a, b]. For instance, as abelian groups, they
are all isomorphic since all are divisible torsion-free abelian groups of cardinal-
ity 2ω, and divisible torsion-free abelian groups are uncountably categorical (cf.
Marker [107] Corollary 3.1.11 p. 72). In this section, the relationship between
C[a, b], L1[a, b], 〈Den∗α[a, b]〉, 〈Denα[a, b]〉 and Den[a, b] as Q[X]-modules (resp.
R[X]-modules) is studied, where we interpret the map f 7→ Xf as the indefinite
integral, so that Xf =∫ xaf . More specifically, it is assumed that this indefinite
integral is the Riemann integral on C[a, b], the Lebesgue integral on L1[a, b], and
288
the Denjoy integral on 〈Den∗α[a, b]〉, 〈Denα[a, b]〉 and Den[a, b]. Note too that this
integral is the indefinite integral, so that e.g. if Xf = 0, then f = 0.
Recall that the signature ofR-modules is simply the signature of abelian groups
equipped with linear maps r for each element r or R (cf. Marker [107] Exam-
ple 1.2.7 p. 17, Hodges [70] p. 37 and Appendix A1 pp. 653 ff, Prest [124] p. 2).
Hence, e.g. the signature of R[X]-modules is uncountable, whereas the signature
of Q[X]-modules is countable. Likewise, since elements r of R correspond to lin-
ear maps in an R-module M , subsets of M such as rM = {ra : a ∈ M} and
ker(r) = {a ∈M : ra = 0} are definable without parameters in M .
There are two main results of this section. The first is that the structures
C[a, b], L1[a, b], 〈Den∗α[a, b]〉, 〈Denα[a, b]〉 and Den[a, b] are stable but not su-
perstable as Q[X]-modules or R[X]-modules, and that the indefinite integral
Xf 7→∫ xaf is not definable in theses structures as Q-vector spaces or R-vector
spaces (cf. Corollary 204). The second main result is that as Q[X]-modules, or
R[X]-modules, these structures are elementarily equivalent, and as Q[X]-modules
their complete theory is decidable (cf. Corollary 227).
4.4.1 Indexes of Subgroups and Non-Definability of the Integral
Remark 198. The goal of this subsection is to establish that the index [XkM :
Xk+1M ] ofXk+1M inXkM is infinite, whereM ⊆M [a, b] is one of C[a, b], L1[a, b],
〈Den∗α[a, b]〉, 〈Denα[a, b]〉 or Den[a, b] (cf. Theorem 200). This result is used to
show that these modules are stable but not superstable, which in turns shows us
that the indefinite integral Xf =∫ xaf is not definable in M as a vector-space over
the reals or rationals (cf. Corollary 204). This information about the cardinality
of [XkM : Xk+1M ] will also be used in the next subsection to show that all of
289
these modules are elementarily equivalent.
Remark 199. Recall from Definition 141 that a subset X ⊆ M [a, b] is said to
be subinterval-closed if f ∈ X and (c, d) ⊆ (a, b) implies fχ(c,d) ∈ X . Further,
recall that it was shown in Proposition 157 that the subspaces 〈Denα[a, b]〉 and
〈Den∗α[a, b]〉 are sub-interval closed. Finally, note that L1[a, b] and Den[a, b] are
sub-interval closed. Hence, the following theorem can be applied to all these
modules.
Theorem 200. Suppose thatM is a submodule of Den[a, b] which contains C[a, b].
Suppose further that one of the following conditions holds: (i) M = C[a, b] or (ii)
M is subinterval-closed. Then [XkM : Xk+1M ] is infinite.
Proof. First we show this for M satisfying hypothesis (i). For each f ∈ M we
may choose g ∈ C[a, b] such that f = g a.e., and so M may be identified with
C[a, b]. This implies that for k ≥ 0 we have
XkM = {f ∈ Ck[a, b] : ∀ i < k f (i)(a) = 0} (4.48)
where we stipulate X0M = M and C0[a, b] = C[a, b]. For, in the case of k = 0,
this follows by our stipulation. Suppose that (4.48) holds for k. To see it holds
for k + 1, consider first the left-to-right containment. That is, suppose that f ∈
Xk+1M . Then f =∫ xag where g ∈ XkM ⊆ M = C[a, b]. Then since this is the
Riemann integral applied to continuous functions, it follows that f is differentiable
everywhere and that f ′ = g, where g is by hypothesis a continuous function. Then
f (k+1)(a) = g(k)(a) = 0 by induction hypothesis and f(a) =∫ 0
ag = 0, so that
f ∈ Ck+1[a, b] and ∀ i < k + 1 f (i)(a) = 0. For the right-to-left containment of
(4.48) in the case of k+ 1, suppose that f ∈ Ck+1[a, b] and ∀ i < k+ 1 f (i)(a) = 0.
290
Let g = f ′ which by hypothesis is in C[a, b] = M . Then by induction hypothesis, it
follows that g ∈ XkM ⊆M = C[a, b], so that∫ xag =
∫ xaf ′ = f(x)− f(a) = f(x),
so that f ∈ Xk+1M . Hence, in fact (4.48) holds for all k ≥ 0.
Now Ck[a, b] is a Banach space with norm given by
‖f‖u,k =∑
0≤i≤k
‖f (i)‖u (4.49)
where ‖ · ‖u is the sup-norm on C[a, b] (cf. Folland [43] Exercise 9 p. 155). From
this and equation (4.48) it follows that XkM is a closed subgroup of Ck[a, b] and
hence is itself a Polish group (cf. Gao [51] Proposition 2.2.1 p. 45). Now, note
that for all k ≥ 0, it is the case that XkM and Xk+1M are homeomorphic by
the map f 7→ Xf . This map is clearly bijective, and it is continuous since if
‖f − g‖u,k < min{ ε2, ε
2(b−a)}, then
‖Xf −Xg‖u,k+1 =∑
0≤i≤k+1
‖(X(f − g))(i)‖u ≤ supx∈[a,b]
∫ x
a
|f − g|+∑
0≤i≤k
‖(f − g)(i)‖u
< [ supx∈[a,b]
∫ x
a
ε
2(b− a)] +
ε
2=
ε
2(b− a)· (b− a) +
ε
2= ε (4.50)
Note that in this equation, it is important to remember that the integral is the
Riemann integral, and hence it is permissible to infer from the integrability of a
function to the integrability of its absolute value (cf. Remark 169). Further, this
291
map is open since if ‖Xf −Xg‖u,k+1 < ε then
‖f − g‖u,k ≤ ‖Xf −Xg‖u +∑
0≤i≤k
‖(f − g)(i)‖u
= ‖Xf −Xg‖u +∑
1≤i≤k+1
‖(Xf −Xg)(i)‖u = ‖Xf −Xg‖u,k+1 < ε
(4.51)
Hence, in fact XkM and Xk+1M are homeomorphic via the map f 7→ Xf .
By induction on k ≥ 0, it follows from this that Xk+1M is meager in XkM .
For k = 0, note that XM = XC[a, b] is meager in M = C[a, b] since the nowhere
differentiable functions are comeager in M and contained in the set M \ XM
(cf. Munkres [115] Theorem 49.1 p. 300). Suppose that it holds for k, that is
suppose that Xk+1M is meager in XkM . Since meagerness is preserved under
homeomorphisms, it follows that Xk+2M is meager in Xk+1M , which is just to
say that the statement holds for k + 1.
From this it easily follows that [XkM : Xk+1M ] is infinite, and indeed un-
countable. For, suppose that [XkM : Xk+1M ] were countable. Then XkM =⊔n gn + Xk+1M , where gn ∈ XkM . Since XkM is a Polish group and each
Xk+1M is nowhere dense in XkM , we have that each gn + Xk+1M is nowhere
dense in XkM (since addition by a constant is a homeomorphism in any Polish
group). Hence, the Polish space XkM is a countable union of nowhere dense sub-
sets, contradicting the Baire Category Theorem. So [XkM : Xk+1M ] is infinite
(and indeed uncountable) for M satisfying hypothesis (i).
Now we show this for M satisfying hypothesis (ii). Suppose that this fails,
and [XkM : Xk+1M ] is finite. Then XkM =⊔ni=1X
kfi +Xk+1M , where fi ∈M .
Choose continuous nowhere differentiable function g ∈ C[a, b] ⊆ M . Choose a
292
partition [a, b] = [a1, b1]t · · ·t [an, bn], and let h = Xk[g+∑n
i=1 fiχ[ai,bi]], which is
in XkM since M is subinterval-closed. So, by hypothesis, there is j ∈ [1, n] such
that h ∈ Xkfj +Xk+1M . Then
h−Xkfj = Xk[g + (n∑i=1
fiχ[ai,bi])− fj] ∈ Xk+1M (4.52)
From this it follows that
g + (n∑i=1
fiχ[ai,bi])− fj ∈ XM (4.53)
But then this function is differentiable a.e. and so differentiable a.e. on each
[ai, bi]. But on the interval [aj, bj], this function is equal to g, which contradicts
the choice of g. So [XkM : Xk+1M ] is infinite when M satisfies hypothesis (ii).
Remark 201. It is a classical result that all modules are stable (cf. Prest [124]
Theorem 3.1 (a) p. 55, or Hodges [70] Theorem A.1.13 p. 660). However, on the
basis of the following proposition, it can be inferred from the previous theorem
that the modules of integrable functions which we are concerned with are not
superstable.
Proposition 202. A module M is superstable if and only if there is no infinite
descending sequence of definable subgroups, each of infinite index in its predeces-
sor.
Proof. See Prest [124] Theorem 3.1 (b) p. 55, or Ziegler [162] Theorem 2.1 p. 156.
Corollary 203. Suppose that M is a submodule of Den[a, b] which contains
C[a, b]. Suppose further that one of the following conditions holds: (i) M = C[a, b]
293
or (ii) M is subinterval-closed. Then M is stable but not superstable. Further,
the map X : M →M is not definable in M as a vector-space over R or Q.
Proof. The part about stability and superstability follows immediately from the
above proposition (Proposition 202) and the previous theorem (Theorem 200).
Suppose that the map X : M 7→ M was definable in M as a vector-space over R
or Q. Then since M as a vector-space over either of these fields is superstable (cf.
Hodges [70] p. 330 and Exercise 6 p. 283), and since superstability is preserved
downward under definability, it would follow that M as an R[X]-module or Q[X]-
module would be superstable.
Corollary 204. Suppose that M is C[a, b], L1[a, b], 〈Den∗α[a, b]〉, 〈Denα[a, b]〉 or
Den[a, b], considered as a R[X]-module (resp. Q[X]-module), where Xf =∫ xaf .
Then M is stable but not superstable, and the map X : M →M is not definable
in the structure of M as a R-vector space (resp. Q-vector space).
Proof. This follows immediately from the previous corollary, keeping in mind that
the conditions of the previous corollary are satisfied for these modules, as we noted
in Remark 199.
4.4.2 Elementary Equivalence and Decidability
Remark 205. Considered as an abelian group or as a Q-module, the structures
C[a, b], L1[a, b], 〈Den∗α[a, b]〉, 〈Denα[a, b]〉, and Den[a, b] have elementarily equiva-
lent and decidable complete theories, since these theories are respectively the the-
ories of divisible torsion-free abelian groups and Q-vector spaces. But considered
in the language of rings, the theory of C[a, b] is very complex, since it is known to
interpret full second-order arithmetic (cf. Cherlin [20] pp. 47-48), and it is known
that Den[a, b] is not closed under multiplication (cf. Swartz [142] Example 14
294
p. 43). Hence, it is a natural question to ask after the elementary equivalence and
decidability of the complete theories of C[a, b], L1[a, b], 〈Den∗α[a, b]〉, 〈Denα[a, b]〉,
and Den[a, b] as Q[X]-modules, where again X is interpreted as the indefinite in-
tegral, so that Xf =∫ xaf . In this section, we show that the complete theories of
these structures are elementarily equivalent and decidable (cf. Corollary 227).
Definition 206. If M is a module over a ring R, then a pp-formula ϕ(x1, . . . , xj)
is a formula of the form ∃ y1, . . . , yk∧ni=1 ϕi(x1, . . . , xj, y1, . . . , yk) where ϕi is an
atomic formula. Further, since the language is that of R-modules, atomic formulas
have the form ψ(x1, . . . , xj, y1, . . . , yk) ≡∑j
`=1 r`x`+∑k
`=1 s`y` = 0, where r`, s` ∈
R. Hence, for a pp-formula ϕ(x1, . . . , xj), there is a n × j matrix A and n × k
matrix B with entries in R such that M |= ∃ y∧ni=1 ϕi(x, y)⇐⇒ ∃ y Ax+By = 0,
where we view x as a j × 1 matrix and y as an k × 1 matrix, and where 0 is the
n× 1 matrix with entires all equal to 0.
Remark 207. Note that any subset G ⊆ M j defined by a pp-formula is a sub-
group of M j.
Definition 208. The invariant sentences of Th(M) are sentences of the form
[G : G ∩ H] = k or [G : G ∩ H] > k, where k ≥ 0 and where G,H ⊆ M are
pp-definable subgroups of M which are definable without parameters.
Theorem 209. (pp-Elimination of Quantiifers) (i) Every set definable without
parameters in an R-module M is a Boolean combination of pp-definable sets. (ii)
For an R-module M , the theory Th(M) is axiomatized by the R-module axioms
and the invariant sentences of M .
Proof. See Prest [124] Corollaries 2.16 & 2.19 p. 37 and Hodges [70] p. 655.
295
Definition 210. A pp-formula ϕ(x1, . . . , xj) is said to be basic if it can be written
as ∃ y (∑j
`=1 r`x`) + sy = 0 or as r`x` = 0. That is, over an R-module M , the
basic pp-formula definable sets are r−1sM j or ker(r) since
a ∈ r−1sM j ⇐⇒ r ·a ∈ sM j ⇐⇒ ∃ b ∈M r ·a = sb⇐⇒ ∃ b ∈M r ·a+ sb = 0
a ∈ ker(r)⇐⇒ r · a = 0
Proposition 211. (i) If R is a PID, then every pp-formula formula is equivalent to
a finite conjunction of basic pp-formulas. (ii) Further if R is countable, then given
a pp-formula one can compute from R the finite conjunction of basic pp-formulas.
Proof. The proof of (i) is from Prest [124] Theorem 2.Z.1 pp. 46-47, which we
include merely for the sake of noting that this proof also gives us a proof of (ii).
The pp-formula defines a set ∃ y Ax + By = 0. Since R is a PID, there is a
diagonal matrix D and invertible matricies U, V such that UBV = D. Then
∃ y Ax+By = 0⇐⇒ ∃ y UAx+ UBV V −1y = 0
⇐⇒ ∃ y UAx+DV −1y = 0⇐⇒ ∃ y UAx+Dy = 0 (4.54)
Since D is diagonal, this is equivalent to a finite conjunction of basic pp-formulas.
Moreover, since there is an algorithm for computing the matrices D,U, V from
the matrix B and an oracle for R, this procedure is computable in R.
Definition 212. Suppose that M is a normed space. Then a compact linear
operator p : M → M is a linear operator which maps bounded sets to sets with
compact closure.
Theorem 213. (Riesz Theorem) Suppose that M is a normed space and p : M →
296
M is a compact linear operator, and consider the map 1+p given by a 7→ a+p(a).
Then 1 + p is surjective if and only if 1 + p is injective.
Proof. See Kress [94] Theorem 3.4 p. 32.
Definition 214. Suppose that U ⊆ C[a, b]. Then U is pointwise bounded if for
every x ∈ U there is M > 0 such that |f | ≤ M for all f ∈ U . Further, U is
equicontinuous if for every ε > 0 there is δ > 0 such that |x− y| < δ implies
|f(x)− f(y)| < ε for all f ∈ U .
Theorem 215. (Arzela-Ascoli) Suppose that U ⊆ C[a, b]. Then U has compact
closure if and only if U is pointwise bounded and equicontinuous.
Proof. See Folland [43] Theorem 4.43.
Proposition 216. The map X : C[a, b] → C[a, b] given by Xf =∫ xaf is a
compact linear operator.
Proof. Suppose that U ⊆ C[a, b] is bounded, say |f |u < M for all f ∈ U . We must
show that XU is relatively compact, which by the Arzela-Ascoli comes down to
showing that it is pointwise bounded and equicontinuous. For pointwise bouded-
ness, simply note that |(Xf)(x)| =∣∣∫ xaf(t)dt
∣∣ ≤ ∫ xa|f(t)| dt ≤
∫ xaMdt ≤ M(b −
a). For equicontinuity, suppose that ε > 0 and let δ < εM
. If 0 < x − y < δ then
|(Xf)(x)− (Xf)(y)| =∣∣∫ xaf(t)dt−
∫ yaf(t)d(t)
∣∣ =∣∣∣∫ xy f(t)dt
∣∣∣ ≤ ∫ xy|f(t)| dt ≤∫ x
yMdt = M(x− y) < M ε
M= ε.
Proposition 217. Suppose that M is an module over a commutative ring R and
that r ∈ R such that r : M →M is a bijection. Further suppose that s ∈ R such
that (i) sa = 0 implies a = 0 for all a ∈M , and such that (ii) s is invertible in R.
Then sr : M →M is a bijection.
297
Proof. If (sr)a = 0 then s(ra) = 0 then by hypothesis (i) on s we have that ra = 0,
and by injectivity of r we have a = 0. Hence sr is injective. Suppose that b ∈M .
By surjectivity of r choose a ∈ M such that ra = b. By hypothesis (ii) on s and
the commutativity of R, we have (sr)(s−1a) = (srs−1)a = (ss−1r)a = ra = b.
Hence sr is surjective.
Proposition 218. Suppose that p ∈ R[X] such that X - p. Then p : C[a, b] →
C[a, b] is a bijection.
Proof. Since X - p, by the previous proposition we may without loss of generality
write p(X) = 1 + a1X + · · · + akXk. Further, set C = max{|ai| : i ∈ [1, k]}.
Proposition 216 implies that a1X + · · ·+ akXk is a compact linear operator, and
hence, by the Riesz Theorem 213, it suffices to show that p is injective. Before
proceeding, note that if 1 denotes the real-valued function which is equal to 1
everywhere on [a, b], then Xk(1) = (x−a)kk!
for k ≥ 1. Recall also that if f ∈ C[a, b]
then ‖f‖u denotes the supremum of f on [a, b]. From this it follows easily that
Xkf ≤ ‖f‖u · (x−a)kk!
for k ≥ 1 and f ∈ C[a, b].
So suppose that f ∈ ker p. We must show that f(x) = 0 for all x ∈ [a, b]. It
suffices to show that there is ex ≥ 0 such that for all n ≥ 0.
|f(x)| ≤ ‖f‖uenxn!
(4.55)
For, choose N > 0 such that ex < N . Then for n ≥ N it follows that the ratio
en+1x
(n+1)!
enxn!
=ex
n+ 1<
N
n+ 1< 1 (4.56)
Hence, by the ratio test, the series∑∞
n=1enxn!
converges, from which it follows that
limnenxn!
= 0, so that (4.55) implies that f(x) = 0.
298
The values of ex depend on whether x−a ≥ 1. So first suppose that x−a ≥ 1.
Then define ex = Ck(x− a)k, so that it suffices to show for all n ≥ 1 that
|f(x)| ≤ ‖f‖u(Ck(x− a)k)n
n!= ‖f‖u
Cnkn(x− a)kn
n!(4.57)
For n = 1, this follows since 0 = pf implies f = −a1Xf − · · · − akXkf and hence
since x− a ≥ 1 it follows that
|f(x)| ≤k∑i=1
C‖f‖u ·(x− a)i
i!≤
k∑i=1
C‖f‖u · (x− a)k = ‖f‖uCk(x− a)k (4.58)
Suppose that (4.57) holds for n. To show that it holds for n + 1, first note again
that 0 = pf implies f = −a1Xf − · · · − akXkf . From this, the fact that (4.57)
holds for n, and x− a ≥ 1, it follows that
|f(x)| ≤k∑i=1
C‖f‖uX i(Cnkn(x− a)kn
n!) =
k∑i=1
C‖f‖uCnkn(x− a)kn+i
n! · kn · (kn+ 1) · · · (kn+ i)
(4.59)
≤k∑i=1
‖f‖uCn+1kn(x− a)k(n+1)
(n+ 1)!= ‖f‖uCn+1kn+1 (x− a)k(n+1)
(n+ 1)!(4.60)
Hence, we have that (4.57) holds for n + 1. Hence, by mathematical induction,
(4.57) holds for all n ≥ 1, which is what we wanted to show.
Now consider the second case, where x− a < 1. Then define ex = Ck(x− a),
so that it suffices to show for all n ≥ 1 that
|f(x)| ≤ ‖f‖u(Ck(x− a))n
n!= ‖f‖u
Cnkn(x− a)n
n!(4.61)
For n = 1, this follows since 0 = pf implies f = −a1Xf − · · · − akXkf and hence
299
since x− a < 1 it follows that
|f(x)| ≤k∑i=1
C‖f‖u ·(x− a)i
i!≤
k∑i=1
C‖f‖u · (x− a) = ‖f‖uCk(x− a) (4.62)
Suppose that (4.61) holds for n. To show that it holds for n + 1, first note again
that 0 = pf implies f = −a1Xf − · · · − akXkf . From this, the fact that (4.61)
holds for n, and x− a < 1, it follows that
|f(x)| ≤k∑i=1
C‖f‖uX i(Cnkn(x− a)n
n!) =
k∑i=1
C‖f‖uCnkn(x− a)n+i
(n+ i)!(4.63)
≤k∑i=1
‖f‖uCn+1kn(x− a)n+1
(n+ 1)!= ‖f‖uCn+1kn+1 (x− a)k(n+1)
(n+ 1)!(4.64)
Hence, we have that (4.61) holds for n + 1. Hence, by mathematical induction,
(4.61) holds for all n ≥ 1, which is what we wanted to show.
Remark 219. The following trick of lifting the Riesz theory to Den[a, b] is more or
less explicit in the proof of Theorem 3.10 of Federson and Bianconi ([35] pp. 103 ff),
although they restrict themselves to the case of Den[a, b] and do not frame this in
the language of modules.
Proposition 220. Suppose that M is a submodule of Den[a, b] which contains
C[a, b]. Suppose that p ∈ R[X] such that X - p. Then p : M →M is a bijection.
Proof. By Proposition 217, we may assume that p(X) = 1 + a1X + · · · + akXk.
To see that p is injective, note that if pf = 0 then f = −a1Xf − · · · − akXkf .
Since XM ⊆ C[a, b], we have that f ∈ C[a, b] and pf = 0 in C[a, b]. But by the
previous proposition, p : C[a, b]→ C[a, b] is an injection, and hence f = 0. So in
fact p : M →M is an injection.
300
To see that p : M → M is a surjection, suppose that g ∈ M . Since XM ⊆
C[a, b], we have that (p − 1)g ∈ C[a, b] and hence −(p − 1)g ∈ C[a, b]. By
the previous proposition, p : C[a, b] → C[a, b] is a surjection, and hence there is
f ∈ C[a, b] such that pf = −(p−1)g. Then p(f+g) = pf+pg = −(p−1)g+pg =
(−(p− 1) + p)(g) = g. Hence, in fact p : M →M is a surjection.
Proposition 221. Suppose that M is a module over a commutative ring R and
that r ∈ R is such that r : M → M is bijective. Then r is an automorphism of
M .
Proof. Let α : M → M by α(a) = ra. By hypothesis, α is a bijection. Further,
by the definition of a module, α(a + b) = r(a + b) = ra + rb = α(a) + α(b), and
likewise since R is commutative we have α(sa) = r(sa) = (rs)a = (sr)a = s(ra) =
sα(a).
Corollary 222. Suppose that M is a submodule of Den[a, b] which contains
C[a, b]. Suppose that p ∈ R[X] such that X - p. Then p : M → M is an
automorphism of M as an R[X]-module (or Q[X]-module).
Proposition 223. Suppose that M is a submodule of Den[a, b] which contains
C[a, b]. Suppose further that p, q ∈ R[X]. Then p−1qM is either M or X`M for
some ` > 0. Further, there is a computable procedure which (i) given p, q ∈ Q[X]
determines which of these occurs and which (ii) returns ` > 0 if the latter occurs.
Proof. Compute the largest k such that Xk divides both p and q. Let p = Xkp0
and q = Xkq0. Then p−1qM = p−10 q0M since pf + qg = 0 if and only if Xk(p0f +
q0g) = 0 if and only if p0f + q0g = 0. Now either X | q0 or X - q0, and we can
compute which of these occurs.
301
If X | q0 then by definition of k we have X - p0 and so by the previous
proposition p0 is an automorphism of M as a R[X]-module. Further, if X | q0 then
compute the largest ` > 0 such that X` | q0. Let q0 = X`q1, where X - q1. Then by
the previous proposition, q1 is an automorphism of M as a R[X]-module. Then we
have the following, where the last equality is due to the fact that automorphisms
fix definable sets:
p−1qM = p−10 q0M = p−1
0 X`q1M = p−10 X`M = X`M (4.65)
On the other hand, suppose that X - q0. Then by the previous proposition, q0
is an automorphism of M as a R[X]-module. Then we have the following, where
the last equality follows from the fact that r−1M = M for any R-module M and
r ∈ R:
p−1qM = p−10 q0M = p−1
0 M = M (4.66)
Proposition 224. Suppose that M is a submodule of Den[a, b] which contains
C[a, b]. Suppose further that p ∈ R[X]. Then ker(p) is either 0 or M . Further,
there is a computable procedure which given p, q ∈ Q[X] determines which of
these occurs.
Proof. If p is zero then ker(p) = M , and we can compute whether this occurs. If
p is non-zero, then compute the largest k such that Xk divides p. Let p = Xkp0.
Then ker(p) = ker(p0) since Xkp0f = 0 if and only if p0f = 0. Then X - p0 and so
by Corollary 222, we have that p0 is an automorphism of M as an R[X]-module
and so ker(p0) = 0.
Corollary 225. Suppose that M is a submodule of Den[a, b] which contains
302
C[a, b]. Suppose further that one of the following conditions holds: (i) M = C[a, b]
or (ii) M is subinterval-closed. Suppose finally that G,H are pp-definable sub-
groups of M definable without parameters. Then [G : G ∩H] = 1 or [G : G ∩H]
infinite, and from formulas defining G and H we can compute which of these oc-
curs. Further, this procedure is uniform in such M , in that formulas for G and H
will return the same values for [G : G ∩H] for all such M .
Proof. By Proposition 211 and the two previous propositions, G and H are finite
conjunctions of the subgroups 0, X`M , and M , and hence themselves are among
the subgroups 0, X`M , and M . Further, from Proposition 211 and the two
previous propositions, given formulas defining G we can computably determined
whether G (resp. H) is 0, X`M , or M . So there are nine possible cases to consider.
The cases in which 0 occurs are trivial, and so there are really only four interesting
cases to consider. Case one: G = M and H = M . Then [G : G ∩H] = 1. Case
two: G = M and H = XkM . Then [G : G∩H] infinite by Proposition 200. Case
three: G = X`M and H = M . Then [G : G ∩H] = 1. Case four: G = X`M and
H = XkM . Then [G : G ∩ H] = 1 if ` ≥ k and [G : G ∩ H] infinite if ` < k by
Proposition 200.
Corollary 226. Suppose that M is a submodule of Den[a, b] which contains
C[a, b]. Suppose further that one of the following conditions holds: (i) M = C[a, b]
or (ii) M is subinterval-closed. Then the theory of M as a Q[X] module is com-
putable, and any two such M have the same theory as R[X]-modules or Q[X]-
modules.
Proof. Consider the following computable theory T of Q[X]-modules. This theory
has all the axioms of Q[X]-modules, and if G and H are pp-definable subgroups
then T has the axiom [G : G ∩H] = 1 or the axioms [G : G ∩H] > k according
303
to what the computation in the previous corollary returns. By Theorem 209 (ii),
this theory T is the complete theory of M as a Q[X]-module.
Corollary 227. The Q[X]-modules C[a, b], L1[a, b], 〈Den∗α[a, b]〉, 〈Denα[a, b]〉, and
Den[a, b] have elementarily equivalent and decidable complete theories. Further,
as R[X]-modules they are elementarily equivalent.
Proof. This follows immediately from the previous corollary and the fact, noted
in Remark 199, that we can apply Theorem 200 to all these modules.
4.5 Further Questions
Question 228. Can the assumptions of analytic determinacy be removed in
Corollaries 195 & 197? By Corollary 195, we have that Den[a, b] is Σ12. Is
Den[a, b] co-analytic or Σ12, perhaps Σ1
2-complete? By Corollary 195, we have
that Den∗α[a, b] is analytic. Is Den∗α[a, b] also Borel?
Question 229. Viewing Derv[a, b] as a subspace of (C[a, b])ω, Dougherty and
Kechris [28] show that Derv[a, b] is co-analytic but not analytic (and indeed not
even analytic on the co-analytic subspace of Cω[a, b] of sequences which converge
pointwise). Is Derv[a, b] co-analytic in M [a, b]?
Question 230. Do the stability, elementary equivalence, and decidability results
from § 4.4 still hold if one views C[a, b], L1[a, b], 〈Den∗α[a, b]〉, 〈Denα[a, b]〉, and
Den[a, b] as R[X] or Q[X]-modules, where Xf 7→∫ baK(x, y)f(y)dy for appropriate
real-valued continuous functions K(x, y)? Note that some care has to be exercised
with respect to the choice of K, since Den[a, b] is not closed under multiplication
(cf. Swartz [142] Example 14 p. 43).
304
BIBLIOGRAPHY
1. Aetas Kantiana. Culture et Civilisation, Bruxelles, 1968-1981.
2. Brad Armendt. Dutch Books, Additivity, and Utility Theory. PhilosophicalTopics, 21:1–20, 1993.
3. Emil Artin. Geometric Algebra. Interscience, New York, 1957.
4. James Ax. The Elementary Theory of Finite Fields. Annals of Mathematics,88:239–271, 1968.
5. Alan Baker. Is there a Problem of Induction for Mathematics? In Mary Leng,Alexander Paseau, and Michael Potter, editors, Mathematical Knowledge,pages 59–72. Oxford University Press, Oxford, 2007.
6. Jon Barwise and John Schlipf. On Recursively Saturated Models of Arith-metic. In A. Dold and B. Eckmann, editors, Model Theory and Algebra,volume 498 of Lecture Notes in Mathematics, pages 42–55. Springer, Berlin,1975.
7. Paul Benacerraf. Logicism, Some Considerations. PhD Thesis, PrincetonUniversity, 1960.
8. Paul Benacerraf. Frege: The Last Logicist. Midwest Studies in Philosophy,6:17–35, 1981. Reprinted in [24].
9. Frederick C. Besier. The Fate of Reason: German Philosophy from Kant toFichte. Harvard University Press, Cambridge, 1987.
10. George Boolos. The Consistency of Frege’s Foundations of Arithmetic. InJudith Jarvis Thomson, editor, On Being and Saying: Essays in Honor ofRichard Cartwright, pages 3–20. MIT Press, Cambridge, 1987. Reprinted in[13], [24].
11. George Boolos. Frege’s Theorem and the Peano Postulates. The Bulletin ofSymbolic Logic, 1(3):317–326, 1995. Reprinted in [13].
305
12. George Boolos. On the Proof of Frege’s Theorem. In Adam Morton andStephen P. Stich, editors, Benacerraf and His Critics, pages 143–159. Black-well, 1996. Reprinted in [13].
13. George Boolos. Logic, Logic, and Logic. Harvard University Press, Cam-bridge, MA, 1998. Edited by Richard Jeffrey.
14. George Boolos and Richard G. Heck Jr. Die Grundlagen der Arithmetik, Sec-tions 82-83. In Matthias Schirn, editor, Philosophy of Mathematics Today,pages 407–428. Clarendon Press, 1998. Reprinted in [13].
15. John P. Burgess. Fixing Frege. Princeton Monographs in Philosophy. Prince-ton University Press, Princeton, 2005.
16. John P. Burgess and A. P. Hazen. Predicative Logic and Formal Arithmetic.Notre Dame Journal of Formal Logic, 39(1):1–17, 1998.
17. Samuel R. Buss. Nelson’s Work on Logic and Foundations and Other Re-flections on the Foundations of Mathematics. In William G. Faris, editor,Diffusion, Quantum Theory, and Radically Elementary Mathematics, vol-ume 47 of Mathematical Notes, pages 183–208. Princeton University Press,Princeton, NJ, 2006.
18. Ernst Cassirer. Substanzbegriff und Funktionsbegriff. Cassirer, Berlin, 1910.Reprinted in [19] vol. 6.
19. Ernst Cassirer. Gesammelte Werke. Meiner, Hamburg, 1998-2009. 26 vol-umes. Edited by Birgit Recki.
20. Gregory Cherlin. Rings of Continuous Functions: Decision Problems. InL. Pacholski, J. Wierzejewski, and A.J. Wilkie, editors, Model Theory ofAlgebra and Arithmetic, volume 834 of Lecture Notes in Mathematics, pages44–91. Springer, Berlin, 1980.
21. A.P. Dawid. Probability, Symmetry, and Frequency. British Journal for thePhilosophy of Science, 36(2):107–128, 1985.
22. Richard Dedekind. Was sind und was sollen die Zahlen? Vieweg, Braun-schweig, 1888. Second edition 1893. Reprinted in [23] vol. 3 pp. 335-391.
23. Richard Dedekind. Gesammelte mathematische Werke. Vieweg, Braun-schweig, 1930-1932. Three volumes. Edited by Robert Fricke, Emmy Noether,and Øystein Ore.
24. William Demopoulos, editor. Frege’s Philosophy of Mathematics. HarvardUniversity Press, Cambridge, 1995.
306
25. William Demopoulos and Peter Clark. The Logicism of Frege, Dedekind, andRussell. In Stewart Shapiro, editor, The Oxford Handbook of Philosophy ofMathematics and Logic, pages 129–165. Oxford University Press, 2005.
26. Keith J. Devlin. Constructibility. Perspectives in Mathematical Logic.Springer, Berlin, 1984.
27. J. L. Doob. Measure Theory, volume 143 of Graduate Texts in Mathematics.Springer, New York, 1994.
28. Randall Dougherty and Alexander S. Kechris. The Complexity of Antidif-ferentiation. Advances in Mathematics, 88(2):145–169, 1991.
29. Michael Dummett. Frege: Philosophy of Mathematics. Harvard UniversityPress, Cambridge, 1991.
30. John Earman. Bayes or Bust? A Critical Examination of BayesianConfirmation Theory. MIT Press, Cambridge, 1992.
31. Kenny Easwaran. Probabilistic Proofs and Transferability. PhilosophiaMathematica, 17:341–362, 2009.
32. Herbert B. Enderton. A Mathematical Introduction to Logic. Harcourt,Burlington, second edition, 2001.
33. Don Fallis. The Epistemic Status of Probabilistic Proof. The Journal ofPhilosophy, 94(4):165–186, 1997.
34. Don Fallis. The Reliability of Randomized Algorithms. British Journal forthe Philosophy of Science, 51:255–271, 2000.
35. Marcia Federson and Ricardo Bianconi. Linear Fredholm Integral Equationsand the Integral of Kurzweil. Journal of Applied Analysis, 8(1):83–110, 2002.
36. Solomon Feferman. Reflecting on Incompleteness. Journal of Symbolic Logic,56(1):1–49, 1991.
37. Solomon Feferman. In the Light of Logic. Logic and Computation in Phi-losophy. Oxford University Press, New York, 1998.
38. Solomon Feferman, Harvey M. Friedman, Penelope Maddy, and John R.Steel. Does Mathematics need New Axioms? The Bulletin of SymbolicLogic, 6(4):401–446, 2000.
39. Pierre Fermat. Remarques sur l’Arithmetique des Infinis du S. J. Wallis. InCommercium epistolicum de quaestionibus quibusdam mathematicis nuperhabitum, pages 24–29. Lichfield, 1656.
307
40. Fernando Ferreira and Kai F. Wehmeier. On the Consistency of the ∆11-CA
fragment of Frege’s Grundgesetze. Journal of Philosophical Logic, 31(4):301–311, 2002.
41. Branden Fitelson. The Plurality of Bayesian Measures of Confirmation andthe Problem of Measure Sensitivity. Philosophy of Science, 66:S362–S378,1999.
42. Branden Fitelson. A Decision Procedure for Probability Calculus with Ap-plications. Review of Symbolic Logic, 1(1):111–125, 2008.
43. Gerald B. Folland. Real Analysis. Pure and Applied Mathematics. JohnWiley & Sons Inc., New York, second edition, 1999.
44. Gottlob Frege. Die Grundlagen der Arithmetik. Koebner, Breslau, 1884.
45. Gottlob Frege. Kleine Schriften. Olms, Hildesheim, 1967. Edited by IgnacioAngelelli.
46. Jakob Friedrich Fries. Die mathematische Naturphilosophie nachphilosophischer Methode bearbeitet: ein Versuch. Mohr & Winter, Hei-delberg, 1822. Reprinted in [47] vol. 13.
47. Jakob Friedrich Fries. Samtliche Schriften. Scientia, Aalen, 1967-2004. 26volumes. Edited by Gert Konig and Lutz Geldsetzen.
48. Haim Gaifman. Reasoning with Limited Resources and Assigning Probabil-ities to Arithmetical Statements. Synthese, 140:97–119, 2004.
49. R. O. Gandy. Proof of Mostowski’s Conjecture. Bulletin de l’AcademiePolonaise des Sciences. Serie des Sciences Mathematiques, Astronomiques etPhysiques, 8:571–575, 1960.
50. Mihai Ganea. Burgess’ PV is Robinson’s Q. Journal of Symbolic Logic,72(2):618–624, 2007.
51. Su Gao. Invariant Descriptive Set Theory. Pure and Applied Mathematics.CRC Press, Boca Raton, 2009.
52. Clark Glymour. Relevant Evidence. Journal of Philosophy, 72(14):403–426,1975.
53. Clark Glymour. The Epistemology of Geometry. Nous, 11(3):227–251, 1977.
54. Clark N. Glymour. Theory and Evidence. Princeton. Princeton UniversityPress, 1980.
308
55. Kurt Godel. Collected Works. Vol. III. Unpublished Lectures and Essays.Clarendon, New York, 1995. Edited by Solomon Feferman et. al.
56. Russell A. Gordon. The Integrals of Lebesgue, Denjoy, Perron, and Henstock,volume 4 of Graduate Studies in Mathematics. American MathematicalSociety, Providence, RI, 1994.
57. Ronald L. Graham, Donald E. Knuth, and Oren Patashnik. ConcreteMathematics. Addison-Wesley, Reading, 1994. second edition.
58. Robert E. Greene and Steven G. Krantz. Function Theory of One ComplexVariable, volume 40 of Graduate Studies in Mathematics. American Math-ematical Society, Providence, RI, third edition, 2006.
59. Petr Hajek and Pavel Pudlak. Metamathematics of First-Order Arithmetic.Perspectives in Mathematical Logic. Springer, Berlin, 1998.
60. Bob Hale and Crispin Wright. The Reason’s Proper Study. Oxford UniversityPress, Oxford, 2001.
61. Valentina S. Harizanov. Pure Computable Model Theory. In Yu. L. Er-shov, S. S. Goncharov, A. Nerode, J.B. Remmel, and V. W. Marek, editors,Handbook of Recursive Mathematics, Vol. 1, volume 138 of Studies in Logicand the Foundations of Mathematics, pages 3–114. North-Holland, Amster-dam, 1998.
62. Robin Hartshorne. Geometry: Euclid and Beyond. Undergraduate Texts inMathematics. Springer, New York, 2000.
63. Allen Hatcher. Algebraic Topology. Cambridge University Press, Cambridge,2002.
64. Richard G. Heck, Jr. The Development of Arithmetic in Frege’s Grundge-setze der Arithmetik. The Journal of Symbolic Logic, 58(2):579–601, 1993.
65. Richard G. Heck, Jr. The Consistency of Predicative Fragments of Frege’sGrundgesetze der Arithmetik. History and Philosophy of Logic, 17(4):209–220, 1996.
66. Richard G. Heck, Jr. Frege’s Theorem: An Introduction. The HarvardReview of Philosophy, 7:56–73, 1999.
67. Richard G. Heck, Jr. Cardinality, Counting, and Equinumerosity. NotreDame Journal of Formal Logic, 41(3):187–209, 2000.
309
68. Thomas Hobbes. Six Lessons to the Professors of Mathematiques, One ofGeometry, the Other of Astronomy: in the Chaires set up by Sir Henry Savilein the University of Oxford. Crook, London, 1656. Reprinted in [69] vol. 7.
69. Thomas Hobbes. The English Works of Thomas Hobbes. Bohn, 1839. Elevenvolumes.
70. Wilfrid Hodges. Model Theory, volume 42 of Encyclopedia of Mathematicsand its Applications. Cambridge University Press, Cambridge, 1993.
71. Walter Hoering. Anomalies of Reduction. In Wolfgang Balzer, David A.Pearce, and Heinz-Jurgen Schmidt, editors, Reduction in Science: Structure,Examples, Philosophical Problems, pages 33–50. Dordrecht, 1984.
72. Thomas Hofweber. Proof-Theoretic Reduction as a Philosopher’s Tool.Erkenntnis, 53:127–146, 2000.
73. Paul Horwich. Probability and Evidence. Cambridge University Press, Cam-bridge, 1982.
74. Colin Howson and Urbach Peter. Scientific Reasoning: The BayesianApproach. Open Court, Chicago, second edition, 1993.
75. Kenneth Ireland and Michael Rosen. A Classical Introduction to ModernNumber Theory, volume 84 of Graduate Texts in Mathematics. Springer,New York, second edition, 1990.
76. Daniel Isaacson. Arithmetical Truth and Hidden Higher-Order Concepts.In Paris Logic Group, editor, Logic Colloquium ’85 (Orsay, 1985), volume122 of Studies in Logic and the Foundations of Mathematics, pages 147–169.North-Holland, Amsterdam, 1987.
77. Daniel Isaacson. Some Considerations on Arithmetical Truth and the ω-Rule. In Michael Detlefsen, editor, Proof, Logic and Formalization, pages94–138. Routledge, London, 1992.
78. St. Iwan. On the Untenability of Nelson’s Predicativism. Erkenntnis, 53(1-2):147–154, 2000.
79. Frank Jackson. Petitio and the Purpose of Arguing. Pacific PhilosophicalQuarterly, 65:26–36, 1984.
80. Thomas Jech. Set Theory. Springer Monographs in Mathematics. Springer,Berlin, 2003. The Third Millennium Edition.
81. James Joyce. How Probabilities Reflect Evidence. Philosophical Perspectives,19:153–178, 2005.
310
82. Abraham Gotthelf Kastner. Ueber die geometrischen Axiome.Philosophisches Magazin, 2(4):420–430, 1790. Reprinted in Aetas Kan-tiana [1] vol. 63.
83. Alexander S. Kechris. The Complexity of Antidifferentiation, Denjoy To-talization, and Hyperarithmetic Reals. In Proceedings of the InternationalCongress of Mathematicians. Berkeley, California, August 3-11, 1986, vol-ume 1, pages 307–313, Providence, RI, 1987. American Mathematical Soci-ety.
84. Alexander S. Kechris. Classical Descriptive Set Theory, volume 156 ofGraduate Texts in Mathematics. Springer, New York, 1995.
85. H. Jerome Keisler. Model Theory for Infinitary Logic. Logic with CountableConjunctions and Finite Quantifiers, volume 62 of Studies in Logic and theFoundations of Mathematics. North-Holland, Amsterdam, 1971.
86. Kevin T. Kelly. The Logic of Reliable Inquiry. Oxford University Press, NewYork, 1996.
87. Kevin T. Kelly. The Logic of Success. In Peter Clark and Katherine Hawley,editors, Philosophy of Science Today, pages 11–38. Oxford University Press,Oxford, 2000.
88. Kevin T. Kelly and Clark Glymour. Why Probability does not Capture theLogic of Success. In Christopher Hitchcock, editor, Contemporary Debatesin Philosophy of Science, pages 94–114. Blackwell, Malden, 2004.
89. Kevin T. Kelly and Oliver Schulte. Church’s Thesis and Hume’s Problem.In Maria Lusia Dalla Chiara, editor, Logic and Scientific Methods, pages159–177. Kluwer, Dordrecht, 1997.
90. Bakhadyr Khoussainov and Anil Nerode. Automata Theory and ItsApplications, volume 21 of Progress in Computer Science and Applied Logic.Birkhauser, Boston, MA, 2001.
91. Philip Kitcher. The Nature of Mathematical Knowledge. Oxford UniversityPress, Oxford, 1984.
92. Stephen Cole Kleene. Quantification of Number-Theoretic Functions.Compositio Mathematica, 14:23–40, 1959.
93. Peter Koellner. Truth in Mathematics: The Question of Pluralism. InOtavio Bueno and Øystein Linnebo., editors, New Waves in the Philosophyof Mathematics, pages 80–116. Palmgrave, 2009.
311
94. Rainer Kress. Linear Integral Equations, volume 82 of Applied MathematicalSciences. Springer, second edition, 1999.
95. Serge Lang. Algebra, volume 211 of Graduate Texts in Mathematics.Springer, New York, third edition, 2002.
96. Shaughan Lavine. Something about Everything: Universal Quantificationin the Universal Sense of Universal Quantification. In Agustın Rayo andGabriel Uzquiano, editors, Absolute Generality, pages 98–148. ClarendonPress, Oxford, 2006.
97. Gottfried Wilhelm Leibniz. New Essays on Human Understanding. Cam-bridge Texts in the History of Philosophy. Cambridge University Press, Cam-bridge, 1996. Edited and translated by Peter Remnant and Jonathan Ben-nett. References are to the page numbers of the original, which are given inthe margins of this edition.
98. Hannes Leitgeb. On Formal and Informal Provability. In Otavio Bueno andØystein Linnebo, editors, New Waves in the Philosophy of Mathematics,pages 263–299. Palmgrave, New York, 2009.
99. Per Lindstrom. Aspects of Incompleteness, volume 10 of Lecture Notes inLogic. Association for Symbolic Logic, Urbana, IL, second edition, 2003.
100. Øystein Linnebo. Predicative Fragments of Frege Arithmetic. The Bulletinof Symbolic Logic, 10(2):153–174, 2004.
101. John Locke. An Essay Concerning Human Understanding. Oxford UniversityPress, Oxford, 1979. Edited by Peter H. Nidditch.
102. Fraser MacBride. Speaking with the Shadows: A Study of Neo-Logicism.The British Journal for the Philosophy of Science, 54(1):103–163, 2003.
103. Fraser MacBride. Can Ante Rem Structuralism Solve the Access Problem?Philosophical Quarterly, 58(230):155–164, 2008.
104. Patrick Maher. Betting on Theories. Cambridge University Press, Cam-bridge, 1993.
105. Kenneth Manders. Diagram-Based Geometric Practice. In Paolo Mancosu,editor, The Philosophy of Mathematical Practice, pages 65–79. Oxford Uni-versity Press, Oxford, 2008.
106. Kenneth Manders. The Euclidean Diagram (1995). In Paolo Mancosu, editor,The Philosophy of Mathematical Practice, pages 80–133. Oxford UniversityPress, Oxford, 2008.
312
107. David Marker. Model Theory, volume 217 of Graduate Texts in Mathematics.Springer, New York, 2002.
108. David Marker. Introduction to the Model Theory of Fields. In Model Theoryof Fields, volume 5 of Lecture Notes in Logic, pages 1–37. Association forSymbolic Logic, La Jolla, CA, second edition, 2006.
109. David Marker, Margit Messmer, and Anand Pillay. Model Theory of Fields,volume 5 of Lecture Notes in Logic. Association for Symbolic Logic, La Jolla,CA, second edition, 2006.
110. Margit Messmer. Some Model Theory of Separably Closed Fields. In ModelTheory of Fields, volume 5 of Lecture Notes in Logic, pages 135–152. Asso-ciation for Symbolic Logic, La Jolla, CA, second edition, 2006.
111. William Mitchell. Beginning Inner Model Theory. In Matthew Foremanand Akihiro Kanamori, editors, Handbook of Set Theory, pages 1449–1496.Springer, Berlin, 2010.
112. Antonio Montalban and Andre Nies. Borel Structures: A Brief Survey. Un-published. Dated January 27, 2010.
113. Yiannis N. Moschovakis. Descriptive Set Theory, volume 100 of Studies inLogic and the Foundations of Mathematics. North-Holland, Amsterdam,1980.
114. Rajeev Motwani and Prabhakar Raghavan. Randomized Algorithms. Cam-bridge University Press, Cambridge, 1995.
115. James R. Munkres. Topology. Prentice Hall, Upper Saddle River, secondedition, 2000.
116. Mark Nadel. Lω1ω and Admissible Fragments. In Jon Barwise and SolomonFeferman, editors, Model-Theoretic Logics, Perspectives in MathematicalLogic, pages 271–316. Springer, New York, 1985.
117. Edward Nelson. Predicative Arithmetic, volume 32 of Mathematical Notes.Princeton University Press, Princeton, NJ, 1986.
118. Piergiorgio Odifreddi. Classical Recursion Theory. Vol. I, volume 125 ofStudies in Logic and the Foundations of Mathematics. North-Holland, Am-sterdam, 1989.
119. J.B. Paris and P. Waterhouse. Atom Exchangeability and Instantial Rele-vance. Journal of Philosophical Logic, 38:313–332, 2009.
313
120. Charles Parsons. Mathematical Thought and Its Objects. Harvard UniversityPress, Cambridge, 2008.
121. Lee Peng Yee. Lanzhou Lectures on Henstock Integration, volume 2 of Seriesin Real Analysis. World Scientific, Singapore, 1989.
122. Washek F. Pfeffer. A Note on the Generalized Riemann Integral. Proceedingsof the American Mathematical Society, 103(4):1161–1166, 1988.
123. Bruno Poizat. Stable Groups, volume 87 of Mathematical Surveys andMonographs. American Mathematical Society, Providence, RI, 2001.
124. Mike Prest. Model Theory and Modules, volume 130 of LondonMathematical Society Lecture Note Series. Cambridge University Press,Cambridge, 1988.
125. Proclus. A Commentary on the First Book of Euclid’s Elements. PrincetonUniversity Press, Princeton, 1970. Translated by Glenn R. Morrow. Pagereferences are to the critical edition, which are given in the margins of thisedition.
126. Thomas Reid. Essays on the Intellectual Powers of Man. Bell, Edinburgh,1785. Reprinted in [127].
127. Thomas Reid. The Works of Thomas Reid, D.D. MacLachlan and Stewart,Edinburgh, 1863. Two volumes. Edited by Sir William Hamilton.
128. Michael D. Resnik. Mathematics as a Science of Patterns: Ontology andReference. Nous, 15(4):529–550, 1981.
129. Michael D. Resnik. Mathematics as a Science of Patterns. Clarendon, Oxford,1997.
130. Lance J. Rips and Jennifer Asmuth. Mathematical Induction and Induc-tion in Mathematics. In Aidan Feeney and Evan Heit, editors, InductiveReasoning: Experimental, Developmental, and Computational Approaches,pages 248–268. Cambridge University Press, 2007.
131. Lance J. Rips, Amber Bloomfield, and Jennifer Asmuth. From NumericalConcepts to Concepts of Number. Behavioral and Brain Sciences, 31:623–687, 2008.
132. Juho Ritola. Begging the Question: A Study of a Fallacy, volume 12 ofReports from the Department of Philosophy. Painosalama Oy, Turku, 2004.Dissertation, Department of Philosophy, University of Turku, Finland.
314
133. Hartley Rogers, Jr. Theory of Recursive Functions and EffectiveComputability. MIT Press, Cambridge, MA, second edition, 1987.
134. Gerald E. Sacks. Higher Recursion Theory. Perspectives in MathematicalLogic. Springer, Berlin, 1990.
135. Stewart Shapiro. Foundations without Foundationalism: A Case forSecond-Order Logic, volume 17 of Oxford Logic Guides. The ClarendonPress, New York, 1991.
136. Stewart Shapiro. Philosophy of Mathematics: Structure and Ontology. Ox-ford University Press, Oxford, 2000.
137. Stephen G. Simpson. An Extension of the Recursively Enumerable TuringDegrees. Journal of the London Mathematical Society, 75(2):287–297, 2007.
138. Stephen G. Simpson. Subsystems of Second Order Arithmetic. CambridgeUniversity Press, Cambridge, second edition, 2009.
139. Robert I. Soare. Recursively Enumerable Sets and Degrees. Perspectives inMathematical Logic. Springer, Berlin, 1987.
140. Clifford Spector. Hyperarithmetical Quantifiers. Fundamenta Mathematicae,48:313–320, 1959/1960.
141. John R. Steel. Forcing with Tagged Trees. Annals of Mathematical Logic,15(1):55–74, 1978.
142. Charles Swartz. Introduction to Gauge Integrals. World Scientific, Singapore,2001.
143. William Tait. Finitism. Journal of Philosophy, 78(9):524–546, 1981. This isreprinted in [144].
144. William Tait. The Provenance of Pure Reason. Logic and Computation inPhilosophy. Oxford University Press, New York, 2005.
145. Alfred Tarski, Andrzej Mostowski, and Raphael M. Robinson. UndecidableTheories. Studies in Logic and the Foundations of Mathematics. North-Holland, 1953.
146. Adolf Trendelenburg. Logische Untersuchungen. Hirzel, Leipzig, second edi-tion, 1862. Two volumes.
147. Lou van den Dries. Tame Topology and O-Minimal Structures, volume 248of London Mathematical Society Lecture Note Series. Cambridge UniversityPress, Cambridge, 1998.
315
148. Albert Visser. Categories of Theories and Interpretations. In Ali Enayat,Iraj Kalantari, and Mojtaba Moniri, editors, Logic in Tehran, volume 26 ofLecture Notes in Logic, pages 284–341. Association for Symbolic Logic, LaJolla, 2006.
149. Albert Visser. The Predicative Frege Hierarchy. Unpublished. Dated October13, 2006.
150. John Wallis. Due Correction for Mr Hobbes, or Schoole Discipline, for notSaying his Lessons Right. Lichfield, Oxford, 1656.
151. John Wallis. A Treatise of Algebra. Davis, Oxford, 1685.
152. John Wallis. The Arithmetic of Infinitesimals. Sources and Studies in theHistory of Mathematics and Physical Sciences. Springer, New York, 2004.Translated by Jaequeline A. Stedall.
153. Andrew Wayne. Bayesianism and Diverse Evidence. Philosophy of Science,62(1):111–121, 1995.
154. Kai F. Wehmeier. Consistent Fragments of Grundgesetze and the Existenceof Non-Logical Objects. Synthese, 121(3):309–328, 1999.
155. Kai F. Wehmeier. Russell’s Paradox in Consistent Fragments of Frege’sGrundgesetze der Arithmetik. In One Hundred Years of Russell’s Paradox,volume 6 of de Gruyter Series in Logic and its Applications, pages 247–257.de Gruyter, Berlin, 2004.
156. Jon Williamson. Countable Additivity and Subjective Probability. BritishJournal for the Philosophy of Science, 50:401–416, 1999.
157. Timothy Williamson. Vagueness. Routledge, London and New York, 1994.
158. Crispin Wright. Frege’s Conception of Numbers as Objects, volume 2 of ScotsPhilosophical Monographs. Aberdeen University Press, Aberdeen, 1983.
159. Crispin Wright. On the Harmless Impredictavity of N= (Hume’s Principle).In Matthias Schirn, editor, Philosophy of Mathematics Today, pages 393–368. Clarendon Press, Oxford, 1998. Reprinted in [60].
160. Crispin Wright. Response to Dummett. In Philosophy of MathematicsToday, pages 389–405. Clarendon Press, Oxford, 1998. Reprinted in [60].
161. Crispin Wright. Is Hume’s Principle Analytic? Notre Dame Journal ofFormal Logic, 40(1):6–30, 1999. Reprinted in [60].
316
162. Martin Ziegler. Model Theory of Modules. Annals of Pure and AppliedLogic, 26(2):149–213, 1984.
163. Boris Zilber. Zariski Geometries: Geometry from a Logician’s Point of View,volume 360 of London Mathematical Society Lecture Note Series. CambridgeUniversity Press, 2010.
317