LOGIC Advanced 1994

This book presents an up-to-date, unified treatment of research in bounded arith-metic and complexity ofpropositional logic with emphasis on independence proofsand lower bound proofs. The author discusses the deep connections between logicand complexity theory and lists a number of intriguing open problems.

An introduction to the basics of logic and complexity theory is followed by dis-cussion of important results in propositional proof systems and systems of boundedarithmetic. Then more advanced topics are treated, including polynomial simula-tions and conservativity results, various witnessing theorems, the translation ofbounded formulas (and their proofs) into propositional ones, the method of ran-dom partial restrictions and its applications, direct independence proofs, completesystems of partial relations, lower bounds to the size of constant-depth propo-sitional proofs, the method of Boolean valuations, the issue of hard tautologiesand optimal proof systems, combinatorics and complexity theory within boundedarithmetic, and relations to complexity issues of predicate calculus.

ENCYCLOPEDIA OF MATHEMATICS AND ITS APPLICATIONS

EDITED BY G.-C. ROTA

Volume 60

Bounded Arithmetic, Propositional Logic, and Complexity Theory


4 W. Miller, Jr. Symmetry and separation of variables6 H. Minc Permanents

11 W. B. Jones and W. J. Thron Continued fractions12 N. F. G. Martin and J. W. England Mathematical theory of entropy18 H. O. Fattorini The Cauchy problem19 G. G. Lorentz, K. Jetter, and S. D. Riemenschneider Birkhoff interpolation21 W. T. Tutte Graph theory22 J. R. Bastida Field extensions and Galois theory23 J. R. Cannon The one-dimensional heat equation25 A. Salomaa Computation and automata26 N. White (ed.) Theory of matroids27 N. H. Bingham, C. M. Goldie, and J. L. Teugels Regular variation28 P. P. Petrushev and V. A. Popov Rational approximation of real functions29 N. White (ed.) Combinatorial geometries30 M. Pohst and H. Zassenhaus Algorithmic algebraic number theory31 J. Aczel and J. Dhombres Functional equations containing several variables32 M. Kuczma, B. Chozewski, and R. Ger Iterative functional equations33 R. V. Ambartzumian Factorization calculus and geometric probability34 G. Gripenberg, S.-O. Londen, and O. Staffans Volterra integral and functional

equations35 G. Gasper and M. Rahman Basic hypergeometric series36 E. Torgersen Comparison of statistical experiments37 A. Neumaier Interval methods for systems of equations38 N. Korneichuk Exact constants in approximation theory39 R. A. Brualdi and H. J. Ryser Combinatorial matrix theory40 N. White (ed.) Matroid applications41 S. Sakai Operator algebras in dynamical systems42 W. Hodges Model theory43 H. Stahl and V. Totik General orthogonal polynomials44 R. Schneider Convex bodies45 G. Da Prato and J. Zabczyk Stochastic equations in infinite dimensions46 A. Bjomer, M. Las Vergnas, B. Sturmfels, N. White, and G. Ziegler Oriented

matroids47 G. A. Edgar and L. Sucheston Stopping times and directed processes48 C. Sims Computation with finitely presented groups49 T. Palmer Banach algebras and the general theory of *-algebras50 F. Borceux Handbook of Categorical Algebra I51 F. Borceux Handbook of Categorical Algebra II52 F. Borceux Handbook of Categorical Algebra III54 A. Katok and B. Hasselblatt Introduction to the modern theory of dynamical

systems58 R. Gardner Geometric tomography


Bounded Arithmetic, Propositional Logic,and Complexity Theory

JAN KRAJICEKAcademy of Sciences of the Czech Republic

AMBRIDGEUNIVERSITY PRESS

Published by the Press Syndicate of the University of CambridgeThe Pitt Building, Trumpington Street, Cambridge C132 I RP

40 West 20th Street, New York, NY 10011-4211, USA10 Stamford Road, Oakleigh, Melbourne 3166, Australia

© Cambridge University Press 1995

First published 1995

Library of Congress Cataloging-in-Publication Data

Krajicek, Jan.

Bounded arithmetic, propositional logic, and complexity theory / Jan Krajicek.

p. cm. - (Encyclopedia of mathematics and its applications; v. 60)

Includes bibliographical references (p. 000-000) and indexes.

ISBN 0-521-45205-8

1. Constructive mathematics. 2. Proposition (Logic).3. Computational complexity. I. Title. II. Series.

QA9.56.K73 19955 1 1.3 - dc20 94-47054

CIP

A catalog record for this book is available from the British Library.

ISBN 0-521-45205-8 hardback

Transferred to digital printing 2004

To Karel Tesaf

CONTENTS

Preface

Acknowledgments

1 Introduction

2 Preliminaries2.1 Logic

2.2 Complexity theory

3 Basic complexity theory3.1 The P versus NP problem

3.2 Bounded arithmetic formulas

3.3 Bibliographical and other remarks

4 Basic propositional logic4.1 Propositional proof systems4.2 Resolution4.3 Sequent calculus4.4 Frege systems4.5 The extension and the substitution rules4.6 Quantified propositional logic4.7 Bibliographical and other remarks

5 Basic bounded arithmetic5.1 Theory IAo5.2 Theories S2 and T2

5.3 Theory PV

vii

viii Contents

5.4 Coding of sequences 79

5.5 Second order systems 83

5.6 Bibliographical and other remarks 92

6 Definability of computations 93

6.1 Polynomial time with oracles 94

6.2 Bounded number of queries 97

6.3 Interactive computations 97


7 Witnessing theorems 102

7.1 Cut-elimination for bounded arithmetic 102

7.2 Eb-definability in Sz and oracle polynomial time 105

7.3 E +2- and E +, -definability in S`2 and bounded queries 113

7.4 E+2-definability in TZ and counterexamples 120

7.5 Eb-definability in T2 and polynomial local search 121

7.6 Model-theoretic constructions 126


8 Definability and witnessing in second order theories 132

8.1 Second order computations 132

8.2 Definable functionals 134


9 Translations of arithmetic formulas 139

9.1 Bounded formulas with a predicate 139

9.2 Translation into quantified propositional formulas 144

9.3 Reflection principles and polynomial simulations 158


9.5 Witnessing and test trees 180


10 Finite axiomatizability problem 185

10.1 Finite axiomatizability of S2 and T2 185

10.2 T2' versus 52+ 186

10.3 SZ versus TZ 193

10.4 Relativized cases 194

10.5 Consistency notions 20210.6 Bibliographical and other remarks 208

11 Direct independence proofs 210

11.1 Herbrandization of induction axioms 21011.2 Weak pigeonhole principle 213

Contents ix

11.3 An independence criterion 22011.4 Lifting independence results 22511.5 Bibliographical and other remarks 231

12 Bounds for constant-depth Frege systems 23212.1 Upper bounds 23212.2 Depth d versus depth d + 1 23612.3 Complete systems 24312.4 k-evaluations 25212.5 Lower bounds for the pigeonhole principle and for counting

principles 25812.6 Systems with counting gates 26612.7 Forcing in nonstandard models 27212.8 Bibliographical and other remarks 277

13 Bounds for Frege and extended Frege systems 27913.1 Counting in Frege systems 27913.2 An approach to lower bounds 28613.3 Boolean valuations 28913.4 Bibliographical and other remarks 297

14 Hard tautologies and optimal proof systems 29914.1 Finitistic consistency statements and optimal proof systems 29914.2 Hard tautologies 30414.3 Bibliographical and other remarks 307

15 Strength of bounded arithmetic 30815.1 Counting 30815.2 A circuit lower bound 31215.3 Polynomial hierarchy in models of bounded arithmetic 31615.4 Bibliographical and other remarks 324

References 327

Subject index 335

Name index 339

Symbol index 341

PREFACE

The central problem of complexity theory is the relation of deterministic andnondeterministic computations: whether P equals NP, and generally whether thepolynomial time hierarchy PH collapses. The famous P versus NP problem is oftenregarded as one of the most important and beautiful open problems in contemporarymathematics, even by nonspecialists (see, for example, Smale [1992]).

The central problem of bounded arithmetic is whether it is a finitely axiomati-zable theory. That amounts to deciding whether there is a model of the theory inwhich the polynomial time hierarchy does not collapse.

The central problem of propositional logic is whether there is a proof system inwhich every tautology has a proof of size polynomial in the size of the tautology. Inthis generality the question is equivalent to asking whether the class NP is closedunder complementation. Particular cases of the problem, to establish lower boundsfor usual calculi, are analogous to constructing models of associated systems ofbounded arithmetic in which NP # coNP.

Notions, problems, and results about complexity (of predicates, functions,proofs, ...) are deep-rooted in mathematical logic, and (good) theorems aboutthem are among the most profound results in the field. Bounded arithmetic andpropositional logic are closely interrelated and have several explicit and implicitconnections to the computational complexity theory around the P versus NP prob-lem. Central computational notions (Turing machine, Boolean circuit) are crucialin the metamathematics of the logical systems, and models of these systems arenatural structures for concepts of computational complexity.

Moreover, the only approach in sight universal enough to have a chance ofproducing lower bounds to the size of general Boolean circuits needed for the firstproblem is the method of approximations, which is a version of the ultraproductconstruction (and of forcing); forcing bears a relation to the second and the thirdproblems, and a general framework for the last problem is in terms of Booleanvaluations (Sections 3.1, 9.4, 12.7, and 13.3).

xi

xii Preface

Much of the contemporary research in computational complexity theory con-centrates on proving weaker versions of P NP, for example, on proving lowerbounds to the size of restricted models of circuits, and some deep results (althoughtelling little about the P versus NP problem) have been obtained.

It is, however, possible to approach the same problem differently and to tryto prove statement P : NP first for other structures than natural numbers N, inparticular for nonstandard models of systems of bounded arithmetic.

Such an approach is, in fact, common in mathematics, where for example anumber-, theoretic conjecture about the field of rational numbers is first tested forfunction fields that share many properties with the rationals. Similarly, we can tryto prove that P 0 NP holds in a model of a system of bounded arithmetic. Nonstan-dard models of systems of bounded arithmetic are not ridiculously pathologicalstructures, and a part of the difficulty in constructing them stems exactly from thefact that it is hard to distinguish these structures, by the studied properties, fromnatural numbers.

Methods (all essentially combinatorial) used for known circuit lower boundsare demonstrably inadequate for the general problem. It is to be expected that anontrivial combinatorial or algebraic argument will be required for the solutionof the P versus NP problem. However, I believe that the close relations of thisproblem to bounded arithmetic and propositional logic indicate that such a solutionshould also require a nontrivial insight into logic. For example, recent stronglower bounds for constant-depth proof systems needed a reinterpretation of logicalvalidity (Section 12.4).

The relations among bounded arithmetic, propositional logic, and complexitytheory are not ad hoc but are reflected in numerous more specific relations, rangingfrom intertranslatability of arithmetic and propositional proofs and computationsof machines, to characterizations of provably total functions in various subsystemsof bounded arithmetic in terms of familiar computational models, correspondencein definability of predicates by restricted means and their decidability in a par-ticular computational model, to proof methods based on analogous combinatorialbackgrounds in all three areas, and finally to formalizability of basic concepts andmethods of complexity theory within bounded arithmetic. It is the main aim of thisbook to explain these relations.

The last several years have seen important developments in areas of complexitytheory, as well as in bounded arithmetic and complexity of propositional logic,and other deep relations between these areas have been established. Althoughthere are several monographs on computational complexity theory and very goodsurvey articles covering the main fields of research, many recent results in boundedarithmetic and propositional logic are scattered in research articles, and someimportant facts, such as relations between various theorems and methods, are onlya part of unpublished folklore or appeared in a longer but now less significant, andhence less read, work.

Preface xiii

To my knowledge there are three published monographs treating, at least par-tially, bounded arithmetic and its relation to complexity theory: Wilkie (1985),Buss (1986), and the last part of Hajek and Pudlak (1993) (Chapter 5, pp. 267-408). Although these are very interesting books, the first two contain none of thedevelopments of the last several years (obviously) and none of the three treatspropositional logic.

This book is not intended to be a textbook of either logic or complexity theory.It merely wants to present the main aspects of contemporary research in boundedarithmetic and complexity of propositional logic in a coherent way and to illustratetopics pointed out at the beginning of this Preface. It is aimed at research mathe-maticians, computer scientists, and graduate students. No previous knowledge ofthe topics is required, but it is expected that the reader is willing to learn whatis needed along the way. My hope is that the book will stimulate more people tocontribute to this fascinating area.

Prague Jan KrajicekJuly 28, 1994

ACKNOWLEDGMENTS

I thank my colleagues Petr Hajek, Pavel Pudlak, JiiI Sgall, Antonin Sochor, andVitezslav Svejdar from our logic seminar at the Mathematical Institute of theAcademy of Sciences at Prague for creating an extremely stimulating and en-couraging research environment. In particular, I have learned a lot from extensivecollaboration with Pavel Pudlak.

I am also indebted to Gaisi Takeuti (Urbana), Sam Buss (San Diego), and PeterClote (Boston), with whom I had the privilege of collaborating on various researchprojects. Gaisi Takeuti never failed to provide an inspiration during my two yearsat the Department of Mathematics of the University of Illinois at Urbana.

I also wish to thank Steve Cook from the Department of Computer Science ofthe University of Toronto, where I wrote a large part of the first version of this bookduring my stay in spring semester 1993, for many inspiring discussions concerningtopics related to this book.

Finally I thank the following people for comments on parts of the manuscript:M. Baaz (Vienna), S. R. Buss (San Diego), M. Chiari (Parma), S. A. Cook (Toronto),F. Pitt (Toronto), P. Pudlak (Prague), A. A. Razborov (Moscow), G. Takeuti (Ur-bana), and D. Zambella (Amsterdam).

xiv

1

Introduction

Ten years ago I had the wonderful opportunity to attend a series of lectures givenby Jeff Paris in Prague on his and Alec Wilkie's work on bounded arithmetic andits relations to complexity theory. Their work produced fundamental informationabout the strength and properties of these weak systems, and they developed avariety of basic methods and extracted inspiring problems.

At that time Pavel Pudlak studied sequential theories and proved interestingresults about the finitistic consistency statements and interpretability (Pudlak1985,1986,1987). A couple of years later Sam Buss's Ph.D. thesis (Buss 1986)came out with an elegant proof-theoretic characterization of the polynomial timecomputations. Then I learned about Cook (1975), predating the above develop-ments and containing fundamental ideas about the relation of weak systems ofarithmetic, propositional logic, and feasible computations. These ideas were de-veloped already in the late 70s by some of his students but unfortunately remained,to a large extent, unavailable to a general audience. New connections and opportu-nities opened up with Miki Ajtai's entrance with powerful combinatorics appliedearlier in Boolean complexity (Ajtai 1988).

The work of these people attracted other researchers and allowed, quite recently,further fundamental results.

It appears to me that with a growing interest in the field a text surveying somebasic knowledge could be helpful. The following is an outline of the book.

Chapter 2 lists notions and results from logic and complexity theory the reader isexpected to have heard about. Chapter 3 overviews basic Boolean complexity andbasic facts about predicates definable by bounded arithmetic formulas. Sketchesof a few proofs are offered there, but mostly I refer the reader to other survey texts.All later chapters contain all necessary proofs.

Chapters 4 and 5 present basic information about the main propositional proofsystems and complexity of proofs issues, and about the main first order systemsof bounded arithmetic and their strength and mutual relations.

1

2 Introduction

Chapters 6 and 7 survey the characterizations of definable functions in thesystems of arithmetic known as witnessing theorems. Chapter 8 treats the secondorder systems of bounded arithmetic using RSUV isomorphism and transfers someresults of the previous three chapters to these systems.

Chapter 9 defines and studies propositional translations of arithmetic formulasand propositional simulations of arithmetic proofs with applications to polynomialsimulation results.

Chapter 10 is devoted to the fundamental problem of whether bounded arith-metic S2 is finitely axiomatizable, the central problem in the area. It surveys allrelevant results known (to me) to date.

Chapter 11 studies direct combinatorial arguments allowing separation of thelowest relativized subsystems of bounded arithmetic.

Chapters 12 and 13 concern the central question of propositional logic, whetherthere is a proof system admitting polynomial size proofs of all tautologies, for Fregeand extended Frege systems and for constant-depth systems. The main results areseveral exponential lower bounds for the constant-depth systems and a certainconceptual framework for the unrestricted system.

Chapter 14 presents finitistic consistency statements and studies the issue ofhard tautologies and optimal proof systems.

The final chapter, Chapter 15, develops some combinatorics and Boolean com-plexity theory within bounded arithmetic and studies several model-theoretic con-structions relevant to all the basic questions studied earlier in the book.

I have made an attempt to present the chosen material as completely and asup to date as possible but I did not try to compile a handbook of the whole field(hence the Bibliography also does not attempt to list the whole literature in thefield). Open problems are occasionally mentioned in the text (see the Index), butI refer the reader to Clote and Krajicek (1993) for a comprehensive annotated listof open problems in the area.

In the main text I give explicit credit only for main ideas and results. Thechapters end with a section of bibliographical and other remarks where completebibliographical information is given, and where I briefly comment on related butnot covered topics.

Finally I want to comment on material that is not covered in the book. Thisincludes, in particular, intuitionistic versions of bounded arithmetic systems, func-tional interpretations of these theories (and the issue of feasible functionals ingeneral), equational theories and machine-independent characterizations of vari-ous computational classes, and modifications of the basic systems relating themto a variety of subclasses of the polynomial time. Some of this material is omittedas I do not feel familiar with it; some is omitted because I think that it - althoughtechnically difficult and innovative - builds on basic ideas already apparent fromearlier results presented in the book.

2

Preliminaries

In this chapter we briefly review the basic notions and facts from logic and com-plexity theory whose knowledge is assumed throughout the book. We shall alwayssketch important arguments, both from logic and from complexity theory, and so adetermined reader can start with only a rough familiarity with the notions surveyedin the next two sections and pick the necessary material along the way.

For those readers who prefer to consult relevant textbooks we recommend thefollowing books: The best introduction to logic are parts of Shoenfield (1967);for elements of structural complexity theory I recommend Balcalzar, Diaz, andGabbarro (1988, 1990); for NP-completeness Garey and Johnson (1979); and fora Boolean complexity theory survey of lower bounds Boppana and Sipser (1990)or the comprehensive monograph Wegener (1987). A more advanced (but self-contained) text on logic of first order arithmetic theories is Hajek and Pudlak(1993).

2.1. LogicWe shall deal with first order and second order theories of arithmetic. The secondorder theories are, in fact, just two-sorted first order theories: One sort are numbers;the other are finite sets. This phrase means that the underlying logic is always thefirst order predicate calculus; in particular, no set-theoretic assumptions are a partof the underlying logic.

From basic theorems we shall use Godel completeness and incompleteness the-orems, Tarski's undefinability of truth, and, in arithmetic, constructions of partialtruth definitions.

A prominent theory is Peano arithmetic (PA), in the language of arithmeticLpA = {0, 1, +, , <, _} axiomatized by Robinson's arithmetic Q

1. a+1 #0

3

4 Preliminaries

2. a+1=b+l--> a=b3. a + 0 = a4. a+(b+1)=(a+b)+15.

6. b)+a7. a=0-3x,x+1=a

see Tarski, Mostowski, and Robinson (1953), and by the induction scheme IND

(0(0,a)AVx(0(x,d)-0(x+I,d)))- VxO(x,d)

for every formula 0(x, d) in the language LpA.We shall use the letters x, y, z.... mostly for bounded variables; the letters

a, b, c, ... will be reserved for free variables (also called parameters). Free vari-ables in axioms are assumed to be universally quantified; for example, the firstaxiom given is equivalent to the formula Vx, x + 10 0.

There are other schemes that can equivalently replace the induction scheme,for example, the least number principle LNP scheme

O(b,d)-3xVy(¢(x,d)A(y<x--.O(y,d))).

The standard model N of PA is the set of natural numbers with the symbolsof LpA interpreted with the usual meaning. A crucial fact about PA is that thereare nonstandard models (models not isomorphic with N) of PA and indeed ofthe theory of N, Th(N). Natural numbers N are isomorphic to a unique initialsubstructure of any nonstandard model M and we shall usually simply assumethatNc M.

A cut in a nonstandard model M is any nonempty I C M satisfying1. a<bAbEI --* a EI,alla,bEM2. aEI --> a + I E I, all a E M.

For example, N is a cut in every nonstandard model. Cuts in nonstandard modelsof PA closed under both addition and multiplication have special prominence asthey are particular models of bounded arithmetic IAO: They satisfy induction forall bounded arithmetic formulas Do, which are formulas in the language LpA withall quantifiers bounded (Section 3.2 is devoted to bounded formulas).

Nonstandard models of PA and even of its proper subtheories are difficult toconstruct; it is a theorem of Tennenbaum (1959) that there are no countable re-cursive nonstandard models of PA (and, indeed, of a weak subtheory IEi with theinduction just for bounded existential formulas, cf. Paris (1984). In particular, theseresults show that every nonstandard countable model of IE 1 has a nonstandard cutthat is a model of whole PA; hence, in a sense, the model theory of bounded arith-metic is as complex as that of PA. Consult Hajek and Pudlak (1993), Kaye (1991),or Smorynski (1984) for the model theory of PA.

2.2 Complexity theory 5

From proof theory we shall use theorems of Gentzen and Herbrand in variousversions. The reader is advised to refer to Takeuti (1975) for Gentzen's sequentcalculus.

We close this section with some remarks on notation. Logical connectives weshall use are the standard -,, v, A, --*, and = with the usual meaning - negation,disjunction, conjunction, implication, and equivalence - and the constants 1, 0 fortruth and falsity.

The symbols c and C are used in the sense of proper inclusion and inclusion.The symbols f(n) = O(g(n)), f(n) = S2(g(n)) and f(n) = 6(g(n)) denote

that eventually f(n) cg(n), f(n) > cg(n), and cjg(n) < f(n) < c2g(n)where c, cl, and c2 are positive constants, and f(n) = o(g(n)) means thatf(n)lg(n) - 0.

2.2. Complexity theoryI assume that the reader is acquainted with such notions as Turing machine, oracleTuring machine, and time and space complexity measures. We adopt the multi-tape version of Turing machines with a read-only input tape and with a finite butarbitrarily large alphabet.

The basic relations between classes of languages Time(f) and Space(f) rec-ognized by a deterministic Turing machine in time (respectively space) boundedby f(n), n the length of the input, and their nondeterministic versions NTime(f )and NSpace(f) are

1. Time(f (n)) C NTime(f (n)) C Space(f (n))2. Space(f (n)) c U, Time(cf("))3. (Hartmanis and Steams 1965) Time(f (n)) = Time(c f (n)) and

Space(f(n)) = Space(c f(n)) whenever n = o(f(n)) and n < f(n)4. (Hartmanis and Steams 1965, Hartmanis, Lewis, and Steams 1965)

Space(f) C Space(g)

and

Time(f) C Time(g log(g))

whenever f = o(g(n)).5. (Savitch 1970)

NSpace(f) C Space(f2)

whenever f (n) is itself computable in space f (n)6. (Szelepcsenyi 1987, Immerman 1988)

NSpace(f) = coNSpace(f)

for f(n) > log(n) and f itself computable in nondeterministic space f (n).

6 Preliminaries

7. (Hopcroft, Paul, and Valiant 1975) For n < f (n)

Time(f(n)) c Space f(n)(lon))

Particular bounds to time or space define the usual complexity classes

LinTime = U Time(cn)C

P = U Time(ne)C

NP = U NTime(n`)C

L = Space(log(n))

PSpace = U Space(n`)C

LinSpace = Space(n)

E = U Time(c")C

EXP = U Time(2n)C

Oracle computations allow one to define hierarchies of languages, the mostimportant of which are the linear time hierarchy LinH of Wrathall (1978)

Eli" = LinTime and E;+i, =

and the polynomial time hierarchy PH of Stockmeyer (1977)

EP = P and E +i = NPE[

The class of complements of languages from class X is denoted coX, and specialclasses of this form coE!'" and coE f are denoted l1'" and 11r, respectively.

The class +i is the class of functions computable by a polynomial-timemachine with access to an oracle from the class Ep.

Some important facts about these classes include the following: f'" C EP (andgenerally more resource in the "same" computational class properly increases theclass; see Zak 1983 for a general diagonalization technique), and LinH contains Land is, in fact, equal to the class of rudimentarypredicates as defined by Smullyan(1961) (cf. Wrathall 1978). It is also known that LinH also equals the class ofpredicates definable by Do-formulas; we shall prove that in Section 3.2. Also notethat either LinH PH or LinH does not collapse (i.e., LinH Ef'" for all i).

2.2 Complexity theory 7

The notion of NP-completeness, Cook's theorem, and the P versus NP problemare central to complexity theory, as well as to the connections with logic, andin Section 3.1 we shall review more basics, in particular some facts from circuitcomplexity.

Many interesting problems and notions arise in connection with counting func-tions. For R(x, y) a binary predicate with the property that for every x there areonly finitely many y's satisfying R(x, y) defines the function

#R(x) := the number of y's such that R(x, y)

Class #P consists of all functions #R(x) with the polynomial time computablerelation R (x, y) and satisfying the preceding finiteness property in a stronger form(cf. Valiant 1979):

R(x, y) -+ lyl < IxIO(l)

An important result of Toda (1989) is that every language in PH is polynomial-timereducible to a function in #P.

Nonuniform versions of the preceding classes are defined with the help ofadvice, functions. Polynomially bounded advice is a function f : N -+ (0, 1)*such that:

I f(n) I =nO(l)

The class P/poly, a nonuniform version of P, is the class of all sets A such thatthere are a set B E P and a polynomially bounded advice function f for which itholds

xEAiff(x,f(IxI))EBThe classes NP/poly, L/poly, and so on, are defined analogously (see the paragraphafter Theorem 3.1.4).

3

Basic complexity theory

We shall survey the basic notions and results of Boolean complexity (Section 3.1)and bounded formulas (Section 3.2) in this chapter. Most of the results in thefirst section are stated without a proof; some proofs appear later (Chapter 15, inparticular) formalized in bounded arithmetic. In the second section, most proofsare at least sketched.

3.1. The P versus NP problemThe central problem in complexity theory, and a major problem of contemporarylogic and mathematics, is whether the class P equals the class NP, the famous Pversus NP problem (Cook 1971). By Cook's theorem the problem is equivalentto asking whether there is a polynomial time deterministic algorithm recognizingthe set of satisfiable propositional formulas, or equivalently, such an algorithmrecognizing the set of propositional tautologies.

One approach to this problem is via investigating the circuit-complexity ofBoolean functions. Some interesting, although only preliminary, results were ob-tained in Boolean complexity.

Definition 3.1.1. A Boolean function with n inputs and m outputs is a function

f: {0,1}"H {0,1}'".

Examples of Boolean functions are obtained from any language Z C {0, 1)*:For any n, define the Boolean function Z" : {0, 1)" i-+ {0, 1) to be the character-istic function of Z n {0, 1)".

On the other hand, a sequence of Boolean functions

f": {0,1}"H 10, 1), n=0,_1,...

8

3.1 The P versus NP problem 9

defines a language

U{w E {0, 1}" I fn(w) = 1}n

Definition 3.1.2.(a) A Boolean connective is a Boolean function with one output. A basis is a

finite set of connectives.(b) A Boolean circuit with input variables xl, ... , x,; output variables y], ... ,

yn; and basis ofconnectives 0 = {gI , ... , gk} is a labeled acyclic directedgraph whose out-degree 0 nodes are labeled by yj 's, in-degree 0 nodes arelabeled by x, 's or by constants from S2, and whose in-degree f > I nodesare labeled by , functions, from n of arity .£.

(c) A Boolean formula is a Boolean circuit in which every node has the out-degree at most 1.

We shall mostly consider the de Morgan basis S2 = {0, 1, v, n).A Boolean circuit with input variables xi, ... , x" naturally computes a Boolean

function with domain 10, 1)": given input E E 10, 11" evaluate consecutively thenodes of the circuit by 0, 1 where a node gets the value computed by the connectivelabeling the node from the values at the incoming nodes. The requirement that acircuit is acyclic guarantees that this can be done consistently (and uniquely).

Definition 3.1.3.(a) The size of a circuit is the number of its nodes.(b) The depth of a circuit is the maximum length of a directed path in the

circuit.(c) For a Boolean. function f, Cc (f) denotes the minimal size of a circuit

with basis 0 computing f, DepthQ (f) denotes the minimal depth of acircuit with basis Q computing f, and L0(f) denotes the minimal size ofa formula with basis Q computing f.When S2 is the de Morgan basis then the index S2 is usually omitted.

The following theorem is the stimulus for investigating the circuit complexityof Boolean functions.

Theorem 3.1.4. Let Z C {0, 1)* be a polynomial-time recognizable languageZ e P. Then there exist a polynomial p(x) and a sequence {Cn }n of circuits inde Morgan basis with one output such that for all n

1. Z" is computed by C"2. the size of Cn is at most p(n).

In other words, C(Zn) < p(n), for all n.

Note that the languages Z with polynomially bounded C(Zn) are exactly thosefrom the class P/poly: circuits Cn can act as advice for inputs of length n (as

10 Basic complexity theory

the evaluation of a circuit can be performed by a polynomial time algorithm),and, for each n, an algorithm computing whether (x, f (IX I)) E B (see the end ofSection 2.2) can be turned into a polynomial size circuit.

Corollary3.1.5. Assume that, for some Z E NP the function C(Z") is not boundedby any polynomial in n.

Then P # NP.

Hence the following problem is a fundamental one.

Fundamental problem. Is there a language Z E NP with superpolynomial circuit-size complexity?

The next theorem shows that even the unexpected negative answer has importantcorollaries.

Theorem 3.1.6 (Karp and Lipton 1982). Assume that every NP language can becomputed by a, family of polynomial size circuits; that is, NP C P/poly.

Then the polynomial time hierarchy PH collapses to its second level

PH = E2 = fl',

Theorem 3.1.7 (Shannon 1949, Muller 1956). For every n there are Booleanfunctions with n inputs and one output having the circuit complexity Q (2n-10ge)

Proof There are 22n Boolean functions with n unknowns and one output, whereasthere are at most

(n + m)O(m)

circuits of size < m, which is less than 22" form < (2"/c n) and c a sufficientlylarge constant. Q.E.D.

Next we give an account of the method of approximations of Razborov (1989),following to some extent an exposition of Karchmer (1993) but stressing the ultra-product interpretation of the construction. We include this material here becauseit is the only framework that can be, at least in principle, applied to unrestrictedcircuits.

Let f : {0, 11" -o- {0, 11 be a Boolean function of n inputs and one output, andU := f-'(0).

Let C be a circuit in n inputs and of size m. We shall denote the nodes of Cby zi , ... , zn where the first n nodes are labeled by xI , ... , xn and the last oneis the output node y. We identify node zi with the Boolean function of n inputscomputed by the subcircuit ending in zi and we shall occasionally write xi insteadof zi if i < n and y if i = m.

The idea is to take the set of all computations of C on inputs u E U andproduce by "ultraproduct" a new computation on some w ¢ U. As all original

computations are rejecting (assuming that C computes f), the new computationwill also be rejecting. Hence w U will mean that C cannot compute f correctlyon all inputs.

Let B be the Boolean algebra of subsets of U. For any g : {0, 1 }" -> (0, 11 put

I lgl l := {u E Ul g(u) = 11

If F C B were an ultrafilter then we would define a new computation bylabeling node zi by I if I Izi I

IE F , and by 0 otherwise. This would produce a

correct rejecting computation, but the particular input defined by the choice of Fwould be from U as all ultrafilters on a finite set are principal. Hence we have torelax the notion of ultrafilter if anything nontrivial should be achieved.

A subset F C B is closed upward if a E F and a c b implies b E F. Fpreserves the pair (a, b) if

aEFAbEF -+ albEFWF is the set of all w E 10, 11" satisfying

IIxiIIEFandw;=0-III-xiIIEF

If exactly one of each I I xi I I and I I -xi I Iis in F, then W F consists of one 0-1

vector denoted WF.Let p (f) be the minimal t such that there exist t pairs (ai, bi )i < t of elements

of B having the property that if F C_ B is closed upward, 0 V F and F preservesall pairs (ai, bi )i<, then WF C U.

Theorem 3.1.8 (Razborov 1989).

1 P(.f) < C(f) = O(p3(f)) + Q(n3)

Proof. We first prove the easier of the two inequalities:2

p(f) < C(f ). Let Cbe an optimal circuit for f and assume all - are at the bottom level (we mayassume that by de Morgan rules possibly increasing the size of the circuit twice).Let (ci, di)i<t be all pairs of nodes of C such that ci and di fan into a commonnode labeled by A. That is, if (gi, hi) is a pair of Boolean functions computed atnodes ci, di then the conjunction gi A hi is computed at some node of C too (and(gi, hi)i<t exhaust all such pairs).

Set ai := I Igi I I and bi := I I hi I I . Assume that F C B is closed upward and thatit preserves all pairs (ai, bi)1<t.

Claim 1. Let g be a function computed by a subcircuit of C and let w E WE. Ifg(w) = I then Ilgll E F.

The claim is readily established by induction on the size of the subcircuit com-puting g: It holds for xi and -xi by the definition of WF, it holds for subcircuits

with top gate v as F is closed upward, and it holds for circuits with top gate A asF preserves all (ai, bi)1<t: that is, all possible conjunctions in C.

Now assume that for some w E WF, W U: that is, f (w) = 1. By the claimthen I I f I I E F. But I I .f I I = 0: that is, 0 E F. This proves the first inequality.

For the second inequality assume that we have (a1, bi )i < t such that for anyF C B closed upward such that 0 0 F and that preserves all (a1, bi)i<t, we haveWFCU.

Claim 2. For any w E {0, 1}n, f(w) = 1 iff0 E F, where F,,, is the minimalsubset of B that is closed upward, preserves all pairs (a1, bi)i<t, and satisfiesI I xt I I E F,(resp. I I -xi I I E F,,)) for wi = 1 (resp. wi = 0).

To see the claim assume first f (w) = 1, from which WFu, ¢ U follows, as bythe definition of F,,,, W E WFu,. Hence 0 E F,,,.

Now assume f(w) = 0: that is, w E U. Take F = {X C UIw E X}. ThenFw C F, F is closed upward and preserves all (a1, bi)i<t, but 0 F. Hence also0 F.

We are ready to construct a circuit computing the function f based on an ideathat for w E {0, I )n the circuit will try to prove that 0 E F,,,.

Let

A = {al, b1, a,nbl, ..., at, bt,atnbt}U{Ilxi II, II-xi II, , Ilx, II, II,xnII)U{0}

The size of A is at most 3t + 2n + 1.For any a e A and k < 3t +2n + 1 consider a function vQ inductively introduced

by

0_ J 1 ifa=IIxrIIAwi=1, or ifa=II=xiIlAwi=0Va

0 otherwise

and

k+I := V k V k kva / Ub V / (Uai A Ubi

bca,bEA jEJa

where Ja={j <tIa.nbj=a}.Obviously, if for some r

V k kvllXill A vll,x,ll

i<n

VaEA; v=va+1

then for all s > r and all a e A : vq = vs and A n F,, = {a I vu = 1). Hence0 E F,,, if vy = I for k = 3t + 2n + 1, and the definition of vk constitutes adefinition of a circuit of size 0((3t + 2n + 1)3) = 0(t3) + 0(n3) computingwhether 0 E Fw, that is, computing f (w). Q.E.D.

In principle thus, one can establish a lower bound t to C(f) by showing thatfor each set of 2t pairs (ai, b,)i<t of subsets of U there is a nontrivial F closedupward and preserving all pairs (a,, b,) i < t such that WF ¢ U.

The method has been successfully applied in the monotone case (i.e., for mono-tone functions and circuits in the basis {0, 1, v, A}); (cf. Razborov 1985). It seemsthat that was possible as the condition posed on a vector to be in WF is in themonotone case relaxed to one implication, wi = 1 ->. I Ixi I I E F, only.

The following function has (2) variables xi/. Denote by G(xi j) the undirectedgraph with vertices V = 11, ... , n) and edges

E={{i,j}Ixi.i=1}

A clique in a graph is a complete subgraph.The clique function

CLIQUEn k(x1j) =1 if G (xij) has a clique of size at least k

0 otherwise

Let C+(f) denote the minimal size of a circuit in the basis {0, 1, v, A} com-puting a monotone function f. C+(f) is sometimes also denoted Cm (f ).

Theorem 3.1.9 (Razborov 1985, Alon and Boppana 1987). Fork < n1/a:

C+(CLIQUE,.k) = nQ(f)

Now we turn our attention to another restricted model of circuits: constant-depth circuits. To obtain a nontrivial model of computation we have to allow Vand A of unbounded arity. Note that any Boolean function can then be computedby a depth 2 circuit: Take its disjunctive or conjunctive normal form.

It is easy to see that any depth d size m circuit can be rewritten as depth d size< and formula; hence there is no essential difference in studying constant-depthcircuits versus formulas.

There are three simple Boolean functions of particular importance.

The parity function

i if Ei xi is oddVX1, X")

0 otherwise

The majority function

1 if (n/2) < >i xiMAJ(xi, ... , xn) _

0 otherwise

The Sipser function of depth d

Sd(xi,...id) =V A... xii...idil <ni2<n

where i j range over { 1, ... , n} and the d connectives V, A alternate.

Note that the disjunctive normal form of all these functions has exponential size(in nrt'i).

Theorem 3.1.10 (Yao 1985, Hastad 1989). For any d > 2, any depth d circuitcomputing the parity, function

®(xl,...,xn)

must have size at least

The proof of the optimal lower bound utilizes the method of random restrictionsand the so-called Hastad's switching lemma (there are two, in fact); the idea ofthe structure of the lower bound proof goes back to Ajtai (1983); Furst, Saxe, andSipser (1984); and Yao (1985).

A restriction of a circuit is a partial evaluation of its inputs

p:(xI,...,xn}H{0,1,*}

where the value p(x) = * is a convenient abbreviation of "p(x) is undefined."The restricted circuit then computes a function of the inputs that received * by p.

The idea of the lower bound proofs by the method of restrictions is that ifthe circuit is small then there will be a restriction leaving some input variablesunevaluated, such that the restricted circuit will compute a constant function. If acircuit computes the parity function, each restriction computes either the parity orits negation and specifically cannot be a constant function. Hence there cannot besmall constant-depth circuits computing the parity function.

We shall now only state the Hastad's switching lemma and leave a proof of itsvariant to Chapter 15 (Lemma 15.2.2), where we shall formalize its proof withinbounded arithmetic.

Lemma 3.1.11 (Hastad 1989). Let 0 < p < 1. Construct a restriction p by thefollowing random process: for any xi define p (xi) independently

* with probability p

p (xi) = 1 with probability (1 - p)12

0 with probability (1 - p) /2

Assume that a circuit C is a depth 2 circuit that is a disjunction of conjunctions ofliterals, with each conjunction of arity at most t.

Then with a probability of at least

1 - (5pt)s

the function computed by the circuit restricted by a random restriction can also be


computed by a depth 2 circuit that is a conjunction of disjunctions of literals, witheach disjunction of arity at most s.

In applications t, s are of the form ns and p is of the form nE-1. In that case thebound from the lemma is very close to 1 : 1 - 2' . Hence it is highly probablethat a disjunction of small conjunctions can be "switched" into a conjunction ofsmall disjunctions.

That this really simplifies the function is clarified by a simple lemma. First weneed a definition.

Definition 3.1.12.(a) A branching program is a directed acyclic graph with one in-degree 0 node

(the source); with all nodes of out-degree either 2 (the inner nodes) or 0(the leaves); the inner nodes labeled by variables with the two outgoingedges labeled by 0, 1, respectively; and the leaves labeled by elements ofa set Y.

(b) A decision tree is a branching program that is a tree with the edges directedfrom the root toward the leaves.

Any evaluation a of variables determines a path P(a) through the program orthe tree: The path uses the edge labeled 1 from a node labeled by x, if and only ifa(xi) = 1.

We say that a branching program (a decision tree) computes a functionAx I ---,

iffor every evaluation a

f (a(xl ), ... , y iffpath P(a) ends with a leaf labeled by y

The size of a branching program (a decision tree) is the number of nodes.The height of a decision tree is the maximum length of a path through it.

Most often the set Y is just the set 10, 1), in which case the branching program(decision tree) computes a Boolean function.

For the first part of the next lemma note that any conjunction in a disjunction (ofconjunctions) computing f has to have a literal in common with every disjunctionin a conjunction (of disjunctions) computing f.

Lemma 3.1.13. Assume that a function f (xl , ... , is computed by a depth 2circuit that is a disjunction of conjunctions of arity at most s, and at the same timealso by a depth 2 circuit that is a conjunction of disjunctions of arity at most s.

Then f can be computed by a decision tree of height at most s2.On the other hand, if a function is computed by a decision tree of height t then

it can be computed by a depth 2 circuit that is a conjunction of disjunctions ofarityat most t and also by a depth 2 circuit that is a disjunction of conjunctions of arityat most t.

Modular counting functions are defined similarly to the parity function.Modular counting function

MODp (x 1, ... , xn) --I

I if Yi xi is not divisible by p

0 otherwise

Theorem 3.1.14 (Razborov 1987, Smolensky 1987). Let p, q be different primesand let d > 2 be arbitrary. Then any depth d circuit in the basis {0, 1, v, A,MODq } computing the function MOD p(x 1, ..., x") must have size at least

2Q("(11(2d)))

In particular, any such circuit computing the majority function

MAJ(x1,...,x")

must have size at least2Q ("(1/(2d+))))

The bound to the majority function follows from the first part as it is easyto see that all MODp functions have depth 2 circuits of size O(n) in the basis10, 1, -, V, A, MAJ}.

We shall now turn our attention from the size of circuits to the depth of circuits.We consider circuits in the de Morgan basis again with the binary v and A. Thereare two simple but important facts. The first is about the relation of the circuitdepth to the size of formulas. The idea for the proof of this theorem is used inseveral proofs in later chapters.

Theorem 3.1.1 5 (Spira 1971). For any Boolean function f

Depth(f) = O(logL(f)) and L(f)<2 I+Depth(f)

In particular, the notions "polynomial size formulas" and " 0 (log n)-depthcircuits " are equivalent concepts.

The second fact concerns the important notion of communication complexity.Consider a game played by two players A and B; player A receives a E to, 1)"and player B receives b E (0, 1 )" such that f (ii) f (T). Their task is to find isuch that ai bi. They send each other bits of information and the game endswhen the players agree on an answer.

The communication complexity of function f(Y) is the minimal number ofbits they need to exchange in the worst case before the game ends. We shalldenote it CC(f). The following is an important characterization of the circuit-depth measure. The theorem is proved by a straightforward induction.

3.2 Bounded arithmetic formulas 17

Theorem 3.1.16 (Karchmer and Wigderson 1988). For any Boolean_function f

Depth(f) = CC(f)

As little is known about the size of formulas as about the circuit-size.

Theorem 3.1.17 (Andreev 1987, Hastad 1993). There is a polynomial-time lan-guage Z such that

L Z > n3-o(I)

The language Z from the theorem is a rather artificial one. Earlier Chrapchenko(1971) showed

L((D(xi, ..., n2

3.2. Bounded arithmetic formulasWe shall consider several languages of arithmetic as underlying languages forvarious systems of bounded arithmetic, but there are two basic ones: the languageof Peano arithmetic LPA defined in Section 2.1, and the language of the theory S2,denoted simply L, which extends the language LPA by three new function symbols

L2J lxI x#y

The intended values of IxI and x#y are Ilog2(x + 1)1 for x > 0 and 101 = 0, and211 Hyl, respectively. Note that Ix I is the length of the binary representation of x, ifx>0.

We shall consider the class of bounded formulas in the language L PA first. Theywere first defined by Smullyan (1961), who called sets defined by such formulasconstructive arithmetic sets.

Definition 3.2.1.1. Eo = Uo is the class of quantifier free formulas.2. Class Ei+i is the class of formulas logically equivalent (i.e., in the predicate

calculus) to a formula of the form

3xi <ti(a)...3xk <tk(a)o(a,x)

with the formula 0 E U; and t, (a) s terms of the language L PA3. U;+1 is the class of formulas logically equivalent to a. formula of the form

`dxI <tJ(a)...Yxk <tk(a)Q(a,7)

with the formula 0 E E1.

4. Class AO of bounded arithmetic formulas is the union of classes E; and U;

Ao=UE,=UU,

Note that both Ei and U; are contained in both E;+1 and U,+1 I.For M a structure for language LPA symbols E;(Mt), U;(Mt), and Ao(Mt),

respectively, denote the classes of subsets of Mt definable by the E,, U, and Doformulas, respectively (we shall usually omit the superscript f when it is obvi-ous from the context). Already the class El (M) can be quite nontrivial from thecomplexity-theoretic point of view, as according to Adleman and Manders (1977)the class El (N) contains an NP-complete set

{(a, b, c) l 3x < c3y < c, axe + by = c}

There are several important characterizations of the class io(N). We start withthe notion of rudimentary sets introduced by Smullyan (1961).

The intended structure for the language of rudimentary sets is the set of wordsover {0, 1) or, via dyadic coding, the set of natural numbers.

The language of rudimentary sets consists of1. A: the empty word,2. the concatenation,3. 0, 1: constants,

and two special kind of quantifiers4. 3x c_ p y and Vx c_P y: the part-of quantifiers,

5. 3Ix l IYI and Vlx l < l yl: the length-bounded quantifiers.The meaning of x c p y is that the word x is a part of the word y

3z1, Z2; Z1 xz2 = y

and the meaning of lx I < Iyl is obvious: the length of x is at most the length of y.

Definition 3.2.2 (Smullyan 1961).1. The class of rudimentary sets RUD is the class of subsets of Nt definable

in the language of rudimentary sets with all quantifiers either part-of orlength-bounded.

2. The class of strictly rudimentary sets SRUD is the class of subsets of Ntdefinable in the language of rudimentary sets with all quantifiers of thepart-of type.

3. The class of positive rudimentary sets RUD+ is the class of subsets of Ntdefinable in the language of rudimentary sets with all quantifiers are eitherpart-of or length-bounded, and in which all quantifiers 3lx I < IYI appearpositively and all quantifiers VIxI < IyI appear negatively.

4. The class of strongly rudimentary sets strRUD is the class of sets that arepositive rudimentary and whose complements are also positive rudimen-tary.

Note that terms are allowed to appear in the quantifiers.5. A function f : Nt i--> N is rudimentary if its graph is a rudimentary set

and the function is majorized by a polynomial.

Theorem 3.2.3 (Bennett 1962).

RUD = Ao(N)

Proof (sketch). Clearly there are only two claims to be established:

Claim 1. The graphs of addition and the multiplication are in RUD.

Claim 2. The graph of the operation of concatenation is in Ao(N).

The idea of the proof of Claim 1 is in Bennett's lemma saying that any functiondefined by bounded recursion on notation is rudimentary (see Lemma 3.2.4). Weshall see a bit stronger argument of the same type in Theorem 3.2.8.

For Claim 2 note that

x - y = z if 3w < z, y < w A x w + y = z A " w is a power of two"

where the last condition is expressed by

Vl <u, v <w3t < u, u v = w --> 2 t = u

Q.E.D.

Lemma 3.2.4 (Bennett 1962). Assume that a junction f is defined from two rudi-mentaryfunctions g and h by bounded recursion on the notation

1. f(0j)=h(T)2. f(xj E {O, 1}"

and satisfies the condition3. If (x,Y)I <0( A,Ix,I+EjlyjI

Then the function f is rudimentary too.

Theorem 3.2.5 (Wrathall 1978).

LinH = RUD

Proof (sketch). Using the natural coding of computations of machines by 0-1strings one verifies that Eo" c_ RUD, from which LinH CRUD follows immedi-ately.

The opposite inclusion is obvious. Q.E.D.

The possibility of coding in Do(N) merits further discussion. We shall nowmention two results and return to this topic again in Section 5.4.

Theorem 3.2.6 (Bennett 1962). The graph of exponentiation

{(x, y, z) I xl' = z}

is rudimentary.

Theorem 3.2.7 (Wrathall 1978). All context free languages are rudimentary andhence in DO(N).

The last theorem finds a root in an important theorem of Nepomnjascij (1970),generalizing Lemma 3.2.4.

The term TimeSpace(f(n), g(n)) denotes the class of languages recognized bya Turing machine working simultaneously in time f (n) and space g(n).

Theorem 3.2.8 (Nepomnjascij 1970). Let c > 0 and 1 > e > 0 be two constants.Then

TimeSpace(n`, nE) c AO(N)

Proof (sketch). We shall give an idea of the proof. By induction on k prove that

TimeSpace(nk'(1-E), nE) 9 Do(N)

If k = 1 then the sequence consisting of the instantaneous descriptions of aTimeSpace(nk-(I -E), nE) computation has size O(n), and hence its code is boundedby a polynomial in input x, Ix I = n.

Assume we have

TimeSpace(nk'(I-E), nE) 9 Ao(N)

and let

L E nE)

Then x e L if and only if there exists a sequence w = (wo, ..., w,-) such thatwo = x, each wi+i is an instantaneous description obtained from the instantaneousdescription wi by a TimeSpace(nk-(I -E), nE) computation, w, is a halting acceptingposition, and r < n 1 -5.

The length of any such w is again O(n) and the conditions defining it are Do-definable by the induction assumption. Q.E.D.

Corollary 3.2.9.

LcOo(N)

The main problem about Do(N) is whether the hierarchy collapses, which isthe same as whether LinH collapses: that is, whether

Do(N) = Ei(N)

for some i.The only partial result is the following weak hierarchy theorem of Wilkie and

Woods.

Theorem 3.2.10 (Wilkie 1980, Woods 1986). Denote by Vk(N) the class ofsub-sets of N definable by a to formula 0 (x) with at most k quantifiers bounded by

X.

Then for all k

Vk (N) C Vk+I (N)

The rest of this section is devoted to bounded formulas in the language L.

Definition 3.2.11 (Buss 1986).1. The class Eo = TIo of sharply bounded formulas consists of formulas in

which all quantifiers have the form

3x < Itl or Vx < Itl

That is, the quantifiers are bounded by the length of a term.2. For 0 < i the classes E+1 and II+I are the smallest classes satisfying

(a) EbU17hCE+nn+(b) both E<+I and II +I are closed under sharply bounded quantifica-

tion, disjunction v, and conjunction A

(c) EI+ is closed under bounded existential quantification

(d) IIb+ is closed under bounded universal quantification

(e) the negation ofa Ei+I formula is IZb+I , and the negation ofa Ilb+Iformula is E+I

3. The class Ebb of bounded L -formulas is the union U, Eb = U, nb

4. A Eb formula is Ob (respectively Ob in a theory T) wit is equivalent to aIIb formula in predicate logic (respectively in T).

In words: The complexity of bounded formulas in language L is defined bycounting the number of alternations of bounded quantifiers, ignoring the sharplybounded ones, analogously to the definition of levels of the arithmetical hierarchywhere one counts the number of alternations of quantifiers, ignoring the boundedones.

Theorem 3.2.12. The subsets of N defined by E00 formulas are exactly the setsfrom the polynomial-time hierarchy PH.

In fact, for i > 1 the Eb formulas exactly define the EP predicates.

Proof (sketch). The only difference from Lemma 3.2.4 and Theorem 3.2.5 isthat now we need to code computations of length no(i), n = Ix I. If iyi < no(')then y < x# ... #x, which is a term of L; hence such y's can appear in boundedquantifiers. Q.E.D.

We should note that Bennett (1962) also considered a class of the extendedrudimentary sets, which are defined similarly to the rudimentary sets except thatthe language is augmented by a function of the growth rate of the function #. It isthen a straightforward extension of Theorem 3.2.5 that the extended rudimentarysets are exactly those from the polynomial time hierarchy PH.

3.3. Bibliographical and other remarksFor the history of results and ideas from Section 3.1 the reader should consultBoppana and Sipser (1990), Wegener (1987), and Sipser (1992). Important top-ics omitted are NP-completeness, for which Garey and Johnson (1979) is a goodsource, and the completeness results for other classes, in particular, the complete-ness of directed st-connectivity for class NL and the completeness of undirectedst-connectivity for class L/poly (Aleliunas et al. 1979). Karchmer and Wigderson(1988) and Raz and Wigderson (1990) study the depth of monotone circuits forconnectivity and matching.

Other interesting facts, but not used later in the book, concern branching pro-grams: Barrington (1989) characterized Boolean functions with polynomial sizeformulas as those computed by width 5, polynomial-size branching programs, anda relation of space bounded Turing computations to size of branching programs:L/poly = BP (cf. Wegener 1987).

Very important but unfortunately unpublished is Bennett's Ph.D. thesis (Ben-nett 1962), containing either explicitly or implicitly most later definability resultssuch as Cobham (1965) and Nepomnjascij (1970). Paris and Wilkie (1981b) studyrudimentary sets explicitly.

4

Basic propositional logic

This chapter will present basic propositional calculus. By that I mean propertiesof propositional calculus established by direct combinatorial arguments as distin-guished from high level arguments involving concepts (or motivations) from otherparts of logic (bounded arithmetic) and complexity theory.

Examples of the former are various simulation results or the lower bound forresolution from Haken (1985). Examples of the latter are the simulation of theFrege system with substitution by the extended Frege system (Lemma 4.5.5 andCorollary 9.3.19), or the construction of the provably hardest tautologies from thefinitistic consistency statements (Section 14.2).

We shall define basic propositional proof systems: resolution R, extended res-olution ER, Frege system F, extended Frege system EF, Frege system with thesubstitution rule SF, quantified propositional calculus G, and Gentzen's sequentcalculus LK. We begin with the general concept of a propositional proof system.

4.1. Propositional proof systemsA property of the usual textbook calculus is that it can be checked in deterministicpolynomial time whether a string of symbols is a proof in the system or not. Thisis generalized into the following basic definition of Cook and Reckhow (1979).

Definition 4.1.1. Let TAUT be the set ofpropositional tautologies in the languagewith propositional connectives: constants 0 (FALSE) and 1 (TRUE), - (negation),v (disjunction), and n (conjunction), and atoms pl, p2, ... .

A propositional proof system is a polynomial time function P whose range isthe set TAUT.

For a tautology r E TAUT, any string w such that P(w) = r is called a P-proofof r.

23

24 Basic propositional logic

The size of a string w is the total number of occurrences of symbols in it, it isdenoted I w I.

Note that standard calculi, like those mentioned earlier, are all covered by thisdefinition as one can define a function P

P(w) := J r w is a proof oft in the calculus

1 w is not a proof in the calculus

This function is polynomial-time, the P-proofs oft are precisely the originalproofs (except when r = 1), and the range of P is the whole set TAUT as all thesecalculi are complete (resolution and extended resolution after a suitable encodingof all formulas by formulas in the disjunctive normal form).

The following problem is one of the most fundamental open problems of logic.

Fundamental problem. Is there a propositional proof system P in which everytautology has a polynomial size proof? That is, are there a proof system P and apolynomial p(x) such that any tautology r has a P -proof of size at most p(I r I)?

Theorem 4.1.2 (Cook and Reckhow 1979). There exists a propositional proofsystem in which every tautology has a polynomial size proof if and only if theclass NP is closed under complementation: NP = coNP.

Proof The "only if" part follows as 3w(IwI < p(I r I); P(w) = r defines a non-deterministic acceptor for the coNP-complete set TAUT whenever all tautologieshave size < p(I r I) P-proofs.

For the "if part" let M be a nondeterministic polynomial time acceptor forTAUT and define

P(w):=r w is an accepting computation of M on r

1 otherwiseQ.E.D.

Not surprisingly, then, there are no lower bounds known for the size of proofsin a general propositional proof system. The current research activity concentratesrather on proving lower bounds for particular natural propositional calculi or oncomparing the efficiency of various systems. For the latter task the followingdefinition is basic.

Definition 4.1.3. Let P and Q be two propositional proof systems.(a) Let g : N -+ N be a function. We say that a system P has a speed-up g

over a system Q ifs`

P(w) = r - 3wi, Iwil < g(IwI) A Q(wi) = r

holds, for all w and r.

4.2 Resolution 25

(b)

We say that P has a polynomial speed-up over Q wit has an nk speed-up(k a constant), and we say that it has a superpolynomial speed-up (resp.an exponential) iff it has no nk speed-up for all k (resp. if it has no 2"Espeed-up, some E > 0).We denote by P < Q the fact that P has a polynomial speed-up over Q.A polynomial simulation of P by Q (or briefly a p-simulation) is a poly-nomial time function f (w, r) such that for all w and r

P(w) = r te Q(.f(w, r)) = r

We write P <,, Q if there is a polynomial simulation of P by Q.

An immediate corollary of the definition is that both < and <p are quasi-orderings of propositional proof systems, and that <P is a finer quasi-orderingthan <. Note that if tautologies have polynomial size P-proofs then they havepolynomial size proofs in every system <-greater than P. A system P admittingpolynomial size proofs for all tautologies would be, in particular, a maximal el-ement in the quasi-ordering <. It is unknown, however, whether there are anymaximal elements in this quasi-order (cf. Krajicek and Pudlak 1989a).

4.2. ResolutionThe resolution system was introduced by Blake (1937) and developed in Davisand Putnam (1960) and Robinson (1965). The system operates with atoms p, ,P2, ... and their negations -PI, -P2, ..., but has no other logical connectives.The basic object is a clause, a finite (possibly empty) set of literals, that is, atomsor negated atoms. We think of a clause as of a disjunction of the literals in it. Atruth assignment

a:(PI,p2,...}-{0, 11

satisfies a clause C if and only if it satisfies at least one literal in C. This will bedenoted by a C. It follows that no assignment satisfies the empty clause.

The resolution rule allows us to derive new clause C, U C2 from two clausesC, U (p) and C2 U {-'p}

C, U{p} C2U{-'p}C, UC2

There are no restrictions on occurrences of literals p, -p in C, or C2. An obviousproperty of the resolution rule is that if a truth assignment satisfies both upperclauses of the rule then it also satisfies the lower clause. This is the soundness ofthe resolution rule.

A resolution refutation of a set C = {C1, ..., Ck) of clauses is a sequence ofclauses D1, ..., D, such that each D, is either an element of C or derived by the

resolution rule from some earlier D, D, u, v < i, and such that the last clauseDt is the empty clause.

Although the system does not allow one to work with propositional formulasdirectly, there is an indirect way called limited extension similar to reducing thegeneral satisfiability problem to the satisfiability of sets of clauses. Let 6 be aformula in atoms pl, ... , pn. Introduce for each subformula 0 of 0, including 6itself, a new atom q0 and a set Ext(6) of all clauses of the form

1. {q,, -pi), {-.q0, p; }, if 0 is atom p;2. {q0, q*}, {-.q0, -q*) if 0 = -. f3. {'qo, q*,, q*2 }, (q0, -q*,), (q0, -q*2 } if o = *1 v *24. {-q0, q*, }, {-q0, q*2}, {q0, -'q*1, -q*2) ifO = *t A 7/r2

Clearly the set Ext(6) is satisfiable if and only if the formula 6 is satisfiable.

Theorem 4.2.1. A set of clauses is unsatisfiable: that is, there is no truth as-signment satisfying simultaneously all clauses in the set, if and only if there is aresolution refutation of the set.

Proof Any truth assignment satisfying all clauses in set C would have to satisfy,by the soundness of the resolution rule, all clauses in a resolution refutation of C.In particular, the empty clause would also be satisfied, and that is impossible. Thisproves the "if part" of the theorem.

Assume now that C is unsatisfiable and let P 1 . . . . . p, be all atoms such thatonly the literals pl, -pi,. p,, , -p,, appear in C. We shall prove by inductionon n that for any such C there is a resolution refutation of C.

Decompose C into four disjoint sets Coo U Co, U Cio U C11, of those clauseswhich contain no pn and no -,pn, no p,,, but do contain ,p,,, do contain pn butnot p, and contain both p, , -p,, respectively.

Let Coi x CIO be the set of clauses obtained by the resolution rule applied toall pairs of clauses C1 U {'p,} from Col and to C2 U {p,} from CIO; these newclauses do not contain either pn or -'p,. Form a new set of clauses

C' = Coo U (Col X CIO)

The set C' is unsatisfiable too. This is because any assignment a' : {Pi , ... , pn }

{0, 1) satisfies either all clauses Ci such that C1 U {-pn } E COI, or all clauses C2such that C2 U {pn } E CIO.

Apply the same procedure to atoms pn_l, pn_2,... to produce sets C", C"', .. .which are all unsatisfiable and derivable by the resolution rule from the original setC. The nth such set will contain just the empty clause: that is, we get a resolutionrefutation of set C. Q.E.D.

As a corollary of the completeness proof we get a simple upper bound to thenumber of clauses appearing in a resolution refutation of an unsatisfiable set of kclauses.

4.2 Resolution 27

Corollary 4.2.2. Let C be an unsatisfiable set of k clauses formed from literalsbuilt from n atoms. Then for every 0 < t < n there is a resolution refutation of Cwith at most

t k 2s4" t+E

4So

clauses.

Proof. By the completeness of R any unsatisfiable set of clauses formed from fatoms or their negations has a resolution refutation with < 4e clauses as 4e is thenumber of all such clauses.

In the proof of Theorem 4.2.1 we start with k clauses in n atoms; after the firststep we create an unsatisfiable set of at most (k2/4) new clauses in n - 1 atoms,and after t steps we have k + (k2/4) +. .. (k2` /4t) clauses with the last set being anunsatisfiable set in n - t atoms, that is, with a refutation with at most 4"-t clauses.

Q.E.D.

We shall see later that this upper bound cannot be much improved.The following search problem is associated with an unsatisfiable set C =

{C1, ... , Ch}: given a truth assignment a find C, E C false under a. This searchproblem can be solved by a branching program (cf. Definition 3.1.12), with leaveslabeled by the clauses from C. A branching program solves the search problem ifand only if for every truth assignment a, the leaf of the path P(a) determined bya is labeled by a clause false under a.

Theorem 4.2.3. Let t be the minimal number of clauses in a regular resolutionrefutation of C, where "regular" means that on every path through the refutationevery atom is resolved at most once.

Lets be the minimal number of nodes in a read-once branching program solvingthe search problem associated with C, where "read-once "means that on everypaththrough the branching program every atom occurs at most once as a label of anode.

Then

t = s

Proof. Let n be a regular resolution refutation of C. Construct a branching programas follows: The last clause in jr (the empty one) becomes the root, the initialclauses from C become the leaves, and all other clauses in r become out-degree 2nodes with the two edges directed from a clause to the two clauses that form thehypotheses of the inference giving the clause. If the resolved atom in that inferenceis p, then the edge to the clause containing p, is labeled by 0, and vice versa.

Now it is straightforward to verify that all clauses on a path determined by atruth assignment a are false under a: That is, the branching program solves thesearch problem and it is read-once as it was regular.

For the opposite direction let or be a read-once branching program of the minimal

size. With every node v in or associate a clause C having the property that everyassignment determining a path going through v falsifies C,,.

If v is a leaf then C is the clause from C labeling v in or. Assume that the nodev is labeled by atom pi and the edge (v, vl) is labeled by 1, and (v, vo) by 0.

We claim that C,,, does not contain pi and does not contain -pi. This isbecause or is read-once and so no path reaching v (and at least one path does reachv ass is minimal possible size) determines the value of pi. Hence we could prolongsuch a path by giving value 1 to pi if pi E C,,, or value 0 if -pi E C o. This newpath would satisfy Cvo or C,,, respectively, contradicting the previous assumption.

It follows that either one of the clauses C,,, , CV 0 contains none of pi, -pi, orthat contains pi but not -pi and C,,, contains -pi but not pi. In the formercase define C to be the clause containing none of pi, -'pi, and in the latter casedefine C to be the resolution of clauses C,,, and C o with respect to (w.r.t.)atom pi.

It is easy to verify (using an argument similar to the earlier one) that no paththrough v satisfies C,,.

The root of or has to be assigned the empty clause as all paths go through it.Hence the constructed object is a regular resolution refutation. Q.E.D.

Note that the first part of the proof showed that s < t even without the restrictionto regular and read-once.

In the rest of the section we prove a strong lower bound to the number of clausesin resolution refutations of particular sets of clauses having clear combinatorialmeaning: the weak pigeonhole principle.

Let m > n be two natural numbers. Consider set PHPn of clauses with atomspi3,i =0,...,m - l and j =0,...,n - 1

{-Pik, -pJk}, all i j and k

{Pro,... , Pi(n-l)}, alli

Interpreting conditions pi] = 1 as defining a relation C 10, ... , m - 1) x{0, ... , n - 1) the clauses say that the relation is a graph of an injective mapfrom {0, ... , m - 1) to {0, ..., n - 1). Hence the set is unsatisfiable and has (byTheorem 4.2.1) a resolution refutation. The following theorem was first proved byHaken (1985) form = n + 1.

Theorem 4.2.4 (Haken 1985, Buss and Turan 1988). In any resolution refutationof set PHPn at least

different clauses must occur.

4.2 Resolution 29

Proof A 1-1 truth assignment to atoms pij is a truth assignment a for which therelation

{(i,j)EIn xnla(pi1)=1}

is a partial 1-1 function from m to n. We shall denote by dom(a) and mg(a) itsdomain and range.

A maximal 1-1 assignment is a 1-1 assignment a that assigns n ones.Let Jr be a resolution refutation of -' PHPn' and assume n is divisible by 4.

Claim 1. For every maximal 1-1 truth assignment a there is a clause Ca in rsatisfying

1. a does not satisfy Ca2. for every i dom(a) there are at most (n/2) j 's such that pig E C«3. there is exactly one i 0 dom(a) such that there are exactly (n/2) j s such

that pij E Ca

To see the claim consider the unique path P« in 7r consisting of clauses CO, C1,, Ct such that(i) Co is the end clause of 7r, that is, the empty clause

(ii) C;+1 is one of the two clauses that are the hypothesis of the resolutioninference giving Ci

(iii) a does not satisfy any of C,(iv) Ct E PHP ;'As a is maximal 1-1, clause Ct must have the form

Ct = {pio, . , pi(,-1)}

for some i 0 dom(a). Let C be the first such clause in Co, ..., Ct such that forsome i 0 dom(a), there are at least (n/2) j's such that pij e C. This C satisfiesall conditions 1-3.

For a a maximal 1-1 assignment let Ca be the first clause in 7r satisfying 1-3.For ,B a 1-1 assignment, not necessarily maximal, let C'5 be the first clause Ca

for some maximal a extending P.

Claim 2. Assume that fi is a 1-1 assignment which assigns (n /4) ones. Then thereare at least (n /4) + 1 i 's for which either there is j such that pij E C'5 or thereare at least (n/2) j s such that pij E Co.

To prove the claim let C' Ca for a maximal 1-1 assignment a. Define:COL- {i 12j, -'pij E CO}COL+ {i I i E dom(a) A (Vj, -pi j 0 CO) A 3'-(n/2) j, Pij E CO}io := the i 0 dom(a) such that for exactly (n/2) j's, pij E COA:={pij I a(pij) = 1A,8(pij)=0}

The sets COL-,COL+and(i0)are mutually disjoint (as COL-f1(n\dom(a)) _0 holds because CO is not satisfied by a). Point io satisfies the condition of the

claim, and as, by the definition, for all i 0 dom(a), i 0 io there are less than (n /2)i's for which pi.j E CP, it remains to show

ICOL-I + ICOL+I ? 4

Assume otherwise; then there must be Pi j E A for which {pioj, pioj} f1 co = 0and i COL- U COL+. This is true because the former condition excludes only(n/2) elements of A and the latter excludes less then (n/4) elements of A: Thatis, both conditions exclude less than (3n/4) = IA I elements of A. Now take suchan atom pig and define a new maximal 1-1 assignment a' from a by changing thevalue of pi,j from 1 to 0 and of pioj from 0 to 1. Clearly P c a' and a' does notsatisfy C'5 either, but for all u V dom(a') there are less than (n/2) v's for whichpuv c= CO (as we replaced io by i). It follows that the clause Ca precedes CO inthe refutation, contradicting the definition of CO.

We are ready to prove the bound. Let h (n, m) be the number of 1-1 assignmentsa with I dom(a)I = (n /4) and let g(n, m) be the maximal number of such 1-1assignments 0 sharing the clause CO. Clearly then the ratio

h(n, m)g(n, m)

is a lower bound to the number of different clauses in the refutation of - PHPn .

To simplify the expression put k (n/4). Then clearly

h(n, m) = ()()k!but h (n , m) can also be expressed differently. For fixed clause C = CO let X,I XI = k + 1, be a set of some k + 1 i's satisfying the property of Claim 2. If tdenotes the number of atoms pig, ,B (pig) = 1, such that i e X then we also have

h(n. ml =k

1 m-k-1 n!O(k+

t )( k t )(n -t - K)

For each i e X there are at most (n /2) j's for which it is possible that 0 (pig) = 1,for a 1-1 assignment ,B such that C = CO, I dom(,B) I = k. Thus

-t)!Vk1 -k-1 (nlt (ng(n,rn)< +

t

)(m

k-t )\2// (n-k)!t=O

Therefore we obtain a lower bound to the ratio

i:kO(kt l)lmkkt l) > tI=01k\t lllmkth(n,m) >

/g(n, m) Ek=olkt

l)(mkt 1j(n/2)t((n - t)!/n!) Et=0 lkt 1)(2/3)'

4.3 Sequent calculus 31

because

(n t (n - t)! < (2)t`2/ n! 1\3

holds fort < k = (n/4). The last ratio is 20(n2l ') . Q.E.D.

Note that when m < n' _' the bound is exponential and for m = o(n2/ log(n))it is still superpolynomial.

Open problem. Are there resolution refutations of - PHPn , m = n2, with poly-nomially many clauses?

4.3. Sequent calculusGentzen's sequent calculus is the most elegant proof system, for both propositionaland predicate logic, allowing sharp proof-theoretic analysis. In this section we shalltreat the basics of the propositional part of Gentzen's system LK, which we shallalso denote by LK as there is no danger of confusion. The full predicate system LKand its modifications for bounded arithmetic are considered in the next chapter.For the details of Gentzen's system we refer the reader to Takeuti (1975).

We shall confine ourselves to the same language as in Section 4.1: 0, 1, -, v, n,and atoms. The formulas are built by using the connectives from atoms and con-stants. First we define the notion of the depth of a formula (a proof).

Definition 4.3.1.1. The logical depth offormula 0, denoted tdp(O), is the maximum nesting

of connectives in 0. Precisely:

(a) 2dp(0) = £dp(l) = £dp(p) = 0, any atom p

(b) fdp(-/i) = 1 + £dp(*), any t/i

(c) fd p(ri o i/i) = 1 +max(tdp(rl), fdp(*)), for any rl, * and o = v, n

2. The depth offormula 0, denoted dp(O), is the maximum number of alter-nations of connectives in 0. Precisely:

(a) dp(i) = 0 fftdp(/) = 0(b) dp(->/i) = dp(tf) if the outmost connective of i/i is and

dp(-'i/i) = I +dp(i/i) otherwise

(i) dp(ri v i/i) = max(dp(ri), dp(i/i)) if the outmost connective inboth rl, * is v,

(ii) dp(rl V f) = 1 + max(dp(rl), d p(t/i)) if none of ri, Vi has v asthe outmost connective,

(iii) dp(rl V') = max(1 + dp(rl), dp(i)) if V is the outmost con-nective of Vi but not of ri,


(iv) dp(rl v *) = max(dp(rl), 1 + dp(i)) if v is the outmost con-nective of n but not of i/i.

3. The depth of proof 7r (resp. the logical depth) is the maximal depth of aformula in r (resp. the maximal logical depth). It is denoted dp(7r) orfdp(7r), respectively.

The basic object of the sequent calculus is a sequent, an ordered pair of twofinite sequences (possibly empty) of formulas written as

Ol,...,. A« *I,...,C'The symbol -+ separates the antecedent 01 .., (Pt, f r o m the succedent *J, ... ,*,,. We shall use letters r, A, Ii, ... to denote finite sequences of formulas, alsocalled cedents.

The truth definition is extended from formulas to sequents by the following: Anassignment a satisfies a sequent I' ) A if and only if a satisfies a formula fromthe succedent A or the negation of a formula from the antecedent F. In particular,the empty sequent 0 0, also written simply -, is unsatisfiable.

Definition 4.3.2. An LK-proof is a sequence of sequents in which every sequenteither is an initial sequent, that is, a sequent having one of the forms

A -- A, 0 -+, - 1with A an atom, or is derived from previous sequents by one of the, following rules:

1. weakening rules

leftr

A and rightr

AA,r -A r-A,A2. exchange rules

F1, A,

B, A, F2 -A

r A, A2

A A, A2

left and rightr A

r A

A : introduction rules

leftA A

and right r-) A,AAB


6. v : introduction rules

leftA,r-O B,r'-*0andAvB,r-A

rA,A r A,Aright r A,AvB and r- A,BvA

7. cut-ruler - A, A A, r ) A

r -)AThe new, formula introduced in a rule is called the principal formula of the rule;

formulas from which it is inferred are called the minor formulas of the rule. Allother formulas are called side formulas.

For a formula in A or r in the lowersequent ofa rule, the same occurrence in theupper sequent(s) is called the immediate ancestor of the formula. The immediateancestor(s) ofa principal, formula of a rule are the minor formulas of the rule.

An ancestor ofa formula is any formula obtained by repeating the immediateancestor step.

Theorem 4.3.3. The system LK is complete. That is, whenever a sequent r -* Ais satisfied by every truth assignment then there is an LK-proof of r - A.

Proof. We shall define a tree with nodes labeled by sequents by the followingprocess:

1 Th t i l l rb d b A. e roo as e e y

2. If a node is labeled by.

II-*E,-0where ,0 is a formula with the maximal logical depth in the sequent, thenthe node has only one successor labeled by

0,rl-) E3. If a node is labeled by

rl-E,0Awhere 0 A f has the maximal logical depth, then the node has exactly twosuccessors labeled by

11 E, 0 and Fl ---) E, t

respectively.

4. If a node is labeled by

rI--+E,4vrwhere 0 v has the maximal logical depth, then it has one successorlabeled by

rl ) E,OVr,OV*which has one successor labeled by

I'l ) E,O,Ovf

and this node also has one successor labeled by

II) E,0,i/r'

5. The cases where a formula of the maximal logical depth is only in theantecedent are defined dually, exchanging the roles of v and A.

Claim. All sequents appearing as labels are valid.

The claim follows from the assumption that the sequent r - A labeling theroot is valid.

It follows that the leaves of the tree must be labeled by a sequent consistingof atoms in which an atom appears in both the antecedent and the succedent, orcontaining 1 (resp. 0) in the succedent (resp. in the antecedent).

Such sequents can be obtained from the initial sequents by the weakening rules.The rest of the tree then defines an LK-derivation of I' -) A. Q.E.D.

Definition 4.3.4.(a) An LK-proof is called treelike if every sequent in the proof is used as a

hypothesis of an inference at most once.(b) An LK-proof is called cut-free iff the cut rule is not used in the proof.(c) The height of an LK-proof is the maximal number h such that there is a

sequence So, ... , Sh ofsequents of the proof in which Si is a hypothesis ofthe inference yielding Si+1, all i < h.

Note that if the LK-proof is treelike its height is just the usual height of theproof tree.

Corollary 4.3.5. Assume that I' - A is a valid sequent of size n. Then there isa cut free, treelike LK-proof of r - A of size at most 0 (n 2n).

Proof Let f (n) be the least upper bound to the size of treelike LK-derivationsof valid sequents of size n. Then the proof of the previous theorem shows that f

satisfies inequality

which yields

f(n)=O(n)+2f(n-1)

f(n) = O(n2")

.E.D.

Corollary 4.3.6. Every valid sequent

has a (cut free) LK-proof in which everyformula is a subformula of one of the.formulas Ot , ... , 0 .,'i , ... , i/i,,.

Proof. By Theorem 4.3.3 every valid sequent has a cut-free proof. Observe that inevery rule except the cut-rule all formulas appearing in the upper sequents of therule also appear as subformulas in the lower one. Q.E.D.

Important applications of cut-free proofs are the Craig interpolation theoremand the Beth definability theorem.

Theorem 4.3.7 (Craig 1957). Let the sequent

r(p,q) A(p,r)

have a cut free, treelike LK-proof of height h.Then there is a formula 1()5), an interpolant of the sequent, satisfying the

following conditions:1. I(p) contains no atoms q or -F2. both sequents

I' -* I and I-) Aare valid

3. £dp(I) < h + 1Morevover both sequents from condition 2 have a cut free, treelike pro of of heightO(h).

Proof. Let 11 -) E be a sequent in the proof. Assume that II = II i U 112 andE = E U E2 where 11 1, E i are the ancestors of r and 112, E2 are the ancestors ofA. We shall show by induction on h', the height ofthe subproof yielding 11 E,that there is a formula I' with no atoms q, r such that both sequents

II i E1, I and I, 112 ) E2

are valid, and such that fdp(I') < h' + 1.

For h' = 0 the sequent FI E is an initial sequent. If it is 0 put

I to if0EIII1 if 0 E 112

and dually in the case of the sequent - 1

1 ifl EE20 if lEEi

If 11 - ) E is the initial sequent pi -k pi define

ifFI2=El = 0ifFI2=E2=0if n, = E, = 0if I'll =E2=0

If 11 E is the initial sequent qi -> qi then both occurrences of qi mustbe ancestors of r and we put I' := 0. Dually, if 11 ---> E is the initial sequentri -) ri then we put I' := 1.

The only case of an initial sequent when the logical depth of I' is not 0 is whenI' = -pi, that is, always £dp(I') 0 we distinguish several cases according to which inference rulewas used to derive 11 -f E and according to which of the sets II 1, II2, E 1, E2contain the principal formula of the inference.

Let us consider just the case of the A : right inference of 11 - * E fromn --) E', a and 11 --k E', f8 where a A f8 is the principal formula and Ii and12 are two interpolants associated with the upper sequents. Define:

I, Iin12 ifaA EE2

Ii v 12 ifa A E El

Hence fdp(I) < 1 + max(fdp(Ii ), fdp(I2)) < 1 + h'.The cases of the other rules are treated analogously. We also leave it to the

reader to verify that all sequents

I'll -) E1, I and I, 112 E2

have cut-free, treelike proofs of the height 0(h'). Q.E.D.

Theorem 4.3.8 (Beth 1959). Let the sequent

r(p,q),r(p,r),glwith q = (q, ... , q,,, ), r = (ri, ... , rm) have a cut free, treelike LK proof ofheight h.

Then there is formula E(-P), an explicit definition ofq l , satisfying the followingconditions:

1. E(P) contains no ;y or T2. both sequents

r(P,9),gt E(P) and r(P,q),E(P) qi

are valid3. fdp(E) <h+1.

Moreover, both sequents in 2 have treelike, cut free LK-proofs of height 0(h).

Proof. Let E be the interpolant constructed in the previous proof for the sequentrli,r12 -) E1,E2 with III = r(P,g),gi, 112 = r(P,Y), Et = 0 andE2 = ri. Then

and

r(P,r),E-frifrom which immediately follows by renaming r into

r'(P,i ),E-kqt

Q.E.D.

It is an open problem whether the assumption that the proof is cut-free can bedropped from the hypothesis of the last two theorems; see Krajicek (1995b) andKrajicek and Pudlak (1995).

The last but one theorem of this section is a lower bound to the worst caseincrease of the size of proofs after cut-elimination, showing that the upper boundsfrom Corollary 4.3.5 are good even if we would strengthen the hypothesis of 4.3.5to the assumption that the sequent has a short LK-derivation.

Theorem 4.3.9. There is a sequent S of size I S1 = m such that every cut free,treelike LK proof of S has at least 2"(" sequents.

Moreover, the sequent S has an L K proof of size m 00).

Proof Let S be the sequent whose succedent is empty and whose antecedentconsists of n + 1 + (n+') ') n disjuncts V C (arbitrarily bracketed), one for eachclause C of -' PHPn+I from Section 4.2.

Let 7r be a cut-free, treelike LK-proof of S. By Lemma 4.3.6 every sequent Zin it has the form

D1,..., D qi,...,q,


where Dr are disjunctions of literals p;j or -p;j and qs are atoms among pig. Byinduction on the number k of inferences in the subproof of it yielding the sequentwe show that there is a resolution derivation of the clause

from the clauses D1, ..., D (we identify a disjunction of literals with the clauseconsisting of those literals) with at most k resolution inferences.

If Z is an initial sequent in r or derived by a structural rule, then there is nothingto prove. The only other inferences in r can be a V : left inference or a -' : leftinference. In the former case,

r, E -+ A r, F -- Ar,EVF) A

where r, E v F -p A is the sequent Z, assume that ao and ai are two resolutionderivations of A from r, E and from r, F, respectively. Construct a new derivationa of A from r, E v F by first replacing E in ao by E v F, getting a derivation ofA, F, and then joining this with ai, where F is replaced by A, F.

In the latter caser -k A,qr, -q -+ A

we have by the induction assumption a resolution derivation of A, q from r.Prolong this derivation by a resolution inference applied to A, q, and {,q}.

Clearly the number of resolution inferences is bounded by the number infer-ences in ir; we use the assumption that it is treelike to maintain this bound duringthe simulation of a V : left inference.

By Theorem 4.2.4 every resolution refutation of -, PHP'+' must contain atleast 22(n) clauses. Hence n must contain 2n0) inferences, which is 2""' form=ISO=0(n3).

By Corollary 13.1.8 and Lemma 4.4.15 the sequent S has a size m 0(1) LK-proof.

Q.E.D.

We shall see in Section 12.5 that Theorem 12.5.3 has a much stronger lowerbound for the sequent S.

The next proposition will show that every treelike proof can be balanced withonly a polynomial increase in size. The lemma itself looks rather technical so letus first motivate it. Assume that we have a proof 7r of sequent S from sequentsS1, ..., Sk (additional axioms: initial sequents). Let a be a truth assignment notsatisfying S. It determines a path P(a) through n: a sequence Zo, ..., Zt ofsequents such that Zo = S, each Zi is false under a, Z;+i is a hypothesis of theinference yielding Z;, and Z, is initial. Obviously Z, E {S1, ..., Ski and hence7r acts as a branching program solving the search problem to find false Si (cf.Theorem 4.2.3). Z;+1 is uniquely determined by Z; alone in all rules except inA : right, V : left and in the cut-rule. It is thus of interest to estimate the number

of these inferences on any path through it as such an estimate gives informationabout the complexity of the search problem.

We introduce two new derived rules:

1. special n : right

r -f A, A

r -* A, Y1 A...AyaA-81 A...A-3b AA

2. special V :left

A,r -* A-ry,v...v-Yavs1v...V3bvA,r)A

where r = (y1, ... , ya) and A = (81, ... , 3b). Their advantage, in connectionwith the preceding discussion, is that they have only one hypothesis. We call aproof special if it uses special A : right and special V : left but does not useA : right or V : left. Note that the only inference in special proofs with morethan one hypothesis is the cut.

Lemma 4.3.10. Let it be a treelike LK-proofofsequent S. Assume that d = d p(ir )is the maximal depth of a, formula in Jr and that it has size m.

Then there is a treelike special LK-proof n' of S from a set IT,, ..., Tk} ofsome tautological sequents such that

1. dp(ir') < l + d2. the size of any cut, formula in n' is O(m3)3. the number of cuts on any path through TI' is at most O (log m)

Proof. Assume it satisfies the hypothesis of the lemma. Construct a proof at of Sby replacing in Jr any A : right with a principal formula A A B (resp. v : left witha principal formula A v B) by two cuts with the sequent A, B -* A A B (resp.with A v B ) A, B); let these new sequents form the set T = IT,, ... , Tk} oftautological sequents used as axioms in at. Clearly Ia, I = O(m).

Denote by H(a) the number of cuts in a proof or and by h (a) the maximumnumber of cuts on a path through or.

The following claim borrows from the idea ofthe proof of Theorem 3.1.15 fromSpira (1971).

Claim 1. Let or be an LK proof and assume

(3/2)r-1 < H(a) < (3/2)t

Then there is a subproof ao of a ending with cut-inference

r A,A A,r -) Ar --) A

and with two immediate subproofs al and a2 (with end-sequents r -k A, A andA, r ) A, resp.) such that for i = 1 or i = 2 it holds

1 3t-1 3\t-1

22

H(ai)2/

and

H(a) - H(a,) < (3/2)t-1

The lemma follows from Claim 2 for or = a, as dp(a,,) = dp(n) and la, I =O(IirD.

Claim 2. Let or be a treelike special LK proof from axioms T. Let d be the max-imum depth of a cut formula in a and n(a) the maximum size of a sequent in or.Lett := [1og3/2(H(a))1

Then there is a treelike special proof a' with the same end-sequent as or suchthat

1. dp(a') < l + d2. the size of any cut formula in a' is < 2t n (a )

3. h(a') < 1 +t = O(log(lal))

We prove Claim 2 by induction on t: Assume or satisfies these inequalities. ByClaim 1 there is a subproof ao ending with a cut-inference

r--) A,A A,r ->Ar )A

with two subproofs al, a2. Assume without loss of generality (w.l.o.g.) i = 1 inClaim 1. Denote by rt (a) the maximum size of a cut-formula in or.

By the induction hypotheses applied to al we get a special proof a' of

r -+A,A

such that ri(a') = 2t-1 . n(ai), dp(ai) < 1 + d and h(a1) < 1 + t - 1 < t.Prolonging a' by several right and v : right and contractions we get aspecial proof ai' of

-YIV...-ya V81 v...V8bVA

where y,'s and St's are as in the definition of the special rules.After the proof a2 insert one special V : left to derive from A, F A

-YIV...-yav81 v...VSbvA,r-+A

and then continue as in a carrying the extra side formula ,y1 v ... -rya v Si v... V Sb v A in the antecedents. This yields a special proof a2, n(a2) < 2n (a2),and H(a2) = H(a) - H(al).

Apply the induction hypothesis to v2 to get a special proof o2 of

where I1 -) E is the end-sequent of dp(or2) < 1 + d, h(az) t, and(QZ) < 2t-1 n(a2') < 2' n(Q2)

- -Joining the proofs d' and a by one cut-inference gets the required special

proof a', estimating the size of the last cut-formula by 2n (02) < 2' m. Q.E.D.

There are other forms of this lemma, depending on a particular formulation ofthe proof system and on a particular purpose (see 12.2.4).

In later chapters we shall frequently talk about constant-depth LK-proofs: theproofs in which the depth of formulas is bounded by an independent constant.For such systems it is more elegant to allow the connectives A and V to have anunbounded arity and change Rules 5 and 6 of Definition 4.3.2 to allow introductionof these new conjunctions and disjunctions from several hypotheses in one step.We will conclude this section by the definition of these constant-depth systemsand by a simple statement.

Definition 4.3.11. Unbounded arity connectives A and V are connectives suchthat NA I__ , An) and V (A 1, ... , An) (written briefly as / \i < n Ai or Vi, n Ai)are formulas whenever Ai 's are formulas.

Rules for these connectives are extensions of Rules 5 and 6 of Definition 4.3.2:5.' A : introduction rules

left

right

A,r - AAi<nA1,r-) A

r)A,Al ... r) A, Anr-o O,Ai<nA1

6.' V : introduction rules

A, r --a A ... A, r Aleft

right

Vi<n Ai, r -. A

r-) A,Ar ) A, Vi<n Ai

where A is among A1, ... , An MA: left and V : right.The depth d LK-system is the system LK with Rules 5' and 6' and allowing in

proofs only formulas of depth at most d.

The next lemma follows from the proof of Theorem 4.3.3.

Lemma 4.3.12. Let r A be a valid sequent consisting of formulas of depthat most d.

Then r -. A is provable in the depth d LK-system.

4.4. Frege systemsA Frege system is a propositional calculus based on finitely many axiom schemesand inference rules that is implicationally complete and sound. We shall restrictourselves to the language 0, 1, v, and A as before, but we consider the gen-eral case at the end of the section. Most of the propositions in this section holdidentically for all Frege systems.

Definition 4.4.1. A Frege rule is a k + 1-tuple of formulas Ao,... , Ak in atomspl, ... , pn written as

AI,...,AkAo

such that any assignment a : { pl, ... , pn } -> {0, 1) satisfying all formulasA l , ... , Ak also satisfies formula Ao (the soundness of the rule).

A Frege rule in which k = 0 is called a Frege axiom scheme.Formula 00 is inferred by the rule from formulas 0i , ... , Ok if there exists a

substitution p, := iji .for whichOj = Aj (p!/,/i;).

A well-known example of a Frege rule is modus ponens

A A-+ BB

where A B is an abbreviation for -'A v B.

Definition 4.4.2. Let F be a, finite collection of Frege rules.1. A Frege proof (or briefly, an F proof) offormula r1 o from, formulas 117 1, ... ,

ij, } is a finite sequence 91, . . . , 9k of formulas such that every B; is eitheran element of (q 1, ... , q,,) or inferred from some earlier 9j 's (j < i) by arule from F, and such that the last formula 9k is rjo.

2. F is implicationally complete if and only if for any set { n. , ... , t7,,) suchthat every assignment satisfying all formulas in this set also satisfies 170,the formula r/o is F provable from {r7 , ... , r71, }.

3. F is a Frege proof system if and only if it is implicationally complete.

A Frege proof system that is often used consists of modus ponens and finitelymany axiom schemes.

We shall also define several complexity measures, besides the size, which makesense for the sequent calculus, Frege systems, and systems treated in a future sectionbut not for a general propositional proof system. We shall introduce these measuresnow as we want to understand some later results also in the perspective of thesecomplexity measures.

Definition 4.4.3.1. The number of steps in a proof 9j , ... , 9k is k; it is the number of in ferences.

The minimal number of steps in a proof of tautology r is denoted k p (r) (Pdenotes the propositional proof system considered).

4.4 Frege systems 43

2. The number of distinct formulas in a proof 7r is the minimal number a suchthat there are formulas O1 , ... , Of, such that any formula occurring as as u b f o r m u l a in n is identical to one o f ol , ... , Of. The minimal number offormulas in a proof of tautology t is denoted £ p(r).

The next several lemmas clarify to some extent the relations among the size, thedepth, and these measures. Recall that I w I denotes the size of string w; in particularIn I denotes the size of proof it. The following lemma is a particular corollary ofa more general result valid for predicate calculus.

Lemma 4.4.4. For every Frege system F there is a constant c > 0 such that everytautology r has an F proofir such that

fdp(ir) < c kF(t) + £dp(t)

In fact, for any F -proof 01, ... , 4k there is another F proof 01, ... , 9k suchthat

1. £dp(9;) <k2. there is a substitution 8 such that:

forall i <k.

s(9i) = Oi

Proof. Think of formulas as of labeled binary trees: The inner nodes are labeledby connectives and the leaves by atoms or constants. If formula P is a substitutioninstance of formula a then the tree representing a can be uniquely embedded intothe tree representing f, mapping the root of a onto the root of P and the sons ofa node onto the sons of the image of the node. Moreover, the labels of the innernodes are preserved by such a map.

We say that the subformula y of P is in the image of a in P if the root of y isin the image of a in the embedding.

Let

be a Frege rule and

A1....,AkAo

to

its instance. The Z-part of the instance of the rule consists of all subformulas of4; that are in the image of A; , i = 0, ... , k.

The Z-part of a proof 01, ..., Ok is the union of the Z -parts of all inferencesused in the proof. Note that the Z-part of a proof with k steps has cardinality atmost c k, where constant c is the maximal number of subformulas occurring in aFrege rule of the system.

Call those subformulas in the proof special whose identical copy is in theZ-part. Those occurrences of special formulas a form the border, which has twoproperties:

(i) if a is a subformula of fi then 0 is special.(ii) at least one of immediate subformulas of a is not special.Define from , ... , Ok a new sequence of formulas 01, ... , Ok by replacing

every border occurrence of a formula a by new atom pa.

Claim. Sequence 01, ... , Ok is a Frege proof and fd p(0;) < c k.

The sequence is a Frege proof as each Oi can be derived from 0,, , ... , Oii bythe same rule as'i was derived from /i. , ... , iii. The logical depth of 0; is atmost c k as any proper chain of subformulas formed from special formulas maycontain at most the cardinality of Z members.

Finally, note that the substitution

S : poHa

gives the wanted substitution: 6 (0i) = 0i. Q.E.D.

Lemma 4.4.5. Let r be a tautology and assume that it is a Frege proof of r of theminimal possible size. Then

2dp(7r) < O( I7rl fdp(r))

P r o o f . Let it = 0 1 , Ok be a proof of r of the minimal size. Take a step Oj ofit and write

v/ _ 1Up (0j) somet > 0

Let the set S of subformulas occurring in it consist of those in the Z-part of itand of those occurring in the last formula. Note that any proper chain of subfor-mulas (each one included in the previous one) from S can have cardinality at mostmax(c, £dp(r)), with c the constant associated to the system in Lemma 4.4.4.

Let Oi 0 1 ,..Ok be the sequence constructed in Lemma 4.4.4 and let So be aminimal substitution such that 8o(Ok) = ¢k, but not necessarily: So(0i) = 4.i fori k. But by the hypothesis it is of the minimal size, so, in fact, SO (0j) _ Oi musthold for all i, as otherwise 80(01), ..., 30(0k) would be a proof of r of size smallerthan jr.11From

the construction in the previous proof it follows that for every subformulaa of Oj there exists an identical subformula in the set S. Put

d := 1 + max(c, tdp(r))

Hence every proper chain of subformulas from S has cardinality less than d.

Assume as. ... . ao is a maximal sequence of subformulas of (k, such that(i) a, is a subformula of ai+i

(ii) the depth of the root of a, in the tree 0, is (s - i) dThus we have

s=t-d, some0<u<dd

Let a s. ... . ao be subformulas from S such that a is identical to a,. Let Pi denotethe path in the tree 4., starting with a, on which ai, ... , ao lies and let P' be thecorresponding path originating from a'. The cardinality of P' is

IP,'l

Now for i 0 j: a' 0 P' by the property that any proper chain of subformulasfrom S has cardinality less than d. This implies that all paths Pi are disjoint, so

s s

I n I >>- I (d j+u+1)j=0 j=0

Substituting into this inequality the value of s (and using 0 2d

, i.e., t < 2d

Hence:

21irld = O( Inl . £dp(T))

Q.E.D.

This bound is, in fact, also optimal (see Krajicek 1989a).The following lemma shows that the measures kF(r) and CF(r) are essentially

the same.

Lemma 4.4.6. For every Frege system F there is a constant c > 0 such that forevery tautology r

kF(r) -<tF(T) <c-kF(T)+ITI

Proof. Let 4)1, ..., Ok be a proof of r, and let the constant c and the formulas9i , ... , 6k be those from Lemma 4.4.4. Let So be a minimal substitution such that30 (0k) = Ok = r. Then sequence So(01), ... , So(0k) is a proof of r and it containsat most c k + I r I distinct formulas, as any subformula in it must be identical toone from the Z-part of the original proof or to a subformula of r.

This proves the second inequality. The first one is trivial. Q.E.D.

The following three lemmas are simple structural properties of Frege proofs.

Definition 4.4.7. A Frege proof 61, ... , 0k is treelike if and only if every step Oi isa hypothesis of at most one inference in the proof.

Lemma 4.4.8. Assume that t has a Frege proof with k steps, of depth d and withsize M.

Then t has a treelike F proof 7r such that:1. dp(rr) < d + c2. 17r1

3. r has at most c k log(k) stepsfor some constant c > 0 depending only on F.

Proof Let Ol , ... , Ok be an F-proof oft satisfying the assumptions. Define 4, :_Ol A ... A 0,, brackets balancing the conjunction into a binary tree of depth atmost O(log(i)). We show that ¢,+1 has a treelike proof from 4, with a O(log(i))number of steps, size O(i I), and depth d + O(1). This follows from the nextclaim.

Claim. For j < i, any Oj can be proved from 4i by a treelike proof with 0(log(i))steps, size O(log(i) Ioil), and depth dp(o,) + O(1).

Then the whole proof has Ei 0(log(i)) = O(k log(k)) steps and size>i O(log(i) 10i 1) < O(log(k) m). Q.E.D.

Lemma 4.4.9. Let t (pl, ... , p,,) be a tautology. Then there are constants k andc such that for every 01, ... , q5n the tautology

t(01,...,0n)

has a treelike proof with k steps, depth at most c + maxi (dp(oi) ), and size at most

c Ti AD.

Proof Let r be any fixed treelike proof of r. Then a proof obtained from 7r bysubstituting 4, for pi satisfies all the requirements. Q.E.D.

The following lemma provides a bound for the deduction lemma.

Lemma 4.4.10. Let 0 be F -provable from 4i by a proof with k steps, size m,and depth d. Then the implication Eli -+ 0 has an F -proof with O(k) steps, size0(k m), and depth d + O(1).

Proof. Let Ol , ... , Ok be an F-proof of 0 from *(= Ol ). Construct successivelythe proofs of implications * -> 0,.

If 0i was inferred from Oj, ..... Of,. in the original proof then the implication

(*-+ Oj,)-((r-+ OJ2)-*(...((* 9jr)Oi)...is a substitution instance of tautology

A

whereA1 , .. , Aj,.

Ai

is an F-rule. Hence by Lemma 4.4.9 the implication has a treelike proof with O (1)steps, size O(I'I + 10i j + 2s<r laj,. I) = O(m), and depth d + 0(1).

From this nested implication and from the implications

V1 --) s < r

proved earlier the desired implication

tJ/

follows in 0(l) steps, size 0(m), and depth d + 00).Hence the total number of steps in the proof of li -. 0 will be O(k), the size

will be O(k m), and the depth d + O(1). Q.E.D.

The knowledge of optimal bounds to the complexity of proofs of a tautologymeasured by any of the measures is poor, except for the depth, which follows from4.3.6.

Lemma 4.4.11. Let r be a tautology with logical depth d. Then r has an F -proofof logical depth at most d + O (1).

For the number of steps and the size only the following trivial consequence ofLemma 4.4.4 is known.

Lemma 4.4.12. Let r be a tautology which is not a substitution instance of anyshorter tautology. Let m be the sum of the sizes of all subformulas of r (including r).

Then any F -proof of r must have at least S2(fdp(r)) steps and size at least7(m).

In particular, any F -proof of r :_ -' ... 1, negation - 2n-times, must haveQ(n) steps and size S2 (n2).

P r o o f . Let 7r = 4 ) 1 ,. . ., Ok be an F-proof of r and let constant c and proof01, ..., Ok be those constructed in Lemma 4.4.4. As r is not a substitution in-stance of any shorter tautology we must have: Ok = Ok = r and 01 , ... , Ok is aproof of T. Hence also fdp(r) c-I < k.

It follows from the construction of 61, ..., Ok that any subformula has an iden-tical occurrence in the Z-part of 01, ..., Ok and, in particular, every subformulaof r is special in every F-proof of T. A subformula from a Z-part has at most csubformulas also in the Z-part; this condition entails that the size of any F-proofof r must be at least Q(m). Q.E.D.

I have postponed till the end of this section the most important result of Reckhow(1976). To appreciate it we now allow the language of Frege systems to be arbitrary


complete finite set of propositional connectives, where complete means that anyBoolean function can be expressed by a formula in this language.

Recall Definition 4.1.3. In this definition we considered only the case when Pand Q had the same language. For the following theorem we extend the defini-tion to P, Q, which do not necessarily have the same language, by replacing theimplication in part (b) of Definition 4.1.3 by the implication

1'(w) = g(t) - Q(f(w, r)) = r

where g is a polynomial-time function translating formulas in the language of Qinto the language of P. It is assumed that if r is a tautology then g(r) is also atautology.

Theorem 4.4.13 (Reckhow 1976). Any Frege system polynomially simulates anyother Frege system. Moreover, if the language of F1 contains the language of F2then g can be the identityfunction and F1 has at most a polynomial speed-up overF2, and the simulation f can be chosen so that both the number of steps and thesize increase at most proportionally and the depth increases by a constant.

Proof (sketch). Instead of giving the full proof we shall outline the easy partand point out the difficulty with the general case. The way how to overcomethe difficulty is then described separately in Lemma 4.4.14.

Let F be a Frege system in the language {0, 1, v, A), with finitely manyaxiom schemes and the modus ponens as the only rule of inference.

Let F' be any other Frege system. We shall show that F and F' mutually poly-nomially simulate each other.

This is rather easy to see when, the language of F' is also {0, 1, v, A). Anyrule of F' has the form

AI(P),...,Ak(P)A0(P)

and is sound, so the formula

AI(P) - (A2(P) - (... - (Ak(P) - Ao(P))...)

is a tautology and hence any of its instances of size m has size 0(m) F-proof (cf.Lemma 4.4.9). Thus k applications of modus ponens simulate the inference in F'using the rule, and this part of the simulation has size 0(m).

On the other hand, all instances of axiom schemes of F have short F'-proofs,by Lemma 4.4.9 again. System F' is implicationally complete so there is a fixedsize F'-proof of q from { p, p - q). Thus any instance of modus ponens can besimulated by a linear size F'-proof too.

To see the difficulty that may arise when F' has a different language considerthe case when = ( the equivalence connective) is in the language of F'. In the

language of F the equivalence p q is defined as (p A q) V (,p A -'q), and, infact, there is no equivalent definition in which a variable would occur only once.

Consider the formula

P1

which has size O (m). If we translate it into the language of F by using the precedingdefinition we obtain a formula of size S2 (2m); generally if the nesting of -='s is k,then the translation will have size S2 (2k).

This shows that we cannot just translate the connectives in one language into theother one and literally translate proofs. However, it also suggests that a statementanalogous to the Spira theorem 3.1.15 might help: First transform the size m F'-proof into a size m 0(I) F'-proof in which all formulas have the logical depth0(log m), and then apply the straightforward translation to construct size m0(1)F-proof.

We extract this idea as a particular lemma explicitly in 4.4.14, and leave therest of the details to the reader. Q.E.D.

In the next lemma we again use the language 0, 1, v, A,

Lemma 4.4.14. Assume that r has an F -proof of size m. Then r has a proof ofsize m0(1) and of logical depth at most 0(log(m)) + £dp(r).

Proof We first state a claim allowing inductive proof of Spira's theorem. Theclaim is verified by induction on the logical complexity of a.

Claim 1. Let a (p) be a formula of size n. Then there are a formula B (p, q) withone occurrence of atom q and a formula y (p) such that

1. a= f(gly)2. (1/3)n < 101, Iyl < (2/3)n

We assume that among several such decompositions fl(y) we pick one in somecanonical way, and we shall call it the canonical decomposition. We also assumethat -' is pushed by de Morgan rules into literals and that -* is an abbreviationfor the formula arising from * by interchanging v and A, 0 and 1, and atoms andtheir negations.

Then we define the balanced form of a, denoted a*, by induction on the sizeof a:

1 . a* := a, if IaI < f2. a* :_ (p*(q/l) A y*) V A,y* ), where Jul > f and 0(y)

is the canonical decomposition of a.The constant f is used for convenience and we shall specify it later.The following claim is readily verified by induction on the logical depth of

formula P*.


Claim 2. The equivalence

((0*(q/l) A Y*) V (0*(q/0) A .Y*)) = P*(Y*)

is valid and has an F proofofsize 0(1#* (y*)12) and oflogical depth Cd p(P* (y *)) +0(1).

The next claim is crucial.

Claim 3. Let a = #(ql /yl, ..., qk/yk) where each q, has exactly one occurrencein 0. Then the equivalence

has an F -proof ofsize 0(Ial°(I)) and logical depth £dp(a*) + 0(1).

In proving the claim we shall proceed by induction on r := flog3/2(lal)l andwe shall distinguish two cases:

1. The canonical decomposition of a = p(t/v) has the form

p= O(gl/Yi,...,ge/Ye,t)

and

or

where

ge,t//(gt+l,...,qk))2. The canonical decomposition of a = p(t/Q) has the form

p = P(qt /Yl, ... , qk-I IA-I , gklO (t))

where

Yk = 0(t/a)

Let us consider the first case, leaving the analogous treatment of the secondone to the reader. For notational simplicity we shall consider a completely generalcase with k = 2, f = 1, denoting yl by y and y2 by S.

By the definition a* is

((4(Y, t/l))* A (,(b))*) v ((4'(y, t/0))* A ,((s))*)

AsFlog3/2(10(Y,t)I)1,[log3/2(I r(8)1)1 r - l,by the induction hypothesis thisis (shortly provably) equivalent to

(O*(Y*11) A *(s*)) V (O*(Y*00) A -r*(s*))

which is by Claim 2 (shortly provably) equivalent to

c*(Y*, .*(s*))

Now we would like to show the equivalence

P* = 0*(ql, .*(q2))

which would yield the wanted equivalence of a* with

fl*(Y*, S*)

We cannot apply the induction hypothesis yet as it might hold that1log3/2(IO(gi,')I)1 = r, but we note that 11og3/2(1¢1)1, [log3/2(I'I)1 < r - 1.Instead we shall simply take the canonical decomposition O(w) of ¢ (q i , ') anduse this to prove the equivalence.

There are three cases to distinguish (the substitution is always for only oneoccurrence of an atom)

(a) for some Bo

0 = 0(9o) and ( = Oo(w)

(b) for some coo

0=0((oo) and w=wo(k)

(c) neither (a) nor (b) holds: That is, the occurrence of w in 0(*) is disjointwith *.

We shall consider the first case only; the other two are identical.By the definition (0(Vl))* is the formula

((0(0o(l)))* A(0*) v ((0(0o(0)))* A -Co*)

which is, by the induction hypothesis applied to O(Oo), (shortly provably) equiva-lent to

((o*(B* (l))) A w*) V ((0*(00 (0))) A -'w*)

and by Claim 2 to

0* (00 ((0*))

As 1log3/2(Ieo((O)I)1 = 1log3/2(IiI)1 < r - 1, by the induction hypothesis this is(shortly provably) equivalent to

0* ((0 (w))*)

which is just

0*(f*)

Let f(m) denote the smallest size of an F-proof of (v(K))* - v*(K*) form = 1 v (K) 1. As the induction hypothesis is in the preceding argument applied to

formulas of length < (2/3)m we have (using the bound from Claim 2 too)

f(m) = O (f (3m)+m2)

which yields f(m) = m°(1).This proves Claim 3.Choose constant f to be a number bigger than the size of any formula A; in a

Frege rule of system F (cf. Definition 4.4.1). Let 7r = 01, ..., 8t be an F-proof ofsize n of formula r (= Ot ) . W e construct F-proofs of 0,, i = 1, ... , t by inductionon i.

Take i = io and assume B;o was inferred from 9;, .... , 8k' i 1, ... , ik < io bythe rule

AI,...,AkA0

such that 9,, = A j (i1, We already have F-proofs of B(j = 1, ... , k)and hence by Claim 3 also of

e

As A* = Aj by the choice of f, applying the same inference rule yields

A0('l/1,...,'Pt

.from which we deduce (by Claim 3 again) formula 04,By Claim 3 the prolongation of the constructed proof needed to get e o is

E j < k I Big 1°(1) = m 0(l), and hence we eventually get size m 0(1) F-proof of for-mula r*, which is equivalent (Claim 3 again)-with a short proof-to T.

The logical depth of the proof of r* is < maxi fdp(9;) + 0(1) = 0(log m),and of r from r* is < fdp(r) + O(1). Hence the logical depth of the new proofis 0(log m) + fdp(r), as required. Q.E.D.

We conclude the section by stating a straightforward relation between the sys-tems LK and F.

Lemma 4.4.15. The system LK and F mutually polynomially simulate each other.In particular, if a. formula ¢ has an F -proof of size m then the sequent

)0

has an LK-proof of size 0(m2), and vice versa: If a sequent

has an LK-proof of size m then the formula

(v) v (voiI

4.5 Extension and substitution rules 53

(disjunction bracketed arbitrarily) has an F -proof of size 0(m2).Moreover, the depth of proofs in these simulations increases at most by a con-

stant.

The proof is omitted.

4.5. The extension and the substitution rulesIn this section we consider two extensions of system F. As before we confine thediscussion to the language 0, 1, v, n,

Definition 4.5.1. The substitution rule allows simultaneous substitution of formu-las, for atoms in one inference step

Pn)

A Frege system F augmented by the substitution rule will be denoted SF.

Another useful rule is the extension rule, whose version we saw already inSection 4.2. Its introduction is a little less straightforward, and rather than definea system, we define its proofs.

Definition 4.5.2. Let F be a Frege system. An extended Frege proof is a sequenceo f formulas 01, ... , 0k such that every Oi either is obtained from some previousBj 's by a rule from F or has the form

9=fwhere:

1. the atom q appears in neither 1 nor Oj. for some j < i2. the atom q does not appear in 0k3. a =_ /3 abbreviates (a n ,B) V (-a n

A formula of this. form is called an extension axiom and the atom q is called anextension atom.

An extended Frege system EFis the proof system whose proofs are the extendedFrege proofs.

The first two lemmas are trivial.

Lemma 4.5.3. For every r

kEF(T) < kF(T) = 0 (kEF(T))

Proof. The first inequality is obvious. For the second one, replace an EF-proofwith the extension axioms

91 °*1(P), q2=*G2(P,91), > 9r= r(5,91>,9r-1)

first successively q, by 1/!,, qr-1 by *,-1, ..., and so on. This is an F-proof of rfrom axioms of the form * = 1/r that remained from the extension axioms; eachhas an F-proof with constant number of steps by Lemma 4.4.9. Q.E.D.

Lemma 4.5.4. A Frege system with the substitution rule SFpolynomially simulatesany Frege system with the extension rule EF.

Proof Let ql = .1, ... , q, = 1/fr be all extension axioms introduced in an EF-proof 91, ... , Ok of r of size m. We may assume that these r formulas are the firstr steps of the proof. Note that none of q j, ... , q, occurs in r but that q j may occurin>/ij+1,...,1/!r.

By the deduction Lemma 4.4.10 there is an F-proof of size O(m2) of the im-plication of size O(m)

qr = *r - - (qr-1 = Yfr-1 - (... (ql = *1)) ...) T .

Apply the substitution rule to this formula by substituting *r for qr

*r = *r , (qr-1 = *r-I ± (... (q1 = *1)) ...) -* r

and then separately derive 1/!r = *r (by a proof of size O 0 *r 1): Lemma 4.4.9),and by the modus ponens thus infer

qr-IY/r-1

By repeating this r-times we derive r by a proof of the total size 0(m2). Q.E.D.

The next lemma is the converse of Lemma 4.5.4 and is considerably moredifficult.

Lemma 4.5.5. Any extended Frege system EFpolynomially simulates any Fregesystem with the substitution rule SE

Proof. We shall assume for simplicity that the only rules in an SF system are themodus ponens and the substitution rule (plus any number of axiom schemes).

Let 01(p ), ... , Ok (p) be an SF-proof where (pl .... pm) = p are all atomsappearing in the proof, and let (q 1 .... qk ) be tuples of mutually distinct atomssuch that qk := p.

A formula 1, denotes ¢; (q, ), for i < k. For j < k define a tuple ,B j of mformulas as follows:

1. if O j (p) is an axiom or has been inferred by modus ponens then set Pj

;j2. if / j (p) has been inferred by the substitution rule (say by substitution a)

from formula ¢, (P ), ¢ j (p) = O,(a()), then set 16 j = a()

4.5 Extension and substitution rules 55

Denote by

'Pij=>/i1A...A>/ij, 0<i-1<j<kwith Ti, i -1 denoting the constant 1.

The simulation of the original proof in system EF proceeds as follows. Firstintroduce variables ik-1, ... , 91 by the extension rule

qi,P := ('Pi+l,i A -'*i+l A 0i+I,t) V ... V (Pi+l,k-I A ,1lik A

Obviously

'Pi+l,j-l A -*j -4 qi,P = 16j,t, i < jis valid and can be derived by a polynomial size proof. Hence by 4.4.9 we havepolynomial size proofs of

'Pi+1,j-I A-*j 1 <j

which can also be written as

''i+l.j-I A-'*j - *i < j

Now we prove consecutively 1/i1, *2, ..., *A. and since 1k = q5k we obtain apolynomial simulation.

Assume that the formulas rl, ..., t/ij_l have been proved. Consider two pos-sibilities: Either 4j is an axiom, in which case so is i/ij, or 4j follows from 0", ¢,,,u, v < j, by modus ponens, or ¢j has been inferred by the substitution rule.

In the case of modus ponens, first derive j-1 and PU+l, j_ 1 and thus getfrom the previous implication (with i = u and i = v)

)

Applying modus ponens to o,, ($ j) and O (,O j ) we get -*j --* 0,, ($ ), which is-Vij -a ,lii, and hence i/ij follows.

In the case that ¢j was inferred by the substitution a from 0i we obtain ->/ij -./i in the same way as in the preceding formula. by the definition then:

ci (f j) = Oi (a (Q j)) = ci (4 j) = Y j

This completes the derivation of t/ij. Q.E.D.

Note that the number of steps in the simulation from the previous proof some-times increases exponentially. In fact, kSF(r) may be exponentially smaller than

kEF(r)

Lemma 4.5.6. For any sufficiently large n there is tautology r" with an SF-proofhaving 0 (n) steps but whose every EF-proof must have at least 0 (2") steps.

Proof Definer, := --.(2n) (1) with ,(') standing fort occurrences of -.Consider formulas Pk = p (_)2k (p). Obviously SF derives,Bk from &-I by

0(1) steps (by first substituting(_)2A-I

(p) for p in,Bk_I and by modus ponens).As Po is provable, every Pk has an SF-proof with 0(k) steps. In particular, r"

has an SF-proof with 0(n) steps.Let t" have an EF-proof with k steps; thus by Lemma 4.5.3 there is an F-proof

7r of t" with 0(k) steps.By Lemma 4.4.4 there is an F-proof of logical depth 0(k) of some Q" such

that r" is a substitution instance of v". But then Q" = in and hence k = S2(2"),which entails the lower bound. Q.E.D.

There is a close relation between the size and the number of steps in the extendedFrege system.

Lemma 4.5.7. Any tautology r has an EF proof of size O(kEF(r) + It I ).

Proof. Let r be a tautology with atoms qi , ... , qn. For all formulas 0 in atomsqi, ..., q" consider the atom pe and a set of defining conditions like Ext(B) inSection 4.2:

1. pq; = qi2. p,o = -pe3. po, Fez = po, A P824. pe, v82 = Pe, V p82 .

Also every formula V1r built from q,'s and p8's is equivalent to some pp: Replacep8's by 9's and let 0 be the resulting formula. We will denote such 4, by -ili*.

Now assume that we have an EF-proof 7r of r with k steps, and assume w.l.o.g.that the first f steps are the extension axioms

ri=¢i,...,re °Oewhere 01 contains only qj's, and 4,j+l may also contain the extension atomsri,...,rj.

Let

Be+l,...,ek

be the remaining k - f steps of n.We shall construct a new EF-proof zr' by the following process. First replace

01 in the first formula ri _ 01 by pp, and replace the atom ri by p,, in the wholeproof. Then replace 02 (which now has p0, in place of rl) in the second formular2 02 by p0* and r2 by pot in the whole proof. Generally in the t''7' step, t < f,replace (pt by po in the t'h formula and r, everywhere in the proof also by po-.

After this transformation the first f formulas of rr are transformed to formulasof the form

PO = PO

4.6 Quantified propositional logic 57

which all have constant size F-proofs, and no atoms rj's appear in the remainingk - f steps.

Our aim is to show that for each 0;, f < i < k, the atom pe* has constant sizer

proof f r o m pee . , .... Poi * Iand from some defining conditions. This is readily

shown by induction on i, considering cases distinguished by the rule used inJr to infer 0, . As an example consider the case when 9,' was inferred from 0',and 0,(= 9,s -+ 9,u) by modus ponens. Assume you , pov and use the definingconditions to infer from you the formula peu -+ po*, and apply modus ponensto get pou, . Finally, using the defining conditions we infer from pok (= pt) theformula i itself. The defining conditions are just the extension axioms, so thenewly constructed sequence is an EF-proof n' with O (k) + O (1 r 1) steps and totalsize O(k) + 0(1,r I). Q.E.D.

The previous proof can be used to give another proof that any two extended Fregesystems polynomially simulate each other. This is because we could construct thenew proof ir' in another system with other language in the same way; only thedefining conditions for po (0 in the first language) would be written by using thesecond language.

Lemma 4.5.8. Extended resolution ER is the system R augmented by all clausesfrom Ext(O), for all formulas 0, as extra initial clauses (cf. Section 4.2). Then ERand EFpolynomially simulate each other.

The substitution rule allows several seemingly weaker variants that are, how-ever, polynomially equivalent. For example, we may restrict the substitution ruleby allowing substitution of only one formula at a time or, more interestingly, byallowing only atoms in place of formulas 01, ... , 0" in Definition 4.5.1. We leavethe proofs of the equivalence of these versions to the reader.

4.6. Quantified propositional logicQuantified propositional calculus is formed from a Frege system or the sequentcalculus LK by introducing propositional quantifiers: `dx9(75, x) whose meaningis 0('P, 0) A 0 (75, 1), and 3x9(75, x) whose meaning is 9 (p, 0) v 0()5, 1). Thisalso defines the satisfiability of quantified propositional formulas.

In first order logic quantifiers allow one to define relations and functions notdefinable by quantifier-free formulas, but the propositional quantifiers do not in-crease the expressive power of formulas. Instead they allow us to shorten somequantifier-free propositional formulas. For example, V E 0 (E) with E ranging over{0, 11" has size 0(2"101) but an equivalent quantified formula 3x i ... 3x0(x)has size only O (n) + 10 I

Definition 4.6.1.1. The class Eo = rig consists of the quantifier free propositional formulas.


2. The classes E+1 and II+1 are the smallest classes satisfying:

(a) Eq U fl c E+1 n n+(b) both E+1 and n+] are closed under v and A

(c) if .0 E E+1 then - E n+1

(d) if 0 E II+1 then -'4 E E +1

(e) E+1 is closed under existential quantification

6 IIi+1 is closed under universal quantification

3. A quantified propositional formula is any formula appearing in one of E7;EQ denotes the class of all quantified propositional formulas

We shall base our definition of the quantified propositional calculus on systemLK.

Definition 4.6.2.1. Quantified propositional calculus G extends the system LK by allowing

quantified propositional formulas in sequents and by adopting the follow-ing extra quantifier rules:

(a) V : introduction

leftA(B), r ----> A

and rightr --> A, A(p)

VxA(x), r A r ) A, VxA(x)

(b) 3 : introduction

leftA(p), r A

and rightA, A(B)

3xA(x), I' - A I' -- A, 3xA(x)

where B is any formula, and with the restriction that the atom p does notoccur in the lower sequents of V : right and 3 : left.

2. The system G; is a subsystem of G that allows only Eq formulas in se-quents.

3. The system G, is a subsystem of G, allowing only treelike proofs.

The definition of the systems Gi and G; may seem ad hoc, but these systemsnaturally occur in connection with bounded arithmetic in Chapter 9.

Note that Go = LK and hence by Lemmas 4.4.8 and 4.4.15 Go and Go polyno-mially simulate each other. This is unknown for G; and G for i > 0. We considerthese quantified systems primarily as proof systems for quantifier-free tautologies,but later we shall also consider sets TAUT; of tautological E7-formulas, and weshall compare the systems Gi and G as proof systems for TAUT; rather then justfor TAUT.

Lemma 4.6.3. The systems G i and EFpolynomially simulate each other.

4.6 Quantified propositional logic 59

Proof First we show that G i polynomially simulates treelike SF, which impliesthat it also polynomially simulates EF, by Lemmas 4.4.8 and 4.5.4. Clearly it isenough to describe how G i simulates an instance of the substitution rule

B(P1, ... , Pn)

To -+ 9 (pl , ... , pn) apply n-times V : right to obtain

b'xl ...Vxn8(xl,...,xn).

Any sequent of the form

has a short G i -proof, from which follows

dxl...Vxna(xl,...,xn) ) 0(01,...,On)

by n applications of V : left. The cut-rule infers from this sequent and the previoussequent the desired sequent

) 0(0],...,On)Now we want to show that EF p-simulates G*. We shall first deal with a simpler

case in which all formulas in a G i -proof either are quantifier free or begin with ablock of existential quantifiers followed by a quantifier-free kernel; we shall callsuch formulas strict Eq.

In this case every sequent in a proof looks like

...,lgs(P),...,3ytO'(P,Yt),...

where a; , ,Bs range over the quantifier-free formulas in the sequent and3 Y pt (p, xj), 3 yto,'( p, yt ) over formulas with a block of existential quantifiersin front of quantifier-free kernels c t,'., ,BI's.

Call a sequence of formulas satisfying the conditions of Definition 4.5.2 withthe requirement that the extension atoms cannot appear in the last formula droppedan EF-sequence.

Claim. Assume that the sequent S of the form given previously has a G *-proof inwhich all formulas are either quantifier free or strict Eq, with k sequents and ofsize m.

Then there is an EF-sequence from I.... ai (7i), ... , a (p, qj ), ...} with ex-tension atoms F I_., with the last formula

Vfls(P)'/VfI(P,Ft)S t

and with 0(k) steps and size 0(m).


The claim is proved readily by induction on k and clearly implies the lemma.The case when not all formulas are strictE' is treated similarly, only the defi-

nition of EF-sequences is slightly more complicated. We leave it to the reader.Q.E.D.

The following lemma is proved completely analogously.

Lemma 4.6.4. For any i > 0, the system Gi polynomially simulates G+] -proofsof Eq -formulas.

The system G is akin to sequent predicate calculus and shares some analogousproperties. We mention only one, the midsequent theorem (see Takeuti 1975), butwe will not prove it (the reader can follow the idea of the proof of Theorem 4.3.3).

Lemma 4.6.5. Let IF -* A be a valid sequent consisting of quantified proposi-tional formulas in a prenex normal form.

Then there is a treelike, cut free G -proof of F -+ A in which there is a se-quent S (the midsequent) such that

1. no quantifier inferences occur in the proof above S2. no propositional inferences occur in the proof below S

There is an obvious generalization of the systems LK and G to the "highertype" propositional calculi. We shall not pursue this generalization as any availableinformation about such systems is only a trivial generalization of facts about LKand G.

4.7. Bibliographical and other remarksThe notions of a propositional proof system and of a polynomial simulation (Def-initions 4.1.1 and 4.1.2(b)) are from Cook and Reckhow (1979).

Theorem 4.2.1 is from Davis and Putnam (1960).Theorem 4.2.3 was used implicitly by various authors and explicitly noted

in Lovasz et al. (1991). The lower bound for resolution in Theorem 4.2.4 waspreceded by a lower bound for regular resolution in Tseitin (1968). Urquhart(1987,1992) investigated the complexity ofresolution further, in relation to cut-freeGentzen's systems and distinguishing between sequencelike and treelike resolutionrefutations (for regular resolution treelike and sequencelike systems are equallyefficient). The proof of Theorem 4.2.4 follows Buss and Turan (1988) closely. SeeKrajicek (1995b) for another proof.

As we excluded the implication from the language of LK, our system has anadditional structural property: All weakening inferences can be postponed till theend of any proof. Interpolation theorems and their relevance to lower boundsare studied in Krajicek (1995b); see also Krajicek and Pudlak (1995). Theorem4.3.9 was originally proved differently by Statman (1978). The proof of Theorem


4.3.10 follows Krajidek (1994). Notions of Frege and extended Frege systems weredefined in Cook and Reckhow (1979). Gentzen refers to such systems as Hilbert-style systems; see Gentzen (1969). Parikh (1973) was the first to observe that abound to the number of steps implies a bound to the logical depth of a proof. Hisproof implicitly gave an exponential upper bound. The optimal bound of Lemma4.4.4 was proved in Krajicek (1989a) for the predicate calculus of arbitrary order.The corollary for the propositional calculus follows also from Lemma 4.5.7: Theproof constructed there has formulas of constant logical depth, and thus replacingthe extension atoms by their definitions increases the logical depth only propor-tionally to the number of steps. Lemma 4.4.5 is taken from Krajicek (1989a).Lemma 4.4.6 was noted in Buss (1993a) and Krajidek (1995a). Lemma 4.4.8 isfrom Krajidek (1989a). A detailed study of the deduction lemma, 4.4.10, is inBonet (1993).

Lemma 4.4.14 is implicitly contained in Reckhow (1976).Definitions 4.5.1 and 4.5.2 are from Cook and Reckhow (1979).Dowd (unpublished) observed that Lemma 4.5.5 is a corollary of a relation of

EF to a bounded arithmetic theory PV established by Cook (1975); see Sections5.3 and 9.3. This was independently observed in Krajicek and Pudlak (1989a),where the explicit polynomial simulation was constructed.

Lemma 4.5.6 is due to Tseitin and Choubarian (1975); the proof is fromKrajidek (1989b). Lemma 4.5.7 is from Statman (1977).

Definitions 4.6.1 and 4.6.2 are from Krajicek and Pudlak (1990a).A system that I did not include into this chapter is the system of cutting planes

of Cook, Coulard, and Turin (1987). It is defined in Section 13.1. For naturaldeduction systems I refer the reader to Smullyan (1968).

The measure e P of Definition 4.4.3 appears a bit artificial. But for systemsP >P EF it is polynomially related to the size (the most important measure byTheorem 4.1.2) and, in fact, all important lower bounds are actually lower boundsto the measure f p.

5

Basic bounded arithmetic

Bounded arithmetic was proposed in Parikh (1971), in connection with length-of-proofs questions. He called his system PB, presumably as the alphabetical succes-sor to PA, but we shall stay with the established name IDo (for "induction for Doformulas"). This theory and its extensions by 11° axioms saying that some particu-lar recursive function is total were studied and developed in the fundamental work

of J. Paris and A. Wilkie, and their students C. Dimitracopoulos, R. Kaye, andA. Woods.

They studied this theory both from the logical point of view, in connections with

models of arithmetic, and in connection with computational complexity theory,mostly reflected by the definability of various complexity classes by subclassesof bounded formulas. They also investigated the relevance of G6del's theorem tothese weak subtheories of PA and closely related interpretability questions.

Further impetus to the development of bounded arithmetic came with Buss(1986), who formulated a bounded arithmetic system S2, a conservative extensionof the system IL + 01 investigated earlier by J. Paris and A. Wilkie, and itsvarious subsystems and second order extensions. The particular choice of thelanguage and the definition of suitable subtheories of S2 allowed him to formulatea very precise relation between the quantifier complexity of a bounded formulaand the complexity of the relation it defines, measured in terms of the levels ofthe polynomial time hierachy PH. He also proved a first witnessing theorem forbounded arithmetic precisely characterizing the computational complexity of aclass of functions definable in certain subtheories of S2.

Later developments, which established some deeper connections betweenbounded arithmetic and the complexity theory, built on all this foundational work.

Paris and Wilkie also considered the relevance of the length-of-proof questionsin propositional logic to independence questions in bounded arithmetic (cf. Parisand Wilkie 1985). Earlier Cook (1975) constructed an equational theory PV (a

62

5.1 Theory IAo 63

subsystem of bounded arithmetic, as we shall see later) and proved another relationof PV to propositional logic.

In this chapter we first define and study Parikh's system I00 and its extensions.Then we introduce Buss's first order systems S2 and T2, Cook's PV, and Buss'ssecond order systems U2 and V2, and we prove basic relations among varioussubsystems of these theories.

5.1. Theory IDaRecall the language L PA from Section 3.2 and Definition 3.2.1 of 00-formulas.

Definition 5.1.1. Bounded arithmetic theory I Do is a first order theory of arith-metic in the language LPA, a subtheory of Peano arithmetic PA. It is axiomatizedby the following axioms called PA-:

1.

2.

3.

4.

5.

6.

7.

8.

9.

10.

H.12.

13.

14.

a+0=a(a + b) + c = a + (b + c)a+b=b+aa<b -x,a+x=b0 = a v 0 < a

0<10<a-+ 1<aa<b -* a + c < b + c

a 0=0

(a - b) - c = a-

a c)and by the i0-induction scheme IND:

(0(0) AVx(0(x) - ¢(x + 1))) - Vx¢(x)

where 0 is a bounded formula (= Do formula), which may have other free vari-ables besides x.

Note that the theory PA- extends Robinson's arithmetic Q, originally con-sidered by Parikh. Its models are exactly nonnegative parts of discretely orderedcommutative rings.

It is often easier to work with basic theory PA- instead of Q. Then notions likeorder and cut make sense; this is not true in every model of Q: A model of Q is, forexample, a copy of N followed by one nonstandard element e in which we definee+x =eforx 0.

Natural models for IDo are cuts in models of PA. We shall define the generalnotion of a cut in a model of PA-.

64 Basic bounded arithmetic

Definition 5.1.2. Let M be a model of PA-. A cut in M is any subset I C Mclosed under < and x + 1

`dx,yEM,(xEIAy<x)-+ (yEInx+l EI)

We denote this by I ce M (e is for "end-extension').

Note that a cut is not necessarily closed under the addition and the multiplicationin M. Examples of cuts are the model itself and N, the initial segment of Misomorphic to the natural numbers.

Lemma 5.1.3. Let M be a model of I AO and I c_e M any cut in M closed underthe addition and the multiplication of M.

Then I is a model of I Ap.

Proof. This lemma is simple but it rests on an important property of boundedformulas which is worthwhile to state explicitly:

Claim. Let 0 (a, b) be a bounded formula with all free variables displayed. As-sume that I is a cut in M closed under the addition and the multiplication, and vare some elements of I. Then for every u E I

iff M=cb(u,v)

The claim is easily established by induction on the quantifier complexity ofusing the fact that I is closed under <.

To prove the lemma it is enough to establish that for any bounded formula 0 (a)with parameters from I the induction for 0(a) holds in I, as the axioms of PA-are clearly preserved from M to I.

Assume

I 0(0) A `dx(O(x) - O(x + 1)) A

for some v E I. By the Claim also

0(0)A(O(u)- O(u+1))

for all u E I and also M -.0 (v). Hence in M the induction fails for the boundedformula

x>vV0(x)which contradicts the assumption that M IDO. Q.E.D.

If M is not a model of IAO, not all cuts in M closed under + and are modelsof IDO (e.g., M itself) but provided M satisfies at least theory Q, a cut that is amodel of ISO is actually definable in M. This is a nontrivial result of A. Wilkie(unpublished); see also Hajek and Pudlak (1993) and Section 10.6.

5.1 Theory IDo 65

Generally, to construct a model of IDo is difficult: There are no countablerecursive models of IDo (a theorem of Tennenbaum 1959) and, in fact, not evenof a weak subtheory IE1 of IDo with the induction scheme restricted only toE1 -formulas, that is, bounded existential formulas (Paris 1984). A subtheory ofIDo with the induction restricted to open formulas only, called IOpen, does haverecursive nonstandard models (Sheperdson 1964). The construction of such modelsand the question of deciding whether a Diophantine equation has a solution ina model of IOpen is of relevance to criteria for solvability of such equations innumber theory, but less to issues treated here. A curious reader is advised to consultvan den Dries (1980, 1990) and Otero (1991).

Theorem 5.1.4 (Parikh 1971). Assume that 9(a, b) is a Do formula and that

IDo HVx3y9(x,Y)

Then there is a term t (x ) such that

IAO I-Vx3y < t(x),0(x, y)

Proof. We prove the theorem by a simple compactness argument.Assume that IDo proves V x 3y9 (x, y), but it does not prove V x 3y < t (x) ,

9(7, y), for any term t. This means that it also does not prove any disjunction ofthe form

V vx3y < t 9(7, y)

as otherwise it would prove

VY3y<t,9(x,y)

fort := tl +t2+----By compactness then the theory

IDo - {`dy < t(c)-O(c, y) I t aterm}

is consistent where c are new constants.Let M be a model of this theory and define a cut I in M by

b E I if M= b< t (c ), some term t

Then I is a model of IDo (by Lemma 5.1.3) but also I 3 x `d y-.9 (x, y),contradicting the hypothesis of the theorem. Q.E.D.

It follows from Parikh's theorem that IAO cannot Do-define a function thateventually majorizes all polynomials, for example, the exponentiation xy = z.Consequently, IDo cannot directly formalize constructions requiring exponentia-tion (such as the cut-elimination).

In fact, it is far from obvious that there is even a 00-definition of the graph ofthe exponentiation {(x, y, z) I xy = z}.

The existence of such a definition follows from Bennett's Lemma 3.2.6 andTheorem 3.2.3; see also Bennett (1962). Paris (described in Dimitracopoulos 1980)and later Pudlak (1983) constructed a definition of the graph of the exponentiationabout which IDo could prove the recursive properties of exp:

x0 = 1 and x(y+') = xy x

With such a definition we can define other functions in IO01. 1-1092 W1 = y iff 2(y-') < x < 2y2. Ixl := F1og2(x + 1)1 ifx > 0, and 101 := 0.

and, more generally, one can define in IO0 the basic relations and operations ofthe rudimentary sets (i.e., the concatenation, the part-of quantification) and proveagain the basic closure properties.

Of particular importance are functions helping to formalize syntactic construc-tions so that IDo can speak about proofs or computations. The existence of DO-definitions of these basic notions follows from Theorem 3.2.7.

Coding of finite sequences is the most important function of this type and westate its properties in a separate lemma. Section 5.4 is devoted to constructions ofsuch a definition.

Lemma 5.1.5. There is a L o-formula 9(w, i, x) that we shall write as

such that I D0 proves:1. (w);=xA(w), y -* x=y2. (w),=xnj<i-Dy<w,(w)j =y3. 2wdiVx, (w); Ox4. bw`diVx, z3w'Vj i, (w)j = (w')j A ((w)i = z -. (w')t = x)

Theories that can interpret a theory of a binary operation satisfying conditions1-4 of the lemma are called sequential in Pudlak (1985). They are very importantas they can interpret theory Q and it is often easier to interpret a sequential theoryin a theory first than some fragment of arithmetic directly.

What one cannot prove in IA0, however, is that there is a sequence of allnumbers smaller than x, as its code would be exponentially large in contradictionto Parikh's theorem.

One also cannot prove in ID0 that in a word a letter can be substituted for byanother word

`dw, x, uDw', w' = w(x/u)

as the code of w' would have length bounded by J w i I UI only: That is, its size would

be bounded by 2I01"l, which is superpolynomial. The substitution plays, however,

5.1 Theory IAO 67

an essential role in basic logical notions, including the definitions of predicatecalculus, and to avoid its use one would be forced into unnatural definitions offormulas with only one explicit occurrence of a variable, and so forth.

Perhaps more importantly, coding of polynomial length proofs and polynomialtime computations requires totality of functions 21xlk, all k.

For this reason Paris and Wilkie (1981a,b; 1987b) studied the extensionIDo + 21 obtained by adding to IDo a If2-sentence

01 : Vx3Y, xlxl = Y

Function xlxI is denoted by w(x).We shall treat this in greater detail in the next section, but first we examine

the question whether one could at least interpret (cf. Section 10.6) theories likeIDo + Q1 or IAO + Exp in IDo and, in particular, whether one can define in IDocuts closed under w (x) or exp. A positive result is due to Solovay (unpublished);see Hajek and Pudlak (1993).

Theorem 5.1.6. Define functions wk(x) by

wl (x) (o (x), wk+l(x) 2Wk(Ix1)

Then for every definable cut J(x) and every k there is formula Ik(x) such thatIDo proves three properties:

1. Ik is closed under wk

Vx, (Ik(x) - 3y, Ik(Y) A wk (x) = Y)

2. Ik defines a cut

Ik(0)AVx,y((Ik(x)Ay<x)-+(Ik(x+l)nIk(Y)))

3.

Vx, Ik(x) -)' J(x)

Proof. Note that wk (x) majorizes all we (x) for £ < k and is majorized by 2x.First define Jl (x) by

Jl (x) := Vy, J(Y) -> J(y + x)

and J2(x) by

J2(x) :=VY, J' (Y) -

J2(x) defines a cut closed under addition and multiplication.Then define II (x) by

I, (X) Vy, J2(Y) * 3z, 21YIIxI = Z A J2(z)

Then IDo proves that I, is a nonempty cut closed under wI.Having the formula Ik(x) about which IDo proves that it defines a nonempty

cut closed under wk(x) define a formula Ik+I (x) by

Ik+I (x) `dy, Ik(y) - 3z, 2,l I(k)1XI(k) = z n Ik(z)

where 2k is the k-times iterated function 2x. It is easy to verify that this formulais in IDo a nonempty cut closed under wk+1 (x). Q.E.D.

The following negative result is interesting.

Theorem 5.1.7 (Wilkie 1986, Paris and Wilkie 1987a). There is no definable cutthat would be provably in IDo closed under all wk(x), k > 1.

In particular, no definable cut is provably closed under 2x.

5.2. Theories S2 and T2The language L of the systems to be defined in this section extends the languageLpA. The idea is that one adds enough function symbols to spare the tediousintroduction of coding of syntactical objects within the systems.

The language L extends LPA by three new function symbols

L LPA U [21, lxI,x#y} ,

where the last symbol has the meaning 2IxI-I I

For a convenience we shall sometimes also add the symbol (w);, for codingsequences, into the language, so

L+ := L U {(w);}

The basic theory that will be included in all our systems is denoted BASIC and itextends Robinson's Q.

Definition 5.2.1. The theory BASIC consists of the following 32language L:

1. a 2a#06. a <bvb <a7.

8. (a<bAb<c)-+a<c9. 101= 0

10. (12a1=1al+1n12a+11=lal+1)

axioms in the

5.2 Theories S2 and T2 69

11. 111=112. a<b -+ Ial<lbI13. la#bl = lal Ibl + 114. 0#a = 115. a 0 - (1#(2a) = 2(1#a) A 1#(2a + 1) = 2(1#a))16. a#b = b#a17. lal = Ibl - a#c = b#c18. lal =lbl+Icl --*

a <a+b20. (a <bAa0b)- (2a+1 <2bA2a+1 2b)21. a+b=b+a22. a+0=a23. a+(b+1)=(a+b)+124. (a+b)+c=a+(b+c)25. a+b<a+c--* b < c26.

27.

28.

29.

a 0

a = L(b/2)J = (2a = b v 2a + I = b)

For future theories, when L is replaced by L+ the common theory BASICwill automatically be extended to BASIC+ by adding the four conditions fromLemma 5.1.5.

The next two definitions introduce the basic systems of bounded arithmetic.

Definition 5.2.2. TZ is a theory in the language L extending BASIC by the induc-tion axiom IND

0(0) A Vx(0(x) - 0(x + 1)) - VxO(x)

for all Eb formulas O (a). The formula O (a) may have other free variables than a.The theory T2 is the union of all theories T2, .

Note that I E, C T2 .

Definition 5.2.3. SS is a theory in language L extending BASIC by the polynomialinduction axiom PIND

0(0) nVx(O(LzJ) - O(x)) - dxq5(x)

for all Eb formulas 0 (a). The formula 0 (a) may have other free variables than a.The theory S2 is the union of all theories S.

Finally we introduce a third version of induction.

Definition 5.2.4. The scheme of the length induction axioms LIND is

0(0) AVx(O(x) - 0(x + 1)) - VxO(Ixl)

The formula ¢ (a) may have other free variables than a.The theory Eb- LIND is BASIC augmented by the axioms of LIND for all

Eb formulas 0(a).

Letters S, T are next to letters P, Q, R taken previously in Tarski, Mostowski,and Robinson (1953) to denote PA, Robinson's Q, and a theory R defined there.The superscript i in TZ and S2 refers to the restriction of the induction scheme toEb-formulas, while the index 2 refers to the presence of function # in L. An indexk would refer to the presence of a function symbol #k in L, a function of the growthrate of approximately wk+i. We shall consider such systems only exceptionally(see, e.g., Corollary 10.5.4).

Lemma 5.2.5. For any i > 1:

S2=Eb- LIND =II' - PIND =nb - LIND

and

T2=Eb-IND=rib- IND

Proof. Let 0 be a E('-formula. Assume first that

0(0) A t/x 0([2j)1 0(x)) A --0 (a)

Then

V1 (0) nVx < laI(fi(x) - * VG(x+ 1)) A-*(lal)

where * (x) 0 (ax) for ax denoting the number consisting of the first x bits ofa. Formula tlr(x) is E' as ax = y is Eb-definable

ax=y if lyl =xn3z<a,a=y+2lvl.z

where 2' ' is just y#1.That contradicts EP- LIND.Analogous reasoning gives the rest of the lemma. Q.E.D.

Definition 5.2.6. The following are four minimization/maximization principles:(a) MIN

0(a)-3x<aVy<x,0(x)A 'O(y)

(b) LENGTH-MIN

O(a) 3x<a`dy<a,O(x)A(IYI<Ixl--* -O(y))

(c) MAX

0(0) - 3x < ally < a, 0(x) A (x < y -'q(y))

(d) LENGTH-MAX

0(0)-3x<aVy<a,O(x)A(Ixl< IYI- -'cb(y))

Recall the definition of A'-formulas (cf. Definition 3.2.11, part 4).

Lemma 5.2.7. For all i > 1:(a) TT = Eb-MAX = Eb-MIN = rib_1-MAX = f1b_1-MIN(b) SS = Eb-LENGTH-MAX = Eb-LENGTH-MIN

= rlb_ 1-LENGTH-MAX = 11b_ 1-LENGTH-MIN

where for i = 1 the class rib_ is replaced by the class of formulas Ob in S.

Proof(a) Let O(x) be a Eb-formula satisfying

.0 (0) n --0 (a)

By Eb-MAX there is maximal b < a such that 0 (b), that is, also -4 (b + 1).Hence 0 (x) does not satisfy the induction hypothesis. This shows that '-MAX

EimpliesT2. For the opposite direction let 0(x) be a -formula and

define another Eb-fbrmula *(x) by

*(x):=3y<a,x<yA¢(y)

Condition ¢ (0) implies f (0). By Eb-IND either f (a) holds, in which case0 (a) holds too, or b (b) A -, (b + 1) holds for some b < a, in which caseb is the maximal element < a satisfying 0(x).The MAX and MIN principles for Eb (resp. for rlb_ 1) are equivalent as xis the maximal element < a satisfying 0(x) iffy = a - x is the minimalelement < a satisfying 0(a - y). Note that a - x is Ab-definable in S.Now we show that Eb-MAX implies l1b_1-MIN: Take a l76_1-formulaO(x) and define Eb-formula fi(x) by:

1(x):=3y<a,x+y=aA (y)

Formula 0(a) implies t(0), and if b is the maximal element < a satisfyingf , then a - b is the minimal element satisfying 0.

Finally assume II' _t-MIN and let O(x) = 3y < t(x, a) *(x, y) be a Eb-formula with* being nb_

1(or L if i = 1). Define a formula 6 by

6((z,y)):=`dx <a,x+z=a-a (y<t(x,a)A*(x,y))

where (z, y) is the pairing function, which is increasing w.r.t the lexico-graphic ordering. Formula 0(0) implies 6((a, yo)) for some yo < t(0, a).By the MIN axiom there is minimal (b, c) for which 6 holds; clearly thena - b is the maximal element < a for which 0 holds.

(b) The proofs of Part (b) are completely analogous.Q.E.D.

Lemma 5.2.8. For all i > 1: SS C TT C S2+1 and thus: T2 = S2.

Proof. S2' C T2' is easier: Let be a E'-formula and assume

0(0) nVx(0(x) - 0(x+ 1))

Hence by TT also `dxo(x) and, in particular, also dx4(lxI). Hence TT implies theE'-LIND which is by Lemma 5.2.5 equivalent to S.

For TZ C S2+1 we use the idea of shortening cuts from Theorem 5.1.6. Let ¢be a Eb-formula satisfying

0(0) A dx(0(x) - 0(x + 1)) A -0(a)

Define formula i/i by

'G(x):dy<a,0(y)-O(x+y)Then t/i satisfies the assumptions of the PIND scheme and hence *(a) followsfrom S2+1. Buttr(a) implies 0(a). Q.E.D.

The next lemma is proved by the same shortening of cuts as in the second partof the previous lemma; we leave it to the reader.

Lemma 5.2.9. For all i > i and for arbitrary Eb formula 0 and IIb formulaif the theory S2 proves the equivalence

Vx<a,0(x)_*(x)

then S2 proves the, formula

0(0) A Vx < a(¢(x) - 0(x + 1)) - 0(a)

Such a scheme is called zb-IND.

The lemma can be strengthened (Corollary 8.2.7) but that requires first someother nontrivial results.

The class of Boolean combinations of formulas from a class r is denoted byta(r).

Lemma 5.2.10. For all i > 1, the theory T2 proves the induction scheme IND forall B(Eb) formulas.

Proof. Every B(Eb)-formula ile (x) is logically equivalent to a formula of the form

bk(x) A -00k-] (X) A -(... -'(02(x) A -Ol (X))...)

with all Oj E I1b (analogously with the difference hierarchy of Hausdorf 1978).We shall call a formula * such that i/i or -1/i can be expressed in this form a level

k formula.We shall show by induction on e that TZ proves the induction axiom for every

level f formula. Denote

*e (X) 4'e(x)A-(Oe-i(x)A,(...-'(412(x)A-01 (x))...)

For f = 1 this follows from Lemma 5.2.5. Let f > I and let a be arbitrary, andassume that the induction assumption for i/ie holds on the interval [0, a].

If Yx < ape(x) holds, then the induction for >/ie on [0, a] is equivalent to theinduction for - e-1, which is a level £ - 1 formula. If Vx < ape(x) fails, takeb < a the least number such that -¢e (b); it exists provably in T2 by Lemma 5.2.7as -of E E. Then the induction hypothesis for 1t on [0, b - 1] implies theinduction hypothesis for the level f - 1 formula -Vie_I and hence -*e-l (b - 1)holds. But that means that the implication

e(b - 1) - Oe(b)

fails, contradicting the induction assumption for 1/re.This establishes the induction for level £ formula 1i, which can be represented

as lie. If * is a level £ formula such that -* can be represented as Vie, the inductionfor.

follows from the induction for -1/i(a - x), which can be represented as ,/ie.Q.E.D.

Definition 5.2.11.(a) Bounded collection scheme B is the scheme

Vx < a3y, 0(x, y) - 3zbx < a3y < z, 0(x, y)

where 0 is a bounded formula. Symbol B Eb denotes the scheme restrictedto E6 -formulas only.

(b) Sharply bounded collection scheme BB is the scheme

Vi <1a13y<b,0(i,y)-3wdi <la1,4(i,(w)i)

Symbol BBE' denotes the scheme restricted to Eb-formulas only.

The collection scheme is sometimes also called the replacement scheme or, inthe context of second order systems, the choice scheme. We only remark that thebounded collection scheme is as strong as the induction scheme when acceptedfor all formulas (cf. Hajek and Pudlak 1993).

Lemma 5.2.12. For all i > I

SS F- BBE'

Proof. Note that the formula

*(x) := 3w < a#b Vi < lal, i < x -* 4)(i, (w),)

is Eb whenever ¢ is, and satisfies the induction assumption

(0) AVx < laI (V, (x) --> r(x + 1))

and hence also, by S2, / (j a l) holds.The bound a#b to w follows as w codes a sequence of at most la I numbers each

of length at most lbl, hence itself has length at most la l lbl. That is, its code is atmost a#b. (In fact, the bound a#b depends on a particular way of codingsequences and could be replaced by another term; see Section 5.4.) Q.E.D.

Note that the same proof actually shows that SZ proves the scheme of the strongsharply bounded collection scheme

3wVi < lal, (3y < b4)(i, y) O(i, (w)i))

A conservation result for the sharply bounded collection scheme was obtainedby Ressayre (1986).

Lemma 5.2.13. Let i > 1. Then the theory

S`2+BBE,+I

is VE+1-conservative over theory S2.

The proof follows the idea of the proof of VE°+i

-conservativity of B E°+1

overtheory IE° (cf. Hajek and Pudlak 1993).

Lemma 5.2.14. Denote by Sz (L+) a theory defined as SS but in the language L +and with BASIC+ instead of BASIC. Call a bounded, formula in the language L+strict E' if it has the form

3xI <tIVx2 <t2...4)(a,X1,...,xi)

with i alternating bounded quantifiers and a sharply bounded kernel 0.Then any Eb formula is in S2(L+) equivalent to a strict Eb formula.

5.3 Theory PV 75

Proof. Let fi(x) be a Eb-formula in prenex normal form. We want to show thatis in SS equivalent to a formula * in prenex form, in which all sharply boundedquantifiers follow after all bounded but not sharply bounded ones. Let

Vi <ItIJY<s,9(i,Y)

be a subformula of 0: That is, 9 is a Eb-formula too. By the sharply bounded col-lection BBEb available through Lemma 5.2.12 in S2 this subformula is equivalentto

w < t#sVi < ItI, 9(i, (w);)

This demonstrates how to switch a pair of sharply bounded/bounded quantifiers.Repeating this yields r.

To get from 1 a strict 1b-formula we only have to replace two consecutiveoccurrences of the same bounded, but not sharply bounded quantifier by one;but this is easily achieved by a pairing function that can be defined by using thefunction (w),. Q.E.D.

Lemma 5.2.15. The theory S2 is a conservative extension of the theory I Do + Q1.That is: Any formula in language L pA provable in S2 is also provable in I Do + S21.

In fact, eve,y model M of I A0 + S21 can be expanded to a model of S2.

P r o o f. The theory S 2 defines the function &)I (x) and proves the axiom 2 i : : This

follows from a trivial bound

(01 (x) < ((x#x) + 2) 2

This shows that 100 + S2I C S2.Let M be a model of I00 + 521. By Theorem 3.2.6 there is a D0-definition of

the graph of exponentiation and by the remark before Lemma 5.1.5 there is suchdefinition for which I00 can prove the recursive equations. It follows that IA ocan also 00-define the graph of the function a#b, and from the bound

it follows that I AO + Q i proves the totality of a#b. The axioms of BASIC poseno problem; nor do the D0-definitions of IxI and L(x/2)J. This demonstrates thatM can be extended by functions to obey BASIC, and from the fact that they areDo- definable in M it follows that induction will hold for all bounded formulas inthe expanded language. Q.E.D.

5.3. Theory PVBuilding on an earlier work of Bennett (1962), Cobham (1965) characterized theclass of polynomial time functions in the following "machine independent" way.


We say that a function f is defined from functions g, ho, h, , and e by limitedrecursion on notation if:

1 . f(x, 0) = g(x ),2. f( T, si(y)) = hi(x, y, f( 7, y)), for i = 0, 1,3. f(x, y) S e(x, y),

where so(y) and s, (y) are two functions adding 0, respectively 1, to the right ofthe binary representation of y

so(y) := 2y

s,(y):=2y+1

Theorem 5.3.1 (Cobham 1965). The class of polynomial time functions is thesmallest class of functions containing constant 0,.functions so(y), sl (y) and x#y,and closed under:

1. permutation and renaming of variables2. composition of functions3. limited recursion on notation

We might note at this point that it is possible to enlarge basic functions by finitely

many polynomial time functions such that requirement 3 becomes redundant inthe theorem: That is, the class of polynomial time functions has a finite basis (cf.Muchnik 1970).

Building on this theorem Cook (1975) defined formal system PV (for poly-nomially verifiable). There are two motivations for considering a system like that:One is its relation to the extended Frege system (Corollary 9.2.4); another is morephilosophical, to define a system in which instances of general proofs can beverified by constructive, computationally feasible procedures.

Definition 5.3.2. We simultaneously define function symbols of rank k and PV-derivations of rank k, k = 0, 1, .... The language of PV will then consist of allfunction symbols of any rank, and a P V-derivation will be a P V-derivation of anyrank.

(a) Function symbols of rank 0 are constant 0; unary so (y), s, (y), and Tr(x);and binary x - y, x#y, and Less(x, y).

(b) Defining equations of rank 0 are:

Tr(0) = 0

Tr(si(x)) = x, i = 0, 1

x- 0=,rx - (s; (y)) = s; (x --- y), i = 0, 1

x#0=0

5.3 Theory PV 77

x#si (y) = x - (x#y), i = 0, 1

Less(x, 0) = x

Less(x, s; (y)) = Tr(Less(x, y)), i = 0, 1

Tr.(x) truncates x (i.e., deletes the rightmost bit), x -- y is the concatena-tion, x#y is JyI concatenated copies ofx, andLess(x, y) is x with I yI rightbits deleted.

(c) P V rules are as follows

R1

R2

R3

t=uu=t

t=u u=ut = U

tl =U1,...,tk=uk.f(tl,...,tk) =f(u),...,uk)

R4

t=ut(x/u) = u(x/u)

R5 Let El , ... , E6 be the equations (1-3) from the definition of the lim-ited recursion on notation: three for f1 and three for f2 in place off. Then

El,...,E6.fi (7, y) = .f2 (7, Y)

(d) PV-derivations of rank k are sequences of equalities El , ... , E, in whichevery function symbol is of rank < k and every E; is either a definingequation of rank < k or derived, from some earlier equations by one of theP V -rules.

(e) Let t be a term consisting of function symbols of rank < k. Then f is afunction symbol of rank k + 1 and f = t is a defining equation of rankk+ 1.

(f) Other, function symbols of rank k + 1 are obtained as follows. Wheneverg, ho, h 1, 20, e 1 are function symbols of maximum rank k and n,, i = 0, 1,are P V -derivations of rank k of equality

Less(hi(x, y, z), z - ei(x, y)) = 0

then

f = f(g,ho,hi,to,ei,no,,ri)


is a, function symbol ofrank k+ 1, and the two equations (1) and (2) defining

f from g, h i by limited recursion on notation are defining equations of rankk+1.

PV has a function symbol f for every function introduced from earlier ones bylimited recursion on notation, provided one can first prove (an equivalent statementto) that f is bounded by a function obtained earlier.

Lemma 5.3.3. Let n denote the dyadic numeral of number n:

0:=0, 2n:=so(n), 2n+l:=si(n)

and let f (x 1, ... , xt) be any polynomial time. function.Then there is a PV. function symbol fpv ( x . . . . . . xk) such that for every m 1, ... ,

ink and n = P171 I --.,mt) the theoryPV proves the equation

fPV(mI,.,mtnOn the other hand, every PV. function symbol defines in N a polynomial timefunction.

Proof The last part follows from Cobham's Theorem 5.3.1. The first part is notobvious because of the requirement in part (f) of the definition of PV that be-fore we may introduce a symbol for the function defined by limited recursion onnotation we must have a proof that it is bounded by some previously introducedfunction. An inspection of the proof of Theorem 5.3.1 shows that as one constructspolynomial time functions, one also has obvious PV-proofs of their boundedness.

Q.E.D.

Now we state a relation between the systems SZ and PV. Denote by S1(PV) thetheory defined as SZ is in the language L augmented by all PV-function symbols,with all PV defining equations as new axioms and with the PIND rule extended toall El'-formulas in the new language.

Theorem 5.3.4 (Buss 1986). The theory S2 '(PV) is conservative over PV. Thatis: Whenever an equation t = tr between PV-terms is provable in Sz (PV), then itis provable already in PV

The proof requires some results from Chapter 7 and we postpone it till Section7.2 (cf. Corollary 7.2.4 and the text after it).

Cook (1975) also defines an extension PV 1 of PV, allowing open formulas andpropositional reasoning instead of only equations, and adding PIND scheme forall open formulas. It holds then (and follows immediately from Theorem 5.3.4)that PV I is also conservative over PV.

Instead we shall define theories PV,, i = 1 , 2, ... , which will act as a universalaxiomatization of a conservative extension of theories TT - I .

PV I consists of all equations t = u provable in PV but has also a form ofinduction axiom: For an open formula fi(x) define a function h(b, u) by

(a) h (b, 0) = (0, b)(b) if h (b, L(u/2) j) = (x, y) and u > 0 then set

(1(x + y/2)], y) if 1(x + y/2)1 < y A *(f (X + y/2)1)

h (b, u) := (x, r (x + y/2)1) if x < F (x + yl2)1 A -* (f (x + y/2)1)

(x, y) otherwise

Then PV I contains the axiom

(*(0) n -*(b) n h (b, b) = (x, y)) - (x + 1 = Y A *(X) A -*(y))

This axiom simulates the binary search and is thus related to the PIND scheme.The logic of PV I is the usual first order predicate calculus.

TheoryPV;+i contains PV i and has inductively defined characteristic functionsof all Ei'- predicates in its language, in particular universal axioms of the form

3Y<t(7),f(x,y)=I-+ g(T)<t(x)Af(x,g(x))=Iwhere g are new function symbols introduced inductively for all formulas of theform

3Y<t(x),f(x,y)= IFurthermore, PV;+1 is closed under definition by cases, composition, and limitedrecursion on notation (i.e., has function symbols for all functions introduced bythese processes) and also contains the preceding axiom for every open * in itslanguage. The logic of PV;+i is also the first order predicate calculus.

Note that each PV, is a universal theory. We state a theorem that will be provedin Section 6.1 (cf. Corollary 6.1.3).

Theorem 5.3.5. For every i > 1, PV;+1 is fully conservative over the theory T.The theory PVi is conservative over PV.

5.4. Coding of sequencesIn this section we sketch a way to code finite sets and sequences in S. This isnecessary in order to be able to formalize various syntactic and logical notions andcomputations of machines in subsystems of S2.

For the language LPA (and IDo + 0 i) this is quite nontrivial as one must firstfind a well-behaved Do-definition of the graph of exponentiation, in order to speakabout lengths of numbers and their bits. The existence of a Do-definition of thegraph of exponentiation follows from Bennett (1962) (cf. Theorem 3.2.6), but thetheorem does not imply that there is such a definition about which IA0 or IA0+S2 i

could prove the basic recursive properties1. x0=12. xy+1 =xy.x

Such a bounded definition was constructed by J. Paris (in Dimitracopulos 1980)(see also Pudlak 1983).

Another crucial function whose well-behaved bounded definition is needed isNumones(x): the number of ones in the binary expansion of x. The function isclearly computable in logarithmic space, and thus its graph is Ao-definable byCorollary 3.2.9, but proving the basic recurrence properties

1. Numones(0) = 02. Numones(2x) = Numones(x)3. Numones(2x + 1) = Numones(x) + 1

again requires some work. In IAo + Sgt this is easier. Theorem 3.2.7 is a generaltool showing that all usual concepts defined by inductive properties can be definedin a well-behaved way in IDo + 521.

With these two definitions in hand one defines the basic relations and functionson finite words, identifying a number with its dyadic representation and the codingof sequences then follows the development of rudimentary sets in Bennett (1962).In IAo + S21 the formalization of syntax and logic is then smooth.

If one wants to formalize logical notions in IDo only, there are other compli-cations. For example, the term resulting from substitution of a term u into a termv for a variable x will not in general have length proportional to I u I + I v 1; henceits code will not be bounded by a polynomial in u, v and by Theorem 5.1.4 I AOcannot prove that the substitution is always defined.

In some situations one can restrict the syntax, for example, to terms and formulaswith only one occurrence of each variable (or to their representation with thisproperty), but the formalizations obtained in this way are unnatural.

We refer the reader to Paris and Wilkie (1987b) or Hajek and Pudlak (1993) fordetailed development of coding, sequences, and syntax in IOo and IDo + Q1.

With the language L of S2 the situation is much simpler because we have thelength function Ix I in language allowing us to define the graph of exponentiationimmediately by

2x=y = 3z<y,z+l=yAIzl=xAIy =x+1which is equivalent to

2x=y - `dz<y,z+1=yam IzI=xAIYI=x+1

We want to define the basic notions of rudimentary sets and of coding of sequencesby means of Ob-formulas in SZ , in such a way that SZ can prove the basic propertiesand, in particular, the properties of Lemma 5.1.5. This is done in great detail inBuss (1986). Another approach is to follow the development of Paris and Wilkie(1987b) and Hajek and Pudlak (1993) in SZ and to verify that all notions that are

only 0o there are Ob in S. To illustrate these topics we outline a way of codingsequences, but we shall proceed rather swiftly, leaving details to the reader.

First we define the pairing function

(a b) I (a+b)(a+b+ 1) I +aL 2 J

It is defined by a term (hence is ob) and S1 can prove the basic property

(a,b)=(u,v) - (a=unb=v)

Then we define the predicate "a is a power of 2"

Pow(a)-3x <a,x+l=anlxl+l=IaI

which is provably in SZ equivalent to

`dx <a,x+l =a-+ lxl+1 = IaI

Next define the function the ith bit of a

1 if3u,v,w <a,u+v+2vw=an Pow(v)A IvI =i+1bit(a,i):= Au<v

0 otherwise

which is also Ob as bit(a, i) = 1 is also equivalent to

du,v,w < a, u + v + 2vw = a A Pow(v)AJul <i IvI=i+1

Using this function we define the elementhood predicate

i E a - bit(a, i) = 1

Claim 1. Functions and predicates (a, b), Pow(a), bit(a, i), and i E a are Obin S2.

We want to code arbitrary 0-1 words. This cannot be done just by binaryexpansions of numbers that always start with 1. So we think of a word as pair(u, v), coding the word consisting of first right I v I bits of u. With this interpretationin mind define the equality of words a = b by

3x,y<a3u,v<b,(x,y)=a A (u,v)=bA

A(di<IYI,iEx-(iEuhi<IvI))A(di<Ivl,iEu-(iExni<IYI))which is again Ob as x, y and u, v are unique. We also define the function the ithletter in word a

Letter(a, i)1 if 3u, v < a, (u, v) = a A i Euni <IvI0 otherwise

The idea of coding sequences of words is that a sequence will be coded by apair (a, b), where the ith bit 1 in b marks the end of the ith subword of a: Thatis, a sequence w1 , ... , wt of t words will be coded by number a whose binaryexpansion is wt - S ... - w I and number b, which has bit 1 in positions I w I I ,

lal.This idea requires that we must be able to define the function counting the

number of ones among the first i bits of a

Numones(a, i):= I{jI j <1 A] E a}(

Define

Numones(a, i) = k if

3x<(a#a)#adu<Ial,(I,u)Ex-AVt,u,v<IaI,(t,u)Exnu<VAVEaA(V5<v,u<s-+ sVa)-4 ( t + 1 , v ) Exn 3u <i,(k,u) E x n Vu < i , ( k + 1 , u ) V

In words: x codes an increasing map f r o m { 1, ... , k} onto the l's of a. Such an xis unique; hence the definition is Ob, and the inductive character of the definitionof x allows us to prove basic inductive properties of Numones(a) (see previousdiscussion).

We are ready to define sequences and the function (w)l

Seq(w)-3x,y<w,(x,y)=wnlxl=lYI

and for w a sequence

(w)t=u= 3x,y<w,(x,y)=wAVt<luldj, k<Ixl,

Numones(y, j) = i- l n Numones(y, k) = i-+ k= j+ I u IAt Eu-(j+t)EX

Lemma 5 . 4 . 1 . T h e f u n c t i o n (w), = u is I -definable in SZ and SZ proves theconditions of Lemma 5.1.5.

We shall conclude this section by a lemma stating that some predicates can bein a sense coded in S2. It extends Lemma 5.2.12.

Lemma 5.4.2. Let A(a) be a Eo (E') -formula, that is, a formula obtained fromE 'formulas by logical connectives and sharply bounded quantification. Let i > 1.

The theory SS proves

Vx3yVt < Ixl, A(t) = (t E y)

That is: Any bounded set of lengths defined by a Eo(Eb) formula can be codedby a number.

Proof It is enough to show that for A E Eb, which also implies that any E b (Eb)-predicate can be (on any interval [0, Ixl]) expressed as Ob whose coding followsfrom the case i = 1.

Consider a Eb-formula B(s) with parameter x

3y < xYt < Ixl, (t (= y -* A(t)) A lyl = S

Clearly B(0) holds as y corresponds to the empty set, and so by the Eb-LENGTH-MAX principle available in S2' by Lemma 5.2.7 there is maximal s < Ix I satisfyingB. It is straightforward to verify that y corresponding to this s codes A on interval

[0, 1X II. Q.E.D.

Corollary 5.4.3. For i > 1 the theory S2 proves the Eb(Eb)-PIND scheme.

An interesting topic related to coding are partial truth definitions; see Paris andDimitracopoulos (1982).

5.5. Second order systemsIn this section we shall introduce some second order systems of bounded arithmetic,most from Buss (1986). We shall, however, proceed by model-theoretic reasoningrather than by direct proof-theoretic investigations. This will allow us to give simplemodel-theoretic proofs for the so-called RSUV isomorphism and translate severalresults from the previous section directly to these systems. It also allows us torelate the use of a second order object to the limited use of exponentiation.

Consider M a nonstandard model of S2 and define a particular cut I ce M by

foraEM: aEI iff 2x,a=lxl

For the obvious reason we shall denote this cut by Log(M). This cut is closedunder addition and multiplication (as la I lb I + 1 = l a#b 1), but it is not necessarilyclosed under # (that would require that M is closed under (02 W, which we do notassume).

Take a collection XM of those subsets of Log(M) coded in model M, that is,those a c Log(M) such that for some a E M

Vi E Log(M), (M i E a) _ (bit(a, i) = 1)

We shall denote such an a by a.Consider now the two-sorted first order structure (Log(M), Xm) with all sym-

bols of L \ {x#y} defined for elements of Log(M) by restricting the operationsand relations from M, with = defined on XM and with the relation i E a definedfor pairs from Log(M) x Xm by the preceding condition. We call elements of thesecond sort XM sets.

We call our systems second order although we treat them only as two-sorted firstorder structures (as the underlying logic always remains just the first order pred-icate calculus). In second order logic the underlying logic has to assume variousprinciples about sets and some set theory. In fact, there is no specific second orderlogic. We shall, however, call our systems second order, to honor the establishedterminology.

Second order bounded arithmetic theories UJ and were introduced in Buss(1986). In this section we define these theories and obtain some basic informa-tion about their strength. L2 is a second order language whose first order partis L, with second order variables at, fS, ... ranging over finite sets of numbers,where t, s, ... are first order terms, and with a membership relation x E at. Thesuperscript t in at is introduced for technical reasons as an explicit upper boundto elements in a'; we will mostly omit the superscript as such upper bounds areimplicit in (proofs of) bounded formulas, and we shall display it only to simplifythe presentation. More systems are introduced in Buss (1986), using varying lan-guage (whether variables for functions are included, or variables for unboundedsets, etc). In accordance with the notation of Buss (1986) the systems we shallintroduce should be called U (BD) or P (BD), but we shall abuse the originalnotation slightly and adopt the simplest notation. In any case, all these theoriesprove the same bounded formulas.

Definition 5.5.1. Bounded second order formulas are, formulas of L 2 all of whosefirst order quantifiers are bounded.

o'bE formulas are bounded formulas without second order quantifiers.The classes ofE`b'b formulas and 1-I, ,bformulas are classes of bounded formu-

las defined analogously to classes Eb and 11b, counting the number of alternationsof second order quantifiers and not counting the first order quantifiers.

AE`1,bformula is A1,b (resp. A1,b in a theory T) if it is equivalent to a 11,'b-

formula in predicate calculus (resp. equivalent to it in T). In particular, all Eo'b-formulas are A, 'b.

Definition 5.5.2. The theory I E I ,b is a theory in language L2 with the followingaxioms:

1. BASIC2. the extensionality axiom

Vat(x), fis(x),(Vy<t(x)+s(x),yEa=yEa

3. axioms stating that all sets are bounded

Yat(x)Yx,y;yEat - y<t(x)

4. Eo'b-IND scheme

5. bounded comprehension scheme for Eob formulasEo'b - CA

VXBi,VVy<x,yEi/?=A(y), forallAEEo'6

Definition 5.5.3. The theory Vi is the theory of all structures (Log(M), XM)obtained from all models M of the theory S2.

Lemma 5.5.4. The theory VI' is equivalent to the theory I Eo'b extended by the

IND scheme for all # -free E, 'bformulas.

Proof. Let M be a model of SZ and (Log(M), XM) be a model of V,', definedearlier. We want to show first that in all such structures the theory IEo holds and

holds (for formulas in the language of VZ ).Note that statement like u E at or at _ fS in (Log(M), XM) can be equiva-

lently stated in M by i -formulas

bit(a, u) = 1 n u< t or a= b

for a = a and ,B = b, first order bounded quantifiers 3u < v and `du < v in(Log(M), XM) are equivalent to sharply bounded ones 3u < 12V 1 and Vu < 12° 1in M, as 2° exists in M, and second order quantifiers 3a` and Vol translate tofirst order bounded quantifiers in M: 3a < 21 and Vb < 2S.

In this view the first three axioms of I Eo'b are trivially satisfied. To see thatbounded comprehension scheme 5 holds, let A(x) be a Eo'b with x the onlyfree variable and with possibly some other first or second order parameters from(Log(M), XM ), and let u e Log(M). Then the formula

3i'Vy<u,yEt/r°-A(y)

translates in M into a E'-formula (as A translates into a M formula) that clearlyholds for a = 0 and satisfies the induction assumption. By Eb-LIND in M thenholds for any length, in particular for u = 12" I E Log(M).

To verify the IND scheme for Ell 'b-formulas let A(y) be a Ell 'b-formula, withy the only free variable and with parameters. The formula A( y) translates in Minto a Eb-formula A*(y), and the assumption

(Log(M), XM) 1= A(0) A Vx, A(x) -- > A(x + 1)

implies that

M A*(0) nVx < I2"I, A*(x) A*(x + 1)

holds for all u E Log(M).By Eb-LIND in M there follows

M = A*(u)

which gives back

(Log(M), X,M) A(u)

This proves that V' contains IEo'b + E1'b-IND.For the opposite inclusion let (K, X) be a model of I Eo'b + E ll'b-IND. We

want to show that (K, X) also satisfies V11'. To this end we shall define a modelM of SS such that

(K, X) = (M,XM)

The idea of the construction is to use pairs (a, a b ) E K x X to code elements ofM; specifically, (a, ab) would code an element with the value Ei< a, iE a' 2'. Weshall omit the superscripts of the second order variables.

Claim 1. There are relations A1'b-definable in V1'

R_((ai,ai),(a2,a2)), R<((al,al),(a2,a2))

and

Rf ((al, ai ), ... , (ak+l, ak+i ))

one. for each k-aryfunction symbol f of the language L, such that V,' proves thetranslations of all axioms of BASIC obtained by replacing = and < by R= and R<,and f (a 1, ... , ak) = ak+i by R f. V1' also proves all translations of the equalityaxioms.

The relation R is Eo'b-definable by utilizing the extensionality axiom, andthe relation R< is also Eo'b-definable by formalizing the lexicographic orderingof sets. The relation R f is defined as a formalization of computations of bits off (XI, ... , xk) from bits of xi, ... , xk. In the case of L(x/2)i this is trivial, in thecase of addition and multiplication this is a formalization of the table of compu-tations, and in the cases of Ix I and x#y this involves formalizing computations ofthe values of the functions via the basic recurrence properties (in fact, the sameapplies to all polynomial time functions introduced by Cobham definition 5.3.1).These computations are unique and hence the defined relations will be A:,b, butthen we need bounded comprehension for All 'b-formulas.

The formula

3t/fa`dy<a,YE Va=A(Y)

is E,'b if A is 0,'b and is easily proved by induction on a; hence bounded 0,'b-CA

is provable in V11. The details are left to the reader.Let M be the structure (K, X)/R= with the symbols of L interpreted by the

relations of the claim. Numbers U E K can be best represented by pairs (I u 1, a )with a,, := {io < ... < ik} for which K u = 2'0 + ... 2`k.

Claim 2. K is isomorphic to Log(M).

To see this one needs to show in (K, X)

(b, P)R=(Ial, a) - 2c < a, (b, fi)R=(IcI, ac)

and

R f ((lal I, aai ), ... , (l ak+l I , aak+j )) - .f (aI__ ak) = ak+l

Both implications are readily proved by Di'b-lND on a (resp. on the sum al +... ak+l), utilizing Claim 1. Moreover, the length of every (a, a)/R= E M is a(i.e., Log(M) c K), and for every a E K, M a = I (a + 1, {a})/R=I. That is,K C Log(M).

Claim 3. The model M satisfies S2.

Let A be a Eb-formula with parameters from M. Using the definition of M itcan be translated into a EI'b-formula A'((a, a)) with parameters from (K, X).Assume that A satisfies in M

A(O) A (Vx < a, A (l2 J) A(x)) A -A((a, a)/R=)

Then (K, X) satisfies

A' ((1, {0})) A (`dx < aV , A'("L(x, 0)/2j") -- A'((x, 0))) A -A'((a, a))

Now "L(x, 0)/2j" is just (x - 1, 0) and w.l.o.g. (1, a)R=(1, {0}), so specificallywe have

(K, X) B(1) A (Yx < a, B(x) -+ B(x + 1)) A -B(a)

for a Ei'b-formula B(x)

B(x) := A'((x, a))

But this contradicts in (K, X). This proves the claim. Q.E.D.

Definition S.S.S.(a) The theory VV extends V1' by augmenting the language by the function

symbol # and accepting the axiom scheme IND for Ei 'b -formulas in theextended language.

(b) The theory Uj, j = 1, 2, extends the theory IEo'b by EJI'b-PIND schemein the language L \ {#} for j = 1 and Lfor j = 2.

The next lemma is entirely analogous to Lemma 5.2.8.

contains V.Lemma 5.5.6. The theory Vi contains UU and Uz+

Lemma 5.5.7. The theory U' proves the separation scheme II,''b-SEP scheme

(Vx, --'A(x) v -'B(x)) --> Vx3*x`d y < x,

(A(Y)--*YEi/i)A(YEf-'-'B(Y))

with A, B E I7;'b

Proof. Consider a formula C(a, b).

Vs < b3>/rbVy < b; s < y < s+a --+ ((A(Y) _ Y E tfb)A(Y E

Assuming Vx, -.A(x) V -'B(x), the theory U' proves

C([ ],b)-C(a,b)

b _ -'B(Y)))

as it can prove that two sets have a union.C(0, b) is trivial. By E "-PIND then there follows the formula C(b, b), and

the instance of Fl',b-SEP scheme follows. Q.E.D.

The separation scheme immediately implies a form of comprehension scheme.

Lemma 5.5.8. The theory U' proves the 0i''b-CA scheme

(Vx,A(x)-B(x))-+Vx3 Vy<x,yE - A(y)

with A E Eil 'b and B E II1.b

A 'b-CA over I Eo'b readily gives the A I'b-IND scheme.

Lemma 5.5.9. The theory U proves Al'b-IND.

Lemma 5.5.10. The theory U' proves the choice scheme Ei1'b-AC.

b'x < a3*A(x, fir) - 3coVx < a, A(x, (tp)x)

with A E Ei1'b and y E ((p)x defined as [x, y] E cp, [x, y] a pairing function.

Proof The lemma is proved analogously as Lemma 5.5.7: show that the requiredco exists when x is drawn from subintervals of [0, a] of length 1, 2, 4, .... HenceEtl'b-LIND, which is available in U analogously with Lemma 5.2.5, is sufficientto obtain the statement. Q.E.D.

A modified choice scheme is the scheme of the dependent choice Eli'b-DC:

`dx < adcp3*A(x,'p, i/r) Vcp3B, (B)o = cp A `dx < a, A (x, (O)x, (B)x+i )

with A E Ei1,b.

Lemma 5.5.11. Et'b-DC is provable in V2 and EIi'b-DC +IEo'b proves V2-

Proof It is clear that Et'b-DC is provable in VV by induction on a in a Ei''b-formula with parameter q

38, (8)0 = cp A `dx < a, A (x, (9)x, (8)x+i )

To see the opposite direction assume that a El'b-formula 3t/rA(x, yr), withA E 111' b1, satisfies

A(0,/o)AVx <a,(3t/ixA(x,V/x))- . (3*x+1 A(x+1,t//x+i))

The second conjuct can be written as

`Ix < aV x31/ix+t, A(x, 1x) -> A(x + 1, rx+t)

which is the antecedent of an instance of DC scheme for the A1'b-formulaA(x, -+ A(x + 1, fx+I ). By the DC scheme with co = *o we get

30,(0)o=*0AVx<a,A(x,(0)x)-->A(x+1,(O)x+1)

By 111"1-Iion the formula A(x, (9)x) then 3,/iA(a, ') follows.The argument so far shows that Eli'b-DC +I11'b1-IND proves E,',b-IND. But

analogously with Lemma 5.2.5 171i1,bi-IND is implied by Ei'' ,-IND. We may thenrepeat the same argument for i - 1 in place of i to derive -IND in E) 'b -DC +E; ' 2-IND and hence also E; 'b-IND in E,' 'b-DC +EiI-' b2-IND. Iterating thisprocess i-times gets the wanted result. Q.E.D.

We shall now state precisely the relation between first order subsystems of S2and second order systems like U', V1 implicit in the proof of Lemma 5.5.4.

Definition 5.5.12. The theory R' is a first order theory in the language L axiom-atized by BASIC and the axiom scheme

0(0) A dx ((p ([2J) - 0(x)) - VxO(Ixl)

for all Eb formulas 0.R2 is the theory Ua R.

Analogously with Lemma 5.2.8 S2 ' c_ RZ C S2, hence R2 = S2, holds. Therelation of T2' -' to Rz is open.

The reason for defining a theory like this is the following result.

Theorem 5.5.13 (Takeuti 1993, Razborov 1993). There exists a translation ofsec-ond order bounded formulas into first order bounded formulas

EEbeo

and a translation of first order bounded formulas into second order boundedformulas

Be -+BZEE,00 00

having the following properties:1. if S21 .8 then V' I- B22. if V1' I- A then SS I- A'

and3. S2 B = (B2)i4. V,'F-A(A')2

The same properties hold for RZ and Uj in place of Sz and V , respectively.

We shall omit the proof as it is a direct generalization of the argument in theproof of Lemma 5.5.4. Proof-theoretic arguments can be found in Takeuti (1993)and Razborov (1993). The pair of translations is called RSUV-isomorphism.

We shall use this theorem in Chapter 8 in characterizing functions and func-tionals definable in second order systems.

Note that the second order choice principle AC translates in the RSUV isomor-phism into the sharply bounded collection scheme BB; hence, for example, a resultlike Lemma 5.2.12 is essentially equivalent to Lemma 5.5.10. We shall see moreexamples of such translations of statements.

A very important closure property is the closure under counting functions (cf.Section 2.2).

Let a "'-formula Enum(f, u, a) denote "f is an increasing bijection from uonto a," that is, the conjunction of conditions

1. `dx<y<u,f(x)<f(y)2. Vx<u,f(x)Ea3. VzEa3x<u,f(x)=z

Lemma 5.5.14. U,1 proves that every set can be enumerated in increasing order

Va3u3 f, Enum(f, u, a)

Proof. Let a bound all elements of a and consider the formula

A(t) Vx, y < a3i 3f, u, y < x + 2'- (/3= {zEaix <z<y}A Enum(f,u,i4))

This is a El,b-formula clearly satisfying induction assumptions A(0) and Vt,A(t) -- A(t + 1), and hence by E I Ib-L1ND A(Ja 1) holds: That is, f, u witnessingA(Ia 1) for x = 0 and y = a increasingly enumerate a. Q.E.D.

The lemma implies that in U,' one can AIIb-define the cardinality of a set

Jai:=u if 3f, Enum(f,u,a)iffVf,v, Enum(f,v,a)-+ u=v

and hence also all combinatorial principles provable by rudimentary counting areprovable in U,1. We shall return to this in Chapters 13 and 15.

We shall conclude this section by characterizing a limited use of exponentiationin first order proofs using second order systems. This is of interest as many com-binatorial or number-theoretic arguments using exponentiation (or equivalentlypower set operation) use it only once. For example, the usual proof of the infini-tude of primes requires one to compute the number n! + 1, or a combinatorialargument requires one to consider the set of all graphs on a given vertex set of sizen and so forth. In the following discussion we will consider only a particular caseof the theory S2, but the arguments apply to other subsystems as well.

Definition 5.5.15. The set SZ + 1-Exp consists of all Eb,- formulas O(a) suchthat there is a term t (a) for which SZ proves the implication

t(a) < Icl - 0(a)

where c is a free variable not occurring in t (a) or 0.

Theorem 5.5.16. Let 0 (a) be a first order bounded formula. Then

(a) E S1 + 1 Exp iff Vz F- (P (a)

Proof. Assume first that Vz Ff 4(a). That is, there is a model of V2 such that

(K, X) -'O(m)

for some m E K.The construction form the proof of Lemma 5.5.4 gives a model M SZ with

Log(M) = K. Assume

S'F-t(a)<Icl-+0(a)

As (K, X) V2, t (M) E K so there is 21(1) E M, and hence O (m) should holdin M. But by the claim in the proof of Lemma 5.1.3 the truth value of 0 (m) has tobe the same in K and M: a contradiction.

Assume now 0 (a) 0 SZ + 1- Exp. By compactness we can find a model Mof SZ with a cut I Ce M such that

(i) 1 = SZ(ii) 3c E M \ I Vb E I, M = 2b < c

Consider a family X of subsets a C I coded by some a E M, M a < c.We claim that (I, X) is a model of Vi . This is because in the translation of

second order bounded formulas over (I, X) into first order ones over M, secondorder quantifiers 30, VO can be replaced by 3a < c and Va < c, and bounded firstorder ones into sharply bounded quantifiers of the form 3x < Ici, Vx < Icl. ThusE i'b-IND in (I, X) follows from Eb-LIND in M. Q.E.D.

We refer the reader to Krajicek (1990) for more on this topic.


5.6. Bibliographical and other remarksDefinition 5.1.1 is from Parikh (1971) (he used Q in place of PA-). The theoryI A0 was studied by Paris and Wilkie in a series of papers (e.g., Paris and Wilkie1981a,b; 1987b) where its extensions by axioms S2k and Exp were also defined.

Definitions and most lemmas in Sections 5.2 up to 5.2.12 are due to Buss (1986,1990a). Lemma 5.2.13 is due to Ressayre (1986). Definition of PV in 5.3 is dueto Cook (1975), and definition of PV1's follows Krajidek, Pudlak, and Takeuti(1991).

The presentation of second order systems in 5.5 follows the model-theoreticconstruction of Krajicek (1990), developed proof-theoretically in Takeuti (1993)and Razborov (1993). Originally the systems were defined in Buss (1986), who ob-served Lemma 5.5.6. Lemmas 5.5.7-5.5.10 are from Krajicek and Takeuti (1992).

Theories RZ were defined in Takeuti (1993). Theorem 5.5.13, Definition 5.5.15,and Theorem 5.5.16 are from Krajicek (1990).

6

Definability of computations

This chapter presents important definability results for fragments ofbounded arith-metic.

A Turing machine M will be given by its set of states Q, the alphabet E, thenumber of working tapes, the transition function, and its clocks, that is, an explicittime bound. Most results of the form "Given machine M the theory T can prove... " could be actually proved in a bit stronger form: "For any k the theory T canprove that for any M running in time < nk ..."A natural formulation for suchresults is in terms of models of T and computations within such models, but inthis chapter we shall omit these formulations.

An instantaneous description of a computation of machine M on input x con-sists of the current state, the positions of the heads, the content of all tapes, andthe current time: That is, it is a sequence of symbols whose length is proportionalto the time bound for n := Ix 1.

A computation will be coded by the sequence of the consecutive instantaneousdescriptions.

Now we shall consider several bounded formulas defining these elementaryconcepts. They are all Ab in the language L+ and thus also (by Lemma 5.4.1)in L.

Subsequently oracles will be represented by formulas, but to make the presen-tation uniform we shall augment the language L by new unary predicate symbolsa (x); the new language will be denoted L (a). Definitions of classes Eb (a), nb(a ),Ab (a) and theories SZ (a),TT (a) are straightforward generalizations of the originaldefinitions for the language L. The characterizations from Section 3.2 generalizeto Eb(a) too: For example, subsets of N defined by E'(a)-formulas are exactlythose in NP", NP with oracle In I a(n)}.

Let M be an oracle Turing machine with the explicit time bound t(n) codedin the description of M. The property that u is an instantaneous description of a

93

94 Definability of computations

computation of M is defined by a simple Ab -formula

Instants (u)

saying that u is a tuple of sequences, one coding the current state, several codingcontents of tapes with positions of heads, one coding the content of the query tape,and one coding the current time.

Another Ob -formula

Initm(x, u)

expresses that u is an initial instantaneous description of M with the input x (i.e.,in the initial state and positions of heads, with x on the input tape, with emptyworking tapes and empty query tape, and time 0).

A halting configuration is defined by the Ob-formula

Haltm (u)

The property that u and v are two consecutive configurations of a computation ofM with oracle a is expressed by the formula

Consects(u, v, a)

which is Ab(a); it says that both u and v are instantaneous descriptions, the timein v is 1 plus the time in u, v was obtained from u by the transition function of M,and, in particular, if u was in a query state then v answers the query according toa.

Finally, the formula

CompM(x, w, a) := Initts(x, (w)1) A

Vi < t(Ixl), Consecm((w)t, (w)r+i, a) A RJ < t(IxI), Haltm((w)j)

expresses that w is a computation of M on the input x with the oracle a, and thefunction

Outputn4 (x, a)

is the content of the first working tape in the halting configuration of a computationof M on x with the oracle a.

It will follow from results in Section 6.1 that all these notions are, in fact, Ob (a)in PV i (a).

6.1. Polynomial time with oraclesIn this section we shall define polynomial time computations with oracles fromthe polynomial time hierarchy PH in fragments of S2.

6.1 Polynomial time with oracles 95

Lemma 6.1.1. For every polynomial time Turing machine M the theory PVIproves

Vx3!w, CompM(x, w, 0)

Proof Let the time bound (which is an explicit part of the definition of M) be< nk.

Then the function

f (x, y) := the sequence of first IyI instantaneous

descriptions of the computation of M on x

is defined by the limited recursion on notation

f(x, 0) := u ifflnitM(u)

and

f (X, Y) :=f(x, [21) -. u where ConsecM(f(x, LZAyi_i, u)

f(x, LZJ) if IYI > Ixlk

with the implicit bound

If(x,Y)I < IX I O(k)

that is,

f(x, y) <x#...#xThe uniqueness of a computation is obviously provable by LIND for open formulasof PVI, available in PVI by its definition. Q.E.D.

Theorem 6.1.2. Let i > 0, let O(a) be a Eb formula and assume T = PVI ifi = 0 and T = TZ if i > 0. Assume that M is an oracle polynomial time Turingmachine. Then theory T proves

Vx3!w, CompM(x, w, O(a))

Proof. Consider a formula A(x, v) defined as

A(x, v) := 3u, w < v; v = (u, w) A CompM(x, w, u) A Vi < IwlVb < w, ("if the i th oracle query was [0 (b)?]" A bit(u, i) = 1)

0(b)

Formula A is Eb for i > 0 and Eb for i = 0 (as u, w are uniquely determinedby v and b is uniquely determined by i, w). We think of u as coding the oracle{iIiEu}.

By Lemma 6.1.1, PV I proves that there is wo such that CompM (x , wo, 0) holds.Put vo := (0, wo).

By Lemma 5.2.6 the theory T proves that there exists maximal v1 < 2"k

satisfying A(x, v); note that such v1 is then bounded by a term in x. Let v1

(ul, wl)

Claim. The theory T proves that for any i < I w 1 I , b < w 1, if [0 (b)?] was the i th

oracle query then

[bit(u1, i) = 1] _ O(b)

The claim holds as all the positive answers of u 1 are correct by the definitionof the formula A, and if some negative answer would be wrong, then we couldchange it into a positive one and all later answers into negative ones. In this waywe would construct u2 > u1 such that for W2 for which CompM(x, W2, u2) holdsthe number v2 = (u2, w2) is bigger than v1, contradicting the choice of v1.

It follows from the claim that CompM(x, w1, O(a)) holds. The uniqueness ofw1 is simple. Q.E.D.

Corollary 6.1.3. For all i > 1, the theory PVl+1 is fully conservative over T2

Proof. The idea is to augment TZ by function symbols for all E +1-definablefunctions. By Theorem 6.1.2 all o +-functions are E +1-definable in T2, henceevery function of PV;+1 will have a counterpart in this conservative extension of T2 .Moreover, the defining equations for the PV;+1-functions determine a polynomialtime oracle Turing machine computing that function using the defming equation,and hence open induction on the computation proves in TZ the defining equation.Finally, open induction in PV;+1 translates into open induction in the extension ofT2, which in turn translates into +1-IND, available in TZ by Lemma 5.2.2.

Q.E.D.

Corollary 6.1.4. For every i > 0 there is a E+1 formula UNIV,+1 (x, y, z) suchthat for any E +1 formula 0 (a) there are a term t (a) and a natural number e suchthat SS proves the equivalence

O(a) = UNIV;+1(a,t(a),e)

(e is the numeral of e).

Proof. The idea of the proof is to use the universal Turing machine. The problemis that this machine has to be itself in polynomial time, but then it cannot simulatemachines running in a greater polynomial time. This is repaired by an extra input,produced by term t(a): if the time of the universal machine is p(n) and time ofthe machine to be simulated is q (n), then we need term t (a) so that

q(lal) < p(It(a)I)

Number e codes the description of a machine computing 0. Q.E.D.

6.2. Bounded number of queriesBounded query classes are defined similarly to oracle computations except that

the machine is equipped with an extra bound to the number of oracle queries it canask (cf. Krentel 1986). We shall consider only one type of such classes.

Definition 6.2.1. Class Pyp[0(log n)] is the class of languages recognized by apolynomial-time Turing machine querying at most 0 (log n)-times a E p-oracle.

Theorem 6.2.2. Let i > 1, let 0(a) be a Eb formula, and let M be a boundedquery oracle polynomial time Turing machine allowing 0(log n) queries. Thenthe theory SZ proves

`dx3!w, Comp ,y(x, w, 0(a))

Proof The proof of this theorem is completely analogous to the proof of Theorem6.1.2 except that we can use the length-maximum principle in the Claim insteadof the original maximum principle, which is by Lemma 5.2.6 provable in S2. Thisis because the length of u's is now bounded by the log of the Ix I; that is, u's aresharply bounded. Q.E.D.

Although the definitions of these classes seem a bit ad hoc, they, in fact, coincidewith several other, perhaps more naturally defined classes.

Theorem 6.2.3. The class of sets PNP[0(log n)] is identical with the. followingthree classes:

(i) sets log-space Turing reducible to SAT. L N P(ii) sets truth-table reducible to SAT: <tt (NP)

(iii) sets definable by Eo(Eb) formulas.

The interest in considering these classes in connection with bounded arithmeticstems from Corollary 7.3.6.

6.3. Interactive computationsIn this section we define the notion of counterexample computations, and then weformalize a slightly more general notion in 6.3.2.

The intended environment for the counterexample computations are optimiza-tion problems. An optimization problem is a binary relation R(x, y) and a functionc(x) which are both polynomial time computable and where the relation R(x, y)satisfies

R(x, y) - Iyl < Ixl°(')Call any y such that R(x, y) holds a solution to x, and given two solutions yj andy2 to x call yt better than y2 if c(yi) > c(y2). A solution is optimal if there is nobetter one.

There are two prominent examples: the optimization problem CLIQUE withthe relation

R(x, y) :_ "y is a clique in the graph x"

with the cost function

c(y) := "the size of y"

and the problem TSP with the relation

R(x, y) := "y is a tour in the graph x with

the edges labeled by natural numbers"

and the cost function

c(y) :_ "the sum of the labels of the tour y"

We refer the reader to Krentel (1986) and Papadimitriou and Yannakakis (1988)for alternative definitions and treatment of optimization problems.

A specific way of searching for an optimal solution requires interaction betweentwo players: the teacher and the student. The student will be a polynomial timemachine, whereas the teacher has unlimited ability: It is an arbitrary function.

On input x the student computes a solution yi : R(x, yl), the candidate foran optimal solution. If yj is optimal then the teacher says so and the computationis finished with the output yi. Otherwise the teacher presents the student with abetter solution y2 : c(y2) > c(y1). Now knowing x and y2 the student produces thesolution y3 and learns from the teacher that it is optimal or gets a better solution y4.

In this way the interaction proceeds until the student finds an optimal solution.The total computational time of the student must be polynomial in the length of x.

In the example CLIQUE the student has a trivial strategy; first he producesa trivial clique consisting of one vertex and then he just repeats the teacher'scounterexamples (y3 := y2, y5 := y4, ...). As there are only < Ix I possible costvalues he always outputs a clique of maximal size. Note that a similar strategyfails in the second example TSP as there are, in general, exponentially many costvalues.

There is a natural reducibility notion between optimization problems in whichTSP is complete and in which CLIQUE is complete among those optimizationproblems whose cost function is bounded by a polynomial: c(y) = Ix 10(' ). One caneven prove, assuming that NP does not have polynomial size circuits, a hierarchytheorem for computations allowing at most f (lx 1) < Ix I'-' counterexamples in asingle computation. We refer the reader to Krajicek, Pudlak, and Sgall (1990) fordetails and proofs.

We shall now generalize the counterexample computations to a slightly moregeneral context. Assume `dx3vvz, A(x, y, z) holds, with A a polynomial timepredicate and with a polynomial bound < t(x) to y and z implicit in A. Given a

the student should compute b such that Vz, A (a, b, z). A counterexample is anyc < t(a) for which -A (a, b, c). Thus the student produces bI < t(a) and eitherlearns that it satisfies Vz, A(a, bI, z) or receives from the teacher cl < t(a) suchthat -A (a, b I, cl ). Having a, cl the student computes b2 < t (a) and again eitherlearns that it is a solution or gets a counterexample c2, and so on.

Note that the original situation with optimization problems is contained inthis scheme as the property "b is an optimal solution to a" can be expressed asVz, A(a, b, z): That is, it is a coNP-property.

A counterexample to the claim that Vz, A(a, b, z) holds is, in fact, a witness tothe existential quantifier in 3z, -A (a, b, z). Hence the following notion of com-putation (cf. Krajicek 1993, Buss et al. 1993) generalizes the counterexamplecomputations.

Definition 6.3.1. A polynomial time Turing machine M with a witness-oracleQ(x) = 3y < t(x)R(x, y) is a polynomial time machine with a query tape. forqueries to Q that answers a query a as follows:

1. if 3y < t(a)R(a, y) holds, then it returns YES and some b < t(a) suchthat R(a, b): That is, it returns a witness to the affirmative answer.

2. ifVy < t(a)-R(a, y) holds, then it returns NO.

Note that since there may be multiple witnesses to affirmative oracle answers,the computation is not uniquely determined and witness-oracle Turing machinesthus generally compute only multivalued functions rather than functions. A mul-tivalued k-ary function f (x) is just a (k + 1)-ary relation F(x, y) . We call anyy such that F(x, y) a value of f (Y) .

Definition 6.3.2. Let i > 1 and let q(n) be a function.FPEI [wit, q (n)] is the class of multivalued, functions f computable by a poly-

nomial time witness-oracle Turing machine such that1. on an input of length n the machine M makes at most q (n) oracle queries2. the witness-oracle has the form

3y < t(a), R(a, y)

with REFO ifi> land in Ebifi=l3. on an input x the machine M outputs some y such that f (x) = y

Note that we do not require that all possible values of f (x) appear as outputsof some computations. Also observe that without changing the strength of suchmachines we may allow the machine unlimited access to a EP 1-oracle. This isbecause a question whether there is a computation with correct answers of a E r I -oracle is a ET-query that always has a unique witness. Hence such a computationmay be found by a single query to a Ep-witness-oracle.

Let

WitCompM(x, W, Q)

formalizes that "w is a computation of a witness-oracle machine M on an input xwith an oracle Q."

Theorem 6.3.3. Let i > 1. Every. function f from FP' ' [wit, c log n] is E +1definable in S2.

That is, for each such function f there are a polynomial time witness-oraclemachine M and E ,P-witness-oracle Q such that S2 proves

dx3w, WitCompM(x, w, Q)

Proof. Let Q(a) = 3y < t(a), R(a, y) and assume R is a O'-formula w.r.t. S.Let p(n) be the time bound of the machine M computing f with witness-oracle Q.Define a formula A (a, h, w) to be the conjunction of the following four conditions:

1. w satisfies all conditions posed on a computation of M on a with possiblyincorrect oracle answers

2. that iI <i2<...ir<Iwland Ji .... , jr = 0, 1

3. oracle query in steps is, s = 1, . . . , r, is answered YES if js = 1 and NOifjs=0

4. whenever [Q(us)?] is the query in step is and js = 1 then ws is a witnessto it and it is a part of the instantaneous description

The first three conditions are Ob, the last is Ob, so the formula A is i '.

Claim. The theory S1 proves that there is a lexicographically maximal e of theform

e = (j1, , jr)

such that

3h, w, i , h = ( ( i 1 , j1), ... , (ir, jr)) A A(a, h, w)

The formula in the claim is Eb, the lexicographically maximal e is just themaximal e, and any e of such form satisfies

e < 2r < laic

Hence the existence of such e follows by the Eb-LENGTH-MAX principle thatis available in SZ by Lemma 5.2.7.

To conclude the proof, argue as in the proof of Theorem 6.1.2: For h and wwitnessing the formula for the maximal e, all affirmative oracle answers are correctas they are witnessed, and all negative answers are also correct as otherwise a 0 ine could be changed to 1, all later l's to 0, in this way creating e' > e satisfying theclaim. Hence w is a computation of M on input a. Q.E.D.

Lemma 6.3.4. For any i > 1, any f E FPF-P[wit, q(n)] is computable by a ma-chine M with a E p-witness-oracle allowing q (n) + I oracle queries but requiringthe oracle to return a witness only in the last query. Moreover, this last witness isan ordered pair whose first component is the output of the computation.

Proof. Let M be a polynomial time machine M with witness-oracle Q. Considermachine M', which by a binary search constructs the maximal 0-1 sequence e =(jl, ..., jr) satisfying the formula B(a, e) from the claim from the previous proof.

This requires < q (n) queries to a E f -oracle

3f,B(a,e-- f

but it does not need the witnesses to the affirmative answers.Having found such maximal e the machine M' states the query

[3(y, (h, w, i )); h = ((ii, ii), ... , (i2, jr))A A(a, h, w) A "y is the output of w"?]

The answer must be YES and a witness is an output of a computation of M on a,that is, a correct output. Q.E.D.

Corollary 6.3.5. Predicates, from the class

U FPEP[wit, c logn]C

that is, functions with unique values from {0, 1 }, are exactly the predicates fromthe class PEP[O(logn)] (cf Definition 6.2.1).

Proof That PEP [O (log n)] is included among predicates in FPEP [wit, c log n] istrivial. The opposite inclusion follows similarly to Lemma 6.3.4 (in the last queryadd condition y = 1 and do not require a witness to the oracle answer). Q.E.D.

6.4. Bibliographical and other remarksTheorem 6.1.2 is due to Buss (1986), Definition 6.2.1 is from Krentel (1986),Theorem 6.2.2 is from Krajicek (1993), and Theorem 6.2.3 is due to Buss and Hay(1988) and Wagner (1990).

Counterexample computations (student-teacher) were defined explicitly inKrajicek et al. (1990) and used earlier in Krajicek et al. (1991). The witness-oracle computations were used in Krajicek (1993) and the classes based on them(6.3.2) were explicitly defined in Buss, Krajicek, and Takeuti (1993). Theorem6.3.3 is from Krajicek (1993).

Witnessing theorems

This chapter considers various witnessing theorems, which are theorems char-acterizing functions definable in various systems of arithmetic in terms of theircomputational complexity. A prototype of such a theorem (and its proof) is thecharacterization of primitive recursive functions as provably total recursive func-tions in fragment IE? of PA (cf. Parsons 1970, Takeuti 1975, and Mints 1976).

There are other approaches to proving witnessing theorems, for example,skolemizing the given theory by Skolem functions from a particular class andthen applying Herbrand's theorem. Or there are intrigued model-theoretic con-structions. I shall mention these methods too, but my opinion is that one reallyhas to know in advance which class of functions one targets before formulatingan argument while the methods based on cut-elimination (Section 7.1) and gener-alizing Theorem 7.2.3 help to discover the right class. This certainly was the casefor all witnessing theorems discussed in this chapter.

7.1. Cut-elimination for bounded arithmeticWe first extend the sequent predicate calculus by rules allowing the introductionof bounded quantifiers and by the induction rules and then we prove the cut-elimination for such a system.

The predicate calculus LK extends the propositional LK from Section 4.3 byfour rules for introducing quantifiers to a sequent as in Definition 4.6.2:

V : introduction

l ft dI' -f A, A(a)

i hte `dxA(x), r - A an r g r -f t , b'xA(x)

2. 3 : introductionA( r) O h - A A(t)a ,

l ft d,

i hte3 A( r) A

an r g F -) A 3 A ( )x x , , x x

102

7.1 Cut-elimination for bounded arithmetic 103

where t is any term and variable a in d : right, and 3 : left must not occur in thelower sequent

3. and by instances of equality axioms

tl =Sl,...,tk =sk , .f(tl.... )= f(SI.... )

and

tl =sl,...,R(t) R(s)

for all function symbols f and predicate symbols R of the language.For technical reasons (an easier formulation of Theorem 7.2.3 )we stipulate that

formula A in any initial sequent

A -* A

must be atomic.

Definition 7.1.1. LKB is a proof system for the predicate logic extending LK bythe following four rules allowing introduction of bounded quantifiers:

1. V < :introduction

leftA(t), I' O

and righta < t, r -a A, A(a)

t<s,Vx<sA(x),F'----> A I') D,Vx<tA(x)2. 3 < : introduction

A, A(t)left

a t, A(a),and right3x<tA(x),i,A

with the requirement that the variable a in V < : right and 3 < : left does notoccur in the lower sequent.

The system LKB is sound and complete and, for that matter, any valid sequentformed from bounded formulas can be proved in LKB without the use of unboundedquantifiers. We shall see this later.

Definition 7.1.2.(a) The IND-rule is the following inference rule

r, A(a) -* A(a + 1), AF, A(0) --+ A(t), A

(b) The PIND-rule is the, following inference rule

r, A(La/2J) -f A(a), AI', A(0) -p A(t), A

(c) The LIND-rule is the following inference rule

I', A(a) -* A(a + 1), AP, A(0) A(Itl), A

104 Witnessing theorems

where t is any term and where the variable a does not occur in the lower sequent.The Eb-IND rule denotes the rule restricted to Eb formulas A, and similarly

for other classes offormulas and the other two rules.

Lemma 7.1.3. Denote by BASICLK the set ofsequents of the form

A

with A an axiom from BASIC.Let i > 1 and let B be any formula. Then

TZ F- B

if and only if sequent

- Bis derivable in LKB +Eb-IND from the initial sequents BASICLK.

The same holds for PIND- and LIND-rules: That is, Ei 1.PIND (resp. Eb-LIND) rules are equivalent to Eb PIND- (resp. Eb-LIND) axioms.

The "if part" of the lemma is obvious and for the "only if part" it is sufficientto infer any instance of an Eb-IND-axiom in LKB using the E'-IND-rule only;this is left to the reader (note that the restriction to Eb applies only to the inductionformulas of the IND-rule, other formulas in a derivation can be arbitrary).

The next theorem, the cut-elimination theorem, is the crucial technical propertyof the sequent calculus. The cut-elimination for LK was proved by Gentzen and itis called Gentzen's Hauptsatz (see Gentzen 1969).

Recall from Section 4.3 (Definition 4.3.2) the notion of an ancestor and suc-cessor of a formula in a proof. A formula 0 in an LKB derivation with inductionrules is called free if 0 is in an initial sequent or no successor of 0 is identical to¢ or to one of the two principal formulas of an induction rule.

Theorem 7.1.4. Assume i > I and assume that the sequent

r- Ais provable in T2. Then

I,- fAhas a proof in LKB with the Eb IND-rule in which no cut formula is free.

The same is true for the SS and the EbPIND-rule.

The theorem is proved by double induction on the complexity of a proof in TZand on the complexity ofthe cut-formulas in it. We refer the reader to Takeuti (1975)for a clear treatment of the cut-elimination that applies to LKB with the inductionrules. We shall state explicitly one immediate corollary of cut-elimination: thesubformula property.

7.2 Eb-defnability in SS and oracle polynomial time 105

Corollary 7.1.5. Let i > 1 and assume that the sequent

IF --) A

is provable in TZ (resp. in Si).Then there is a proof of

1` 0 A

in the same theory in which everyformula is either a subformula of a Eb- or aIIb formula or a subformula of a formula from

r-) AIn particular, if all formulas in r, are Eb formulas, then all formulas in the

proof are Eb or 1-Ib.

Proof Let q5 be a formula in a proof of r ) o without free cut-formulas; suchproof exists by the previous theorem. Assume .q is not a subformula of a Eb-formula. It follows that 0 is free, as are all its successors. If 0 would not occur asa subformula in r A then it would have to disappear before the end sequentby a cut, in which case the cut-formula would be free, a contradiction. Q.E.D.

7.2. Eb-defnability in S2 and oracle polynomial timeThe idea of the witness function method is the following. Assume that

r -) Ais a valid sequent consisting of formulas starting with existential quantifiers, say

2xi/i(a, x) --) Dy9(a, y)

Assume that a are free variables that do not have to all appear in all formulas of thesequent. That means that for any x satisfying t/i(a, x) there is y satisfying 9(a, y)so that the function

some y such that 9(a, y) if 1/i(a, x)

any y if -*(a, x)

is well defined and witnesses the validity of the sequent.The witnessing functions for initial sequents are trivial so one hopes that pro-

gressing along a proof of the sequent allows an explicit description of the witness-ing functions for each sequent in the proof and, in particular, an estimate of theircomputational complexity. Obviously this cannot be done with an arbitrary proofin which any formula may appear and so one has to appeal to the cut-eliminationtheorem to assure the existence of proofs for which there is information about theformulas in it.

The next definition introduces a method to deal with sequents whose formulasdo not start with an existential quantifier. We use sequence coding from Section 5.4.

Definition 7.2.1 (Buss 1986). Let i > 1 and A be a Eb formula with free vari-ables among J. By induction on the logical complexity of A define the formula

Witness Aa (w, a )

by:

1.IfAEEb, U11'_, then

Witness " a (w, a) = A (U)A

2.IfA=BACthen

WitnessA°(w, a) = (Witness. ((w)1, i) A Witness' ((w)2, Z O)

3. IfA=BvCthen

WitnessA°(w,a)-(Witness.i,ii ((w)1,a)V Witness C ((w)2,a))

4. If A(a) = Vx < I s(a) B(a, x) and A EbI

U fIb , then

WitnessAa(w,a). (Seq(w)nVx < ls(i )j Witnessga'd((w)x+1,a,x))

5. IfA(a) = 3x < s(a)B(a,x) andA 0 EbI U l1b_, then

Witness 5°(w, a) -_ (Seq(w)n(w)2 < s(i )AWitnesse '6((w)I, a, (w)z))

6. If A = -B and A Eb i U l1'_ 1 then use the prenex operations andde Morgan rules to push the negations into the Eb

1-subformulas and then

apply one of clauses 1-4.

Lemma 7.2.2. Let i > I and let A be a Eb formula. Then:(a) The. formula WitnessA°(w, a) is Ab in S2-(b)

PV1 H Witness Aa (w, a) -a A(a )

(c)

S2+BBEbHA(a)-3w, Witness5a(w,a)

Proof. Property (a) is obvious from the definition of the witness formula. Property(b) is proved by induction on the logical complexity of A, which is straightforward.

To prove property (c) proceed also by induction on the logical complexity ofA. The only nontrivial cases of Definition 7.2.1 are those treating the quantifiers.Let

A(a) = 3x < t(a)B(a,x)

7.2 Eb-defnability in S2 and oracle polynomial time 107

Argue in SS : If A(a) holds, then B(a, b) holds for some b < t (ii) and by theinduction assumption

WitnessB° (w, a, b)

holds for some w. Setting w' := (w, b), clearly

WitnessA° (w', j j)

Now let

A(a) =`dx < Is(a)IB(a,x)

By the induction assumption we already know

Sz + BBEb I- B(a, b) - 3vb < t(a ), WitnessB°'b(vb, a, b)

So

Sz + BBEb H A(a) - Vx < Is(a )I3vx < t(a ), WitnessB°'b(vx, d, x)

Applying BBEb we get

Sz + BBEb H A(a) -* 3wVx < Is(a)I,

(WitnessB,b((w)x+1, a, x) A (w)x+i < t(ii))

and thus also

Sz + BBE' H A(a) - 3w, WitnessA°(w,

Q.E.D.

We are arriving at Buss s witnessing theorem; our formulation is a combinationof the original version with a later improvement.

Theorem 7.2.3 (Buss 1986, 1990a). Let i > 1 and assume that

SSHA(a)- B(a)

where A and B are Eb and a are all free variables in the sequent.Then there is a PV; function symbol f (w, a) such that

1. the.function f is and it is E'-definable in Sz2.

PV; H WitnessA°(w,a) --) WitnessB°(f(w,a), a)

Proof. From the assumption and Corollary 7.1.5 there is an LKB proof Jr of thesequent with the Eb-PIND rule in which all formulas are Eb or rlb. A general


sequent in it has the form (with possibly permuted formulas)

El,...IEk,U1,...,Ur ± F1,...,Ft,V1,...,Vs

with Ei's and F's Eb and Uj's and Vj's rlb.By induction on the number of inferences in it above the sequent we show that

there is a function g(w, b) such that:

PVi I- Witnessi (w, b) -f Witness y (g(w, T), T)

where

G :=AEi A/(,V,) and H:=V F; vV(-Uj)i i i >

and b are all free variables of the sequent.In the induction step we shall distinguish cases according to the type of the

last inference applied in it to derive the sequent. To simplify the notation we shallpicture sequents in the form

E,r-+A or r -* A,Fthinking that all formulas are Ei , and we shall write, for example, Witness,,,. (w, b )instead of the conjunction G in the index. We shall also describe only the inductivedefinition of the function g leaving the provability of its properties in PVi to thereader, and treat only the right rules of LKB as the left ones are treated dually.

Initial sequents. An initial sequent has either the form C ) C for C atomicor -k C for C an axiom from BASIC; in both cases C is open and by definition

C(b) __ Witnesscb(w, b )

Structural inferences. Structural inferences are easy to handle, utilizing thepermutation of the components of the witness (for the exchange rule), defining bycases distinguished by a Ab-property (for the contraction rule) or adding a dummycomponent (for the weakening rule).

Propositional inferences. Assume that for a A : right inference

r--A,C r ) A,Dr - A,CAD

functions $1 and $2 witness the upper sequents where g1 computes from a witnessw a tuple (w I, wc) (a witness for the succedent of the left upper sequent), and $2computes on w a tuple (w2, wD). Then define the function g by

(w 1, 0) if Witnesso (w 1, b ))g(w - j

l (02, (WC, WD)) otherwise

7.2 Eb-definability in SZ and oracle polynomial time 109

Assume that w witnesses F. Then (wl, wc) witnesses A, C, so either wlwitnesses A, in which case g(w) = (w 1, 0) witnesses A, C A D, or we witnessesC, in which case g(w) = (02, (wc, WD)) witnesses A, C A D, as well, sinceeither w2 witnesses A or pair (wc, wD) witnesses the conjunction C A D.

Assume that for a V : right inference

F -+ O,CF--)A,CvD

function gi is witnessing the upper sequent, computing on w a tuple (w 1, wc).Define function g by

g(w) (J1, (wc, 0))

It trivially witnesses the lower sequent.Finally assume that gl is a witnessing function for the upper sequent of a: right inference

C,F -> AF -) A, -C

Since we assume that all formulas in r are Eb, C is in fact in Eb 1 U IIb_1 and

hence Witnesscb(w, b) is by definition formula C itself.Define the function g by

g(w) (gi ((0, w)), 0)

Quantifier inferences. Let gl witness the upper sequent of the V < : rightinference

c<t,F ) A,C(c)r -) 0, `dx < t c(x)

We must consider two cases, when t is of the form Is I, and when it is not of thisform.

In the latter case the formula Vx < tC(x) must be IIb_1 and hence its witnessformula is identical with it and the definition of g poses no problem.

In the former case assume that gl computes on input ((u, w), c) the output(w`, w2). Define the function g by

if w` witnI w`g(W) :_(0, (wo, ... , w1s)) otherwise

esses A for some c < IsI

If w witnesses r then g(w) witnesses the lower sequent because eachgl ((u, w, c)), c = 0, ... , Isl, witnesses either A or C(c).

Note that the sequence (wo, ..., wls) exists by BBEb available in S.

1 10 Witnessing theorems

Now assume that gi (w) = (w 1, v) witnesses the upper sequent of a 3 < : rightinference

r - A, C(t)t <s,r -* A, ]x <sC(x)

Then define the function g by

g(u, w) :=0

(w(val(t), v))ift>sift <s

where val(t) is the value of the term t. If w witnesses r and t < s then (u, w)witnesses t < s, r, in which case either wI witnesses A or v witnesses C(t). Thatis, (val (t), v) witnesses 3x < c, C(x).

Cut-rule. Let gI (w) = (w1, u) and g2(v, w) = W2, respectively, witness theleft and the right upper sequents of a cut-rule inference

r-) A,c C,r ) Ar ) A

Define the function g by

W1 if Witness`og(w)

(w , b )

g2(u, w2) otherwise

The function g(w) is either wl, in which case it witnesses A, or otherwise uwitnesses C and hence g(u, w) witnesses A.

Eb-PIND rule Let gi (u, w, c) = (wi, v) witness the upper sequent of a Eb-PIND inference

r, C(Lc/21) - C(c), Ar, C(0) -- C(t), A

We shall consider only the case when C EbI

U IIh , as otherwise the construc-tion of g is trivial.

The idea of the construction of the function g(u, w) witnessing the lower se-quent is to iterate the function g, f o r values c = 0, ... , t. We shall proceed by thelimited recursion on notation

f(u, w, 1) :_ J0I gi (u, W, 1)

if - Witnesscb0) (u, b )

otherwise

f(u, w, Y) = gi It\.f (u, W, Y21)), w, y)

and then we put

g(u, w) := f(u, w, t)

7.2 Eb-definability in S2 and oracle polynomial time 111

The definition off is correct as f is a priori bounded by 2e, where f is the maximalsum of the lengths of witnesses for C(c) plus I w I plus It I.

To see that g witnesses the lower sequent verify by induction on the length ofy that f (u, w, y) witnesses the sequent

I', C(0) - C(y), A

This completes the proof of the theorem. Q.E.D.

The witnessing Theorem 7.2.3 has several important corollaries. We shall men-tion four of them now.

Corollary 7.2.4. The theory SZ is Eb-conservative over the theory PVI and. fori > 1 the theory S2+I is E +I -conservative over the theory PV,+I and also overthe theory I.

Proof Assume A(a) E E,+1 and

Si+I H A(a)

By Theorem 7.2.3

PV; +I H WitnessA°(f(a),a)

for some PV;+I-function symbol f. But by Lemma 7.2.2 even

PVI H WitnessAa(f(a),a) -+ A(a)

That is, together

PVi+I I- A(a)

This proves a part of the corollary. The part about the conservativeness over Tzfollows from Theorem 5.3.5. Q.E.D.

Note that the theory TZ has a VEi+1-axiomatization; hence the preceding state-ment is the best possible unless in fact TZ = S2+1.

Corollary 7.2.5. Let i > 1 and let 0(a) be a Eb formula and V' (a) be a fIh-formula. Assume that

S2f-¢(a)_*(a)Then the formula 0(a) defines a Op predicate in N.

In particular any property that is provably in Sz inactually in the class P.

the class NP fl coNP is

Proof From the hypothesis it follows that

SZ F- 0(a) V -*(a)


where this disjunction is a Eb-formula. By Theorem 7.2.3 there is a P-functionf so that

PV; I Witness (f (a), a)

That is, either (f (a)) I witnesses (P (a) or (f (a))2 witnesses ,l (a), and both casescannot occur at the same time as

Sz f- -(O(a) A -Vf(a))

Hence the characteristic function of ¢

1 if Witnessa ((f (a)) i )

0 otherwise

is a P-function: That is, .0 (a) defines a predicate in the AP level of the polynomialtime hierarchy PH.

For i = 1, EP = NP, III = coNP and Op = P (cf. Section 2.2 and Theo-rem 3.2.12). Q.E.D.

Corollary 7.2.6. Let i > 1 and let (a, b) be a Ed :formula. Assume that

Sz H Vx3yg(x, Y)

Then there is a Eb formula ¢*(a, b) such that SZ also proves1. 0* (a, b) _ 0(a, b)2. Vx3! y¢*(x, Y)

Proof By Theorem 7.2.3 the hypothesis implies

S2' I- Vx, Witness' a (f (x), x)

for some P function f fib-definable in S. Define the formula (P* by

0*(a, b) (b = (f(a))1)

It clearly satisfies the requirement. Q.E.D.

Corollary 7.2.7. For i > 1 let Q! E Ei+, and 1/1 E IIb+1 be arbitrary, and assume

T2' H0(a)-*(a)

Then

T2' H (0(0) A Vx(0(x) -* ¢(x + 1))) Vx, 0(x)

The same holds for i = 0 and PV1 in place of T°.This scheme is called the Ob+i IND scheme.

7.3 E+2 and E+t definability in S2 and bounded queries 113

Proof By Lemma 5.2.9 S2+' admits A +l-PIND. Using Vx, O(x) =_ *(x) wecan write the IND-axiom for 0 as

->>/*(0)V x<a(O(x)A-,i/i(x+1))v0(a)

which is a E+,-formula. By Corollary 7.2.4 T2 proves all E +1-consequences ofSZ+1. Q.E.D.

7.3. E +2- and E+,-definability in S2 and bounded queriesWe give two witnessing theorems for E+1 -consequences of S. The first oneuses a simple idea of introducing a symbol for a Herbrand function to reduce thequantifier complexity of the formula and then reinterpreting the Herbrand function.

Consider a formula of the form ili (a) = 3x`d yO (a, x, y). Let f (a, x) be a newfunction symbol and define the formula ilH(a) by

lH(a) := 3xO(a, x, y/f(a, x))

That is, iliH arises from i by reducing the quantifier complexity with the help ofone Herbrand function for the first universal quantifier (a herbrandization of a for-mula would require introduction of Herbrand functions for all universal quantifiersand the new formula would be existential). The following lemma is trivial.

Lemma 7.3.1. Let 1/i(a) = 3xVy¢(a, x, y). Then the implication

*(a) - 'H(a)is logically valid.

We shall define a particular interpretation of the function f (a, x)

.f*(a, x)I

y y is the minimal y s.t. -0(a, x, y) holds

0 there is no such y

Lemma 7.3.2. For every n the sentence 3x0(n, x, f *(n, x)) is valid if and onlyif 3xVyo(n, x, y) is valid.

Proof. Assume 3x4 (n, x, f *(n, x)) and let x = a witness the existential quanti-fier, while Vx3y-(n, x, y) and, in particular 3y-(n, a, y). But the definitionof f *(n, a) also implies that -0(n, a, f *(n, a)), which is a contradiction.

Assume now 3xVyo(n, x, y) with x = a witnessing the existential quantifier:dyO(n, a, y). IfVx-'O(n, x, f *(n, x)) would hold, then also -O(n, a, f *(n, a)),which is again a contradiction. Q.E.D.

Now we use the Herbrand function to derive from Theorem 7.2.3 a witnessingtheorem for the E+2-consequences of S.

Theorem 7.3.3. Let *(a, b, c) be a Eb formula and assume that

SZ I- 3xvy < a 0(a, x, y)

Then there is a function g(a) such that

N envy < n, O(n, g(n), y)

and g is computable by a counterexample computation with the student being apolynomial time Turing machine with a E v 1-oracle and the teacher producingcounterexamples to Vy < ao(a, b, y), that is, some c < a for which -.4 (a, b, c),if such c exists.

In particular, g E FPS+I [wit, poly]. (See Section 6.3 for the definitions of thecomputational models involved.)

Proof. Consider the formula

(3xvy < aO (a, x, y))H = 3x, f(a, x) < a - 0(a, x, f(a, x))

The hypothesis of the theorem implies that

SS(f) F- 3x, f(a,x) <a q(a,x, f(a,x))

where S S (f) is a theory defined like SS but in the language L U If) and with axiomf (a, x) < a. For such a theory we have a statement straightforwardly analogousto the witnessing Theorem 7.3.2 with the witnessing functions now being fromthe class f (f). That is: The witnessing function is computable by a polynomialtime machine querying a EP

,-oracle and querying also for values of the function

f. Such a witnessing functional F then satisfies

f(a, F(a, f)) < a - 0(a, F(a, f), f(a, F(a, f)))

Now substitute into the algorithm computing F(a, f) for oracle f a particularfunction f *

f *(a, b)min c < a s.t.-(a, b, c) if it exists

a+1 if Vy < aO (a, b, y)

Then

f*(a,b) <a - 0(a,b, f*(a,b))implies

Vy < a, O(a, b, y)

and hence the preceding implication yields

Vy < a, 4'(a, F(a, f*), y)

7.3 E +2- and E+ -definability in S2' and bounded queries 115

Note that the algorithm computing F(a, f *) specifies a counterexample compu-tation with the required properties. Q.E.D.

In principle we may apply the herbrandization to bounded formulas of ar-bitrary complexity and obtain a E' (f , ... , fk)-formula. The Herbrand func-tions, however, do not appear to have a nice interpretation (cf. Krajicek 1992 andLemma 11.1.2).

Corollary 7.3.4. Let i > 1 and assume that a function f is E<+1-definable in S.Let the graph off be defined by a formula of the form 3xdy < aA(a, b, x, y),whereVy < aA(a, b, c, y) is l1b.

Then there is a. function g that can be computed by a counterexample compu-tation where the student is a polynomial time machine with a Eb -oracle andreceives from the teacher counterexamples to the formula

Vy < a, A(a, b, c, y)

and such that the function f (a) is a projection of g(a). That is, for any a the valueg(a) has the, form g(a) = (f (a), x), for some x.

Proof Let f be defined by a E+1-formula

3x`dy < aA(a, b, x, y)

The statement then follows from Theorem 7.3.3. Q.E.D.

If a function f is El+1-defnable in SZ and thus also in then it is aE +i-function. Corollary 7.3.4 implies that such a function is, in particular, in

FPEr [wit, poly]. The class FPEn [wit, poly] contains, however, the class +, triv-ially. Thus the only new information the corollary gives is the form of the formulato which the teacher offers counterexamples. That appears to be rather poor infor-mation to yield some estimate of the computational complexity of the functionsE+t -definable in S2' and we need another approach to it. We shall use witness-oracle computations to obtain a witnessing theorem for E<+1-consequencesof S.

Theorem 7.3.5 (Krajicek 1993). Let i > 1 and let A(a, b) be a Ed ,-formulasuch that

Sz 1- dx3yA(x, y)

Then there is a multivalued function f (x) EFPErn[wit,

O(log n)] that is E+definable in Si such that

SZ H Yx, A(x, f(x))

This means that any value y = f (x) satisfies A(x, y).


Note that the converse of this theorem, Theorem 6.3.3, is also true.

Proof We shall show that there is a function f E FPEb[wit, O(log n)]; its E+]-defmability in SS follows from Theorem 6.3.3.

Assume for simplicity that A E IIb. By Corollary 7.1.5 the hypothesis of thetheorem then implies that there is a derivation Tr of the sequent

-) 3yA(a, y)

in which every sequent has the form

1 7 1,A] - r2, A2

where(i) r1 u r2 c Eb u rib

(ii) A 1 has the form

3Y1a1(b, yi ), ..., 3Yrar(b, Yr)

(iii) A2 has the form:

3z111(b, z1)..... 3zsps(b, zs)

where ai's and ni's are nb-formulas.We will assume for simplicity that bounds to yi's and zi's are parts of the

formulas ai and ,Bi, respectively.Now we modify the notion of a witness from Section 7.2. We say that u is a

witness to r1, 01 (for parameters b) if u has the form

u:=(b,Yl, ,Yr)

and the conjunction

nrl(b)AAai(b,Yi)i

is true. We say that v of the form

v:=(b,zl,...,zs)

is a witness to r2, A2 if the disjunction

V 1`2 (b) v VPi(b, zi)i

is true.By induction on the number of inferences above the sequent in r we prove that

there is a multivalued function g(u) EFPE!b[wit,

O(logn)] having the propertythat if u witnesses r1, A1, then any value of g(u) witnesses r2, A2

7.3 E +2- and E +1-definability in S2' and bounded queries 117

This is proved analogously with the proof of Theorem 7.2.3 with an additionalcase in the 3 < : introduction-rules and a bit different treatment of the PIND-rule.

The case of 3 < : right only involves evaluating a term; this is done by apolynomial time machine. In the case of 3 < : left there are two different situations:The inference introduces one of the 3yj's or it introduces an existential quantifierinto a *formula.

In the former case the witnessing algorithm remains the same, except that avariable treated as a parameter now moves into the nonparametrical part of thewitness.

In the latter case assume that the formula introduced by the rule was

3ty(b, t)

and that it was inferred from y (b, c), c a parameter not occurring among b (bythe proviso of the 3 < : left rule). Assume that M(b, c, y) computes the witnessfunction for the upper sequent.

Consider a new algorithm M': on the input (b, y) it first asks the witness-oraclea Eb-query [3ty(b, t) ?]. If the answer is negative, M' outputs 0 and halts. This iscorrect as (b, y) was not a witness to the antecedent of the lower sequent.

If the answer is affirmative then M' is also provided with a witness c to it:Y (b, c). M' then forms a tuple u = (b, c, y) and runs as M on u. Clearly if (b, y)witnessed r 1, A 1, then the output of this computation must witness 172, A2

The principal formula of a V < : right inference is nb. This case is treateddually to the 3 < : left inferences.

Now consider a PIND-inference

y(Lc/2j) - y(c)y(0) - y(t)

with y E E' (we omit the side formulas). Let M be the witness-oracle machinewitnessing the upper sequent. Define machine M' as follows.

On input u' _ (b', ...) it computes the value v of the term t and asks the query[y(v)?]. If the answer is affirmative it outputs 0 (any tuple witnesses true y(t)).If it is negative M' asks query [y(0)?]. If the answer to this query is negative itanswers 0 and again stops (no tuple can witness false y (0)).

In the remaining case (answers to y (v) and y (0) negative and affirmative,respectively) it finds by binary search some c such that y (Lc/2 j) is true whiley(c) is false. Note that this takes 0(logn) E'-queries as we deal with PIND (and

.where n = t i ) .

Then M' forms u = (b, c.... ) and computes as M on u. Any output v of thiscomputation witnesses the succedent of the upper sequent, but as y (c) is false itmust witness the disjunction of the remaining formulas in the succedent. Theseformulas are also in the lower sequent, so v witnesses the succedent of the lowersequent too. Q.E.D.

Corollary 7.3.6. For i > 1, a predicate is E+1-definable in SZ if and only if it is

a predicate from the class Prp[O(log n)].

Proof. By Corollary 6.3.5 the class PEp[O(log n)] equals the class of predicatesin

FPEtP

[wit, O(log n)]. Theorem 7.3.5 then implies the corollary. Q.E.D.

In the rest of this section we shall consider a witnessing theorem for a theoryextending SS by a particular combinatorial principle. The witnessing functionswill be computed by probabilistic algorithms.

A pigeonhole principle is the obvious fact that m pigeons cannot sit in n holesif m > n, each hole accommodating at most one pigeon. There are several ways toformalize this principle in arithmetic (see Dimitracopoulos and Paris 1986), andwe will now consider one of them. Later (in Chapters 11, 12, and 15) we shallconsider some other ways.

The pigeonhole principle PHP(EC) is the scheme

Vx < n3y < n, A(x, y) --) 3x1<x2<n,y<n,A(x1,y)nA(x2,y)

for bounded formulas A. Observe that PHP(EQ) proves (over BASIC) all induc-tion axioms of T2. It is unknown whether T2 proves all instances of PHP(Eb 00).

A weak pigeonhole principle for function f, )APHP(f ), is the formula

3y<2aVx<a,f(x)0 y

stating that no map from [0, a] can be onto [0, 2a]. The word weak indicates thata stronger version is also valid with a + 1 in place of 2a.

Let WPHP(PVI) denote the set of axioms WPHP(f) for every PV1-functionsymbol f (x) (f (x) may have other arguments besides x and they are treated asparameters in the axioms). Recall from Section 5.3 that S1 (PV 1) is the theory SZin the language of PV1; it is a conservative extension of S21 by Theorem 5.3.4and Corollary 7.2.4. Later (Theorem 11.2.4) we shall see that T2 (PV 1) proves thescheme WPHP(PV I ). It is open whether S1(PV I) proves it too.

Theorem 7.3.7. Let 3z < tA(a, z) be a Eb(PV1) formula and assume that

S2 (PVI) + WPHP(PV1) !- 3!z < t, A(a, z)

Then there is a multifunction g, computable by a probabilistic polynomial timealgorithm with bounded error, that witnesses the formula b'x3z < t, A (x, z)

Vx, g(x) < t n A(x, g(x))

In particular, predicates E'-definable in S21 (PV1) + WPHP(PV1) are witnessedby a function from R (random polynomial time; cf. Balcazkr Diaz and Gabarro1988).

7.3 E2- and E 1-definability in SS and bounded queries 119

We should remark on what we mean by a bounded error probabilistic com-putation of a function g. This means that there is a probabilistic polynomial timealgorithm that on input x outputs some g(x) with probability > (3/4).

Proof. Assume that there is a SZ (PV 1)-proof of

3!z < t, A(a, z)

using the WPHP(PV1) axioms for functions f1, ... , fk. For simplicity assumek = 1; case k > 1 is similar.

It follows that there is a S1 (PV 1)-proof of

(3bVy < 2b3x < b, f(x) = y) v (3!z < t, A(a, z))

Introduce a Herbrand function h to get rid of the universal quantifier:

SS(PVI, h) !- (3b3x < b, h(b) < 2b -). f(x) = h(b)) V (3!z < t, A(a, z))

By the relativization of Theorem 7.2.3 there is a polynomial time machine Mquerying an oracle for values of the function h and computing from a a witness

M(a, h) := ((b, x), z)

to the disjunction, where either (b, x) are witnesses to the existential quantifiersin the first disjunct or z witnesses the second disjunct.

For a particular function h = h* computing counterexamples

h*(b) := some y < 2b s.t. `dx < b, f(x) # y

it holds (for reasons analogous to those in the proof of Theorem 7.3.3) that thealgorithm M outputs ((b, x), z) such that z is always a witness to the seconddisjunct: That is,

`dx, A(x, (M(x, h*))2)

Assume that p(n) is the time bound of the algorithm M. It is therefore sufficientto describe a probabilistic algorithm for h* with the probability of error at most< (1 /4 p(n )): The probability that algorithm answers at least one query of M abouth* wrongly is then < (1/4). Such algorithm is, however, trivial: Pick randomlywith a uniform distribution £ numbers < 2b. Each of them has probability < 1/2)to be in the range of f ; hence the probability that all of them are in the range is< (1/2t). So it is sufficient to pick f > 2 + log(p(n)) such numbers.

Assume now that B(x) is a predicate whose characteristic function is Eb-definable in SS (PV 1)+ WPHP(PV 1). Then the algorithm can check whether g(a)is actually a witness for B(a): That is, it can make an error only if a does notsatisfy B(x). This shows B E R. Q.E.D.


7.4. E +2-definability in T2 and counterexamplesIn this section we prove that Theorem 7.3.3 can be substantially improved whenthe theory SS is replaced by PV1, that is, by T2-1 (if i > 1).

Theorem 7.4.1 (Krajicek, Pudlak, and Takeuti 1991). Let i > 1 and assume that(a, x, y) is an 3116 formula. Suppose

T2' F- 3xVy, 0(a, x, y)

Then there a r e D+1 functions f l (a), f2(a, b1), ... , fk(a, bI , ... , bk_1) with allfree variables shown such that TZ proves

0(a, f i (a), bl) v 0(a, f2(a, bl), b2) v ... V O(a, fk(a, bl, ... , bk-1), bk)

This is also true for PV;+I in place of T T and for PVI if i = 0.

Proof Take PV;+I, a universal conservative extension of TZ by Theorem 5.3.5.Assume that

0(a, x, y) = 3z* (a, x, y, z)

with * E 116. By the definition of PV;+I, the formula i/i is in PV;+I equivalentto an open formula, say to a formula il(a, x, y, z).

Since PV;+I is a universal theory we may apply the Herbrand theorem to obtainfrom a PV,+I-proof of 3z >)(a, x, y, z) the terms t, s,,,, such that the disjunction

[rl(a, ti (a), bi, sl,1) v ... v rl(a, ti (a), b1, sl,n)]

V...V [q(a,tk(a,bl,...,bk-1),bk,sk,l)V... V 17(a,tk(a,bl,...,bk-1),bk,sk,n)]

is PVi+1-provable. Terms t1 , ... , tk depend only on the variables shown; termss,,,,, may depend on all variables a, b1, ... , be,.

From the disjunction follows by a repeated introduction of 3-quantifiers insteadof terms s,,,,, (and contracting occurrences of identical formulas) the disjunction

3z r7 (a,tl(a),b1,z)V...V3zr7(a,tk(a,bl,...,bk-1),bk,z)

The functions

fj :=tjare then the desired functions. Q.E.D.

Corollary 7.4.2. For i > 1, a function E +2-definable in PV1 (or in T2-1 fori > 1) is computable by a counterexample computation where the student is apolynomial time machine with a EP r oracle and the teacher answers at most fqueries for counterexamples to a IIi+I -property, where f is a constant.

7.5 Eb-definability in TZ and polynomial local search 121

Proof Let f (a) = b be defined by

3x <tVy<t,0(a,b,x,Y)

with 0 E Eb, that is,

PV; I- 3b3xVy, 0(a, b, x, y)

(we leave out the bounds < t for simplicity of notation). Write this as

PV1 I- 3cVy, */i(a, c, y)

where

*(a, c, y) := 0(a, (C) 1, (c)2, Y)

It is obviously sufficient to describe a computation of c.By the previous theorem there are p-functions fI , ..., fk such that

* (a, fl (a), YI) V ... V * (a, fk (a, YI , ... , Yk- I ), Yk)

The counterexample computation, with f = k - 1, will look like this: The stu-dent computes cI := fI (a) and submits to the teacher that `dyi/i (a, cl, y). If hefails he also receives a counterexample: yI s.t. -*(a, cl, yI ). Then the studentcomputes c2 := f 2 (a, yI) and submits it to the teacher, and so on. As the pre-ceding disjunction is valid, one of cI, ... , ck computed with the help of at most 2counterexamples must work. Q.E.D.

This corollary will play a crucial role in Section 10.2.

7.5. E b -definability in T2 and polynomial local searchIn this section we shall characterize the E

i-consequences of TZ in terms of poly-

1

local search problems. These problems occur naturally in polynomial timeversion only, but their obvious generalizations to p-functions would, via a state-ment analogous to Theorem 7.5.3, characterize E'-consequences of Tz T.

The following definition is from Johnson, Papadimitriou, and Yannakakis(1988).

Definition 7.5.1. A polynomial local search problem (PLS-problem) P is an op-timization problem satisfying the following conditions:

1. Instances of the problem P are x E {0, 1 }*, and for any x there is a set ofsolutions Fp(x) satisfying

(i) the binary predicates E Fp (x) is polynomial time

(ii) VxYs E Fp(x), IsI < p(Ixl), some polynomial p

(iii) dx, 0 E Fp(x).

2. A cost function is a polynomial time, function

cp(s,x) : {0, 1}* x {0, 1}* -+ N

3. A neighborhood function Np(s, x) is polynomial-time and it satisfies

dx, s, Np(s, X) E Fp(x)

4. The cost and the neighborhood functions satisfy

Vx, s, Np(s, x) s -+ cp(s, x) < cp(Np(s, x), x)

5. The task is, given x.find a locally optimal solution s E Fp(x), that is, asolution s E Fp(x) for which

Np(s, x) = s

It follows from the definition that there is a polynomial time computable func-tion Mp(x) such that Mp(x) > cp(s, x) for all s E Fp(x).

A PLS-problem P can be expressed by a rlb-sentence: the conjunctions of thefirst four conditions. If this is provable in TZ then we say P is definable in T'ZThe formula Optp(x, s) is the Ob-formula formalizing Np(s, x) = s.

Lemma 7.5.2. Let P be a PLS problem definable in TZ . Then

T2 H `dx3y, Optp(x, y).

Proof By Lemma 5.2.7 TZ proves the Eb-MAX axioms. Hence in T2 1, for all x,there is a maximum value co < Mp(x) satisfying

3s E Fp(x), Cp(s, X) = Co

Takings to be a witness for this last formula, s is even globally optimal and hence,specifically, satisfies Optp(x, s). Q.E.D.

We shall use the Witness formula from Section 7.2.

Theorem 7.5.3 (Buss and Krajicek 1994). Let R(a) be a Eb formula such thatTz F- bx R(x). Then there is a PLS-problem P definable in Tz' such that TI proves

°YxYs,Optp(x, s) Witness, (s, X).

Proof. Assume that

TZ I- R(a)

and that R is a strict Eb-formula. By Theorem 7.2.3 there is a TZ -proof 7r in thesystem LKB of the sequent - R(a) such that every sequent in tr has the form

AI(b), ... , Ak(b) - B1 (T), ..., Bf(b)

7.5 E'-definability in TT and polynomial local search 123

where b are all r (including a) free variables and where all the formulas A; andB; are strictEb-formulas.

We shall prove by induction on the number of inferences in it above the sequentthat the sequent corresponds computationally to a PLS-problem. Namely, there isa PLS-problem P' such that

1. inputs to Pare k + r-tuples (ml, ... , m,. , v l , ... , vk) where ml, ..., m-are values for the variables b I, ... , b,.

2. for an input tuple (m, v ), the locally optimal solutions are the k + r + 1-tuples of the form (m, v, w) such that if each v; witnesses A, (m) then wis a witness f o r one of the formulas B1 (m ), ... , Be (m ).

From such a problem P' we get problem P satisfying the requirement of thetheorem by adding to each P'-solution (U, w) a new neighbor w with a highercost, provided w is a witness to R.

The existence of the PLS-problem is obvious for the initial sequents. The casesof the propositional inferences and the structural inferences are obvious, requiringonly minor changes to the PLS-problem.

The case of an 3 < : right inference

r -+ A, A(t)t<s,r-A,3x<sA(x)

is simple too: By the induction hypothesis there is a PLS problem Po that appliesto the upper sequent. We modify Po to get a PLS problem P' that works for thelower sequent. First, let

cp'(s, x) = cpo(s, x) + 1

for s E Fpo(x).Inputs (m, vo, v) to P' that provide witnesses to I' are assigned cost 0 and

have as neighbor the input (m, u) to PO. An output (m, u, w) of Po has as itsP'-neighbor a tuple (m, vo, U, w') with the cost Mpo ((m, u )) + 1, where w' = wor w' = (t(m ), w), depending on which one of them provides a witness to aformula in the succedent

A, 3x <sA

Clearly P' has the desired properties.

The cases of 3 < : left and V < : left are handled by simple modifications to thePLS-problem too. The case where the final inference is a V right is analogousto the case of the induction rule IND treated later.

Consider the case where the sequent was inferred using the cut-inference

r - A A- Ar -) A

By the induction hypothesis, two PLS-problems P1 and P2 for the upper sequentssatisfy the requirements.

A PLS problem for the lower sequent is formed as a "composition" of PLSproblems. To simplify this case, we assume w.l.o.g. that the cut formula A is theonly formula in the succedent (antecedent) of the left (resp. right) upper sequent.W.l.o.g. assume that domains Fp, and Fp2 are disjoint. The local optima of theproblem P1 will have as neighbors the instances of P2. By adding Mp, to thecost function of P2, the cost of any P2-solution is greater than the cost of anyP1-solution. This allows us to arrange that any local optimum of the combinedproblem can be found by applying P2 to a local optimum of P1. The details areleft to the reader.

Finally consider the case when the sequent was inferred by the induction infer-ence Eb-IND

A(bo,b)-> A(bo+l,b)A(O,b) A(t(b),b)

Assume w.l.o.g. that there are no side formulas. Given a PLS-problem Po for theupper sequent, define problem P for the lower sequent by an exponentially longiteration of instances of Po.

First, the set Fp'((m, v)) is the set of tuples (mo, z, s) where mo < t(R)and s E Fp0((mo, m, z)); thus FP, is a disjoint union of the solution sets of theinstances of Po. Then define

cp'((mo, z, s), (m, v)) = mo M(m, cp(s, (mo, m, z))

where M is large enough to dominate MPG, (s) whenever m o < t (R) and s EFp0((mo, in, z)). The neighborhood function is defined such that

NP'((mo, z, s), (m, v)) = (mo, z, Npo(s, (mo, m, z)))

with the exception of the case when s = Np0 (s, (m o, m, z)), in which case we set

Np'((mo, z, s), (m, v)) = (mo + 1, z', (mo + 1, m, z'))

for mo < t(m) - 1, where z' is the last component of s, that is, the witness forA(mo+1,h ).Ifmo=t(m)- 1, then

NP'((mo, z, s), (m, v)) = v, Y)

This last case gives a local optimum for P'. It is easy to verify that P' gives aPLS-problem that satisfies the requirement of the theorem for the sequent.

Q.E.D.

Corollary 7.5.4. A multivalued function f is Eb-definable in TZ i f can beexpressed as a PLS-problem composed with a projection, function.

7.5 E'-definability in TT and polynomial local search 125

Proof. Any PLS-problem (and hence also its projection) is by Lemma 7.5.2 Eb-definable in T2 1. On the other hand, by Theorem 7.5.3, witnesses to a E b-definablefunction can be computed by a PLS-problem (i.e., its local optima are all witnesses)and a witness to the existential quantifier of the Eb-formula is obtainable by aprojection from the witness. Q.E.D.

It is a bit unnatural to speak about definable multivalued functions but in thissituation it appears inevitable. It is an open question whether any PLS-problem Pcan be reduced to a PLS-problem P' that has a unique optimal solution in everyinstance. Reduction refers to an obvious notion: By polynomial time functionstranslate P-instances into P'-instances and then P'-local optima back to P-localoptima (cf. Johnson et al. 1988 for details).

On the side of bounded arithmetic the related problem (by Lemma 7.5.2 andTheorem 7.5.3) is whether for every Eb -formula A for which

Tz I- Vx3y, A(x, y)

one can find another '-formula A* such that(i) Tz 1- A*(x, y) --*A(x, y)

(ii) T2 l Vx3!y, A*(x, y)(an analogous statement is open for T2 too). Note that by Corollary 7.2.6 thisimplication holds for S.

Several other search classes were introduced by Papadimitriou and Yannakakis(1988) and Papadimitriou (1990, 1994), in particular the classes PPA, PPAD, andPPP. A search problem in PPA is defined as a PLS-problem except that there is nocost function and the neighborhood function satisfies

1 . Vx, s, Np(s, x) c_ Fp(x) \ {s} A I x) I < 22. `dx, S, S', S' E Np(s, x) = S E Np(s', x)3. Vx3! s, s E Np(0, x).

The task is to find s E Fp(x) \ {0} such that

3! s', S' E Np(s, x)

Thinking about the conditions' E Np(s, x) as defining the edge (s, s') in a graphG p(x) with the set of vertices Fp(x), the conditions imply that G p(x) is a sym-metric graph of degree < 2 in which vertex 0 has degree 1. Hence the task is tofind another degree I vertex.

The problems in the class PPAD are defined similarly but the graph G p(x) isnow directed and satisfies the conditions

1. indeg(0) = OA outdeg(0) = 1

2. indeg(v) < 1 A outdeg(v) < I for any vertex vand the task is to find a vertex v 0 such that

indeg(v) + outdeg(v) < 1

A problem in class PPP is specified by a polynomial time function

fp(s, x) : Fp(x) - Fp(x) \ {0}

and the task is to find two s, s' E Fp(x), s 0 s', such that

fp(s,x) = fp(s',x)

In other words: the task is to find two witnesses to the fact that fp(s, x) does notviolate the pigeonhole principle.

Denote by PPA(PV 1) the set of `d Eb (PV 1)-sentences expressing that no PV 1-function Np (s, x) can define a symmetric graph of degree < 2 with deg(0) = 1and no other degree 1 node. Similarly denote by PPAD(PV 1) and PPP(PV,) thecorresponding principles underlying the classes PPAD and PPP.

The following theorem is proved similarly to Theorem 7.3.7 (interpreting theHerbrand function h as a function h* witnessing the principle).

Theorem 7.5.5. A multivaluedfunction f is Eb-definable in SZ (PV1) + PPA(PV1)it it can be witnessed by PPA problem.

Similarly for PPAD and PPP.

A reason for studying these classes is that numerous other seemingly differentsearch problems are reducible to one of them (cf. Papadimitriou 1990).

7.6. Model-theoretic constructionsIn this section we shall give a few model-theoretic constructions for some re-sults proved earlier in the chapter by proof-theoretic methods and obtain a newproposition.

We begin with a general discussion of complexity classes in models of PV. Thisis elaborated on more in Sections 15.2-3.

In model M of PV the class P is the class of subsets of M definable by an atomicPV-formula with parameters from M (in SZ this would be provably Ab-formulaswith parameters), or equivalently, recognizable by a standard DTM with an extrainput (the parameter) that may be nonstandard, or equivalently, recognizable bya DTM possibly with a nonstandard description but whose time is bounded by astandard degree polynomial from M[n].

The class P/ poly is defined in the same way except that the parameter of thestandard DTM may vary with the length of the inputs, the class pstan is defined asthe class P but allowing no parameters, and the classes NP, NP/poly, NPs" aredefined analogously using NDTMs.

The first theorem is a strengthening of Corollary 7.2.5.


Theorem 7.6.1. Let M be a countable model of PV. Then there exists a modelM* ofPV a E'-elementary extension of M, such that

M* "P=NPncoNP"

Proof. Let a (x) E E b and #(X) E 116 be two formulas with parameters from M.There are two cases:

(a)

PV + Thyn6 (M) F- Vx(a(x) =_ ,e(x))

in which case there is by the Herbrand theorem a PV-symbol f (x, y) and b E Msuch that

PV + Thynh (M) I- Vx(a(x) (f(x, b) = 1))

so the set defined by a(x) is in the class P of M, or(b)

PV + Thyn6 (M) V `dx(a(x) __ fi(x))

in which case either(1)

PV + Thynh(M) V Vx(-a(x) V /3(x))

which means that

M a(c) n -'8(c)

some c E M, and this will remain true in every extension of M, or(2)

PV + Thynh (M) I f Vx(a(x) V -,B(x))

In case (2) take a new constant c and form a model M' satisfying

PV + Thyll (M) + -'a(c) +,6(c)

Any Eb-elementary extension of M' will satisfy this theory and hence also

-'dx(a(x) ,8(x))

Enumerate with infinite repetitions (ao, ,eo), (a i , 0i ), ... all pairs of a Eb anda 116-formula in the language of PV augmented by a name for every element ofM and by countably many new constants. Construct a countable chain of Eb-elementary extensions

M=MocMi c...where M;+i is obtained from M; as M' from M for pair (a, (a;, ,B; ), and

such that every element of any M; has a name among the constants of the language.

Then the union of the chain

M* UMi

is the desired model. Q.E.D.

The nonuniform analog of this theorem for P/ poly, NP/ poly, and coNP/ polyalso holds as one can treat every length from Log(M) separately in the constructionof the chain.

Next we give a model-theoretic proof for the counterexample witnessing The-orem 7.4.1.

Lemma 7.6.2. Let M be a model of T2, for i > 1, or of PVl for i = 0. Assumethat M* CM is a subset of M closed under all standard O+t functions definablein M using parameters, from M*.

Then it holds that1. M* is a E'-elementary substructure of M2. M* = T2fori > 1, orM* = PVI fori = 0

Proof. The Skolem functions for the E'-formulas are E<+, -definable in T21 as TZproves the Eb MAX principle (by Lemma 5.2.7). This shows condition 1.

For condition 2 we want to show that IND holds for any E'-formula 0

X0(0)vO(a)v(3x <a,O(x)A-(x+ 1))

Since M* is a E' -elementary substructure of M, it suffices to find a +1-function

in M that on the input a for which 0 (0) A -'0 (a) holds outputs x < a for which

O(x) A -4(x + 1)

Such a function is available in PV;+i by definition (cf. Section 5.3): Take thefunction (h (b, b)) I. Q.E.D.

With the help of this lemma we give an alternative proof of Theorem 7.4.1.Let 0 satisfy the hypothesis of that theorem and assume f o r the sake of contra-

diction that f o r no k and fi, ... , fk E D+1, T2' proves the disjunction required inthe theorem.

Let fi, f2.... be an enumeration of all D+, -functions with the followingproperties:

(i) f is < j-ary,(ii) each DPI-function occurs in the list infinitely many times.

Let c, d1, d2.... be new constants. By compactness then the theory

TZ + -'0 (c, fl (c), d i) V -'0 (c, .f2 (c, d i ), d2) v ...

is consistent. Take M as a model of this theory.

Let M* C M be the set

M* {fi (c), f2 (c, dl), ...}

It holds that1. 1c, dl ....}CM*2. M* is closed under the Op, -functions.

This is because the projections are o +, -definable and because each +, -functionoccurs infinitely many times.

Lemma 7.6.2 then implies that M* TZ and it is Ev-elementary. That is,

M* TZ +Vx3y, -O(c, x, y)

(for x = f (c, d1.... , d1_1) take y = dj). This contradicts the hypothesis, andhence Theorem 7.4.1 is proved.

The last theorem in this section is a model-theoretic statement implying thewitnessing Theorem 7.2.3. In proving that theorem, one would like to reason asfollows. Assume that for A E Eb, SS does not prove A(a, f (a)) for any f E t'.By compactness then there is a model M and a E M such that

M Sz + -A (a, f (a))

for all f E t'. Take M* C M to be the subset of M generated from a by allp-functions. M* is a substructure of M and

M* 3xVy-'A(x, y)

Unfortunately the argument fails at this point as there is no apparent reason whyM* should satisfy S.

However, a chain construction similar to the one underlying Theorem 7.6.1 andLemma 7.6.2 works.

Theorem 7.6.3. Any countable model M of PV; has a Zb-elementary extensionM' such that

1.

M' Sz (PV;)

2. for any open PV1 formula 0 (x, y) with parameters from M' there is aPV; -term f (x) with parameters from M' for which

M' Vx3yO(x, y) - VxO (x, f(x))

Proof. Let M be a countable model of PV; and let 0 be an open PV; -formula withparameters from M. Let Thh(M) denote the II'-theory of M. If

i

PV, + Thai (M) I- Vx3y < t(x)0(x, y)

then by the Herbrand theorem (analogously to the proof of Theorem 7.6.1) thereis a PV,-term f(x) such that

Vx, f(x) < t(x) A0 (x, y)

will hold in every Eb-elementary extension of M. Otherwise there is either a c Msuch that

M=Vy<t(a)-4(a,y)

and that will remain valid in all E'-elementary extensions of M or, if there isno such element a, take a new constant c and form a model Mi of the theoryPV1+Thn (M) +Vy < t(c)-(c, y). Hence Vx3y < t(x)4(x, y) will fail in allextensions of M'.

Iterating this construction countably many times as in the proof of Theorem7.6.1 yields a model M' satisfying condition 2. However, that already implies thatM' is also a model of SS(PV; ): if * is an open PV;-formula and b E M' such that

M' *(0, u) A Vx < lblVv3w, r(x, v) -* (x + 1, w)

holds (with the bounds to v, w implicit in then by condition 2 also

M' Vx < IblVvi(x, v) - *(x + 1, f(x, v))

for some PV;-term f. As M' is a model of PV, there is w E M' of the form

where

But then

w = ( o . . . . . UIbl )

uo := u and uj+i := f(j, uj)

VJ<Ibl,/(J,u;)--* *(>+1,u;+i)

and by induction for the formula tr(x, ux) available in PV, t'(Ibl, ull,l) follows:That is, 3vi/i(Ibl, v) holds in M'. Q.E.D.

Following Zambella (1994) we derive Corollary 7.2.4 from Theorem 7.6.3.Assume that SS(PVi) proves Vx3y < t(x)4(x, y), where 0 is an open PV;-formula (every Eb-formula is equivalent to a Eb(PV,)-formula). Assume for thesake of contradiction that for no PV,-term f (x) does PV; prove Vx, f (x) < t (x)A¢ (x, f (x)). By compactness we may take a countable model M of PV, in which

M -(f(a) < t(a) A¢ (a, f(a)))

holds for some a E M and all PV, -terms f. Take M' to be the extension of M guar-anteed by Theorem 7.6.3. Clearly b'x3y < t(x)4(x, y) fails in M', contradictingthe assumption that the formula is provable in S.

7.7. Bibliographical and other remarksThe proof-theoretic method for proving the witnessing theorems for systems S%and Ti was developed by Buss (1986) and Theorem 7.1.4 is stated there. Thewitness formula (7.2.1) is also defined there. Theorem 7.2.3 follows from Buss(1986, 1990a) from which Corollaries 7.2.4-7.2.7 were also obtained.

The use of Herbrand functions for witnessing Theorem 7.3.3 comes fromKrajidek (1992); it was obtained by using a direct method in Pudlak (1992b).Theorem 7.3.5 and Corollary 7.3.6 are from Krajidek (1993).

Theorem 7.3.7 was proved by Wilkie (unpublished). Theorem 7.4.1 is fromKrajidek et al. (1991). Section 7.5 is based on Buss and Krajidek (1994). Topicsmentioned at the end of Section 7.5 are considered in Chiari and Krajicek (1994).

Theorem 7.6.1 is new. I do not know whether it is valid for Pstan, NPstan andcoNPstan in place of P, NP, and coNP. Lemma 7.6.2 and the model-theoretic proofof Theorem 7.4.1 follow Krajidek et al. (1991).

The first model-theoretic proof of Corollary 7.2.4 was obtained by Wilkie (un-published). The proof here follows Zambella (1994).

61

Definability and witnessing in secondorder theories

This chapter is devoted primarily to proving several definability and witnessingtheorems for the second order system U2 and V,, analogous to those in Chapters6 and 7.

Our tool is the RSUV isomorphism (Theorem 5.5.13), or rather the definitionof V,' (Definition 5.5.3), together with the model-theoretic construction of Lemma5.5.4.

The first section discusses and defines the second order computations. In the sec-ond section are proved some definability and witnessing theorems for the secondorder systems and further conservation results for first order theories (Corollaries8.2.5-8.2.7). The proofs are sketched and the details of the RSUV isomorphismarguments are left to the reader.

8.1. Second order computationsLet A(a, $`(b)) be a second order bounded formula and (K, X) a model of V, .By Definition 5.5.3 we may think of K as of K = Log(M) for some M = S'21

with X being the subsets of K coded in M. Pick some a, b E K of length n andsome pt(b). Then

(K, X) = A(a,,Bt(b))

if and only if (see Theorem 5.5.13 for the notation)

Mk-- A'(a,u)

where u codes pl(b). The length of u is thus 21t(b)l = 20(n)Moreover,if A E E, 'b then A' E Eb; hence a second order query to a E=l,b

oracle with the first order inputs of length n and the second order inputs (i.e., the

132

8.1 Second order computations 133

characteristic function of 0`(b)) of length 20(n) is just a query to a E'-oracle oflength 20(n)

Hence the second order computations in a model of V11 are analogous to the firstorder computations in a model M of S' stipulating that some inputs and outputs arefrom Log(M). With such a notion we shall be able to use the RSUV-isomorphism5.5.13 to characterize the functions definable in the second order systems.

One more remark concerns the time bound. If (K, X) V1 then M S3,2

where S3 is defined as SZ is but in the extended language L(#3) := L U {#3},where

x#3y = 21x1#IYI

Compare the definition of the function w2 (x) in Theorem 5.1.5. This is completelyanalogous to Lemma 5.5.4 as K is closed under # if the lengths in M are closedunder # if M itself is closed under #3. Similarly R3 is defined as RZ (Definition5.5.12) but in the language L (#3). We use a statement analogous to Theorem 5.5.13as a lemma, as we shall use it later.

Lemma 8.1.1. The RSUV-isomorphism (Theorem 5.5.13) holds identically forpairs of theories S3 and V21 and R3 and U.

In addition definability and witnessing theorems of Chapters 6 and 7 straight-forwardly translate to the theories in the language L (#3). Call #3-time a time boundt (n) where t is a term of language L. We have, for example, a statement analogousto Theorems 6.1.2 and 7.2.3: The theory T3 Ei+1(#3) -defines exactly the functionscomputable in the #3-time using a Eb-oracle.

In the following definition we identify a finite set i/rY with its characteristicfunction on the interval [0, y].

Definition 8.1.2. Let Q(x, t/!Y) be a second order oracle set, with x its first orderinput and *Y its second order input.

A Turing machine M with the time bound t (n), with first and second orderinputs and with the oracle Q, is required to satisfy the following conditions:

1. M has two read-only input tapes: a first order one with inputs of size nand a second order one with inputs of size < 21(n)

2. M writes the queries on a pair of write-only tapes. On the first order tapeM writes a string x of size < t (n), on the second order tape a string /Y ofsize < 2'(). The oracle answers YES/NO according to whether Q(x, tl,Y)holds or not, and erases the query tapes.

3. The total computational time is bounded by2'(n)4.M writes the outputs on two write-only tapes: one for a first order output,which can have size at most < t(n), and one for a second order output,which may have size < 2'(n).

134 Definability and witnessing in second order theories

TIME(2t("))Q denotes the class of the functionals computable by a Turingmachine M satisfying the conditions stated. Set

EXPQ := U TIME(2"k)Qk

and

EAPEi := U EXPQQEE`I,b

The space complexity of such a computation is the total space used on workingtapes of M. SPACE(s(n))Q denotes the class of the functionals computable bysuch computations with the space complexity < s(n) and

PSPACEFr,n U U SPACE(nk)QQEE`l,b k

We say that a machine from the definition computes a function, if it has nosecond order inputs or outputs (but it may ask second order queries).

Lemma 8.1.3. Let i > 0. The functionals from the class EXP',b

that are functionsare precisely the functions from the class EXP£p; for i = 0 this is EXP.

Similarly the functionals from the class PSPACEE;I'h

that are functions areprecisely the functions from PSPACEEP, and for i = 0 this is just PSPACE.

Proof The statement follows from the discussion before Lemma 8.1.1. The casesfor i = 0 follow from the observation that any Eo'b-property can be decided inPSPACE. Q.E.D.

8.2. Definable functionalsThe first result is analogous to Theorem 7.2.3.

In

Theorem 8.2.1. Let i > 1. The functionals from the class are preciselythose Eil'b-definable in V2. In particular, the functions Ej'b definable in V2 areprecisely the functions from the class EXP,

Proof Let F(a, a) = (b, $) be a functional (for simplicity we suppress the su-perscripts in the second order variables). Let (K, X) be a model of V2, a, b EK, a, 6 E X such that

(K, X) F(a, a) = (b, ,B)

Take M S3 s.t. K = Log(M) and s.t. X are the subsets of K coded in M, andlet a=u,8=ufor u,vEM.

Assume that a E,1,b A(a, a, b, ,B) defines the graph of F in (K, X).Then the *formula A1(a, u, b, v) defines the graph in M (cf. Theorem 5.5.13for the definition of A 1). As A defines F provably in V2, by Theorem 7.2.3 A 1defines a p(#3)-function f provably in S3. On the input a, u with Jal = n andJul = 2"0 1' the running time is 2"0(0, and it queries a Eb 1-oracle.

The same machine then computes F when a, P are identified with their char-acteristic functions., and the oracle becomes, by the remarks before Lemma 8.1.1,

Iba Ei'b -oracle. Thus F is in

The extra statement for i = I follows then from Lemma 8.1.3. Q.E.D.

If we would augment the machine in Definition 8.1.2 by a witness-oracle ratherIb

than just an oracle (cf. Definition 6.3.1) then it would not change the class EXP£i 'This is because the number of queries is unlimited so M may ask consecutivelyfor all < 21 (') bits of a witness. However, when the number of queries is bounded,then this trivial simulation does not apply and we get an apparently distinct class.

Definition 8.2.2. Class EXPEl1'6

[wit, poly] is a class of multivalued functionalscomputable by a Turing machine M in time 2"

0(1)equipped with a witness-oracle

from Ei 'b satisfying all conditions of Definition 8.1.2 and in addition satisfying1. If the oracle answer to the query (x, k) is YES then the oracle writes on a

special write-only witness tape a witness for the Et1,b formula Q(x,2. The total number of queries made is bounded by a polynomial n 0(1).

IbLemma 8.2.3. Let i > 0 and assume that values off e EXPEL [wit, poly] areonly first order.

Then, in fact

f (= PSPACE, i1'b

Proof. Computing a particular bit of the value off does not require witnesses tothe oracle answers, analogously to Lemma 6.3.4 and Corollary 6.3.5.

For f with only first order values we may compute each bit of the outputseparately. By Definition 8.1.2 there are at most n 0(1) bits in the output and in acomputation of each the machine asks n0(1) queries. So f is computable by anEXP£t

I'b-machine with onlypolynomially many queries to the oracle: Call the class

of the functionals computable in this way EXPEp [poly]. Completely analogouslywith Theorem 6.2.3 (condition (i) of that theorem) we get

PSPACE'+'b =EXPE;I.b

[poly]

Q.E.D.

Theorem 8.2.4. For i > 1, multivalued functionals Ei 'b-definable in U2 are

£precisely those from the class EXP!-I [wit, poly]. In particular, the functions

'b-definable in UU are those from the class PSPACE

i > 2 the same is true with V2iin place of U2.

Proof. In the RSUV isomorphism (Theorem 5.5.13) the theory UU corresponds toRi Vi-1 to Sz-1 and E1'b to Eb.3 2 3 i I

By statements analogous to Theorems 6.3.3 and Corollary 7.3.6 for the #3-time, the multivalued functions Eb-definable in 53-1 are precisely those from the

classTIME(#3)Ep

1 [poly]. Hence the RSUV-isomorphism implies the part of thetheorem for Vi -1.

To prove the part for UU we establish a statement analogous to Theorem 6.3.3and Theorem 7.3.6 for the theory R3 in place of S. In the proof of the witnessingTheorem 7.3.5 the number of queries in the functions witnessing the sequents in aproof increases by a constant for all LKB rules except for the cut, by a factor of 2for the cut rule, and by O (log n) queries in the case of the E b 1 P1ND rule. Now,if the rule

4(la/2J) - O(a)0(o) - O(ItI

of R3 from Definition 5.5.12 is used instead of the EP 1 PIND, the witness con-tains a witness to the existential quantifiers of the Eb-formula 0, and hence thef u n c t i o n g witnessing the lower sequent is obtained f r o m the function $1 witness-

ing the upper sequent by < I I t I I iterations. The value of l i t I I is (log n)° (1 ) for thelanguage L (#3). Hence if the number of queries asked in the computation of $1 is(log n) 0(1), so is the number of queries asked in the computation of g.

Thus a witnessing argument identical to that in the proof of Theorem 7.3.5shows that the multivalued functions *definable in R3 are in the class

TIME(#3)En

1 [(logn)o(1)].The proof of Theorem 6.3.3 needs no change at all to show that S3 can

Eb-define all functions from TIME(#3)[(logn)0(1)], and hence also R3 asS3 c R3 (see the remark after Definition 5.5.12).

This gives the wanted characterization of the functions Eb-defmable in R3", andhence the theorem is proved. Q.E.D.

Corollary 8.2.5. For i > 1, the theory U2+1 is `dE11+1-conservative over V2.Also the theory R3+1 is `dEi+1-conservative over S.

Proof. The corollary follows from the previous theorem similarly to the way Corol-lary 7.2.4 follows from Theorem 7.2.3. Q.E.D.

From this statement and its proof we can deduce three more interesting corol-laries. Recall that B(E11'b) denotes the class of Boolean combinations of E11'b-formulas, AC is the axiom of choice scheme (cf. Lemma 5.5.8), and BBEb is thesharply bounded collection scheme (cf. Definition 5.2.11).

Corollary 8.2.6. For i > 1, the theory U2+1 is d13(E,l+l)-conservative over thetheory V2 + E"+l AC.

Also the theory R3+I is V8(Ei+1)-conservative over the theory S3+ BBEi+1.

Proof. Work with the first order systems. As any Boolean combination can bewritten in a conjunctive normal form, it suffices to prove the statement for a formulaof the form

nvawith r E rli+1 and a E E,+1, that is, for sequents of the form

err -) aThis is a sequent of Ei+l-formulas, and hence the witnessing theorem for R3+1(outlined in the proof of Theorem 8.2.4) implies that R3+' proves the sequent

i+l,a i+1 a3w Witness_7r (w, a) - 3w Witnesses , (w, a)

which is, by Corollary 8.2.5, provable in S3 too.Hence it would be enough to show in S3 the equivalence

3w' Witnesses 1°a(w', a) - a(a)

By Lemma 7.2.2 (a statement analogous to it for the #3-time), however, this isprovable in S3+

This proves the theorem for the first order case and by RSUV-isomorphism alsofor the second order case. Q.E.D.

The next corollary is related to Lemma 5.2.9.

Corollary 8.2.7. For i > 1, both S3 and S2 prove the 0+1 FIND scheme.Also VV proves the L '1 PIND scheme.

Proof. As with Lemma 5.2.9 one can show

R3I-0+1 PIND

An instance of PIND for a Ab+l-formula is `dEl+l; hence Corollary 8.2.5 impliesthat

S3 H Ae+ PIND

too.This proof does not apply to SS but we may argue as follows. By Lemma 5.2.13

we know that SS + BBiE+ is `dEt+1-conservative over S2. It is thus sufficient toprove O+1 PIND in SS + BBE<+1. We have

SS + BBEi+1 F- Yx3ydt < IX I, a(t) = t E y

for any A +, -formula a, as follows from BBE +I applied to the formula

Vt < IxI3y, (y = 1 A Or (t)) V (y = 0 A -or (t))

Hence LIND for a follows from Ab-LIND for t E y, which is provable in S.0i+1 PIND is a special case of E;+1 PIND, and hence provable in U2+1 and

its instance is a VEi1+1-formula, hence by Corollary 8.2.5 also provable in V2 .Q.E.D.

We conclude this section with a useful statement.

Corollary 8.2.8. For i > 1

Vz I- At+1 AC

Proof. By Lemma 5.2.8 U2+1 proves 0i+1-AC and by Corollary 8.2.5 it is prov-able in V2 too. Q.E.D.

8.3. Bibliographical and other remarksThe treatment of the second order computations follows Buss (1986) and Buss etal. (1993). Theorem 8.2.1 is from Buss (1986); Definition 8.2.2 and the statements8.2.3-8.2.8 are from Buss et al. (1993).

We did not use the proof-theoretic treatment of the second order systems butwe at least remark on the sequent calculus formalization. Namely, the Eo'b-CAscheme becomes provable from the introduction rules for the second order 3

I' - A, A(0/B)P A,30A(0)

where A is any formula and B is Eo'b. The CA for B then follows by applying therule to

Vx < a, B(x) __ B(x)

quantifying only one occurrence of B.

9

Translations of arithmetic formulas

We shall define in this chapter two translations of bounded arithmetic formulasinto propositional formulas and, more importantly, we shall also define translationsof proofs in various systems of bounded arithmetic into propositional proofs inparticular proof systems.

In the first section we shall consider the case when the language of IDo isaugmented by new predicate or function symbols, and the case of the theories U11and V11. In the second section we treat formulas in the language L and the theoriesS2, T2, and U'2*

In the third section we study the provability of the reflection principles forpropositional proof systems in bounded arithmetic and the relation of these re-flection principles to the polynomial simulations. In the fourth section we presentsome model-theoretic proofs for statements obtained earlier. The final section thensuggests another relation of arithmetic proofs to Boolean logic, namely the relationbetween witnessing arguments and test (decision) trees.

9.1. Bounded formulas with a predicateFirst we shall treat the theory I00(R) and then generalize the treatment to thetheories U1 and V 1 . Instead of IDo(R) we could consider the theory 1 1,b butthe presentation for the former is simpler. The language LPA(R) of IDo(R) is thelanguage LpA augmented by a new binary predicate symbol R(x, y).

Definition 9.1.1. Let9(al, ... , ak) be a boundedformula in the language LPA(R)and let pig be propositional atoms, one for each pair (i, j) E N x N.

F o r n 1, ... , nk E N we define the propositional formula (9)(n, ,...,nk) by induc-

tion on the logical depth of 9:

139

140 Translations of arithmetic formulas

1. if 9 is the atomic formula s(a) = t(a) then

1 ifs (n) = t(n) is true(e)(n)

0 ifs(n)=t(n)isfalse

2. f 9 is the atomic formula s(a) < t (ii) then

11 ifs(n) <t(n)is true0 ifs(n) <t(n)isfalse

3. f9 is the atomic formula R(s(a ), t(a )), s(n) = i and t(n) = j then

(0)(n) := Pij

4. if 9 = - then

(6)(n)

5.

(6>(n) M(n) 0

6. f9 =2x <s(a) v(a,x)and s(n) =uthen

M(n) V (v)(n,m)m<u

7. if9=`dx<s(a)v(a,x)and s(n)=uthen

(9)(n) A (v)(n,m)m<u

In clause 6 (resp. 7) the disjunction (resp. the conjunction) is formed from thebinary connectives with the brackets associated, for example, to the left.

The next lemma is proved by induction on the complexity of 9.

Lemma 9.1.2. Let O(a) be a bounded formula in the language L PA (R). Thenthere are d and 2 such that for every n

1. dp((9)(n)) < d2. I(0)(n)I < (max(i)+2)

Note that we used the depth of 9 and not the logical depth, as the connectivesin clauses 6 and 7 of the preceding definition are of unbounded arity.

The theory I Do (R) is defined exactly as the theory IDo is in Section 5.1 exceptthat the axiom scheme of induction IND is accepted for all bounded formulas inLpA(R). The theories T2 (R), SS(R), and S2(R) are defined analogously.

Theorem 9.1.3. Let 9(a) be a bounded formula in L pA(R) and assume

IDo(R) I- `dx9(x)

Then there are d and £ such that every propositional formula (0) (n) has a depth dF proof of size at most n e.

Moreover, there is a polynomial time algorithm producing on input 1 ... 1(n-times) a depth d F -proof of (0)(n).

Proof. We shall describe the construction of a depth d size ne LK-proof of thesequent

(0)(n)

and it will be obvious that the required algorithm exists. This is equivalent to therequired task by Lemma 4.4.15.

By cut-elimination Theorem 7.1.4 (straightforwardly modified for IDo(R))there is an LKB-proof 7r using the Do(R)-IND rule of the sequent

-) 0 (a)

A sequent in the proof it has the form

01(b),...0,(b) *1(b).... *,(b)

where all ¢i, ilij are io(R) (by Corollary 7.1.5). By induction on the number ofinferences above the sequent in it, prove that there are d and f such that for anytuple n the sequent

(01)(7i),...,(0r) (n) ) (1I)(n),...,(Y's)(n)

has a depth d size (max(i) + 2)e LK-proof.By Definition 7.1.1 all initial sequents have the form A -* A, A atomic, or

-* A, A an axiom of PA-. In the former case the propositional translation iseither 0 -f 0, 1 1, or pig - pig. In the latter case the translation is of theform -+ r, where r is a true Boolean sentence (i.e., without atoms). Moreover,the depth of r is constant (= the maximal logical depth of a PA--axiom). Anysuch sentence has a dp(r) LK-proof of size O(Ir1).

The case when the sequent was obtained by structural or propositional rules orby the cut-rule is obvious: The same rules of propositional logic should be appliedto the propositional translations of the upper sequents.

For the closed terms t < s(n) is a formula of the form (17(t))(n) one of thedisjuncts of (3x < s, >)(x))(n ); thus 3 < : right rule is simulated by repeated(polynomially many times) V : right rule of LK.

For the V < : right inference

a <t,l' - A,A(a)I'-*O,Vx<tA(x)

assume that f o r each a = 0, 1, ... , val(t) there is an LK-proof of the translationof the upper sequent with the required properties. Then all a < t translate to 1,

and thus can be cut out with the initial sequent -* 1, and the sequents obtainedare joined by repeated applications of the A : right rule for a = 0, 1, . . . , val(t).Hence the size of this translation is val(t)°(1) = max(n)°(1). The left quantifierrules are treated analogously.

Finally, the IND-rule

A(a), r --* A, A(a + 1)A(0), r A, A(t)

is simulated by applying the cut rule to the LK-proofs of the translations of theupper sequent f o r a = 0, 1, ... , val(t) - 1. Q.E.D.

Corollary 9.1.4. Assume that T is a theory defined as I Ao(R) except that thelanguage of T might be richer than LPA(R) and that T contains finitely manybounded axioms besides the induction axioms. Let f (n) be a non-decreasingfunc-tion such that any term in the language of T can be majorized by some iterationof f(n).

Then if 0 (a) is a bounded formula in the language of T and

T I- 9(a)

there are d and e such that everypropositional formula (0) (n) has depth d LK proofofsize < f(e)(n) (the fth iteration of f).

Proof. The argument is entirely the same as in the preceeding proof, estimatingthe values of the terms in the axioms, in the quantifier rules, and in the IND-ruleby iterations of the function f (n) rather then by the terms themselves. Q.E.D.

We shall extend the simulation from Theorem 9.1.3 to the second order theoriesU1 and V1' .

First we have to amend Definition 9.1.1 as the language of Ull and V11 doesnot contain symbol R(x, y) but has second order variables at and new atomicformulas at = /V and s E at.

For each variable at (Q) and U E w introduce new atoms po, p1, ... , p wherev = val(t (u)), and define

1. (as(x) = Pt(Y))(n,m) Ai<s(n)(Pi = qi) A As(n) t(m) is defined analogously.

.

t(Y) _(s(x) E

qn

toifu=s(n) <t(m)otherwise

where atoms pi and qj are associated with as and Pt.A sequence of formulas 91, ..., 9k is called an EF-sequence if it satisfies

the conditions of Definition 4.5.2 to be an EF-proof with the condition that noextension atom appears in 9k dropped.


Theorem 9.1.5. Let A(x) be a Eo'b -formula and assume that

V11 F- VxA(x).

Then the formulas (A (x)), have polynomial size EF-proofs.

Proof We shall consider V11 formalized in the sequent calculus with the Ell 'b-

IND-rule in place of the E1,b-IND-axioms and with the introduction rules for thesecond order quantifiers replacing Eo'b-CA (cf. Lemma 5.5.4 and the discussionof Eo'b-CA in Section 8.3).

Assume that it is a V11-proof of the sequent -* A (a); w.l.o.g. we may assumethat all formulas in it are These are E,1'b-formulas in which all secondorder quantifiers precede all first order quantifiers and all connectives (a notionanalogous to strictEb from Lemma 5.2.15).

By induction on the number of steps in it above a sequent show that if

3*lBl(x,a, 1), ...>3*.B,,(x,a,*u)

is a sequent in it, then there is a constant k such that for all m there is an EF-sequence of size at most (max(m) + 2)k ending with the sequent

(CoiWce, -P"

and such that in this EF-sequence none of the atoms p"` or pr" correspondingto a free second order variable a; (resp. to a second order variable *j) from anantecedent is an extension atom.

The construction follows the proof of Theorem 9.1.3 and we need to treat onlytwo new rules: the introduction of the second order 3 to the succedent and EI.b_

IND (the introduction of the second order 3 to the antecedent does not change thetranslation).

Assume that in the former case the minor formula of the inference is

t EC X, a, E(X,t,a)

with both C, E E Eo'b, and that the principal formula is ii, i ). Introducea new atom p7 _ Then the equivalence

(C)m p", (E)m t= (C)m(p", h )

can be derived from the new extension axioms by an F-derivation of size

0 ((I (C); l + I (E)m,,1)2) = (max(m +2)0(1)) ,


as t is implicitly bounded in E by a power of max(m ). This concludes the firstcase.

Now consider the Ei'b IND inference

3 bC(b, b) --* 34b+I C(b + 1, sib+1)

34oC(0, 0) - 34.C(n, sin)

(the other free variables and the side formulas are omitted for simplicity). Bythe induction hypothesis we have polynomial size EF-sequences ending with theformulas

(C)m,u(p ") -+ (C)m,11 1

(phi,+ )

for u = 0, 1, ... , n -1. Joining these sequences by n -1 cuts gives an EF-sequenceending with the implication

(Oi (Oi ,n(p'")

of total size polynomial in max(m, n).

As the formula A is Eo'b, atoms in (A)O correspond to free second ordervariables in A and hence cannot be the extension atoms. Thus the final EF-sequenceis, in fact, an EF-proof. Q.E.D.

The proof can be modified to yield the following statement.

Theorem 9.1.6. Let A(x) be a Eo'b formula and assume that

Ull l- VxA(x)

Then the formulas (A (x)), have F proofs of size n(logn)0(')

Proof. By Theorem 9.1.5 there is an EF-proof of (A (x)) n of size no(). Moreover,by its proof in these EF-proofs will be introduced only (log n) 0(1) extension atomsas the U,1-proofir uses EI'b There will beO((logn)')extension atoms if there are k nested induction inferences.

Having such an EF-proof, replace in it the last introduced extension atom by itsdefinition: This yields a new EF-proof of size n 0(1). n 0(1) with one less applicationof the extension rule. Repeating this procedure until all extension axioms areeliminated produces an F-proof of size n(logn)0(" Q.E.D.

9.2. Translation into quantified propositional formulasWe define a translation of EO.-formulas of the language L into the quantifiedpropositional formulas.

Let A(a1.... , ak) be a E,,-formula. As all terms in L are polynomial timefunctions there exists a polynomial PAW such that for n 1, ... , nk the truth value ofA(n 1, ... nk) can be computed in the interval [0, 2PA(m)], where m = maxi (In; I).

Some canonical PA (x) is easy to define by induction on the logical complexityof A and we shall assume that such polynomials are fixed. Any polynomial q (x)majorizing PA (x) is called a bounding polynomial of A.

We shall also need to express the bits of numbers, and for better readability weshall write the function bit(a, i) as a(i). Hence

n=i<InI

Define a (i) := 0 for i > j a 1.For B a quantified propositional formula with atoms (po, pi , ...), B(n)

denotes B with the bits n (i) substituted for pi.Any term t(al,... , ak) can be computed f o r lal I, , lakI < in by a Boolean

circuit Ct (by Theorem 3.1.4) with k (m + 1) bits of input and poly(m) bits ofoutput (the bits of the value oft (a )). Moreover, the size of Ct is polynomial inin. If we introduce new atoms for every node of the circuit, the statement "Ct oninputs pl...... k outputs q " can be expressed by a Eq-propositional formula(saying that there exists a computation of Ct), which we shall denote

Bm(P1...... Pk,g)

Such a formula is defined precisely by induction on the complexity of the term tand we leave it to the reader. Note that

AM (P1,...,pk,q)I =mo(1)

Finally, recall Definition 4.6.2 of the systems G, G and G*.

Definition 9.2.1. Let A (al, ... , ak) be a bounded formula in the language L ofS2. Let q (x) be a bounding polynomial for the formula A.

For every in we construct a quantified propositional formula

IJA llq(m)

with the atoms p;, i = 1, ... , k where each pt = (p°, ... , p9(m)). We proceed byinduction on the logical complexity of A:

(a) For A the atomic formula t (ii) = s (i7) define

llAllq(,n) :=3xo,...,Xq(m),Yo,...,Yq(m),Bt(Pl,...,pk,gjlxj)

ABs(P1..... k,gjlyj)A A Xi=Yti <q(m)

where we define that tuples pl...... k, q in B have q(m) + 1 bits, withp! for j > in not occurring and with not all qj necessarily occurring.

(b) For A the atomic formula t (ii) < s (ii) define

-IlAllm

y(m) 3x0, ... , xq(m), Y0, ... , yq(m), Br(P1, ... , Pk, gj/xj)

A B5(P1,..., Pk,gj/yj)

A A (( A xj = Yj) n x; --+ yl )O <i <q(m) i+1 <j <q(m)

(the last conjunct defines the lexicographic order on .z, y).(c) For A = -A1 define

IIAIIq(m) -IIA1IIg(m)

(d) ForA=A1oA2,o=V, A, define

IIAII'q (m) = IIA1llq(m) o IIA2IIq(m)

(e) For A(a) = 3x < I t I Al (a, x) define

'IAIIq(m) V 11b I q (m) I.F o r A (a) = 3x < t A l (a, x) with t not o f the f o r m I s I define

IIAllq(m) :=3xo,...,xq(m)IIb <tAAl(a,b)Iiq(m)(g/x)

and f o r A (a) = `dx < t A (a, x) with t not o f the f o r m I s I define

IIAIIqm) :=dxo,...,xq(m)llb <_ t -* A1(a,b)llqm)(glx)

where ;Y is the tuple associated with b.

Lemma 9.2.2. Let A be a bounded formula and q (x) its bounding polynomial.Then it holds that

1. There is a polynomial r(x) such that for all in

I IlAll m)l < r(m)

2. If A E Eo then II A Ilq(m) is Oq in G i . That is: Provably in G i the formulaII A II q(m) is equivalent to a E9 formula and to a 119 formula.

3. If A E Eb then I!Allq(m) is provably in G* a Eq formula.

Proof. Condition 1 is obvious from the definition of 11 All q'(.). For condition 2 it is

sufficient to verify the claim for atomic formulas, as translations of the Eo-formulaare formed from translation of atomic formulas by Boolean connectives only.

For A of the f o r m t = s, II 4ll q'(m) can be expressed as

dxo,...,xq(m),YO,...,Yq(m),Bt(PI ...... Pk,gjlXJ)

ABs(PI...... Pk, gj/yj) - A xi = Yii<q(n:)

and f o r A= t < s, I I A I I q (m) can be expressed as

VX0,...,xq(m),YO,...,Yq(m), Bt(Pl,..., Pk,gjlxj)

ABS(Pl...... Pk,qjlYJ)--+^ A xi=Yi Axi -* Yii $q(m) i+1 <J<q(m)

That these formulas are equivalent to the original ones depends on the fact that acircuit has a unique computation on an input, and the provability of the equivalencerests on the provability of the formulas

`dxo, ... , xq(m), YO..... Yq(m), Bt(PI...... Pk, gilxi)

ABt(pl...... Pk,gJIYJ) A xi =Yii<q(m)

(and similarly for BS).To see that this formula is provable in G* in size poly(m) assume that Bt has

the form

Bt :=3w1,...,wr(m)Dt(Pl,...,Pk,w,x)

where Dt is quantifierfree, the bits w I, ... correspond to the nodes of the circuitCt, and Dt is a conjunction of < r(m) conditions saying that the value at the nodew, is computed from the values at the incoming nodes according to the definitionof Ct, and that pi's are the inputs and Y's are the outputs. Now assume w.l.o.g.that Ct computes we's in the order w1, w2..... Then by induction on £ constructGo-proofs of the implications

D,(1 ,.. .)5k,w,x)ADt(Pl,.. >Pk,v,Y)- /\wi<e

vi

This needs 0(t) steps, each of which has size poly(m): That is, the total size ispoly(m). This proves condition 2.

Condition 3 follows from 2 by induction on the logical complexity of A.Q.E.D.


Lemma 9.2.3. Let A E Eo, t be a term and q (x) a bounding polynomial for A (t).Then for every in there are size m 0(1) G proofs of

Ilt = a - A(a)llq(m) -' IIA(t)llg'(m)

Moreover, these proofs are constructible from m by a polynomial time functiondefinable in SS .

Proof. The lemma is proved by induction on the logical complexity of the formulaA and the term t, and we leave it to the reader.

The definability of the proofs in S21 follows immediately. Q.E.D.

Lemma 9.2.4. Let A bean axiom of BASIC and q (x) a bounding polynomial of A.Then for all m there are size m 01' 1 G *-proofsproofs of the formula 11 All'.).). Moreover,

these proofs are constructible by a polynomial time function definable in S.

Proof. This lemma is a special case of a theorem of Cook (1975) . That theo-rem says that the translations of all axioms of PV have polynomial size extendedresolution proofs and hence by Lemmas 4.5.8 and 4.6.3 also G'-proofs.

Such proofs are constructed in two steps: First find a PV-derivation of A (Aexpressed as an equation) consisting of the equations El, ..., Et, and then byinduction on i construct size poly(m) proofs of the formulas II Ei II m), wherer(m) is a bounding polynomial for all El , ... , Ee.

The first step is straightforward as the axioms of BASIC mostly state just therecursive properties of functions that are used in their PV-definition by the limitedrecursion on notation.

The second step is also simple. The only nontrivial part is a simulation of ruleR5 in Definition 5.3.2, which is handled in the same way as the proofs of the lastimplication in the proof of Lemma 9.2.2. That is: The equality ll fi = f211g(m) ofthe new functions introduced in the rule is proved by induction on the length of y.Q.E.D.

Theorem 9.2.5. Let i > 1 and A(a) be a Eb formula. Assume that

TZ F- A (a)

Then there is a bounding polynomial q (x) for A such that for all in formulas

11 A Il (m) have size m 0(1) G; proofs.

Moreover, these G; -proofs are definable in S.

Proof. For this proof it is convenient to extend the system G by two derived rulesfor introduction of implication:

Impl : leftr ) A,B C,F -

B -+ c, r-) A

Impl : right

B,r-*A,Cr - A,B -C

Recall that B - C is an abbreviation of -B v C and the preceding rules justabbreviate these two derivations, left : from r - A, B derive -B, r -± A andthen B --* C, I" - A, and right : from B, I' - A, C derive r - A, B, Cand by two V : right rules and a contraction derive r - A, B -+ C.

Assume that

TZ F- A(a)

By Corollary 7.1.5 there is an LKB-proof 7r with the Eb-IND rule of the sequent

A(a)

in which all formulas are in Eb U IZb.Choose q (x) to be a bounding polynomial of all formulas occurring in the proof

ir. The idea of the simulation of Tr by a G; -proof is to translate every formula B inn into 11BIIq(m), and possibly fill in some derivations to obtain a valid G,-proof.

As usual we proceed by induction on the number of inferences above a sequentin it of the form

r- A

to show that the sequent

Ilrll -± IIAII

has size m °') G1-proof, where for r = (A 1, ... , Ak) the symbol 11 I'll abbreviatesthe cedent 1 1A1 1 1 '( .) , , 1 1 A k l l (m )

The case of the initial sequents, that is, the axioms of BASIC, the equalityaxioms or the logical axioms of the form B -* B, B atomic, are obvious withthe exception of the axioms of BASIC, for which the statement follows fromLemma 9.2.4.

The case of the structural or of the propositional rules or the cut-rule is handledby the same rules of G; .

Consider now the V < : right rule

a <s, r A, B(a)r - A, Vx < sB(x)

There are two possibilities: s is or is not of the form It 1. For the former let the setX be

X:={EE{0,1}*IEi=O fori> Iq(m)II

and first derive the sequent

V Ila = E AT < Itl II Ila < Itl II (S1)EEX

and the sequent

IIB(a)II - ) IIB(a)II (S2)

and from these two derive by the introduction of the implication to the left and tothe right

Ila<ItI-B(a)II IIVa=EAT <Iti---> B(a)II (S3)EEX

Then derive the sequent

V Ila = E A E < Itl II - 11B(a)II A Ila =A E < Itl -+ B(a)II (S4)EEX EEX

and by the cut rule applied to (S3) and (S4) derive

Ila<Itl-B(a)II ) AIla=EAT<ItI--* B(a)II (S5)EEX

Separately derive

Alla=EAE<ItI-kB(a)I1-Alla<Itl-B(a)II(p/E) (S6)EEX EEX

and by the cut-rule applied to (S5) and (S6) also

Ila < Itl - B(a)ll(P) - A Ila < Itl -> B(a)ll(P/ ) (S7)EEX

From the upper sequent of the simulated inference infer by Impl : right

IIrll-+II All, Ila<s-+B(a)IlAlla<Iti-*B(a)ll(p/E)EEX

and by the cut with (S7) infer the translation of the lower sequent of the simulatedinference.

This shows the simulation ifs = I t I. Assume now that s does not have the formI t I. From the upper sequent by Impl : right derive

or -f iill, la < 5 B(a)li

and by q(m) + 1 inferences d : right applied top associated with a derive thewanted sequent

Ilrll - IIoll, Ildx <sB(x)II

Now consider the rule V < : left

B(t), r -, At < s, Vx < sB(x), I` A

Assume first that t = Irl for some term r. In both cases we shall first derive thesequent

lit < sli, Il`dx < sB(x)li -f IIB(t)II (So)

and by the cut-rule with the upper sequent obtain the wanted sequent

lit < sil, IIVx < sB(x)II, IIFI IIAII

The cases differ as to the derivation of (So). If t = I r I derive

lit ri Ii Vlit=aAa<irlll(p/E) (S1)EEX

and then

IIdx IrI B(x)li, VEEx ll t = a Aa < Irl II(P/E )

V lit=aAB(a)ll(P/E)EEX

By the cut-rule applied to (S1) and (S2) derive

(S2)

lit < Irl II, IlVx < IrIB(x)ll -f V lit = a A B(a)ll(P/E) (S3)EEX

By Lemma 9.2.3 applied to the formula B(a) derive the sequent

V lit =aAB(a)ii(PlE)) IIB(t)II (S4)EEX

and by the cut derive (So) from (S3) and (S4).Assume now that t is not of the form IrI. First derive

lit<sll-*3xlla<sAa=tll(P/x) (SO)

and

(S2)

From (Si) and (S2) the cut-rule yields the sequent

lit <_ sll, IIdx < sB(x)II - 3x Ila = t A B(a)II(plx) (S3)

Using Lemma 9.2.3 also derive

3xlla=tAB(a)il(P/x)-fIIB(t)II (S4)

and the sequent (So) can be obtained from (S3) and (S4) by the cut-rule.

This concludes the treatment of the V < : introduction rules. The cases of the3 < : introduction rules are dual and we leave them to the reader.

It remains to treat the Eb-IND rule

B(a), I' -* A, B(a + 1)B(O), F - A, B(t)

Assume that the atoms p are associated with the variable a and the atoms q withthe term t. For simpler readability we shall omit the side formulas I', A.

The first idea would be to simulate the IND-rule by repeated cuts applied to theupper sequent for a = 0, 1, ... , t - 1. This would, however, have an exponentialsize. We shall instead shorten the simulation by the substitution rule, which is bythe proof of Lemma 4.6.3 available in G.

Assume that we have the sequent

IIB(a)II IIB(a+1)11 (S)

Derive the sequents

IIB(a)II ---) IIB(a+2')II (U1)

for i = 0, ... , q(m). The sequent (U0) is just (S), and (Ui+1) is derived from(U;) as follows: Let F be a (q (m) + 1)-tuple of new atoms. By substituting rd's forp in (U;) derive

IIB(a)II(p/F)---) IIB(a+2')II(p/F) (V)

Using the equality axioms whose translations are shortly provable derive

Ila+2' =b1I(p,F),IIB(a+2')II(T)- JIB(a)II(TI) (V2)

where F is associated with the variable b. By the cut-rule applied to (U;) and (V2)get

Ila+2' =bll()5,F), IIB(a)II(p) -* IIB(a)II(p/ ) (V3)

By the cut obtain from (VI) and (V3) the sequent

Ila+2' =bII(p,F), IIB(a)II(p) - IIB(a)II(p/ ) (V4)

Again using the equalit, :ioms derive

Ila+2' =bll(p,F. IIB(a+2')II(7i/F) -> IIB(a+2'+')II(7i) (VS)

and by the cut from (V4) and (V5) get

Ila+2' =bll(p,F), IIB(a)II(p) - IIB(a+2'+')II(p) (V6)

Derive

3xlIa+2' =bII(p,x) (V7)

9.2 Translation into quantified propositional fonnulas 153

and from (V6) by (q (m) + 1) 3 : left rules

37 1la+2'=bII(P,7 (V8)

and by the cut-rule from (V7) and (V8) the sequent (U,+1).The next idea is to derive the sequent

112' > bll(r), JIB(a)II(P) - IIB(a+b)ll(P,T) (Si)

consecutively for i = 0, ... , q(m). The sequent (So) follows from (Uo), usingsimple

112°>bll(r)- Ila=bva+l=bll(P,T)Assume that we have (Si) and derive (S;+i) as follows. Let c, d be new variableswith the associated atoms u, T. By substituting u for P and v for r in (Si) derive

112' > cll(u), JIB(d)II(v) IIB(d+c)II(v,u) (Z1)

From (U,) derive

(Z2)

and by the cut from (Z I) and (Z2) the sequent

112'>cll(u),Ila+2'=d11(P,3),IIB(a)11(P)-IIB(d+c)11(v,u) (Z3)

From (Z3) get

(Z4)

From the sequents (Si) and (Z4) infer by V : left

112' > bv(2' > cAb = 2'+c)II (Y, ii), II B(a)II (p) - II B(a+b)ll (P, Y) (Z5)

By (q (m) + 1) -applications of 3 : left to u get

3x112`>bv(2'>cAb=2'+c)Il(T,x),IIB(a)II(P)(Z6)-) IIB(a+b)ll(P,r)

Separately derive

112'+' >bll(r)3x112'>bv(2'>cAb=2'+c)11(Y,x) (Z7)

and by the cut-rule applied to (Z6) and (Z7) derive the sequent (S,+1).Having (Sq(m)) substitute 0 for P and q for r to obtain

112g1m) > t11(q), IIB(O)11 -) IIB(t)II(q)

Since the sequent

112q(m) > tIl(q)

has a simple proof, the cut-rule applied to these two sequents yields the wantedsequent

II B(o) II II B(t) II

This concludes the proof of the theorem. Q.E.D.

Theorem 9.2.6. Let i > 1 and let A(a) be a Eb formula. Assume that

S2 H A(a)

Then there is a bounding polynomial q (x) for A such that for all m the formulasIIAIIg(m) have size m 0(1)

Moreover these proofs are Eb-defnable in S.

Proof. The simulations constructed in the proof of Theorem 9.2.5 are all treelikeexcept the simulation of the Eb-IND rule where the sequents (U,) and (Si) areused repeatedly. But in S2' the Eb-IND is replaced by the Eb-PIND rule

B(Lal2)), r - A, B(a)B(O), r - A, B(t)

which can be simply simulated by < (q (m) + 1) cuts. Let it be a G; -proof of

-) 11B(a)II

Substituting in the whole proof for atoms p associated with a consecutively thebits of t, Lt/2J, ..., 1, 0 get < (q(m) + 1) proofs nt, .r.Zj, ... , iri, pro each ofsize m°(l). Joining these proofs by the cut-rule entails the wanted sequent

11B(0)11 --> IIB(t)11

Q.E.D.

Corollary9.2.7. Assume that the equation t = sin the language ofPV is provablein PV. Then there is a bounding polynomial q(x) for t = s such that for all mformulas 11t = s ll q (m) have size m °( 1) EF proof.

Proof. From PV I- t = s follows S2' (PV) I- t = s and by the preceding theorem(trivially modified to the language of SS (PV)) the formula ll t = s ll q gym) has size

m 0(1) G *-proofs. The corollary then follows from Lemma 4.6.3. Q.E.D.

The rest of this section is devoted to the extension of the previous simulationtheorems to the theory U2 and to the full system G. Such a simulation is not un-expected as Uz relates to PSPACE by Theorem 8.2.4 and the set of tautological


quantified propositional formulas is PSPACE-complete (cf. Balcazar et al. 1988).In fact, Dowd (1979) proposed an equational theory PSA related to PSPACE anal-ogous to the relation of PV to P and showed a simulation of PSA by G. His trans-lation, however, was not the II . . . II translation. In his translation the quantifiercomplexity of a Eb-formula was not Eq but increased with the length of the inputand with the space bound.

We shall prove the simulation using the II II translation.

Theorem 9.2.8. Let i > 1 and let A(a) be a bounded first order Eb formula.Assume that

UZ I- A(a)

Then there is a bounding polynomial q (x) for A such that for all m the formulam(m) has size m°(l) G -proof

Moreover, these proofs are E'-definable in S.

Proof. We shall prove, in fact, a stronger statement (the following claim), whichis a propositional version of the witnessing Theorem 8.2.4 for the case of UZ . Firstwe shall make some simplifications and conventions.

We shall assume that all E I'b-formulas are, in fact, strictE i'b, that is, a block ofsecond order 3-quantifiers followed by a Er'b-formula. This is achieved by addingthe pairing function (x, y) to the language and coding sequences of sets by

j E (a)i :_ (i, j) E a

and by enlarging BASIC by a few axioms implying E,'' AC (over Eo'b P]ND).We shall also extend the translation II II from Eb- to E 11b-formulas by

stipulating that atomic formulas a E a are translated as

IIallq(m)(Pl,...,pq(m))

where IlaII is a new metavariable for quantified propositional formulas. Thesemetavariables will actually never occur in a G-proof, but they allow convenientnotation. We shall manipulate them freely; for instance, II A (a, 01/,B) II is the sameas IIA(a, a)II(IIall/IIfII).

Claim. Let 30'A (a, as, 0t) and 3t/rrB(a, as, *r) be strictEI'b formulas suchthat

UU I- 34tA(a, as,,t) ) 3lirrB(a, as, rr)

Then for every m there is a E00 formula with the metavariables

W. (Ilasll, IWll)

and a bounding polynomial q (x) for A, B such that for any two Eq formulas

C, D without metavariables the implication

IIAII9(m) (p, IIasII/C', IIO`II/D)

IIBIIq(m) (p, IIashI/C, III'll/Wm(IIashh/C, IIo`II/D))

has a G proofofsize poly(m, Cl,IIDI).The formulas C, D have q (m) + 1 atoms and may contain the atoms p too.

Under the hypothesis of the claim there is a UU -proof it in the second order LKBwith E 1'b FIND in which all formulas are strictEI' ", ending with the implication.Any sequent in 7r has the form

.... 3 0 i (a, b, a, 0, Ot), ..., r A, ... , 3*j Bj (a, b, a, , j), ...

where b, ,B are the other free variables different from a, a, and F, A are the cedents'b-ofElformulas. For simplicity of notation we shall omit b, P, and we shall write

II II instead of II . . . Ilq'(m)

A remark on the choice of polynomial q (x): It is a bounding polynomial of allformulas in 7r; in the case of second order formulas with variables ys this meansthat q (m) also bounds the length of all possible values of s needed in the evaluationof the formula for first order inputs of length < in.

We shall also write

IIAiII(C, Di)

instead of

IlAill (Ilall/C, Iloill/Di)

(and similarly for Bj), as there is no danger of confusion.By induction on the number of inferences in 7r above such a sequent we construct

E,,-formulas without metavariables C..... Di,... (with (q (m) + 1) new atomsand with possible occurrences of other atoms free in the sequent) such that thereare size m°(1) G-proofs of

..., IIA1II(C, Di),..., IIFII IIoII,..., IIB;II(C, Wj(C,..., Di,...)),...

Formulas Wk are called witnessing formulas.We proceed now considering several cases according to the type of the inference

giving the sequent.The principal formulas of all propositional inferences must be Eo'b, and the

only two nontrivial cases are V : left and A : right.Let a principal formula of an v : left inference be Ei V E2 with Wj (resp. WZ )

the witnessing formulas for the upper sequent with El (resp. with E2). Define thewitnessing formulas W j for the lower sequent by the definition by cases

W':=( IIElIIAW/) V(11-E1 11 AW2)

9.2 Translation into quantified propositional fonnulas 157

If II El v E2 II is true, then W 1 is correctly defined as a valid witness; otherwise W ican be anything. Moreover, there are polynomial size G-proofs of the sequents

11 E, 11 ) W' - W

and

II-EIII - W' W2

Using these proofs one can construct from the corresponding proofs for the uppersequents polynomial size proofs of the lower sequent with the witness Wi.

The A : right rule is treated dually.The structural rules are simple; in particular, the contraction : right rule is treated

by the definition by cases again: If two occurrences of formula 3r B(a, a, *) in thesuccedent are contracted, and W1 j, WW are the witnessing formulas correspondingto these two occurrences, set

Wi (IIBII(Ilall, W,) AW,) v (,IIBII(Ilall, W,) A W2)

Now let us consider the second order 3 : right inference (which in the sequentcalculus represents Eo'b-CA [cf. Section 8.3])

.... B(a, a, '/E(a, a)).. , 3VB(a, a, V)

with E E Eob Then just define the new witness formula for new variable 1 by

W := IIEII(Ilall)

The last nontrivial case to consider is E' ,b-PIND, where the induction formula3¢E(a, b, a, 0) is Ei'b and not Eo'b (the latter case is treated as in the proof ofTheorem 9.2.5).

Assume that W (a, b, I l a l l , ..., 1 1 0 i1 1 ,.. . , 11011) is the witness formula associ-ated with 30 in the succedent of the upper sequent of the induction inference

.... 30E(a, LZ],a,0) - 30E(a,b,a,0),....... 3¢E(a, 0, a, 0) -* 30E(a, t, a, O), .. .

Define terms to, t1,... , tk by to := t and ti+1 := Lt1/2J, for k = q(m). Note thattk - 0. Using these terms then define

VI := W(a,b/tk-1,Ilall,...,ll0lI,...,II0II)

and

Ve+t := W (a, b/tk-e_l, Ilall,... ,

for£ = 1,...,q(m) - 1, and set

W := Vq(m)


It is clear, and easily G-proved, that Vt witnesses the implication

3¢E(a, 0, a, -0) -+ 30E(a, tk_t, a, 0)

and hence W witnesses the lower sequent of the induction rule.There is one more requirement that is not obviously satisfied: namely, W has

to have a size of at most m 00). Indeed, if 11411 has at least two occurrences in W,then the size of Vt grows exponentially with 2, whereas if II¢II occurs in W onlyonce, the size of VQ increases only proportionally. Hence a way to overcome thisobstacle is first to put W into a G-equivalent form with only one occurrence of1 1 01 1. For example, replace W by

Vx,X - 11011 - W(11 011/x)

This will increase the quantifier complexity of W, but we do not care about thatas the complexity of W is greater than that of W in any case. Q.E.D.

9.3. Reflection principles and polynomial simulationsIn this section we show that the provability of the reflection principles for proposi-tional proof systems in bounded arithmetic implies polynomial simulation. As anillustration of this idea assume that we can verify in S1 the soundness of a proofsystem P. Then the simulation of SZ by EF (Corollary 9.2.7) allows to "prove" thesoundness of P in EF, and then to use this proof to simulate P-proofs by EF-proofs.This idea is due to Cook (1975).

We shall apply the idea to systems considered in the first two sections of thischapter, and we shall also derive some corollaries about bounded arithmetic, notjust about the propositional proof systems.

First we shall consider the language L2 of V1l , and the translation of Section 9.1.In this language we have finite sets and all basic operations with sets are Eo'b-definable (in I Eo'b using Eo'b-CA). In particular, recall that a sequence of setscan be coded by a set by

jE(a);-(j,i)Eaand again such coding applies CA to the definition of the sequence; this will alwaysbe Eo'b or A 1'b. Thus we can carry in I E 1,b some usual set-theoretic coding ofpropositional formulas, say as finite binary trees with inner nodes labeled by theconnectives and leaves labeled by atoms or constants. Proofs are then particularsequences of formulas, and for systems F or EF the definitions of F-proofs (resp.EF-proofs) are obviously also Eo'b. A truth evaluation of a formula will be codeda 0, 1-labeling of the nodes of the formula computed according to truth tablesof the connectives. Moreover, these definitions allow us to prove in IEo'b suchelementary syntactic properties as "A formula has unique immediate subformulas,"

and so on. We leave the reader to design her/his own definitions and to apply themto the following arguments. We just stipulate a certain notation.

Definition 9.3.1.1. Fla(a) is a Eo'b-definition of "a is a propositional formula"2. For P = E EF, Prfp (7r, a) is a E,'b-definition of "7r is a P proof of a "3. Assign (17, a) is a E/'b-definition of "?7 is a truth assignment to the atoms

of the formula a," and Assign(77, a) implies in I Eo'b Fla(a)4. Eval(p, a, y) is a El ,b-definition of "y is the evaluation of the formula

a over the truth assignment i7 to its atoms," and Eval(r7, of, y) implies inI Eo'b the conjunction Fla(a) A Assign(r), a)

5. q a is a Ai'b-definition in U1 of " t7 is a satisfying truth assignment tothe atoms of the formula a," and it is in I E 11b-defined by

3y, Eval(r7, a, y) A "y evaluates to 1 "

6. TAUT(a) is a IIi'b formula defined in IEob as

Vr7, Assign(r7, a) - q H a

Formula z a is 0;'b in UI as even I Eo'b can prove the implication

Eval(i7, a, y,) A Eval(r7, a, y2) -+ yi = y2

(by induction on the size of y, , y2), and by the following lemma, which is notobvious.

Lemma 9.3.2. The theory Ui proves that every propositional formula can beevaluated over any truth assignment to its atoms

Vr7, a3y, Assign(r7, a) Eval(r7, a, y)

Proof We would like to proceed by the induction on the logical depth of a, but thiswould require EI'b IND, whereas we have only EI'b PIND. We shall thereforemake use of a proof of the Spira theorem 3.1.15, reducing the depth to a logarithmicone.

Let a be a formula and n an evaluation of its atoms. Assume that a is a boundto all nodes in a and for b < a let ab denote the subformula of a with the root b.For P a subformula of a and S a set of some mutually incomparable nodes of a,let fi S denote a formula consisting of those nodes of P not majorized by anynode of 3, with y E S a leaf of a J. 3 labeled by a new atom qy. Thus P 4. S has atmost 13 1 new atoms that do not occur in a.

Consider the formula

A(u) := [Vx < aYS(JSI > lal - u), Fla(ax 4, 3) A lax 4- 31 < (3/2)u]

-+ V [Assign(r7 U , ax 4. 3) - 3y, Eval(i7 U l; , ax .(- S, y)]

As the size of S is bounded by la 1, l; can be coded by a number bounded by a andhence A(u) E EI,b.

For u = 0, lax 4. S 1 = 1 so ax 4. S is just an atom or a constant and A (0)clearly holds. The idea of the proof of implication A (u) A (u + 1) comes fromSpira (1971), where the following claim was stated.

Claim. Any formula at has a subformula a2 such that Ia21 < (2/3)lai I andlai I -1a21 < (2/3) la 11.

This claim is easy to formalize (we have a A 11 'b-definition of the cardinality ofa set, by Lemma 5.5.14) and to prove by P1ND on the size of the formula.

Assume A (u) and let ax 4, 8 fulfill the hypotheses of A(u + 1). From the claimit follows that there is a node y in ax 4, S (= *1) such that both a y J, S (= ,2) andax 4. (S U {y}) have size at most (3/2)'. By A(u) then there is an evaluation y'of a y .{ 8 over any q U and an evaluation y" of ax J, (S U (y)) over q U U X,where x assigns to q y the value computed by y'. It is then straightforward to Eo'bdefine from y', y" the evaluation y of ax y S over n U l; . This entails A(u + 1).

By E 'b-LIND, available in U,1, it follows then that A(1093/2 (a)) : That is, there

is an evaluation y of a over q. Q.E.D.

Theorem 9.3.3. The theory U1 proves that F is a sound proof system

Va, n, PrfF(n, CO - TAUT(a)

P r o o f Argue in U,'. Let jr, a satisfy P r f F (n, a), and let (7r) 1, ... , (1r )k = a be thesteps in n. Assume that n is a truth assignment to the atoms of a: Assign(i, a). Wemay assume that no other atoms occur in n (as otherwise they could be substitutedfor by 0, for example).

Consider the formula A(u)

A(u) := n = (n)u

This formula is by Definition 9.3.1 O1'b in U,. By Lemma 5.5.9 U, admits l'b-

IND; hence we have the induction for the formula A(u).Clearly A(1) holds (as (7r)i must be a logical axiom), and A(u) -+ A(u + 1)

(as all Frege rules are sound). Hence A(k) holds: that is, q a.This applies to any rt; hence a is a tautology. Q.E.D.

Theorem 9.3.4. The theory Vii proves that EF is a soundproof system

Va, n, PrfEF(Ir, CO - TAUT(a)

P r o o f Argue in V11. Let n be an EF-proof of a with steps ( 7 r )1 , (n )k = a,where (7r) 1, ... , (7r). are the extension atoms used in Jr, in < k.

Let Assign(rt, a) hold and w.l.o.g. assume that in n only the atoms from a orthe extension atoms occur. Take the formula A (u)

A(u) := 33 "l; is a truth assignment to the extension atoms in n"

A Vv < u, 1i U (ir)v

This is a E,11 b-formula clearly satisfying A(1) and A(u) -+ A(u + 1), givingto the extension atoms truth values computed from q by their definitions, foru = 1, ... , in. Hence E1'b-IND implies A(k), and q a follows. Q.E.D.

There is no Eo'b-definition of the relation ri a. This is because such adefinition would allow us to express every Boolean formula by a constant depthcircuit (ri = a) of size polynomial in n = Jul, which is impossible (e.g., theparity function (D(xl, ..., x,) has a formula of size n2 but it has no polynomialsize constant depth circuits, c£ Theorem 3.1.10). Thus we cannot translate theprevious arguments into I Eo'b. The theory I Eo'b does not even prove that everyformula can be evaluated

Vil, a, Assign(rj, a) - 3y, Eval(ri, a, y)

The unprovability of this implication is shown as follows: Assume that I Eo'bproves the implication. Then the witnessing argument of Theorem 9.1.5 wouldgive the witness formulas for y in terms of the bits of rt, a, and these witnessformulas would be E.'b. Hence again it would follow that formulas can be writtenas polynomially larger constant depth circuits: a contradiction with Theorem 3.1.10(and with the existence of small formulas for the parity function; see the end ofSection 3.1).

There is, however, a Eo'b-definition of ri = a assuming that the depth of a isbounded by a standard constant.

Lemma 9.3.5. Let d be a constant and let Flad (a) be a Eo'b-definition of "a is adepth < d formula." Then

I Eo'b H Vrl, a, Flad(a) A Assign(ri, a) 3y, Eval(q, a, y)

Proof. Prove the statement by induction on d showing that the evaluation y isactually (for fixed d) Eo'b-definable from ri and a (the implication then followsby Eo'b-CA). This is because IEo'b can prove (by the remark before Definition9.3.1) that a depth d formula with the outmost connective A is a conjunctionof (arbitrarily bracketed) depth d - 1 formulas: That is, it is true if "all thesesubformulas of depth d - 1 are true" Assuming that a truth definition for d - 1formulas is already formed, this allows us to define the truth for depth d formulaswith the help of a V< quantifier; similarly when the outmost connective is v. If itis then first apply de Morgan rules to rewrite a so that all negations apply onlyto the atomic subformulas. Q.E.D.

Theorem 9.3.6. Let d > 0 be a constant. Then the theory I EO'b proves that anyFrege proof of depth < d is sound

V7r, a, PrfF(Jr, a) A "dp(r) < d " -+ TAUT(a)

Proof The proof goes by induction on the number of steps in r as in Theorem9.3.3, using the EO'b-definition of the satisfaction relation for depth < d formulasprovided by Lemma 9.3.5. Q.E.D.

Before turning to the quantified propositional proof systems and the translationof Section 9.2 we formulate in model-theoretic terms a sufficient condition for ademonstration of superpolynomial lower bounds.

Lemma 9.3.7. Assume that A is a EO'b formula. Then I Eo'b proves the equiva-lence

A(ax) = (a = (A)x(P))

where a is a truth assignment, EO'b -definable in I Eo'b, assigning to p, the value 1iff i E ax.

Proof. This is readily established by induction on the logical complexity of A.Q.E.D.

Recall that n° := Uk« nk for n, an element of a nonstandard model of arith-metic.

Theorem 9.3.8. Let A(a, a) be a EO'b -formula with a and a the only free vari-ables. Let M be a nonstandard model ofthe true arithmetic T h (cv) and let n E M\wbe its nonstandard element.

Assume that for every bounded set it c_ n' coded in M there is a familyX c_ exp(nw) of bounded subsets of n' and a E X such that

(i) jr E X( i i ) I E

O

(iii) (nw, X) -A (n, a).Then the formulas (A(a))m, m < w, do not have polynomial size constant-depth

F -proofs.If (n', X) Uj then the formulas (A(a))m do not have polynomial size F-

proofs, and if even (n°', X) = V1' then they do not have polynomial size EF-proofs.

Proof. Assume that the formulas (A(a))m, in < co, do have polynomial sizeconstant-depth F-proofs. As M satisfies the true arithmetic there is k < w suchthat for every element n E M, M codes a constant-depth F-proof of (A(a)), ofsize at most nk. Let jr c nk be such a proof.

Take X and a E X satisfying the conditions (i)-(iii). Then (n', X) is a modelin which the propositional formula (A(a)), has a depth d F-proof, someof I EO'b

d E w. By Theorem 9.3.6 the formula (A (a)), must be a tautology. However,-A (n, o' true, hence the assignment a defined in Lemma 9.3.7 does not satisfy

hat lemma. This is a contradiction.of U, and V,1 follow analogously, using Theorems 9.3.3 and 9.3.4 in

place of Theorem 9.3.6. Q.E.D.

Now we shall turn to the quantified propositional formulas and the translationof Section 9.2.

Lemma 9.3.9. There are a Ab in S2 definition w =o A of the satisfaction relationfor Eo formulas A, and a B(EP) definition w [--i A of the satisfaction relationfor Eg u fg formulas.

Moreover, S2 proves Tarski's conditions for truth definitions (i > 0):1. (w A V B) ((w =i A) v (w --i B))2. (w3. (w =i -A) = (-' w --i A)4. (w =i 3xA(x)) ((w A(0)) v (w =i A(1)))5. (w =j VxA(x)) ((w =i A(O)) A (w = A(l)))

In fact, 1 and 4 are provable in a stronger form6. (w --i VEEX A(E )) _ (3 E E X, w --i A(E )), where X C 10, 11'7.

(w =i(3w', "w' is a is a truth assignment to atoms

p1... ,pn" A w- A(p))

Moreover S2 proves (i > 0)

VA E Eg u IIg vw, (w =j A) _ (w [--i+l A)

Definition 9.3.10. For i > 0 define a do +1 formula

Tauti(A) := Vw(IwI < IAI), w =j A

It defines the set of the tautological Eq U 11q formulas. The set is denoted TAUTi.

The following definition extends Definitions 4.1.1 and 4.1.3 to systems forquantified propositional logic.

Definition 9.3.11.(a) A polynomial time computable binary relation P(x, y) is a quantified

propositional proof system iff

(37r, P(7r, A)) -a A E U TAUTii>o


(b) Let P and Q be two quantified propositional proof systems and i > 0.Then P i-polynomially simulates Q, P >i Q in symbols, iff there is apolynomial time function f (x, y) such that

d7r, A, Q(7r, A) A A E Eq U IIg -) P(f (n, A), A)

(c) Assume that a quantified propositional proof system p is Ab-defined inS. Then we define the formula

i-RFN(P)

as

V 7r, A, P(Jr, A) A A E E7 U fl -+ Tauti (A)

Note that iRFN(P)isenbfori =0anddEbfori >_ 1.( d ) Assume that a p r o o f system p is / '-defined in S. Then we define the

formula

Con(P)

as

dnr, -P(ir, 0)

We do not require in the definition of a quantified propositional proof systemthat it is complete (with respect to all quantified tautologies). This is because wewant the definition also to cover systems like F, EF, Gi, G, that are not completew.r.t. all tautological quantified propositional formulas.

Lemma 9.3.12.(a) Let i > 0 and assume that 0(a) E Eb for i >_ 1 and that 0(a) is Ab in

S2 .for i = 0. Let a be a truth assignment assigning to the atom pj thebit a(j), j > 0; it is Lb-definable in S. Assume that q(x) is a boundingpolynomial for 0 (a).Then

Sz 1- w(a) = a =i IIWq11

(b) For any proof system that is Ab-definable in SZ and closed under thesubstitution rule, modus ponens, and that is complete.for TAUT0 it holds

SZ I- Con(P) = 0-RFN(P)

Proof. Part (a) of the lemma is proved by induction on the logical complexity of0; it is analogous to Lemma 9.3.7.

For part (b) it is enough to prove the implication

Con(P) -f 0-RFN(P)

as the opposite one is trivial. Argue in SZ : Assume that P proves a Eo -formulaA(p) but w =o --A for some assignment w.

The former implies that P F- A(-PIT), and the latter implies P F- -A(-PIT),from which P F- 0 follows. Here we use the assumption that P is closed under thesubstitution rule and the modus ponens. Q.E.D.

The following lemma extends Lemma 9.2.4.

Lemma 9.3.13. Assume that 0(a) E Eb and let q(x) be a bounding polynomialfor 0. Then

S2 F- O(a) -+ (G1 F- II.IIq(lIal)(a))

Proof. The statement is proved by induction on the logical complexity of q5, treat-ing general true equations similarly to the axioms of BASIC in Lemma 9.2.4.

Q.E.D.

Lemma 9.3.14. Let 0 be a Eb formula, i > 1, and let q (x) be a bounding poly-nomial for q5. Assume that P is a proof system At-defined in S.

Then SS proves the implication

(i-RFN(P) A (P I- II0IIq(lXl)) - dy(IYI < Ixl - 0(Y))

Proof The lemma follows from the following claim.

Claim. SS proves

Tautl (110llq(Ilxl) A lyl < Ixl - 0(y)

The claim follows from Lemma 9.3.12, part (a). Q.E.D.

The following lemma is a useful, intuitively clear property of the formula Taut;.

Lemma 9.3.15. Let i > 1 and assume that q(x) is a bounding polynomial of theformula Taut. Then SS proves

(AEEgAn>JAIAGZI-IITauti(A)Ilq[n))- G, F- A

The same holds for i = 0 with G i in place of Go.

Proof. We present the case i = 0; the general case is analogous. The formulaTaut; (A) is defined as

Vw(IwI < IAI), w J:=t A

We think of the formula w =i A as being defined analogously to the formulas inDefinition 9.3.1 by formalizing

Vu, "u is the computation of the value of A over w"

-+ "u assigns to A the value 1"

The "computation" u is an assignment of values (u) 1, ... , (u)t, each 0 or 1, to thesubformulas of A, and we assume that values of the subformulas of a subformulaappear in the sequence (u) 1, ..., (u), sooner. Assume that (u)j corresponds to thesubformula A j of A.

Let Comp(w, A, u) denote the formalization of "u is the evaluation of Aover w "

Let p be the atoms of A.

Claim. S2 proves

Vj < IAI; Comp(-P, A, u) --+ (u)j = Aj

and, in particular

Comp( p, A, u) -a (u)1 - A

The claim is proved by PIND on j in the formula

Comp(w, A, u) (u)j - Aj(p/w)

and taking w := p.The theory S2 also proves that any formula can be evaluated by induction on

the length of the formula. Thus, in particular, SZ proves

3u, Comp(p, A, u)

and by Theorem 9.2.6 the last two formulas imply that SZ proves

G* I- 3x, IlCompll(p,A,x)

and

G' I-IICompll(p,A,9)- qt=A(p)

which entails the lemma.

Theorem 9.3.16. For i > 1

TZ H i-RFN(G,)

and

Q.E.D.

SZ I- i-RFN(G; )

Proof Using the formula w =i A one can define an b 1-formula STauti (S )formalizing that "the sequent S consisting of Eq U IIq-formulas is satisfied by alltruth assignments."

In S2+t then by II+t PIND on the length of a G,-proof 7r prove that anysequent in n is tautological. Corollary 7.2.4 then also implies that

T2 F- i-RFN(G,)

The proof of the second part of the theorem is more involved and we only sketchit, leaving the details to the reader. Work in S2. Assume that 7r is a G; -proof andw.l.o.g. assume that all formulas in 7r are Eq. Then following the proof of Theo-rem 7.2.3 define functions witnessing every sequent in 7r. This is straightforwardexcept for the estimate of the time bound to the running time of the algorithmcomputing the witnessing function. To obtain the polynomial bound one needs touse the assumption that 7r is treelike.

We shall explain this on an example. If f (w) is a function witnessing a sequentin a proof that is not treelike, then in the course of the construction of the wit-nessing functions for later sequents we may iterate f polynomially many times.That can, however, give an exponential time (e.g., a repeated squaring gives theexponentiation in polynomially many steps). If 7 is treelike, however, then f willnever be iterated but only composed with the other witnessing functions.

The reader is invited to fill in the details, or see Krajicek and Takeuti (1992).Q.E.D.

The following theorem is the main application of the reflection principles andit generalizes a statement from Cook (1975).

Theorem 9.3.17. Assume that P is a proof system, Ab-defined in S2, and let0 < j < i. Assume also that

TZ F- j-RFN(P)

Then G, j-polynomially simulates P and, in fact, this simulation is definable in SZ

SZFGi>7P

The same is true for Sz and G* in place of TZ and Gi.

Proof From the hypothesis and Theorem 9.2.5 it follows that

Si F- (Gi F- Ilj-RFN(P)IIla

q[jla u)

for some bounding polynomial q(x) for j-RFN(P). This implies (as Gi is,


provably in S2, closed under modus ponens) that

S, F- Gi F- IIP(x, Y)IIq(IIaD(PI4) A Gi F- IIY E EgIIq(IIai(9 ))

- Gi F- IITautj(Y)IIq(IaD(q )

Now argue in S2: let 7r be a P-proof of A E Eg: That is, P(7r, A) holds. P(x, y)is a E'-formula, so by Lemma 9.3.13 also

G; F- IIP(x, Y)II(it, A)

holds, and so also

Gi F- IIPII(n, A)

The same argument applies to the true Ab-formula A E Eq, so

GiF'IIYEE9II(A)

Hence by the previous implication

Gi F- IlTautj(y)II(A)

By Lemma 9.3.15 then

GiF- A

This concludes the proof of

SZF- Gi>j P

That the simulation is polynomial time follows then from Theorem 7.2.3.An identical argument, using Theorem 9.2.6 instead of 9.2.5, proves the state-

ment for S2 and G, in place of T and G. Q.E.D.

Corollary 9.3.18. Let P be a propositional proof system in the sense of Defini-tion 4.1.1 that is L -definable in SZ and closed under substitution and modusponens. Assume

Sz F- Con(P)

Then EFpolynomially simulates P and, in fact

SSF- EF?P

Proof. By Lemma 9.3.12 the hypothesis implies that

S' F- O-RFN(P)

which by Theorem 9.3.17 implies

SZF- G>oP

The statement then follows from Lemma 4.6.3, whose proof, as is readily veri-fiable, can be formalized in S. Q.E.D.

The following corollary again states Lemma 4.5.5. The reason for doing sois that this time it is obtained in a simpler way (though building on nontrivialmachinery). This proof was, in fact, chronologically the first one.

Corollary 9.3.19.

S1 I- EF > SF

Proof By Corollary 9.3.18 it is enough to prove Con(SF) in S2, which is straight-forward: By induction on the number of steps show that every formula in it is atautology. This requires flb-LIND available in S2 through Lemma 5.2.5.

Q.E.D.

The following corollary is a very important effective version of Theorem 4.1.2.Its first version, for PV and EF, was proved by Cook (1975).

Corollary 9.3.20. Let Tauto (a) be a fIb formula defining in SS the set of tautolog-ical propositional (quantifier free) formulas. Assume that there is a Eb formulaa(a) such that

TZ H a(a) - Tauto(a)

That is

TT I- NP = coNP

Then tautologies TAUTo have polynomial size G; proofs. In fact, this is provablein S2

Sz F- VA, Tauto(A) -+ (G; I- A)

The same is true for S2 and Gz (resp. Sz and EF) in place of T2 and G.

Proof Let

or (a) = 3x < t(a), S(a, x)

with S E Ob Define a proof system P by

P(ir,A) :=n <t(A)nS(A,Jr)

Then the hypothesis of the statement implies that

T2 F- Con(P)

That is by Corollary 9.3.18

S2' - Gi > P

But the hypothesis also implies that every tautology has a polynomial size(< It(A)I) P-proof, hence also a Gi-proof.

For Si and G, (and SZ and EF) the same argument applies. Q.E.D.

Note that the condition of the previous statement can be strengthened as, in fact,all E9-tautologies will have polynomial size Gi-proofs. This is because the hy-pothesis implies that every Eb-formula, and hence the formula Taut; in particular,is in T2' equivalent to a Eb-formula.

The following statement is a propositional version of the previous statement.

Theorem 9.3.21. Assume that all E'-tautologies (TAUT1) have polynomial sizeGi -proofs.

Then all Eq-tautologies (TAUT;) have polynomial size Gi proofs.The same is true for G7 in place of Gi.

P r o o f. The hypothesis implies that the following V -sentence A is true

A := bx, Tauto(x) 3y, IYI _< Ixlk A PrfG1(y, x)

where PrfG (y, x) is a Ab-formalization of "y is a G,-proof of x"As TT F- i-RFN(Gi), by Theorem 9.3.16, we have

T Z + A F- Tauto(A) _ (3y, lyi < I AIk A PrfG (y, A))

That is,

TT + A I- NP = coNP

Define a proof system GA to be an extension of Gi by new initial sequents

IIAII9(m)

m = 1, 2, ..., where q (x) is a fixed bounding polynomial for A. Then analogouslywith Theorems 9.2.5 and 9.3.16 we have that TZ + A proves i-RFN(GA) and thatGA simulates TZ + A-proofs of the E'-consequences. Hence Corollary 9.3.20 alsoholds for GA and TZ + A in place of G, and T.

Thus

TT + A I- NP = coNP

established previously implies that all TAUT; have polynomial size G-proofs.But all formulas II All

q(m)E TAUT, (as A E E '), and hence they all have by the

hypothesis of the theorem polynomial size Gi-proofs. So Gi has a polynomialspeed-up (see Definition 9.1.3) over GA and the statement follows. Q.E.D.

We shall now prove a statement about the reflection principles of G in UU .


Theorem 9.3.22. The theory U i proves that G is a sound proof system. That is,2for every i > 0

U1 I- i-RFN(G)

Proof The satisfaction relation for the quantified propositional formulas is A 11 'b-definable in U2 1. A definition of

A

for general A E Eq is constructed by formalizing the following:if

b'x, 3y, ... b'x,3y B(x, y, p) is a prenex normal form of A(p )

then there are Skolem functions

F1(xi, p),..., p)s.t.VxB(x, yi/F,w) istrue

This works as the prenex normal form is Ob-definable in SZ, F,, ... , F can becoded by set variables of the language of U2, and the last formula is 1-I b (Fi, ... , F ).

By induction on the length of A (i.e., by Ei'b PIND) UZ proves the Tarskiconditions for this definition of satisfaction.

This definition as stated is Ell '', but it can be made A '' by choosing functionsF1, ... , F in some canonical way.

Argue now in U2 . Let n be a G-proof of A. By induction on the length of Jrshow that all sequents in 7r are satisfied by all assignments; this needs [I " LIND,which is available in U2. Hence

A

But w A implies w It A (the formula from Lemma 9.3.9); this is provableby induction on the logical complexity of A using Tarski's conditions satisfied byboth = and H. Q.E.D.

The next corollary summarizes all corollaries of the previous theorem obtainedanalogously to the earlier ones for the system Gi.

Corollary 9.3.23.1. Let P be a propositional proof system A' -defined in Si and assume that

for some i > 0

Ui I- i-RFN(P)

Then G i-polynomially simulates P and, in fact

S2 I- G>i P

(after 9.3.17)

2. IfUZ I- NP = coNP

then all tautologies from TAUT,, := Ui TAUT; have polynomial size G-proofs and, in fact, for each i > 0

Sz F- VA, Taut; (A) ---> (G I- A)

(after 9.3.20)3. Assume that all E -tautologies have polynomial size G-proofs. Then all

quantified tautologies from TAUT have polynomial size G proofs.(after 9.3.21)

We conclude this section by an important corollary of Theorems 9.3.16 and9.3.22.

Theorem 9.3.24. Let P be any of the following proofs systems: EF, SF, Gi orG; (i > 1), or G.

Then there is a polynomial p(x) such that for each n there is a P proof of

II Con(P)II9(n)

of size < p(n) (q(x) is a fixed bounding polynomial ofCon(P)).

Proof. For Gi, Gi , or G the theorem follows from Theorems 9.3.16, 9.2.5, and9.2.6, or from 9.3.22 and 9.2.8, respectively.

For EF and SF it follows from the statement for G i using Corollaries 9.3.18and 9.3.19. Q.E.D.

In the proof we could also use Theorems 9.3.4 and 9.1.5 for the case P = EF.Note that Theorems 9.1.6 and 9.3.3 imply a similar statement for P = F withn(log n)o(') in place of p(n).

9.4. Model-theoretic constructionsIn this section we give model-theoretic proofs for Theorems 9.1.3 and 9.2.7. Ibelieve that this side of the simulation results is important for understanding theinterplay between arithmetic and propositional logic, and the fundamental problemof lower bounds for proof systems.

The following statement is a version of Theorem 9.1.3.

Theorem 9.4.1. Let M be a countable model of the true arithmetic T h (w) and letn, t E M \ w be its two nonstandard elements. Let 0(a) be a Do(R) formula witha the only free variable.

Assume that in M there is no Frege proof of (0) (n) of depth < t and size < n`(i.e., no element < 2 n` codes such a proof).

Then it is possible to define R C M x M such that

(n', R) 0 Ioo(R)

and

(nW, R) -.6(n)

Before we give the proof we should understand that this theorem implies The-orem 9.1.3. Assume that the formulas (0) (,n), in < w, do not have polynomialsize constant-depth Frege proofs. This means that for any k < to and any d < wthere are in < to such that (6)(,n) does not have a depth d size < ink F-proof. Bycompactness there is a model of Th(w) and nonstandard d, k E M such that forsome m E M, M thinks that there is no depth d F-proof of (6)(,n) of size < mk.Take t := min(d, k). By this theorem then there is a model of ID0(R) in whichVx6(x) fails: That is, `dx6(x) is not provable in I AO (R).

Proof Let M, 0, n, and t satisfy the hypothesis of the theorem. Let Fle denote theset of propositional formulas coded in M, having a standard depth, built from theatoms pig, and of size < nW. We shall form a set T C Fle satisfying:

1. -'(6)(n) E T2. for any t/r E Fle: f E T or -t e T, but not both3. if , E Fle has the f o r m / \i Oj then t/r E T iff Oj E T all i4. if t E Fle has the form Vi Oi then 1 E T iff t6i c T some i5. for all ri (x) E AO(R) with parameters from n' and x the only free variable,

either -'(rl)(o) E T or (rj)(n) E T for all u E n' or (17) (u) A ,(r1)(u+1) E Tfor some u E n'

having such set T define a relation R c nW x nW by

R(i, j) if pig E T

Claim 1. For any Do(R)-sentence with parameters from nW

(nW, R) iff E T

The claim follows by conditions 2-4 posed on T.

Claim 2. (n°, R) = IL0(R) + ,6(n)

This follows from conditions 1 and 5.It remains to construct the set T having the required properties. This can be done

by a completeness-type argument, but we shall cast it as a forcing-type argument.Let P denote the class of subsets S C_ Fle satisfying the conditions(t) -(0)(n) E S

(ii) for any k < w there is no depth k size < nk F-proof of contradiction (= 0)from formulas in S

(iii) S is definable (and hence coded) in M.Note that for S E P there is s > w such that there is no F-proof of 0 from S of

depth < s and size < ns; this follows by induction as it is true for all standard s.The next claim is obvious.

Claim 3. Let S E P and i/i E Fle.Then either S U {i/i} E P or S U{-'t/i} E P.

Claim 4. Let S E P and t E S, and assume that t/i has the form V j<r 4j.Then for some jo < r, S U {0jo } E P.

Assume otherwise: That is, for every j < r there is a depth kj size < nk%F-proof of 0 from S U 10j), and hence depth £ j size < ne_i F-proof nj of -'¢j fromS, k, Li <w.As i/rEFle,r<I*I<ne,some £E(.

Take s > co such that there is no depth s size < ns F-proof of 0 from S. Eachproof irj has depth < < s and size << (ns/nt), so joining these < ne proofs gets adepth < s size < ns proof of 0 from S, a contradiction.

Claim S. Let S E P and t/i E S, and let' have the form Ai oi.Then SU{4iIalli}EP.

This is seen analogously to Claim 4.

Claim 6. Let S E P and let 11(x) be a AO (R) formula with the parameters fromn°' and with x the only free variable.

Then one of the following sets is in P too:(a) SU {-.(j)(o)}(b) S U 1(11)(u) I U E n'}(c) S U U {'(r1)(u-+1)} some u E n1°

To prove Claim 6 assume otherwise, so there is a depth ko size < nk0 proof n_of (q) (0) from S, a depth k1, size < nk« proof 7r,, of

(1))(u) - (17)(u+1)

from S for all u E n', and a depth k size < nk proof from S of the disjunction

V -(17)(")UEX

for some X c n' of size < nk.For any nonstandard s, joining proofs.Tr_ 1, no, ... , ,r for v = max (X) by cuts

entails all (17) (u), U E X by a depth s size < ns proofs, obtaining thus a depth ssize < ns proof of 0 from S, contradicting S E P.

Now we are ready to construct the set T. Let *1, t/i2, ... enumerate the set Fleand ill (x), 112(x), ... enumerate all Ao(R) formulas with parameters from n', andwith one free variable x.

Construct a sequence So, Si .... E P such that

(i) So(ii) Si C Si+1

(iii) t/li E Si or -*, E Si(iv) if r, E Si and, = Vj <, O j then l/ jo E Si, some jo < r(v) if , E Si and *, = Aj<r Oj then all (aj E Si

(vi) either -(rii)(o) E Si or A E Si for some it < n`°, or(rii)(u) E Si all u E n'.

Having Si satisfying the conditions, 5,+1 exists by Claims 3-6. Set

T:=USii

The set T fulfills requirements 1-5. Q.E.D.

The reason for the particular forcing-type formulation of the argument is itssimilarity with the following, more involved construction, which is convenientlyexpressed by using forcing. Another reason is a later finite version of this con-struction developed in Section 13.3.

Theorem 9.4.2. Let (M, X) be a model of V11 and let t (p1, ... , p,) E X be apropositionalformula in (M, X).

Then the following two conditions are equivalent:1. In (M, X) there is no EF proof off.2. There is a E'-elementary extension (M', X') of (M, X) in which -f is

satisfiable.

Proof. Assume that 1 fails, and let 7r E X be an EF-proof off in (M, X). As(M', X') is a "-elementary extension, 7r is an EF-proof off in (M', X') aswell (see Definition 9.3.1). But by Theorem 9.3.4 f must be tautologically true in(M', X'); hence 2 fails.

Assume now that I holds and assume also that (M, X) is countable. Construct(M', X') as follows.

By compactness there is a countable elementary extension (MO, Xo) of (M, X)satisfying V,' such that:

(i) there is t E MO such that for all v E M, v < t(ii) in (Mo, Xo) there is no EF-proof of f.Let (M*, X*) be a substructure of (Mo, Xo) defined by:(i) M*={vEM012wEM,v<w}

(ii) X* = { E Xo 1$ c M*}.We define in (Mo, Xo) several families. Let IT) be the atoms off and let

Fle(p) C Xo be the formulas with the atoms among { p }. Further let A be the setof the atoms { T) plus new atoms of the form q,j, one for each f E Fle(p ), andlet Fle c Xo be the set of the formulas with atoms among A.

Let C C X0 be the family of tuples of elements of A U {0, 1). Let

C*:=to eCIA EM*}

where I (q* , , ... , q*m) m, and let

Fle* := {0 E Fle 1101 E M*}

We will consider ,B E Xo simultaneously also as an element of C: the tuple of bitsof the characteristic function of,8 (so for such $: $ E C* _ 0 E X*).

The following claim is established by induction on the logical complexityof B.

C l a i m 1. Let B(f3) be a EO

and let $ E X*.Then

(M*, X*) I= B($) --+ 37rPrfF (n, (B)(4l/

where q are atoms corresponding to 0.

This is analogous to Lemma 9.3.13.We will construct a set G C Fle satisfying the following conditions:

(1) -i E G,(2) for all t1i E Fle* exactly one of t/i, -* is in G,(3) whenever 7r E Xo is an EF-proof of * from the assumptions t/i1,

In I E M* and all *; E G, then also t1r E G,(4) if' E G, t/f E Fle*, and t/i = V I <,< r *j, then t/ij E G for some

1 < j <r,(5) for any E I'1'-formula H(O, x) with the parameters from C* and any v E

M*, one of the following three conditions holds

(a) -.(H(O, E G, all S E C* of length < t(v),

(b) (H(O, v)),,(S) E G, for some S E C* of length < t(v),

(c) there is v' < v such that

(H(O, v' E G

E C* of length < t (v) and for all E E C* of length < t (v).

The term t(v) implicitly bounds the size of the interval whose subsets can besubstituted for 0 in H for x < v.

Assume for a moment that we have such a set G. Define a structure(M*[G], X*[G]) by

M*[G] := M* and X*[G] := C*/

where is an equivalence relation defined by

I IB2 lff (IBI = 02),4l , f2) E G


(u the maximum of the lengths of $1, $2). Note that (M*[G], X*[G]) is an exten-sion of (M*, X*) and hence of (M, X) too.

Claim 2. Let B(P) be any Eo'b formula with parameters from C* and $ E C*.Then we have for all sufficiently large u

(M*[G], X*[G]) = B($l -) ((B),($) E G

In particular, (M*[G], X*[G]) is a Eo'b-elementary extension of (M, X).

The claim follows from conditions (2)-(4) posed on G. For example, that allEo'b-sentences true in (M, X) also hold in (M*[G], X*[G]) follows from condi-tion (3) and Claim 1.

Claim 3. Structure (M*[G], X*[G]) is a model of V1

Condition (5) posed on G guarantees that the induction for every E,11 b-for-mula 2 H(o, x) holds up to every v E M*[G]. The other axioms hold in(M*[G], X*[G]) obviously. In particular, the Eo'b-CA is guaranteed by Claim 2.

Claim 4. There is a E X*[G] such that

(M*[G], X*[G]) = (a i)By condition (1) posed on G and Claim 2, a is a satisfying assignment for -'f

in (M*[G], X*[G]), where

a:=pal,.,It remains to construct the set G satisfying the preceding five requirements. We

shall use two simple technical properties of system EF.Form a set T C Fle consisting of all formulas:(i) qp; pi, whenever pi E { p ],(ii) q-1 (-'q*), whenever t/r E Fle(p ),

(iii) q,1,1011,2 - (q,,1 o q+.2), whenever 1, V2 E Fle(p) and o = V, A.A set of formulas S C Fle is said to f-entail formula t/r if there is an F-proof

of size at most f of iJr with the axioms from S U T. A set S is called f-consistentif S does not f-entail 0.

Claim S. Let S c Fle be a L b-definable in (M0, X0), and assume that tlr has anEF proof from S of size f in (M0, Xp).

Then S also O(f2)-entails tlr in (M0, Xp).

This follows as every extension axiom of size t in the EF-proof can be proved(after suitably renaming the extension atoms) from T by an F-proof of size 0(t2).

Claim 6. Let S CFle be a All 'b- definable in (M0, Xa) and assume that S isf-consistent in (M0, Xp), where f is nonstandard.

Then foreveryformula r ofsize at most t2 'one of the sets SU{tji}orSU{-,1ris f2-' -consistent.

Also, for every disjunction \/i<r >/ii E Fle of size at most f3-' one of the sets

S U {i \i<r -tjii} or S U { V i<r i/ii} U {ijij}, some j < r, is E3-'-consistent.

The first part is obvious. For the second part, assuming that all r + 2 sets areE3-' -inconsistent would allow us to construct in an obvious way a proof of 0 fromS of size at most

(r + 2)E3-1 + 0 1 I V *i I2 1<( i<r I

This is a contradiction.

Claim 7. Let S CFle be a OI'b-definable family of formulas in (Mo, Xo), andassume that S is E-consistent in (MO, Xo), where E E (Mo\M*).

Let H(O, x) be a Eo'b formula with parameters from C* and M*. Let v E M*,and assume that a term t (v) bounds the size of the interval whose subsets can besubstituted for ¢ in H for all x < v.

Then one of the following sets is E3-1 -consistent:(i) S U{-'(H(O, I s E C*, ISI < t(v)},(ii) SU {(H(o, some S E C* of length < t(v),(iii) SU {(H(o, U {-'(H(O, v'+ 1 p c C*, I pI < t(v)}, some

SEC*ofsize<t(v)andv' <v.

To prove Claim 7 take the formula D(u)

Vw < u3Fw E C, SE3 ' -entails formula (H(o, w))(Fu,)

The formula D(u) is a E1'b-formula and witnesses Fw are actually from C*(using the bound I Fwd < t (v)).

As (Mo, Xo) V,1 one of the two cases must occur:(a) D(v) holds in (MO, Xo)(b) there exists minimal u < v for which D(u) fails in (Mo, Xo).In case (a) define

S' := SU {(H(o,

where F is a witness to the existential quantifier of D(v). The set S' is f/2-consistent as otherwise one could f/2 + f3-' < E-entail 0 from S, which wouldbe a contradiction.

In case (b) let u < v be the first u such that D(u) fails. Take a set

S' := S U {(H(O, u - 1))(Fu-1)} U{-'(H(O, u))(q) 1;7 E C, I q I < t(v)}

for u > 1 (and again 1 the relevant witness) or

S' := S U {,(H(O, 0))(4) I q E C, I i I< t(v)}

for u = 0.We claim that S' is E3-'-consistent. Assume otherwise and w.l.o.g. let u > 1.

The set S + (H(O, u - 1))(ru_1) then O(t2/3)-entails some disjunction of theform

V (H(4,, u)) (q)qEI

where I C C*. But then (H(O, u)) (r) can also be O(22/3)-entailed from S +(H(¢, u - 1))(T.- I), where r is a new tuple defined by extension atoms using acase distinction considering which disjunct in the disjunction is true (cf. Claim 5).Note that I T I < t (v). This contradicts the assumption that D(u) fails; hence S' isj3-' -consistent.

Define now the family P of all H C Fle that are AI'b-definable in (MO, Xo)and that are f-consistent for some f E (Mo\M*); such f exists by our assumptionabout (Mo, Xo). Note that {-.f} E P.

Family P is partially ordered by the inclusion relation C. Class Q C P is denseif

'HEP3H'EQ,HCH'Class Q is definable if there is a formula 11(X) in the language of V'1 augmentedby new metavariable X such that

Q={HEPI (M,X,H) W(H)}

Class 9 C_ P is generic if it satisfies the following conditions:(i) if H E Q and H' C H then H' E 9,

(ii) 9 intersects every dense, definable subclass of P.

Claim 8. Let C C P be a generic class and assume that {,f } E G. Put

G:=UG

Then G satisfies conditions (1)-(5) and hence (M*[G], X*[G]) is a model ofV11 in which the formula -f is satisfiable.

As model (MO, Xo) is countable there are only countably many dense definablesubclasses of P; hence by the standard argument a generic class G exists. ByClaims 5, 6, 7 the classes of those K E P that fulfill condition (2) for f E Fle*

tfEK or ''fEKare clearly definable and dense, as are the classes of K E P that fulfill condition

(4) for V,<r'/i; E Fle*, that is

A,ri E K or {i/i,t/ij}cK, somej <ri<r

and the classes of K E P that fulfill condition (5) for H(O, x) and v E M*, that is{-'(H(O, 0))v(S) I S E C, ISI < t(v)} C K or

{(H(O, v))v(S)} C K, some S E C of length < t(v) or

{(H(0, v'))v(S)} U {,(H(,P, v' + 1))v(A) I A E C, IPI t(v)} C K,some S E C of length < t(v) and v' < v

Hence any G defined from a generic 9 satisfies conditions (1)-(5).This concludes the description of the forcing construction of the model

(M', X') = (M*[G], X*[G])

Q.E.D.

We leave it as an exercise for the reader to give proofs for the preceding twotheorems by modifying the proofs of Theorems 9.1.3 and 9.2.7 and by workingwith ID0(R) (resp. V11) plus the Eo'b-diagram of the original model.

9.5. Witnessing and test treesA class of search problems comes from tasks to witness an existential quantifier ina first order sentence valid over all finite structures. Let 3 TO( 7 ) be a first ordersentence in a relational language consisting, for simplicity, of one binary relationR. Assume that 3 x¢ (x) is valid in every finite structure (i.e., in every digraph inthe case of the language {R}). The search task is, Given a digraph find its verticesU such that q5 ( v) holds in the digraph.

A particular model for solving search problems is the test tree, defined as adecision tree (cf. Definition 3.1.12) but allowing labeling of the inner nodes byarbitrary formulas and leaves by tuples U.

A bounded formula of the form 3x < t(a)9(a, x ) defines for every naturalnumber n a sentence 3.7 < t (n )6 (n, x ) whose validity is determined in a finitestructure: an initial interval of N of the form [0, s(n)], s(a) a suitable term.

If 3 x < t (n)9 (n, x ) is valid for all n we obtain a sequence of search problems,and if the formula 3.z < t(a)9(a, x) is actually provable in bounded arithmetic,then by the results of Chapter 7 we have information in terms of the computationalcomplexity of the witnessing function. In this section we translate this into theterms of the complexity of the test trees solving the associated search problems.

Call a propositional formula ES 't if it satisfies conditions (a) -(d) of part 1 ofDefinition 10.4.8. For example, E S" -formulas are disjunctions of conjunctions ofsize < t.

9.5 Witnessing and test trees 181

Lemma 9.5. 1. Let i > 1. Let 3u < a0 (a, u) be a EZ+1 formula, 6 E F1', Assumethat

TT (R) I- Vx3u < x 6(x, u)

Then for every n there is a test tree T finding, given R, a valid formula from theset

{(0)n,. I u < t(n)}

and satisfying the following conditions:1. the height of Tn is 1og(n)oO)2. every test formula in Tn is ES", where S = 2t and t = log(n)°(1)

Proof The proof of Theorem 9.1.3 (based on Theorem 7.1.4) shows how to trans-late the T2 (R)-proof of the formula 3u < t(a)6(a, u) using the translation (...)of Definition 9.1.1 into a constant-depth LK-proof n' (cf. Definition 4.3.10). Thesize of Jr' is now 21og(n)oal as the term may contain the # function (cf. Lemma9.1.2).

As the cut-elimination was applied first to the original T2 (R)- proof all formulasin 7[' or their negations are equivalent to a ES"-formula (S and t as earlier),or equal to (3u < t(a)0(a, u))n. This is because Eo(R)-formulas translate intoformulas of size log(n) 0(1) (Lemma 9.1.2) and can be written both as disjunctionof conjunctions of size at most t and as conjunctions of disjunctions of size atmost t.

The IND-rule of T2 (R) is simulated in n' by < S cuts that can be arrangedin a binary tree of height < t. The V < : right and 3 < : left are simulated byA : right and V : left with A, V of unbounded (< S) arity.

The proof it' is easily turned into a constant-depth LK refutation it of the setof sequents

{) (,0(a, u))n,u I u < t(n)}

Having the truth evaluation for the atoms (i.e., the relation R) construct a pathP(R) = .1, ... , l;, of sequents of it satisfying

1. 1 is the end-sequent2. all sequents are false for R3. 1;i+1 is a hypothesis of the inference of Jr giving ,

4. ', is an initial sequentConditions 2 and 4 imply that , must have the form

(-O)n,u, some u < t(n)

Hence 0(n, u) is true for R. Thus the path P(R) contains an answer to the searchproblem.

We need to define i+l knowing i. This is trivial in all inferences except thecut-rule, A : right and V : left.

In case of the cut-rule let 1 i+1 be the unique false hypothesis; to decide whichone it is one needs to query the truth value of the cut formula.

In case of A : right with the principal formula A;<. A; , in < S, it is necessaryto find A, false for R. This is done by binary search querying log(m) < t truthvalues of formulas of the form A5<

1S, A; .

The case of V : left is dual to right. As the number of V : lefts and

A : rights on any path through r is bounded by an independent constant (e.g. bythe number of 3 < : left and d < : right inferences in the original TT (R)-proofthat is independent of m) the total number of queries is 0(t) on any path P(R).

This shows that the test tree can be constructed from 7r by turning it upside downand simulating the V : left and A : right inferences as earlier (cf. Theorem 4.2.3).

Q.E.D.

Note that for the formula 3u < t(a)9(a, u) that is E'(R) this gives no infor-mation as then the binary search for u works as well.

We shall state here without a proof a lemma relevant to the test trees but basedon ideas developed later in Chapter 11. The proof of the lemma is almost identicalwith the proof of Theorem 11.3.2 (based on the proof of Theorem 11.2.5).

Lemma 9.5 2. Let 3x9(x) be a sentence in a relational language L' and assumethat -3x6(x) has an infinite model. Assume also that 3x6(x) is valid in everyfinite structure.

Denote by Xn the search problem to find in a given structure with n elementssome u satisfying 9(u).

Then there is no constant k such that each Xn could be solved by a test tree ofheight < log(n)k querying the validity of the Ei'` formulas t = log(n)k, S = 2r.

Note that such test trees for Xn are guaranteed to exist if 9(x) has the formVyo(x, y) and the formula 3x < aby < ao(x, y) is provable in TZ (L'

Next we reexamine Lemma 9.5.1 for the theory S2(R) in place of T2 (R). Weshall consider witnessing test trees instead of ordinary test trees. An ordinary testtree in a node querying 3x < n f (x) branches into two paths, one for the affirmativeanswer and one for the negative one. The witnessing test tree will allow, besidesthese nodes, nodes that branch into n + 2 paths: one for the negative answer andn + 1 for all possible witnesses u < n for the affirmative answer. Hence we mustknow a witness if we want to proceed. Such nodes are called witnessing.

Lemma 9.5.3. Let i > 1. Let 3u < a69 (a, u) be a E,+i formula, 0 E 11'. Assume

SS(R) I- Vx3u < a9(a, u)

Then for every n there is a witnessing test tree Tn finding, given R, a valid formulafrom the set

{(B)n,u I u < t(n)}

and satisfying the following conditions:1. the height of T is O(log(log(n)))2. on every path in T there are only O(1) witnessing nodes3. every test formula in T is ES", where S = 2r and t = log(n)°(1)

Proof. The proof of the lemma is similar to the proof of Lemma 9.5.1. We needto notice only that as the PIND-rule is used instead of the IND-rule there are only< t cuts (arranged in a binary tree of height O(log(t))) needed for the simulationof the PIND-rule.

The A : right and V : left inferences are treated this time with the newwitnessing nodes of the tree. Q.E.D.

The reader may observe that Lemmas 9.5.1 and 9.5.3 are versions of the wit-nessing Theorems 7.2.3 (for T2-1 in place of SZ there) and 7.3.5.

9.6. Bibliographical and other remarksThe translation of Section 9.1 was first considered in Paris and Wilkie (1985)for IDo(R) in a model-theoretic context (Theorem 9.4.1). The proof-theoreticpresentation (and the extension to Ull and V11) follows Krajidek (1995a).

The translation of E a-formulas in Section 9.2 generalizes the translation de-fined in Cook (1975) for PV-equations and was developed in Krajicek and Pudlak(1990a) and somewhat differently in Dowd (unpublished). Propositions 9.2.2-9.2.6 are from Krajicek and Pudlak (1990a). Corollary 9.2.7 is from Cook (1975).The relation between G and Ui was proved in Krajidek and Takeuti (1990). Usinga different translation a relation between G and an equational system PSA wasproved in Dowd (1979).

The idea to use the reflection principles for simulations is due to Cook (1975)and was extended to systems G;, G1 , G, and subsystems of F and EF inKrajicek and Pudlak (1990a), Krajidek and Takeuti (1990, 1992), and Krajidek(1995a). The underlying idea is always the same, but different systems often presentsome nontrivial technical problems.

Corollary 9.3.19 was first observed in Dowd (unpublished). Corollary 9.3.20is from Krajicek and Pudlak (1990a), as well as Theorem 9.3.2 1.

In view of the remark after the proof of Theorem 9.3.24 it would be interestingto have a theory satisfying Theorem 9.3.3 and Theorem 9.2.6 with a polynomialbound instead of bound n (log ")°(') . This would entail (by the same proof) Theorem9.3.24 for P = F. In fact, Theorem 9.3.24 for P = F was proved by a directconstruction in Buss (1991), and analyzing that construction Arai (1991) suggesteda subtheory of Ul fulfilling the preceding requirements: It is an extension of I Eo'bby a form of inductive definition. Another approach would be using the equationaltheory ALV proposed in Clote (1992) relating to F similarly as PV relates to EF.


However, Clote (1992) does not verify that the soundness of F is provable in thetheory.

The proof of Theorem 9.4.1 is a variant of the original proof of Paris and Wilkie(1985) to allow generalization to Vil (Theorem 9.4.2), which is from Krajicek(1995a), and generalizes an unpublished construction of Wilkie.

The content of Section 9.5 expands a bit on a remark in Krajicek (1994).

10

Finite axiomatizability problem

This chapter surveys the known facts about the

Fundamental problem. Is bounded arithmetic S2 finitely axiomatizable?

As we shall see (Theorem 10.2.4), this question is equivalent to the questionwhether there is a model of S2 in which the polynomial time hierarchy PH doesnot collapse.

10.1. Finite axiomatizability of SZ and T2In this section we summarize the information about the fundamental problem thatwe have on the grounds of the knowledge obtained in the previous chapters.

Theorem 10.1.1. Each ofthe theories S2 and TZ isfinitely axiomatizable for i > I.

Proof. By Lemma 6.1.4, for i > I there is a Eb-formula UNIV; (x, y, z) that is auniversal Eb-formula (provably in SS). This implies that T2 and S2+1, i > 1, arefinitely axiomatizable over S.

To see that SS is also finitely axiomatizable, verify that only a finite part of SSis needed in the proof of Lemma 6.1.4. Q.E.D.

The next statement generalizes this theorem.

Theorem 10.1.2. Let 1 < i and 2 < j. Then the set of the VEb -consequencesof T2

dEb(Tz):=EE'ITTI-0}is finitely axiomatizable.

The sets V E' (SS) and V Eb (UU) are also .finitely axiomatizable.

185

186 Finite axiomatizability problem

Proof Let 0(a) E Eb. Assume

T2 F--0 (a)

Then by Theorem 9.2.5

s2H(dxG;1-il4)IIq(ixu)

for some bounding polynomial q (y) for 0, and hence also

SS + j-RFN(Gi) H (Vx Tautj(IIOllgxllxp))

This implies by Lemma 9.3.12

SS + j-RFN(Gi) H Vx O(x)

Hence every VEb-consequence of TT follows from SS + j-RFN(G1). Note thatalso j-RFN(G,) E VEj(Tz) (by Theorem 9.3.16) and that SS has a finite VEZ-axiomatization.

An analogous argument applies to VE'(SS), using G* in place of G,, and toVE'(UZ ), using G in place of Gi. Q.E.D.

The following statement is a separation of the theories S2 and T2 conditionedon a combinatorial property.

Theorem 10.1.3. Let i > 1, i > j > 0 and assume that the proofsystem G* doesnot j-polynomially simulate the system Gi.

Then SS TT and, in fact, T2' is not V Eb-conservative over S.

Proof. The VE'-formula j-RFN(Gi) is provable in TT (by Theorem 9.3.16) and(by Theorem 9.3.17) is provable in SZ if and only if G, j-polynomially simulatesGi (provably in Si). Q.E.D.

10.2. T2' versus 52+1

Inspired by Theorem 7.4.1 we consider the following computational complexityprinciple.

The principle is associated with an optimization problem (see Section 6.3) ofthe type: R(x, y) is a binary relation satisfying

VxR(x, 0) A Vx, y (R(x, y) -* lyl < lxl)

and the task is, given a find b such that

R(a, b) and Ibl is maximal

In Section 6.4 we considered only polynomial time predicates R(x, y), but nowwe allow higher complexity.

10.2 T21 versus 52+i 187

Definition 10.2.1. For i > 0, the principle S2; is the following computationalprinciple:

For any relation R(x, y) E 11' there are D+I -functions

f i (a), f2 (a, b1) .... fk(a, bl, ... , bk-1)

that solve the preceding optimization problem in the counterexample interactiveway described at the beginning of Section 6.3. That is, the following is true:either `dzR*(a, fi(a),z) is true, or if b1 is s.t. _,R*(a, f1(a),b1)then `dzR*(a, f2(a, b1), z) is true, or if b2 is s.t. _R*(a, f2 (a, bj), b2)then ...

then VzR*(a, fk(a, b1, ..., bk_1), z) is truewhere the relation R* (x, y, z) is defined by

R(x, y) A (lyl < IzI < IxI - -R(x, z))

Recall the convention that rIo = oP = P.

Lemma 10.2.2. The principle S2; is implied by E +I = O +,, and S2j implies

E +1 c +1 /poly and hence also E+2 = I1ip+2

Proof. That Q, is implied by E+1 = +1 follows from the fact that the binarysearch for an optimal b can be performed (under the hypothesis E+I = 0 +I) bya D +1-function. The principle Q, then holds with k = 1, in fact.

The proof of the second part is more involved. We show that the hypothesis thatS2; is valid implies E+I c 0+1 /poly. That the latter implies E+2 = n +2 is by

Theorem 3.1.6 straightforwardly generalized to i > 0.We shall treat only the case i = 0; the general case is completely analogous.Let A(v) be a Ep-predicate and assume A(v) is defined

A(v) := 3w < v B(v, w)

with B(v, w) a AP -predicate.We want to show that there is a polynomially bounded advice, function h (n) (cf.

Section 2.2) such that for some DP-function g(u, v)

Vv, (3w < v B(v, w)) - (B(v, g(h(IvI), v)))

holds. Call w a witness for v if w < v A B(v, w), and define a AP-relation

R(a, b) := ifa=(vi,...,vr) and b=(wl,...,w,) then

s < r and d.f < s: "wt is a witness for vt"

Now we apply the principle S2o: We assume that there are f -functions f1 (a),f2(a, b1), ... , fk(a, b1 .... , bk_1) interactively computing, given a, an optimal bsuch that R(a, b) holds.

Fix length n. We shall describe how the advice h(n) is constructed. Let V1 =(l v i = n 13w < v B(v, w)) and let w(v) denote a witness for v E V1. For a k-tuple(the same k as is the number of functions f1, ... , f k ) a = (vl I ,--- vk) of distinctelements from V1 consider the following algorithm constructing a pair (e, w):

Step 1 Compute fl (a).Step 2If f1(a) = (w'., ... , j > 1, and R(a, f1 (a)) holdsthen put (C, w) (1, w,) and STOP.Else compute f2(a, (w(v1))) and GO TO Step 3.Step m (2 < m < k):If fm-1 (a, (w(vl )), ... , (w(vl ), ... , W(Vm-2))) = (w'1 , ... , W j), jm - 1, and R(a, (w,_., w')) holdsthen put (e, w) := (m - 1, w'm_1) and STOP.Else compute fm(a, (w(v1)),..., (w(v1),..., w(v,,,- 1))) and GO TO Stepm + 1.Step k + 1 If the algorithm reaches this step then it necessarily holds that

fk(a, (w(vl)), ... , (w(vl), ... , w(vk-1))) = (w'1,..., wk) and

R(a, (w'., ... , wk)) is true

Put (t, w) := (k, wk) and STOP.The idea of the algorithm is that from the witnesses w (v 1), ..., w (ve _ 1) it is

possible to compute by a P -algorithm a witness (namely w) for ve E V. Nowwe utilize this for a construction of the advice h(n).

For Q a (k - 1)-element subset of V1 and V E Vl \ Q we say that Q helps vif and only if there is an ordering IV],- ve-1, ve+l, ... , vk) of Q such that theprevious algorithm assigns to the k-tuple

(vl,...,ve-l, v,ve+l,...,Vk)

the pair (t, w), where w is a witness for v.Put N1 = I V1 1. The previous algorithm produces the pair (f, w) from any k-

tuple of elements of V1; this shows that the number of pairs (Q, v) such that Qhelps v is at least (k'

On the other hand, there are only (k '' 1) (k - 1)-element subsets Q of V1, andso there must be one subset Q1 c V1 such that Q1 helps to at least

(k') _ N1-k+l(^'') kk-1)

elements v E V1.

10.2 TT versus S2+i 189

Put

V2 := V1 \ {v E V1 I Q1 helps v}

and N2 V2 1. Analogously there must be a (k - 1)-element subset Q2 c V2 thathelps at least

(N2) N2 -k+ 1(kNZ1) k

elements V E V2.Iterating this procedure we obtain a decreasing chain

V1 DV2D...

and a sequence

Q1 c V1, Q2cV2,...

of (k -1)-element subsets Q j of Vj, such that Q j helps all elements v E Vj \ V j+1.A simple computation shows

Nj+1 <(kk1)J

N1+k

and so we get Nt < k after at most t steps of the procedure for

t:=0(1og2(kk. 1og2(Nl)I=0(n)- 1) /

Define the advice h (n) to be the sequence of all pairs

(v, w(v))

for all elements

V E Q1 U ... U Qt-1 U Vt

The function g(u, v) for u = h(IvI) is then defined by the algorithm1. try whether v E Q1 U ... U Qt_1 U Vt and if so output w(v) (which is a

part ofh(IvI)).2. if 1 fails try consecutively whether Qj helps v , j = 1, ..., t - 1. One of

them must, giving a witness for v (which g(u, v) outputs).The second part of the lemma then follows, observing that, for example, the

problem of computing the maximal size of a clique in a graph, that is, optimizingIbI in

R(a, b) := "b is a clique in graph a"

is an NP-complete problem. Q.E.D.

190 Finite axioinatizability problem

The disadvantage of the construction of the advice in the proof of the previouslemma is that it is not apparent how to formalize it within the theory S2. If onehad a construction that could be formalized in S2, then Theorem 10.2.4 could alsobe formalized in S2. The next lemma offers such a construction, albeit for a bitweaker statement.

Lemma 10.2.3. Let i > 0 and assume that.for every 11' predicate R(x, y) thereare +1-functions fl (a), ... , fk(a, bI, ... , bk_I) such that for R*(x, y, z) de-fined from R(x, y) as in Definition 10.2.1 T2 proves

`dx, yl , .... yk; R* (x, f1(x), Yi) V ... V R*(x, fk(x, Yl, ..., yk- I ), yk)

Then

TZ I- E PI C R+1 /poly

This means that for every E+1 -formula A(v) there is a TI+1 formula C(v, u) forwhich

TZ F b'x3uVv; IxI = I v I -a A(v) - C(v, u)

Proof. Let A(v) be a E+1-formula of the form 3w < v B(v, w) with B E l1',and let R(x, y) be defined for B as in the proof of Lemma 10.2.2, where functionsfl, ... , fk fulfill the hypothesis of the lemma.

Fix length n and consider a formula E(t, u) that is the conjunction of theconditions

(i)(ii)

u=(v.+l,...,vk)A t> lAIvr+1I=...=lvk1=n

Vvl,... , v1, WI.... , Wt Vt, w; n (wi < vi A B(vi, wi))1<<i<1

A (t, w) = D((v1, ... , Vk)) -+ £ < t

where D(a) is the algorithm described in the proof of Lemma 10.2.2 with w (vi)wi,i = 1,...,t.

Let to be the minimal t < k such that for some u, E(to, u) holds. Such toexists as there are only finitely many possibilities for to; in particular, no inductionaxioms are needed to assure the existence of to.

Claim. TT proves that, for I v I = n and

u = (vrp+I, ... , vk )

such that E(to, u) holds the following conditions are equivalent1. A(v) holds

10.2 TZ versus S2+1 191

2. C(v, u) holds where C(v, u) is the formula

`dv1,...,vto_1 VwJ,...,wt0_t V ,w,

A (Ivi I = n n wi < vi n B(vi, wi ))0toA (t, w) = D((vo, , Vto-1, v, vto+1, .. , Vk)))

((e<tones<venB(ve,w))v(f=tones<vnB(v,w)))

To see the claim, reason in T2 and assume first that A (v) holds: that is, that forsome w < v the formula B(v, w) holds. Then the second condition is satisfied byS2i, that is, by the definition of the algorithm D(a) f r o m the functions f i , ... , fk,and by the choice of to and u.

Assume now that condition 2 is satisfied but that A (v) does not hold. Thatmeans that (f, w) from 2 must satisfy f < to, but then u' = (v, Vto+l, , vk)would satisfy E(to - 1, u'), contradicting the previous choice of to as the minimalt for which 3x E(t, x) is valid.

The definition of the algorithm D(a) is D+, and the formula C(v, u) fromcondition 2 is IIb+l. Q.E.D.

We can now employ these two lemmas to get a strong theorem about the relationof theories T2 and S2+'.

Theorem 10.2.4 (Krajicek, Pudlak and Takeuti 1991). Let i > 1. Then the the-ories T2 and S2+1 are distinct, assuming that Ei+1 ¢ A+, /poly, or assumingthat the polynomial time hierarchy PH does not collapse to its (i + 2)"d level:

P PEP1 i+2'

In fact, the theories T2 and S2+1 are distinct, assuming that T2 does not provethat EP I C rl,

1/poly.

The same is true for i = 0 with PV1 in place of T2.

Proof We shall show that T2 = S2+1 implies that the principle S2i is prov-able in T2, that is, that for every rIb-formula R(x, y) there are o +i-functionsfi (a), ... , fk(a, b1, ... , bk_ I) O +i -definable in T2 such that T2 proves the dis-junction

R*(a,.fi(a),bJ)v...vR*(a,fk(a,bl,...,bk-1),bk)

with R*(x, y, z) defined from R as in Definition 10.2.1.Consider the formula

*(x, y) := IyI < Ixl A R(x, y)

It is a n'-formula and i(x, 0) holds (by the assumption that `dx R(x, 0) holds).

By Lemma 5.2.7 SS+' admits the IIb-LENGTH-MAX principle: That is, S2+1proves that there is y of the maximal length satisfying i (x, y)

3y(IYI < Ixi) Vz(lzi < Ixl), R(x, y) A (IYI < Izl - -R(x, z))

That is

Vx 3yVz R*(x, y, z)

(with the bounds < x to y and z implicit).Now use the hypothesis of the theorem to get

T2' I- Vx3yVz R*(x, y, z)

Theorem 7.4.1 then applies (with 0 = R*) and guarantees the existence of theD+' -functions fi , ... , fk with the required property

T2' F-R*(a,fi(a),bi)v...vR*(a,fk(a,bl,...,bk-1),bk)

By Lemma 10.2.2 this implies that E+' c +I /poly and hence E +2 = II +Z,while Lemma 10.2.3 implies that T2 proves that E+' c f1+' /poly. Q.E.D.

Corollary 10.2.5. Let i > 1 and assume that T2 = S2+'. Then T2 = S2. Thesame holds for i = 0 with PV1 in place of T2.

Proof We use the second part of Theorem 10.2.4 by which the hypothesis of thelemma implies that TT proves EP I c n+ /poly.

First we show that T' = T2+' . Let a (x) be a E+, -formula and, working inT21 assume that

a(0) A`/x < v(a(x) --* a(x + 1)) A --a(v)

By E +i c II,+' /poly we know that for every length n there is an advice un suchthat

Vx, IxI = n a(x) C(x, un)

where C(x, y) is a fixed I1+' -formula. Joining the advice uo, ... , uIvi into one uwe have

Vx <v,or (x)=C'(x,u)

where

C'(x, u) := C(x, uIxI)

is a R+i-formula.But then or(x) is provably A

+'on the interval [0, v] and as S2+' admits A +I-

IND (by Lemma 5.2.9), T2(= Sr') satisfies the induction for or (x) on any interval

[0, v]. This shows that T2' = T2+'.

10.3 SZ versus T2 193

To show that, in fact, TT = S2, observe that

T2 F- Ei+1 c 11+1 /poly

implies that any q(x) is in T2 on any interval [0, v] equivalent toa 11 +1-formula (with parameters depending on v); this is proved by inductionon the logical complexity of rt. Hence the induction for any rt(x) follows fromn +1 IND, and hence from T2' as we have already established that T2' = T2+1

Q.E.D.

Corollary10.2.6. To show that S2 is not finitely axiomatizable it is necessary andsufficient to show the existence of a model M of S2 in which the polynomial timehierarchy PH does not collapse, that is, in which for every i the class E +1(M) of

Esubsets ofM definable by +1-formulas with parameters is strictly larger thanthe class E,'(M).

Proof. S2 is finitely axiomatizable iff (by Theorem 10.1.1) S2 = TT for some i if(by Theorem 10.2.5) T2i = 52+1 for some i. This implies (by Theorem 10.2.4)

Tz I- Ei+1 c 11+1 /poly

for some i.We shall show that the last condition implies that

T2E+3-II+3

which trivially implies T2+3 = S2, completing thus the proof of the corollary.The inclusion E+1 C II +1 /poly obviously implies (in T2 )

E +1 /poly c II+1 /poly

and in particular PH c rI +1 /poly. By analogy with Theorem 3.1.6 (for i + 2 inplace of 1) this gives (again in TT)

P 1PH = E3 = ni+3

Q.E.D.

We note that Buss (1993b) showed that Er+1 _c I1+1/poly implies even thatPH C 8(E ,P2): that every set in PH is a Boolean combination of E+2-sets. Thisis not needed, however, for the previous corollary.

10.3. S2 versus T2

This section presents a conditional separation of theories S2i and T2 .


Theorem 10.3.1 (Krajicek 1993). Let i > 1. Then the theories S2 and T2 aredistinct, assuming that the classes PEp[O(Iogn)] and 0+i are distinct. In fact,the assumption can be weakened to the assumption that SS does not provePEf [O(logn)] = O+i.

Proof Let i > 1. By Theorem 6.1.2 TZ can E+i -define all D+1-functions and, inparticular, all 0 +i -predicates. By Corollary 7.3.6 the predicates E+1-definable

in S2 are exactly those from the class PEP[O(log n)] and, in fact, Theorem 7.3.5

shows that the equivalence of a PEf[O(log n)]-predicate to a E+1-predicate isprovable in S.

Hence SS = T2 implies that any E -definable predicate in TT (i.e., any 0+ -

predicate) is in SS equivalent to a P£, [O(logn)]-predicate. Q.E.D.

Note that it is open whether Pyip[O(logn)] = D+, implies the collapse ofthe polynomial time hierarchy PH. We thus cannot obtain a statement analogousto Corollary 10.2.5.

Corollary 10.3.2. Assume that there is a model M of S2 in which the classesEo(E6)(M) and O +i (M) are distinct for all i > 1.

Then all theories S2 and T2 are mutually distinct.

Proof. The class E0(Eb) is the class of formulas that are obtained by logicalconnectives and sharply bounded quantification from Eb-formulas. These define

exactly thePyiv[O(logn)]-predicates

(cf. Theorem 6.2.3). Hence the hypothesisimplies, via Theorem 10.3.1, that SS T2, all i > 1.

The inequality P£%P[O(logn)] 0 O+I implies, in particular, that EP = E+1,

and hence by Corollary 10.2.6 also that PV1 ¢ S; and TT # S2+1, i > 1.Q.E.D.

10.4. Relativized casesIn this section we prove, using the results of previous two sections, that all theoriesSS (R) and T2 (R) are distinct.

Recall that these are the theories defined as S2 and TZ but in the languageL U {R} extending L by a new binary relation symbol R. Equivalently we couldwork with the theories S2 and TZ in the language L2 of the second order boundedarithmetic.

Lemma 10.4.1. Assume that there is an oracle A such that the relativized poly-nomial hierarchy PHA does not collapse. Then the theories T2 (R) and S2+' (R)are distinct.

Assuming that there is an oracle A such that all relativized classes LEipiAIO +i (A) are different, then the theories S2 (R) and T2 (R) are distinct.

Proof. The lemma is a corollary of the proofs of Theorems 10.2.4 and 10.3.1, whichboth "relativize" to theories SS (R) and T2 (R). This is because the definability andthe witnessing theorems from Chapters 6 and 7 readily relativize. Q.E.D.

A construction of an oracle separating the levels of the polynomial time hier-archy was proposed by Furst, Saxe, and Sipser (1984). It reduced the existenceof such an oracle to the existence of exponential lower bounds on the size ofdepth d - 1 circuits computing Sipser's function Sd(xl,,,,.,;d) (cf. Section 3.1).Such bounds were announced by Yao (1985) and proved by Hastad (1989). In thefollowing theorem we use similar combinatorics to construct oracle A for whicheven

(Ef (A)) # API (A)

for all i > 1. Recall the class <tt (E°)

from Theorem 6.2.3.

Definition 10.4.2.(a) For i > 1 define the l1'_1(a) formulas

(1) '1(x,yi):=Y1 =0Va((i,x,YI))(2) 2(x, yl) := yl = 0 v VY2 < (x log(x))1i2 a((i, x, yl, y2))

(3)

ti (x, YO := Yl = 0 (1vVy2 < x3Y3 < x ... Qi-lYi-1 < x QiYi < ixlo (x) 1/2

\ 2 1

xa((i,x, y1,...,Yi))

where Q1-1 (resp. Qi) is V (resp. 3) iff i is odd.

(b) For i > 1 define the predicate

Pi (x, a) :_ "the maximal yl < x satisfying ti (x, y1) is odd "

The predicate Pi generalizes the ODDMAXSAT problem.

Lemma 10.4.3. The predicate Pi(x, A) is in the class 0+1(A), for all i > 1 andAcw.

Proof The algorithm finding maximal Y1 < x such that ii (x, yl) based on thebinary search defines a Op 1(A)-function. Q.E.D.

The crucial idea allowing later combinatorial arguments is to fix in ,fi (x, yl )values of x and yl and to think of such *i (m, u) as of a predicate defined for setsA c_ w. Next we shall define the Boolean circuitsi (m, u) computing such apredicate.

Definition 10.4.4.(a) The input variables of circuit Sri (m, u) are of the form

Pu,y2, ..y;-i.t

for every (i - 2)-tuple yZ, ... , y,_I< m and t < (i m log(m)/2)'/2.(b) The circuit ri(m, u) is thepropositional translation (t/r,(x, yi))(m, u) with

the atomic formula a((i, x, y, ... , yi)) translated as p,,. yi (cf. Defini-tion 9.1.1).

(c) The circuit Ci' is a disjunction of Fm/21 conjunctions

'lri(m,u)A A -Ti (m, v)u<u<m

one for each odd u < m.

The following lemma is straightforward.

Lemma 10.4.5. For any i >I and m:1. The circuit ifri(m, u) computes the truth value of i/ri(m, u) under the as-

signment

Py...... yi := 1 i T (i,m, yi,...,yi) E A

2. Under the same assignment the circuit C,"' computes the truth value of thepredicate Pi (m, A).

The next definition and lemma are crucial technical concepts in the method ofrandom restrictions in Boolean complexity.

Definition 10.4.6.(a) Let (Bj)j be a partition of the atoms of the circuit C!" into mi classes of

the, form

IP t I t <y1/2

2 jone f o r e v e r y choice o f yl, ... , yi _ I < m.

(b) Let 0 < q < 1 be a real number. A probability space Rq of randomrestrictions (cf. Section 3.1) is a space of restrictions p determined by thefollowing process

(a)

I *, with probability q

0, with probability 1 - q

(b) for every atom p E Bj

p(p)sj, with probability q

jt 1, with probability I - q

(c) A probability space R9 is defined as R9 interchanging the roles of 0and 1.

(d) For any p E Rq, g(p) is a further restriction and renaming of the atomsdefined as_follows (for all j):

(a) for j s. t. sj = * let pj = py, be an atom from Bj given value* by p with the least t,

(b) g(p) gives value I to all p E Bj, p # pj s.t. p(p)

(c) g(p) renames pj to atom py,,...,y;_,.

(e) For P E Rq , g(p) is defined as in (d) but again interchanging 0 with 1.(f) For G a circuit with the atoms among those of the circuit C7' , GP denotes

the circuit obtained from G by first performing the restriction p and thenalso the restriction g(p). Note that the atoms of GP are among those of

Ct,,,-I

The following lemma is crucial. We refer to Hastad (1989) for full details ofthe proof.

Lemma 10.4.7. Fix a value q :_ (2 i . log(m)/m)'/2 and assume that m issufficiently large. Then the following three conditions hold.

1. Let G be a depth 2 subcircuit of C"': That is, G is either an OR ofANDs ofsize < (i - in log(m)/2)'/2 or an AND of OR5 of size < (i - in log(m)/2)1/2.Pick a random p from R9 if G is an OR of ANDs and from Rq if it is anAND of ORs.Then with the probability at least

1 - 1 m-i+i3

GP is an OR (resp. an AND) of at least ((i - 1) m log(m )/2)1 /2 differentatoms.

2. Let i > 3. Pick p random from Rg for i even or from Rq for i odd.Then with the probability at least

2

3

I . That is, after some renaming of the atomsthe circuit (Cr )P contains C!"(Cr')P becomes C;" 1.


3. Let i = 2. Pick p from RIq

randomly. Then with a probability at least

2

3

the circuit (C2')p contains circuit Cn, for n = (m log(m)/2) 1/2.

Proof We sketch the proofs of the parts of the lemma.(1) Assume that G is an OR of ANDs and p is drawn from RI

q(the case when

G is an AND of ORs and p E Rq is treated similarly).An AND gate of G corresponds to class Bj of atoms and takes after p the value

sj with the probability at least

ini logos)) 1/2lie (

1 - (1-q)IB1I _ (I -(2ilog(m)2

> 1 - 6m-i

Hence with probability at least

1 _ 1 m-i+l6

this is true for all ANDs in G.The expected number of ANDs assigned the values sj in the definition of p and

not the value 0 is m q = (2im log(m))1/2, and thus with probability at least

1 - 1 M-i6

we can get at least ((i - i)m log(m)/2) 1/2 se's assigned.Hence with probability at least

1 _ 1 m-i+l3

the circuit GP is an OR of at least ((i - 1)m log(m)/2) 1/2 different atoms.(2) In C"' there are m' -2 different subcircuits G of depth 2. From (1) we have

that with the probability at least

1 - 1 m-1 > 2

3 -3all of them are restricted by p as required in (1). So, after renaming the atoms,(Ci"T becomes the circuit Cm 1.

(3) For i = 2 the circuit -i/i;(m, u) is just an AND of size (m log(m))1/2 corre-sponding to the classes Bj, and there are m of them. Hence (1) implies that with theprobability at least (5/6) they are all assigned the value sj, which is again with theprobability at least (5/6) equal to * for at least (m log(m)/2)1/2 of these ANDs.

Q.E.D.

Definition 10.4.8.1. A Boolean circuit is called E ,. if

(a) it has depth i + 1

(b) its top gate is an OR and ORs and ANDs alternate in levels

(c) it has at most S gates in levels 2, 3, ..., i + 1

(d) its bottom (depth 1) gates have arity at most t

(e) its atoms are among those of Cr.

2. A tt-reducibility D = (f; El, ..., Er) of type (i, m, k) is a Boolean func-tion f(wl,..., w,.) in r < log(m)k variables and with r Esm-circuitsE1, ... , E,., where S = 21og(m)k and t = log(S).

3. A tt-reducibility D of type (i, m, k) computes a junction of the atoms of Cm:First evaluate wj Ej on the atoms and then evaluate f (w1, ... , w,.)on wj's.

Lemma 10.4.9. Let G be an AND of ORs of size < t with the atoms among thoseof C. Pick p randomly, from Ry or from Rq

Then with probability at least

1 - (6gt)s

the circuit GP can be written as an OR of ANDs of size < s.The same is valid for the probability of the switching an OR of ANDs into an

AND of ORs.

This is one of Hastad's two switching lemmas. For the proof consult Hastad(1989).

Lemma 10.4.10. Let D be a tt-reducibility of type (i, m, k) and let q(2i log(m)/m)1/2. Pick p at random from Rq orfrom Rq .

Then with the probability at least

1/2

is

D,' (f;EP,...,Ep)a tt-reducibility of type (i - 1, m, k).

Proof. Putt := s := log(m)k and apply Lemma 10.4.9. Then the probability of afailure to switch one depth 2 subcircuit of any Ej is at most

(6qt)' =(6(2i log(m) )1/2 log(M)k )Iog(ni)k < 2-2.1og(m)k

for sufficiently large m.

There are < log(m)k 21,g("')' such depth 2 subcircuits, so with probability atleast

1 - log(m)k2-log(m)k > 1/2

all of them are switched. The switched subcircuits can then be merged with thegates at level 3, to decrease the depth of circuits Ej by 1. Q.E.D.

The following lemma shows the nonexistence of a particular tt-reducibility.

Lemma 10.4.11. Let i > 1 and k be fixed. Then, for in sufficiently large there isno tt-reducibility of type (i, m, k) correctly computing the predicate P, (m, A) forall ACw.

Proof. The proof of the lemma divides into two claims.

Claim 1. Let D be a tt-reducibility of type (i, m, k) computing the predicatePi (m, A) for all A c co.

Then there is a tt-reducibility D, of type (1, in, k) computing correctly thepredicate P, ((m log(m)/2)1"2, B) for every B c w.

Claim 2. For k fixed and in sufficiently large there is no tt-reducibility of type(1, m, k) computing correctly P, ((m log(m)/2)'/2, B) for every B c co.

We prove Claim 1 first. The predicate P, (m, A) is computed by the circuit CT(Lemma 10.4.5). By Lemmas 10.4.7 and 10.4.10 (and the probability q definedthere) a random restriction p (drawn from Rq for i even and from Rq for i odd)converts with probability > 1 /6 simultaneously C"' into C' and Di = D into att-reducibility Di_I of the type (i - 1, m, k). Hence there exists such a restrictionp and clearly C', and Di-1 compute the same predicate, that is, P-1 (m, A).

Apply this conversion (i - 1)-times (with part 3 of Lemma 10.4.7 in the laststep) to prove the claim.

Now we prove Claim 2 by a direct diagonalization of any type (1, m, k) tt-reducibility D.

Put n := (m log(m)/2)'/2 and t := log(m)k. Let D = (f; Ei, ... , E,.) be atype (1, m, k) tt-reducibility and for simplicity of notation denote C' by C.

We shall construct a sequence of sets of numbers AS , AS , IS. satisfying1. AS fl AS = 0 and any number in AS U AS is smaller than t.2. IAS I <sand IAs UAS 1 _<st.3. at least half of the numbers < max(A) belong in AS U A-s4. IS S; (1,...,r}andIIj=s.5. for every B C w for which

AS CB A AS flB=0

and for every j E Is it holds that

EB = 1

EB denotes the circuit Ej evaluated according to B.Put Ao := A- := 10 := 0. Assume that we have AS , AS , IS satisfying the

conditions stated.Put B := AS (hence E q = I for all j E IS) and consider three cases:(a) DB = 1 but max(B) is even, or DB = 0 but max(B) is odd. In this case

STOP.

(b) DB = 1 and max(B) is odd. Consider a set

S={x <2` 1max(AS) <x,xeven, x 0 As }.

By conditions 1, 2, and 3, the set S is nonempty. Two subcases may occur:

(b l) We may add some x c= S to B to form B' := B U [x) such thatDBE=DB=1.In this subcase put A +1 := AS U {x} and AS+1 := AS and STOP.

(b2) We cannot add any x E S to B with the property (b 1).

Take x := min(S) and put A+1 := AS U {x}. As the circuit Dchanges value, some Ego for jo Is had to receive new value 1. Takean AND of Ego (containing x), which becomes true, and add indicesof all atoms negatively occurring there to AS to form As+1. Note thatthis is correct as none of them is in AS .

Put Is+1 := Is U { jo} and GO TO step s + 2.

(Note that the new sets A+,, AS+1, and Is+I fulfill the conditions1-5. For example, 3 holds as x was chosen to be the minimal elementof S.)

(c) DB = 0 and max(Aa) is even. Take a set

S = {x < 2` I max(AS) < x, x odd, x V AS }

and proceed analogously with case (b).If the sequence is not completed at step s, necessarily Is ¢ I,s+l and hence the

construction must eventually halt.Take B := AS for the final s. Clearly DB does not agree with CO; this proves

Claim 2 and hence also the lemma. Q.E.D.

Theorem 10.4.12. There is an oracle A such that for all i

<tr (ET(A)) 0 +I (A)

and hence also

EP(A) 0 FlP(A)

Proof. We construct the oracle A c_ w such that for all i > 1 the predicate P; (x, A)is not in <t, (EP(A)). This is enough by Lemma 10.4.3.

Let Mi, j = 0, 1.... enumerate all polynomial time machines. We shall con-sider successively pairs (i, j) and build A in stages to assure that Mj is not att-reducibility of P, (x, A) to EP(A) (to some canonical EP(A)-complete set).

Let A s be the approximation of A constructed in the firsts stages and let (i, j) bea pair to be considered next. Choose number m := ms+1 so large that all numbersconsidered in the firsts stages are small w.r.t. m. The machine Mj outputs on m aBoolean function f (w1, ... , wr) and queries zj, ... , Zr to EP (A). Such queriesnaturally correspondand to the evaluations of a E S'log("-circuit (a propositional trans-Y P I.,m

lation of the instances of the E'(A)-formula defining the EP(A) complete set),with S = 2Iog(m)'

In these circuits first evaluate the atoms "n E a" according to AS and set to 0all atoms "n E a" f o r n not of the f o r m (i, m, yl, ... , yt ).

This leaves us with a tt-reducibility of type (i, m, k), and by Lemma 10.4.11 itcannot compute the predicate Pi (x, A) correctly for all A C w.

Take a finite As+, D AS in such a way that the reducibility fails, and hence themachine Mj does also.

Proceed to the next pair (i, j).This completes the proof of the theorem. Q.E.D.

Corollary 10.4.13. For all i

SS(R) 0 T2 (R) 0 Sz (R)

and

PVC (R) 0 SZ (PVC , R)

Proof. The proof follows from Theorem 10.4.12 and Lemma 10.4.11. Q.E.D.

10.5. Consistency notionsThis section is devoted to examining what can be achieved by Godel-type ar-guments in the context of the problem of the finite axiomatizability of boundedarithmetic.

That Peano arithmetic PA is not finitely axiomatizable was proved by Ryll-Nardzewski (1952). Rabin (1961) strengthened this to show that PA is not ax-iomatizable by any set of axioms of a bounded quantifier complexity. PA hassubtheories IDo c IEO C IE° C ..., based on induction for the E?-formulas,

10.5 Consistency notions 203

and each of them is finitely axiomatizable (utilizing universal E°-formulas, anal-ogously to Section 10.1). Every theory IE+1 proves the consistency Con(IE?)of theory I E9, which is, by Godel's theorem, not provable in I E9 itself. Hence

I E+1 # IE0. We refer the reader to Hajek and Pudlak (1993) for information onthis topic.

Of course, we would like to apply the same idea in bounded arithmetic butthere are serious obstacles. First is a theorem of Paris and Wilkie (1987b) (for thedefinition of Q see Section 2.1; Exp is a IIZ axiom Vx 2y, I yl = x).

Theorem 10.5.1. The theory S2+ Exp does not prove the consistency of Robin-son's arithmetic Q.

It follows that we cannot separate the theories SS and TZ by statements of theform Con(SS).

A natural notion to consider then is the notion of bounded consistency,BdCon(T), formalizing "there is no T proof of 0 = 1 using only bounded formu-las." But Pudlak (1990) showed that S2 does not prove BdCon(SZ) either.

However, we do not have to guess various modifications of the consistencystatements as Theorem 10.1.2 tells us that Con(G,), defined in Section 9.3, isthe strongest consistency statement (over SZ) provable in T2: That is, every dllb-consequence of Tz follows in SZ from Con(G; ). Thus the problem is to prove thatSZ does not prove Con(G1). A problem in carrying out the usual diagonalizationargument is that the metanotion of provability (provability in S2) is different fromthe notion (the provability in G;) to which the diagonalization is applied.

This obstacle can be removed to some extent. In particular, we define a notionof regular provability in (fragments of) S2 and show that RCon(T,` ), a regularconsistency of TT, is in SZ equivalent to Con(G1). However, we are unable toprove that SS does not prove RCon(T,).

I include this material primarily to illustrate what cannot be achieved by Godel-type arguments. Some corollaries are nevertheless obtained.

Definition 10.5.2. An i-regular proof of a # -free sequent

of Eb00-formulas is a triple

(7r,t,d )

satisfying1. 7r is an LKB proof of IF -+ A using the Eb IND-rule2. the function symbol # does not occur in n3. all formulas in r are strictE'

4. proof r is in a free variable normal form: the eigenvariables (the variableseliminated in V < : right and 3 < : left) are all mutually distinct and arealso distinct from all parameters

5. if a are all parameters (i.e., the free variables in I' -+ A) and b =(bo, ... , bk) all other free variables in n then

(i) if the elimination inference of bu is below the elimination inferenceof b then u < v

(ii) the elimination rule of any bu is one of

Eb IND

A(bu), 11 ---). E, A(bu + 1)A(0), n -+ E, A(r(a, bp, ... , bu-1))

3 <: left

bu < r(a, bo, ... , bu-1), A(bu), 11 --> E3x < r(a, bo, ..., bu-1) A(x), n - E

V <: right

bu < r(a, bo,... , bu-1),11 -± E, A(bu)II E, Vx < r(a, bo,... , bu-1) A(x)

6. i is a tuple of k + 1 terms to (a) and it holds that

bo <to(a),...,bu-1 <tu-1(a)) r(a,bo,...,bu-1) <tu(a)

where r (ii, l, o, ... , bu_ 1) is the term from the elimination inference of buin S.

7. d is a (k + 1)-tuple of proofs du such that each du is a proof of the sequentfrom 6 that is without the IND-rule, is quantifier-free and # -free, and theonly variables in du area, bo, ... , bu_1.

RCon(T1) is a b'11b formula formalizing that "there is no i-regular proof ofthe empty sequent."

We only sketch the idea of the following statement; the details are in Krajicekand Takeuti (1992).

Theorem 10.5.3. For i > 1, Tz is not `d nb-conservative over T,".In particular, the formula RCon(T1') is provable in TZ but not in Tl .

Proof (sketch). First we shall discuss why RCon(T1') is provable in T2. The ideais to use a partial truth definition for the #-free strictE' formulas. Such a definitionis a Eb -formula Tr1 (x, y) in two variables that has the property that for any #-freestrictE b formula 0 (a) and any tuple n = (n 1, ... , nk)

Tri(1ol,n) = O(n)

where (0l is the GOdel number of ¢: that is, a number coding formula 0 as asyntactic object. Such a definition is constructed by induction on i, and for theinduction step as well as for the case i = 0 a Ab-definition of a value of a #-freeterm

val((tl,n) = t(n1,...,nk)

for any term t (a) is needed.The function val(x, y) is defined straightforwardly using a sequence of the

values of subterms (which is unique and hence the function is Ab -definable) butwe have to, in advance, bound val (x, y) by a term of L.

Assume that the coding of terms and formulas is defined in a natural way (cf.Section 5.4) such that, in particular

11111 = 0(111)

By induction on the syntactic complexity of the term t (a ) one shows

t(n) < (max(i) +2)O(ItI)

and hence the bound

val(x, y) < (x#y)o(t)

is sufficient.The partial truth definition Tr; (x, y) we obtain in this way is a Eb-formula, and

S2 proves Tarski's conditions. We may extend Tr; to a partial truth definition STr;for sequents consisting only of strictEr-formulas. The formula STr; is then Ab+tin S.

Now let the triple (n, t, d) be an i-regular proof, where Jr is a sequence of thesequents St , ..., Se. Let a be the parameter variables and bo, ..., bk all other freevariables. Consider the IIt+ -formula ¢ (s)

Vr <sVbo<to(')....,bk <tk(a)STr,(Sr)

where to, ... , tk are the terms t provided by the i-regular proof.The formula 0 (1) holds as any initial sequent is trivially true, and the implica-

tion 4(s) -+ ¢(s + 1) holds as all inference rules are sound (and using Tarski'sconditions SS can prove it) and the terms to, ... , tk are correctly chosen, withproofs of the correctness provided by the proofs do, ... , dk, which are a part ofthe i-regular proof as well.

Hence 11+t-LIND implies 0(f) and, in particular, that Se is true and so Secannot be the empty sequent. Hence Si+t proves RCon(Tt') and the first part ofthe theorem follows from Corollary 7.2.4.

Take a diagonal formula A (a) such that the equivalence

A(a) - (b'w < a", w is not an i-regular proof of A(a))

is provable in T,, where A (a) is a #-free 11'5-formula and a formalizes the dyadicnumeral, and u is a constant to be specified later.

By the same argument TT proves that "A (a) is true," that is, A (a). Assume forthe sake of contradiction that A (a) is provable in T1 and we may take D to besome i-regular proof of A(a).

Let m be any number and m its dyadic numeral. As

a = m,A(a)-A(m)

can be proved substituting m for b in some fixed i-regular proof of

a = b, A(a) - A(b)

and hence (by cut with A(a))

a=m --> A (m)

and thus also

3x<m,x=m-*A(m)

has an i-regular proof of size O(Im I) = O(log m). Trivially all sequents

- 3x <m,x=m

have i-regular proofs of size O(log m) and consequently also all

--> A (m )

have i-regular proofs of size O(log m), say < v log(m), for all sufficiently largem (the constant v is independent of m, u in A and D). We may assume that wehave v < u. Then for m large enough there is an i-regular proof wm < 2v tog(m) <m° < m" of A(m ). But that implies -A (m ): a contradiction. Hence the formulaA(a) is not provable in

This proves that T' is not Vflb-conservative over T,1. To prove that, in fact,RCon(T,') is not provable in T,' one shows that its provability would imply that thediagonal formula A is provable. However, this requires slightly more estimates tothe lengths of the proofs as we do not have a priori all Lob's conditions sufficientfor the derivability of Godel's theorem (cf. Smoryrisky 1977). In particular, thecondition

Prf; (a, b) - 3x Prf, (x, [ Prf; [a, b]] )

is not generally valid in T1', where Prf, (a, b) formalizes that "a is an i-regularproofofb"

We shall skip these estimates as we have no further use for them. Q.E.D.

Recall that the theory Tk is defined as T2 but with the function symbolsWI (x), ... , 04- 1 (x) in the language (cf. Theorem 5.1.6) and with the additionalaxioms

Iw;+I(x)I = W;(IXI)

j = 1, ... , k - 2, in BASIC. The theories Sk are defined similarly.

Corollary 10.5.4. For i > I and k > 1 it holds that1. The theory Tk+I is not dll, -conservative over Tk

2. The theory Sk+2 is not dlIb-conservative over Sk+1

Proof. Let k > 1. Say that a triple w is a restricted Tk proof of a #-free strictEb-formula A (a) iff w is an i-regular proof of a sequent of the form

2 < Icl(k), lal(k) . laIlk) ... IaIlk) < ICI(k) -* A(a)

where l a l (k) appears j times and

j < IWl(k+l)

Symbol j a 1(k) denotes the k-times iterated function lx I on a.The number j grows slowly but exceeds any standard number and thus the

compactness argument implies that any #-free strict'-formula A(a) provable inTk actually has a restricted Tk -proof (note that the antecedent of the sequent stands

for "(okj)(a) exists").Given a and w the number c := Wk(a w) satisfies the antecedent of the

preceding sequent, and as TZ proves that every i-regular proof is also sound formulaA (a) has to be true. Hence Tk+I proves that every restricted Tk proof is sound.

We show that Tk itself does not prove that every restricted Tk proof is sound.Take a #-free IIb-formula A (a) obtained by a diagonalization such that

SS I- A(a) - (Vw < a", "w is not restricted Tk proof of A(a)")

By an argument similar to the end of the proof of the previous statement Tk Ff A (a),but 7k +I H A(a).

Part 2 of the corollary follows from the first part, as a simple consequence ofCorollary 7.2.4 is that Sk+2 is dE<+I -conservative over Q.E.D.Q.E.D.

We may use a similar provability notion to show that the theory U2 is not`d1Z,-conservative over IDo (and U3 over S2, etc.). First note, however, that wealready know that UZ is different from I Eo'b; for example, UZ can prove that every

Boolean formula can be evaluated, whereas IEo'bcannot prove that (cf. Section9.3, the discussion after Theorem 9.3.4).


Call a triple (n, t, d) a regularproofif it is i-regular for some i, and let RCon (T1 )denotes a #-free Vflb-formula (in the first order language L) formalizing that "thereis no regular proof of the empty sequent."

Theorem 10.5.5. The theory U2 is not `d lI1'-conservative over TI. In particular,U2 proves, formula RCon(T1) whereas Ti does not prove this, formula.

Proof. The idea of a proof of RCon(T1) in UZ is that one can find a A," b_definition

of truth for the #-free E,,-formulas. This is somewhat similar to the truth definitionfor the quantified propositional formulas (cf. Section 9.3, the proof of Theorem9.3.22), and we leave the details to the reader.

The rest is shown similarly to the argument in the proof of Theorem 10.5.3.Q.E.D.

We conclude this section by a technical lemma that we state for the completenessof the presentation but shall not prove (the proof would follow the idea of the proofof Theorem 9.2.5).

Lemma 10.5.6. Let i > 1. Then

SZ F Con(G1) RCon(Ti)

and

SZ P Con(G) =_ RCon(Ti)

10.6. Bibliographical and other remarksTheorems 10.1.2 and 10.1.3 are from Krajicek and Pudlak (1990a). Definition10.2.1, Lemma 10.2.2, and the first part of Theorem 10.2.4 are from Krajicek etal. (1991). Lemma 10.2.3 is from Buss (1993b) and Zambella (1994) (based onKadin 1988), and it implies the last part of Theorem 10.2.4 and Corollaries 10.2.5and 10.2.6.

The content of Sections 10.3 and 10.4 is from Krajicek (1993) and the pre-sentation of 10.4.2-10.4.12 closely follows that paper. Corollary 10.4.13 is fromKrajicek et al. (1991) and Krajicek (1993).

The content of Section 10.5 is from Krajicek and Takeuti (1992), with theexception ofTheorem 10.5.1, which is from Paris and Wilkie (1987b), and Theorem10.5.5, which is from Krajicek and Takeuti (1990). Paris and Wilkie (1987b) haveconsidered restricted provability notions.

There are several topics related to questions considered in Section 10.5, althoughthey do not seem to have some immediate implications for bounded arithmetic (or,more accurately, for the questions about bounded arithmetic we study here). But


these topics appear to me quite important so I shall at least mention three ofthem: the truth definition for bounded formulas, in Lessan (1978), Dimitracopou-los (1980), Paris and Dimitracopoulos (1988), and Takeuti (1988); interpretabilityin theories of arithmetic in Pudlak (1985) and Wilkie (1986); and restricted con-sistency statements in Pudlak (1986, 1987) and Paris and Wilkie (1987b).

11

Direct independence proofs

From Section 10.4 we know that all theories S2(R) and T2 (R) are distinct. In thischapter we examine specific, more direct independence proofs for theories SS (R),T2 (R), and S2 (R), and we strengthen Corollary 10.4.3.

11.1. Herbrandization of induction axiomsIn this section we shall examine the following idea for independence proofs: Takean induction axiom for a Eb(a)-formula. It has the complexity bE+j (a). Intro-duce a new function syml :)l to obtain a Herbrand_form of the axiom, as at thebeginning of Section 7.3. But this time we reduce the axiom to an existential for-mula. This allows us to use a simpler witnessing theorem (Theorem 7.2.3) thanthe original form of the axiom would require.

Consider first the simplest case (which will turn out to be the only one for whichthe idea works). Let a (x, y) be a binary predicate. Then the herbrandization of theinduction axiom for the formula A(a) := 3u < a, a(u, a)

A(0) n `dx < a, (A (x) --* A(x + 1)) -* A(a)

is the formula

a(0, 0) A (Vx, y < a, (a(x, y) n x < y)

- (a(f(x, y), y+ 1) A f(x, y) < Y+ 1)) 3u < a, a(u, a)

Denote this formula INDH(A(a)).

Theorem 11.1.1. The formula INDH(A(a)) is provable in TZ (a, f) but not inSZ (a, f ). Hence T21 (a, f) is not `d Eb (a, f)-conservative over SZ (a, f).

Proof. The antecedent of the formula INDH(A(a)) implies the antecedent ofthe induction axiom for A(x) := 3y < x, a(y, x); hence T2 (a, f) implies the

210

11.1 Herbrandization of induction axioms 211

formula 3u < a, a(u, a). This shows

TZ (a, f) I- INDH(A(a)).

Assume now that also S1(a, f) H INDH(A(a)). By (the relativization of)Theorem 7.2.3 there must be a polynomial time oracle machine M",f computingon an input a a witness for INDH(A(a)), for every a c w x w and f : w2 --> wof a polynomial growth. We shall show, however, that for any such machine thereare a, f, and a for which M', f (a) is not a witness to INDH(A(a)).

Let Ma, f be any machine and fix a sufficiently large (depending on the runningtime of Ma' f ). Start a computation of M'' f on the input a. When the machine willask queries a (x, y) or ask for values of f (x, y), we shall determine the answer tothe query by the following rules:

1. Assign a(0, 0) the value TRUE.2. To the query [a(x, y)?] assign TRUE (and answer YES) if for all t <

y, t x query a (t, y) was already assigned FALSE. Otherwise assign toa(x, y) FALSE (and answer NO).

3. If the machine requests the value [f (x, y) =?] consider three cases:

(a) a (x, y) is assigned TRUE. Then choose t < y + 1 s.t. a (t, y + 1) isalready assigned TRUE if such t exists; otherwise choose arbitraryt < y + 1 such that a (t, y + 1) was not assigned a truth value yet.Answer f (x, y) = t and assign to a(t, y + 1) the value TRUE.

(b) a(x, y) is assigned FALSE. Choose some t < y + 1 s.t. a(t, y + 1)is already assigned FALSE if such t exists; otherwise pick arbitraryt < y + 1 such that a(t, y + 1) has no truth value yet. Answerf (x, y) = t and assign to a(t, y + 1) the value FALSE.

(c) a (x, y) has no truth value yet. Assign to it some value following rule2 and then define f (x, y) according to (a) or (b).

Claim. a(x, y) cannot receive the truth value TRUE during the answers to thefirst y oracle queries.

The claim is verified by induction on y.Now assume that a is large enough that for the running time nk, la Ik < a. Then

no a(x, a) could receive the value TRUE.Assume M''f(a) claims to output a witness to INDH(A(a)). If it is a pair

(x, y), x < y < a such that a(x, y) is assigned TRUE, then either f (x, y) isdefined and < y + 1 and a (f (x, y), y + 1) is also assigned TRUE, or undefined(and then define f (x, y) following 3).

Take ao to be the relation consisting of those pairs (x, y) such that a(x, y)received value TRUE and fo any total function extending the partial function f.

Clearly then the output of M"o'fo(a) is not a witness to INDH(A(a)).Q.E.D.

212 Direct independence proofs

A natural thing is to try to apply the same idea to the induction axioms forformulas of the form

A(a) = 3x1 < aVY1 < a ... Qx1 < a, a( T, y, a)

(with i alternating quantifiers). A herbrandization INDH (A (a)) of the inductionaxiom for the formula A (a) has the form

a(0,5,0)A(Vb,x1... ,xm,tl,...,tn <a,a(x,Yflf,b)- &(zklgk,t,b+1))- 3u1,...,um Via, &(u,velht,a)

where(i) m = Ii /21 and n = Li /2J and & is the formula

xl _bA(y1 <b -+ (x2 <bA(...(a(x,Y,b))...)))

(ii) the function symbol f has the arguments b, XI , ... , xj, ti, ... , tj _ 1(iii) the function symbol gk has the arguments b, x1, ... , Xk, t1, ... , tk_I(iv) the function symbol he has the arguments a, u1, ... , ut.

Lemma 11.1.2. Let i >I and let A(a) be a E,(a) formula of the form

3x1 < aVyt <a ... a(x, y, a)

a an (i + 1)-arypredicate.Then the formula INDH(A(a)) defined earlier is provable in T2 (a, f, g, h ).

Proof. Consider the formula B(b)

B(b) = 3u1, ... , um < b, &(u, ve/he, b)

with he a function symbol depending on b, uI.... , ue, f = 1, ... , n.Assume the conjunction in the antecedent of the formula INDH(A(a)) is valid.

Then B(0) is valid as it follows f r o m a(0, ... , 0). Also the implication

B(b) B(b + 1)

is valid for b < a, which is seen as follows. Assume B(b), that is

3u1,...,um <b, &(u,velht,b)

which by the second conjunct of the antecedent of INDH(A(a)) implies

V u, t, &(zk/gk, t, b + 1)

where g k = gk(b, u1 , Uk, t1, ... , tk_I). Substitute for to the function he andexistentially quantify the terms gk (tt/ ht) on the places of zk's. This gives B(b + 1).

By Eb((X, h)-IND then B(a), which is the desired succedent of INDH(A(a)),follows. Q.E.D.

11.2 Weak pigeonhole principle 213

11.2. Weak pigeonhole principleWe have already considered the weak pigeonhole principle in Section 4.2 (Theorem4.2.4) and in Section 7.3 (Theorem 7.3.7) but in both cases in slightly differentversions, so now we define the principle again, as it will be used in this and furtherchapters.

Definition 11.2.1. PHP(R)n is the following bounded LPA(R) formula

(Vx < m3!y < nR(x, y) A t/y < n3!x < mR(x, y))

It says that the relation R is not a graph of a 1-1 function from m = {0, 1, . .

m - 1) onto n = {O,1,...,n-1}.

In the case when m > n + 1 the formula is called the weak pigeonhole principle;most prominent cases of the weak pigeonhole principle are m = 2n, n2, and 2".Later we shall also consider a stronger version of the principle saying that there isnot an injective map from m into n.

In the rest of the section we omit explicit reference to the relation R in theformula PHP(R)"'.

Lemma 11.2.2. The theory SS (R) proves the weak pigeonhole principle PHPnn.In other words

SZ(R) I- PHPIa1

Proof. Assume - for R c 2" x n, where n = ja 1. Define the formula

"j E R-1(i)" if 3x < 2n, bit(x, j) = 1 A R(x, i)

which is Ab (R) in SZ (R) as the witness x is unique, and consider a Ab (R)-formula

As the bounded comprehension axiom CA is available in SS(R) for Ab(R)-formulas (cf. Lemma 5.4.2), we have

3y < 2"b'i < n, (bit(y, i) = 1) - "i R-1(i)"

and the usual diagonal argument applies to y. Q.E.D.

Theorem 11.2.3. For all k

Si (R) + 3y, y =alal(k)

F- PHP(R)a2

where Ial(k) is k-times iterated function Ix 1.

Proof. We shall work in S2(R) first and later see how much induction we usedand how much we used the function #. Assume -PHPa2 with R c a2 x a, and let

F : a x a -+ a be the bijection whose graph is R, that is

F(x, y) = z if R(a x + y, z)

Consider a complete binary tree of height t. We identify its nodes with the 0-1sequences of length at most t and consequently with the numbers < 21.

Given i < a, we shall label the nodes of the tree by numbers < a by inductionon the height:

1. label f(i, E) = i, where E the empty sequence (i.e., E is the root)2. if f (i, x) = u and x0, x 1 are the left and the right sons of x, and F(v, w) _

u, then set

f(i, x0) =v and f(i, xl) = w

In this sense i < a codes a sequence of numbers < a of length 2`: the labels ofthe leaves ordered lexicographically. Note that f(i, x) is Ob(R)-definable from iand in the definition one needs to code sequences of numbers < a of the lengthO(lx1) = 0(t): That is, one needs that the number at exists.

On the other side let g : 2` -* a be any map definable by a -formula. Thenthere is i < a such that

g(x) = f(i, x)

for all x < 2`. This is proved by induction as follows. Consider the formula

Ag(r):=r<t-*b'xE{0,1}t-` 3i <aVyE{0,1)',f(i,y)=g(x-- y)

Then Ag(O) holds as for x E 10, 1}t we can take i := g(x). Assume that Ag(r)holds and let x E 10, 1}r-r-I be arbitrary. By the induction hypothesis there areio, i I < a such that for all y E {0, 1 }r

f (io, y) = g(x 0 -- y) and f (i l , y) = g(x ^ I - y)

Take i := F(io, i1); hence i < a. It is straightforward to verify that for all y E{0, l}r+I

f(i, Y) = g(x - Y)

The formula Ag(r) is fI so S (R) implies that Ag(t) holds, fort < 1al. Takei < a witnessing Ag(t) (the quantifier Vx is void). Then

`dy E (0, 1}, f(i, Y) = g(Y)

We shall summarize the preceding arguments in a claim.

Claim. The theory S3 (R) +PHpa2

proves that if at exists, t < la1, then thereis a Ob-definable map

f(i, x) : a x 2t -+ a

such that for any Lb-definable map

g(x) : 2' a

there is ig < a such that

Vx < 2r, e(ig, x) = g(x)

Now we are ready to obtain the contradiction from the existence of the map F.Take t := la I and consider the Ab-definable map h : 2` = a a defined by

h(i) =i,

0,

ife(i, i) 0 iif e(i, i) = i and i 00

1, if e(i, i) = i and i = 0

By the previous observation above there is ih < a such that

V y < a, e(ih, Y) = h(y)

and, in particular

e(ih, ih) = h(ih)

which is impossible by the definition.This shows that Sl (R) proves -'

PHPa2

assuming at exists for t = la t.Assume now that a' exists for t = hall. The claim implies that there is a Ab

definable map e(i, x) enumerating for i < a all -definable maps from 2' to a,that is, from la l to a. Hence using this map we can encode sequences of numbers< a of length < la I needed in the definition of the function e' enumerating alli. -definable functions from 21al to a, and the contradiction is obtained as before.

Generally, if t = J a I (k) then we use the claim k-times to obtain the contradiction.As all maps e(i, x), e'(i, x), ... constructed in the repeated use of the claim areOb (R), the formula Ag(r) used in its proof is n3 (R) and hence the whole argumentcan be formalized in Sj (R). Q.E.D.

Theorem 11.2.4 (Paris, Wilkie, and Woods 1988).

TZ (R) f- PHP(R)a2

and

T2 (R) I- PHP(R)a

Proof. The first part of the theorem follows from Theorem 11.2.3 and Corollary7.2.4. For the second part let G be an injection of 2a into a. If we iterate G k-timeswe get a map

which is Ab (G) -definable.

For k = Ja I this gives an injection

F: a2 -+a

and Theorem 11.2.3 applies. Q.E.D.

Theorem 11.2.5 (Krajicek 1992).

S22(R) If PHP(R)az

In fact, let f be a function such that a is eventually majorized by 2f(a)E, for anye>0.Then

S22(R) V PHP(R) f(a)

Proof. Assume for the sake of contradiction that

S22(R) I- PHP(R)a'

The formula PHP(R)a2 is EZ (R), and hence by Theorem 7.2.3 there is a polynomialtime machine MB with access to a Eb(R)-oracle B that on input a outputs one ofthe following elements:

(a) x < a2 such that `dy < a, -'R(x, y)(b) xt < x2 < a2 and y < a such that R(xi , y) A R(x2, y)(c) y < a such that Vx < a2, -R(x, y)(d) x <a 2 and y, < Y2 < a such that R(x, yl) A R(x, y2).

We shall contradict this hypothesis by proving the following claim.

Claim. Let MB be a polynomial time oracle Turing machine with access to aEb(R)-oracle B.

Then there is R C_ a2 x a such that MB on input a with the oracle B does notoutput a witness to PHP(R)a2.

To prove the claim assume that the oracle B(b) has the form

B(b) := 3w < t(b), NR(w, b)

where NR(w, b) is a Ab(R)-formula formalizing (cf. Section 6.1)"w is an accepting computation of machine NR with oracle R on input b"

where NR is a polynomial time oracle Turing machine.

Fix a large enough (we shall analyze the precise magnitude of a needed later).Start the computation of MB on a. As in the proof of Theorem 11.1.1 we shallanswer the oracle queries and construct the approximations Ro , R , ... to R andRo , RI , ... to (a2 x a) \ R. Put Ro := Ro := 0. Assume that after i queries[B(bi)?], ... , [B(b,)?] we have Ro C Rt c ... C R+ and Ro C Ri C ... CRr satisfying for j < i the conditions

1. R nR7 =0

2. Rt is a graph of a partial 1-1 function from a2 to a3. IRS U RJ I < j tM ON (I a 1)), where tM (n) and tN (n) are the time bounds

of the machines M and N.4. For any S 2 R+, S fl RI = 0 that is a graph of partial 1-1 function from

a2 to a, the oracle answer to the queries [BS(bj)?], j = 1, ... , i, are fixedas given in the construction.

Let [B(bi+1)?] be the (i + 1)-st query. Consider two cases:(i) There is S D Ri , S fl RI = 0 that is a graph of partial 1-1 function from

a2 to a, and w < t (bi+1) such that NS(w, b1+1) holds.(ii) otherwise.

In case (i) answer YES. Extend RI (resp. RI) to R+1 (resp. to R-1) by adding toit all pairs (x, y) E a2 x a such that [S(x, y)?] is a query in the computation w andis in w answered affirmatively (resp. negatively). As there are at most tM (tN (J a 1))such queries, the sets R+1, Ri+1 will satisfy condition 3.

In case (ii) answer NO and put R+1 := R; and RI+i := Rl.

Clearly R+ I, R[1

satisfy all four conditions 1-4.

Let R := R+, for io the last query, and run the machine MR on input a. Itcannot output xl < x2 < a2 and y < a such that

R(xl, y) A R(x2, y)

as R is a partial 1-1 function. Assume it outputs xo < a2, which should be awitness to

Vy<a, -R(x,y)

But by condition 3

IR+UR 1

so we can always find yo < a such that

R':=RU{(xo,yo)}

is a partial 1-1 map disjoint with Ro. This means, by condition 4, that MB(R')would on input a output the same xo < a2, but now

3y < a, R'(xo, y)

This shows that no output of such a machine can satisfy (a) or (b). Analogouslythe output cannot satisfy (c) or (d) either. Hence the machine does not witnessPHP(R)az.

For the second part of the theorem note that earlier we needed only that

tM(n) . tN(tM(n)) = n0(1) < f(a)

where n = lal, that is, that for any k < co we can choose a large enough that

talk < f(a)

That is

a <

Q.E.D.Q.E.D.

As the formula PHP(R)a2 is VEZ (R), last two theorems imply the next corollary.

Corollary 11.2.6. The theory T2 2(R) is not VE2 (R)-conservative over the theoryS2 (R).

We shall improve upon this corollary. Let us consider now the weak pigeonholeprinciple formalized with function symbols f for the bijection whose graph is therelation R and g for its inverse function. The formula WPHP(a, f, g) is

3x <2a f(x) >a V 3y<ag(y)>2av 3x < 2a g(f(x)) x v 3y < a f(g(y)) = y

Note that the formula WPHP(a, f, g) is 2 (f, g)-witnessed as the binary searchlooking for appropriate x or y asks only Ef(f, g)-questions. Recall Definition7.5.1.

Theorem 11.2.7. The formula WPHP(a, f, g) is not witnessed by a PLS-problem.In particular, WPHP(a, f, g) is not provable in S2 (f, g).

Proof. The second sentence follows from Theorem 7.5.3 and Corollary 7.2.4.For the first part assume that WPHP(a, f, g) is witnessed by a PLS-problem

P = Pf g: That is, the locally optimal solutions s from the solution space Fp (a) arewitnesses to WPHP(a, f, g). W.l.o.g. assume that the solution space is Fp(a) :=(0, 1)IaIk and let Cf g(a, s) and Nf g(a, s) be the polynomial-time algorithmswith the oracle for the functions f and g computing the cost and the neighborhoodfunction.

Assume for the sake of contradiction that the local optima do provide witnessesfor WPHP(a, f, g): Whenever s = (w, ti,...) E Fp(a) is locally optimal, thenw witnesses one of the four existential quantifiers in the formula WPHP(a, f, g).

We shall show that for any Cf g and Nf g there are f, g for which a localoptimum fails to provide a witness.

Fix a large enough. Call a computation of a polynomial time machine withthe oracle for f, g good if the oracle answers form a partial 1-1 map from < 2ato<a.

Let M E N be the minimal number such that there are s E Fp(a) and a goodcomputation of Cf g yielding Cf g(a, s) = m. Set sn to be one such s and fn to

be the oracle answers in some good computation of Cf g(a, sm) = m. Note thatfm is a partial 1-1 map and I f,,, I = log(a) 0(1).

Extend fm to h by adding to fm the oracle answers in some good computationofNfg(a, s,,,) consistent with fm. Hence IhI = log(a)°(1) and by our assumption

Nh (a, sm) = sm

as sm is even globally optimal.It follows that sm is locally optimal in every problem Pf g for all f D h and

g h(-1 ), and thus it has either the form

sm = (w, tl, ...)

in which case f (w) > a should hold for all f 2 h or g(w) > 2a for all g 2 hor the form

S. = ((x, Y), tl , ...)

in which case x 0 y and f (x) = f (y) should hold for all f h or g(x) = g(y)for all g D h").

In both cases it is, however, easy to find f Q h and g Q h t I not obeying therequired property, as 1hI = log(a)011W << a. Q.E.D.

From this theorem and Theorem 11.2.4 we get the following corollary.

Corollary 11.2.8. The theory T2 (R) is not VEb(R)-conservative over the theoryS2 (R).

Proof. The formula WPHP(a, f, g) is VEb(f, g) but in a language with the func-tion symbols f, g. If we translate the formula naively into a language with a rela-tion symbol R interpreting R as the graph of f we get only a VE2(R)-formula.However, a better translation into a relational language is derived by interpretingRC2ax(Gal+1)as

R(x, i) true if bit(f(x), i) = 1

and S C a x (Ia!)as

S(y, j) true if bit(g(y), j) = 1

The relation R is called the bit-graph of f.Then the formula translates into a VEb (R, S )-formula. Moreover, two relations

R and S can be easily ib-coded by one relation.It is a trivial corollary of Theorem 11.2.7 that this formula is not PLSR,s-

witnessed either, and hence the corollary follows. Q.E.D.

11.3. An independence criterionS. Riis (1993a) obtained a nice sufficient condition (11.3.2) for a combinatorialprinciple to be unprovable in SZ (R). His original proof is model-theoretic but wegive another proof of a slightly stronger statement.

T h e o r e m 11.3.1. Assume that (D = d x 3 TO (x, y ), 0 open, is a sentence in arelational language {R) consisting of one binary relation (the case of a generalrelational language is completely analogous). Assume that 4' has an infinite model.

Then there are no polynomial-time machine M and NP(R)-oracle A (u) (i.e., aEb(R) formula) such thatfor everyfinite structure ([0, a], R) given as an input toM the machine queries A evaluated in ([0, a], R) and eventually outputs x E [0, a]satisfying

([0,a],R) = dY-0(x,Y)

In other words, the formula

35E <aVY<a, -O(x,

is not witnessed by a polynomial time machine with an NP(R) oracle.

Proof. We shall follow the idea of the proof of Theorem 11.2.5. Let (K, RK) bean infinite model of fi.

Let M be a machine running in time < ne, n = ja 1; let A(u) have the form

A(u) = 3w(Iwl < ju1 A NR(u, w))

where NR (u, w) formalizes that"w is a computation of the polynomial-time machine N on the input uworking with the oracle R."

For any w, I w(< Iu1, there are Dw c [0, a] and Rw C Dw x D,,, such thatfor every R c [0, a]2 satisfying

RwCR A (D2 \Rw)nR=0

the truth value of NR(u, w) over the structure ([0, a], R) is fixed. We say thatD, Rw forces that truth value. Note that I Dw I < n`.

Start a computation of M on a, a sufficiently large. We claim that there is asequence of D, c [0, a] Ri C Di x Di and

Fi : (Di, Ri) i-+ (K, RK)

for i = 0, 1, . . ., satisfying1.

2. DicDi+i,RiCRi+,,(D?\Ri)nRi+i=03. Fj+j Di = Fi

11.3 An independence criterion 221

4. F is an embedding and

Yx,YE Di,Ri(x,y)=RK(F,(x),Fi(Y))

5. for all R c_ [0, a]2, Ri c_ R, and (D, \ Ri) n R = 0, the truth values off i r s t i queries A(u 1), .. , A(ui) in ([0, a], R) are fixed.

Set Do = Ro = FO = 0. Assume that we have (Di, Ri, Fi) and that [A(u)?] isthe (i + 1)-st query. Consider two cases.

(i) There is w, I w I < I u I` such that

(a) (D, Rw) forces A(u) be true

(b) R = R; U Rw satisfies (D? \ Ri) n R = 0

(c) there is an embedding

F: (Di UDw,Ri URw) H (K,RK)

such that F I Di = F,

(ii) otherwiseIn Case (i) set Di+1 := Di U Dw, Ri+1 := Ri U Rw and Fi+1 := F, answer

the query affirmatively, and resume the computation of the machine.In Case (ii) set

(Di+1, Ri+l, F+1) := (Di, Ri, Fi)

answer the query negatively, and resume the computation.Conditions 1-4 are satisfied trivially; 5 is true as we prefer, by Case (i), the

affirmative answer to the negative one.Let (D, R, F) = (Dio, Rio, Fio) be the last triple in this sequence, so io < n.

Let x be the output of M; w.l.o.g. we assume that x is disjoint from D. Let x' beany tuple from K disjoint with the range Rng(F) of F.

As (K, RK) ='tthere is a tuple y' such that

(M, RM) 0 (x', Y')

Again w.l.o.g. assume y' is disjoint with (x'}U Rng(F). Pick any y disjoint with(x ) U D (this is possible as I DI < ne+` << a) and extend R to R' such that theembedding

((x, Y, D), R') H ((x', Y', Rng(F)), RK)

extends F.This R' fulfills the hypothesis of condition 5 and so M outputs x when the

queries are evaluated over ([0, a], R'). But ([0, a], R') sox does notwitness 3 x d y-4 (x, y ). Q.E.D.

Note that the same proof shows a slightly stronger (a nonuniform version) state-ment with the Turing machine M replaced by a witnessing test tree satisfying the

conditions of Lemma 9.5.3. Examples of the weak pigeonhole principle formu-lated with a relation symbol and with function symbols show that the theorem isnot valid for formulas in language with function symbols (cf. the discussion beforeTheorem 11.2.7).

Theorem 11.3.2 (Riis 1993a). Let 4) = Vx3y 0(x, y) be a first order sentencein a language L' (not necessarily relational) disjoint with L and assume that ithas an infinite model (we do not require 0 to be open).

Then

S2 (L') V 3x < ab'y < a, _O<° (x, y)

where O<a denotes 0 with all quantifiers bounded by a.

Proof. Assume 0(x, y) has the form

Vul3vl ... Vuk3Vk, (x, y, u, i )

Let 4)S be the skolemization of d>

cs:=Vx,ul,...,uk, l'(x,ylG(a,x),u,vtlH,(a,x,ul,...,ui))

For every r-ary function symbol f (Z] -, Zr) in 4)s consider a new (r + 1)-arypredicate symbol Rf and a formula Def(Rf)

Vzl,.. ,zr3t,R(zl,...,zr,t)AVZ1....,zr,t1,t2, R(zl,...,zr,t1)AR(zl,...,zr,t2)±tl =t2

and replace in (DS the occurrences off using Rf without an increase of the quan-tifier complexity to get a formula 4)S. For example, replace a negative occurrenceof a formula like P(f(z)) in 4S by 3t, Rf(z, t) A P(t) and a positive one byb't, R f(z, t) -* P(t). Hence Vs is also universal.

Take a formula 4)rel to be the conjunction

VS A A Def(Rf)f

f ranging over the function symbols of 4)s. The formula 4),l is thus V3.If 4) has an infinite model, so does 4) and 4)rel By Theorem 11.3.1 the formula

(-'(Drel)<Q is not witnessed by a machine with an NP(L") oracle, where L" listsall relations of 4)rel. Hence by Theorem 7.2.3 ('4)rel) <Q is not provable in SS (L").

Take a model M of S2 (L") in which

(M, L") = (4)rel)<m

holds for some m E M. The relation symbols Rf of L" satisfy in M the conditionsDef(Rf). Hence Rf is a graph of a unique function; denote it f.

11.3 An independence criterion

Expand M to M' by these function symbols. Clearly then

M' ('S)<m

and hence also

223

M' ((D)<m

We have to show, however, that M' is still a model of S2 in the language L"extended by the function symbols f. This follows from a simple observation thatany Ei-formula in this extended language is equivalent in M' to the EZ (L")-formula because an atomic formula f (z) = t is equivalent to a Ob (L")-formula(same as previously). Q.E.D.

Corollary 11.3.3. None of the following principles is provable in S22(L1. the weak pigeonhole principle PHP(R)Q

2

2. the modular counting principle MODk (R, S )a formalizing (for fixed stan-dard k) that for each a at least one of the following properties must fail:

(a) R is a partition of a = {O..... a - 1) into k-element classes

(b) S is a partition of a \ {0} into k-element classes

3. no linear ordering R of [0, a] can be dense4. every linear ordering R of [0, a] has a least element

Proof. By Theorem 11.3.2 it is enough to construct infinite models for the nega-tions of the principles. This is trivial for all of them (interpreting a2 as a x a).

Q.E.D.

Before the next theorem recall that in S2 we can code sets of size < lalk (cf.Section 5.4).

Theorem 11.3.4. Let O (S) be a first order sentence in the language L' disjointwith L, and let L' contain a unary predicate S. Assume that there is an infinitestructure K for L' \ {S} such that for every finite S C K

(K, S) = ¢(S)

Then for every k

S22(L') V 2S C [0, a], I SO < JaIk A _.p <a (S )

Proof. We shall indicate how the proof of Theorem 11.3.1 can be extended andleave the extension of the proof of Theorem 11.3.2 to this case to the reader.

We want to show that no polynomial time machine M with access to an NP(L')oracle evaluated over an (L' \ {S})-structure with the universe [0, a] can, given as

input a, output S C [0, a] such that

(i) ISI < log(a)k(ii) ..<a(5) holds in [0, a].

Choose a large enough and construct (L' \ {S})-structure on [0, a] as in the proofof Theorem 11.3.1, utilizing the embeddings into the given infinite (L' \ {S})-structure K.

Assume that M outputs S, that is, a list of < nk elements of [0, a], IaI = n.Then O<a (S) cannot hold in [0, a] as the image F"S of S in the embedding intoK would be a finite set satisfying 0, which contradicts the assumption. Q.E.D.

Corollary 11.3.5. None of the following true principles is provable in S2 (L'):1. every binary tree R c [0, a] has a path of length < [log(a)]2. every vector space V C [0, a] over F2 has a basis of size < [log(a)]

Proof. The corollary follows from Theorem 11.3.4 as there is an infinite binarytree without a finite path and no infinite vector space over F2 has a finite basis.

Q.E.D.

The following statement is a bit surprising.

Theorem 11.3.6. Let ¢ (S) be a first order sentence in the language L' disjointwith L and assume that for some f

S2 (L') I- 3S C [0, a], I SI _< J a I(e) A _4<a (S)

Then, in fact, for every k:

3SC[0,a],ISI<IaI(k)A-O(S)

is true in every finite (L' \ {S})-structure M on [0, a], a < w, where J a I (k) is thelength function k-times iterated.

Proof. Assume that the conclusion is not valid: That is, for arbitrary large a < cothere is an (L' \ {S})-structure on [0, a] such that

As Ia I (k) eventually majorizes any constant, by compactness there is an infinite(L' \ {S})-structure K such that for any finite set S C K

(K,S) SO(S)

The theorem then follows as in the proof of Theorem 11.3.4. Q.E.D.

11.4 Lifting independence results 225

11.4. Lifting independence resultsIn this section we develop a method, utilizing Sipser's functions (cf. Section 3.1),to lift the independence results for the theories S2(a), TT(a), i = 1, 2 to higheri > 2. In this way we shall lift Theorem 11.2.5, a version of the weak pigeonholeprinciple, but similar arguments work for the principles from Section 11.3.

The idea is the following. Formula PHP(R)a2 is a VEZ (R)-formula provable inT2 (R) but not in S2 (R) (Theorems 11.2.4 and 11.2.5). For technical reasons weshall represent the numbers x < a2 by pairs of numbers xi , x2 < a, and we takethe weak pigeonhole principle WPHP(a, R (x, y, z)) in the form

_ (du1, u2 < al!w < aR(ul, u2, w) A Vw < al!ul < u2 < aR(ul, u2, w))

formalizing that 3u 1, u2 <a, a u I+ u2 = x A R(u 1, 142, y) is not a graph of abijection between a2 = a x a and a.

If we replace the predicate R(x1, x2, y) by a formula id (a, x, y)

i/4(a, x, y) := Vz1 < a3z2 < a ... Qzd_1 < a

Qrzd < (taloga)1/2, a(x1, x2, y,

where a is a (d + 3)-ary predicate (and £ a constant to be specified later), theformula PHP(R)az becomes a E2+d (a) -formula and just substituting (a, x, y)for R(x, y) in a TZ(R)-proof of PHP(R)a2 shows that it is provable in T22+d (a)(as Ab (R)-formulas transform into Ab+d(a)-formulas).

We want to argue that WPHP (d (a, x, y))a2 is not provable in S22+d (a). Assum-ing otherwise, Theorem 7.2.3 implies that WPHP(d (a, x, y))a is witnessed by aC]2+d (a)-function: by a polynomial time Turing machine M querying a E +d (a)-oracle. For a fixed sufficiently large input a we want to find a such that M failsto output the required witness. Using the partial restrictions from Section 10.4 wecollapse all possible E+d(a)-queries to something analogous to Ep(a)-queries,and at the same time the formula WPHP(fd(a, x, y))a2 will almost collapse tothe original formula WPHP(R)az. finally an argument analogous to the proof ofTheorem 11.2.5 should give the wanted contradiction.

Now we present these ideas formally. We shall work with Boolean circuits withatoms of the form

(mIo(m))2Pxl,xz,y,Z....,, for Y> zt , , z1 < m> z<

2

The set of all these variables is denoted B',t(m).

Recall from Definition 10.4.6 the space of random restrictions Rq withq

Bi,t(m) partitioned into classes of the form

{ Pxi,x2,y,zi,...,z;_i,t t <fm

2log(m) 1 /2

1

and the definition of the restricted circuit Cp, for P E R9+ or p E Rq R. Definition

10.4.8 of ES;`-circuits is used in the same form except that atoms of the circuitare from B', (m) now.

Hastad's two Lemmas 10.4.7 (converting a depth d Sipser function into depthd -1 one) and 10.4.9 (switching a Esm circuit into a E LI m circuit) hold literally,with the same choice of the probability q.

Definition 11.4.1. Fix i and C. A circuit-oracle is a junction C assigning to anynatural number u a Boolean circuit Ct, with variables from B',t (m), in = in (u)also a function of u.

For any a c w2+' the circuit oracle C defines a particular set C"

C":= {uIC'=1}

where C" is the evaluation of Cu under the evaluation of atoms

pxI,x2,y,z,,...,zi = 1 i f f (xl, x2, y, zl, ... , zi) E a

For 5, t, and m functions of U, a circuit oracle is called iffC, E ES(u),t(u)

t,m m (u)for all u.

Note that the propositional translation (...) (Definition 9.1.1) of bounded for-mulas into Boolean formulas provides a correspondence between a E ,P (a) -oracle

and a ESm-circuit-oracle, with the functions S =21og(m)0"), t = log(S), and

in = 2109(m)On)

Definition 11.4.2. Fix in and k, and let [m]k denote the set of k-tuples of elementsof 10.... M - 11.

A (k - u)-dimensional cylinder in [m]k is any set of the form

{(xI,...,xk) E [M]k I xi, =at,...,xi, =au}

for any fixed i I < < i,1 and a l, ... , a,, < M.

The notion of a cylinder is a technical one and its relevance comes from thenext theorem. Note that there are ( )mk-r many r-dimensional cylinders in [m]k.

Theorem 11.4.3. Let d > 1, let a be a (3 + d)-ary predicate symbol, and leta(a, x, y) be a l1

d(a) formula defined at the beginning of the section.

Let A(a, R) be any EA(R) formula with all quantifiers bounded by < a, inwhich the only free variable is a, in which there are no function symbols, and inwhich all variables in all occurrences of a are bounded.

Assume that M is a polynomial-time oracle machine with E +d-oracle B suchthat MB computes on input a some witness to the formula A(a, / (a, x, y)).

Then there is a constant c > 1 such that, for any sufficiently large m there isQ C N3 and a Ej'm -circuit oracle C1 with variables from B°,P d (M) satisfyingthe conditions

1. for all u: m(u) = in, S(u) = 21og(m)`, t(u) = log(m)c2. for any r = 1, 2, 3 and any r-dimensional cylinder U in [m]3

IU \ Q1 > mr-1/2

3. for every Ro C N3. for which R° fl Q = 0 the machine M with the oracleC1 computes on input in a witness to the formula B(m, R°[xl, x2, y]).

Proof. Choose in large enough so that Lemmas 10.4.7 and 10.4.9 hold with f >d + 4, and fix a := m. W.l.o.g. we forbid as an element into any a any (3 + d)-tuple (xl, x2, y, z 1 , ... , zd) f o r which one of x1, x2, y, Z 1 ,- . , zd_1 is >_ in orfor which zd > (2m log(m)/2)1/2. Identify the formula >/id(a,x1,x2, y) with adepth d circuit (* (a, x1, x2, y))(m) computing *d (a, xl, x2, y) from the atomsof

B be the Ed+l (a)-oracle. Since M runs in polynomial time any u, for whichB is queried is bounded by < 21og(m)c, some c < w. Moreover, all quantifiers inall such instances B(u) are bounded by numbers of the same magnitude. Forany u < 2105(m)' B(u) is computed by a circuit (B)(u) built from the atoms ofBd'e(in); each such (B) (u) is a ES,`S'` l,m circuit with m(u) = m, S(u) = 21og(m)e

and t (u) = log(S(u)). Hence we may think of B as a Ed+l m-circuit oracle ratherthan a Ed+1(a)-oracle.

Choose randomly partial restrictions pi from R m(qi) for i odd and fromR_ nt(q,) for i even, i = d, d - 1, ... , 1, where the probability qj is set to be

(2(2-d+i)log(m)\1/2qi :_

m

Denote by Cn the circuit ( ((CPd )A'-') ...)PI .

S.tl m-circuit collapses to a E,,'nt-circuit with proba-By Lemma 10.4.9 any Ed+

bility at least

q; >l<i<d

(as qd > > ql) and hence with probability at least

1 - S2 d (6gdt)` > 1 - 2-I log(m)"'


this happens for all S circuits (B) (u) of the circuit oracle (form large enough w.r.t.c and £).

To see what happens with the circuits *d (a, x1, x2, y) we apply Lemma 10.4.7.Restricting the circuit by random pd, ... , pi produces circuits

( 'a(a,xl,x2,Y))pd ,

((id(a,xi,x2,Y))pd)pd-,

..

which with the probabilities at least 1- (1 /3)m contain (a, x 1, x2, y),

*;-2 (a, xI, x2, y), .... There is m3 of such circuits; hence after the restrictionsPd, ... , P2 all these circuits contain

d-d+l (a, XI, x2, Y)

with the probability at least

> 1 - 3 (d - I)m-t+d+2

We have to analyze what happens with these circuits Vle -d+l (a, xi , x2, y) afterthe random restriction pi more closely to be able to establish condition 2 of thetheorem.

Assume U is an r-dimensional cylinder, r = 1, 2, 3. Analogously with theproof of Lemma 10.4.7 (part 2) we have that with the probability at least

1 -t+d-I+r6M-

there are at least

Mr((C -d)log(m))l/2 mr-(I/2)

M

many *'s assigned to the Mr circuits corresponding to (XI, x2, y) E U. But at thesame time, with probability at least

1 -f+d-I+r6m

none of these circuits collapses to 1. Summing up these two probabilities, with theprobability at least

1 m-t+d-l+r1 - -3

all Mr circuits corresponding to U collapse to either * (i.e., to atom pxi,x2, y) or to0, and at most m - mr-(I /2) of them collapse to 0.

There are 3m2 1-dimensional cylinders, 3m 2-dimensional ones, and 1 3-dimensional, so the preceding holds for all of them with probability at least

> 1 -M- t+d+l


By the earlier computations the Ed+t m-circuit oracle B collapses tocircuit oracle with probability at least

1 - 2 3log(m)c+1

ES'tl,m

and by the preceding with probability at least

> 1 - m-e+d+l - 3 (d - 1)m-e+d+l > 1 - 3m-f+d+l

all circuitsd (a, x1, x2, y) collapse to Px1,x2, y or 0 with the "cylinder property"2 of the theorem holding.

Hence for m large enough this probability is at least 1/2 and hence there is acombined restriction q satisfying all these conditions. Finally, for some such fixed77 put

Q {(x1,x2,Y) I (Vd(x,,x2,Y))'1 =0}

and define the E j It-circuit oracle C 1 by

C :_ ((B)(,,))"

Q.E.D.

Now we apply the previous general theorem to a particular formula.

Theorem 11.4.4. For all d > 0, the EZ+d(a) formula

WPHP (a, 1d (xl, x2, y))

where *d (x, , x2, y) (and £) were defined previously, is provable in the theoryTZ+d(a) but not in S2+d(a).

Proof. By Theorem 11.2.4 WPHP(a, R) is provable in T2 (R). If we replaceR in the T2 2(R)-proof by (x1, x2, y), the E2 (R)-induction formulas be-come E2+d(a) formulas, and hence the proof becomes a T +d(a)-proof ofWPHP(a, (x,, x2, Y))

Let d > 1 and assume for the sake of contradiction that SZ+d (a) does prove theformula. By Theorem 7.2.3 there must by a polynomial time oracle machine Mwith access to a E'+d (a) -oracle witnessing the formula WPHP(a, *d (xl , x2, y)).By Theorem 11.4.3 for any sufficiently large m there are

(t) Q C [m]3(ii) Es"-circuit oracle Cl

with the properties stated there, such that M with access to CR witnessesWPHP(m, R) f o r all R c [m]3, R n Q = 0.

Now construct a sequence Ro , Ro , Ri , Rj , ... C_ [m]3 such that

1. Ro := 0 and Ro := Q

2. R+ n RI = 03. R+ is a graph of a partial 1-1 function from m x m to in4. IR; U Rt I < IQ I + j log(m)c, where c is the constant from Theorem

11.4.3

5. For any S c [m]3, S D R+, and S n Rb = 0 that is a graph of a partial1-1 function from m x m tom, the oracle answers to first i queries [CL1.?],j = 1, ... , i, are fixed.

This sequence is constructed identically to the sequence in the proof of Theorem11.2.5, noting that if a E ', -circuit oracle is true, its truth can be forced by < tvalues of some atoms.

Let R+ := R t and R- := Rio be the last pair in the construction, so inparticular:

l(R+ U R-) \ QI < io log(m)c < log(m)2c

Hence the cylinder condition 2 of Theorem 11.4.3 implies that for every cylinder U

IU\(R+UR )I>m1'2-log(m)2c>1

Now assume that M with the oracle CR+ on the input in outputs u 1, u2 < mthat should witness the formula; that is, it should hold that

Vv < m, -R(u1, u2, v)

Take the cylinder U

U (XI, x2, Y) E [m]3 I xl = u 1, x2 = u2 }

By the previous discussion there is at least one v < m such that

(U I, u2, v)R-

Hence for

R := R+ U {(ui, U2, v)}

machine M with oracle CR on input m also outputs pair (u1, u2), but now

3v<m,(UI,U2,v)ER

Candidates of witnesses for the other quantifiers of the formula WPHP are treatedsimilarly. This concludes the proof. Q.E.D.

Corollary 11.4.5 (Buss and Krajicek 1994 ). For any i > 1, the theory T2 (a) isnot V E b (a) -conservative over SZ(a).

Corollary 11.4.5 reduces the case i > 2 via Theorem 11.4.3 to the base casei = 2. We could, however, also reduce it to the base case i = 1 and argue in

the base case in analogy with Theorem 11.1.1. Or to use a more elegant versionof the herbrandization of the induction axioms for a E b (a)-formula, the iterationprinciple from Buss and Krajidek (1994)

(0 < f (O) n Vx < a, x < f (x) --* f (x) < f (f (x))) --* 3x < a, f (x) > a

Theorem 11.4.3 has to be modified to be applicable to formulas with functionsymbols. The main difficulty is that after the series of the restrictions the bit-graphof a function is determined to such an extent that the values of the function aredetermined for most arguments as well. That leaves no room for a diagonalization.

This technical problem can be overcome; we refer the reader to Chiari andKrajidek (1994). Here we confine ourselves to stating without proof the mainapplication of the modified lifting method.

Theorem 11.4.6 (Chiari and Krajidek 1994 ). For any i > 2, the theory T2(a) isnot b Eb 1(a)-conservative over S? (a).

11.5. Bibliographical and other remarksSection 11.1 is based on Krajidek (1992); earlier Buss (1986) contained a separationof SZ (f) and TZ (f ). Paris et al. (1988) prove WPHP in IDo + Q1; the analysisof the proof gives Theorems 11.2.3 and 11.2.4. Theorem 11.2.5 is from Krajidek(1992); Corollary 11.2.6 was obtained independently by Pudlak (1992a). Theorem11.2.7 and Corollary 11.2.8 are from Chiari and Krajidek (1994).

The content of 11.3 is essentially that of Riis (1993 a), generalized and with newproofs (the original ones were model-theoretic and applied to a language withoutfunction symbols). One corollary of his approach that we do not derive in this waywas a construction of a model of TZ (R) in which SZ (R) does not hold.

The content of Section 11.4 is based on Buss and Krajidek (1994), which ex-tended a construction from Krajidek (1993). Theorem 11.4.6 is from Chiari andKrajidek (1994).

12

Bounds for constant-depth Fregesystems

Constant depth Frege systems, or equivalently constant-depth LK systems (re-call Definition 4.3.11), are the strongest systems for which some nontrivial lowerbounds are known.

We present these bounds in this chapter. We shall start with the known upperbounds, however, to get an idea of the strength of the systems.

12.1. Upper boundsTheorem 9.1.3 and Corollary 9.1.4 can be used as sources of the upper bounds forconstant-depth Frege proofs: To assure the existence of polynomial size constant-depth F-proofs of instances of a combinatorial principle (and generally of anytranslation (A)n of a bounded formula), one only needs to prove the principlein bounded arithmetic IDo(R), or perhaps in I1 0(R) augmented by an extrafunction. The results of Section 11.2 then imply the following theorems.

Theorem 12.1.1. There are constant-depth Frege proofs of size nO(!og(n)) of theweak pigeonhole principle PHPnn:

V / \ -Pij V V V (pi, Pi2.i)i<2n j<n it<i2<2n j<n

v V A -Pij v V V (Pij, A Pij2)j<n i<2n j,<J2<n i<2n

Proof. By Theorem 11.2.4, I Ao(R) proves

alai < b - PHP(R)2a

232

12.1 Upper bounds 233

Hence by Theorem 9.1.3 the propositional formula

alai < b - PHP(R)2a )a=n,b=n "I

has no oogw) size, constant-depth Frege proofs.But this formula is clearly (shortly) provably equivalent to the preceding propo-

sitional version. Q.E.D.

Analogously as before, using Theorem 11.2.3 in place of Theorem 11.2.4, weget the following theorem.

Theorem 12.1.2. For every k there are constant-depth Frege proofs of the weakpigeonhole principle PHP",2

V / \-Pij v V V(pi,jAPi21)i<n2 j<n i1<12<n2 j<n

v V A -Pi j v V V (Pij, A Pij2 )j<n i<n2 j,<12<n i<n2

ofsize

We shall mention a few more combinatorial principles: the Ramsey theoremand the Tournament principle. We shall confine the analysis to the simplest casesof these principles.

Ramsey theorem. Every graph with n vertices contains either a clique or anindependent set of size > (log(n)/2)

Tournament principle. In a digraph G = (V, E) with n vertices satisfying

VV

W C V

VvEV\W 3weW,(w,v)EE

of size O(log(n)).

Note that as the sets whose existence is claimed in these principles are of alogarithmic size, they are coded by a number of size n 0(log(")), and hence, in fact,these principles are expressible as Eb (R)-formulas with the parameter n.00

Theorem 12.1.3. The theory T2 5(R) proves the Ramsey theorem:

Ilog(n)3X C {O, ... , n - 1 }, jXj>

L2

L2 J

A ((Vx, y E X, R(x, y)) V (Vx, y E X, -R(x, y)))

234 Bounds for constant-depth Frege systems

Proof The idea of the proof is to reduce the Ramsey theorem to the weak pigeon-hole principle PHP(S )2a for S definable by a bounded formula from R.

Claim 1. For x, y < n define the function F(x, y) by

1, if R(x, y)F(x, y) - jl

0, if f-R(x, y)

Then there are x1 < x2 < < xs < n and Eo, ..., Es_ I E (0, 1 ), where s = In I,such that

Vi < j < s, F(x;, xj) = Ei

We shall prove the claim in a sequence of steps. First we define the relationEc{0,...,n-I}`-sx{0,1}`s by

E((xl,...,xj),(EI,...,Ek)) := XI =0/ x1 <X2 <... <xjnk+l=jAVU

j`dy<xu+13v<u,F(xv,y) Ev

The last conjunct means that xu+I is the minimal xu+I such that

Vv < u, F(xv, xu+I) = Ev

Note that E E IIb(R) as u, v are sharply bounded.W e want to show that f o r some x = (x I , ..., xs) and E = (E I , ..., Es_ I) the

relation E(x, E) holds. For the sake of the construction assume that there is nosuch .z and E. We want to define a function

H : n {0, 1}:5s-I (< n/2)

which will be 1-1 and hence will contradict the weak pigeonhole principle.Define H by1. H(0) := 02. for 0 < x < n put

H(x) := y

where y = (E I , ... , Ej) E {0, l}', j < s, is the lexicographically minimaly (with minimal j) such that

3x1 < ... x j < x, E((xi, .. , xj, x), y)

Note that H E B(Ed(E)): That is, H E 13(Ez(R)).The range of H is included in {0, 1}s-I (that we identify via the dyadic rep-

resentation of numbers with n/2) so it remains to prove that H is defined on thewhole n and that it is 1-1.

12.1 Upper bounds 235

Let H(x) = H(x') = y, where y = (El, ..., Ej) and let it hold forx= (x1,...,xj,x) and x' _ (x,,...,xj,x')

E(x,y)AE(x',y)

Then we show xi = x' by induction on is It is true for i = 1 (as xl = x, = 0)and if xl = x,,, ... , xi = x,' then necessarily xi+1 = x,+i as xi+I is determinedby xi, ..., xi and E1, ... , Ei. This shows that H is 1-1.

To show that H is defined everywhere on n let x < n. If x = 0 then H(x) isthe empty word, that is, 0. Let 0 < x < n. We define the sequence xl , X2.... andEl,E2,...by

1. xl = 0 , El := F(xj,x)2. xi+l is the minimal y > xi such that for j = 1, ... , i

F(xj,y)=Ej

3. Ei+1 = F(xi+l, x)Clearly for all i : E((xl, ... , xi+l ), (61, ..., Ei )); hence if xi+l = x it holds

H(x) = (El,...,Ei)

Assume that xi # x f o r i = 1, ... , s - 1. That is,

xl < < xs_1 < x

Take xs < x to be the minimal z < x such that

xl < ... < xs-1 < z

and

F(xi,z) =Ei, i = 1,...,s- 1But then E((x1, ..., xs), (El, ..., Es-1)) holds, thus contradicting our assumptionthat there are no such x, E.

This means that in the construction xi = x, some i < s. That is,

H(x)=(EI,...,Ei), somei <s

Hence H is defined everywhere on n.By Theorem 11.2.4, TZ (H) and so TZ (R) proves that such H cannot exist.

This proves the claim.T o prove the theorem take x j, ... , xs and c l, ... , Es _ I such that

E(x, E )

and let

A:_{x,IEi=0}, B:_{xiIEi=I}

Then either I A I > (s/2) or IBI > (s/2), and A is an independent set while B is aclique in the graph ({0, ... , n - 1), R). Q.E.D.

The propositional version of the Ramsey theorem is a formula built from (2)atoms pij, {i, j} c {0, ... , n - 1), i $ j and has the form RAM,:

V ( A p4niv VA _ Piu,iva<u<s us

where s = L1Og(n) J. The length ofRAMn is 0(n log(') log(n)2).

Corollary 12.1.4. The formula RAM, has constant-depth F -proofs of size

nO(log(n)) = IRAMn10(1)

Proof. From the previous theorem follows, via Corollary 9.1.4, the bound of theform niog(")0"". The bound nO(iog(n)) comes from the inspection of the proof ofthe Theorem 11.2.4 showing that in the T2 2(R) proof of PHP(R)a we use onlynumbers of size < 0(a#a) = aO(IOg(a)). Q.E.D.

Despite the fact that proofs of the Tournament principle are similar to proofs ofthe Ramsey theorem, it is open whether it is provable in bounded arithmetic. Thiswould be interesting to know in relation to the proof of Lemma 10.2.2.

12.2. Depth d versus depth d + IIn this section we shall prove the exponential lower bound to the size of constant-depth LK-proofs. We shall also show a superpolynomial speed-up of the depthd + 1 system over the depth d system, which demonstrates that the depth d systemdoes not polynomially simulate the depth d + 1 system (w.r.t. to the proofs of thedepth d formulas). Note that by Theorem 4.4.15 constant-depth LK-proofs andconstant-depth F-proofs are equally efficient, though the depth in LK (resp. in F)proofs might differ by a constant.

We shall begin with a slightly refined version of Theorem 12.1.2. In this sectionwe shall denote the propositional version of PHP by PHP(A, B) to stress explicitlythe domain A and the range B of the map; this is because these will not alwayshave the form of an initial interval of numbers.

For technical reasons we shall work with set - PHP(A, B) of the formulas1. V,EB pii, for each i E A2. ,pik v -pjk, for each i 0 j, i, j E A and k r= B3. ViEA pi_i, for each j E B4. - pil v -pik, for each j 0 k, j, k E B and i E A

12.2 Depth d versus depth d + 1 237

and we shall study the LK-refutations of -PHP(A, B), that is, the LK-proofs ofthe empty sequent from axioms of PHP(A, B).

Recall that we identify n with {0, ... , n - 1).

Lemma 12.2.1. Let 0 < E < 1, and let A and B be two sets ofsizes IAA = [n'+'51and I B I = n.

Then there is an LK-refutation of - PHP(A, B) of depth 1 and length at mostnO(log(n)). Moreover, every formula in the LK-proof is either a conjunction or a

disjunction of size at most O(log(n)).

Proof. By Theorem 11.2.4 the theory TZ (R) proves PHP(R)n2. Assume that S isa graph of bijection between n'+E and n. Identify n'+E with the set n x nE andn 1+2E with n l +E x nE, and for (i, j) E n l +E x nE define the function S' by

Sl (i, j) := S(S(i), j)

Then S' is a bijection between n 1+2E and n.Similarly define the maps Sk between n1+2AE and n. Fork := Ilog(E-')1, Sk

is a bijection between n2 and n, and it is clearly Ab (S)-definable.By the preceding TZ (Sk), and so T2 (S), proves PHP(S )n'+'

Moreover, in the proof of Theorem 11.2.4 only numbers of size at most no (log(e))

are used; in particular, all quantifiers in the T22(S)-proof are bounded by n 00og(n )>

By Theorem 9.1.3 and Corollary 9.1.4 there is a constant-depth LK-refutationof - PHP(n 1+E, n) of size n O(bog(n)). Moreover, the proof of that theorem gives adepth 3, treelike refutation in which every depth 3 formula is a disjunction, andthe number of formulas in any sequent in the refutation is bounded by constant cindependent of n.

Claim. Assume that a sequent of the form

V01. \...., V 0k, r V V *JeII ik JI JP

occurs in the refutation, where IT, A are cedents of depth < 2 formulas, and otherformulas are depth 3.

Then for any choice of ii, ... , ik the sequent

has a depth 2, treelike LK proof ni I .....ik from the axioms -, PHP(n'+E , n), of totalsize at most

< n0(log(n))

lI ,...,lk

The claim is readily established by induction on the number of inferences inthe original refutation above the sequent, utilizing the existence of the constant c.

Analogously with Lemma 4.6.4 any depth d + 1 treelike proof (of a depth dsequent) can be transformed with only a polynomial increase in size into a depthd proof of that sequent (not necessarily treelike).

By the claim applied to the last sequent of the refutation there is a treelike,depth 2 refutation of - PHP(n i+F, n) in which every depth 1 subformula has sizeO(log(n)). Hence there is a depth 1 refutation (not necessarily treelike) of sizenO(Iog(n)), in which every formula is of size at most O(log(n)): That is, it is aconjunction or a disjunction of size O(log(n)). Q.E.D.

As in Sections 10.4 and 11.4, a formula is Ed'`, d > 0, if it is equivalent to aformula i having the properties

1. dp(fl <d+1,2. if dp(t1i) = d + 1 then the outmost connective of * is V3. f has < S subformulas of depths 2, 3, ..., d + 14. all depth 1 subformulas are (conjunctions or disjunctions) of arity at

most t.A formula is 11 " if its negation is Ed'`. A formula is D

iif both and are

expressible as disjunctions of conjunctions of arity at most t, that is, if they areboth E i ' `.

Ford, n > 1 arbitrary and any atom pu(i E A, j E B) let Sv be the depth d

Si ser function built from the atoms ' I "P pijd,n

Si,j AV ... Qid<n pijil<n i2<n

where Q is A if d is odd. For d = 0, let Sf " be just pi j. Let Var(Si, ") denotethe set of the atoms of Sia".

Recall also that for a formula 4) and a restriction p, O denotes the restrictedformula.

We will make an essential use of the following technical proposition.

Lemma 12.2.2. Let c, d, n > 1 and I AI < n2, I BI < n. Let S < 2"" 3' andt = log(s).

Assume that *1 .... u are formulas from Ed'` u la'` with the atoms fromV := Ui j Var(Sfd'") such that

Y,Ifil<Si{0,1}U{pijIiEA, jEB}

satisfying the following conditions:

1. all formulas fp are AP i < u2. for all i E A, j E B, the formula (Std'")p is either 0 or pij3. for every Us, s < r, for at least I US I n-(1 /2) pairs (i, j) E US it holds that

(S j'")'° = Pij

Proof The proof of this lemma is almost identical to the combinatorial part ofthe proof of Theorem 11.4.3 where sets U. are in place of cylinders. It utilizesHastad's switching Lemma 10.4.9 analogously to the proof of Lemma 10.4.10.

Here is a brief sketch; the details are left to the reader.For each i E A, j E B, partition the set Var(S.d'") into nd-1 classes of

the form

It < n}

one for each i 1 , ... , id- 1 < n.For each i E A , j E B, the probability spaces of random restrictions R (q),(q) are defined for restrictions of atoms from Var(Std'"). Hence for different

i, j these restrictions are independent.The random restrictions that are applied to *1, are disjoint unions of pij

independently drawn from R' (q).The rest of the argument is similar to that of Theorem 11.4.3. Q.E.D.

Definition 12.2.3. The E-depth ofan LK-proofir of length S = 17r I is the minimald such that every formula in it is from Ed'r U fi 't, fort = log(S).

Theorem 12.2.4 (Krajidek 1994). Let d > 0, n > 1, 1 > E > 0, and let A, B betwo sets of size I A I = n 1 +E and I B I = n. For sufficiently large n the following twostatements are valid:

1. Every E-depth d, treelike refutation of the set -PHP(A, B)(ptj/Std'")must have size at least

2'1/5

2. There are E-depth d, sequencelike, and E-depth d + 1, treelike refutationsof the set-,PHP(A, B)(p;j/Std'") of size atmost no(1og("))

Proof. Fix d, n, c and A, B satisfying the hypothesis. We shall need the followingmodification of the pigeonhole principle.

Let

r : A exp(B)

be a function satisfying(a) I r (i) I > n 1/2, for all i E A(b) Ir(-1)(j)I > nl'+E for all j E B, where R(-1)(j) := {i E A I j E r(i)}.

Then we define modified PHP, - MPHP(A, B, r) to be the set of sequents

-PikV -pjk, f o r j E A, kEr(i)f1r(j)

fori E A, r(i)={ji,.. , jr}

-pij V -pik, for j 0 k E r(i), i E A

pi, j, ... , Pi.,. j, for j E B, r(-1)(j) = {ii, ... , is}

We shall prove the first part of the theorem now. Assume that 7r is an LK-refutation of - PHP(A, B) (Sidthat is treelike and has size S = I7r 1 < 2"

/3

Let *1, ... , " be all formulas occurring in 7r (hence >i<,, I'i I < S) and letU1, ... , Ur list all n i+E + n 1-dimensional cylinders in A x B, that is, sets of theform {i} x B or A x {j}.

By Lemma 12.2.2 there is a map p

p : Var(Si`ij'") - {0, 1, pij} all i, j

such that(i) all 'p are A', t = log(S)

(ii) (Sid'")P is 0 or pij

(iii) I{(i, j) E Us I (Sd"")P = Pij}I > n IUsIDefine a map r : A -* exp(B) by

r(i) := {j I (Sid'")'° = pij}

Note that the map r satisfies (a) and (b).Proof 7r P, with every *i replaced by is a refutation of -' PHP(A, B) ((Sid'")P )

This implies that there is a treelike LK-refutation 7r' of -, MPHP(A, B, r) suchthat:

1. I7r'I < S+n°(d)2. every formula in 7r' is A', t = log(S)

This is because all formulas from -PHP(A, B)((Sid'")P) follow from-MPHP(A, B, r) by a treelike LK-proof of size no(d) in which all formulasare A' (just note that every (Sid'")P is proved to be either 0 or pij, depending onwhether j r(i) or j c: r(i)).

To finish the proof of the first part of the theorem it suffices to show thatMPHP(A, B, r) cannot be refuted by an LK-proof satisfying 1 and 2.

Claim. For n sufficiently large, r : A -* exp(B) a map satisfying (a) and (b), and7r' a treelike LK-refutation of - MPHP(A, B, r) satisfying I and 2, it must holdthat

S > 2""4_(I/2)

For a binary tree T labeled by formulas or sequents (proof trees are examplesof such trees) and a the node, denote by T" the subtree with the root a and with

the label of a changed to 0, and denote by Ta tree T \ Ta, with label of a (now aleaf) changed to 1.

We shall use a simple combinatorial fact that in a binary tree T there is node asuch that

ITal<2ITI and ITal<2ITI3 3

where I T I denotes the number of nodes in the tree (cf. Lemma 4.3.10 or the claimin the proof of Lemma 9.3.2 taken from Spira theorem 3.1.15).

To prove the claim we shall construct a sequence of binary trees T (1), T (2), .. .obtained from the original LK-refutation 7r' satisfying the hypothesis of the claim,and a sequence (Ro , Ro ), (Rt , R ), ... of pairs of subsets of A x B satisfying

1. Ru nR, =02.

3. (i,j)ER=+ -+ jEr(i)Put T (O) := n'. Assume that we have T(O)..., T(i). To construct T(i + 1)

find in T (i) a sequent a = F - A such that

Now distinguish two cases:(i) there is S C A x B such that

(a) SD R, and s n Rt = 0(b) S is a graph of a 1-1 function from A to B

(C) (i, j) E S -)- j E r(i)

(d) the evaluation

Du; =1, if (i, j) E S

0 if (' ') 0 S

makes the sequent r - A true.

(ii) there is no such S.In case (i), as r A is equivalent to a E i ''-formula (the disjunction of the

negations of the formulas from i' and of the formula from A), pick R+1 C- S,R+ 1 D R+ and Rr+1 nS = 0, Ri-+I D Rt satisfying conditions 1-3 (in particular,

(R1 U R1+1) \ (R! U R!) I < t) forcing one of the disjuncts of size < t of theE S" -formula to be true. Also set

T(i + 1) := T(i)a

and label a by 1.

In case (ii) define R+1 := R! , R_I := R- and put

T(i + 1) := T(i)a

and label a by 0.We obviously have the following two properties:4. the nodes of T(i) are labeled by the sequents of 7r' or by 0, 15. IT(i)I < (3)' S

Let T (f) be the last tree in the sequence: That is, T (f) is one node. We have6. 2 <_ 1og3/2(S) < t log3/2(2) < 2t7. the node T (f) is labeled by 0 and was an initial sequent R - E of 7r'8. for every S C A x B satisfying (a)-(d) with i = £, the evaluation

Pij _-1

1,

0,

if(i,j)ESif (i, j) V S

makes n - E false.Condition 6 is obvious (from 5); 7 and 8 are proved by induction on i (the root

of every T(i) is labeled by 0 and any S satisfying (a)-(d) makes the sequent inthat node of jr' false).

It follows that 11 -p E must be a sequent of the form

Piji,...,Pijr

for some i E A, r(i) = { ji , ... , j,) or of the form

-) Pi 1 j , ... , Pi' j

for some j E B, r(-')(j)But

So for

IRe I

if i 0 dom(RQ) we can find j E B such that (i, j) ¢ R-. This shows (as f < 2t)that if

<n1/2

no Re , Re can force that all S considered make the first sequent false. Similarlyno S can force the second sequent to fail. Hence we get

t>ni/4.

1

2

12.3 Complete systems 243

and

S >

This proves the first part of the theorem.To prove part 2 note that substituting S.d'n for pig in the refutation guaranteed

by Lemma 12.2.1 yields a E-depth d refutation of -PHP(A, B)(Sid'y`) of sizen O(log(n)). Q.E.D.

Corollary 12.2.5. For any d > 0 there is a superpolynomial speed-up (m ver-sus exp(exp(Q (log 1/2 m)))) between the sequencelike and the treelike E-depth drefutations of sets of depth d sequents.

We conclude this section by a problem.

Open problem. Is there a constant c > 0 such that for every d > c there is asequence of sets (Td )n of depth < c sequents of the total length n'(I) satisfying

(i) any E-depth d, treelike refutation of Tdn

must have the size exp(n"(1)),(ii) Td

nhas more than polynomially shorter E-depth d + 1, treelike refutations?

12.3. Complete systemsThis section is devoted to the main technical notion in the lower bound proofsfor constant-depth Frege systems, the notion of complete systems. The completesystems generalize the truth tables or, from a different point of view, the disjunctivenormal forms or decision trees. We defer the discussion of these connections untilthe beginning of section 12.4.

We shall consider two combinatorial situations, related to the pigeonhole prin-ciple and to the modular counting principle.

Let n be a natural number and D and R two sets of cardinalities n + 1 and n,respectively. MPHP denotes the set of partial 1-1 maps from D into R. For a EMPHP, Ial denotes the number of the pairs forming a: That is, IaI = I dom(a)I =

I rng(a)ILet A be a set of the cardinality a n + 1. Apartial a-partition of A is a set a of

disjoint a-element subsets of A. Let MMODQ be the set of all partial a-partitionsof A. For U E M MOD Ia I denotes the number of the equivalence classes of a.

Further M denotes either MPHP or ,MMODp

Definition 12.3.1.(a) ForHcM,thenormilHllofHis

IIHII:=maxlalaEH

(b) The set S C_ M is a k-complete system iff the following conditions arefulfilled:

1. Va,0ES; a

2. VyEM; IYI<n - k --* ( 3 U yUCI E.M)

3. `daES; I a I <k.

(Note that S#0by2fora=0.)

Lemma 12.3.2. The following are examples of k-complete systems:1. Let io E D and let S C MPHP be

S := {(io, j) I j E R}

Then S is 1-complete.2. Let io E D, jo E R, and let S C MPHP be

S:={(io,Jo)}U{{(io,J),(i,jo)} Iio0i E D, jo0j E R}

Then S is 2-complete.3. Let X C_ A be a set with at most k elements and let SX be the set of all

partial a-partitions a of A such that

(a) X is contained in the support of a U a)

(b) every equivalence class of a intersects X

Then S is k-complete.

Definition 12.3.3.1. The set S C M refines the set H C M, H a S in symbols, wit holds for

allaES

or 3yEH,yca.2. For S, T C M, a common refinement of S and T, S x T in symbols, is

the set

SxT:=let Up EMIaES,19ET)3. A projection S(H) of H C M on S C M is the set

S(H) := {al 3y E H, y c a}.

Lemma 12.3.4. Let H a S a T for H, S, T C_ M and assume that S is k-completeand that I I T I I+ k< n.

Then

HiT

Proof By the k-completeness of S and by I I T I I + k< n

VPET3a'ES,a'U/3E.M

hence by S < T it holds that

V E T3a E S, a C f (*)

Assume now that y E H,,6 E T are such that y U ,B E M. By (*) there isa E S s.t. a C P. Thus Y U a E Nl too, and hence y' C_ a for some h' E H, byH < S. We proved y' C ,B as it was required. Q.E.D.

Lemma 12.3.5. Let S, T C M be such that S is k-complete, T is f-complete,IISII +f <n, IlTll +k <n, andk+f <n.

Then the following two conditions hold:1. S<SxT,T<SxT.2. S x T is (k + f)-complete.

Proof To prove the first part let a E S and a U (0 U y) E .M hold for someelement 0 U y E S x T. By k-completeness of S then a= ,B so a C_ ,B U y. Thisshows S< S x T. T< S x T follows identically.

To prove the second part assume 0 U y, /3' U Y' E M for 0 U y, 0' U y'two elements of S x T. Then either /3 ,B' in which case /3 U /B' 0 M by thecompleteness of S or y # y' (i.e., y U y' M by the completeness of T resp.). Inboth cases thus (/3 U y) U (#'U y') 0 M. This proves condition (i) of Definition12.3.1

To verify (ii) let lal + k + f < n. Then, as IISII < k and IITII < f, by thecompleteness of S, T there are,6 E S and Y E T for which a U (/B U y) E M.

Condition (iii) holds too as clearly I IS x T I I < IISII + IITII < k + f. Q.E.D.

Lemma 12.3.6. Let H, S, T C M, let S be k-complete and T be f-complete.Assume that IISII + f < n, IITI I + k < n and that H < S < T.

Then

T(S(H)) = T(H) and T(S) = T

and

S(H)=SET(H)=T

Proof. T(S(H)) C T (H) is obvious from the definition of a projection.To see T(H) C T(S(H)) let /3 E T(H) and y C ,B for some y E H. Then

from (*) in the proof of Lemma 12.3.4 a c_ 0 for some a E S. Hence y U a E Mand, as H < S, we have y' C_ a for some y' E H. Thus also y' C_ a C ,B, that is,,B E T(S(H)).

T (S) = T follows by taking H :_ {0}.

Finally, to establish equivalence first assume S(H) = S. By the first partT(S) = T and T(S(H)) = T(H), and so T(H) = T.

For the opposite implication assume T(H) = T, and take a E S. By the 2-completeness of T (and by la I + £ < I I SI I + e < n) a U ,B E M holds for some0 E T. By the assumption y c ,B holds for some y E H. Hence a U Y E M andH a S implies that y' C a some y' E H. Thus a E S(H). Q.E.D.

Lemma 12.3.7.1. Let S, Hi E M, i E I be arbitrary sets.

Then:

S(uHi) =US(Hi)/ I

2. Let S C_ M be a k-complete set and So, S1 C S its two disjoint subsets,and let T C M.Then it holds that

T(So)nT(Si)=0

3. Let S, T C 4 be two sets such that S< T, S is k-complete and I T I I +k < n.Let So C S be any set.Then

T(S \ So) = T \ T(So)

Proof. The first two parts follow straightforwardly from the definition of a pro-jection. In the third part Lemma 12.3.6 implies

T(S) = T

By parts 1 and 2

T(S \ So) = T \ T(So)

Q.E.D.

In the applications of the complete systems in the next section we shall movefrom a situation with n, D, R, MPHP (or n, A, MMOD. ) and some H, S, T.... CM to a situation where some information about a partial map (resp. a partialpartition) will be available. The next definition and lemma formalize this.

Definition 12.3.8. Let a, p E M. Define the restriction ap of a by p

ap.- a\p, ifaUpEMundefined, if a U p 0 .M

Further put Hp :_ {ap I a E H} for H C M, DP := D \ dom(p) andRP := R \ mg(p) in case of the PHP (and AP := A \ supp(p) in case of theMODQ),andnP:=n-IpI

Note that aP undefined is not the same as aP = 0. The support supp(p) is theset UeEP e. For P E MPHP this is dom(p) U rng(p).

Lemma 12.3.9. Let H, S, K C M and let p E M be arbitrary.Then the following three conditions hold:

1. if H a S then HP a SP,2. if S is k-complete and I p I +k < n then SP is also k-complete,3. if K = S(H) and H a S then KP = SP (HP).

Proof. For the first part let y E H, a E S satisfy yP Uap E M. Then Y Ua E .Mand thus y' c a for some y' E H. Clearly (y')P C aP.

For the second part assume that up U a2 E M for al, a2 E S. Then alsoal U a2 c .M and hence aI = a2, that is, also up = UP

Now let I y i + k < (n)' for some y E MP C M. Since IpI = n - (n)P we getI y U p I +k < n. By the k-completeness of S then for some a E S: (y U p) U a E M.But as y = yP clearly then y U ap E Mp.

The inequality I I SP I I < 1 1 5 1 1 < k is trivial.For the last part of the lemma let KP E KP, for some K E K. Then we have

K E S and y c K for some y E H, which implies KP E SP and also yP c KP, andthus KP E SP(HP).

Finally let Up E SP, yP E HP for which yP c_ aP. Then y U a E M andy' C a for some y' E H, as H a S. Therefore, a E S(H), which entails a E K,and so ap E KP. Q.E.D.

We will want an object p E M to have certain particular properties. It is difficultto construct suitable p explicitly, and instead we shall show that such p exists bya counting argument.

The following lemma is a crucial technical result, a statement analogous to theswitching Lemma 3.1.11 for this situation. See the discussion at the beginning ofSection 12.4 and also Lemma 15.2.2.

Lemma 12.3.10. Let H1, ... , HN C_ M = MPHP, such that for all i < N,I I Hi I I < t < s. Let w < n. Assume that

ws

(n + 1 - w)4st3s> N

Then there is p e A4, I p I = w, such that. for every i < N there is Si C MP suchthat

1. Si is 2s-complete (wnt. MP)2. HfiS,,all i <N.

In particular, for 0 < S < c < (1/7), w = n -n' and t = s = n8, the precedinginequality is satisfied for any N < 2"a. If w = n - nE and N = no(' then theinequality can be satisfied with t = s = O(1) sufficiently large.

Proof First we shall consider the case when N = 1. Denote H := H1. We shalldivide the proof into several steps.

1. Enumerate by h', h2, h3, ... the elements of H, where H C M, IIHII <t < s.

2. We shall define a game played by two players. They construct a sequenceof maps So c 31 c ..., So = 0, Si E M.Assume that during a play the sequence So c 31 C_ ... c St is alreadyconstructed.

(a) Player I plays he+1: the first h'e+1 in the enumeration h', h2, ... suchthat he+1 U Se E M. Hence its move is completely determined by Seand by the enumeration.

(b) Assume that he+1 = p;lji ... p;,j,. Player II chooses an arbitraryC-minimal Se+1 Se such that {i 1, ... , it) c dom(Se+1) and{J], ..., it) C rng(Se+1)

The play is finished if either player I cannot make a move or player IIconstructed Se+1 ? he+1

3. Define the set S by

S {Sk+lI

some So C 61 c ... c Sk+l is a sequence

constructed in a finished play}

Claim.

(a) S is I I SI I -complete.

(b) H < S.

Assume that 8 0 S' E S. Let Se S' be the first move of II that wasdifferent in the two plays. Clearly (as Se, SP are C_-minimal) Se U 8e .M,so6U3'VM.Take a E .M such that lal + I ISI I < n. Player II will follow the followingstrategy: On supp(er) answer according to a; otherwise play arbitrarily butconsistently with a. The inequality I a I + IISII < n implies that he canalways move. Clearly a U S E M for any output 8 of such a game.To see the second part of the claim notice that S is a refinement of H bydefinition when the play terminates.In general it is not true that I I SI I < 2s. We shall use map p E M and showsuch an inequality for H. By Lemma 12.3.9 the claim will remain validafter the restriction by p.

4. Let h1, h2, ..., hk+l be the elements of H played by player I in a playS o c 81 c ... c 3k+1. We shall call hi the ith critical map.Now we consider the game played with Hp instead of H, but we shall usethe same ordering on Hp induced by taking the first element of H amongall hi identified by restriction p.

5. Suppose that in a play we obtained the critical maps h°, ... , hk+l and80931 c ... c Sk+l The pairs of hp \ 8i_1 will be called critical pairs.

6. We wish to show that for some p E .M, IpI = w, in every play will occurless than s critical pairs. That would imply IISII < 2s.Assume, for the sake of contradiction, that for every such p there is a playin which at least s critical pairs occur.We assign a set of parameters for each such a play with > s critical pairs.Denote by

so:=0

s;:=8i\8i.i

8; = (dom(hi) x mg(hi)) n 8i = (dom(hp) x mg(hp)) n 81

Let

si = Ihp\Si_1 I for 1 < i < k

be the number of critical pairs in hp and put

Sk+1 =s-S1...-Sk

Also take di = I81 I and note that

18; I = 2si - di

The sets T1, ... , Tk+1 c { 1, ... , t) are defined as follows. Let i < k befixed and let

hi={ei,...,e11}

Then

hp \ Si !_= {ej}jET,

The map hi and the set Ti thus determine the critical pairs of he. We defineTk+1 in a similar way, but we take only the initial part of h k+1 \8k consisting

of the first sk+i pairs.7. Let yl, ... , yk be the partial one-to-one maps with the support contained

in { 1, . . . , t) defined by the following. For i < k fixed suppose

dom(hi) = Jai,..., at,) and al < . <ai;

mg(hi) = {b1, ... , bi } and b1 < ... < b,,

Then put

S _ I(ae, by,(e)) I f E dom(y;))

Hence h, and y; determine S* for all i < k.8. Let r be the union of p with the first s critical pairs. Hence T E M and

Irk = w + S.9. We also definemaps,Bl,...,Pkfrom(1,...,2(s, -d,)) into {1,...,n+

1 - w - s} such that (for i = 1, ... , k) Pi determines S' - 8*. Let

{a , ..., as. _d; } = (dom (S;) - dom (S, )) n dom (h;) ,

with a,, < < as. and

{bI,... , bs _d,} _ (mg(S,) - mg(S; )) n rng(hi)

with bl, < < bs,-d,Also let

D\dom(r) = {UI, ... , _w_s} and R\rng(r) = {vI, ... , vn_w_s}

Then

S;(aj)=v#;(j) forj=1,...,s,-d,and

uPI(s;-d,+j) for ] = I.... , s, - d;

10. The parameters r will be the pair (r, 7r0), where Jro is the tuple consistingof the following objects

T1..... Tk+I; YI..... Yk: A, - .. , Pk

11. The number of Tl , ... , Tk+I is bounded by

t< (111

...\Sk+1/ < \sl/

...\Sk+l/ C is

The number of yl, ... , yk is clearly at most

< (dl) . td.. ... (;:) . tkdk < t2s

The number of ,81, ... , Pk is also obviously bounded by the product

(n+l -w-s)2(si-d1).....(n+1 -w-s)2(sk-dk) < (n+l -w-s)2s

Hence we get an upper bound to the number of tuples Iro

<t3s,(n+1-w-S)2s

12. Take the parameters ir. We shall show that they, in fact, determine the mapp. This is done by the following process:

(a) Put rI := r; rl determines hl as the first hil consistent with it.Knowing h I and TI yields the set KI of the critical pairs in h 1. FromK1 and the maps yl and,Bl we reconstruct the first move S 1 of player II.

(b) Put r2 := 81 U (rI \ KI ). The map r2 determines h2 as the first hitconsistent with it. Hence, as earlier, with the help of T2 we get the setK2 of the critical pairs in h2 and further from y2, 02 also the map S2.

(c) Repeating the process k + 1-times allows us to reconstruct the set ofthe firsts critical pairs Ui<k+I Ki. But then

p=r\ U Kii<k+l

13. As it defines p uniquely, the number of such parameters 7r cannot besmaller than the number of possible p. Define

A:=I{pEMIIPI=w}I

and

B:=I{rE MI rl=w+s}Iand compute the number

A

which is equal to

`w +S)S J

(n-w)(n+l-w)sl . t3s(n + 1 - w - s)2SS / s

as (wSS) is the number of p contained in one r while (n sw)(n+S-w)s! is

the number of r extending a given p.By a simple calculation we get that this fraction is at least

ws

(n + 1 - w)4st3s

which is

>N>lby the hypothesis of the lemma. Hence the number of possible maps p isbigger than the number of possible parameters 7r. Thus there must be p forwhich the assumption that there is at least one play with > s critical pairsfails.

Now return to the case N > 1. Assuming that for each p there is at least onei < N such that there is a play on hp with > s critical pairs, we may reconstructp knowing i and the parameters it for such a play.

This shows that, in fact, the same argument works as long as

ws

(n + 1 - w)4st3sN> 1

Q.E.D.

We conclude this section by stating an analogous lemma for the case A4 =,MMOD°. Its proof follows identical lines.

Lemma 12.3.11. Let a > 2. Let H1 .., HN C M = MMODQ such that for alli <N, IIHi II <t <s. Let w < n. Assume that

ws >N

where c > 1 is a constant depending on a only.Then there is p E M, I p I = w, such that for every i < N there is Si C MP

such that1. Si is (a s)-complete (w.rt.MP).2. HipiSi,alli <N.

In particular, for 0 < 8 < e sufficiently small (depending on a) and w = n -nE andt = s = n3, the preceding inequality is satisfied for any N <

2' s.If w = n - nE

and N = n0(t) then the inequality can be satisfied with t = s = 0(1) sufficientlylarge.

12.4. k-evaluationsThis section introduces the crucial notion of k-evaluation. First, however, we shalldiscuss the notion of a complete system as promised at the beginning of the previ-ous section. We shall confine discussion to the situation with the Boolean variablesX1, ... , x and the notion of a complete system w.r.t. the set M"'' of all total truthassignments to the variables. That is a simpler situation than those considered inSection 12.3 but well explains the intuition behind the definitions and the state-ments of that section. See Section 15.2 also.

A complete system w.r.t..A4"v' is a set S of partial truth assignments tox t , ... , x such that

1. any two a # ,B from S are incompatible2. every total truth assignment a E .M"" contains some 0 E S.

We identify the elements a E S with the blocks

[a] := {fi E Mtrivi I a C ,B}

12.4 k-evaluations 253

of total assignments extending a. Then S is a complete system if the set of theblocks [a], a E S, is a partition of M"ivi

Let O(xi, ..., x,z) be a propositional formula. The truth table of 0 is simply amap assigning to every a E Mtr'vi the truth value 0(a). If S is a complete systemsuch that

0(-t)(0) = U [alaES0

and 0(-t)(1) = U [a]aES1

for some partition So U Si = S of S then the truth table for ¢ can be abbreviatedby a map from S into 10, 1): map a E S to 0 iff a E So.

Moreover, the formulas 0 and -0 are expressible in particular disjunctive nor-mal forms:

V aCx and -.P- V otcaES1 aESo

where a c X abbreviates the maximal conjunction of literals made true by a. Notethat a complete system is just a particular disjunctive normal form of the formula1, namely such a form in which the disjuncts are mutually incompatible.

Disjunctive normal forms for 0 and -0 with this property are obtained fromany decision tree for 0 (see Section 3.1), and conversely, a complete system Sallowing the expression of 0 and -0 as earlier also yields a decision tree for 0(see Lemma 3.1.13). The depth of that decision tree will be < 11S112. Hence it isof interest to have complete systems of small norm.

If 0 = x; then the complete system Sxi = {x, r+ 0, x, H 11 of the norm 1allows the expression of 0 and ,0 as previously. If we have such a system SOfor 41 then it obviously works for -0 too. Hence the only difficult case in theconstruction of a complete system So of small norm by induction on the depth of0 is the case when 0 _ Vi 41; . A system S allows an expression of 0 and -0 asearlier if the sets Qi(-')(0) and 4(-1)(1) are unions of some blocks determined byS. However, having systems S pi for ¢; allows an expression

(P(-1)(1) = U [a]

aEH

of 0 as a union of blocks, where

H=UiaESs,

and it has the norm II H11 = maxi II So, II small. The problem is that H is notnecessarily a subset of a complete system. But if S is a system refining H

VaES, (3p EH, Pca) V(VPEH, P.La)


then

V acx and -0-= V aCxaES1 aESO

where

S1={aESI3jf EH, 0Ca}and

So={aESIY,5EH, fiIa}A lemma stating that there is a partial truth assignment p such that there is a smallnorm S refining Hp thus replaces the switching Lemma 3.1.11 in this situation(see Lemma 15.2.2 and Lemmas 12.3.10 and 12.3.11).

Assume that we have for formulas 0 complete systems So allowing the expres-sion of 0 and -as earlier. This allows us to assign to 0 the Boolean algebraexp(SS) of the subsets of So with a value H,

HO :=(aESo I[a]c0")(1)

We may think of HO as of those a E S forcing 0 true (see Section 12.7). Henceis a tautology iff HO = So if all a E So force 0 true.

Now, in the particular definition of k-complete systems w.r.t..MPHP no elementa intuitively forces PHP true. Hence one expects to get an "evaluation" of formulasin which PHP will not be true. The notion of k-evaluation formalizes these ideas.

Definition 12.4.1. Let r be a set of formulas closed under subformulas. Theatoms of the formulas from r are pig if .M = MPHP and pX if M = MMODa.

A k-evaluation of r is a pair of mappings (H, S)

H : F --* exp(M) S : r -+ exp(M)

satisfying the conditions1. V4OEF, H.CS(pCM2. E r, S. is k-complete3. Ho = 0, Hl = {0}, So = S1 = {0}

4. For M = .MPHP:

Hpi; = {{(i, j)})

Spit = {{(i, j'), (i', j)}Ii' # i, j' # j} U {{(i, j)}}

For M = .Mmoou

HpX = {X}

SPX ={aEMIXC(Ua) A (VYEa, YnX#0)}5. `d-'pEF, Hv =S,\H. and S,=S,

12.4 k-evaluations 255

6.

V(OEF, <o=V<Oi---> A Hw=S (UHviiE/ iEt iEI

Note that by Lemma 12.3.2 the system Sp i (resp. SPX) from 4 is 2-complete(resp. a-complete).

In the sequel we denote by px the atoms of both forms pig and px to sim-plify the notation. In the former case X = {i, j}. The next lemma follows fromLemma 12.3.9 and from the observation that the set {0} is refined by any other set.

Lemma 12.4.2. Let P E .M and px be an atom. Define

1 ifX c p

(px)p = 0 if XU p 0 .M

px otherwise

For cp a formula define coP to be a formula obtained from cp by replacing allatoms px by (px)P (but performing no further simplifications). For r a set offormulas, put F'- := {cpP

I(p E F}.

Then for any set of formulas I', any p E M such that I AI + k < n, and anyk-evaluation (H, S) of r, the pair (HP, SP) is a k-evaluation of F.

. The following is a crucial theorem.

Theorem 12.4.3. Letd > 1 and0 < E besuf cientlysmall, andlet 0 < 6 < Ed-1.Let r be a set of formulas of depth d closed under the subformulas, and assumethat Fl <2's.

Then there e x i s t p E .M, I pI < n -nEd 1,

and k < O(ns) such that there existsa k-evaluation of F.

Proof. For d = 1 the formulas in r are atoms and constants, and we have theirevaluation by 2 and 3 of Definition 12.4.2. In this case p = 0.

Assume now that the lemma is true for d > 1 and let r be a set of formulas ofdepth d + 1 that is closed under the subformulas. Let F' be the depth < d formulasfrom F. Let 0 < Ed(= Ed+1-1) be given.

By the assumption we have p' E M, Ip'I < n - nEd-I, k < O(ns), and ak-evaluation (H', S') of (1-')P'.

Let in = n - lp'l, that is, m > nEd '. We shall extend p' by a suitable p"to form p. By Lemma 12.4.2 the restrictions of MPG and S' by p" will be < k-evaluations of ((I")P')P" again; thus we only need to find p" so that we can extendthis evaluation to the whole F.

Consider the case M = MPHP. For a negation it is obvious for any p". For adisjunction 0 = Vi ¢i apply Lemma 12.3.10 with n, D, R, and H replaced by

m, DP', RP', and IJ, H. , and taking t = s = ns, w = m - mE. The followinginequalities hold

ns = nEa I.(31Ed i) < m(s' I) andS

< EEd-1

By Lemma 12.3.10 then there exist p" E Mp and S C ,A4P'P" such that

U(H " < S and I p"I = m - mE

For such p" extend the evaluation (H', S') to cp by defining

Sv=S and H,, =S(U(H',)P" I

From the hypothesis that I T I < 2"s it follows that there are < 2' such disjunctionsand hence by Lemma 12.3.10 there is at least one p" with these properties satisfiedfor all such cp e F. This is enough as we also have

IPI=Ip'Up"I <n-m+m-mE=n-mE <n - nE

The case of M = MMODII is treated analogously using Lemma 12.3.11.Q.E.D.

Lemma 12.4.4. Let

R :Y1,..., Yt

Yo

be a Frege rule.Then there is a constant r satisfying the following condition: Whenever

Y1 (*I, ,*m), , yt(Y'1, , Yam)YOM, ,*m)

is an instance of the rule and (H, S) is a k-evaluation of all subformulas occurringin the instance with k < n/r, and if H0, = St, for i = 1, ... , t and 9i _Yi (* 1, , yam ). then also

He. = So.

Proof. Let R be given and take r to be greater than the number of subformulasin R.

Let (H, S) be a k = (n/r)-evaluation of the set r of formulas of the formy(*,, ..., Yam ), where y is a subformula of some y; , i = 0, ... , t. Suppose thatHo,=Se;forl<i<t.

Take T to be a common refinement of all Sy, y E F. Such system exists byLemma 12.3.5 and is (n (r - 1)/r)-complete. Hence I I S I I + I I T I I < n for everyyEF.

12.4 k-evaluations

Assume first that -y E F. Then

Hy=Sy\Hy

and so by Lemma 12.3.7

T(H y) = T \ T(Hy)

257

Now suppose that a, 0, a V ,B E F with a and 0 having the forms Vi E A t andVIEB 4j, respectively. Hence, using Lemma 12.3.7 again,

Havp=Savd (UH.ti)US"'VP (UH1).

iEA iE

From Lemmas 12.3.7 and 12.3.6 we obtain

T(Hav,6) = T CSav,B (UHHHiUT \Savf \"

UHkiiEA ) / iEB

= T (tUHk)UT (`UHt)

= T (S-( tUHHi))UT (S$(UHk)) =T(Ha)UT(HH)

Moreover, by Lemma 12.3.6

T(H0)=T(S0,)=T

f o r i = 1, ... , r, since He; = Se, and Se; is complete.The mapping

y - T(Hy)

is thus a mapping of r into the Boolean algebra of subsets of T mapping , on theoperation of the complement, v on the operation of the union and O (i = 1, ... , t)on the whole set T.

Because any Frege rule is sound, we must have

T (He,,) = T

Hence by Lemma 12.3.6 also

Heo = S00 (He0) = S60

This concludes the proof of the lemma. Q.E.D.

12.5. Lower bounds for the pigeonhole principleand for counting principles

In this section we obtain strong lower bounds to the size of constant-depth Fregeproofs of the pigeonhole principle formulas PHPn and the modular counting prin-ciples Count'. First we shall define the formulas as we used some of them earlierin various forms. Recall that [A]a is the set of a-element subsets of A.

Definition 12.5.1.1. For I DI = n + 1 and I R I = n the formula PHPn is built, from atoms quv,

uEDandvER:V V (q11IV A qu2u) V

V V(qnv, A quv2)

ui<u2<n+l v<n u<n+1 vl<v2<n

v V A -quv V V A -quvv<n u<n+l a<n+l u<n

2. For I A I = a n + 1 the formulas Count' are formed from atoms px,X E [A ]a:

Count' := V (pX A PY) V V A -pxX¢Y,Xf1Yoo i<n iEX

Note that PHPn formalizes that there is no bijection between D and R and it is aweaker statement than the often used statement that there is no injection of D into R(and thus lower bounds will be stronger statements). The latter formula expressesthat no a -partition of A can be total and it is a tautology as I A I - 1(mod a).

We shall obtain exponential lower bounds to constant-depth proofs of PHPnand Count'. We shall also study the dependence among various instances of theseprinciples.

It is easy to see that PHPn, (q) follows from Count 2M (4'x) by a constant-depthsize O(m2) Frege proof, where A = {0, ... , 2m} and Ox is for X = {u, v},u < m + 1, m + 1 < v < 2m + 1, and Ox is 0 otherwise. In the opposite directionAjtai (1990) showed that there are no polynomial size constant-depth Frege proofsof Count,2,(px) from any instances PHPn,(0,,,,) of the pigeonhole principle,formulas in atoms px (implicitly of bounded depth). We shall also give a proofof a lower bound to the constant-depth Frege proofs of formulas Count" from theinstances of formulas Counts for different primes a and b.

Lemma 12.5.2.1. Let M = M PHP and let (H, S) be a k-evaluation of the set ofsubformulas

of the formula PHPn and let k < (n /2) - 3.Then HPHP = 0 and hence

HPHP' 0 SPHP

12.5 Lower bounds for the pigeonhole principle and for counting principles 259

In particular if p E .M and k < ((n - Ip1)/2) - 3, then

H(PHPP)P S(PHPP)P

2. Let M = M MODa and let (H, S) be a k-evaluation of the set of subfor-mulas of the formula Counts and let k < (n/2) - 2a.Then HCountn, = 0 and hence

HCountn S Counta

In particular, if p E A4 and k < ((n - Ip1)/2) - 2a, then

H(Counta)P 0 S(Countn)P

Proof. We shall prove part 2 of the lemma; part 1 is completely analogous.Count' is a disjunction of the formulas of the form

Px A Pr

where X n Y¢ 0, X 0 Y, and

A\ -PxiEX

We shall show that for any such formula n : H,t = 0. This will be enough as

HCountn = SCounta (y H) = Scountn(0) = 0n

Consider first the formula

-(-Px V -pr)for X n Y 0, X 0 Y. By Definition 12.4.1

Hp X={aE.MIXcsupp(a)AVZEa,ZnX 0}\{X}

and

H pY={P EMIYcsupp(,B)Ac'ZE,B,ZnY 0}\{Y}

(recall supp(a) U a). For 2a-complete set T

T:=(aE.MIXUYcsupp(a)AVZEa,Zf(XUY) 0}

it holds that

T(H pXUH pY)=T

By Lemma 12.3.5 there is k + 2a < n/2-complete system W refining both T andS pXV-.pY. Hence Lemma 12.3.6 implies

W(H pXUH pY)=W

and then also

HPXV-pY = S

This gives

Now take the other formula

The set

HPXAPY = 0

n ViEXPX

T:={XE[A]aIiEX}is 1-complete and by Definition 12.4.1

HH = S,i(T)

By Lemma 12.3.6

hence

Sn (T) = Sn

H,i=0The more general statement about the restricted formula (Count n)P follows in

exactly the same way observing that for any rl of the preceding form if one of itsatoms gets the value by p then all its atoms get a value and (1)P - 0, that is,H(,?),, = 0. Q.E.D.

Now we are equipped to prove a strong lower bound.

Theorem 12.5.3 (Ajtai 1988, Beame et al. 1992). Assume that F is a Frege proofsystem and d a constant, and let n > 1.

Then in every depth d F -proof of the formula PHPn at least

2n

different formulas must occur.

In particular, each depth d F proof of PHPn must have size at least 2n`) andmust have at least c2

(2n()16))

) proof steps.

Proof. Let F be a Frege system and d a constant, let n > 1 be sufficiently large,and assume that jr = (91, ... , 9e) is a depth d F-proof of PHPn. Take F to be theset of the formulas occurring in 7r as the subformulas.

Assume for the sake of contradiction that

Iri < 2W

Pxv-PY (HPx U HPY) = SPxv,PY

forsome 8 < (1/5d-I).

Let f be a constant greater than the constant r assured by Lemma 12.4.4 forall rules R of F. Let 0 < E < 1/5 such that 8 < Ed-1. By Theorem 12.4.3 there isp E M = MPHP such that

I pl < n - nE

and k = 2ns-evaluation (H, S) of (F')p. By Lemma 12.4.4, as (nP/f) > k

H(,9,), = S(er ),

for all steps 6; in it. At the same time by Lemma 12.5.2, ask < (nP/2) - 3

H(PHP,),

This is a contradiction; hence

InI>2"sThe lower bound to the size follows from trivial IJr I > 1 r 1, and the lower bound

to the number of proof steps follows from Lemma 4.4.6. Q.E.D.

The following theorem follows completely analogously.

Theorem 12.5.4. Assume that F is a Frege proofsystem and d a constant, and leta > 2.

Then there is c > 0 such that for sufficiently large n, in every depth d F proofof the, formula Count' at least

different, formulas must occur. "IIn particular, each depth d F -proof of Count' must have size at least 2" and

must have at least Q (2nE ) proof steps.

Theorem 9.1.3 and Corollary 9.1.4 then imply the following independence resultfor bounded arithmetic.

Corollary 12.5.5. None ofthe o (R)-formulas PHP(R) or Count' (R) is provablein S2 (R). In fact, the, formulas are not provable in any Sti (R), k > 1.

We now want to investigate the mutual relation of formulas PHP" and Count'.First note a simple lemma.

Lemma 12.5.6. For a > 2, the theory

I L1o(R) + Vx Counta (Do(R))

proves the . formula PHP" (R). In particular, there are constants c, d > 1 suchthat any propositional instance PHP" of the pigeonhole principle has a depth dF prooffrom some instances Count" of size < n` (hence also m < n`).

Proof. Assume PHP (R) and define A:= {0, ... , an) and

{xl < < xa} E S - 3y < n + 1,

xl <n A A xi = x,_l + (i - 1)n A xa = (a - 1)n + y A R(y, xl)a>i>2

This clearly defines an a-partition of A.The upper bounds to F-proofs follow from Theorem 9.1.3. Q.E.D.

Note that we used in an essential way that the pigeonhole principle speaksabout bijections between n + 1 and n rather than just about an injection of n + 1into n. For the latter version it is an open problem whether it can be proved fromCountn(io(R)).

The principle Countn is, in fact stronger than PHP, in the sense of the followingtheorem.

Theorem 12.5.7. Let a > 2 and let d be fixed. Then there is E > 0 such that forsufficiently large n in any depth d Frege proof of Countn (pX) from the instancesof the pigeonhole principle PHPm at least

2nEd

distinct formulas must occur.d

E

In particular, the size of all such proofs has to be at least2'

and each suchd

proof must have at least 0 (2" ) steps.

Proof. Assume for the sake of contradiction that there is a depth d F-proof 7r =(01, ... , 0r) of Countn from some instances of PHP. Clearly we may assumew.l.o.g. (possibly increasing d by a constant and the proof polynomially) that onlyone instance

PHPm (1kuv)

of PHP is used as an axiom in r (use the definition by cases to combine severalinstances into one). The formulas iuv are built from the atoms pX of Countn.

Let F be the set of all subformulas occurring in ir; in particular it containsformulas of the form

V iuv and - V *uv and

v<m u<m+l*uv, A uv2 and Vu,, A Iku2v

As in the proof of Theorems 12.5.3 and 12.5.4 let p E .MMOD°, P 1 < n -nEd-'

be such that there is a k < an3-evaluation (H, S) of rp (constants E, S are thesame as in Theorem 12.5.4).

Claim 1. For = PHPm (*uv)

HH=SS

To prove the claim assume Hg SS . We may assume then that, in fact, Ht = 0as otherwise we may restrict all formulas in (1')P further by some p' E St \ Hthat would collapse (HI)P to 0.

We first note two claims:

Claim 2. Assume that Hg = 0, from Claim 1. Then for every u < m + I andevery v < m the sets

U andv< in

U H,`Hvn+l

are k-complete.

Claim 3. We may assume w l.o.g. that all 2m + 1 complete systems from Claim 2have the same cardinality, say M.

Assuming Claims 2 and 3, Claim 1 follows as we may then count the sum

U (m + 1) Mu<rn+1 v< n,

also as

E I U H,1,, =mMv<ni a<nr+l

which is a contradiction.So it remains to demonstrate Claims 2 and 3. Denote

S,, U H,1 ,,v<in

and

St:= U Hu<In+1

As H = 0 we have for v I : v?

nVI = 0

which implies

Va e V,6 e H,1,,,.,, a U.$ MOn the other hand, HH = 0 also implies

that is,

HvtG, = Sv, ,,, (U\ 1'

Hence

VaEM,lal <n-k-+3#EUH*,,,,, aUf EMU

This shows that S,, is k-complete. Similarly for S°, and Claim 2 follows.Claim 3 is demonstrated by a combinatorial construction of systems refining the

systems from Claim 2 (using the idea behind Lemma 3.1.13 explained in Section3.1). These refined systems resemble decision trees with all paths of equal length.Precisely, we say that a complete system is of the form of a depth i decision treeif (with A =a land [X]1' = {Y C XIYI= k})

1. for i = 1, S has the form

S={{Y}IxEY, YE[A]'I

for some x E A2. for i > 1, there is x E A such that for all Y E [A \ {x}](°-1) the set

{a\({x}UY) E SI {x}UY Ea}

is a complete system over A \ ({x} U Y) of the form of a depth (i - 1)decision tree.

Note a related example in 3 of Lemma 12.3.2.Clearly all complete systems of the form of a depth i decision tree have the

same cardinality.Claim 1 implies with Lemma 12.4.4 that

H0,=S,i=1, r

which contradicts Lemma 12.5.2. This proves the theorem. Q.E.D.

The next corollary follows from the previous theorem by Corollary 9.1.4.

Corollary 12.5.8. Let k > 1 and a > 2 be arbitrary. Then the formula Countn (R)is not provable in the theory

Sk(R) + Vx PHPx (A o (R))

It remains to understand the relation of Countn and Countm principles fordifferent a, b > 2. First a simple result in the spirit of Lemma 12.5.6. For itwe generalize Countn to a conjunction of formulas saying that none of the setsa n + k, k = 1, ... , a - 1, can be partitioned into a-element blocks. Denote thisgeneralized principle gCountn.

Lemma 12.5.9. Let a, b > 2 and assume that b divides a. Then the theory

IDo(R) + Vx gCounta(Do(R))

proves the formula Countn (R).

In particular there are constants c, d such that for every n there is a depth dF -proof of size < nc of the formula Countn from some instances of the formulasg Count..

P r o o f. Work in IDo(R) and assume that the R is a b-partition of the set B, B I =bn + 1. Let k :_ (a/b), put A := an + k, and let the partition S consist of theclasses X E [A]° such that

3YER,X= U {y+(bn+l)iIyEY}0<i<k

Then S is an a-partition of A, but as 1 < k < a, I Al # O(mod a); that is, g Countnis violated. Q.E.D.

The case when b does not divide a is also understood. In the following weconfine discussion to the case when a, b are two distinct primes.

Theorem 12.5.10. Assume p, q > 2 are two different primes. Then for every depthd and every c > 1 it holds that for sufficiently large n, in every depth d F -proofof the formula Countn from the instances of the Countqm formula at least

> n`

different formulas must occur.In particular, the theory

I0o(R) + Vx Countq (Do (R))

does not prove the principle Countn(R).

Proof. Let M := MMODp and let .7r be a depth d F- proof of Countn from oneinstance

Countq (i*/X)

X E [A ]q, Al C= l(modq) (as in the proof of Theorem 12.5.7 we assume w.l.o.g.that only one instance of Count. is used in sr).

Take I' to be the set of all formulas occurring in it plus all formulas of the form(i E A)

V Vx and V V'xiEX iEA 1EX

By the proof of Theorem 12.4.3 from Lemma 12.3.11 there are p e M, Ipl <n -n'd-1,

and a k-evaluation (H, S) of (r)p where k is a constant depending onc but independent of n (this is because if I T 1 < nC then Lemma 12.3.11 allows kto be a large enough constant).

As in the proof of Theorem 12.5.7 it is then sufficient to establish the followingclaim.

Claim 1. For = Counts, (tfx) it holds that

Ht = St

In the proof of the claim we follow the strategy of the proof of Claim 1 in theproof of Theorem 12.5.7, but in this case it is complicated.

Claim 2. Assume Ht = 0. Then for every i E A the set UiE x H*X is a k-completesystem.

Claim 2 is established similarly to Claim 2 in the proof of Theorem 12.5.7.We defer the proof of Claim 1 from Claim 2 after Theorem 12.6.2. Q.E.D.

12.6. Systems with counting gatesThe formula Count' (R) formalizes a counting principle but it is natural to consideralso a situation when the language allows direct counting. This was, in the contextof bounded arithmetic, studied in Paris and Wilkie (1985). They extended languageLpA by a quantifier of the form Qax < t with the meaning

Qax < to (x) is true if I{x < t 10(x) is true}I = 0(moda)

Denote by Qa Do the set of the bounded formulas formed by also using the Qa-quantifier. It is easy then to extend IDo to the theory I Qa Ao by adding a fewrules for handling the new quantifier (and allowing all new bounded formulas inthe induction scheme). We shall instead consider in greater detail the correspondingextension of a Frege system by new connectives.

Definition 12.6.1. Let a > 2 be fixed and i =0,...,a- 1. MODa,i is a propo-sitional connective of unbounded arity such that

MODa,i (0i .... , Ik) i s true i f f I { j 10j true} I - i (mod a)

MODa-axioms are the, following axiom schemes(a)

(b)

(c)

MODa,o(0)

MODa,i (0), for i = 1, ... , a - 1

MODa,i (I', o) = [(MODa,i (I')) n -O) V (MODa,i-l(I') A 4>)]

,for i = 0, ... , a - 1, where i - 1 means i - 1 modulo a, and where rstands for a sequence (possibly empty) of formulas.

12.6 Systems with counting gates 267

The system F(MODa) or LK(MODa) is the system F (resp. LK) whose languageis extended by the connectives MODa,j, i = 0, ... , a - 1, and that is augmentedby the preceding axioms.

The following theorem is straightforward, but, in fact, it is the only informationabout the constant-depth F(MODa)-proof systems. In particular, it is an openproblem to apply the methods leading to the results about constant-depth circuitswith MODp gates (see Theorem 3.1.14) to the lower bounds for constant-depthF(MODp)-proofs.

Theorem 12.6.2. Fora > 2, theformulasCounta have polynomial size, constant-depth F(MODa) proofs.

Proof. We shall denote by the symbol a F-,, P the fact that there are constant-depthsize n°(1) F(MODa)-proofs of ,B from a. Thus we have

1.

F-* MODa,1((1)iEA)

2.

3.

4.

asl - IAI(moda)

-'Counts F-. MODa,1((pX)jEX)

for every i E A as - Count a implies that i is in exactly one X

F-* MODa,o((PX)IEX)

any fixed X as IXI = 0 (mod a)

-Counts F-* MODa,1((pX)iEA,iEX)

5.

from 1 and 2.

F-* MODa,o((PX)icA,iEX)

from 3.6. From the last two steps we get

-Count a F-* MODa, t (0)

which contradicts an axiom.

Q.E.D.

The proof does not not really use propositional reasoning but rather only rudi-mentary counting modulo a. This suggests formulation of the counting principles


in the language of rings and consideration of an equational logic as a propositionalproof system.

Fix a > 2 and n, and A := a n + 1. Consider the equations

T vX = 1iEX

(i)

(X,Y)

for each i E A and each X, Y E [A]a such that X 1 Y, which will be shorthandfor XflY 0 A X Y.

The equations imply, in any field, that each vX is 0 or 1. A particular solutiondetermines the set

a := { X E [A ]a I vX = 1}

and the equations force that a is a total a-partition of A, which is impossible.Hence the system has no solution in any field. In fact, the system consisting ofequations (i) is not solvable in rings of characteristic a as the sum of the left sidesis 0 while the sum of the right sides is 1.

Now we resume the proof of Claim 1 from the proof of Theorem 12.5.10.We continue using the notation from that proof.

For p and n take the system

T u E = 1 (e)eEE

(E, F)

one equation (e) for each e E B = p n + 1 and one for each E, F E [B]P, E 1 F.As earlier, the system expresses that {E I uE = 1) is a total p-partition of B and iscontradictory in every field. We shall reach the contradiction, however, by makinguse of the alleged proof 7r of Count, from Countm ( X)

Let r(u) = 0 be an equation following from the (e) and (E, F) equations ina particular finite field Fq. As the system is contradictory, every equation is itsconsequence, but for some such equations we will get a nontrivial information.Denote by we the polynomial

we:=EuE-1eEE

and by WE, F the polynomial

Hence the system (e)'s and (E, F)'s is equivalent to the system

we=0 and WE,F=O

12.6 Systems with counting gates 269

As it is contradictory, the polynomials generate in the ring Fq [ u ] the trivial ideal,and hence for every r ( u) there are polynomials ce and CE, F such that

Y, Ce'We + ECE,F - WE,F ° r(u)e E,F

is valid in Fq [ u ] (this is a simple consequence of Hilbert's Nullstellensatz but hasa simple direct proof too). In general we are able to say nothing useful about thepolynomials ce and CE, F, but for some polynomials r ( u) there is extra information.

Lemma 12.6.3. For a E M \ {0} define the monomials a by

a=uEi ... UE

where a = { El, Et } and define 0Let S C M be a k-complete system, qk2 < n, and rs the polynomial

rs(u):=Ea -1aES

Then rs can be expressed in the ring Fq [ u ] as a linear combination

rs=Tce-we+TCE,F'WE,Fe E,F

in which the degree of all coefficients Ce, CE,F is at most < qk2.

Proof. The lemma is proved by induction on II SII Pick a E S and let Sa be thekq -complete system

1{Y EMI UaCUy A YEEy,EflUa#0}

Any ,B E S \ {a} is incompatible with a. So for every y E S, either IIBY I < k - 1or /3Y is undefined. Hence S1

S1:= U(y)xSYYES.

is a kq + k - 1-complete system. Each SY is k-complete (Lemma 12.3.9) andIS Y I < k - 1; hence we may repeat the same process with each SY separately.In i steps this produces a (k + (k - 1) + + (k - i + 1))q + (k - i)-completesystem Si. That is, a Sk is () q < qk2- complete system.

Call a complete system S t-supported if there is Z C B of size t such that

S={aIZCUa A VEEot,EnZ 0}

Define S to be i-good by induction on i1. S is 0-good if it is t-supported, some t

2. S is (i + 1)-good if there is a supported complete system T such that Srefines T and such that for every a E T system Sa is i-good

Note that Sa is qk-supported and that Sk is k-good.To simplify the notation we say that the degree of a linear combination L of we's

and WE, F's is the maximum degree of its coefficients, and a "linear combination"automatically means a "linear combination of we's and WE, F's."

The lemma follows from three claims.

Claim 1. If T is t-supported then there is a linear combination LT of degree< II T II such that

E&=LT+1aET

Claim 2. If T is i-good then there is a linear combination LT of degree < II T IIsuch that

J& = LT+1aET

Claim 3. If T refines system U and T is i-good then there is a linear combinationL U of degree < II T II such that

T& = LU+1aEU

The first claim is readily established by induction on t. To see the second claimwrite

aET aETp (PETO

where T refines To and To is supported, and each Ta is (i - 1)-good. By theinduction hypothesis there are linear combinations LTa such that

E =LTa+1SETa'

That is,

61 &LTa +aET aETp lJ aETp

and by Claim 1 the last sum is expressible as LTo + 1, LTo a linear combination.Clearly the total degree is < it T II

The third claim is computed similarly:

1:4_1: , =E&I , +j&PET aEU PET,PDa aEU \\8ETa // (aEU ) aEU

12.6 Systems with counting gates

where LT- are linear combinations of degree < II T a II such that

1: ,B=LTa+1,BET'

271

(their existence follows from Claim 2 as all Ta are clearly i-good if T is i-good).But the first sum is by Claim 2 expressible as LT + 1. Hence

&=LT-1:

&LTa) + 1aEU aEU J

The alert reader noticed that L Ta are linear combinations over B \ (U a) and notover B. In particular, polynomials we miss unknowns uF, for F intersecting U a.However, if we add these missing terms, the difference it causes in the product

LTa is a linear combination of polynomials WF,E, and u E, - U2 , E, e a. Thelatter are themselves degree 1 linear combinations over B. Hence each &LT- is alinear combination over B, even if LTor is not. This proves the claims.

The lemma then follows from Claim 3 as Skis k-good refines Sand 11 Sk 11 < qk2.

Q.E.D.

By Claim 2 in the proof of 12.5.10 each UiEX HEX is a k-complete system;denote it S, . Then by the previous lemma there are linear combinations Li of we'sand wE,F's of degree < qk2 such that

1] a=L1+1aES,

and hence

1: a=1:(Li+1)= YLil +1iEA aESS jEA iEA ff

But the first sum can be also computed as

iEA aES, XE [A]v iEX aEH*X

because IXI = 0(modq). Hence there is a linear combination L = Ei Li ofdegree < qk2 such that L + 1 = 0 in Fq [ u ].

The proof of Claim 2 from the proof of Theorem 12.5.10, and thus the proof ofthe theorem itself, is concluded by the following lemma, taking d = qk2.

Lemma 12.6.4. Let p, q be differentprimes and d a constant. Then forsufcientlylarge n and for any linear combination L of we 's and WE, Fs of degree < d thereis a partial p -partition a of B = pn + 1 such that

L(ua) = 0


where ua is defined by

uE _1, if E E a

jt 0, otherwise

We shall not prove this lemma. The reader may consult Beame et al. (1994).Note that a lower bound d on the degree of a linear combination L for whichL + 1 = 0 holds generally implies a lower bound in Theorem 12.5.10 of the formn2(d). In particular, d = n t) would yield an exponential lower bound in thattheorem.

Open question. Let a, b > 2 be two different numbers. When are there polynomialsize, constant-depth F(MODa) proofs offormulas Countn?

Theorem 12.5.10 does not yield any information about this problem.

12.7. Forcing in nonstandard modelsIn this section we reinterpret some of the previous material of this chapter asforcing constructions. We shall begin with a simple but illustrative result.

Recall that 31 -formulas are the existential formulas; we shall consider them inthe form 3xq5(x, u ), 0 open.

Theorem 12.7.1. The fragment of Peano arithmetic axiomatized by Q and by theleast number principle. for existential formulas in the language L PA (R) does notprove the pigeonhole principle PHP(R):

L31 (R) If PHP(R).

Proof. Let M be a countable, nonstandard model of true arithmetic Th(N). Letn E M be a nonstandard number.

Let P= {g E M I g: c n+ 1-+ n, g injective}. The symbol g E M meansthat g is coded in M.

Enumerate all elements of n + 1: uI, U2.... and also all 31 (R)-formulas 01(x),02(x), . . . with one free variable x and with parameters from M.

We shall build a sequence go c_ g, c_ ... E P such that1. U,dom(g) =n+12. for g:= U; g;,

(a) (M, g) L31(g)

(b) (M, g) f= "g is injective"

3. the cardinality of each g, is standard(Step 0) go :_ 0(Step 2i) Choose any g21 2 g2i_1 such that ui E dom(g2,) and such that condi-tion 3 is satisfied

12.7 Forcing in nonstandard models 273

(Step 2i + 1) Choose g2i+1 > g2i of standard size such that for any F : n + 1 i-* n,F Q g2i+1, injective, if R is the graph of F, then

(M, F) least number principle for Bi (x)

To perform the odd steps we need the following simple claim.

Claim. Let B (x) be a 31(R) formula. Then there is standard k such that for everygEPand aEMif

3RQg,(M,R)0(a)then there is h E P, Ih \ gI < k such that

VR D h, (M, R) = 0(a)

where R ranges over the graphs of the total injective maps from n + 1 to n in thelast two, formulas.

To see the claim observe that in the open kernel 0 (x, u) of 0 constantly manyatomic subformulas of the form R(s, t), where s and t are terms built from x, u,occur.

If some R Q g makes 0(a) true in M we need to fix the truth value of onlyconstantly many such subformulas. To make R(i, j) true add the pair (i, j) to g;to make it false add (i, j') to g for some j' 0 j.

With the claim and g = $2i in an odd step, consider the formula

q(x) = 3h E P, I h \ gj < k A "R Q h forces 9i (x) true"

where k is the number guaranteed for 9i by the claim.Take a E M to be the minimal element satisfying q (x) in M, if there is any.

Such an element exists as M satisfies the least number principle for all formulas.Let h be the map guaranteed for a by the claim. Put

g2i+1 h

Q.E.D.

Note that the same proof works for those LpA(R)-formulas in place of 31(R)-formulas in which every universal quantifier is bounded by some m such thatm' < n for all standard c: A statement analogous to the claim holds with thenumber k being replaced by a number of the form m', some c standard.

This is seen as follows. Let 9(x) be a formula of the form

b'vl <m3uldv2 <m...3ueo(x,v,u)

with 0 open. Formula 0(x) is equivalent (in M) to the formula

3H1,...,He :m-. MVvl,...,vt <m0(x,U,ui/Hi(x,vl,...,ve))

where Hi are Skolem functions. The subformula

Vu1.... ,ve <m.0 (x,9,uilHi(x,vi... ,ve))

can be replaced by the conjunction

A o(x,U,uilHi(x,v1,...,ve))Vt.....Ve<m

which contains < c me distinct terms (in parameters x, U and also using thesymbols Hi ). Hence we may use this conjunction in place of the open kernel inthe proof of the claim.

Note that this shows that the theory Eb (R)-MIN (which is equivalent to TZ (R)by Lemma 5.2.7) does not prove PHP(R): That is, it is an alternative proof ofTheorem 11.2.5. In fact, the proofofCorollary 11.3.2 from Riis (I 993a) generalizesthe proof of Theorem 12.7.1.

The rest of this section is devoted to an interpretation of the method of k-evaluations from Sections 12.3-5 as a forcing type of construction (we refer thereader to Takeuti and Zaring 1973 for basic information on forcing).

Let M be a countable nonstandard model of true arithmetic Th(N), n e M \ wits nonstandard element and I c_e M the cut in which the elements 21"1e, £ < w,are cofinal. Hence I is a model of S2 (cf. Lemma 5.1.2). Our aim is to find abijection f : n + I n such that

(I, f) S2(f)+-,PHP(f)

The bijection f will be obtained by forcing. The set of the forcing conditions Pis defined by

P:=IUEMIa:cn+l--) n,aisI-1,IaI <n - some t<w}

and partially ordered by inclusion.Call subset D C P dense if

VaEP3,BED,ac,6

and definable if there is a formula 0 (x) with parameters from M such that

D = t(a)}

A subset G C P is called generic if it satisfies four properties:1. G 02. Vol EG`d,BCa,,BEG3. Va,0EG3yEG, aUP Cy4. D fl G 0, for every dense definable subset D C P

The next lemma is standard and its proof is omitted.

12.7 Forcing in nonstandard models 275

Lemma 12.7.2.1. Let a E P be arbitrary. Then there is generic G such that a E G.2. Let G be generic and define the map fG by

fG:=UG

That is,

fG(x)=y iff 3aEG,a(x)=y

Then fG is a bijection between n + 1 and n.

Let 4) (f) be an L (f )-sentence with parameters from I. The notion of genericityallows us to define the notion of forcing.

For a E P : a .forces 0 (f ), a 0 (f) in symbols, if for every genericGCP,aEG

(I, fG) 4)(f)

This forcing notion satisfies the usual properties (cf. Takeuti and Zaring 1973).We would like to find a generic G such that for every bounded L(f)-formula

4)(x) with the parameters from I and with one free variable x some a E G forcesthat 4)(x) satisfies in (I, fG) the least number principle. It is, however, not at allclear that such G exists and we will have to employ the restriction method fromSections 12.3-5 to achieve this.

Consider the set .F of such bounded L (f ) -formulas 0 (x). Let m E M \ I be anyfixed element and d a standard number. Define (in M) the set I'd of propositionalformulas to be the set of formulas 9 E M that satisfy

1. dp(9) < d2. 191<m3. the atoms of 9 are among pij, i < n + 1 and j < n

Note that for any 0 (x) E F there is d such that for all u E I

(o),, EI'd

where (0) is the translation from Definition 9.1.1.As M is a model of true arithmetic, Theorem 12.4.3 holds in M. Choose con-

stants e = 1 /6 and 3 = 1 /2, some fixed f > (o. Hence in M it holds that for somep E P, IpI < n - n1116d '), there is k-evaluation (H, s) of (I')p with k < 2ns.

The following lemma is an important technical vehicle.

Lemma 12.7.3. Let 0 be a bounded L (f )-sentence with parameters, from I andlet d < co be such that (0) E I,d. Let E, 8, and p be as earlier and let (H, S) be ak = 2n8-evaluation of (r'd)p.

Then for a generic G containing p it holds that

GIH 0 iff (I, fG) V aaEH(,p)p

where (I, fG) a iffa C fG.

Proof. The proof proceeds by induction on the logical complexity of ¢. It is clearfor the atomic 0 and for the negation. We shall consider only the case when4) = 3x < uilr(x). For simplicity of notation we shall skip the reference to p.

By Definition 12.4.1 we have

U 1H(O) = S(O)

Cv<U

and

Va V v 0aEH(O) v<u ,BEH(*)v

The second equality follows from

Va= VaEH $ES(H)

whenever a complete system S refines H, which in turn follows from the identity

VaaES

valid for every complete system S.Hence G If- 4) iff G IF- 3v < ui/r(v) if

(I, .fG) V V sv<u PEH(,p)v

iff (I, fG) k- V«EH(,b) a. Q.E.D.

We are equipped now to describe a construction of a generic G C P for which

(I, fG) S2(f)

Let 01 (x), 02(x), ... enumerate set.F so that each 4), appears twice. Put A0 := 0and construct po c pl C . . . E P in steps.

If we are in an even step, extend p2i to some P2i+1 from the i `h dense definableset. If we are in an odd step p2i+1 consider the formula 0,+1. There are twopossibilities:

1. for some d < w s.t. (0,+1) E I'd there is k-evaluation (H, S) of (rd)P2'+isuch that k`' < n, all c < w

2. otherwise.

In the first case pick in M the minimal u < m such that

3p2p2i+l,lp\p21+11 <kApEPApH 1+1(u)

This is definable because of the bound l p \ p21+1 I k and because p 0r+1 (u)if p H Vu H(0 +i )u a, which is clearly definable.

Pick p2i+2 P21+1 any p witnessing the validity of the formula for the mini-mal u.

In the second case take any d < w such that (0t+1 ) E I'd and any per-2extending Per+1 for which there is a k-evaluation of (rd)P2r--2

Take

G:={aEPI pt Qa, somei <w}

By Lemma 12.7.2

(I, .fG) H - PHPn (fG )

and by Lemma 12.7.3

(I. fG) 0 3xo(x) 3uVv < u. 0(u) A -0(L)

That is, also

(I. fG) H S2(fG)

12.8. Bibliographical and other remarksTheorem 12.1.3 and Corollary 12.1.4 are due to Pudlak (1992a). Section 12.2 isafter Krajicek (1994): Theorem 12.2.4 was the first exponential lower bound forthe constant-depth systems. Earlier bounds (Ajtai 1988. Bellantoni. Pitassi, andUrquhart 1992) were slightly superpolynomial.

Sections 12.3 and 12.4 contain material from Krajicek et al. (1991). Lemma12.3.10 was proved there using a probabilistic argument based on an unpublishedwork of Woods. who, inturn. built on Yao (1985). Cai (1989), and Hastad (1989).The present proof recasts the original as a counting argument: see Lemma 15.2.2for a similar argument.

Ajtai (1988. 1990) investigated the lengths of proofs of PHP and Count; studiedearlier in connection with the bounded arithmetic in Paris and Wilkie (1985) andWoods (1981). His arguments showed that there are no polynomial size constant-depth proofs of PHP. and of Count;, from the instances of PHP. Bellantoni etal. (1992) extracted from Ajtai (1988) an explicit superpolynomial lower boundto PHP. The exponential lower bound was obtained independently in Krajiceket al. (1991) and Pitassi. Beame. and Impagliazzo (1993). and the exponentialseparation of PHP and Count; was obtained by Beanie and Pitassi (1993) and Riis(1993b). The proof of Theorem 12.5.7 in part follows Riis (1993b). The separation


of Count' and Countq for p, q primes (Theorem 12.5.10) was announced byAjtai (1994a) and Riis (1994). The proof presented here is from Beame et al.(1994), where the dependence of the principles is completely characterized alsofor composite p, q.

Paris, Handley, and Wilkie (1984) and Paris and Wilkie (1985) characterizesome of the classes Qa 0o in terms of Bel'tyukov's machines (see Bel'tyukov1979), and Gandy (unpublished, see Theorem 11 in Paris and Wilkie 1985) char-acterized class Q2 AO in terms of a limited primitive recursion.

Nothing is known about the proof systems with the counting gates. The construc-tion of i-good refinement in the proof of Lemma 12.6.3 is similar to a constructionof "canonical systems" in Riis (1993b), and it is based on the idea behind Lemma3.1.13. The bounded arithmetic systems with the counting quantifiers are also notunderstood.

Theorem 12.7.1 is due to Paris and Wilkie (1985), and it is the first forcingargument in the context of weak arithmetic. Later forcing constructions of Ajtai(1988) or Riis (1993 a,1993b) are based on the same principle, although technicallymore complicated. Our presentation is slightly modified from Riis (1993b).

13

Bounds for Frege and extended Fregesystems

In this chapter we shall discuss the complexity ofFrege systems without any restric-tions on the depth. There is some nontrivial information, in particular nontrivialupper bounds, but no nontrivial lower bounds are known at present (only boundsfrom Lemma 4.4.12).

13.1. Counting in Frege systemsTheorems 9.1.5 and 9.1.6 are useful sufficient conditions guaranteeing the exis-tence of the polynomial size EF-proofs and of quasipolynomial n019 ")O"' sizeF-proofs, respectively. For example, U,' proves the pigeonhole principle PHP(R)and hence there are quasipolynomial size F-proofs of PHP,. A subtheory of Ucorresponding to the polynomial size F-proofs, based on a version of inductivedefinitions, was considered by Arai (1991); see Section 9.6. Its axiomatizationhowever, stresses a logical construction, whereas we would like a theory based ona more combinatorial principle.

The most important property of a Frege system relevant for the upper boundsis that it can count. We shall make this precise by showing that F simulates anextension of IAo(a) by counting functions, and that F p-simulates a propositionalproof system cutting planes.

Definition 13.1.1.(a) Let Lobe the language of the second order bounded arithmetic but without

the symbol #. The language L;+1 is obtained from the language Li byadding, for every bounded L1-formula

B(xi,...,xk, Y1,..., YO

279

280 Bounds for Frege and extended Frege systems

with k + 2 free variables a new function symbol Fa y(x; T). The languageLw is the union U; Li.

(b) The theory I Do (a)`Ou"` is a theory in the language Lu, axiomatized by PA(see Definition 5.1.1), the induction axioms. for all bounded LO, formulas,and by all axioms of the, form

FAv(O;bl,...,be):=0

and for j = 1, ..., k

FB y(al,...,aj-l,a+ 1,aj+1,...,ak;bl,...,be)FB y(ai,...,aj-l,a,aj+l,...,ak;bl,...,be)+FA,Y,x; (al,...,aj-l,aj+l,...,ak;bl,...,be,a)

The meaning of the function Fx,9 Y(x; y) is

FB,(a;b) = {xEal

We note a simple logical property of IDo(a)count

Lemma 13.1.2. Let T be the theory I Eo'b (Definition 5.5.2) in the language ofI D0(a)cou"t (i.e., without the # symbol, in particular), plus the axiom

`da3u, f, Enum(f, u, a)

where Enum is the formula from Lemma 5.5.14.Then the theory T is conservative over the theory IDo(a)`ou"t wr.t. all

V0 (a)- formulas.

Proof. Note that any model of IDo(a)cou t can be expanded to a model of thetheory T, leaving the first order part unchanged. Q.E.D.

The following lemma is not surprising.

Lemma 13.1.3. The theory IDo(a).ou"t proves the pigeonhole principleVxPHP(R, x).

Proof. Assume - PHP(R, a), that is, R(xl, x2) is a graph of a 1-1 map from a + 1into a.

By induction on y and z the theory IDo(a)count proves for y < a + 1 and z < a

FR"x'(y,a;)=y, and FRR',xz(a+1,z;) z

and hence

a+1=FR''x2(a+1,a;)<awhich is a contradiction. Q.E.D.

13.1 Counting in Frege systems

Next we define the cutting planes proof system.

Definition 13.1.4. A CP-inequality in x1, ... , x is an inequality of the, form

281

where aI,...,a0,bareintegers.CP-inference rules are the following four rules allowing us to infer a CP-

inequalityfrom other CP-inequalities:1. initial

andxi>0 -xi>-2. addition

3. multiplication

>2i aixi > b >2i cixi > d>2i (ai + ci )xi > b + d

Ei aixi > b>i (cai )xi > cb

where c is a nonnegative integer4. division

Ei aixi > b

>2i (ai l k)xi > (bl klwhenever k is a positive integer dividing all ai

A cutting plane refutation (a CP-refutation, shortly) ofCP-inequalities Il , ... , Ikis a sequence J1, ..., Je such that each Ji is either one of II, ..., Ik or derivedfrom the previous inequalities J1 , ... , J_1 by one of the CP-rules, and such thatthe last inequality Je is 0 > 1.

The size ofa CP-inequality >iaixi > bisJbJ+>i Jail. The size ofa CF-proofis the sum of the sizes of its steps.

The proof system CP is complete, meaning that any set of CP-inequalitieswithout a 0-1 solution has a CP-refutation (cf. Cook, Coullard, and Turin 1987).

_xj,, }For the following lemma define that a clause C = {xi...... xi., -x11,...,is represented by a CP-inequality I(C):

xi- xj>1-biEU\V jEV\U

where U={i1....,ia}andV={J1,..., jb}.

Lemma 13.1.5. The cutting planes proof system p-simulates the resolution sys-tem.

P r o o f . Let x1, ... , x be all atoms. If a clause contains both xk and -,xk then theCP-inequality representing the clause can be inferred from 0 > 0 (which follows,

e.g., from xi > 0 by multiplying both sides by 0) by adding to it some initialinequalities xi > 0 and -xi > -1.

Let C1 and C2 be two clauses such that xj E C, and ,xj E C2, and let C bethe resolvent of Ci and C2 w.r.t. xj

C= (Cl \{xj})U(C2\{-.xj})

By the preceding material we may assume that for any k 0 j, at most one of theliterals Xk, --'xk occurs in C1 U C2. Moreover, we may also assume that -xj V Ciand x j C2 as otherwise I (C) can be obtained from I (C2) if,x j E C1 (resp. fromI (Ci )) if x j E C2, by adding to it suitable initial inequalities xi > 0, -xi > -1.

Let 1(C1) : E aixi > 1 -band I (C2) : E cixi > I - d. Note that all ai, ciare 0, 1 or -1. Add to the inequality I (Ci) + I (C2)

E(ai+ci)xi>2-b-dall initial inequalities xi > 0 if ai + ci = 1 and all -xi > - l if ai + ci = -1 toget an inequality J

T, uixi > 2-2b-2d+ 1

where all u i are 0, 2 or -2. Note that u j = 0 as a j + c j = 0. Apply to J thedivision rule with k = 2

2xi > I 1 -(b+d)+2 = 1-(b+d-1)

It is easy to verify that J represents the resolvent of C1, C2 w.r.t. x j. Hence one res-olution inference has a size O(n) CP-simulation. This readily implies the lemma.

Q.E.D.

Lemma 13.1.6. Let PHPn P be the set of CP-inequalities representing the clausesof PHPn. That is, PHPn P consists of

xil+....+xin> 1, .for] -1, forl n+lif

For a fixed k derive the inequalities (r = 2, ... , n + 1)

-Xlk - X2k - .. - xrk > -1

13.1 Counting in Frege systems 283

For r = 2 this is just an inequality of PHPnP. The inequality for r + 1 is obtainedby summing r - 1 copies of

-Xlk-X2k-...-Xrk>-1

with all inequalities of PHPn P of the form

-Xik - Xr+l k> - 1 - 2r

which after division by r entails the inequality for r + 1.Sum up all inequalities

-Xlk - ... - Xn+l k >- -1

for k = 1, ... , n to get

T, -xi j > -nij

and adding this to the first inequality of the proof yields

0>1

The length of any inequality is 0(n logn) and there are 0(n3) steps: That is,the total size is n 0 . Q.E.D.

Note that the lemma together with Theorem 12.5.3 implies that the constant-depth systems do not p-simulate CP. The following theorem is the main result ofthis section. Recall Definition 9.1.1.

Theorem 13.1.7. Assume that 9(x) is a bounded AO(a)formula in the languageLu, and that

IAO(a)count I- Vx9(x)

Then there is a polynomial p(x) such that all formulas (9), have size < p(n)F -proofs.

Proof The simulation is constructed as in Theorem 9.1.3. We only need to showhow to simulate the new axioms from Definition 13.1.1, that is, how to define theformulas pQ h j of the translation of formulas (Fy 3,(x; y) = z)a b and how toprove the translations of the axioms. To simplify the notation we shall considerthe formulas 9(x) with one free variable only (parameters implicit) and we shalluse the atoms pi instead of the formulas (0)i.

For any fixed n we need to define the formulas qi j representing FB (i) = j. Anaive approach would be to use the inductive property of F to define qoo := 0 and

qi+1 j (qij A -pi) V (qi j-1 A Pi)

This, however, gives q, ,j of size S2 (2n). A slightly less naive method would be todefine F on subintervals of n of the form [u (n/2v), (u + 1) (n/2v)), u < 2v,and proceed by induction on v = log n, log n - 1, ... , 0; this yields formulas ofsize nO(logn) only.

We resolve this problem by using the carry-save addition. That is a techniqueallowing us to compute the sum of n numbers with n bits each by a Boolean formulaof size n0(1). Thinking about pl, . . . , pn as n numbers with one bit each we shallfind size nO(1 j formulas computing the logn bits r0, ... , rkgn-1 of E1<k pi interms of pl, ... , p, and then put

qkj A\ rk n A -rk](i)=I .j(i)=0

where AO).... , j (log n - 1) are the bits of j. From the construction of rk it willbe apparent that the system F can verify the inductive property

qkj ° (qk-1, j A -pk-1) V (qk-1, j-1 A pk-1)

We define the formulas (in p l, ... , pn) a U and b U , where u = 0, ... , f - 1,v = 0, ... , (n/2u) - 1, w = 0, ... , 2 - 1, and 2 = [lognl. Denote by auv thef-tuple (a°v, ... , auU 1), similarly buv.

Put

w pv ifw = 0aov- 0 ifw>0

and bov := 0, and then define

au+l,v := A(au,2v, bu,2v, au,2v+1 , bu,2v+1)

and

bu+l,v B(au,2v, bu,2v, au,2v+1, bu,2v+1)

where A and B are f-tuples of propositional formulas in 4f atoms defining thecarries and the sums in the sumofthefour £-bit-long numbers au,2v, bu,2v, au,2v+1,bu,2v+1, and are defined by

A := Ao(Ao(au,2v, bu,2v, au,2v+l ), BO(au,2v, bu,2v, au,2v+1), bu,2v+1)

and

B := BO(AO(au,2v, bu,2v, au,2v+1), Bo(au,2v, bu,2v, au,2v+1), bu,2v+1)

13.1 Counting in Frege systems 285

where

Ao (x y, E) _= x" ED yw ®zw

and B 0 0 ( , , ) := 0 and

Bp (x,.Y,z) (xw-I Ayw-1) V (xw-I Azu,-1) V (zw-1 A yw-1)

if0<w<f.The formula t ® s is a shorthand for (t A -s) V (,t A s). Note that Ao is the

sum modulo 2 of the wth bits of x, y, E and that Bo is the carry on the wth place.Hence the sum (modulo 2e, but all our numbers are smaller than 2) of numbersx, y, z is equal to the sum of A0(x, T, E) and Bo(x, y, z ).

We shall use some fixed formula C(e, f) in 2f atoms representing the sum oftwo f-bit-long numbers. We take for C the disjunctive normal form ofthe associatedBoolean function. Such a formula has size 20(e) = no(1) and F can prove all theusual properties of the addition by simply considering all assignments.

Define

clly := C(allU, b11 )

Hence c" are the bits of E v-211pi, and ce_ I o are the bits of the wanted

sum Ti pi. This allows us to put

rk:=ce-I o(PI,...,pk-1,0,...,0)

The inductive property of qkj then follows from

pk- C(rk, 0 ... 0, 1) = rk+l

-Pk -+ rk = rk+l

which is verified by induction on u in

Pk = (C(cllv(pk10), 0...0, 1) = cuv)

where v is chosen to be the unique number such that k E [v 2", (v + 1) . 2").All these verifications rest only on the F-provability of the basic properties of

the addition for formula C that are available by our choice of C. Q.E.D.

The theorem and Lemma 13.1.3 entail the following corollary.

Corollary 13.1.8 (Buss 1987). There are polynomial size Frege proofs of the pi-geonhole principle PHP".

From Theorem 12.2.4 we know that constant-depth systems cannot p-simulatesystem F. Note that the last corollary and Theorem 12.5.3 are also proof of thisfact.


A CP-inequality with variables x I, ... , x can be coded by an (n + 1)-tuple ofnumbers, each of them represented by a set of its bits. So the whole inequality isrepresented by a set, as is any CP-refutation. Recall a similar situation with thecoding of F- and EF-proofs in Section 9.3.

Theorem 13.1.9. The theory Ii.o(a)" proves the soundness of the cuttingplanes proof system CP.

Proof. It is enough to show that the theory T from Lemma 13.1.2 can define theaddition, multiplication, and division of numbers coded by sets of their bits andprove the basic properties, say PA-. This is done similarly to the proof of Lemma5.5.4; there we used A 'b-induction to verify the basic properties, but we mayreplace that by a Eo'b-induction with the parameter (the multiplication table, etc.)

witnessing the Ell 'b-definition of the operation. These witnesses are definable bya Eo'b-formula with the help of the new counting functions. We leave the detailsto the reader. Q.E.D.

The following corollary is obtained from the previous theorem and Theorem13.1.7 analogously with 9.3.17 followed from 9.2.5 and 9.3.13.

Corollary 13.1.10. Frege system F p-simulates cutting planes proof system CP.

13.2. An approach to lower boundsThis section points out an open problem rather than presenting results.

The only known lower bound to the size of F-proofs is provided by Lemma4.4.12. Its general form, with an identical proof, is the following lemma.

Lemma 13.2.1. Let t be a tautology that is not a substitution instance of anytautology of a smaller size. Denote by s(r) the sum of the sizes of all subformulasoft.

Then every F -proof it oft must have size at least

Inl = Q(s(t))

The weakness of the lemma is not so much in the poor bounds it provides (asalways s(r) = O(Ir12)) as in the fact that the value ofs(t) depends on a syntacticform oft rather than on the meaning oft. For example, a disjunction a of n literalshas the value s(a) = Q (n2) if the brackets are associated to the left but only thevalue s(a) = O(n logn) if the disjunctions are arranged in a balanced binarytree.

We would therefore be interested in obtaining a lower bound of the form atleast n 1 +E but robust w.r.t. to the distribution of brackets and similar syntactical

13.2 An approach to lower bounds 287

questions. In particular, we would like to obtain such a bound to the F-refutations ofan unsatisfiable formula represented by a set of small clauses. Such a formulationseems to depend least on the syntactic form.

Tautologies that might be useful in this respect were considered by Tseitin(1968) . Let G be an undirected graph with n vertices labeled by 0 and 1 suchthat an odd number of vertices get label 1. Let each edge of G have an associatedatom. Consider n sets of clauses {C;,}1, one set for each vertex v, representing inthe conjunctive normal form Al C;, the formula

el ®... (D et = 4

where el, ... , et are the atoms associated to the edges incident with v and £v isthe label of v. Clearly i < 2`.

Denote by C(G) the set of all clauses C;,, for all v and i. The set is unsatisfiable(as the sum of the labels of the nodes has to be even but is odd by the hypothesis).

There is a relation of the formula C (G) to the instances of the counting principleCount,' . Let G = (V, E) be a graph with labeling a of its edges satisfying C(G).We may assume that G has exactly one vertex v1 labeled by 1 and that for thatvertex deg(v1) = 1(a graph satisfying this can be defined from G in a simple way,e.g., by a DO(G)-formula). Then the labeling a of the edges defines a completepairing on the set

A = (({u, v}, u) I u E V, {u, v} E E} \ ((e, vl))

where e is the unique edge incident with vi: pair ({u, v}, u) with ({u, v), v) ifa({u, v}) = 0 and ({u, v}, v) with ({u', v), v) if {u, v) and {u', v} are the (2i + 1)stand the (2i + 2)nd edge incident with v and labeled 1 by a. The set A has an oddcardinality, however. This shows that C(G) can be refuted from an instance ofCount,'. Note that this refutation is polynomial size and is bounded depth if thedegree of G is bounded.

The opposite implication also works. From any equivalence relation R falsi-fying Count,' one can obtain a Ao(R)-definable labeling a of the edges of thecomplete graph K2n+l whose exactly one vertex is labeled by 1, such that a sat-isfies C(K2n+l )

Assume that G has the degree bounded by an independent constant, thenIC(G)I = 0(n). Urquhart (1987) showed that bounded degree expanders yieldsets of clauses with exponential size resolution proofs. On the other hand, theclauses have polynomial size F-refutation for any graph G, utilizing the countingin F from the previous section. We would like to find bounded degree graphs forwhich any F-refutation of C(G) must have the size nl+Q(1).

Next we consider a possible approach to showing such a lower bound (forsuitable graphs). Consider the following search problem. Given a labeling a of the

edges of G find a vertex v for which the parity condition

ej ®...®et =fU

is violated. We want to solve the problem by using a modification of decision trees(cf. Definition 3.1.12) where at a node the tree branches into 2` subtrees dependingon the labels of the edges incident with a vertex v.

Denote by D(G) the minimal height of such a tree solving the search problemfor all labelings of the edges of G (labels of vertices are fixed: a part of thespecification of G).

For example, if G is a complete graph with just one vertex labeled by 1 thenD(G) = n - 1. If G is a circle (again with one vertex labeled by 1) then D(G) <log n, by the binary search.

Assume we have an F-refutation n of C(G). Knowing the truth values of allformulas in it allows us to construct a path *o, ..., *, through it (similarly tothe proof of Lemma 9.5.1) and to solve the search problem in this way. Definenewsize(/) for the formula 1 to be the minimal number k such that there is a setW of k vertices such that the truth value of can be determined knowing only thelabels of the edges incident with a vertex from W. Hence

D(G) < newsize(,r) newsize(i)

where 1 runs over steps of it.We would like to find a type of graph G and a space of random partial evaluations

p leaving p n edges unlabeled and having with a high probability two properties1. D(GP) = Q(pn)2. newsize(irP) = 0 (p'+E newsize(7r))

where E > 0. Choosing p - n-1+8 gives

S2(pn) = D(GP) < newsize(irP) = 0 (p '+E newsize(lr))

and thus

Q(p-En) = 22 (n1+[1-s]E) = newsize(ir) < IirI

This is motivated by the method of Subbotovskaja (1961), which showed that theexpected shrinking of a formula c in atoms x 1, ... , x under random restrictionsfrom Lemma 3.1.11 is by a factor 0 (p3/2 ). We would need therefore a type of graphG for which the decision complexity D(G) is S2 (n) and such that the restrictedgraphs GP are of the same type (which would yield condition 1) and for whichthe shrinking of the formulas w.r.t. the newsize occurs with factor p'+"' ('). Again,expander-type graphs might be good candidates.

13.3 Boolean valuations 289

13.3. Boolean valuationsIn this section we define the concept of Boolean valuations, which was introducedin Kraj icek (1995a). It aims at describing a framework for proving the lower boundsto £F (recall the measure CF from Section 4.4: the number of distinct formulas ina proof). In the rest of the section we consider a Frege system F in the language{0, 1, V, A). We have to start with three rather formal definitions.

Definition 13.3.1. Let I' be a set of formulas. We say that a formula t can beproved within I, Hr in symbols, if there is an F-proof of r in which only formulasfrom the set r occur as subformulas.

Definition 13.3.2.1. A partial Boolean algebra is a structure

13(0, 1, n, v, -,)

in the language of Boolean algebras {0, 1, n, v, -} but in which the oper-ations A, v, - are only partial; in which the elements 0 and 1 are distinct;and that satisfies those axioms t = s of the theory of Boolean algebras BAfor which both t and s are defined. For definiteness we take as the axiomsof BA the instances of the following identities:

(a) -0=1 -,1=0, Ovu=u and lnu=u(b) uv -'u= 1, un-'u=0(c) uvv=vvu, unv=vnu(d) uv(vvw)=(uvv)vw, uA(vnw)=(unv)nw(e) uv(vnw) = (uvv)n(uvw), un(vvw) = (unv)V(unw)

2. A homomorphism h : 131 -* 132 ofa partial Boolean algebra 131 into 132 is amap h of the universe of 131 into the universe of 132 such that h (O8) = 0132,h(1n,) = 182, and

(a) -'h(u) is defined and equal to h(-'u), whenever -'u is defined in B1,

(b) h (u) oh(v) is defined and equal to h (u o v), whenever u o v is definedin 131 (o=A,v).

3. A congruence relation on a partial Boolean algebra 13 is a partition = ofthe universe of 13 such that

(a) u = v implies (-'u) - (-'v), provided both -u, w are defined in B,

(b) u = v and u' - v' imply (u o u') - (v o v'), provided both u o u' andvov'aredefinedin13(o=A,v).

See Gratzer (1979) for details on partial algebras. The following definition iscrucial.

Definition 13.3.3. Let r be a set of formulas. A Boolean valuation of r is a map

v:F-8of the ,formulas from r into a partial Boolean algebra 8(0, 1, A, V, --,) satisfyingthe following conditions:

1. v(0)=O8and v(l)=1n,if0EF(resp.IEF)2. -'v(') is defined and equal to v(-' ), whenever Vi, -* E r3. v(*) o v(0) i s d e f i n e d and equal to v( o 0), whenever , ¢, i// o 0 E r,

for o = V, A.

Lemma 13.3.4. There are c > 0 and an assignment * assigning to any finite setF offormulas and any F proof r of r within r a set offormulas rn F such that

jr; I < cIFI and max {dp(¢)I -0 E rn } < c + max {dp(O)I 0 E r}

and such that there is no Boolean valuation v : F; -). 13 in which v(r) 113.

Proof. For R (and p = (pl, ... , pn))

Yr(p)

Yo(P)

a rule of the system F let TR(p) be the set of all Boolean terms occurring as thesubterms in some fixed equational proof eR of the equation

YO (p) = l

from the equations

YA(P) = 1, i = 1,...,rin the theory BA. Such a derivation exists as R is sound.

Let n = 91, ..., 9k be an F-proof of r within r. Define the set Fn to be thesmallest set satisfying

1. r;Dr2. whenever

Yl(*I,...,Y'n), ,Y,.(*i, ,Yn)*n)

is an inference in jr using the rule R then

TR(/I,..., *n) C rn

It is clear that the depth of r; increases by a constant over d p(F) and the cardinalityof r; is proportional to F.

Construct an equational derivation of

r=1

constructing consecutively the derivations of

8;=1

using the derivations eR if 9j was inferred by using rule R. As the valuation vpreserves the connectives on rn, clearly

v(9,) = la

all i, and so also

v(i) = 18

That is a contradiction. Q.E.D.

The following construction is a converse to the previous lemma.

Lemma 13.3.5. There are a constant c > 0 and an assignment + assigning toany finite set r of formulas a set of formulas r+ D I' such that

lr+l< IFI° and max{dp(0)1 0 (=-r+} < c+max(dp(O)J ¢ E r}

and such that for each r r. r, if there is no F-proof of r within r+ then thereexists a Boolean valuation v : IF -- 8 in which v(r) 0 113.

Proof. For ti = t2 an identity of the language of BA define the set of formulasF,,,t2 in the following way:

1. for every subterm s oft; , i = 1, 2, introduce a new atom qs2. for o = v, A, i = 1, 2, and sl o s2 a subterm oft; put the formula

qs os2 = (qa, o qa2 )

into the set At,, t23. for i = 1, 2 and -s a subterm oft; put the formula

q's = (-'qs )

into the set A,,,t24. set F,, ,,2 is the set of all formulas occurring in a fixed F-proof 7r,, ,,2 of the

formula

/ \ A,,,t2 - qt', = q'

(A bracketed to the left, say). Recall that we use - as an abbreviation.Define the set r+ to be the smallest set r+ D r closed under subformulas and

satisfying the following conditions:

1. for any a, Y E F, all formulas

a

are in r+2. whenever tI = t2 is one of the identities -a = -a, a v b = a v b, a A b =

a A b then

Fti,t2(ala, blf) C r+

for alla,f E I'3. whenever ti = t2 is one of the axioms of BA and U' E I' then

Fh,t2(Q,/as) C r+

Clearly dp(r+) < c + dp(I') and II'+l < II'I`' for some constant c > 1.Define the relation - on I' by

if I-r+ O lr

Condition I implies that is an equivalence relation and 2 implies that it is evena congruence relation. From 1 it also follows that r is not equivalent to 1 as bythe hypothesis of the lemma r has no F-proof included in r+.

From define a partial Boolean algebra B and a valuation v : I' B by(a) the elements of B are the congruence classes from r/(b) 0L = 0/ 113 = 1/

(c) -a = -a/ whenever a e F and a E a(d) a o b = (a o ,B)/ whenever a o ,B E F and a E a,,8 E b, o= V, A(e) v(a) := a/

It is straightforward to verify using 2 and 3 that the operations V, A are correctlydefined and that v is a Boolean valuation. As v(r) ¢ l , we are done. Q.E.D.

We shall state the main theorem on Boolean valuations.

Theorem 13.3.6 (Krajicek 1995a). Let r be a propositional.formula and let nrbe the maximal number n such that, for every set A of < n formulas containing r,there is a Boolean valuation v : A -. 13 in which v(r) ln.

Then

nr = O($F(r))

and

£F(r) = no(t)

The same is valid i f we restrict F to the depth d systems and consider for A onlythe sets of depth < d + c formulas, c a constant depending on F only.

Proof. Let jr be an F-proof oft with £F(r) distinct formulas. Take r to be theset of the formulas occurring in n and put A := F. By Lemma 13.3.4 there is noBoolean valuation v : A -± B in which v(r) 1B. Also Al I= O(Irl)

For the second inequality take r a set of n r + I formulas containing t for whichthere is no Boolean valuation v in which v(t) 0 113. By Lemma 13.3.5 there mustbe an F-proof oft within r+, hence

fF(r) < Ir+I = Ir100) = (nt)o(t)

The generalization to the constant-depth systems holds because of the boundsto the depth of r*, r+ in Lemmas 13.3.4 and 13.3.5. Q.E.D.

The reader should find modifying the concept of Boolean valuations to theextensions of F by a set A C TAUT of extra axioms straightforward; require thatv(a) = 113 for all a E A. This gives nontrivial information, as we shall see in thenext chapter that for any propositional proof system P in the sense of Definition4.1.1 there is a polynomial time subset A p c TAUT such that the minimal size ofP-proof oft is bounded by O(fF(A,)(r)), where F(Ap) is F augmented by Apas extra axioms.

It is apparently quite difficult to construct a Boolean valuation v giving to t avalue different from 18. This is because a nontrivial lower bound to nr implies bythe previous theorem and by Lemma 4.4.6 a nontrivial lower bound to the numberof proof-steps (in F), and by Lemma 4.5.7 also to the size of EF-proofs oft. Nonontrivial lower bounds are known for these two measures.

However, in the case of constant-depth Frege systems there are nontrivial bounds(cf. Sections 12.3-5), and, in fact, there is a general construction allowing us toconstruct a Boolean valuation from a k-evaluation. It is a modification of thealgebraic direct limit construction (see Gratzer 1979). We first describe a generalsetting and then the particular case arising from k-evaluations.

Let (I, <) be a partial order that is decomposed into three levels 11, 12, 13 suchthat for any eight a l , ... , as E I, there is b E 12 such that a i , ... , as < b, andsuch that for any two b1, b2 E 12 there is c E 13 for which b1, b2 < c. Assume thatfor every a E I there is a total Boolean algebra Bp, and that for every a < b thereis an embedding hab of Ba into Bb such that the whole collection (Ba)a, (hab)abforms a commutative diagram. That is

hab o hbc = hac

whenever a < b < c.Let the set X be the disjoint union of the universes of algebras Ba for a E Ii.

For U E Ba and V E Bb, a, b E Ii define

u - v iff hac(u) = hbc(V), some c E 12

The following observation follows from the condition that for any two b 1, b2 E 12the algebras Bb, and B62 can be jointly embedded into some B, C E 13.

Lemma 13.3.7. Let a 1, a2 E Il and b1, b2 E 12 such that a 1, a2 < b 1, b2, and letul E Ba,, U2 E Ba2. Then

haibi(ul) = hazb,(u2) f haibz(ul) = ha2b2(u2)

Lemma 13.3.8. For any collection (Ba)a, (hab)ab satisfying the preceding condi-tions, the relation = is a partition of the set X.

Proof. It is clear that = is reflexive and symmetric. The transitivity follows fromLemma 13.3.7 as any three Ba, , BaZ , Ba3 , al, a2, a3 E II, can be simultaneouslyembedded in some Bb, b E 12. Q.E.D.

Definition 13.3.9. Denote by [u] the =-equivalence class of u E X. A partialstructure 8 is defined as follows (u E Ba, V E Bb, W E Be, and a, b, c E Ii):

1. the universe of 13 is the quotient X/ -,2. OB=[0B;]and ln=[lB;],3. -.[u] = [v], provided -had(u) = hbd(v) for some d E 12,4. [u] o [v] = [w], provided hRe(u) o hbe(v) = hce(w) for some e E 12,

o=A,V.The partial structure 13 is called the limit of the collection (Ba)a, (hab)ab.

Lemma 13.3.10. The limit B of the collection (Ba)a, (hab)ab satisfying the con-ditions in Definition 13.3.9 is a partial Boolean algebra.

Proof Take an identity of BA, say the distributivity law

Ul n (u2 V U3) = (ul n u2) V (ul A u3)

Assume that in B it holds

[u2] V [u3] = [u4], [u l] A [u4] = [u5]

and

[U I] A [u2] = [u6], [ul] A [u3] = [u7], [u6] V [u7] = [u8]

for u, E Ba,, ai E Ii, i = 1, ... , 8. We need to show that [u5] = [u8]. Assumeotherwise and let b > a 1, ... , a8 be such that all Ba; can be simultaneously embed-ded in Bb. Then by Lemma 13.3.7 the preceding equalities hold for ha;b(ui)'s inplace of [ui]'s too but also hacb(u5) hasb(u8). That is impossible as Bb satisfiesall identities of BA.

An analogous argument applies to all other identities of BA as they all containat most eight different subterms. Q.E.D.

The following theorem is then clear.

Theorem 13.3.11. Let r be a set of formulas and assume that (I, <) and thecollection (Ba)a, (hab)ab satisfies the preceding conditions. Let B be the limit ofthe collection (Ba)a, (hab)ab as defined. Assume that It = r and that there is amap f : I' --* X such that f (O) E BO for all 0 E r.

Moreover assume that for any collection {01, 02, 031 c r there is b E 12 suchthat the map

hm,b o f : 0 H hm,b (f(O))

is a Boolean valuation of {01, ¢2, 03} into Bb.Then the map

vErH[f(O)]E13is a Boolean valuation of r.

Theorem 13.3.11 can be used to obtain Boolean valuations from complete sys-tems. As an example we shall show that a k-evaluation of a set of formulas con-structed in Section 12.5 gives rise via the limit construction to a Boolean valuationof that set.

The next lemma follows from Lemmas 12.3.6 and 12.3.7. M = MMOD° in therest of the section.

Lemma 13.3.12. Let S and T be a k-complete system (resp. an 2-complete sys-tem), and let BS and BT be the Boolean algebras of the subsets of S and T, re-spectively. Assume that T refines S and that k + £ < n. Define map h : Bs - BTby

h(H) := T(H)

Then h is an embedding of Bs into BT.

Let a pair (H, S) be a k-evaluation of the set r of formulas built from the atomsof the form pX, X E [A]a, and assume that 24k < n. Let [r]" denote the set of< t-element subsets of r. Set I1 := r, 12 :_ [r] `s, and 13 := [r]`16

For a E I, and b E Ii define a < b if a C b. To every set b = (*I, ..., *e) EIj assign the Boolean algebra of the subsets of the k-complete system Sb

So X ... X So

defined as the minimal common refinement of all systems S*,,, u = 1, ... , f.Such a system exists by Lemma 12.3.5 and is 2k-complete. Define further the maphab : 13a - 13b by

hab(H) := Sb(H)

By 13.3.12 these maps are embeddings and by Lemma 12.3.6 the collection(Ba)a, (hab)ab forms a commutative diagram. The next lemma is obvious.

Lemma 13.3.13, For any three formulas i/it , t/r2, t/i3 E r there is a E 12 such thatthe composite map

t/i H H,1, -* hl*}a (HO

is a Boolean valuation of the set I* 1, 1fi2, i/i3).

Let B be a partial Boolean algebra that is the limit ofthe collection (Ba )a, (hab)ab.Define the map v of formulas from r into B by

v(*) := [H,,,]

That is, the value of ili is the congruence class of H,,. The following lemma nowfollows from Theorem 13.3.11.

Lemma 13.3.14. The map v : I' -. 13 is a Boolean valuation of r. If HO = O forsome 0 E r then v(¢) = O.

Theorem 12.4.3 implies the following lemma.

Lemma 13.3.15. For every d there are E > 8 > 0 such that for every sufficientlylarge n and every set r of at most 2"8 depth d formulas built from the atoms px,X E [A]a, there is a partial a -partition p e M such that

Ipl < n - nE

and there exists a Boolean valuation

v:f'P - B

in which v([Countn]P) 11.

Note that the proofs of lower bounds from Theorems 12.5.7 and 12.5.10 consistof showing that the instances of PHP" (resp. of Counts (a = p in this case))receive by v the value 113 (cf. the remark after Theorem 13.3.6).

Let us close this section by giving an explicit description of the Boolean valuein 8 assigned to 0 E r by the limit construction applied to k-evaluations, and byreexamining the role of the k-complete systems.

A formula 0 gets the value

v(0) = {(H, S) I (S x Sp) (H) = (S x SO)(H,p)}

where (H, S) runs over all H C S and S a k-complete system.

Lemma 13.3.16. Let 8a for some a E II be the Boolean algebra of the subsetsof the k-complete system Sa a,-) c_ M. Let bi := [fail] E B be thevalues of Jai) in B.

Then bl, ... , br form a partition of unity in B:1. b;Abj =0t3fori j2. V I<i<r bi = 113.

The proof is omitted.

13.4. Bibliographical and other remarksThe proof of PHP from the counting (13.1.3) was noted in Woods (1981) and Parisand Wilkie (1985). The proof system CP was defined in Cook et al. (1987), whereLemmas 13.1.5 and 13.1.6 are observed. Counting in F was developed in Buss(1987), where Corollary 13.1.8 was proved directly. Theorem 13.1.7 was noted inKrajicek (1995a) and we have followed Buss (1987) in the use of the carry-saveaddition. Corollary 13.1.10 is from Goerdt (1990). When the size of an inequalityis measured by the sum of the absolute values of the coefficients, an exponentiallower bound for CP was proved in Bonet, Pitassi, and Raz (1995) (see Krajicek(1995b) for an alternative proof via interpolation). Pudlak (unpublished) extendedthis to ordinary size. Cook et al. (1987) suggest that Tseitin's graph formulas basedon bounded degree expanders are candidates for not having short CP-proofs.

For topics related to Lemma 13.2.1 see Bonet (1993). A communication com-plexity approach to the lower bounds for F is briefly discussed in Krajicek (1995a)(as are other related search problems). The graph tautologies were introduced inTseitin (1968) and studied in Urquhart (1987). Hastad (1993) improved the shrink-ing factor to p2-o(l).

The next chapter is devoted to the question of hard tautologies. No naturalcombinatorial principles (like PHP or Count' of Chapter 12) are known that wouldform plausible candidates for having only long proofs in F (but see Krajicek andPudlak 1995). Arai (unpublished) suggested a principle of linear algebra

An+1x = 0 -+ Any = 0

(A an n x n 0-1 matrix, x a 0-1 vector), and Karchmer (unpublished) suggested thatthe Graham and Pollak theorem (saying that no n - 2 complete bipartite subgraphsof the complete graph K can partition the edges) may be hard for F as all itsknown proofs utilize the linear algebra operations not known to be definable bypolynomial size formulas. Both these principles are, however, provable in UI andthus admit quasi-polynomial size F-proofs (Theorem 9.1.6). This author suggestedearlier (see Clote and Krajicek 1993) that a theorem of Bondy (1972) might bea good candidate, but Bonet, Buss, and Pitassi (1995) constructed its polynomialsize F-proofs.


Urquhart (unpublished) posed the question whether the set C(G), for G boundeddegree expanders, must have exponential size constant-depth F-proofs. An affir-mative answer might allow one to prove for constant-depth systems a result in thespirit of Chvatal and Szemeredi (1988).

The content of Section 13.3 is from Krajicek (1995a); the limit constructionand Theorem 13.3.11 are new.

14

Hard tautologies and optimal proofsystems

We shall study in this chapter the topic of hard tautologies: tautologies that are can-didates for not having short proofs in a particular proof system. The closely relatedquestion is whether there is an optimal propositional proof system, that is, a proofsystem P such that no other system has more than a polynomial speed-up over P.We shall obtain a statement analogous to the NP-completeness results character-izing any propositional proof system as an extension of EF by a set of axioms ofparticular form. Recall the notions of a proof system and p-simulation from Sec-tion 4.1, the definitions of translations of arithmetic formulas into propositionalones in Section 9.2, and the relation between reflection principles (consistencystatements) and p-simulations established in Section 9.3. We shall also use thenotation previously used in Chapter 9.

14.1. Finitistic consistency statements andoptimal proof systems

We shall denote by Taut (x) the IIb-formula Tauto(x) from Section 9.3 definingthe set of the (quantifierfree) tautologies, denoted TAUT itself.

Recall from Section 9.2 the definition of the translation

O(x) - IIoIIq(n)(P1.... ,P,,)

producing from a IIb-formula a sequence of propositional formulas (Definition9.2.1, Lemma 9.2.2). Also recall that a number a (or a formula, . . . coded by anumber a) is represented by a tuple a of 0, 1: the bits of a. We use the notation >vfor p-simulation from Definition 4.1.3.

299

300 Hard tautologies and optimal proof systems

Definition 14.1.1.(a) Let P be a propositional proof system. Function c p (r) : TAUT -* N is

defined by

cp(r) := min(12rI 17r isa P proofoft)

(b) Let P, Q be two propositional proof systems. Then system P is better thanQ, P > Q in symbols, iff there is a polynomial p(x) such that

VT E TAUT, cp(t) .

Observe that P is better than Q if Q has a polynomial speed-up over P andthat P > p Q implies P > Q but the converse does not necessarily apply.

Problem. Does there exist an optimal propositional proof system?

Any proof system P that proves all tautologies in polynomial size is optimal;thus NP = coNP implies the affirmative answer to the problem (cf. Theorem 4.1.2).It is unknown, however, whether the converse implication is also true.

Nontrivial information about the problem is provided by Theorem 9.3.17 andCorollary 9.3.18: Relative to any theory S2 or TZ there is an optimal proof system.That is, there is a >-greatest proof system among those whose consistency isprovable in the theory. We use the idea of the proof of these results to obtain aparticular representation of a general proof system.

Recall from Definition 9.3.11 the formula 0-RFN(P)(x)

Vy(IYI < IxI)Vz(JzI _< IxI), P(y, z) A Fla(z) -- Taut(z)

where P(u, v) and Fla(v) are Ob-formulas defining the relations "u is a P-proofof v" and "v is a propositional formula," respectively.

Denote by II O-RFN(P) II the sequence of the propositional formulas

I10-RFN(P)II := {I10-RFN(P)(x)Ilq(n)Il n < w}

where q(x) is a fixed bounding polynomial of formula O-RFN(P)(x).

Theorem 14.1.2. Let P be a propositional proof system. Let

EF + II0-RFN(P)II

be the proofsystem obtained from EF by adding tautologies from II O-RFN(P) I I as

extra axioms.Then:

EF + I I0-RFN(P) I I ?p P

14.1 Finitistic consistency statements and optimal proof systems 301

In particular:

EF + I I0-RFN(P) I I ? P

Proof Let Jr be a P-proof of i. By Lemmas 9.3.13 and 4.6.3 there is a polynomialsize EF-proof r11 of

II P(u, v)IIm(fr, t) A II Fla(v)IIm(t)

where m = max(IjrI, Ir1) (we skip the explicit reference to particular boundingpolynomials, similarly as in Section 9.3). From this formula (and ql) and the newaxiom

I IO-RFN(P)I Im

we get by substitution a polynomial size (EF + I10-RFN(P)I I)-proof rig of

II Taut(v) I Im (t)

There is a polynomial size EF-proof 173 of the implication (analogously to Lemma9.3.15)

I I Taut(v) I Im (t) -* r

From 172 and 113 one obtains by modus ponens a polynomial size (EF +110-RFN(P)I I)-proof n4 of T.

Note that r7l and r13 are actually constructible by a polynomial time algorithm,and so this gives a p-simulation of P by (EF + I 10-RFN(P)11). Q.E.D.

For P sufficiently strong (simulating the modus ponens and the substitutionof constants in polynomial size, e.g., P D EF) we could replace the reflectionprinciple 1IO-RFN(P)II by a bit more elegant consistency statement Con(P) (cf.Lemma 9.3.12).

Note that a natural P (like the systems SF, G;, G, is, in fact, p-equivalent to EF +I I O-RFN(P)1 I. This is because such a P admits a polynomialtime construction of proofs of the formulas I I Con(P) I I' (cf. Theorem 9.3.24).

Now we link the problem of the existence of an optimal proof system to twoquestions, one from logic and one from structural complexity theory. The logicalquestion deals with the lengths of first order proofs of the so-called fmitisticconsistency statements. Let T be a consistent theory extending S2 and with apolynomial time set of axioms. Then there is a A -formula PrfT(y, z) expressingthat "y is a T-proof of formula z " Consider a formula Cony (a) naturally expressingthat no T-proof of length < a is a proof of 0 = 1:

Vy, IYI a -+ Prfp (y, f0 = 11)

Note that ConT (a) is not a bounded formula.

It is a fundamental problem to estimate the length of the shortest proof of thesentence ConT(n) in a theory S; see Pudlak (1986, 1987) for more background.The length of the numeral n is O(logn); hence the only a priori lower boundto such proofs is the length of the formula: Q (log n). The next theorem sharplyestimates the length of the shortest S-proofs in the case when S = T.

Theorem 14.1.3 (Pudlak 1986, 1987). Let T 2 SZ be a consistent theory with apolynomial time set of axioms and let ConT (a) be the formula defined previously.

Then there are constants E > 0 and c > 1 such that for all n the minimal sizem n of a T proof of the sentence Con T (n) satisfies

nE < mn < nc

Note that (ConT (n) I << nE and hence the lower bound is nontrivial. The upperbound is also nontrivial. To see this take, for example, S = SS and T = ZFC.There does not seem to be another way to prove ConT (n) in S than to list (in S)all T-proofs of length < n and check that none of them is a proof of 0 = 1. Thisgives, however, only the estimate 20(n)

The question whether there is S admitting size n 0(1) proofs of ConT (n) for allT can be linked to the problem posed earlier.

Theorem 14.1.4. The following two propositions are equivalent:1. there exists an optimal propositional proof system2. there exists a consistent theory S 2 SZ with a polynomial-time set of axioms

such that for every consistent theory T 2 SZ with a polynomial-time set ofaxioms there is polynomial p(x) such thatfor each n the sentence ConT (n)has S proof of size < p(n).

Proof. Assume that P is an optimal proof system. Define the theory Sp by

Sp := Sz + 0-RFN(P)

Now let T 2 SZ be a consistent theory with a polynomial-time set of axioms.Define the formula 0(x)

O(x) := ConT(IxI)

Then 0(x) is a TIb-formula. Consider a proof system Q

Q:=P+111011mIm<w}

The formulas 1 1011' are tautologies as T is consistent.Because P is optimal, there is a polynomial q(x) such that each 110111 has

P-proof of size < q(m). Hence the theory Sp admits proofs of

Taut(Ii-011m)

14.1 Finitistic consistency statements and optimal proof systems 303

of size m°M and, similarly with the proof of Theorem 14.1.2, also size m°(1)proofs of

Ixl<m -* O(x)

Hence 0(n) has Sp-proof of size log(n)0(1), and consequently ConT(N), equiv-alent to O(2"), has Sp-proof of size N°O). This proves that the first statementimplies the second.

Now let S be a theory satisfying the second statement. Define the propositionalproof system Ps by

Ps(r, r) iffPrfs (n, [Taut(f)1)

Let Q be an arbitrary propositional proof system. By Theorem 14.1.2 Q is p-simulated by the system EF + I Io-RFN(Q)I I. As S Q S2, Ps p-simulates EF (byCorollary 9.3.18). It is thus sufficient to construct polynomial size Ps-proofs forthe tautologies

I I0-RFN(Q) I l m

Consider the theory TQ

TQ := Si + O-RFN(Q)

By the hypothesis there are polynomial size S-proofs of

ConTQ (h)

Assume that rr is the size in = 17r I Q-proof of r. As - Taut E Eb, there is k < wsuch that the implication

-Taut(f) --+ 3y(lyI Mk n Prfs2 (y, [,Taut[f1])

is provable in SS and hence also in S: This is an instance of a general statementproved by induction on the logical complexity of q5 (x).

Claim. Let (x) be a Eb formula. Then there is k < w such that for every u < co

N 10(u) - SS Hjulk -O(u)

where F,,, is an abbreviation for 'provable by a proof of size < in.In fact, this itself is formalizable in S21. That is, SZ proves

0(u) -). (3y(lyl l ulk) Prfs2 (y, FOOD

where Pijsi is a Ob formalization of "y is an S2 proof of 0 (u)."

For the same reason there is a size < mk TQ-proof of Q(fr, f), and by theaxioms 110-RFN(Q)I I TQ also admits size < mk proofs of

Taut(?)

and hence there are size m 0(1) S-proofs of

3z, Izl < In lk A PrfTQ (z, (Taut[?]l )

This formula and the last but one entails that there is a size mo(i) S-proof of

Taut(?) -p -'ConTQ(me)

some fixed f < w.By the hypothesis there is a constant t < w such that all ConTQ(n) have size

< n' S-proofs; hence there are size < met S-proofs of Taut(f).By the definition of PS this proof is also a PS-proof of r of size < met < 17r let.

Q.E.D.

One may speculate about a construction of a theory T for which given S doesnot admit size n0(i) proofs of ConT(n). Possible candidates are T := S + Consor a theory formed from S by adding to the language a truth predicate for formulasin the language of S, Tarski's conditions on this predicate and the statement (usingthe new predicate) that all axioms of S are true (such theory is called "jump" of Sin Buss 1986). However, if for S one can find T without short S-proofs of ConT (h)it follows that S does not prove that NP = coNP. This is because the formula 0considered in the preceding proof is IIb and NP = coNP would allow us to expressit also as a Eb -formula and so its instances (and consequently the instances ofConT (x)) would have polynomial size proofs by the claim in the previous proof.

The next theorem links the problem of the existence of an optimal proof systemto a problem in structural complexity theory.

Theorem 14.1.5. The following two propositions are equivalent:1. there exists an optimal propositional proof system2. for every coNP-set X there exists a nondeterministic Turing machine M

accepting exactly X and such that for everypolynomial-time, sparse Y C_ Xthere is a polynomial p(x) such that every u E Y is accepted by M in time

< P(hil).

We refer the reader to Krajicek and Pudlak (1989a) for the proof.

14.2. Hard tautologiesThe first definition formalizes a notion of hard tautologies. Recall the definitionof the function cp(r) in Definition 14.1.1.

14.2 Hard tautologies 305

Definition 14.2.1. A sequence {Tn }n <w of tautologies is hard for a propositionalproof system P ifthe. following three conditions are fulfilled:

1. there exists a polynomial time machine computing from I(') the, formulaTn

2. n < I rn l..for all n3. there is no polynomial p(x) for which

cp(r) < P(ITnI)

would hold, for all n

Note that ITnI = nOM. Conditions 1 and 2 imply that set {rn I n < (o} ispolynomial-time recognizable and so we may add it to P as extra axioms to forma new proof system Q := P + {Tn I n < (o}. Adding extra axioms to a generalproof system precisely means that n is a Q-proof of T if it is a P-proof of a -* T,where or is a conjunction of substitution instances of new axioms. P is then notbetter than Q (in the sense of Definition 14.1.1). Hence the task to construct a hardsequence { r }n « for P is the same as the task to find proof system Q such thatP Q and its axiomatization over P by a polynomial-time set of tautologies.

Assume P > EF. Having Q for which P Q we may take the proof systemQ' := EF + I10-RFN(Q) 11. By Theorem 14.1.2 Q' > Q; hence P Q' and thesequence (II0-RFN(Q)IIn}n,, is hard for P. This gives the following simple butuseful statement.

Theorem 14.2.2. Let P be aproofsystem and assume that P > EF. The, followingthree statements are equivalent:

1. there exists a sequence of tautologies {r n }n < hard. for P2. there exists a proof system Q such that P is not better than Q : P Q

3. there exists a p r o o f system Q such that the sequence {II O RFN(Q) I I n }n <,,

is hard for P

The quasi-ordering > (Definition 14.1.1) ofproof systems induces a reducibilityamong sequences { r, ), <, , {or, }, < w over a given system P:

{Tn}n<m >p {Qn}n<a, if P + [-in I n < w} >p P + {an I n < co}

that is, formulas an can be deduced by polynomial size P-proofs from substitutioninstances of some Tin's.

For example, by Lemma 12.5.6 and Theorem 12.5.7

{Countn}n >p {PHPn}n

but not the converse for P being the depth d Frege system (d > 5).The question whether there exists a > p-complete sequence (i.e., a maximal

element of the quasi-order >p) is obviously equivalent with the main problem

of Section 14.1 whether there is an optimal proof system. If one restricts to thesequences with polynomial size proofs in some system Q, then among those therewill be a >p-complete one, provided Q has a polynomial-time axiomatizationover P.

We are interested in natural sequences {rn }n<,,, that would be hard for systemsfrom Chapter 4, such as F or EF. We would like to find a Do(a)-formula ¢(x, a),expressing a natural combinatorial principle for finite structures, such that thesequence

{Wn}n<m

is hard for F or EF (recall Definition 9.1.1 for translation (0)n). For example,the formulas 0 representing the pigeonhole principle or the modular countingprinciples of Section 12.5 give rise to tautologies hard for constant-depth Fregesystems but not for F (by Theorems 12.5.3 and 12.5.4 and Corollary 13.1.8).Similarly the Ramsey theorem from Section 12.1 is not hard for F.

In fact, if `dxVa, 0 (x, a) represents a scaled down 111-form of a combinatorialprinciple whose usual proof depends on the existence of large numbers (like variousmodifications of Ramsey theorem), then it is actually likely to have polynomial sizeF-proofs. This is because the III -version takes the large number as a parameter;hence the length of the tautologies is also large.

Take a coNP set defined by a combinatorial property, for example, the set ofgraphs without a large clique where large means of size greater than nE, n thenumber of vertices and E > 0 fixed. For fixed n let p;/ be (n) atoms for thepossible edges in a graph with n vertices and q,,,,, 1 < u < mnEl, I < v < nanother n 1+E atoms. Consider the formula A (p, q )

AV qj,v A A A(-'g1I u v ,quzu)u V U10112 V

V V (q«, v, A qu2V2 A 'PV, V2)VI V2 11154112

expressing that if J (u, v) l q,,, = ])defines a 1-1 map then its range is not a cliquein the graph {(i, j) I pig = 1}. Note that any coNP-property can be representedin a similar fashion as it can be defined by a 11 -formula (use the translations ofSections 9.1-2).

For a particular graph H of size n let AH(q) denote the instance of An (p, q )with pig replaced by 1 if the edge (i, j) is in H. Then AH is a tautology if Hhas no large clique, and such tautologies cannot have polynomial size proofs inany proof system unless NP = coNP (cf. Theorem 4.1.2). Our aim is, given aparticular proof system P, to find a sequence of graphs H, without large cliquessuch that the sequence { A Hn }n < ,,, is hard for P; in particular, the graphs Hn willbe constructible by a p-time function.

Theorem 14.2.3. Let P > E F be a proof system that is not optimal. Then thereexist graphs H such that the formulas (A HH }nom u, form a sequence of tautologieshard for P.

In particular, there is a p-time function f constructing Hn from 1(n) and theset {Hn I n < (o} is itself p-time.

Proof Let X denote the set of graphs without a large clique and 0 (x) be its IIb-definition in S. The theory SS obviously defines the usual reduction of TAUTto X (cf. Garey and Johnson 1979); that is, there is a p-time function g (Ab-definable in SS) such that SZ proves

Taut(a) - O(g(a))

Take a proof system Q such that P is not better than EF +IIO-RFN(Q)11; we mayassume the existence of such Q by the hypothesis of the theorem and by Theorem14.2.2.

The function f defined by

f(1(n)) .-g(IIO-RFN(Q)Iin)

has the required properties. l1 Q.E.D.

We remark that if P = EF and, for example, EF G i then Hn can be found that,in addition, the formulas AHn have polynomial size Gi-proofs. This is because Giadmits polynomial size proofs of the instances of 110-RFN(Gi)11, by Theorems9.3.16 and 9.2.5.

14.3. Bibliographical and other remarksThe question whether there is an optimal proof system was studied in Krajidek andPudlak (I 989a) and the statements of Section 14.1 are from there, except Theorem14.1.3, which is due to Pudlak (1986, 1987).

Pitassi and Urquhart (1992) show how to translate non-three-colorable graphsinto tautologies, and vice versa, such that the minimal size of an EF-proof of s ispolynomially related with the minimal number of steps needed in a constructionof the associated graph Gi using the rules of the Hajos calculus. A corollaryto Theorem 14.1.2 is that any nondeterministic acceptor of non-three-colorablegraphs can be p-simulated by the Hajos calculus augmented by a p-time set ofextra initial graphs (associated to the formulas from II O-RFN(Q) 11, for suitable Q).

15

Strength of bounded arithmetic

The previous chapters dealt mostly with the metamathematical properties of thesystems of bounded arithmetic and of the propositional proof systems. We studiedthe provability and the definability in these systems and their various relations.The reader has by now perhaps some feeling for the strength of the systems. In thischapter we shall consider the provability of several combinatorial facts in boundedarithmetic.

In the first section we study the counting functions for predicates in PH, thebounded PHP, the approximate counting, and the provability of the infinitude ofprimes. In the second section we demonstrate that a lower bound on the size ofconstant-depth circuits can be meaningfully formalized and proved in boundedarithmetic. The last, third section studies some questions related to the main prob-lem whether there is a model of S2 in which the polynomial-time hierarchy doesnot collapse.

15.1. CountingA crucial property that allows a theory to prove a lot of elementary combinatorialfacts is counting. In the context of bounded arithmetic this would require havingE,-definitions of the counting functions for E00-predicates.

The uniform counting is not available.

Theorem 15.1.1. There is no Er(a)-formula 9(a, a) that would define for eachset a and each n < (o the parity of the set {x < n I a(x)}.

Proof. Assume 9(a, a) defines the parity of the sets. Then the propositional for-mula (9),,, provided by Definition 9.1.1, is of size < 2( log

n)o(''<< 2"Q`" and of

constant depth (Lemma 9.1.2) and computes the parity of {i < n I p; = 1) forpi := (a);. This contradicts Theorem 3.1.10. Q.E.D.

308

15.1 Counting 309

We may still hope for a nonuniform counting. That is, the definitions of thecounting functions would not be instances of one Eb(a)-formula, but each

70definable set would require a special E00-definition of its counting function.As mentioned at the end of Section 2.2, Toda (1989) puts a strong block to such

a hope. Recall the definitions of the function #R and of the class #P from the endof Section 2.2 and define the function ®R by

®R(x) := the parity of the set {y I R(x, y)}

and let ® P be the class of the functions ®R with R E P satisfying the conditionR(x, y) -* lyl < Ixl°('). Note that the counting function for a predicate A(x) isthe function #R for R(x, y) := (A(x) A y < x).

We state Toda' s result explicitly; recall that P denotes the class of the func-tions computable by a polynomial time machine with an oracle from PH.

Theorem 15.1.2 (Toda 1989). Assume that PH does not collapse.Then

and even ®P54 P

A form of the counting that is available in S2 is provided by the next simple butuseful lemma.

Lemma 15.1.3. Let g(x) be a junction with a Eb0

-graph and majorized by 2X `""Then there is a. function f (a, b), E ,-definable in S2, such that S2 proves

f(a,0)=0AVi <lal, f(a,i+l)= f(a,i)+g(i)

In particular, if A is a E00-definable set then there is a junction h(a, i),definable in S2, such that S2 proves h (a, 0) = 0 and that for all i < I a

I h(a, i) +I if A(i)h(a,i+1)=h(a, i)

i < Ial the length Ig(i)I is bounded by i°M = lal°WWW. Hence thesequence

( 0, g(0), g(0) + g(1)..... g(0) + ... + g(lal - 1) )

consisting of the values of f (a , i) for i < la l is coded by a number < 21° 1 °(') andbounded induction can be applied to prove the existence of such a sequence forall a.

The second part follows, applying the first part to a particular g, the character-istic function of A. Q.E.D.

Note that the lemma implies that all E,,-consequences of S2(a)c0 t (cf. Sec-tion 13.1) are provable in the theory SZ + 1-Exp of Section 5.5.

310 Strength of bounded arithmetic

Although the counting is not available within PH, an approximate counting isavailable.

Theorem 15.1.4 (Sipser 1983b). Let A be a Eb00 definable set and let e > 0 befixed. Then there is a. function f (x) with a Eb00-graph such that for any a

IA nal < f(a) < (1 +E) IA flat

Proof. Fix a of the form a = 2n and identify a with the n-dimensional vectorspace over F2. Denote Aa := A fl a and assume that2t-' < l Aa l < 2t. Take U tobe the set of (1 + t) x n, 0-1 matrices.

Claim 1. For any x E Aa there are at least > I U I /2 = 2t +n matrices M E U. forwhich

VyEAa\{x}, Mx0My

For any y x, mx = my holds for 2n-' vectors m; hence Mx = My holdsfor I UI/2t+' = 2n matrices M (as they have t + 1 rows). Hence the number ofmatrices satisfying the formula from the claim is at least

IUI-2U1,(IAal-1)> Izl =2t+n

using 2- I A 1/2 and such that

/xEBVyEAa\B,Mx0My

The claim follows as by Claim 1 there are at least > (I U I /2) 1 Aa I pairs (M, X) EU x Aa satisfying Claim 1, whereas the failure of Claim 2 would imply that thereare < IUIIAaI/2 such pairs.

Iterating Claim 2 < log(IA,, 1) = t-times yields

Claim 3. There are matrices MI, ..., Mt E U and sets Bp = 0 C BI C ... CBt = Aa such that

Bi+1 =BiU{xEAa\Bi IVyEAa\Bi,x y - Mi+lx Mi+1y}

Define a map F : a = 2" 2('+t)t < 2I Aal log(IAal) by

F(x) :_ (Mix..... Mtx)

The map F is 1-1 on the set Aa. To make F E!' -definable take such F for theminimal t < n = Ia l and for the lexicographi All ..... M, E U forwhich it is 1-1 on Aa.

15.1 Counting 311

Take k minimal such that

(2kn) < (1 + E)k

Clearly k < 0(n) = O(IaI) and so we may define a set A' of the sequences ofelements of Aa of length k, identified with elements of ak. Hence I A' I = I Aa I k.

Applying Claim 3 to the set A' we get a map F' : A' -* 2IA'I log(I A'j) that is1-1 on A'. By the choice of k we have

(2IA'I log(IA'I))k I = (2IAaIkk log(IAaI))k ' < IAaI(2kn)k < (1 +E)IAaI

This allows us to define a function f (a) to be the kth-root of the supremum ofthe range of F. Then

IAaI < f(a) < (1 +E)IA,I

as required, and f is E,,-definable. Q.E.D.

The inability to count in bounded arithmetic can sometimes be replaced bythe bounded PHP, which was pioneered by Woods (1981). To ensure that thepresentation is unambiguous we define the principle once again.

Definition 15.1.5. The Eb pHPQ is the set of axioms of the form00

3x<bb'y<a,-'9(x,y) V 3xi <x2<b3y<a,8(xi,y)n8(x2,y)

where 8 E Eb and may contain other parameters (different from a, b).DoPHPQ is the same scheme but, for Ao formulas 9 only.EbPHP and OoPHP denote the scheme for b = a + 1.

Note that we formulate the pigeonhole principle as a statement about injectivefunctions rather than about bijections as earlier.

We shall state explicitly an important open problem.

Problem. Does the theory S2 prove Does the theory I Do proveDoPHP?

By (the proof of the) Theorem 11.2.4 the theory S2 proves E0PHPa for b =a2, 2a; this is open for the Do-case. On the other hand, S2(a) does not provePHPQ+1(a) (Corollary 12.5.5).

Lemma 15.1.6. Assume that S2 is not finitely axiomatizable.Then no theory S2 proves all formulas from EbPHP

Proof. Assume that 8(x) is a formula for which the induction fails.

0(0) A (dx < a, 0 (x) -> 0 (x + 1)) A -'9(a)

Define a 1-1 function f: a + 1 -> a by

f(x) _-x ifO(x)

jlx-1 if-.9(x)

Hence proves S2. That is, S2 I- would imply SS = S2 and thelemma follows by Theorem 10.1.1. Q.E.D.

An important case when the counting (or the existence of exponentially largenumbers) could be replaced by EbPHP is the proof of the existence of infinitelymany primes. The usual proof of the existence of a prime bigger than a requiresus to take a! + 1, which is not definable in S2 (by Theorem 5.1.4).

Theorem 15.1.7 (Woods 1981, Paris et al. 1988). The theory S2 proves that thereexist unboundedly many primes

b'x3p > x, Prime(p)

where Prime(p) is a IIb-definition ofprimes

In fact, the theory S2 proves Sylvester's theorem

Vx > lb'y > x3z, y < z < y+x A (3p < z, Prime(p) A p > x A pIz)

where pIz denotes the divisibility

The reader may find the proof of the first part in Paris et al. (1988) and of thesecond part in Woods (1981).

It is an open problem whether the theory I Do can prove the infinitude of primes.It is also unknown which of the basic elementary number-theoretic theorems, suchas Fermat little theorem or Lagrange's four-square theorem, can be proved in IDo;see Berarducci-Intrigila (1991). Woods (1981) conjectured that the theory I Do (Jr )can prove the infinitude of primes, where the formulas ir(0) := 0 and

Yr(n +n(n)+1 ifn+1isaprime

1)=7r(n) otherwise

are added to PA-.

15.2. A circuit lower boundIn this Section we show that a weaker form of Theorem 3.1.10 can be proved in asubtheory of S2. Recall from Section 3.1 the relevant definitions.

15.2 A circuit lower bound 313

We shall adopt the following framework. The input 0-1 vectors are coded bynumbers a, b, ..., hence the number of the input bits is a length n = Ia I..... Theset of the input vectors of a given length is thus definable, but it is not coded by anumber.

The circuits are also coded by numbers. This implicitly indicates that we cannotdefine circuits of a superpolynomial size as numbers of length

21"'are not defin-

able (Theorem 5.1.4). Clearly the size and the depth of a circuit are A b -defin-able, as is the function computed by a circuit (the circuit is a parameter in thedefinition).

Boolean functions are identified with subsets of 2" = a and thus are not codedby numbers. But all interesting functions, in particular all characteristic functionsof the sets from PH, are definable by a bounded formula (by Ab- resp. Eb- in thecase of P-resp. NP-sets; cf. Theorem 3.2.12).

This framework does not allow us to prove (even to formulate) Theorem 3.1.7or to prove an exponential lower bound. It allows us, however, to rephrase suchbounds as the nonexistence of polynomial upper bounds. By Theorem 3.1.4 theexistence of polynomial size circuits for NP-functions is the chief problem andthat is perfectly formalizable in the framework.

We begin by proving (a weaker form of) Theorem 3.1.10. The notions from thefollowing definition occurred in a slightly different form in Section 12.3 and inthe discussion at the beginning of Section 12.4.

Definition 15.2.1. Let x , ... , x be Boolean variables and let M be the set ofpartial truth assignments to xi , ... , x,,.

1. A complete system is a subset S C M such that every total truth assignmentextends exactly one a E S.

2. For H, S C M, S refines H if

`daeS,(VheH,h1a)v(2hEH,hca)

where h 1 a means that h, a are contradictory.3. For H C M, the norm of H is IIHII := maxhEH I dom(h)I.

For a E M denote by [a] the set of the tuples a for which ai = a(xi),if xi E dom(a). Then UaEs[a] partitions 2" if S is a complete system. If f is aBoolean function such that for some complete system S and a partition So U S1 = Sof S it holds that

f (i) = U [a], i = 0, 1acs,

then both f and -f are expressible in a disjunctive normal f o r m whose conjunc-tions are of size < 1 1S1 1 . Assume f = UU f and

fu 1(1) = U [a]a E S11`

where JI S" 11 < t for all u. To express f and -f as a disjunction of conjunctionsof size < s it is sufficient to find a complete system S refining H := U,, Su andsatisfying 11 S11 < s as then

/- (1) = U [0lPEST

where

S1 :={PESI3u&tESu, aC0)

Hence the following lemma substitutes the switching Lemma 3.1.11 in this ap-proach to collapsing circuits.

For XCMand pEM

XP :={a\pIaUpeM, aEX}

In the next proof we follow the strategy of the proof of Lemma 12.3.10.

Lemma 15.2.2. Let HQ C_ M be sets for which I) He II < t < s in.

Assume 2m < (n' -E - 1/8t)', some fixed E > 0.Then for sufficiently large n there is p E .M for which Idom (p) I = n - nE and

such that

3 a complete St C .M p, 11 Se 11 < s and SQ refines He

holds for all £.

Proof. First fix f and denote Ht simply by H. Also fix p E M.Let H = h' , ... , h'' be a fixed listing of H. Construct the following two-player

game.

Step 1: Player I picks h I := the first hi consistent with p; player II picksassignment al on dom(h 1) \ dom(p).Step i : Player I picks hi := the first hJ consistent with p U a1 U ... U ai _ 1;player II picks ai an assignment on dom(hi) \ (dom(p) U Uj< i dom(aj)).

The game ends when either all hi's are exhausted or hi g p U U<i aj.The following claim is analogous to the claim in the proof of Lemma 12.3.10.

Claim. The set of all

ai U...Uak

in a finished game forms a complete system refining H.

We would like to show that for some p, in no game can II assign > s values.A marked value sequence is v E {0, 11' together with jo = 0 < j, < <

jk = s. We think of a marked value sequence as of a coding that II played ji - ji - i

values vJ,_i+j, ... , v1, when facing hi.

15.2 A circuit lower bound 315

The history of the game (specifying it uniquely) is the list of h j, ... , h k that IIfaced plus the sequence D = (D1, ... , Dk) s.t. D, is a ji - j; _ i -element subsetof t, specifying the positions of the atoms in hi that II had to evaluate.

Notice that because by definition h; is the first hi not given 0 by p U U,< r ai,instead of knowing the history of the game it is enough to know the (unique)assignment r with the domain dom(p) U U,<k dom(Q;) that extends p and s.t.

pUQ1 U(r dom(Q;))

makes hi true, all i < k. Then from r, using the marked value sequence v and D,it is easy to reconstruct all h 1 , ai, ... , hk, 6k, and p.

Thus the triple v, D, r determines p and hence a map

FH(p) := (v, D, r)

where v, D are associated to a fixed game in which II assigns > s values is a 1-1map.

Assume that I dom(p) I = k. Then there are ()2" such p's and at most ( k+s)2k+spartial assignments r. Clearly there are < 4s marked value sequences and < issequences D. Hence FH is a 1-1 map from A :_ (k)2k into B := (k+s)2k+s(4t).`An elementary computation shows

A > (8t)-s k + I > (8t)-s k + I s >

(k

s

B n-k-s+1s

n-k+1 8t(n-k)Choosing k = n - nE we get

A n1-E-1B - ( 8t )

that is 2"o"' for t < s < n6 such that c + 8 < 1, and is > no(s), for t a fixedconstant.

Thus the map FH violates the weak PHP in both cases.Now assume that for every p, Idom(p) I = k, there is HE for which player II

assigns in a game on HQ at least s values. Define

F(p) := (f, FHe (p))

Then F is a 1-1 map from A into B m and by the hypothesis of the lemma(A/Bm) > 2. Hence F violates the weak PHP and there must be p for which it isundefined, that is, for which player II assigns < s values in any game on any Hf.

Q.E.D.

Now we use the lemma to formalize a lower bound to the constant-depth circuitsin the theory PV I + WPHP(PV 1); recall PV I from Section 5.3 and WPHP(PV 1)from Section 7.3.

Theorem 15.2.3. Let d be a constant and p(x) a polynomial. Then the theoryPV1 + WPHP(PV,) proves that, for sufficiently large n there are no depth d size< p(n) circuits computing the parity function ®(xi, ... , x").

Proof Work in a nonstandard model M of PV1 + WPHP(PV i) and let n ELog(M) \ w. Let C E M be a depth d size < p(n) circuit in xl , ... , x, .

As in the standard proofofTheorem 3.1.10 we show that for some p, I dom(p) I >n - nEd,

the function computed by the restricted circuit Cp has an associated com-plete system of the norm < s << nEd. Such a p is obtained by iterating Lemma15.2.2 < d-times (as outlined before that lemma). Hence we only need to showthat the lemma holds in M with m = p(n).

We shall take t = s a large (w.r.t. d and p(x)) but fixed constant. Then thez -definable subsets of M of norm < s are coded by numbers < t (C), tan L-term. Elements of M are also coded, and there are only finitely many (< (0)marked value sequences and tuples D; specifically, a quantification over them issharply bounded. Taking in the definition of FH (p) the first v, D (r is unique) forwhich it is defined makes FH Ab-definable.

As we cannot count in M the domain and the range of F, we cannot follow theproof of Lemma 15.2.2 in demonstrating that F violates WPHP. Instead we shallcertify such a violation by constructing a zb-definable 1-1 map G of two disjointcopies of (a set containing) Rng(F) into Dom(F). Dom(F) is supposed to be theset of all p E M, Idom(p) I = k = n - nE. The composed function F o G violatesthe WPHP.

Assume we have a Ob-definable 1-1 map Go of 2 (8t)S m copies of [n]k+,into [n]k. Construct the map G (we use the notation from the proof of Lemma15.2.2) as follows

1. identify the elements of the form (1, (v, D, r)) with m (4t)S copies of r2. identify the set of r E M, IT I = k + s, with 2' copies of the set of

pairs (X, rt), where X E [n]k+' and ri E M with 117I = k (thinking ofX = dom(r) and ri the restriction of r to the first k elements of X, ands-tuples of the values of r \ q determining one of the 2S copies)

3. map the pair (X, ri) to p, where dom(p) = G0(X) and such that the valuesof p are given by n

Map G is 1-1 and clearly Ab-definable. It remains to construct Go, which issimple and left as an exercise. Q.E.D.

15.3. Polynomial hierarchy in models of bounded arithmeticIn this section we study the complexity classes within the models of boundedarithmetic systems and we show some connections among various open problemsmentioned earlier in the book.

We begin with a definition; see also Section 7.6.

15.3 Polynomial hierarchy in models of bounded arithmetic 317

Definition 15.3.1. Let M be a model of PV.1. P(M), the class of the polynomial-time subsets of M, is the class of the

subsets of M definable by an open PV-formula with parameters from M.E f (M), the class of the E f -subsets of M, is the class of the subsets of Mdefinable by a Eb formula with parameters from M.IIp(M), the class of the rlf-subsets of M, is the class of the subsets of Mdefinable by a 11b -formula with parameters from M.The classes EP(M) and rI p(M) are also denoted NP(M) and coNP(M).PH(M) is the class of the subsets of M definable by a with

parameters from M.2. The classes Pstan(M), (EP)S1Qn(M), (IIP)stan(M), and PHstan(M) are

defined analogously to the classes in 1 but allowing no parameters.3. The classes (P/poly)(M), (E f lpoly)(M), (IIp/poly)(M), and

(PH/poly)(M) are defined analogously to the classes in 1 but allowingdifferent parameters for every length n E Log(M) of elements of M.

For example, X E (P/poly)(M) if there is an open PV-formula 0(x, y) suchthat for every n e Log(M) there is cn E M such that for any a e M, Ia I = n

aEX iff

The following is one of the main open problems.

Fundamental problem. Is there a model M S2 such that PH(M) does notcollapse? That is: such that PH(M) E f (M), all i < w?

The problem is equivalent, by Corollary 10.2.6, to the fundamental problemwhether S2 is finitely axiomatizable. A little is known about the problem, but thereare some surprising connections with the bounded PHP. We first prove two resultsabout end-extensions. K D M is an end-extension of M if M is a cut in K. K is aproper end-extension of M if, moreover, K \ M 0 0.

Lemma 15.3.2. Let M, K be two models of I AO and assume that K is a properend-extension of M. Then M BE 01

Proof. Recall the collection scheme B from Definition 5.2.11. Let 9(x, y, z) bea 0o-formula with parameters from M and assume

M = `dx < a3y3z6(x, y, z)

As M is a cut in K, Lemma 5.1.3 implies that for all b E K \ M

K l=Vx <a3y,z<b6(x,y,z)

K satisfies the least number principle for Ao-formulas; hence there is the minimalb with this property and clearly b E M. Then (by Lemma 5.1.3 again)

M = Vx < a3y < b3z9(x, y, z)

Q.E.D.

The following lemma constructs a particular end-extension.

Lemma 15.3.3. Let M be a model of I AO and a, t E M \ co such that there isb E M, M = b = at. Assume that for some R(x, Y) E AO (M)

M PHP(R)az

That is, the function F with graph R violates the weak PHP.Then there is an end-extension K of M, a model of I Do, such that, for some

z

Proof. Assume a'2 0 M as otherwise we could take K := M. Similarly, as in theproof of Theorem 11.2.3, we may find a function

H:a x2' --* a

with a 0o(M)-graph such that for any G : 2' -* a, G E D0(M), there is g < asuch that

Mk-- dx<2t,H(g,x)=G(x)Thinking about g < a as about a "number"

k:= H(g, i) a'i<2'

we may D0-define in M the structure MO with the universe {k I g < a) anddefine the operations +, and relations =, < on it, simulating the constructionfrom Lemma 5.5.4.

In M0 there is a largest element. We take a cut I C_ M such that t2 E I but2' 0 I, closed under addition, and then define K to be the substructure of Mo with

E K iffmax(i I H(g, i) ¢ 0) E I.The model M itself embeds in K as a cut, and a'2 E K is the element g with

H(g, i) --0 otherwise

Q.E.D.Q.E.D.

We are now ready to prove a theorem with an interesting proof.

Theorem 15.3.4 (Paris et al. 1988). Assume that I00 does not prove DoPHPa2.Then I AO is not finitely axiomatizable.

Proof. Assume that IAo does not prove AoPHPa2. By a compactness argumentit follows from Theorem 11.2.3 that there is a model M = I AO and a, t E M \ wsuch that

1. at exists in M2. akt, k = 1, 2, ... , are cofinal in M3. M H -PHP(R)a2, some R E A0(M).

By 2 ate is not in M and so Lemma 15.3.3 implies that M has a proper end-extension and hence satisfies by Lemma 15.3.2 the collection scheme BE°.

Now assume that IAo is finitely axiomatizable. We use this to construct a modelM satisfying 1-3 but not satisfying BE?, which will be a contradiction.

From the finite axiomatizability assumption it follows that in any model of 1 AOthere are finitely many Skolem functions fi, ... , fr, A0-definable in M, such thatany substructure closed under them is a model of IAo too.

Take some Mo satisfying 1-3 and define a function Z : N -* Mo by(i) Z(O) :_ (a, t, at, b),

where b are all parameters in the definition of R(ii) Z((i, u)) := f (Z (u)), for i = 1, . . . , r(iii) Z((r + 1, u)) := Z(ul) + Z(u2), for u = (ui, u2)(iv) Z((r + 2, u)) := Z(ui) Z(u2), for u = (ul, u2)(v) Z((s, u)) := 0, for s > r + 2Let M C M0 be the subset of M0 defined by

cEM if c = Z(k), some k E N

Then M is, in fact, a substructure of Mo (by (iii) and (iv)), is a model of IAo (as itis by (ii) closed under the Skolem functions), contains a, t, a' (by (i)), and henceR is A0-definable in M too and therefore violates PHP(R)a2. But also a'2 V M.

To see that BE? fails in M note that the function Z has a E?-definition (commonfor M0 and M) and that

M Vx < a3k, y, (Z(k) = x A y = akt)

But by 2 there is no bound in M one could put on y. Q.E.D.

The idea behind the proof of Lemma 15.3.3 can be used to prove anotherinteresting result. Recall Definition 3.2.1 of the classes E, . The symbols Es tan and

tan indicate that again parameters are not allowed.

Theorem 15.3.5 (Paris and Wilkie 1985). Let M be a model of 'I Ao, a, t E M \ wsuch that at exists in M and assume that R E AO(M) violates PHP(R)a2.

Then the hierarchy Aoa"(M) does not collapse. That is, there is no i such that

Asstan(M) = Etan(M)

Proof. Denote by E' and A' the classes of E;- respectively of Ao-formulas withparameters among u.

Without loss of generality we may assume that all parameters in the definitionof R are smaller than at . As in the previous propositions let G : a' -+ a bean injective Do-definable map; we may assume that it is A0 'a-definable as theminimal parameters in the definition of R that yield a 1-1 map G are definablefrom at, a.

Claim 1. For any j there is a O0',a formula Try (x, y) such that for any Ei 'aformula 0 (x) there is n < w such that

Vx<a',0(x)= Trj (x,n)

Any 0 is for x < a' equivalent to a formula of the form

3yil ,...,yi <atb'Y2,.... y2 <a'...Q11 < a' 00(x, T, at, a)

where 0o is an open formula with atomic subformulas of the form tj = t2, tt + t2 =t3, or t1 t2 = t3 with ti, t2, t3 E 10, 1, at, a, x, y}. We shall simply assume that 0is of this form.

Let q (x, y, u) be a Ao ,a-formula, a partial truth definition, such that for everyformula 0 of the preceding form there is f < co such that for x, y < at

00 (x, Y, at, a)=q (x,Y,f)

Such a formula is constructed easily.Using the map G we may rewrite 0(x) in the form

<a'Az1 cRng(G),

V(zz, ... , zz) < at , z2 c Rng(G) (x, z, a', a)

where tJr is a Oo -formula obtained from ri(x, y, t) by replacing variable y by(the DD 'a-definition of) G(-»(z).

In this way we constructed a 0o'a-formula Tri(x, v) such that every E -

formula 9(x) is for x < at equivalent to Trj(x, n ), suitable n encoding both kand £.

Claim 2. There is no j such that

Oa'a(M) C Ea',a(M)

The claim follows from the previous claim as, for example, the A0a"a -formula

-, Try (x, x) cannot be expressed as an E ra-formula.

As any t(M)-set ofthe form {x I0(x, u)}canberetrievedfromtheA,an(M)-set { (x, y) I 0(x, y) }, the assumption that for some i

Asstan(M) C Estan(M)

would imply that for some j

A"(M) C Ej"(M)

But this contradicts Claim 2, and the theorem is proved. Q.E.D.

We conclude the first part of the Section by stating a proposition relating thePHP and consistency statements. It rests upon two facts: that S2 + Exp provesEb PHP and that S2 F- NP = coNP allows us to express each bounded formula as00F b and apply the provable E b-completeness (cf. the claim in the proof of Theorem14.1.4).

Theorem 15.3.6. Assume that the theory S2+ Con(S2) does not prove Eb PHP.Then S2 does not prove that NP = coNP. That is, there is a H11' -formula notequivalent in SZ to any fib formula.

We turn our attention to the simplest case of the fundamental problem.

Problem. Does there exist a model M of PV in which NP coNP that is,NP(M) 0 coNP(M)?

Further we say that a theory proves NP = coNP if NP(M) = coNP(M) holdsfor each model of the theory, and similarly for other classes. We shall also freelymove between the languages of PV and SZ; note that the L -formulas in L areequivalent to the open PV-formulas (cf. Section 5.3).

Theorem 15.3.7. Consider three propositions:1. NP = coNP2. P = NP3. EF is a complete proof system

Then if S2 proves one of the propositions, it proves all three, and then also PVproves all three propositions.

Proof. SZ proves I if it proves 2 by Corollary 7.2.5. If SZ proves

Vr, - Taut(r) V EF F- r

then there is, by Theorem 7.2.3, a polynomial-time function deciding whether -ris satisfiable or r a tautology (as by Theorem 9.3.16 and Lemma 4.6.3 SZ provesthat EF is sound). That is, SZ F- P = NP.

Assume that S2 proves NP = coNP. That is, Taut(a) (2y < t(a), 6(a, y)),for some 0 E A,. Then we may think of b < t (a) A 0 (a, b) as a proof system


P (b a proof of a) whose soundness (and completeness) is provable in SZ S. Henceby Corollary 9.3.18 SZ proves that EF polynomially simulates P and so it is alsocomplete.

The provability of the propositions in PV follows from Corollary 7.2.4 as theyare all V301b. Q.E.D.

A simple corollary to Corollary 7.2.5 is the following.

Corollary 15.3.8. Assume that P NP. Then there is a model of SS in which NPcoNP.

We link the completeness of EF in a model of PV to the extendability of themodel. First we need a technical lemma. Recall Definition 9.2.1. To simplify thenotation we shall not show the bounding polynomials explicitly.

Lemma 15.3.9. Let M be a model ofPV and let (a) be a IIb formula. Let m e M.Then the. following two conditions are equivalent:

1. there is an extension N of M (necessarily Ab elementary) such that

N PV + -¢(m)

2. M = EF V 11.0 llImI(m)

Proof Assume that condition 1 holds while 2 fails and let 7r be an EF-proof of110 11 mI(th) in M. Then 7r E N and it is an EF-proof in N too. As EF is sound inmodels of PV, I10111m1 (m) is a tautology in N and hence by Lemma 9.3.12 0(m)holds in N. That contradicts the assumption.

Now assume that 1 fails. This implies that the theory PV + Diag(M) in thelanguage with a constant for every element of M, where Diag(M) is the diagramof M, proves the formula 0(m). That means that PV proves an implication

a(m, u) -f 0(m)

where u are some other constants and a is an open formula. By Theorem 9.2.6 inM there is EF-proof of

IlaIll»1I,I11I(m, u) -, 11011 ml(th)

As or is true in M, by Lemma 9.3.12 Ilor lllml'I"I(m, u) has in Man EF-proof andthus also II0II m (th) has (by modus ponens) an EF-proof in M. This shows that 2fails. Q.E.D.

We get the following corollary.

Corollary 15.3.10. Let M be a model of PV. Then the following two conditionsare equivalent:

1. The system EF is complete in M.2. Any extension of M is Eb-elementary.

Proof. Let (P (m) be a IIb-sentence. Then by Lemma 9.3.12 it holds in M ifIIoII""`gy(m) is a tautology in M. The rest follows by Lemma 15.3.9. Q.E.D.

The last two statements demonstrate that a way to show that EF is not completein a model M is to find a suitable tautology r and an extension N of M containinga new truth assignment not satisfying r. A forcing-type construction of such anextension is provided (in the context of the theory V related to SZ by the RSUV-isomorphism) by Theorem 9.4.2. Such a construction can be also viewed as aconstruction of a particular Boolean valuation v (outside the model) of the setof the propositional formulas coded in M that gives r a value v(r) 0 1a (cf.Section 13.3). The problem is to produce such an evaluation without assumingthat EF is not complete in M.

Another possibility how to prove that PV does not prove NP = coNP wouldbe to demonstrate a superpolynomial lower bound to the size of EF-proofs. Thefollowing lemma is a direct corollary of Parikh's Theorem 5.1.4.

Lemma 15.3.11. Assume that the theory S2 proves that the system EF is complete.Then all tautologies have polynomial size EF-proofs.

We shall show at least that PV itself does not prove such superpolynomial lowerbounds. Let Bound(f) be the formula

Vx3r > x, Taut(r) A EF Fff(Irr r

where Ff,,, means "not provable by a proof of size < m."We say that f is provably in PV superpolynomial if for every polynomial p(x)

PV proves

Vx3y > x3z < y, f(z) = y A p(z) < y

Theorem 15.3.12. Let f be a junction definable in PV and provably in PV super-polynomial. Then

PV F/ Bound(f)

Proof. Let M be a countable nonstandard model of PV such that for some elementa E M the numbers laIk, k = 1 , 2, ... , are cofinal in Log(M).

We shall construct a countable chain of cofinal extensions of M : Mo = M CMl _c .... Fix in advance a countable language L' with names for all elementsof all M, and let cl, ... be an enumeration of L' with infinite repetitions. PutMO := M and assume that we have Mi. Consider two cases:

1. C, E M; and M; Taut (c;) A EF 1 f c,2. otherwise.

In the first case take an extension Ni of M; in which -'c; is satisfiable and put

M1+1 :={a EN; I3bE M;,N; [= a <b)

hence M,+1 is a cofinal extension of M;.In the second case put M,+1 := M;.Note that if c E L' is a propositional formula in some M; then in some Mj

either it has an EF-proof or its negation is satisfiable. Hence M' := U; M; is acofinal extension of M in which every tautology has an EF-proof of size < ja 1k,some k < w. This implies that M' -. Bound(f) whenever f is PV-provablysuperpolynomial. Q.E.D.

In the model M' in the previous proof the exponentiation is not a total function.It would be interesting to improve the construction to obtain a model of SZ + - Expin which EF is complete. Such model would exist, in particular, if any countablemodel of SZ would admit a Ob-elementary extension to a model of SZ in whichEF is complete. This is because we could take for the ground model a modelsatisfying the negation of some Vl-Ib-consequence of SZ + Exp (such formulasexist as SZ + Exp is not Vl-Ib-conservative over SZ ; cf. Section 10.5).

Another direction in which it would be interesting to improve the constructionis to find a model of PV +- Exp in which EF is complete and that satisfiesthe collection scheme BEb (Definition 5.2.11). This would eliminate a possiblepathological property of M' in which the shortest EF-proofs of tautologies < a,for some a, are cofinal in the model. It would also imply that the function f (n) :=maxj11<n cEF(r) is total in M' and subexponential (see Definition 14.1.1 for thedefinition of CEF(T))

We conclude the Section recalling that a nonuniform version of the problem,a construction of a model of PV in which NP/poly 0 coNP/poly, would implythat PV S2' (PV) (Theorem 10.2.4, second part). Hence even the very first steptoward the fundamental problem has important consequences. Indeed, I believe thata construction of a model of PV in which EF is not complete would yield a hintof how to construct Boolean valuations of sets of polynomially many formulasand how to prove the nonexistence of a polynomial upper bound to the size ofEF-proofs.

15.4. Bibliographical and other remarksTheorem 15.1.1 follows from Yao (1985) and Hastad (1989); the superpolynomiallower bounds of Ajtai (1983) and Furst, Saxe, and Sipser (1984) imply only thenonexistence of a AO(a)-counting.

Paris and Wilkie (1985) show that fin Lemma 15.1.2 can be AO if A is 00. Theproof of Theorem 15.1.4 follows an unpublished presentation of Sipser (1983b)by A. Wilkie; Paris and Wilkie (1985) show by another proof that for A E AO afunction f satisfying a weaker inequality IAQ 1 < f(a) < (AQ 11 +1 can also be D0.

Bounded PHP was studied in Woods (1981), where Sylvester's theorem is de-duced by using AoPHPQ°.

The proof of Lemma 15.2.2 follows to a large extent an unpublished proof ofWoods of the switching lemma 3.1.11. He estimates the probability (over p) thatin a game player II assigns > s values and that that game follows a given markedvalue sequence v, splitting the computation further according to the history of thegame. Prior to our presentation Razborov (1994a) found a similar presentation ofHastad's original proof of 3.1.11.

Razborov (1995) considers a framework in which Boolean complexity is for-malized in second order bounded arithmetic systems. Input vectors are still codedby numbers, but Boolean functions and circuits are coded by sets. This allows us,in particular, to speak directly about exponentially large circuits. He shows that insuch formalization all main known lower bounds can be proved in the theory V11(and often in a weaker theory). By Lemma 5.5.14 there is no problem with directcounting arguments. The only potential problem may arise in formalizing proofswhere Boolean functions are quantified (e.g., in various conditional probabilitiesas in the original proof of 3.1.11 in Hastad 1989), which could increase the com-plexity of induction formulas. Razborov (1994a) overcomes this in the case of theswitching lemma by a new proof, another option is to replace the original quanti-fication over arbitrary functions by a quantification over a A 11 'b-definable sequenceof functions.

Razborov (1994) complements this by a result in the opposite direction: namelythat the theory S2 (a) does not prove a superpolynomial lower bound to the circuitscomputing the satisfiability predicate unless a standard cryptographic assumptionfails; see also Krajicek (1995b) for an alternative proof. This assumption, say-ing that strong pseudo-random number generators exist, implies in particular thatNP ¢ P/poly.

Lemma 15.3.2 is due to Paris and Kirby (1978). Lemma 15.3.3 and Theorem15.3.4 are from Paris et al. (1988); Theorem 15.3.5 is from Paris and Wilkie (1985).

Theorem 15.3.6 is from Paris and Wilkie (1987b) and relates to the interpretabilityquestions; see also Hajek and Pudlak (1993) for details and related topics.

Lemma 15.3.9 is from Krajicek and Pudlak (1990b); A. Wilkie (unpublished)gave a model-theoretic proof. Theorem 15.3.12 is also from Krajiirek and Pudlak(1990b); Cook and Urquhart (1993) proved a similar result for the intuitionisticversion of S2 1, and Buss (1990b) showed that the two results are actually equivalent.

A research program some reader may contemplate is to show that a statementlike NP coNP is not provable in PV, for example. This is a task to construct a


model of PV in which NP = coNP and it is somewhat interesting. Theorem 7.6.1is in this spirit.

I shall close the book, however, recalling the Preface: In my opinion, instead ofconstructing a model in which NP = coNP and thus confirming what we alreadyknow, namely that the problem is not elementary and that the available methodsare insufficient, it would be far more interesting to construct a model M for whichone could prove NP(M) 0 coNP(M) by an argument using the underlying ideabehind our belief that the conjecture NP # coNP is true. Results in the oppositedirection, as well as many oracle-independence results, have less to do with theproblem and rather exploit technical details of the definitions.

REFERENCES

Adleman, L., and Manders, K. (1977) Reducibility, randomness, and intractability, in: Proceedingsof the 9th Annual ACM Symposium on Theory of Computing, pp. 151-63. ACM Press.

Ajtai, M. (1983) E'-formulae on finite structures, Annals of Pure and Applied Logic, 24: 1-48.Ajtai, M. (1988) The complexity of the pigeonhole principle, in: Proceedings of the IEEE 29th

Annual Symposium on Foundations of Computer Science, pp. 346-55.Ajtai, M. (1990) Parity and the pigeonhole principle, in: Feasible Mathematics, eds. S.R.Buss and

P.J.Scott, pp. 1-24. Birkhauser.Ajtai, M. (1994a) The independence of the modulo p counting principles, in: Proceedings of the

26th Annual ACM Symposium on Theory of Computing, pp. 402-11. ACM Press.Ajtai, M. (1994b) Symmetric systems of linear equations modulo p, preprint.Ajtai, M. (1995) On the existence of modulo p cardinality functions, in: Feasible Mathematics II,

eds. P. Clote and J. Remmel, pp. 1-14. Birkhauser.Aleliunas, R., Karp, R. M., Lovasz, L., Lipton, M. J., and Rackoff, C. (1979) Random walks,

universal traversal sequences, and the maze problems, in: Proceedings of the IEEE 20th AnnualSymposium on Foundations of Computer Science, pp. 218-23.

Alon, N., and Boppana, R. (1987) The monotone circuit complexity of boolean functions, Com-binatorica, 7(1): 1-22.

Andreev, A. E. (1985) On a method for obtaining lower bounds for the complexity of individualmonotone functions (in Russian), Doklady AN SSSR, 282(5): 1033-7.

Andreev, A. E. (1987) On a method for obtaining more than quadratic effective bounds for thecomplexity of;r-schemes (in Russian), Vestnik Moskovskove Univ. Matemat., 42(1): 70-3.

Arai, T. (1991) Frege system, ALOGTIME and bounded arithmetic, manuscript.Baker, T., Gill, J., and Solovay, R. (1975) Relativizations of the P = ? NP question, SIAM J.

Computing, 4: 431-42.Balcazar, J. L., Diaz, J., and Gabarro, J. (1988) Structural Complexity L Berlin, Springer.Balcazar, J. L., Diaz, J., and Gabarro, J. (1990) Structural Complexity II. Berlin, Springer.Barrington, D. A. (1989) Bounded-width polynomial-time branching programs recognize exactly

those languages in NC', Comp. Systems Sci., 38(1): 150-64.Beame, P., Impagliazzo, R., Krajicek, J., Pitassi, T., Pudlak, P., and Woods, A. (1992) Exponential

lower bounds for the pigeonhole principle, in: Proceedings of the 24th Annual ACMSymposiumon Theory of Computing, pp. 200-21. ACM Press.

Beame, P., Impagliazzo, R., Krajicek, J., Pitassi, T., and Pudlak, P. (1994) Lower bounds onHilbert's Nullstellensatz and propositional proofs, submitted.

Beame, P., and Pitassi, T. (1993) Exponential separation between the matching principles and thepigeonhole principle, submitted.

Bellantoni, S., Pitassi, T., and Urquhart, A. (1992) Approximation and small depth Frege proofs,SIAMJ. Computing, 21(6): 1161-79.

327

328 References

Bel'tyukov, A. (1979) A computer description and a hierarchy of initial Grzegorczyk classes,Zapiski Nauc. Sem. LOMI, 88, Steklov Institute.

Bennett, J. H. (1962) On spectra, Ph.D. Thesis, Princeton University.Berarducci, A., and Intrigila, B. (1991) Combinatorial principles in elementary number theory,

Annals of Pure and Applied Logic, 55: 35-50.Beth, E. W. (1959) The foundations of mathematics. Amsterdam, North-Holland.Blake, A. (1937) Canonical expressions in boolean algebra, Ph.D. Thesis, University of Chicago.Bondy, J.A. (1972) Induced subsets, J. Combinatorial Theory (B), 12: 201-2.Bonet, M. (1993) Number of symbols in Frege proofs with and without the deduction rule, in:

Arithmetic, Proof Theory and Computational Complexity, eds. P. Clote and J. Krajicek, pp.61-95. Oxford, Oxford University Press.

Bonet, M., and Buss, S. (1993) The deduction rule and linear and near-linear proof simulations,J. Symbolic Logic, 58(2): 688-709.

Bonet, M., Buss, S. and Pitassi, T. (1995) Are there hard examples for Frege systems? in: FeasibleMathematics II, eds. P. Clote and J. Remmel, pp. 30-56. Birkhauser.

Bonet, M., Pitassi, T., and Raz, R. (1995) Lower bounds for cutting planes proofs with smallcoefficients, preprint.

Boppana, R., and Sipser, M. (1990) Complexity of finite functions, in: Handbook of TheoreticalComputer Science, ed. J. van Leeuwen, pp. 758-804. Amsterdam, Elsevier.

Buss, S. R. (1986) Bounded Arithmetic. Naples, Bibliopolis. (Revision of 1985 Princeton Univer-sity Ph.D. thesis.)

Buss, S. R. (1987) The propositional pigeonhole principle has polynomial size Frege proofs, J.Symbolic Logic, 52: 916-27.

Buss, S. R. (1988) Weak formal systems and connections to computational complexity, lecturenotes, University of California, Berkeley.

Buss, S. R. (1990a) Axiomatizations and conservation results for fragments of bounded arithmetic,in: Logic and Computation, Contemporary Mathematics, 106: 57-84. Providence, AmericanMathematical Society.

Buss, S. R. (1990b) On model theory for intuitionistic bounded arithmetic with applications toindependence results, in: Feasible Mathematics, eds. S.R. Buss and P.J. Scott, pp. 27-47.Birkhauser.

Buss, S. R. (1991) Propositional consistency proofs, Annals of Pure and Applied Logic, 52: 3-29.Buss, S. R. (1993a) Some remarks on lengths of propositional proofs, submitted.Buss, S. R. (1993b) Relating the bounded arithmetic and polynomial time hierarchies, submitted.Buss, S. R., and Hay, L. (1988) On truth-table reducibility to SAT and the difference hierarchy

over NP, Proceedings Structure in Complexity, IEEE Computer Society Press, 224-33.Buss, S. R., and Krajicek, J. (1994) An application of boolean complexity to separation problems

in bounded arithmetic, Proceedings of the London Mathematical Society, 69(3): 1-21.Buss, S. R., Krajicek, J., and Takeuti, G. (1993) On provably total functions in bounded arithmetic

theories R3, U2 and V2, in: Arithmetic, Proof Theory and Computational Complexity, eds.P. Clote and J. Krajicek, pp. 116-61. Oxford, Oxford University Press.

Buss, S. R., and Turin, G. (1988) Resolution proofs of generalized pigeonhole principles, Theo-retical Computer Science 62: 311-17.

Cai, J. (1989) With probability one, a random oracle separates PSPACE from the polynomial-timehierarchy, J Comput. System Sci., 38: 68-85.

Chiari, M., and Krajicek, J. (1994) Witnessing functions in bounded arithmetic and search prob-lems, submitted.

Chrapchenko, V. M. (1971) A method of determining lower bounds for the complexity of 7r-schemes (in Russian), Matemat. Zametki, 10(1): 83-92.

Chvital, V., and Szemeredi, E. (1988) Many hard examples for resolution, J. of ACM, 35(4):759-68.

Clote, P. (1992) ALOGTIME and a conjecture of S.A. Cook, Annals ofMathematics and ArtificialInteligence, 6: 57-106.

Clote, P., and Krajicek, J. (1993) Open problems, in: Arithmetic, Proof Theory and ComputationalComplexity, eds. P. Clote and J. Krajicek, pp. 1-19. Oxford, Oxford University Press.

References 329

Clote, P., and Takeuti, G. (1986) Exponential time and bounded arithmetic, in: Proceedings Struc-ture in Complexity, LN Comp. Sci. 223, Springer-Verlag.

Clote, P., and Takeuti, G. (1992) Bounded arithmetic for NC, ALOGTIME, L and NL, Annals ofPure and Applied Logic, 56: 73-117.

Cobham, A. (1965) The intrinsic computational difficulty of functions, in: Proceedings of theInternational Congress Logic, Methodology and Philosophy of Science, ed. Y. Bar-Hillel,pp. 24-30. North-Holland.

Cook, S A. (1971) The complexity of theorem proving procedures, in: Proceedings of the 3rdAnnual ACM Symposium on Theory of Computing, pp. 151-8. ACM Press.

Cook, S A. (1975) Feasibly constructive proofs and the propositional calculus, in: Proceedings ofthe 7th Annual ACM Symposium on Theory of Computing, pp. 83-97. ACM Press.

Cook, S A. (1991) Computational complexity of higher type functions, in: Proceedings ICMKyoto1990, pp. 55-69. Springer-Verlag.

Cook, S. A., and Reckhow, A. R. (1979) The relative efficiency of propositional proof systems,J. Symbolic Logic, 44(1): 36-50.

Cook, S. A., and Urquhart, A. (1993) Functional interpretation of feasibly constructive arithmetic,Annals of Pure and Applied Logic, 63; 103-200.

Cook, W., Coullard, C. R., and Turin, G. (1987) On the complexity of cutting plane proofs,Discrete Applied Mathematics, 18: 25-38.

Craig, W. (1957) Three uses of the Herbrand-Gentzen theorem in relating model theory and prooftheory, J. Symbolic Logic, 22(3): 269-85.

Davis, M., and Putnam, H. (1960) A computing procedure for quantification theory, J. ACM, 7(3):210-15.

Dimitracopoulos, C. (1980) Matijavic's theorem and fragments of arithmetic, Ph.D. Thesis, Uni-versity of Manchester.

Dimitracopoulos, C., and Paris, J. (1986) The pigeonhole principle and fragments of arithmetic,Zeitschrifi f. Mathematikal Logik u. Grundlagen d. Mathematik, 32: 73-80.

Dowd, M. (1979) Propositional representations of arithmetic proofs, Ph.D. Thesis, University ofToronto.

Furst, M., Saxe, J. B., and Sipser, M. (1984) Parity, circuits and the polynomial-time hierarchy,Math. Systems Theory, 17: 13-27.

Gaifman, H., and Dimitracopoulos, C. (1982) Fragments of Peano's arithmetic and the MRDPtheorem, Logic and algorithmic, Monogr. Enseign. Mathemat., 30: 187-206.

Garey, MR., and Johnson, D. S. (1979) Computers and intractability. New York, W.H. Freeman.Gentzen, G. (1969) The collected papers of Gerhard Gentzen. Amsterdam, North-Holland.Godel, K. (1986) Collected Works I. eds. S. Feferman et. al. New York and Oxford, Oxford

University Press.Godel, K. (1990) Collected works II. eds. S. Feferman et. al. New York and Oxford, Oxford

University Press.Goerdt, A. (1990) Cutting plane versus Frege proof systems, in: Computer Science Logic '90,

LNCS 533, pp. 174-94. Springer-Verlag.Gratzer G. (1979) Universal algebra, 2nd ed. Springer-Verlag.Grzegorczyk, A. (1953) Some classes of recursive functions, in: Rozprawy Matematiczne IV,

Warszawa.Hijek, P., and Pudlik, P. (1993) Metamathematics of first-order arithmetic, Perspectives in Math-

ematical Logic. Springer-Verlag.Haken,A. (1985) The intractability of resolution, Theoretical Computer Science, 39: 297-308.Hartmanis, J., Lewis, P. M., and Steams, R. E. (1965) Hierarchies of memory limited computations,

IEEE Conference Research on Switching Circuit th. and Logic design, Ann Arbor, Michigan,pp. 179-90.

Hartmanis, J., and Steams, R. E. (1965) On the computational complexity of algorithms, Trans-actions of the American Mathematical Society, 117: 285-306.

Hastad, J. (1989) Almost optimal lower bounds for small depth circuits. in: Randomness andComputation, ed. S. Micali, Ser. Adv. Comp. Res., 5: 143-70. JAI Press.

330 References

Hastad, J. (1993) The shrinkage factor is 2, in: Proceedings of the IEEE 34th Annual Symposiumon Foundations of Computer Science, pp. 114-23.

Hausdorf, F. (1978) Set Theory, 3rd ed. Chelsea.Hopcroft, J., Paul, W., and Valiant, L. (1975) On time versus space, J. Assoc. Computing Machinery,

24: 332-7.Immerman, N. (1988) Nondeterministic space is closed under complementation, SIAM J. Com-

puting, 17(5): 935-8.Johnson, D. S., Papadimitriou, C. H., and Yannakakis, M. (1988) How easy is local search?

J. Comput. System Sci., 37: 79-100.Kadin, J. (1988) The polynomial time hierarchy collapses if the boolean hierarchy collapses,

in: Proceedings of the IEEE 29th Annual Symposium on Foundations of Computer Science,pp. 278-92.

Karchmer, M. (1993) On proving lower bounds for circuit size, in: Proceedings of Structure inComplexity, 8th Annual Conference, pp. 112-19. IEEE Computer Science Press.

Karchmer, M., and Wigderson, A. (1988) Monotone circuits for connectivity require super-logarithmic depth, in: Proceedings of the 20th Annual ACM Symposium on Theory of Com-puting, pp. 539-50. ACM Press.

Karp, R., and Lipton, R. J. (1982) Turing machines that take advice, Enseign. Mathem., 28:191-209.

Kaye R. (1991) Models of Peano arithmetic. Oxford, Oxford University Press.Krajicek, J. (1989a) On the number of steps in proofs, Annals of Pure and Applied Logic, 41:

153-78.Krajicek, J. (1989b) Speed-up for propositional Frege systems via generalizations of proofs,

Commentationes Mathematicae Universitas Carolinae, 30(1): 137-40.Krajicek, J. (1990) Exponentiation and second-order bounded arithmetic, Annals of Pure and

Applied Logic, 48: 261-76.Kraj icek, J. (1992) No counter-example interpretation and interactive computation, in: Logic From

Computer Science, Proceedings of a Workshop held November 13-17, 1989 in Berkeley,ed. Y. N. Moschovakis, Mathematical Sciences Research Institute Publication, 21: 287-93.Springer-Verlag.

Krajii;ek, J. (1993) Fragments of bounded arithmetic and bounded query classes, Transactions ofthe A.M.S., 338(2): 587-98.

Krajicek, J. (1994) Lower bounds to the size of constant-depth propositional proofs, J. SymbolicLogic, 59(1): 73-86.

Krajicek, J. (1995a) On Frege and Extended Frege proof systems, in: Feasible Mathematics II,eds. P. Clote and J. Remmel, pp. 284-319. Birkhauser.

Krajicek, J. (1995b) Interpolation theorems, lower bounds for proof systems and independenceresults for bounded arithmetic, submitted.

Krajicek, J., and Pudlak, P. (1988) The number of proof lines and the size of proofs in first orderlogic, Archive for Mathematical Logic, 27: 69-84.

Krajicek, J., and Pudlak, P. (1989a) Propositional proof systems, the consistency of first ordertheories and the complexity of computations, J. Symbolic Logic, 54(3): 1063-79

Krajicek, J., and Pudlak, P. (1989b) On the structure of initial segments of models of arithmetic,Archive for Mathematical Logic, 28(2): 91-8.

Krajicek, J., and Pudlak, P. (1990a) Quantified propositional calculi and fragments of boundedarithmetic, Zeitschrift f. Mathematikal Logik u. Grundlagen d. Mathematik, 36: 29-46.

Krajicek, J., and Pudlak, P. (1990b) Propositional provability in models of weak arithmetic, in:Computer Science Logic (Kaiserslautern, Oct. '89), eds. E. Boerger, H. Kleine-Bunning, andM.M. Richter, LNCS 440, pp. 193-210. Springer-Verlag.

Krajicek, J., and Pudlak, P. (1995) Some consequences of cryptographical conjectures for Sz andEF, submitted.

Krajicek, J., Pudlak, P., and Sgall, J. (1990) Interactive computations of optimal solutions, in:Mathematical Foundations of Computer Science (B. Bystrica, August '90), ed. B. Rovan,LNCS 452, pp. 48-60. Springer-Verlag.

References 331

Krajicek, J., Pudlak, P., and Takeuti, G. (1991) Bounded arithmetic and the polynomial hierarchy,Annals of Pure and Applied Logic, 52: 143-53.

Krajicek, J., Pudlak, P., and Woods, A. (1991) Exponential lower bound to the size of boundeddepth Frege proofs of the pigeonhole principle, Random Structures and Algorithms, to appear.

Krajicek, J., and Takeuti, G. (1990) On bounded -polynomial induction, in: Feasible Mathe-matics, eds. S.R. Buss and P.J. Scott, pp. 259-80. Birkhauser.

Kraji6ek, J., and Takeuti, G. (1992) On induction-free provability, Annals of Mathematics andArtificial Intelligence, 6: 107-26.

Krentel, M. W. (1986) The complexity of optimization problems, in: Proceedings of'the 18thAnnual ACM Symposium on Theory of Computing, pp. 69-76. ACM Press.

Lesan, H. (1978) Models of arithmetic, Ph.D. Thesis, University of Manchester.Lovasz, L. (1990) Communication complexity: a survey, in: Paths, Flows and VLSI Layout, eds.

Korte, Lovasz, Promer, and Schrijver, pp. 325-66. Springer-Verlag.Lovasz, L., Naor, M., Newman, I., and Wigderson, A. (1991) Search problems in the decision tree

model, in: Proceedings of the 32nd IEEE Symposium on Foundations of Computer Science,pp. 576-85. IEEE Computer Society Press.

Macintyre, A. (1987) The strength of weak systems, in: Schriftenreihe der Wittgenstein-Gesell-schqft, 13, Logic, Philosophy of Science and Epistemology, pp. 43-59, Vienna.

Mehlhorn, K., and Schmidt, F. M. (1982) Las Vegas is better than determinism in VLSI anddistributed computing, in: 14th Annual ACMSymposium on Theory of Computing, pp. 330-7.ACM Press.

Meyer, A., and Stockmeyer, L. (1973) The equivalence problem for regular expressions withsquaring requires exponential time, in: Proceedings of the IEEE 13th Symposium on Switchingand Automata Theory, pp. 125-9.

Mints, G. E. (1976) What can be done in PRA? Zapiski Nauc. Sem. LOMI, 60, Steklov Institute,pp. 93-102.

Muchnik, A. A. (1970) On two approaches to the classification of recursive functions, in: Problemsin Mathematical Logic, Complexity ofAlgorithms and Classes of'Computable Functions, eds.V.A. Kozmidiadi and A.A.Muchnik, pp. 123-8. Mir, Moscow.

Muller, D. E. (1956) Complexity of electronic switching circuits, IRE Trans. Electronic Computers,5: 15-19.

Nepomnjascij, V.A. (1970) Rudimentary predicates and Turing calculations, Doklady AN SSSR,195.

Otero, M. (1991) Models of'open indunction, Ph.D. Thesis, Oxford University.Papadimitriou, C. H. (1990) On graph-theoretic lemmata and complexity classes (extended ab-

stract), in: Proceedings of the 31st IEEE Symposium on Foundations of Computer Science(Volume 17), pp. 794-801. IEEE Computer Society Press.

Papadimitriou, C. H. (1994) On the complexity of the parity argument and other inefficient proofsof existence, J. Comput. System Sci., 48(3): 498-532.

Papadimitriou, C. H., and Yannakakis, M. (1988) Optimization, approximation and complexityclasses, in: 20th Annual ACM Symposium on Theory of Computing, pp. 229-34. ACM Press.

Parikh, R. (1971) Existence and feasibility in arithmetic, J. Symbolic Logic, 36: 494-508.Parikh, R. (1973) Some results on the length of proofs, Transactions of the A.M.S., 177: 29-36.Paris, J. B. (1984) O struktufe modelu omezene El indukce (in Czech), Casopis pestovkni matem-

atiky, 109: 372-9.Paris, J., and Dimitracopoulos, C. (1982) Truth definitions for t10 formulae, in: Logic and Algo-

rithmic, I'Enseignement Mathematique, 30: 318-29. Geneve.Paris, J., and Dimitracopoulos, C. (1983) A note on undefinability of cuts, J. Symbolic Logic, 48:

564-9.Paris, J.B., Handley, W.G., and Wilkie, A.J. (1984) Characterizing some low arithmetic classes,

Colloquia Mathematica Soc. J. Bolyai, 44: 353-64.Paris, J.B., and Kirby, L. (1978) En-collection schemes in arithmetic, in: Logic Colloquium '77,

pp. 199-209. Amsterdam, North-Holland.Paris, J., and Wilkie, A. J. (1981a) tip sets and induction, in: Proceedings of'the Jadwisin Logic

Conference, Poland, pp. 237-48. Leeds University Press.

332 References

Paris, J., and Wilkie, A. J. (1981b) Models of arithmetic and rudimentary sets, Bull. Soc. Mathem.Belg., Ser. B, 33: 157-69.

Paris, J., and Wilkie, A. J. (1984) Some results on bounded induction, in: Proceedings of the2nd Easter Conference on Model Theory (Wittenberg '84), pp. 223-8. Humboldt Universitat,Berlin.

Paris, J., and Wilkie, A. J. (1985) Counting problems in bounded arithmetic, in: Methods inMathematical Logic, LNM 1130, pp. 317-40. Springer-Verlag.

Paris, J., and Wilkie, A. J. (1987a) Counting t10 sets, Fundamenta Mathematica, 127: 67-76.Paris, J., and Wilkie, A. J. (1987b) On the scheme of induction for bounded arithmetic formulas,

Annals of Pure and Applied Logic, 35: 261-302.Paris, J. B., Wilkie, A. J., and Woods, A. R. (1988) Provability of the pigeonhole principle and

the existence of infinitely many primes, J. Symbolic Logic, 53: 1235-44.Parsons, C. (1970) On a number theoretic choice schema and its relation to induction, in: Intu-

itionism and Proof Theory, eds. A. Kino et. al., pp. 459-73. Amsterdam, North-Holland.Pitassi, T., Beame, P., and Impagliazzo, R. (1993) Exponential lower bounds for the pigeonhole

principle, Computational complexity, 3: 97-140.Pitassi, T., and Urquhart, A. (1992) The complexity of the Hajos calculus, in: Proceedings of

the IEEE 33rd Annual Symposium on Foundations of Computer Science, pp. 187-96. IEEEComputer Society Press.

Pudlak, P. (1983) A definition of exponentiation by a bounded arithmetic formula, CommentationesMathematicae Universitas Carolinae, 24(4): 667-71.

Pudlak, P. (1985) Cuts, consistency statements and interpretations, J. Symbolic Logic, 50:423-41.

Pudlak, P. (1986) On the length of finitistic consistency statements in first order theories, in: LogicColloquim 84, pp. 165-96. North-Holland.

Pudlak, P. (1987) Improved bounds to the length of proofs of finitistic consistency statements,Contemporary Mathematics, 65: 309-31.

Pudlak, P. (1990) A note on bounded arithmetic, Fundamenta Mathematica, 136: 85-9.Pudlak, P. (1992a) Ramsey's theorem in bounded arithmetic, in: Proceedings of Computer Science

Logic, eds. Boerger E. et. al., Comp. Sci., 553: 308-12.Pudlak, P. (1992b) Some relations between subsystems of arithmetic and the complexity of com-

putations, in: Logic from Computer Science, Proceedings of a Workshop held November 13-17, 1989 in Berkeley, ed. Y.N. Moschovakis, Mathematical Sciences Research Institute Pub-lication, 21: 499-519. Springer-Verlag.

Rabin, M. O. (1961) Non-standard models and independence of the induction axioms, in: Essayson the Foundation of Mathematics, pp. 287-99. Amsterdam, North-Holland.

Rabin, M. O. (1962) Diophantine equations and non-standard models of arithmetic, in: Proceed-ings of the 1st International Congress on Logic, Methodology and Philosophy of Science,pp. 151-8, Stanford University Press.

Raz, R., and Wigderson, A. (1990) Monotone circuits for matching require linear depth, in: Pro-ceedings of the 22nd Annual ACM Symposium on Theory of Computing, pp. 287-92. ACMPress.

Razborov, A. A. (1985) Lower bounds on the monotone complexity of some Boolean functions,Soviet Mathem. Doklady, 31: 354-7.

Razborov, A. A. (1987) Lower bounds on the size of bounded depth networks over a completebasis with logical addition, Matem. Zametki, 41(4): 598-607.

Razborov, A. A. (1989) On the method of approximations, in: Proceedings of the 211h AnnualACM Symposium on Theory of Computing, pp. 168-76. ACM Press.

Razborov, A. A. (1993) An equivalence between second order bounded domain bounded arith-metic and first order bounded arithmetic, in: Arithmetic, Proof Theory and ComputationalComplexity, eds. P. Clote and J. Krajicek, pp. 247-77. Oxford University Press.

Razborov, A. A. (1994) Unprovability of lower bounds on the circuit size in certain fragments ofbounded arithmetic, submitted.

Razborov, A. A. (1995) Bounded arithmetic and lower bounds in boolean complexity, in: FeasibleMathematics 17, eds. P. Clote and J. Remmel, pp. 344-86. Birkhauser.

References 333

Reckhow, R. A. (1976) On the lengths of proofs in the propositional calculus, Ph.D. Thesis,University of Toronto, Technical Report No. 87.

Ressayre, J.-P. (1986) A conservation result for system of bounded arithmetic, unpublishedpreprint.

Riis, S. (1993a) Making infinite structures finite in models of second order bounded arithmetic, in:Arithmetic,Proof Theory and Computational Complexity, eds. P. Clote and J. Krajicek, pp. 289-319. Oxford University Press.

Riis, S. (1993b) Independence in bounded arithmetic, Ph.D. Thesis, Oxford University.Riis, S. (1994) Count(q) does not imply Count(p), submitted.Robinson, J., A. (1965) A machine-oriented logic based on the resolution principle, J. ACM, 12(1):

23-41.Ryll-Nardzewski, C. (1952) The role of the axiom of induction in elementary arithmetic, Funda-

menta Mathematica, 39: 239-63.Savitch, W. J. (1970) Relationships between nondeterministic and deterministic tape complexities,

J. Computer and System Sciences, 4: 177-92.Shannon, C. E. (1949) The synthesis of two-terminal switching circuits, Bell Systems Techn. J.,

28(1): 59-98.Sheperdson, J. C. (1964) A non-standard model of a free variable fragment of number theory,

Bull. Acad. Pol. Sci., 12: 79-86.Shoenfield, J. (1967) Mathematical Logic. Reading, Mass., Addison-Wesley.Sipser, M. (1983a) Borel sets and circuit complexity, in: Proceedings of the 15th Annual ACM

Symposium on Theory of Computing, pp. 61-9. ACM Press.Sipser, M. (1983b) A complexity theoretic approach to randomness, in: Proceedings of the 15th

Annual ACM Symposium on Theory of Computing, pp. 330-5. ACM Press.Sipser, M. (1992) The history and status of the P versus NP question, in: Proceedings of the 24th

Annual ACM Symposium on Theory of Computing, pp. 603-18. ACM Press.Smale, S. (1992) Theory of computation, in: Mathematical Research Today and Tomorrow, eds.

C. Casacuberta and M. Castellat, pp. 59-69. Berlin and Heidelberg, Springer-Verlag.Smolensky, R (1987) Algebraic methods in the theory of lower bounds for Boolean circuit com-

plexity, in: Proceedings of the 19th Annual ACM Symposium on Theory of Computing, pp.77-82. ACM Press.

Smorynsky, C. (1977) The incompleteness theorems, in: Handbook of Mathematical Logic, ed.J. Van Leeuwen, Studies in Logic and Found. of Mathematics 90, pp. 821-65. Amsterdam,North-Holland.

Smorynsky, C. (1984) Lectures on non-standard models of arithmetic, in: Logic Colloquium '82,Stud. in Logic and Found. of Math. 112, pp. 1-70. Amsterdam, North-Holland.

Smullyan, R. (1961) Theory of Formal Systems, Annals of Mathematical Studies, 47. Princeton,Princeton University Press.

Smullyan, R. (1968) First-Order Logic, Springer-Verlag.Spira, P. M. (1971) On time-hardware complexity of tradeoffs for Boolean functions, in: Pro-

ceedings of the 4th Hawaii Symposium System Sciences, pp. 525-7. North Hollywood, Calif.,Western Periodicals.

Statman, R. (1977) Complexity of derivations from quantifier-free Horn formulae, mechani-cal introduction of explicit definitions, and refinement of completeness theorems, in: LogicColloquium '76, pp. 505-517. Amsterdam, North-Holland.

Statman, R. (1978) Bounds for proof-search and speed-up in the predicate calculus. Annals ofMathematical Logic, 15: 225-87.

Stockmeyer, L. (1977) The polynomial-time hieararchy, Theoretical Computer Science, 3: 1-22.Subbotovskaja, B. A. (1961) Realizations of linear functions by formulas using +, , - (in Rus-

sian), Doklady AN SSSR, 136(3): 553-5.Szelepcsenyi, R. (1987) The method of forcing for non-deterministic automata, Bull. Europ. Assoc.

Theor. Comput.Sci., 32: 96-100.Tait, W. W. (1968) Normal derivability in classical logic, in: The Syntax and Semantics ofinfinitary

Languages, ed. J. Barwise, pp. 204-36. Springer-Verlag.

334 References

Takeuti, G. (1975) Proof Theory, North-Holland.Takeuti, G. (1988) Bounded arithmetic and truth definition, Annals of Pure and Applied Logic,

39: 75-104.Takeuti, G. (1993) RSUV isomorphism, in: Arithmetic. Proof Theory and Computational Com-

plexity, eds. P. Clote and J. Krajicek, pp. 364-86. Oxford, Oxford University Press.Takeuti, G, and Zaring, W. M. (1973) Axiomatic Set Theory, Springer-Verlag.Tarski, A., Mostowski, A., and Robinson, R. M. (1953) Undecidable Theories. Amsterdam, North-

Holland.Tennenbaum S. (1959) Non-archimedean models of arithmetic, Notices of the A.M.S., 6: 270.Toda, S. (1989) On the computational power of PP and ®P, in: Proceedings of the 30th IEEE

Symposium on the Foundations of Computer Science, pp. 514-19. IEEE Computer SciencePress.

Tseitin, G. C. (1968) On the complexity of derivations in propositional calculus, in: Studies inmathematics and mathematical logic, Part II, ed. A.O. Slisenko, pp. 115-25.

Tseitin, G. C., and Choubarian, A. A. (1975) On some bounds to the lengths of logical proofs inclassical propositional calculus in Russian, Trudy Vvcisl Centra AN Arm SSR i ErevanskovoUniv., 8: 57-64.

Urquhart, A. (1987) Hard examples for resolution, J. ACM, 34: 209-19.Urquhart, A. (1992) The relative complexity of resolution and cut-free Gentzen systems, Annals

of Mathematics and Artificial Intelligence, 6: 1157-68.Valiant, L. (1979) The complexity of computing the permanent, Theoretical Computer Science,

8: 189-201.Van den Dries, L. (1980) Some model theory and number theory for models of weak systems of

arithmetic, in: Model Theory ofAlgebra and Arithmetic, eds. L. Pacholski et al., LN Mathe-matics, pp. 346-67. Springer-Verlag.

Van den Dries, L. (1990) Which curves over Z have points with coordinates in a discretely orderedring? Transactions of the American Mathematical Society, 264: 33-56.

Wagner, K. W. (1990) Bounded query classes, SIAMJ. Computing, 19(5): 833-46.Wegener, 1. (1987) The Complexity ofBoolean Functions, New York and Stuttgart, Wil ley/ Teubner.Wilkie, A. (1978) Some results and problems on weak systems of arithmetic, in: Logic Colloquium

'77, Studies in the Logical Foundations of Mathematics 96, pp. 285-96. North-Holland.Wilkie, A. (1980) Applications of complexity theory to Eo-definability problems in arithmetic,

in: Model Theory ofAlgebra and Arithmetic, LNM 834, pp. 363-9. Springer-Verlag.Wilkie, A. (1985) Modeles non standard de l'arithmetique, et complexity algorithmique, in:

Modeles non standard en arithmetique et theorie des ensembles, Publications Mathematiquesde l'Universite Paris VII, 22.

Wilkie, A. (1986) On sentences interpretable in systems of arithmetic, in: Logic Colloquium '84,Studies in the Logical Foundations of Mathematics 120, pp. 329-42. North-Holland.

Wilkie, A. (1987) On schemes axiomatizing arithmetic, in: Proceedings of the ICMBerkelev 1986,pp. 331-7.

Woods, A. (1981) Some problems in logic and number theory and their connections, Ph.D. Thesis,University of Manchester.

Woods, A. (1986) Bounded arithmetic formulas and Turing machines of constant alternation, in:Logic Colloquium '84, Studies in the Logical Foundations of Mathematics 120, pp. 355-77.North-Holland.

Wrathall, C. (1978) Rudimentary predicates and relative computation, SIAMJ. Computing, 7(2):149-209.

Yao, Y. (1985) Separating the polynomial-time hierarchy by oracles, in: Proceedings of the 26thAnnual IEEE Symposium on Foundations of Computer Science, pp. 1-10. IEEE ComputerScience Press.

Zak, S. (1983) A Turing machine hierarchy, Theoretical Computer Science, 26: 327-33.Zambella, D. (1994) Notes on polynomially bounded arithmetic, preprint.

SUBJECT INDEX

advice 7, 9, 187, 190ancestor 33antecedent 32approximation method 10axiom scheme 42

basis 9de Morgan 9

Bel'tyukov machine 278binary tree 224, 241bit graph 219Boolean algebra 254, 257

partial 289, 294Boolean circuit 9, 199, 226, 313Boolean combination 73, 137, 193Boolean connective 9, 266Boolean formula 9Boolean function 8, 313Boolean valuation 289, 292, 295, 296, 323, 324bounded arithmetic 4, 63bounded query 97bounding polynomial 145branching program 15, 22, 27, 38

carry save addition 284cedent 32clause 25, 287clique 13, 189, 233, 306

function 13coding 66communication complexity 16complete system 244, 252, 313comprehension

axiom 88bounded 85

concatenation 18congruence 289

conservativity 78, 79, 96, 111, 136, 204, 207,218, 280

consistency statement 202, 208, 321finitistic 302

constant depth 41, 141,232,258,267,277,283,285, 305

constructive sets 17counterexample 97, 114, 115, 120,428counting 308

approximate 310function 7, 90, 280, 309principle 287

modular 16, 223, 258, 265, 305cryptographic 325cut 65, 104

in a model 4, 64, 67, 68, 72, 83, 274, 317rule 33

cutting planes 61, 279, 281, 283, 286cut-free 34, 60cylinder 226

decision tree 15, 243, 288deduction lemma 46dense 274depth 141, 236

logical 31, 46of a circuit 9, 16of a formula 31of a proof 31

Diophantine equation 65dyadic numeral 78

eigenvariable 204equational

logic 268theory 2

end-extension 317, 319

335

336

explicit definition 37exponentiation 20, 65, 309, 324extended

Frege

system 53sequence 142

resolution 57extension 127, 129

atom 53axiom 53rule 53

Subject index

field 268finite axiomatizability 185, 193, 202, 311, 317,

318forcing 175, 254, 272, 274formula

bounded arithmetic 4, 18, 21free 104minor 33open 65principal 33second order 84sharply bounded 21side 33

Fregeproof 42proof system 42rule 42, 256

free-variable normal form 204function

Herbrand 113multivalued 99, 124Sipser 13, 195, 225, 230Skolem 102, 171, 319

functional 134feasible 2

interpretation 2

multivalued 135

generic 274, 275Godel number 205graph 287

expander 287, 288

Hajos calculus 307halting configuration 94hard tautologies 299, 304, 307height 15, 34herbrandization 210hierarchy

linear-time 6polynomial-time 6, 22, 62, 191, 309, 317

Hilbert-style system 61homomorphism 289

implicationally complete 42instantaneous description 93interactive 94intuitionism 2interpolant 35interpretability 62, 66, 209iteration principle 231

k-complete system 244, 266, 269, 295k-evaluation 252, 254, 258, 275, 295, 296

language 139, 266of arithmetic 3, 17context-free 20second order 83

limit 294, 295, 296direct limit 293

limited extension 26linear combination 270linear ordering 223literal 25Lob conditions 206

machine independent 2, 75majority function 13minimal size 44model 126, 128, 193, 220, 272, 316, 319

nonstandard 4, 162, 172, 272, 316, 323recursive 4, 65standard 4

modus ponens 42monotone function 13

natural deduction 61norm 243, 313NP-completeness 7, 22number

of formulas 43of steps 42

open problem 10, 24, 31, 125, 185, 243, 262,267,272,286,300,311,312,317,321

optimal proof system 299, 300, 302optimization problem 97, 186

parameter 4parity function 13partition 244part-off quantification 18path 15, 39

Subject index

pigeonhole principle 258, 261, 272, 280, 285,305, 311

bounded 308

weak 28, 118, 213, 216, 218, 223, 225, 232,234

polynomial simulation 25, 48, 52, 57, 58, 60,158, 164, 168, 186, 281, 283, 285, 303

polynomial-timemachine 94probabilistic 118

principleleast number 4, 272, 275MAX 71MIN 70

problemNP =?coNP 169, 304, 321, 323, 326P =?NP 7, 10, 321

projection 244propositional proof system 23, 268

quantified propositionalcalculus 58formula 58, 145proof system 163

recursion on notation 19, 76refinement 244, 313reflection principle 158, 167, 171regular provability 203resolution 25, 281

refutation 25regular 27, 60rule 25

restriction 14, 226, 246partial 288random 14, 196, 226

RSUV-isomorphism 83, 90, 133, 323rudimentary

function 19set 6, 18, 19

extended 22positive 18strictly 18strongly 19

ruleinference 32, 41introduction 148IND 103LIND 103PIND 103

337

schemechoice 74, 88, 90, 136

dependent 88collection 136, 317, 319, 324

bounded 73sharply bounded 73, 90strong sharply bounded 74

induction 4, 69polynomial 69length 70

replacement 74separation 88

search problem 27, 38, 121, 125, 180, 182, 288second order oracle 133sequent 32

initial 32sequent calculus 31sequential theory 66E-depth 239size 9

soundness 160, 171, 286space 5sparse set 304speed-up 24, 236, 243, 299

exponential 25polynomial 25superpolynomial 25

specialproof 39rule 39

st-connectivity 22subformula property 104substitution Frege system 53substitution rule 53succedent 32switching lemma 14, 199, 226, 239, 247, 314,

323

test tree 180, 182witnessing 182, 221

theoremBeth definability 36Bondy 297Buss 107Cobham 76Cook 7, 8Craig interpolation 35Fermat little 312Gentzen (Haupsatz) 5, 104GOdel

completeness 3incompleteness 3, 62, 202, 206

Graham and Pollak 297Herbrand 5, 102, 120, 127, 130

338 Subject index

Hilbert Nullstellensatz 269Lagrange four-square 312midsequent 60Nepomnjascij 20Parikh 65, 323Ramsey 233, 236, 306Spira 16, 39, 49, 159, 241Sylvester 312, 325Tennenbaum 4, 65

theoryfirst order 3second order 3

time 5tournament 233, 236translation 139, 144, 158treelike 34, 45, 46, 154, 237truth

assignment 159, 313partial truth definition 3, 163, 204, 209, 304undefinability of 3

Tseitin graph formula 297Turing machine 5, 93, 133, 304

oracle 5, 99

ultrafilter 11ultraproduct 10

variablebounded 4free 4

vector space 224, 310

witnessingformula 106, 156function method 105oracle 99, 115

Z-part 43

NAME INDEX

Adleman, L. 18Ajtai, M. 1, 14, 258, 260, 277, 278, 324Aleliunas, R. 22Alon, N. 13Andreev, A. E. 17Arai, T. 183, 279, 297

Balcazar, J. L. 3, 118, 155Barrington, D. A. 22Beame, P. 260, 277, 278Bellantoni, S. 277Bel'tyukov, A. 278Bennett, J. H. 19, 20, 22, 66, 75, 79, 80Berarducci, A. 312Beth, E. W. 35, 36Blake, A. 25Bondy, J. A. 297Bonet, M. 61, 297Boppana, R. 3, 13, 22Buss, S. R. 1, 21, 60, 61, 62, 63, 78, 80, 83, 84,

92, 99, 101, 106, 107, 122, 131, 138,183,193,208,230,231,285,297,304,325

Cai, J. 277Chiari, M. 131, 231Choubarian, A. A. 61Chrapchenko, V. M. 17Chvatal, V. 298Clote, P. 2, 183, 184, 297Cobham, A. 22, 75, 76Cook, S. A. 1,8,23,24,60,61,63,76,78,92,

148, 158, 167, 169, 183, 325Cook, W. 61, 281, 297Coullard, C. R. 61, 281, 297Craig, W. 35

Davis, M. 25, 60Diaz, J. 3, 118Dimitracopoulos, C. 62, 66, 80, 83, 118, 209Dowd, M. 61, 155, 183

Furst, M. 14, 195, 324

Gabarro, J. 3, 118Gandy, R. 0. 278Garey, M. R. 3, 22, 307Gentzen, G. 31, 61, 104Godel, K. 3, 202, 203Goerdt, A. 297Gratzer, G. 289, 293

Hajek, P. 3, 4, 64, 67, 74, 80, 203, 325Haken, A. 23, 28Handley, W. G. 278Hartmanis, J. 5Hastad, J. 14, 17, 195, 197, 199, 226, 239, 277,

297, 324, 325Hausdorf, F. 73Hay, L. 101Herbrand, J. 113, 115, 131, 210Hopcroft, J. 6

Immerman, N. 5Impagliazzo, R. 277Intrigila, B. 312

Johnson, D. S. 3, 22, 121, 125, 307

Kadin, J. 208Karchmer, M. 10, 17, 22, 297Karp, R. 10Kaye, R. 4, 62Kirby, L. 325

339

340 Name index

Krajicek, J. 2, 25, 37, 45, 60, 61, 92, 98, 99,101,115,120,122,131,138,167,183,184, 191, 194, 204, 208, 216, 230, 231,239, 277, 289, 292, 297, 298, 304, 307,325

Krentel, M. W. 97, 98, 101

Lessan, H. 209Lewis, P. M. 5Lipton, R. J. 10Lob, M. H. 206Lovasz, L. 60

Manders, K. 18Mints, G. E. 102de Morgan 9, 161Mostowski, A. 4, 70Muchnik, A. A. 76Muller, D. E. 10

Nepomnjascij, V. A. 20, 22

Otero, M. 65

Papadimitriou, C. H. 98, 121, 125, 126Parikh, R. 61, 62, 63, 65, 92Paris, J. B. 1, 4, 22, 62, 65, 66, 67, 68, 80, 83,

92, 118, 183, 184, 203, 208, 209, 215,231, 266, 277, 278, 297, 318, 319, 325

Parsons, C. 102Paul, W. 6, 7Peano, G. 3, 251Pitassi, 1277, 297, 307Pudlak, P. 1, 3, 4, 25, 37, 60, 61, 64, 66, 67,

74, 80, 92, 98, 101, 120, 131, 191, 203,208, 209, 231, 277, 302, 304, 307, 325

Putnam, H. 25, 60

Rabin, M. 0. 202Raz, R. 22, 297Razborov, A. A. 10, 11, 13, 16, 89, 90, 92, 325Reckhow, R. A. 23, 24, 48, 60, 61Ressayre, J.-P. 74, 92Riis, S. 220, 222, 231, 274, 277, 278Robinson, J. A. 25Robinson, R. M. 3, 4, 63, 68, 70, 203Ryll-Nardzewski, C. 202

Savitch, W. J. 5Saxe, J. B. 14, 195, 324

Shannon, C. E. 10Sgall, J. 98, 101Sheperdson, J. C. 65Shoenfield, J. 3Sipser, M. 3, 14, 22, 195, 310, 324Smolensky, R. 16Smorynsky, C. 4, 206Smullyan, R. 6, 17, 18, 61Solovay, R. 67Spira, P. M. 16, 39, 49, 160Statman, R. 60, 61Steams, R. E. 5Stockmeyer, L. 6Subbotovskaja, B. A. 288Szelepcsenyi, R. 5Szemeredi, E. 298

Takeuti, G. 5, 31, 60, 89, 90, 92, 101, 102, 104,120, 131, 138, 167, 183, 191, 204, 208,209, 274, 275

Tarski, A. 3, 4, 70, 163, 171, 205, 304Tennenbaum S. 4, 65Toda, S. 7, 309Tseitin, G. C. 60, 61, 287, 297Turan, G. 28, 60, 61, 281, 297

Urquhart, A. 60, 287, 297, 298, 307, 325

Valiant, L. 6, 7Van den Dries, L. 65

Wagner, K. W. 101Wegener, I. 3, 22Wigderson, A. 17, 22Wilkie, A. 1, 21, 22, 62, 65, 67, 68, 80, 92, 131,

183, 184, 203, 208, 209, 215, 266, 277,278, 297, 319, 325

Woods, A. 21, 62, 215, 277, 297, 311, 312, 325Wrathall, C. 6, 19, 20

Yannakakis, M. 98, 121, 125Yao, Y. 14, 195, 277, 324

Zak, S. 6Zambella, D. 131, 208Zaring, W. M. 274, 275

SYMBOL INDEX

2.1 LpA, Q, IND, LNP, PA, N, Th(N), IEI, C, C, ,, V, A,

G(g(n)), Q(g(n)), O(g(n)), o(g(n))

2.2 Time(f ), Space(f ), NTime(f ), NSpace(f ), LinTime, P, NP, L,

PSpace, LinSpace, E, EXP, LinH, EI'',

Ep, II,'", II!', coX, #R(x), #P, P/poly, NP/poly, L/poly, o

3.1 Cc(f),DepthQ(f), Lu(f), CLIQUE, C+(f),

C"'(.f), ®, MAJ, Sd(xil,....id)' MODE, CC(f)

3.2 L, L1J, Ix1, x#y, Ei, U Ao, Do (M), A,

CE, RUD, SRUD, RUD+, strRUD, TimeSpace(f, g), Vk(N), Eb,Fib

i 00, Eb, Ab

4.1 TAUT, P<Q, P<EQ

4.2 Ext(O), -'PHPn , R

4.3 LK

4.4 F, kp(r), £p(r)

4.5 SF, EF, ER

4.6 E7, I1', Eq, G, Gi, G*, TAUT1

5.1 IDo, PA , Ce, IOpen, 521, w(x), Exp, wk(x),

IDo+Q1, IAo+Exp5.2 L+, BASIC, T2, T2, S2, S2, PIND, LIND,

#k, MIN, LENGTH-MIN, MAX, LENGTH-MAX,

A'-IND,13(F), BEb, BBE', BASIC+, S'2 (L+), strictEb

341

342 Symbol index

5.3 PV, SZ(PV), PV,

5.4 Numones(x), (a, b), bit(a, i), i E a, Seq(w)

5.5 Log(M), a' L2 E,1,b n, ,b Di ,hI Ep'b, Vjj, Vj,

Uj', Uj, SEP, AC, DC, R2, R2, R3, Enum(f, u,a), l-Exp

6.1 L(a), E'(a), nb(a), O' (a), S2i(a),

Tz (a), CompM, UNIV,

6.2 PNP[O(logn)], PEP[O(logn)], LNP, <,, (NP), Eo(Eb)

6.3 WitCompM,FPEiv

[wit, q(n)]

7.1 LKB, BASICLK

7.2 Witness''

7.3 *y, WPHP(f), WPHP(PV 1)

7.5 PLS, Fp, NP, CPI PPA, PPAD, PPP, PPA(PV 1)

8.1 #3, Rz, L(#3), TIMEQ, SPACEQ, EXPQ, PSPACEQh

8.2EXPEiI

" [wit,poly]

9.1 LPA(R), IL0(R), (6)(n)

9.2 n(i), IIAIIq(,,,), Ili'll

9.3 Fla(a), Prfp(7r, a), Assign(ij, a), Eval(rj, a, y),

n a, TAUT(a), Flad(a), a,

w = a, Taut, (A), TAUT,, P >; Q, i-RFN(P), Con(P)

10.2 Q,

10.4 Ra,Rq,Esm

10.5 IEO, Con(T), BdCon(T), RCon(T),

Tr, (x, y), STr, (x, y)

11.2 PHPn

11.3 MODk(R, S)Q

11.4 WPHP(a, R)

12.1 RAM

12.2 Ea,t Hds,t A], Sd

R+ (q), R- j(q), E-depth, -, MPHP(A, B, r)

Symbol index 343

12.3 MPHP MMODO, IIHII, S x T, S(H),

HaS,aP,HP,supp(p)12.4 [a], Mtrivi OP, HO, So

12.5 Count', PHPn, gCount'12.6 Qax < t, Q, AO, IQ, Ao,

MODa,i, F(MODa), LK(MOD0), X I Y12.7 L31, P, II-13.1 Li, L,,,, CP, PHPn P, IDo(a)count

13.2 C(G), D(G)13.3 F-1,BA,I'*, r+14.1 Taut(x), cp(r), PrfT, ConT, Hn:

14.2 >p15.1 ®P, Dam, D0PHP, IDo(7r)

15.3 P(M), EP(M), IIp(M), NP(M), coNP(M),PH(M), pstan( ) (EP)stan(M), (nP)stan(M), PHstan(M)

(P/ poly)(M), (NP / poly)(M), (coNP /poly) (M), Diag(M), Ooan(M),Bound(f)

LOGIC Advanced 1994

Documents

Transcript of LOGIC Advanced 1994