Analysis in One Variable

8/21/2019 Analysis in One Variable

http://slidepdf.com/reader/full/analysis-in-one-variable 1/220

Introduction to Analysis

in One Variable

Michael E. Taylor

1



2

Contents

Chapter I. Numbers

1. Peano arithmetic2. The integers3. Prime factorization and the fundamental theorem of arithmetic4. The rational numbers5. Sequences6. The real numbers7. Irrational numbers8. Cardinal numbers9. Metric properties of R

10. Complex numbers

Chapter II. Spaces

1. Euclidean spaces2. Metric spaces3. Compactness

Chapter III. Functions

1. Continuous functions

2. Sequences and series of functions3. Power series4. Spaces of functions

Chapter IV. Calculus

1. The derivative2. The integral3. Power series4. Curves and arc length5. Exponential and trigonometric functions

6. Unbounded integrable functionsA. The fundamental theorem of algebraB. π2 is irrationalC. More on (1 − x)b

D. Archimedes’ approximation of πE. Computing π using arctangents



3

Chapter V. Further topics in analysis

1. Convolutions and bump functions

2. The Weierstrass approximation theorem3. The Stone-Weierstrass theorem4. Fourier series5. Newton’s method

A. Inner product spaces



4

Introduction

This is a text for students who have had a three course calculus sequence, and whoare ready for a course that explores the logical structure of this area of mathematics,which forms the backbone of analysis. This is intended for a one semester course. Anaccompanying text, Introduction to Analysis in Several Variables [T2], can be used in thesecond semester of a one year sequence.

The main goal of Chapter 1 is to develop the real number system. We start with atreatment of the “natural numbers” N, obtaining its structure from a short list of axioms,the primary one being the principle of induction. Then we construct the set Z of all integers,which has a richer algebraic structure, and proceed to construct the set Q of rational

numbers, which are quotients of integers (with a nonzero denominator). After discussinginfinite sequences of rational numbers, including the notions of convergent sequences andCauchy sequences, we construct the set R of real numbers, as ideal limits of Cauchysequences of rational numbers. At the heart of this chapter is the proof that R is complete,i.e., Cauchy sequences of real numbers always converge to a limit in R. This provides thekey to studying other metric properties of R, such as the compactness of (nonempty) closed,bounded subsets. We end Chapter 1 with a section on the set C of complex numbers. Manyintroductions to analysis shy away from the use of complex numbers. My feeling is thatthis forecloses the study of way too many beautiful results that can be appreciated atthis level. This is not a course in complex analysis. That is for another course, and with

another text (such as [T3]). However, I hope that various topics covered in this text makeit clear that the use of complex numbers in analysis not only gives the subject an infusionof power, but also often makes things simpler, not more complicated.

In fact, the structure of analysis is revealed more clearly by moving beyond R and C,and we undertake this in Chapter 2. We start with a treatment of n-dimensional Euclideanspace, Rn. There is a notion of Euclidean distance between two points in Rn, leading tonotions of convergence and of Cauchy sequences. The spaces Rn are all complete, andagain closed bounded sets are compact. Going through this sets one up to appreciate afurther generalization, the notion of a metric space , introduced in §2. This is followed by§3, exploring the notion of compactness in a metric space setting.

Chapter 3 deals with functions. It starts in a general setting, of functions from one

metric space to another. We then treat infinite sequences of functions, and study thenotion of convergence, particularly of uniform convergence of a sequence of functions. Wemove on to infinite series. In such a case, we take the target space to be Rn, so we canadd functions. Section 3 treats power series. Here, we study series of the form

(1)∞k=0

ak(z − z0)k,

with ak ∈ C and z running over a disk in C. For results obtained in this section, regardingthe radius of convergence R and the continuity of the sum on DR(z0) = z ∈ C : |z −z0| <



5

R, there is no extra difficulty in allowing ak and z to be complex, rather than insistingthey be real, and the extra level of generality will pay big dividends in Chapter 4. A finalsection in Chapter 3 is devoted to spaces of functions, illustrating the utility of studying

spaces beyond the case of Rn

.Chapter 4 gets to the heart of the matter, a rigorous development of differential and

integral calculus. We define the derivative in §1, and prove the Mean Value Theorem,making essential use of compactness of a closed, bounded interval and its consequences,established in earlier chapters. This result has many important consequences, such as theInverse Function Theorem, and especially the Fundamental Theorem of Calculus, estab-lished in §2, after the Riemann integral is introduced. In §3, we return to power series,this time of the form

(2)∞

k=0

ak(t − t0)k.

We require t and t0 to be in R, but still allow ak ∈ C. Results on radius of convergenceR and continuity of the sum f (t) on (t0 − R, t0 + R) follow from material in Chapter 3.The essential new result in §3 of Chapter 4 is that one can obtain the derivative f (t) bydifferentiating the power series for f (t) term by term. In §4 we consider curves in Rn,and obtain a formula for arc length for a smooth curve. We show that a smooth curvewith nonvanishing velocity can be parametrized by arc length. When this is applied tothe unit circle in R2 centered at the origin, one is looking at the standard definition of thetrigonometric functions,

(3) C (t) = (cos t, sin t).

We provide a demonstration that

(4) C (t) = (− sin t, cos t)

that is much shorter than what is usually presented in calculus texts. In §5 we move on toexponential functions. We derive the power series for the function et, introduced to solvethe differential equation dx/dt = x. We then observe that with no extra work we get ananalogous power series for eat, with derivative aeat, and that this works for complex a as

well as for real a. It is a short step to realize that eit

is a unit speed curve tracing out theunit circle in C ≈ R2, so comparison with (3) gives Euler’s formula

(5) eit = cos t + i sin t.

That the derivative of eit is ieit provides a second proof of (4). Thus we have a unifiedtreatment of the exponential and trigonometric functions, carried out further in §5, withdetails developed in numerous exercises. Section 6 extends the scope of the Riemannintegral to a class of unbounded functions. Chapter 4 has several appendices, one provingthe fundamental theorem of algebra, one showing that π is irrational, one exploring in



6

more detail than in §3 the power series for (1 − x)b, and one describing an approximationto π pioneered by Archimedes.

Chapter 5 treats further topics in analysis. If time permits, the instructor might cover

one or more of these at the end of the course. The topics center around approximatingfunctions, via various infinite sequences or series. Topics include approximating continuousfunctions by polynomials, Fourier series, and Newton’s method for approximating theinverse of a given function.



7

Chapter I

Numbers

Introduction

One foundation for a course in analysis is a solid understanding of the real number sys-tem. Texts vary on just how to achieve this. Some take an axiomatic approach. In such anapproach, the set of real numbers is hypothesized to have a number of properties, includingvarious algebraic properties satisfied by addition and multiplication, order axioms, and,crucially, the completeness property, sometimes expressed as the supremum property.

This is not the approach we will take. Rather, we will start with a small list of ax-

ioms for the natural numbers (i.e., the positive integers), and then build the rest of theedifice logically, obtaining the basic properties of the real number system, particularly thecompleteness property, as theorems.

Sections 1–3 deal with the integers, starting in §1 with the set N of natural numbers.The development proceeds from axioms of G. Peano. The main one is the principle of mathematical induction. We deduce basic results about integer arithmetic from theseaxioms. A high point is the fundamental theorem of arithmetic, presented in §3.

Section 4 discusses the set Q of rational numbers, deriving the basic algebraic propertiesof these numbers from the results of §§1–3. Section 5 provides a bridge between §4 and §6.It deals with infinite sequences, including convergent sequences and “Cauchy sequences.”

This prepares the way for

§6, the main section of this chapter. Here we construct the set

R of real numbers, as “ideal limits” of rational numbers. We extend basic algebraic resultsfrom Q to R. Furthermore, we establish the result that R is “complete,” i.e., Cauchysequences always have limits in R. Section 7 provides examples of irrational numbers, suchas

√ 2,

√ 3,

√ 5,...

Section 8 deals with cardinal numbers, an extension of the natural numbers N, that canbe used to “count” elements of a set, not necessarily finite. For example, N is a “countably”infinite set, and so is Q. We show that R “uncountable,” and hence much larger than Nor Q.

Section 9 returns to the real number line R, and establishes further metric properties of R and various subsets, with an emphasisis on the notion of compactness . The completeness

property established in §6 plays a crucial role here.Section 10 introduces the set C of complex numbers and establishes basic algebraic and

metric properties of C. While some introductory treatments of analysis avoid complexnumbers, we embrace them, and consider their use in basic analysis too precious to omit.

Sections 9 and 10 also have material on continuous functions, defined on a subset of Ror C, respectively. These results give a taste of further results to be developed in Chapter3, which will be essential to material in Chapters 4 and 5.



8

1. Peano arithmetic

In Peano arithmetic, we assume we have a set N (the natural numbers). We assume

given 0 /∈ N, and form N = N ∪ 0. We assume there is a map

(1.1) s : N −→ N,

which is bijective . That is to say, for each k ∈ N, there is a j ∈ N such that s( j) = k, sos is surjective ; and furthermore, if s( j) = s( j) then j = j, so s is injective . The map splays the role of “addition by 1,” as we will see below. The only other axiom of Peano

arithmetic is that the principle of mathematical induction holds. In other words, if S ⊂ Nis a set with the properties

(1.2) 0 ∈ S, k ∈ S ⇒ s(k) ∈ S,

then S = N.Actually, applying the induction principle to S = 0 ∪ s(N), we see that it suffices to

assume that s in (1.1) is injective; the induction principle ensures that it is surjective.

We define addition x + y, for x, y ∈ N, inductively on y, by

(1.3) x + 0 = x, x + s(y) = s(x + y).

Next, we define multiplication x · y, inductively on y, by

(1.4) x · 0 = 0, x · s(y) = x · y + x.

We also define

(1.5) 1 = s(0).

We now establish the basic laws of arithmetic.

Proposition 1.1. x + 1 = s(x).

Proof. x + s(0) = s(x + 0).

Proposition 1.2. 0 + x = x.

Proof. Use induction on x. First, 0 + 0 = 0. Now, assuming 0 + x = x, we have

0 + s(x) = s(0 + x) = s(x).



9

Proposition 1.3. s(y + x) = s(y) + x.

Proof. Use induction on x. First, s(y + 0) = s(y) = s(y) + 0. Next, we have

s(y + s(x)) = ss(y + x),s(y) + s(x) = s(s(y) + x).

If s(y + x) = s(y) + x, the two right sides are equal, so the two left sides are equal,completing the induction.

Proposition 1.4. x + y = y + x.

Proof. Use induction on y. The case y = 0 follows from Proposition 1.2. Now, assuming

x + y = y + x, for all x ∈ N, we must show s(y) has the same property. In fact,

x + s(y) = s(x + y) = s(y + x),and by Proposition 1.3 the last quantity is equal to s(y) + x.

Proposition 1.5. (x + y) + z = x + (y + z).

Proof. Use induction on z. First, (x + y) + 0 = x + y = x + (y + 0). Now, assuming

(x + y) + z = x + (y + z), for all x, y ∈ N, we must show s(z) has the same property. Infact,

(x + y) + s(z) = s((x + y) + z),

x + (y + s(z)) = x + s(y + z) = s(x + (y + z)),

and we perceive the desired identity.Proposition 1.6. x · 1 = x.

Proof. We havex · s(0) = x · 0 + x = 0 + x = x,

the last identity by Proposition 1.2.

Proposition 1.7. 0 · y = 0.

Proof. Use induction on y. First, 0 · 0 = 0. Next, assuming 0 · y = 0, we have 0 · s(y) =0 · y + 0 = 0 + 0 = 0.

Proposition 1.8. s(x) · y = x · y + y.

Proof. Use induction on y. First, s(x) · 0 = 0, while x · 0 + 0 = 0 + 0 = 0. Next, assumings(x) · y = x · y + y, for all x, we must show that s(y) has this property. In fact,

s(x) · s(y) = s(x) · y + s(x) = (x · y + y) + (x + 1),

x · s(y) + s(y) = (x · y + x) + (y + 1),

and identity then follows via the commutative and associative laws of addition, Proposi-tions 1.4 and 1.5.



10

Proposition 1.9. x · y = y · x.

Proof. Use induction on y. First, x · 0 = 0 = 0 · x, the latter identity by Proposition 1.7.

Next, assuming x

·y = y

·x for all x

∈ N, we must show that s(y) has the same property.

In fact,x · s(y) = x · y + x = y · x + x,

s(y) · x = y · x + x,

the last identity by Proposition 1.8.

Proposition 1.10. (x + y) · z = x · z + y · z.

Proof. Use induction on z. First, the identity clearly holds for z = 0. Next, assuming it

holds for z (for all x, y ∈ N), we must show it holds for s(z). In fact,

(x + y) · s(z) = (x + y) · z + (x + y) = (x · z + y · z) + (x + y),

x · s(z) + y · s(z) = (x · z + x) + (y · z + y),

and the desired identity follows from the commutative and associative laws of addition.

Proposition 1.11. (x · y) · z = x · (y · z).

Proof. Use induction on z. First, the identity clearly holds for z = 0. Next, assuming it

holds for z (for all x, y ∈ N), we have

(x · y) · s(z) = (x · y) · z + x · y,

whilex

·(y

·s(z)) = x

·(y

·z + y) = x

·(y

·z) + x

·y,

the last identity by Proposition 1.10. These observations yield the desired identity.

We next demonstrate the cancellation law of addition:

Proposition 1.12. Given x,y,z ∈ N,

(1.6) x + y = z + y =⇒ x = z.

Proof. Use induction on y. If y = 0, (1.6) obviously holds. Assuming (1.6) holds for y, wemust show that

(1.7) x + s(y) = z + s(y)

implies x = z. In fact, (1.7) is equivalent to s(x + y) = s(z + y). Since the map s is assumedto be one-to-one, this implies that x + y = z + y, so we are done.

We next define an order relation on N. Given x, y ∈ N, we say

(1.8) x < y ⇐⇒ y = x + u, for some u ∈ N.

Similarly there is a definition of x ≤ y. We have x ≤ y if and only if y ∈ Rx, where

(1.9) Rx = x + u : u ∈ N.



11

Proposition 1.13. If x ≤ y and y ≤ x then x = y.

Proof. The hypotheses imply

(1.10) y = x + u, x = y + v, u, v ∈ N.Hence x = x + u + v, so, by Proposition 1.12, u + v = 0. Now, if v = 0, then v = s(w), sou + v = s(u + w) ∈ N. Thus v = 0, and u = 0.

Proposition 1.14. Given x, y ∈ N, either

(1.11) x < y, or x = y, or y < x,

and no two can hold.

Proof. That no two of (1.11) can hold follows from Proposition 1.13. To show that onemust hold, we want to show that

(1.12) y /∈

Rx

=⇒

y < x.

To do this, use induction on y. If 0 /∈ Rx, then x = 0, so x ∈ N, and hence x = 0 + x showsthat 0 < x. Now, assuming that y has the property (1.12), we must show that s(y) hasthis property.

So assume s(y) /∈ Rz. Since R0 = N, we deduce that z = 0, hence z ∈ N, hence z = s(x)for some x. But

s(y) /∈ Rs(x) ⇐⇒ y /∈ Rx.

The inductive hypothesis gives x = y + u, u ∈ N, hence s(x) = s(y) + u, and we are done.

We can now establish the cancellation law for multiplication.

Proposition 1.15. Given x,y,z ∈ N,(1.13) x · y = x · z, x = 0 =⇒ y = z.

Proof. If y = z , then either y < z or z < y. Suppose y < z, i.e., z = y + u, u ∈ N. Thenthe hypotheses of (1.13) imply

x · y = x · y + x · u, x = 0,

hence, by Proposition 1.12,

(1.14) x · u = 0, x = 0.

We thus need to show that (1.14) implies u = 0. In fact, if not, then we can write u = s(w),

and x = s(a), with w, a ∈ N, and we have(1.15) x · u = x · w + s(a) = s(x · w + a) ∈ N.

This contradicts (1.14), so we are done.

Remark. Note that (1.15) implies

(1.16) x, y ∈ N =⇒ x · y ∈ N.

We next establish the following variant of the principle of induction, called the well-

ordering property of

N.



12

Proposition 1.16. If S ⊂ N is nonempty, then S contains a smallest element.

Proof. Suppose S contains no smallest element. Then 0 /∈ S. Let

(1.17) T = x ∈ N : x < y, ∀ y ∈ S .Then 0 ∈ T. We claim that

(1.18) x ∈ T =⇒ s(x) ∈ T.

Indeed, suppose x ∈ T, so x < y for all y ∈ S. If s(x) /∈ T, we have s(x) ≥ y0 for somey0 ∈ S. Now, using Proposition 1.13, one can show that

(1.19) x < y0, s(x) ≥ y0 =⇒ s(x) = y0.

In turn, from this one can deduce that y0 must be the smallest element of S. Thus, if S has

no smallest element, (1.18) must hold. The induction principle then implies that T = N,which implies S is empty.

Exercises

Given n ∈ N, we define nk=1 ak inductively, as follows.

(1.20)1k=1

ak = a1,n+1k=1

ak = nk=1

ak

+ an+1.

Use the principle of induction to establish the following identities.

(1) 2n

k=1

k = n(n + 1).

(2) 6nk=1

k2 = n(n + 1)(2n + 1).

(3) (a − 1)nk=1

ak = an+1 − a, if a = 1.

In (3), we define an inductively by a1 = a, an+1 = an · a. We also set a0 = 1 if a ∈ N, andnk=0 ak = a0 +

nk=1 ak. Verify that

(4) (a − 1)

nk=0

ak = an+1 − 1, if a = 1.

5. Given k ∈ N, show that2k ≥ 2k,

with strict inequality for k > 1.

6. Show that, for x, x, y , y ∈ N,

x < x, y ≤ y ⇒ x · y < xy and x + y < x + y.



13

2. The integers

An integer is thought of as having the form x − a, with x, a ∈ N. To be more formal, we

will define an element of Z as an equivalence class of ordered pairs (x, a), x , a ∈ N, wherewe define

(2.1) (x, a) ∼ (y, b) ⇐⇒ x + b = y + a.

We claim (2.1) is an equivalence relation. In general, an equivalence relation on a set S isa specification s

∼t for certain s, t

∈S , which satisfies the following three conditions.

(a) Reflexive. s ∼ s, ∀ s ∈ S.

(b) Symmetric. s ∼ t ⇐⇒ t ∼ s.

(c) Transitive. s ∼ t, t ∼ u =⇒ s ∼ u.

We will encounter various equivalence relations in this and subsequent sections. Generally,(a) and (b) are quite easy to verify, and we will be content with verifying (c).

Proposition 2.1. The relation (2.1) is an equivalence relation.

Proof. We need to check that

(2.2) (x, a) ∼ (y, b), (y, b) ∼ (z, c) =⇒ (x, a) ∼ (z, c),

i.e., that, for x,y,z,a,b,c ∈ N,

(2.3) x + b = y + a, y + c = z + b =⇒ x + c = z + a.

In fact, the hypotheses of (2.3), and the results of §1, imply

(x + c) + (y + b) = (z + a) + (y + b),

and the conclusion of (2.3) then follows from the cancellation property, Proposition 1.12.

Let us denote the equivalence class containing (x, a) by [(x, a)]. We then define additionand multiplication in Z to satisfy

(2.4)[(x, a)] + [(y, b)] = [(x, a) + (y, b)], [(x, a)] · [(y, b)] = [(x, a) · (y, b)],

(x, a) + (y, b) = (x + y, a + b), (x, a) · (y, b) = (xy + ab,ay + xb).

To see that these operations are well defined, we need:



14

Proposition 2.2. If (x, a) ∼ (x, a) and (y, b) ∼ (y, b), then

(2.5) (x, a) + (y, b) ∼ (x, a) + (y, b),

and

(2.6) (x, a) · (y, b) ∼ (x, a) · (y, b).

Proof. The hypotheses say

(2.7) x + a = x + a, y + b = y + b.

The conclusions follow from results of §1. In more detail, adding the two identities in (2.7)gives

x + a

+ y + b

= x

+ a + y

+ b,and rearranging, using the commutative and associative laws of addition, yields

(x + y) + (a + b) = (x + y) + (a + b),

implying (2.5). The task of proving (2.6) is simplified by going through the intermediatestep

(2.8) (x, a) · (y, b) ∼ (x, a) · (y, b).

If x > x, so x = x + u, u

∈N, then also a = a + u, and our task is to prove

(xy + ab,ay + xb) ∼ (xy + uy + ab + ub,ay + uy + xb + ub),

which is readily done. Having (2.8), we apply similar reasoning to get

(x, a) · (y, b) ∼ (x, a) · (y, b),

and then (2.6) follows by transitivity.

Similarly, it is routine to verify the basic commutative, associative, etc. laws incorpo-rated in the next proposition. To formulate the results, set

(2.9) m = [(x, a)], n = [(y, b)], k = [(z, c)] ∈ Z.

Also, define

(2.10) 0 = [(0, 0)], 1 = [(1, 0)],

and

(2.11) −m = [(a, x)].



15

Proposition 2.3. We have

(2.12)

m + n = n + m,

(m + n) + k = m + (n + k),m + 0 = m,

m + (−m) = 0,

mn = nm,

m(nk) = (mn)k,

m · 1 = m,

m · 0 = 0,

m · (−1) = −m,

m · (n + k) = m · n + m · k.

To give an example of a demonstration of these results, the identity mn = nm isequivalent to

(xy + ab, ay + xb) ∼ (yx + ba,bx + ya).

In fact, commutative laws for addition and multiplication in N imply xy + ab = yx + baand ay + xb = bx + ya. Verification of the other identities in (2.12) is left to the reader.

We next establish the cancellation law for addition in Z.

Proposition 2.4. Given m, n, k ∈ Z,

(2.13) m + n = k + n =⇒ m = k.

Proof. We give two proofs. For one, we can add −n to both sides and use the results of Proposition 2.3. Alternatively, we can write the hypotheses of (2.13) as

x + y + c + b = z + y + a + b

and use Proposition 1.12 to deduce that x + c = z + a.

Note that it is reasonable to set

(2.14) m − n = m + (−n).

This defines subtraction on Z.There is a natural injection

(2.15) N → Z, x → [(x, 0)],

whose image we identify with N. Note that the map (2.10) preserves addition and multi-plication. There is also an injection x → [(0, x)], whose image we identify with −N.



16

Proposition 2.5. We have a disjoint union:

(2.16) Z = N ∪ 0 ∪ (−N).

Proof. Suppose m ∈ Z; write m = [(x, a)]. By Proposition 1.14, either

a < x, or x = a, or x < a.

In these three cases,

x = a + u, u ∈ N, or x = a, or a = x + v, v ∈ N.

Then, either

(x, a) ∼ (u, 0), or (x, a) ∼ (0, 0), or (x, a) ∼ (0, v).

We define an order on Z by:

(2.17) m < n ⇐⇒ n − m ∈ N.

We then have:

Corollary 2.6. Given m, n ∈ Z, then either

(2.18) m < n, or m = n, or n < m,

and no two can hold.

The map (2.15) is seen to preserve order relations.Another consequence of (2.16) is the following.

Proposition 2.8. If m, n ∈ Z and m · n = 0, then either m = 0 or n = 0.

Proof. Suppose m= 0 and n

= 0. We have four cases:

m > 0, n > 0 =⇒ mn > 0,

m < 0, n < 0 =⇒ mn = (−m)(−n) > 0,

m > 0, n < 0 =⇒ mn = −m(−n) < 0,

m < 0, n > 0 =⇒ mn = −(−m)n < 0,

the first by (1.16). This finishes the proof.

Using Proposition 2.7, we have the following cancellation law for multiplication in Z.



17

Proposition 2.8. Given m, n, k ∈ Z,

(2.19) mk = nk, k = 0 =⇒ m = n.

Proof. First, mk = nk ⇒ mk − nk = 0. Now

mk − nk = (m − n)k.

See Exercise 3 below. Hence

mk = nk =⇒ (m − n)k = 0.

Given k

= 0, Proposition 2.7 implies m

−n = 0. Hence m = n.

Exercises

1. Verify Proposition 2.3.

2. We define nk=1 ak as in (1.20), this time with ak ∈ Z. We also define ak inductively

as in (1.19), with a0 = 1 if a = 0. Use the principle of induction to establish the identity

n

k=1

(−1)k−1k = − m if n = 2m,

m + 1 if n = 2m + 1.

3. Show that, if m, n, k ∈ Z,

−(nk) = (−n)k, and mk − nk = (m − n)k.

4. Deduce the following from Proposition 1.16. Let S ⊂ Z be nonempty and assume thereexists m ∈ Z such that m < n for all n ∈ S . Then S has a smallest element.

5. Show that Z has no smallest element.



18

3. Prime factorization and the fundamental theorem of arithmetic

Let x ∈ N. We say x is composite if one can write

(3.1) x = ab, a, b ∈ N,

with neither a nor b equal to 1. If x = 1 is not composite, it is said to be prime. If (3.1)holds, we say a|x (and that b|x), or that a is a divisor of x. Given x ∈ N, x > 1, set

(3.2) Dx = a ∈ N : a|x, a > 1.

Thus x ∈ Dx, so Dx is non-empty. By Proposition 1.16, Dx contains a smallest element,say p1. Clearly p1 is a prime. Set

(3.3) x = p1x1, x1 ∈ N, x1 < x.

The same construction applies to x1, which is > 1 unless x = p1. Hence we have eitherx = p1 or

(3.4) x1 = p2x2, p2 prime , x2 < x1.

Continue this process, passing from xj to xj+1 as long as xj is not prime. The set S of such xj ∈ N has a smallest element, say xµ−1 = pµ, and we have

(3.5) x = p1 p2 · · · pµ, pj prime.

This is part of the Fundamental Theorem of Arithmetic:

Theorem 3.1. Given x ∈ N, x = 1, there is a unique product expansion

(3.6) x = p1 · · · pµ,

where p1 ≤ · · · ≤ pµ are primes.

Only uniqueness remains to be established. This follows from:

Proposition 3.2. Assume a, b ∈ N, and p ∈ N is prime. Then

(3.7) p|ab =⇒ p|a or p|b.

We will deduce this from:



19

Proposition 3.3. If p ∈ N is prime and a ∈ N, is not a multiple of p, or more generally if p, a ∈ N have no common divisors > 1, then there exist m, n ∈ Z such that

(3.8) ma + np = 1.

Proof of Proposition 3.2. Assume p is a prime which does not divide a. Pick m, n suchthat (3.8) holds. Now, multiply (3.8) by b, to get

mab + npb = b.

Thus, if p|ab, i.e., ab = pk, we have

p(mk + nb) = b,

so p|b, as desired.

To prove Proposition 3.3, let us set

(3.9) Γ = ma + np : m, n ∈ Z.

Clearly Γ satisfies the following criterion:

Definition. A nonempty subset Γ ⊂ Z is a subgroup of Z provided

(3.10) a, b

∈Γ =

⇒a + b, a

−b

∈Γ.

Proposition 3.4. If Γ ⊂ Z is a subgroup, then either Γ = 0, or there exists x ∈ N such that

(3.11) Γ = mx : m ∈ Z.

Proof. Note that n ∈ Γ ⇔ −n ∈ Γ, so, with Σ = Γ ∩N, we have a disjoint union

Γ = Σ ∪ 0 ∪ (−Σ).

If Σ = ∅, let x be its smallest element. Then we want to establish (3.11), so set Γ0 = mx :m ∈ Z. Clearly Γ0 ⊂ Γ. Similarly, set Σ0 = mx : m ∈ N = Γ0 ∩ N. We want to showthat Σ0 = Σ. If y ∈ Σ \ Σ0, then we can pick m0 ∈ N such that

m0x < y < (m0 + 1)x,

and hencey − m0x ∈ Σ

is smaller than x. This contradiction proves Proposition 3.4.



20

Proof of Proposition 3.3. Taking Γ as in (3.9), pick x ∈ N such that (3.11) holds. Sincea ∈ Γ and p ∈ Γ, we have

a = m0x, p = m1x

for some mj ∈ Z. The assumption that a and p have no common divisor > 1 implies x = 1.We conclude that 1 ∈ Γ, so (3.8) holds.

Exercises

1. Prove that there are infinitely many primes.Hint. If p1, . . . , pm is a complete list of primes, consider

x = p1 · · · pm + 1.

What are its prime factors?

2. Referring to (3.10), show that a nonempty subset Γ ⊂ Z is a subgroup of Z provided

(3.12) a, b ∈ Γ =⇒ a − b ∈ Γ.

Hint. a ∈ Γ ⇒ 0 = a − a ∈ Γ ⇒ −a = 0 − a ∈ Γ, given (3.12).

3. Let n ∈ N be a 12 digit integer. Show that if n is not prime, then it must be divisibleby a prime p < 106.

4. Determine whether the following number is prime:

(3.13) 201367.

Hint. This is for the student who can use a computer.

5. Find the smallest prime larger than the number in (3.13). Hint. Same as above.



21

4. The rational numbers

A rational number is thought of as having the form m/n, with m, n ∈ Z, n = 0. Thus, wewill define an element of Q as an equivalence class of ordered pairs m/n, m ∈ Z, n ∈ Z\0,where we define

(4.1) m/n ∼ a/b ⇐⇒ mb = an.

Proposition 4.1. This is an equivalence relation.

Proof. We need to check that

(4.2) m/n ∼ a/b, a/b ∼ c/d =⇒ m/n ∼ c/d,

i.e., that, for m, a, c ∈ Z, n ,b,d ∈ Z \ 0,

(4.3) mb = an, ad = cb =⇒ md = cn.

Now the hypotheses of (4.3) imply (mb)(ad) = (an)(cb), hence

(md)(ab) = (cn)(ab).

We are assuming b = 0. If also a = 0, then ab = 0, and the conclusion of (4.3) followsfrom the cancellation property, Proposition 2.8. On the other hand, if a = 0, then m/n ∼a/b ⇒ mb = 0 ⇒ m = 0 (since b = 0), and similarly a/b ∼ c/d ⇒ cb = 0 ⇒ c = 0, so thedesired implication also holds in that case.

We will (temporarily) denote the equivalence class containing m/n by [m/n]. We thendefine addition and multiplication in Q to satisfy

(4.4)[m/n] + [a/b] = [(m/n) + (a/b)], [m/n] · [a/b] = [(m/n) · (a/b)],

(m/n) + (a/b) = (mb + na)/(nb), (m/n)

·(a/b) = (ma)/(nb).

To see that these operations are well defined, we need:

Proposition 4.2. If m/n ∼ m/n and a/b ∼ a/b, then

(4.4A) (m/n) + (a/b) ∼ (m/n) + (a/b),

and

(4.4B) (m/n) · (a/b) ∼ (m/n) · (a/b).



22

Proof. The hypotheses say

(4.4C) mn = mn, ab = ab.

The conclusions follow from the results of §2. In more detail, multiplying the two identitiesin (4.4C) yields

manb = manb,

which implies (4.4B). To prove (4.4A), it is convenient to establish the intermediate step

(m/n) + (a/b) ∼ (m/n) + (a/b).

This is equivalent to

(mb + na)/nb ∼ (mb + na)/(nb),

hence to

(mb + na)nb = (mb + na)nb,

or to

mnbb + nnab = mnbb + nnab.

This in turn follows readily from (4.4C). Having (4.4D), we can use a similar argument toestablish that

(m/n) + (a/b)

∼(m/n) + (a/b),

and then (4.4A) follows by transitivity of ∼.

From now on, we drop the brackets, simply denoting the equivalence class of m/n bym/n, and writing (4.1) as m/n = a/b. We also may denote an element of Q by a singleletter, e.g., x = m/n. There is an injection

(4.5) Z → Q, m → m/1,

whose image we identify with Z. This map preserves addition and multiplication. Wedefine

(4.6) −(m/n) = (−m)/n,

and, if x = m/n = 0, (i.e., m = 0 as well as n = 0), we define

(4.7) x−1 = n/m.

The results stated in the following proposition are routine consequences of the results of §2.



23

Proposition 4.3. Given x, y, z ∈ Q, we have

x + y = y + x,

(x + y) + z = x + (y + z),x + 0 = x,

x + (−x) = 0,

x · y = y · x,

(x · y) · z = x · (y · z),

x · 1 = x,

x · 0 = 0,

x · (−1) = −x,

x · (y + z) = x · y + x · z.

Furthermore,x = 0 =⇒ x · x−1 = 1.

For example, if x = m/n, y = a/b with m,n,a,b ∈ Z, n ,b = 0, the identity x · y = y · xis equivalent to (ma)/(nb) ∼ (am)/(bn). In fact, the identities ma = am and nb = bnfollow from Proposition 2.3. We leave the rest of Proposition 4.3 to the reader.

We also have cancellation laws:

Proposition 4.4. Given x, y, z ∈ Q,

(4.8) x + y = z + y =⇒ x = z.

Also,

(4.9) xy = zy, y = 0 =⇒ x = z.

Proof. To get (4.8), add −y to both sides of x +y = z +y and use the results of Proposition4.3. To get (4.9), multiply both sides of x · y = z · y by y−1.

It is natural to define

(4.10) x − y = x + (−y),

and, if y = 0,

(4.11) x/y = x · y−1.

We now define the order relation on Q. Set

(4.12) Q+ = m/n : mn > 0,



24

where, in (4.12), we use the order relation on Z, discussed in §2. This is well defined.In fact, if m/n = m/n, then mn = mn, hence (mn)(mn) = (mn)2, and thereforemn > 0 ⇔ mn > 0. Results of §2 imply that

(4.13) Q = Q+ ∪ 0 ∪ (−Q+)

is a disjoint union, where −Q+ = −x : x ∈ Q+. Also, clearly

(4.14) x, y ∈ Q+ =⇒ x + y,xy, x

y ∈ Q+.

We define

(4.15) x < y ⇐⇒ y − x ∈ Q+,

and we have, for any x, y ∈ Q, either

(4.16) x < y, or x = y, or y < x,

and no two can hold. The map (4.5) is seen to preserve the order relations. In light of (4.14), we see that

(4.17) given x,y > 0, x < y ⇔ x

y < 1 ⇔ 1

y <

1

x.

As usual, we say x ≤ y provided either x < y or x = y. Similarly there are naturaldefinitions of x > y and of x ≥ y.

The following result implies that Q has the Archimedean property .

Proposition 4.5. Given x ∈ Q, there exists k ∈ Z such that

(4.18) k − 1 < x ≤ k.

Proof. It suffices to prove (4.18) assuming x ∈ Q+; otherwise, work with −x (and make afew minor adjustments). Say x = m/n, m,n ∈ N. Then

S = ∈ N : ≥ x

contains m, hence is nonempty. By Proposition 1.16, S has a smallest element; call it k.Then k ≥ x. We cannot have k − 1 ≥ x, for then k − 1 would belong to S. Hence (4.18)holds.

Exercises




25

2. Look at the exercise set for §1, and verify (3) and (4) for a ∈ Q, a = 1, n ∈ N.

3. Here is another route to (4) of §1, i.e.,

(4.19)nk=0

ak = an+1 − 1

a − 1 , a = 1.

Denote the left side of (4.19) by S n(a). Multiply by a and show that

aS n(a) = S n(a) + an+1 − 1.

4. Given a

∈ Q, n

∈ N, define an as in Exercise 3 of

§1. If a

= 0, set a0 = 1 and

a−n = (a−1)n, with a−1 defined as in (4.7). Show that, if a, b ∈ Q \ 0,

aj+k = ajak, ajk = (aj)k, (ab)j = ajbj , ∀ j, k ∈ Z.

5. Prove the following variant of Proposition 4.5.

Proposition 4.5A. Given ε ∈ Q, ε > 0, there exists n ∈ N such that

ε > 1

n.

6. Work through the proof of the following.Assertion If x = m/n ∈ Q, then x2 = 2.Hint. We can arrange that m and n have no common factors. Thenm

n

2

= 2 ⇒ m2 = 2n2 ⇒ m even (m = 2k)

⇒ 4k2 = 2n2

⇒ n2 = 2k2

⇒ n even.

Contradiction? (See Proposition 7.2 for a more general result.)

7. Given xj , yj ∈ Q, show that

x1 < x2, y1 ≤ y2 =⇒ x1 + y1 < x2 + y2.

Show that0 < x1 < x2, 0 < y1 ≤ y2 =⇒ x1y1 < x2y2.



26

5. Sequences

In this section, we discuss infinite sequences. For now, we deal with sequences of rationalnumbers, but we will not explicitly state this restriction below. In fact, once the set of real numbers is constructed in §6, the results of this section will be seen to hold also forsequences of real numbers.

Definition. A sequence (aj) is said to converge to a limit a provided that, for any n ∈ N,there exists K (n) such that

(5.1) j ≥ K (n) =⇒ |aj − a| < 1

n.

We write aj → a, or a = lim aj , or perhaps a = limj→∞ aj .

Here, we define the absolute value |x| of x by

(5.2)|x| = x if x ≥ 0,

−x if x < 0.

The absolute value function has various simple properties, such as |xy| = |x| · |y|, whichfollow readily from the definition. One basic property is the triangle inequality:

(5.3) |x + y| ≤ |x| + |y|.In fact, if either x and y are both positive or they are both negative, one has equalityin (5.3). If x and y have opposite signs, then |x + y| ≤ max(|x|, |y|), which in turn isdominated by the right side of (5.3).

Proposition 5.1. If aj → a and bj → b, then

(5.4) aj + bj → a + b,

and

(5.5) ajbj → ab.

If furthermore, bj = 0 for all j and b = 0, then

(5.6) aj/bj → a/b.

Proof. To see (5.4), we have, by (5.3),

(5.7) |(aj + bj) − (a + b)| ≤ |aj − a| + |bj − b|.



27

To get (5.5), we have

(5.8)|ajbj − ab| = |(ajbj − abj) + (abj − ab)|

≤ |bj | · |aj − a| + |a| · |b − bj |.The hypotheses imply |bj | ≤ B, for some B, and hence the criterion for convergence isreadily verified. To get (5.6), we have

(5.9)aj

bj− a

b

≤ 1

|b| · |bj ||b| · |a − aj | + |a| · |b − bj |

.

The hypotheses imply 1/|bj | ≤ M for some M, so we also verify the criterion for conver-gence in this case.

We next define the concept of a Cauchy sequence.

Definition. A sequence (aj) is a Cauchy sequence provided that, for any n ∈ N, there exists K (n) such that

(5.10) j, k ≥ K (n) =⇒ |aj − ak| ≤ 1

n.

It is clear that any convergent sequence is Cauchy. On the other hand, we have:

Proposition 5.2. Each Cauchy sequence is bounded.

Proof. Take n = 1 in the definition above. Thus, if (aj) is Cauchy, there is a K such that j, k ≥ K ⇒ |aj − ak| ≤ 1. Hence, j ≥ K ⇒ |aj | ≤ |aK | + 1, so, for all j,

|aj | ≤ M, M = max|a1|, . . . , |aK −1|, |aK | + 1.

Now, the arguments proving Proposition 5.1 also establish:

Proposition 5.3. If (aj) and (bj) are Cauchy sequences, so are (aj + bj) and (ajbj).Furthermore, if, for all j, |bj | ≥ c for some c > 0, then (aj/bj) is Cauchy.

The following proposition is a bit deeper than the first three.

Proposition 5.4. If (aj) is bounded, i.e., |aj | ≤ M for all j, then it has a Cauchy subsequence.

Proof. We may as well assume M

∈ N. Now, either aj

∈ [0, M ] for infinitely many j or

aj ∈ [−M, 0] for infinitely many j. Let I 1 be any one of these two intervals containing ajfor infinitely many j, and pick j(1) such that aj(1) ∈ I 1. Write I 1 as the union of two closedintervals, of equal length, sharing only the midpoint of I 1. Let I 2 be any one of them withthe property that aj ∈ I 2 for infinitely many j, and pick j(2) > j(1) such that aj(2) ∈ I 2.

Continue, picking I ν ⊂ I ν −1 ⊂ · · · ⊂ I 1, of length M/2ν −1, containing aj for infinitelymany j, and picking j(ν ) > j(ν − 1) > · · · > j(1) such that aj(ν ) ∈ I ν . Setting bν = aj(ν ),we see that (bν ) is a Cauchy subsequence of (aj), since, for all k ∈ N,

|bν +k − bν | ≤ M/2ν −1.



28

Proposition 5.5. Each bounded monotone sequence (aj) is Cauchy.

Proof. To say (aj) is monotone is to say that either (aj) is increasing, i.e., aj ≤ aj+1 forall j, or that (aj) is decreasing, i.e., aj

≥aj+1 for all j. For the sake of argument, assume

(aj) is increasing.By Proposition 5.4, there is a subsequence (bν ) = (aj(ν )) which is Cauchy. Thus, given

n ∈ N, there exists K (n) such that

(5.11) µ, ν ≥ K (n) =⇒ |aj(ν ) − aj(µ)| < 1

n.

Now, if ν 0 ≥ K (n) and k ≥ j ≥ j(ν 0), pick ν 1 such that j(ν 1) ≥ k. Then

aj(ν 0) ≤ aj ≤ ak ≤ aj(ν 1),

so

(5.12) k ≥ j ≥ j(ν 0) =⇒ |aj − ak| < 1

n.

We give a few simple but basic examples of convergent sequences.

Proposition 5.6. If |a| < 1, then aj → 0.

Proof. Set b = |a|; it suffices to show that bj → 0. Consider c = 1/b > 1, hence c =1 + y, y > 0. We claim that

cj = (1 + y)j ≥ 1 + jy,

for all j ≥ 1. In fact, this clearly holds for j = 1, and if it holds for j = k, then

ck+1 ≥ (1 + y)(1 + ky) > 1 + (k + 1)y.

Hence, by induction, the estimate is established. Consequently,

bj < 1

jy,

so the appropriate analogue of (5.1) holds, with K (n) = K n, for any integer K > 1/y.

Proposition 5.6 enables us to establish the following result on geometric series.

Proposition 5.7. If |x| < 1 and

aj = 1 + x + · · · + xj ,

then

aj → 1

1 − x.



29

Proof. Note that xaj = x + x2 + · · · + xj+1, so (1 − x)aj = 1 − xj+1, i.e.,

aj = 1 − xj+1

1 − x

.

The conclusion follows from Proposition 5.6.

Note in particular that

(5.13) 0 < x < 1 =⇒ 1 + x + · · · + xj < 1

1 − x.

It is an important mathematical fact that not every Cauchy sequence of rational numbershas a rational number as limit. We give one example here. Consider the sequence

(5.14) aj =

j=0

1! .

Then (aj) is increasing, and

an+j − an =

n+j=n+1

1

! ≤ 1

n!

1

n + 1 +

1

(n + 1)2 + · · · +

1

(n + 1)j

,

since (n + 1)(n + 2) · · · (n + j) ≥ (n + 1)j . Using (5.13), we have

(5.15) an+j − an ≤ 1

(n + 1)!

1

1 − 1n+1

= 1

n! · 1

n .

Hence (aj) is Cauchy. Taking n = 2, we see that

(5.16) j > 2 =⇒ 2 12 < aj < 2 3

4 .

Proposition 5.8. The sequence (5.14) cannot converge to a rational number.

Proof. Assume aj → m/n with m, n ∈ N. By (5.16), we must have n > 2. Now, write

(5.17)

m

n =

n

=0

1

! + r, r = limj→∞ (an+j − an).

Multiplying both sides of (5.17) by n! gives

(5.18) m(n − 1)! = A + r · n!

where

(5.19) A =n=0

n!

! ∈ N.



30

Thus the identity (5.17) forces r · n! ∈ N, while (5.15) implies

(5.20) 0 < r · n! ≤ 1/n.

This contradiction proves the proposition.

Exercises

1. Show that

limk→∞

k

2k = 0,

and more generally for each n ∈ N,

limk→∞

kn

2k = 0.

2. Show that

limk→∞

2k

k! = 0,

and more generally for each n ∈ N,

limk→∞

2nk

k! = 0.

The following exercises discuss continued fractions. We assume

(5.21) aj ∈ Q, aj ≥ 1, j = 1, 2, 3, . . . ,

and set

(5.22) f 1 = a1, f 2 = a1 + 1

a2, f 3 = a1 +

1

a2 + 1a3

, . . . .

Having f j , we obtain f j+1 by replacing aj by aj + 1/aj+1. In other words, with

(5.23) f j = ϕj(a1, . . . , aj),

given explicitly by (5.22) for j = 1, 2, 3, we have

(5.24) f j+1 = ϕj+1(a1, . . . , aj+1) = ϕj(a1, . . . , aj−1, aj + 1/aj+1).

3. Show thatf 1 ≤ f j , ∀ j ≥ 2, and f 2 ≥ f j , ∀ j ≥ 3.



31

Going further, show that

(5.25) f 1 ≤ f 3 ≤ f 5 ≤ · · · ≤ f 6 ≤ f 4 ≤ f 2.

4. If also aj+1 ∈ Q, aj+1 ≥ 1, show that

(5.26)ϕj+1(a1, . . . , aj , aj+1) − ϕj+1(a1, . . . , aj , aj+1)

= ϕj(a1, . . . , aj−1, bj) − ϕj(a1, . . . , aj−1, bj),

with

(5.27)

bj = aj + 1

aj+1

, bj = aj + 1

aj+1

,

bj − bj = 1

aj+1− 1

aj+1=

aj+1 − aj+1

aj+1aj+1.

Note that bj , bj > 1. Iterating this, show that

(5.28) f 2j − f 2j+1 → 0, as j → ∞.

Deduce that (f j) is a Cauchy sequence.

5. Suppose a sequence (aj) has the property that there exist

r < 1, K ∈ N

such that j ≥ K =⇒

aj+1

aj

≤ r.

Show that aj → 0 as j → ∞. How does this result apply to Exercises 1 and 2?

6. If (aj) satisfies the hypotheses of Exercise 5, show that there exists M < ∞ such that

kj=1

|aj | ≤ M, ∀ k.

Remark. This yields the ratio test for infinite series.



32

6. The real numbers

We think of a real number as a quantity that can be specified by a process of approx-imation arbitrarily closely by rational numbers. Thus, we define an element of R as anequivalence class of Cauchy sequences of rational numbers, where we define

(6.1) (aj) ∼ (bj) ⇐⇒ aj − bj → 0.

Proposition 6.1. This is an equivalence relation.

Proof. This is a straightforward consequence of Proposition 5.1. In particular, to see that

(6.2) (aj) ∼ (bj), (bj) ∼ (cj) =⇒ (aj) ∼ (cj),

just use (5.4) of Proposition 5.1 to write

aj − bj → 0, bj − cj → 0 =⇒ aj − cj → 0.

We denote the equivalence class containing a Cauchy sequence (aj) by [(aj)]. We thendefine addition and multiplication on R to satisfy

(6.3)

[(aj)] + [(bj)] = [(aj + bj)],

[(aj)] · [(bj)] = [(ajbj)].

Proposition 5.3 states that the sequences (aj + bj) and (ajbj) are Cauchy if (aj) and (bj)are. To conclude that the operations in (6.3) are well defined, we need:

Proposition 6.2. If Cauchy sequences of rational numbers are given which satisfy (aj) ∼(aj) and (bj) ∼ (bj), then

(6.4) (aj + bj) ∼ (aj + bj),

and

(6.5) (ajbj) ∼ (ajbj).

The proof is a straightforward variant of the proof of parts (5.4)-(5.5) in Proposition5.1, with due account taken of Proposition 5.2. For example, ajbj − ajb

j = ajbj − ajb

j +

ajbj − ajb

j , and there are uniform bounds |aj | ≤ A, |bj | ≤ B, so

|ajbj − ajbj | ≤ |aj | · |bj − bj | + |aj − aj | · |bj |≤ A|bj − bj | + B|aj − aj |.



33

There is a natural injection

(6.6) Q → R, a → [(a , a , a , . . . )],

whose image we identify with Q. This map preserves addition and multiplication.If x = [(aj)], we define

(6.7) −x = [(−aj)].

For x = 0, we define x−1 as follows. First, to say x = 0 is to say there exists n ∈ N suchthat |aj | ≥ 1/n for infinitely many j. Since (aj) is Cauchy, this implies that there existsK such that |aj | ≥ 1/2n for all j ≥ K. Now, if we set αj = ak+j , we have (αj) ∼ (aj); wepropose to set

(6.8) x−1 = [(α−1j )].

We claim that this is well defined. First, by Proposition 5.3, ( α−1j ) is Cauchy. Furthermore,

if for such x we also have x = [(bj)], and we pick K so large that also |bj | ≥ 1/2n for all j ≥ K, and set β j = bK +j , we claim that

(6.9) (α−1j ) ∼ (β −1

j ).

Indeed, we have

(6.10) |α−1j − β −1

j | ≤ |β j − αj |

|αj

| · |β j

| ≤ 4n2|β j − αj |,

so (6.9) holds.It is now a straightforward exercise to verify the basic algebraic properties of addition

and multiplication in R. We state the result.

Proposition 6.3. Given x, y,z ∈ R, all the algebraic properties stated in Proposition 4.3 hold.

For example, if x = [(aj)] and y = [(bj)], the identity xy = yx is equivalent to (ajbj) ∼(bjaj). In fact, the identity ajbj = bjaj for aj , bj ∈ Q, follows from Proposition 4.3. Therest of Proposition 6.3 is left to the reader.

As in (4.10)-(4.11), we define x−

y = x + (−

y) and, if y = 0, x/y = x

·y−1.

We now define an order relation on R. Take x ∈ R, x = [(aj)]. From the discussionabove of x−1, we see that, if x = 0, then one and only one of the following holds. Either,for some n, K ∈ N,

(6.11) j ≥ K =⇒ aj ≥ 1

2n,

or, for some n, K ∈ N,

(6.12) j ≥ K =⇒ aj ≤ − 1

2n.



34

If (aj) ∼ (bj) and (6.11) holds for aj , it also holds for bj (perhaps with different n and K ),and ditto for (6.12). If (6.11) holds, we say x ∈ R+ (and we say x > 0), and if (6.12) holdswe say x ∈ R− (and we say x < 0). Clearly x > 0 if and only if −x < 0. It is also clear

that the map Q → R in (6.6) preserves the order relation.Thus we have the disjoint union

(6.13) R = R+ ∪ 0 ∪R−, R− = −R+.

Also, clearly

(6.14) x, y ∈ R+ =⇒ x + y,xy ∈ R+.

As in (4.15), we define

(6.15) x < y ⇐⇒ y − x ∈ R+

.

The following results are straightforward.

Proposition 6.4. For elements of R, we have

(6.16) x1 < y1, x2 < y2 =⇒ x1 + x2 < y1 + y2,

(6.17) x < y ⇐⇒ −y < −x,

(6.18) 0 < x < y, a > 0 =⇒ 0 < ax < ay,

(6.19) 0 < x < y =⇒ 0 < y−1 < x−1.

Proof. The results (6.16) and (6.18) follow from (6.14); consider, for example, a(y − x).The result (6.17) follows from (6.13). To prove (6.19), first we see that x > 0 impliesx−1 > 0, as follows: if −x−1 > 0, the identity x · (−x−1) = −1 contradicts (6.14). As forthe rest of (6.19), the hypotheses imply xy > 0, and multiplying both sides of x < y bya = (xy)−1 gives the result, by (6.18).

As in (5.2), define |x| by

(6.20)|x| = x if x ≥ 0,

−x if x < 0.

It is straightforward to verify

(6.21) |xy| = |x| · |y|, |x + y| ≤ |x| + |y|.

We now show that R has the Archimedean property.



35

Proposition 6.5. Given x ∈ R, there exists k ∈ Z such that

(6.22) k − 1 < x ≤ k.

Proof. It suffices to prove (6.22) assuming x ∈ R+. Otherwise, work with −x. Say x = [(aj)]where (aj) is a Cauchy sequence of rational numbers. By Proposition 5.2, there existsM ∈ Q such that |aj | ≤ M for all j. By Proposition 4.5, we have M ≤ for some ∈ N.Hence the set S = ∈ N : ≥ x is nonempty. As in the proof of Proposition 4.5, takingk to be the smallest element of S gives (6.22).

Proposition 6.6. Given any real ε > 0, there exists n ∈ N such that ε > 1/n.

Proof. Using Proposition 6.5, pick n > 1/ε and apply (6.19). Alternatively, use the rea-soning given above (6.8).

We are now ready to consider sequences of elements of R.

Definition. A sequence (xj) converges to x if and only if, for any n ∈ N, there exists K (n) such that

(6.23) j ≥ K (n) =⇒ |xj − x| < 1

n.

In this case, we write xj → x, or x = lim xj .The sequence (xj) is Cauchy if and only if, for any n

∈N, there exists K (n) such that

(6.24) j, k ≥ K (n) =⇒ |xj − xk| < 1

n.

We note that it is typical to phrase the definition above in terms of picking any realε > 0 and demanding that, e.g., |xj − x| < ε, for large j. The equivalence of the twodefinitions follows from Proposition 6.6.

As in Proposition 5.2, we have that every Cauchy sequence is bounded.It is clear that, if each xj ∈ Q, then the notion that (xj) is Cauchy given above coincides

with that in§

5. If also x∈Q, the notion that xj

→x also coincides with that given in

§5.

Furthermore, if each aj ∈ Q, and x ∈ R, then

(6.25) aj → x ⇐⇒ x = [(aj)].

In fact, given x = [(aj)],

(6.26) j, k ≥ K ⇒ |aj − ak| ≤ 1/n

=⇒

j ≥ K ⇒ |aj − x| ≤ 1/n

.

The proof of Proposition 5.1 extends to the present case, yielding:



36

Proposition 6.7. If xj → x and yj → y, then

(6.27) xj + yj → x + y,

and

(6.28) xjyj → xy.

If furthermore yj = 0 for all j and y = 0, then

(6.28) xj/yj → x/y.

So far, statements made about R have emphasized similarities of its properties with

corresponding properties of Q. The crucial difference between these two sets of numbers isgiven by the following result, known as the completeness property.

Theorem 6.8. If (xj) is a Cauchy sequence of real numbers, then there exists x ∈ R such that xj → x.

Proof. Take xj = [(aj : ∈ N)] with aj ∈ Q. Using (6.26), take aj,(j) = bj ∈ Q such that

(6.29) |xj − bj | ≤ 2−j .

Then (bj) is Cauchy, since

|bj

−bk

| ≤ |xj

−xk

|+ 2−j + 2−k. Now, let

(6.30) x = [(bj)].

It follows that

(6.31) |xj − x| ≤ |xj − bj | + |x − bj | ≤ 2−j + |x − bj |,

and hence xj → x.

If we combine Theorem 6.8 with the argument behind Proposition 5.4, we obtain thefollowing important result, known as the Bolzano-Weierstrass Theorem.

Theorem 6.9. Each bounded sequence of real numbers has a convergent subsequence.

Proof. If |xj | ≤ M, the proof of Proposition 5.4 applies without change to show that (xj)has a Cauchy subsequence. By Theorem 6.8, that Cauchy subsequence converges.

Similarly, adding Theorem 6.8 to the argument behind Proposition 5.5 yields:

Proposition 6.10. Each bounded monotone sequence (xj) of real numbers converges.

A related property of R can be described in terms of the notion of the “supremum” of a set.



37

Definition. If S ⊂ R, one says that x ∈ R is an upper bound for S provided x ≥ s for all s ∈ S, and one says

(6.32) x = sup S

provided x is an upper bound for S and further x ≤ x whenever x is an upper bound for S.

For some sets, such as S = Z, there is no x ∈ R satisfying (6.32). However, there is thefollowing result, known as the supremum property.

Proposition 6.11. If S is a nonempty subset of R that has an upper bound, then there is a real x = sup S.

Proof. We use an argument similar to the one in the proof of Proposition 5.3. Let x0 bean upper bound for S, pick s0 in S, and consider

I 0 = [s0, x0] = y ∈ R : s0 ≤ y ≤ x0.

If x0 = s0, then already x0 = sup S. Otherwise, I 0 is an interval of nonzero length,L = x0 − s0. In that case, divide I 0 into two equal intervals, having in common only themidpoint; say I 0 = I 0 ∪ I r0 , where I r0 lies to the right of I 0.

Let I 1 = I r0 if S ∩ I r0 = ∅, and otherwise let I 1 = I 0. Let x1 be the right endpoint of I 1,and pick s1 ∈ S ∩ I 1. Note that x1 is also an upper bound for S.

Continue, constructingI ν ⊂ I ν −1 ⊂ · · · ⊂ I 0,

where I ν has length 2−ν L, such that the right endpoint xν of I ν satisfies

(6.33) xν ≥ s, ∀ s ∈ S,

and such that S ∩ I ν = ∅, so there exist sν ∈ S such that

(6.34) xν − sν ≤ 2−ν L.

The sequence (xν ) is bounded and monotone (decreasing) so, by Proposition 6.10, it con-verges; xν → x. By (6.33), we have x ≥ s for all s ∈ S, and by (6.34) we have x−sν ≤ 2−ν L.Hence x satisfies (6.32).

We turn to infinite series

∞k=0 ak, with ak ∈ R. We say this series converges if and

only if the sequence of partial sums

(6.35) S n =nk=0

ak

converges:

(6.36)∞k=0

ak = A ⇐⇒ S n → A as n → ∞.

The following is a useful condition guaranteeing convergence.



38

Proposition 6.12. The infinite series ∞k=0 ak converges provided

(6.37)∞

k=0 |ak|

<∞

,

i.e., there exists B < ∞ such that nk=0 |ak| ≤ B for all n.

Proof. The triangle inequality (the second part of (6.21)) gives, for ≥ 1,

(6.38)

|S n+ − S n| = n+k=n+1

ak

≤

n+

k=n+1 |ak

|,

and we claim this tends to 0 as n → ∞, uniformly in ≥ 1, provided (6.37) holds. In fact,if (6.37) holds, we can apply Proposition 6.10 to

(6.39) σn =nk=0

|ak|, σn , σn ≤ B,

to deduce that there exists β ∈ R such that σn → β as n → ∞. Then

(6.40) |S n+ − S n| ≤ σn+ − σn ≤ β − σn → 0 as n → ∞.

Thus (6.37) ⇒ (S n) is Cauchy. Convergence follows, by Theorem 6.8.

When (6.37) holds, we say the series ∞k=0 ak is absolutely convergent .

The following result on alternating series gives another sufficient condition for conver-gence.

Proposition 6.13. Assume ak > 0, ak 0. Then

(6.41)∞

k=0

(−1)kak

is convergent.

Proof. Denote the partial sums by S n, n ≥ 0. We see that, for m ∈ N,

(6.42) S 2m+1 ≤ S 2m+3 ≤ S 2m+2 ≤ S 2m.

Iterating this, we have, as m → ∞,

(6.43) S 2m α, S 2m+1 β, β ≤ α,



39

and

(6.44) S 2m − S 2m+1 = a2m+1,

hence α = β , and convergence is established.

Here is an example:

∞k=0

(−1)k

k + 1 = 1 − 1

2 +

1

3 − 1

4 + · · · is convergent.

This series is not absolutely convergent (cf. Exercise 6 below). For an evaluation of thisseries, see exercises in §5 of Chapter 4.

Exercises


2. If S ⊂ R, we say that x ∈ R is a lower bound for S provided x ≤ s for all s ∈ S , andwe say

(6.45) x = inf S,

provided x is a lower bound for S and further x ≥ x whenever x is a lower bound for S .

Mirroring Proposition 6.11, show that if S ⊂R

is a nonempty set that has a lower bound,then there is a real x = inf S .

3. Given a real number ξ ∈ (0, 1), show it has an infinite decimal expansion, i.e., thereexist bk ∈ 0, 1, . . . , 9 such that

(6.46) ξ =∞k=1

bk · 10−k.

Hint. Start by breaking [0, 1] into ten subintervals of equal length, and picking one towhich ξ belongs.

4. Show that if 0 < x < 1,∞k=0

xk = 1

1 − x < ∞.

Hint. As in (4.19), we have

nk=0

xk = 1 − xn+1

1 − x , x = 1.



40

5. Assume ak > 0 and ak 0. Show that

(6.47)

∞k=1

ak < ∞ ⇐⇒∞k=0

bk < ∞,

where

(6.48) bk = 2ka2k .

6. Deduce from Exercise 5 that the harmonic series 1 + 12 + 1

3 + 14 + · · · diverges, i.e.,

(6.49)

∞

k=1

1

k = ∞.

7. Deduce from Exercises 4–5 that

(6.50) p > 1 =⇒∞k=1

1

k p < ∞.

For now, we take p ∈ N. We will see later that (6.50) is meaningful, and true, for p ∈R, p > 1.

8. Given a, b ∈ R \ 0, k ∈ Z, define ak as in Exercise 4 of §4. Show that

aj+k = ajak, ajk = (aj)k, (ab)j = ajbj , ∀ j, k ∈ Z.

9. Given k ∈ N, show that, for xj ∈ R,

xj → x =⇒ xkj → xk.

Hint. Use Proposition 6.7.

10. Given xj , x , y ∈ R, show that

xj ≥ y ∀ j, xj → x =⇒ x ≥ y.

11. Given the alternating series

(−1)kak as in Proposition 6.13 (with ak 0), with sumS , show that, for each N ,

N k=0

(−1)kak = S + rN , |rN | ≤ |aN +1|.



41

The following exercises deal with the sequence (f j) of partial fractions associated to asequence (aj) as in (5.21), via (5.22)–(5.24), leading to Exercises 3–4 of §5.

12. Deduce from (5.25) that there exist f o, f e ∈ R such that

f 2k+1 f o, f 2k f e, f o ≤ f e.

13. Deduce from (5.28) that f o = f e (= f , say), and hence

f j −→ f, as j → ∞,

i.e., if (aj

) satisfies (5.21),

ϕj(a1, . . . , aj) −→ f, as j → ∞.

We denote the limit by ϕ(a1, . . . , aj , . . . ).

14. Show that ϕ(1, 1, . . . , 1, . . . ) = x solves x = 1 + 1/x, and hence

ϕ(1, 1, . . . , 1, . . . ) = 1 +

√ 5

2 .

Note. The existence of such x implies that 5 has a square root, √ 5 ∈ R. See Proposition7.1 for a more general result.



42

7. Irrational numbers

There are real numbers that are not rational. One, called e, is given by the limit of thesequence (5.14); in standard notation,

(7.1) e =∞=0

1

!

Proposition 5.8 implies that e is not rational. One can approximate e to high accuracy. Infact, as a consequence of (5.15), one has

(7.2) e −n=0

1

! ≤ 1

n! · 1

n.

For example, one can verify that

(7.3) 120! > 6 · 10198,

and hence

(7.4) e −120

=0

1

! < 10−200

.

In a fraction of a second, a personal computer with the right program can perform a highlyaccurate approximation to such a sum, yielding

2.7182818284 5904523536 0287471352 6624977572 4709369995

9574966967 6277240766 3035354759 4571382178 5251664274

2746639193 2003059921 8174135966 2904357290 0334295260

5956307381 3232862794 3490763233 8298807531 · · ·

accurate to 190 places after the decimal point.A number in R \ Q is said to be irrational. We present some other common examples

of irrational numbers, such as√

2. To begin, one needs to show that√

2 is a well definedreal number. The following general result includes this fact.

Proposition 7.1. Given a ∈ R+, k ∈ N, there is a unique b ∈ R+ such that bk = a.

Proof. Consider

(7.5) S a,k = x ≥ 0 : xk ≤ a.



43

Then S a,k is a nonempty bounded subset of R. Note that if y > 0 and yk > a then y isan upper bound for S a,k. Take b = sup S a,k. We claim that bk = a. In fact, if bk < a, itfollows from Exercise 9 of §6 that there exists b1 > b such that bk1 < a, hence b1 ∈ S a,k,

so b < sup S a,k. Similarly, if bk > a, there exists b0 < b such that bk0 > a, hence b0 is anupper bound for S a,k, so b > sup S a,k.

We write

(7.6) b = a1/k.

Now for a list of some irrational numbers:

Proposition 7.2. Take a ∈ N, k ∈ N. If a1/k is not an integer, then a1/k is irrational.

Proof. Assume a1/k = m/n, with m, n

∈N. Then

(7.7) mk = ank.

Using the Fundamental Theorem of Arithmetic, Theorem 3.1, write

(7.8) m = pµ11 · · · pµ , n = pν 11 · · · pν , a = pα11 · · · pα ,

with p1 < · · · < p prime and µj , ν j , αj ∈ N = N ∪ 0. The identity (7.7) implies

(7.9) pkµ11 · · · pkµ = pα1+kν 11 · · · pα+kν

,

and the uniqueness part of Theorem 3.1 then implies that kµj = αj + kν j , 1 ≤ j ≤ ,hence

(7.10) αj = kβ j , β j ∈ N,

and hence

(7.11) a = bk, b = pβ11 · · · pβ ∈ N.

Noting that 12

= 1, 22

= 4, 32

= 9, we have:Corollary 7.3. The following numbers are irrational:

√ 2,

√ 3,

√ 5,

√ 6,

√ 7,

√ 8.

The real line is thick with both rational numbers and irrational numbers. By (6.25),given any x ∈ R, there exist aj ∈ Q such that aj → x. Also, given any x ∈ R, thereexist irrational bj such that bj → x. To see this, just take aj ∈ Q, aj → x, and set

bj = aj + 2−j√

2.



44

In a sense that can be made precise, there are more irrational numbers than rationalnumbers. Namely, Q is countable , while R is uncountable . See §8 for a treatment of this.

Perhaps the most intriguing irrational number is π. See Chapter 4 for material on π,

including a proof that it is irrational.

Exercises

1. Let ξ ∈ (0, 1) have a decimal expansion of the form (6.46), i.e.,

(7.12) ξ =∞k=1

bk · 10−k, bk ∈ 0, 1, . . . , 9.

Show that ξ is rational if and only if (7.12) is eventually repeating, i.e., if and only if there

exist N, m ∈ N such thatk ≥ N =⇒ bk+m = bk.

2. Show that∞k=1

10−k2

is irrational.

3. Making use of Proposition 7.1, define a p for real a > 0, p = m/n ∈ Q. Show that if also q ∈ Q,

a paq = a p+q.

Hint. You might start with am/n = (a1/n)m, given n ∈ N, m ∈ Z. Then you need to showthat if k ∈ N,

(a1/nk)mk = (a1/n)m.

You can use the results of Exercise 8 in §6.

4. Show that, if a,b > 0 and p ∈ Q, then

(ab) p = a pb p.

Hint. First show that (ab)1/n = a1/nb1/n.

5. Using Exercises 3 and 4, extend (6.50) to p ∈ Q, p > 1.Hint. If ak = k− p, then bk = 2ka2k = 2k(2k)− p = 2−( p−1)k = xk with x = 2−( p−1).

6. Show that√

2 +√

3 is irrational.Hint. Square it.



45

8. Cardinal numbers

We return to the natural numbers considered in §1 and make contact with the fact thatthese numbers are used to count objects in collections. Namely, let S be some set. If S isempty, we say 0 is the number of its elements. If S is not empty, pick an element out of S and count “1.” If there remain other elements of S , pick another element and count “2.”Continue. If you pick a final element of S and count “n,” then you say S has n elements.At least, that is a standard informal description of counting. We wish to restate this alittle more formally, in the setting where we can apply the Peano axioms.

In order to do this, we consider the following subsets of N. Given n ∈ N, set

(8.1) I n = j ∈N

: j ≤ n.While the following is quite obvious, it is worthwhile recording that it is a consequence of the Peano axioms and the material developed in §1.

Lemma 8.1. We have

(8.2) I 1 = 1, I n+1 = I n ∪ n + 1.

Proof. Left to the reader.

Now we propose the following

Definition 8.1. A nonempty set S has n elements if and only if there exists a bijective map ϕ : S → I n.

A reasonable definition of counting should permit one to demonstrate that, if S has nelements and it also has m elements, then m = n. The key to showing this from the Peanopostulates is the following.

Proposition 8.2. Assume m, n ∈ N. If there exists an injective map ϕ : I m → I n, then m ≤ n.

Proof. Use induction on n. The case n = 1 is clear (by Lemma 8.1). Assume now thatN ≥ 2 and that the result is true for n < N . Then let ϕ : I m → I N be injective. Two

cases arise: either there is an element j ∈ I m such that ϕ( j) = N , or not. (Also, there isno loss of generality in assuming at this point that m ≥ 2.)If there is such a j, define ψ : I m−1 → I N −1 by

ψ() = ϕ() for < j,

ϕ( + 1) for j ≤ < m.

Then ψ is injective, so m − 1 ≤ N − 1, and hence m ≤ N .On the other hand, if there is no such j, then we already have an injective map ϕ :

I m → I N −1. The induction hypothesis implies m ≤ N − 1, which in turn implies m ≤ N .



46

Corollary 8.3. If there exists a bijective map ϕ : I m → I n, then m = n.

Proof. We see that m ≤ n and n ≤ m, so Proposition 1.13 applies.

Corollary 8.4. If S is a set, m, n ∈ N, and there exist bijective maps ϕ : S

→ I m

, ψ :S → I n, then m = n.

Proof. Consider ψ ϕ−1.

Definition 8.2. If either S = ∅ or S has n elements for some n ∈ N, as in Definiton 8.1,we say S is finite.

The next result implies that any subset of a finite set is finite.

Proposition 8.5. Assume n ∈ N. If S ⊂ I n is nonempty, then there exists m ≤ n and a bijective map ϕ : S → I m.

Proof. Use induction on n. The case n = 1 is clear (by Lemma 8.1). Assume the result is

true for n < N . Then let S ⊂ I N . Two cases arise: either N ∈ S or N /∈ S .If N ∈ S , consider S = S \ N , so S = S ∪ N and S ⊂ I N −1. The inductive

hypothesis yields a bijective map ψ : S → I m (with m ≤ N − 1), and then we obtainϕ : S ∪ N → I m+1, equal to ψ on S and sending the element N to m + 1.

If N /∈ S , then S ⊂ I N −1, and the inductive hypothesis directly yields the desiredbijective map.

Proposition 8.6. The set N is not finite.

Proof. If there were an n ∈ N and a bijective map ϕ : I n → N, then, by restriction, therewould be a bijective map ψ : S → I n+1 for some subset S of I n, hence by the results above

a bijective map ψ : I m →

I n+1 for some m

≤n < n + 1. This contradicts Corollary 8.3.

The next result says that, in a certain sense, N is a minimal set that is not finite.

Proposition 8.7. If S is not finite, then there exists an injective map Φ : N → S .

Proof. We aim to show that there exists a family of injective maps ϕn : I n → S , with theproperty that

(8.3) ϕnI m

= ϕm, ∀ m ≤ n.

We establish this by induction on n. For n = 1, just pick some element of S and callit ϕ1(1). Now assume this claim is true for all n < N . So we have ϕN −1 : I N −1 → S injective, but not surjective (since we assume S is not finite), and (8.3) holds for n ≤ N −1.

Pick x ∈ S not in the range of ϕN −1. Then define ϕN : I N → S so that

(8.3A)ϕN ( j) = ϕN −1( j), j ≤ N − 1,

ϕN (N ) = x.

Having the family ϕn, we define Φ : N → S by Φ( j) = ϕn( j) for any n ≥ j.

Two sets S and T are said to have the same cardinality if there exists a bijective mapbetween them; we write Card(S ) = Card(T ). If there exists an injective map ϕ : S → T ,we write Card(S ) ≤ Card(T ). The following result, known as the Schroeder-Bernsteintheorem, implies that Card(S ) = Card(T ) whenever one has both Card(S ) ≤ Card(T ) andCard(T ) ≤ Card(S ).



47

Theorem 8.8. Let S and T be sets. Suppose there exist injective maps ϕ : S → T and ψ : T → S . Then there exists a bijective map Φ : S → T .

Proof. Let us say an element x

∈ T has a parent y

∈ S if ϕ(y) = x. Similarly there is a

notion of a parent of an element of S . Iterating this gives a sequence of “ancestors” of anyelement of S or T . For any element of S or T , there are three possibilities:

a) The set of ancestors never terminates.b) The set of ancestors terminates at an element of S .c) The set of ancestors terminates at an element of T .

We denote by S a, T a the elements of S, T , respectively for which case a) holds. Similarlywe have S b, T b and S c, T c. We have disjoint unions

S = S a ∪ S b ∪ S c, T = T a ∪ T b ∪ T c.

Now note thatϕ : S a → T a, ϕ : S b → T b, ψ : T c → S c

are all bijective. Thus we can set Φ equal to ϕ on S a ∪ S b and equal to ψ−1 on S c, to geta desired bijection.

The terminology above suggests regarding Card(S ) as an object (some sort of number).Indeed, if S is finite we set Card(S ) = n if S has n elements (as in Definition 8.1). A setthat is not finite is said to be infinite. We can also have a notion of cardinality of infinitesets. A standard notation for the cardinality of N is

(8.4) Card(N) = ℵ0.

Here are some other sets with the same cardinality:

Proposition 8.9. We have

(8.5) Card (Z) = Card (N× N) = Card (Q) = ℵ0.

Proof. We can define a bijection of N onto Z by ordering elements of Z as follows:

0, 1,−

1, 2,−

2, 3,−

3,· · ·

.

We can define a bijection of N and N× N by ordering elements of N×N as follows:

(1, 1), (1, 2), (2, 1), (1, 3), (2, 2), (3, 1), · · · .

We leave it to the reader to produce a similar ordering of Q.

An infinite set that can be mapped bijectively onto N is called countably infinite. Aset that is either finite or countably infinite is called countable. The following result is anatural extension of Proposition 8.5.



48

Proposition 8.10. If X is a countable set and S ⊂ X , then S is countable.

Proof. If X is finite, then Proposition 8.5 applies. Otherwise, we can assume X = N, andwe are looking at S

⊂ N, so there is an injective map ϕ : S

→ N. If S is finite, there is

no problem. Otherwise, by Proposition 8.7, there is an injective map ψ : N → S , and thenTheorem 8.8 implies the existence of a bijection between S and N.

There are sets that are not countable; they are said to be uncountable.

Proposition 8.11. The set R of real numbers is uncountable.

Proof. We may as well show that (0, 1) = x ∈ R : 0 < x < 1 is uncountable. If it werecountable, there would be a bijective map ϕ : N → (0, 1). Expand the real number ϕ( j) inits infinite decimal expansion:

(8.6) ϕ( j) =

∞

k=1

ajk · 10−k

, ajk ∈ 0, 1, . . . 9.

Now set

(8.7)bk = 2 if akk = 2,

3 if akk = 2,

and consider

(8.8) ξ =∞

k=1

bk·

10−k, ξ ∈

(0, 1).

It is seen that ξ is not equal to ϕ( j) for any j ∈ N, contradicting the hypothesis thatϕ : N → (0, 1) is onto.

A common notation for the cardinality of R is

(8.9) Card(R) = c.

We leave it as an exercise to the reader to show that

(8.10) Card(R× R) = c.

Further development of the theory of cardinal numbers requires a formalization of thenotions of set theory. In these notes we have used set theoretical notions rather informally.Our use of such notions has gotten somewhat heavier in this last section. In particular,in the proof of Proposition 8.7, the innocent looking use of the phrase “pick x ∈ S . . . ”actually assumes a weak version of the Axiom of Choice. For an introduction to theaxiomatic treatment of set theory we refer to [Dev].

Exercises



49

1. What is the cardinality of the set P of prime numbers?

2. Let S be a nonempty set and let T be the set of all subsets of S . Adapt the proof of Proposition 8.11 to show that

Card(S ) < Card(T ),

i.e., there is not a surjective map ϕ : S → T .

Hint. There is a natural bijection of T and T , the set of functions f : S → 0, 1, via

f ↔ x ∈ S : f (x) = 1. Given ϕ : S → T , describe a function g : S → 0, 1, not in therange of ϕ, taking a cue from the proof of Proposition 8.11.

3. Finish the proof of Proposition 8.9.

4. Use the map f (x) = x/(1 + x) to prove that

Card(R+) = Card((0, 1)).

5. Find a one-to-one map of R onto R+ and conclude that Card(R) = Card((0, 1)).

6. Use an interlacing of infinite decimal expansions to prove that

Card((0, 1) × (0, 1)) = Card((0, 1)).

7. Prove (8.10).

8. Let m ∈ Z, n ∈ N, and consider

S m,n = k ∈ Z : m + 1 ≤ k ≤ m + n.

Show thatCard S m,n = n.

Hint. Produce a bijective map I n → S m,n.

9. Let S and T be sets. Assume

Card S = m, Card T = n, S ∩ T = ∅,

with m, n ∈ N. Show thatCard S ∪ T = m + n.

Hint. Produce bijective maps S → I m and T → S m,n, leading to a bijection S ∪T → I m+n.



50

9. Metric properties of R

We discuss a number of notions and results related to convergence in R. Recall that asequence of points ( pj) in R converges to a limit p ∈ R (we write pj → p) if and only if forevery ε > 0 there exists N such that

(9.1) j ≥ N =⇒ | pj − p| < ε.

A set S ⊂ R is said to be closed if and only if

(9.2) pj ∈ S, pj → p =⇒ p ∈ S.

The complement R \ S of a closed set S is open . Alternatively, Ω ⊂ R is open if and onlyif, given q ∈ Ω, there exists ε > 0 such that Bε(q ) ⊂ Ω, where

(9.3) Bε(q ) = p ∈ R : | p − q | < ε,

so q cannot be a limit of a sequence of points in R \ Ω.We define the closure S of a set S ⊂ R to consist of all points p ∈ R such that

Bε( p) ∩ S = ∅ for all ε > 0. Equivalently, p ∈ S if and only if there exists an infinitesequence ( pj) of points in S such that pj → p.

An important property of R is completeness , which we recall is defined as follows. Asequence ( pj) of points in R is called a Cauchy sequence if and only if

(9.4) | pj − pk| −→ 0, as j, k → ∞.

It is easy to see that if pj → p for some p ∈ R, then (9.4) holds. The completeness propertyis the converse, given in Theorem 6.8, which we recall here.

Theorem 9.1. If ( pj) is a Cauchy sequence in R, then it has a limit.

Completeness provides a path to the following key notion of compactness . A nonemptyset K ⊂ R is said to be compact if and only if the following property holds.

(9.5)

Each infinite sequence ( pj) in K has a subsequence

that converges to a point in K .

It is clear that if K is compact, then it must be closed. It must also be bounded, i.e., thereexists R < ∞ such that K ⊂ BR(0). Indeed, if K is not bounded, there exist pj ∈ K suchthat | pj+1| ≥ | pj | + 1. In such a case, | pj − pk| ≥ 1 whenever j = k, so ( pj) cannot have aconvergent subsequence. The following converse statement is a key result.

Theorem 9.2. If a nonempty K ⊂ R is closed and bounded, then it is compact.

Clearly every nonempty closed subset of a compact set is compact, so Theorem 9.2 is aconsequence of:



51

Proposition 9.3. Each closed bounded interval I = [a, b] ⊂ R is compact.

Proof. This is a direct consequence of Theorem 6.9, the Bolzano-Weierstrass theorem.

Let K ⊂ R be compact. Since K is bounded from above and from below, we have welldefined real numbers

(9.6) b = sup K, a = inf K,

the first by Proposition 6.11, and the second by a similar argument (cf. Exercise 2 of §6).Since a and b are limits of elements of K , we have a, b ∈ K . We use the notation

(9.7) b = max K, a = min K.

We next discuss continuity. If S ⊂ R, a function

(9.8) f : S −→ R

is said to be continuous at p ∈ S provided

(9.9) pj ∈ S, pj → p =⇒ f ( pj) → f ( p).

If f is continuous at each p ∈ S , we say f is continuous on S .The following two results give important connections between continuity and compact-

ness.

Proposition 9.4. If K ⊂ R is compact and f : K → R is continuous, then f (K ) is

compact.Proof. If (q k) is an infinite sequence of points in f (K ), pick pk ∈ K such that f ( pk) = q k.If K is compact, we have a subsequence pkν → p in K , and then q kν → f ( p) in R.

This leads to the second connection.

Proposition 9.5. If K ⊂ R is compact and f : K → R is continuous, then there exists p ∈ K such that

(9.10) f ( p) = maxx∈K

f (x),

and there exists q ∈

K such that

(9.11) f (q ) = minx∈K

f (x).

Proof. Since f (K ) is compact, we have well defined numbers

(9.12) b = max f (K ), a = min f (K ), a, b ∈ f (K ).

So take p, q ∈ K such that f ( p) = b and f (q ) = a.

The next result is called the intermediate value theorem.



52

Proposition 9.6. Take a, b, c ∈ R, a < b. Let f : [a, b] → R be continuous. Assume

(9.13) f (a) < c < f (b).

Then there exists x ∈ (a, b) such that f (x) = c.

Proof. Let

(9.14) S = y ∈ [a, b] : f (y) ≤ c.

Then a ∈ S , so S is a nonempty, closed (hence compact) subset of [a, b]. Note hat b /∈ S .Take

(9.15) x = max S.

Then a < x < b and f (x) ≤ c. If f (x) < c, then there exists ε > 0 such that a < x − ε <x + ε < b and f (y) < c for x − ε < y < x + ε. Thus x + ε ∈ S , contradicting (9.15).

Returning to the issue of compactness, we establish some further properties of compactsets K ⊂ R, leading to the important result, Proposition 9.10 below.

Proposition 9.7. Let K ⊂ R be compact. Assume X 1 ⊃ X 2 ⊃ X 3 ⊃ · · · form a decreas-ing sequence of closed subsets of K . If each X m = ∅, then ∩mX m = ∅.

Proof. Pick xm ∈ X m. If K is compact, (xm) has a convergent subsequence, xmk → y.

Since xmk : k ≥ ⊂ X m

, which is closed, we have y ∈ ∩mX m.

Corolary 9.8. Let K ⊂ R be compact. Assume U 1 ⊂ U 2 ⊂ U 3 ⊂ · · · form an increasing sequence of open sets in R. If ∪mU m ⊃ K , then U M ⊃ K for some M .

Proof. Consider X m = K \ U m.

Before getting to Proposition 9.10, we bring in the following. Let Q denote the set of rational numbers. The set Q ⊂ R has the following “denseness” property: given p ∈ Rand ε > 0, there exists q ∈ Q such that | p − q | < ε. Let

(9.16) R = Brj (q j) : q j ∈ Q, rj ∈ Q ∩ (0, ∞).

Note that Q is countable , i.e., it can be put in one-to-one correspondence with N. Hence Ris a countable collection of balls. The following lemma is left as an exercise for the reader.

Lemma 9.9. Let Ω ⊂ R be a nonempty open set. Then

(9.17) Ω =

B : B ∈ R, B ⊂ Ω.

To state the next result, we say that a collection U α : α ∈ A covers K if K ⊂ ∪α∈AU α.If each U α ⊂ R is open, it is called an open cover of K . If B ⊂ A and K ⊂ ∪β∈BU β, wesay U β : β ∈ B is a subcover. This result is called the Heine-Borel theorem.



53

Proposition 9.10. If K ⊂ R is compact, then it has the following property.

(9.18) Every open cover U α : α ∈ A of K has a finite subcover.

Proof. By Lemma 9.9, it suffices to prove the following.

(9.19)Every countable cover Bj : j ∈ N of K by open intervals

has a finite subcover.

For this, we set

(9.20) U m = B1 ∪ · · · ∪ Bm

and apply Corollary 9.8.

Exercises

1. Consider a polynomial p(x) = xn + an−1xn−1 + · · · + a1x + a0. Assume each aj ∈ Rand n is odd . Use the intermediate value theorem to show that p(x) = 0 for some x ∈ R.

We describe the construction of a Cantor set. Take a closed, bounded interval [a, b] = C0.Let C1 be obtained from C0 by deleting the open middle third interval, of length (b − a)/3.At the jth stage, Cj is a disjoint union of 2j closed intervals, each of length 3−j(b−a). Then

Cj+1 is obtained from

Cj by deleting the open middle third of each of these 2j intervals.

We have C0 ⊃ C1 ⊃ · · · ⊃ Cj ⊃ · · · , each a closed subset of [a, b].

2. Show that

(9.21) C =j≥0

Cj

is nonempty, and compact. This is the Cantor set.

3. Suppose C is formed as above, with [a, b] = [0, 1]. Show that points in C are preciselythose of the form

(9.22) ξ =∞j=0

bj 3−j , bj ∈ 0, 2.

4. If p, q ∈ C (and p < q ), show that the interval [ p, q ] must contain points not in C. Onesays C is totally disconnected.

5. If p ∈ C, ε > 0, show that ( p − ε, p + ε) contains infinitely many points in C. Given thatC is closed, one says C is perfect .



54

6. Show that Card(C) = Card(R).Hint. With ξ as in (9.22) show that

ξ → η =∞j=0

bj2

2−j

maps C onto [0, 1].

7. Show that Proposition 9.6 implies Proposition 7.1.



55

10. Complex numbers

A complex number is a number of the form

(10.1) z = x + iy, x, y ∈ R,

where the new object i has the property

(10.2) i2 = −1.

We denote the set of complex numbers by C. We have R → C, identifying x ∈ R withx + i0 ∈ C.

We define addition and multiplication in C as follows. Suppose w = a + ib, a, b ∈ R.We set

(10.3) z + w = (x + a) + i(y + b),zw = (xa − yb) + i(xb + ya).

It is routine to verify various commutative, associative, and distributive laws, parallel tothose in Proposition 4.3. If z = 0, i.e., either x = 0 or y = 0, we can set

(10.4) z−1 = 1

z =

x

x2 + y2 − i

y

x2 + y2,

and verify that zz−1 = 1.For some more notation, for z ∈ C of the form (10.1), we set

(10.5) z = x − iy, Re z = x, Im z = y.

We say z is the complex conjugate of z , Re z is the real part of z, and Im z is the imaginarypart of z.We next discuss the concept of the magnitude (or absolute value) of an element z ∈ C.

If z has the form (10.1), we take a cue from the Pythagorean theorem, giving the Euclideandistance from z to 0, and set

(10.6) |z| =

x2 + y2.

Note that

(10.7) |z|2 = z z.

With this notation, (10.4) takes the compact (and clear) form

(10.8) z−1 = z|z|2 .

We have

(10.9) |zw| = |z| · |w|,for z, w ∈ C, as a consequence of the identity (readily verified from the definition (10.5))

(10.10) zw = z · w.

In fact, |zw|2 = (zw)(zw) = z w z w = zzww = |z|2|w|2. This extends the first part of (6.21) from R to C. The extension of the second part also holds, but it requires a littlemore work. The following is the triangle inequality in C.



56

Proposition 10.1. Given z, w ∈ C,

(10.11) |z + w| ≤ |z| + |w|.

Proof. We compare the squares of each side of (10.11). First,

(10.12)

|z + w|2 = (z + w)(z + w)

= |z|2 + |w|2 + wz + zw

= |z|2 + |w|2 + 2Re zw.

Now, for any ζ ∈ C, Re ζ ≤ |ζ |, so Re zw ≤ |zw| = |z| · |w|, so (10.12) is

(10.13) ≤ |z|2 + |w|2 + 2|z| · |w| = (|z| + |w|)2,

and we have (10.11).

We now discuss matters related to convergence in C. Parallel to the real case, we saya sequence (zj) in C converges to a limit z ∈ C (and write zj → z) if and only if for eachε > 0 there exists N such that

(10.14) j ≥ N =⇒ |zj − z| < ε.

Equivalently,

(10.13) zj → z ⇐⇒ |zj − z| → 0.

It is easily seen that

(10.16) zj → z ⇐⇒ Re zj → Re z and Im zj → Im z.

The set C also has the completeness property, given as follows. A sequence (zj) in C issaid to be a Cauchy sequence if and only if

(10.17) |zj − zk| → 0, as j, k → ∞.

It is easy to see (using the triangle inequality) that if zj → z for some z ∈ C, then (10.17)holds. Here is the converse:

Proposition 10.2. If (zj) is a Cauchy sequence in C, then it has a limit.

Proof. If (zj) is Cauchy in C, then (Re zj) and (Im zj) are Cauchy in R, so, by Theorem6.8, they have limits.

We turn to infinite series ∞

k=0 ak, with ak ∈ C

. We say this converges if and only if the sequence of partial sums

(10.18) S n =nk=0

ak

converges:

(10.19)∞k=0

ak = A ⇐⇒ S n → A as n → ∞.

The following is a useful condition guaranteeing convergence. Compare Proposition 6.12.



57

Proposition 10.3. The infinite series ∞k=0 ak converges provided

(10.20)

∞

k=0 |ak| < ∞,

i.e., there exists B < ∞ such that nk=0 |ak| ≤ B for all n.

Proof. The triangle inequality gives, for ≥ 1,

(10.21)

|S n+ − S n| = n+k=n+1

ak

≤

n+

k=n+1 |ak|,

which tends to 0 as n → ∞, uniformly in ≥ 1, provided (10.20) holds (cf. (6.39)–(6.40)).Hence (10.20) ⇒ (S n) is Cauchy. Convergence then follows, by Proposition 10.2.

As in the real case, if (10.20) holds, we say the infinite series ∞

k=0 ak is absolutely convergent .

An example to which Proposition 10.3 applies is the following power series, giving theexponential function ez:

(10.22) ez =∞k=0

zk

k!, z ∈ C.

We turn to a discussion of polar coordinates on C. Given a nonzero z ∈ C, we can write

(10.23) z = rω, r = |z|, ω = z

|z| .

Then ω has unit distance from 0. If the ray from 0 to ω makes an angle θ with the positivereal axis, we have

(10.24) Re ω = cos θ, Im ω = sin θ,

by definition of the trigonometric functions cos and sin. Hence

(10.25) z = r cis θ,

where

(10.26) cis θ = cos θ + i sin θ.



58

If also

(10.27) w = ρ cis ϕ, ρ = |w|,

then

(10.28) zw = rρ cis(θ + ϕ),

as a consequence of the identity

(10.29) cis(θ + ϕ) = (cis θ)(cis ϕ),

which in turn is equivalent to the pair of trigonometric identities

(10.30)cos(θ + ϕ) = cos θ cos ϕ − sin θ sin ϕ,

sin(θ + ϕ) = cos θ sin ϕ + sin θ cos ϕ.

There is another way to write (10.25), using the classical Euler identity

(10.31) eiθ = cos θ + i sin θ.

Then (10.25) becomes

(10.32) z = r eiθ.

The identity (10.29) is equivalent to

(10.33) ei(θ+ϕ) = eiθeiϕ.

We will present a self-contained derivation of (10.31) (and also of (10.30) and (10.33)) inChapter 4.We next define closed and open subsets of C, and discuss the notion of compactness. A

set S ⊂ C is said to be closed if and only if

(10.34) zj ∈ S, zj → z =⇒ z ∈ S.

The complement C \ S of a closed set S is open. Alternatively, Ω ⊂ C is open if and onlyif, given q ∈ Ω, there exists ε > 0 such that Bε(q ) ⊂ Ω, where

(10.35) Bε(q ) = z ∈ C : |z − q | < ε,

so q cannot be a limit of a sequence of points in C

\ Ω. We define the closure S of a setS ⊂ C to consist of all points p ∈ C such that Bε( p) ∩ S = ∅ for all ε > 0. Equivalently, p ∈ S if and only if there exists an infinite sequence ( pj) of points in S such that pj → p.

Parallel to (9.5), we say a nonempty set K ⊂ C is compact if and only if the followingproperty holds.

(10.36)Each infinite sequence ( pj) in K has a subsequence


As in §9, if K ⊂ C is compact, it must be closed and bounded. Parallel to Theorem 9.2,we have the converse.



59

Proposition 10.4. If a nonempty K ⊂ C is closed and bounded, then it is compact.

Proof. Let (zj) be a sequence in K . Then (Re zj) and (Im zj) are bounded, so Theorem6.9 implies the existence of a subsequence such that Re zjν and Im zjν converge. Hence thesubsequence (zjν ) converges in C. Since K is closed, the limit must belong to K .

If S ⊂ C, a function

(10.37) f : S −→ C

is said to be continuous at p ∈ S provided

(10.38) pj ∈ S, pj → p =⇒ f ( pj) → f ( p).

If f is continuous at each p ∈ S , we say f is continuous on S . The following result has thesame proof as Proposition 9.4.

Proposition 10.5. If K ⊂ C is compact and f : K → C is continuous, then f (K ) is compact.

Then the following variant of Proposition 9.5 is straightforward.

Proposition 10.6. If K ⊂ C is compact and f : K → C is continuous, then there exists p ∈ K such that

(10.39)

|f ( p)

|= maxz∈K |

f (z)

|,

and there exists q ∈ K such that

(10.40) |f (q )| = minz∈K

|f (z)|.

There are also straightforward extensions to K ⊂ C of Propositions 9.7–9.10. We omitthe details. But see §1 of Chapter 2 for further extensions.

Exercises

1. Use (10.25)–(10.28) in conjunction with Proposition 7.1 to prove the following:

Given a ∈ C, a = 0, n ∈ N, there exist z1, . . . , zn ∈ C

such that znj = a.

2. Compute 1

2 +

√ 3

2 i3

,



60

and verify that

(10.41) cos π

3

= 1

2

, sin π

3

=

√ 3

2

.

3. Find z1, . . . , zn such that

(10.42) znj = 1,

explicitly in the form a + ib (not simply as cis(2πj/n)), in case

(10.43) n = 3, 4, 6, 8.

Hint. Use (10.41), and also the fact that the equation u2j = i has solutions

(10.44) u1 = 1√

2+

i√ 2

, u2 = −u1.

4. Take the following path to finding the 5 solutions to

(10.45) z5j = 1.

One solution is z1 = 1. Since z5 − 1 = (z − 1)(z4 + z3 + z2 + z + 1), we need to find 4solutions to z4 + z3 + z2 + z + 1 = 0. Write this as

(10.46) z2 + z + 1 + 1z

+ 1z2

= 0,

which, for

(10.47) w = z + 1

z,

becomes

(10.48) w2 + w − 1 = 0.

Use the quadratic formula to find 2 solutions to (10.48). Then solve (10.47), i.e., z 2 −wz +1 = 0, for z.

5. Take the following path to explicitly finding the real and imaginary parts of a solutionto

z2 = a + ib.

Namely, with x = Re z, y = Im z, we have

x2 − y2 = a, 2xy = b,



61

and alsox2 + y2 = ρ =

a2 + b2,

hence

x = ρ + a

2 , y =

b

2x,

as long as a + ib = −|a|.

6. Taking a cue from Exercise 4 of §6, show that

(10.49) 1

1 − z =

∞k=0

zk, for z ∈ C, |z| < 1.

7. Show that1

1 − z2 =

∞k=0

z2k, for z ∈ C, |z| < 1.

8. Produce a power series series expansion in z, valid for |z| < 1, for

1

1 + z2.



62

Chapter II

Spaces

Introduction

In Chapter 1 we developed the real number line R, and established a number of metricalproperties, such as completeness of R, and compactness of closed, bounded subsets. Wealso produced the complex plane C, and studied analogous metric properties of C. Herewe examine other types of spaces, which are useful in analysis.

Section 1 treats n-dimensional Euclidean space, Rn. This is equipped with a dot productx · y ∈ R, which gives rise to a norm |x| =

√ x · x. Parallel to (6.21) and (10.11) of Chapter

1, this norm satisfies the triangle inequality. In this setting, the proof goes through aninequality known as Cauchy’s inequality. Then the distance between x and y in Rn isgiven by d(x, y) = |x − y|, and it satisfies a triangle inequality. With these structures, wehave the notion of convergent sequences and Cauchy sequences, and can show that Rn iscomplete. There is a notion of compactness for subsets of Rn, similar to that given in (9.5)and in (10.36) of Chapter 1, for subsets of R and of C, and it is shown that nonempty,closed bounded subsets of Rn are compact.

Analysts have found it useful to abstract some of the structures mentioned above, andapply them to a larger class of spaces, called metric spaces . A metric space is a setX , equipped with a distance function d(x, y), satisfying certain conditions (see (2.1)),including the triangle inequality. For such a space, one has natural notions of a convergent

sequence and of a Cauchy sequence. The space may or may not be complete. If not,there is a construction of its completion, somewhat similar to the construction of R as thecompletion of Q in §6 of Chapter 1. We discuss the definition and some basic propertiesof metric spaces in §2. There is also a natural notion of compactness in the metric spacecontext, which we treat in §3.

Most metric spaces we will encounter are subsets of Euclidean space. One exceptionintroduced in this chapter is the class of infinite products; see (3.3). Another importantclass of metric spaces beyond the Euclidean space setting consists of spaces of functions,which will be treated in §4 of Chapter 3.



63

1. Euclidean spaces

The space Rn, n-dimensional Euclidean space, consists of n-tuples of real numbers:

(1.1) x = (x1, . . . , xn) ∈ Rn, xj ∈ R, 1 ≤ j ≤ n.

The number xj is called the jth component of x. Here we discuss some important algebraicand metric structures on Rn. First, there is addition. If x is as in (1.1) and also y =(y1, . . . , yn) ∈ Rn, we have

(1.2) x + y = (x1 + y1, . . . , xn + yn) ∈ Rn.

Addition is done componentwise. Also, given a ∈ R, we have

(1.3) ax = (ax1, . . . , a xn) ∈ Rn.

This is scalar multiplication.We also have the dot product,

(1.4) x · y =nj=1

xjyj = x1y1 + · · · + xnyn ∈ R,

given x, y ∈ Rn. The dot product has the properties

(1.5)

x · y = y · x,

x · (ay + bz) = a(x · y) + b(x · z),

x · x > 0 unless x = 0.

Note that

(1.6) x · x = x21 + · · · + x2

n.

We set

(1.7) |x| =√

x · x,

which we call the norm of x. Note that (1.5) implies

(1.8) (ax) · (ax) = a2(x · x),

hence

(1.9) |ax| = |a| · |x|, for a ∈ R, x ∈ Rn.



64

Taking a cue from the Pythagorean theorem, we say that the distance from x to y inRn is

(1.10) d(x, y) = |x − y|.For us, (1.7) and (1.10) are simply definitions. We do not need to depend on the Pythagoreantheorem. Significant properties will be derived below, without recourse to the Pythagoreantheorem.

A set X equipped with a distance function is called a metric space. We will considermetric spaces in general in the next section. Here, we want to show that the Euclideandistance, defined by (1.10), satisfies the “triangle inequality,”

(1.11) d(x, y) ≤ d(x, z) + d(z, y), ∀ x,y,z ∈ Rn.

This in turn is a consequence of the following, also called the triangle inequality.

Proposition 1.1. The norm (1.7) on Rn has the property

(1.12) |x + y| ≤ |x| + |y|, ∀ x, y ∈ Rn.

Proof. We compare the squares of the two sides of (1.12). First,

(1.13)

|x + y|2 = (x + y) · (x + y)

= x

·x + y

·x + y

·x + y

·y

= |x|2 + 2x · y + |y|2.

Next,

(1.14) (|x| + |y|)2 = |x|2 + 2|x| · |y| + |y|2.

We see that (1.12) holds if and only if x · y ≤ |x| · |y|. Thus the proof of Proposition 1.1 isfinished off by the following result, known as Cauchy’s inequality.

Proposition 1.2. For all x, y ∈ Rn,

(1.15) |x · y| ≤ |x| · |y|.

Proof. We start with the chain

(1.16) 0 ≤ |x − y|2 = (x − y) · (x − y) = |x|2 + |y|2 − 2x · y,

which implies

(1.17) 2x · y ≤ |x|2 + |y|2, ∀ x, y ∈ Rn.



65

If we replace x by tx and y by t−1y, with t > 0, the left side of (1.17) is unchanged, so wehave

(1.18) 2x · y ≤ t

2

|x|2

+ t

−2

|y|2

, ∀ t > 0.

Now we pick t so that the two terms on the right side of (1.18) are equal, namely

(1.19) t2 = |y||x| , t−2 =

|x||y| .

(At this point, note that (1.15) is obvious if x = 0 or y = 0, so we will assume that x = 0and y = 0.) Plugging (1.19) into (1.18) gives

(1.20) x · y ≤ |x| · |y|, ∀ x, y ∈ Rn.

This is almost (1.15). To finish, we can replace x in (1.20) by −x = (−1)x, getting

(1.21) −(x · y) ≤ |x| · |y|,and together (1.20) and (1.21) give (1.15).

We now discuss a number of notions and results related to convergence in Rn. First, asequence of points ( pj) in Rn converges to a limit p ∈ Rn (we write pj → p) if and only if

(1.22) | pj − p| −→ 0,

where | · | is the Euclidean norm on Rn

, defined by (1.7), and the meaning of (1.22) isthat for every ε > 0 there exists N such that

(1.23) j ≥ N =⇒ | pj − p| < ε.

A set S ⊂ Rn is said to be closed if and only if

(1.24) pj ∈ S, pj → p =⇒ p ∈ S.

The complement Rn \ S of a closed set S is open . Alternatively, Ω ⊂ Rn is open if andonly if, given q ∈ Ω, there exists ε > 0 such that Bε(q ) ⊂ Ω, where

(1.25) Bε(q ) = p ∈ Rn : | p − q | < ε,

so q cannot be a limit of a sequence of points in Rn \ Ω.An important property of Rn is completeness , a property defined as follows. A sequence

( pj) of points in Rn is called a Cauchy sequence if and only if

(1.26) | pj − pk| −→ 0, as j, k → ∞.

It is easy to see that if pj → p for some p ∈ Rn, then (1.26) holds. The completenessproperty is the converse.



66

Theorem 1.3. If ( pj) is a Cauchy sequence in Rn, then it has a limit, i.e., (1.22) holds for some p ∈ Rn.

Proof. Since convergence pj →

p in Rn is equivalent to convergence in R of each component,

the result is a consequence of the completeness of R. This was proved in Chapter 1.

Completeness provides a path to the following key notion of compactness . A nonemptyset K ⊂ Rn is said to be compact if and only if the following property holds.

(1.27)Each infinite sequence ( pj) in K has a subsequence


It is clear that if K is compact, then it must be closed. It must also be bounded, i.e., thereexists R < ∞ such that K ⊂ BR(0). Indeed, if K is not bounded, there exist pj ∈ K suchthat | pj+1| ≥ | pj | + 1. In such a case, | pj − pk| ≥ 1 whenever j = k, so ( pj) cannot have a

convergent subsequence. The following converse statement is a key result.Theorem 1.4. If a nonempty K ⊂ Rn is closed and bounded, then it is compact.

Proof. If K ⊂ Rn is closed and bounded, it is a closed subset of some box

(1.28) B = (x1, . . . , xn) ∈ Rn : aj ≤ xk ≤ bj , ∀ j.

Clearly every closed subset of a compact set is compact, so it suffices to show that B iscompact. Now, each closed bounded interval [aj , bj ] in R is compact, as shown in §9 of Chapter 1, and the compactness of B follows readily from this.

We establish some further properties of compact sets K ⊂ Rn, leading to the important

result, Proposition 1.8 below. This generalizes results established for n = 1 in §9 of Chapter1. A further generalization will be given in §3.

Proposition 1.5. Let K ⊂ Rn be compact. Assume X 1 ⊃ X 2 ⊃ X 3 ⊃ · · · form a decreasing sequence of closed subsets of K . If each X m = ∅, then ∩mX m = ∅.

Proof. Pick xm ∈ X m. If K is compact, (xm) has a convergent subsequence, xmk → y.

Since xmk : k ≥ ⊂ X m

, which is closed, we have y ∈ ∩mX m.

Corolary 1.6. Let K ⊂ Rn be compact. Assume U 1 ⊂ U 2 ⊂ U 3 ⊂ · · · form an increasing sequence of open sets in Rn. If ∪mU m ⊃ K , then U M ⊃ K for some M .

Proof. Consider X m = K

\U m.

Before getting to Proposition 1.8, we bring in the following. Let Q denote the set of rational numbers, and let Qn denote the set of points in Rn all of whose components arerational. The set Qn ⊂ Rn has the following “denseness” property: given p ∈ Rn andε > 0, there exists q ∈ Qn such that | p − q | < ε. Let

(1.29) R = Br(q ) : q ∈ Qn, r ∈ Q ∩ (0, ∞).

Note that Q and Qn are countable , i.e., they can be put in one-to-one correspondence withN. Hence R is a countable collection of balls. The following lemma is left as an exercisefor the reader.



67

Lemma 1.7. Let Ω ⊂ Rn be a nonempty open set. Then

(1.30) Ω =

B : B ∈ R, B ⊂ Ω.

To state the next result, we say that a collection U α : α ∈ A covers K if K ⊂ ∪α∈AU α.If each U α ⊂ Rn is open, it is called an open cover of K . If B ⊂ A and K ⊂ ∪β∈BU β, wesay U β : β ∈ B is a subcover.

Proposition 1.8. If K ⊂ Rn is compact, then it has the following property.

(1.31) Every open cover U α : α ∈ A of K has a finite subcover.

Proof. By Lemma 1.7, it suffices to prove the following.

(1.32)Every countable cover Bj : j ∈ N of K by open balls

has a finite subcover.

For this, we set

(1.33) U m = B1 ∪ · · · ∪ Bm


Exercises

1. Identifying z = x + iy ∈ C with (x, y) ∈ R2 and w = u + iv ∈ C with (u, v) ∈ R2, showthat the dot product satisfies

z · w = Re zw.

In light of this, compare the proof of Proposition 1.1 with that of Proposition 10.1 inChapter 1.

2. Show that the inequality (1.12) implies (1.11).

3. Prove Lemma 1.7.

4. Use Proposition 1.8 to prove the following extension of Proposition 1.5.

Proposition 1.9. Let K ⊂ Rn be compact. Assume X α : α ∈ A is a collection of closed subsets of K . Assume that for each finite set B ⊂ A, ∩α∈BX α = ∅. Then

α∈A

X α = ∅.



68

Hint. Consider U α = Rn \ X α.

5. Let K ⊂ Rn be compact. Show that there exist x0, x1 ∈ K such that

|x0| ≤ |x|, ∀ x ∈ K,

|x1| ≥ |x|, ∀ x ∈ K.

We say|x0| = min

x∈K |x|, |x1| = max

x∈K |x|.



69

2. Metric spaces

A metric space is a set X, together with a distance function d : X × X → [0, ∞), havingthe properties that

(2.1)

d(x, y) = 0 ⇐⇒ x = y,

d(x, y) = d(y, x),

d(x, y) ≤ d(x, z) + d(y, z).

The third of these properties is called the triangle inequality. We sometimes denote this

metric space by (X, d). An example of a metric space is the set of rational numbers Q

,with d(x, y) = |x − y|. Another example is X = Rn, with

d(x, y) =

(x1 − y1)2 + · · · + (xn − yn)2.

This was treated in §1.If (xν ) is a sequence in X, indexed by ν = 1, 2, 3, . . . , i.e., by ν ∈ Z+, one says

(2.2) xν → y ⇐⇒ d(xν , y) → 0, as ν → ∞.

One says (xν ) is a Cauchy sequence if and only if

(2.3) d(xν , xµ) → 0 as µ, ν → ∞.

One says X is a complete metric space if every Cauchy sequence converges to a limit inX. Some metric spaces are not complete; for example, Q is not complete. You can take asequence (xν ) of rational numbers such that xν →

√ 2, which is not rational. Then (xν ) is

Cauchy in Q, but it has no limit in Q.

If a metric space X is not complete, one can construct its completion X as follows. Let

an element ξ of

X consist of an equivalence class of Cauchy sequences in X, where we say

(2.4) (xν ) ∼ (yν ) =⇒ d(xν , yν ) → 0.

We write the equivalence class containing (xν ) as [xν ]. If ξ = [xν ] and η = [yν ], we can set

(2.5) d(ξ, η) = limν →∞

d(xν , yν ),

and verify that this is well defined, and makes X a complete metric space.If the completion of Q is constructed by this process, you get R, the set of real numbers.

This construction was carried out in §6 of Chapter 1.



70

There are a number of useful concepts related to the notion of closeness. We definesome of them here. First, if p is a point in a metric space X and r ∈ (0, ∞), the set

(2.6) Br( p) = x ∈ X : d(x, p) < ris called the open ball about p of radius r. Generally, a neighborhood of p ∈ X is a setcontaining such a ball, for some r > 0.

A set S ⊂ X is said to be closed if and only if

(2.7) pj ∈ S, pj → p =⇒ p ∈ S.

The complement X \ S of a closed set is said to be open . Alternatively, U ⊂ X is open if and only if

(2.8) q ∈ U =⇒ ∃ ε > 0 such that Bε(q ) ⊂ U,

so q cannot be a limit of a sequence of points in X \ U .We state a couple of straightforward propositions, whose proofs are left to the reader.

Proposition 2.1. If U α is a family of open sets in X , then ∪αU α is open. If K α is a family of closed subsets of X , then ∩αK α is closed.

Given S ⊂ X , we denote by S (the closure of S ) the smallest closed subset of X containing S , i.e., the intersection of all the closed sets K α ⊂ X containing S . Thefollowing result is straightforward.

Proposition 2.2. Given S ⊂ X, p ∈ S if and only if there exist xj ∈ S such that xj → p.Given S ⊂ X, p ∈ X , we say p is an accumulation point of S if and only if, for each

ε > 0, there exists q ∈ S ∩ Bε( p), q = p. It follows that p is an accumulation point of S if and only if each Bε( p), ε > 0, contains infinitely many points of S . One straightforwardobservation is that all points of S \ S are accumulation points of S .

If S ⊂ Y ⊂ X , we say S is dense in Y provided S ⊃ Y .The interior of a set S ⊂ X is the largest open set contained in S , i.e., the union of all

the open sets contained in S . Note that the complement of the interior of S is equal tothe closure of X \ S .

We next define the notion of a connected space. A metric space X is said to be connected

provided that it cannot be written as the union of two disjoint nonempty open subsets.The following is a basic example. Here, we treat I as a stand-alone metric space.

Proposition 2.3. Each interval I in R is connected.

Proof. Suppose A ⊂ I is nonempty, with nonempty complement B ⊂ I , and both setsare open. (Hence both sets are closed.) Take a ∈ A, b ∈ B; we can assume a < b. Letξ = supx ∈ [a, b] : x ∈ A. This exists, by Proposition 6.11 of Chapter 1.

Now we obtain a contradiction, as follows. Since A is closed, ξ ∈ A. (Hence ξ < b.) Butthen, since A is open, ξ > a, and furthermore there must be a neighborhood (ξ − ε, ξ + ε)contained in A. This would imply ξ ≥ ξ + ε. Contradiction.



71

See the next chapter for more on connectedness, and its connection to the IntermediateValue Theorem.

Exercises

1. Prove Proposition 2.1.


3. Let (X, d) be a metric space, and let X be the set of equivalence classes of Cauchy

sequences in X , with equivalence relation (2.4), and distance function d given by (2.5).

Show that ( X, d) is complete.

4. Show that if p ∈ Rn and R > 0, the ball BR( p) = x ∈ Rn : |x − p| < R is connected.Hint. Suppose BR( p) = U ∪ V , a union of two disjoint open sets. Given q 1 ∈ U, q 2 ∈ V ,consider the line segment

= tq 1 + (1 − t)q 2 : 0 ≤ t ≤ 1.

4. Let X = Rn, but replace the distance d(x, y) =

(x1 − y1)2 + · · · + (xn − yn)2 by

d1(x, y) = |x1 − y1| + · · · + |xn − yn|.Show that (X, d1) is a metric space. In particular, verify the triangle inequality. Showthat a sequence pj converges in (X, d1) if and only if it converges in (X, d).



72

3. Compactness

We return to the notion of compactness, defined in the Euclidean context in (1.27). Wesay a (nonempty) metric space X is compact provided the following property holds:

(A) Each sequence (xk) in X has a convergent subsequence.

We will establish various properties of compact metric spaces, and provide various equiv-alent characterizations. For example, it is easily seen that (A) is equivalent to:

(B) Each infinite subset S ⊂ X has an accumulation point.

The following property is known as total boundedness:

Proposition 3.1. If X is a compact metric space, then

(C) Given ε > 0, ∃ finite set x1, . . . , xN such that Bε(x1), . . . , Bε(xN ) covers X.

Proof. Take ε > 0 and pick x1 ∈ X . If Bε(x1) = X , we are done. If not, pick x2 ∈X \ Bε(x1). If Bε(x1) ∪ Bε(x2) = X , we are done. If not, pick x3 ∈ X \ [Bε(x1) ∪ Bε(x2)].Continue, taking xk+1 ∈ X \ [Bε(x1) ∪ · · · ∪ Bε(xk)], if Bε(x1) ∪ · · · ∪ Bε(xk) = X . Notethat, for 1 ≤ i, j ≤ k,

i

= j =

⇒d(xi, xj)

≥ε.

If one never covers X this way, consider S = xj : j ∈ N. This is an infinite set with noaccumulation point, so property (B) is contradicted.

Corollary 3.2. If X is a compact metric space, it has a countable dense subset.

Proof. Given ε = 2−n, let S n be a finite set of points xj such that Bε(xj) covers X .Then C = ∪nS n is a countable dense subset of X .

Here is another useful property of compact metric spaces, which will eventually begeneralized even further, in (E) below.

Proposition 3.3. Let X be a compact metric space. Assume K 1

⊃K 2

⊃K 3

⊃ · · · form

a decreasing sequence of closed subsets of X . If each K n = ∅, then ∩nK n = ∅.

Proof. Pick xn ∈ K n. If (A) holds, (xn) has a convergent subsequence, xnk → y. Sincexnk : k ≥ ⊂ K n , which is closed, we have y ∈ ∩nK n.

Corollary 3.4. Let X be a compact metric space. Assume U 1 ⊂ U 2 ⊂ U 3 ⊂ · · · form an increasing sequence of open subsets of X . If ∪nU n = X , then U N = X for some N .

Proof. Consider K n = X \ U n.

The following is an important extension of Corollary 3.4. Note how this generalizesProposition 1.8.



73

Proposition 3.5. If X is a compact metric space, then it has the property:

(D) Every open cover U α : α ∈ A of X has a finite subcover.

Proof. Each U α is a union of open balls, so it suffices to show that (A) implies the following:

(D’) Every cover Bα : α ∈ A of X by open balls has a finite subcover.

Let C = zj : j ∈ N ⊂ X be a countable dense subset of X , as in Corollary 3.2. Each Bαis a union of balls Brj (zj), with zj ∈ C ∩ Bα, rj rational. Thus it suffices to show that

(D”) Every countable cover Bj : j ∈ N of X by open balls has a finite subcover.

For this, we setU n = B1 ∪ · · · ∪ Bn


The following is a convenient alternative to property (D):

(E) If K α ⊂ X are closed andα

K α = ∅, then some finite intersection is empty.

Considering U α = X \ K α, we see that

(D) ⇐⇒ (E ).

The following result completes Proposition 3.5.

Theorem 3.6. For a metric space X ,

(A) ⇐⇒ (D).

Proof. By Proposition 3.5, (A) ⇒ (D). To prove the converse, it will suffice to show that(E ) ⇒ (B). So let S ⊂ X and assume S has no accumulation point. We claim:

Such S must be closed.

Indeed, if z ∈ S and z /∈ S , then z would have to be an accumulation point. Say S = xα :α

∈ A. Set K α = S

\ xα

. Then each K α has no accumulation point, hence K α

⊂X is

closed. Also ∩αK α = ∅. Hence there exists a finite set F ⊂ A such that ∩α∈F K α = ∅, if (E) holds. Hence S = ∪α∈F xα is finite, so indeed (E ) ⇒ (B).

Remark. So far we have that for every metric space X ,

(A) ⇐⇒ (B) ⇐⇒ (D) ⇐⇒ (E ) =⇒ (C ).

We claim that (C) implies the other conditions if X is complete . Of course, compactnessimplies completeness, but (C) may hold for incomplete X , e.g., X = (0, 1) ⊂ R.



74

Proposition 3.7. If X is a complete metric space with property (C), then X is compact.

Proof. It suffices to show that (C ) ⇒ (B) if X is a complete metric space. So let S ⊂ X be an infinite set. Cover X by balls B1/2(x1), . . . , B1/2(xN ). One of these balls contains

infinitely many points of S , and so does its closure, say X 1 = B1/2(y1). Now cover X byfinitely many balls of radius 1/4; their intersection with X 1 provides a cover of X 1. One

such set contains infinitely many points of S , and so does its closure X 2 = B1/4(y2) ∩ X 1.Continue in this fashion, obtaining

X 1 ⊃ X 2 ⊃ X 3 ⊃ · · · ⊃ X k ⊃ X k+1 ⊃ · · · , X j ⊂ B2−j (yj),

each containing infinitely many points of S . One sees that (yj) forms a Cauchy sequence.If X is complete, it has a limit, yj → z, and z is seen to be an accumulation point of S .

If X j , 1 ≤ j ≤ m, is a finite collection of metric spaces, with metrics dj , we can definea Cartesian product metric space

(3.1) X =mj=1

X j , d(x, y) = d1(x1, y1) + · · · + dm(xm, ym).

Another choice of metric is δ (x, y) =

d1(x1, y1)2 + · · · + dm(xm, ym)2. The metrics d andδ are equivalent , i.e., there exist constants C 0, C 1 ∈ (0, ∞) such that

(3.2) C 0δ (x, y)

≤d(x, y)

≤C 1δ (x, y),

∀ x, y

∈X.

A key example is Rm, the Cartesian product of m copies of the real line R.We describe some important classes of compact spaces.

Proposition 3.8. If X j are compact metric spaces, 1 ≤ j ≤ m, so is X = mj=1 X j .

Proof. If (xν ) is an infinite sequence of points in X, say xν = (x1ν , . . . , xmν ), pick aconvergent subsequence of (x1ν ) in X 1, and consider the corresponding subsequence of (xν ), which we relabel (xν ). Using this, pick a convergent subsequence of (x2ν ) in X 2.Continue. Having a subsequence such that xjν → yj in X j for each j = 1, . . . , m , we thenhave a convergent subsequence in X.

The following result is useful for analysis on Rn.

Proposition 3.9. If K is a closed bounded subset of Rn, then K is compact.

Proof. This has been proved in §1. There it was noted that the result follows from thecompactness of a closed bounded interval I = [a, b] in R, which in turn was proved in §9 of Chapter 1. Here, we just note that compactness of [a, b] is also a corollary of Proposition3.7.

We next give a slightly more sophisticated result on compactness. The following exten-sion of Proposition 3.8 is a special case of Tychonov’s Theorem.



75

Proposition 3.10. If X j : j ∈ Z+ are compact metric spaces, so is X = ∞j=1 X j .

Here, we can make X a metric space by setting

(3.3) d(x, y) =

∞j=1

2−j dj( pj(x), pj(y))1 + dj( pj(x), pj(y)) ,

where pj : X → X j is the projection onto the jth factor. It is easy to verify that, if xν ∈ X,then xν → y in X, as ν → ∞, if and only if, for each j, pj(xν ) → pj(y) in X j .

Proof. Following the argument in Proposition 3.8, if (xν ) is an infinite sequence of pointsin X, we obtain a nested family of subsequences

(3.4) (xν ) ⊃ (x1ν ) ⊃ (x2

ν ) ⊃ · · · ⊃ (xjν ) ⊃ · · ·such that p(xjν ) converges in X , for 1 ≤ ≤ j. The next step is a diagonal construction .We set

(3.5) ξ ν = xν ν ∈ X.

Then, for each j, after throwing away a finite number N ( j) of elements, one obtains from(ξ ν ) a subsequence of the sequence (xjν ) in (3.4), so p(ξ ν ) converges in X for all . Hence(ξ ν ) is a convergent subsequence of (xν ).

Exercises

1. Let ϕ : [0,

∞)

→[0,

∞) have the following properties: Assume

ϕ(0) = 0, ϕ(s) < ϕ(s + t) ≤ ϕ(s) + ϕ(t), for s ≥ 0, t > 0.

Prove that if d(x, y) is symmetric and satisfies the triangle inequality, so does

δ (x, y) = ϕ(d(x, y)).

2. Show that the function d(x, y) defined by (3.3) satisfies (2.1).Hint. Consider ϕ(r) = r/(1 + r).

3. In the setting of (3.1), let

δ (x, y) = d1(x1, y1)2 + · · · + dm(xm, ym)21/2

.

Show thatδ (x, y) ≤ d(x, y) ≤ √

m δ (x, y).

4. Let X be a metric space, p ∈ X , and let K ⊂ X be compact. Show that there existx0, x1 ∈ K such that

d(x0, p) ≤ d(x, p), ∀ x ∈ K,

d(x1, p) ≥ d(x, p), ∀ x ∈ K.



76

Chapter III

Functions

Introduction

The playing fields for analysis are spaces, and the players themselves are functions. Inthis chapter we develop some frameworks for understanding the behavior of various classesof functions. We spend about half the chapter studying functions f : X → Y from onemetric space (X ) to another (Y ), and about half specializing to the case Y = Rn.

Our emphasis is on continuous functions, and §1 presents a number of results on con-tinuous functions f : X → Y , which by definition have the property

xν → x =⇒ f (xν ) → f (x).We devote particular attention to the behavior of continuous functions on compact sets.We bring in the notion of uniform continuity, a priori stronger than continuity, and showthat f continuous on X ⇒ f uniformly continuous on X , provided X is compact. We alsointroduce the notion of connectedness, and extend the intermediate value theorem givenin §9 of Chapter 1 to the setting where X is a connected metric space, and f : X → R iscontinuous.

In §2 we consider sequences and series of functions, starting with sequences (f j) of functions f j : X → Y . We study convergence and uniform convergence. We move toinfinite series

∞

j=1

f j(x),

in case Y = Rn, and discuss conditions on f j yielding convergence, absolute convergence,and uniform convergence. Section 3 introduces a special class of infinite series, powerseries,

∞k=0

akzk.

Here we take ak ∈ C and z ∈ C, and consider conditions yielding convergence on a diskDR = z ∈ C : |z| < R. This section is a prelude to a deeper study of power series, as itrelates to calculus, in Chapter 4.

In §4 we study spaces of functions, including C (X, Y ), the set of continuous functionsf : X → Y . Under certain hypotheses (e.g., if either X or Y is compact) we can take

D(f, g) = supx∈X

dY (f (x), g(x)),

as a distance function, making C (X, Y ) a metric space. We investigate conditions underwhich this metric space can be shown to be complete. We also investigate conditionsunder which certain subsets of C (X, Y ) can be shown to be compact. Unlike §§1–3, thissection will not have much impact on Chapters 4–5, but we include it to indicate furtherinteresting directions that analysis does take.



77

1. Continuous functions

Let X and Y are metric spaces, with distance functions dX and dY , respectively. Afunction f : X → Y is said to be continuous at a point x ∈ X if and only if

(1.1) xν → x in X =⇒ f (xν ) → f (x) in Y,

or, equivalently, for each ε > 0, there exists δ > 0 such that

(1.1A) dX(x, x) < δ =⇒ dY (f (x), f (x)) < ε.

We say f is continuous on X if it is continuous at each point of X . Here is an equivalentcondition.

Proposition 1.1. Given f : X → Y , f is continuous on X if and only if

(1.1B) U open in Y =⇒ f −1(U ) open in X.

Proof. First, assume f is continuous. Let U ⊂ Y be open, and assume x ∈ f −1(U ), sof (x) = y ∈ U . Continuity of f at x forces the image of Bε(x) to lie in a small ball Bδ(y)about y if ε is small enough, hence to lie in U , given that U is open. Thus Bε(x) ⊂ f −1(U )for ε small enough, so f −1(U ) must be open.

Conversely, assume (1.1B) holds. If x ∈ X , and f (x) = y, then for all δ > 0, f −1(Bδ(y))must be an open set containing x, so f −1(Bδ(y)) contains Bε(x) for some ε > 0. Hence f is continuous at x.

We record the following important link between continuity and compactness. Thisextends Proposition 9.4 of Chapter 1.

Proposition 1.2. If X and Y are metric spaces, f : X → Y continuous, and K ⊂ X compact, then f (K ) is a compact subset of Y.

Proof. If (yν ) is an infinite sequence of points in f (K ), pick xν ∈ K such that f (xν ) = yν .If K is compact, we have a subsequence xν j

→ p in X, and then yν j

→f ( p) in Y.

If f : X → R is continuous, we say f ∈ C (X ). A useful corollary of Proposition 1.2 is:

Proposition 1.3. If X is a compact metric space and f ∈ C (X ), then f assumes a maximum and a minimum value on X.

Proof. We know from Proposition 1.2 that f (X ) is a compact subset of R, hence bounded.Proposition 6.1 of Chapter 1 implies f (K ) ⊂ R has a sup and an inf, and, as noted in (9.7)of Chapter 1, these numbers are in f (K ). That is, we have

(1.2) b = max f (K ), a = min f (K ).



78

Hence a = f (x0) for some x0 ∈ X , and b = f (x1) for some x1 ∈ X .

For later use, we mention that if X is a nonempty set and f : X → R is bounded fromabove, disregarding any notion of continuity, we set

(1.3) supx∈X

f (x) = sup f (X ),

and if f : X → R is bounded from below, we set

(1.4) inf x∈X

f (x) = inf f (X ).

If f is not bounded from above, we set sup f = +∞, and if f is not bounded from below,we set inf f = −∞.

Given a set X, f : X → R, and xn ∈ X , we set

(1.5) lim supn→∞

f (xn) = limn→∞sup

k≥n

f (xk),

and

(1.6) lim inf n→∞

f (xn) = limn→∞

inf k≥n

f (xk)

.

We return to the notion of continuity. A function f ∈ C (X ) is said to be uniformly continuous provided that, for any ε > 0, there exists δ > 0 such that

(1.7) x, y ∈ X, d(x, y) ≤ δ =⇒ |f (x) − f (y)| ≤ ε.

More generally, if Y is a metric space with distance function dY , a function f : X → Y issaid to be uniformly continuous provided that, for any ε > 0, there exists δ > 0 such that

(1.8) x, y ∈ X, dX(x, y) ≤ δ =⇒ dY (f (x), f (y)) ≤ ε.

An equivalent condition is that f have a modulus of continuity , i.e., a monotonic functionω : [0, 1) → [0, ∞) such that δ 0 ⇒ ω(δ ) 0, and such that

(1.9) x, y ∈ X, dX(x, y) ≤ δ ≤ 1 =⇒ dY (f (x), f (y)) ≤ ω(δ ).

Not all continuous functions are uniformly continuous. For example, if X = (0, 1) ⊂ R,then f (x) = sin1/x is continuous, but not uniformly continuous, on X. The followingresult is useful, for example, in the development of the Riemann integral in Chapter 4.

Proposition 1.4. If X is a compact metric space and f : X

→ Y is continuous, then f

is uniformly continuous.

Proof. If not, there exist ε > 0 and xν , yν ∈ X such that dX(xν , yν ) ≤ 2−ν but

(1.10) dY (f (xν ), f (yν )) ≥ ε.

Taking a convergent subsequence xν j → p, we also have yν j → p. Now continuity of f at p implies f (xν j ) → f ( p) and f (yν j ) → f ( p), contradicting (1.10).

If X and Y are metric spaces and f : X → Y is continuous, one-to-one, and onto, andif its inverse g = f −1 : Y → X is continuous, we say f is a homeomorphism. Here is auseful sufficient condition for producing homeomorphisms.



79

Proposition 1.5. Let X be a compact metric space. Assume f : X → Y is continuous,one-to-one, and onto. Then its inverse g : Y → X is continuous.

Proof. If K

⊂ X is closed, then K is compact, so by Proposition 1.2, f (K )

⊂ Y is

compact, hence closed. Now if U ⊂ X is open, with complement K = X \ U , we see thatf (U ) = Y \ f (K ), so U open ⇒ f (U ) open, that is,

U ⊂ X open =⇒ g−1(U ) open.

Hence, by Proposition 1.1, g is continuous.

We next define the notion of a connected space. A metric space X is said to be connectedprovided that it cannot be written as the union of two disjoint nonempty open subsets.The following is a basic class of examples.

Proposition 1.6. Each interval I in R is connected.

Proof. This is Proposition 2.3 of Chapter 2.

We say X is path-connected if, given any p, q ∈ X , there is a continuous map γ : [0, 1] →X such that γ (0) = p and γ (1) = q . It is an easy consequence of Proposition 1.6 that X is connected whenever it is path-connected.

The next result is known as the Intermediate Value Theorem. Note that it generalizesProposition 9.6 of Chapter 1.

Proposition 1.7. Let X be a connected metric space and f : X → R continuous. Assume p, q

∈ X , and f ( p) = a < f (q ) = b. Then, given any c

∈ (a, b), there exists z

∈ X such

that f (z) = c.

Proof. Under the hypotheses, A = x ∈ X : f (x) < c is open and contains p, whileB = x ∈ X : f (x) > c is open and contains q . If X is connected, then A ∪ B cannot beall of X ; so any point in its complement has the desired property.

Exercises

1. If X is a metric space, with distance function d, show that

|d(x, y) − d(x, y)| ≤ d(x, x) + d(y, y),

and henced : X × X −→ [0, ∞) is continuous.

2. Let pn(x) = xn. Take b > a > 0, and consider

pn : [a, b] −→ [an, bn].



80

Use the intermediate value theorem to show that pn is onto.

3. In the setting of Exercise 2, show that pn is one-to-one, so it has an inverse

q n : [an, bn] −→ [a, b].

Use Proposition 1.5 to show that q n is continuous. The common notation is

q n(x) = x1/n, x > 0.

Note. This strengthens Proposition 7.1 of Chapter 1.

4. Let f, g : X → C be continuous, and let h(x) = f (x)g(x). Show that h : X → C iscontinuous.

5. Define pn : C → C by pn(z) = zn. Show that pn is continuous for each n ∈ N.Hint. Start at n = 1, and use Exercise 4 to produce an inductive proof.

6. Let X, Y ,Z be metric spaces. Assume f : X → Y and g : Y → Z are continuous. Defineg f : X → Z by g f (x) = g(f (x)). Show that g f is continuous.

7. Let f j : X → Y j be continuous, for j = 1, 2. Define g : X → Y 1 × Y 2 by g(x) =(f 1(x), f 2(x)). Show that g is continuous.

We present some exercises that deal with functions that are semicontinuous . Given a metricspace X and f : X → [−∞, ∞], we say f is lower semicontinuous at x ∈ X provided

f −1((c, ∞]) ⊂ X is open, ∀ c ∈ R.

We say f is upper semicontinuous provided

f −1([−∞, c)) is open, ∀ c ∈ R.

8. Show that

f is lower semicontinuous ⇐⇒ f −1([−∞, c]) is closed, ∀ c ∈ R,

andf is upper semicontinuous ⇐⇒ f −1([c, ∞]) is closed, ∀ c ∈ R.

9. Show that

f is lower semicontinuous ⇐⇒ xn → x implies lim inf f (xn) ≥ f (x).



81

Show that

f is upper semicontinuous ⇐⇒ xn → x implies lim sup f (xn) ≤ f (x).

10. Given S ⊂ X , show that

χS is lower semicontinuous ⇐⇒ S is open.

χS is upper semicontinuous ⇐⇒ S is closed.

Here, χS (x) = 1 if x ∈ S, 0 if x /∈ S .

11. If X is a compact metric space, show that

f : X → R is lower semicontinuous =⇒ min f is achieved.



82

2. Sequences and series of functions

Let X and Y be metric spaces, with distance functions dX and dY , respectively. Con-sider a sequence of functions f j : X → Y , which we denote (f j). To say (f j) converges atx to f : X → Y is simply to say that f j(x) → f (x) in Y . If such convergence holds foreach x ∈ X , we say (f j) converges to f on X , pointwise.

A stronger type of convergence is uniform convergence. We say f j → f uniformly on X provided

(2.1) supx∈X

dY (f j(x), f (x)) −→ 0, as j → ∞.

An equivalent characterization is that, for each ε > 0, there exists K ∈ N such that(2.2) j ≥ K =⇒ dY (f j(x), f (x)) ≤ ε, ∀ x ∈ X.

A significant property of uniform convergence is that passing to the limit preserves conti-nuity.

Proposition 2.1. If f j : X → Y is continuous for each j and f j → f uniformly, then f : X → Y is continuous.

Proof. Fix p ∈ X and take ε > 0. Pick K ∈ N such that (2.2) holds. Then pick δ > 0 suchthat

(2.3) x ∈ Bδ( p) =⇒ dY (f K (x), f K ( p)) < ε,

which can be done since f K : X → Y is continuous. Together, (2.2) and (2.3) imply

(2.4)

x ∈ Bδ( p) ⇒ dY (f (x), f ( p))

≤ dY (f (x), f K (x)) + dY (f K (x), f K ( p)) + dY (f K ( p), f ( p))

≤ 3ε.

Thus f is continuous at p, for each p ∈ X .

We next consider Cauchy sequences of functions f j : X → Y . To say (f j) is Cauchy

at x ∈ X is simply to say (f j(x)) is a Cauchy sequence in Y . We say (f j) is uniformly Cauchy provided

(2.5) supx∈X

dY (f j(x), f k(x)) −→ 0, as j, k → ∞.

An equivalent characterization is that, for each ε > 0, there exists K ∈ N such that

(2.6) j, k ≥ K =⇒ dY (f j(x), f k(x)) ≤ ε, ∀ x ∈ X.

If Y is complete, a Cauchy sequence (f j) will have a limit f : X → Y . We have thefollowing.



83

Proposition 2.2. Assume Y is complete, and f j : X → Y is uniformly Cauchy. Then (f j) converges uniformly to a limit f : X → Y .

Proof. We have already seen that there exists f : X

→Y such that f j(x)

→f (x) for each

x ∈ X . To finish the proof, take ε > 0, and pick K ∈ N such that (2.6) holds. Then takingk → ∞ yields

(2.7) j ≥ K =⇒ dY (f j(x), f (x)) ≤ ε, ∀ x ∈ X,

yielding the uniform convergence.

If, in addition, each f j : X → Y is continuous, we can put Propositions 2.1 and 2.2together. We leave this to the reader.

It is useful to note the following phenomenon in case, in addition, X is compact.

Proposition 2.3. Assume X is compact, f j : X → Y continuous, and f j → f uniformly on X . Then

(2.8) K = f (X ) ∪j≥1

f j(X ) ⊂ Y is compact.

Proof. Let (yν ) ⊂ K be an infinite sequence. If there exists j ∈ N such that yν ∈ f j(X )for infinitely many ν , convergence of a subsequence to an element of f j(X ) follows fromthe known compactness of f j(X ). Ditto if yν ∈ f (X ) for infinitely many ν . It remains toconsider the situation yν ∈ f jν (X ), jν → ∞ (after perhaps taking a subsequence). That,

is, suppose yν = f jν (xν ), xν ∈ X, jν → ∞. Passing to a further subsequence, we canassume xν → x in X , and then it follows from the uniform convergence that

(2.9) yν −→ y = f (x) ∈ K.

We move from sequences to series. For this, we need some algebraic structure on Y .Thus, for the rest of this section, we assume

(2.10) f j : X −→ Rn,

for some n

∈N. We look at the infinite series

(2.11)∞k=0

f k(x),

and seek conditions for convergence, which is the same as convergence of the sequence of partial sums,

(2.11) S j(x) =

jk=0

f k(x).



84

Parallel to Proposition 6.12 of Chapter 1, we have convergence at x ∈ X provided

(2.13)

∞

k=0 |f k(x)| < ∞,

i.e., provided there exists Bx < ∞ such that

(2.14)

jk=0

|f k(x)| ≤ Bx, ∀ j ∈ N.

In such a case, we say the series (2.11) converges absolutely at x. We say (2.11) convergesuniformly on X if and only if (S j) converges uniformly on X . The following sufficient

condition for uniform convergence is called the Weierstrass M test.

Proposition 2.4. Assume there exist M k such that |f k(x)| ≤ M k, for all x ∈ X , and

(2.15)∞k=0

M k < ∞.

Then the series (2.11) converges uniformly on X , to a limit S : X → Rn.

Proof. This proof is also similar to that of Proposition 6.12 of Chapter 1, but we reviewit. We have

(2.16)

|S m+(x) − S m(x)| ≤ m+k=m+1

f k(x)

≤m+k=m+1

|f k(x)|

≤m+k=m+1

M k.

Now (2.15) implies σm = m

k=0 M k is uniformly bounded, so (by Proposition 6.10 of Chapter 1), σm β for some β ∈ R+. Hence

(2.17) |S m+(x) − S m(x)| ≤ σm+ − σm ≤ β − σm → 0, as m → ∞,

independent of ∈ N and x ∈ X . Thus (S j) is uniformly Cauchy on X , and uniformconvergence follows by Propositon 2.2.

Bringing in Proposition 2.1, we have the following.



85

Corollary 2.5. In the setting of Proposition 2.4, if also each f k : X → Rn is continuous,so is the limit S .

Exercises

1. For j ∈ N, define f j : R → R by

f 1(x) = x

1 + x2, f j(x) = f ( jx).

Show that f j → 0 pointwise on R.Show that, for each ε > 0, f j → 0 uniformly on R \ (−ε, ε).Show that (f j) does not converge uniformly to 0 on R.

2. For j ∈ N, define gj : R → R by

g1(x) = x√ 1 + x2

, gj(x) = g1( jx).

Show that there exists g : R → R such that gj → g pointwise. Show that g is not continuouson all of R. Where is g discontinuous?

3. Let X be a compact metric space. Assume f j , f : X → R are continuous and

f j(x)

f (x),

∀x

∈X.

Prove that f j → f uniformly on X . (This result is called Dini’s theorem.)Hint. For ε > 0, let K j(ε) = x ∈ X : f (x)−f j(x) ≥ ε. Note that K j(ε) ⊃ K j+1(ε) ⊃ · · · .What about ∩j≥1K j(ε)?

4. Take gj as in Exercise 2 and consider

∞k=1

1

k2gk(x).

Show that this series converges uniformly on R, to a continuous limit.

5. Take f j as in Exercise 1 and consider

∞k=1

1

kf k(x).

Where does this series converge? Where does it converge uniformly? Where is the sumcontinuous?Hint. For use in the latter questions, note that, for ∈ N, ≤ k ≤ 2, we have f k(1/) ∈[1/2, 1].



86

3. Power series

An important class of infinite series is the class of power series

(3.1)∞k=0

akzk,

with ak ∈ C. Note that if z1 = 0 and (3.1) converges for z = z1, then there exists C < ∞such that

(3.2) |akzk1 | ≤ C, ∀ k.

Hence, if |z| ≤ r|z1|, r < 1, we have

(3.3)∞k=0

|akzk| ≤ C ∞k=0

rk = C

1 − r < ∞,

the last identity being the classical geometric series computation. (Compare (10.49) inChapter 1.) This yields the following.

Proposition 3.1. If (3.1) converges for some z1 = 0, then either this series is absolutely convergent for all z ∈ C, or there is some R ∈ (0, ∞) such that the series is absolutely convergent for |z| < R and divergent for |z| > R.

We call R the radius of convergence of (3.1). In case of convergence for all z, we saythe radius of convergence is infinite. If R > 0 and (3.1) converges for |z| < R, it defines afunction

(3.4) f (z) =∞k=0

akzk, z ∈ DR,

on the disk of radius R centered at the origin,

(3.5) DR = z ∈ C : |z| < R.

Proposition 3.2. If the series (3.4) converges in DR, then it converges uniformly on DS for all S < R, and hence f is continuous on DR, i.e., given zn, z ∈ DR,

(3.6) zn → z =⇒ f (zn) → f (z).

Proof. For each z ∈ DR, there exists S < R such that z ∈ DS , so it suffices to show thatf is continuous on DS whenever 0 < S < R. Pick T such that S < T < R. We know thatthere exists C < ∞ such that |akT k| ≤ C for all k. Hence

(3.7) z ∈ DS =⇒ |akzk| ≤ C S

T

k.



87

Since

(3.8)∞

k=0S

T k

<

∞,

the Weierstrass M-test, Proposition 2.4, applies, to yield uniform convergence on DS . Since

(3.9) ∀ k, akzk is continuous,

continuity of f on DS follows from Corollary 2.5.

More generally, a power series has the form

(3.10) f (z) =

∞

n=0 an(z − z0)

n

.

It follows from Proposition 3.1 that to such a series there is associated a radius of con-vergence R ∈ [0, ∞], with the property that the series converges absolutely whenever|z − z0| < R (if R > 0), and diverges whenever |z − z0| > R (if R < ∞). We identify R asfollows:

(3.11) 1

R = limsup

n→∞|an|1/n.

This is established in the following result, which complements Propositions 3.1–3.2.

Proposition 3.3. The series (3.10) converges whenever |z − z0| < R and diverges when-ever |z − z0| > R, where R is given by (2.2). If R > 0, the series converges uniformly on z : |z − z0| ≤ R, for each R < R. Thus, when R > 0, the series (3.10) defines a continuous function

(3.12) f : DR(z0) −→ C,

where

(3.13) DR

(z0

) =

z ∈C :

|z

−z

0|< R

.

Proof. If R < R, then there exists N ∈ Z+ such that

n ≥ N =⇒ |an|1/n < 1

R =⇒ |an|(R)n < 1.

Thus

(3.14) |z − z0| < R < R =⇒ |an(z − z0)n| ≤

z − z0

R

n

,



88

for n ≥ N , so (3.10) is dominated by a convergent geometrical series in DR(z0).For the converse, we argue as follows. Suppose R > R, so infinitely many |an|1/n ≥

1/R, hence infinitely many |an|(R)n ≥ 1. Then

|z − z0| ≥ R > R =⇒ infinitely many |an(z − z0)n| ≥ z − z0

Rn ≥ 1,

forcing divergence for |z − z0| > R.The assertions about uniform convergence and continuity follow as in Proposition 3.2.

It is useful to note that we can multiply power series with radius of convergence R > 0.In fact, there is the following more general result on products of absolutely convergentseries.

Proposition 3.4. Given absolutely convergent series

(3.15) A =∞n=0

αn, B =∞n=0

β n,

we have the absolutely convergent series

(3.16) AB =∞n=0

γ n, γ n =nj=0

αjβ n−j .

Proof. Take Ak = kn=0 αn, Bk =

kn=0 β n. Then

(3.17) AkBk =

k

n=0

γ n + Rk

with

(3.18) Rk =

(m,n)∈σ(k)

αmβ n, σ(k) = (m, n) ∈ Z+ × Z+ : m, n ≤ k, m + n > k.

Hence

(3.19)

|Rk| ≤m≤k/2

k/2≤n≤k

|αm| |β n| +

k/2≤m≤k

n≤k

|αm| |β n|

≤ A n≥k/2

|β n| + B m≥k/2

|αm|,where

(3.20) A =∞n=0

|αn| < ∞, B =∞n=0

|β n| < ∞.

It follows that Rk → 0 as k → ∞. Thus the left side of (3.17) converges to AB and theright side to

∞n=0 γ n. The absolute convergence of (3.16) follows by applying the same

argument with αn replaced by |αn| and β n replaced by |β n|.



89

Corollary 3.5. Suppose the following power series converge for |z| < R:

(3.21) f (z) =∞

n=0

anzn, g(z) =∞

n=0

bnzn.

Then, for |z| < R,

(3.22) f (z)g(z) =∞n=0

cnzn, cn =nj=0

ajbn−j .

The following result, which is related to Proposition 3.4, has a similar proof.

Proposition 3.6. If ajk ∈ C and

j,k |ajk | < ∞, then

j ajk is absolutely convergent

for each k, k ajk is absolutely convergent for each j, and

(3.23)∞j=0

∞k=0

ajk

=

∞k=0

∞j=0

ajk

=j,k

ajk .

Proof. Clearly the hypothesis implies j |ajk | < ∞ for each k and

k |ajk | < ∞ for each

j. It also implies that there exists B < ∞ such that

S N =N j=0

N k=0

|ajk | ≤ B, ∀ N.

Now S N is bounded and monotone, so there exists a limit, S N A < ∞ as N ∞. Itfollows that, for each ε > 0, there exists N ∈ N such that

(j,k)∈C(N )

|ajk | < ε, C(N ) = ( j, k) ∈ N× N : j > N or k > N .

Now

∞j=0

∞k=0

ajk

−N k=0

N k=0

ajk

≤

(j,k)∈C(N )

|ajk |.

We have a similar result with the roles of j and k reversed, and clearly the two finite sumsagree. It follows that

∞j=0

∞k=0

ajk

−

∞k=0

∞j=0

ajk

< 2ε, ∀ ε > 0,

yielding (3.23).

Using Proposition 3.6, we demonstrate the following. (Thanks to Shrawan Kumar forthis argument.)



90

Proposition 3.7. If (3.10) has a radius of convergence R > 0, and z1 ∈ DR(z0), then f (z) has a convergent power series about z1:

(3.24) f (z) =∞k=0

bk(z − z1)k, for |z − z1| < R − |z1 − z0|.

Proof. There is no loss in generality in taking z0 = 0, which we will do here, for notationalsimplicity. Setting f z1(ζ ) = f (z1 + ζ ), we have from (3.10)

(3.25)

f z1(ζ ) =∞n=0

an(ζ + z1)n

=

∞

n=0

n

k=0

ann

kζ k

zn−k1 ,

the second identity by the binomial formula. Now,

(3.26)∞n=0

nk=0

|an|

n

k

|ζ |k|z1|n−k =

∞n=0

|an|(|ζ | + |z1|)n < ∞,

provided |ζ | + |z1| < R, which is the hypothesis in (3.24) (with z0 = 0). Hence Proposition3.6 gives

(3.27) f z1(ζ ) =∞k=0

∞n=k

an

n

k

zn−k1

ζ k.

Hence (3.24) holds, with

(3.28) bk =∞n=k

an

n

k

zn−k1 .

This proves Proposition 3.7. Note in particular that

(3.29) b1 =∞n=1

nanzn−11 .

For more on power series, see §3 of Chapter 4.

Exercises



91

1. Let ak ∈ C. Assume there exist K ∈ N, α < 1 such that

(3.30) k ≥ K =⇒ ak

+1

ak ≤ α.

Show that ∞k=0 ak is absolutely convergent.

Note. This is the ratio test.

2. Determine the radius of convergence R for each of the following power series. If 0 <R < ∞, try to determine when convergence holds at points on |z| = R.

(3.31)

∞

n=0

zn,∞

n=1

zn

n ,

∞

n=1

zn

n2,

∞n=1

zn

n!,

∞n=1

zn

2n,

∞n=1

z2n

2n ,

∞n=1

nzn,∞n=1

n2zn,∞n=1

n! zn.


4. We have seen that

(3.32) 1

1 − z =

∞k=0

zk, |z| < 1.

Find power series in z for

(3.33) 1

z − 2,

1

z + 3.

Where do they converge?

5. Use Corollary 3.5 to produce a power series in z for

(3.34) 1

z2 + z − 6.

Where does the series converge?

6. As an alternative to the use of Corollary 3.5, write (3.34) as a linear combination of thefunctions (3.33).



92

4. Spaces of functions

If X and Y are metric spaces, the space C (X, Y ) of continuous maps f : X → Y has anatural metric structure, under some additional hypotheses. We use

(4.1) D(f, g) = supx∈X

d

f (x), g(x)

.

This sup exists provided f (X ) and g(X ) are bounded subsets of Y, where to say B ⊂ Y isbounded is to say d : B × B → [0, ∞) has bounded image. In particular, this supremumexists if X is compact.

Proposition 4.1. If X is a compact metric space and Y is a complete metric space, then C (X, Y ), with the metric (4.1), is complete.

Proof. That D(f, g) satisfies the conditions to define a metric on C (X, Y ) is straightfor-ward. We check completeness. Suppose (f ν ) is a Cauchy sequence in C (X, Y ), so, asν → ∞,

(4.2) supk≥0

supx∈X

d

f ν +k(x), f ν (x) ≤ εν → 0.

Then in particular (f ν (x)) is a Cauchy sequence in Y for each x ∈ X , so it converges, sayto g(x)

∈Y . It remains to show that g

∈C (X, Y ) and that f ν

→g in the metric (4.1).

In fact, taking k → ∞ in the estimate above, we have

(4.3) supx∈X

d

g(x), f ν (x) ≤ εν → 0,

i.e., f ν → g uniformly. It remains only to show that g is continuous. For this, let xj → xin X and fix ε > 0. Pick N so that εN < ε. Since f N is continuous, there exists J suchthat j ≥ J ⇒ d(f N (xj), f N (x)) < ε. Hence

j ≥ J ⇒ d

g(xj), g(x) ≤ d

g(xj), f N (xj)

+ d

f N (xj), f N (x)

+ d

f N (x), g(x)

< 3ε.

This completes the proof.

In case Y = R, we write C (X,R) = C (X ). The distance function (4.1) can then bewritten

(4.4) D(f, g) = f − gsup, f sup = supx∈X

|f (x)|.

f sup is a norm on C (X ).Generally, a norm on a vector space V is an assignment f → f ∈ [0, ∞), satisfying

(4.5) f = 0 ⇔ f = 0, af = |a| f , f + g ≤ f + g,



93

given f , g ∈ V and a a scalar (in R or C). A vector space equipped with a norm is called anormed vector space. It is then a metric space, with distance function D(f, g) = f − g.If the space is complete, one calls V a Banach space .

In particular, by Proposition 4.1, C (X ) is a Banach space, when X is a compact metricspace.

The next result is a special case of Ascoli’s Theorem. To state it, we say a modulus of continuity is a strictly monotonically increasing, continuous function ω : [0, ∞) → [0, ∞)such that ω(0) = 0.

Proposition 4.2. Let X and Y be compact metric spaces, and fix a modulus of continuity ω(δ ). Then

(4.6) Cω =

f ∈ C (X, Y ) : d

f (x), f (x) ≤ ω

d(x, x)

∀ x, x ∈ X

is a compact subset of C (X, Y ).

Proof. Let (f ν ) be a sequence in Cω. Let Σ be a countable dense subset of X , as in Corollary3.2 of Chapter 2. For each x ∈ Σ, (f ν (x)) is a sequence in Y , which hence has a convergentsubsequence. Using a diagonal construction similar to that in the proof of Proposition 3.10of Chapter 2, we obtain a subsequence (ϕν ) of (f ν ) with the property that ϕν (x) convergesin Y, for each x ∈ Σ, say

(4.7) ϕν (x) → ψ(x),

for all x ∈ Σ, where ψ : Σ → Y.So far, we have not used (4.6). This hypothesis will now be used to show that ϕν

converges uniformly on X. Pick ε > 0. Then pick δ > 0 such that ω(δ ) < ε/3. Since X is

compact, we can cover X by finitely many balls Bδ(xj), 1 ≤ j ≤ N, xj ∈ Σ. Pick M solarge that ϕν (xj) is within ε/3 of its limit for all ν ≥ M (when 1 ≤ j ≤ N ). Now, for anyx ∈ X, picking ∈ 1, . . . , N such that d(x, x) ≤ δ, we have, for k ≥ 0, ν ≥ M,

(4.8)

d

ϕν +k(x), ϕν (x) ≤ d

ϕν +k(x), ϕν +k(x)

+ d

ϕν +k(x), ϕν (x)

+ d

ϕν (x), ϕν (x)

≤ ε/3 + ε/3 + ε/3.

Thus (ϕν (x)) is Cauchy in Y for all x ∈ X , hence convergent. Call the limit ψ(x), so wenow have (4.7) for all x ∈ X . Letting k → ∞ in (4.8) we have uniform convergence of ϕν

to ψ. Finally, passing to the limit ν → ∞ in(4.9) d(ϕν (x), ϕν (x)) ≤ ω(d(x, x))

gives ψ ∈ Cω.

We want to re-state Proposition 4.2, bringing in the notion of equicontinuity . Givenmetric spaces X and Y , and a set of maps F ⊂ C (X, Y ), we say F is equicontinuous at apoint x0 ∈ X provided

(4.10)∀ ε > 0, ∃ δ > 0 such that ∀ x ∈ X, f ∈ F ,dX(x, x0) < δ =⇒ dY (f (x), f (x0)) < ε.



94

We say F is equicontinuous on X if it is equicontinuous at each point of X . We say F isuniformly equicontinuous on X provided

(4.11) ∀ ε > 0, ∃ δ > 0 such that ∀ x, x

∈ X, f ∈ F ,dX(x, x) < δ =⇒ dY (f (x), f (x)) < ε.

Note that (4.11) is equivalent to the existence of a modulus of continuity ω such thatF ⊂ Cω, given by (4.6). It is useful to record the following result.

Proposition 4.3. Let X and Y be metric spaces, F ⊂ C (X, Y ). Assume X is compact.then

(4.12) F equicontinuous =⇒ F is uniformly equicontinuous.

Proof. The argument is a variant of the proof of Proposition 1.4. In more detail, supposethere exist xν , xν ∈ X, ε > 0, and f ν ∈ F such that d(xν , xν ) ≤ 2−ν but

(4.13) d(f ν (xν ), f ν (xν )) ≥ ε.

Taking a convergent subsequence xν j → p ∈ X , we also have xν j → p. Now equicontinuityof F at p implies that there esists N < ∞ such that

(4.14) d(g(xν j ), g( p)) < ε

2, ∀ j ≥ N, g ∈ F ,

contradicting (4.13).Putting together Propositions 4.2 and 4.3 then gives the following.

Proposition 4.4. Let X and Y be compact metric spaces. If F ⊂ C (X, Y ) is equicontin-uous on X , then it has compact closure in C (X, Y ).

Exercises

1. Let X and Y be compact metric spaces. Show that if

F ⊂C (X, Y ) is compact, then

F is equicontinuous. (This is a converse to Proposition 4.4.)

2. Let X be a compact metric space, and r ∈ (0, 1]. Define Lipr(X,Rn) to consist of continuous functions f : X → Rn such that, for some L < ∞ (depending on f ),

|f (x) − f (y)| ≤ LdX(x, y)r, ∀ x, y ∈ X.

Define a norm

f r = supx∈X

|f (x)| + supx,y∈X,x=y

|f (x) − f (y)|d(x, y)r

.



95

Show that Lipr(X,Rn) is a complete metric space, with distance function Dr(f, g) =f − gr.

3. In the setting of Exercise 2, show that if 0 < r < s ≤ 1 and f ∈ Lips

(X,Rn

), then

f r ≤ C f 1−θsup f θs, θ =

r

s ∈ (0, 1).

4. In the setting of Exercise 2, show that if 0 < r < s ≤ 1, then

f ∈ Lips(X,Rn) : f s ≤ 1

is compact in Lipr(X,Rn).

5. Let X be a compact metric space, and define C (X ) as in (4.4). Take

P : C (X ) × C (X ) −→ C (X ), P (f, g)(x) = f (x)g(x).

Show that P is continuous.



96

Chapter IV

Calculus

Introduction

Having foundational material on numbers, spaces, and functions, we proceed furtherinto the heart of analysis, with a rigorous development of calculus, for functions of one realvariable.

Section 1 introduces the derivative, establishes basic identities like the product rule andthe chain rule, and also obtains some important theoretical results, such as the Mean ValueTheorem and the Inverse Function Theorem. One application of the latter is the study of

x1/n, for x > 0, which leads more generally to xr, for x > 0 and r ∈ Q.Section 2 brings in the integral, more precisely the Riemann integral. A major result is

the Fundamental Theorem of Calculus, whose proof makes essential use of the Mean ValueTheorem. Another topic is the change of variable formula for integrals (treated in someexercises).

In §3 we treat power series, continuing the development from §3 of Chapter 3. Herewe treat such topics as term by term differentiation of power series, and formulas for theremainder when a power series is truncated. An application of such remainder formulas ismade to the study of convergence of the power series about x = 0 of (1 − x)b.

Section 4 studies curves in Euclidean space Rn, with particular attention to arc length.

We derive an integral formula for arc length. We show that a smooth curve can bereparametrized by arc length, as an application of the Inverse Function Theorem. Wethen take a look at the unit circle S 1 in R2. Using the parametrization of part of S 1 as(t,

√ 1 − t2), we obtain a power series for arc lengths, as an application of material of §3

on power series of (1 − x)b, with b = −1/2, and x replaced by t2. We also bring in thetrigonometric functions, having the property that (cos t, sin t) provides a parametrizationof S 1 by arc length.

Section 5 goes much further into the study of the trigonometric functions. Actually,it begins with a treatment of the exponential function et, observes that such treatmentextends readily to eat, given a ∈ C, and then establishes that eit provides a unit speedparametrization of S 1. This directly gives Euler’s formula

eit = cos t + i sin t,

and provides for a unified treatment of the exponential and trigonometric functions. Wealso bring in log as the inverse function to the exponential, and we use the formula xr =er log x to generalize results of §1 on xr from r ∈ Q to r ∈ R.

In §6 we give a natural extension of the Riemann integral from the class of bounded (Rie-mann integrable) functions to a class of unbounded “integrable” functions. The treatmenthere is perhaps a desirable alternative to discussions one sees of “improper integrals.”



97

This chapter concludes with some appendices. Appendix A gives a proof of the Fun-damental Theorem of Algebra, that every nonconstant polynomial has a complex root.Appendix B presents a proof that π is irrational. Appendix C refines material on the

power series of (1 − x)b

, in case b > 0. This will prove useful in Chapter 5. Appendix Ddicusses a method of calculating π that goes back to Archimedes. Appendix E discussescalculations of π using arctangents.



98

1. The derivative

Consider a function f , defined on an interval (a, b) ⊂ R, taking values in R or C. Givenx ∈ (a, b), we say f is differentiable at x, with derivative f (x), provided

(1.1) limh→0

f (x + h) − f (x)

h = f (x).

We also use the notation

(1.2) df

dx(x) = f (x).

A characterization equivalent to (1.1) is

(1.3) f (x + h) = f (x) + f (x)h + r(x, h), r(x, h) = o(h),

where

(1.4) r(x, h) = o(h) means r(x, h)

h → 0 as h → 0.

Clearly if f is differentiable at x then it is continuous at x. We say f is differentiable on(a, b) provided it is differentiable at each point of (a, b). If also g is defined on (a, b) anddifferentiable at x, we have

(1.5) d

dx(f + g)(x) = f (x) + g(x).

We also have the following product rule :

(1.6) d

dx(f g)(x) = f (x)g(x) + f (x)g(x).

To prove (1.6), note that

f (x + h)g(x + h) − f (x)g(x)

h =

f (x + h) − f (x)

h g(x) + f (x + h)

g(x + h) − g(x)

h .

We can use the product rule to show inductively that

(1.7) d

dxxn = nxn−1,



99

for all n ∈ N. In fact, this is immediate from (1.1) if n = 1. Given that it holds for n = k,we have

d

dx

xk+1 = d

dx

(x xk) = dx

dx

xk + x d

dx

xk

= xk + kxk

= (k + 1)xk,

completing the induction. We also have

1

h

1

x + h − 1

x

= − 1

x(x + h) → − 1

x2, as h → 0,

for x = 0, hence

(1.8) d

dx

1

x = − 1

x2, if x = 0.

From here, we can extend (1.7) from n ∈ N to all n ∈ Z (requiring x = 0 if n < 0).A similar inductive argument yields

(1.9) d

dxf (x)n = nf (x)n−1f (x),

for n ∈ N, and more generally for n ∈ Z (requiring f (x) = 0 if n < 0).

Going further, we have the following chain rule . Suppose f : (a, b) → (α, β ) is differen-tiable at x and g : (α, β ) → R (or C) is differentiable at y = f (x). Form G = g f , i.e.,G(x) = g(f (x)). We claim

(1.10) G = g f =⇒ G(x) = g (f (x))f (x).

To see this, write

(1.11)G(x + h) = g(f (x + h)) = g(f (x) + f (x)h + rf (x, h))

= g(f (x)) + g(f (x))(f (x)h + rf (x, h)) + rg(f (x), f (x)h + rf (x, h)).

Here,rf (x, h)

h −→ 0 as h → 0,

and alsorg(f (x), f (x)h + rf (x, h))

h −→ 0, as h → 0,

so the analogue of (1.3) applies.The derivative has the following important connection to maxima and minima.



100

Proposition 1.1. Let f : (a, b) → R. Suppose x ∈ (a, b) and

(1.12) f (x) ≥ f (y), ∀ y ∈ (a, b).

If f is differentiable at x, then f (x) = 0. The same conclusion holds if f (x) ≤ f (y) for all y ∈ (a, b).

Proof. Given (1.12), we have

(1.13) f (x + h) − f (x)

h ≤ 0, ∀ h ∈ (0, b − x),

and

(1.14) f (x + h) − f (x)h ≥ 0, ∀ h ∈ (a − x, 0).

If f is differentiable at x, both (1.13) and (1.14) must converge to f (x) as h → 0, so wesimultaneously have f (x) ≤ 0 and f (x) ≥ 0.

We next establish a key result known as the Mean Value Theorem .

Theorem 1.2. Let f : [a, b] → R. Assume f is continuous on [a, b] and differentiable on (a, b). Then there exists ξ ∈ (a, b) such that

(1.15) f

(ξ ) =

f (b)

−f (a)

b − a .

Proof. Let g(x) = f (x) − κ(x − a), where κ denotes the right side of (1.15). Then g(a) =g(b). The result (1.15) is equivalent to the assertion that

(1.16) g(ξ ) = 0

for some ξ ∈ (a, b). Now g is continuous on the compact set [a, b], so it assumes both amaximum and a minimum on this set. If g has a maximum at a point ξ ∈ (a, b), then(1.16) follows from Proposition 1.1. If not, the maximum must be g(a) = g(b), and then g

must assume a minimum at some point ξ ∈ (a, b). Again Proposition 1.1 implies (1.16).

We use the Mean Value Theorem to produce a criterion for constructing the inverse of a function. Let

(1.17) f : [a, b] −→ R, f (a) = α, f (b) = β.

Assume f is continuous on [a, b], differentiable on (a, b), and

(1.18) 0 < γ 0 ≤ f (x) ≤ γ 1 < ∞, ∀ x ∈ (a, b).



101

Then (1.15) implies

(1.19) γ 0(b − a) ≤ β − α ≤ γ 1(b − a).

We can also apply Theorem 1.2 to f , restricted to an interval [x1, x2] ⊂ [a, b], to get

(1.20) γ 0(x2 − x1) ≤ f (x2) − f (x1) ≤ γ 1(x2 − x1), if a ≤ x1 < x2 ≤ b.

It follows that

(1.21) f : [a, b] −→ [α, β ] is one-to-one.

The intermediate value theorem implies f : [a, b] → [α, β ] is onto. Consequently f has aninverse

(1.22) g : [α, β ] −→ [a, b], g(f (x)) = x, f (g(y)) = y,

and (1.20) implies

(1.23) γ 0(g(y2) − g(y1)) ≤ y2 − y1 ≤ γ 1(g(y2) − g(y1)), if α ≤ y1 < y2 ≤ β.

The following result is known as the Inverse Function Theorem .

Theorem 1.3. If f is continuous on [a, b] and differentiable on (a, b), and (1.17)–(1.18)hold, its inverse g : [α, β ]

→[a, b] is differentiable on (α, β ), and

(1.24) g(y) = 1

f (x), for y = f (x) ∈ (α, β ).

The same conclusion holds if in place of (1.18) we have

(1.25) −γ 1 ≤ f (x) ≤ −γ 0 < 0, ∀ x ∈ (a, b),

except that then β < α.

Proof. Fix y ∈ (α, β ), and let x = g(y), so y = f (x). From (1.22) we have, for h small

enough, x + h = g(f (x + h)) = g(f (x) + f (x)h + r(x, h)),

i.e.,

(1.26) g(y + f (x)h + r(x, h)) = g(y) + h, r(x, h) = o(h).

Now (1.23) implies

(1.27) |g(y1 + r(x, h)) − g(y1)| ≤ 1

γ 0|r(x, h)|,



102

provided y1, y1 + r(y, h) ∈ [α, β ], so, with h = f (x)h, we have

(1.28) g(y + h) = g(y) +h

f (x) + o(h),

yielding (1.24) from the analogue of (1.3).

Remark. If one knew that g were differentiable, as well as f , then the identity (1.24) wouldfollow by differentiating g(f (x)) = x, applying the chain rule. However, an additionalargument, such as given above, is necessary to guarantee that g is differentiable.

Theorem 1.3 applies to the functions

(1.29) pn(x) = xn, n ∈ N.

By (1.7), pn(x) > 0 for x > 0, so (1.18) holds when 0 < a < b < ∞. We can take a 0and b ∞ and see that

(1.30) pn : (0, ∞) −→ (0, ∞) is invertible,

with differentiable inverse q n : (0, ∞) → (0, ∞). We use the notation

(1.31) x1/n = q n(x), x > 0,

so, given n ∈ N,

(1.32) x > 0 =⇒ x = x1/n · · · x1/n, (n factors).

Note. We recall that x1/n was constructed, for x > 0, in Chapter 1, §7, and its continuitydiscussed in Chapter 3, §1.

Given m ∈ Z, we can set

(1.33) xm/n = (x1/n)m, x > 0,

and verify that (x1/kn)km = (x1/n)m. Thus we have xr defined for all r ∈ Q, when x > 0.We have

(1.34) xr+s = xrxs, for x > 0, r ,s ∈ Q.

See Exercises 3–5 in §7 of Chapter 1. Applying (1.24) to f (x) = xn, g(y) = y1/n, we have

(1.35) d

dyy1/n =

1

nxn−1, y = xn, x > 0.



103

Now xn−1 = y/x = y1−1/n, so we get

(1.36) d

dy

yr = ryr−1, y > 0,

when r = 1/n. Putting this together with (1.9) (with m in place of n), we get (1.36) forall r = m/n ∈ Q.

The definition of xr for x > 0 and the identity (1.36) can be extended to all r ∈ R, withsome more work. We will find a neat way to do this in §5.

We recall another common notation, namely

(1.37)√

x = x1/2, x > 0.

Then (1.36) yields

(1.38) d

dx

√ x =

1

2√

x.

In regard to this, note that, if we consider

(1.39)

√ x + h − √

x

h ,

we can multiply numerator and denominator by√

x + h +√

x, to get

(1.40) 1√ x + h +

√ x

,

whose convergence to the right side of (1.38) for x > 0 is equivalent to the statement that

(1.41) limh→0

√ x + h =

√ x,

i.e., to the continuity of x → √ x on (0, ∞). Such continuity is a consequence of the fact

that, for 0 < a < b < ∞, n = 2,

(1.42) pn : [a, b] −→ [an, bn]

is continuous, one-to-one, and onto, so, by the compactness of [a, b], its inverse is continu-ous. Thus we have an alternative derivation of (1.38).

If I ⊂ R is an interval and f : I → R (or C), we say f ∈ C 1(I ) if f is differentiable on I and f is continuous on I . If f is in turn differentiable, we have the second derivative of f :

(1.43) d2f

dx2(x) = f (x) =

d

dxf (x).



104

If f is differentiable on I and f is continuous on I , we say f ∈ C 2(I ). Inductively, we candefine higher order derivatives of f, f (k), also denoted dkf/dxk. Here, f (1) = f , f (2) = f ,and if f (k) is differentiable,

(1.44) f (k+1)(x) = d

dxf (k)(x).

If f (k) is continuous on I , we say f ∈ C k(I ).Sometimes we will run into functions of more than one variable, and will want to dif-

ferentiate with respect to each one of them. For example, if f (x, y) is defined for (x, y) inan open set in R2, we set

(1.45)

∂f

∂x(x, y) = lim

h→0

f (x + h, y) − f (x, y)

h ,

∂f ∂y

(x, y) = limh→0

f (x, y + h) − f (x, y)h

.

We will not need any more than the definition here. A serious study of the derivative of a function of several variables is given in the companion [T2] to this volume, Introduction to Analysis in Several Variables .

We end this section with some results on the significance of the second derivative.

Proposition 1.4. Assume f is differentiable on (a, b), x0 ∈ (a, b), and f (x0) = 0. As-sume f is differentiable at x0 and f (x0) > 0. Then there exists δ > 0 such that

(1.46) f (x0) < f (x) for all x ∈ (x0 − δ, x0 + δ ) \ x0.

We say f has a local minimum at x0.

Proof. Since

(1.47) f (x0) = limh→0

f (x0 + h) − f (x0)

h ,

the assertion that f (x0) > 0 implies that there exists δ > 0 such that the right side of (1.47) is > 0 for all nonzero h

∈[

−δ, δ ]. Hence

(1.48)−δ ≤ h < 0 =⇒ f (x0 + h) < 0,

0 < h ≤ δ =⇒ f (x0 + h) > 0.

This plus the mean value theorem imply (1.46).

Remark. Similarly,

(1.49) f (x0) < 0 =⇒ f has a local maximum at x0.



105

These two facts constitute the second derivative test for local maxima and local minima.

Let us now assume that f and f are differentiable on (a, b), so f is defined at each

point of (a, b). Let us further assume

(1.50) f (x) > 0, ∀ x ∈ (a, b).

The mean value theorem, applied to f , yields

(1.51) a < x0 < x1 < b =⇒ f (x0) < f (x1).

Here is another interesting property.

Proposition 1.5. If (1.50) holds and a < x0 < x1 < b, then

(1.52) f (sx0 + (1 − s)x1) < sf (x0) + (1 − s)f (x1), ∀ s ∈ (0, 1).

Proof. For s ∈ [0, 1], set

(1.53) g(s) = sf (x0) + (1 − s)f (x1) − f (sx0 + (1 − s)x1).

The result (1.52) is equivalent to

(1.54) g(s) > 0 for 0 < s < 1.

Note that

(1.55) g(0) = g(1) = 0.

If (1.54) fails, g must assume a minimum at some point s0 ∈ (0, 1). At such a point,g(s0) = 0. A computation gives g (s) = f (x0) − f (x0) − (x0 − x1)f (sx0 + (1 − s)x1), andhence

(1.56) g(s) = −(x0 − x1)2f (sx0 + (1 − s)x1).

Thus (1.50) ⇒ g(s0) < 0. Then (1.49) ⇒ g has a local maximum at s0. This contradictionestablishes (1.54), hence (1.52).

Remark. The result (1.52) implies that the graph of y = f (x) over [x0, x1] lies below thechord, i.e., the line segment from (x0, f (x0)) to (x1, f (x1)) in R2. We say f is convex .

Exercises



106

Compute the derivative of each of the following functions. Specify where each of thesederivatives are defined.

1 + x2,(1)

(x2 + x3)−4,(2)√

1 + x2

(x2 + x3)4.(3)

4. Let f : [0, ∞) → R be a C 2 function satisfying

(1.57) f (x) > 0, f (x) > 0, f (x) < 0, for x > 0.

Show that

(1.58) x, y > 0 =⇒ f (x + y) < f (x) + f (y).

5. Apply Exercise 4 to

(1.59) f (x) = x

1 + x.

Relate the conclusion to Exercise 1 in §3 of Chapter 2. Give a direct proof that (1.58)holds for f in (1.59), without using calculus.

6. If f : I → Rn, we define f (x) just as in (1.1). If f (x) = (f 1(x), . . . , f n(x)), then f isdifferentiable at x if and only if each component f j is, and

f (x) = (f 1(x), . . . , f n(x)).

Parallel to (1.6), show that if g : I → Rn, then the dot product satisfies

d

dx f (x) · g(x) = f (x) · g(x) + f (x) · g(x).

7. Establish the following variant of Proposition 1.5. Suppose (1.50) is weakened to

(1.60) f (x) ≥ 0, ∀ x ∈ (a, b).

Show that, in place of (1.52), one has

(1.61) f (sx0 + (1 − s)x1) ≤ sf (x0) + (1 − s)f (x1), ∀ s ∈ (0, 1).Hint. Consider f ε(x) = f (x) + εx2.

8. The following is called the generalized mean value theorem. Let f and g be continuouson [a, b] and differentiable on (a, b). Then there exists ξ ∈ (a, b) such that

[f (b) − f (a)]g(ξ ) = [g(b) − g(a)]f (ξ ).

Show that this follows from the mean value theorem, applied to

h(x) = [f (b) − f (a)]g(x) − [g(b) − g(a)]f (x).



107

2. The integral

In this section, we introduce the Riemann version of the integral, and relate it to thederivative. We will define the Riemann integral of a bounded function over an intervalI = [a, b] on the real line. For now, we assume f is real valued. To start, we partitionI into smaller intervals. A partition P of I is a finite collection of subintervals J k : 0 ≤k ≤ N , disjoint except for their endpoints, whose union is I . We can order the J k so thatJ k = [xk, xk+1], where

(2.1) x0 < x1 < · · · < xN < xN +1, x0 = a, xN +1 = b.

We call the points xk the endpoints of P . We set

(2.2) (J k) = xk+1 − xk, maxsize (P ) = max0≤k≤N

(J k)

We then set

(2.3)

I P (f ) =k

supJ k

f (x) (J k),

I P (f ) =

kinf J k

f (x) (J k).

Here,

supJ k

f (x) = sup f (J k), inf J k

f (x) = inf f (J k),

and we recall that if S ⊂ R is bounded, sup S and inf S were defined in §6 of Chapter 1;cf. (6.32) and (6.45). We call I P (f ) and I P (f ) respectively the upper sum and lower sumof f , associated to the partition P . Note that I P (f ) ≤ I P (f ). These quantities shouldapproximate the Riemann integral of f, if the partition P is sufficiently “fine.”

To be more precise, if P and Q are two partitions of I, we say P refines Q, and write

P Q, if P is formed by partitioning each interval in Q. Equivalently, P Q if and onlyif all the endpoints of Q are also endpoints of P . It is easy to see that any two partitionshave a common refinement; just take the union of their endpoints, to form a new partition.Note also that refining a partition lowers the upper sum of f and raises its lower sum:

(2.4) P Q =⇒ I P (f ) ≤ I Q(f ), and I P (f ) ≥ I Q(f ).

Consequently, if P j are any two partitions and Q is a common refinement, we have

(2.5) I P 1(f ) ≤ I Q(f ) ≤ I Q(f ) ≤ I P 2(f ).



108

Now, whenever f : I → R is bounded, the following quantities are well defined:

(2.6) I (f ) = inf P∈Π(I )

I P (f ), I (f ) = supP∈Π(I )

I P (f ),

where Π(I ) is the set of all partitions of I . We call I (f ) the lower integral of f and I (f ) itsupper integral. Clearly, by (2.5), I (f ) ≤ I (f ). We then say that f is Riemann integrable provided I (f ) = I (f ), and in such a case, we set

(2.7)

ba

f (x) dx =

I

f (x) dx = I (f ) = I (f ).

We will denote the set of Riemann integrable functions on I by R(I ).We derive some basic properties of the Riemann integral.

Proposition 2.1. If f, g ∈ R(I ), then f + g ∈ R(I ), and

(2.8) I

(f + g) dx = I

f dx + I

g dx.

Proof. If J k is any subinterval of I , then

supJ k

(f + g) ≤ supJ k

f + supJ k

g, and inf J k

(f + g) ≥ inf J k

f + inf J k

g,

so, for any partition P , we have I P (f + g) ≤ I P (f ) + I P (g). Also, using common refine-ments, we can simultaneously approximate I (f ) and I (g) by I P (f ) and I P (g), and dittofor I (f + g). Thus the characterization (2.6) implies I (f + g) ≤ I (f ) + I (g). A parallelargument implies I (f + g)

≥I (f ) + I (g), and the proposition follows.

Next, there is a fair supply of Riemann integrable functions.

Proposition 2.2. If f is continuous on I, then f is Riemann integrable.

Proof. Any continuous function on a compact interval is bounded and uniformly continuous(see Propositions 1.1 and 1.3 of Chapter 3). Let ω(δ ) be a modulus of continuity for f, so

(2.9) |x − y| ≤ δ =⇒ |f (x) − f (y)| ≤ ω(δ ), ω(δ ) → 0 as δ → 0.

Then

(2.10) maxsize (P ) ≤ δ =⇒ I P (f ) − I P (f ) ≤ ω(δ ) · (I ),

which yields the proposition.We denote the set of continuous functions on I by C (I ). Thus Proposition 2.2 says

C (I ) ⊂ R(I ).

The proof of Proposition 2.2 provides a criterion on a partition guaranteeing that I P (f )and I P (f ) are close to

I f dx when f is continuous. We produce an extension, giving a

condition under which I P (f ) and I (f ) are close, and I P (f ) and I (f ) are close, given f bounded on I . Given a partition P 0 of I , set

(2.11) minsize(P 0) = min(J k) : J k ∈ P 0.



109

Lemma 2.3. Let P and Q be two partitions of I . Assume

(2.12) maxsize(

P )

≤ 1

k

minsize(

Q).

Let |f | ≤ M on I . Then

(2.13)I P (f ) ≤ I Q(f ) +

2M

k (I ),

I P (f ) ≥ I Q(f ) − 2M

k (I ).

Proof. Let P 1 denote the minimal common refinement of P and Q. Consider on the onehand those intervals in

P that are contained in intervals in

Qand on the other hand those

intervals in P that are not contained in intervals in Q. Each interval of the first type is alsoan interval in P 1. Each interval of the second type gets partitioned, to yield two intervalsin P 1. Denote by P b1 the collection of such divided intervals. By (2.12), the lengths of theintervals in P b1 sum to ≤ (I )/k. It follows that

|I P (f ) − I P 1(f )| ≤J ∈P b

1

2M(J ) ≤ 2M (I )

k ,

and similarly |I P (f ) − I P 1(f )| ≤ 2M (I )/k. Therefore

I P (f ) ≤ I P 1(f ) + 2M

k (I ), I P (f ) ≥ I P 1(f ) − 2M

k (I ).

Since also I P 1(f ) ≤ I Q(f ) and I P 1(f ) ≥ I Q(f ), we obtain (2.13).

The following consequence is sometimes called Darboux’s Theorem.

Theorem 2.4. Let P ν be a sequence of partitions of I into ν intervals J νk , 1 ≤ k ≤ ν ,such that

maxsize(P ν ) −→ 0.

If f : I →

R is bounded, then

(2.14) I P ν (f ) → I (f ) and I P ν (f ) → I (f ).

Consequently,

(2.15) f ∈ R(I ) ⇐⇒ I (f ) = limν →∞

ν k=1

f (ξ νk)(J νk),

for arbitrary ξ νk ∈ J νk , in which case the limit is

I f dx.



110

Proof. As before, assume |f | ≤ M . Pick ε > 0. Let Q be a partition such that

I (f ) ≤ I Q(f ) ≤ I (f ) + ε,

I (f ) ≥ I Q(f ) ≥ I (f ) − ε.

Now pick N such that

ν ≥ N =⇒ maxsize P ν ≤ ε minsize Q.

Lemma 2.3 yields, for ν ≥ N ,

I P ν (f ) ≤ I Q(f ) + 2M (I )ε,

I P ν (f )

≥I Q(f )

−2M (I )ε.

Hence, for ν ≥ N ,I (f ) ≤ I P ν (f ) ≤ I (f ) + [2M (I ) + 1]ε,

I (f ) ≥ I P ν (f ) ≥ I (f ) − [2M (I ) + 1]ε.

This proves (2.14).

Remark. The sums on the right side of (2.15) are called Riemann sums, approximating I f dx (when f is Riemann integrable).

Remark. A second proof of Proposition 2.1 can readily be deduced from Theorem 2.4.

One should be warned that, once such a specific choice of P ν and ξ νk has been made,the limit on the right side of (2.15) might exist for a bounded function f that is not Riemann integrable. This and other phenomena are illustrated by the following exampleof a function which is not Riemann integrable. For x ∈ I, set

(2.16) ϑ(x) = 1 if x ∈ Q, ϑ(x) = 0 if x /∈ Q,

where Q is the set of rational numbers. Now every interval J ⊂ I of positive length contains

points in Q and points not in Q, so for any partition P of I we have I P (ϑ) = (I ) andI P (ϑ) = 0, hence

(2.17) I (ϑ) = (I ), I (ϑ) = 0.

Note that, if P ν is a partition of I into ν equal subintervals, then we could pick each ξ νk tobe rational, in which case the limit on the right side of (2.15) would be (I ), or we couldpick each ξ νk to be irrational, in which case this limit would be zero. Alternatively, wecould pick half of them to be rational and half to be irrational, and the limit would be12 (I ).



111

Associated to the Riemann integral is a notion of size of a set S, called content . If S isa subset of I, define the “characteristic function”

(2.18) χS (x) = 1 if x ∈ S, 0 if x /∈ S.

We define “upper content” cont+ and “lower content” cont− by

(2.19) cont+(S ) = I (χS ), cont−(S ) = I (χS ).

We say S “has content,” or “is contented” if these quantities are equal, which happens if and only if χS ∈ R(I ), in which case the common value of cont+(S ) and cont−(S ) is

(2.20) m(S ) = I

χS (x) dx.

It is easy to see that

(2.21) cont+(S ) = inf N k=1

(J k) : S ⊂ J 1 ∪ · · · ∪ J N

,

where J k are intervals. Here, we require S to be in the union of a finite collection of intervals.

See the appendix at the end of this section for a generalization of Proposition 2.2, givinga sufficient condition for a bounded function to be Riemann integrable on I , in terms of the upper content of its set of discontinuities.

There is a more sophisticated notion of the size of a subset of I, called Lebesgue measure.The key to the construction of Lebesgue measure is to cover a set S by a countable (eitherfinite or infinite ) set of intervals. The outer measure of S ⊂ I is defined by

(2.22) m∗(S ) = inf k≥1

(J k) : S ⊂k≥1

J k

.

Here J k is a finite or countably infinite collection of intervals. Clearly

(2.23) m∗(S ) ≤ cont+(S ).

Note that, if S = I ∩Q, then χS = ϑ, defined by (2.16). In this case it is easy to see thatcont+(S ) = (I ), but m∗(S ) = 0. Zero is the “right” measure of this set. More materialon the development of measure theory can be found in a number of books, including [Fol]and [T1].

It is useful to note that

I f dx is additive in I, in the following sense.



112

Proposition 2.5. If a < b < c, f : [a, c] → R, f 1 = f

[a,b], f 2 = f

[b,c]

, then

(2.24) f

∈ R[a, c]

⇐⇒f 1

∈ R[a, b]

and f 2

∈ R[b, c]

,

and, if this holds,

(2.25)

ca

f dx =

ba

f 1 dx +

cb

f 2 dx.

Proof. Since any partition of [a, c] has a refinement for which b is an endpoint, we may aswell consider a partition P = P 1 ∪P 2, where P 1 is a partition of [a, b] and P 2 is a partitionof [b, c]. Then

(2.26) I P

(f ) = I P 1

(f 1

) + I P 2

(f 2

), I P

(f ) = I P 1

(f 1

) + I P 2

(f 2

),

so

(2.27) I P (f ) − I P (f ) =

I P 1(f 1) − I P 1(f 1)

+

I P 2(f 2) − I P 2(f 2)

.

Since both terms in braces in (2.27) are ≥ 0, we have equivalence in (2.24). Then (2.25)follows from (2.26) upon taking sufficiently fine partitions.

Let I = [a, b]. If f ∈ R(I ), then f ∈ R([a, x]) for all x ∈ [a, b], and we can consider thefunction

(2.28) g(x) = xa

f (t) dt.

If a ≤ x0 ≤ x1 ≤ b, then

(2.29) g(x1) − g(x0) =

x1x0

f (t) dt,

so, if |f | ≤ M,

(2.30) |g(x1) − g(x0)| ≤ M |x1 − x0|.

In other words, if f ∈ R(I ), then g is Lipschitz continuous on I.Recall from §1 that a function g : (a, b) → R is said to be differentiable at x ∈ (a, b)

provided there exists the limit

(2.31) limh→0

1

h

g(x + h) − g(x)

= g (x).

When such a limit exists, g(x), also denoted dg/dx, is called the derivative of g at x.Clearly g is continuous wherever it is differentiable.

The next result is part of the Fundamental Theorem of Calculus.



113

Theorem 2.6. If f ∈ C ([a, b]), then the function g, defined by (2.28), is differentiable at each point x ∈ (a, b), and

(2.32) g(x) = f (x).

Proof. Parallel to (2.29), we have, for h > 0,

(2.33) 1

h

g(x + h) − g(x)

=

1

h

x+h

x

f (t) dt.

If f is continuous at x, then, for any ε > 0, there exists δ > 0 such that |f (t) − f (x)| ≤ εwhenever |t − x| ≤ δ. Thus the right side of (0.33) is within ε of f (x) whenever h ∈ (0, δ ].Thus the desired limit exists as h 0. A similar argument treats h 0.

The next result is the rest of the Fundamental Theorem of Calculus.

Theorem 2.7. If G is differentiable and G

(x) is continuous on [a, b], then

(2.34)

ba

G(t) dt = G(b) − G(a).

Proof. Consider the function

(2.35) g(x) =

xa

G(t) dt.

We have g ∈ C ([a, b]), g(a) = 0, and, by Theorem 2.6,

g(x) = G(x), ∀

x∈

(a, b).

Thus f (x) = g(x) − G(x) is continuous on [a, b], and

(2.36) f (x) = 0, ∀ x ∈ (a, b).

We claim that (2.36) implies f is constant on [a, b]. Granted this, since f (a) = g(a)−G(a) =−G(a), we have f (x) = −G(a) for all x ∈ [a, b], so the integral (2.35) is equal to G(x)−G(a)for all x ∈ [a, b]. Taking x = b yields (2.34).

The fact that (2.36) implies f is constant on [a, b] is a consequence of the Mean ValueTheorem. This was established in §1; see Theorem 1.2. We repeat the statement here.

Theorem 2.8. Let f : [a, β ]→R be continuous, and assume f is differentiable on (a, β ).

Then ∃ ξ ∈ (a, β ) such that

(2.37) f (ξ ) = f (β ) − f (a)

β − a .

Now, to see that (2.36) implies f is constant on [a, b], if not, ∃ β ∈ (a, b] such thatf (β ) = f (a). Then just apply Theorem 2.8 to f on [a, β ]. This completes the proof of Theorem 2.7.

We now extend Theorems 2.6–2.7 to the setting of Riemann integrable functions.



114

Proposition 2.9. Let f ∈ R([a, b]), and define g by (2.28). If x ∈ [a, b] and f is contin-uous at x, then g is differentiable at x, and g(x) = f (x).

The proof is identical to that of Theorem 2.6.

Proposition 2.10. Assume G is differentiable on [a, b] and G ∈ R([a, b]). Then (2.34)holds.

Proof. We have

G(b) − G(a) =n−1k=0

G

a + (b − a)k + 1

n

− G

a + (b − a)

k

n

= b − a

n

n−1k=0

G(ξ kn),

for some ξ kn satisfying

a + (b − a)k

n < ξ kn < a + (b − a)

k + 1

n ,

as a consequence of the Mean Value Theorem. Given G ∈ R([a, b]), Darboux’s theorem

(Theorem 2.4) implies that as n → ∞ one gets G(b) − G(a) = ba

G(t) dt.

Note that the beautiful symmetry in Theorems 2.6–2.7 is not preserved in Propositions2.9–2.10. The hypothesis of Proposition 2.10 requires G to be differentiable at each x ∈[a, b], but the conclusion of Proposition 2.9 does not yield differentiability at all points.For this reason, we regard Propositions 2.9–2.10 as less “fundamental” than Theorems2.6–2.7. There are more satisfactory extensions of the fundamental theorem of calculus,involving the Lebesgue integral, and a more subtle notion of the “derivative” of a non-smooth function. For this, we can point the reader to Chapters 10-11 of the text [T1],Measure Theory and Integration.

So far, we have dealt with integration of real valued functions. If f : I → C, we setf = f 1 + if 2 with f j : I → R and say f ∈ R(I ) if and only if f 1 and f 2 are in R(I ). Then

I

f dx =

I

f 1 dx + i

I

f 2 dx.

There are straightforward extensions of Propositions 2.5–2.10 to complex valued functions.Similar comments apply to functions f : I → Rn.

Complementary results on Riemann integrability

Here we provide a condition, more general then Proposition 2.2, which guarantees Rie-mann integrability.



115

Proposition 2.11. Let f : I → R be a bounded function, with I = [a, b]. Suppose that the set S of points of discontinuity of f has the property

(2.38) cont +

(S ) = 0.

Then f ∈ R(I ).

Proof. Say |f (x)| ≤ M . Take ε > 0. As in (2.21), take intervals J 1, . . . , J N such that

S ⊂ J 1 ∪ · · · ∪ J N and N

k=1 (J k) < ε. In fact, fatten each J k such that S is containedin the interior of this collection of intervals. Consider a partition P 0 of I , whose intervalsinclude J 1, . . . , J N , amongst others, which we label I 1, . . . , I K . Now f is continuous oneach interval I ν , so, subdividing each I ν as necessary, hence refining P 0 to a partitionP 1, we arrange that sup f − inf f < ε on each such subdivided interval. Denote thesesubdivided intervals I 1, . . . , I L. It readily follows that

0 ≤ I P 1(f ) − I P 1(f ) <N k=1

2M (J k) +Lk=1

ε(I k)

< 2εM + ε(I ).

Since ε can be taken arbitrarily small, this establishes that f ∈ R(I ).

Remark. An even better result is that such f is Riemann integrable if and only if

(2.38A) m∗(S ) = 0,

where m∗(S ) is defined by (2.22). Standard books on measure theory, including [Fol] and[T1], establish this.

We give an example of a function to which Proposition 2.11 applies, and then an examplefor which Proposition 2.11 fails to apply, though the function is Riemann integrable.

Example 1. Let I = [0, 1]. Define f : I → R by

f (0) = 0,

f (x) = (−1)j for x ∈ (2−(j+1), 2−j ], j ≥ 0.

Then |f | ≤ 1 and the set of points of discontinuity of f is

S = 0 ∪ 2−j : j ≥ 1.

It is easy to see that cont+ S = 0. Hence f ∈ R(I ).

See Exercises 16-17 below for a more elaborate example to which Proposition 2.11 applies.



116

Example 2. Again I = [0, 1]. Define f : I → R by

f (x) = 0 if x /

∈Q,

1

n if x =

m

n , in lowest terms.

Then |f | ≤ 1 and the set of points of discontinuity of f is

S = I ∩Q.

As we have seen below (2.23), cont+ S = 1, so Proposition 2.11 does not apply. Neverthe-less, it is fairly easy to see directly that

I (f ) = I (f ) = 0, so f ∈ R(I ).

In fact, given ε > 0, f ≥ ε only on a finite set, hence

I (f ) ≤ ε, ∀ ε > 0.

As indicated below (2.23), (2.38A) does apply to this function.

By contrast, the function ϑ in (2.16) is discontinuous at each point of I .

We mention an alternative characterization of I (f ) and I (f ), which can be useful. GivenI = [a, b], we say g : I → R is piecewise constant on I (and write g ∈ PK(I )) provided thereexists a partition P = J k of I such that g is constant on the interior of each interval J k.

Clearly PK(I ) ⊂ R(I ). It is easy to see that, if f : I → R is bounded,

(2.39)

I (f ) = inf I

f 1 dx : f 1 ∈ PK(I ), f 1 ≥ f

,

I (f ) = sup I

f 0 dx : f 0 ∈ PK(I ), f 0 ≤ f

.

Hence, given f : I → R bounded,

(2.40)

f ∈ R(I ) ⇔ for each ε > 0, ∃f 0, f 1 ∈ PK(I ) such that

f 0 ≤ f ≤ f 1 and I

(f 1 − f 0) dx < ε.

This can be used to prove

(2.41) f, g ∈ R(I ) =⇒ f g ∈ R(I ),

via the fact that

(2.42) f j , gj ∈ PK(I ) =⇒ f jgj ∈ PK(I ).

In fact, we have the following, which can be used to prove (2.41).



117

Proposition 2.12. Let f ∈ R(I ), and assume |f | ≤ M . Let ϕ : [−M, M ] → R be continuous. Then ϕ f ∈ R(I ).

Proof. We proceed in steps.Step 1. We can obtain ϕ as a uniform limit on [−M, M ] of a sequence ϕν of continuous,piecewise linear functions. Then ϕν f → ϕ f uniformly on I . A uniform limit g of functions gν ∈ R(I ) is in R(I ) (see Exercise 9). So it suffices to prove Proposition 2.12when ϕ is continuous and piecewise linear.Step 2. Given ϕ : [−M, M ] → R continuous and piecewise linear, it is an exercise to writeϕ = ϕ1 − ϕ2, with ϕj : [−M, M ] → R monotone, continuous, and piecewise linear. Nowϕ1 f, ϕ2 f ∈ R(I ) ⇒ ϕ f ∈ R(I ).Step 3. We now demonstrate Proposition 2.12 when ϕ : [−M, M ] → R is monotone andLipschitz. By Step 2, this will suffice. So we assume

−M ≤ x1 < x2 ≤ M =⇒ ϕ(x1) ≤ ϕ(x2) and ϕ(x2) − ϕ(x1) ≤ L(x2 − x1),

for some L < ∞. Given ε > 0, pick f 0, f 1 ∈ PK(I ), as in (2.40). Then

ϕ f 0, ϕ f 1 ∈ PK(I ), ϕ f 0 ≤ ϕ f ≤ ϕ f 1,

and I

(ϕ f 1 − ϕ f 0) dx ≤ L

I

(f 1 − f 0) dx ≤ Lε.

This proves ϕ f ∈ R(I ).

Exercises

1. Let c > 0 and let f : [ac, bc] → R be Riemann integrable. Working directly with thedefinition of integral, show that

(2.43) ba

f (cx) dx = 1c bcac

f (x) dx.

More generally, show that

(2.44)

b−d/ca−d/c

f (cx + d) dx = 1

c

bcac

f (x) dx.

2. Let f : I ×S → R be continuous, where I = [a, b] and S ⊂ Rn. Take ϕ(y) =

I f (x, y) dx.



118

Show that ϕ is continuous on S.Hint . If f j : I → R are continuous and |f 1(x) − f 2(x)| ≤ δ on I, then

(2.45) I

f 1 dx − I

f 2 dx ≤ (I )δ.

3. With f as in Exercise 2, suppose gj : S → R are continuous and a ≤ g0(y) < g1(y) ≤ b.

Take ϕ(y) = g1(y)

g0(y) f (x, y) dx. Show that ϕ is continuous on S.

Hint . Make a change of variables, linear in x, to reduce this to Exercise 2.

4. Let ϕ : [a, b] → [A, B] be C 1 on a neighborhood J of [a, b], with ϕ(x) > 0 for allx ∈ [a, b]. Assume ϕ(a) = A, ϕ(b) = B. Show that the identity

(2.46) BA

f (y) dy = ba

f ϕ(t)ϕ(t) dt,

for any f ∈ C (J ), follows from the chain rule and the Fundamental Theorem of Calculus.Hint . Replace b by x, B by ϕ(x), and differentiate.

4A. Show that (2.46) holds for each f ∈ PK(J ). Using (2.39)–(2.40), show that f ∈R(J ) ⇒ f ϕ ∈ R([A, B]) and (2.46) holds. (This result contains that of Exercise 1.)

5. Show that, if f and g are C 1 on a neighborhood of [a, b], then

(2.47) ba

f (s)g(s) ds = − ba

f (s)g(s) ds +

f (b)g(b) − f (a)g(a)

.

This transformation of integrals is called “integration by parts.”

6. Let f : (−a, a) → R be a C j+1 function. Show that, for x ∈ (−a, a),

(2.48) f (x) = f (0) + f (0)x + f (0)

2 x2 + · · · +

f (j)(0)

j! xj + Rj(x),

where

(2.49) Rj(x) =

x0

(x − s)j

j! f (j+1)(s) ds

This is Taylor’s formula with remainder.Hint . Use induction. If (2.48)–(2.49) holds for 0 ≤ j ≤ k, show that it holds for j = k + 1,by showing that

(2.50)

x0

(x − s)k

k! f (k+1)(s) ds =

f (k+1)(0)

(k + 1)! xk+1 +

x0

(x − s)k+1

(k + 1)! f (k+2)(s) ds.



119

To establish this, use the integration by parts formula (2.47), with f (s) replaced byf (k+1)(s), and with appropriate g(s). See §3 for another approach. Note that anotherpresentation of (2.49) is

(2.51) Rj(x) = xj+1

( j + 1)!

1

0

f (j+1)

1 − t1/(j+1)

x

dt.

7. Assume f : (−a, a) → R is a C j function. Show that, for x ∈ (−a, a), (2.48) holds, with

(2.52) Rj(x) = 1

( j − 1)!

x0

(x − s)j−1

f (j)(s) − f (j)(0)

ds.

Hint . Apply (2.49) with j replaced by j −1. Add and subtract f (j)(0) to the factor f (j)(s)

in the resulting integrand.

8. Given I = [a, b], show that

(2.53) f, g ∈ R(I ) =⇒ f g ∈ R(I ),

as advertised in (2.41).

9. Assume f k ∈ R(I ) and f k → f uniformly on I . Prove that f ∈ R(I ) and

(2.54) I

f k dx−→

I

f dx.

10. Given I = [a, b], I ε = [a + ε, b − ε], assume f k ∈ R(I ), |f k| ≤ M on I for all k, and

(2.55) f k −→ f uniformly on I ε,

for all ε ∈ (0, (b − a)/2). Prove that f ∈ R(I ) and (2.54) holds.

11. Use the fundamental theorem of calculus and results of §1 to compute

(2.56) ba

xr dx, r ∈ Q \ −1,

where −∞ < a < b < ∞ if r ≥ 0 and 0 < a < b < ∞ if r < 0. See §5 for (2.56) withr = −1.

12. Use the change of variable result of Exercise 4 to compute

1

0

x

1 + x2 dx.



120

13. We say f ∈ R(R) provided f |[k,k+1] ∈ R([k, k + 1]) for each k ∈ Z, and

(2.57)∞

k=−∞

k+1

k

|f (x)| dx < ∞.

If f ∈ R(R), we set

(2.58)

∞−∞

f (x) dx = limk→∞

k−k

f (x) dx.

Formulate and demonstrate basic properties of the integral over R of elements of R(R).

14. This exercise discusses the integral test for absolute convergence of an infinite series,which goes as follows. Let f be a positive, monotonically decreasing, continuous functionon [0, ∞), and suppose |ak| = f (k). Then

∞k=0

|ak| < ∞ ⇐⇒ ∞

0

f (x) dx < ∞.

Prove this.Hint. Use

N k=1

|ak| ≤ N 0

f (x) dx ≤N −1k=0

|ak|.

15. Use the integral test to show that, if p > 0,

∞k=1

1

k p < ∞ ⇐⇒ p > 1.

Note. Compare Exercise 7 in §6 of Chapter 1. (For now, p ∈ Q+. Results of §5 allow one

to take p ∈ R+.) Hint. Use Exercise 11 to evaluate I N ( p) =

N

1

x− p dx, for p =

−1, and

let N → ∞. See if you can show ∞1

x−1 dx = ∞ without knowing about log N . Subhint.

Show that 2

1 x−1 dx =

2N N

x−1 dx.

In Exercises 16–17, C ⊂ [a, b] is the Cantor set introduced in the exercises for §9 of Chapter1. As in (9.21) of Chapter 1, C = ∩j≥0Cj .

16. Show that cont+ Cj = (2/3)j(b − a), and conclude that

cont+ C = 0.



121

17. Define f : [a, b] → R as follows. We call an interval of length 3−j(b − a), omitted inpassing from Cj−1 to Cj , a “ j-interval.” Set

f (x) = 0, if x ∈ C,

(−1)j , if x belongs to a j-interval.

Show that the set of discontinuities of f is C. Hence Proposition 2.11 implies f ∈ R([a, b]).

18. Let f k ∈ R([a, b]) and f : [a, b] → R satisfy the following conditions.

|f k| ≤ M < ∞, ∀ k,(a)

f k(x)

−→f (x),

∀x

∈[a, b],(b)

Given ε > 0, there exists S ε ⊂ [a, b] such that

cont+ S ε < ε, and f k → f uniformly on [a, b] \ S ε.(c)

Show that f ∈ R([a, b]) and

ba

f k(x) dx −→ ba

f (x) dx, as k → ∞.

Remark. In the Lebesgue theory of integration, there is a stronger result, known as the

Lebesgue dominated convergence theorem. See Exercises 12–14 in §6 for more on this.



122

3. Power series

In §3 of Chapter 3 we introduced power series, of the form

(3.1) f (z) =∞k=0

ak(z − z0)k,

with ak ∈ C, and established the following.

Proposition 3.1. If the series (3.1) converges for some z1 = z0, then either this series is absolutely convergent for all z ∈ C or there is some R ∈ (0, ∞) such that the series is

absolutely convergent for |z − z0| < R and divergent for |z − z0| > R. The series converges uniformly on

(3.2) DS (z0) = z ∈ C : |z − z0| < S ,

for each S < R, and f is continuous on DR(z0).

We now restrict attention to cases where z0 ∈ R and z = t ∈ R, and apply calculus tothe study of such power series. We emphasize that we still allow the coefficients ak to becomplex numbers.

Proposition 3.2. Assume ak

∈C and

(3.3) f (t) =∞k=0

aktk

converges for real t satisfying |t| < R. Then f is differentiable on the interval −R < t < R,and

(3.4) f (t) =∞k=1

kaktk−1,

the latter series being absolutely convergent for |t| < R.

We first check absolute convergence of the series (3.4). Let S < T < R. Convergence of (3.3) implies there exists C < ∞ such that

(3.5) |ak|T k ≤ C, ∀ k.

Hence, if |t| ≤ S ,

(3.6) |kaktk−1| ≤ C

S kS

T

k,



123

which readily yields absolute convergence. (See Exercise 1 below.) Hence

(3.7) g(t) =∞

k=1

kaktk−1

is continuous on (−R, R). To show that f (t) = g(t), by the fundamental theorem of calculus, it is equivalent to show

(3.8)

t0

g(s) ds = f (t) − f (0).

The following result implies this.

Proposition 3.3. Assume bk ∈ C and

(3.9) g(t) =∞k=0

bktk

converges for real t, satisfying |t| < R. Then, for |t| < R,

(3.10)

t0

g(s) ds =∞k=0

bkk + 1

tk+1,

the series being absolutely convergent for |t| < R.

Proof. Since, for |

t|

< R,

(3.11) bk

k + 1tk+1

≤ R|bktk|,

convergence of the series in (3.10) is clear. Next, write

(3.12)

g(t) = S N (t) + RN (t),

S N (t) =N k=0

bktk, RN (t) =∞

k=N +1

bktk.

As in the proof of Proposition 3.2 in Chapter 3, pick S < T < R. There exists C < ∞such that |bkT k| ≤ C for all k. Hence

(3.13) |t| ≤ S ⇒ |RN (t)| ≤ C ∞

k=N +1

S

T

k= C εN → 0, as N → ∞.

so

(3.14)

t0

g(s) ds =N k=0

bkk + 1

tk+1 +

t0

RN (s) ds,



124

and, for |t| ≤ S ,

(3.15) t

0

RN (s) ds ≤ t

0 |RN (s)

|ds

≤CRεN .

This gives (3.10).

Second proof of Proposition 3.2. As shown in Proposition 3.7 of Chapter 3, if |t1| < R,then f (t) has a convergent power series about t1:

(3.16) f (t) =∞k=0

bk(t − t1)k, for |t − t1| < R − |t1|,

with

(3.17) b1 =∞n=1

nantn−11 .

This clearly implies f is differentiable at t1, and f (t1) is given by (3.17).

Remark. The definition of (3.10) for t < 0 follows standard convention. More generally,if a < b and g ∈ R([a, b]), then

ab

g(s) ds = − ba

g(s) ds.

More generally, if we have a power series about t0,

(3.18) f (t) =∞k=0

ak(t − t0)k, for |t − t0| < R,

then f is differentiable for |t − t0| < R and

(3.19) f (t) =∞k=1

kak(t − t0)k−1.

We can then differentiate this power series, and inductively obtain

(3.20) f (n)(t) =∞k=n

k(k − 1) · · · (k − n + 1)ak(t − t0)k−n.



125

In particular,

(3.21) f (n)(t0) = n! an.

We can turn (3.21) around and write

(3.22) an = f (n)(t0)

n! .

This suggests the following method of taking a given function and deriving a power seriesrepresentation. Namely, if we can, we compute f (k)(t0) and propose that

(3.23) f (t) =∞

k=0

f (k)(t0)

k! (t − t0)k,

at least on some interval about t0. To take an example, consider

(3.24) f (t) = (1 − t)−r,

with r ∈ Q (but −r /∈ N), and take t0 = 0. (Results of §5 will allow us to extend thisanalysis to r ∈ R.) Using (1.36), we get

(3.25) f (t) = r(1 − t)−(r+1),

for t < 1. Inductively, for k ∈ N,

(3.26) f (k)(t) =k−1=0

(r + )

(1 − t)−(r+k).

Hence, for k ≥ 1,

(3.27) f (k)(0) =k−1=0

(r + ) = r(r + 1) · · · (r + k − 1).

Consequently, we propose that

(3.28) (1 − t)−r =∞k=0

akk!

tk, |t| < 1,

with

(3.29) a0 = 1, ak =k−1=0

(r + ), for k ≥ 1.



126

We can verify convergence of the right side of (3.28) by using the ratio test:

(3.30) ak+1tk+1/(k + 1)!

aktk

/k! = k + r

k + 1 |t

|.

This computation implies that the power series on the right side of (3.28) is absolutelyconvergent for |t| < 1, yielding a function

(3.31) g(t) =∞k=0

akk!

tk, |t| < 1.

It remains to establish that g(t) = (1 − t)−r.We take up this task, on a more general level. Establishing that the series

(3.32)∞k=0

f (k)(t0)k!

(t − t0)k

converges to f (t) is equivalent to examining the remainder Rn(t, t0) in the finite expansion

(3.33) f (t) =nk=0

f (k)(t0)

k! (t − t0)k + Rn(t, t0).

The series (3.32) converges to f (t) if and only if Rn(t, t0) → 0 as n → ∞. To see when thishappens, we need a compact formula for the remainder Rn, which we proceed to derive.

It seems to clarify matters if we switch notation a bit, and write

(3.34) f (x) = f (y) + f (y)(y − x) + · · · + f (n)(y)

n! (x − y)n + Rn(x, y).

We now take the y-derivative of each side of (3.34). The y-derivative of the left side is 0,and when we apply ∂/∂y to the right side, we observe an enormous amount of cancellation.There results the identity

(3.35) ∂Rn

∂y (x, y) = − 1

n!f (n+1)(y)(x − y)n.

Also,

(3.36) Rn(x, x) = 0.

If we concentrate on Rn(x, y) as a function of y and look at the difference quotient[Rn(x, y) − Rn(x, x)]/(y − x), an immediate consequence of the mean value theorem isthat, if f is real valued,

(3.37) Rn(x, y) = 1

n!(x − y)(x − ξ n)nf (n+1)(ξ n),



127

for some ξ n betweeen x and y. This is known as Cauchy’s formula for the remainder. If f (n+1) is continuous, we can apply the fundamental theorem of calculus to (3.35)–(3.36),and obtain the integral formula

(3.38) Rn(x, y) = 1

n!

xy

(x − s)nf (n+1)(s) ds.

This works regardless of whether f is real valued. Another derivation of (3.38) arose inthe exercise set for §1. The change of variable x − s = t(x − y) gives the integral formula

(3.39) Rn(x, y) = 1

n!(x − y)n+1

1

0

tnf (n+1)(ty + (1 − t)x) dt.

If we think of this as 1/(n + 1) times a weighted mean of f (n+1)

, we get the Lagrangeformula for the remainder,

(3.40) Rn(x, y) = 1

(n + 1)!(x − y)n+1f (n+1)(ζ n),

for some ζ n between x and y, provided f is real valued. The Lagrange formula is shorterand neater than the Cauchy formula, but the Cauchy formula is actually more powerful.The calculations in (3.43)–(3.54) below will illustrate this.

Note that, if I (x, y) denotes the interval with endpoints x and y (e.g., (x, y) if x < y),then (3.38) implies

(3.41) |Rn(x, y)| ≤ |x − y|n!

supξ∈I (x,y)

|(x − ξ )nf (n+1)(ξ )|,

while (3.39) implies

(3.42) |Rn(x, y)| ≤ |x − y|n+1

(n + 1)! supξ∈I (x,y)

|f (n+1)(ξ )|.

In case f is real valued, (3.41) also follows from the Cauchy formula (3.37) and (3.42)follows from the Lagrange formula (3.40).

Let us apply these estimates with f as in (3.24), i.e.,

(3.43) f (x) = (1 − x)−r,

and y = 0. By (3.26),

(3.44) f (n+1)(ξ ) = an+1(1 − ξ )−(r+n+1), an+1 =n=0

(r + ).



128

Thus, if n is sufficiently large that r + n + 1 > 0,

(3.45)

supξ∈I (x,0)

|f (n+1)(ξ )| = |an+1| if − 1 ≤ x ≤ 0,

|an+1|(1 − x)−(r+n+1) if 0 ≤ x < 1.

Thus (3.42) implies

(3.46)

|Rn(x, 0)| ≤ |an+1|(n + 1)!

|x|n+1 if − 1 ≤ x ≤ 0,

|an+1|(n + 1)!

1

(1 − x)r

x

1 − x

n+1

if 0 ≤ x < 1.

Now

(3.47) bn = |an+1|(n + 1)!

satisfies

(3.48) bn+1

bn=

n + 1 + r

n + 2 = 1 +

r − 1

n + 2,

if n + 1 + r > 0, and this tends to 1 as n → ∞, so we conclude from the first part of (3.46)that

(3.49) Rn(x, 0) −→ 0 as n → ∞, if − 1 < x ≤ 0.

On the other hand, x/(1 − x) is < 1 for 0 ≤ x < 1/2, but not for 1/2 ≤ x < 1. Hence thefactor(x/(1 − x))n+1 decreases geometrically for 0 ≤ x < 1/2, but not for 1/2 ≤ x < 1.Thus the second part of (3.46) yields only

(3.50) Rn(x, 0) −→ 0 as n → ∞, if 0 ≤ x < 1

2.

This what the estimate (3.42) yields. To get the better result

(3.51) Rn(x, 0) −→ 0 as n → ∞, if 0 ≤ x < 1,

we use the estimate (3.41). This gives

(3.52) |Rn(x, 0)| ≤ |an+1|n!

|x| supξ∈I (x,0)

|x − ξ |n|1 − ξ |n+1+r

.

Now

(3.53) 0 ≤ ξ ≤ x < 1 =⇒ x − ξ

1 − ξ ≤ x,



129

since the conclusion is equivalent to x −ξ ≤ x−xξ , and hence to xξ ≤ ξ . Hence we deducefrom (3.52) that

(3.54)0 ≤ x < 1 ⇒ |Rn(x, 0)| ≤ |

an+1

|n!

xn+1

(1 − x)1+r

= (n + 1)bnxn+1

(1 − x)1+r,

with bn as in (3.47). The analysis (3.48) of bn then gives (3.51), as a consequence of (3.54).We can now confidently state that (3.28) holds, with ak given by (3.29).

There are some important examples of power series representations for which one doesnot need to use remainder estimates like (3.41) or (3.42). For example, as seen in Chapter1, we have

(3.55)nk=0

xk = 1 − xn+1

1 − x ,

if x = 1. The right side tends to 1/(1 − x) as n → ∞, if |x| < 1, so we get

(3.56) 1

1 − x =

∞k=0

xk, |x| < 1,

without further ado, which is the case r = 1 of (3.28)–(3.29). We can differentiate (3.56)repeatedly to get

(3.57) (1 − x)−n =∞k=0

ck(n)xk, |x| < 1, n ∈ N,

and verify that (3.57) agrees with (3.28)–(3.29) with r = n. However, when r /∈ Z, suchan analysis of Rn(x, 0) as made above seems necessary.

Let us also note that we can apply Proposition 3.3 to (3.56), obtaining

(3.58)∞k=0

xk+1

k + 1 =

x0

dy

1 − y, |x| < 1.

Material covered in §

5 will produce another formula for the right side of (3.58).

Exercises

1. Show that (3.6) yields the absolute convergence asserted in the proof of Proposition3.2. More generally, show that, for any n ∈ N, r ∈ (0, 1),

∞k=1

knrk < ∞.



130

Hint. Refer to the ratio test, discussed in §3 of Chapter 3.

2. A special case of (3.18)–(3.21) is that, given a polynomial p(t) = antn + · · · + a1t + a0,

we have p(k)

(0) = k! ak. Apply this to

P n(t) = (1 + t)n.

Compute P (k)n (t) using (1.7) repeatedly, then compute P

(k)n (0), and use this to establish

the binomial formula:

(1 + t)n =nk=0

n

k

tk,

n

k

=

n!

k!(n − k)!.

3. Find the coefficients in the power series

1√ 1 − x4

=∞k=0

bkxk.

Show that this series converges to the left side for |x| < 1.Hint. Take r = 1/2 in (3.28)–(3.29) and set t = x4.

4. Expand

x0

dy 1 − y4

in a power series in x. Show this holds for |x| < 1.

5. Expand x0

dy 1 + y4

as a power series in x. Show that this holds for |x| < 1.

6. Expand 1

0

dt√ 1 + xt4

as a power series in x. Show that this holds for |x| < 1.

7. Show that another formula for Rn(x, y) in (3.34) is

(3.59) Rn(x, y) = 1

(n − 1)!

xy

(x − s)n−1[f (n)(s) − f (n)(y)] ds.



131

Hint. Do (3.34)–(3.38) with n replaced by n − 1, and then write

Rn−1(x, y) = f (n)(y)

n! + Rn(x, y).

Remark. An advantage of (3.59) over (3.38) is that for (3.59), we need only f ∈ C n,rather than f ∈ C n+1.

8. Since 2 = (9 − 1)/4, we have√

2 = 3

2

1 − 1

9.

Expand the right side in a power series, using (3.28)–(3.29). How many terms suffice to

approximate√

2 to 12 digits?

9. In the setting of Exercise 8, investigate series that converge faster, such as

√ 2 = (1.4)

1 +

4

196.

10. Apply variants of the methods of Exercises 8–9 to approximate√

3,√

5,√

7, and√ 1001.

11. Assume F

∈C ([a, b]), g

∈ R([a, b]), F real valued, and g

≥0 on [a, b]. Show that

ba

g(t)F (t) dt = ba

g(t) dt

F (ζ ),

for some ζ ∈ (a, b). Show how this result justifies passing from (3.39) to (3.40).

Hint. If A = min F, B = max F , and M = ba

g(t) dt, show that

AM ≤ ba

g(t)F (t) dt ≤ BM.



132

4. Curves and arc length

The term “curve” is commonly used to refer to a couple of different, but closely related,objects. In one meaning, a curve is a continuous function from an interval I ⊂ R ton-dimensional Euclidean space:

(4.1) γ : I −→ Rn, γ (t) = (γ 1(t), . . . , γ n(t)).

We say γ is differentiable provided each component γ j is, in which case

(4.2) γ (t) = (γ 1(t), . . . , γ n(t)).

γ (t) is the velocity of γ , at “time” t, and its speed is the magnitude of γ (t):

(4.3) |γ (t)| =

γ 1(t)2 + · · · + γ n(t)2.

We say γ is smooth of class C k provided each component γ j(t) has this property.One also calls the image of I under the map γ a curve in Rn. If u : J → I is continuous,

one-to-one, and onto, the map

(4.4) σ : J −→ Rn, σ(t) = γ (u(t))

has the same image as γ . We say σ is a reparametrization of γ . We usually require that ube C 1, with C 1 inverse. If γ is C k and u is also C k, so is σ, and the chain rule gives

(4.5) σ(t) = u(t)γ (u(t)).

Let us assume I = [a, b] is a closed, bounded interval, and γ is C 1. We want to definethe length of this curve. To get started, we take a partition P of [a, b], given by

(4.6) a = t0 < t1 < · · · < tN = b,

and set

(4.7) P (γ ) =N j=1

|γ (tj) − γ (tj−1)|.

We will massage the right side of (4.7) into something that looks like a Riemann sum for ba |γ (t)| dt. We have

(4.8)

γ (tj) − γ (tj−1) =

tjtj−1

γ (t) dt

=

tjtj−1

γ (tj) + γ (t) − γ (tj)

dt

= (tj − tj−1)γ (tj) +

tjtj−1

γ (t) − γ (tj)

dt.



133

We get

(4.9) |γ (tj) − γ (tj−1)| = (tj − tj−1)|γ (t)| + rj ,

with

(4.10) |rj | ≤ tjtj−1

|γ (t) − γ (tj)| dt.

Now if γ is continuous on [a, b], so is |γ |, and hence both are uniformly continuous on[a, b]. We have

(4.11) s, t ∈ [a, b], |s − t| ≤ h =⇒ |γ (t) − γ (s)| ≤ ω(h),

where ω(h) → 0 as h → 0. Summing (4.9) over j, we get

(4.12) P (γ ) =N j=1

|γ (tj)|(tj − tj−1) + RP ,

with

(4.13) |RP | ≤ (b − a)ω(h), if each tj − tj−1 ≤ h.

Since the sum on the right side of (4.12) is a Riemann sum, we can apply Theorem 2.4 toget the following.

Proposition 4.1. Assume γ : [a, b] → Rn is a C 1 curve. Then

(4.14) P (γ ) −→ ba

|γ (t)| dt as maxsize P → 0.

We call this limit the length of the curve γ , and write

(4.15) (γ ) =

ba

|γ (t)| dt.

Note that if u : [α, β ] → [a, b] is a C 1 map with C 1 inverse, and σ = γ u, as in (4.4), wehave from (4.5) that |σ(t)| = |u(t)| · |γ (u(t))|, and the change of variable formula (2.46)for the integral gives

(4.16)

βα

|σ(t)| dt =

ba

|γ (t)| dt,

hence we have the geometrically natural result

(4.17) (σ) = (γ ).



134

Given such a C 1 curve γ , it is natural to consider the length function

(4.18) γ (t) = t

a |γ (s)

|ds, γ (t) =

|γ (t)

|.

If we assume also that γ is nowhere vanishing on [a, b], Theorem 1.3, the inverse functiontheorem, implies that γ : [a, b] → [0, (γ )] has a C 1 inverse

(4.19) u : [0, (γ )] −→ [a, b],

and then σ = γ u : [0, (γ )] → Rn satisfies

(4.20)

σ(t) = u(t)γ (u(t))

=

1

γ (s) γ

(u(t)), for t = γ (s), s = u(t),

and by (4.18), γ (s) = |γ (s)| = |γ (u(t))|, so

(4.21) |σ(t)| ≡ 1.

Then σ is a reparametrization of γ , and σ has unit speed. We say σ is a reparametrizationby arc length.

We now focus on that most classical example of a curve in the plane R2, the unit circle

(4.22) S 1

= (x, y) ∈ R2

: x2

+ y2

= 1.

We can parametrize S 1 away from (x, y) = (±1, 0) by

(4.23) γ +(t) = (t,

1 − t2), γ −(t) = (t, −

1 − t2),

on the intersection of S 1 with (x, y) : y > 0 and (x, y) : y < 0, respectively. Hereγ ± : (−1, 1) → R2, and both maps are smooth. In fact, we can take γ ± : [−1, 1] → R2,but these functions are not differentiable at ±1. We can also parametrize S 1 away from(x, y) = (0, ±1), by

(4.24) γ (t) = (− 1 − t2, t), γ r(t) = ( 1 − t2, t),

again with t ∈ (−1, 1). Note that

(4.25) γ +(t) = (1, −t(1 − t2)−1/2),

so

(4.26) |γ +(t)|2 = 1 + t2

1 − t2 =

1

1 − t2.



135

Hence, if (t) is the length of the image γ +([0, t]), we have

(4.27) (t) = t

0

1

√ 1 − s2

ds, for 0 < t < 1.

The same formula holds with γ + replaced by γ −, γ , or γ r.We can evaluate the integral (4.27) as a power series in t, as follows. As seen in §3,

(4.28) (1 − r)−1/2 =∞k=0

akk!

rk, for |r| < 1,

where

(4.29) a0 = 1, a1 =

1

2 , ak = 1

23

2 · · ·k − 1

2.

The power series converges uniformly on [−ρ, ρ], for each ρ ∈ (0, 1). It follows that

(4.30) (1 − s2)−1/2 =∞k=0

akk!

s2k, |s| < 1,

uniformly convergent on [−a, a] for each a ∈ (0, 1). Hence we can integrate (4.30) term byterm to get

(4.31) (t) =

∞k=0

akk! t

2k+1

2k + 1 , 0 ≤ t < 1.

One can use (4.27)–(4.31) to get a rapidly convergent infinite series for the number π,defined as

(4.31A) π is half the length of S 1.

See Exercise 7 in §5.Since S 1 is a smooth curve, it can be parametrized by arc length. We will let C : R → S 1

be such a parametrization, satisfying

(4.32) C (0) = (1, 0), C (0) = (0, 1),

so C (t) traverses S 1 counter-clockwise, as t increases. For t moderately bigger than 0, therays from (0, 0) to (1, 0) and from (0, 0) to C (t) make an angle that, measured in radians,is t. This leads to the standard trigonometrical functions cos t and sin t, defined by

(4.33) C (t) = (cos t, sin t),

when C is such a unit-speed parametrization of S 1.



136

We can evaluate the derivative of C (t) by the following device. Applying d/dt to theidentity

(4.34) C (t) · C (t) = 1

and using the product formula gives

(4.35) C (t) · C (t) = 0.

since both |C (t)| ≡ 1 and |C (t)| ≡ 1, (4.35) allows only two possibilities. Either

(4.36) C (t) = (sin t, − cos t).

or

(4.37) C (t) = (− sin t, cos t).

Since C (0) = (0, 1), (4.36) is not a possibility. This implies

(4.38) d

dt cos t = − sin t,

d

dt sin t = cos t.

We will derive further important results on cos t and sin t in §5.One can think of cos t and sin t as special functions arising to analyze the length of arcs

in the circle. Related special functions arise to analyze the length of portions of a parabolain R2, say the graph of

(4.39) y = 1

2x2.

This curve is parametrized by

(4.40) γ (t) =

t, 1

2t2

,

so

(4.41) γ (t) = (1, t).

In such a case, the length of γ ([0, t]) is

(4.42) γ (t) =

t0

1 + s2 ds.

Methods to evaluate the integral in (4.42) are provided in §5. See Exercise 10 of §5.



137

The study of lengths of other curves has stimulated much work in analysis. Anotherexample is the ellipse

(4.43) x2

a2 + y2

b2 = 1,

given a, b ∈ (0, ∞). This curve is parametrized by

(4.44) γ (t) = (a cos t, b sin t).

In such a case, by (4.38), γ (t) = (−a sin t, b cos t), so

(4.45)|γ (t)|2 = a2 sin2 t + b2 cos2 t

= b2 + γ sin2 t, γ = a2

−b2,

and hence the length of γ ([0, t]) is

(4.46) γ (t) = b

t0

1 + σ sin2 s ds, σ =

γ

b2.

If a = b, this is called an elliptic integral, and it gives rise to a more subtle family of special functions, called elliptic functions. Material on this can be found in §33 of [T3],Introduction to Complex Analysis .

We end this section with a brief discussion of curves in polar coordinates . We define a

map

(4.47) Π : R2 −→ R2, Π(r, θ) = (r cos θ, r sin θ).

We say (r, θ) are polar coordinates of (x, y) ∈ R2 if Π(r, θ) = (x, y). Now, Π in (4.47) isnot bijective, since

(4.48) Π(r, θ + 2π) = Π(r, θ), Π(r, θ + π) = Π(−r, θ),

and Π(0, θ) is independent of θ. So polar coordinates are not unique, but we will notbelabor this point. The point we make is that an equation

(4.49) r = ρ(θ), ρ : [a, b] → R,

yields a curve in R2, namely (with θ = t)

(4.50) γ (t) = (ρ(t)cos t, ρ(t)sin t), a ≤ t ≤ b.

The circle (4.33) corresponds to ρ(θ) ≡ 1. Other cases include

(4.51) ρ(θ) = a cos θ, −π

2 ≤ θ ≤ π

2,



138

yielding a circle of diameter a centered at (a/2, 0), and

(4.52) ρ(θ) = a cos3θ,

yielding a figure called a three-leaved rose.To compute the arc length of (4.50), we note that, by (4.38),

(4.53)x(t) = ρ(t)sin t, y(t) = ρ(t)sin t

⇒ x(t) = ρ(t)cos t − ρ(t)sin t, y(t) = ρ(t)sin t + ρ(t)cos t,

hence

(4.54)

x(t)2 + y(t)2 = ρ(t)2 cos2 t − 2ρ(t)ρ(t)cos t sin t + ρ(t)2 sin2 t

+ ρ

(t)

2

sin

2

t + 2ρ(t)ρ

(t)sin t cos t + ρ(t)

2

cos

2

t= ρ(t)2 + ρ(t)2.

Therefore

(4.55) (γ ) =

ba

|γ (t)| dt =

ba

ρ(t)2 + ρ(t)2 dt.

Exercises

1. Let γ (t) = (t2, t3). Compute the length of γ ([0, t]).

2. With a,b > 0, the curveγ (t) = (a cos t, a sin t,bt)

is a helix. Compute the length of γ ([0, t]).

3. Let

γ (t) = t, 2

√ 2

3

t3/2, 1

2

t2

.

Compute the length of γ ([0, t]).

4. In case b > a for the ellipse (4.44), the length formula (4.46) becomes

γ (t) = b

t0

1 − β 2 sin2 s ds, β 2 =

b2 − a2

b2 ∈ (0, 1).

Apply the change of variable x = sin s to this integral (cf. (2.46)), and write out theresulting integral.



139

5. The second half of (4.48) is equivalent to the identity

(cos(θ + π), sin(θ + π)) = −(cos θ, sin θ).

Deduce this from the definition (4.31A) of π, together with the characterization of C (t)in (4.33) as the unit speed parametrization of S 1, satisfying (4.32). For a more generalidentity, see (5.44).

6. The curve defined by (4.51) can be written

γ (t) = (a cos2 t, a cos t sin t), −π

2 ≤ t ≤ π

2.

Peek ahead at (5.44) and show that

γ (t) =a

2 +

a

2 cos 2t,

a

2 sin 2t

.

Verify that this traces out a circle of radius a/2, centered at (a/2, 0).

7. Use (4.55) to write the arc length of the curve given by (4.52) as an integral. Show thisintegral has the same general form as (4.45)–(4.46).



140

5. The exponential and trigonometric functions

The exponential function is one of the central objects of analysis. In this section we de-fine the exponential function, both for real and complex arguments, and establish a numberof basic properties, including fundamental connections to the trigonometric functions.

We construct the exponential function to solve the differential equation

(5.1) dx

dt = x, x(0) = 1.

We seek a solution as a power series

(5.2) x(t) =

∞k=0

aktk.

In such a case, if this series converges for |t| < R, then, by Proposition 3.2,

(5.3)

x(t) =∞k=1

kaktk−1

=∞=0

( + 1)a+1t,

so for (5.1) to hold we need

(5.4) a0 = 1, ak+1 = akk + 1

,

i.e., ak = 1/k!, where k! = k(k − 1) · · · 2 · 1. Thus (5.1) is solved by

(5.5) x(t) = et =∞k=0

1

k!tk, t ∈ R.

This defines the exponential function et.

More generally, we can define

(5.6) ez =∞k=0

1

k!zk, z ∈ C.

The ratio test then shows that the series (5.6) is absolutely convergent for all z ∈ C, anduniformly convergent for |z| ≤ R, for each R < ∞. Note that, again by Proposition 3.2,

(5.7) eat =∞k=0

ak

k!tk



141

solves

(5.8) d

dt

eat = aeat,

and this works for each a ∈ C.We claim that eat is the unique solution to

(5.9) dy

dt = ay, y(0) = 1.

To see this, compute the derivative of e−aty(t):

(5.10) d

dte−aty(t)

= −ae−aty(t) + e−atay(t) = 0,

where we use the product rule, (5.8) (with a replaced by −a) and (5.9). Thus e−aty(t) isindependent of t. Evaluating at t = 0 gives

(5.11) e−aty(t) = 1, ∀ t ∈ R,

whenever y(t) solves (5.9). Since eat solves (5.9), we have e−ateat = 1, hence

(5.12) e−at = 1

eat, ∀ t ∈ R, a ∈ C.

Thus multiplying both sides of (5.11) by eat gives the asserted uniqueness:

(5.13) y(t) = eat, ∀ t ∈ R.

We can draw further useful conclusions from applying d/dt to products of exponentialfunctions. In fact, let a, b ∈ C; then

(5.14)

d

dt

e−ate−bte(a+b)t

= −ae−ate−bte(a+b)t − be−ate−bte(a+b)t + (a + b)e−ate−bte(a+b)t

= 0,

so again we are differentiating a function that is independent of t. Evaluation at t = 0gives

(5.15) e−ate−bte(a+b)t = 1, ∀ t ∈ R.

Again using (5.12), we get

(5.16) e(a+b)t = eatebt, ∀ t ∈ R, a , b ∈ C,



142

or, setting t = 1,

(5.17) ea+b = eaeb, ∀ a, b ∈ C.

We next record some properties of exp(t) = et for real t. The power series (5.5) clearlygives et > 0 for t ≥ 0. Since e−t = 1/et, we see that et > 0 for all t ∈ R. Sincedet/dt = et > 0, the function is monotone increasing in t, and since d2et/dt2 = et > 0,this function is convex. (See Proposition 1.5 and the remark that follows it.) Note that,for t > 0,

(5.18) et = 1 + t + t2

2 + · · · > 1 + t +∞,

as t ∞. Hence

(5.19) limt→+∞

et = +∞.

Since e−t = 1/et,

(5.20) limt→−∞

et = 0.

As a consequence,

(5.21) exp : R −→ (0, ∞)

is one-to-one and onto, with positive derivative, so there is a smooth inverse

(5.22) L : (0, ∞) −→ R.

We call this inverse the natural logarithm:

(5.23) log x = L(x).

See Figures 5.1 and 5.2 for graphs of x = et and t = log x.Applying d/dt to

(5.24) L(et) = t

gives

(5.25) L(et)et = 1, hence L(et) = 1

et,

i.e.,

(5.26) d

dx log x =

1

x.



143

Figure 5.1

Figure 5.2

Since log 1 = 0, we get

(5.27) log x =

x1

dy

y .

An immediate consequence of (5.17) (for a, b ∈ R) is the identity

(5.28) log xy = log x + log y, x, y ∈ (0, ∞).

We move on to a study of ez for purely imaginary z, i.e., of

(5.29) γ (t) = eit, t ∈ R.



144

This traces out a curve in the complex plane, and we want to understand which curve itis. Let us set

(5.30) eit = c(t) + is(t),

with c(t) and s(t) real valued. First we calculate |eit|2 = c(t)2 + s(t)2. For x, y ∈ R,

(5.31) z = x + iy =⇒ z = x − iy =⇒ zz = x2 + y2 = |z|2.

It is elementary that

(5.32)z, w ∈ C =⇒ zw = z w =⇒ zn = zn,

and z + w = z + w.

Hence

(5.33) ez =∞

k=0

zk

k! = ez.

In particular,

(5.34) t ∈ R =⇒ |eit|2 = eite−it = 1.

Hence t → γ (t) = eit traces out the unit circle centered at the origin in C. Also

(5.35) γ (t) = ieit =⇒ |γ (t)| ≡ 1,

so γ (t) moves at unit speed on the unit circle. We have

(5.36) γ (0) = 1, γ (0) = i.

Thus, for moderate t > 0, the arc from γ (0) to γ (t) is an arc on the unit circle, pictured

in Figure 5.3, of length

(5.37) (t) =

t0

|γ (s)| ds = t.

Figure 5.3



145

Standard definitions from trigonometry (cf. (4.33)) say that the line segments from 0 to1 and from 0 to γ (t) meet at angle t (in radians), and that

(5.38) cos t = c(t), sin t = s(t).

Thus (5.30) becomes

(5.39) eit = cos t + i sin t,

which is Euler’s formula. The identity

(5.40) d

dteit = ieit,

applied to (5.39), yields

(5.41) d

dt cos t = − sin t,

d

dt sin t = cos t.

Compare the derivation of (4.38). We can use (5.17) to derive formulas for sin and cos of the sum of two angles. Indeed, comparing

(5.42) ei(s+t) = cos(s + t) + i sin(s + t)

with

(5.43) eiseit = (cos s + i sin s)(cos t + i sin t)

gives

(5.44)cos(s + t) = (cos s)(cos t) − (sin s)(sin t),

sin(s + t) = (sin s)(cos t) + (cos s)(sin t).

Further material on the trigonometric functions is developed in the exercises below.

Exercises.

1. Show that

(5.45) |t| < 1 ⇒ log(1 + t) =∞k=1

(−1)k−1

k tk = t − t2

2 +

t3

3 − · · · .



146

Hint. Rewrite (5.27) as

log(1 + t) =

t0

ds

1 + s,

expand1

1 + s = 1 − s + s2 − s3 + · · · , |s| < 1,

and integrate term by term.

2. In §4, π was defined to be half the length of the unit circle S 1. Equivalently, π is thesmallest positive number such that eπi = −1. Show that

e

πi/2

= i, e

πi/3

=

1

2 +

√ 3

2 i.

Hint. See Figure 5.4.

Figure 5.4

3. Show that

cos2 t + sin2 t = 1,

and

1 + tan2 t = sec2 t,

where

tan t = sin t

cos t, sec t =

1

cos t.



147

4. Show thatd

dt

tan t = sec2 t = 1 + tan2 t,

d

dt sec t = sec t tan t.

5. Evaluate y0

dx

1 + x2.

Hint. Set x = tan t.

6. Evaluate

y0

dx√ 1 − x2 .

Hint. Set x = sin t.

7. Show thatπ

6 =

1/2

0

dx√ 1 − x2

.

Use (4.27)–(4.31) to obtain a rapidly convergent infinite series for π.Hint. Show that sin π/6 = 1/2. Use Exercise 2 and the identity eπi/6 = eπi/2e−πi/3. Notethat a

k in (4.29)–(4.31) satisfies a

k+1 = (k + 1/2)a

k. Deduce that

(5.45A) π =∞k=0

bk2k + 1

, b0 = 3, bk+1 = 1

4

2k + 1

2k + 2 bk.

8. Set

cosh t = 1

2(et + e−t), sinh t =

1

2(et − e−t).

Show thatd

dt cosh t = sinh t,

d

dt sinh t = cosh t,

and

cosh2 t − sinh2 t = 1.

9. Evaluate y0

dx√ 1 + x2

.

Hint. Set x = sinh t.



148

10. Evaluate

y

0 1 + x2 dx.

11. Using Exercise 4, verify that

d

dt(sec t + tan t) = sec t(sec t + tan t),

d

dt(sec t tan t) = sec3 t + sec t tan2 t,

= 2 sec3 t − sec t.

12. Next verify that

ddt

log | sec t| = tan t,

d

dt log | sec t + tan t| = sec t.

13. Now verify that tan t dt = log | sec t|, sec t dt = log | sec t + tan t|,

2 sec3

t dt = sec t tan t + sec tdt.

(Here and below, we omit the arbitrary additive constants.)

14. Here is another approach to the evaluation of

sec t dt. Using Exercise 8 and the chainrule, show that

d

du cosh−1 u =

1√ u2 − 1

.

Take u = sec t and use Exercises 3–4 to get

d

dt cosh−1

(sec t) =

sec t tan t

tan t = sec t,

hence sec t dt = cosh−1(sec t).

Compare this with the analogue in Exercise 13.

15. Show that

E an(t) =nk=0

ak

k!tk satisfies

d

dtE an(t) = aE an−1(t).



149

From this, show thatd

dt

e−atE an(t)

= −an+1

n! tne−at.

16. Use Exercise 15 and the fundamental theorem of calculus to show that tne−at dt = − n!

an+1E an(t)e−at

= − n!

an+1

1 + at +

a2t2

2! + · · · +

antn

n!

e−at.

17. Take a = −i in Exercise 16 to produce formulas for

tn cos t dt and tn sin tdt.

Exercises on xr

In §1, we defined xr for x > 0 and r ∈ Q. Now we define xr for x > 0 and r ∈ R, asfollows:

(5.46) x

r

= e

r log x

.

18. Show that if r = n ∈ N, (5.46) yields xn = x · · · x (n factors).

19. Show that if r = 1/n, x1/n defined by (5.46) satisfies

x = x1/n · · · x1/n (n factors),

and deduce that x1/n, defined by (5.46), coincides with x1/n as defined in §1.

20. Show that xr, defined by (5.46), coincides with xr as defined in §1, for all r ∈ Q.

21. Show that, for x > 0,xr+s = xrxs, ∀ r, s ∈ R.

22. Show that, given r ∈ R,

d

dxxr = rxr−1, ∀ x > 0.



150

23. Show that, given r, rj ∈ R, x > 0,

rj → r =⇒ xrj

→ xr

.

24. Given a > 0, computed

dxax, x ∈ R.

25. Computed

dxxx, x > 0.

26. Prove thatx1/x −→ 1, as x → ∞.

Hint. Show thatlog x

x −→ 0, as x → ∞.

27. Verify that 1

0

xx dx =

1

0

ex log x dx

= ∞0

e−ye−y

e−y dy

=∞n=1

∞0

(−1)n

n! yne−(n+1)y dy.

28. Show that, if α > 0, n ∈ N, ∞0

yne−αy dy = (−1)nF (n)(α),

whereF (α) =

∞0

e−αy dy = 1

α.

29. Using Exercises 27–28, show that

1

0

xx dx =∞n=0

(−1)n(n + 1)−(n+1)

= 1 − 1

22 +

1

33 − 1

44 + · · · .



151

Some special series

30. Using (5.45), show that∞k=1

(−1)k−1

k = log 2.

Hint. Using properties of alternating series, show that if t ∈ (0, 1),

N k=1

(−1)k−1

k tk = log(1 + t) + rN (t), |rN (t)| ≤ tN +1

N + 1,

and let t 1.

31. Using the result of Exercise 5, show that

∞k=0

(−1)k

2k + 1 =

π

4.

Hint. Exercise 5 implies

tan−1 y =∞k=0

(−1)k

2k + 1y2k+1, for − 1 < y < 1.

Use an argument like that suggested for Exercise 30, taking y 1.

Alternative approach to exponentials and logs

An alternative approach is to define log : (0,

∞)

→R first and derive some of its properties,

and then define the exponential function Exp : R → (0, ∞) as its inverse. The followingexercises describe how to implement this. To start, we take (5.27) as a definition:

(5.47) log x =

x1

dy

y , x > 0.

32. Using (5.47), show that

(5.48) log(xy) = log x + log y, ∀ x,y > 0.



152

Also show

(5.49) log 1

x =

−log x,

∀x > 0.

33. Show from (5.47) that

(5.50) d

dx log x =

1

x, x > 0.

34. Show that log x → +∞ as x → +∞.(Hint. See the hint for Exercise 15 in §2.)Then show that log x → −∞ as x → 0.

35. Deduce from Exercises 33 and 34, together with Theorem 1.3, that

log : (0, ∞) −→ R is one-to-one and onto,

with a differentiable inverse. We denote the inverse function

Exp : R −→ (0, ∞), also set et = Exp(t).

36. Deduce from Exercise 32 that

(5.51) es+t = eset, ∀ s, t ∈ R.

Note. (5.51) is a special case of (5.17).

37. Deduce from (5.50) and Theorem 1.3 that

(5.52) d

dtet = et, ∀ t ∈ R.

As a consequence,

(5.53) dn

dtnet = et, ∀ t ∈ R, n ∈ N.

38. Note that e0 = 1, since log 1 = 0. Deduce from (5.53), together with the power seriesformulas (3.34) and (3.40), that, for all t ∈ R, n ∈ N,

(5.54) et =nk=0

1

k!tk + Rn(t),



153

where

(5.55) Rn(t) = tn+1

(n + 1)!

eζ n ,

for some ζ n between 0 and t.

39. Deduce from Exercise 38 that

(5.56) et =∞k=0

1

k!tk, ∀ t ∈ R.

Remark. Exercises 35–39 develop et only for t ∈ R. At this point, it is natural to segue

to (5.6) and from there to arguments involving (5.7)–(5.17), and then on to (5.29)–(5.41),renewing contact with the trigonometric functions.



154

6. Unbounded integrable functions

There are lots of unbounded functions we would like to be able to integrate. For example,consider f (x) = x−1/2 on (0, 1] (defined any way you like at x = 0). Since, for ε ∈ (0, 1),

(6.1)

1

ε

x−1/2 dx = 2 − 2√

ε,

this has a limit as ε 0, and it is natural to set

(6.2) 1

0

x−1/2 dx = 2.

Sometimes (6.2) is called an “improper integral,” but we do not consider that to be aproper designation. Here, we define a class R#(I ) of not necessarily bounded “integrable”functions on an interval I = [a, b], as follows.

First, assume f ≥ 0 on I , and for A ∈ (0, ∞), set

(6.3)f A(x) = f (x) if f (x) ≤ A,

A, if f (x) > A.

We say f ∈ R#

(I ) provided

(6.4)

f A ∈ R(I ), ∀ A < ∞, and

∃ uniform bound

I

f A dx ≤ M.

If f ≥ 0 satisfies (6.4), then I f A dx increases monotonically to a finite limit as A +∞,

and we call the limit I f dx:

(6.5) I

f A dx I

f dx, for f ∈ R#

(I ), f ≥ 0.

We also use the notation ba

f dx, if I = [a, b]. If I is understood, we might just write f dx. It is valuable to have the following.

Proposition 6.1. If f, g : I → R+ are in R#(I ), then f + g ∈ R#(I ), and

(6.6)

I

(f + g) dx =

I

f dx +

I

g dx.



155

Proof. To start, note that (f + g)A ≤ f A + gA. In fact,

(6.7) (f + g)A = (f A + gA)A.

Hence (f + g)A ∈ R(I ) and (f + g)A dx ≤ f A dx + gA dx ≤ f dx + g dx, so we

have f + g ∈ R#(I ) and

(6.8)

(f + g) dx ≤

f dx +

g dx.

On the other hand, if B > 2A, then (f + g)B ≥ f A + gA, so

(6.9)

(f + g) dx ≥

f A dx +

gA dx,

for all A < ∞, and hence

(6.10)

(f + g) dx ≥

f dx +

g dx.

Together, (6.8) and (6.10) yield (6.6).

Next, we take f : I → R and set

(6.11)f = f + − f −, f +(x) = f (x) if f (x) ≥ 0,

0 if f (x) < 0.

Then we say

(6.12) f ∈ R#(I ) ⇐⇒ f +, f − ∈ R#(I ),

and set

(6.13)

I

f dx =

I

f + dx − I

f − dx,

where the two terms on the right are defined as in (6.5). To extend the additivity, webegin as follows

Proposition 6.2. Assume that g ∈ R#(I ) and that gj ≥ 0, gj ∈ R#(I ), and

(6.14) g = g0 − g1.

Then

(6.15)

g dx =

g0 dx −

g1 dx.



156

Proof. Take g = g+ − g− as in (6.11). Then (6.14) implies

(6.16) g+ + g1 = g0 + g−,

which by Proposition 6.1 yields

(6.17)

g+ dx +

g1 dx =

g0 dx +

g− dx.

This implies

(6.18)

g+ dx −

g− dx =

g0 dx −

g1 dx,

which yields (6.15)

We now extend additivity.

Proposition 6.3. Assume f 1, f 2 ∈ R#(I ). Then f 1 + f 2 ∈ R#(I ) and

(6.19)

I

(f 1 + f 2) dx =

I

f 1 dx +

I

f 2 dx.

Proof. If g = f 1 + f 2 = (f +1 − f −1 ) + (f +2 − f −2 ), then

(6.20) g = g0 − g1, g0 = f +1 + f +2 , g1 = f −1 + f −2 .

We have gj ∈ R#(I ), and then

(6.21)

(f 1 + f 2) dx =

g0 dx −

g1 dx

=

(f +1 + f +2 ) dx −

(f −1 + f −2 ) dx

=

f +1 dx +

f +2 dx −

f −1 dx −

f −2 dx,

the first equality by Proposition 6.2, the second tautologically, and the third by Proposition

6.1. Since

(6.22)

f j dx =

f +j dx −

f −j dx,

this gives (6.19).

If f : I → C, we set f = f 1 + if 2, f j : I → R, and say f ∈ R#(I ) if and only if f 1 andf 2 belong to R#(I ). Then we set

(6.23)

f dx =

f 1 dx + i

f 2 dx.



157

Similar comments apply to f : I → Rn.Given f ∈ R#(I ), we set

(6.24) f L1(I ) = I

|f (x)| dx.

We have, for f, g ∈ R#(I ), a ∈ C,

(6.25) af L1(I ) = |a| f L1(I ),

and

(6.26)

f + gL1

(I ) = I

|f + g| dx

≤ I

(|f | + |g|) dx

= f L1(I ) + gL1(I ).

Note that, if S ⊂ I ,

(6.27) cont+(S ) = 0 =⇒ I

|χS | dx = 0,

where cont+(S ) is defined by (2.21). Thus, to get a metric, we need to form equivalenceclasses. The set of equivalence classes [f ] of elements of R#(I ), where

(6.28) f ∼ f ⇐⇒ I

|f − f | dx = 0,

forms a metric space, with distance function

(6.29) D([f ], [g]) = f − gL1(I ).

However, this metric space is not complete. One needs the Lebesgue integral to obtain acomplete metric space. One can see [Fol] or [T1].

We next show that each f ∈ R#(I ) can be approximated in L1 by a sequence of bounded, Riemann integrable functions.

Proposition 6.4. If f ∈ R#(I ), then there exist f k ∈ R(I ) such that

(6.30) f − f kL1(I ) −→ 0, as k → ∞.



158

Proof. If we separately approximate Re f and Im f by such sequences, then we approximatef , so it suffices to treat the case where f is real. Similarly, writing f = f + − f −, we seethat it suffices to treat the case where f ≥ 0 on I . For such f , simply take

(6.31) f k = f A, A = k,

with f A as in (6.3). Then (6.5) implies

(6.32)

I

f k dx I

f dx,

and Proposition 6.3 gives

(6.33)

I

|f − f k| dx = I

(f − f k) dx

=

I

f dx − I

f k dx

→ 0 as k → ∞.

So far, we have dealt with integrable functions on a bounded interval. Now, we sayf : R

→R (or C, or Rn) belongs to

R#(R) provided f

|I

∈ R#(I ) for each closed, bounded

interval I ⊂ R and

(6.34) ∃A < ∞ such that

R−R

|f | dx ≤ A, ∀ R < ∞.

In such a case, we set

(6.35)

∞−∞

f dx = limR→∞

R−R

f dx.

One can similarly define R#(R+).

Exercises

1. Let f : [0, 1] → R+ and assume f is continuous on (0, 1]. Show that

f ∈ R#([0, 1]) ⇐⇒ 1

ε

f dx is bounded as ε 0.



159

In such a case, show that 1

0

f dx = limε→0

1

ε

f dx.

2. Let a > 0. Define pa : [0, 1] → R by pa = x−a if 0 < x ≤ 1 Set pa(0) = 0. Show that

pa ∈ R#([0, 1]) ⇐⇒ a < 1.

3. Let b > 0. Define q b : [0, 1/2] → R by

q b(x) = 1

x| log x|b ,

if 0 < x ≤ 1/2. Set q b(0) = 0. Show that

q b ∈ R#([0, 1/2]) ⇐⇒ b > 1.

4. Show that if a ∈ C and if f ∈ R#(I ), then

af ∈ R#(I ), and

af dx = a

f dx.

Hint. Check this for a > 0, a =

−1, and a = i.

5. Show thatf ∈ R(I ), g ∈ R#(I ) =⇒ f g ∈ R#(I ).

Hint. Use (2.53). First treat the case f, g ≥ 1, f ≤ M . Show that in such a case,

(f g)A = (f AgA)A, and (f g)A ≤ M gA.

6. Compute

1

0

log tdt.

Hint. To compute 1ε log t dt, first compute

d

dt(t log t).

7. Given g ∈ R(I ), show that there exist gk ∈ PK(I ) such that

g − gkL1(I ) −→ 0.



160

Given h ∈ PK(I ), show that there exist hk ∈ C (I ) such that

h − hkL1(I ) −→ 0.

8. Using Exercise 7 and Proposition 6.4, prove the following: given f ∈ R#(I ), there existf k ∈ C (I ) such that

f − f kL1(I ) −→ 0.

9. Recall Exercise 4 of §2. If ϕ : [a, b] → [A, B] is C 1, with ϕ(x) > 0 for all x ∈ [a, b], then

(6.36)

BA

f (y) dy =

ba

f (ϕ(t))ϕ(t) dt,

for each f ∈ C ([a, b]), where A = ϕ(a), B = ϕ(b). Using Exercise 8, show that (6.36)holds for each f ∈ R#([a, b]).

10. If f ∈ R#(R), so (6.34) holds, prove that the limit exists in (6.35).

11. Given f (x) = x−1/2(1 + x2)−1 for x > 0, show that f ∈ R#(R+). Show that

∞0

1

1 + x2

dx√ x

= 2

∞0

dy

1 + y4.

12. Let f k ∈ R#([a, b]), f : [a, b] → R satisfy

|f k| ≤ g, ∀ k, for some g ∈ R#([a, b]),(a)

Given ε > 0, ∃ contented S ε ⊂ [a, b] such that S ε

g dx < ε, and f k → f uniformly on [a, b] \ S ε.(b)

Show that f ∈ R#

([a, b]) and ba

f k(x) dx −→ ba

f (x) dx, as k → ∞.

13. Let g ∈ R#([a, b]) be ≥ 0. Show that for each ε > 0, there exists δ > 0 such that

S ⊂ [a, b] contented, cont S < δ =⇒ S

g dx < ε.



161

Hint. With gA defined as in (6.3), pick A such that

gA dx ≥ g dx − ε/2. Then pick

δ < ε/2A.

14. Deduce from Exercises 12–13 the following. Let f k ∈ R#

([a, b]), f : [a, b] → R satisfy

|f k| ≤ g, ∀ k, for some g ∈ R#([a, b]),(a)

Given δ > 0, ∃ contented S δ ⊂ [a, b] such that

cont S δ < δ, and f k → f uniformly on [a, b] \ S δ.(b)

Show that f ∈ R#([a, b]) and

ba

f k(x) dx −→ ba

f (x) dx, as k → ∞.

Remark. Compare Exercise 18 of §2. As mentioned there, the Lebesgue theory of integration has a stronger result, known as the Lebesgue dominated convergence theorem.



162

A. The fundamental theorem of algebra

The following result is the fundamental theorem of algebra.

Theorem A.1. If p(z) is a nonconstant polynomial (with complex coefficients), then p(z)must have a complex root.

Proof. We have, for some n ≥ 1, an = 0,

(A.1) p(z) = anzn + · · · + a1z + a0

= anzn

1 + O(z−1)

, |z| → ∞,

which implies

(A.2) lim|z|→∞

| p(z)| = ∞.

Picking R ∈ (0, ∞) such that

(A.3) inf |z|≥R

| p(z)| > | p(0)|,

we deduce that

(A.4) inf |z|≤R | p(z)| = inf z∈C | p(z)|.Since DR = z : |z| ≤ R is compact and p is continuous, there exists z0 ∈ DR such that

(A.5) | p(z0)| = inf z∈C

| p(z)|.

The theorem hence follows from:

Lemma A.2. If p(z) is a nonconstant polynomial and (A.5) holds, then p(z0) = 0.

Proof. Suppose to the contrary that

(A.6) p(z0) = a = 0.

We can write

(A.7) p(z0 + ζ ) = a + q (ζ ),

where q (ζ ) is a (nonconstant) polynomial in ζ , satisfying q (0) = 0. Hence, for some k ≥ 1and b = 0, we have q (ζ ) = bζ k + · · · + bnζ n, i.e.,

(A.8) q (ζ ) = bζ k + O(ζ k+1), ζ → 0,



163

so, uniformly on S 1 = ω : |ω| = 1

(A.9) p(z0 + εω) = a + bωkεk + O(εk+1), ε 0.

Pick ω ∈ S 1 such that

(A.10) b

|b|ωk = − a

|a| ,

which is possible since a = 0 and b = 0. In more detail, since −(a/|a|)(|b|/b) ∈ S 1, Euler’sidentity implies

− a

|a||b|b

= eiθ,

for some θ ∈R

, so we can take ω = eiθ/k.

Given (A.10),

(A.11) p(z0 + εω) = a

1 − b

a

εk+ O(εk+1),

which contradicts (A.5) for ε > 0 small enough. Thus (A.6) is impossible. This provesLemma A.2, hence Theorem A.1.

Now that we have shown that p(z) in (A.1) must have one root, we can show it has n

roots (counting multiplicity).Proposition A.3. For a polynomial p(z) of degree n, as in (A.1), there exist r1, . . . , rn ∈C such that

(A.12) p(z) = an(z − r1) · · · (z − rn).

Proof. We have shown that p(z) has one root; call it r1. Dividing p(z) by z − r1, we have

(A.13) p(z) = (z − r1)˜ p(z) + q,

where ˜ p(z) = anzn−1 +· · ·+ a0 and q is a polynomial of degree < 1, i.e., a constant. Settingz = r1 in (A.13) yields q = 0, so

(A.14) p(z) = (z − r1)˜ p(z).

Since ˜ p(z) is a polynomial of degree n − 1, the result (A.12) follows by induction on n.

The numbers rj , 1 ≤ j ≤ n, in (A.12) are called the roots of p(z). If k of them coincide(say with r) we say r is a root of multiplicity k . If r is distinct from rj for all j = , wesay r is a simple root.



164

B. π2 is Irrational

The following proof that π2 is irrational follows a classic argument of I. Niven, [Niv].The idea is to consider

(B.1) I n =

π0

ϕn(x)sin x dx, ϕn(x) = 1

n!xn(π − x)n.

Clearly I n > 0 for each n ∈ N, and I n → 0 very fast, faster than geometrically:

(B.1A) 0 < I n < 1

n!

π

2

2n

.

The next key fact, to be established below, is that I n is a polynomial of degree n in π2

with integer coefficients:

(B.2) I n =nk=0

cnkπ2k, cnk ∈ Z.

Given this it follows readily that π2 is irrational. In fact, if π2 = a/b, a,b ∈ N, then

(B.3)nk=0

cnka2kb2n−2k = b2nI n.

But the left side of (B.3) is an integer for each n, while by the estimate (B.1A), theright side belongs to the interval (0, 1) for large n, yielding a contradiction. It remains toestablish (B.2).

A method of computing the integral in (B.1), which works for any polynomial ϕn(x) isthe following. One looks for an antiderivative of the form

(B.4) Gn(x)sin x − F n(x)cos x,

where F n and Gn are polynomials. One needs

(B.5) Gn(x) = F n(x), Gn(x) + F n(x) = ϕn(x),

hence

(B.6) F n (x) + F n(x) = ϕn(x).

One can exploit the nilpotence of ∂ 2x on the space of polynomials of degree ≤ 2n and set

(B.7)

F n(x) = (I + ∂ 2x)−1ϕn(x)

=nk=0

(−1)kϕ(2k)n (x).



165

Then

(B.8) d

dxF n(x)sin x

−F n(x)cos x

= ϕn(x)sin x.

Integrating (B.8) over x ∈ [0, π] gives

(B.9)

π0

ϕn(x)sin x dx = F n(0) + F n(π) = 2F n(0),

the last identity holding for ϕn(x) as in (B.1) because then ϕn(π − x) = ϕn(x) and henceF n(π − x) = F n(x). For the first identity in (B.9), we use the defining property thatsin π = 0 while cos π = −1.

In light of (B.7), to prove (B.2) it suffices to establish an analogous property for ϕ(2k)n (0).

Comparing the binomial formula and Taylor’s formula for ϕn(x):

(B.10)

ϕn(x) = 1

n!

n=0

(−1)

n

πn−xn+, and

ϕn(x) =

2nk=0

1

k!ϕ(k)n (0)xk,

we see that

(B.11) k = n + ⇒ ϕ(k)n (0) = (−1) (n + )!n!

nπn−,

so

(B.12) 2k = n + ⇒ ϕ(2k)n (0) = (−1)n

(n + )!

n!

n

π2(k−).

Of course ϕ(2k)n (0) = 0 for 2k < n. Clearly the multiple of π2(k−) in (B.12) is an integer.

In fact,

(B.13)

(n + )!n!

n = (n + )!

n!n!

!(n − )!

= (n + )!

n!!

n!

(n − )!

=

n +

n

n(n − 1) · · · (n − + 1).

Thus (B.2) is established, and the proof that π2 is irrational is complete.



166

C. More on (1

−x)b

In §3 we showed that

(C.1) (1 − x)b =∞k=0

akk!

xk,

for |x| < 1, with

(C.2) a0 = 1, ak =k−1=0

(−b + ), for k ≥ 1.

There we required b ∈ Q, but in §5 we defined yb, for y > 0, for all b ∈ R (and for y ≥ 0 if b > 0), and noted that such a result extends. Here, we prove a further result, when b > 0.

Proposition C.1. Given b > 0, ak as in (C.2), the identity (C.1) holds for x ∈ [−1, 1],and the series converges absolutely and uniformly on [−1, 1].

Proof. Our main task is to show that

(C.3)∞k=0

|ak|k!

< ∞,

if b > 0. This implies that the right side of (C.1) converges absolutely and uniformly on[−1, 1] and its limit, g(x), is continuous on [−1, 1]. We already know that g(x) = (1 − x)b

on (−1, 1), and since both sides are continuous on [−1, 1], the identity also holds at theendpoints. Now, if k − 1 > b,

(C.4) ak

k! = − b

k

1≤≤b

1 − b

b<≤k−1

1 − b

,

which we write as (B/k) pk, where pk denotes the last product in (C.4). Then

(C.5)

log pk = b<≤k−1

log1 − b

≤ −

b<≤k−1

b

≤ −b log k + β,

for some β ∈ R. Here, we have used

(C.6) log(1 − r) < −r, for 0 < r < 1,



167

and

(C.7)k−1

=1

1

>

k

1

dy

y .

It follows from (C.5) that

(C.8) pk ≤ e−b log k+β = γk−b,

so

(C.9) |ak|

k! ≤ |Bγ | k−(1+b),

giving (C.3).

Exercise

1. Why did we not put this argument in §3?Hint. logs



168

D. Archimedes’ approximation to π

Here we discuss an approximation to π proposed by Archimedes. It is based on the fact(known to the ancient Greeks) that the unit disk D = (x, y) ∈ R2 : x2 + y2 ≤ 1 has theproperty

(D.1) Area D = π.

We have not discussed area in this text. This topic is treated in the companion text [T2].Actually, (D.1) was originally the definition of π. Here, we have taken (4.31A) as thedefinition. To get the equivalence, we appeal to notions from first-year calculus, giving

areas of regions bounded by graphs in terms of integrals. We have

(D.2)

Area D = 2

1

−1

1 − x2 dx

= 2

π/2

−π/2

cos2 θ dθ

=

π/2

−π/2

(cos2θ + 1) dθ

= π.

Here, the second identity follows from the substitution x = sin θ and the third from theidentity

cos2θ = cos2 θ − sin2 θ = 2 cos2 θ − 1,

a consequence of (5.44), with s = t = θ. One can also get (D.1) by computing areas inpolar coordinates (cf. [T2]).

Having (D.1), Archimedes proceeded as follows. If P n is a regular n-gon inscribed inthe unit circle, then Area P n → π as n → ∞, with

(D.3) π − c

n2 < Area P n < π.

See (D.18)–(D.20) below for more on this. Note that such a polygon decomposes into nequal sized isoceles triangles, with two sides of length 1 meeting at an angle αn = 2π/n.Such a triangle T n has

(D.4) Area T n =

sin αn

2

cos

αn2

=

1

2 sin αn,

so

(D.5) Area P n = n

2 sin

2π

n .



169

One can obtain an inductive formula for Area P n for n = 2k as follows. Set

(D.6) S k = sin 2π

2k

, C k = cos 2π

2k

.

Then, for example, S 2 = 1, C 2 = 0, and

(D.7) (C k+1 + iS k+1)2 = C k + iS k,

i.e.,

(D.8) C 2k+1 − S 2k+1 = C k, 2C k+1S k+1 = S k.

We are in the position of solving

(D.9) x2 − y2 = a, 2xy = b,

for x and y, knowing that a ≥ 0, b, x, y > 0. We substitute y = b/2x into the first equation,obtaining

(D.10) x2 − b2

4x2 = a,

then set u = x2 and get

(D.11) u2 − au − b2

4 = 0,

whose positive solution is

(D.12) u = a

2 +

1

2

a2 + b2.

Then

(D.13) x =√

u, y = b

2√

u.

Taking a = C k, b = S k, and knowing that C 2k + S 2k = 1, we obtain

(D.14) S k+1 = S k

2√

U k,

with

(D.15) U k = 1 + C k

2 =

1 +

1 − S 2k2

.



170

Then

(D.16) Area P 2k = 2k−1 S k.

Alternatively, with P k = Area P 2k , we have

(D.17) P k+1 = P k√

U k.

As we show below, π is approximated to 15 digits of accuracy in 25 iterations of (D.14)–(D.17), starting with S 2 = 1 and P 2 = 2.

First, we take a closer look at the error estimate in (D.3). Note that

(D.18) π

−Area

P n =

n

22π

n −sin

2π

n ,

and that

(D.19) δ − sin δ = δ 3

3! − δ 5

5! + · · · <

δ 3

3!, for 0 < δ < 6,

so

(D.20) π − Area P n < 2π3

3 · 1

n2, for n ≥ 2.

Thus we can take c = 2π

3

/3 in (D.3) for n ≥ 2, and this is asymptotically sharp.From (D.20) with n = 225, we have

(D.21) π − P 25 < 2π3

3 · 2−50.

Since

(D.22) 210 = 1024 ⇒ 250 ≈ 1015, and 2π3

3 ≈ 20,

we get

(D.23) π − P 25 ≈ 10−14.

The Archimedes method often gets bad press because the error given in (D.20) decreasesslowly with n. However, given that we take n = 2k and iterate on k, the error actuallydecreases exponentially in k. Nevertheless, use of the infinite series suggested in Exercise 7of §5 has advantages over the use of (D.14)–(D.17), particularly in that it does not requireone to calculate a bunch of square roots.

There is another disadvantage of the iteration (D.14)–(D.17), though it does not show upin a mere 25 iterations (at least, not if one is using double precision arithmetic). Namely,



171

any error in the approximate calculation of P k (compared to its exact value), due forexample to truncation error, can get magnified in the approximate calculation of P k+ for ≥ 1. This will ultimately lead to an instability, and a breakdown in the viability of the

iterative method (D.14)–(D.17).We end this appendix by showing how the approximation to π described here can be

justified without any notion of area. In fact, setting

(D.24) An = n

2 sin

2π

n ,

(cf. (D.5)), we get

(D.25) 0 < π − An < 2π3

3

1

n2, for n ≥ 2,

directly from (D.19); cf. (D.20). Thus, we can simply set P k = A2k , and then the estimates(D.21)–(D.23) hold, and the iteration (D.14)–(D.17) works, without recourse to area.

In effect, the previous paragraph took the geometry out of Archimedes’ approximationto π. Finally, we note the following variant, bringing in arc length (treated thoroughly in§4) in place of area. Namely, the perimeter Qn of the regular n-gon P n is a union of n linesegments, each being the base of an isoceles triangle with two sides of length 1, meetingat an angle αn = 2π/n. Hence each such line segment has length 2 sin αn/2, so

(D.26) (Qn) = 2n sin π

n.

The fact that

(D.27) (Qn) −→ 2π, as n → ∞

follows from the definition (4.31A) together with Proposition 4.1. Note that (D.26) implies

(D.28) (Qn) = 2A2n,

leading us back to Archimedes’ approximation.

Note. Actually, Archimedes started with the regular hexagon and proceeded from there

to evaluate P k = Area P 3·2k , for k up to 5. The basic iteration (D.7)–(D.15) also appliesto this case. By (D.20),

0 < π − Area P 96 < 0.00225.

Archimedes’ presentation was

3 + 10

71 < π < 3 +

1

7.



172

E. Computing π using arctangents

In Exercise 3 of §5, we defined tan t = sin t/ cos t. It is readily verified (via Exercise 4of §5) that

(E.1) tan :−π

2, π

2

−→ R

is one-to-one and onto, with positive derivative, so it has a smooth inverse

(E.2) tan−1 : R −→−π

2, π

2

.

It follows from Exercise 5 of §5 that

(E.3) tan−1 x =

x0

ds

1 + s2.

We can insert the power series for (1 + s2)−1 and integrate term by term to get

(E.4) tan−1 x =∞k=0

(−1)k

2k + 1x2k+1, if − 1 < x < 1.

This provides a way to obtain rapidly convergent series for π, alternative to that proposedin Exercise 7 of §5, which can be called an evaluation of π using the arcsine.

For a first effort, we use

(E.5) tan π

6 =

1√ 3

,

which follows from

(E.6) sin π

6 =

1

2, cos

π

6 =

√ 3

2 ⇐⇒ eπi/6 =

√ 3

2 +

1

2i,

compare Exercises 2 and 7 of §5. Now (E.4)–(E.5) yield

(E.7) π

6 =

1√ 3

∞k=0

(−1)k

2k + 1

1

3

k.

We can compare (E.7) with the series (5.45A) for π . One difference is the occurence of the

factor 1/√

3, which is irrational. To be sure, it is not hard to compute√

3 to high precision.Compare Exercises 8–10 of §3; for a faster method, see the treatment of Newton’s methodin §5 of Chapter 5. Nevertheless, the presence of this irrational factor in (E.7) is a bit



173

of a glitch. Another disadvantage of (E.7) is that this series converges more slowly than(5.45A).

We can do better by expressing π as a finite linear combination of terms tan−1 xj for

certain fairly small rational numbers xj . The key to this is the following formula fortan(a + b). Using (5.44), we have

(E.8)

tan(a + b) = sin(a + b)

cos(a + b) =

sin a cos b + cos a sin b

cos a cos b − sin a sin b

= tan a + tan b

1 − tan a tan b.

Since tan π/4 = 1, we have, for a, b, a + b ∈ (−π/2, π/2),

(E.9) π

4

= a + b

⇐=

tan a + tan b

1 − tan a tan b

= 1.

Taking a = tan−1 x, b = tan−1 y gives

(E.10)

π

4 = tan−1 x + tan−1 y ⇐= x + y = 1 − xy

⇐= x = 1 − y

1 + y.

If we set y = 1/2, we get x = 1/3, so

(E.11) π

4

= tan−1 1

3

+ tan−1 1

2

.

The power series (E.4) for tan−1(1/3) and tan−1(1/2) both converge faster than (E.7), butthat for tan−1(1/2) converges at essentially the same rate as (5.45A). We might optimise

by taking x = y in (E.10), but that yields x = y =√

2−1, and we do not want to plug this

irrational number into (E.4). Taking a cue from√

2 − 1 ≈ 0.414, we set y = 2/5, whichyields x = 3/7, so

(E.12) π

4 = tan−1 2

5 + tan−1 3

7.

Both resulting power series converge faster than (5.45A), but not by much.

To do better, we bring in a formula for tan(a + 2b). Note that setting a = b in (E.8)yields

(E.13) tan 2b = 2tan b

1 − tan2 b,

and concatenating this with (E.8) (with b replaced by 2b) yields, after some elementarycalculation,

(E.14) tan(a + 2b) = tan a(1 − tan2 b) + 2 tan b

1 − tan2 b − 2tan a tan b .



174

Thus, parallel to (E.9),

(E.15) π

4

= a + 2b

⇐=

tan a(1 − tan2 b) + 2 tan b

1 − tan2

b − 2tan a tan b

= 1.

Taking a = tan−1 x, b = tan−1 y gives

(E.16)

π

4 = tan−1 x + 2 tan−1 y ⇐= x(1 − y2) + 2y = 1 − y2 − 2xy

⇐= x = 1 − y2 − 2y

1 − y2 + 2y.

Taking y = 1/3 yields x = 1/7, so

(E.17) π4

= tan−1 17

+ 2 tan−1 13

.

Both resulting power series converge significantly faster than (5.45A). Alternatively, wecan take y = 1/4, yielding x = 7/23, so

(E.18) π

4 = tan−1 7

23 + 2 tan−1 1

4.

The power series for tan−1(7/23) converges a little faster than that for tan−1(1/3).One can go still farther, iterating (E.13) to produce a formula for tan 4b, and concate-

nating this with (E.8) to produce a formula for

(E.19) tan(a + 4b).

An argument somewhat parallel to that involving (E.15)–(E.16) yields identities of theform

(E.20) π

4 = tan−1 x + 4 tan−1 y,

including the following, known as Machin’s formula:

(E.21) π4

= 4tan−1 15 − tan−1 1

239,

with y = 1/5, x = −1/239. For many years, this was the most popular formula forhigh precision approximations to π, until the 1970s, when a more sophisticated method(actually discovered by Gauss in 1799) became available. For more on this, the reader canconsult Chapter 7 of [AH].

Returning to the arctangent function, we record a series that converges much fasterthan (E.4), for such values of x as occur in (E.11), (E.12), (E,17), (E.18), and (E.21). Thefollowing is due to Euler.



175

Proposition E.1. For x ∈ R,

(E.22) x tan−1 x = ϕ x2

1 + x2,

with

(E.23) ϕ(z) = z

1 + 2

3z +

2 · 4

3 · 5z2 +

2 · 4 · 6

3 · 5 · 7z3 + · · ·

.

The power series (E.23) has the same radius of convergence as (E.4). The advantage of (E.22)–(E.23) over (E.4) lies in the fact that x2/(1 + x2) is substantially smaller than x,for the values of x that appear in our various formulas for π.

To start the proof of Proposition E.1, note that

(E.24) z = x2

1 + x2 ⇐⇒ x2 =

z

1 − z.

Hence, by (E.4),

(E.25)

x tan−1 x =∞k=1

(−1)k−1

2k − 1 x2k

=∞

k=1

(−1)k−1

2k − 1 zk(1

−z)−k.

Now

(E.26) (1 − z)−1 =∞n=0

zn,

and differentiating repeatedly gives

(E.27) (1 − z)

−k

=

∞

n=0k + n

−1

n z

n

,

for |z| < 1. Thus, with z = x2/(1 + x2), we have (E.22) with

(E.28)

ϕ(z) =∞k=1

∞n=0

(−1)k−1

2k − 1

k + n − 1

n

zn+k

=∞=1

−1n=0

(−1)−n−n

2 − 2n − 1

− 1

n

z.



176

Hence

(E.29) ϕ(z) =∞

=1

−1

m=0

(−1)m

2m + 1 − 1

m z.

To get (E.23), it remains to show that

(E.30) ϕ =−1m=0

(−1)m

2m + 1

− 1

m

=⇒ ϕ =

2 · 4 · · · 2( − 1)

3 · 5 · · · (2 − 1), ≥ 2,

while

(E.31) ϕ1 = 1.

In fact, (E.31) is routine, so it suffices to establish that

(E.32) ϕ+1 = 2

2 + 1ϕ.

To see this, note that the binomial formula gives

(E.33) (1 − s2)−1 =−1m=0

(−1)m

− 1

m

s2m,

and integrating over s∈

[0, 1] gives

(E.34) ϕ =

1

0

(1 − s2)−1 ds.

To get the recurrence relation (E.32), we start with

(E.35)

d

ds(1 − s2)+1 = −2( + 1)s(1 − s2),

d2

ds2(1 − s2)+1 = −2( + 1)(1 − s2) + 4( + 1)s2(1 − s2)−1.

Integrating the last identity over s ∈ [0, 1] gives

(E.36) 2

1

0

(1 − s2)−1s2 ds =

1

0

(1 − s2) ds.

Hence

(E.37) 2(−ϕ+1 + ϕ) = ϕ+1,

which gives (E.32). This finishes the proof of Proposition E.1.



177

Chapter V

Further Topics in Analysis

Introduction

In this final chapter we apply results of Chapters 3 and 4 to a selection of topics inanalysis. One underlying theme here is the approximation of a function by a sequence of “simpler” functions.

In §1 we define the convolution of functions on R,

f ∗ u(x) =

∞−∞

f (y)u(x − y) dy,

and give conditions on a sequence (f n) guaranteeing that f n ∗ u → u as n → ∞. In §2 wetreat the Weierstrass approximation theorem, which states that each continuous functionon a closed, bounded interval [a, b] is a uniform limit of a sequence of polynomials. Wegive two proofs, one using convolutions and one using the uniform convergence on [−1, 1]of the power series of (1 − x)b, whenever b > 0, established in Appendix C of Chapter4. (Here, we take b = 1/2.) Section 3 treats a far reaching generalization, known as theStone-Weierstrass theorem. A special case, of use in §4, is that each continuous functionon T1 is a uniform limit of a sequence of finite linear combinations of the exponentialseikθ, k ∈ Z.

Section 4 introduces Fourier series,

f (θ) =∞

k=−∞

akeikθ.

A central question is when this holds with

ak = 1

2π

π−π

f (θ)e−ikθ dθ.

This is the Fourier inversion problem, and we examine several aspects of this. Fourier

analysis is a major area in modern analysis, and it is hoped that the material treated herewill provide a useful stimulus for further study.

For further material on Fourier analysis, one can look at Chapter 13 of [T3], dealingwith Fourier series on a similar level as here, but with a different perspective, followed byChapters 14–15 of [T3], on the Fourier transform and Laplace transform. Progressivelymore advanced treatments of Fourier analysis can be found in [Fol], Chapter 8, and [T4],Chapter 3.

Section 5 treats the use of Newton’s method to solve

f (ξ ) = y



178

for ξ in an interval [a, b] given that f (a) − y and f (b) − y have opposite signs and that

|f (x)| ≤ A, |f (x)| ≥ B > 0, ∀ x ∈ [a, b].

It is seen that if an initial guess x0 is close enough to ξ , then Newton’s method producesa sequence (xk) satisfying

|xk − ξ | ≤ Cβ 2k

, for some β ∈ (0, 1).

It is extremely useful to have such a rapid approximation of the solution ξ .



179

1. Convolutions and bump functions

If u is bounded and continuous on R and f is integrable (say f ∈ R(R)) we define theconvolution f ∗ u by

(1.1) f ∗ u(x) =

∞−∞

f (y)u(x − y) dy.

Clearly

(1.2) |f

|dx = A,

|u

| ≤M on R =

⇒ |f

∗u

| ≤AM on R.

Also, a change of variable gives

(1.3) f ∗ u(x) =

∞−∞

f (x − y)u(y) dy.

We want to analyze the convolution action of a sequence of integrable functions f n onR that satisfy the following conditions:

(1.4) f n ≥ 0, f n dx = 1, R\I n

f n dx = εn → 0,

where

(1.5) I n = [−δ n, δ n], δ n → 0.

Let u ∈ C (R) be supported on a bounded interval [−A, A], or more generally, assume

(1.6) u ∈ C (R), |u| ≤ M on R,

and u is uniformly continuous on R, so with δ n as in (1.5),

(1.7) |x − x| ≤ 2δ n =⇒ |u(x) − u(x)| ≤ εn → 0.

We aim to prove the following.

Proposition 1.1. If f n ∈ R(R) satisfy (1.4)–(1.5) and if u ∈ C (R) is bounded and uniformly continuous (satisfying (1.6)–(1.7)), then

(1.8) un = f n ∗ u −→ u, uniformly on R, as n → ∞.



180

Proof. To start, write

(1.9)

un(x) =

f n(y)u(x

−y) dy

= I n

f n(y)u(x − y) dy +

R\I n

f n(y)u(x − y) dy

= vn(x) + rn(x).

Clearly

(1.10) |rn(x)| ≤ M εn, ∀ x ∈ R.

Next,

(1.11) vn(x) − u(x) = I n

f n(y)[u(x − y) − u(x)] dy − εnu(x),

so

(1.12) |vn(x) − u(x)| ≤ εn + M εn, ∀ x ∈ R,

hence

(1.13) |un(x) − u(x)| ≤ εn + 2M εn, ∀ x ∈ R,

yielding (1.8).Here is a sequence of functions (f n) satisfying (1.4)–(1.5). First, set

(1.14) gn(x) = 1

An(x2 − 1)n, An =

1

−1

(x2 − 1)n dx,

and then set

(1.15)f n(x) = gn(x), |x| ≤ 1,

0, |x| ≥ 1.

It is readily verified that such (f n) satisfy (1.4)–(1.5). We will use this sequence in Propo-sition 1.1 for one proof of the Weierstrass approximation theorem, in the next section.

The functions f n defined by (1.14)–(1.15) have the property

(1.16) f n ∈ C n−1(R).

Furthermore, they have compact support, i.e., vanish outside some compact set. We say

(1.17) f ∈ C k0 (R),

provided f ∈ C k(R) and f has compact support. The following result is useful.



181

Proposition 1.2. If f ∈ C k0 (R) and u ∈ R(R), then f ∗ u ∈ C k(R), and (provided k ≥ 1)

(1.18) d

dx

f

∗u(x) = f

∗u(x).

Proof. We start with the case k = 0, and show that

f ∈ C 00 (R), u ∈ R(R) =⇒ f ∗ u ∈ C (R).

In fact, by (1.3),

f ∗ u(x + h) − f ∗ u(x)

=

∞−∞

[f (x + h − y) − f (x − y)]u(y) dy≤ sup

x|f (x + h) − f (x)| ∞

−∞

|u(y)| dy,

which clearly tends to 0 as h → 0.From here, it suffices to treat the case k = 1, since if f ∈ C k0 (R), then f ∈ C k−1

0 (R),and one can use induction on k. Using (1.3), we have

(1.19) f ∗ u(x + h) − f ∗ u(x)

h =

∞−∞

gh(x − y)u(y) dy,

where

(1.20) gh(x) = 1

h[f (x + h) − f (x)].

We claim that

(1.21) f ∈ C 10 (R) =⇒ gh → f uniformly on R, as h → 0.

Given this,

(1.22)

∞

−∞gh(x − y)u(y) dy −

∞

−∞f

(x − y)u(y) dy≤ sup

x|gh(x) − f (x)|

∞−∞

|u(y)| dy,

which yields (1.18).It remains to prove (1.21). Indeed, the fundamental theorem of calculus implies

(1.23) gh(x) = 1

h

x+h

x

f (y) dy,



182

if h > 0, so

(1.24) |gh(x) − f (x)| ≤ supx≤y≤x+h

|f (y) − f (x)|,

if h > 0, with a similar estimate if h < 0. This yields (1.21).

We say

(1.25) f ∈ C ∞(R) provided f ∈ C k(R) for all k,

and similarly f ∈ C ∞0 (R) provided f ∈ C k0 (R), for all k. It is useful to have some examplesof functions in C ∞0 (R). We start with the following. Set

(1.26) G(x) = e−1/x2 , if x > 0,

0, if x ≤ 0.

Lemma 1.3. G ∈ C ∞(R).

Proof. Clearly g ∈ C k for all k on (0, ∞) and on (−∞, 0). We need to check its behaviorat 0. The fact that G is continuous at 0 follows from

(1.27) e−y2 −→ 0, as y → ∞.

Note that

(1.28) G(x) =

2

x3e−1/x2 , if x > 0,

0, if x < 0.

also

(1.29) G(0) = limh→0

G(h)

h = 0,

as a consequence of

(1.30) ye−y2 −→ 0, as y → ∞.

Clearly G is continuous on (0,∞

) and on (−∞

, 0). The continuity at 0 is a consequenceof

(1.31) y3e−y2 −→ 0, as y → ∞.

The existence and continuity of higher derivatives of G follows a similar pattern, makinguse of

(1.32) yke−y2 −→ 0, as y → ∞,

for each k ∈ N. We leave the details to the reader.



183

Corollary 1.4. Set

(1.33) g(x) = G(x)G(1 − x).

Then g ∈ C ∞0 (R). In fact, g(x) = 0 if and only if 0 < x < 1.

Exercises

1. Let f ∈ R(R) satisfy

(1.34) f ≥ 0,

f dx = 1,

and set

(1.35) f n(x) = nf x

n

, n ∈ N.

Show that Proposition 1.1 applies to the sequence f n.

2. Take

(1.36) f (x) = 1

Ae−x

2

, A =

∞−∞

e−x2

dx.

Show that Exercise 1 applies to this case.

Note. In [T2] it is shown that A = √ π in (1.36).

3. Modify the proof of Lemma 1.3 to show that, if

G1(x) = e−1/x, if x > 0,

0, if x ≤ 0,

then G1 ∈ C ∞(R).

4. Establish whether each of the following functions is in C ∞(R).

(a) ϕ(x) = G(x) sin 1

x, if x = 0,

0, if x = 0.

(b) ψ(x) = G1(x) sin

1

x, if x = 0,

0, if x = 0.

Here G(x) is as in (1.26) and G1(x) is as in Exercise 3.



184

2. The Weierstrass approximation theorem

The following result of Weierstrass is a very useful tool in analysis.

Theorem 2.1. Given a compact interval I , any continuous function f on I is a uniform limit of polynomials.

Otherwise stated, our goal is to prove that the space C (I ) of continuous (real valued)functions on I is equal to P (I ), the uniform closure in C (I ) of the space of polynomials.

We will give two proofs of this theorem. Our starting point for the first proof will bethe result that the power series for (1 − x)a converges uniformly on [−1, 1], for any a > 0.This was established in Chapter 3, Appendix C, and we will use it, with a = 1/2.

From the identity x1/2

= (1 − (1 − x))1/2

, we have x1/2

∈ P ([0, 2]). More to the point,from the identity

(2.1) |x| =

1 − (1 − x2)1/2

,

we have |x| ∈ P ([−√ 2,

√ 2]). Using |x| = b−1|bx|, for any b > 0, we see that |x| ∈ P (I ) for

any interval I = [−c, c], and also for any closed subinterval, hence for any compact intervalI . By translation, we have

(2.2) |x − a| ∈ P (I )

for any compact interval I . Using the identities

(2.3) max(x, y) = 1

2(x + y) +

1

2|x − y|, min(x, y) =

1

2(x + y) − 1

2|x − y|,

we see that for any a ∈ R and any compact I ,

(2.4) max(x, a), min(x, a) ∈ P (I ).

We next note that P (I ) is an algebra of functions, i.e.,

(2.5) f, g ∈ P (I ), c ∈ R =⇒ f + g,fg,cf ∈ P (I ).

Using this, one sees that, given f ∈ P (I ), with range in a compact interval J , one hash f ∈ P (I ) for all h ∈ P (J ). Hence f ∈ P (I ) ⇒ |f | ∈ P (I ), and, via (2.3), we deducethat

(2.6) f, g ∈ P (I ) =⇒ max(f, g), min(f, g) ∈ P (I ).

Suppose now that I = [a, b] is a subinterval of I = [a, b]. With the notation x+ =max(x, 0), we have

(2.7) f II (x) = min

(x − a)+, (b − x)+

∈ P (I ).



185

This is a piecewise linear function, equal to zero off I \ I , with slope 1 from a to themidpoint m of I , and slope −1 from m to b.

Now if I is divided into N equal subintervals, any continuous function on I that is linear

on each such subinterval can be written as a linear combination of such “tent functions,”so it belongs to P (I ). Finally, any f ∈ C (I ) can be uniformly approximated by suchpiecewise linear functions, so we have f ∈ P (I ), proving the theorem.

For the second proof, we bring in the sequence of functions f n defined by (1.14)–(1.15),i.e., first set

(2.8) gn(x) = 1

An(x2 − 1)n, An =

1

−1

(x2 − 1)n dx,

and then set

(2.9)f n(x) = gn(x), |x| ≤ 1,

0, |x| ≥ 1.

It is readily verified that such (f n) satisfy (1.4)–(1.5). We will use this sequence in Propo-sition 1.1 to prove that if I ⊂ R is a closed, bounded interval, and f ∈ C (I ), then thereexist polynomials pn(x) such that

(2.10) pn −→ f, uniformly on I.

To start, we note that by an affine change of variable, there is no loss of generality inassuming that

(2.11) I =−1

4, 1

4

.

Next, given I as in (2.11) and f ∈ C (I ), it is easy to extend f to a function

(2.12) u ∈ C (R), u(x) = 0 for |x| ≥ 1

2.

Now, with f n as in (2.8)–(2.9), we can apply Proposition 1.1 to deduce that

(2.13) un(x) =

f n(y)u(x − y) dy =⇒ un → u uniformly on R.

Now

(2.14)|x| ≤ 1

2 =⇒ u(x − y) = 0 for |y| > 1

=⇒ un(x) =

gn(y)u(x − y) dy,



186

that is,

(2.15)

|x

| ≤

1

2

=

⇒un(x) = pn(x),

where

(2.16)

pn(x) =

gn(y)u(x − y) dy

=

gn(x − y)u(y) dy.

The last identity makes it clear that each pn(x) is a polynomial in x. Since (2.13) and(2.15) imply

(2.17) pn −→ u uniformly on−1

2, 1

2

,

we have (2.10).

Exercises

1. As in Exercises 1–2 of §1, take

f (x) = 1

A

e−x2

, A = ∞

−∞

e−x2

dx,

f n(x) = nf x

n

.

Let u ∈ C (R) vanish outside [−1, 1]. Let ε > 0 and take n ∈ N such that

supx

|f n ∗ u(x) − u(x)| < ε.

Approximate f n by a sufficient partial sum of the power series

f n(x) = nA

∞k=0

1k!−x

2

n2k,

and use this to obtain a third proof of Theorem 2.1.

Remark. A fourth proof of Theorem 2.1 is indicated in Exercise 8 of §4.

2. Let f be continuous on [−1, 1]. If f is odd , show that it is a uniform limit of finite linearcombinations of x, x3, x5, . . . , x2k+1, . . . . If f is even , show it is a uniform limit of finitelinear combinations of 1, x2, x4, . . . , x2k, . . . .



187

3. If g is continuous on [−π/2, π/2], show that g is a uniform limit of finite linear combi-nations of

sin x, sin2

x, sin3

x , . . . , sink

x , . . . .

Hint. Write g(x) = f (sin x) with f continuous on [−1, 1].

4. If g is continuous on [−π, π] and even , show that g is a uniform limit of finite linearcombinations of

1, cos x, cos2 x , . . . , cosk x , . . . .

Hint. cos : [0, π] → [−1, 1] is a homeomorphism.

5. Assume h : R → C is continuous, periodic of period 2π, and odd , so

(2.18) h(x + 2π) = h(x), h(−x) = −h(x), ∀ x ∈ R.

Show that h is a uniform limit of finite linear combinations of

sin x, sin x cos x, sin x cos2 x , . . . , sin x cosk x , . . . .

Hint. Given ε > 0, find δ > 0 and continuous hε, satisfying (2.18), such that

supx

|h(x) − hε(x)| < ε, hε(x) = 0 if |x − jπ| < δ, j ∈ Z.

Then apply Exercise 4 to g(x) = hε(x)/ sin x.



188

3. The Stone-Weierstrass theorem

A far reaching extension of the Weierstrass approximation theorem, due to M. Stone,is the following result, known as the Stone-Weierstrass theorem.

Theorem 3.1. Let X be a compact metric space, A a subalgebra of C R(X ), the algebra of real valued continuous functions on X. Suppose 1 ∈ A and that A separates points of X,i.e., for distinct p, q ∈ X, there exists h pq ∈ A with h pq( p) = h pq(q ). Then the closure A is equal to C R(X ).

We present the proof in eight steps.

Step 1. Let f ∈ A and assume ϕ : R → R is continuous. If sup |f | ≤ A, we can applythe Weierstrass approximation theorem to get polynomials pk → ϕ uniformly on [−A, A].Then pk f → ϕ f uniformly on X , so ϕ f ∈ A.

Step 2. Consequently, if f j ∈ A, then

(3.1) max(f 1, f 2) = 1

2|f 1 − f 2| +

1

2(f 1 + f 2) ∈ A,

and similarly min(f 1, f 2)

∈ A.

Step 3. It follows from the hypotheses that if p, q ∈ X and p = q , then there existsf pq ∈ A, equal to 1 at p and to 0 at q .

Step 4. Apply an appropriate continuous ϕ : R → R to get g pq = ϕ f pq ∈ A, equal to 1on a neighborhood of p and to 0 on a neighborhood of q , and satisfying 0 ≤ g pq ≤ 1 on X .

Step 5. Fix p ∈ X and let U be an open neighborhood of p. By Step 4, given q ∈ X \ U ,there exists g pq ∈ A such that g pq = 1 on a neighborhood Oq of p, equal to 0 on aneighborhood Ωq of q , satisfying 0

≤g pq

≤1 on X .

Now Ωq is an open cover of X \ U , so there exists a finite subcover Ωq1 , . . . , ΩqN . Let

(3.2) g pU = min1≤j≤N

g pqj ∈ A.

Then g pU = 1 on O = ∩N 1 Oqj , an open neighborhood of p, g pU = 0 on X \ U , and0 ≤ g pU ≤ 1 on X .

Step 6. Take K ⊂ U ⊂ X , K closed, U open. By Step 5, for each p ∈ K , there existsg pU ∈ A, equal to 1 on a neighborhood O p of p, and equal to 0 on X \ U .



189

Now O p covers K , so there exists a finite subcover O p1 , . . . , O pm . Let

(3.3) gKU = max1≤j≤M

g pjU

∈ A.

We have

(3.4) gKU = 1 on K, 0 on X \ U, and 0 ≤ gKU ≤ 1 on X.

Step 7. Take f ∈ C R(X ) such that 0 ≤ f ≤ 1 on X . Fix k ∈ N and set

(3.5) K =

x ∈ X : f (x) ≥

k,

so X = K 0 ⊃ · · · ⊃ K ⊃ K +1 ⊃ · · · K k ⊃ K k+1 = ∅. Define open U ⊃ K by

(3.6) U =

x ∈ X : f (x) > − 1

k

, so X \ U =

x ∈ X : f (x) ≤ − 1

k

.

By Step 6, there exist ψ ∈ A such that

(3.7) ψ = 1 on K , ψ = 0 on X \ U , and 0 ≤ ψ ≤ 1 on X.

Let

(3.8) f k = max0≤≤k

kψ ∈ A.

It follows that f k ≥ /k on K and f k ≤ (−1)/k on X \U , for all . Hence f k ≥ (−1)/kon K −1 and f k ≤ /k on U +1. In other words,

(3.9) − 1

k ≤ f (x) ≤

k =⇒ − 1

k ≤ f k(x) ≤

k,

so

(3.10) |f (x) − f k(x)| ≤ 1

k, ∀ x ∈ X.

Step 8. It follows from Step 7 that if f ∈ C R(X ) and 0 ≤ f ≤ 1 on X , then f ∈ A. It isan easy final step to see that f ∈ C R(X ) ⇒ f ∈ A.

Theorem 3.1 has a complex analogue.



190

Theorem 3.2. Let X be a compact metric space, A a subalgebra (over C) of C (X ), the algebra of complex valued continuous functions on X . Suppose 1 ∈ A and that A separates the points of X . Furthermore, assume

(3.11) f ∈ A =⇒ f ∈ A.

Then the closure A = C (X ).

Proof. Set AR = f + f : f ∈ A. One sees that Theorem 3.1 applies to AR.

Here are a couple of applications of Theorems 3.1–3.2.

Corollary 3.3. If X is a compact subset of Rn, then every f ∈ C (X ) is a uniform limit of polynomials on Rn.

Corollary 3.4. The space of trigonometric polynomials, given by

(3.12)N

k=−N

ak eikθ,

is dense in C (S 1).

Exercises

1. Prove Corollary 3.3.

2. Prove Corollary 3.4, using Theorem 3.2.Hint. eikθeiθ = ei(k+)θ, and eikθ = e−ikθ.

3. Use the results of Exercises 4–5 in §2 to provide another proof of Corollary 3.4.Hint. Use cosk θ = ((eiθ + e−iθ)/2)k, etc.

4. Let X be a compact metric space, and K ⊂ X a compact subset. Show that A = f |K :f ∈ C (X ) is dense in C (K ).

5. In the setting of Exercise 4, take f ∈ C (K ), ε > 0. Show that there exists g1 ∈ C (X )

such that supK

|g1 − f | ≤ ε, and supX

|g1| ≤ supK

|f |.

6. Iterate the result of Exercise 5 to get gk ∈ C (X ) such that

supK

|gk − (f − g1 − · · · − gk−1)| ≤ 2−k, supX

|gk| ≤ 2−(k−1).

7. Use the results of Exercises 4–6 to show that, if f ∈ C (K ), then there exists g ∈ C (X )such that g|K = f .



191

4. Fourier series

We work on T1 = R/(2πZ), which under θ → eiθ is equivalent to S 1 = z ∈ C : |z| = 1.Given f ∈ C (T1), or more generally f ∈ R(T1) (or still more generally, if f ∈ R#(T1),defined as in §6 of Chapter 4), we set, for k ∈ Z,

(4.1) f (k) = 1

2π

2π

0

f (θ)e−ikθ dθ.

We call f (k) the Fourier coefficients of f . We say

(4.2) f ∈ A(T1) ⇐⇒∞

k=−∞

|f (k)| < ∞.

We aim to prove the following.

Proposition 4.1. Given f ∈ C (T1), if f ∈ A(T1), then

(4.3) f (θ) =∞

k=−∞

f (k)eikθ.

Proof. Given |ˆ

f (k)| < ∞, the right side of (4.3) is absolutely and uniformly convergent,defining

(4.4) g(θ) =∞

k=−∞

f (k)eikθ, g ∈ C (T1),

and our task is to show that f ≡ g. Making use of the identities

(4.5)

1

2π

2π

0

eiθ dθ = 0, if = 0,

1, if = 0,

we get f (k) = g(k), for all k ∈ Z. Let us set u = f − g. We have

(4.6) u ∈ C (T1), u(k) = 0, ∀ k ∈ Z.

It remains to show that this implies u ≡ 0. To prove this, we use Corollary 3.4, whichimplies that, for each v ∈ C (T1), there exist trigonometric polynomials, i.e., finite linearcombinations vN of eikθ : k ∈ Z, such that

(4.7) vN −→ v uniformly on T1.



192

Now (4.6) implies T1

u(θ)vN (θ) dθ = 0, ∀ N,

and passing to the limit, using (4.7), gives

(4.8)

T1

u(θ)v(θ) dθ = 0, ∀ v ∈ C (T1).

Taking v = u gives

(4.9)

T1

|u(θ)|2 dθ = 0,

forcing u ≡ 0, and completing the proof.

We seek conditions on f that imply (4.2). Integration by parts for f ∈ C 1(T1) gives,for k = 0,

(4.10)

f (k) = 1

2π

2π

0

f (θ) i

k

∂

∂θ(e−ikθ) dθ

= 1

2πik

2π

0

f (θ)e−ikθ dθ,

hence

(4.11) |f (k)| ≤ 1

2π|k| 2π

0

|f (θ)| dθ.

If f ∈ C 2(T1), we can integrate by parts a second time, and get

(4.12) f (k) = − 1

2πk2

2π

0

f (θ)e−ikθ dθ,

hence

|ˆ

f (k)| ≤ 1

2πk2 2π

0 |f

(θ)| dθ.

In concert with

(4.13) |f (k)| ≤ 1

2π

2π

0

|f (θ)| dθ,

which follows from (4.1), we have

(4.14) |f (k)| ≤ 1

2π(k2 + 1)

2π

0

|f (θ)| + |f (θ)| dθ.



193

Hence

(4.15) f ∈ C 2(T1) =⇒

|f (k)| < ∞.

We will sharpen this implication below. We start with an interesting example. Consider(4.16) f (θ) = |θ|, −π ≤ θ ≤ π,

and extend this to be periodic of period 2π, yielding f ∈ C (T1). We have

(4.17)f (k) =

1

2π

π−π

|θ|e−ikθ dθ

= −[1 − (−1)k] 1

πk2,

for k = 0, while f (0) = π/2. This is clearly a summable series, so f ∈ A(T1), andProposition 4.1 implies that, for

−π

≤θ

≤π,

(4.18)

|θ| = π

2 − k odd

2

πk2eikθ

= π

2 − 4

π

∞=0

1

(2 + 1)2 cos(2 + 1)θ.

Now, evaluating this at θ = 0 yields the identity

(4.19)∞=0

1

(2 + 1)2 =

π2

8 .

Using this, we can evaluate

(4.20) S =∞k=1

1

k2,

as follows. We have

(4.21)

∞k=1

1

k2 =

k≥1 odd

1

k2 +

k≥2 even

1

k2

= π2

8 +

1

4

∞=1

1

2,

hence S − S/4 = π2/8, so

(4.22)∞k=1

1

k2 =

π2

6 .

We see from (4.17) that if f is given by (4.16), then f (k) satisfies

(4.23) |f (k)| ≤ C

k2 + 1.

This is a special case of the following generalization of (4.15).



194

Proposition 4.2. Let f be Lipschitz continuous and piecewise C 2 on T1. Then (4.23)holds.

Proof. Here we are assuming f is C 2 on T1

\ p1, . . . , p

, and f and f have limits at each

of the endpoints of the associated intervals in T1, but f is not assumed to be differentiableat the endpoints p. We can write f as a sum of functions f ν , each of which is Lipschitzon T1, C 2 on T1 \ pν , and f ν and f ν have limits as one approaches pν from either side. It

suffices to show that each f ν (k) satisfies (4.23). Now g(θ) = f ν (θ + pν − π) is singular only

at θ = π , and g(k) = f ν (k)eik( pν−π), so it suffices to prove Proposition G.2 when f has asingularity only at θ = π. In other words, f ∈ C 2([−π, π]), and f (−π) = f (π).

In this case, we still have (4.10), since the endpoint contributions from integration byparts still cancel. A second integration by parts gives, in place of (4.12),

(4.24)

f (k) = 1

2πik π

−π

f (θ) i

k

∂

∂θ

(e−ikθ) dθ

= − 1

2πk2

π−π

f (θ)e−ikθ dθ + f (π) − f (−π)

,

which yields (4.23).

We next make use of (4.5) to produce results on T1

|f (θ)|2 dθ, starting with the follow-ing.

Proposition 4.3. Given f ∈ A(T1),

(4.25) |f (k)

|2 =

1

2π T1 |f (θ)

|2 dθ.

More generally, if also g ∈ A(T1),

(4.26)

f (k)g(k) = 1

2π

T1

f (θ)g(θ) dθ.

Proof. Switching order of summation and integration and using (4.5), we have

(4.27)

1

2π

T1

f (θ)g(θ) dθ = 1

2π

T1 j,k

f ( j)g(k)e−i(j−k)θ dθ

=k

f (k)g(k),

giving (4.26). Taking g = f gives (4.25).

We will extend the scope of Proposition 4.3 below. Closely tied to this is the issue of convergence of S N f to f as N → ∞, where

(4.28) S N f (θ) =|k|≤N

f (k)eikθ.



195

Clearly f ∈ A(S 1) ⇒ S N f → f uniformly on T1 as N → ∞. Here, we are interested inconvergence in L2-norm, where

(4.29) f 2L2 = 1

2π T1

|f (θ)|2 dθ.

Given f ∈ R(T1), this defines a “norm,” satisfying the following result, called the triangleinequality:

(4.30) f + gL2 ≤ f L2 + gL2 .

See Appendix A for details on this. Behind these results is the fact that

(4.31)

f

2

L

2 = (f, f )L2 ,

where, when f and g belong to R(T1), we set

(4.32) (f, g)L2 = 1

2π

S 1

f (θ)g(θ) dθ.

Thus the content of (4.25) is that

(4.33)

|f (k)|2 = f 2L2 ,

and that of (4.26) is that

(4.34)

f (k)g(k) = (f, g)L2 .

The left side of (4.33) is the square norm of the sequence ( f (k)) in 2. Generally, asequence (ak) (k ∈ Z) belongs to 2 if and only if

(4.35) (ak)22 =

|ak|2 < ∞.

There is an associated inner product

(4.36) ((ak), (bk)) =

akbk.

As in (4.30), one has (see Appendix A)

(4.37) (ak) + (bk)2 ≤ (ak)2 + (bk)2 .

As for the notion of L2-norm convergence, we say

(4.38) f ν → f in L2 ⇐⇒ f − f ν L2 → 0.



196

There is a similar notion of convergence in 2. Clearly

(4.39) f − f ν L2 ≤ supθ

|f (θ) − f ν (θ)|.

In view of the uniform convergence S N f → f for f ∈ A(T1) noted above, we have

(4.40) f ∈ A(T1) =⇒ S N f → f in L2, as N → ∞.

The triangle inequality implies

(4.41)f L2 − S N f L2

≤ f − S N f L2 ,

and clearly (by Proposition 4.3)

(4.42) S N f 2L2 =

N k=−N

|f (k)|2,

so

(4.43) f − S N f L2 → 0 as N → ∞ =⇒ f 2L2 =

|f (k)|2.

We now consider more general functions f ∈ R(T1). With f (k) and S N f defined by(4.1) and (4.28), we define RN f by

(4.44) f = S N f + RN f.

Note that T1

f (θ)e−ikθ dθ = T1

S N f (θ)e−ikθ dθ for |k| ≤ N , hence

(4.45) (f, S N f )L2 = (S N f, S N f )L2 ,

and hence

(4.46) (S N f, RN f )L2 = 0.

Consequently,

(4.47)f 2

L2 = (S N f + RN f, S N f + RN f )L2

= S N f 2L2 + RN f 2

L2 .

In particular,

(4.48) S N f L2 ≤ f L2 .

We are now in a position to prove the following.



197

Lemma 4.4. Let f, f ν belong to R(T1). Assume

(4.49) limν →∞

f − f ν L2 = 0,

and, for each ν ,

(4.50) limN →∞

f ν − S N f ν L2 = 0.

Then

(4.51) limN →∞

f − S N f L2 = 0.

Proof. Writing f

−S N f = (f

−f ν ) + (f ν

−S N f ν ) + S N (f ν

−f ), and using the triangle

inequality, we have, for each ν ,

(4.52) f − S N f L2 ≤ f − f ν L2 + f ν − S N f ν L2 + S N (f ν − f )L2 .

Taking N → ∞ and using (4.48), we have

(4.53) lim supN →∞

f − S N f L2 ≤ 2f − f ν L2 ,

for each ν . Then (4.49) yields the desired conclusion (4.51).

Given f

∈ C (T1), we have trigonometric polynomials f ν

→ f uniformly on T1, and

clearly (4.50) holds for each such f ν . Thus Lemma 4.4 yields the following.

(4.54)f ∈ C (T1) =⇒ S N f → f in L2, and

|f (k)|2 = f 2L2 .

Lemma 4.4 also applies to many discontinuous functions. Consider, for example

(4.55)f (θ) = 0 for − π < θ < 0,

1 for 0 < θ < π.

We can set, for ν ∈ N,

(4.56)

f ν (θ) = 0 for − π < θ < 0,

νθ for 0 ≤ θ ≤ 1

ν ,

1 for 1

ν ≤ θ < π.

Then each f ν ∈ C (T1). (In fact, f ν ∈ A(T1), by Proposition 4.2.). Also, one can checkthat f − f ν 2

L2 ≤ 1/ν . Thus the conclusion in (4.54) holds for f given by (4.55).



198

More generally, any piecewise continuous function on T1 is an L2 limit of continuousfunctions, so the conclusion of (4.54) holds for them. To go further, let us consider the classof Riemann integrable functions. A function f : T1 → R is Riemann integrable provided

f is bounded (say |f | ≤ M ) and, for each δ > 0, there exist piecewise constant functionsgδ and hδ on T1 such that

(4.57) gδ ≤ f ≤ hδ, and

T1

hδ(θ) − gδ(θ)

dθ < δ.

Then

(4.58)

T1

f (θ) dθ = limδ→0

T1

gδ(θ) dθ = limδ→0

T1

hδ(θ) dθ.

Note that we can assume |hδ|, |gδ| < M + 1, and so

(4.59)

1

2π

T1

|f (θ) − gδ(θ)|2 dθ ≤ M + 1

π

T1

|hδ(θ) − gδ(θ)| dθ

< M + 1

π δ,

so gδ → f in L2-norm. A function f : T1 → C is Riemann integrable provided its real andimaginary parts are. In such a case, there are also piecewise constant functions f ν → f in

L2

-norm, giving the following.Proposition 4.5. We have

(4.60)f ∈ R(T1) =⇒ S N f → f in L2, and

|f (k)|2 = f 2L2 .

This is not the end of the story. Lemma 4.4 extends to unbounded functions on T1 thatare square integrable, such as

(4.61) f (θ) = |θ|−α on [−π, π], 0 < α < 1

2 .

In such a case, one can take f ν (θ) = min(f (θ), ν ), ν ∈ N. Then each f ν is continuous andf − f ν L2 → 0 as ν → ∞. The conclusion of (4.60) holds for such f . We can fit (4.61)into the following general setting. If f : T1 → C, we say

f ∈ R2(T1) ⇐⇒ f, |f |2 ∈ R#(T1),

where R# is defined in §6 of Chapter 4. Though we will not pursue the details, Lemma4.4 extends to f, f ν ∈ R2(T1), and then (4.60) holds for f ∈ R2(T1).



199

The ultimate theory of functions for which the result

(4.62) S N f −→ f in L2-norm

holds was produced by H. Lebesgue in what is now known as the theory of Lebesguemeasure and integration. There is the notion of measurability of a function f : T1 →C. One says f ∈ L2(T1) provided f is measurable and

T1

|f (θ)|2 dθ < ∞, the integral

here being the Lebesgue integral. Actually, L2(T1) consists of equivalence classes of suchfunctions, where f 1 ∼ f 2 if and only if

|f 1(θ) − f 2(θ)|2 dθ = 0. With 2 as in (4.35), it isthen the case that

(4.63) F : L2(T1) −→ 2,

given by

(4.64) (F f )(k) = f (k),

is one-to-one and onto, with

(4.65)

|f (k)|2 = f 2L2 , ∀ f ∈ L2(T1),

and

(4.66) S N f −→ f in L2

, ∀ f ∈ L2

(T1

).

We refer to books on the subject (e.g., [T2]) for information on Lebesgue integration.We mention two key propositions which, together with the arguments given above,

establish these results. The fact that F f ∈ 2 for all f ∈ L2(T1) and (4.65)–(4.66) holdfollows via Lemma 4.4 from the following.

Proposition A. Given f ∈ L2(T1), there exist f ν ∈ C (T1) such that f ν → f in L2.

As for the surjectivity of F in (4.63), note that, given (ak) ∈ 2, the sequence

f ν (θ) = |k|≤ν

akeikθ

satisfies, for µ > ν ,

f µ − f ν 2L2 =

ν<|k|≤µ

|ak|2 → 0 as ν → ∞.

That is to say, (f ν ) is a Cauchy sequence in L2(T1). Surjectivity follows from the fact thatCauchy sequences in L2(T1) always converge to a limit:



200

Proposition B. If (f ν ) is a Cauchy sequence in L2(T1), there exists f ∈ L2(T1) such that f ν → f in L2-norm.

Proofs of Propositions A and B can be found in the standard texts on measure theoryand integration, such as [T2].

We now establish a sufficient condition for a function f to belong to A(T1), more generalthan that in Proposition 4.2.

Proposition 4.6. If f is a continuous, piecewise C 1 function on T1, then |f (k)| < ∞.

Proof. As in the proof of Proposition 4.2, we can reduce the problem to the case f ∈C 1([−π, π]), f (−π) = f (π). In such a case, with g = f ∈ C ([−π, π]), the integration byparts argument (4.10) gives

(4.67) f (k) = 1

ikg(k), k

= 0.

By (4.60),

(4.68)

|g(k)|2 = g2L2 .

Also, by Cauchy’s inequality (cf. Appendix A),

(4.69)

k=0

|f (k)| ≤k=0

1

k2

1/2k=0

|g(k)|21/2

≤ C gL2 .

This completes the proof.

Moving beyond square integrable functions, we now provide some results on Fourierseries for a function f ∈ R#(T1). For starters, if f ∈ R#(T1), then (4.1) yields

(4.70) |f (k)| ≤ 1

2π

2π

0

|f (θ)| dθ = 1

2πf L1(T1).

Using this, we can establish the following result, which is part of what is called theRiemann-Lebesgue lemma.

Proposition 4.7. Given f ∈ R#(T1),

(4.71) f (k) −→ 0, as |k| → ∞.

Proof. By Proposition 6.4 of Chapter 4, there exist f ν ∈ R(T1) such that

(4.72) f − f ν L1(T1) −→ 0, as ν → ∞.



201

Now Proposition 4.5 applies to each f ν , so k |f ν (k)|2 < ∞, for each ν . Hence

(4.73) f ν (k) −→ 0, as k → ∞, for each ν.

Since

(4.74) supk

|f (k) − f ν (k)| ≤ 1

2πf − f ν L1(T1),

(4.71) follows.

We now consider conditions on f ∈ R#(T1) guaranteeing that S N f (θ) converges to f (θ)as N → ∞, at a particular point θ ∈ T1. Note that

(4.75)

S N f (θ) =N

k=−N

f (k)eikθ

= 1

2π

N k=−N

T1

f (ϕ)eik(θ−ϕ) dϕ

=

T1

f (ϕ)DN (θ − ϕ) dϕ,

where Dn(θ), called the Dirichlet kernel, is given by

(4.76) DN (θ) = 1

2π

N

k=−N

eikθ.

The following compact formula is very useful.

Lemma 4.8. We have DN (0) = (2N + 1)/2π, and if θ ∈ T1 \ 0,

(4.77) DN (θ) = 1

2π

sin(N + 1/2)θ

sin θ/2 .

Proof. The formula (4.76) can be rewritten

(4.78) DN (θ) = 1

2π e−iNθ

2N k=0

eikθ.

Using the geometrical series 2N k=0 zk = (1 − z2N +1)/(1 − z), for z = 1, we have

(4.79)DN (θ) =

1

2π e−iNθ

1 − ei(2N +1)θ

1 − eiθ

= 1

2π

ei(N +1)θ − e−iNθ

eiθ − 1 ,



202

and multiplying numerator and denominator by e−iθ/2 gives (4.77).

Note that if Rϕf (θ) = f (θ + ϕ), then, for each f ∈ R#(T1),

(4.80) S N Rϕf = RϕS N f,

so to test for convergence of S N f to f at ϕ, it suffices to test for convergence of S N Rϕf to Rϕf at θ = 0. Thus we seek conditions that

(4.81) S N f (0) =

T1

f (ϕ)DN (ϕ) dϕ

converges to f (0) as N → ∞. (Note that DN (ϕ) = DN (−ϕ).) We have

(4.82) S N f (0) = 1

2π

π−π

f (θ)

sin θ/2 sin(N + 1

2 )θdθ.

Also,

(4.83) sin

N + 1

2

θ = (sin N θ)(cos 1

2 θ) + (cos N θ)(sin 12 θ).

Using this in concert with Proposition 4.7, we have the following.

Lemma 4.9. Let f

∈ R#(T1). Assume f “vanishes” at θ = 0 in the sense that

(4.84) f (θ)

sin θ/2 ∈ R#([−π, π]).

Then

(4.85) S N f (0) −→ 0, as N → ∞.

Applying Lemma 4.9 to f (θ) = g(θ) − g(0), we have the following.

Corollary 4.10. Let g ∈ R

#(T1), and assume

(4.86) g(θ) − g(0)

sin θ/2 ∈ R#([−π, π]).

Then

(4.87) S N g(0) −→ g(0), as N → ∞.

Bringing in (4.80), we have the following.



203

Proposition 4.11. Let f ∈ R#(T1). Fix θ0 ∈ T1. If

(4.88) f (θ) − f (θ0)

sin(θ − θ0)/2 ∈ R#([

−π + θ0, π + θ0]),

then

(4.89) S N f (θ0) −→ f (θ0), as N → ∞.

Proposition 4.11 has the following application. We say a function f ∈ R#(T1) is Holdercontinuous at θ0 ∈ T1, with exponent α ∈ (0, 1], provided there exists δ > 0, C < ∞, suchthat

(4.90) |θ − θ0| ≤ δ =⇒ |f (θ) − f (θ0)| ≤ C |θ − θ0|α.

Proposition 4.11 implies the following.

Proposition 4.12. Let f ∈ R#(T1). If f is H¨ older continuous at θ0, with some exponent α ∈ (0, 1], then (4.89) holds.

Proof. We have

(4.91) f (θ) − f (θ0)

sin(θ

−θ0)/2

≤ C |θ − θ0|−(1−α),

for |θ − θ0| ≤ δ . Since sin(θ − θ0)/2 is bounded away from 0 for θ ∈ [−π + θ0, π + θ0] \[θ0 − δ, θ0 + δ ], the hypothesis (4.88) holds.

We now look at the following class of piecewise regular functions, with jumps. Takepoints pj ,

(4.92) −π = p0 < p1 < · · · < pN = π.

Take functions

(4.93) f j : [ pj , pj+1] −→ C,

Holder continuous with exponent α > 0, for 0 ≤ j ≤ N − 1. Define f : T1 → C by

(4.94)

f (θ) = f j(θ), if pj < θ < pj+1,

f j( pj+1) + f j+1( pj+1)

2 , if θ = pj+1.

By convention, we take N ≡ 0 (recall that π ≡ −π in T1).



204

Proposition 4.13. With f as specified above, we have

(4.95) S N f (θ) −→ f (θ), ∀ θ ∈ T1.

Proof. If θ /∈ p0, . . . , pN , this follows from Proposition 4.12. It remains to consider thecase θ = pj for some j. By (4.80), there is no loss of generality in taking pj = 0. Parallelto (4.80), we have

(4.96) S N T f = T S N f, T f (θ) = f (−θ).

Hence

(4.97) S N f (0) = 1

2S N (f + T f )(0).

However, f + T f is Holder continuous at θ = 0, with value 2f (0), so Proposition 4.12implies

(4.98) 1

2S N (f + T f )(0) −→ f (0), as N → ∞.

This gives (4.95) for θ = pj = 0.

Exercises

1. Prove (4.80).

2. Prove (4.96).

3. Compute f (k) when

(4.99)f (θ) = 1 for 0 < θ < π,

0 for − π < θ < 0.

Then use (4.60) to obtain another proof of (4.22).

4. Apply Proposition 4.13 to f in (4.99), when

θ = 0, π

2, π.

5. Apply (4.60) when f (θ) is given by (4.16). Use this to show that

∞k=1

1

k4 =

π4

90.



205

6. Use Proposition 4.12 in concert with Proposition 4.2 to demonstrate that (4.3) holdswhen f is Lipschitz and piecewise C 2 on T1, without recourse to Corollary 3.4 (whose

proof in §3 uses the Stone-Weierstrass theorem). Use this in turn to prove Proposition 4.1,without using Corollary 3.4.

7. Use the results of Exercise 6 to give a proof of Corollary 3.4 that does not use theStone-Weierstrass theorem.Hint. As in the end of the proof of Theorem 2.1, each f ∈ C (T1) can be uniformlyapproximated by a sequence of Lipschitz, piecewise linear functions.

Recall that Corollary 3.4 states that each f ∈ C (T1) can be uniformly approximated bya sequence of finite linear combinations of the functions eikθ, k ∈ Z. The proof given in

§3 relied on the Weierstrass approximation theorem, Theorem 2.1, which was used in the

proof of Theorems 3.1 and 3.2. Exercise 7 indicates a proof of Corollary 3.4 that does notdepend on Theorem 2.1.

8. Give another proof of Theorem 2.1, as a corollary of Corollary 3.4.Hint. You can take I = [−π/2, π/2]. Given f ∈ C (I ), you can extend it to f ∈ C ([−π, π]),vanishing at ±π, and identify such f with an element of C (T1). Given ε > 0, approximatef uniformly to within ε on [−π, π] by a finite sum

N

k=−N

akeikθ.

Then approximate eikθ uniformly to within ε/(2N + 1) for each k ∈ −N , . . . , N , by apartial sum of the power series for eikθ.



206

5. Newton’s method

Here we describe a method to approximate the solution to

(5.1) f (ξ ) = 0.

We assume f : [a, b] → R is continuous and f ∈ C 2((a, b)). We assume it is known thatf vanishes somewhere in (a, b). For example, f (a) and f (b) might have opposite signs.We take x0 ∈ (a, b) as an initial guess of a solution to (5.1), and inductively construct thesequence (xk), going from xk to xk+1 as follows. Replace f by its best linear approximationat xk,

(5.2) g(x) = f (xk) + f (xk)(x − xk),

and solve g(xk+1) = 0. This yields

(5.3) xk+1 − xk = − f (xk)

f (xk),

or

(5.4) xk+1 = xk − f (xk)f (xk) .

Naturally, we need to assume f (x) is bounded away from 0 on (a, b). This production of the sequence (xk) is Newton’s method, and as we will see, under appropriate hypothesesit converges quite rapidly to ξ .

We want to give a condition guaranteeing that |xk+1 − ξ | < |xk − ξ |. Say

(5.5) xk = ξ + δ.

Then (5.4) yields

(5.6)

xk+1 − ξ = δ − f (ξ + δ )

f (ξ + δ )

= f (ξ + δ )δ − f (ξ + δ )

f (ξ + δ ) .

Now the mean value theorem implies

(5.7) f (ξ + δ ) − f (ξ ) = f (ξ + τ δ )δ, for some τ ∈ (0, 1).



207

Since f (ξ ) = 0, we get from (5.6) that

(5.7) xk+1

−ξ =

f (ξ + δ ) − f (ξ + τ δ )

f (ξ + δ )

δ.

A second application of the mean value theorem gives

(5.8) f (ξ + δ ) − f (ξ + τ δ ) = (1 − τ )δf (ξ + γδ ),

for some γ ∈ (τ, 1), hence

(5.9) xk+1 − ξ = (1 − τ )f (ξ + γδ )

f (ξ + δ ) δ 2, τ ∈ (0, 1), γ ∈ (τ, 1).

Consequently,

(5.10) |xk+1 − ξ | ≤ sup0<γ<1

f (ξ + γδ )

f (ξ + δ )

δ 2.

A favorable condition for convergence is that the right side of (5.10) is ≤ βδ for someβ < 1. This leads to the following.

Proposition 5.1. Let f ∈ C ([a, b]) be C 2 on (a, b). Assume there exists a solution ξ ∈(a, b) to (5.1). Assume there exist A, B ∈ (0, ∞) such that

(5.11) |f (x)| ≤ A, |f (x)| ≥ B, ∀ x ∈ (a, b).

Pick x0 ∈ (a, b). Assume

(5.12) |x0 − ξ | = δ 0, [ξ − δ 0, ξ + δ 0] ⊂ (a, b),

and

(5.13) A

Bδ 0 = β < 1.

Then xk, defined inductively by (5.4), converges to ξ as k → ∞.

When Proposition 5.1 applies, one clearly has

(5.14) |xk − ξ | ≤ β kδ 0.

In fact, (5.10) implies much faster convergence than this. With |xk−ξ | = δ k, (5.10) implies

(5.15) δ k+1 ≤ A

Bδ 2k,



208

hence

(5.16) δ 1

≤

A

B

δ 20 , δ 2

≤ A

B1+2

δ 40 , δ 3

≤ A

B1+2+4

δ 80 ,

and, inductively,

(5.17) δ k ≤A

B

2k−1

δ 2k0 = β 2k−1δ 0,

with β as in (5.13). Note that the exponent on β in (5.17) is much larger (for moderatelylarge k) than that in (5.14). One says the sequence (xk) converges quadratically to thelimit ξ , solving (5.1). Roughly speaking, xk+1 has twice as many digits of accuracy as xk.

If we change (5.1) to

(5.18) f (ξ ) = y,

then the results above apply to f (x) = f (x) − y, so we get the sequence of approximatesolutions defined inductively by

(5.19) xk+1 = xk − f (xk) − y

f (xk) ,

and the formula (5.9) and estimate (5.10) remain valid.As an example, let us take

(5.20) f (x) = x2 on [a, b] = [1, 2],

and approximate ξ =√

2, which solves (5.18) with y = 2. Note that f (1) = 1 < 2 andf (2) = 4 > 2. In this case, (5.19) becomes

(5.21)xk+1 = xk − x2

k − 2

2xk

= xk

2 +

1

xk.

Let us pick

(5.22) x0 = 3

2.

Examining (1.4)2 and (1.5)2, we see that 1.4 <√

2 < 1.5. Thus (5.12) holds with δ 0 < 1/10.Furthermore, (5.11) holds with A = B = 2, so (5.13) holds with β < 1/10. Hence, by(5.17),

(5.23) |xk −√

2| ≤ 10−2k .



209

Explicit computations give

(5.24)

x0 = 1.5

x1 = 1.41666666666666x2 = 1.41421568627451

x3 = 1.41421356237469

x4 = 1.41421356237309.

We have |x24 − 2| ≤ 4 · 10−16, consistent with (5.23).

Under certain circumstances, Newton’s method can be even better than quadraticallyconvergent. This happens when f (ξ ) = 0, assuming also that f is C 3. In such a case, themean value theorem implies

(5.25) f

(ξ + γδ ) = f

(ξ + γδ ) − f

(ξ )= γδf (3)(ξ + σγδ ),

for some σ ∈ (0, 1). Hence, given |xk − ξ | = δ k, we get from (5.10) that

(5.26) |xk+1 − ξ | ≤ sup0<γ<1

f (3)(ξ + γδ k)

f (ξ + δ k)

δ 3k.

Thus xk → ξ cubically .Here is an application to the production of a sequence that rapidly converges to π, based

on

(5.27) sin π = 0.

We take f (x) = sin x. Then f (x) = − sin x, so the considerations above apply. Theiteration (5.4) becomes

(5.28) xk+1 = xk − sin xkcos xk

.

If xk = π + δ k, note that

(5.29) cos(π + δ k) = −1 + O(δ

2

k),so the iteration

(5.30) xk+1 = xk + sin xk

is also cubically convergent, if x0 is chosen close enough to π. Now, the first few terms of the series (4.27)–(4.31) of Chapter 4, applied to

(5.31) π

6 =

1/2

0

dx√ 1 − x2



210

(cf. Chapter 4, §5, Exercise 7, (5.45A)), yields π = 3.14 · · · . We take

(5.32) x0 = 3,

and use the iteration (5.30), obtaining

(5.33)

x1 = 3.14112000805987

x2 = 3.14159265357220

x3 = 3.14159265358979.

The error π − x2 is < 2 · 10−11, and all the printed digits of x3 are accurate. If thecomputation were to higher precision, x3 would approximate π to quite a few more digits.

By contrast, we apply Newton’s method to

(5.34) sin π6

= 12

(equivalent to (5.31)). In this case, f (x) = sin x/6, and (5.19) becomes

(5.35) xk+1 = xk − 6sin(xk/6) − 1/2

cos(xk/6) .

If we take x0 = 3, as in (5.32), the iteration (5.35) yields

(5.36)

x1 = 3.14066684291090

x2 = 3.14159261236234

x3 = 3.14159265358979.

Note that x1 here is almost as accurate an approximation to π as is x1 in (5.33), but x2

here is substantially less accurate than x2 in (5.33). Here, x3 has full accuracy, thoughas noted above, x3 in (5.33) could be much more accurate if the computation (5.30) weredone to higher precision.

Exercises

Using a calculator or a computer, implement Newton’s method to get approximate solu-tions to the following equations.

1. x5 − x3 + 1 = 0.

2. ex = 2x.

3. tan x = x.



211

4. x log x = 2.

5. xx

= 3.

6. Apply Newton’s method to f (x) = 1/x, obtaining the sequence

(5.37) xk+1 = 2xk − ax2k

of approximate solutions to f (x) = a. That is, xk → 1/a, if x0 is close enough. Try thisout with a = 3, x0 = 0.3. Note that the right side of (5.37) involves only multiplicationand subtraction.

7. Prove Proposition 5.1 when the hypothesis (5.12) is replaced by

(5.38) |f (x0)| ≤ Bδ 0, [x0 − δ 0, x0 + δ 0] ⊂ (a, b).



212

A. Inner product spaces

In §4, we have looked at norms and inner products on spaces of functions, such as C (S 1)and R(S 1), which are vector spaces. Generally, a complex vector space V is a set on whichthere are operations of vector addition:

(A.1) f, g ∈ V =⇒ f + g ∈ V,

and multiplication by an element of C (called scalar multiplication):

(A.2) a ∈ C, f ∈ V =⇒ af ∈ V,

satisfying the following properties. For vector addition, we have

(A.3) f + g = g + f, (f + g) + h = f + (g + h), f + 0 = f, f + (−f ) = 0.

For multiplication by scalars, we have

(A.4) a(bf ) = (ab)f, 1 · f = f.

Furthermore, we have two distributive laws:

(A.5) a(f + g) = af + ag, (a + b)f = af + bf.

These properties are readily verified for the function spaces mentioned above.An inner product on a complex vector space V assigns to elements f, g ∈ V the quantity

(f, g) ∈ C, in a fashion that obeys the following three rules:

(A.6)

(a1f 1 + a2f 2, g) = a1(f 1, g) + a2(f 2, g),

(f, g) = (g, f ),

(f, f ) > 0 unless f = 0.

A vector space equipped with an inner product is called an inner product space. For

example,

(A.7) (f, g) = 1

2π

S 1

f (θ)g(θ) dθ

defines an inner product on C (S 1), and also on R(S 1), where we identify two functionsthat differ only on a set of upper content zero. Similarly,

(A.8) (f, g) =

∞−∞

f (x)g(x) dx



213

defines an inner product on R(R) (where, again, we identify two functions that differ onlyon a set of upper content zero).

As another example, in we define 2 to consist of sequences (ak)k∈Z such that

(A.9)∞

k=−∞

|ak|2 < ∞.

An inner product on 2 is given by

(A.10)

(ak), (bk)

=∞

k=−∞

akbk.

Given an inner product on V , one says the object f defined by

(A.11) f =

(f, f )

is the norm on V associated with the inner product. Generally, a norm on V is a functionf → f satisfying

af = |a| · f , a ∈ C, f ∈ V,(A.12)

f > 0 unless f = 0,(A.13)

f + g

≤ f

+

g

.(A.14)

The property (H.14) is called the triangle inequality. A vector space equipped with a normis called a normed vector space. We can define a distance function on such a space by

(A.15) d(f, g) = f − g.

Properties (A.12)–(A.14) imply that d : V × V → [0, ∞) makes V a metric space.If f is given by (A.11), from an inner product satisfying (A.6), it is clear that (A.12)–

(A.13) hold, but (A.14) requires a demonstration. Note that

(A.16)f + g2 = (f + g, f + g)

= f 2 + (f, g) + (g, f ) + g2

= f 2 + 2 Re(f, g) + g2,

while

(A.17) (f + g)2 = f 2 + 2f · g + g2.

Thus to establish (A.17) it suffices to prove the following, known as Cauchy’s inequality.



214

Proposition A.1. For any inner product on a vector space V , with f defined by (A.11),

(A.18) |(f, g)| ≤ f · g, ∀ f, g ∈ V.

Proof. We start with

(A.19) 0 ≤ f − g2 = f 2 − 2Re(f, g) + g2,

which implies

(A.20) 2 Re(f, g) ≤ f 2 + g2, ∀ f, g ∈ V.

Replacing f by af for arbitrary a

∈C of absolute velue 1 yields 2 Re a(f, g)

≤ f

2 +

g

2,

for all such a, hence

(A.21) 2|(f, g)| ≤ f 2 + g2, ∀ f, g ∈ V.

Replacing f by tf and g by t−1g for arbitrary t ∈ (0, ∞), we have

(A.22) 2|(f, g)| ≤ t2f 2 + t−2g2, ∀ f, g ∈ V, t ∈ (0, ∞).

If we take t2 = g/f , we obtain the desired inequality (A.18). This assumes f and gare both nonzero, but (A.18) is trivial if f or g is 0.

An inner product space V is called a Hilbert space if it is a complete metric space, i.e.,if every Cauchy sequence (f ν ) in V has a limit in V . The space 2 has this completenessproperty, but C (S 1), with inner product (A.7), does not, nor does R(S 1). Chapter 2describes a process of constructing the completion of a metric space. When appied to anincomplete inner product space, it produces a Hilbert space. When this process is appliedto C (S 1), the completion is the space L2(S 1). An alternative construction of L2(S 1) usesthe Lebesgue integral.



215

References

[AH] J. Arndt and C. Haenel, π Unleashed, Springer-Verlag, New York, 2001.[BS] R. Bartle and D. Sherbert, Introduction to Real Analysis, J. Wiley, New York,

1992.[Be] P. Beckmann, A History of π, St. Martin’s Press, New York, 1971.

[Dev] K. Devlin, The Joy of Sets: Fundamentals of Contemporary Set Theory, Springer-Verlag, New York, 1993.

[Fol] G. Folland, Real Analysis: Modern Techniques and Applications, Wiley-Interscience,New York, 1984.

[Niv] I. Niven, A simple proof that π is irrational, Bull. AMS 53 (1947), 509.

[T1] M. Taylor, Measure Theory and Integration, American Mathematical Society, Prov-idence RI, 2006.

[T2] M. Taylor, Introduction to Analysis in Several Variables. Preprint, 2014.[T3] M. Taylor, Introduction to Complex Analysis. Preprint, 2014.[T4] M. Taylor, Partial Differential Equations, Vols. 1–3, Springer-Varlag, New York,

1996 (2nd ed., 2011).[T5] M. Taylor, Introduction to Differential Equations, American Mathematical Society,

Providence RI, 2011.



216

Index

absolute value 26, 55absolutely convergent series 38, 57, 84accumulation point 70, 72alternating series 38arc length 131Archimedean property 24, 34Ascoli’s theorem 93associative law 9, 14

ball 70Banach space 93Bolzano-Weierstrass theorem 36

cancellation law 10Cantor set 53, 120Card 46cardinal number 45cardinality 46Cauchy inequality 64

Cauchy remainder formula 127Cauchy sequence 27, 35, 65, 69change of variable 119circle 134cis 57closed set 50, 58, 70closure 70commutative law 9, 14compact set 50, 58, 66, 72, 100completeness property 36, 56, 65

completion 69complex conjugate 55complex number 55composite number 18connected 70, 79cont+ 111cont− 111continued fraction 30continuous 51, 77, 98, 108convergent sequence 26, 35, 56



217

convex 105convolution 179cos 57, 135, 145

cosh 146countable 47countably infinite 47cover 52cubic convergence 209curve 132

Darboux theorem 109, 114dense 70derivative 98diagonal construction 75differentiable function 98differential equation 140Dirichlet kernel 201distance 69disk 86dot product 63

e 42, 140elliptic function 137elliptic integral 137

equicontinuity 93equivalence class 13equivalence relation 13Euclidean space 63Euler identity 58, 145exponential function 57, 140

Fourier inversion 177Fourier series 177, 191Fundamental theorem of algebra 162

Fundamental theorem of arithmetic 18Fundamental theorem of calculus 112, 113function 77

Generalized mean value theorem 106geometric series 86

Heine-Borel theorem 52Holder continuous 203



218

improper integral 154induction principle 8infinite decimal expansion 39

infinite series 37, 56, 83inner product 212inner product space 212integer 13integral 107integral remainder formula 127integral test 120integration by parts 118Intermediate value theorem 51, 71, 79interval 51, 70Inverse function theorem 101, 134irrational number 42, 163

Lagrange remainder formula 127limit 26log 141lower content 111

max 51, 59, 68, 184maximum 77, 99maxsize 107

Mean value theorem 100, 113, 114, 206metric space 69min 51, 59, 68, 81, 184minimum 77, 99modulus of continuity 78monotone sequence 28, 36multiplying power series 88

natural logarithm 142neighborhood 70

Newton’s method 206norm 63

open set 50, 58, 70order relation 10, 15, 23outer measure 111

parametrization by arc length 134partition 107path-connected 77



219

Peano axioms 8perfect set 53π 135, 146, 147, 164, 209

piecewise constant 116piecewise regular 201polar coordinates 57, 137polynomial 162, 179power series 86, 122, 140prime number 18principle of induction 8product rule 98, 141Pythagorean theorem 55, 64

quadratic convergence 208

raduis of convergence 86ratio test 31rational number 21real number 32refinement 107remainder in a power series 126reparametrization 132Riemann integrable 108Riemann integral 107

Riemann-Lebesgue lemma 200Riemann sum 110, 132

Schroeder-Bernstein theorem 46second derivative 103second derivative test 105semicontinuous 80sec 146sequence 26sin 57, 135, 145

sinh 147speed 132Stone-Weierstrass theorem 188subgroup 19sup 37, 78supremum property 37

tan 145triangle inequality 26, 55, 64, 69, 213trigonometric function 135, 145



220

trigonometric polynomial 190Tychonov theorem 74

unbounded integrable function 154uncountable 48uniform convergence 82uniformly continuous 78uniformly equicontinuous 94upper bound 37upper content 111

vector space 211

Analysis in One Variable

Documents

Transcript of Analysis in One Variable