Real Analysis II - Rutgers Universityandromeda.rutgers.edu/~loftin/ra2/ra2.pdf · Real Analysis II...

Real Analysis II

John Loftin∗

May 13, 2017

1 Spaces of functions

1.1 Banach spaces

Many natural spaces of functions form infinite-dimensional vector spaces.Examples are the space of polynomials and the space of smooth functions.If we are interested in solving differential equations, then, it is important tounderstand analysis in infinite-dimensional vector spaces (over R or C).

First of all, we should recognize the following straightforward fact aboutfinite-dimensional vector spaces:

Homework Problem 1. Let x = (x1, . . . , xm) denote a point in Rm, andlet xn = (x1

n, . . . , xmn ) be a sequence of points in Rm. Then xn → x if

and only if xin → xi for all i = 1, . . . ,m.

(Recall the standard metric on Rm is given by |x − y|, where the norm| · | is given by |x| =

√(x1)2 + · · ·+ (xm)2.)

Thus for taking limits in Rm, we could even dispense with the notion ofthe taking limits using the metric on Rm, and simply define the xn → xby xin → xi for each i = 1, . . . ,m. This reflects the fact that there is onlyone natural topology on a finite-dimensional vector space: that given by thestandard norm.

For infinite-dimensional vector spaces, say with a countable basis, so thatx = (x1, x2, . . . ), it is possible to define a topology by xn → x if and onlyif each xin → xi. It turns out that this is not usually the most useful way

∗partially supported by NSF Grant DMS0405873

1

to define limits in infinite-dimensional spaces, however (though a relatedconstruction is used in defining the topology of Frechet spaces).

Finite-dimensional vector spaces are also all complete with respect totheir standard norm (in other words, they are all Banach spaces). Given anorm on an infinite dimensional vector space, completeness must be proved,however. There are many examples of Banach function spaces: On a measurespace, the Lp spaces of functions are all Banach spaces for 1 ≤ p ≤ ∞. Also,on a metric space X, the space of all bounded continuous functions C0(X)is a measure space under the norm

‖f‖C0(X) = supx∈X|f(x)|.

The Lp and C0 form the basis of most other useful Banach spaces, with exten-sions typically provided by measuring not just the functions themselves, butalso their partial derivatives (as in Sobolev and Ck spaces) or their differencequotients (Holder spaces).

Completeness of a metric space of course means that any Cauchy sequencehas a unique limit. More roughly, this means that any sequence that shouldconverge, in that its elements are becoming infinitesimally close to each other,will converge to a limit in the space. As we will see, taking such limits isa powerful way to construct solutions to analytic problems. Unfortunately,many of the most familiar spaces of functions (such as smooth functions) donot have the structure of a Banach space, and so it is difficult to ensure thata given limit of smooth functions is smooth. In fact we have the followingtheorem, which we state without proof:

Theorem 1. On Rn equipped with Lebesgue measure, the space C∞0 (Rn) ofsmooth functions with compact support is dense in Lp(Rn) for all 1 ≤ p <∞.In other words, completion of the space of smooth functions with compactsupport on Rn with respect to the Lp norm, is simply the space of all Lp

functions for 1 ≤ p <∞.

If we are working in L2, for example, it is possible for the limit of smoothfunctions to be quite non-smooth: there are many L2 functions which arediscontinuous everywhere. This poses a potential problem if the limit wehave produced is supposed to be a solution to a differential equation. Inparticular, such a limit may be nowhere differentiable. Some of our goals thenare to understand (1) how to make sense of taking derivatives of functionswhich are not classically differentiable (the theory of distributions and weak

2

derivatives), and (2) how to show that a limit function actually has enoughderivatives to solve the equation (bootstrapping).

Theorem 1 reminds us that the Lp Banach spaces have a very large over-lap, which of course includes many more functions than the smooth functionswith compact support. In particular, it is often useful to take the point ofview that these Banach function spaces are not so much different spaces butdifferent tools to study either the space of all functions or (via the comple-tion process) the space of only very nice functions (e.g., smooth functions ofcompact support).

In particular, two function spaces which are very closely related to eachother are L∞ and C0. As we will see below, they have essentially the samenorm. First of all, we show that C0(X) is a Banach space for any metricspace X.

1.2 The Banach space C0

Given a metric space X, define

C0(X) = f : X → R : f is continuous and supX|f | <∞.

Define the norm‖f‖C0(X) = sup

X|f |.

It is straightforward to verify that ‖ · ‖C0 satisfies the requirements for anorm:

• ‖f‖C0 = 0 ⇐⇒ f ≡ 0,

• ‖λf‖C0 = |λ|‖f‖C0 ,

• ‖f + g‖C0 ≤ ‖f‖C0 + ‖g‖C0 .

Remark. If fi → f in C0(X), then we say fi → f uniformly on X, andC0(X) convergence is the same as uniform convergence.

The main thing to check is that the norm gives C0(X) the structure of acomplete metric space:

Proposition 1. For any metric space X, C0(X) is a Banach space withnorm ‖ · ‖C0.

3

Proof. We simply need to check the metric induced on C0(X) is complete.Let d denote the metric on X, and consider a Cauchy sequence fi ⊂

C0(X). In other words, for all ε > 0, there is an N so that n,m > Nimplies ‖fn − fm‖C0 < ε. By the definition of the norm, this is equivalent to|fn(x) − fm(x)| < ε for all x ∈ X. Now for each x ∈ X, fi(x) ⊂ R is aCauchy sequence, and since R is complete, there is a limit f∞(x) = limi fi(x).

Now we have produced a limit function f∞. Now we need to show that‖fi − f∞‖C0 → 0 and f∞ ∈ C0(X). The first statement is straightforward:For all ε > 0, there is an N so that for all n,m > N , for all x ∈ X,

|fn(x)− fm(x)| < ε.

Now let m→∞ to see that

|fn(x)− f∞(x)| ≤ ε.

So we have that for all ε > 0, there is an N so that for all n > N , and for allx ∈ X,

|fn(x)− f∞(x)| ≤ ε.

Since this is true for all x ∈ X, we have

‖fn − f∞‖C0 = supx∈X|fn(x)− f∞(x)| ≤ ε,

and so ‖fi − f∞‖C0 → 0.We still need to prove that the limit function f∞ is continuous. So let x ∈

X and choose ε > 0. Then there is an N so that for n > N , ‖fn−f∞‖C0 < ε.By the previous paragraph and the definition of ‖ · ‖C0 ,

|fn(x)− f∞(x)| < ε and |fn(y)− f∞(y)| < ε for all y ∈ X.

Choose a particular n > N and since fn is continuous at x, there is a δ > 0so that |fn(x) − fn(y)| < ε for y so that d(x, y) < δ. Then for such y in aδ-ball around x,

|f∞(x)− f∞(y)| = | [f∞(x)− fn(x)] + [fn(x)− fn(y)] + [fn(y)− f∞(y)] |≤ |f∞(x)− fn(x)|+ |fn(x)− fn(y)|+ |fn(y)− f∞(y)|< ε+ ε+ ε = 3ε.

So we have proved that for all ε > 0, x ∈ X, there is a δ > 0 so thatd(x, y) < δ ⇒ |f∞(x)− f∞(y)| < 3ε. This proves f∞ is continuous.

4

The last bit of the proof can be remembered as this: Any uniform limitof continuous functions is continuous.

Remark. The previous proposition works as well for functions whose range isthe complex numbers C, or a vector space Rn, or in fact any Banach spaceB. The proof is the same. In this last case, we could refer to the Banachspace C0(X;B) as the Banach space of continuous functions from X into B.

Consider an open set Ω ⊂ Rn. On Ω, the C0 norm is essentially the sameas the L∞ norm, but is simpler to define because we can consider functionsas elements of C0, while we need equivalence classes of functions to defineL∞. In fact, more is true. Let Ω inherit the standard metric and Lebesguemeasure from Rn. For a measurable function f : Ω → R, let [f ] be theequivalence class whose members are all functions from Ω→ R which agreewith f almost everywhere.

Proposition 2. The map Φ : C0(Ω) → L∞(Ω) given by Φ(f) = [f ] is one-to-one and preserves the norm.

Proof. First of all, note that it follows immediately from the definitions thatfor f ∈ C0(Ω), Φ(f) ∈ L∞(Ω). Also, we should show that ‖f‖C0 = ‖Φ(f)‖L∞to show Φ preserves the norm.

The proof hinges on the simple fact that every full-measure subset V of Ωis dense in Ω. (Recall V ⊂ Ω has full measure if Ω\V has Lebesgue measurezero.) This fact may be proved as follows: let V ⊂ Ω have full measure.Then there is no open ball contained in Ω \V (since open balls have positivemeasure). This shows V is dense in Ω. (Question: We need to use Ω is anopen subset of Rn in this paragraph. Where did we use that Ω is open?)

Now we prove the map Φ is injective. So if f and g are in C0(Ω), and[f ] = [g], then by definition, f ≡ g on a set V of full measure. Let x ∈ Ω.Since V is dense, there is a sequence xn → x, xn ∈ V . Then

f(x) = f(limnxn) = lim

nf(xn) = lim

ng(xn) = g(lim

nxn) = g(x)

since f and g are continuous and f(xn) = g(xn). So f and g coincide at eachpoint of Ω and so f = g in C0(Ω).

Finally, we show that for f ∈ C0(Ω), ‖f‖C0 = ‖f‖L∞ . In particular, let µdenote Lebesgue measure and compute (recall we often write ‖f‖L∞ insteadof the more correct ‖[f ]‖L∞ = ‖Φ(f)‖L∞)

‖f‖L∞(Ω) = infa : |f(x)| ≤ a for almost every x ∈ Ω= infa : µx : |f(x)| > a = 0.

5

But µx : |f(x)| > a = 0 implies that x : |f(x)| > a = ∅ (Proof: If theset is not empty it is an open subset of Ω since |f | is continuous. The onlyopen subset of Ω with measure zero is the empty set.) So now

‖f‖L∞(Ω) = infa : µx : |f(x)| > a = 0= infa : |f(x)| > a = ∅= infa : |f(x)| ≤ a for all x ∈ Ω= sup

x∈Ω|f(x)| = ‖f‖C0(Ω).

Remark. The previous Proposition is true for any measurable subset Ω of Rn

with the following property: every nonempty open subset of Ω has positivemeasure.

Remark. The map Φ from C0(Ω) to L∞(Ω) is far from being onto. A typicaldiscontinuous function g cannot be changed on a set of measure zero to becontinuous. The following homework problem is to show this is the case withthe Heaviside function.

Homework Problem 2. Let g(x) be the Heaviside function on R. In otherwords, let g(x) = 0 if x < 0 and g(x) = 1 if x ≥ 0.

(a) Show there is no function in C0(R) which is equal to g almost every-where.

(b) Show that there is no sequence of functions fn ∈ C0(R) which satisfyfn → g in L∞(R).

Hint for (b): Show that if fn → g in L∞(R), then fn is a Cauchy sequencein C0(R). Then use Proposition 1 and show the resulting limit functionf∞ ∈ C0(R) must be equal to g almost everywhere. (This amounts to showingthat Φ(C0) is a closed subspace of L∞.) Provide a contradiction.

1.3 Quantifiers

It is worth taking the time to look in some detail at C0 convergence, and tocompare it to pointwise convergence. By contrast, C0 convergence is oftencall uniform convergence.

6

For a metric space X, fn → f in C0(X), if for all ε > 0, there is an N sothat

n > N =⇒ ‖fn − f‖C0(X) < ε.

In other words, for all ε > 0, there is an N so that

n > N =⇒ supx∈X|fn(x)− f(x)| < ε.

So then fn → f in C0(X) implies that for all ε > 0, there is an N so that forx ∈ X,

n > N =⇒ |fn(x)− f(x)| < ε.

A few easy manipulations imply in fact the following

Lemma 3. Let X be a metric space and let fn ∈ C0(X). Then fn → f inC0(X) if and only if for every ε > 0, there is an N = N(ε) so that for x ∈ X,

n > N =⇒ |fn(x)− f(x)| < ε.

Homework Problem 3. Prove Lemma 3.

Since C0(X) is a Banach space, we know that the limit function f ∈C0(X) as well, and thus the uniform limit of continuous functions is continu-ous. C0 convergence is called uniform convergence because the N in Lemma3 depends only on ε > 0 and not on x ∈ X: thus N is uniform over allx ∈ X.

We contrast this with pointwise convergence. If fn are functions on X,then fn → f pointwise if for all ε > 0 and x ∈ X, there is an N = N(ε, x) sothat

n > N =⇒ |fn(x)− f(x)| < ε.

The difference between pointwise and uniform convergence is subtle but veryimportant: in pointwise convergence N = N(ε, x) may depend on ε and x,while in uniform convergence N = N(ε) only depends on ε and is independentof x.

We have belabored this point because it is one of the major issues inanalysis: keeping track of which constants, or quantifiers, depend on whichother quantifiers. (It is even better to have explicit bounds (estimates) on thebehavior of quantifiers with respect to each other.) Of course it is desirable(though not always possible) to have more uniform dependence of quantifiers,as we see in the following standard example:

We have seen that the uniform limit of continuous functions is continuous.On the other hand, a pointwise limit of continuous functions may be not be:

7

Example 1. Consider X = [0, 1] and fn(x) = xn. Then fn → f pointwiseon [0, 1], where

f(x) =

0 for x ∈ [0, 1),1 for x = 1.

So the pointwise limit f is discontinuous, and thus we see that fn 6→ funiformly.

1.4 Derivatives

The theory of derivatives in one variable is fairly straightforward: if a functionf : R→ R is differentiable at p (i.e., f ′(p) exists), then f must be continuousat p. For functions of more than one variable, however, consider the followingexample:

Example 2.

f(x, y) =

xy

x2 + y2for (x, y) 6= (0, 0)

0 for (x, y) = (0, 0),

has first partial derivatives everywhere but is not even continuous at (0, 0).

Even though f has all its first partial derivatives at (0, 0), we do notconsider f to be differentiable at (0, 0). For functions of more than onevariable, we introduce the following definition of differentiability, which isstronger than just the existence of all the partial derivatives. Instead of R-valued functions, we consider the slightly more general case of maps from Rn

to Rm. A basic reference is Spivak, Calculus on Manifolds, Chapter 2.Let O ⊂ Rn be a domain, and let f = (f 1, . . . , fm) : O → Rm. Then f

is differentiable at a point a ∈ O if there is a linear map Df(a) : Rn → Rm

which satisfies

limh→0

|f(a+ h)− f(a)−Df(a)(h)||h|

= 0,

where h ∈ Rn. Df(a) is called the derivative, or total derivative, of f at a.

Lemma 4. In terms of standard bases of Rn and Rm, Df(a) is written asthe Jacobian matrix

Df(a) =

(∂f i

∂xj(a)

), i = 1, . . . ,m, j = 1, . . . , n.

8

In particular, if f is differentiable at a, then all the partial derivatives ∂f i/∂xj

exist at a.

Proof. WriteDf(a) as the matrix (λij). Also consider a path h = (0, . . . , k, . . . , 0),where k → 0 sits in the jth slot. (In other words, hl = δljk, where δlj is theKronecker delta, which is 1 if l = j and 0 otherwise.) We also use Einstein’ssummation convention. In n space, this summation convention requires thatany repeated index which appears in both up and down positions—such asthe l in the last two lines below—is assumed to be summed from 1 to n.Compute

∂f i

∂xj(a) = lim

k→0

f i(a1, . . . , aj + k, . . . , an)− f i(a)

k

= limk→0

[f i(a1, . . . , aj + k, . . . , an)− f i(a)− λilhl] + λilhl

k

= 0 + limk→0

λilδljk

k

= λilδlj = λij.

The key step, going from the second to the third line, follows from the as-sumption that f is differentiable at a.

Another important result with essentially the same proof concerns direc-tional derivatives. For a vector v = vj ∈ Rn, The directional derivative at ain the direction v is the vector

Dvf(a) = limt→0

f(a+ tv)− f(a)

t.

(Note we do not require ‖v‖ = 1 to define the directional derivative.) Wehave the following lemma:

Lemma 5. If f is differentiable at a, then the directional derivative Dvf(a)exists and

Dvf(a) = vj∂f

∂xj.

As Example 2 above shows, the converse of Lemma 4 is not true withoutextra assumptions on the partial derivatives. The following proposition givesan easy criterion for a function to be differentiable:

9

Proposition 6. If f = (f 1, . . . , fm) has continuous first partial derivatives∂f i/∂xj on a neighborhood of a, then f is differentiable at a.

Proof. For a component function f i, write

f i(a+ h)− f i(a) = f i(a1 + h1, a2, . . . , an)− f i(a1, a2, . . . , an)

+ f i(a1 + h1, a2 + h2, . . . , an)− f i(a1 + h1, a2, . . . , an)

+ · · ·+ f i(a1 + h1, a2 + h2, . . . , an + hn)−f i(a1 + h1, a2 + h2, . . . , an−1 + hn−1, an)

Now consider the first term in terms of the function f i(x1, a2, . . . , an) of thefirst variable x1 alone. The Mean Value Theorem shows that there is a b1

between a1 and a1 + h1 so that

f i(a1 + h1, a2, . . . , an)− f i(a1, a2, . . . , an) = h1 ∂fi

∂x1(b1, a2, . . . , an).

Similarly, for all other terms the difference equals

hj∂f i

∂xj(a1 + h1, . . . , aj−1 + hj−1, bj, aj+1, . . . , an)

for bj between aj and aj + hj. So if we set cj = (a1 + h1, . . . , aj−1 +hj−1, bj, aj+1, . . . , an), then we have

f i(a+ h)− f i(a) =n∑j=1

hj∂f i

∂xj(cj),

where each cj → a as h→ 0. So compute

limh→0

∣∣∣∣∣f i(a+ h)− f i(a)−n∑j=1

∂f i

∂xj(a)hj

∣∣∣∣∣|h|

= limh→0

∣∣∣∣∣n∑j=1

[∂f i

∂xj(cj)−

∂f i

∂xj(a)

]hj

∣∣∣∣∣|h|

≤ limh→0

n∑j=1

∣∣∣∣∂f i∂xj(cj)−

∂f i

∂xj(a)

∣∣∣∣ |hj||h|

≤ limh→0

n∑j=1

∣∣∣∣∂f i∂xj(cj)−

∂f i

∂xj(a)

∣∣∣∣= 0

10

since each ∂f i/∂xj is assumed to be continuous at a.So we have proved that each component function f i is differentiable at a.

To show f is differentiable, just note∣∣∣∣f(a+ h)− f(a)− ∂f

∂xj(a)hj

∣∣∣∣|h|

≤m∑i=1

∣∣∣∣f i(a+ h)− f i(a)− ∂f i

∂xj(a)hj

∣∣∣∣|h|

,

which goes to 0 as h→ 0.

Recall a function is (locally) C1 if its first partial derivatives are continu-ous. The previous Proposition 6 shows that such functions are differentiable,and Lemma 5 then shows that directional derivatives work as expected forC1 functions.

Now, for functions f on Ω an open subset of Rm, consider the norm

‖f‖C1(Ω) = ‖f‖C0(Ω) +m∑i=1

∥∥∥∥ ∂f∂xi∥∥∥∥C0(Ω)

and the space

C1(Ω) = f : Ω→ R : f, ∂1f, . . . , ∂mf are bounded and continuous.

Similarly, we can consider Rp-valued C1 functions, the difference being thatthe functions f , ∂if have bounded values in Rp.

Proposition 7. On any open set Ω ⊂ Rm, C1(Ω,Rp) is a Banach space.

Proof. It is straightforward to check ‖ · ‖C1 is a norm.Since ‖f‖C1 ≥ ‖f‖C0 and ‖f‖C1 ≥ ‖ ∂f

∂xj‖C0 , then for any Cauchy sequence

fn in C1, fn and ∂fn∂xj are Cauchy sequences in C0. Therefore, since C0

is a Banach space, there are uniform limits

f∞ = limnfn, gi = lim

n

∂f

∂xi, i = 1, . . . ,m, (1)

and f∞, gi ∈ C0. Since

‖f‖C1 = ‖f‖C0 +m∑i=1

∥∥∥∥ ∂f∂xi∥∥∥∥C0

,

11

(1) shows it suffices to prove that

∂f∞∂xi

= gi, i = 1, . . . ,m.

As usual, we recognize that integrating has better properties than differ-entiating. For x ∈ Ω, choose an x0 = x − (0, . . . , k, . . . , 0), where the k > 0is in the ith slot. Since Ω is open, we may choose k small enough so that theline segment from x0 to x is contained in Ω. Compute

f∞(x) = limnfn(x)

= limn

[fn(x0) +

∫ y=xi

y=xi−k

∂fn∂xi

(x1, . . . , xi−1, y, xi+1, . . . , xm) dy

]

= f∞(x0) +

∫ y=xi

y=xi−kgi(x

1, . . . , xi−1, y, xi+1, . . . , xm) dy (2)

The key step in the computation is the last one: fn(x0) → f∞(x0) is easy,and the integral converges by the Dominated Convergence Theorem: Sincegi ∈ C0, there is a constant C so that |gi| ≤ C on Ω. Moreover, since ∂fn

∂xi→ gi

in C0, there is an N so that |∂fn∂xi− gi| ≤ 1 for all n ≥ N . Thus ∂fn

∂xiare all

bounded by the integrable function C + 1, and the Dominated ConvergenceTheorem applies.

Now we can differentiate (2) with respect to xi and we see that ∂f∞∂xi

= giat each x ∈ Ω. This completes the proof.

The last part of the proof is of independent interest. We record it as

Proposition 8. Let fn be C1 functions on a domain Ω ⊂ Rm. Then if fn →f uniformly and ∂fn/∂x

i → gi uniformly for i = 1, . . . ,m, then gi = ∂f/∂xi.

Remark. We can also define Ck(Ω,Rp) to be the space of all functions f :Ω→ Rp so that f and all its partial derivatives up to order k are continuousand bounded. The norm is given by

‖f‖Ck =∑|α|≤k

‖∂αf‖C0 , (3)

where α = (α1, . . . , αm), each αi ≥ 0, |α| = α1 + · · ·+ αm, and

∂αf =∂|α|f

(∂x1)α1 · · · (∂xm)αm

12

(if some αi = 0, then there is no differentiation with respect to xi).We can use the same proof as above to conclude that Ck is a Banach

space. In particular, we can apply the theorem to F = (f, f,1, . . . , f,n) andthen relate ‖F‖C1 to ‖f‖C2 to provide an inductive step.

C∞ is not a Banach space, as the analog of (3) would involve an infinitesum.

We’ve used the following problem implicitly a few times above.

Homework Problem 4. Show that if f : Rn → Rm is differentiable at apoint a, then it is continuous at a.

Homework Problem 5. Let f be a real-valued function defined on a domainin R2. Show that if the second mixed partials f,12 = ∂2f

∂x1∂x2and f,21 = ∂2f

∂x2∂x1

are continuous in a neighborhood of a point y, then

∂2f

∂x1∂x2(y) =

∂2f

∂x2∂x1(y).

Hint: If the two are not equal, assume without loss of generality that thedifference f,12 − f,21 > 0 at y. Then it must be positive on a rectangularneighborhood. Integrate this quantity over the rectangular neighborhood, anduse Fubini’s Theorem and the Fundamental Theorem of Calculus to arrive ata contradiction.

Finally, we introduce the Chain Rule. We need the following lemma first:

Lemma 9. Let A : Rn → Rm be a linear map. Then there is a constantC = C(A) so that |Ax| ≤ C|x| for all x ∈ Rn.

Homework Problem 6. Prove Lemma 9. Hint: write down Ax in termsof the matrix entries of A.

Proposition 10 (Chain Rule). Let g : O → Rn, f : U → O, where O ⊂ Rm

and U ⊂ Rl are domains. Assume f is differentiable at a ∈ U , and g isdifferentiable at f(a) ∈ O. Then there is a composition of linear maps

D(g f)(a) = Dg(f(a)) Df(a).

In terms of partial derivatives, this is equivalent to

∂gp

∂xi=∂gp

∂yj∂yj

∂xi,

where xi are coordinates on Rl, yj are coordinates on Rm, and we followthe usual rules of Leibniz notation and Einstein summation.

13

Proof. Let A = Df(a), B = Dg(f(a)). Now consider the remainder termsin the definition of differentiable maps. For h ∈ Rl, k ∈ Rm,

φ(h) = f(a+ h)− f(a)− A(h),

ψ(k) = g(f(a) + k)− g(f(a))−B(k),

ρ(h) = (g f)(a+ h)− (g f)(a)− (B A)(h).

Then since f and g are differentiable,

limh→0

|φ(h)||h|

= 0, (4)

limk→0

|ψ(k)||k|

= 0, (5)

and we want to show that

limh→0

|ρ(h)||h|

= 0.

So compute

ρ(h) = g(f(a+ h))− g(f(a))−B(A(h))

= g(f(a+ h))− g(f(a))−B(f(a+ h)− f(a)− φ(h))

= [g(f(a+ h))− g(f(a))−B(f(a+ h)− f(a))] +B(φ(h))

= ψ(f(a+ h)− f(a)) +B(φ(h))

So then|ρ(h)||h|

≤ |ψ(f(a+ h)− f(a))||h|

+|B(φ(h))||h|

.

|B(φ(h))|/|h| → 0 as h → 0 by (4) and Lemma 9. On the other hand (5)shows that for all ε > 0 there is a δ so that

|k| < δ =⇒ |ψ(k)| ≤ ε|k|.

Therefore if |f(a + h)− f(a)| < δ (which can be achieved if |h| < γ since fis continuous),

|ψ(f(a+ h)− f(a))||h|

≤ ε|f(a+ h)− f(a)||h|

≤ ε

(|A(h)||h|

+|φ(h)||h|

)14

Now if we let h→ 0, using (4) and Lemma 9,

lim suph→0

|ψ(f(a+ h)− f(a))||h|

≤ Cε.

Now we may let ε→ 0 to show that |ρ(h)|/|h| → 0 as h→ 0.

1.5 Contraction mappings

Another tool we need is a basic fact about complete metric spaces, the Con-traction Mapping Theorem.

A fixed point of a map f : X → X is a point x ∈ X so that f(x) = x.For a metric space X with metric d, a contraction map is a map g : X → Xso that there is a constant λ ∈ (0, 1) for which

d(g(x), g(y)) ≤ λ d(x, y) for all x, y ∈ X.

Remark. It is important that the constant λ < 1 is independent of the x andy in X. As we’ll see below in a homework exercise, the following theorem isfalse if we let λ depend on x and y.

Theorem 2 (Contraction Mapping). Any contraction mapping on a com-plete metric space has a unique fixed point.

Proof. As above, denote our metric space by X with metric d, and letλ ∈ (0, 1) be the constant for the contraction map g: for all x, y ∈ X,d(g(x), g(y)) ≤ λ d(x, y).

First we prove uniqueness. If x and y are fixed points of g (so g(x) = x,g(y) = y), then

d(x, y) = d(g(x), g(y)) ≤ λ d(x, y).

So (1 − λ)d(x, y) ≤ 0. Since λ < 1 and d(x, y) ≥ 0 (since X is a metricspace), we must have d(x, y) = 0 and so x = y (again since X is a metricspace).

To prove existence of the fixed point, we consider any point x0 ∈ X,and consider iterates defined inductively by xn+1 = g(xn) for all n ≥ 0. Weclaim xn is a Cauchy sequence and the limit x∞ of xn is the fixed point. For

15

n ≥ m ≥ 0, compute

d(xn, xm) ≤ d(xn, xn−1) + · · ·+ d(xm+1, xm)

= d(g(xn−1), g(xn−2)) + · · ·+ d(g(xm), g(xm−1))

≤ λ d(xn−1, xn−2) + · · ·+ λ d(xm, xm−1)

≤ λ2d(xn−2, xn−3) + · · ·+ λ2d(xm−1, xm−2)

≤ λn−1d(x1, x0) + · · ·+ λmd(x1, x0)

= d(x1, x0)n−1∑i=m

λi ≤ d(x1, x0)∞∑i=m

λi = d(x1, x0)λm

1− λ

(Note that in this computation, we’ve used the exact sum of the geometricseries, and it is crucial that λ ∈ (0, 1): the geometric series diverges forλ ≥ 1.) So if N is a positive integer, then for all n,m > N , d(xn, xm) ≤d(x1, x0)λN/(1− λ), and this last quantity d(x1, x0)λN/(1− λ)→ 0 as N →∞. Thus xn is a Cauchy sequence which has a limit x∞ ∈ X since X is acomplete metric space.

Now we prove that x∞ is a fixed point. Since x∞ = limi xi = limi xi+1,we have

g(x∞) = g(limixi) = lim

ig(xi) = lim

ixi+1 = x∞,

and so x∞ is a fixed point. One point to note is that we have interchanged gwith lim, which is valid only if g is continuous (this is a homework problembelow).

Homework Problem 7. Show any contraction map is continuous.

Homework Problem 8. Newton’s method is an iterative method for findingzeros of differentiable functions. For an initial x0, we proceed by the recursivedefinition

xi+1 = xi −f(xi)

f ′(xi).

Then the limit limxn should produce a zero of the function f .A differentiable function f : R → R has a nondegenerate zero at x if

f(x) = 0 and f ′(x) 6= 0.Assume f : R → R is a locally C2 function (i.e., f ′′ is continuous on all

of R). Show that every nondegenerate zero x of f has a neighborhood Nx sothat for any initial x0 ∈ Nx, Newton’s method converges to x. Hints:

16

(a) The main point is to exhibit the Newton’s method iteration as a con-traction map on a complete metric space (recall a closed subset of anycomplete metric space is complete). You must find an appropriatelysmall neighborhood of x on whose closure Newton’s method is a con-traction map.

(b) You will need the following lemma: For a C1 function g : R→ R,

y 6= z ∈ [a, b] =⇒ |g(y)− g(z)||y − z|

≤ maxw∈[a,b]

|g′(w)|.

(c) Show that any fixed point of Newton’s method is a zero.

(d) Show the zero you have produced via Newton’s method must be the orig-inal zero x.

1.6 Differentiating under the Integral

Proposition 11. Let f = f(y, x) be a locally C1 real-valued function fory ∈ Rn, x ∈ O an open subset of Rm. Then on a measurable Ω ⊂⊂ O ⊂ Rm

equipped with Lebesgue measure dx,

∂

∂yi

∫Ω

f(y, x) dx =

∫Ω

∂f

∂yi(y, x) dx.∫

Ωf(y, x) dx is C1 as a function of y.

Remark. Ω ⊂⊂ O means that the closure Ω in Rm is a compact subset of O.

Proof. Compute. Let ei be the standard ith basis vector on Rn.

∂

∂yi

∫Ω

f(y, x) dx = limk→0

1

k

[∫Ω

f(y + kei, x) dx−∫

Ω

f(y, x) dx

]= lim

k→0

∫Ω

f(y + kei, x)− f(y, x)

kdx

Clearly as k → 0, the integrand goes to ∂f∂yi

(y, x) pointwise. We need toshow that the integrands are bounded in absolute value by a fixed integrablefunction to use the Dominated Convergence Theorem. This follows from theMean Value Theorem, which shows that the integrand is equal to

∂f

∂yi(y, x)

17

for y = (y1, . . . , yi−1, bi, yi+1, . . . , yn), bi between yi and yi + k. Since f is C1,∂f/∂yi is continuous, Ω is compact, and y stays in a compact neighborhoodof y, then the absolute value of the integrand is bounded by a constant M .Since

∫ΩM dx <∞, the Dominated Convergence Theorem shows that

∂

∂yi

∫Ω

f(y, x) dx = limk→0

∫Ω


kdx

=

∫Ω

limk→0


kdx

=

∫Ω

∂f

∂yi(y, x) dx

To show that∫

Ωf(y, x) is C1 as a function of y, note that its partial

derivatives

gi(y) =

∫Ω

∂f

∂yi(y, x) dx

are continuous in y by the Dominated Convergence Theorem again, since ify → y0, then

limy→y0

gi(y) = limy→y0

∫Ω

∂f

∂yi(y, x) dx

=

∫Ω

limy→y0

∂f

∂yi(y, x) dx

=

∫Ω

∂f

∂yi(y0, x) dx

= gi(y0)

because ∂f/∂yi is continuous in y.

Remark. The last argument also shows that if f = f(z, x) is a continuousfunction of z and x, and x ∈ Ω a compact subset of Rn, then the function

z 7→∫

Ω

f(z, x) dx

is continuous.

18

1.7 The Inverse Function Theorem

We need the following lemma first:

Lemma 12. If f is a C1 function from a ball B in Rn to Rm, which satisfies∣∣∣∣∂f i∂xj

∣∣∣∣ ≤ C

on B, then for y, z ∈ B,

|f(y)− f(z)| ≤ Cmn|y − z|.

Proof. If y, z ∈ B, then the line segment ty + (1− t)z : 0 ≤ t ≤ 1 betweenthem is also contained in B (see Homework Problem 13 below). Then usethe Chain Rule to compute for i = 1, . . . ,m,

|f i(y)− f i(z)| =

∣∣∣∣∫ 1

0

∂

∂tf i(ty + (1− t)z) dt

∣∣∣∣=

∣∣∣∣∫ 1

0

(yj − zj)∂fi

∂xj(ty + (1− t)z) dt

∣∣∣∣≤ Cn|y − z|.

(Note this argument is essentially the same as the use of the Mean ValueTheorem.) Now apply

|f(y)− f(z)| ≤m∑i=1

|f i(y)− f i(z)|.

Theorem 3 (Inverse Function Theorem). Let f : O → U be a C1 mapbetween domains in Rm. Assume that for a ∈ O, Df(a) is an invertiblematrix (i.e., detDf(a) 6= 0). Then there are neighborhoods O′ 3 a andU ′ 3 f(a) so that f : O′ → U ′ is a bijection and f−1 is also a C1 map. Forevery b ∈ O′, D(f−1)(f(b)) = (Df(b))−1.

Proof. First of all, we may reduce to the case that a = f(a) = 0 and Df(a) =I the identity map from Rm to itself. (This can be achieved by replacing f(x)by (Df(a))−1(f(x + a)− f(a)). Then use the Chain Rule and the fact thatthe derivative of the linear map (Df(a))−1 is (Df(a))−1 itself.)

19

Now consider g(x) = x − f(x) and note that Dg(0) = 0 the zero lineartransformation. Since g is C1, there is an r > 0 so that |x| < 2r implies∣∣∣∣ ∂gi∂xj

(x)

∣∣∣∣ < 1

2m2, for i, j = 1, . . . ,m. (6)

Let B(r) = x ∈ Rm : |x| < r. Then Lemma 12 and g(0) = 0 imply thatg(B(r)) ⊂ B(r/2).

Now let y ∈ B(r/2) and consider

gy(x) = g(x) + y = x− f(x) + y.

Then

• gy(x) = x is equivalent to f(x) = y, and so a fixed point of gy isequivalent to a solution to f(x) = y.

• If x ∈ B(r), |gy(x)| ≤ |g(x)| + |y| ≤ r, and so gy is a map from the

complete metric space B(r) to itself.

• Lemma 12 and (6) imply gy is a contraction map (with λ = 1/2). In

other words, for x1, x2 ∈ B(r),

|gy(x1)− gy(x2)| = |g(x1)− g(x2)| ≤ 12|x1 − x2| (7)

Therefore, for each y ∈ B(r/2), there is a unique fixed point x of gy, which

shows there is a unique solution x to f(x) = y in B(r).Now we show x = f−1(y) is continuous: for x1, x2 ∈ B(r), we have, by

the definition g = x− f and (7)

|x1 − x2| ≤ |g(x1)− g(x2)|+ |f(x1)− f(x2)|≤ 1

2|x1 − x2|+ |f(x1)− f(x2)|,

12|x1 − x2| ≤ |f(x1)− f(x2)|,

|f−1(y1)− f−1(y2)| ≤ 2|y1 − y2| (8)

for yi = f(xi). Thus f−1 is continuous.To show f−1 is differentiable at y2 with total derivative (Df(x2))−1, we

need to show that

limy1→y2

|f−1(y1)− f−1(y2)− (Df(x2))−1(y1 − y2)||y1 − y2|

= 0.

20

To show this, compute

|f−1(y1)− f−1(y2)− (Df(x2))−1(y1 − y2)|= |x1 − x2 − (Df(x2))−1(f(x1)− f(x2))|= |(Df(x2))−1[Df(x2)(x1 − x2)− (y1 − y2)]|≤ C|Df(x2)(x1 − x2)− (y1 − y2)| (by Lemma 9)

= C|Df(x2)(x1 − x2)− [f(x1)− f(x2)]| (9)

Therefore,

|f−1(y1)− f−1(y2)− (Df(x2))−1(y1 − y2)||y1 − y2|

=|f−1(y1)− f−1(y2)− (Df(x2))−1(y1 − y2)|

|x1 − x2|· |x1 − x2||y1 − y2|

(Note y1 6= y2 implies x1 6= x2 since yi = f(xi).) This expression goes to zeroas y1 → y2 by (8) and (9), since f is differentiable at x2.

Finally we show the total derivative (Df(x))−1 is continuous in y. Wecan think of Df as a map from x to Rm2

, which represents the space ofm×m matrices. Df(x) is continuous in x (f is C1), and thus is continuousin y. The determinant function det : Rm2 → R is continuous, since it is apolynomial in the matrix entries. So detDf(x) is bounded away from zero,by compactness of B(r). We are left to prove the continuity of the matrixinverse operation for square matrices with determinant bounded away from 0.This follows from the formula from the inverse in terms of cofactor matrices:Each entry of the inverse matrix A−1 = (aij)

−1 is of the form

(m− 1)st-order polynomial in the aijdet(aij)

.

Homework Problem 9. If, in the Inverse Function Theorem, f is a smooth(C∞) map, then f−1 : U ′ → O′, the C1 local inverse of f , is also C∞. Hints:

(a) If A = A(s) is a family of invertible n × n matrices which dependdifferentiably on a real parameter s, differentiate the equation AA−1 = Ito show

d(A−1)

ds= −A−1dA

dsA−1.

21

(b) Use the formula for D(f−1) to show that f−1 is C∞.

Hints: It may be helpful to use the following notation. If f = f(x) =f(x1, . . . , xn), we may write (y1, . . . , yn) = y = y(x) = f(x). And sof−1(y) = x may be written simply as y = y(x). To show f−1 is C2, forexample, you should write

∂2(f−1)k

∂yi∂yj=

∂2xk

∂yi∂yj

in terms of (the components of) the first and second derivatives

∂f

∂xi=

∂y

∂xi, and

∂2f

∂xi∂xj=

∂2y

∂xi∂xj

and verify that the resulting expression is continuous.

Remember to use the Chain Rule, as in e.g.,

∂

∂yj=∂xi

∂yj∂

∂xi,

and recall that Df−1 = (Df)−1 can be written as(∂xi

∂yj

)=

(∂yk

∂xl

)−1

.

It will also be helpful to use Einstein’s summation notation. In partic-ular, the matrix notation used in part (a) is insufficient, as there maybe quantities with more than 2 indices which need to be summed.

Theorem 4 (Implicit Function Theorem). Suppose f : Rn×Rm → Rm is C1

in an open set containing (a, b), and assume f(a, b) = 0. Assume the m×mmatrix (

∂f i

∂xn+j(a, b)

), 1 ≤ i, j ≤ m

is invertible. Then there is an open set O ⊂ Rn containing a and an openset U ⊂ Rm containing b so that for each x ∈ O, there is a unique g(x) ∈ Uso that f(x, g(x)) = 0. g is locally C1.

Homework Problem 10. Prove the Implicit Function Theorem. Hints:

22

(a) Consider F : Rn×Rm → Rn×Rm defined by F (x, y) = (x, f(x, y)) andapply the Inverse Function Theorem to F .

(b) Show that, on a suitably small neighborhood, F−1 is of the form F−1(x, y) =(x, p(x, y)) for p : Rn × Rm → Rm.

(c) Show that g(x) = p(x, 0) satisfies the conditions of the theorem.

1.8 Lipschitz constants and functions

A closely related concept to the contraction map is the Lipschitz constant.A map f : X → Y has Lipschitz constant

L = supx,x′∈X:x 6=x′

dY (f(x), f(x′))

dX(x, x′).

Here of course dX and dY are the metrics on X and Y respectively. Anequivalent definition is that L is the smallest constant so that

dY (f(x), f(x′)) ≤ LdX(x, x′) for all x, x′ ∈ X.

A function with finite Lipschitz constant is called Lipschitz. A basic fact isthe following:

Lemma 13. Any Lipschitz function is continuous.

If f : X → X, then the Lipschitz constant gives a criterion for a mappingto be a contraction mapping:

Lemma 14. f : X → X is a contraction map if and only if the Lipschitzconstant L of f is strictly less than 1.

Idea of proof. The Lipschitz constant is the smallest value of λ for which fis a contraction map.

If f : R→ R, then the Lipschitz constant is simply

L = supx 6=y

|f(x)− f(y)||x− y|

,

which of course is suggestive of the definition of the derivative. In fact, thefollowing is true:

23

Homework Problem 11. The Lipschitz constant of a locally C1 functionf : R→ R is equal to supx∈R |f ′(x)|.

Hint: To show the two quantities are equal, you need to relate the sup ofthe derivative to the sup of the difference quotients. To relate the derivativef ′(x) to difference quotients, use the definition of the derivative. To relate agiven difference quotient to a derivative, use the Mean Value Theorem.

The previous problem shows that any differentiable function with boundedderivative is Lipschitz. The converse is false, as we see in the following ex-ample.

Example 3. The function x 7→ |x| is a Lipschitz function from R to R. Thisfollows from the observation that for each x 6= y ∈ R,∣∣|x| − |y| ∣∣

|x− y|≤ 1.

(This can be proved using the Triangle Inequality.)

Example 4. For any constant α ∈ (0, 1), the function from R to R x 7→ |x|αis not Lipschitz. In particular,

limx→0

∣∣|x|α − |0|α∣∣|x− 0|

= limx→0|x|α−1 =∞.

In terms of the graph of a function, a function whose graph has a corner(as does x 7→ |x|) is Lipschitz, while a function whose graph has a cusp (asdoes x 7→ |x|α) is not Lipschitz.

Another basic fact we establish is this: the conclusion of the ContractionMap Theorem may be false if the Lipschitz constant is equal to 1. An easyexample is the map x 7→ x + 1 from R → R. The Lipschitz constant isobviously 1, and there is no fixed point. A related, but somewhat moresurprising fact, is outlined in the following problem:

Homework Problem 12. Find an example of a differentiable function f :R→ R so that for each x 6= y,

|f(x)− f(y)||x− y|

< 1,

and yet f has no fixed point. Prove your answer works.

24

Hint: The point of this problem is that there should be no uniform L < 1which works for all x and y. To construct such a function f , use Problem11 above. In particular, first construct the derivative f ′ and then integrateto find f . (You’ll need supx |f ′(x)| = 1; why?) Use the Mean Value Theoremto relate values of f ′ to difference quotients.

A subset C of a real vector space is convex if every line segment connectingtwo points in C is contained in C. More formally, C is convex if

x, y ∈ C, t ∈ [0, 1] =⇒ tx+ (1− t)y ∈ C.

Proposition 15. Any globally C1 function from a convex domain Ω ⊂ Rn

to Rm is globally Lipschitz.

Proof. Lemma 12 above shows that for any x, y ∈ Ω,

|f(x)−f(y)| ≤ Cnm|x−y|, for C = sup

∣∣∣∣∂f i∂xj(z)

∣∣∣∣ : z ∈ Ω, i ≤ n, j ≤ m

.

C <∞ since f is C1. Thus f is Lipschitz.

Consider X a locally compact metric space and Y any metric space. Thenwe say a function f : X → Y is locally Lipschitz if f satisfies one of the twofollowing equivalent definitions:

1. f is Lipschitz when restricted to any compact set of X. In other words,if K ⊂ X is compact, then there is a constant LK so that

x, x′ ∈ K =⇒ dY (f(x), f(x′)) ≤ LKdX(x, x′).

2. Each x ∈ X has a neighborhood on which f is Lipschitz.

We prove these two definitions are equivalent below.

Corollary 16. On any domain Ω ⊂ Rn, any locally C1 function f is locallyLipschitz.

Proof. Any ball is convex (see the following homework problem), and so iff is C1 on a small ball, then it is Lipschitz on the ball by the previousProposition 15.

25

Homework Problem 13. Show that any ball Bx(r) = y ∈ Rn : |y−x| < ris convex.

Proposition 17. Let X be a locally compact metric space and Y be anymetric space, then for maps f from X to Y , the two definitions (1) and (2)above are equivalent.

Proof. To prove (1) =⇒ (2), consider x ∈ X. Since X is locally compact,there is a neighborhood O of x with compact closure. By the definition oflocally Lipschitz, f is Lipschitz when restricted to O, and is thus Lipschitzon O also.

To prove part (2) =⇒ (1), let K ⊂ X be a compact subset. Given thatall points in X have neighborhoods on which f is Lipschitz, we need to provethat f is Lipschitz on K. The set of all neighborhoods of points in K onwhich f is Lipschitz forms an open cover of K, and thus there is a finitesubcover O1, . . . ,On. The set

P = K ×K \

(n⋃i=1

Oi ×Oi

)

is compact, and so the function

dY (f(x), f(x′))

dX(x, x′),

which is continuous on P , attains its maximum M on P .Consider any x 6= x′ ∈ K. Then either (x, x′) ∈ P or x, x′ ∈ Oi for

some i = 1, . . . , n. Let Li be the Lipschitz constant of f |Oi . Choose L =maxM,L1, . . . , Ln. Then for every x 6= x′ ∈ K,

dY (f(x), f(x′))

dX(x, x′)≤ L

and f is Lipschitz on K.

26

2 Ordinary Differential Equations

2.1 Introduction

An ordinary differential equation (an ODE ) is an equation of the form

x(n)(t) = F (x(n−1), . . . , x, x, t), (10)

where x : I → R is a function of t, I is an open interval in R,

x =dx

dt, and x(n) =

dnx

dtn.

The order of the above equation is n, the highest derivative of x whichappears. It is also useful to consider the case

x = (x1, . . . , xm) : I → Rm,

which is called a system of ODEs.Some ODEs can be solved explicitly by using integration techniques, but

most cannot. For most ODEs, instead of explicit solutions, we must relyon an abstract existence theorem to show that for nice enough F (Lipschitzsuffices), there is a unique solution locally. We also investigate the regularityof solutions, showing, for example, if F is smooth, then any solution to (10)is smooth. Existence, uniqueness, and regularity are three main themes inthe theory of all differential equations, and there are satisfactory theoremsto handle all three for ODEs.

Consider the following example (where x, not t, is the dependent vari-able):

Example 5. Consider the differential equation dy/dx = x2y. This first orderODE is called separable, since it is written in the form dy/dx = f(x)g(y).Recall the solution procedure for a separable ODE:

• If c is a root of g(y), then y = c is a solution. (Why?) So in the presentcase, y = 0 is a solution.

27

• For other values of g(y), compute

dy

dx= x2y,

dy

y= x2 dx,∫

dy

y=

∫x2 dx,

ln |y| =x3

3+ C,

y = ±eCex3/3 = C ′ex3/3,

where C ′ = ±eC is a nonzero constant.

• If we let C ′ be any real number, then we capture both cases above, andthe general solution is y = C ′ex

3/3.

Homework Problem 14. Consider the ODE

dy

dx=

1 + y2

1 + x2.

(a) Find the general solution to this differential equation. Your answershould be rational functions of x. You may need to write your answerusing more than one case.

(b) Find the particular solution passing through (x, y) = (1, 1).

(c) Find the particular solution passing through (x, y) = (1,−1). (Hint:What is the formula for tan(φ+ π

2)?)

2.2 Local Existence and Uniqueness

The most natural setting for systems of ODEs is in terms of an initial valueproblem. Let x = (x1, . . . , xn) = x(t). An initial value problem for a firstorder system of ODEs at t = t0 consists of

• a system of ODEs x = v(x, t)

• and an initial condition x(t0) = x0.

28

We’ll see below that if v satisfies a Lipschitz condition, and for t in a smallinterval around t0, there is a unique solution to the initial value problem.

Example 6. Consider the following problem: Find a solution to the ODEy = y2 subject to the initial condition y(0) = 1. Interpreting t as a timevariable, what happens as time goes forward from t = 0?

Solution: dy/dt = y2 is separable, and so compute

dy

y2= dt =⇒

∫dy

y2=

∫dt =⇒ −1

y= t+ C =⇒ y = − 1

t+ C.

Plug in the initial condition y = 1 and t = 0 to solve for C to find C = −1and

y =1

1− t.

Note that y(t) is discontinuous at t = 1, so as time goes forward from t = 0,the solution only exists until time 1. Also note there is no problem goingbackward in time, and so the solution to the initial value problem is

y =1

1− t, t ∈ (−∞, 1).

It does not make sense to talk about the solution to the initial value problembeyond t = 1.

The previous example shows that it is not in general possible to extend asolution to an initial value problem for all time. However, we can still hopeto find a solution to an initial value problem on a neighborhood (t0−ε, t0 +ε)of t0.

Theorem 5. Consider the initial value problemx = v(x, t),x(t0) = x0

(11)

for x : I → Rn for I an open neighborhood of t0. Assume v is a Lipschitzfunction from O × I → Rn, where O ⊂ Rn is an open neighborhood of x0.Then on a neighborhood I of t0 contained in I, there is a unique solution φto (11).

Before we give the proof, let us consider a few examples.

29

Example 7. The differential equation x = x2 + t has no solution which canbe written down in terms of standard algebraic and transcendental functions(such as roots, exponentials, trigonometric functions). Theorem 5 states thatthere is a local solution for every initial value problem. For example, forinitial conditions x(0) = 1, there is a solution valid on an open intervalcontaining t = 0.

Theorem 5 does not guarantee a solution which is valid for all time t (seeExample 6 above). In fact the solution for the present initial-value problemwill also blow up in finite time. This is basically because for t ≥ 0, x =x2 + t ≥ x2, and so the solution should grow faster than the solution toExample 6, which goes to infinity in finite time.

If v in Theorem 5 is not Lipschitz, then it is possible to lose the uniquenessstatement from Theorem 5 (although existence is still valid).

Example 8. Consider the initial value problem

x = x23 , x(0) = 0.

Then it is straightforward to verify that x(t) = 0 is a solution. There isanother solution, however. Solve the equation

dx

dt= x

23 ,

x−23dx = dt,∫

x−23dx =

∫dt,

3x13 = t+ C,

x = (13t+ 1

3C)3.

Then plug in x(0) = 0 to find C = 0 and the solution x(t) = (13t)3.

The point of this example is that v = x23 is not Lipschitz—see Example 4

above. Therefore, Theorem 5 does not apply.

Proof of Theorem 5. The idea of the proof is to set up the problem in termsof a contraction mapping. We first find an iteration whose fixed point solvesthe differential equation and then find an appropriate complete metric spaceon which the iteration is a contraction map.

30

For a continuous Rn-valued function φ defined on a neighborhood of t0,let Aφ be another such function defined as follows:

(Aφ)(t) = x0 +

∫ t

t0

v(φ(τ), τ)dτ. (12)

(Note we are integrating Rn-valued function. This may be related to theusual R-valued integration theory by considering each component separately.)A will be our iterative map, and we consider φ, Aφ, A2φ, etc., to be thePicard approximations for the initial value problem. We consider Picardapproximations because of the following

Lemma 18. A continuous fixed point of the Picard approximation (12) is asolution to the initial value problem (11). In particular, any such fixed pointis continuously differentiable.

Proof. If Aφ = φ, then compute

φ =d

dt

[x0 +

∫ t

t0

v(φ(τ), τ)dτ

]= v(φ(t), t)

by the Fundamental Theorem of Calculus. In particular, since φ and v arecontinuous (Lemma 13), φ is continuous, and so φ is continuously differen-tiable. Lastly, check the initial condition

φ(t0) = x0 +

∫ t0

t0

v(φ(τ), τ)dτ = x0

to complete the proof of the lemma.

Our complete metric space will be

X = φ ∈ C0(I ,Rn) : φ(t0) = x0, supt∈I|φ(t)− x0| ≤ P,

where I = [t0 − ε, t0 + ε] ⊂ I for a small positive ε to be determined later,| · | is the norm on Rn, and P is chosen so that the closed ball Bx0(P ) = x :|x− x0| ≤ P ⊂ O. We first demonstrate

Lemma 19. X is a complete metric space.

31

Proof. First of all, C0(I ,Rn) is complete by Proposition 1. Moreover, theconditions imposed give closed subsets of the Banach space C0. The secondcondition is obviously closed since the norm on any Banach space is contin-uous. To check the condition φ(t0) = x0 is closed, use the following lemma,whose proof is immediate:

Lemma 20. For a metric space J and y ∈ J , the map from the Banachspace C0(J,Rn) to Rn given by f 7→ f(y) is continuous.

Since these two conditions are closed, X is a closed subset of the completemetric space C0(I ,Rn), and so is complete with the induced metric.

Remark. Lemma 20 is false for the Banach space L∞. Why?

So we have proved that X is a complete metric space. Next we show

Lemma 21. For ε > 0 small enough, A : X → X.

Proof. First of all, choose δ > 0 so that [t0 − δ, t0 + δ] ⊂ I. Since v iscontinuous and x : |x − x0| ≤ P × [t0 − δ, t0 + δ] is compact, there is aconstant M so that

sup|t−t0|≤δ, |x−x0|≤P

|v(x, t)| ≤M.

In order for this bound to work below, we must have ε ≤ δ (so then I ⊂[t0 − δ, t0 + δ]). To check A : X → X, we need to check for each φ ∈ X,

1. Aφ is continuous. This follows as in Lemma 18 above.

2. (Aφ)(t0) = x0. This is easy to check as in Lemma 18.

3. supt∈I |(Aφ)(t)− x0| ≤ P . To check this, write

|(Aφ)(t)− x0| =∣∣∣∣∫ t

t0

v(φ(τ), τ)dτ

∣∣∣∣ ≤M |t− t0| ≤Mε,

where we have used the fact that φ ∈ X and the definition of M toshow the first inequality. So this condition is satisfied if ε ≤ P/M .

So A : X → X if ε ≤ minδ, P/M.

32

Finally we use the Lipschitz hypothesis on v to show that A is a con-traction map. Let L be the Lipschitz constant for v. Then for φ, ψ ∈ X,compute

|(Aφ)(t)− (Aψ)(t)| =

∣∣∣∣∫ t

t0

[v(φ(τ), τ)− v(ψ(τ), τ)]dτ

∣∣∣∣≤

∫ t

t0

|v(φ(τ), τ)− v(ψ(τ), τ)|dτ

≤∫ t

t0

L|φ(τ)− ψ(τ)|dτ

≤ L‖φ− ψ‖C0|t− t0|≤ εL‖φ− ψ‖C0

Then since ‖Aφ− Aψ‖C0 = supt∈I |(Aφ)(t)− (Aψ)(t)|, we see that

‖Aφ− Aψ‖C0 ≤ εL‖φ− ψ‖C0 .

So A is a contraction map if ε < 1/L. Thus all together, if we requireε < minδ, P/M, 1/L, then A is a contraction map on X, and its fixedpoint is a solution to the initial value problem.

In order to show uniqueness of the initial value problem, note that theContraction Mapping Theorem automatically proves that any two continuoussolutions φ1 and φ2 to the initial value problem from I to Rn must coincideif the additional constraint

supt∈I|φ(t)− x0| ≤ P

is satisfied. Since φ1 and φ2 are continuous and satisfy the initial condition,this condition is automatically satisfied for both φ1 and φ2 on a (perhapssmaller) interval I ⊂ I containing t0. Then uniqueness applies on this smallerinterval, since A is a contraction map for any ε small enough. Note that theinterval I on which φ1 = φ2 may depend on φ1 and φ2. The proof that thetwo solutions must coincide on all of I depends on the Extension Theorem 6below.

We record what we have proven so far with respect to uniqueness here.

Proposition 22. Any two solutions φ1 and φ2 to the initial value problem(11) coincide on a small interval containing t0. The interval may depend onthe solutions φ1 and φ2.

33

Remark. Note that in the proof of the previous theorem, we only use thatv is Lipschitz in the x variables (with a uniform Lipschitz constant uniformvalid for all t). We still require v to be continuous in t.

The previous theorem provides a continuously differentiable solution onan interval I containing the initial time t0 and proves uniqueness on a (per-haps) smaller interval I. There is a satisfactory more global theory of ODEswhich we detail in the next subsection.

2.3 Extension of solutions

Recall, from Corollary 16 above, that any locally C1 function f from Ω, adomain in Rn, to Rm is locally Lipschitz. In other words, f is Lipschitz whenrestricted to any compact subset of Ω.

Theorem 6 (Extension). Consider an initial value problem

x = v(x, t), x(t0) = x0. (13)

Assume v is continuous and locally Lipschitz in Rn × I, where I is an openinterval containing t0. Then there is an open interval J satisfying t0 ∈ J ⊂ Iand a unique solution φ : J → Rn to the initial value problem. Moreover,J is maximal in the following sense: if there is a time T ∈ I ∩ ∂J , thenlim supt→T |φ(t)| =∞.

So this theorem says that if we start with an initial condition x(t0) = x0

and flow forward (or backward) in time by satisfying the ODE, then thereis a unique solution which continues until (1) the end of the interval I isreached, or (2) the solution blows up.

Proof. We first consider the following lemma, which is a consequence of theproof of Theorem 5 above:

Lemma 23. On any compact subset K of Rn × I, there is an ε > 0 sothat for any (x0, t0) ∈ K, there exists a solution to the initial value problemx = v(x, t), x(t0) = x0 which is valid on [t0 − ε, t0 + ε].

The point is that there is a uniform ε which works for all initial conditions(x0, t0) ∈ K.

34

Proof. Recall that in the proof of Theorem 5. Any ε < minδ, P/M, 1/Lworks. By compactness of K and since I is open, we can choose a uniformδ > 0 so that for all (x0, t0) ∈ K, [t0 − δ, t0 + δ] ⊂ I. We may choose P tobe any positive number (since O = Rn in the present case). The Lipschitzconstant L = LK is uniform over any compact set K by the locally Lipschitzproperty of v (Proposition 17). Let

M = max(x,t)∈K

|v(x, t)|,

where

K = (x, t) ∈ Rn+1 : ∃(x0, t0) ∈ K : |t− t0| ≤ δ, |x− x0| ≤ P

It is straightforward to check K is compact (it is the image of the com-pact set K × BP (0) × [−δ, δ] ⊂ Rn+1 × Rn+1 under the continuous map+: Rn+1×Rn+1 → Rn+1.) Therefore, since v is continuous, M can be chosenindependently of (x0, t0) ∈ K.

(Note the reason we need to go to all of K: the definition of M in theproof of Theorem 5 above is

M = sup|t−t0|≤δ,|x−x0|≤P

|v(x, t)|.

In order to have a single M work for all (x0, t0) ∈ K, we must have let(x, t) ∈ K. L must be valid on all of K as well, since we consider integralsfrom t0 to t, where (x0, t0) ∈ K, |t− t0| ≤ ε < δ.)

Now we must ensure that ε < minδ, P/M, 1/L. All of these quantitiescan be chosen independently of (x0, t0) ∈ K.

Lemma 24 (Gluing solutions). Consider any two solutions to x = v(x, t)which are defined on intervals in R. If the two coincide on any intervalin R then they must coincide on the entire intersection of their intervals ofdefinition. Thus they can be glued together to form a solution on the unionof their intervals of definition.

Proof. Consider two solutions φ1, φ2 to x = v(x, t) defined on intervals I1

and I2. Assume they coincide on an interval I3 ⊂ I1 ∩ I2. We want to showφ1 = φ2 on all of I1∩ I2. Let I4 be the largest interval containing I3 on whichφ1 and φ2 coincide (take I4 to be the path-connected component of the closedset t : φ1(t) = φ2(t) containing I3). Now we will show that I4 = I1 ∩ I2.

35

Assume I4 6= I1 ∩ I2. Then since I4 is a relatively closed subinterval ofI1 ∩ I2, there is an endpoint T of I4 in the interior of I1 ∩ I2. Now φ1 and φ2

are both solutions of

x = v(x, t), x(T ) = φ1(T ) [= φ2(T )].

Proposition 22 shows that φ1 and φ2 must agree on a small interval I5 3 t0.Thus I4 must contain I5, and we have a contradiction to the assumption thatT is an endpoint of I4 in the interior of I1 ∩ I2. Thus I4 = I1 ∩ I2.

It may help to refer to the following picture of the intervals involved.

I1

I2

I1 ∩ I2

I3

I4r T

I5

Now we have proved that φ1 = φ2 on the intersection of their domains ofdefinition I1 ∩ I2. To extend to I1 ∪ I2, define

φ(t) =

φ1(t) for t ∈ I1,φ2(t) for t ∈ I2 \ I1.

Note that φ is a solution to the differential equation since both φ1 and φ2

are. There is no trouble with the differentiability of this piecewise-definedfunction since φ1 = φ2 on the whole interval I1 ∩ I2.

For simplicity, consider only solutions moving forward in time. Let

E = t ∈ I+ : there is a unique solution φ to (13) on [t0, t),

where I+ = I ∩ (t0,∞). We will set this E to be equal to J+ = J ∩ (t0,∞).Uniqueness on [t0, t) means any other solution to the initial value problemdefined on an interval containing [t0, t) must coincide with φ there. It willsuffice to prove the following

Lemma 25. If supE |φ| ≤ C <∞, then E = I+.

Proof. Assume |φ| is uniformly bounded on E. Then to prove the lemma itis enough to show that E is a nonempty, open, and closed subset of I+ (and

36

so E = I+ since I+ is connected). E is nonempty by Theorem 5 and Lemma24 above.

To show E is open in I+, let T ∈ E. Then there is a unique solution φdefined on (t0, T ). First we note that (t0, T ] ⊂ E. To see this, let T ′ ∈ (t0, T ].Then the restriction of φ = φT to [t0, T

′] is a solution to (13) on [t0, T ).Moreover, it is unique, since any other solution to (13) on [t0, T

′) agrees withφ on a neighborhood of t0, and so Lemma 24 shows they must agree on all[t0, T

′).So to show E is open, we may restrict our attention to times larger than

T . Since |φ| is uniformly bounded by C and [t0, T ] is a compact subintervalof I, we may apply Lemma 23 to show there is uniform ε so that any solutionto the differential equation with initial condition x(τ) = χ for τ ∈ [t0, T ],|χ| ≤ C must exist on [τ − ε, τ + ε]. Now we may consider the initial valueproblem

x = v(x, t), x(T − ε2) = φ(T − ε

2). (14)

So Lemma 23 shows there is a solution φ to this initial value problem whichexists on [T − 3ε

2, T + ε

2]. Moreover, Lemma 24 says that φ = φ on the

intersection of their intervals of definition, and moreover, that φ may beextended by φ to a solution on [t0, T + ε

2]. Lemma 24 also implies this

extension is unique on every subinterval containing t0, and so in particular[T − ε

2, T + ε

2] ⊂ E and E is open.

rTrT − ε2

[t0, T ][T − 3ε

2, T + ε

2]

It remains to show that E is closed in I+. Let T ∈ E ∩ I+. Let ti ∈ E,ti → T . Then the assumption that |φ| ≤ C on E implies there is a uniformε so that for all ti, there is a solution on [ti − ε, ti + ε]. Choose ti so that|T − ti| < ε. Also, let τ < ti so that |T − τ | < ε. Now we use the sameargument as in previous paragraphs: Use the solution φ on [t0, ti) to constructa solution φ on [τ−ε, τ+ε] 3 T . Lemma 24 allows us to glue φ and φ togetherto form a unique solution valid on [t0, τ + ε] 3 T . So T ∈ E as above and Eis closed in I+.

rTrtirτrT − ε[t0, T ]

[τ − ε, τ + ε]

37

This Lemma 25 completes the proof of the Extension Theorem 6, at leastfor solutions moving forward in time. The reason is this: if there is a timeT ∈ I+ ∩ ∂J (we may choose I+ since we are only moving forward in time),then

E = J+ 6= I+.

Therefore, by the contrapositive of Lemma 25, supE |φ| =∞. But since φ iscontinuous on [t0, T ), we must have lim supt→T |φ(t)| =∞.

The argument for solutions moving backward in time is the same.

The above theorem may be improved as follows:

Theorem 7. Consider an initial value problem x = v(x, t), x(t0) = x0.Assume v is continuous and locally Lipschitz in U , where U is a connectedopen subset of Rn × R containing (x0, t0). Then there is an open intervalJ satisfying t0 ∈ J and a unique solution φ : J → Rn to the initial valueproblem. Moreover, J is maximal in the following sense: Let J+ = J∩(t0,∞)and J− = J ∩ (−∞, t0). Then neither of the graphs G± = (t, φ(t)) : t ∈ J±is contained in any compact subset of U .

The proof is essentially the same as that of Theorem 6.Here is an important principle which follows from the basic theorems

Proposition 26. Consider the graph of a solution (t, x(t)) to a differentialequation x = v(x, t), where v is Lipschitz. If any two solutions have graphswhich cross, then they must coincide on the intersection of their intervals ofdefinition.

Proof. Let φ1 and φ2 be the two solutions. If their graphs cross at (t0, x0),then they both solve the initial value problem

x = v(x, t), x(t0) = x0.

The solutions must coincide on a small interval by Proposition 22, and thenmust coincide on the whole intersection of their intervals of definition byLemma 24.

Homework Problem 15. Consider the initial value problem x = x2 + t,x(0) = 1. Show that the solution to this problem (moving forward in time)exists only until some time T > 0, where T < 1.

38

Hint: See Examples 6 and 7 above. Let φ(t) be the solution to the currentinitial value problem. We will compare φ to the solution ψ(t) = 1

1−t of theinitial value problem x = x2, x(0) = 1. Let J be the maximal interval onwhich φ can be extended. Let J+ = J∩(0,∞); T is then the positive endpointof J+. Now consider the interval

E = t ∈ J+ : φ(τ) ≥ ψ(τ) for all τ ∈ (0, t].

(a) Show that E = J+ implies T ≤ 1. (Use Theorem 6.)

(b) Proceed to show E = J+. It suffices to show E is nonempty, open andclosed in J+. Why?

(c) To show E is nonempty, differentiate the equation φ = φ2 + t at t = 0.This will allow you to compute φ(0). Show that φ(0) = ψ(0), φ(0) =ψ(0), and φ(0) > ψ(0). Why does this show E is nonempty? (UseTaylor’s Theorem or integrate in t twice; in particular, by the regularityresults in Subsection 2.5 below, φ is continuous.)

(d) To show E is open, show that φ(t) > ψ(t) for t ∈ E.

(e) To show E is closed, use the continuity of φ and ψ. So this provesE = J+ and so T ≤ 1.

(f) To show T < 1, note that part (c) implies there is a point τ ∈ Ewhere φ(τ) > ψ(τ). Let ψ(t) be the solution to the initial value problemx = x2, x(τ) = φ(τ). Solve this equation explicitly and show that ψblows up at a time T < 1. Then note that parts (a)-(e) can be repeatedto show that J+ ⊂ (0, T ).

2.4 Linear systems

If x ∈ Rn, a homogeneous linear system is a system of the form x = A(t)x,where A(t) is an n × n matrix valued function of t alone. In this case, it isstraightforward to see that the space of solutions is a vector space over R.In other words, if α ∈ R, φ, ψ satisfy the equation, then αφ+ψ also satisfiesthe equation. The existence and uniqueness theorem allows us to find thedimension of the solution space.

39

Proposition 27. Consider the equation x = A(t)x, where A(t) is a contin-uous n× n matrix valued function of t, and x(t) ∈ Rn. For each t0, there isan interval I 3 t0 so that the space of solutions φ(t) on I has dimension n.Consider an initial value condition x(t0) = x0. Let φx0(t) be the solution tothis initial value problem. Then the map S : x0 7→ φx0 is a linear isomorphismfrom Rn to the space of solutions defined on I.

Remark. It is not too hard to show that the interval I can be taken to bethe maximal open interval containing t0 on which A(t) is continuous. (SeeMichael Taylor, Partial Differential Equations, Basic Theory.)

Proof. A(t)x is locally Lipschitz in x and continuous in t, as needed forTheorems 5 and 6. First of all, for a basis ξi of Rn, let I be a small intervalon which all the solutions φξi exist. Note the map x0 7→ φx0 is obviouslylinear. S is injective since if x0 6= y0, φx0(t0) 6= φy0(t0), and thus φx0 6= φy0 .Therefore, if x0 = aiξi, φx0 = aiφξi . Again by uniqueness, any solution φ tox = A(t)x is determined by the initial value φ(t0) = x0, and so S is onto.

Given a linear equation x = A(t)x, for x = x(t) ∈ Rn, we can consider asimilar equation X = A(t)X for X = X(t) an n× n matrix valued function.The solution Φ(t) of the initial value problem

X = A(t)X, X(t0) = I the identity matrix,

is called the fundamental solution of the equation x = A(t)x. It is straight-forward to see that the ith column of Φ(t) is the solution to x = A(t)x,xj(t0) = δji . Moreover, the fundamental solution can be used to compute anysolution to the differential equation near t0.

Lemma 28. On the maximal interval of existence of the fundamental solu-tion Φ(t) of x = A(t)x, the solution to the initial value problem

x = A(t)x, x(t0) = x0,

is given by Φ(t)x0.

Proof. The proof is an immediate calculation.

Homework Problem 16. An inhomogeneous linear system is a system ofthe form

x = A(t)x+ b(t), (15)

where A(t) and x are as above and b(t) is a continuous Rn-valued function.

40

(a) Let ψ(t) be a solution to (15). Show that the solution space to (15) isequal to

ψ(t) + φ(t) : φ(t) solves x = A(t)x.

(b) In dimension 1, let Φ(t) be the fundamental solution to x = A(t)x.Show that the general solution to (15) is

Φ(t)

[∫b(t)

Φ(t)dt + C

].

(c) Still in dimension 1, solve the initial value problem x = x+ t, x(0) = 1.

An important example class of equations are those with constant coeffi-cients. x = Ax, for A a constant n× n matrix. The fundamental solution tosuch an equation (with t0 = 0) can be calculated directly. In the case that Ais diagonalizable, write A = PDP−1, with D = diag (λ1, . . . , λn) the diago-nal matrix with the eigenvalues λi of A along the diagonal and P the matrixwhose columns are a basis of eigenvectors for the appropriate eigenvalues.Then if we define

etD = diag (etλ1 , . . . , etλn),

then the fundamental solution to x = Ax is given by

etA ≡ PetDP−1.

To check that etA is the fundamental solution, note that e0A = I and

d

dtetA =

d

dt(PetDP−1)

= P

(d

dtetD)P−1

= PDetDP−1

= PDP−1PetDP−1

= AetA.

One thing to note is D and P may be complex-valued matrices. This doesn’tcause any problem if we use Euler’s formula

ex+iy = ex(cos y + i sin y).

Not every matrix B is diagonalizable. To find a general formula for thefundamental solution etB, we need to deal with the case of Jordan blocks.The following problem addresses this.

41

Homework Problem 17. Let B be the n× n Jordan block matrixλ 1 0 · · · 00 λ 1 · · · 00 0 λ · · · 0...

......

. . ....

0 0 0 · · · λ

(16)

with λ on the diagonal, 1 just above the diagonal, and 0 elsewhere. Find thefundamental solution etB to x = Bx.

Hint: Write out the system of equations in terms of components. Notethat xn only involves xn and not any other xi. So first solve the appropriateinitial value problems for xn (you’ll need to do one initial value problem foreach column of the identity matrix I). Then do xn−1, then xn−2, etc., andfind a formula that works for all xi.

Alternatively, it is possible to write out etB as a power series. If youapproach the problem this way, you must check to be sure your answer works.

Of course the reason we consider Jordan blocks is the following famoustheorem.

Theorem 8 (Jordan Canonical Form). Let A be an n × n complex matrix.Then we can write A = PBP−1, where B is an upper triangular, blockdiagonal matrix of the form

B =

B1 0 0 · · · 00 B2 0 · · · 00 0 B3 · · · 0...

......

. . ....

0 0 0 · · · Bm

,

where each Bi is an li × li Jordan block matrix of the form (16) for i =1, . . . ,m, λ = λi an eigenvalue of A. Of course l1 + · · · + lm = n. If λ is aroot of the characteristic polynomial det(λI − A) repeated k times, then∑

λi=λ

li = k.

B is unique up to the ordering of the blocks Bi.

42

Remark. A is diagonalizable if and only if each Jordan block is 1 × 1. Ifthe characteristic polynomial of A has distinct roots, then A is diagonaliz-able, but the converse is false in general (A = I the identity matrix is acounterexample).

Homework Problem 18. Assume that all the eigenvalues of the n × nmatrix A have negative real part. (A is not necessarily diagonalizable.) Showthat etA → 0 as t → ∞. (Just check that each entry in the matrix etA goesto 0.)

Homework Problem 19. Solve the initial value problem

x = 2x− y, y = 2x+ 5y, x(0) = 2, y(0) = 1.

2.5 Regularity

Regularity of a function refers to how many times the function may be differ-entiated. A function is (locally) Ck if it and all of its partial derivatives up toorder k are continuous. A function is C∞ if it and all of its partial derivativesof all orders are continuous. For the purposes of this course, a function issmooth if it is C∞ (in other settings a function may be called smooth if it hasas many derivatives as the purpose at hand requires). There are other no-tions of regularity in which the function and perhaps its derivatives, suitablydefined, are in Lp or other Banach spaces.

A vector-valued function is smooth or Ck if and only if each of its com-ponent functions is smooth or Ck respectively.

Theorem 9. Assume v : O × I → Rn is smooth (O ⊂ Rn is a domain andI ⊂ R is an open interval). Any solution to x = v(x, t) is smooth.

Proof. Let φ be a solution. Since φ exists, then φ is differentiable, and thuscontinuous. Since v is continuous as well, φ = v(φ, t) is continuous and so φis (locally) C1. Now since v is smooth, we may differentiate to find

φ(t) =∂v

∂xi(φ, t)φi(t) +

∂v

∂t(φ, t).

Now since φ and φ and the partial derivatives of v are continuous, we see thatφ is continuous and φ is (locally) C2. Since v is smooth, we can keep differ-entiating, using the chain and product rules, to find by induction dmφ/dtm

is continuous for all m and so φ is C∞.

43

Remark. The technique used in the proof of Theorem 9 above is called boot-strapping. In this process, once we know that φ is C0, we plug into theequation to find that φ is C1. Then we use the fact that φ is C1 to prove φis C2, etc.

Remark. The proof above also shows that if v is Ck, then φ is Ck+1.

2.6 Higher order equations

A higher-order systems of ODEs is of the form

x(m) = v(x(m−1), . . . , x, x, t), (17)

where of course x(m) = dmxdtm

. There is an easy trick to transform this system toan equivalent first-order system with more variables. Let y1 = x, . . . , ym−1 =x(m−1). Then it is easy to see the system (17) above is equivalent to thesystem

ym−1 = v(ym−1, . . . , y1, x, t),ym−2 = ym−1,

......

y1 = y2,x = y1.

(18)

This first-order system leads us to the appropriate formulation of the initial-value problem:

Theorem 10. Let U be a neighborhood of (xm−10 , . . . x1

0, x0, t0) in Rnm+1 =Rn× · · · ×Rn×R. Let v : U × I → Rn be locally Lipschitz. Then there is aninterval J on which there is a unique solution to the initial value problem

x(m) = v(x(m−1), . . . , x, x, t),x(m−1)(t0) = xm−1

0 ,...

...x(t0) = x1

0,x(t0) = x0.

(19)

Moreover, if T is an endpoint of J (either finite or infinite), then as t→ T ,(x(m−1), . . . , x, x, t) leaves every compact subset of U .

Proof. Apply Theorems 5 and 7.

44

So for an mth order differential equation, we need initial conditions forthe function and its derivatives up to order m− 1.

Remark. The trick of introducing new variables into a system of ODEs isstandard in physics. For a particle at position x = x(t), a typical equationinvolves how a force acts on the particle. The sum F of the forces acting onthe particle must be equal to mx, where m is a constant called the mass. It isstandard to introduce a new vector quantity, called the momentum q = mx.Then F = mx is equivalent to the system

q = F, x =q

m.

Again, an important class of examples is linear equations with constantcoefficients. If

x(m) + am−1x(m−1) + · · ·+ a1x+ a0x = 0,

for x a real-valued function, the functions eλkt are linearly independent inthe solution space, if λk solve the characteristic equation

λm + am−1λm−1 + · · ·+ a1λ+ a0.

If all the roots are distinct, then eλkt form a basis. If a root is repeated ltimes, then we must consider functions of the form tjeλkt for j = 0, . . . , l− 1to form a basis of the solution space.

Euler’s formula again allows us to handle complex roots of the character-istic equation.

Homework Problem 20. For which real values of the constants a and b doall the solutions to

x+ ax+ bx = 0

go to 0 as t→∞? Prove your answer, and draw your answer as a region inthe (a, b) plane.

2.7 Dependence on initial conditions and parameters

We’ve shown above that if v = v(x, t) is smooth, then the resulting solutionto x = v(x, t), x(t0) = x0 is also smooth as a function of t. The initial valueproblem also depends on the initial point x0. We investigate regularity ofthe solution depending on x0.

45

First of all we remark that there is a neighborhood N of (x0, t0) in Rn+1

and an ε > 0 so that every solution to the equation with initial conditionx(τ) = y for (y, τ) ∈ N exists by Lemma 23. This existence on a neighbor-hood allows us to consider taking derivatives in y in what follows.

Theorem 11. Let v be a C2 function on a neighborhood of the initial con-ditions (y, t0) ∈ Rn × R. Then the solution φ = φ(y, t) to the initial valueproblem

x = v(x, t), x(t0) = y,

is C1 in y.

Proof. If ∂φ/∂yi exists, then it must satisfy

∂

∂tDyφ = Dxv(φ, t) Dyφ.

(Here Dyφ is the total derivative matrix with respect to the y variables. Soits entries are ∂φj/∂yi.) So Φ = (φ,Dyφ) = (x, z) should satisfy the initialvalue problem

x = v(x, t),z = Dxv(x, t) z,

x(t0) = y,z(t0) = I the identity matrix.

(20)

Note that since v is C2, Dxv = (∂vk/∂xj) is C1 and is thus locally Lipschitzby Proposition 15. Even though we don’t yet know that the derivative Dyφsatisfies the equation, we do know that the initial value problem (20) issolvable.

In order show the solution to (20) is the partial derivative, we return to theproof of Theorem 5. Let φ0 = y, ψ0 = I the identity matrix. Then (φ0, ψ0)satisfy the initial conditions in (20). Now we form Picard approximations

φn+1(y, t) = y +

∫ t

t0

v(φn(y, τ), τ) dτ,

ψn+1(y, t) = I +

∫ t

t0

Dxv(φn(y, τ), τ) ψn(y, τ) dτ.

It is easy to show by induction that Dyφn = ψn. We already have theinitial step n = 0, and since we can differentiate under the integral sign

46

(see Proposition 11 above), we can easily check that Dyφn = ψn impliesDyφn+1 = ψn+1.

We know by the proof of Theorem 5 that φn → φ and ψn → ψ uniformlyon a small interval containing t0. Then Proposition 8 shows that ∂φ/∂yi = ψithe ith component of ψ for i = 1, . . . , n. Since these partial derivatives arecontinuous (the uniform limit of continuous functions is continuous), thenProposition 6 shows Dyφ = ψ.

Remark. The previous theorem is true if we assume v is only C1 and notnecessarily C2. The proof is more involved in the case v is only C1. (SeeTaylor, Partial Differential Equations, Basic Theory, section 1.6.)

A bootstrapping argument can be used to prove the following

Proposition 29. For r ≥ 2, let v be a Cr function on a neighborhood ofthe initial conditions (x0, t0) ∈ Rn × R. Then the solution φ = φ(y, t) to theinitial value problem

x = v(x, t), x(t0) = y,

is Cr−1 in y.

Proof. Let Proposition Tr be the proposition for a given r ≥ 2. We proceedby induction. The case r = 2 is proved above in Theorem 11. Now assumethat the Proposition Tr has been proved. To prove Tr+1, assume that v islocally Cr+1 and let φ be a solution to the initial value problem. Then Dxvis locally Cr. Now as above, the pair (φ,Dyφ) = (x, z) satisfies

x = v(x, t), z = Dxv(x, t) z. (21)

Now analyze the right-hand side of the equations in (21). They are Cr

functions of x, z, t. Therefore, Proposition Tr shows that z = Dyφ is locallyCr−1 in y. Since the first partial derivatives of φ are Cr−1, φ is Cr. Thisproves the inductive step, and the proposition.

We also have the following

Corollary 30. If v = v(x, t) is smooth (C∞), then the solution φ to theinitial value problem x = v(x, t), x(t0) = y is smooth in y.

Moreover, it is not too hard to prove the following:

47

Theorem 12. Let r ≥ 2. If v(x, t) is Cr jointly in x and t, and if φ is thesolution to x = v(x, t), x(t0) = y, then φ is jointly Cr−1 in y, t and t0.

Idea of proof. The difficult part is already done (the Cr−1 dependence on y).For the rest, recall that any solution φ = φ(y, t0, t) satisfies

φ = y +

∫ t

t0

v(φ(y, t0, τ), τ) dτ.

Then use the Fundamental Theorem of Calculus and Proposition 11 aboveto produce a bootstrapping argument to show that the appropriate partialderivatives are continuous.

For a complete proof, see Arnol’d, Ordinary Differential Equations, sec-tion 32.5.

Homework Problem 21. For f = f(x, t, y) a smooth function real variablesof x, t, and y, compute

d

dt

∫ t2

0

f(x(t, y), t, y) dy.

Make sure your answer works for the functions f(x, t, y) = x2ty + t3y2 + x,x(t, y) = y2 + t2.

Hint: Carefully rename all intermediate variables and apply the ChainRule. It also should help to write down the anti-derivative F =

∫f(x(t, y), t, y) dy

and work with the function F using the Fundamental Theorem of Calculus.

Homework Problem 22 (Smooth dependence on parameters). Show thatif v = v(x, t, α) is jointly smooth on a neighborhood of (x0, t0, α0) in Rn ×R× Rm, then the solution φ to the initial value problem

x = v(x, t, α), x(t0) = x0

is smooth as a function of α.Hint: Show that this initial value problem is equivalent to the problem

x = v(x, t, β), x(t0) = x0, β = 0, β(t0) = α.

48

2.8 Autonomous equations

An ODE system of the form x = v(x) is autonomous. In other words, asystem is autonomous if there is no explicit dependence on t. The main factabout autonomous systems is the following proposition, whose proof is aneasy computation:

Proposition 31. If φ is a solution to x = v(x), then for all T ∈ R, φ(t) =φ(t+ T ) is also a solution.

A constant solution to an ODE system is called an equilibrium solution.The equilibrium solutions to autonomous equations correspond to the rootsof v.

Example 9. Consider the initial value problem x = x2 − 1. Then to solve,we have the equilibrium solutions x = 1 and x = −1. If x2 − 1 6= 0, compute

dx

dt= x2 − 1,∫

dx

x2 − 1=

∫dt,∫

1

2

(1

x− 1− 1

x+ 1

)dx = t+ C,

1

2ln

∣∣∣∣x− 1

x+ 1

∣∣∣∣ = t+ C,

x− 1

x+ 1= ±e2t+2C

= Ae2t, A = ±e2C 6= 0,

x =1 + Ae2t

1− Ae2t,

A =x(0)− 1

x(0) + 1.

If x(0) ∈ (−1, 1), then A < 0, and the solution x exists for all time andis bounded between the equilibrium solutions at 1 and −1. Moreover, xapproaches the equilibrium solutions x → −1 as t → ∞ and x → 1 ast → −∞. If x(0) > 1, then A ∈ (0, 1) and the solution exists only fort ∈ (−∞,−1

2lnA). If x(0) < −1, then A > 1 and the solution exists only

for t ∈ (−12

lnA,∞).

49

This behavior is typical of the behavior of autonomous equations for Lip-schitz v. Any bounded solution which exists for all time must be asymptoticto equilibrium solutions as t → ±∞. Also note that any integral curve Iacts as a barrier to other solutions, in that no other integral curves can crossI (see Proposition 26 above).

Homework Problem 23. Let v : R → R be locally Lipschitz. Showthat any bounded solution φ of x = v(x) which exists for all time satisfieslimt→∞ φ(t) = c, where v(c) = 0.

Hint: There are three cases:Case 1: v(φ(0)) = 0. Show that φ is constant by uniqueness.Case 2: v(φ(0)) > 0. Show that v(φ(t)) > 0 for all t (if it is ever equal tozero, apply the argument of Case 1 above to show φ is constant; also use thecontinuity of v φ). Now show φ(t) is always increasing, and so must havea finite limit c as t→∞. Compute limt→∞ v(φ(t)). Write

∞ > c = φ(0) +

∫ ∞0

φ(t) dt = φ(0) +

∫ ∞0

v(φ(t)) dt,

and show that v(c) = 0.Case 3: v(φ(0)) < 0 is essentially the same as Case 2.

2.9 Vector fields and flows

An important interpretation of autonomous systems of equations is given interms of vector fields. Interpret x(t) as a parametrized curve x : I → Rn,where I ⊂ R is an interval. Then x(t) is the tangent vector to the curve attime t. For O ⊂ Rn an open set, a function v : O → Rn can be thoughtof as a vector field. In other words, at every point x ∈ O, v(x) is a vectorin Rn based at x. Then we have a natural interpretation of an autonomousdifferential equation x = v(x) as the flow along the vector field v.

For any solution to x = v(x), the tangent vector x(t) must be equal tothe value of the vector field v(x(t)). The solution x(t) is an integral curveto the equation x = v(x). The integral curves for the solution are tangentto the vector field at each point x. Moreover, if v(x) is locally Lipschitz,then the solutions are unique, and we may think of the vector field as givingunique directions for how to proceed in time at each point in space. Bythe invariance of solutions in time, we have the following strong version ofuniqueness:

50

Proposition 32. Let O ⊂ Rn be an open set, and let v : O → Rn be locallyLipschitz. If φ1 and φ2 are two maximally extended solutions to x = v(x)which satisfy φ1(t1) = φ2(t2), then φ1(t) = φ2(t + t2 − t1) for all t in themaximal interval of definition of φ1.

Proof. φ1(t) and φ2(t) = φ2(t+ t2− t1) both satisfy the initial value problem

x = v(x), x(t1) = φ1(t1),

and so must be the same by Theorems 5 and 6.

For a vector field v on O ⊂ Rn, a picture of all the integral curves on Ois called the phase portrait of v. Recall we drew in class the phase portraitsof the two systems in R2

x =

(1 00 −1

)x, x =

(−3 4−2 3

)x.

Homework Problem 24.

(a) Draw the phase portrait of the system in R2

x =

(1 00 2

)x.

Show that each integral curve lies in a parabola or a line in R2.

(b) Draw the phase portrait of the system in R2

x =

( 32−1

2

−12

32

)x.

Here is the principal theorem regarding flows of vector fields on open sets:

Theorem 13. Let O ⊂ Rn be open, and v : O → Rn be smooth. Then thereis an open set U so that O×0 ⊂ U ⊂ O×R on which the solution φ(y, t)to

x = v(x), x(0) = y

exists, is unique, and is smooth jointly as a function of (y, t).

Proof. This follows immediately from Theorems 5, 7 and 11.

51

Remark. It may not be possible to find an ε > 0 so that O×(−ε, ε) ⊂ U . Thereason is that solutions may leave O in shorter and shorter times for initialconditions y → ∂O. A simple example is given by v(x) = 1, O = (0, 1).This problem cannot be fixed by considering O = Rn, since we may havev(y)→∞ rapidly as y →∞ in Rn. However, see the following corollary.

Corollary 33. Under the conditions of Theorem 13 above, if K ⊂ O iscompact, then there is an ε > 0 so that the solution

φ : K × (−ε, ε)→ O.

Proposition 34. Consider φ(y, t) the solution to x = v(x), x(0) = y, for vsmooth. Then as long as φ(y, t1), φ(y, t1 + t2) ∈ O, then

φ(y, t1 + t2) = φ(φ(y, t1), t2).

Proof. Consider

ψ(t) = φ(y, t1 + t), θ(t) = φ(φ(y, t1), t).

Then if we show ψ and θ satisfy the same initial value problem, then unique-ness will show that ψ(t) = θ(t) and we are done.

Compute

ψ(0) = φ(y, t1),

θ(0) = φ(φ(y, t1), 0) = φ(y, t1),

ψ(t) = φ(y, t1 + t) · 1 = v(φ(y, t1 + t)) = v(ψ(t)),

θ(t) = φ(φ(y, t1), t) = v(φ(φ(y, t1), t)) = v(θ(t)).

Note that it is necessary in the previous Proposition 34 to restrict totimes in which the solution does not leave O. In fact, long-time existenceof flows along vector fields is problematic on open subsets of Rn. Recall werequire our subsets to be open for ODEs since we want to be able to take two-sided limits for any derivatives involved. On the other hand, compactnessguarantees a uniform time interval for existence. But compact subsets of Rn

are closed and bounded, and thus (if nonempty) cannot be open. The wayout of this problem is to consider compact manifolds, which we will realizeas compact lower-dimensional subsets of Rn. For example,

S1 = (x1, x2) : (x1)2 + (x2)2 = 1

is a compact one-dimensional submanifold of R2.

52

2.10 Vector fields as differential operators

A vector field v on O naturally differentiates functions f on O by the direc-tional derivative:

vf = Dvf = vi∂f

∂xi

for vi the components of v. Therefore, we often write

v = vi∂

∂xi.

We say that v is a first-order differential operator on functions f .This observation is natural from the point of view of ODEs by the fol-

lowing

Proposition 35. For an interval I ⊂ R, let φ : I → Rn be a solution to theautonomous system x = v(x), where v : O → Rn is a continuous function andO an open subset in Rn. Also consider a differentiable function f : O → R.Then the derivative

(f φ)′(t) = (Dvf)(φ(t)) = (vf)(φ(t)).

Proof. Compute

(f φ)′(t) = (Df)(φ(t)) (Dφ)(t)

=∂f

∂xi(φ(t))

dφi

dt(t)

=∂f

∂xi(φ(t))vi(φ(t))

=

(vi

∂

∂xif

)(φ(t))

= (vf)(φ(t)).

Define the bracket [v, w] of two operators to be

[v, w]f = (vw − wv)f = v(wf)− w(vf).

Homework Problem 25. Let v and w are two smooth vector fields on Ω.

53

(a) Show that the differential operator [v, w] is also a first-order differentialoperator determined by a vector field (which we also write as [v, w]).What are the components of [v, w]?

(b) For smooth vector fields u, v and w, show that

[u, v] = −[v, u]

and[[u, v], w] + [[v, w], u] + [[w, u], v] = 0.

(This last identity is the Jacobi identity.)

Remark. Part (b) of the previous problem shows that the vector space ofsmooth vector fields on O is a Lie algebra. The bracket [·, ·] is called the Liebracket.

54

3 Manifolds

3.1 Smooth manifolds

We define smooth manifolds as subsets of RN . We basically follow Spivak,Calculus on Manifolds, Chapter 5. When we say smooth in this section, wemean C∞.

We say a subset M ⊂ Rn is a smooth k-dimensional manifold (or, moreproperly, a submanifold of Rn), if for all x ∈ M , there are open subsetsU ⊂ Rk and O ⊂ M with x ∈ O and a one-to-one C∞ map φ : U → Rn

satisfying

1. φ(U) = O.

2. For all y ∈ U , Dφ(y) has rank k.

3. φ−1 : O → U is continuous.

Such a pair (φ,U) is called a local parametrization of M . The componentsof the map φ−1 : O → Rk are local coordinates on M . A set of triples(φα,Uα,Oα) is called an atlas of M if Oα is an open cover of M .

Since O is an open subset of M , there is an open subset W ⊂ Rn so thatO = M ∩W . In this case, we may rewrite condition (1) as

(1′) φ(U) = M ∩W .

Also note that φ : U → O is a homeomorphism from O to U since it issmooth, one-to-one, onto, and φ−1 is continuous.

Now we note with a few examples why conditions (2) and (3) are nec-essary. First of all, consider φ : R → R2 given by φ(t) = (t2, t3). Thenφ is smooth, one-to-one, and φ−1 : φ(R) → R is continuous. But we note

the image φ(R), which is the graph of x1 = (x2)23 in R2, is not smooth at

(0, 0) ∈ R2. We also check that

Dφ =

(2t3t2

)= 0 when t = 0 and φ(t) = (0, 0),

and so Dφ has rank 0 < 1 at the point at which φ(R) is not smooth.Condition (3) is necessary by the following problem:

55

Homework Problem 26. Recall polar coordinates (x, y) = (r cos θ, r sin θ)in R2. Show that a portion of the polar graph r = sin 2θ can be parametrizedfor I an open interval in R, by φ : I → R2 so that φ is one-to-one, C∞, andDφ is never 0, but so that φ−1 : φ(I)→ I is not continuous. Sketch the graphand indicate pictorially why φ(I) should not be considered a submanifold ofR2.

If W and V are open subset of Rn, then a map f : W → V is a diffeomor-phism if f is one-to-one, onto, C∞, and f−1 is C∞. The Inverse FunctionTheorem and Problem 9 show

Lemma 36. f : W → V is a diffeomorphism if and only if f is one-to-one,onto, C∞, and detDf(x) 6= 0 for all x ∈ W .

The following theorem is useful in proving properties about manifolds:

Theorem 14. M ⊂ Rn is a k-dimensional manifold if and only if for all x ∈M , there are two open subset V,W of Rn, with x ∈ W and a diffeomorphismh : W → V satisfying

h(W ∩M) = V ∩ (Rk × 0) = y ∈ V : yk+1 = · · · = yn = 0.

Proof. (⇐) Let U = a ∈ Rk : (a, 0) ∈ h(W ), and define φ : U → Rn byφ(a) = h−1(a, 0). φ is smooth and one-to-one since h is a diffeomorphism.Moreover, φ(U) = M ∩W to satisfy condition (1′). φ−1 = h

∣∣(W∩M)

is contin-uous.

So all that is left to check is the rank condition (2). Consider H : W → Rk

H(z) = (h1(z), . . . , hk(z)).

Then H(φ(y)) = y for all y ∈ U . Then use the Chain Rule to computeDH(φ(y)) Dφ(y) = I, and so Dφ(y) must be an injective linear map, andso must have rank k. Thus M is a smooth manifold.

(⇒) Now assume M is a manifold, and define y = φ−1(x). Then Dφ(y)has rank k, and so there is at least one k × k submatrix of Dφ(y) withnonzero determinant. (We may think of Dφ(y) as an n× k matrix mappingcolumn vectors in Rk to column vectors in Rn. Then a k × k submatrix issimply a collection of k distinct rows of Dφ(y).) By a linear change of basis,if necessary, then, we may assume that

det1≤i,j≤k

(∂φi

∂yj

)(y) 6= 0.

56

By continuity, this is true on an open neighborhood U ′ of y.Define g : U ′×Rn−k → Rn by g(a, b) = φ(a)+(0, b). Then, in block matrix

form,

Dg(a, b) =

(∂φi

∂yj

)1≤i,j≤k

0(∂φi

∂yj

)1≤j≤k,k<i≤n

In−k

.

So detDg(a, b) = det1≤i,j≤k

(∂φi

∂yj

)6= 0. So we may apply the Inverse Function

Theorem to find that there are open subsets of Rn V ′1 3 (y, 0) and V ′2 3g(y, 0) = x so that g : V ′1 → V ′2 has a smooth inverse h : V ′2 → V ′1 .

Define O via

O = φ(a) : (a, 0) ∈ V ′1= (φ−1)−1(ι−1(V ′1)),

where ι : Rk → Rn sends a to (a, 0). Since φ−1 is continuous, O is an opensubset of φ(U ′), and of M . Therefore, there is an open subset V of Rn sothat O = M ∩ V .

Let W = V ∩V2, and V = g−1(W ). Then h : V → W is a diffeomorphismand

W ∩M = φ(a) : (a, 0) ∈ V = g(a, 0) : (a, 0) ∈ V ,

h(W ∩M) = g−1(W ∩M)

= g−1 (g(a, 0) : (a, 0) ∈ V )= V ∩ (Rk × 0).

This completes the proof.

This characterization of manifolds is quite useful. Consider two smoothlocal parametrizations φα : Uα → Oα, and φβ : Uβ → Oβ. Then ifOα∩Oβ 6= ∅,then we have the following

Proposition 37. φ−1β φα : φ−1

α (Oβ)→ φ−1β (Oα) is a diffeomorphism.

Proof. Consider π : Rn → Rk given by (a, b) 7→ a for (a, b) ∈ Rk ×Rn−k, andι : Rk → Rn given by ι(a) = (a, 0). Let hα and hβ be the diffeomorphisms

57

guaranteed by Theorem 14. Then φα(a) = h−1α (a, 0), φ−1

α (x) = π(hα(x)), andso

φ−1β φα = π hβ h−1

α ιis smooth since hα, hβ are diffeomorphisms.

The maps φ−1β φα are called gluing maps.

Remark. It is often useful to think of a manifold M as being glued togetherfrom domains Uα in Rk by the gluing maps. In fact, the previous proposi-tion is the starting point for the abstract definition of a smooth manifold:A smooth k-dimensional manifold is Hausdorff, sigma-compact topologicalspace for which each point x has a neighborhood Oα homeomorphic to adomain Uα ⊂ Rk via φα : Uα → Oα. In addition, we require the gluing mapsφ−1β φα to be smooth on φ−1

α (Oβ).

If M is a smooth manifold, then a function f : M → Rp is said tobe smooth if for each smooth parametrization φ : U → M , f φ : U →Rp is smooth. If N ⊂ Rp is a smooth submanifold, then f : M → N issaid to be smooth the induced map f : M → Rp is smooth. (For abstracttarget manifolds N , we may work with local parametrizations instead.) Thisdefinition of smooth maps from manifolds is consistent in the following sense:

Proposition 38. If f : M → Rp, and f φα is smooth from Uα → Rp, thenon φ−1

β (Oα) ⊂ Uβ, f φβ is also smooth.

Proof. Apply Proposition 37 and the Chain Rule.

Proposition 39. If M ⊂ Rn is a smooth manifold and f : M → Rp, then fis smooth if and only if f can be locally extended to smooth functions fromdomains in Rn to Rp. In other words, f is smooth if and only if every x ∈Mhas a neighborhood W in Rn, and there is a smooth function F : W → Rp sothat F

∣∣W∩M = f .

Proof. (⇒) For x ∈ M , consider the local diffeomorphism h : W → Vguaranteed by Theorem 14. Then for the smooth parametrization φ(a) =h−1(a, 0), we know f φ is smooth. Now define

F = f h−1 π h : W → Rp

for π : (a, b) 7→ a. F is smooth since

F = f h−1 π h = (f φ) π h.

58

(⇐) For a local parametrization φ, f φ is smooth since locally, f φ =F φ, which is smooth by the Chain Rule.

X ⊂ RN is a smooth manifold of dimension k if every x ∈ X has aneighborhood that is diffeomorphic to an open subset of Rk. In other words,there is an open cover Oα of X so that each Oα is diffeomorphic to an opensubset Uα ⊂ Rk. Let φα : Uα → Oα be the diffeomorphism. φα is called aparametrization of Oα ⊂ X, and the inverse map φ−1

α is called a coordinatesystem. The open cover, together with the coordinate systems

Oα, φα,Uα

is called a smooth atlas of X, and X is a smooth manifold if and only if ithas a smooth atlas.

Example 10. The unit sphere

S2 = (x1, x2, x3) ∈ R3 : (x1)2 + (x2)2 + (x3)2 = 1

is a two-dimensional submanifold of R3.To show this, we provide an atlas. Let N = (0, 0, 1) be the north pole and

S = (0, 0,−1) be the south pole. Then let O1 = S2 \ N, O2 = S2 \ S,U1 = U2 = R2. We construct the coordinate systems φ−1

α , α = 1, 2, bystereographic projection. We may realize R2 as the plane x3 = 0 ⊂ R3.

For a point x in O1, consider the line Lx,N in R3 through N and x. Wedefine φ−1

1 (x) to be the unique point in R2 ∩ Lx,N . It is easy to compute

(y1, y2) = φ−11 (x1, x2, x3) =

(x1

1− x3,

x2

1− x3

),

(x1, x2, x3) = φ1(y1, y2) =

(2y1

|y|2 + 1,

2y2

|y|2 + 1,|y|2 − 1

|y|2 + 1

).

Similarly, for any point x ∈ O2, define φ−12 (x) to be the unique point in

R2 ∩ Lx,S, and we find as above

(z1, z2) = φ−12 (x1, x2, x3) =

(x1

1 + x3,

x2

1 + x3

),

(x1, x2, x3) = φ2(z1, z2) =

(2z1

|z|2 + 1,

2z2

|z|2 + 1,−|z|

2 − 1

|z|2 + 1

).

It is straightforward to check that each of these coordinate systems is a dif-feomorphism, and since S2 = O1 ∪ O2, we have produced a smooth atlas ofS2 and thus have shown that S2 is a two-dimensional manifold.

59

Given a smooth manifold X with a smooth atlas Oα, φα,Uα, let Oαβ =Oα ∩ Oβ. Also define Uαβ = Uα ∩ φ−1

α (Oαβ). As long as Oαβ 6= ∅, the map

φαβ ≡ φ−1β φα : Uαβ → Uβα

is a diffeomorphism. These maps φαβ are called the gluing maps of the man-ifold X associated to the atlas. In particular, the manifold can be thought ofas the union of the coordinate charts Uα glued together by the gluing maps.It is straightforward to see, at least as a set, we may identify

X =

(⊔α

Uα

)/ ∼,

where t means disjoint union and the equivalence relation ∼ is given by

x ∼ y if x ∈ Uαβ ⊂ Uα, y ∈ Uβα ⊂ Uβ, y = φαβ(x).

Gluing maps may be used to define smooth manifolds which are not necessar-ily subsets of RN (though we won’t do so here). It is instructive to think ofk-dimensional smooth manifolds as spaces that are smoothly glued togetherfrom open sets in Rk.

Example 11. Recall the example of the atlas of S2 above. Compute

O12 = S2 \ S,N,U12 = R2 \ 0,U21 = R2 \ 0,

z = φ12(y) = φ−12 (φ1(y)) =

(y1

|y|2,y2

|y|2

)=

y

|y|2.

This gluing map is called inversion across the circle |y|2 = 1 in R2. Eachpoint is mapped to a point on the same ray through the origin, but the distanceto the origin is replaced by its reciprocal. So we can think of S2 as two copiesof R2 glued together along R2\0 by the inversion map across the unit circle.

3.2 Tangent vectors on manifolds

Recall that for a solution φ to an autonomous system x = v(x), the para-metric curve φ(t) has tangent vector φ(t) = v(φ(t)) at time t. We will use

60

this to define tangent vectors to manifolds. A tangent vector at a point pin a smooth manifold X is given by the derivative α(0) of a smooth curveα : (−ε, ε)→ X ⊂ RN so that α(0) = p. (Note the fact RN is a vector spaceallows us to differentiate α.) The space of all tangent vectors at p is calledthe tangent space TpX of X at p, and it is characterized by the followingproposition.

Proposition 40. If X ⊂ RN is a k-dimensional smooth manifold, then thetangent space TpX is the following: Given a local parametrization of X

φ : U → O 3 p

so that φ(0) = p,TpX = Dφ(0)(Rk).

In particular, TpX is naturally a k-dimensional vector space.

Proof. First of all, given a curve α : (−ε, ε) → X so that α(0) = p, we canensure (by shrinking ε if necessary), that the image of α is contained in thecoordinate neighborhood O. Now

α = φ (φ−1 α)

and the chain rule shows that

α′(0) = Dφ(0)[(φ−1 α)′(0)] ∈ Dφ(0)(Rk).

Thus we’ve shown TpX ⊂ Dφ(0)(Rk).To show Dφ(0)(Rk) ⊂ TpX, for any vector v ∈ Rk, consider α(t) = φ(tv)

for |t| small enough that the image of α is contained in O. Then

α′(0) = Dφ(0)v

and so Dφ(0)(Rk) = TpX.

Also note the following corollary of our definition of TpX:

Corollary 41. TpX is independent of the coordinate neighborhood O of p.

If f : X → Rm is a smooth map from a smooth k-dimensional manifoldX, and if p ∈ X, then we define

Df(p) : TpX → Rm

61

by using a local parametrization φ : U → X so that φ(q) = p. Then we define

Df(p) = D(f φ)(q) (Dφ(q))−1.

The following exercise verifies this definition makes sense (see Guillemin andPollack).


(a) Show that Dφ(q) is invertible as a linear map from Rk to TpX.

(b) Show that the definition of Df(p) is independent of the coordinateparametrization φ.

(c) Show that if f : X → Y for Y ⊂ Rm a manifold, then Df(p)(TpX) ⊂Tf(p)Y .

Tangent vectors naturally differentiate functions at a point. So if f : X →R, then and the tangent vector v = α′(0) for a curve α so that α(0) = p,then we may define

(vf)(p) = (f α)′(0) = Df(p)α′(0) = Df(p)v.

This definition depends only on v, and not on the curve α used. (For each vthere are many α, since v only depends on the first derivative α′(0) and nohigher Taylor coefficients.)

For a coordinate system

φ−1 = (x1, . . . , xk) : O → Rk,

(where we assume as usual that φ(0) = p), then the coordinate basis of TpXinduced by φ may be written as ∂/∂xi, which are thought of as tangentvectors differentiating functions f by(

∂

∂xi

)f

∣∣∣∣p

=∂

∂xif φ−1

∣∣∣∣0

=∂

∂xif(x1, . . . , xk)

∣∣∣∣0

.

(∂/∂xi is the tangent vector associated to the curve α = φ(tei), for ei the ith

basis standard basis vector in Rk.) Thus we can write any tangent vector vat p as

v = vi∂

∂xi.

62

Writing tangent vectors in terms of the coordinate basis of TpX is much moreuseful than writing them in terms of a basis of RN ⊃ TpX.

The components vi will change depending on the local coordinates. OnOαβ = Oα ∩ Oβ the intersection of two coordinate neighborhoods of p, thenwe have two coordinate systems φ−1

α = (x1, . . . , xk) and φ−1β = (y1, . . . , yk).

We can write by using the chain rule

v = vi(x)∂

∂xi= vi(x)

∂yj

∂xi∂

∂yj= vj(y)

∂

∂yj.

Therefore, we know how the vi change under coordinate transformationsx→ y:

vj(y) = vi(x)∂yj

∂xi. (22)

(In a more coordinate-free notation, the Jacobian matrix ∂yj/∂xi is thederivative of the gluing map φαβ = φ−1

β φα. It is easy to check thaty = φαβ x.)

All the tangent spaces of a manifold X patch together to make a largermanifold TX called the tangent bundle. We define the tangent bundle

TX = (p, w) ∈ RN × RN : p ∈ X, w ∈ TpX.

Homework Problem 28. If X is a k-dimensional manifold, show that TXis a 2k-dimensional submanifold of R2N . To prove this, consider a localparametrization φ : U → X ⊂ RN .

(a) Define Φ: U × Rk → R2N for y = (y1, . . . , yk) by

Φ(x, y) =

(φ(x),

∂φ

∂xi(x) yi

).

Show that Φ(U×Rk) is an open subset of TX and that Φ is one-to-one.

(b) Show that DΦ has rank 2k.

(c) Show that Φ−1 is continuous from Φ(U × Rk) to U × Rk.

There is a natural smooth map

π : TX → X, π(p, w) = p,

63

and each π−1(p) is the vector space TpX.Each coordinate system φ−1 = (x1, . . . , xk), provides a local frame ∂/∂xi

of the tangent bundle. A local frame is a basis of the tangent space for ev-ery p in a neighborhood O ⊂ X. These frames are patched together in thefollowing paragraph.

A more abstract view of the tangent bundle is given by looking a givensmooth atlas Oα, φα,Uα of X. Then as a set, we may identify

TX =

(⊔α

Uα × Rk

)/ ≈,

where the equivalence class ≈ is given by

(x, v) ≈ (y, w) if x ∈ Uαβ, y ∈ Uβα, y = φαβ(x), w = Dφαβv.

A vector field on a manifold X provides a tangent vector at every pointin X. More precisely, a vector field is a section of the tangent bundle. Inother words, v : X → TX is a vector field if π(v(p)) = p for all p ∈ X. Sov(p) = (p, w(p)) for w(p) ∈ TpX. In fact, for X ⊂ RN , w : X → RN so thatw(p) ∈ TpX is equivalent to v(p) = (p, w(p)). (Clearly v and w carry thesame amount of information, and we often will refer to both of them usingthe same symbol v.)

A vector field v is smooth if it is given as a smooth map from X toRN × RN ⊃ TX as above. Equivalently, v is smooth if for every localcoordinate system (x1, . . . , xk),

v = vi(x)∂

∂xi

for vi smooth on U ⊂ Rk.

3.3 Flows on manifolds

A smooth vector field v on a manifold X defines a system of ODEs in thelocal coordinates of X (or we may say more simply a system on X). TheODE system is given by

x = v(x)

for x : I → X a parametric curve.

64

In order to describe the relationship between the local and global picturesof the ODE system, consider X ⊂ RN and v : X → RN so that for eachp ∈ X, v(p) ∈ TpX. Consider a local parametrization φα : Uα → Oα. Letφ−1α = (x1

α, . . . , xkα). Locally on Uα ⊂ Rk, we represent v by

vα = viα∂

∂xiα.

In other words, for p ∈ Oα ⊂ X, we have

v(p) = Dφα(p)vα(p).

Proposition 42. Consider v a smooth vector field on X ⊂ RN . Consider asolution ψα to xα = vα(xα), where ψα : I → Uα for a time interval I. Then

ψ = φα ψα

is a solution to x = v(x) from I to Oα ⊂ X. Every solution to x = v(x)restricted to Oα is of this form.

Proof. First of all, note that xα = vα(xα) is a well-defined system of ODEson the open set Uα ⊂ Rk. On the other hand, on X, the system x = v(x) isnot an ODE system on RN ⊃ X. This may be remedied locally as follows:For each p ∈ X, v(p) ∈ TpX ⊂ RN . Then since v is a smooth function,we may locally extend v to a smooth function to RN (we refer to each localextension simply as v).

Consider a solution ψα to xα = vα(xα). Then if we let ψ = φα ψα, thencompute

ψ = Dφα(ψα) = Dφα(vα) = v.

Thus ψ is a solution. To show that every solution ψ to x = v(x) is of thisform, note that since TpX is the image of Dφα(q) for φα(q) = p (Proposition40), then every smooth vector field v is locally equal to Dφαvα. Then byuniqueness of ODEs, the solution to x = v(x) must be the image of thesolution to xα = vα(xα).

Remark. The restriction to autonomous equations x = v(x) is unnecessary.The same proof works for non-autonomous systems x = v(x, t) on manifolds.

Recall a subset X of a metric space Y is compactly contained in anothersubset Z if X is compact and X ⊂ Z. In this case we write X ⊂⊂ Z, andsay X is a precompact subset of Z.

65

Theorem 15. Let v be a smooth vector field on a compact manifold X. Thenthe flow F (y, t) along the vector field (the solution to

x = v(x), x(0) = y)

is a smooth function from X×R to X. In particular, any flow on a compactmanifold exists for all time.

Proof. Consider an atlas Oα, φα,Uα of X. First of all, by Lemma 43 below,there is an open cover Qβ of X so that each Qβ ⊂⊂ Oα for some Oα in theatlas. Then each φ−1

α Qβ is a compact subset of Uα. Our differential equationis equivalent to xα = vα(xα) on each Uα.

Since X is compact, we can choose a finite subcover Q1, . . . ,Qn of theopen cover Qβ. For each i = 1, . . . , n, an straightforward analog of Lemma23 shows there is an εi > 0 so that if x0 ∈ φ−1

α Qi, then the solution to

xα = vα(xα), x(0) = x0

stays in Uα for t ∈ [−εi, εi]. Moreover, by Proposition 31, for any T ∈ R, thesolution with initial condition x(T ) = x0 ∈ φ−1

α Qi stays within Uα for timet ∈ [T − εi, T + εi].

Let ε = minε1, . . . , εn > 0. Then for every T ∈ R, p ∈ X, we claim thesolution to x = v(x), x(T ) = p exists for all t ∈ [T − ε, T + ε]. To prove theclaim, note that each p ∈ X lies in one of the Qi ⊂ Oα, and that the solutionto

xα = vα(xα), x(T ) = φ−1α (p)

lies in Uα for t ∈ [T − ε, T + ε]. Thus Proposition 42 shows that the solutionto x = v(x), x(T ) = p is in Oα for t ∈ [T − ε, T + ε], and the claim is proved.

In order to prove the Theorem, continue as in the proof of Lemma 25 toshow the solution exists for all time. The smoothness of the solution followsfrom Theorem 12 and Proposition 42.

Lemma 43. Given an atlas Oα, φα,Uα of a manifold X, there is an opencover Qβ of X so that each Qβ is precompact in some Oα.

Proof. We can cover each open Uα ⊂ Rk by open balls Bβ ⊂⊂ Uα. ThenQβ = φα(Bβ) forms an open cover of X.

The support of an Rm-valued function f is the closure

supp(f) = x : f(x) 6= 0.

66

An important class of functions is smooth functions with compact support.Prominent examples can be constructed using the smooth function on R

f(x) =

e−

1x for x > 0

0 for x ≤ 0

See the notes on bump functions.

Homework Problem 29. Let Ω ⊂ Rn be a domain. Consider a smoothvector field v : Ω → Rn with compact support. Show that any solution ψ tox = v(x), x(0) = x0 ∈ Ω, exists for all time t ∈ R.

Hint: First show that if v(y) = 0, then any solution to x = v(x), x(t0) =y, must be constant for all time. Use this to show that any solution to x =v(x), v(x(0)) 6= 0, must remain in supp(v) for its entire maximal interval ofdefinition. Apply Theorem 7.

Given a smooth manifold X, consider the set Diff(X) of diffeomorphismsfrom X to itself. Then for f, g ∈ Diff(X), it is easy to see that

f g ∈ Diff(X), f−1 ∈ Diff(X), f f−1 = id

for id the identity map. Therefore, Diff(X) is a group.

Proposition 44. Let v be a smooth vector field on a compact manifoldX. Then for the flow F (y, t), define Ft(y) = F (y, t). Then Ft ∈ Diff(X),Ft1+t2 = Ft1 Ft2, and F−t = F−1

t . (And so F is a group homomorphismfrom the additive group R to Diff(X).)

Proof. Theorem 15 shows that Ft is smooth for any t. The group homo-morphism property is simply a restatement of Proposition 34. Therefore,Ft F−t = F0, which is the flow along v for time 0. By definition, F0 = id theidentity map. Now F−1

t = F−t is smooth, and so Ft is a diffeomorphism.

Remark. Note the only place we used the fact that X is compact is to guar-antee the existence of the flow for all time. So the proposition still holds forany smooth vector field v on a smooth manifold X so that the flow exists forall time.

Example 12. For the sphere S2 ⊂ R3, consider the vector field defined byv(x1, x2, x3) = (−x2, x1, 0). It is straightforward to show that the tangentspace to S2 at (x1, x2, x3) is given by v = (v1, v2, v3) ∈ R3 so that v1x1 +

67

v2x2 + v3x3 = 0. (Proof: S2 = f = 1 for f = (x1)2 + (x2)2 + (x3)2,and so for any local parametrization φ, we have f φ = 1. Thus the ChainRule shows that Df(x)(TxS2) = 0, and so TxS2 ⊂ kerDf(x). They mustbe equal since both are two-dimensional vector spaces. Then simply computekerDf(x).) Therefore, v is a smooth vector field on S2.

Recall that the coordinate systems of the atlas introduced above are

(y1, y2) = φ−11 (x1, x2, x3) =

(x1

1− x3,

x2

1− x3

),

(z1, z2) = φ−12 (x1, x2, x3) =

(x1

1 + x3,

x2

1 + x3

).

On U1, compute at x = (x1, x2, x3) ∈ O1 ⊂ S2,

Dφ−11 (x)(v) =

(1

1−x3 0 x1

(1−x3)2

0 11−x3

x2

(1−x3)2

) −x2

x1

0

=

(− x2

1−x3x1

1−x3

)=

(−y2

y1

).

It turns out that for x ∈ O2,

Dφ−12 (x)(v) =

(−z2

z1

)as well.

In the coordinate charts, these systems can be solved explicitly. For A =(0 −11 0

), compute the fundamental solution

eAt = PetDP−1

=

(1 1−i i

)exp

[(i 00 −i

)t

]( 12

i2

12− i

2

)=

(1 1−i i

)(cos t+ i sin t 0

0 cos t− i sin t

)( 12

i2

12− i

2

)=

(cos t − sin tsin t cos t

).

68

Therefore, for y ∈ U1, the solution to y = v(y), y(0) = y0 is

y(t) =


)y0. (23)

And also, for z ∈ U2, the solution to z = v(z), z(0) = z0 is

z(t) =


)z0. (24)

Proposition 42 implies that these two flows should be related, since they bothcorrespond to flows on S2. In particular, for y0 ∈ U12, let z0 = φ12(y0) =y0|y0|−2. Then we check that the solution

z(t) = φ12(y(t))

for y(t) from (23) and z(t) from (24). So compute

y(t) =

(y1

0 cos t− y20 sin t

y10 sin t+ y2

0 cos t

)for y0 =

(y1

0

y20

),

|y(t)|2 = (y10 cos t− y2

0 sin t)2 + (y10 sin t+ y2

0 cos t)2 = |y0|2,

φ12(y(t)) =y(t)

|y(t)|2=

1

|y0|2


)y0

=


)z0 = z(t).

Therefore, the flow patches from U1 to U2.The flow itself can be represented by on U1 by

Ft(y) =


)y,

on U2 by

Ft(z) =


)z,

and even on S2 ⊂ R3 itself by

Ft(x) = exp

0 −1 01 0 00 0 0

t

x =

cos t − sin t 0sin t cos t 0

0 0 1

x.

69

Homework Problem 30. Consider the atlas given above for S2. On U1,consider the vector field

v = −y1 ∂

∂y1− y2 ∂

∂y2.

Show that Dφ1v is extends to a smooth vector field on all of S2 (i.e., itextends smoothly across N = S2 \ O1.) Write down this vector field in thez coordinates on U2 as well. Solve for the flow on U1 and U2, and explicitlycheck they agree on the overlap O12.

3.4 Riemannian metrics

For a vector v at a point p on a manifold X ⊂ RN , we can measure the lengthof v by using the inner product on RN . So if v ∈ TpX ⊂ RN , and

v = va∂

∂ya

for y = (y1, · · · , yN) coordinates on RN , then the length |v| of v is given by

|v|2 =N∑a=1

(va)2 = δabvavb

for the Kronecker δab = 1 if a = b and δab = 0 if a 6= b. In this usagefor computing the length of a tangent vector on RN , the Kronecker δ is aRiemannian metric.

(Note we use the following convention for an n-dimensional manifold X ⊂RN : use indices a, b, c from 1 to N to represent coordinates in RN , and usei, j, k from 1 to n to represent local coordinates on X.)

On a manifold X, a Riemannian metric is a smoothly varying positivedefinite inner product on TpX for all p ∈ X. Recall the definitions involved.An inner product on a real vector space V is a pairing g : V ×V → R which isbilinear and symmetric. g is bilinear if for every v ∈ V , the maps g(v, ·) andg(·, v) from V to R are linear maps, and g is symmetric if for each v, w ∈ V ,g(v, w) = g(w, v). An inner product is positive definite if g(v, v) ≥ 0 for allv ∈ V and g(v, v) = 0 only if v = 0.

If the vector space V has a basis ei, then the inner product g is determinedby gij = g(ei, ej), since for any linear combination v = viei, w = wjej,

70

bilinearity shows

g(v, w) = g(viei, wjej) = vig(ei, w

jej) = viwjg(ei, ej) = viwjgij.

The fact g is symmetric is equivalent to gij = gji.Note that a positive definite inner product g provides a way to measure

the length of a vector |v|g =√g(v, v), and it also provides a measurement

of the angle θ between two nonzero vectors v and w:

cos θ =g(v, w)

|v|g|w|g.

A Riemannian metric on X gives a positive definite inner product on eachtangent space TpX. We also require these inner products to vary smoothlyas the point p varies in X. To describe this, consider a smooth atlas on X,and a local coordinate system (x1, . . . , xk) around p. Then a smooth vectorfield v can be represented as v = vi ∂

∂xifor the standard local frame ∂/∂xi

of the tangent bundle. Then at each point, the inner product g is representedby gij(x), and

g(v, w) = gijviwj, vi = vi(x), wj = wj(x), gij = gij(x).

Then g is smoothly varying on X if the functions gij are smoothly varyingon each coordinate chart in the smooth atlas of X.

Euclidean space RN has a standard Riemannian metric given by the stan-dard inner product δab. As we’ve seen above, for any submanifold X ⊂ RN

endows X with a Riemannian metric. In particular, for v, w ∈ TpX ⊂ RN ,we can form g(v, w) using the inner product δab. In particular, consider asmooth parametrization φ : U → O ⊂ X ⊂ RN . Then φ = (φ1, · · · , φN). Avector field represented by

v = vi∂

∂xi

on U ⊂ Rn is represented by

Dφ(x)(v) =∂φa

∂xi(x)vi(x) ∈ Tφ(x)X ⊂ RN .

Dφ(x)(v) is called the push-forward of v under the map φ. For v, w ∈ TpX,we may define the metric

gijviwj = g(v, w) =

(∂φa

∂xivi)(

∂φb

∂xjwj)δab

=

(∂φa

∂xi∂φb

∂xjδab

)viwj.

71

Therefore, the Euclidean inner product on RN induced the Riemannian met-ric on X locally given by the formula

g

(∂

∂xi,∂

∂xj

)= gij =

∂φa

∂xi∂φb

∂xjδab. (25)

Given a real vector space V , the dual vector space V ∗ is given by the setof all linear functions from V to R. It is easy to check V ∗ is a vector space.If V has a basis ei, then there is a dual basis ηi of V ∗, which is definedas follows:

ηi(ej) = δij.

Given a local coordinate frame ∂/∂xi of TX, the local frame on the dualspace is written as dxi. Each dxi is called a differential. The dual spaceT ∗pX of TpX is called the cotangent space of X at p.

Lemma 45. If y = y(x) is a coordinate change as in (22), then

dyj =∂yj

∂xidxi.

Proof. Write dyj = ξj`dx`. Then we have

δji = dyj(∂

∂yi

)= ξj`dx

`

(∂xk

∂yi∂

∂xk

)= ξj`

∂xk

∂yidx`(

∂

∂xk

)= ξj`

∂xk

∂yiδ`k = ξjk

∂xk

∂yi.

Therefore, (ξjk) is the inverse matrix of

(∂xk

∂yi

), and so ξjk =

∂yj

∂xk.

A Riemannian metric can be naturally written as

gk` dykdy` = gij

∂yk

∂xi∂y`

∂xjdxidxj.

This makes sense because of the natural pairing

dxi(

∂

∂xj

)= δij

between the tangent and cotangent spaces implies that

g(v, w) = gij dxi

(vk

∂

∂xk

)dxj

(w`

∂

∂x`

)= gij(v

kδik)(w`δj` ) = gk` v

kw`.

72

A Riemannian metric is an example of a tensor on X. The tensor productV ⊗W of two real vector spaces with bases respectively νi and ωj is the realvector space formed from the basis

νi ⊗ ωj.

This impliesdimV ⊗W = (dimV )(dimW ).

A tensor of type (k, `) on a manifold X assigns to each point p ∈ X anelement of

(TpX)⊗k ⊗ (T ∗pX)⊗`,

which has as its basis∂

∂xi1⊗ · · · ⊗ ∂

∂xik⊗ dxj1 ⊗ · · · ⊗ dxj`

.

Locally, we write a tensor ω as

ωi1···ikj1···j`∂

∂xi1⊗ · · · ⊗ ∂

∂xik⊗ dxj1 ⊗ · · · ⊗ dxj` ,

or simply as ωi1···ikj1···j` . We say ω is smooth if each ωi1···ikj1···j` is smooth locally forall coordinates in a smooth atlas of X.

A Riemannian metric is then a smooth symmetric (0, 2) tensor on a man-ifold X. Since the product is symmetric, we omit the ⊗ and simply writegij dx

idxj for a Riemannian metric in local coordinates x. (There are alsoantisymmetric (0, k) tensors, or k-forms, for which the tensor product ⊗ isreplaced by ∧.)

Example 13. For S2, in the local coordinate given by stereographic projec-tion, recall the coordinate chart φ = φ1:

φ(y1, y2) =

(2y1

|y|2 + 1,

2y2

|y|2 + 1,|y|2 − 1

|y|2 + 1

),

73

and the Riemannian metric induced from R3 is

gij dyidyj = δab

∂φa

∂yi∂φb

∂yjdyidyj

= δab dφadφb

= dφ1dφ1 + dφ2dφ2 + dφ3dφ3

=

(−2(y1)2 + 2(y2)2 + 2

(|y|2 + 1)2dy1 +

−4y1y2

(|y|2 + 1)2dy2

)2

+

(−4y1y2

(|y|2 + 1)2dy1 +

2(y1)2 − 2(y2)2 + 2

(|y|2 + 1)2dy2

)2

+

(4y1

(|y|2 + 1)2dy1 +

4y2

(|y|2 + 1)2dy2

)2

=4

(|y|2 + 1)2(dy1dy1 + dy2dy2).

Note in the previous example, we used the formula for differentials

dφa =∂φa

∂yidyi.

It is also useful to have the following notation: If h = hab dzadzb is a Rie-

mannian metric on Z, and φ : Y → Z is a smooth map, then we denote thepullback metric

φ∗h = hab(φ) dφadφb

on Y . Thus in the construction above, if δ = δab dxadxb is the Euclidean

metric on RN , then the metric g induced on a submanifold φ : X → RN isthe pullback φ∗δ.

Homework Problem 31. Let φ : X → Y be a smooth map of manifolds.Let Y have a Riemannian metric h on it. Show that φ∗h is a Riemannianmetric on X if and only if the tangent map Dφ(x) : TxX → Tφ(x)Y is injectivefor every x ∈ X. (In this case φ is called an immersion.)

Hint: Do the calculations in local coordinates on X and Y . The key pointto check is whether φ∗h is positive definite. Show φ∗h(x) is 0 on the kernelof Dφ(x).

Note in the previous example, we considered the Riemannian metric onS2 pulled back from the Euclidean metric on R3. It is possible to write downother Riemannian metrics as well.

74

Example 14. Consider hyperbolic space

Hn = x = (x1, . . . , xn) ∈ Rn : xn > 0

equipped with the Riemannian metric

dx1dx1 + · · ·+ dxndxn

(xn)2.

A famous theorem of John Nash shows that for every Riemannian metricg on a smooth manifold X, there is an embedding i : X → RN so that g isinduced from the standard metric on RN . (Although it is not in most casesobvious what the embedding is.)

3.5 Vector bundles and tensors

In order to explain better what tensors are, we introduce the idea of a vectorbundle. The tangent bundle TX of a smooth n-dimensional manifold X is avector bundle. Recall there is a map

π : TX → X.

The fiber over a point p ∈ X π−1(p) = TpX is an n-dimensional vectorspace. Moreover, over each coordinate neighborhood O ⊂ X with coordi-nates x1, . . . , xn, π−1O is diffeomorphic to O × Rn, the diffeomorphismbeing

(p, v) 7→ (p, v1, . . . , vn)

for p ∈ O, v = vi ∂∂xi∈ TpX.

We generalize these properties of TX to define a vector bundle. A vec-tor bundle of rank k over a manifold X is given by an n + k dimensionalmanifold V with a smooth map π : V → X. V is called the total space ofthe vector bundle. Every point in X has a neighborhood O so that π−1O isdiffeomorphic to O×Rk. Under this diffeomorphism, π is simply the naturalprojection from O×Rk → O. Thus vector bundles are locally trivial, in thateach vector bundle is locally a product of a neighborhood times Rk. Notethat each diffeomorphism

π−1O → O × Rk

75

provides for each p ∈ O a basis of the vector space π−1(p) by taking thepreimage of the standard basis of Rk under the diffeomorphism. Such asmoothly varying basis is called a local frame of the vector bundle over O.

Given a gluing map y = y(x) of two small coordinate neighborhoods Oxand Oy in X, there is a corresponding gluing map of Ox ×Rk and Oy ×Rk.We require this gluing map to be of the form

(x, v) 7→ (y(x), A(x)v)

for v a vector in Rk and A(x) a smoothly varying nonsingular matrix inx. Therefore, above each point p, if we change coordinates from x to y, theframe changes by the matrix A(x). A(x) is a transition function of the vectorbundle V . So the transition functions act on the fibers of a vector bundle aslinear isomorphisms. This preserves the vector-space structure on each fiberwhen changing coordinates.

Remark. We have defined real vector bundles of rank k, for which each fiber isdiffeomorphic to Rk. We may also define complex vector bundles with fibersdiffeomorphic to Ck.

A section of a vector bundle π : V → X is a map s : X → V satisfyingπ(s(p)) = p for all p ∈ X. So for each p ∈ X, s(p) is an element of thevector space π−1(p). A vector field is precisely a section of the tangentbundle. Locally, k sections which are linearly independent on each fiberform a frame of the vector bundle. For example, ∂/∂xi are n linearlyindependent sections of the tangent bundle over a coordinate chart.

Since vector bundles preserve the linear structure on each fiber, we maydo linear algebra on the fibers to create new vector bundles. In particular,we can take duals and tensor products of the fiber space to form new vectorbundles. The tensor bundle of type (k, `) over an n dimensional manifold Xis the vector bundle of rank nk+` with the fiber over p given by

TpX⊗k ⊗ T ∗pX⊗`.

Over each coordinate chart, the natural frame of the tensor bundle is

∂

∂xi1⊗ · · · ⊗ ∂

∂xik⊗ dxj1 ⊗ · · · ⊗ dxj`

for i1, . . . , ik, j1, . . . , j` ∈ 1, . . . , n. The transition functions of a tensorbundle are determined by the formulas

∂

∂xi=∂yk

∂xi∂

∂yk, dxj =

∂xj

∂y`dy`.

76

For example the transition functions for the (0, 2) tensor bundle are given by

dxi dxj =∂xi

∂yk∂xj

∂y`dyk dy`.

Note we can view∂xi

∂yk∂xj

∂y`

as a nonsingular n2 × n2 matrix, which is the tensor product of the matrix∂xi

∂ykwith itself.

A smooth tensor of type (k, `) is a smooth section of the (k, `) tensorbundle. Thus a Riemannian metric is a smooth symmetric, positive-definite(0, 2) tensor.

3.6 Integration and densities

We begin by introducing the Change of Variables Formula for multiple inte-grals:

Theorem 16 (Change of Variables). Let Ω ⊂ Rn be an open set, and letg : Ω → Rn be one-to-one and locally C1. Then for every L1 function f ong(Ω) with Lebesgue measure dx and dy,∫

g(Ω)

f(y) dy =

∫Ω

f(g(x))| detDg(x)| dx.

Proof. See Spivak Calculus on Manifolds.

Here is another useful concept. Given an open cover Oα of a smoothmanifold X, a partition of unity subordinate to the cover is a collection ofsmooth functions ρβ : X → R satisfying

1. ρβ(x) ∈ [0, 1].

2. For each ρβ, there is an α so that supp(ρβ) ⊂⊂ Oα.

3. Every x ∈ X has a neighborhood which intersects only finitely manyof the supports of the ρβ.

4.∑

β ρβ(x) = 1.

77

Proposition 46. For every open cover of a smooth manifold X, there existsa subordinate partition of unity.

For a proof, see Spivak or Guillemin and Pollack.

Theorem 17. A Riemannian metric g on a manifold X provides a measureon X called the Riemannian density.

The construction of this measure follows below, along with a sketch of aproof.

Let Oα, φα,Uα be a smooth atlas of X. A function f : X → R ismeasurable if each f φα : Uα → R is measurable. For a Riemannian metricg on X, the density dVg is defined first for measurable functions f : X → Rwhose supports are contained in some Oα. In this case, define∫

X

f dVg =

∫Oαf dVg =

∫Uαf(x)

√det gij(x) dx

for local coordinate x on Oα and Lebesgue measure dx on Uα ⊂ Rn.The key point is to make sure this definition makes sense for functions f

whose support is contained in two open charts Oα and Oβ. As above, let xbe the local coordinates on Oα, and let y be the coordinates on Oβ. Then weuse the rule (25) for changing gij under a change y = y(x) and the Changeof Variables Theorem 16 to show∫Uβf(y)

√det gij(y) dy =

∫Uαf(x)

√det gij(y)

∣∣∣∣det∂yi

∂xj

∣∣∣∣ dx=

∫Uβf(x)

√det

(gk`(x)

∂xk

∂yi

∂x`

∂yj

) ∣∣∣∣det∂yi

∂xj

∣∣∣∣ dx=

∫Uβf(x)

√det gk`(x)

∣∣∣∣det∂xk

∂yi

∣∣∣∣ ∣∣∣∣det∂yi

∂xj

∣∣∣∣ dx=

∫Uβf(x)

√det gk`(x) dx.

Let ρβ be a partition of unity subordinate to the atlas Oα of X. For anymeasurable subset Ω ⊂ X, consider its characteristic function χΩ. Then

Vg(Ω) =

∫X

χΩ dVg =∑β

∫X

ρβχΩ dVg.

78

The calculation in the previous paragraph can be used to ensure that this def-inition is independent of the atlas and partition of unity used. It is straight-forward to check that dVg defines a measure on X. Then for any L1 functionf on X (measured by dVg of course),∫

X

f dVg =∑β

∫X

ρβf dVg.

Homework Problem 32. Check that Vg is a measure on X.

Remark. To complete a proof of Theorem 17, it is necessary to check thatthe definition depends only on g and not on the atlas Oα, φα,Uα or thepartition of unity ρβ subordinate to the open cover Oα.

If Ω is a domain in Rn with smooth boundary, then the measure on theboundary ∂Ω is given by the restriction of the Riemannian metric on Rn.(So this gives a Riemannian metric on ∂Ω, and thus a density as above.) If∂Ω is locally given by the graph of a function (x1, . . . , xn−1, f(x1, . . . , xn−1)),then

φ(x1, . . . , xn−1) = (x1, . . . , xn−1, f(x1, . . . , xn−1))

is a local parametrization of the n− 1 dimensional manifold ∂Ω ⊂ Rn. Thematrix

Dφ =

1 0 · · · 00 1 · · · 0...

.... . .

...0 0 · · · 1f,1 f,2 · · · f,n−1

.

Then the pullback metric

n−1∑i,j=1

gij dxidxj = φ∗δ = δab dφ

adφb

= (dx1)2 + · · ·+ (dxn−1)2 + (f,1dx1 + · · ·+ f,n−1dx

n−1)2.

As a matrix,(gij) = (δij + f,if,j).

In order to compute the volume form, we should compute det gij. Fortunately,it is easy to compute in this case

det g = 1 + |df |2 = 1 + f 2,1 + · · ·+ f 2

,n−1,

79

(see Problem 33) below. So the density

dVg =√

1 + |df |2 dxn−1

for dxn−1 Lebesgue measure on Rn−1.

Homework Problem 33. For w an n-dimensional column vector, and Ithe n× n identity matrix, show that det(I + ww>) = 1 + |w|2.

Hint: Show that I+ww> can be diagonalized, with one eigenvalue 1+|w|2,and with the eigenvalue 1 repeated n− 1 times. (For this last step, show thaton the n− 1 space orthogonal to the natural (1 + |w|2)-eigenvector, I +ww>

acts as the identity. What is a natural eigenvector to try?)

For a function f : Ω → R, the differential, or one-form, df = ∂f∂xidxi.

Under a change of coordinates y = y(x), df transforms as via the chain rule

∂f

∂yjdyj = df =

∂f

∂xidxi =

∂f

∂yj∂yj

∂xidxi.

In particular, this gives the formula for differentials (cf. Lemma 45)

dyj =∂yj

∂xidxi.

It also shows that for each p ∈ X a manifold, we can think of df(p) ∈ T ∗pXthe cotangent space. This is investigated further in the following problem:

Homework Problem 34. If f is a smooth function on X and v is a smoothvector field, show that at each point p ∈ X,

(vf)(p) = df(p)(v(p)).

(In the expression on the right, consider df(p) as an element of the dual spaceT ∗pX.)

Hint: Check it in a single coordinate chart.

On a Riemannian manifold (X, g) (i.e., g is a Riemannian metric on themanifold X), for each smooth function f , there is a vector field called thegradient of f . We define the gradient ∇f in local coordinates to be

(∇f)i = gijf,j, gk`g`m = δkm.

(So gij is the inverse of the matrix gij.) Note that the Einstein conventionwith one index up (typically) indicates that ∇f is a vector field.

80

Homework Problem 35. Show that ∇f transforms as a vector field undercoordinate changes. In other words, check that if y = y(x),

(∇f)j(y) =∂yj

∂xi(∇f)i(x)

as in (22).Hint: First check how the inverse of the metric gij transforms. Note that

in the definition gijgjk = δik, δik is independent of coordinate changes.

In the case of Euclidean space, it is common to use the gradient of afunction instead of its differential. In this case, ∇f = δabf,a. Note that onany Riemannian manifold

|df |2 = gabf,af,b = gacgcdgdbf,af,b = gcd(∇f)c(∇f)d = |∇f |2.

Let v = vi ∂∂xi

be a vector field on a domain in Rn. Then the divergence of vis a function defined to be

∇ · v =∂vi

∂xi.

The divergence of a vector field may also be defined on Riemannian manifolds,but the definition is somewhat more involved.

Here is another important theorem, which is a consequence of Stokes’sTheorem (see Spivak, Guillemin and Pollack, or Taylor). We only state itfor domains in Rn, and not in its more general context of compact manifoldswith boundary.

Theorem 18 (Divergence Theorem). Let Ω ⊂⊂ Rn be a domain with smoothboundary ∂Ω. Then for any C1 vector field v on Ω,∫

Ω

∇ · v dxn =

∫∂Ω

v · n dV.

(Here n is the unit outward normal vector field to ∂Ω, and dV is the measureon ∂Ω induced from the Euclidean metric.)

Remark. The way we have put the integration depends on the Euclideanmetric (to form the dot product, dV and n). In the general form of Stokes’sTheorem, it it unnecessary to use the metric. (We may recast v and ∇ · v asdifferential forms.)

81

Idea of proof. We do the computation in a very special case, for v havingcompact support in Ω, which is the lower half-space x = (x1, . . . , xn) ∈ Rn :xn ≤ 0.

In this case the unit normal vector n = (0, . . . , 0, 1) and dV = dxn−1

Lebesgue measure on Rn−1 = xn = 0. Then, using Fubini’s Theorem, wewant to prove∫ ∞

−∞. . .

∫ ∞−∞

∫ 0

−∞

∂vi

∂xidxndxn−1 · · · dx1 =

∫ ∞−∞

. . .

∫ ∞−∞

vn dxn−1 · · · dx1.

Note that the left-hand integral is a sum from i = 1 to n. For i = n, compute∫ 0

−∞

∂vn

∂xndxn = vn(x1, . . . , xn−1, 0)− lim

t→−∞v(x1, . . . , xn−1, t)

= vn(x1, . . . , xn−1, 0)

since v has compact support. On the other hand, for i 6= n,∫ ∞−∞

∂vi

∂xidxi = 0

since v has compact support. Therefore, using Fubini’s Theorem, for eachi 6= n, we can integrate ∂vi/∂xi with respect to xi first to get zero. Theremaining term is the case i = n, and so∫ ∞

−∞. . .

∫ ∞−∞

∫ 0

−∞

∂vi

∂xidxndxn−1 · · · dx1

=

∫ ∞−∞

. . .

∫ ∞−∞

∫ 0

−∞

∂vn

∂xndxndxn−1 · · · dx1

=

∫ ∞−∞

. . .

∫ ∞−∞

vn dxn−1 · · · dx1.

This proves the Divergence Theorem in this special case.The general case can be reduced to this special case by using a partition

of unity and the Implicit Function Theorem (see Spivak). In particular, neareach point in ∂Ω, there is a local diffeomorphism of Ω to the lower half-space, sending the boundary to the boundary. Together with open subsets ofΩ, these form an open cover of the compact Ω, and so we may take a finitesubcover, and a partition of unity subordinate to this subcover. Then we can

82

apply the above special case to ρv for ρ in the partition of unity and v thevector field.

It is also necessary to make sure that the various terms in the integralstransform well with respect to the local diffeomorphisms. This can be checkeddirectly, but it is better to use the language of differential forms (see Spivakor Guillemin and Pollack).

Homework Problem 36. Let Ω be a domain in Rn with smooth boundary.On a neighborhood N ⊂ Rn of a point in the boundary ∂Ω, assume that

Ω ∩N = x ∈ N : xn < f(x1, . . . , xn−1)

so that Ω is locally the region under the graph of a smooth function f . Com-pute n and dV . For a smooth vector field v, compute∫

∂Ω∩Nv · n dV

in terms of the integral of a function times Lebesgue measure on Rn−1.Hint: Locally, ∂Ω is a submanifold of Rn which is the image of

φ(x1, . . . , xn−1) = (x1, . . . , xn−1, f(x1, . . . , xn−1)).

Show that n is proportional to ∇ψ, for

ψ(x1, . . . , xn) = xn − f(x1, . . . , xn−1).

Your answer should be of the form∫φ−1(∂Ω∩N )

h dxn−1

for h a function of x1, . . . , xn−1.

Corollary 47 (Integration by Parts). Let Ω ⊂⊂ Rn be a domain with smoothboundary ∂Ω. Then for any C1 vector field v on Ω and C1 function f on Ω,∫

Ω

v · ∇f dxn = −∫

Ω

f ∇ · v dxn +

∫∂Ω

f v · n dV.

Proof. It is easy to check that ∇ · (fv) = (∇f) · v + f ∇ · v, and∫Ω

∇ · (fv) dxn =

∫∂Ω

f v · n dV.

83

3.7 The ε-Neighborhood Theorem

Theorem 19. Let X ⊂ Rn be a compact k-dimensional manifold. Thenthere is an ε > 0 so that for

Xε = X +Bε(0) = y ∈ Rn : there is an x ∈ X so that |x− y| < ε,

there is a smooth projection map from Xε to X which restricts to the identityon X.

Before we prove Theorem 19, we need to introduce the normal bundleNX, which is a vector bundle over X for X ⊂ Rn. Let 〈·, ·〉 denote thestandard inner product on Rn. Define

NX = (x, y) ∈ Rn × Rn : x ∈ X, 〈y, z〉 = 0 for all z ∈ TxX.

Then NX is a vector bundle of rank n − k, with π : NX → X given byπ : (x, y) 7→ x. For a given x ∈ X, NxX = π−1(x) is the normal space to Xat x, which consists of all vectors in Rn perpendicular to the tangent spaceTxX.

First of all, we show that NX is a smooth n-dimensional manifold.

Homework Problem 37. NX is a smooth manifold of dimension n.

(a) Show that X ⊂ Rn is a smooth manifold if and only if for each x ∈ X,there is a neighborhood W of x in Rn and a smooth function ψ : W →Rn−k so that Dψ has constant rank n − k and X ∩W = ψ−1(0). (Toshow =⇒, use Theorem 14, and to show ⇐=, use the Implicit FunctionTheorem.)

(b) At each x ∈ X, and given a smooth function ψ as above, show that thenormal space Nx is the image of of the transpose of the tangent mapDψ(x)⊥ : Rn−k → Rn.

(c) Use the previous section and the techniques of Problem 28 to show NXis a manifold.

We will prove the ε-Neighborhood Theorem by showing that there is aneighborhood of X in Rn which is diffeomorphic to the a neighborhood ofthe zero section (x, 0) : x ∈ X ⊂ NX, and the map required by theε-Neighborhood Theorem then comes from π : NX → X.

84

Proof of the ε-Neighborhood Theorem. Consider the map F : NX → Rn

given by F : (x, y) 7→ x + y. For each x ∈ X, DF (x, 0) : Tx(NX) → Rn

is a linear isomorphism. This can be proved since T(x,0)(NX) can be writtenas a sum Tx(X) + Nx(X), and DF (x), when restricted to each factor, isa linear isomorphism. The Inverse Function Theorem then shows that eachx ∈ X, there are neighborhoods Nx of (x, 0) in NX and Wx of x in Rn so thatF |Nx is a diffeomorphism from Nx to Wx. Note we may apply the InverseFunction Theorem because by considering a local parametrization of NX,and diffeomorphisms of (open subsets of) manifolds are defined in terms ofthese parametrizations.

Consider the following lemma:

Lemma 48. There are open sets N and X so that X×0 ⊂ N ⊂ NX andX ⊂ X ⊂ Rn and the restriction of F is a diffeomorphism from N to X.

Proof. First of all, we note that DF is a linear isomorphism on N ′ =⋃x∈X Nx. The Inverse Function Theorem then shows that F |N ′ is a dif-

feomorphism onto its image as long as it is one-to-one. Therefore, we needonly find an open N satisfying X×0 ⊂ N ⊂ N ′ on which F is one-to-one.

Now assume by contradiction that no such N exists. Then there arepoints (xn, yn) 6= (x′n, y

′n) ∈ NX satisfying F (xn, yn) = F (x′n, y

′n) and so

that |yn|, |y′n| < 1n

(Why? You must use the compactness of X.) Since X iscompact, there must be a subsequence ni so that (xni , yni)→ (x, 0) as i→∞.Then we may take a further subsequence nij so that (x′nij

, y′nij)→ (x′, 0) as

j → ∞. For simplicity, we rename the subsequence nij as simply n. Thenthe continuity of F shows that

x = F (x, 0) = limn→∞

F (xn, yn) = limn→∞

F (x′n, y′n) = F (x′, 0) = x′.

Since F is injective on X × 0, we have x = x′. But then F |Nx is injec-tive, which contradicts our assumption that (xn, yn) 6= (x′n, y

′n) for large n.

Therefore, the lemma is proved.

Now since X is compact, there is a small ε > 0 so that Xε ⊂ F (N ). Theprojection map from Xε → X is then given by π F−1, which is smooth.This completes the proof of the ε-Neighborhood Theorem.

85

4 The Calculus of Variations

4.1 The variational principle

In this section, we want to consider the problem of constructing a functionwhich minimizes a given functional. (A functional is a map from functionsto R.)

Example 15. Let Ω ⊂⊂ Rn be a domain with smooth boundary. Then weconsider the class

F = f ∈ C2(Ω) ∩ C0(Ω) : f = g on ∂Ω

for a given C2 function g on ∂Ω. Consider the graph of f

(x, f(x)) ∈ Ω× R.

By pulling back the Euclidean metric on Rn+1, we can consider the n-volumeof the graph. We have computed above

Vol(f) =

∫Ω

√1 + |∇f |2 dxn.

Then we want to consider the following question: Is there an f ∈ F whichminimizes Vol(f) over all of F?

If it exists, f must satisfy

d

dε

∣∣∣∣ε=0

Vol(f + εh) = 0

for every h so that f + εh ∈ F . We compute and integrate by parts to finda differential equation f must satisfy. First of all, f + εh ∈ F if and only if

86

h ∈ C2(Ω) ∩ C0(Ω) and h = 0 on ∂Ω.

0 =d

dε

∣∣∣∣ε=0

Vol(f + εh)

=d

dε

∣∣∣∣ε=0

∫Ω

√1 + |∇f + ε∇h|2 dxn

=d

dε

∣∣∣∣ε=0

∫Ω

√1 + |∇f |2 + 2ε df · ∇h+ ε2 |∇h|2 dxn

=

∫Ω

2∇f · ∇h+ 2ε |∇h|2

2√

1 + |∇f + ε∇h|2dxn

∣∣∣∣∣ε=0

=

∫Ω

∇f · ∇h√1 + |∇f |2

dxn

= −∫

Ω

h∇ ·

(∇f√

1 + |∇f |2

)dxn +

∫∂Ω

h

(∇f√

1 + |∇f |2

)· n dV

= −∫

Ω

h∇ ·

(∇f√

1 + |∇f |2

)dxn.

This last integral must be equal to zero for every h ∈ C0(Ω) which vanisheson ∂Ω. We claim this forces

g = ∇ ·

(∇f√

1 + |∇f |2

)= 0

on Ω.To prove the claim, note that since f is C2, g is continuous on Ω. We

prove the claim by contradiction. If g is nonzero at any point x ∈ Ω, assumewithout loss of generality that g(x) > 0. Then by continuity, g > 0 in a smallball B centered at x. Now it is easy to find a smooth bump function h whosesupport is contained in B. In this case∫

Ω

hg dxn =

∫Bhg dxn > 0,

which provides the contradiction.

87

Thus any function f which minimizes the functional Vol satisfies theEuler-Lagrange equation of the functional

∇ ·

(∇f√

1 + |∇f |2

)= 0.

This equation is known as the minimal surface equation.So a solution to our problem satisfies the minimal surface equation, and

the boundary condition f = g on ∂Ω. This sort of boundary condition ofspecifying the value of a solution f is called a Dirichlet boundary condition.The problem of finding a solution to the equation with this boundary conditionis a Dirichlet boundary value problem. Note that the Dirichlet boundarycondition is essential in making sure the variational function h vanishes onthe boundary, and thus there are no boundary terms when we integrate byparts. There is another useful type of boundary condition, the Neumannboundary condition, in which the normal derivative ∇f · n = 0. Notice thatthis also makes the integral over ∂Ω vanish in the integration by parts.

In the previous example, we computed the Euler-Lagrange equation forVol. There may be solutions to the Euler-Lagrange equation which are notminimizers of Vol, since we have only checked the first-derivative test. Asolution to the Euler-Lagrange equation may correspond to a local maxi-mum, a saddle point or a local but non-global minimum. We’ll see belowspecific techniques for finding a global minimizer, which we apply in anothergeometric problem.

The Euler-Lagrange equations come from the first variational formulathat a minimizer must satisfy: Given a family fε with f = f0, then if fminimizes a functional P ,

d

dε

∣∣∣∣ε=0

P (fε) = 0.

This is the formula of the first variation, which comes from the first derivativetest in calculus. We may also use the second derivative test. A minimizer fas above must satisfy the second variation formula

d2

dε2

∣∣∣∣ε=0

P (fε) ≥ 0.

88

Homework Problem 38. Consider a variational problem for C2 functionsy = y(x) from a domain [a, b] and fixed endpoints y(a) = y0, y(b) = y1.Assume the function is of the form

J(y) =

∫ b

a

F (y, y′)dx,

for F a smooth function of 2 variables.

(a) Compute the general Euler-Lagrange equation for J .

(b) Multiply the Euler-Lagrange equation by y′ to show that any solution tothe Euler-Lagrange equation must satisfy

dG

dx= 0

for a function G depending on F, y and their derivatives.

(c) A graph y = y(x) of a C1 positive function determines a surface ofrevolution around the x-axis with surface area

A(y) = 2π

∫ b

a

y√

1 + (y′)2 dx.

Compute the Euler-Lagrange equation for A (assume y is C2) and com-pute its general solution. (The graph of this solution is called a cate-nary.)

4.2 Geodesics

Given a C1 path γ : I → X for I = [α, β] an interval and X ⊂ RN a manifoldwith Riemannian metric g induced from the Euclidean metric on RN , thelength of the path γ(I) is given by

L(γ) =

∫ β

α

|γ|g dt =

∫ β

α

√g(γ, γ) dt =

∫ β

α

√gij(γ(t))γi(t)γj(t) dt.

(In the last formulation, note the use of local coordinates. So the last for-mulation is strictly only true when γ(I) is contained in a single coordinatechart.) L(γ) is called the length functional which take paths γ to R.

89

Proposition 49. The length of a path is independent of the parametrization.In other words, if γ(τ) = γ(t(τ)) for t = t(τ) a C1 diffeomorphism onto I,then L(γ) = L(γ).

Proof. Let t = t(τ) with t(α) = α, t(β) = β. Assume that α < β and since tis a diffeomorphism, then dt/dτ > 0. Then compute

L(γ) =

∫ β

α

√g

(dγ

dτ,dγ

dτ

)dτ

=

∫ β

α

√g

(dγ

dt

dt

dτ,dγ

dt

dt

dτ

)dτ

=

∫ β

α

√g

(dγ

dt,dγ

dt

)dt

dτdτ

=

∫ β

α

√g

(dγ

dt,dγ

dt

)dt

= L(γ).

The case when dt/dτ < 0 and α > β is similar.

So this definition corresponds to the usual definition of the arc length of aparametric curve. In particular, it is invariant under change of parametriza-tion. This particular feature turns out to cause trouble analytically. In thefollowing sections, we’ll seek to find paths minimizing arc length by con-structing a sequence of paths approaching a length-minimizing one. The factthat a potentially minimizing path has many different parametrizations willmake the analysis more difficult, since it will be difficult to find a sequence ofpaths which approaches a particular minimizing path among all the possibleparametrizations. Another analytic objection to the length functional is thatit is the L1 norm of the length of the tangent vector γ. L2 norms tend tobehave better, since we can use the structure of Hilbert spaces.

Assume for convenience that the interval I = [0, 1]. This can always beachieved by using a linear map to take a given I to [0, 1].

Thus we introduce a related functional, the energy of a C1 path γ : [0, 1]→X. Define

E(γ) =

∫ 1

0

|γ|2g dt.

The energy is related to the length by the following proposition.

90

Proposition 50. For a given homotopy class C of curves γ : [0, 1] → X, aC1 curve γ minimizes E in C if and only if it minimizes L among C1 curvesin C and the speed |γ(t)|g is constant.

Before we start the proof, we recall a little about homotopy classes.Two continuous curves γi : [0, 1]→ X i = 0, 1 are homotopic if γi(0) = p,

γi(1) = q for i = 0, 1, and if there is a continuous function (called a homotopy)G : [0, 1]× [0, 1]→ X so that G(0, t) = γ0(t), G(1, t) = γ1(t) for all t ∈ [0, 1],and G(s, 0) = p and G(s, 1) = q for all s ∈ [0, 1]. (More generally, if Yand X are both metric spaces, then two continuous maps f0, f1 : Y → Xare said to be homotopic if there is a continuous map F : [0, 1] × Y → Xwith F (0, y) = f0(y), F (1, y) = f1(y) for all y ∈ Y . In the present case, thespace Y = [0, 1] and we impose the extra conditions that the values at theendpoints t = 0, 1 are fixed at p, q respectively as well.)

Since we are measuring length and energy, we are only interested in curvesγi which are C1, while we allow the homotopy G to be only continuous.

Proposition 51. The condition of two paths being homotopic is an equiva-lence relation, and thus we may consider homotopy classes of paths.)

Proof. We need to show the property is reflexive, symmetric, and transitive.If γ : [0, 1] → X is a continuous path, then it is homotopic to itself via thehomotopy G(s, t) = γ(t) for s ∈ [0, 1]. This shows the reflexive property.

If γ0 is homotopic to γ1 via the homotopy G, then we see γ1 is homotopicto γ0 via the homotopy G(s, t) = G(1 − s, t). This shows the symmetricproperty.

Finally, to show the transitive property, if γ0 is homotopic to γ1 via ahomotopy G and γ1 is homotopic to γ2 via a homotopy F , then we constructa homotopy from γ0 to γ2 by the formula

H(s, t) =

G(2s, t) for s ∈ [0, 1/2]

F (2s− 1, t) for s ∈ [1/2, 1]

Note this definition is well-defined, since for H(1/2, t) = γ1(t) for eitherdefinition above. This observation also shows that H is continuous. It isstraightforward to show H is a homotopy.

A C1 diffeomorphism t = t(τ) of [0, 1] is called orientation preserving ifdt/dτ > 0. Another fact about homotopy we’ll presently use is the following

91

Lemma 52. If γ(τ) = γ(t(τ)) for t = t(τ) an orientation-preserving diffeo-morphism of [0, 1], then γ and γ are homotopic.

Proof. For s, τ ∈ [0, 1], define ψ(s, τ) = sτ + (1 − s)t(τ). Then we willshow that G(s, τ) = γ(ψ(s, τ)) is the required homotopy. First of all, sincet(τ) is an orientation-preserving diffeomorphism, we see t(0) = 0, t(1) = 1.Now check that for s, τ ∈ [0, 1], ψ(s, τ) ∈ [0, 1]: because 0 ≤ τ ≤ 1 and0 ≤ t(τ) ≤ 1, then

0 = s(0) + (1− s)0 ≤ sτ + (1− s)t(τ) ≤ s(1) + (1− s)(1) = 1.

This shows the homotopy G is well-defined. It is obvious for τ ∈ [0, 1]that G(0, τ) = γ0(τ) and G(1, τ) = γ1(τ). Also compute for s ∈ [0, 1],G(s, 0) = γ(0) and G(s, 1) = γ(1).

Also, note the following

Lemma 53. For any C1 path γ, E(γ) ≥ L(γ)2 and they are equal if andonly if |γ(t)|g is constant.

Proof. Apply Holder’s inequality

L(γ) =

∫ 1

0

|γ(t)|g dt ≤(∫ 1

0

12dt

) 12(∫ 1

0

|γ(t)|2g dt) 1

2

=√E(γ)

with equality if and only if 1 is proportional to |γ(t)|g, which is the same as|γ(t)|g being constant.

Proof of Proposition 50. Let γ ∈ C satisfy E(γ) ≤ E(γ′) for all γ′ ∈ C. Givenγ, let γc be the constant speed reparametrization of γ (this exists by Problem39 below). Then we have by Proposition 49 and Lemma 53

L(γc)2 = L(γ)2 ≤ E(γ) ≤ E(γc) = L(γc)

2.

Thus all the inequalities in the above equation must be equalities and L(γ)2 =E(γ). Then Lemma 53 implies γ must have constant speed. So we’ve shownso far that if γ minimizes E, then γ has constant speed.

Let γ minimize E. For each C1 curve γ′ ∈ C, let γ′c be a constant speedreparametrization. Then since γ has constant speed, Lemma 53 and Propo-sition 49 show

L(γ)2 = E(γ) ≤ E(γ′c) = L(γ′c)2 = L(γ′)2.

So we’ve shown that if γ minimizes E in C, then γ minimizes L in C.We leave the converse statement as Problem 40 below.

92

Homework Problem 39. Let γ : [0, 1] → X, γ = γ(t) be a C1 path intoa Riemannian manifold X. Assume |γ(t)|g 6= 0 for all t ∈ [0, 1]. Show thatthere is a reparametrization t(τ) so that t(0) = 0, t(1) = 1, dt/dτ > 0, and∣∣dγdτ

∣∣g

is constant.

Hint: Show the constant must be equal to L(γ). Then show the condi-tion is an ODE in τ = τ(t). (Note that if dt/dτ > 0, then t(τ) is strictlyincreasing and thus has an inverse on [0, 1].)

Homework Problem 40. For a given homotopy class C of curves γ :[0, 1] → X, assume γ has constant speed |γ(t)|g and γ minimizes L amongC1 curves in C. Then γ minimizes E among C1 curves in C.

Now we compute the first variation of the energy functional. Let γ bea smooth curve from [0, 1] to X so that γ(0) = p, γ(1) = q. X ⊂ RN hasthe Riemannian metric pulled back from RN . Assume γ minimizes E in ahomotopy class C, and that γ is C2. Then for each smooth family γε(t), wehave

d

dε

∣∣∣∣ε=0

E(γε) = 0.

Consider a variation of the following special form. Near a point in γ([0, 1]),pick local coordinates x : O → U ⊂ Rn. Then there is a small time intervalI = γ−1(O) ⊂ [0, 1]. Assume for simplicity that I doesn’t contain eitherendpoint 0 or 1. In terms of the local coordinates x, x(γ(t)) = γ(t) ∈ U ⊂ Rn,for t ∈ I. Then let h : R→ Rn be a smooth function so that supp(h) ⊂⊂ I.For ε near 0,

γε(t) = γ(t) + εh(t) ⊂ Ufor t ∈ I. We define γε outside ofO to be simply γ. Apply the first variationalformula

d

dε

∣∣∣∣ε=0

E(γε) =d

dε

∣∣∣∣ε=0

∫ 1

0

g(γε(t), γε(t)) dt

=d

dε

∣∣∣∣ε=0

∫I

gij(γ(t) + εh(t))[γi(t) + εhi(t)][γj(t) + εhj(t)] dt

=

∫I

[∂gij∂xk

(γ(t))hk(t)

]γi(t) γj(t) dt

+

∫I

gij(γ(t)) hi(t) γj(t) dt

+

∫I

gij(γ(t)) γi(t) hj(t) dt

93

Now we integrate by parts in the last two integrals. Note that since h hascompact support, all the boundary terms involving h vanish. Compute∫

I

gij(γ(t)) hi(t) γj(t) dt = −∫I

[∂gij∂xk

(γ(t))γk(t)

]hi(t) γj(t) dt

−∫I

gij(γ(t))hi(t) γj(t) dt.

We may plug this in to find for a minimizer

0 =d

dε

∣∣∣∣ε=0

E(γε)

=

∫I

∂gij∂xk

hkγiγj − ∂gij∂xk

γkhiγj − gijhiγj −∂gij∂xk

γkγihj − gij γihjdt

=

∫I

hk∂gij∂xk

γiγj − ∂gkj∂xi

γiγj − gkj γj −∂gik∂xj

γj γi − gjkγjdt.

Since this is true for each h with compact support in I, then we must havefor each k = 1, . . . , n, and for all t in the open interval I,

0 =∂gij∂xk

γiγj − ∂gkj∂xi

γiγj − gkj γj −∂gik∂xj

γj γi − gjkγj.

Since gkj = gjk, we have

0 = gjkγj +

1

2

(−∂gij∂xk

+∂gkj∂xi

+∂gik∂xj

)γiγj,

0 = γ` +1

2gk`(∂gkj∂xi

+∂gik∂xj− ∂gij∂xk

)γiγj

= γ` + Γìj γiγj,

Γìj =1

2gk`(∂gkj∂xi

+∂gik∂xj− ∂gij∂xk

).

Γìj are called the Christoffel symbols of the metric gij, and

γ` + Γìj γiγj = 0 (26)

is called the geodesic equation for the metric g. Note

Γìj = Γ`ji.

Any curve satisfying this second-order system is called a geodesic on theRiemannian manifold X.

94

Remark. Our definition of geodesic requires a specific parametrization tosolve the equation (the constant speed parametrization). Many other authorsdefine a geodesic to be a curve which satisfies the first variational equationof arc-length. These geodesics are the same as our geodesics as subsets of theRiemannian manifold, but the parametrization is not required to be constantspeed.

Note that this analysis does not work at the endpoints 0 and 1. There,we simply have the conditions γ(0) = p and γ(1) = q to remain in the classC. This is essentially a Dirichlet boundary condition on the problem.

Homework Problem 41. Let p, q be points in a manifold X, and considerthe class C of all smooth paths from p to q.

(a) Compute the Euler-Lagrange equations for the length functional L(γ)for γ ∈ C. Show that any γ : [0, 1] → X which is a critical point of Lmust satisfy

γ`(t) + Γ`ij(γ(t))γi(t)γj(t) = c(t)γ`(t)

for t ∈ (0, 1) and c(t) a real-valued function of t.

(b) Use part (a) to prove the following generalization of Proposition 50:A curve γ in C is a critical point of E if and only if it is a critical pointof L and it has constant speed.

Homework Problem 42. Let (X, g) be an n-dimensional smooth compactRiemannian manifold. By Nash’s Theorem, we may assume that g = i∗δ thepull-back of the Euclidean metric δ on RN for some embedding i : X → RN .If (p, v) ∈ TX (i.e. p ∈ X and v ∈ TpX), show that the solution to thegeodesic equation (26) on X with initial conditions γ(0) = p and γ(0) = vexists for all time.

Hints:

(a) Show that if γ(t) solves the geodesic equation (26), then the speed |γ(t)|gis constant in t.

(b) Reduce the problem to the case the initial speed |v|g(p) = 1.

(c) The unit tangent bundle UTX is defined by

UTX = (p, v) ∈ TX : |v|g(p) = 1.

Show UTX is compact as long as X is compact.

95

(d) Mimic the proof of Theorem 15 to complete the proof.

Example 16. Euclidean space is Rn with the standard Euclidean metricδ = δij dx

i dxj. In this case, all the Christoffel symbols Γkij vanish, since eachterm involves differentiating the components of the metric tensor, all of whichare constant. Therefore, the geodesic system is simply γk = 0. Solutions tothis ODE are simply linear functions of t, and so geodesics are of the formγ = tv + w for v, w ∈ Rn. So geodesics on Euclidean space are straight linestraversed at constant speed.

Example 17. For hyperbolic space, recall the metric gij = (xn)−2δij on x ∈Rn : xn > 0. Compute the Christoffel symbols:

gij = (xn)2δij,

gij,k = −2(xn)−3δijδnk ,

Γkij = 12(xn)2δk`(gi`,j + g`j,i − gij,`)

= 12(xn)2δk`[−2(xn)−3](δi`δ

nj + δ`jδ

ni − δijδn` )

= −(xn)−1(δki δnj + δkj δ

ni − δknδij).

Now consider i, j, k distinct integers in 1, . . . , n.

Γkij = 0,

Γiik = Γiki = −(xn)−1δnk ,

Γkii = (xn)−1δkn,

Γiii = −(xn)−1δni .

First, we look for solutions in which γk = 0 for k = 1, . . . , n− 1 (so onlyγn varies in t). It is plausible to look for such solutions since the coefficientsgij of the metric depend only on xn.

In this case, for k < n, compute

0 = γk = −Γkij γiγj

= −Γknnγnγn

= (xn)−1δknγnγn = 0.

Thus if γ1 = · · · = γn−1 = 0, then the geodesic equations for γk for k < nare automatically solved.

96

Now compute the geodesic equation for γn:

γn = −Γnij γiγj

= −Γnnnγnγn

= (xn)−1γnγn,

= (γn)−1γnγn. (27)

This is a second-order nonlinear equation in γn, and we do not have anygeneral technique to solve such an equation. We can, however, make someeducated guesses. In particular, note that

(γnγn) = γnγn + γnγn,

and that each of these terms is similar to those in the geodesic equation (27)above.

In particular, compute for a function f of γn

0 = (f(γn)γn) (28)

= f(γn)γn + f ′(γn)γnγn,

0 = γn +f ′(γn)

f(γn)γnγn. (29)

This last equation is the same as the geodesic equation (27) if

f ′(γn)

f(γn)= − 1

γn,

and this is now a first-order separable equation for f . We may solve to findf = (γn)−1 is a solution.

Now plug into (28) to find

0 =

(γn

γn

)·,

C =γn

γn

= (log γn),

Ct+D = log γn,

γn = AeCt

97

for A a positive constant (since in hyperbolic space, we have xn = γn > 0)and C any real constant. Therefore,

γ1 = γ10 , . . . , γn−1 = γn−1

0 , γn = AeCt

solves the geodesic system on hyperbolic space.So far we have only found geodesics in the special case that γ1 = · · · =

γn−1 = 0. To find all the geodesics on hyperbolic space, we introduce thenotion of an isometry of a Riemannian manifold.

Given a Riemannian manifold (X, g), a diffeomorphism Φ: X → X is anisometry if Φ∗g = g. Isometries of Hn are well understood, and we introducea specific type. For α > 0, let

ια : x 7→ αx

|x|2,

where x ∈ Hn ⊂ Rn and |x|2 = (x1)2 + · · ·+ (xn)2 comes from Rn. It is easyto see that ια is a diffeomorphism of Hn. To show that it is an isometry, lety = ια(x). Then

ι∗αg = ι∗α

(∑nj=1(dyj)2

|y|2

).

98

Dropping the pull back ι∗α notation, we compute

yj = αxj

|x|2,

dyj =∂yj

∂xidxi,

= αn∑i=1

|x|2δji − 2xixj

|x|4dxi,

(dyj)2 = α2

(n∑i=1

|x|2δji − 2xixj

|x|4dxi

)2

= α2

(n∑i=1

|x|2δji − 2xixj

|x|4dxi

)(n∑k=1

|x|2δjk − 2xkxj

|x|4dxk

)

=α2

|x|8n∑

i,k=1

[4xixk(xj)2 − 2|x|2xixjδjk − 2|x|2xkxjδji + |x|4δji δ

jk

]dxi dxk

=α2

|x|8

4(xj)2

n∑i,k=1

xixk dxi dxk − 2|x|2xj dxjn∑i=1

xi dxi

− 2|x|2xj dxjn∑k=1

xk dxk + |x|4(dxj)2

=α2

|x|8

4(xj)2

n∑i,k=1

xixk dxi dxk − 4|x|2xj dxjn∑i=1

xi dxi + |x|4(dxj)2

,

99

n∑j=1

(dyj)2 =α2

|x|8

4

(n∑j=1

(xj)2

)(n∑

i,k=1

xixk dxi dxk

)

−4|x|2n∑

i,j=1

xjxi dxi dxj + |x|4n∑j=1

(dxj)2

=α2

|x|8

4|x|2

n∑i,k=1

xixk dxi dxk − 4|x|2n∑

i,k=1

xixk dxi dxk

+ |x|4n∑j=1

(dxj)2

=α2

|x|4n∑j=1

(dxj)2,

∑nj=1(dyj)2

(yn)2=

α2

|x|4∑n

j=1(dxj)2

α2 (xn)2

|x|4

=

∑nj=1(dxj)2

(xn)2.

Therefore, ι∗αg = g and ια is an isometry.Moreover, it is trivial to check that any translation x 7→ x + x0 is an

isometry of Hn if the last component xn0 = 0. Also, note that the compositionof two isometries is again an isometry (indeed the set of isometries of aRiemannian manifold X forms a subgroup of the diffeomorphism group calledthe isometry group).

Proposition 55 below shows that for any geodesic ψ : R→ Hn, then ια ψis also a geodesic. Recall we know so far that

γ = (γ10 , . . . , γ

n−10 , AeCt)

are geodesics for A > 0, C ∈ R. Compute for α > 0,

ια γ = αγ

|γ|2=

α(γ10 , . . . , γ

n−10 , AeCt)

(γ10)2 + · · ·+ (γn−1

0 )2 + A2e2Ct.

The image ια γ(R) is then the half-circle in Rn which intersects xn = 0perpendicularly at

0 and(γ1

0 , . . . , γn−10 , 0)

(γ10)2 + · · ·+ (γn−1

0 )2.

100

Then if we apply the isometry given by adding a constant x0 with xn0 = 0,then every half-circle in Hn which intersects xn = 0 perpendicularly at bothendpoints is the image of a geodesic path in Hn.

All together, for constants

γ10 , . . . , γ

n−10 , x1

0, . . . , xn−10 , C ∈ R, A, α > 0,

the path for t ∈ R

ψ(t) =α(γ1

0 , . . . , γn−10 , AeCt)

(γ10)2 + · · ·+ (γn−1

0 )2 + A2e2Ct+ (x1

0, . . . , xn−10 , 0) (30)

is a geodesic in Hn, and the image ψ(R) is a ray or a half-circle in Rn

perpendicular to xn = 0. All such rays and semicircles are represented bysuch geodesic paths.

We claim that we have found all the geodesics in Hn. The way to checkthis is to recognize that the geodesic system, as a second-order ODE systemwith smooth coefficients, has a unique solution for each initial value problem

γk = −Γkij γiγj, γ(0) = y0, γ(0) = v0.

Then if we can check that every initial condition (y0, v0) ∈ THn occurs as(ψ(0), ψ(0)) for a geodesic ψ(t) in (30), uniqueness of the geodesic systemwill imply that we have found all the geodesics in Hn.

So we must check that every (y0, v0) ∈ THn = Hn×Rn can be representedby (ψ(0), ψ(0)) for a ψ(t) in (30). For a given point y0 ∈ Hn, and vectorv0 ∈ Ty0Hn = Rn, consider first the case when

v10 = · · · = vn−1

0 = 0.

In this case, we can choose A > 0 and C so that

ψ(t) = (y10, . . . , y

n−10 , AeCt)

satisfies ψ(0) = y0 and ψ(0) = v0. Otherwise, y0 and v0 span a plane P inHn. Let L = P∩xn = 0. It is straightforward to check that there is a uniquesemicircle in the plane P which hits L perpendicularly, passes through y0 andis tangent to v0 at y0. This is the image of some geodesic ψ(t) in (30). Thenwe can adjust C and A to ensure that ψ(0) = y0 and ψ(0) = v0. Therefore,every initial condition (y0, v0) is achieved by a geodesic on our list, and wehave found all the geodesics in hyperbolic space.

101

The following proposition was discussed in Example 17 above.

Proposition 54. Consider a Riemannian manifold (X, g). Given p ∈ X,v ∈ TpX, there is an ε > 0 and a unique geodesic γ : (−ε, ε) → X withγ(0) = p, γ(0) = v.

Remark. In general, the geodesic γ may not exist for all time, although wehave seen that all the geodesics on hyperbolic space (Example 17) and oncompact Riemannian manifolds (Problem 42) do exist for all time.

A map Φ : X → Y for manifolds X and Y with Riemannian metrics gand h respectively is a local isometry if every point in X has a neighborhoodO on which Φ: O → Φ(O) ⊂ Y is an isometry.

Proposition 55. If Φ: X → Y is a local isometry of Riemannian manifolds,then for every geodesic ψ : (−ε, ε)→ X, Φψ is a geodesic on Y . Any geodesicon Φ(X) ⊂ Y is of this form.

Proof. In local coordinates on X and Y , we can write the isometry as y =y(x). Note this is the same form as a coordinate change, and the conditionthat the map is an isometry is simply that the metric pulls back as a (0, 2)tensor when changing coordinates.

Therefore, the proof boils down the the following fact: for a local isometry,and for any C2 path γ, the quantity

wk = γk + Γkij γiγj

transforms like a tangent vector (i.e. a (1, 0) tensor) under changes of coor-dinates. Therefore,

wk∂

∂xk= wk

∂yI

∂xk∂

∂yI

and wk(x) = 0 for k = 1, . . . , n is equivalent to wI(y) = 0 for I = 1, . . . , n.

This is because ∂yI

∂xkis nonsingular for y = y(x) a diffeomorphism.

In order to compute how wk transforms, we use the following index con-vention. Indices i, j, k, . . . are with respect to the x variables, while indicesI, J,K, . . . are with respect to the y variables. For example, gij is the metricin the x coordinates, while gIJ is the metric in the y coordinates.

First of all, note

gIJ = gij∂xi

∂yI∂xj

∂yJ, gIJ = gij

∂yI

∂xi∂yJ

∂xj.

102

Compute

gIJ,K =∂gIJ∂yK

=∂

∂yK

(gij

∂xi

∂yI∂xj

∂yJ

)=

∂gij∂yK

∂xi

∂yI∂xj

∂yJ+ gij

∂2xi

∂yI∂yK∂xj

∂yJ+ gij

∂xi

∂yI∂2xj

∂yJ∂yK

= gij,k∂xk

∂yK∂xi

∂yI∂xj

∂yJ+ gij

∂2xi

∂yI∂yK∂xj

∂yJ+ gij

∂xi

∂yI∂2xj

∂yJ∂yK.

Then compute

gKJ,I + gIK,J − gIJ,K

= gij,k∂xk

∂yI∂xi

∂yK∂xj

∂yJ+ gij

∂2xi

∂yI∂yK∂xj

∂yJ+ gij

∂xi

∂yK∂2xj

∂yJ∂yI

+ gij,k∂xk

∂yJ∂xi

∂yI∂xj

∂yK+ gij

∂2xi

∂yI∂yJ∂xj

∂yK+ gij

∂xi

∂yI∂2xj

∂yJ∂yK

− gij,k∂xk

∂yK∂xi

∂yI∂xj

∂yJ− gij

∂2xi

∂yI∂yK∂xj

∂yJ− gij

∂xi

∂yI∂2xj

∂yJ∂yK

= gij,k∂xk

∂yI∂xi

∂yK∂xj

∂yJ+ gij,k

∂xk

∂yJ∂xi

∂yI∂xj

∂yK− gij,k

∂xk

∂yK∂xi

∂yI∂xj

∂yJ

+ 2 gij∂2xi

∂yI∂yJ∂xj

∂yK.

103

Then the Christoffel symbols

ΓLIJ = 12gKL(gKJ,I + gIK,J − gIJ,K)

= 12gm`

∂yK

∂xm∂yL

∂x`

(gij,k

∂xk

∂yI∂xi

∂yK∂xj

∂yJ+ gij,k

∂xk

∂yJ∂xi

∂yI∂xj

∂yK

− gij,k∂xk

∂yK∂xi

∂yI∂xj

∂yJ+ 2 gij

∂2xi

∂yI∂yJ∂xj

∂yK

)= 1

2gm`

∂yL

∂x`

(gmj,k

∂xk

∂yI∂xj

∂yJ+ gim,k

∂xk

∂yJ∂xi

∂yI

− gij,m∂xi

∂yI∂xj

∂yJ+ 2 gim

∂2xi

∂yI∂yJ

)= Γìj

∂yL

∂x`∂xi

∂yI∂xj

∂yJ+∂yL

∂x`∂2x`

∂yI∂yJ.

Note that the second term in the last formula shows that the Christoffelsymbols do not transform as a tensor. In fact, this is fortunate, as the extranon-tensorial term will cancel out a similar term coming from the secondderivative γk.

Note that

γI =∂yI

∂xiγi,

γL =d

dt

(∂yL

∂x`(γ) γ`

)=

∂yL

∂x`γ` +

∂2yL

∂x`∂xjγj γ`.

Compute

ΓLIJ γI γJ =

(Γìj

∂yL

∂x`∂xi

∂yI∂xj

∂yJ+∂yL

∂x`∂2x`

∂yI∂yJ

)γm

∂yI

∂xmγp∂yJ

∂xp

= Γìj γiγj

∂yL

∂x`+

∂2xk

∂yI∂yJ∂yL

∂xk∂yI

∂xj∂yJ

∂x`γj γ`.

Therefore, γL + ΓLIJ γI γJ will transform like a tensor if we can show that the

non-tensorial terms cancel: We need to show

∂2yL

∂x`∂xj+

∂2xk

∂yI∂yJ∂yL

∂xk∂yI

∂xj∂yJ

∂x`= 0. (31)

104

This equation follows from the formula for the first derivative of an inversematrix. If A represents the first derivative of a matrix A (with respect toany parameter or variable), then

(A−1)˙ = −A−1AA−1.

(Proof: Differentiate the equation AA−1 = I to find AA−1 + A(A−1)˙ = 0.)Then since (∂yL/∂x`) is the inverse matrix of (∂x`/∂yL),

∂2yL

∂x`∂xj=

∂

∂xj

(∂yL

∂x`

)= −∂y

L

∂xk

[∂

∂xj

(∂xk

∂yJ

)]∂yJ

∂x`

= −∂yL

∂xk

[∂yI

∂xj∂

∂yI

(∂xk

∂yJ

)]∂yJ

∂x`.

Upon plugging in, this proves formula (31) and the proposition.

Remark. There is also a more geometric proof of the previous proposition.Recall that we derived the geodesic equation as the Euler-Lagrange equationof the energy functional. So any path which minimizes the energy satisfies thegeodesic equation. It is easy to see that the energy of a path is invariant underan isometry; therefore, the notion of energy-minimizing path is invariantunder isometries.

The problem is that there are geodesics which do not minimize the en-ergy. (They may be saddle points of the energy functional.) This can besurmounted by restricting to small domains by using the following fact fromRiemannian geometry: Every point in a Riemannian manifold has a neighbor-hood O so that all geodesic paths in O are energy-minimizing for endpointsin O. (In Riemannian geometry books, this fact is usually stated in termsof the length functional instead; to translate to the present situation, re-call that energy-minimizing paths are length-minimizing paths parametrizedwith constant speed.)

Homework Problem 43. Given a smooth function on a Riemannian man-ifold, the Hessian of f is defined locally by the formula

H(f)ij =∂2f

∂xi∂xj− Γkij

∂f

∂xk.

Show that the Hessian of f is a symmetric (0, 2) tensor.

105

Homework Problem 44. Compute all the geodesics on S2.Hint: Use the expression for the metric in local coordinates (y1, y2) from

Example 13. Compute the Christoffel symbols. Analyze the case when y2 = 0and only y1 varies. Solve the resulting second-order ODE for γ1 = y1. Thenmove these geodesics around via the isometry group of S2.

(The isometry group of S2 is given by the orthogonal group of 3× 3 ma-trices

O(3) = A : AA> = I.

Show that each such linear action is an isometry of R3 which takes the unitsphere S2 to itself. For every line L though the origin in R3, show thatrotating by an angle θ around the line L is a linear map in O(3). Showthat every initial condition (p, v) ∈ TS2 of the geodesic equation on S2 canbe realized by the examples you computed above, when acted on by such arotation in O(3).)

4.3 The direct method: An example

We have computed the Euler-Lagrange equations of the energy functional.Now we introduce an example of the direct method in the calculus of varia-tions.

The direct method is this: Given a functional E : C → R, if there is alower bound I = infγ∈C E(γ) > −∞, then there is a sequence of paths γi sothat E(γi) → I. The direct method is to show that there is a subsequenceof γi which converges to some γ, and to show that the limiting γ ∈ C andthat E(γ) = I. Thus we have constructed a minimizer γ over the class Cof the functional E. There are subtle points to deal with along the way.Typically, the class C is a closed subset of a Banach space, and in passingto the limit of a subsequence, the limit γ we construct may be in a weakerBanach space (for example, a sequence in C1 may produce a limit only inC0, which will be problematic if the functional involves any derivatives).A related issue is that in passing to the limit γij → γ, we may not haveE(γij)→ E(γ). In particular, below we will have to deal with the situationin which we only know limj→∞E(γij) ≥ E(γ)—so that the functional is onlylower semi-continuous under the limit. Thus we will typically need to spendtime improving the regularity of the limit γ and showing some semi-continuityof the functional under the limiting subsequence.

The direct method of the calculus of variations is very useful in solving

106

elliptic PDEs. The problem we approach involves geodesics, and thus thesolution we produce be a solution to an ODE. This will allow us to proceedwith much of the general picture of the calculus of variations while avoid-ing some of the more technical points. In particular, we will learn aboutdistributions, weak derivatives, Hilbert spaces, and compact maps betweenBanach spaces in solving our problem.

Given a smooth manifold X, a loop is a continuous map from the circle S1

to X. Each such loop is equivalent to a continuous map γ : R→ X which isperiodic in the sense that γ(t+1) = γ(t) for all t ∈ R. We will abuse notationby using the same γ for γ : S1 → X and the periodic γ : R → X. (This isbecause S1 is naturally the quotient R/Z, where Z acts on R by addingintegers to real numbers.) Two loops γ0, γ1 : S1 → X are freely homotopic ifthere is a continuous homotopy

G : [0, 1]× S1 → X, G(0, t) = γ0(t), G(1, t) = γ1(t).

The condition of being freely homotopic is an equivalence relation, and thuseach loop on a manifold X is a member of a free homotopy class.

Here is our problem:

Problem: Find a curve of least length in a free homotopy class of loops ona compact Riemannian manifold.

The problem may have no solution on a noncompact Riemannian mani-fold. There may be loops of arbitrarily small length in a given nontrivial freehomotopy class, corresponding to a loops slipping off a narrowing end of themanifold.

Homotopy classes are objects defined by continuity, and the followingresult should come as no surprise.

Proposition 56. For a smooth compact manifold X ⊂ RN , there is an ε > 0so that if two loops γ0, γ1 : S1 → X ⊂ RN satisfy

‖γ0 − γ1‖C0(S1,RN ) < ε,

then γ0 and γ1 are homotopic as loops in X.

Proof. We apply the ε-Neighborhood Theorem (19): For ε > 0, let Xε be theopen subset of RN consisting of all points distance less than ε from X. Thereis a ε > 0 small enough so that every point in Xε has a unique closest point

107

in X. Then the map π : Xε → X which sends a point in Xε to its closestpoint in X is a smooth map of Xε to X, and it fixes each point in X ⊂ Xε.

Let γ0 and γ1 be loops on X satisfying

‖γ0 − γ1‖C0(S1,RN ) < ε.

Then consider the homotopy in RN

G(s, t) = (1− s)γ0(t) + sγ1(t) ∈ RN .

For s, t ∈ [0, 1], the distance in RN

|G(s, t)− γ0(t)| = s|γ0(t)− γ1(t)| < 1 · ε.

So G(s, t) ∈ Xε for all s, t ∈ [0, 1], and we may define a homotopy in X by

G(s, t) = π(G(s, t)).

Remark. The homotopy G(s, t) constructed is a smooth homotopy if γ0 andγ1 are smooth. Thus the same theorem works with smooth homotopy classes(as considered in Guillemin and Pollack).

Corollary 57. If γi are a sequence of loops in a free homotopy class inX ⊂ RN , and

limi→∞‖γi − γ‖C0(S1,RN ) = 0,

then the loop γ is in the same free homotopy class.

Proof. For the ε > 0 of Proposition 56 above, there is a γi so that

‖γi − γ‖C0(S1,RN ) < ε.

Apply Proposition 56 to show γ and γi are in the same free homotopy class.

The ε-Neighborhood Theorem, together with the mollifier technique ofapproximation, allow us to prove an important foundational result in topol-ogy:

108

Theorem 20. Let f : Rn → Y be uniformly continuous, where Y ⊂ RN isa compact submanifold without boundary. Then f is homotopic to a smoothmap from Rn → Y .

Proof. Since f is uniformly continuous, for all ε > 0, there is a δ > 0 sothat if |x − x′| < δ, then |f(x) − f(x′)| < ε. The ε-Neighborhood Theoremshows that there is an ε > 0 so that the map π : Y ε → Y is well-defined andsmooth. Let δ be the corresponding δ from the uniform continuity of f .

Let ρ be a smooth nonnegative bump function with support in the unitball B1(0) in Rn so that

∫Rn ρ dxn = 1. Then for α > 0, define ρα(x) =

α−nρ(x/α). Note supp ρα = Bα(0). Define

fα(x) =

∫Rnf(y)ρα(x− y) dyn =

∫y:|x−y|≤α

f(y)ρα(x− y) dyn.

(Note each fα is RN -valued.) If α < δ, then |f(y) − f(x)| < ε for y in thedomain of integration, and so

fα(x) =

∫y:|x−y|≤α

f(y)ρα(x− y) dyn

=

∫y:|x−y|≤α

[f(y)− f(x)]ρα(x− y) dyn

+

∫y:|x−y|≤α

f(x)ρα(x− y) dyn

=

∫y:|x−y|≤α

[f(y)− f(x)]ρα(x− y) dyn + f(x)

since ∫y:|x−y|≤α

ρα(x− y) dyn =

∫Rnρα(x− y) dyn =

∫Rnρα(z) dzn = 1

for the substitution z = x− y. So

|fα(x)− f(x)| =

∣∣∣∣∫y:|x−y|≤α

[f(y)− f(x)]ρα(x− y) dyn

∣∣∣∣ (32)

≤∫y:|x−y|≤α

|f(y)− f(x)|ρα(x− y) dyn

< ε

∫y:|x−y|≤α

ρα(x− y) dyn = ε.

109

Therefore if α ∈ (0, δ), then fα(x) ∈ Y ε. Then we check that fα(x) =π(fα(x)) is the desired homotopy. In particular, as α → 0, fα(x) → f(x)uniformly by (32) (view ε as varying to zero instead of fixed for this inter-pretation). Since π and fα are smooth, then fα is smooth for small α > 0.In particular, we have shown that

F (α, x) =

fα(x) for α > 0 smallf(x) for α = 0

is the desired homotopy.

Theorem 21. Let f : X → Y be a continuous map between smooth mani-folds. Then f is homotopic to a smooth map from X → Y .

Sketch of proof. We may assume X ⊂ RM by Whitney’s Embedding The-orem. Then there is a ν > 0 so that πM : Xν → X is well-defined andsmooth. Define g : RM → RN by g(p) = f(πM(p)) for p ∈ Xν and g(p) = 0for p 6∈ Xν . Note g(p) is uniformly continuous on a neighborhood of X.Apply the mollifier argument as above to g and show that the homotopyconstructed in the proof of Theorem 20, when restricted to X ⊂ RM , has thedesired properties.

The discussion above about energy and length still holds. Assuming theminimizer is smooth enough, then a constant-speed length-minimizing loopis the same as an energy-minimizing loop. Thus we may as well considerenergy-minimizing loops, and we have the equivalent problem.

Problem: Find a curve of least energy in a free homotopy class of loops ona compact Riemannian manifold.

So far in our discussion, the formulation of length and energy dependon the loop γ being C1 (so that the derivative γ is C0 and thus can beintegrated). If we look more closely, the energy is defined as the square ofthe L2 norm of γ

E(γ) =

∫ 1

0

|γ|2g dt.

Therefore, we really do not need γ to be continuous, but only L2. In terms ofγ itself, we need to develop a theory of how to take a derivative which endsup not being continuous, but only L2. For this purpose, we define derivativesin the sense of distributions, or weak derivatives.

110

4.4 Distributions

On Rn, we consider each smooth function φ with compact support to be atest function. For any C1 function f on Rn and test function φ, we have thefollowing formula by integrating by parts:∫

Rnf,iφ dxn = −

∫Rnfφ,i dxn. (33)

For two locally L1 functions f and h on Rn, we say f,i = h in the sense ofdistributions if for all test functions φ,∫

Rnhφ dxn = −

∫Rnfφ,i dxn.

Let D(Rn) be the vector space of all smooth functions with compact sup-port in Rn. For our purposes, we will define a distribution on Rn to be a linearmap from D(Rn)→ R. We often allow C-valued test functions and considercomplex linear maps to C; complex-valued functions are useful when doingFourier analysis. (The usual definition of a distribution is more involved:one must define a topology on D(Rn) and then consider distributions to beonly continuous linear maps to C. For our purposes, the simpler definitionsuffices. See Section 4.9 below for a more standard treatment of distributionson the circle S1.) Recall a measurable function f is locally L1 if over everycompact subset K of the domain of f ,

∫K|f | <∞. Any locally L1 function

f on Rn gives a distribution by sending

f : φ 7→ f(φ) =

∫Rnfφ dxn.

Notice that there is a slight abuse of notation: f(φ) for φ a test function isnot to be confused with f(x) for x ∈ Rn. Two locally L1 functions f1, f2 aresaid to be equal in the sense of distributions if for every test function φ,∫

Rnf1φ dxn =

∫Rnf2φ dxn ⇐⇒

∫Rn

(f1 − f2)φ dxn = 0.

Remark. On RN , note that any locally Lp function for p ≥ 1 is also locallyL1. This is because for K ⊂⊂ Rn, 1

p+ 1

q= 1, and f locally Lp, Holder’s

inequality states∫K

|f | dxn ≤(∫

K

1 dxn

) 1q(∫

K

|f |p dxn) 1

p

<∞.

111

Example 18. Any locally finite Borel measure dµ on Rn defines a distribu-tion by sending

φ 7→∫Rnφ dµ

for any test function φ.An important example of this is the inaptly named δ-function, or unit

point mass, at the origin. The δ-function is a measure on Rn so that for anysubset Ω ⊂ Rn,

δ(Ω) =

1 if 0 ∈ Ω0 if 0 /∈ Ω.

So the distribution defined by this measure is

δ : φ 7→ φ(0),

which is just evaluation of φ at the origin. The following problem shows thereis no locally L1 function which is equal to the δ-function.

Homework Problem 45. Show that there is no L1 function f on Rn sothat ∫

Rnfφ dxn = φ(0) for all φ ∈ D(Rn).

Hint: Consider a smooth nonnegative function ρ : Rn → R with supportin B1(0) the unit ball centered at 0 and so that

∫Rn ρ dxn = 1. Use this ρ to

define ρε(x) = ε−nρ(x/ε). If there were such an L1 function f , recall that if

f ε(x) =

∫Rnf(y)ρε(x− y) dyn,

then f ε → f in L1 as ε→ 0.

(a) Show that for all x 6= 0 that f ε(x) = 0 for ε small enough. (Follow theproof of Proposition 58.)

(b) Suppose a family of continuous functions f ε → f in L1(Rn) as ε→ 0+,and let O ⊂ Rn be a measurable subset on which f ε = 0 identically onO for all ε sufficiently small. Show that f = 0 almost everywhere on O.(Split up the relevant integrals on Rn into integrals on O and Rn \ O.)

(c) Show our f = 0 almost everywhere on Rn.

112

(d) Find a contradiction.

We have just seen that distributions are more general than functions.In particular, it is possible to differentiate any distribution by mimickingformula (33). A distributional derivative of a function may no longer be afunction, but it will be well-defined as a distribution. Given a distribution fdefined by a map f : φ 7→ f(φ) ∈ R, the partial derivative f,i in the sense ofdistributions is defined to be the distribution

f,i : φ 7→ −f(φ,i).

It is this innovation which allows us to define the derivatives of L2 functions.

Remark. Note that the equation (33) motivating the distributional derivativeis essentially the same as the integration by parts used to calculate the Euler-Lagrange equations for γ+εh. Thus if h is smooth with compact support, wecan still integrate by parts even if γ is no longer regular enough for ordinarydifferentiation; we simply consider the derivatives to be taken in the sense ofdistributions.

Homework Problem 46. Consider the Heaviside function

h(x) =

1 if x ≥ 00 if x < 0.

Show that the derivative h′ (taken in the sense of distributions) is the δfunction on R.

Homework Problem 47. Consider for any test function φ ∈ D(R),

PV

(1

x

)(φ) = lim

ε→0+

(∫ −ε−∞

1

xφ(x) dx+

∫ ∞ε

1

xφ(x) dx

).

Part (a) shows that PV ( 1x) is a distribution. It is called the principal value

of 1x.

(a) Show PV ( 1x)(φ) converges for all smooth test functions φ. (Hint: The

potential problem is clearly at x = 0. Use Taylor’s Theorem to writeφ = φ(0) + O(x), where O(x) represents a term so that O(x)/x con-verges to a real limit as x→ 0.)

113

(b) Show that the first derivative in the sense of distributions of PV ( 1x) is

given in terms of φ ∈ D(R) as

limε→0+

[∫ −ε−∞

(− 1

x2

)φ(x) dx+

∫ ∞ε

(− 1

x2

)φ(x) dx+

2

εφ(0)

].

One more thing is needed to complete the picture of distributions asgeneralizations of functions. Recall that every locally Lp function for p ≥ 1defines a distribution. The following proposition shows this map is injective.

Proposition 58. If two locally L1 functions f1 and f2 on Rn define the samedistribution, then f1 = f2 almost everywhere.

Proof. We first consider the case when f1 and f2 are both globally L1 on Rn.Then recall that we can use a mollifier to approximate each in L1 by smoothfunctions. In particular, if ρ is a smooth nonnegative function with compactsupport so that

∫Rn ρ dxn = 1, then define

ρε(x) =1

εnρ(xε

), f εi (x) =

∫Rnρε(x− y)fi(y) dyn, i = 1, 2.

Then each f εi is a smooth L1 function on Rn and f εi → fi in L1 as ε → 0.Now for each fixed x ∈ Rn, ρε(x− y) is a smooth test function with compactsupport in y, and f εi (x) is simply the evaluation of this test function by thedistribution fi. Since f1 = f2 in the sense of distributions, then f ε1(x) = f ε2(x)for all x ∈ Rn. So then

‖f1 − f2‖L1 = limε→0‖f ε1 − f ε2‖L1 = lim

ε→00 = 0.

Then f1 = f2 in L1, which is equivalent to f1 = f2 almost everywhere.If f1 and f2 are only locally L1, consider a smooth function βR with

compact support which is identically equal to 1 on BR = |x| ≤ R. It iseasy to check that the condition f1 = f2 in the sense of distributions impliesβRf1 = βRf2 in the sense of distributions. Then since each fi is locally L1,each βRfi is globally in L1. We apply the argument of the previous paragraph;so βRf1 = βRf2 almost everywhere on Rn. This implies that f1 = f2 almosteverywhere on the ball BR. Now let R→∞ to conclude that f1 = f2 almosteverywhere on Rn.

114

So far, we have discussed distributions on Rn. On the circle S1, thedefinitions are similar, the main difference being that since S1 is compact,our test functions are simply all smooth functions on S1. In particular, wecan think of test functions on S1 as smooth periodic functions on R withperiod 1. In this way, an L1 function f on S1 acts on test functions by

f : φ→∫ 1

0

fφ dt.

One thing to check is that integration by parts still works. If f is C1 on S1

and φ is smooth on S1, then∫S1fφ dt =

∫ 1

0

fφ dt

= −∫ 1

0

fφ dt+ (fφ)

∣∣∣∣10

= −∫S1fφ dt+ f(1)φ(1)− f(0)φ(0)

= −∫S1fφ dt

because f(0) = f(1) and φ(0) = φ(1) since f and φ are periodic. So wehave the same basic formula as in (33), and we may define distributions anddistributional derivatives in the same manner as above.

Now we return to our problem. We want to consider all loops γ : S1 →X ⊂ RN so that

E(γ) =

∫ 1

0

|γ|2g dt = ‖γ‖2L2(S1,RN ) <∞.

Therefore, we consider the Sobolev space

L21(S1,RN) = γ : S1 → RN : ‖γ‖2

L21

= ‖γ‖2L2 + ‖γ‖2

L2 <∞,

where the derivative γ is taken in the sense of distributions. Note thatγ ∈ L2(S1,RN) implies that γ, when defined in the sense of distributions,may be represented as a function (and an L2 function at that).

We may consider each component γ1, . . . , γN separately, and it should beclear that γi → γ in L2

1(S1,RN) if and only if each γai → γa in L21(S1,R) for

115

each a = 1, . . . , N . Thus we may work with each component of γ separatelyin RN . Below we will see that L2

1 is a Hilbert space, but for now we arecontent to show that every function in L2

1(S1) is continuous. Recall thatelements of L2

1(S1) are only equivalence classes of functions, two functionsbeing equivalent if they agree almost everywhere.

Proposition 59. Every element of L21(R) contains a continuous representa-

tive.

Remark. This proposition is an important example of the Sobolev embeddingtheorem, which gives a means to embed Sobolev spaces Lpk(Rn) into appropri-ate C` spaces ` = `(p, k, n). In particular, the present result depends stronglyon the fact that the dimension of the domain R of the functions is one. (Thereare elements of L2

1(R2) which do not have continuous representatives.)

Proof. Let f ∈ L21(R). So

∫R |f |

2 dt = C2 <∞. Then compute for t2 ≥ t1

|f(t2)− f(t1)| =

∣∣∣∣∫ t2

t1

f(t) dt

∣∣∣∣≤

(∫ t2

t1

|f(t)|2 dt) 1

2(∫ t2

t1

dt

) 12

≤ C(t2 − t1)12 .

So this formula shows f is continuous, as long as we can justify using theFundamental Theorem of Calculus

f(t2)− f(t1) =

∫ t2

t1

f(t) dt.

We achieve this by defining g(t) =∫ t

0f(s) ds. The previous argument

implies that g is continuous. Now we argue that there is a constant K sothat f − g = K almost everywhere. This will show there is an continuousrepresentative g +K in the equivalence class of f .

First we show that g = f in the sense of distributions. Consider a test

116

function φ. Then

g(φ) = −∫ ∞−∞

g(t)φ(t) dt

= −∫ ∞−∞

(∫ t

0

f(s) ds

)φ(t) dt

= −∫R1

f(s)φ(t) ds dt+

∫R2

f(s)φ(t) ds dt

by Fubini’s Theorem, for the regions in the plane

R1 = (s, t) : s ≥ 0, t ≥ s, R2 = −R1.

Then again by Fubini, and since φ has compact support,

g(φ) = −∫ ∞

0

(∫ ∞s

φ(t) dt

)f(s) ds+

∫ 0

−∞

(∫ s

−∞φ(t) dt

)f(s) ds

= −∫ ∞

0

(−φ(s))f(s) ds+

∫ 0

−∞φ(s)f(s) ds

=

∫ ∞−∞

φ(s)f(s) ds

= f(φ).

Therefore, g = f in the sense of distributions.The following proposition, applied to f−g, shows that there is a constant

K so that f = g + K in the sense of distributions. Then Proposition 58above shows f = g + K almost everywhere, and thus there is a continuousrepresentative in the equivalence class of f .

Proposition 60. If a distribution h on R satisfies h = 0 in the sense ofdistributions, then there is a constant K so that h = K as distributions.

Proof. Let φ be a test function with integral∫R φ dt = 1. Let K = h(φ).

Then for a test function ψ with∫R ψ dt = L, compute

h(ψ) = h(ψ − Lφ) + Lh(φ) = h(ψ − Lφ) + LK.

But now ∫ ∞−∞

(ψ − Lφ) dt = L− L · 1 = 0,

117

and thus the function

χ(t) =

∫ t

−∞[ψ(s)− Lφ(s)]ds (34)

is a smooth function with compact support—Proof: Let supp(ψ − Lφ) ⊂[T,T′]. It is clear that χ(t) = 0 for t < T . For t > T ′, note that χ′(t) =ψ(t) − Lφ(t) = 0 and so χ is constant on (T ′,∞). Then (34) shows thatχ(t)→ 0 as t→∞, and so χ = 0 on (T ′,∞).

Then since χ = ψ − Lφ,

h(ψ) = LK + h(ψ − Lφ) = LK + h(χ) = LK − h(χ) = LK

since h = 0 in the sense of distributions. But then

h(ψ) = LK = K

∫Rψ dt =

∫RKψ dt.

and h = K as distributions.

Homework Problem 48. Prove Propositions 59 and 60 above for distribu-tions on S1 instead of on R. Here are the key steps:

(a) Let f : S1 → R be an L2 function, and assume that the distributionalderivative f is L2 as well. Represent f and f as periodic functionsfrom R→ R. For any t ∈ R, define

g(t) =

∫ t

0

f(s) ds.

Show that g is periodic and continuous (and so defines a continuousfunction on S1.) Note that the constant function 1 is a test functionon S1.

(b) Show that f = g in the sense of distributions. In other words, for everysmooth periodic test function φ ∈ D(S1), show that∫ 1

0

fφ dt = −∫ 1

0

gφ dt.

(c) If h is a distribution on S1 which satisfies h = 0 in the sense of distri-butions, show there is a constant K so that h = K as distributions. Inother words, show that for every periodic smooth ψ : R→ R,

h(ψ) =

∫ 1

0

Kψ dt.

118

Now since any L21 map from S1 → X ⊂ RN is continuous, each one is in

a free homotopy class of loops on X. With that in mind, we formulate ourfinal version of the problem:

For X ⊂ RN a smooth submanifold with Riemannian metric pulled backfrom the Euclidean metric on RN , define

L21(S1, X) = γ ∈ L2

1(S1,RN) : γ(S1) ⊂ X.

Here we assume that γ is continuous, as we may by Proposition 59 above.

Problem: Let X ⊂ RN be a smooth compact manifold equipped with theRiemannian metric pulled back from the Euclidean metric on RN . Let C bethe class of loops γ : S1 → X in a free homotopy class on X and in L2

1(S1, X).Find a loop of least energy in C.

Proposition 61. Let γ ∈ L21(S1, X) be energy minimizing in a free homotopy

class on X for X ⊂ RN a smooth manifold without boundary. Then γ solves(a version of) the geodesic equation

2(gikγi)˙ − gij,kγiγj

for all k = 1, . . . , n, in the sense of distributions.

Proof. First of all, note that we can choose γ to be continuous by Problem48 above. Thus it makes sense that γ is in a free homotopy class. Sinceγ minimizes energy, then for each h smooth with compact support so thatγ(supph) ⊂⊂ a single coordinate chart in X, that

d

dε

∣∣∣∣ε=0

E(γ + εh) = 0.

Compute the first variation as in the derivation of the Euler-Lagrange equa-tions in Subsection 4.2 above:∫

S1gij,k h

k γi γj dt+

∫S1gij h

i γj dt+

∫S1gij γ

i hj dt = 0.

Since the components of h are smooth with compact support, they act astest functions, and we may then integrate by parts in the second and thirdintegrals, in the sense of distributions, to conclude that

0 = (gkj γj)˙ + (gikγ

i)˙ − gij,kγiγj = 2(gikγi)˙ − gij,kγiγj

in the sense of distributions.

119

Remark. In the previous proposition, we cannot immediately perform theusual rules of calculus, since the objects involved are only distributions. Inparticular, we show in the next homework problem that functions which areonly continuous cannot be meaningfully multiplied by distributions whichare not Borel measures.

Homework Problem 49. Note that if λ : Rn → R is a smooth function, andf is a locally L1 function, then the product λf is also a locally L1 function.

(a) If λ : Rn → R is a smooth function, and p is a distribution on Rn, thenshow that it is possible to define the product λp in such a way that if pis induced from a locally L1 function, then λp is induced from the usualproduct of two functions.

(b) Let δ be the δ-function on R. Compute its first derivative δ in the senseof distributions.

(c) Show that if g : R → R is a continuous function which is not differen-tiable at 0, then the formula for the product developed in part (a) abovedoes not give a reliable answer for the product gδ of the continuousfunction g and the distribution δ.

4.5 Hilbert spaces

Recall that a Hilbert space is a Banach space whose norm comes from apositive definite inner product. We now show that L2

1(S1,R) is a Hilbertspace. Recall that L2

1(S1,R) consists of all L2 functions on S1 whose derivativein the sense of distributions is also L2. This suggests a natural inner product:

〈f, h〉L21

=

∫S1fh dt+

∫S1f h dt.

Then plug in f = h to find

‖f‖2L21

=

∫S1|f |2 dt+

∫S1|f |2 dt = 〈f, f〉L2

1,

and so the norm on L21 is induced by the inner product. Below in Corollary

67, we show that any positive definite inner product defines a norm.

120

Remark. L21(S1,RN) is also naturally a Hilbert space, with inner product

given by

〈f, h〉L21

=

∫S1〈f, h〉 dt+

∫S1〈f , h〉 dt,

where 〈·, ·〉 is the inner product on RN .It is also useful to define complex Hilbert spaces, in which the inner

product 〈·, ·〉 is Hermitian and positive definite. A Hermitian inner producton a complex vector space V is a map from V × V → C which satisfies forλ ∈ C and f, g, h ∈ V ,

〈λf + g, h〉 = λ〈f, h〉+ 〈g, h〉,〈f, λg + h〉 = λ〈f, g〉+ 〈f, h〉,

〈f, g〉 = 〈g, f〉.

These three conditions are respectively that the inner product is complexlinear in the first slot, complex antilinear in the second slot, and skew-symmetric. The first two conditions together are called sesquilinear.

Then L21(S1,C) is a complex Hilbert space with inner product

〈f, g〉 =

∫S1fg dt+

∫S1f g dt.

We can also define the Sobolev space L21(Rn,R) by the inner product

〈f, g〉 =

∫Rnfg dxn +

n∑i=1

∫Rnf,ig,i dxn,

the derivatives taken in the sense of distributions. The elements of L21(Rn,R)

are then equivalence classes of functions in L2 so that all the first partials inthe sense of distributions are also in L2.

We will work with L21(S1,R) instead of L2

1(S1,RN), since convergencein L2

1(S1,RN) is equivalent to each component converging in L21(S1,R). The

proofs that follow will work with minor modifications for the spaces L21(S1,RN)

and L21(S1,C).

We focus on L21(S1,R), which we refer to simply as L2

1.

Proposition 62. L21(S1,R) is a Hilbert space.

121

Proof. We’ve exhibited an inner product on L21, and it is easy to check that

it is positive definite (if we consider elements to be equivalence classes offunctions, two functions being equivalent if they agree almost everywhere).Thus the remaining thing to check is that the metric L2

1(S1,R) is complete(and so it is a Banach space).

First of all note that fn → f in L21 is equivalent to fn → f in L2 and

fn → f in L2.Let fn be a Cauchy sequence in L2

1. Then by the definition of the norm,it is clear that fn and fn are both Cauchy sequences in L2. Then we havelimits fn → f and fn → g in L2. In order to show that fn → f in L2

1, itsuffices to show that f = g in the sense of distributions.

Let φ be a test function, and note that fn → f in L2 implies by Holder’sinequality that

|fn(φ)− f(φ)| =∣∣∣∣∫

S1(fn − f)φ dt

∣∣∣∣ ≤ ‖fn − f‖L2‖φ‖L2 → 0

as n → ∞. We use this fact for both fn → f and fn → g to compute for atest function φ

g(φ) =

∫S1gφ dt = lim

n→∞

∫S1fnφ dt

= − limn→∞

∫S1fnφ dt

= −∫S1fφ dt

= −f(φ) = f(φ).

Therefore, g = f in the sense of distributions.

Remark. Essentially the same proof shows that L21(Rn,Rm) is a Hilbert space.

For a real Hilbert space H, an orthonormal basis is a collection of elementseαα∈A which are orthonormal in that

〈eα, eβ〉 = δαβ

and so that every element v ∈ H can be written as

v =∑α∈A

vαeα

122

for vα ∈ R. Here A is an index set, which may be finite, countably infinite, oruncountable (and of course the convergence of any infinite sum is controlledby the norm). A Hilbert space which has a countable (finite or infinite)orthonormal basis is called separable. The following is true:

Proposition 63. Every Hilbert space has an orthonormal basis. In fact,every orthonormal set in a Hilbert space can be completed to an orthonormalbasis.

We omit the proof, which is similar to the proof of the correspondingfact for vector spaces (any linearly independent set can be completed to abasis). In particular, Zorn’s Lemma is needed in the case of non-separableHilbert spaces. But see Problem 54 below for a proof of this Propositionfor separable Hilbert spaces, and for a discussion of how this special case isadequate for the proofs of the results in this section.

Theorem 22 (Pythagorean Theorem). If v, w ∈ H a Hilbert space, and〈v, w〉 = 0, then

‖v‖2 + ‖w‖2 = ‖v + w‖2.

Proof. Compute

‖v + w‖2 = 〈v + w, v + w〉 = 〈v, v〉+ 2〈v, w〉+ 〈w,w〉 = ‖v‖2 + ‖w‖2.

Lemma 64 (Bessel’s Inequality). If e1, . . . , en is a finite orthonormal setin H, then for all y ∈ H,

‖y‖2 ≥n∑i=1

|〈y, ei〉|2.

Proof. Check that for w =∑n

i=1〈y, ei〉ei, 〈y − w,w〉 = 0. Then applythe Pythagorian Theorem to y = (y − w) + w, and note that ‖w‖2 =∑n

i=1 |〈y, ei〉|2.

Corollary 65. If ei is a countable orthonormal set, then

‖y‖2 ≥∞∑i=1

|〈y, ei〉|2.

123

Proof. Use Bessel’s Inequality and take limits of partial sums.

Theorem 23. Given a Hilbert space H with an orthonormal basis eαα∈A,for every element v ∈ H,

v =∑α∈A

〈v, eα〉eα, (35)

‖v‖2H = 〈v, v〉 =

∑α∈A

|〈v, eα〉|2, (36)

where the (possibly uncountable) sums are defined by using Homework Prob-lem 50 below. Moreover, if there are vα ∈ R so that

∑α∈A |vα|2 < ∞, then

v =∑

α∈A vαeα converges to an element of H.

Remark. For each v ∈ H, only a countable number of the coefficients vα =〈v, eα〉 are nonzero. This is due to the following fact:

Homework Problem 50. Let A be an uncountable set, and for each α ∈ A,let xα ≥ 0.

(a) If A′ ⊂ A is a finite set, let SA′ =∑

α∈A′ xα. Show that if the set

SA′ : A′ ⊂ A is a finite set

is bounded, then xα = 0 for all but countably many α ∈ A.

(b) Use part (a) to define∑α∈A

xα = supSA′ : A′ ⊂ A is a finite set

as an element of [0,∞] for any xα ≥ 0. In particular, if the sum isfinite, show that ∑

α∈A

xα =∑α∈A

xα,

where A = α ∈ A : xα > 0 is countable. Show that if A is infinite,the right-hand sum is the usual sum of a convergent countably infiniteseries (for any bijection between A and the natural numbers).

Hint for (a): Each xα > 0 satisfies xα ∈ [2n, 2n+1) for some n ∈ Z.Derive a contradiction if the number of positive xα is uncountable.

124

Remark. Note that ∑α∈A

xα =

∫A

x dc

for dc the counting measure on A. If A′ ⊂ A, then the counting measurec(A′) = |A′| the cardinality of A′ (and so c(A′) = +∞ when A′ is infinite).

Proof of Theorem 23. First assume that vα ∈ R and∑

α∈A |vα|2 <∞. ThenHomework Problem 50 above shows that all but countably many of vα arezero, and so we may write v as a countable sum

∑∞i=1 v

iei. Let vn =∑n

i=1 viei.

Then for n > m

‖vn − vm‖2 =

∥∥∥∥∥n∑

i=m+1

viei

∥∥∥∥∥2

=n∑

i=m+1

|vi|2 ≤∞∑

i=m+1

|vi|2.

Here, the second equality is by the Pythagorean Theorem. Since the series∑∞i=1 |vi|2 converges, the tail of the series

∑∞i=m+1 |vi|2 must go to zero as

m→∞, and thus vn is a Cauchy sequence in H. Since H is complete, vnconverges to the limit v ∈ H.

Now let v ∈ H and vα = 〈v, eα〉. Then Bessel’s Inequality shows that forall finite subsets A′ ⊂ A, that∑

α∈A′|vα|2 ≤ ‖v‖2.

So for the collection |vα|2α∈A, the set S of finite partial sums is bounded.So Homework Problem 50 shows that all but countably many vα = 0. Denu-merate the countable number of nonzero terms as v1, v2, . . . , and the corre-sponding elements of the orthonormal basis as e1, e2, . . . .

Since the sequence∑N

i=1 |vi|2 is bounded and increasing, it has a finitelimit as N → ∞. We have shown above that the series

∑∞i=1 v

iei convergesto a limit v′ ∈ H. Compute

〈v − v′, ei〉 = limn→∞

⟨v −

n∑j=1

vjej, ei

⟩= vi − vjδij = 0.

And for any eα 6∈ e1, e2, . . . , compute

〈v − v′, eα〉 = limn→∞

⟨v −

n∑j=1

vjej, eα

⟩= 0.

125

So for all eα in the orthonormal basis,

〈v, eα〉 = 〈v′, eα〉 =

⟨∞∑i=1

viei, eα

⟩= vα.

Now the definition of orthonormal basis shows that there are vα ∈ R sothat

∑α∈A v

αeα = v. By the same analysis as above, all but countably manyvα are zero, and we may write v =

∑∞i=1 v

iei. Moreover, as in the previousparagraph,

vα = 〈v, eα〉 =

⟨∞∑i=1

viei, eα

⟩= vα

and so (35) is proved.To prove (36), note that (35) shows that v = limn→∞ vn in H, for vn =∑ni=1〈v, ei〉ei. Since the norm is continuous, then

‖v‖2 = limn→∞

‖vn‖2 = limn→∞

n∑i=1

|〈v, ei〉|2 =∞∑i=1

|〈v, ei〉|2 =∑α∈A

|〈v, eα〉|2.

This concludes the proof of the theorem.

Corollary 66. If v =∑∞

i=1 viei, w =

∑∞i=1w

iei for ei an orthonormalbasis of a separable Hilbert space, then

〈v, w〉 =∞∑i=1

viwi.

Proof. Compute

‖v + w‖2 = ‖v‖2 + 2〈v, w〉+ ‖w‖2,

〈v, w〉 = 12[‖v + w‖2 − ‖v‖2 − ‖w‖2]

= 12

∞∑i=1

[(vi + wi)2 − (vi)2 − (wi)2]

=∞∑i=1

viwi.

126

Remark. The formula for a complex Hilbert space is

〈v, w〉 =∞∑i=1

viwi.

Remark. Homework Problem 50 shows that this result still holds for non-separable Hilbert spaces, since the number of basis elements with nonzerocoefficients for v and/or w is countable.

Here is another basic result in Hilbert spaces:

Homework Problem 51 (Cauchy-Schwartz Inequality). If v, w ∈ H a realHilbert space, then |〈v, w〉| ≤ ‖v‖‖w‖, and there is equality if and only if vand w are linearly dependent.

Hint: Use calculus to compute the minimum value of ‖tv + w‖2 as afunction of t, and note the minimum value must be nonnegative.

Remark. The Cauchy-Schwartz Inequality is also true for complex Hilbertspaces, but for the proof, note that the minimum value of ‖teiθv + w‖2, fort ∈ R and θ so that eiθ〈v, w〉 = |〈v, w〉|, is nonnegative.

Corollary 67. Any positive definite inner product on a real vector space Vproduces a norm by the formula ‖v‖2 = 〈v, v〉.

Proof. The main thing to check is the triangle inequality. Let v, w ∈ V andnote that

‖v + w‖ ≤ ‖v‖+ ‖w‖⇐⇒ ‖v + w‖2 ≤ ‖v‖2 + 2‖v‖‖w‖+ ‖w‖2

⇐⇒ ‖v‖2 + 2〈v, w〉+ ‖w‖2 ≤ ‖v‖2 + 2‖v‖‖w‖+ ‖w‖2

⇐⇒ 〈v, w〉 ≤ ‖v‖‖w‖.

The main results we will use regarding Hilbert spaces involve anothertopology on the Hilbert space which is different from the topology definedby the metric. The usual metric convergence of sequences is called strongconvergence. So a sequence vi → v in H strongly if

‖vi − v‖H → 0,

127

and we write vi → v. On the other hand, a sequence vi ∈ H is weaklyconvergent to a limit v ∈ H if

〈vi, w〉 → 〈v, w〉 for all w ∈ H,

and we write vi v. If vi → v, then vi v (Homework Problem 52 below),but the converse is not true in general, as the following example shows:

Example 19. Let H be a Hilbert space with a countably infinite orthonormalbasis e1, e2, . . . . Then ei 0 in H, but ei does not converge strongly.

Proof. Let w ∈ H. Then since ‖w‖2 =∑∞

i=1 |〈w, ei〉|2 < ∞, we must haveeach term |〈w, ei〉|2 → 0 as i → ∞. This shows ei → 0 weakly in H asi→∞.

To show ei does not converge strongly, note that

‖ei − ej‖H =√

2 for i 6= j

by the Pythagorean Theorem. Thus ei cannot be a Cauchy sequence inH, and thus cannot converge strongly.

Homework Problem 52. Show that if vi → v converges strongly in aHilbert space H, then vi v weakly in H.

Hint: Use Cauchy-Schwartz.

Theorem 24. Let vi be a sequence in a Hilbert space H satisfying ‖vi‖ ≤K for a uniform constant K. Then there is a weakly convergent subsequenceto a limit v which satisfies ‖v‖ ≤ K. In other words, the closed ball of radiusK is (sequentially) compact in the weak topology on H.

Proof. Let eαα∈A be an orthonormal basis. Problem 50 shows that foreach of v1, v2, . . . , only a countable subset Av1 , Av2 , · · · ⊂ A have nonzerocoefficients in the orthonormal decomposition. Then the union

∞⋃i=1

Avi

is also countable, and it represents all the basis elements with nonvanishingcoefficients for all the vi. Denumerate these elements as e1, e2, . . . , and write

vi =∞∑j=1

vji ej.

128

Since there is a constant K so that ‖vi‖ ≤ K, then Theorem 23 showsfor each N

N∑j=1

|vji |2 ≤ K2. (37)

Thus, since the interval [−K,K] ⊂ R is compact, there is a subsequence1vi of vi so that

limi→∞

1v1i = v1 ∈ [−K,K].

Now there is a subsequence 2vi of 1vi so that

limi→∞

2v1i = v1, lim

i→∞2v

2i = v2, |v1|2 + |v2|2 ≤ K2.

This is because 1v2i ∈ [−K,K], which is compact, and the bound follows from

(37). Recursively, we may define for each N a subsequence Nvi and a realnumber vN so that

Nvi is a subsequence of (N−1)vi,limi→∞

Nvji = vj for j = 1, . . . , N, (38)

|v1|2 + · · ·+ |vN |2 ≤ K2. (39)

We use a diagonalization procedure to find a weakly convergent subse-quence. ivi is a subsequence of vi, and we will show that it convergesweakly to v =

∑∞i=1 v

iei and v ∈ H. Note by construction that ivji → vj

as i → ∞ for each j = 1, 2, . . . . This is because, for each j, ivi∞i=j is asubsequence of jvi∞i=1 and by condition (38).

That v ∈ H follows directly from (39) and Theorem 23. Now we show

ivi → v weakly in H. Let w ∈ H, and let ε > 0. Write

|〈ivi, w〉 − 〈v, w〉| = |〈ivi − v, w〉|≤

∑α∈A

|(ivαi − vα)wα|

=∞∑j=1

|(ivji − vj)wj|

=n∑j=1

|(ivji − vj)wj|+∞∑

j=n+1

|(ivji − vj)wj|

129

for all n. Here the third line follows from the second since ivαi = vα = 0 if

eα 6∈ e1, e2, . . . .Since ‖ivi‖ ≤ K and ‖v‖ ≤ K, then ‖ivi−v‖ ≤ 2K and Cauchy-Schwartz

shows that

∞∑j=n+1

|(ivji − vj)wj| ≤

(∞∑

j=n+1

|ivji − vj|2) 1

2(

∞∑j=n+1

|wj|2) 1

2

≤

(∞∑j=1

|ivji − vj|2) 1

2(

∞∑j=n+1

|wj|2) 1

2

≤ 2K

(∞∑

j=n+1

|wj|2) 1

2

Since w ∈ H,∑∞

j=1 |wj|2 ≤∑

α∈A |wα|2 = ‖w‖2 converges, and there is an nso that (

∞∑j=n+1

|wj|2) 1

2

< ε.

Now for j = 1, 2, . . . , n, each ivji → vj as i→∞. So we may choose an I

so that for all i ≥ I, |ivji − vj| < ε′ for (|w1| + · · · |wn|)ε′ < ε. Therefore, fori ≥ I,

|〈ivi, w〉 − 〈v, w〉| ≤n∑j=1

|(ivji − vj)wj|+∞∑

j=n+1

|(ivji − vj)wj|

≤ ε′(|w1|+ · · ·+ |wn|) + 2Kε

< ε+ 2Kε.

Since K is independent of i, 〈ivi, w〉 → 〈v, w〉 as i → ∞ and thus ivi vweakly in H.

Theorem 25. Let vi v in a Hilbert space. Then

‖v‖ ≤ lim infi→∞

‖vi‖.

In other words, the Hilbert space norm is lower semicontinuous under weakconvergence.

130

Proof. The proof is to translate the current problem into Fatou’s Lemma.Let eαα∈A be an orthonormal basis of our Hilbert space H. Then put thecounting measure c on the index set A. Let f : A→ [0,∞), f : α 7→ fα be anonnegative real-valued function on A. Then it is straightforward to checkthat ∫

A

f dc =∑α∈A

fα,

and thus each sum may be thought of as an integral with respect to thecounting measure.

In our case, if

vi =∑α∈A

vαi eα, v =∑α∈A

vαeα,

we may view vi as a function from A → R by vi : α → vαi . (The same holdsfor v.) Theorem 23 shows

‖vi‖2 =∑α∈A

|vαi |2, ‖v‖2 =∑α∈A

|vα|2. (40)

Now since vi v, then

vαi = 〈vi, eα〉 → 〈v, eα〉 = vα

as i→∞ for all α. Thus with respect to the counting measure on A, vi → veverywhere on A. Thus each terms in the sums in (40) is nonnegative, andfor each α, |vαi |2 → |vα|2, the limit vi → v satisfies the hypotheses of Fatou’sLemma with respect to the counting measure, and so

lim infi→∞

‖vi‖2 = lim infi→∞

∑α∈A

|vαi |2

= lim infi→∞

∫A

|vi|2 dµ

≥∫A

|v|2 dµ =∑α∈A

|vα|2 = ‖v‖2.

Note that the above proofs depend heavily on the existence of an or-thonormal basis, Proposition 63, which we did not prove. The following

131

problem outlines a standard procedure for getting around the proof of Propo-sition 63, by proving the existence of an orthonormal basis for any Hilbertspace with a countable spanning set. A subset S of a Hilbert space H is saidto be a spanning set if the (strong) closure of finite linear combinations ofelements in S is equal to all of H. For example, in the proof of Theorem24, we need only deal with the closure H ′ of the span of v1, v2, . . . . Theexistence of an orthonormal basis of H ′ is sufficient for the proof of Theorem24.

Homework Problem 53. Show that any strongly closed linear subspace ofa Hilbert space H is again a Hilbert space (with the same inner product).

We say a subset vαα∈A ⊂ H is linearly independent (in the sense ofBanach spaces) if any convergent sum∑

α∈A

bαvα = 0

implies bα = 0 for all α ∈ A. Note in particular, the implication holds for anyfinite sum (and thus this notion of linearly independence in this Banach-spacesense implies linear independence in the usual vector-space sense).

Homework Problem 54 (Gram-Schmidt Orthogonalization).

(a) Let H be a Hilbert space with a countable spanning set v1, v2, . . . which is finite or countably infinite. Show that there is a subset ofv1, v2, . . . which is a linearly independent spanning set of H.

(b) Given a linearly independent spanning set v1, v2, . . . on a Hilbertspace H, define fi and ei recursively by

f1 = v1, e1 =f1

‖f1‖,

f2 = v2 − 〈v2, e1〉e1, e2 =f2

‖f2‖,

fn = vn −n−1∑i=1

〈vn, ei〉ei, en =fn‖fn‖

.

Show that this recursive definition can be carried out (in particular,show that fn 6= 0). Then show that e1, e2, . . . is an orthonormalbasis for H. In other words, show that 〈ei, ej〉 = δij and that any v inH can be written as a convergent sum v =

∑∞i=1 v

iei.

132

The use of the previous problem isn’t strictly necessary for our purposes,as L2

1(S1,R) is separable (though we won’t prove that it is).Recall that for every Banach space B, the dual space Banach space B∗ is

the space of all continuous linear functionals λ : B → R, with norm given by

‖λ‖B∗ = supx∈B\0

|λ(x)|‖x‖B

.

Also recall that for any p ∈ (1,∞), the dual Banach space of Lp(Rn) isLq(Rn) for p−1 + q−1 = 1. Thus L2(Rn) is dual to itself. This fact is true forall Hilbert spaces, as the following problem shows in the separable case.

Homework Problem 55. Let H be a separable real Hilbert space. Showthat the dual Banach space H∗ is naturally equal to H. In particular, theinner product provides a map from H → H∗ by

x 7→ λx = 〈·, x〉.

Show that this map preserves the norm, is one-to-one and onto.Hint: The most significant step is showing the map is onto. First reduce

to the case λ 6= 0. Show that L = λ−1(0) is also a separable Hilbert space,and let ei be an orthonormal basis for L. Let y /∈ L and use a version ofGram-Schmidt to show we may assume y ⊥ L. Construct x from y and λ.

A sequence vi in a Banach space B converges to v ∈ B, in the weak*topology if for every λ ∈ B∗, λ(vi)→ λ(v). The previous problem shows thatTheorem 24 is a special case of the following more general theorem aboutBanach spaces:

Theorem 26 (Banach-Alaoglu). In a Banach space B, the unit ball x ∈B : ‖x‖B ≤ 1 is compact in the weak* topology. In other words, if xi is asequence in the unit ball, then there is a subsequence xij and a limit x ∈ Bso that for all λ ∈ B∗, λ(xij)→ λ(x) as j →∞.

Example 20 (Fourier series). In the following theorem, we compute per-haps the easiest nontrivial example of an orthonormal basis on an infinite-dimensional Hilbert space. L2(S1,C) is a complex Hilbert space with innerproduct given by

〈f, g〉 =

∫S1fg dt.

133

Theorem 27. The complex exponential functions

e2πikt : k ∈ Z

form an orthonormal basis of L2(S1,C).

Proof. It is clear that each e2πikt ∈ L2(S1,C), and we compute

〈e2πikt, e2πi`t〉 =

∫S1e2πikte2πi`t dt

=

∫ 1

0

e2πi(k−`)t dt

=

e2πi(k−`)t

2πi(k − `)

∣∣∣∣10

= 0 if k 6= `∫ 1

0

dt = 1 if k = `.

Therefore, e2πikt∞k=−∞ forms an orthonormal set in L2(S1,C).We must show that every element f ∈ L2(S1,C) can be written as a

Fourier series

f =∞∑

k=−∞

〈f, e2πikt〉e2πikt,

with the convergence in the L2 sense.First, we address this problem for smooth functions f ∈ C∞(S1,C). Re-

call that C∞(S1,C) is dense in L2(S1,C) (which may be proved by mollifyingL2 functions).

Lemma 68. If f ∈ C∞(S1,C), then for every polynomial P = P (k),

limk→∞

P (k)〈f, e2πikt〉 = limk→−∞

P (k)〈f, e2πikt〉 = 0.

Proof. We use the following claim: For any L2 function f , the Fourier coef-ficients 〈f, e2πikt〉 → 0 as k → ±∞. This follows from Bessel’s Inequality

∞∑k=−∞

|〈f, e2πikt〉|2 ≤ ‖f‖2L2 <∞.

134

If f is smooth, then f is also smooth (and thus is in L2), and integrationby parts gives us

〈f , e2πikt〉 =

∫ 1

0

f e−2πikt dt

= −∫ 1

0

f(e−2πikt) dt+ f(t)e−2πikt

∣∣∣∣10

= 2πik〈f, e2πikt〉+ 0.

Now by the claim, 〈f , e2πikt〉 = 2πik〈f, e2πikt〉 → 0 as k → ±∞. Now we mayapply induction to show that

limk→±∞

kn〈f, e2πikt〉 = 0 for each n = 0, 1, 2, . . . .

Thus any polynomial P (k) times the Fourier coefficients also goes to zero ask → ±∞.

The previous lemma shows that for any smooth function f ∈ C∞(S1,C),the Fourier series

g(t) =∞∑

k=−∞

〈f, e2πikt〉e2πikt

converges uniformly: This is because there is a constant C > 0 so that

|〈f, e2πikt〉| ≤ C

1 + k2

(why?), which shows that the C0 norm of the Fourier series satisfies

∞∑k=−∞

∥∥〈f, e2πikt〉e2πikt∥∥C0 ≤

∞∑k=−∞

C

1 + k2<∞.

So the sup norm of the tails of the series

∞∑k=−∞

〈f, e2πikt〉e2πikt

must go to zero, as they are bounded by the tails of an absolutely convergentseries.

135

Therefore, uniform convergence implies that g(t) is continuous (and thusis in L2 as well—why?). (In fact, g(t) is smooth—see Homework Problem 58below.) If we let

h(t) = f(t)− g(t) = f(t)−∞∑

k=−∞

〈f, e2πikt〉e2πikt,

then by the same techniques in the proof of Theorem 23 above, we see that

〈h, e2πikt〉 = 0 for all k ∈ Z.

The following lemma shows that h = 0:

Lemma 69. Given a function h ∈ C0(S1,C) all of whose Fourier coefficients〈h, e2πikt〉 = 0, then h = 0 identically.

Proof. We prove by contradiction. If h is not identically zero, then there isa point τ ∈ S1 at which h(τ) 6= 0. Then we know that at least one of thefollowing is true:

Reh(τ) > 0, Reh(τ) < 0, Imh(τ) > 0, Imh(τ) < 0.

Assume that Reh(τ) > 0 (the other cases are similar), and let α(t) = Reh(t).Since α is continuous, there is a δ > 0 so that

α(t) > 12α(τ) > 0 if t ∈ (τ − δ, τ + δ).

We will construct an approximate bump function to prove a contradiction.For n a positive integer, define

bn(t) = [12

+ 12

cos 2π(t− τ)]n =[

12

+ 14e−2πiτe2πit + 1

4e2πiτe−2πit

]n.

It is obvious that bn(t) is real-valued, periodic with period 1 (and so definesa function on S1), and is equal to a finite Fourier series. Moreover, note that

12

+ 12

cos 2π(t− τ) ∈ [0, 1]

always, and is equal to 1 only if t = τ in S1. Thus the powers bn(t) → 0 asn → ∞ away from t = τ , while bn(τ) → 1. This is the property that makesbn similar to bump functions centered around t = τ .

136

Now compute

|Re 〈h, bn〉| =

∣∣∣∣Re

∫S1h(t)bn(t) dt

∣∣∣∣=

∣∣∣∣∫S1α(t)bn(t) dt

∣∣∣∣=

∣∣∣∣∫ τ+δ

τ−δα(t)bn(t) dt+

∫S1\[τ−δ,τ+δ]

α(t)bn(t) dt

∣∣∣∣≥

∣∣∣∣∫ τ+δ

τ−δα(t)bn(t) dt

∣∣∣∣− ∣∣∣∣∫S1\[τ−δ,τ+δ]

α(t)bn(t) dt

∣∣∣∣>

∫ τ+ δ2

τ− δ2

α(t)bn(t) dt−∣∣∣∣∫

S1\[τ−δ,τ+δ]

α(t)bn(t) dt

∣∣∣∣ .(Note the last inequality follows since the integrand is positive.) Also, wehave the following bounds:

t ∈ [t− δ2, t+ δ

2] =⇒ α(t) > 1

2α(τ) > 0, bn(t) ≥ (1

2+ 1

2cos πδ)n,

t 6∈ [t− δ, t+ δ] =⇒ |α(t)| < C, bn(t) ≤ (12

+ 12

cos 2πδ)n.

for some constant C (since α is continuous). The bounds on bn follow byexamining the graph of the cosine function. The key point is that

12

+ 12

cosπδ > 12

+ 12

cos 2πδ > 0. (41)

Now compute

|Re 〈h, bn〉| >

∫ τ+ δ2

τ− δ2

α(t)bn(t) dt−∣∣∣∣∫

S1\[τ−δ,τ+δ]

α(t)bn(t) dt

∣∣∣∣≥ δ 1

2α(τ)(1

2+ 1

2cos πδ)n − (1− 2δ)C(1

2+ 1

2cos 2πδ)n.

Now (41) shows the ratio of the first term over the second goes to +∞ asn→∞ and thus there is an n so that |Re 〈h, bn〉| > 0.

Now the contradiction is this: Since bn is a finite Fourier series, 〈h, bn〉 isa finite linear combination of Fourier coefficients 〈h, e2πikt〉, which we assumeare all zero. Thus 〈h, bn〉 = 0, and we have a contradiction.

Since h is the difference between the smooth f and its Fourier series, wehave shown

137

Lemma 70. Let f ∈ C∞(S1,C). Then

f(t) =∞∑

k=−∞

〈f, e2πikt〉 e2πikt,

and the series converges uniformly in t.

Uniform convergence on S1 implies L2 convergence (since S1 has finitemeasure; why does this imply L2 convergence?). Therefore, as in Theorem23, we have

‖f‖2L2 =

∫S1|f |2 dt =

∞∑k=−∞

|〈f, e2πikt〉|2,

for f ∈ C∞(S1,C).To complete the proof of Theorem 27, first define the Hilbert space `2 =

L2(Z,C) for the counting measure on Z. In other words, `2 is the set of allcomplex-valued integer-indexed sequences vkk∈Z so that

∑∞k=−∞ |vk|2 <∞.

Then we have the operation F of defining Fourier series:

F : L2(S1,C)→ `2, F : f 7→ fk = 〈f, e2πikt〉.

Moreover, on the dense subset C∞(S1,C) ⊂ L2(S1,C), F is an isometry.Bessel’s Inequality and the fact that e2πikt is an orthonormal set in L2(S1,C)shows that for all f ∈ L2(S1,C),

‖f‖2L2 ≥

∞∑k=−∞

|〈f, e2πikt〉|2 = ‖F(f)‖2`2 .

Therefore F is a bounded linear map from L2(S1,C) to `2. A linear map Lfrom a Banach space B1 to another Banach space B2 is called bounded ifthere is a positive constant C so that for all v ∈ B1,

‖L(v)‖B2 ≤ C‖v‖B1 .

A linear map between Banach spaces is bounded if and only if it is continuous(see Problem 56 below). Therefore, F is continuous.

Also, define the linear map G : `2 → L2(S1,C) by

G(v) =∞∑

k=−∞

vke2πikt.

138

The proof of Theorem 23 shows that G preserves the norms. In other words,

‖G(v)‖L2(S1,C) = ‖v‖`2 .

Let f ∈ L2(S1,C). Since smooth functions are dense in L2, there is asequence fn → f in L2 for fn ∈ C∞(S1,C). Since F is continuous, thenF(fn)→ F(f) in `2 as n→∞. In other words,

0 = limn→∞

‖F(fn)−F(f)‖2`2

= limn→∞

‖fn − f‖2`2

= limn→∞

‖G(fn)− G(f)‖2L2

Now recall that

G(fn) =∞∑

k=−∞

〈fn, e2πikt〉e2πikt = fn

since fn is smooth. Therefore,

0 = limn→∞

‖G(fn)− G(f)‖L2 = limn→∞

‖fn − G(f)‖L2 .

So in L2,

fn → G(f) =∞∑

k=−∞

fke2πikt.

Since we assumed fn → f in L2, this shows

f =∞∑

k=−∞

fke2πikt

in L2, and since the sum converges in L2, finite linear combinations of theorthonormal set e2πikt are dense in L2(S1,C), and e2πikt is an orthonormalbasis of L2(S1,C). So Theorem 27 is proved.

Homework Problem 56. Let L : B1 → B2 be a linear map between Banachspaces. Show that L is bounded if and only if L is continuous.

Homework Problem 57. Using the notation of the proof of Theorem 27above, show that F : L2(S1,C) → `2 is an isometry and that F G is theidentity map.

139

Homework Problem 58. Let fk ∈ C for all k ∈ Z, and assume for alln ≥ 0 that

limk→∞

knfk = limk→−∞

knfk = 0.

Then the Fourier series

f(t) =∞∑

k=−∞

fke2πikt

converges uniformly to a smooth function from S1 → C.Hint: The key is being able to change the order of the derivative d/dt

with the summation∑∞

k=−∞. Recall that the summation∑∞

k=−∞ can be in-terpreted as an integral over Z with respect to the counting measure dµ. Thusfor all t ∈ S1,

f(t) =

∫Zfke2πiktdµ(k).

To show that f(t) ∈ C1(S1,C), show that there is a constant C > 0 sothat

|fk| ≤ C

1 + |k|3.

Mimic the proof of Proposition 11: Show that the absolute value of the dif-ference quotient

fke2πik(t+h) − fke2πikt

h

is uniformly ≤ C′(|k|+1)1+|k|3 for a constant C ′. (Apply the Mean Value Theorem

to the real and imaginary parts of e2πikt separately.) Show that the series

∞∑k=−∞

C ′(|k|+ 1)

1 + |k|3

converges by using the procedure in the proof of Lemma 70 above.Use induction to show f(t) is smooth.

4.6 Compact maps and the Ascoli-Arzela Theorem

Recall that every element of L21(S1) = L2

1(S1,R) has a continuous represen-tative (Proposition 59). So there is a natural linear map L2

1(S1) → C0(S1).

140

In this section, we show that this map is compact. A linear map betweenBanach spaces Λ : B1 → B2 is called compact if the closure of the image ofthe unit ball in B1 is strongly compact in B2. In other words, if vi ∈ B1

satisfy ‖vi‖B1 ≤ 1, then Λ(vi) has a strongly convergent subsequence inB2: i.e. there is a subsequence vij and an element w ∈ B2 so that

limj→∞‖Λ(vij)− w‖B2 = 0.

The basic observation which allows us to conclude that the natural inclu-sion map L2

1(S1) → C0(S1) is compact comes from the proof of Proposition59. If f ∈ L2

1(S1), then

|f(t2)− f(t1)| =

∣∣∣∣∫ t2

t1

f(t) dt

∣∣∣∣≤

(∫ t2

t1

|f(t)|2 dt) 1

2(∫ t2

t1

dt

) 12

≤(∫

S1|f(t)|2 dt

) 12

(t2 − t1)12

≤ ‖f‖L21(t2 − t1)

12

(Note that the first equality was justified in the proof of Proposition 59.)Therefore, f is continuous. But moreover, for every ε > 0, we may choose

δ =

(ε

‖f‖L21

)2

so that|t2 − t1| < δ =⇒ |f(t2)− f(t1)| < ε.

So the modulus of continuity δ does not depend on t, and depends only onthe norm ‖f‖L2

1, and on no other information about f .

A family of functions Ω of functions from a metric space X to a metricspace Y is called equicontinuous at a point x ∈ X if for all ε > 0, there is aδ > 0 so that

dX(x, x′) < δ =⇒ dY (f(x), f(x′)) < ε

for all f ∈ Ω. The point is that δ does not depend on f . Such a familyof functions Ω is called equicontinuous on X if it is equicontinuous at eachpoint x ∈ X.

141

Note that if Ω is equicontinuous on X then each f ∈ Ω is continuous.The computations above show

Lemma 71. The unit ball in L21(S1) is equicontinuous on S1.

Theorem 28 (Ascoli-Arzela). Let X be a compact metric space, and let Ωbe an equicontinuous family of real-valued functions on X. Assume there is auniform C so that |f(x)| ≤ C for all f ∈ Ω and x ∈ X. Then each sequencefn ⊂ Ω has a uniformly convergent subsequence.

Proof. We’ll prove the theorem with the help of a few lemmas.

Lemma 72. Any compact metric space has a countable dense subset.

Proof. Let X be the compact metric space. For ε = 1/n, obviously

X =⋃x∈X

Bε(x), Bε(x) = y ∈ X : dX(x, y) < ε.

For each positive integer n, this open cover of X has a finite subcover con-sisting of balls of radius 1/n centered at points xn,1, . . . , xn,mn . The union

∞⋃n=1

xn,1, . . . , xn,mn

is a countable dense subset of X.

Lemma 73. Let P be a countable set, and let fn : P → R be a sequenceof functions. Assume there is a constant C so that |fn(p)| ≤ C for alln = 1, 2, . . . and all p ∈ P. Then there is a subsequence of fn whichconverges everywhere on P to a function f : P → R.

Proof. See Problem 59 below.

Lemma 74. Let fn be an equicontinuous sequence of mappings from acompact metric space X to R. If the sequence fn(x) converges for each xin a dense subset of X, then fn converges uniformly on X to a continuouslimit function.

142

Proof. First we show that fn(x) converges pointwise everywhere to a functionf(x). Let y ∈ X and let ε > 0. Then by equicontinuity, there is a δ > 0 sothat

dX(x, y) < δ =⇒ |fn(x)− fn(y)| < ε.

(Note δ is independent of n.) Since fn converges on a dense subset of X,there is an x ∈ Bδ(y) for which fn(x) converges. Therefore, fn(x) is aCauchy sequence in R, and so there is an N so that

n,m ≥ N =⇒ |fn(x)− fm(x)| < ε.

Therefore, for n,m ≥ N ,

|fn(y)− fm(y)| ≤ |fn(y)− fn(x)|+ |fn(x)− fm(x)|+ |fm(x)− fm(y)| < 3ε.

Therefore, fn(y) is a Cauchy sequence in the complete metric space R, andso it converges to a limit which we call f(y).

Let y ∈ X and ε > 0. Then equicontinuity shows that there is a δ > 0 sothat

x ∈ Bδ(y) =⇒ |fn(x)− fn(y)| < ε (42)

for all n. By letting n→∞, we also have

x ∈ Bδ(y) =⇒ |f(x)− f(y)| ≤ ε (43)

These Bδ(y) form an open cover of X, and so there is a finite subcover

X =k⋃i=1

Bδi(yi)

since X is compact. Choose N large enough so that

n ≥ N =⇒ |fn(yi)− f(yi)| < ε, i = 1, . . . , k. (44)

Then for x ∈ X, x ∈ Bδi(yi) for some yi, and so (42), (43) and (44) show

|fn(x)− f(x)| ≤ |fn(x)− fn(yi)|+ |fn(yi)− f(yi)|+ |f(yi)− f(x)| < 3ε.

Since the same N works for all x ∈ X, the convergence is uniform.f , as the uniform limit of continuous functions, is continuous.

This completes the proof of Theorem 28.

143

Homework Problem 59. Let P be a countable set, and let fn : P → Rbe a sequence of functions. Assume that for each p ∈ P, there is a constantC = Cp so that |fn(p)| ≤ C for all n = 1, 2, . . . . Show there is a subsequenceof fn which converges everywhere on P to a function f : P → R.

Hint: Use a diagonalization argument.

An important version of the Ascoli-Arzela Theorem is the following:

Theorem 29. Let X be a metric space so that there is a countable numberof open subsets Oi satisfying

X =∞⋃i=1

Oi, Oi ⊂⊂ Oi+1, (45)

and let Ω be an equicontinuous set of real-valued functions on X. If for asequence of functions fn ⊂ Ω, there is a uniform C so that |fn(x)| ≤ Cfor all n and all x ∈ X, then there is a subsequence of fn which convergespointwise to a function f : X → R, and the convergence is uniform on everycompact subset of X.

Remark. Recall A ⊂⊂ B for A a subspace of a topological space B meansthat the closure A relative to B is compact.

Remark. A sequence of functions converging uniformly on compact subsetsof X is said to converge normally on X.

We relegate the proof of Theorem 29 to the following problem:

Homework Problem 60. Prove Theorem 29.Hint: Consider X, Oi as in the previous theorem. Note we may apply

Theorem 28 to each of the compact sets Oi. Use a diagonalization argumentto find a uniformly convergent subsequence on each Oi. Show that everycompact subset of X is contained in some Oi.

Remark. For every smooth manifoldX (which is Hausdorff and sigma-compact),there are a countable collection of open sets Oi satisfying condition (45). Seethe notes on “The Real Definition of a Smooth Manifold.”

The Ascoli-Arzela Theorem provides the following.

Proposition 75. If C > 0 and fn is a sequence of functions in L21(S1,R)

which satisfy ‖fn‖L21≤ C, then there is a uniformly convergent subsequence.

144

Proof. This follows from the Ascoli-Arzela Theorem and Lemma 71 above,once we know in addition that there is a constant K so that |fn| ≤ Kpointwise. First of all, note that

|fn(t2)− fn(t1)| ≤ ‖fn‖L21|t2 − t1|

12 ≤ C|t2 − t1|

12

shows that for every t2, t1 ∈ S1,

|fn(t2)− fn(t1)| ≤ C

since we may choose t2, t1 ∈ [0, 1). Since(∫ 1

0

|fn|2 dt) 1

2

= ‖fn‖L2 ≤ ‖fn‖L21≤ C,

there must be a t1 ∈ S1 so that |fn(t1)| ≤ C. Then for any t2 ∈ S1,

|fn(t2)| ≤ |fn(t1)|+ |fn(t2)− fn(t1)| ≤ 2C.

Thus the hypotheses of the Ascoli-Arzela Theorem are satisfied.

Corollary 76. The inclusion L21(S1,R) → C0(S1,R) is compact.

Proof. Take C = 1 in the above theorem.

Corollary 77. Let C > 0 and let X ⊂ RN be a compact manifold, and letγn ∈ L2

1(S1, X) ⊂ L21(S1,RN) satisfy E(γn) ≤ C. Then there is a uniformly

convergent subsequence of γn, and the limit is a continuous function γ :S1 → X.

Proof. Recall

‖γn‖2L21(S1,RN ) = ‖γn‖2

L2(S1,RN ) + ‖γn‖2L2(S1,RN ) = ‖γn‖2

L2(S1,RN ) + E(γn).

Since γn(S1) ⊂ X andX is compact, there is a constantK so that |γn(t)| ≤ Kfor all n and t. Therefore,

‖γn‖2L2(S1,RN ) ≤

∫S1K2 dt = K2,

and moreover,‖γn‖2

L21(S1,RN ) ≤ C +K2

145

independently of n. So each component function γan for a = 1, . . . , N satisfies

‖γan‖L21(S1,R) ≤

√C +K2.

Then Proposition 75 shows that there is a subsequence 1γn of γn sothat the component 1γ

1n converges uniformly. Let 2γn be a subsequence of

1γn so that 2γ1n and 2γ

1n converge uniformly. By induction, as in the proof

of Theorem 24, there is a subsequence Nγn of γn so that Nγan converges

uniformly for a = 1, . . . , N . Since this subsequence converges uniformly oneach component in RN , Nγn converges uniformly as n → ∞ to a limit γ inC0(S1,RN).

Since X is closed in RN and the subsequence converges pointwise, thelimit γ : S1 → X.

It is also useful to define the Holder norm for functions f : S1 → R

‖f‖C0, 12 (S1)

= ‖f‖C0 + supt1 6=t2

|f(t1)− f(t2)|dS1(t1, t2)

12

.

(Here we definedS1(t1, t2) = inf

k∈Z|(t1 + k)− t2|.

This definition is necessary, since we identify the real numbers t and t+k onthe circle S1. For example, dS1(0, 0.9) = |1− 0.9| = 0.1.) It is easy to check

that this defines a norm. Define the space C0, 12 (S1) to be all f from S1 → R

so that ‖f‖C0, 12 (S1)

<∞.

C0, 12 (S1) is a Banach space (Proposition 78 below), and the calculations

above show that there is a natural continuous inclusion map from L21(S1)→

C0, 12 (S1). Moreover, the natural inclusion map from C0, 1

2 (S1) → C0(S1) iscompact. Then Problem 63 below shows that composition inclusion fromL2

1(S1)→ C0(S1) is compact.In general, for any metric space X, α ∈ (0, 1], we can define

C0,α(X) = f : X → R : ‖f‖C0,α <∞,

‖f‖C0,α = supx∈X|f(x)| + sup

x 6=y∈X

|f(x)− f(y)|dX(x, y)α

.

These are called Holder spaces and Holder norms respectively.

146

Example 21. This is the standard example for X = [−1, 1] ⊂ R. f(x) = |x|αis in C0,α(X).

Proof. It clearly suffices to bound the difference quotient

q(x, y) =||x|α − |y|α||x− y|α

, x 6= y ∈ [−1, 1].

We will show that this is always ≤ 1. First, simplify to the case x and yhave the same sign, since if they have opposite signs, q(x, y) < q(−x, y).We may assume x and y have the same sign. By possibly interchanging(x, y) ↔ (−x,−y) and switching x and y, we may assume x > y ≥ 0. Thenwrite

q(x, y) =xα − yα

(x− y)α=

1− ρα

(1− ρ)α, ρ =

y

x∈ [0, 1).

Then we computedq

dρ=α(1− ρα−1)

(1− ρ)α+1≤ 0.

Therefore, the max of q(ρ) is achieved at ρ = 0, q = 1.

We also say f(x) = |x|α is locally C0,α on R, since the α Holder norm off is finite on any compact subset of R.

In the case α = 1, note that a function in C0,1 is simply a C0 functionwhich is globally Lipschitz.


(a) Show that the inclusion C1(S1) → C0(S1) is compact (Hint: use theMean Value Theorem).

(b) Show that every bounded sequence fn ∈ C1(R) (i.e., there is a uniformC so that ‖fn‖C1 ≤ C for all n) has a subsequence which convergesuniformly on compact subsets of R to a continuous limit f . Hint: It iseasy to show that R satisfies condition (45).

(c) Find an example of a bounded sequence of functions fn ∈ C1(R) whichdoes not have a convergent subsequence in C0(R). Thus the inclusionC1(R) → C0(R) is not compact. (Hint: How is this situation differ-ent from parts (a) and (b)? You must use the noncompactness of R.Therefore, the interesting behavior of the fn should be “moving off toinfinity.”)

147

It is also useful to apply Holder norms to the derivatives of a functions.In particular, on Rn, we may define for k a positive integer, α ∈ (0, 1],

‖f‖Ck,α =∑|β|≤k

‖∂βf‖C0,α ,

where, as in (3) above, we use the multi-index notation to denote all thepartial derivatives of f of order ≤ k.

Remark. It is not useful to define C0,α for α > 1, as the following problemshows.

Homework Problem 62. Let α > 1, and let f : R→ R. Assume that

supx 6=y

|f(x)− f(y)||x− y|α

= C <∞.

Show that f is a constant function.Hint: Use the definition of the derivative to show that f ′(x) = 0 for all

x.

Proposition 78. Let X be a metric space and α ∈ (0, 1]. Then C0,α(X) isa Banach space.

Proof. It is straightforward to show that ‖ · ‖C0,α is a norm. As always, wemust check completeness carefully.

Let fn be a Cauchy sequence in C0,α(X). We want to show that thereis a limit f ∈ C0,α and that ‖fn − f‖C0,α → 0 as n→∞.

First of all, it is obvious from the definition of the Holder norm thatfn is a Cauchy sequence in C0(X), and since C0 is complete, there is acontinuous limit function f , and fn → f uniformly.

Now we show f ∈ C0,α. Let ε > 0. Then there is an N so that

m,n ≥ N =⇒ ‖fm − fn‖C0,α < ε. (46)

Then for all m ≥ N , ‖fm‖C0,α < ‖fN‖C0,α + ε ≡ Cε. By the definition of theHolder norm, for all x, y ∈ X,

|fm(x)− fm(y)| ≤ CεdX(x, y)α.

Taking m → ∞ shows that f ∈ C0,α. Now (46) also implies that for allx, y ∈ X,

|fm(x)− fn(x)− fm(y) + fn(y)| ≤ εdX(x, y)α,

148

and so again let m→∞ to show for all x, y ∈ X, and for all n ≥ N ,

|f(x)− fn(x)− f(y) + fn(y)| ≤ εdX(x, y)α.

Since we already know fn → f in C0, this is exactly the additional statementwe need to show fn → f in C0,α.

Remark. If X is a smooth manifold, then it is possible (by using an atlas anda subordinate partition of unity) to define Ck,α(X). If X is compact, thenCk,α(X) → Ck(X) is a compact inclusion.

Homework Problem 63. Let Λ: B1 → B2 and Φ: B2 → B3 be linear mapsbetween Banach spaces.

(a) Assume Λ is continuous and Φ is compact. Then Φ Λ is compact.

(b) Assume Λ is compact and Φ is continuous. Then Φ Λ is compact.

Homework Problem 64. Let Λ : B1 → B2 be a compact linear map ofBanach spaces. Show Λ is continuous.

Hint: It suffices to show Λ is bounded. For B1(0) the unit ball in B1,consider the image of the compact set

ΛB1(0) ⊂⊂ B2

under the norm map ‖ · ‖B2 : B2 → R.

Remark. The Holder spaces Ck,α, for α ∈ (0, 1), and the Sobolev spaces Lpk,for p ∈ (1,∞), play a very important role in the theory of partial differen-tial equations. In particular, the behave much better than the more obviousspaces Ck. Our simple proofs that L2

1(S1) embeds continuously in C0, 12 (S1)

and compactly in C0(S1) constitute some of the easiest cases of Sobolev em-bedding theorem. The Sobolev embedding theorem allow us to embed certainSobolev spaces, in which derivatives are defined only in the sense of distri-butions, to Holder and Ck spaces, in which we may take derivatives in theusual sense. These spaces are crucial to the regularity theory of solutions toPDEs.

149

4.7 Convergence

Now we have finally developed the tools needed to solve our problem. Recall

Problem: Let X ⊂ RN be a smooth compact manifold equipped with theRiemannian metric pulled back from the Euclidean metric on RN . Let C bethe class of loops γ : S1 → X in a free homotopy class on X and in L2

1(S1, X).Find a loop of least energy in C.

Our strategy is as follows: Define

L = infγ∈C

E(γ).

Since E(γ) ≥ 0 always, L ≥ 0. Now there is a sequence of γi ∈ C so thatE(γi) → L. We want to find a subsequence γij which converges to a limitγ ∈ C so that E(γ) = L. Moreover, we expect γ to be a geodesic—it shouldsatisfy the geodesic equations not just in the sense of distributions, but alsoin the usual sense. Therefore, by the theory of ODEs, γ should be smooth.

First of all, we show the existence of a limit γ. Corollary 77 shows thatthere is a subsequence of γi which converges uniformly to a continuous γ :S1 → X. (For simplicity, we just refer to this subsequence as γi again.) Sinceγi → γ uniformly, Corollary 57 shows that γ is in the same free homotopyclass. Thus we have

Proposition 79. There is a subsequence of γi which converges uniformly toa limit γ in the same free homotopy class.

Proposition 80. Let X ⊂ RN be a compact manifold. If γi : S1 → X satisfyE(γi)→ L, then there is a constant K independent of i so that ‖γi‖L2

1(RN ) ≤K.

Proof. Since X is compact, there is a uniform C so that ‖γi‖L2(S1,RN ) ≤ C.Since E(γi)→ L, E(γi) is a bounded sequence. Therefore,

‖γi‖2L21(S1,RN ) = E(γi) + ‖γi‖2

L2(S1,RN )

is bounded independent of i.

This proposition shows there is a further subsequence of γi which con-verges weakly to a γ ∈ L2

1(S1,RN) by Theorem 24. (Explanatory note: a

150

further subsequence means that we take a subsequence not just of the origi-nal γi, but of the subsequence taken in the paragraph above Proposition 79.)We still refer to this further subsequence as γi. Then Theorem 25 shows thatthe Hilbert space norm

‖γ‖L21(S1,RN ) ≤ lim inf

i→∞‖γi‖L2

1(S1,RN ).

Note a potential problem: We have taken a subsequence of the originalγi to converge uniformly to a continuous γ, and then we take a further sub-sequence to converge weakly in L2

1 to γ in L21. We must show γ and γ are the

same. This will follow from the fact that they must be equal in the sense ofdistributions, and thus are equal almost everywhere (Proposition 58). Sinceboth γ and γ are continuous, they must be equal everywhere. In particular,we require

Proposition 81. γ = γ in the sense of distributions.

Proof. It suffices to show each component γa = γa in the sense of distribu-tions for a = 1, . . . , N .

For each a = 1, . . . , N , γai → γa uniformly as i → ∞. So if φ ∈ D(S1) isa smooth test function, then

|γai (φ)− γa(φ)| =∣∣∣∣∫

S1(γai − γa)φ dt

∣∣∣∣ ≤ ‖φ‖L1‖γai − γa‖C0 ,

which goes to 0 as i→∞ by uniform convergence. Therefore,

γa(φ) = limi→∞

γai (φ). (47)

Also, γai → γa weakly in L21(S1). Let φ ∈ D(S1) ⊂ L2

1(S1) be a testfunction. Let fi = γai − γa. Then fi → 0 weakly in L2

1. Compute

〈fi, φ〉L21

=

∫S1

(fiφ+ fiφ) dt =

∫S1

(fiφ− fiφ) dt = fi(φ− φ),

the last term denoting fi acting in the sense of distributions. Therefore, forall φ ∈ D(S1),

limi→∞

fi(φ− φ) = limi→∞〈fi, φ〉L2

1= 0.

By Proposition 82 below, for every ψ ∈ D(S1), there is a φ ∈ D(S1) so thatφ− φ = ψ. Therefore, for all ψ ∈ D(S1),

limi→∞

fi(ψ) = 0 ⇐⇒ limi→∞

γai (ψ) = γa(ψ).

Therefore, by (47) above, γ = γ in the sense of distributions.

151

Proposition 82. For every ψ ∈ D(S1), there is a φ ∈ D(S1) so that ψ =φ− φ.

Proof. Recall D(S1) = C∞(S1,R). Moreover, Lemma 70 and Problem 58show that

C∞(S1,C) =

∞∑

k=−∞

fke2πikt : limk→±∞

fk|k|n = 0 for n = 1, 2, . . .

. (48)

The convergence of each such series is uniform, and the sum commutes withthe derivative d/dt.

Therefore, if

φ =∞∑

k=−∞

φke2πikt ∈ C∞(S1,C),

then

φ =∞∑

k=−∞

(−4π2k2)φke2πikt,

φ− φ =∞∑

k=−∞

(1 + 4π2k2)φke2πikt.

So if

ψ =∞∑

k=−∞

ψke2πikt ∈ C∞(S1,C),

then we may let

φ =∞∑

k=−∞

ψk

1 + 4π2k2e2πikt,

so that φ− φ = ψ.We must prove that φ ∈ C∞(S1,C). Let n be a positive integer. Then

limk→±∞

φk|k|n = limk→±∞

ψk|k|n

1 + 4π2k2= 0.

because |ψk||k|n−2 → 0. So φ is smooth. (Note that we went from a |k|nlimit to a |k|n−2 limit. This is because the differential equation is of ordertwo.)

We have considered C-valued functions so far. It is easy to check thatψ ∈ C∞(S1,R) implies φ ∈ C∞(S1,R).

152

Remark. The previous proposition uses a standard technique for solvingconstant-coefficient differential equations on S1. The differential equationthen breaks into an algebraic equation for each Fourier coefficient, each ofwhich can be typically be solved.

This also works for functions on the n-torus (S1)n. In this case, the Fourierseries is summed over Zn, and we can solve constant-coefficient PDEs. Also,on Rn, the Fourier transform turns constant-coefficient PDEs into algebraicequations of the Fourier transform variable.

Homework Problem 65. L22(S1,C) is the complex Hilbert space defined by

the inner product

〈f, g〉L22

=

∫S1

(fg + f ¯g + f ¯g) dt.

The elements of L22(S1,C) are all complex-valued functions f on S1 which are

L2 and whose first and second derivatives f and f in the sense of distributionsare also L2 functions. (You may assume L2

2(S1,C) is a Hilbert space, as inProposition 62.)

Show that if fn → f converges weakly in L22(S1,C), then for all φ ∈ D(S1),

fn(φ)→ f(φ).Hint: Mimic the proofs of Propositions 81 and 82.

To recap, so far we have a sequence of loops γi in C so that

limi→∞

E(γi) = L = infα∈C

E(α),

limi→∞

γi = γ uniformly and weakly in L21(S1,RN).

Moreover, γ ∈ C the same free homotopy class of L21 loops containing the γi.

Since γi → γ uniformly, we have

‖γi − γ‖2L2(S1,RN ) =

∫S1|γi − γ|2 dt ≤ sup

t|γi − γ|2 → 0,

and so γi → γ in L2.

153

Now Theorem 25 shows that

‖γ‖2L21(S1,RN ) ≤ lim inf

i→∞‖γi‖2

L21(S1,RN )

= lim infi→∞

[E(γi) + ‖γi‖2

L2(S1,RN )

]= L+ ‖γ‖2

L2(S1,RN ),

E(γ) = ‖γ‖2L21(S1,RN ) − ‖γ‖

2L2(S1,RN )

≤ L.

Since L is the infimum of the energy of all loops in C, and γ ∈ C, thenE(γ) ≥ L as well. So E(γ) = L. Thus we have proved

Theorem 30. Let X be a compact Riemannian manifold without boundary.Then in each free homotopy class of loops, there is a γ ∈ L2

1(S1, X) whichminimizes the energy.

Corollary 83. This minimizing γ satisfies the geodesic equations (in localcoordinates on X) in the form

2(gikγi)˙ − gij,kγiγj = 0

in the sense of distributions.

Proof. See Proposition 61.

Note in the proof of Theorem 30 above, we implicitly use the fact thatthe map from L2

1(S1)→ L2(S1) is compact, by using the inclusions

L21(S1) → C0(S1) → L2(S1),

the first of which is compact and the second of which is continuous. Thefollowing problem gives a direct proof.

Homework Problem 66. Show directly that the inclusion L21(S1,C) →

L2(S1,C) is a compact linear map.Hints:

(a) Use the characterization of L21(S1,C) in terms of Fourier series from

Proposition 87 below.

154

(b) If ‖fi(t)‖L21≤ 1, then use a diagonalization argument to produce a

subsequence fij so that for each k ∈ Z, the Fourier coefficients fkijconverge to constants gk ∈ C as j →∞.

(c) For all ε > 0, show that there is an N so that if |k| ≥ N , then∑|k|≥N

|fk|2 < ε

for all f such that ‖f‖L21≤ 1.

(d) Conclude that the subsequence fij converges strongly to∑k∈Z

gke2πikt

in L2(S1,C).

Remark. The proof presented in the previous problem works for Sobolevspaces in higher dimensions (for functions on the n-dimensional torus S1 ×· · ·×S1), whereas the use of the Sobolev embedding theorem for the compactinclusion L2

1(S1,C) → C0(S1,C) is only available in dimension n = 1.

4.8 Regularity

Now we show that γ is smooth. First of all, note that Γkij is a smooth in eachset of local coordinates x on X. Also, since γ ∈ L2

1(S1,RN), then we knowthat γ is continuous in t ∈ S1, and so Γkij(γ) is continuous on S1.

Until now, we’ve been lax about distinguishing between γ = (γ1, . . . , γN) ∈X ⊂ RN and γ in local coordinates. There is an important point in whichwe should make a distinction. Recall we are working on a coordinate chartφ : U → O ⊂ X ⊂ RN , where U ⊂ Rn. Our notation has been this: γa is theath coordinate of γ in RN ⊃ X, while γi has been shorthand for (φ−1 γ)i

the ith coordinate of φ−1 γ in Rn ⊃ U .In the previous subsections, we have dealt with the L2

1 norm of γ in RN ,while in local coordinates, we should deal with the L2

1 norm of φ−1 γ inU ⊂ Rn. Let φ−1 : O → U be restriction of the smooth map

y = (y1, . . . , yn) : Q → U ,

155

where Q is an open subset of RN which contains O ⊂ X ⊂ RN . (Recallwe may do this by the definition of smooth maps from O to Rn.) Let x =(x1, . . . , xN) represent coordinates on RN . Compute for k = 1, . . . , n

∂

∂t(y γ)k =

∂yk

∂xaγa,

where a is summed from 1 to N .

Proposition 84. Let φ : U → O be a smooth coordinate parametrization ofX. Let I ⊂ R be a compact interval, and let K ⊂ O be compact. Then thereare positive constants C1, . . . , C5 so that

C1‖γ‖L21(I,RN ) + C3 ≥ ‖φ−1 γ‖L2

1(I,Rn) + C4 ≥ C2‖γ‖L21(I,RN ) + C5 (49)

for all γ so that γ(I) ⊂ K. (The point is that C1, C2, C3, C4, C5 are indepen-dent of γ.)

Corollary 85. ‖γ‖L21(I,RN ) is bounded if and only if ‖φ−1 γ‖L2

1(I,Rn) isbounded.

Remark. A related, simpler notion is the following: Two norms ‖ · ‖B1 and‖ · ‖B2 on a single linear space B are called equivalent if there are constantsC1 > C2 > 0 so that for all x ∈ B,

C1‖x‖B1 ≥ ‖x‖B2 ≥ C2‖x‖B1 .

Remark. As long as we restrict to compact subsets of coordinate charts,the norms in RN ⊃ X and in local coordinates on Rn are equivalent. Thecorollary holds for all the Banach function spaces we have discussed, not justfor L2

1. Also, a similar proposition holds for Banach spaces of functions fromX to R, not simply spaces of maps from S1 to X:

For K ⊂⊂ O, the norms on L21(K) and L2

1(φ−1K) are equivalent underthe map

L21(K)→ L2

1(φ−1K), f 7→ f φ.

Proof of Proposition 84. We claim it suffices to prove the bound (49) sepa-rately for the L2 norm of γ and for the L2 norm of γ. Proof: if A = ‖γ‖L2

and B = ‖γ‖L2 , then‖γ‖L2

1=√A2 +B2.

156

Then it is easy to check that for A,B ≥ 0,

1√2

(A+B) ≤√A2 +B2 ≤ A+B.

In other words, the norm on γ given by the sum of the L2 norm of γ and theL2 norm of γ is equivalent to the L2

1 norm. It is straightforward to use thisfact to prove the claim.

Since φ−1 is C1 on K, it is locally Lipschitz and thus globally Lipschitzon K (see Proposition 17). So for C the Lipschitz constant and x0 a pointin K, for all x ∈ K,

|φ−1(x)| ≤ |φ−1(x0)|+ C|x− x0|≤ C ′ + C|x|,

C ′ = |φ−1(x0)|+ C|x0|.

Therefore, the Triangle Inequality gives

‖φ−1(γ)‖L2(S1,Rn) =

(∫S1|φ−1(γ(t))|2 dt

) 12

≤(∫

S1(C ′ + C|γ(t)|)2 dt

) 12

≤(∫

S1(C ′)2 dt

) 12

+

(∫S1

[C|γ(t)|]2 dt) 1

2

= C ′ + C‖γ‖L2(S1,RN )

This is essentially one half of (49) for the L2 norm of γ. The other halffollows from the fact that φ is a C1 function on the compact set φ−1K.

We still must address the L21 norm of γ. Recall for y = φ−1 as above, that

(φ−1 γ)˙ =∂y

∂xaγa.

On the compact set K, since φ−1 is C1, there is a constant C so that∣∣∣∣ ∂y∂xa∣∣∣∣ ≤ C on K,

157

and so on K

∣∣(φ−1 γ)∣∣ =

∣∣∣∣ ∂y∂xa γa∣∣∣∣ ≤ C

N∑a=1

|γa| ≤ CN |γ|.

Thus, as in the previous paragraph,

‖(φ−1 γ) ‖L2(S1,Rn) ≤ CN‖γ‖L2(S1,RN ).

The opposite inequality can be obtained by considering φ as a C1 map insteadof φ−1.

Remark. In the previous proof, it sufficed to consider the L2 norms of γ andγ separately. For higher derivatives, this is no longer adequate: Compute

(φ−1 γ) =∂y

∂xaγa +

∂2y

∂xa∂xbγaγb.

So first derivative terms of γ come into the calculations of the second deriva-tives of φ−1 γ.

The geodesic equation is written in terms of the coordinates on U ⊂ Rn,and for an open interval I ⊂ S1, γ(I) ⊂ O. On any compact subinterval ofI, there is a constant C so that the the components of the metric gk`(γ) andits first derivatives gk`,m(γ) have absolute values bounded by a constant C(this is since γ is continuous on the compact interval I). Since γ ∈ L2

1, eachγi ∈ L2. Therefore, Holder’s inequality shows that∫

I

12|gij,kγiγj| dt ≤ C

2

n∑i,j=1

(∫I

|γi|2 dt) 1

2(∫

I

|γj|2 dt) 1

2

<∞.

Thus 12|gij,kγiγj| ∈ L1(I) for each k, and thus Corollary 83 shows (gikγ

i)˙ ∈L1(I) for each k in the sense of distributions. Lemma 86 below and the proofof Proposition 59 above then show gik(γ)γi is continuous. Now since theinverse metric g`m(γ) is continuous in t, we may multiply by it to show thateach γi is continuous as well. Thus γ is locally C1.

Now bootstrap using Corollary 83 again to show that (gik(γ)γi)˙ is con-tinuous. Thus gik(γ)γi is, in the sense of distributions, a C1 function. Asabove, this shows γi is also C1, and thus γ is locally C2.

158

We now have enough regularity to show rewrite Corollary 83 as thegeodesic equation

γk = −Γkij(γ)γiγj.

for γk C2 functions. The equation holds in the usual sense of ODEs. There-fore, since Γkij is smooth, the usual regularity theory for ODEs, Theorem 9,applies, and the geodesic γ is smooth.

Lemma 86. Let f ∈ L1loc(R). Then

g(t) =

∫ t

t0

f(s) ds

is continuous.

Proof. Let t ∈ R, and let h > 0 (the case h < 0 is similar). Compute

g(t+ h)− g(t) =

∫ t+h

t

f(s) ds

=

∫Rχ[t,t+h](s) f(s) ds

for χ[t,t+h] the characteristic function of the interval [t, t+h]. Then as h→ 0,

χ[t,t+h](s) f(s)→ 0

almost everywhere on R. For small h,∣∣χ[t,t+h](s) f(s)∣∣ ≤ ∣∣χ[t−1,t+1](s)f(s)

∣∣ ,and the right-hand function is integrable since f is locally L1. Then theDominated Convergence Theorem says that

g(t+ h)− g(t) =

∫Rχ[t,t+h](s) f(s) ds→

∫R

0 ds = 0

as h→ 0+. The case h→ 0− is similar. Thus g(t+ h)→ g(t) as h→ 0 andg is continuous at each t ∈ I.

Homework Problem 67. Let f : R→ R be an L1 function. Show that

ψ(t) = exp

(∫ t

0

f(τ) dτ

)is a continuous function satisfying ψ(0) = 1 and ψ solves ψ = f(t)ψ inthe sense of distributions. (Hint: approximate f in L1 by a sequence of C∞

functions.)

159

4.9 Sobolev spaces, distributions, and Fourier series

In this subsection, we provide some more background results about Sobolevspaces and distributions on S1.

First of all, we describe C valued distributions. A complex valued distri-bution is a C-linear map from C∞(S1,C) to C.

Example 22. For k ∈ Z, the map

φ 7→ φk =

∫S1φ e2πikt dt

is a distribution.

Proposition 87.

L21(S1,C) =

∑k∈Z

fke2πikt :∑k∈Z

|fk|2(k2 + 1) <∞

.

Moreover, the norm ‖f‖L21

is equivalent to

(∑k∈Z

|fk|2(k2 + 1)

) 12

.

Proof. First we show ⊂. Let f ∈ L21(S1,C) and compute

f(e−2πikt) =

∫S1f(t) e−2πikt dt = −

∫S1f(t)(−2πik)e−2πikt dt = 2πikfk.

Since f ∈ L2,∑k∈Z

4π2k2|fk|2 = ‖f‖2L2 <∞. ⇐⇒

∑k∈Z

k2|fk|2 <∞.

Now since f ∈ L2 also, then∑k∈Z

|fk|2 <∞ and∑k∈Z

|fk|2(k2 + 1) <∞.

This proves ⊂.

160

To show ⊃, note that∑k∈Z

|fk|2(k2 + 1) <∞ ⇐⇒∑k∈Z

|fk|2 <∞ and∑k∈Z

k2|fk|2 <∞.

Therefore,

f =∑k∈Z

fke2πikt ∈ L2,

and by the computations in the previous paragraph fk ≡ f(e−2πikt) =2πikfk. Consider a test function φ ∈ C∞(S1,C). Then compute

f(φ) = −f(φ)

= −∫S1f φ dt

= −〈f, φ〉L2

= −∑k∈Z

fk 2πikφk

= −∑k∈Z

(−2πik)fk φk

=∑k∈Z

fkφk

=

⟨∑k∈Z

fke2πikt, φ

⟩L2

.

This shows that

f =∑k∈Z

fke2πikt =∑k∈Z

(2πik)fke2πikt

in the sense of distributions. Therefore, both f and f are in L2, and thusf ∈ L2

1(S1,C).The statement about equivalence of the norms follows easily.

Remark. Similar easy calculations show that

L2m(S1,C) =

∑k∈Z

fke2πikt :∑k∈Z

|fk|2(k2 + 1)m <∞

161

for every m = 0, 1, 2, . . . . Our characterization of smooth functions in (48)above then shows that

C∞(S1,C) =∞⋂m=0

L2m(S1,C)

Proof: it is straightforward to show that L2m(S1,C) compactly embeds in

Cm−1(S1,C) for all m ≥ 1.

The Fourier series isometry between L2(S1,C) and sequences `2 = L2(Z,C)also allows us to define even more Sobolev spaces.

For any s ∈ R, define L2s(S1,C) to be the set of distributions f which act

onφ =

∑k∈Z

φke2πikt

by

f(φ) =∑k∈Z

fkφk, (50)

where fk = f(e2πikt) and we assume that∑k∈Z

|fk|2(1 + k2)s <∞. (51)

Homework Problem 68. Show that if fk is a sequence of complex numberssatisfying (51), then for any φ ∈ C∞(S1,C), the sum in (50) converges.

Now we are able to put a topology on C∞(S1,C). We only describe thistopology in terms of convergence of sequences. We say φj → φ in C∞(S1,C),if φj → φ in L2

m(S1,C) for all m ≥ 0.

Homework Problem 69. Show that φj → φ in C∞(S1,C) if and only ifφj → φ in Cp(S1,C) for all p ≥ 0.

Hint: You may use the fact that L2m(S1,C) embeds compactly into Cm−1(S1,C)

for each m ≥ 1. Also show that Cp(S1,C) embeds continuously into L2p(S1,C)

for all p ≥ 0.

Now we finally give the correct definition of complex distributions on S1.A distribution on S1 is a continuous C-linear map from C∞(S1,C) to C.Denote the space of complex distributions on S1 by D′(S1,C).

162

Proposition 88. D′(S1,C) =⋃m∈Z

L2m(S1,C), and the image of D′(S1,C)

under the Fourier transform is the set of all polynomially bounded complexsequences. In other words, it is the set of all sequences fk so that there arem ∈ Z, C > 0 so that |fk| ≤ C(k2 + 1)

m2 for all k ∈ Z.

Proof. We prove the first equality, and leave the rest as an exercise.To prove ⊃, if f is in the union, then f ∈ L2

−m(S1,C) for some positivem. To show f ∈ D′(S1,C), consider a sequence of φj → φ in C∞(S1,C).Then by definition, φj → φ in L2

m. Then

|f(φj)− f(φ)| = |f(φj − φ)|≤

∑k∈Z

|fk(φkj − φk)|

=∑k∈Z

|fk|(1 + k2)

m2

[|φkj − φk|(1 + k2)

m2

]

≤

(∑k∈Z

|fk|2

(1 + k2)m

) 12(∑k∈Z

|φkj − φk|2(1 + k2)m

) 12

.

The second term in the last line goes to zero by the remark after Proposition87, while the first term is finite by the fact f ∈ L2

−m. Therefore, f(φj)→ f(φ)for every test function φ, and f ∈ D′(S1,C).

We prove ⊂ by contradiction. If f ∈ D′(S1,C) is not in L2m(S1,C) for

every m ∈ Z, then for all m ∈ Z,

∞∑k=−∞

|fk|2(1 + k2)m =∞.

This implies that

supk∈Z|fk|2(1 + k2)m =∞ for all m ∈ Z.

(Proof of the contrapositive:

|fk|2(1 + k2)m ≤ C =⇒∑k∈Z

|fk|2(1 + k2)m−1 ≤∑k∈Z

C

1 + k2<∞.)

163

So for each j, there is a kj so that

|fkj |(1 + k2

j )j2

≥ 1.

We may assume kj 6= 0.Now we construct a sequence φj which converges to 0 in C∞(S1,C), but

for which f(φj) 6→ 0. Define

φj =fkj

|fkj |(1 + k2j )

j2

e2πikjt.

Compute

‖φj‖2L2n≈

(1 + k2j )n

(1 + k2j )j

= (1 + k2j )n−j,

where ≈ denotes equivalence of norms. For each fixed n, since each k2j ≥ 1,

thenlimj→∞‖φj‖2

L2n

= 0,

and so φj → 0 in C∞(S1,C). On the other hand,

f(φj) = fkj · fkj

|fkj |(1 + k2j )

j2

=|fkj |

(1 + k2j )

j2

≥ 1.

So f(φj) 6→ 0 = f(0) = f(limφj), where φj → 0 in C∞(S1,C).

164

Real Analysis II - Rutgers Universityandromeda.rutgers.edu/~loftin/ra2/ra2.pdf · Real Analysis II...

Documents

Transcript of Real Analysis II - Rutgers Universityandromeda.rutgers.edu/~loftin/ra2/ra2.pdf · Real Analysis II...