Introduction to Applicable Analysis: Part II (1997 Spring ... · 34A Self-Adjointness 467 34B...
Transcript of Introduction to Applicable Analysis: Part II (1997 Spring ... · 34A Self-Adjointness 467 34B...
Introduction to Applicable Analysis:Part II
(1997 Spring version)
Y. GonoBeckman Institute and Department of Physics
405 N. Mathews Av.University of Illinois at Urbana-Champaign
Urbana, IL 61801
February 13, 1997
Teaching is the Fine Art of Imparting Knowledge withoutPossessing It. -Mark Twain
Hence, you must be careful about using these notes. This Part II isstill crude especially towards its end.
1
Table of Contents
19 Integration Revisited 272
Appendix a19 Measure 283
20 Hilbert Space 28921 Orthogonal Polynomials
21A General Theory 30321B Representative Examples 310
22 Numerical Integration22A Gauss Formulas 31922B Variable Transformation Schemes 32522C Multidimensional Integrals 327Appendix a2C Electrodynamics 33
23 Separation of Variables - General Consideration~ 33124 General Linear ODE
24A General Theory 34024B Frobenius' Theory 34424C Representative Examples 350Appendix a24 Floquet Theory 355
25 Asymptotic Expansion 35626 Spherical Harmonics
26A Basic Theory 36926B Application to PDE 376
27 Cylinder Functions27A General Theory 38127B Application to PDE 399
28 Diffusion Equation: How irreversibility is captured 40029 Laplace Equation: Consequence of spatial moving average 40630 Wave equation: Finiteness of propagation speed 41331 Numerical solution of PDE 42132 Fourier Transformation
32A Basics 43032B Applicationof Fourier Transform 43532C Fourier Analysis of Generalized Functions 44132D Radon Transformation L448Appendix a32 Bessel Transform 453
33 Laplace Transformation 455
2
Appendix a33 Mellin Transformation 465
34 Linear Operators34A Self-Adjointness 46734B Spectral Decomposition 46934C Spectrum 472
35 Spectrum of Sturm-Liouville Problem 47736 Green's Function: Laplace Equation 48237 Spectrum of Laplacian 48738 Green's Function: Diffusion Equation 49239 Green's Function: Helmholtz Equation 49840 Green's Function: Wave equation 502
41 Colloquium: What is Computation?41A Recursive Functions and Church Thesis 50641 B Turing Machine 51041C Decision Problem 51241D Computable Analysis 51541E Algorithmic Randomness 51841F Randomness as a Fundamental Concept 520
Appendix A Rudiments of Analysis
Table of Standard Symbols 524Al Points and Limits 524A2 Function 529A3 Differentiation 531A4 Integration 537A5 Infinite Series 540A6 Function of Two Variables 545A7 Fourier Series and Fourier Transform 551A8 Ordinary Differential Equation 555A9 Vector Analysis 558
Index 563
3
19 Integration Revisited
Riemann can integrate piecewise continuous functions. However, there are many functions which cannot be integratedby the Riemann integration, although the values of theirintegrals are more or less obvious. In this section, the basic idea of the Lebesgue integral is given with a practicalsummary. The theory is a natural prerequisite for understanding Hilbert space. The most natural integral conceptfor Fourier expansion is the Lebesgue integral. In the Appendix, rudiments of measure theory is outlined.
Key words: measure zero, almost everywhere, Lebesgueintegral, dominated convergence theorem, Beppo-Levi's theorem, Fubini's theorem, Gaussian integral, Wick's theorem.
Remember:(1) Lebesgue integral is defined by the integral of simple functions (=functions taking only countably many values) (19.7-8).(2) There are several very powerful theorems for Lebesgue integration(19.11-17). Basically, they justify what looks formally OK to physicists.(3) Lebesgue integral is the most natural framework to consider Fourieranalysis (19.18).(4) Gaussian integrals should be very familiar (19.19-20).
[19.0 Practical Check].Exercise. Before going into the discussion of the Lebesgue integration theory, letus check our practical ability to compute Riemann integrals. (1) Compute the following indefinite integrals:
Jdax + b
x ex + d'
Here we assume that a, b, e(:I 0), d are constants.(2) Let n E N. For
r/2
In == Jo sinn xdx
demonstrate that
In = (1 - ~) In- 2 •n
272
(19.1)
(19.2)
(19.3)
Then, compute In.(3) Find the range of 0: where
100 sin2 x--dx
o x Oi
exists.(4) [Fresnel integral]. Show that
exists (as a Riemann integral). cf 8B.8(1).(5) Does
100sinecosh x )dx
exist (as a Riemann integral)?(6) Show
Use (.....8B.7)
100 -OIX sin AX d _ 0:e x - 2 \2'
o X 0: +A(7) Show that
rOO sin ax cos bx = ~,
io x 2
if a> b> O. What happens otherwise?(8) Show that
1~/2 ~
log sin BdB = - - log 2.o 2
(9) Compute
. 1 x 2x (71 - l)x11m -[1 + cos - + cos - + ... cos + cosx]
n--+oo 71 71 71 71
(10) Computed" t (x - y)"-l
dx n io (71 - I)! j(y)dy.
Discussion.(1) Let
( ) -100 dxI a,b =o v(a2 + x 2 )(b2 + x 2
)
for positive a and b. Show that
273
(19.4)
(19.5)
(19.6)
(19.7)
(19.8)
(19.9)
(19.10)
(19.11)
(19.12)
(19.13)
(19.14)
(19.15)
for any n = 1,2"", where an+l = (an + bn)/2 and bn+1 = Vanbn, where al = aand b1 = b. an and bn converge to a common limit ft determined by a and b. Gauss(--7.15) used the bove observation to compute p. = 1r/2I. Show this conclusion.(2) Let! be integrable on [0,1]. Then
11
exp(j(t))dt ~ exp (11
!(t)dt) .
Note that fo1 f(t)dt may be understood as the average of f on [0, l] (--2A.1, Discussion (A)).
19.1 Dirichlet function. The Dirichlet function is defined as275
D(x) = { 0 for x r:t Q,1 for x E Q. (19.16)
fo1 dxD(x) must be zero, but obviously this function is not Riemannintegrable.
19.2 The area below D(x) must be zero. We know (-17.18(4),A1.16) all the rational numbers can be counted, so we may write thetotality ofrational numbers in [0,1] as Q == {Yn}~=l = Qn [0,1]. Let uscover Yn with an interval En of length E/2n centered at Yn' Obviously,UEn :J Q for any positive E, but the total length of UEn is not largerthan E, because length(UEn ) ::; 2:( length En) = E. This number is anypositive number, so it can be indefinitely small. Hence, the total areaoccupied by Q must be zero. This must be the area below D(x) on[0,1]. Hence, 'fldxD(x)' =0 (-19.7).
19.3 Measure zero. We have demonstrated that Q is measure zero.A set U c R is called a measure zero set, if it can be covered by countably many open intervals the totality of the length of which is less thanEfor any E(> 0). 19.2 tells us that any countable set is measure zero.See Appendix a19 for a general discussion about measure (-a19.4).
19.4 Lebesgue's characterization of Riemann integrability. Inhis thesis, Lebesgue showed the following theorem.Theorem. A bounded function f is integrable in the sense of Riemannon [a, b] if and only if the set of discontinuous points of f is measurezero. 0Obviously, D(x) is not integrable in the sense of Riemann.
19.5 "Almost everywhere". Lebesgue also introduced the conceptof almost everywhere: if a property 'A' is true for a function f except on
275 This is the characteristic function of the set of all the rational numbers.
274
the measure zero set, we say f has the property 'A' almost everywhere.Thus the theorem above can be restated as: A bounded function f isRiemann integrable if f is almost everywhere continuous.
19.6 Simple function. A function which takes at most countablymany (-17.18(4), Al.16) values is called a simple function. TheDirichlet function (-19.1) is a simple function, because it assumesonly two values, 0 and 1.
19.7 Lebesgue integral of simple functions. Let f be a realvalued simple function defined on an interval I. If the right-hand-sideof the following formula converges absolutely, we say f is Lebesgue integrable and the limit is denoted by just the same symbol as the Riemannintegral:
(19.17)
where 1* 1 is the total length ofthe set *, and In ..--- {xix E I, f(x) = Yn}.Cantor showed IQI = 0 (-19.2). Hence, the Dirichlet function isLebesgue integrable and the value of the integral is zero. 276
Note that the values of a function on measure zero sets are irrelevant to the value of the integral.
19.8 Lebesgue integral of general function: L 1 ([a, b]). TheLebesgue integral of a function f on an interval [a, b] is defined as follows. Make a uniform approximation sequence of Lebesgue integrablesimple functions fi for f:
,...,
1'::::7
11
Then
sup Ifi(X) - f(x)1 - 0 as i - 00.xE[a,bJ
(19.18)
lb
f (x )dx..--- .lim l bfi (X)dx. (19.19)
a ~-+oo a
[Of course, if we cannot find such a sequence, f is not Lebesgue integrable.]
The totality of functions Lebesgue integrable on the interval [a, b]is denoted by L1([a, b]).
276In this definition, it is very crucial that all In have lengths. Or more generally, ifwe wish to define an integral of functions on a multidimensional space, then In musthave a definite volume. Therefore, Lebesgue had to contemplate on the concept'volume.' This led him to his measure theory (--+a19). We say a simple function fis measurable if all In have well-defined volumes (--+a19.4). A function f is said tobe measurable (more precisely, Borel measurable), if the set {x Ia < f (x) < b} hasa definite length (measure) for any a and b( > a).
275
Discussion [Fundamental properties of integrals].(I) Double Linearity. We know that the integral is linear with respect to theintegrand. There is one more linearity with respect to the domain as we alreadynoticed in 6.2:
or
If we define
l c
f(t)dt = l b
f(t)dt +l c
f(t)dt
1 f(t)dt = 1 f(t)dt + r f(t)dt.[a,b]+[b,c] [a,b] J[b,c]
(19.20)
(19.21)
(19.22)r f(t)dt = Ct1 f(t)dt,Ja-[a,b] [a,b]
then J becomes a linear map on geometrical objects (in this case we discussed only1D objects, but this can be generalized to general dimensional spaces). Notice thatthe convention is meaningful if we interpret the integral over -[a, b] to be the integral on [a, b] from b to a instead of a to b (- is the reversing of orientation).(II) Non-negativity and monotonicity. If the integrand is nonnegative, its inte
gral is nonnegative. Consequently, if f 2: g, then J: dtf(t) 2: J: g(t)dt.(III) Boundedness. If the integrand is bounded, then its integral over a boundedset is bounded.
19.9 Remark. We must demonstrate that the limit in 19.8 doesnot depend on the choice of the approximation sequences, but it is atechnical detail. An important difference between the Riemann andthe Lebesgue integrations is that the latter requires absolute convergence. A. N. Kolmogorov and S. V. Fomin, Introductory Real Analysis(Revised English edition, Englewood Cliffs, 1970)277 is an excellent selfstudy textbook for the measure theory and Lebesgue integration (andstandard functional analysis (say, spectral analysis)).
19.10 Relation between Riemann and Lebesgue integrals.(1) If f is integrable in both the senses, their values are the same.(2) If f is bounded and Riemann integrable, then it is Lebesgue integrable. But(3) There are Riemann integrable but not Lebesgue integrable functions, and vice versa.
The practical merit of the Lebesgue integral is that the conditionsfor exchanging the order of operations (say, limit and integral) canbe simpler than those for Riemann integrals (---+19.11, 19.14, 19.17).This simplicity is due to the absolute convergence in the definition
277Its original Russian version is an undergraduate textbook for Analysis III(designed by Kolmogorov) of Dept of Engineering Mathematics of Moscow StateUniversity.
276
(-+19.7).
19.11 Theorem [Lebesgue's dominated convergence theorem].Let I be an interval. If limn->oo fn(x) = j(x) for almost all x E I (i.e.,except on a measure zero set (-+19.3), fn converges to j), and if there isa Lebesgue integrable function (-+19.8) ep(x) such that Ifn(x)1 < ep(x)on I, then
D
lim r fn(x)dx = r f(x)dx.n->oo iI iI (19.23)
(19.24)
19.12 Theorem [Beppo-Levi]. Let fn be Lebesgue integrable onan interval I, JI fn(x)dx < K for some number K for all n, andh :::; h :::; ... :::; f n :::; .. '. Then
lim r fn(x)dx = r lim fn(x)dx.n->oo iI iI n->oo
D
19.13 Example. Termwise integration of 2:xn = (1 - x)-l. Fort E [0,1), we may apply Beppo-Levi's theorem to the partial sums tointegrate this termwisely:
t 00 00 t 00 tn1L xndx = L io xndx = L - = -In(1 - t).o n=O n=O 0 n=l n
Exercise.Compute the following integrals in the n -+ 00 limit:(1)
11 x--dx.
o 1+ nx
(2)
11 1-:----;;-dx1 + nx2
(19.25)
(19.26)
(19.27)
Notice that the exchange of the order of limit and integration does not work for
See 14.19.
11 n2 2 dx .o 1 + n x
(19.28)
19.14 Theorem [FubiniJ. If J dx (J dylf(x, y)\) or J dy (J dx\f(x, y)l)is finite, then we may exchange the order oftwo integrations in J dx Jdyf(x, y).D
277
Discussion.(1) Using the integral of j(x, y) == xY on [0,1.] x [a, b] for 0 < a < b, demonstrate
11 xb - x a 1 + b----:---dx == log --.
o logx 1 +a
(2) Demonstrate that
J1 dxdyf(a2x2 + b2y2) ==..!!-b roo xf(x)dx.x~O,y~o 4a Jo
(3) Compute
(19.29)
(19.30)
(19.31)
(19.32)
(19.33)
19.15 Pathological example. Do not think the order of integrationscan be freely changed:
11 11 x2
- y2 7f 11 11 x2
- y2 7fdx dy = -. dy dx = --. (19.34)o 0 (x2 +y2)2 4' 0 0 (x2 +y2)2 4
Demonstrate that the condition for 19.14 is violated.
DiscussionThe reason for the pathology is explained by Legendre with the aiel of the followingformula:
11 11 x 2 - y2 1r f3dx dy (2 2)2 == - - arctan-.
0: j3 x +y 2 0:
Demonstrate the formula and complete the argument.
(19.35)
(19.36)
19.16 Good function principle. In short, if a relation is correctfor a simple function (-+19.6), then it is correct for integrable functions. This is sometimes called the good function principle.
19.17 Exchanging differentiation and integration. Suppose f(x, a)is integrable for any a in its range, and Get! is integrable, then
:a f !(x1 a)dx = f :a!(x, a)dx.
Very crudely peaking1 for Lebesgue integration, if the formal result ismathematically meaningful1 then the result is (eventually) justifiable.
278
(19.37)
Discussion.(1) Let f be continuous. Demonstrate that g defined by
r (x - y)n-lg(x) = io (n _ I)! j(y)dy
is en and g(n)(x) = j(x). [Almost the same as 19.0 (10).](2) Hadamard representation. Let j(x, y) be e 1 in the ball of radius r centeredat (xo,Yo). Then
j(x, y) = j(xo, Yo) + h (x, y)(x - xo) + h(x, y)(y - Yo),
whererIM rIM
h (x, y) = io ax (Xt, Yddt, h(x, y) = io ay (Xt, Yt)dt
with Xt = tx + (1 - t)xo and Yt = ty + (1 - t)yo.Exercise.(1) Show that
F(x) =100
e-y2
sin2xydy
satisfiesF'(x) + 2xF(x) = 1.
(2) A similar question is: Let
I(a) = 100
e-x2
cos2a.rdx.
Show thatdI- = -2aI.da
Use this to demonstrate that
I V1i _a2
=Te .
[Hint. The change of variables:: = x + a/x works.](3) Let
Demonstrate thatdI 2- = -2b I.da
Then, show
(19.38)
(19.39)
(19.40)
(19.41 )
(19.42)
(19.43)
(19.44)
(19.45)
(19.46)
(19.47)
19.18 Why is the Lebesgue integral most natural for Fourieranalysis? As we have already mentioned in 17.10(3) if f is square
279
Lebesgue integrable, then its Fourier series is almost everywhere convergent to f. See also Carlson's theorem (-t17.9). Physicists knowthat Fourier transform is a powerful tool to disentangle convolution(-t32A.2). This can be done freely only when we integrate all integrals as Lebesgue integrals. We can make a continuous and absoluteintegrable function f such that its convolution to itself Jdxf(t - x )f(x)is Lebesgue integrable, but diverges for all rational t (so that it is notRiemann integrable).278 That is, if we use the Riemann integral, thenwe cannot freely use Fourier transformation to disentangle the convolution. The Lebesgue integration theory is much more elegant andfundamental in Fourier analysis than the Riemann integration.
19.19 Gaussian integral, 'Wick's theorem'. The following integral (the generator of multidimensional Gaussian distribution) is ofvital importance in theoretical physics:
j oo joo ( 1 n n)I(A, b) = ... dX1'" dXn exp -"2 L AijXiXj +L Xibi ,-00 -00 i,j=l ;=1
(19.48)where A = M atr(Aij ) is an n X n symmetric non-singular matrix, andb is an n-vector. We get
I(A,b) = (27r)n/2(detA)-1/2 exp (~2:Aijbibj) .I,)
(19.49)
I(A, b)/I(A, 0) is called the generator (generating function) of the Gaussian distribution with mean zero and covariance matrix given by A-I.The standard method to compute this is to shift the origin to the minimum pointof the function in the parentheses as
This leads to
Yi = Xi - L(A -1 )ijbj.j
(19.50)
I(A,b) = exp (~2:(A-1)ijbibj)I:"'1: dY1··· dYn exP (-.t AijYiYj)I,) 1,)==1
(19.51)The integral can be computed by diagonalizing the matrix.
According to 19.17 we can freely change the order of differentiationwith respect to b and integration in (19.48). In this way we arrive atthe so-called Wick '8 theorem: For b = 0
(19.52)
278See Korner, Example C.6 on p570.
280
where {k1,"', kn } = {a,"" z} and the sum is over all the possiblepairings of a, b, ... ,Z. For example,
(X1X2XaX4) = (X1X2) (XaX4) + (X1Xa) (X2X4) + (X1X4) (X2Xa). (19.53)
Exercise.(A) Compute the following integrals:(1)
J1 dxdy e-(z'+2xycos lI+y').z~O,y~O
(2)
(19.54)
J r dxdye-(Z2+2ZYCosll+y2). (19.55)1R'
(B) Using the spherical symmetry of the Gaussian integral, find the following integrals in terms of
u == Jddke- ak' /2. (19.56)
(1)
1= Jddk~~e-ak'/2. (19.57)
(2)
J k2k
2J = adk-=-...1!...e- ak'/2. (19.58)
k4
[Hint. (19.53) and (k4) = d(k;) + d(d - 1)(k~k~). Also differentiation
and integration with respect to a (or -a/2) is useful.]
19.20 Gaussian integral: complex case. We have the followinganalogous formula
I(A, b) == 100
•••100
dz1dz1··· dzndzn exp (- t AijZiZj + t(z;bi + ZlJi)) ,-00 -00 i,j=l i=l
(19.59)where A is any nonsingular n x n matrix, b is a complex n-vector. Interms of real variables Xi and Yi as
(19.60)
we get dzidzi = dXidYi.279 Integration is understood as the integrationwith respect to these real variables. The result is
I(A,b) = (27f)n(detA)-l exp (~(A-1))jibj),I,J
(19.61)
279 although formally, the calculation here seems to justify the equality, it is betterto undersdand that dzdz is a shorthand notation of dxdy.
281
The cleverest proof of this relation is: (i) (if necessary) to slightly perturb A so that all the eigenvalues of A +bA are distinct (so that A +bAis diagonalizable); (ii) compute the integral analogous to 19.19; then(iii) use the continuity of the integral as a function of the componentsof A to obtain the result for the unperturbed case.
282
APPENDIX a19 Measure
In this appendix the general theory of the Lebesgue measure is outlined. Without measure theory proper understanding of statistical mechanics and dynamical systems is impossible. However, just as all theimportant topics, the essence of measure theory is not at all hard tounderstand. The theory could be read as a very nice example of theanalysis of a concept that we seem to know intuitively. For a moreformal introduction Kolmogorov-Fomin is strongly recommended.
al9.a Reader's guide to this appendix. (1) + (3) is the minimumof this appendix:(1) The ordinary Lebesgue measure = volume is explained up to al9.6.These entries should be very easy to digest. Remember that Archimedesreached this level of sophistication more than 2000 years ago.(2) General Lebesgue measure is outlined in al9.9-l1. This is an abstract repetition of (1), so the essence should be already obvious.(3) Lebesgue integral is redefined in terms of the Lebesgue measure inal9.l5 with a preparation in al9.l4. This leads us naturally to theconcept of functional and path integrals (al9.l6).(4) Probability is a measure with total mass 1 (i.e., normalized) (al9.l9).(5) If we read any probability book, we encounter the triplet (P, X, B).The reason why we need such a nonintuitive device is explained ina19.20-21.
al9.l What is volume? For simplicity, we confine our discussionto 2-space, but our discussion can easily be extended to higher dimensional spaces. The question is: what is 'area' ? It is not easy to answerthis question for an arbitrary shape.28o Therefore, we should start witha seemingly obvious example. The area of a rectangle [0, a] x [0, b] inR 2 is abo Do we actually know this? Why can we say the area of therectangle is ab without knowing what area is? To be logically conscientious we must accept:Definition. The area of a rectangle which is congruent281 to (0, a) X
(0, b) (Here ( is [ or ( and) is ] or )) is defined to be abo Notice that
280 As we will see soon in a19.21, if we stick to our usual axiomatic system ofmathematics ZF+C (-+17.18(5) for references), then there are figures withoutarea.
281 This word is defined by the superposability. That is, if we move (translate,rotate) a figure .4 and can exactly superpose it on B, we say A and B are congruent.As Hilbert (-+20.4) realized we must guarantee that the figure does not deform,etc., while being moved, so that we need an axiom, which was never stated in Euclid,although freely used by him (just as the Axiom of Choice in the early 20th century).
283
area is defined so that it is not affected by whether the boundary isincluded or not.
a19.2 Area of fundamental set. A set which is a direct sum (disjoint union) of finite number of rectangles is called a fundamental set.The area of a fundamental set is defined by the sum of the areas ofconstitutive rectangles.
It should be intuitively obvious that the join and the common setof fundamental sets are again fundamental.
a19.3 Heuristic consideration. For an arbitrary shape, the strategyfor defining its area should be to approximate the figure with a sequenceof fundamental sets. We should use the idea going back to Archimedes;we must approximate the figure from the inside and from the outside.If both sequences converge to the same area, we should define the areato be the are of the figure.
a19.4 Outer measure. Let A be a set. We consider a cover of A withfinite number of rectangles Pk (inclusion or exclusion of their boundaries can be chosen conveniently ---+a19.1), and call ita rectangularcover P = {Pk } of A. Let us denote the area of a rectangle Pk bym(Pk ). The outer measure m*(A) of A is defined byc?
m*(A) = infL m(Pk ),
k
(19.62)
where the infimum is taken over all the finite or countable rectangularcovers of A.m*(A) = °is equivalent to A being measure zero (---+19.3 or a null set).
a19.5 Inner measure. For simplicity, let us assume that A E E =[0,1] x [0,1]. Then, the inner measure m*(A) of A is defined by
Obviously,
m*(A) = 1 - m"(E \ A). (19.63)
(19.64)
for any figure A.
a19.6 Measurable set, area = Lebesgue measure. Let A be abounded subset of E. 282 If m*(A) = m*(A), then we say A is measurable (in the sense of Lebesgue), and m*(A) written as ~L(A) is called its
282It should be obvious how to generalize our argument to a more general bounded
set in R 2•
284
area (= Lebesgue measure).
al9.7 Additivity. Assume that all the sets here are in a boundedrectangle, say, E above. The join and the common set of finitely manymeasurable sets are again measurable. This is true even for countablymany measurable sets. The second statement follows from the preceding statement thanks to the finiteness of the outer measure of the join or the common set.
al9.8 O"-additivity. Let {An} be a family of measurable sets satisfying An n Am = 0 for n =!= m. Let A = UnAn. Then,
(19.65)n
This is called the O"-additivity of the Lebesgue measure. D[Demo] A is measurable due to aI9.7. Since {An} covers A, JL(A):5 2:JL(All ). Onthe other hand A:) U;;=lAn, so that for any N It(A) ~ 2::=1 JL(An).
al9.9 Measure, general case. A map from a family of sets to Ris called a set function. A set function m satisfying the following threeconditions is called a measure.(1) m is defined on a semiring283 S. [Note that the set of all the rectangles is a semiring.](2) m(A) ~ O.(3) m is an additive function: If A is direct-sum-decomposed in termsof the elements of S as A = Uk=1Ak, then m(A) = 2:k=1 m(Ak).
Therefroe, the area J-l defined in al9.6 is a measure on the set ofall the rectangles. In the case of area, the definition of area is extendedfrom rectangles to fundamental sets (~al9.2). This is the next step:
al9A.I0 Minimum algebra on S, extension of measure. Thetotality of sets A which is a finite join of the elements in S is called theminimum algebra generated by S. Notice that the totality of fundamental sets in a19.2 is the minimum algebra of sets generated by thetotality of rectangles. Just as the concept of area could be generalizedto the area of a fundamental set, we can uniquely extend m defined onS to the measure defined on the algebra generated by S.
a19.ll Lebesgue extension. We can repeat the procedure to define J-l from m* and m* in al9A.5 for any measure m on S (in an
283If a family of sets S satisfies the following conditions, it is called a semiring ofsets:(i) S contains 0,(ii) If A,B E S, then An B and AU B are in S,(iii) if Al and A are in S and Al C A, then A \ Al can be written as a direct sum(the join of disjoint sets) of elements in S.
285
abstract fashion). We define m* and m* with the aid of the coversmade of the elements in S. If m*(A) = m*(A), we define the Lebesgueextension f-l of m with f-l(A) = m*(A), and we say A is f-l-measurable.
a19.l2 Remark. When we simply say the Lebesgue measure, we usually mean the volume (or area) defined as in a19A.6. However, thereis a different usage of the word. f-l constructed in a19.ll is also calleda Lebesgue measure. That is, a measure constructed by the Lebesgueextension is generally called Q Lebesgue measure. This concept includesthe much narrower usage common to physicists.
a19.l3 a-additivity. (3) in a19.9 is often replaced by the following a-additivity condition: Let A be a sum of countably many disjointf-l-measurable sets A = U~=lAn' If
00
f-l(A) = l: f-l(An ),
n=l(19.66)
we say !t is a a-additive measure.The Lebesgue measure defined in a19.6 is a-additive. Actually,
if m is a-additive on a semiring of sets, then its Lebesgue extension isalso a-additive.
a19.l4 Measurable function. A real function defined on a set Dis called a f-l-measurable function for a Lebesgue measure f-l on the set,ifany'levelset'{xlf(x) E [a,b]}nDisf-l-measurable. When we simplysay a function is measurable, then it means that any level set has a welldefined volume in the ordinary sense.
a19.l5 Lebesgue integral with measure !t. Let f-l be Lebesguemeasure on R n
. Then the Lebesgue integral of a f-l-measurable function on U C R n is defined as
( f(x)df-l(x) = liml:af-l({xi f(x) E [a-E/2,a+E/2)}nU), (19.67)Ju €-+O
where the sum is over all the disjoint level sets of 'thickness' E (> 0).284
a19.l6 Functional integral. As the reader has seen in a19.l5, ifwe can define a measure on a set, we can define an integral over the set.
284The measures m satisfying tt(A) = 0 ::} m(A) = 0, where tt is the Lebesguemeasure (volume), is said to be absolutely continuous with respect to tt. If m is absolutely continuous "\V.r.t. tt, then Lebesgue extension, Lebesgue integral, etc are easywithout any technical difficutly just as the volume. However, careful considerationis needed because there are 'singular' measures.
286
The set need not be an ordinary finite-dimensional set, but can be afunction space. In this case the integral is called a functional integral.If the set is the totality of paths from time t = 0 to T, that is, if the setis the totality of continuous functions: [0, T] ~ R d
, we call the integralover the set a path integral. The Feynman-Kac path integral (~30.12)is an example.285
a19.17 Uniform measure. The Lebesgue measure defined in a19.6is uniform in the sense that the volume of a set does not depend on itsabsolute location in the space. That is, the measure is translationallyinvariant (see a19.20 below for a further comment). However, there isno useful uniform measure in infinite dimensional spaces (~20.2 Discussion (1)). Thus every measure on a function space or path spacemust be non-uniform.
a19.18 Borel measure. Usually, we mean by a Borel measure a measure which makes measurable all the elements of the smallest algebra(~a19.10) of sets containing all the rectangles.
a19.19 Probability. A (Lebesgue) measure P with the total mass1 is called a probability measure. To compute the expectation valuewith respect to P is to compute the Lebesgue integral w.r.t. the measure P.
When we read mathematical probability books, we always encounter the 'triplet' (P, X, B), where P is a probability measure, Xis the totality of elementary events (the event space; P(X) = 1) andB is the algebra of measurable events. This specification is needed,because if we assume that every composite event has a probability, wehave paradoxes.286 This question arose from the characterization of'uniform measure' in a finite dimensional Euclidean space:
a19.20 Lebesgue's measure problem. Consider d-Euclidean spaceRd. Is it possible to define a set function (~a19.9) m defined on everybounded set A E R d such that(l) The d-unit cube has value 1.
285 However, the definition of the Feynman path integral is too delicate to bediscussed in the proper integration theory.
286There is at least one problem in which the choice of l3 is crucial. This is thefirst digit problem. The first significant digits of a table of natural phenomenonsuch as the height of mountains do not distribute uniformly: 1 appears much moreoften than 9. Why is this so? A conclusive mathematical explanation was givenrecently: T P Hill, The Significant-digit Phenomenon, Am. Math. Month. April1995, p322. If we apparently need a uniform probability on an infinite space (inthis case [0,00)), the choice of l3 seems to be the key (-+a19.17).
287
(2) Congruent sets have the same value,(3) m(A U B) = m(A) +m(B) if An B = 0, and(4) a-additive?
This is called Lebesgue's measure problem.
a19.21 Hausdorff and non-measurable set. Hausdorff demonstrated in 1914 for any d there is no such m satisfying (1 )-(4) of a19.20.Then, Hausdorff asked in 1914 what if we drop the condition (4). Heshowed that m does not exist for d 2 3.287 He showed this by constructing a partition of 2-sphere into sets A, B, C, D such that A, B,C and B U C are all congruent and D is countable (---r15.6). Thus ifm existed, then we had to conclude 3 = 2. Therefore, we must admitnon-measurable sets.288
287Banach demonstrated in 1923 that there is a solution for d = 1 and for d = 2.288 under the current popular axiomatic system ZF + C.
288
20 Hilbert Space
Fourier expansion is quite parallel to the expansion of avector into a linear combination of basis vectors in a finite dimensional vector space. However, function spacesare generally very different from finite dimensional vectorspaces. To understand Fourier expansion more intuitively,it is convenient to introduce an infinite dimensional vectorspace in which our knowledge of finite dimensional vectorspaces can be used almost 'freely.' This is the Hilbert space.
Key words: Hilbert space, scalar product, completeness,l2' £2, H 2, Cauchy-Schwartz inequality, bra-ket, dual space,K-vector space, orthonormal basis, Gram-Schmidt orthonormalization, generalized Fourier expansion, orthogonal projection, Bessel's inequality, Parseval's equality
Remember:(1) Hilbert space is an infinite dimensional vector space in which wecan define an angle between vectors (20.3).(2) Understand Gram-Schmidt orthonormalization geometrically (20.16).(3) Fourier expansion is a orthogonal decomposition in a Hilbert space(20.14).(4) Be familiar with the bra-ket notation (20.21-24).(5) Understand the formal expression of Green's functions (20.28).
20.1 Vector space. Let V be a set such that any (finite) linear combination of its elements with coefficients taken from a field K is again inV. V is called a K-vector space. K may be R or C. A R-vector spaceis called a real vector space and a C-vector space is called a complexvector space. For example, the set C O([O, I]) of continuous real functions on the interval [0, 1] is a real vector space. The set of analyticfunctions on the unit disc is a complex vector space.Examples.(1) The set of all the real polynomials of degree n forms a real vector space.(2) The totality of continuous functions on [a, b] is a vector space (with respect tothe ordinary + and x).(3) The totality of sequences {Xi} converging to zero is a vector space, if we introduce + as {Xi} + {Yi} = {Xi + Yi} and scalar multiplication by C{Xi} = {exi}.
20.2 Infinite dimensional space. Consider the set CO([O,l]) of all
289
the continuous functions on [0,1]. x n cannot be written as a linearcombination of 1,x, x2,' •. ,xn - 1 for any n. Thus this function spaceis obviously infinite dimensional, if we wish to define the 'dimension'of the space as in the ordinary vector space by counting the necessarynumber of components to specify a vector uniquely. Another approachmay be to refer to the interpretation of f (x) as the x-component of avector f as in functional differentiation (---+3.7, 20.21 ).289
Infinite dimensionality causes special difficulties in convergence.For example, the boundedness of a sequence does not guarantee theexistence of a convergent subsequence. For example, consider,
(1,0"",), (0, 1,0", '), (0,0,1,0", .), .... (20.1)
Discussion.Infinite dimensional spaces have important peculiar features.(1) We cannot define a 'uniform volume.' More precisely, there is no uniform measure (=volume) J1 (-+19a) such that for the unit cube C (of infinite dimension)J.l(C) = 1 with the translational symmetry (i.e., even if we translate an object, itsvolume does not change), and the additivity (J.l(AUB) = J.l(A)+J.l(B), if AnB = 0).If such a J.l were to exists, then the volumes of most bounded sets are 0 or 00.290
Therefore, we cannot define the concept of 'almost everywhere' (-+19.5).291(2) Compactness and boundedness are distinct. Compactness means (-+A1.25): ifa set A is covered by a family of open sets, then A can already be covered by afinite subset of the family. If the space dimension is finite, this is equivalent to theopen boundedness (the Heine-Borel theorem). However, this is obviously untrue forinfinite dimensional space: to cover a unit open ball we need infinitely many openballs of radius 1/2. This distinction of compactness and boundedness in infinitedimensional space makes functional analysis much more difficult. A bounded operator and a compact operator are distinct (-+34C.9).
20.3 Hilbert space. An infinite dimensional vector space V, which iscomplete (see below) with respect to the norm (---+3.3 footnote) defined
289In this case one might feel that the dimension is uncountable (-+17 .15(3)).However, usually we do not pay the minute details of the functions, but pay attention to the equivalence classes of functions as individual elements (for example, weignore the difference on measure zero sets (-+19.3), so that often the dimension iscountable. See Weierstrass' theorem 17.3.
290 Here, we are not discussing 'non-measurable' sets. We confine ourselves to theBorel sets. That is, we discuss the sets which can be constructed as joins andintersections of countable finite cubes. See 19a.
291See B R Hunt, T Sauer, and J A Yorke, "Prevalence: a translational-invariant"almost-every" on infinite dimensional spaces," Bull. Amer. Math. Soc. 27, 217(1992). Addendum 28, 306 (1993).
290
by the scalar product (see below) is called a Hilbert space.292
A scalar product is a bilinear functional of two vectors I, 9 E V denotedby the bracket product Ulg) satisfying
(II!) > 0, (II!) = °{=::} 1=0, (20.2)
(11 + hlg) (hlg) + (hlg), (20.3)
U/gl + g2) Ulgl) + Ulg2), (20.4)
Ulg) (glf), (20.5)
(aIlg) aUlg), Ulag) = aUlg)· (20.6)
Here a is a constant scalr (i.e., an element in K). The norm in a
Hilbert space is defined by 11111 = 1fiIi). 'Complete' means that allthe Cauchy sequences293 do converge: in particular, if IIIn - gil -+ 0,then actually In -+ g.
Introduction of scalar product allows us to introduce the conceptof angle between two vectors. We may say that an infinite dimensionalspace in which we can talk about not only lengths but also angles is aHilbert space. In other words, in any vector spaces we can define magnitudes by a norm, but the concept of direction is not easy to visualize.To this end, we need a scalar product to introduce the angle betweenvectors.Discussion.(A) Banach space. A complete normed space is called a Banach space. It is moreimportant in the study of PDE than the Hilbert space. L 1 (-+19.8) is a typicalBanach space.(B) Euclidean space. In these notes, Hilbert spaces are defined as infinite dimensional spaces. Hilbert spaces and finite dimensional vector spaces (with the ordinaryscalar product) are sometimes called Euclidean spaces (written as Ed).
20.4 Who was Hilbert? 294 David Hilbert was born in 1862. Hestudied mainly at Konigsberg, where he befriended Minkowski (whowas already famous when he was a high school student. He died relatively young due to appendicitis). From 1895 until his retirement in1930 he was a named professor at Gottingen. At the Second International Congress of Mathematicians in Paris in 1900, he presented the
292The definition of 'Hilbert space' can change slightly from book to book. Manyauthors include finite dimensional vector spaces. Here, following Kolmogorov andFomin, we understand that a Hilbert space is always infinite dimensional (need notbe countably so).
293 A Cauchy sequence for a given norm II II is a sequence {Yn} such that llYn Yrnll -+ 0 as nand m go to infinity. If the sequence is a complex number sequence,then the norm is the usual modulus. We know that C is complete.
294See also C Reid, Hilbert (Springer, 1970).
291
famous 23 problems for the mathematics of twentieth century. He hada characteristic optimism that new discoveries would continuously bemade and that these discoveries were necessary for the vitality of mathematics.
His scientific study covers vast area of mathematics, algebra, number theory, functional analysis (as one of the founders; the term 'spectrum' (--+34B, 34C) is due to him). His Grund1agen der Geometrie(based first on the lectures delivered in 1898-9; there are many versions,because he continued to improve the work) made an epoch.295 He endeavored to make axiomatic systems more general; he believed thatfundamental terms should not have a single privileged interpretation.
Hilbert's last two main scientific interests were theoretical physicsand foundation of mathematics. His study of the Boltzmann equationwas an important contribution.
He was the major proponent of Formalism, trying hard to provethe consistency of the axiomatic systems on which the modern mathematics is based on (--+17.18(5)). This was shown to be untenableby Godel. However, we must remember that Godel's sharp result waspossible because the problem was posed (formulated) unambiguouslyby the Hilbert school.
Hilbert died during the World War II (1943). The motto on hisgrave in Gottingen reads, "Wir miissen wissen, wir werden wissen.,,296
20.5 Examples.(1) l2-space. Let V be the totality of infinite sequences {cn } ={CI' ... ,Cn , ... } such that :En c; < +00. If we introduce the naturallinear structure a{cn} = {acn} and {an} + {bn} = {an + bn} and thescalar product {an} . {bn} = :E anbn, then V is a Hilbert space, whichis called the 12 -space.(2) L 2([a, b]). Let V be the totality of square Lebesgue integrable(--+19.8) functions (complex valued) on the interval [a,b]. Then, withthe definition of the scalar product
(20.7)
V becomes a Hilbert space called the L2([a, b])-space (--+20.19).297(3) HI([a, b]). Let V be the totality of Lebesgue square integrable functions defined on [a, b] whose first derivatives are also square integrable.
295Hilbert's axiomatization of Euclidean geometry is summarized in the book ofMac Lane quoted in Book Guide (p63 and on of the book).
296 We must know; we will know.297 Some authors use £2 and [2 for £2 and [2'
292
If we introduce the following scalar product
(JIg) =l b
dx{f(x)g(x) + f'(x)g'(x)}, (20.8)
then V becomes a Hilbert space called the Hl_space.298 The normbased on this scalar product is called in the context of wave equationsthe energy norm (---+alD.12 ).
Discussion.(A) Theorem[Riesz-Fischer]. Let {In}} be an orthonormal set (not necessarily a basis -.21.10) of a Hilbert space H. Then for any element C = {cn } of 12(-.21.4(1)), there is la) E H such that (nla) = Cn' 0In this sense, any separable (-.21.11) Hilbert space is isomorphic.(B) {(2rr(n2+ 1))-1/2einx} is a complete orthonormal basis of H1([-rr,rr]).(C) Let u E L 2[( -rr, rr)]. A condition for u E HI([-rr, rr]) is that l:nEZ n21cnl2 <00, where en is the complex Fourier expansion coefficient (-.17.1.
Exercise.Set up the Gram-Schmidt orthonormalization scheme (-.20.16) for the HI ([-1,1])space. Apply it to {I, X, x 2 ,' .• } and obtain the first three polynomials. Comparethem with the Legendre polynomials (-.21A.5, 21B.2).
20.6 Parallelogram law and Pythagoras theorem. Let V be aHilbert space and x, y E V.(1) Parallelogram law. Ilx + yll + Ilx - yll = 2(llx112+ IlyI12).(2) Pythagoras' theorem. If (xly) = 0, then Ilx +yl12 = IIxl12+ Ily112.Discussion.The parallelogram law is a necessary and sufficient condition that the vector spaceis an Euclidean space (-.20.3). To demonstrate this we have only to show that
1(x, y) == 4(llx + yll-Ilx - ylll (20.9)
20.6is a respectable scalar product (-.20.3). Demonstrating the linearity (.A) is notvery easy. See Kolmogorov-Fomin.
From this we can show that Cp-space defined by l: IcnlP < 00 is a Euclideanspace only when p = 2. Also the vector space e[a,b] can never be an Euclideanspace.
20.7 Cauchy-Schwartz inequality. Let V be a Hilbert space andf,g E V. Then
(20.10)
To prove this assume g oF 0, and g is normalized (without loss of generality). Makeh~ f - g(glJ). (hlh) 2: 0 implies the desired inequality.
298This is an example of the Sobolev space (Sergei L'vovich Sobolev, 1908-?).
293
This inequality tells us a very obvious fact that the modulus of cosine cannot be larger than 1. As is often the case, very obvious thingstell us deep things. Heisenberg's uncertainty principle is a disguisedversion of Icos 01 ~ 1 (---+32B.1).
From this it is easy to derive theTriangle inequality: Ilf + gil ~ Ilfll + Ilgll·
Discussion.This inequality allows us to show that + and scalar product are continuous for aHilbert space. For example, (x n, Yn) -- (x, y) .
20.8 Bracket notation.(1) Ket. In elementary algebra, we regard an element of a vector spacea column vector a. Dirac introduced a symbol I!) to denote an elementf of a vector space, and called it a keto(2) Dual space. A map from a K-vector space (---+20.1) V to a field Kis called a linear map, if it satisfies the superposition principle (---+1.4):f(ala)+J3lb)) = af(la))+J3f(lb)). The totality V* of these linear mapsis again a K-vector space.Exercise.Demonstrate this statement.This space V* is called the dual space of V.(3) Scalar product. In a finite dimensional vector space V, a scalarproduct is introduced as (a, b) = a*b. 299 Any linear map f(a) froma K-vector space to K can be uniquely described as a scalar productf (a) = (b, a) by choosing an appropriate vector b.Exercise.Demonstrate the above statement. [It is convenient to use a basis vector set of V.]This implies that if a E V, then a* E V*. That is, (at least for a finitedimensional vector space) we may identify the dual space as the vectorspace spanned by all the row vectors. We write the hermitian conjugateof a ket la) as (ai, which is called a bra. We regard V* the totality ofbras.Notation. The scalar product of la) and Ib) is written as (alb).
20.9 How Dirac introduced brackets. The bra-ket notation wasintroduced by Dirac. See P. A. M. Dirac, Principles of Quantum Mechanics (Oxford UP, 1958). The book is a good example to demonstratethat mathematical depth and mathematical rigor can be different. Inthis book he introduces kets to describe the states of a quantum mechanical system after explaining superposition of states is required tounderstand the double slit interference experiment. What he claims
299* implies the hermitian conjugate. That is, a* is the complex conjugate of thetransposition of a.
294
is that the state space of a quantum mechanical system is a vectorspace. Then, he says that for a given vector space, there is always another space, and introduces the space of bras as the dual vectors of kets.
20.10 Orthonormal basis, separability. A subset {ej} of a Hilbertspace V is said to be an orthonormal basis, if (eilej) = Dij and thesubspace spanned by {ej} is dense300 in V. If a Hilbert space has acountable dense set, then we say the Hilbert space is separable. Separable Hilbert spaces have countable orthonormal basis.
Discussion.(A) L z(R3
) is separable.(B) An example of a non-separable Hilbert space is the totality offunctions on [0,1]such that they are nonzero only on a countably many points, and the square sumof these values is finite. The scalar product is defined by (x, y) = E x(t)y(t), wherethe sum is over all the countable points on which x(t)y(t) =/= O. (from KolmogorovFomin)(C) Let en = {6nk hEN' Then, {en}~=o is a complete orthonormal system of lz.
20.11 Bessel's inequality. Let {len)} be an orthonormal set of aseparable Hilbert space V. Then for \fll) E V
00
L l(enll)12~ Uil)·
n=l
[Demo]N N
Ilf - L Ien)(enlf)II Z= (fIn - L l(enlfW 2: 0
n=1 n=1
for any positive integer N. Hence, (20.11).0
(20.11)
(20.12)
20.12 Parseval's equality. Let {\en)} be an orthonormal basis ofa separable Hilbert space V. Then, for \fll) E V
00
L l(en ll)12= UII)·
n=l
(20.13)
Conversely, if (20.13) holds for \fll) E V, then {len)} is an orthonormalbasis of V. (This follows easily from IS[i]) = \1) (see below 20.14).This is a natural extension of Pythagoras' theorem 20.6.)
Discussion.
300Le., for any f E V there is a sequence {ad such that bN = E~1 aiei convergesto f in the norm as N -+ 00. That is, {ei} is complete (-+20.3).
295
(A) Let Q = {In)} be an orthonormal set of a Hilbert space. Q is an orthonormalbasis, iff301 la) satisfying (nla) = 0 for all n is actually zero.[Demo] If Q is an orthonormal basis, vanishing of all the Fourier coefficients impliesthat la) = O. Suppose Q is not a basis. Then due to Bessel's inequality 21.12 andParseval's equality 21.13 there is a nonzero vector Ib) such that
(20.14)n
Thanks to the Riesz-Fischer theorem (-+D20.5(1)), there is a ket la) such that
la) = I: In)(nlb).n
(20.15)
Since (bib) > (ala), Ib) - la) #- O. However, (nib - a) = 0 for any n. That is, thereis a ket Ie) satisfying (nle) = 0 for all n but not zero. Hence, if there is no such ketIe), then Q must be a basis.(B) Rademacher functions. Define l'n (x) as 1'0 (x) = 1 and
(20.16)
where X n is the number of the n-th binary place of x. R I = {rn(x)}nEN is calledthe Rademacher orthogonal function system.(1) Show that it is an orthonormal system for Lz([O, 1]).(2) Show, however, the system is not complete.(3) Let R be the totality offunctions made by multiplying finite number offunctionsin RI
• Then, R is a complete orthonormal system for Lz([O, 1]).
20.13 Generalized Fourier expansion. Let {len)} be an orthonormal basis (-+20.10) of a Hilbert space V. The following sum forIf) E V
00
18[1]) = L len)(enlf)n=l
(20.17)
is called the generalized Fourier expansion of f (cf. 20.24). Due tothe definition of the orthonormal basis, actually 18[i]) = 1f).302 Theexpansion allows us to make a one to one map between any separableHilbert space (-+20.8) and the .ez-space (-+20.3). Hence, all the separable Hilbert spaces are isomorphic.303
20.14 Least square approximation and Fourier expansion. 20.11
301 i.e., if and only if.302This equality is in the L z sense (-+20.5). When this equality is in the ordinary
sense is a non-trivial question as we have seen in 17.303In these notes, we use the terminology 'Hilbert space' for infinite dimensional
cases only.
296
tells us that the Fourier coefficients can be determined by the followingminimization problem:
N
min IIf - L cnenll·n=O
(20.18)
That is, the generalized Fourier series gives the best approximation inthe L 2-sense. This gives another reason why L 2 is a natural space toconsider Fourier series (Fourier analysis in general) (-+19.18).
20.15 Decomposition of unity. The main result of 20.12 can beabstracted as
(20.19)n
for an orthonormal basis {len)} of a Hilbert space V. This formula iscalled a decomposition of unity.
20.16 Gram-Schmidt orthonormalization. Let V be a Hilbertspace, and {II'), 12'), ...} be a set of linearly independent kets in Vwhose linear hull is dense in V (i.e., complete -+20.3). Then, we canconstruct an orthonormal basis {II), 12), ...} of V out of these kets asfollows. The procedure is called the Gram-Schmidt orthonormalization.(1) 11) = 11')/11'1, where lal will denote J(ala) in this entry.
(2) 12) = 12")/12"1' where 12") = 0-11)(11)12').(3) 3) = 3")1 3" ,where 3") = (1 - 1)(1 - 12)(21)13'), etc.This is a method to construct orthogonal polynomials from 1, x, x 2 , x 3 , •••
(-+21A.2).
20.17 Respect the order in the basis. Hilbert spaces may almostbe treated as finite dimensional vector space. However, we must respectthe ordering of the basis set. The (generalized) Fourier expansion is notabsolutely convergent usually, so this is a very natural thing to respect.
20.18 Orthogonal projection. Let the k-th summand in (20.19)be Pk ....-Iek)(ekl. Then we have PiPj = PjPi = t5ijPi. Especially,PiPi = Pi' These operators are hermitIan, Pi; = Pk.
If a linear operator P satisfies the idempotency, Le., p 2 = P, thenP is called a projection (or a projection operator).If it is hermitian, then it is called an orthogonal projection: For a nonzero ket la), let Ip)....- Pia) and Iq)....- (1 - P)la). (plq) = (aIP*(1 P)la) = (al(P* - P* P)la). If P is hermitian, this vanishes. That is, Ip)and Iq) are orthogonal.
Discussion.(A) [What is P 1P 2?] Let PI and P2 be orthogonal projection operators. A necessary and sufficient condition for P1P2 to be a projection operator is that P1 and
297
P2 commute. Let PiV = Vi, where V is a vector space on which these projectionoperators are defined. What is P1P2V?(B) [System reduction]. We wish to study a nonlinear equation
du- =N(u).dt
(20.20)
Here N is a nonliner functional (a map). Formally, orthogonal projections are usedto reduce a complicated system. Suppose P is a projection to a space spanned by'important variables' (say, slow variables). Let us write Q = 1-P. We can formallyrewrite
dPu
dtaQuat
= PN(Pu+Qu),
= QN(Pu + Qu).
(20.21 )
(20.22)
If we could solve the second equation for Qu for any Pu as Qu = F (Pu), then thefirst member becomes
dPuat = PN(Pu + F(Pu)). (20.23)
In this way we can get rid of unwanted variables, and reduce the number of variables or the dimension of the space we work. The procedure is only formal, and thecrucial point is how to choose P, and how to obtain F. This is a very active fieldof research now.
20.19 Space L 2([a,b],w). Let L 2([a, b]' w) be the totality of the functions which are square integrable30i! with the weight w on the interval[a, b]:
L 2([a,b],w)..-{fllb
lf (xWw(x)dx < oo}. (20.24)
This set is a Hilbert space with the following definition of the scalarproduct
(Jlg)..- lb
j(x)g(x) w(x)dx. (20.25)
When w(x) =1 we omit wand write L2 ([a, b]) as in 20.5. L2 ( (-00, +(0))is often written as L 2 or L 2(R). The convergence with respect to the
norm (called the L 2- norm) defined by II j II = /fiIi) is called theL 2 -convergence. As we know from the theory of Lebesgue integrals(-t19.8), we may freely change the values of the function on a measurezero set (-t19.3). so that the convergence in this sense could be quitedifferent from the ordinary sense of convergence (w.r.t the sup norm).
Discussion.(A) measure (->19a). Mathematicians usually avoid to discuss the weight functions w, because W need not be an ordinary function (i.e., the density need not be
304Usually, 'integrable' means 'Lebesgue integrable' (->19.8).
298
well-behaved). Hence, instead of writing wdx we usually write d/l, introducing ameasure /1. Hence, more officially, it is better to call L2([a,b],w) as Lz([a,b],/1):
(20.26)
(B) Lp-space. The Lp-space (p 2: 1) is defined by the completion305 ofthe followingfunction set
{<plll<plip < +oo},
where II lip is the Lp-norm defined a
(20.27)
(20.28)
Lp-space is a Banach space (~20.3 Discussion), but not a Hilbert space except forp = 2, because the parallelogram law (~20.6) does not hold.
20.20 Dirac's "abuse" of symbols. As we have seen, in a Hilbertspace306 Dirac's bra-ket notation causes no mathematical problem andis quite useful. However, Dirac wished to unify not only the linear space spanned by normalizable states (physically, localized states-t34C.8(4); this part is a Hilbert space) but also the space containing 'plane wave states' which cannot be normalized in the usual way.307The starting point of his formal approach is the following interpretationof an ordinary function as a vector with uncountably many components.
20.21 I(x) as an x-component of a vector. It is not an unnaturalidea to regard the i-th component of a vector Iv) as a 'value' v(i) ofa function v defined on {I, 2" .. ,n}, where n is the dimension of thevector space. Then, as we have already used the idea (-t3.7), it is notoutrageous to regard f(x) as the 'x-component' of a vector If). Weknow the i-th component of a vector v may be written as Vi = (ilv)using the basis vecor Ii). Analogously, we write
f(x) = (xlf), f(x) = (fIx). (20.29)
[We Thus we may regard a function as a vector in an infinite dimensional vector space spanned by position kets {Ix) : x E [a, b]}. Theseposition kets may be regarded as orthonormal vectors (-t20.10).
305Completion means to add elements to make all the Cauchy sequences haveunique limits.
306assuming separability (~20.10)
307Dirac wished to use the Hilbert space notation in a much wider class of spacesnow called rigged Hilbert space.
299
20.22 Inner product of functions. It is natural to interpret summations over the coordinate indices as integrations (weighted with afunction w as in 20.19) over the independent variable x. Thus, it isnatural to define the scalar product or inner product of two functionsj and 9 defined on the same domain as
(Jlg).,--- Jdxw(x)(Jlx)(xlg) = Jdxw(x)f(x)g(x). (20.30)
20.23 Decomposition of unity. The formula (20.30) suggests thatwe can decompose unity (cf. 20.15) as
JIx)w(x )dx(xl == 1. (20.31 )
This suggests that we may interpret {Ix)} as an "orthonormal basis."Often unity is written as the following operator:
1 = Ix) Jdxw(x)(xl· (20.32)
20.24 Trigonometric expansion revisited. Let V = L 2 ([ -1f,1fj)(----+20.5). Let us introduce the kets 10), In, c), In, s) such that
111(xIO) = !<c.' (xln,c) = ;;;:cosnx, (xln,s) = ;;;:sinnx. (20.33)
y 21f y 1f Y 1f
Then {/O), 11, c), 11, s), 12, c), 12, s),"'} is an orthonormal basis, becauseit is a complete set for CO-functions on [-1f, 1f], (----+17.4). The standardFourier expansion 17.1 is
00
If) = IO)(OIf) +L {In, c)(n, elj) + In, s)(n, slf)}·n=l
(20.34)
[Here, the equality is in the L2-sense.] Notice, again, that the equality in this formula is in the L2-sense. Bessel's inequality (----+20.11) andParseval's equality (----+20.12) adapted to the trigonometric function setare their original forms.
20.25 o-function (with weight). We can formally write (----+20.23)
I(x) = (xl:!.l!) = J(xly)w(y)dy(yj!) = Jf(y)(xly)w(y)dy. (20.35)
Therefore, it is natural to introduce
(xly) = ow(x - y)
300
(20.36)
(20.37)
such thatJOw(x - y)w(y)dy = 1,ow(x-y)=O x=j:y.
Obviously, Ow is a generalization of fJ (---+14.5). We should identify as
fJw(x - y) = fJ(x - y)jw(x).
Exercise.Show (for r' > 0)
t5(x - .'C')t5(y - y')t5(z - z') = t5(r - r')t5(O - O')t5(<p - <p')/r2 sin O.
(20.38)
(20.39)
20.26 fJ-function for curvilinear coordinates. (20.38) tells us thatif we wish to use functions defined in terms of the O_qlq2q3 coordinateswhich are orthogonal curvilinear (---+2D.3), then it is natural to choosethe function space whose scalar product uses the weight function w =h1h2 h3 (---+2D.8). Thus it is convenient to define the position bra-ketwith the normalization
For example, for the spherical coordinate system (---+2D.5)
( () I' ()' ')- fJ(r-r')fJ(()-()')fJ('{J-'{J')r, ,'{J r, , '(J - 2 . () .
r sm
Exercise.Write down the t5-function adapted to the elliptic cylindrical coordinates.
(20.41)
20.27 Delta function in terms of orthonormal basis. Sinceo(x - y) = (xly) may be interpreted as (xI1Iy), we may introducethe decomposition of unity 20.15 into this formula to obtain
(20.42)n
where {len)} is an orthonormal basis, and en(x) =(xlen).
20.28 Green's operator and Green's function - a formal approach. We have already seen the fundamental idea of Green in 1.8,and know several examples of Green's functions (---+15, 16). We wishto solve the following linear equation:
[Lu](z) = f(z)
301
(20.43)
with the homogeneous boundary condition. Let {Ix)} be the positionkets w.r.t. the Cartesian coordinates (-*20.21). With the aid of thedecomposition of unity (-*20.23), we rewrite (20.43) as
or
(zILjy) Jdy(ylu) = (zlj) (20.44)
JdyL(z, Y)Zl(Y) = f(z), (20.45)
where L(x, y) = (xILly) (a sort of matrix element). If we can invertthe 'matrix' L(x, V), then we can solve this equation. In other words,if we can solve
LG = 1 (20.46)
for G, then Zl = Gf tanks to superposition (linearity). (20.46) reads
JdyL(x, y)(yIGlz) = (xlz) = 8(x - z). (20.47)
G is called a Green's operator, and G(xly) == (xIGly) is called a Green'sfunction. Formally, G = L-\ so that G(xjy) = (xIL-1Iy).
20.29 Eigenfunction expansion of Green's function - a formalapproach. Suppose we know the eigenkets {In)} of the operator L:
(20.48 )
If all the eigenvalues are non-zero, then formally
(20.49)n
where (xln) = un(x). Here we have assumed that the eigenkets of Lmake a complete orthonormal set. This is the Fourier decompositionformula for the Green's function. We can immediately see the symmetry of the Green's function: G(xly) = G(ylx) (-*16A.20, 35.2, 36.4,37.7). We will later return to a more careful discussion (-*37).
302
21 Orthogonal Polynomials
We can construct a polynomial orthonormal basis of a Hilbertspace. They are called orthogonal polynomials, which havea beautiful general theory and many important numericalapplications (---t 22).
Key words: generalized Fourier expansion, generalized Rodrigues' formula, generating function, three term recursionrelation, zeros, Sturm's theorem, Legendre polynomial, Hermite polynomial, Chebychev polynomial
Summary:(1) Recognize that there is a set of relations and formulas common tomany (all classical) orthogonal polynomials (21A.3-11).(2) Generating function is a useful tool to derive recursion relations(21B.4, for example).(3) Remember where the representative polynomials - Legendre, Hermite, and Chebychev - appear (21B).
21.A General Theory
21A.l Existence of general theory. The most important fact aboutorthonormal polynomials is that there is a general theory shared by allthe families of (classical ---t21A.6 Discussion (A) ) orthogonal polynomials. The general theory includes generalized Rodrigues' formula, associating (Sturm-Liouville type) eigenvalue problems, generating functions, three term recursion formulas, etc.
21A.2 Orthogonal polynomials for L2([a, b]' w) via Gram-Schmidt.{l, x, x2, .•• } makes a complete set offunctions for L2([a, b]' w) (---t20.19):notice first that CO([a, b]) (the totality of continuous functions on [a, b])is dense in this space. Weierstrass' theorem (---t17.3) tells us thatany continuous function on a finite interval can be uniformly approximated by a polynomial. Hence, the totality of polynomials is dense inL2((a, b), w). Therefore, the set of kets {In)} such that (xln) = xn308 is
308Por the notational convention see 20.21.
303
a complete set (-+20.3) of the Hilbert space L2([a, b], w). In this spacethe scalar product (-+20.5) is defined by
(fIg) _lb
f(x)g(x)w(x)dx, (21.1)
and the norm Ilfllw = JUII). We apply the Gram-Schmidt orthonormalization (-+20.16) to {In)} as follows:
(1) We define Ipo) = 10)//(010).(2) Normalizing 11) -IPo)(Poll), we construct Ipl)'(3) More generally, normalizing
n-l
In) - L IPk)(Pkl n ),k=O
(21.2)
we obtain IPn).{IPn)} is an orthonormal basis of L2([a, b], w).
The family of orthogonal polynomials of L2([a, b], w) is defined by(xIPn) times appropriate n-dependent numerical multiplicative factoras seen in 21A.5.
Exercise.Apply the Gram-Schmidt orthonormalization method to {xn}~=o and make an ONbasis for L2 ([O, 1]). Compute the basis up to the third member of the set.
21A.3 Theorem.(1) Pn(x) = (xIPn) is orthogonal to any (n - I)-order polynomial.(2) The orthonormal polynomials for L2 ([a, b], w) are unique, if the coefficients of the highest order terms are chosen to be positive.309
These assertions are obviously true by construction, but practically important.
21AA Least square approximation and generalized Fourier expansion. Let Pn be the totality of the polynomials order less than orequal to n. The polynomial P E Pn which minimizes
Ilf - Pllw (21.3)
for f E L2([a, b), w) is called the n-th order least square approximationof f (-+20.13). The ket IP) satisfying this condition is given by
n
IP) =L Ipj)(pjlj),j=O
(21.4)
309Here, it is not meant that the orthonormal basis in terms of polynomials isunique (of course, not). If we demand that there are no two polynomials of thesame order in the basis, the choice is unique.
304
(21.6)
where IPi) are calculated in 21A.2 with respect to w. That is, IP)is the n- th partial sum of the following generalized Fourier expansion(~20.14) of If)
00
If) = L Ipj)(pjlf)· (21.5)j=O
Notice that all the general properties of the Fourier series 17.5 applyhere as well.
Exercise.(1) Consider the step function (xla) = 0(x - a) on [-1,1] (a E (-1,1». Expandthis in terms of Legendre polynomials (-->21A.5).
(Pnl a) =V2(2n\ 1) (Pn-1(a) - Pn+1(a».
(pO Ia) = (1 - a) / y'2 as easily seen. Hence,
1 1 ex:>
0(x - a) = 2(1 - a) + 2 I)Pn-1(a) - Pn+1(a)]Pn(x). (21.7)n=!
(2) Expand x 5 into the generalized Fourier series in terms of Legendre polynomials.
21A.5 Example: Legendre polynomials. A family of orthogonalpolynomials of £2 ([-1, 1]) called the Legendre polynomials is definedas
Pn(x) = f2(xIPn) (21.8)y~
in terms of orthonormal kets {IPn)} constructed for a = -1, b = 1
and w = 1 in 21A.2. The coefficient J2/(2n + 1) is the multiplicativefactor mentioned in 21A.2. Pn(x) is called the n-th order Legendrepolynomial. According to our notational rule (~20.22)
11 V2n + 1(Pnlf) = -1 dx 2 Pn(x)f(x). (21.9)
Hence, the corresponding generalized Fourier expansion (21.5) in termsof the Legendre polynomials reads
00 2n + 1 [11 ]f(x) =~ 2 Pn(x) -1 dxPn(x)f(x) . (21.10)
21A.6 Generalized Rodrigues' formula. Let Fn(x) be defined on(a, b) eRas
(21.11)
305
(21.12)
where wand s are chosen as
a b w(x) s(x)a b (b - x)Q(x - a)f1 a.f3 > -1 (b-x)(x-a)a +00 e-X(x - a)f3 f3 > -1 x-a-00 +00 e- x2 1
As can easily be seen Fn is an n-th order polynomial (-t2A.l Exercise (D) )'1Fn(x)} is a orthogonal polynomial system for £2((a, b), w)(-t20.17),31 because
lb
dxw(x)Fn(x)Fm(x) = 0 for n =J m.(I
(Demonstrate this.) If the interval (a, b) and the weight function ware given, the orthogonal polynomial set311 is uniquely fixed as seenfrom the Gram-Schmidt construction (up to multiplicative constants)(-t21A.2).
For example, with w = 1 (that is, a = f3 = 0), a = -1 and b = 1,Fn must (-t21A.3) be proportional to the Legendre polynomial PnIndeed, from (21.11)
(21.13)
(21.14)
This is called Rodrigues' formula.With a suitable n-dependent numerical coefficient K n a set of or
thogonal polynomials {fn} is defined by
1 d"fn(x) = K () -dn [w(x)s(xtJ
nW x x
which is called the generalized Rodrigues formula (-t21B.l).312
Discussion.(A) Classical polynomials. The generalized Rodrigues' formula can be introduced in a slightly more abstract fashion as follows:Consider
FIl(x) = w(;r)-Id
dll[w(x)s(x)"],
x n (21.15)
where the following conditions are required:(1) F I (x) is a first order polynomial.
3IOU a and b are finite, then L2 ((a,b),w) = L2 ([a,b],w).311 We assume that the polynomials are ordered according to their order (---+20.19).3l2Not all the orthogonal polynomials can be obtained from the formula; only the
so-called classical polynomials.
306
(2) s(x) is a polynomial in x of degree less than or equal to 2 with real roots.(3) w(x) is real, positive and integrable on [a, b] and satisfies the boundary conditions w(a)s(a) = w(b)s(b) = O.It turns out that (i)-(iii) implies that we can only have the cases in the table in22A.6 (apart from trivial linear transformations, and multiplicative constants).313These polynomials are called classical polynomials.(B) Demonstrate with the aid of Rolle's theorem that all the zeros of Pn(x) are in[-1,1].
21A.7 Relation to the Sturm-Liouville problem. fn(x) definedby (21.14) obeys the following equation generally called the SturmLiouville equation (-+15.4, 35.1)
where A is a pure number given by
\ = _ (K dh(O) n - 1 d2s(x))
/\ n 1 d + d 2 .x 2 x
(21.16)
(21.17)
This can be demonstrated by a tedious but straightforward calculation.See 35.3 Discussion.
21A.8 Generating functions. In general, the following power series of ( is called the generating function of the orthogonal polynomialset {Pn(x)}
00
Q((,x) = L AnPn(x)C,n=O
(21.18)
(21.19)
where An is a numerical constant introduced to streamline the formula.That there is such a function for any orthogonal polynomial family canbe seen from the rewriting of generalized Rodrigues' formula (21.11).Using Cauchy's theorem (-+6.14), we have
1 i n'f n ( z) = K () dt .( ') +1 W ( t )s (t )n ,n W Z aD 21T'l t - Z n
where Dee is a small disk centered at z. We define a new variable( as
1 s(t)--a-(- t - z'
(21.20)
313 See P Dennery and A Krzywicki, Mathematics for Physicists (Harper and Row,1967), Section 10.3.
307
where a is a numerical factor introduced to streamline the final outcome. In terms of this variable (21.19) can be rewritten generally as
ann! i 1inez) = 2'K () d(;-n+l Q((,z),7f'/, nW Z aD' ."
(21.21)
where Q is an appropriate function resulted from the intergrand in(21.19) through the change of variables. This implies
(21.22)
(21.23)
21A.9 Generating function for Legendre polynomials. For example, for the Legendre polynomials, Kn = (-2)nn! and w(x) = 1.(21.19) reads (or directly from (21.13))
1 1 (t2 - 1)n dtPn(z) = 27fi JeD [2(t - z)]n t - z'
which is called Schlafii's integral. We choose a = -1/2 in (21.21) toget
P (z) = _1 1 _1_ d(71 27fi JaD' (n+1 VI - 2z( + (2 '
so that (---t8B.3(i))
1 00
w(z, () = VI _ 2z( + (2 = ; Pn(z)C.
This is the generating function for the Legendre polynomials.
Exercise.Derive (21.24). Use the new variable (following (21.20)) <: as
1 t2 - 1(=2(t-z)'
(21.24)
(21.25)
(21.26)
[Hint. When the reader solves for t, she must choose the correct branch so thatt -+ z corresponds to <: -+ 0.]
21A.I0 Three term recursion formula. Let {IPn)} be a completeset of orthonormal polynomial kets, and kn be the highest order coefficient of the polynomial Pn(x) = (xIPn). Define
(21.27)
308
Then,Pn+l(X) = h'nx - an)Pn(X) - fJnPn-l(X),
this follows easily from (1) of 21A.3.
Discussion.Let us demonstrate the assertion.
(21.28)
(21.29)
is a polynomial of degree at most n - 1. Therefore, it can be expressed as a sum of{Pn-l, ... ,Po}.(1) Demonstrate, because of 21A.3, that only Pn-2 and Pn-l are needed to expressPn - xknPn-I/kn1 · Already we have the form of (21.24). [Hint. What happens ifthere are other remaining terms?](2) Determine the coefficients.
21A.l1 Zeros of orthogonal polynomials. Let {IPn)} be the orthogonal polynomial kets of L2(fa, b], w) (~20.19). Then(1) All the zeros of Pn(x) = (x Pn) are in the interval (a, b). This is~ractically very important (~22A.3). For a proof see 35.3 DiscusSIOn.
(2) All the zeros of Pn(x) are single and the zeros of Pn+l(X) are separated by those of Pn (x).
Discussion.The three term recurrence relation can be written as
xP(x) = AP(x) + q(x), (21.30)
where P = (Po, PI,'" ,Pn-If, A is a symmetric matrix, and q = (0,··· ,0, kn-1Pn/kn).Choose x to be a zero Xi of Pn, then we have
(21.31)
That is, the zeros of Pn must be the eigenvalues of A, so that they must be real.
21A.12 Remark: how to locate real zeros of polynomials. Drawing graphs with the aid of Mathematica and zooming into the relevantportion of the graphs may be the most practical method. Analytically,there is a famousTheorem [Sturm]. Assume that the n-th order polynomial P doesnot have any multiple zero. Let Po =P and P1 =Pl. Using the theorem of division algorithm, construct Pn as follows:
PH1 = Piqi - Pi-1 (i == 1,2"", n - 1). (21.32)
Let V (c) be the number of changes of sign in the sequence Po (c), P1 (c), ... ,Pn(c).314
The number of zeros in the interval [a, b] is given by V(a) - V(b).O
31 4Remove pi(C) if it is zero from the sequence.
309
21A.13 Example of Sturm's theorem. Let us study f(x) = x(x21). We trivially know that 0, ±1 are the real zeros. First we constructPi in the theorem as follows:
Po = x(x2 - 1); PI = 3x2 - 1; P2 = 2x/3; P3 = 1. (21.33)
Therefore, we can make, for example, the following table exhibiting thesigns and V.
a Po PI P2 P3 V(a)+00 + + + + 0
2 + + + + 01/2 - - + + 1
-1/2 + - - + 2-00 - + - + 3
For example, V(-1/2) - V(2)(-1/2,2).
2, so there must be two zeros III
Discussion.Find the number of positive real roots of the following polynomials.(1) P(x) = 3x4 + 2x2 - x - 5,(2) P(x) = 13x21 +3x3
- 2,(3) (Runge's example)P(x) = 3.22x6 + 4.12x4 + 3.11x3
- 7.25x2 + 1.88x - 7.84.
21A.14 Descartes' sign rule. Let
P(x) = aoxn + al:Z;n-1 + ... + an (21.34)
be a real coefficient polynomial. Let W be the number of the signchange in the sequence ao, al," . ,an (remove 0 from this sequence before counting). Then the number of strictly positive roots of P(x) = 0 isgiven by War W minus some even positive number. (Hence, if ltV = 1,that is the answer.)
21.B Representative Examples
21B.l Table of orthogonal polynomials. (---t2A.l Exercise (D))21A.6 tells us that various orthogonal polynomial families can be ob-
310
tained by choosing wand s appropriately and also by choosing appropriate multiplicative numerical factors Kn • Some common examplesare given as follows.
name symbol domain w(x) s(x) K -1n
Legendre Pn [-1,1] 1 1 - x"l. (-1)n2nn!Chebychev Tn [-1,1] I/Vl - x2 1- x2 (-I)n(2n - I)!!
Jacobi p(o.f3) [-1,1] (1- x)Q(x + 1)f3 1 -x2 (-1)n2nn!n
Laguerre Ln [0,00) e x x n!Hermite Hn (-00,00) e-X- 1 (_1)n
Note that L n is L~O) of 2A.1.
Exercise. Show Tn = n!..;:rrp~-1/2.1/2) /f(n + 1/2).
21B.2 Legendre polynomials. The Legendre polynomials have beendiscussed above (~21A.5). The orthonormal basis of L2([-I, 1]) (~20.19)in terms of the Legendre polynomials is in 21A.5 with the generalized Fourier expansion formula. The decomposition of unity (---+-20.15)reads
00 2n + 18(x - y) = L 2 Pn(x)Pn(y).
n=O(21.35)
(21.36)
Rodrigues' formula is in 21A.6, and the generating function is givenin 21A.9. We can write down the general formula starting from Rodrigues' formula as
1 [nJ2] (-I)j (2 - 2 ')'P ( ) = _ '" n J. n-2j
n X 2n f;:o j! (n _ j)!(n _ 2j)!x .
([.] is Gauss' symbol denoting the largest integer not exceeding .. )
LO p.(.) /'
0.8 ~)c~ .1 ....-'" I0.6 .\~p, (.) "?,0-...... ,I, ~'" ;'\ /0.4 I ~l)( /h ~r..........~ '><~b ~q,'t.102" ~)§. 1\ / 'j.\.. < I'-.. .-.?;...' ') ~ q,'" q,~ •
o \ '" v v?' ....... A ./ Ik-0.2 fI X ./ ~;:::::../ -f.--' 'V ........~)<~!l.~-0.4 .....-/ _ .......1--
-0.6 II I.A ,
I ....... '1'.,',\.,~_.1-I-t---4-1--1--l--+-+-l--+--+-+--+-t-t---l-0.8 v -t
10 .......- :"1.0 -0.8 -0.6 -0.4 0.2 0 0.2 0.4 0.6 0.8 1.0
311
Discussion.Let Qn(X) be the n-th order polynomial with its highest order coefficient normalizedto be unity. If its L 2-distance from 0 is the smallest among such polynomials, Qnis proportional to Pn . That is, minimize
(21.37)
with respect to the coefficients. The resultant polynomial is proportional to Pn .
21B.3 Sturm-Liouville equation for Legendre polynomials. Thedifferential equation corresponding to (21.16) reads (---t24C.l)
or
(21.38)
(21.39)
21BA Recursion formulas for Legendre polynomials. The threeterm recursion relation (---t21A.I0) reads
(21.40)
(21.41)
with Po (x) = 1 and P1(x) = x. This can also be obtained easily fromthe generating function (21.25): expand
? aw(1-2x(+(-)O( +(-(+x)w=O.
Similarly, we obtain
? aw(1 - 2x( +(-)- - (w = O.ax
This leads toP~+1 - 2xP~ + P~-l - Pn = O.
If we eliminate P~-l from (21.40) and (21.43), we get
P~+l - xP~ = (n + l)Pn .
If we eliminate P~+1 from (21.40) and (21.43), we get
xP~ - P~-l = nPn •
312
(21.42)
(21.43)
(21.44)
(21.45)
Combining above two formulas, we obtain
(21.46)
21B.5 Legendre polynomials, some properties.(1) Pn(x) is an odd (resp., even) function, if n is odd (resp., even):Pn(x) = (-l)npn(-x), Pn(1) = 1 and Pn(-l) = (_1)n. P2n (O) =(-;(2) (see Exercise below).
(2) IPn(x)1 :::; 1.(3) All the zeros of Pn are simple and in (-1,1) (-+21A.ll).(4) If I1n is an n-th order polynomial satisfying
(21.47)
for all k E {a, 1, ... ,n - I}, then I1n ex: Pn (-+21A.3(2»).[Demo of (2)] This can be proved with the aid of Schliifii's integral (21.23). Wechoose for the intergration path to be
t=z+~ei<l> (21.48)
for ep E [-1r, 1l"). Note that dt/ (t - z) = idep. Changing the integration variable fromt to ep in (21.23), we get the following Laplace's first integral
(21.49)
From this we get
ExerciseP2n(0) can be obtained from Rodrigues' formula (21.11), which reads
() (n r(n + 1/2)
P2n 0 = -1) J1rr(n + 1)'
(21.50)
(21.51)
21B.6 Hermite polynomials. The orthonormal basis {Ihn )} forL 2((-oo,oo),e- X2
) (-+20.19) obtained by the Gram-Schmidt methodapplied to monomials (-+21A.2) is written in terms of the Hermitepolynomials Hn(x) as
(21.52)
313
where
[(n+1)/2] ,Hn(x) = L (_)n . n. (2xt+1-2m.
m=O m!(n + 1 - 2m)!(21.53)
([.] is Gauss' symbol denoting the largest integer not exceeding '.) Thegeneralized Rodrigues formula (-+21A.6) for the Hermite polynomialsis
H ( ) - (_l)n x2 ..!!!:..- _x2
n X - e de.x n
The generating function (-+21A.8) is given by
(21.54)
W ( r) _ 2z(_(2 _ ~ Hn(z) rnHZ,., - e - L...J --,-" •
n=O n.(21.55)
H n is an even (resp., odd) function, if n is even (resp., odd).
3o
1M11~ "'~i-lIlz1r ,
~~,
V~'-.;f {7If... ,~~",
~;;.~!' \"-K
I t'J-j"\~"l(%l / \ X'~
~ \ A / X \ ~~~:J' ~z
;'''(z) \::I(,,)~ JI ~ .) __<..~vt4-.1
~ ...-;,/~
(Zl(x,
'\ h~1-'\%1. "
, \, ~
0.5
0.4
0.3
0.2
O.
o
-0.54
-0.2
-0.3
-0.4
-0.1
Warning. Many authors use the weight e-x2/2 instead of e-x2
• Ifwe write the Hermite polynomials defined for this weight as H~(x),
then the generalized Rodrigues formula (-+21A.6) reads
(21.56)
and
(21.57)
Discussion.
314
To demonstrate the completeness of the Hermite polynomials, Weierstrass' theorem 17.3 is not enough, because the latter is about a finite interval. To show thecompleteness with respect to the L 2-norm we have only to show the completenessof polynomials. This can be demonstrated with the aid of Weierstrass' theorem onincreasingly large intervals.
Exercise.(A) From the generating function show
ex'/2Hn(x) =~Joo eixv-v2/2Hn(y)dy.z V2ir -00
(21.58)
This can be split into real and imaginary part relations (Lebedev).(B) From the generating function we obtain the following generalized Fourier expansion
00 n
eax = ea' /4,""" _a_H (x) (21.59)L..t 2n , n ,o n.
which is good for all x E R.(C) Compute the generalized Fourier expansion of e-ax2 in terms of Hermite polynomials. The expansion coefficients can be written as
1 Joo ,_ -(a+l)x. .C2n - 22n (2n)!y'1r -00 e H2n (x)dx. (21.60)
To compute the integral use (21.69) below. The x-integration can be done and weare left with
( _l)TlaTl 100_ -8 n-l/2d
C2Tl - y'1r(2n)!(1 + a)Tl+l/2 0 e s s.
Use the Gamma function (-+9.6) to obtain the final result (Lebedev)
(_l)nan
(21.61)
(21.62)
21B.7 Sturm-Liouville equation for Hermite polynomials. Theformula corresponding to (21.16) reads
H~ - 2xH~ + 2nHn = 0. (21.63)
21B.8 Recurrence equations for Hermite polynomials. Thethree term recurrence relation (----+21A.I0) reads
Hn+l + 2xHn + 2nHn_1 = 0,
which can be obtained from
8WH8f = -2(z + Ow.
315
(21.64)
(21.65)
From
we obtain
f)WH = 2(wf)z
(21.66)
(21.67)
Exercise.An integral formula for Hermite polynomials can be obtained with the aid of
2 2 (Xi 2
e- X = -.[if Jo e- t cos2xtdt. (21.68)
[Hint. Note that the integrand is an even function.] Putting this into the generalizedRodrigues' formula (calculate the odd and even n cases separately, and unify theresults), we obtain
271( ')71 x
2 JooH ( ) - -z e -t 2+2itx "d
" x - r::::: e t t.y7r -00
(21.69)
21B.9 Chebychev polynomials. These polynomials are best introduced as
Tn(x) = cos(ncos-1 x).
The generalized Rodrigues formula (~21A.6) is given by
(21.70)
(21.71)
This can be transformed into (21.70) with the aid of the binomial theorem: it is easy to demonstrate that this formula yields
(21. 72)
which reduces to cos nO with x = cosO.The orthonormal basis {It n )} of L2([-1, 1], 1/Jl - x 2 )) (~20.19)
obtained by the Gram-Schmidt orthonormalization of monomials (~21A.6)can be written as
(xltn) = ~Tn(X).
The generating function (~21A.8) is
1 - z2 00
----=-2 = To(x) + 2 L: Tn(x)zn.1 - 2xz + Z 71=1
316
(21. 73)
(21.74)
The highest order coefficient of Tn is 2n-
1 for n ~ 1. The three termrecursion formula (.....21A.10) is315
Tn+l(x) = 2xTn(x) - Tn-l(x)
for n = 1,2",' with To = 1, TI(x) = X.
Exercise.(1) Demonstrate that
(21.75)
(21.76)
(2) Demonstrate the generating function for Chebychev polynomials (21.74) as elegantly as possible. [Hint. Use (" ).]
:V.7Q
21B.10 Remarkable properties of Chebychev polynomials.(1) Theorem. Let Pn(x) be a polynomial of order n(~ 1) whose coefficient of xn is unity. Then,
max IPn(x)1 ~ 21-
n,
XE[-l,l](21.77)
and the equality holds if and only if Pn(x) =Tn(x)/2n- I .D(2) The best (w.r.t. the sup norm) n-th order polynomial approximantof xn+l on [-1,1] is given by Tn+l(x)/2n - xn+l. This property makesthe Chebychev polynomial very important in approximation theory offunctions.(3) Xk+l = Tn(Xk) defines a sequence Xo, Xl, X2,' .. from the initial condition xo. This is a typical chaotic sequence. Among any continuousfunctions with n laps, Tn(x) gives the most chaotic sequences on theaverage.
315This is nothing but cos(n + 1)x + cos(n - 1)x = 2 cosx cosnx.
317
Discussion.(A) (1) above implies that if the n-th order polynomial Qn defined on [-1,1] withits highest order coefficient normalized to be unity and if its maximum deviationfrom zero is the smallest among such polynomials, then Qn is proportional to theorder n Chebychev polynomial.(B) Take T2 (x). Demonstrate that there are two intervals I and J in [-1,1.] whichshare at most one point such that T2 (I) n T2 (J) :J I U J. In general, if the readercan find two positive integers and two intervals I and J sharing at most one pointsuch that reI) n fm(J) :J I U J, then f exhibits chaos on the inteval containingboth I and J. That is, there is an invariant set n of fN for some positive integerN such that fN restricted to n is isomorphic to the coin-tossing process.).
318
22 Numerical Integration
Most integrals cannot be computed analytically. Some ofthe most important numerical integration algorithms areinseparably connencted to the theory of orthogonal polynomials. Also discussed are the effectiveness of the simpletrapezoidal rule and high-dimensional integrals.
Key words: Gauss schemes, IMT formula, DE formula,quasi-Monte Carlo method, Monte Carlo method
Summary:(1) Roughly speaking, Gauss formulas are versatile and useful. Probably, up to 4 or 5-tuple integrals, direct use of the scheme may bepractical. (-+22A.3, 22A.5, 22A.6).(2) However, if a very accurate integration is needed, variable transformation schemes should be used, esp., the DE formula (-+22B.2).(3) If the integration is over a moderately high (rv 10) dimensionalregion, then quasi-Monte Carlo method 22C.5 should be consideredfirst with the conditioning of the function according to 22C.2. I fhtedimension is higher, then currently no better versatile method than theMonte Carlo method is known22C.6.
22.A Gauss Formulas
22A.I Numerical integration. Simple numerical integration methods as the trapezoidal rule or Simpson's rule has the following generalstructure
(the general Newton-Cotes formula).
(22.1 )We have N freedom to choose Cv ' Hence, it is possible to choose themso that the formula is exact for f(x) = 1,x, ... ,xN - 1 ((cf. Weierstrass'theorem BI7.3). Gauss pointed out that there is no necessity to chooseequidistant points v / N to sample the function values. See the followingexample.
319
22A.2 Simple demonstration. We choose N = 2:
(22.2)
We choose Ci and Xi so that the formula is exact for f = 1, X, x 2 andx 3 . We have four formulas
1 2 = C1 + C2 ,
X 0 = C1X1 + C2X2,
x2 2/3 = c1xi + C2X~,
x 3 0 = C1Xr + C2X~,
From these equations, we solve as
C1 = C2 = 1,
Xl = -X2 = 1/J3.
Therefore, the N = 2 Gauss-formula (G2) is
11 f(x)dx ~ f (_1)+ f ( __1).-1 J3 J3
If we need the integration
I = l b
¢>( u )du,
introduce the variable X running from -1 to 1 such that
1u = 2"[(b - a)x + a + b]
and1 J11= "2(b - a) -1 ¢>([(b - a)x + a + b]/2)dx.
Examples for (22.3) are given as316
(22.3)
(22.4)
(22.5)
(22.6)
exact 1 2/3 0.4 0.77751164... 0.306853...G2 0.99848... 0.6738... 0.3987... 0.77750464... 0.2261...
Here f*(x) = l/(x + 2) for X E [O,e - 2], f*(x) = 0 for X E [e - 2,1].
316 From P. J. Davis and P. Rabinowitz, Methods of Numerical Integration (Academic, 1975); not updated but still useful.
320
(22.10)
As we see, for smooth functions the method is amazingly powerful.If we choose the 4 point formula for I = J07l"/2 sin xdx, I = 1.000000,correct to six decimal places. (The Simpson rule (--t22A.8) with 64points produces 0.99999983).
Exercise.(1) Compute the following integral analytically:
/1 dX(x2 _ 1)e- x2
/ 2. (22.7)-1
Prescribe a method to compute this numerically with the aid of (only) G2 with therelative error of 10-5 .
(2) Construct the N = 2 Gauss formula for the integral of range [-1,1] with theweight e- 1xl • Apply it to cosx and compare the result with the ordinary GaussLegendre formula with N = 2 applied to e- 1xl cosx on [-1,1].(3) Compute
("/2Jo cosx sgn(n-j4 - x)dx (22.8)
to the relative accuracy of 10-4 using only G2. In this case if G2 is naively usedfor the whole inteval, the error is about 20%.
22A.3 Fundamental theorem of Gauss quadrature. Let w( x)be a weight function for the interval [a, bJ. Then, there exist real numbers Xl,.'" xN and Gl , ... , GN with the following properties(i) a < Xl < X2 < ... < XN < b,(ii) Gk > 0 for k = 1,2" .. ,N,(iii)
b N1f(x)w(x)dx = {; Gkf(Xk) (22.9)
is exact for every polynomial f (x) of degree not more than 2N - 1. DActually, Xl'" ,XN are the zeros of PN, the N-th member of the orthogonal polynomial family on [a, bJ with the weight w(x) (--t21A.2),and
Gk = Ib~N(x)w(x)dx (k = 1, ... , N).
a PN(X)(X - Xk)D
For example, for J~l f(x)dx, PN(X) = J2~±l PN(x) (--t21A.5) so thatthe scheme is called the Gauss-Legendre formula.[Demo] We demonstrate the theorem for L 2 ([-1, 1]), the most important case. Let! be an m-th order polynomial, and the desired integration formula is given by
1 NII !(.1J)dx = ECk!(Xk).
321
as in (iii). Here the fact (--+21A.ll) that the zeros of orthogonal polynomials areall in its domain has been fully utilized. Notice that f can be uniquely decomposedas
(22.11)
where Pn is the n-th order Legendre polynomial, and R is a polynomial of orderless than n. Since the order of Q is m - n, if m - n ~ n - 1 (i.e., m ~ 271 - 1), thenPn is orthogonal to Q (--+21A.3(1)). Hence, for m ~ 2n -1, we conclude
/1 f(x)d;r = /1 R(x)dx.
-1 -1
According to our formula (22.10), we have
1 N N
/ f(x)dx = L CkPn(Xk)Q(Xk) +L CkR(Xk).-1 k=l k=l
(22.12)
(22.13)
Therefore, we immediately see that if we can choose Xk to be the zeros of Pn ,
then the first term on RHS vanishes. That is, (22.12) is true for our formula underconstruction. For this to be true, we need to set n = N (--+21A.ll) and m = 2N-l.We have fixed the sampling point locations. If we can choose C'k so that (22.12)holds exactly for all the N - 1 order polynomials, then we can integrate all thepolynomials up to the order 2N - 1 exactly by our integration formula. Therfore,the remaining task is to determine Ck so that
(22.14)
is exact for any choice of N - 1 order polynomial R. Notice that generally we canwrite
N
R(x) =L R(xk)lk(x),k=l
where317
n ( )x-x'
lk(x) = II '.ii'k Xk - Xi
Hence, the following choice solves our problem:
Since lk(x)(x - Xk) is proportional to PN (all the zeros are common!),
(22.15)
(22.16)
(22.17)
317This is the standard Lagrange interpolation formula.
322
Exercise.Demonstrate the formula for the weight of the Gauss-Legendre formula:
(22.19)
[Hint.
\0 20o
-5
(22.22)
(22.21) -10
(22.20)
1r(p + p-l)lerrorl ~ ?N+1 max If(z)l·p- zEn
Exercise.Calculate the following three integrals:
(2) If f is holomorphic (--+5.4) in n ={z liz + 11 + Iz - 11 = p + p-l}for p > 1, then
22A.4 Error estimate of Gauss formulas.(1) If f is at least 2N times continuously differentiable (Le., in C 2N ),then the integration (011 [-1,1]) error is bounded by
22N+1(NI)4lerrorl ~ (2N + 1)((2~)!)3 x¥r!t1j lf(2N)(x)l.
with the aid of the Gauss-Legendre formulas for N = 2,4, and 8 and discuss theresults. (The necessary table is on p916 of Abramowitz and Stegun).
22A.5 How to get the weights. Abscissa and weight factors are tabulated in, e.g., Abramowitz-Stegun, Handbook of Mathematical Functions (Dover, 1972), but it is recommended to compute them to avoidany transcription mistakes.
22A.6 Many dimension. We can of course extend the formula formany dimensional cases. [See Davis & Rabinowitz Chapter 5]. Forexample, a singular integral like
JlJl dxdy-_l-1 -1 1 - xy
can be accurately calculated without any special considerations on thesingularities.
22A.7 Integral equation solver. The Gauss method may be the
323
best general numerical method to solve integral equations.
22A.8 Trapezoidal vs. Simpson rule318 Let
{
n-l 1 }2h ~ f(a + 2rh) + 2[f(a) + f(a + 2nh)] ,
[0 = 2h{~f(a+(2r+l)h)}.
To compute the following integral
la+2nh
[= a f(x)dx,
the trapezoidal rule uses
and the Simpson rule uses
(22.23)
(22.24)
(22.25)
(22.26)
(22.27)
(22.28)
Usually, it is believed that the Simpson rule is superior to the trapezoidal rule. However, this is not always the case. If
Ib Ib+h[= f(x)dx = f(x)dx,
a a+h
where h is the increment of integration, then the trapezoidal rule issuperior to the Simpson rule. If f vanishes or becomes very small (likeexp(-x2 )) outside the domain sufficiently inside [a, b]' or if f is a periodic function and [a, b] is a period, then (22.28) hold. [See 22A.9 forthe computation of Fourier coefficients.] The purpose of the modification in the Simpson rule is to eliminate the end effect of the integrationrange. This is why the trapezoidal rule can be better if there is no endeffect. Therefore, the Simpson rule is better than the trapezoidal rule,when (22.28) does not hold.
22A.9 Discrete Fourier transform I. Let
an = ~ f X k cos (n~1r) ,k=O
bn = ~ f X k sin (n~1r) .k=O
(22.29)
(22.30)
318This section is based on an essay by H. Takahashi, 'Superposition in numericalintegration,' Sugaku Seminar, March 1971.
324
Then,
1 N-l { (mnk1r) (mnk1r) }X k = 2(ao + aN cosk1r)) ~ an cos N + bn sin ~ ,
(22.31)if Xk = f(k1r/N), then (22.30) is obtained from the standard forn1Ulas for Fourier coefficients through 'approximating' the integrals withthe aid of the trapezoidal rule. However, notice that the formulas areexact. This is an example of the merit of the trapezoidal rule for periodic functions.
22A.I0 Discrete Fourier transform II. Let Xsequence of complex numbers, and
e(x) == exp( -2?rix). (22.32)
The following sequence X = {xn} is called the discrete Fourier transform of X:
N-l (k )X
k = Ee :: Xn-
Its inverse transformation is given by
Cf.32B.12.
22.B Variable Transformation Schemes
(22.33)
(22.34)
22B.l Functions of double exponential decay. If f is an analyticfunction, then the trapezoidal rule gives an excellent result for the integral over R. This seems to be a well known fact. If the integranddecays double exponentially, i.e., for some positive constants A, BandC
If I '" Aexp(-B exp(Cx))
The error of the trapezoidal rule truncated at N
N
Th = h L f(kh)k=-N
325
(22.35 )
(22.36)
for some positive 6. This means that if N is doubled, then the numberof the significant digits doubles.
22B.2 Double exponential (DE) formula. The DE formula wasproposed by Takahashi319 and Mori in 1974, and is regarded the mosteffective integration formula currently. The essence is to change theindependent variable so that the function decays double-exponentially.For example, for the integral of an analytic function f on [-1, 1.]
o ...o-_....~ __
for the integral of f from -00 to +00 is given by
ITh - II ::; canst·llfll exp(-6N / In N)
x = ¢(t) == tanh (~ sinh t)
(22.37)
(22.38)
D",,~ and the DE formula reads320
1 N!-1 f(x)dx ~ h k~N f(¢(hk))¢'(hk). (22.39)
However, the DE formula is not effective for the integrals of Fouriertransformation type.Discussion.The DE formula is powerful even for an integrand with end singularities:
(22.40)
If we use the Gauss-Legendre formula to this, the error is never less than 10-2 forn ~ 30. The DE formula with 5 terms is already with only less then 1% error.With 10 points, the error is about 10-6 • With n = 30 the error is about 10-15 •
The improvement is roughly exponential 1O-n / 2 • [This is in conformity with thetheoretical error estimate.]
22B.3 Numerical estimate of Fourier transform. For
1000
f(x) sin (7r(X; a)) dx (22.41)
319 This is the same person of the 'Takahashi gas', proving that there is no phasetransition in l-space with short range interactions. He is the most creative statisticalphysicist Japan has ever produced when he was young, but later became the leaderof computer research in ,Japan, saying physics was his hobby.
320H Takahashi and M Mori, Pub!. RIMS 9, 721 (1974).
326
the following transformation is effective:321
t1/;(t) = .
1 - exp( -21r sinh t)
The formula reads
(22.42)
100
f(x) sin (1r(x ; a)) dx ~ Ak'fN g (~1/; (h(k\+ a)) )1/;' (h(k\+ a)) ,(22.43)
where g(x) = f(x)sin[1r(x - a)/A].
22.C Multidimensional Integrals
22C.l Overview. An immediate idea is to use the one dimensional formulas repeatedly (direct product scheme). Other interesting methodsare the Monte Carlo or quasi-Monte Carlo methods. These latter methods are characterized by the error estimate which is independent of thespatial dimensionality but dependent only on the number of samplingpoints. Here we discuss only two methods for very large dimensions.The quasi Monte Carlo method is becoming increasingly important,because the error improves as 1/N instead of 1/-IN. However, thereseems to be no versatile general scheme applicable to all the cases. Thisis a very active field of research esp., in relation to finance.
22C.2 Polynomial variable transformation: recommended preconditioning. Let p be an integer not less than 2. If a function f ({Xi} )has continuous partial derivatives
(22.44)
for all jl,'" ,js E {a, 1"" ,p}, then we can use the following transformation
_ (2p + 1)! {YiXi = ¢(Yi) = (p!)2 J
ouP(1- u)Pdu
to convert the integrand f to
321T. Ooura and M. Mori, J. Camp. App!. Math. 38,353-360 (1991).
327
(22.45)
(22.46)
whose multidimensional Fourier coefficients have the following estimate:
(22.47)
With this smoothness condition (---+17.12), many integration formulasbecome more effective than without the transformation. Thus usually,it is recommended to transform the integrand with the aid of this transformation prior to application of integration schemes.
22C.3 Weyl's equidistribution theorem. If a is irrational, thenfor any 0 :::; a :::; b :::; 1 we have
1N#{n I{na} E [a, b], n E {I, 2"", N}} ---+ Ib - al· (22.48)
/,/'
/V'
~
Here {a} = a - [a] is the fractional part of a, and #A is the numberof members (the cardinality) of the set A. We will not give any prooffor this,322 but this should be intuitively clear, if the reader imaginesa particle geodesically moving (i.e., going straight) on the 2-torus, and[0, 1] is the coordinate of its section (the so-called Poincare section inthe theory of dynamical systems). A multidimensional version shouldnot be hard to formulate and understand in a similar fashion. Thus weget
22C.4 Theorem [Weyl]. Let 1, a1,"', as be rationally independent. 323Then,
1 Nlim NLf({kad,· .. {kas}) = r f({x})d{x}.
N ->00 k=l l[o,l]s
22C.5 Improved Haselgrove method.324
r 1 NIf( f({x})d{x} ~ N Lzuq(k/N)f({kad''''{kas}),
[O,l]S k=l
where
( ) _(2q+1)! q( )qzuq x - (q!)2 x 1 - x .
(22.49)
(22.50)
(22.51 )
The representative irrational numbers a1,' .. ,as are chosen (semi-empirically)as
322 See Section 3 of Korner.323That is, there are no integers PO,Pl,'" ,Ps (not all of them are simultaneously
equal to 0) such that Po + 2:Pk Q k = 0324M. Sugihara and K. Murota, Math. Computation 39, 549-554 (1982).
328
(1) If s + 3 is a prime, then (Xj = 2cos(2jlr/(2s + 3)),(2) Otherwise, (Xj = 2j /(s+lJ.wq is introduced to reduce the error further. A detailed error estimateis available, but the main features of the error is that it is bounded bythe number proportional to N-q.
22C.6 Monte Carlo method. To compute
(22.52)
the Monte Carlo method randomly and uniformly samples points in thecube [0, 1]S as Yl, Y2,'" and claim
(22.53)
The principle should be understandable from the random analogue of22C.3.
Its error can be estimated with the aid of Chebychev's inequality325
asProbabilitY(II - SNI ~ 2/VdV) :S E (22.54)
for 1 such that 111 :S l.For example, if N = 106, then with probability 99% we can get the answer with 2% relative error independent of the dimension of the space!However, the accuracy improves only as N- 1/ 2 •
Exercise.(1) We wish to compute
/1 /1 -(x +Xo+"+X 2 N... e· 1. N) / dX1 ... d:L'N-1 -1
(22.55)
(22.56)
with the aid of the Monte Carlo method. How many samples do we need conservatively to obtain the integral with 5% relative error with probability 99.9%?(2) We wish to compute the following integral by the Monte Carlo method:
1= 1dX1 ... dX100r(1 - r),
where r = JL~~~ Xl/5, and the domain D is the 100 dimensional hypercube[0,1] x ... x [0,1]. How many sample points are (conservatively) needed, if we wishto get I with the error less than 2% with the probability more than 99.5%?
325 0.2Probability(lx\ ~ a) ::; (x 2 ), which can be derived easily from the obviousinequality x2 ~ a2 0(lxl ::; a).
329
(3) Generalization of the Chebychev inequality. Let f be a positive function,326and tpA == infxEA tp(x). Then,
tpAProbability(X E A) ::; (tp).
The inequality we have used is a special case with 'P = x 2 •
(22.57)
326Measurable w.r.t. the probability measure under consideration (---+19a).
330
23 Separation of Variables - General Consideration -
Separation of variables is probably the only systematic wayto solve linear PDEs. Its essence is the construction of theproblem-adapted orthogonal function system. We have already studied the method in 18 when the ordinary Fourierexpansion is applicable. The principles have been exhaustedthere. Here the general features of the method are outlinedwith a summary of prerequisites and limitations. Practically, if the reader wishes to solve a PDE boundary valueproblem, consult a collection of worked-out problems. Weshould not forget that if we need an exact method, it is asure sign of our ignorance about the problem.
Key words: special function, eigenvalue problem.
Summary:(1) Practically, the method works only when the domain has a specialshape. Possible shapes are best seen in 'style books,' that is, bookscollecting worked-out problems. If the reader cannot find any good example in them, it may be wise to give up exact solutions (---+23.2).(2) The essence of separation is the problem-adapted Fourier-type expansion; consequently, in order to justify the method we need almostall the machinery of functional analysis (---+23.3).
23.1 Separation of variables: general idea. All our time-dependentlinear problems (---+ 1) have the following form:
Lt'ljJ(x, t) = Q'ljJ(x, t), (23.1)
where the operator L t acts only on the functions of time, and Q on thefunctions of space coordinates. The time and space coordinates can beseparated trivially as
f-l'IP1 (t ) ,
f-l'ljJ2(X ).(23.2)(23.3)
Since the first equation is an ODE, it is easy to obtain its generalsolution. If Q has a 'good' property, we can generalize the eigenvalue
331
expansion method for a finite dimensional vector space. Formally (23.1)can be transformed into
(23.4)
where CPJ.l(x) is the eigenfunction of the operator Q (QcpJ.l(x) = f-LCPJ.l(x))and (consistently with the notation in 20.21)
(23.5)
This is an analogue of the integral to compute the Fourier coefficients(--+20.14, 20.24). The final solution is formally given by
'l/J(X,t) = L(cpJ.l(x)I'l/J(x,t))cpJ.l(x),f'
(23.6)
where the summation is over all the eigenvalues. Hence, the key problem of the separation of variables is to find a problem-adapted generalized Fourier expansion.
23.2 Practical procedure via separation of variables. As wehave seen in 18 boundary conditions make the separation proceduremore complicated than stated above (--+26B, 27B). We will see anillustration in 23.9. A practical procedure to solve a PDE with inhomogeneous boundary conditions by separation of variables can besummarized as follows:(A) If the domain shape is not regular (roughly speaking, if the boundary does not consist of part of planes and conic surfaces), forget aboutexact analytic methods (--+23.4).(B) If the domain is'well-shaped,' then consult a typical problem sourcebook of the boundary-value problem. For example, the lecturer find N.N. Lebedev, I. P. Skalskaya and Y. S. Ufliand Worked Problems inApplied Mathematics (Dover 1965) very usefuL327 If the reader cannotfind any similar problem, unless she wishes to be an expert of specialfunctions, it is wise for her to give up analytical methods to obtainexact solutions.(C) If the reader insists on analytical solutions:(1) Decompose the problem into the problems with inhOIllogeneousboundary conditions only in one coordinate direction with the aid ofsuperposition principle exactly as we did in 18.2. The remaining coordinate directions become (generalized) eigenvalue problems.(2) The (generalized) eigenfunctions of the separated homogeneousproblems dictate the form of the solution. (This is the step of constructing the problem-adapted generalized Fourier expansion scheme.)
327 This is an accompanying workbook of N. N. Lebedev. Special Functions (3 TheirApplications (Dover, 1965), which is an excellent book.
332
(3) Fix the expansion coefficients with the aid of the inhomogeneousboundary conditions and the orthogonality of the eigenfunctions as in18. See 23.9 for an illustration
23.3 What do we need to justify and implement our procedure? Here, we summarize the requirements.(1) When can we justify the expansion (23.6)? To answer this question, we need a rudimentary knowledge of Hilbert space (---+20) and theoperators on it (---+34). After a suitable preparation we can generalizeFourier expansion and integral transformations (---+34B.6).(2) We must be able to find explicitly the eigenfunctions of Q defined ona linear space satisfying the auxiliary conditions. We use the methodof separation of variables to reduce the problems to lower dimensional(hopefully 1-space) problems. Therefore, we need methods to solvelinear ODEs (---+24) and associated eigenvalue problems (the SturmLiouville problems 35).
Discussion: Fourier expansion of multivariable functions: addendum toseparation of variables.We have claimed that the key element of the justification of the separation of variables is the (generalized) Fourier expansion of the function in terms of the 'equationadapted' orthonormal basis.328 Generally, we have several variables and we needmultiple Fourier expansion. Then a natural question is whether the totality of thetensor products constructed from ON bases for individual coordinates is indeed anON basis. The answer is in a certain sense affirmative, but somewhat delicate.(1) The (generalized) Fourier expansion of l(x!,'" ,xn ) is well defined if f is integrable thanks to Fubini's theorem (......20.15). That is, the value of the Fouriercoefficients do not depend on the order of expansion.(2) To reconstruct the original function from the Fourier coefficients, we can applythe individual inverse transforms successively. This is allowed, but the Fourier coefficients may not be integrable, so to interpret the inverse transform as an n-tupleintegral (not as n successive one dimensional integrals) is delicate329 and some extracondition on f is generally required. 330
23.4 What problems can we solve by hand? To have an analyticsolutions, we must be able to solve the eigenvalue problem by hand. Tothis end almost always separation of variables is mandatory. As is mentioned in 23.2(A), this requires not only a special form of the operator,
328The reader might say any ON basis will do for our purpose. If we need notworry about the (termwise) differentiability of the Fourier sum, then this is indeedthe case. However, we are solving differential equations, so that we must be sensitiveabout the uniformity ofthe convergence of the resultant Fourier series (......17.12-13).
329That is, we must in general inverse transform in the reverse order of the operation used in the calculation of the coefficients.
330See Kolmogorov and Fomin, second ed. Chapter 8, Section 4. Perhaps notavailable in English.
333
but also a special shape of the domain. 331 Therefore, problems we cansolve analytically are very limited even for the Laplace equation. Forsituations frequently encountered in practice (e.g., the Laplacian in aball) eigenfunctions of separated operators are well known and calledspecial functions. In short, we can solve by hand only very standardPDE under very standard auxiliary conditions. That is why the advicein (B) of 23.2 "see a style book" is practical.
Exercise.(A) Specify appropriate curvilinear coordinates to solve the following problems (ifthe problems are separable at all):(1) From a solid ball of radius a, another ball of radius b(< a) which is completelyinside the first ball is removed. Temperatures of inside and outside surfaces aregiven. Find the steady temperature distribution in the solid.(2) There are two osculating identical conducting balls. Compute the electric fieldwhen the balls are maintained at V with respect to the infinity.(3) A cylindrical hole of radius r is made through a solid conducting ball of radiusR(> r) slightly off the center. Find the electric field when the solid has the totalcharge Q.(4) A lens-shaped conductor is maintained at voltage V relative to infinity. Assumethat the surfaces are with the same radius of curvature R and the thickness of thelens is 2d, where d < R.(5) A conducting plane has a semicylindrical boss of radius a. The plane is maintained with the electric potential V. Find the electric field in the space.(B) Two identical conducting spheres of radius a are placed with the separation of21 between the centers. Both the spheres are maintained at voltage V relative toinfinity. Find the electrostatic potential due to these spheres.
Discussion: Lame's problem.The most general case we can solve with the aid of separation of variables is theconfocal rectangular parallelepiped whose surfaces are made of confocal quadraticsurfaces given by
(23.7)
The necessary special functions are called Lame functions which are not studiedvery well.
23.5 What is a special function? The word 'special function' isused to denote collectively (1) r -function (-*9) and related functions
331 We must be able to employ the standard orthogonal curvilinear coordinates. Forexample, for 3d Schrodinger equation, complete separation of variables is possibleonly when the potential function V has the following form:
where h's are the ones given in 2D.3 (H. P. Robertson, Math. Ann. 98, 749 (1928);L. P. Eisenstein, Ann. Math. 35, 284 (1934)).
334
like polygamma functions,332 (2) functions described by indefinite integrals of elementary functions like the probability integral (--t25.11),(3) elliptic functions, (4) solutions of second order ODE obtained byseparating variables, and (5) solutions to special ODE like Painleveequations.333 Solutions to the second order linear ODE with 3 regular singular points (special functions of hypergeometric type) or with1 irregular singular point resulting from the merging of two regularsingular points (--t24B.2) in the former (special functions of confluenttype) are called classical special functions.
23.6 Are the analytic solutions useful? It is not easy to say yes.Often the obtained solutions are series solutions in terms of specialfunctions. Since special functions are mere symbols, one must look uptables or use, e.g., Mathematica or Maple (even trigonometric functionsare no exceptions; we need a table or a pocket calculator). Hence, ifshe wants a detailed behavior of the solution, a lot of numerical workis needed anyway. One might say that in order to know qualitative orasymptotic behaviors of a solution, analytic forms are useful. This istrue. However, to require a complete solution in order to get qualitativeor asymptotic behaviors does not sound elegant.
It should be clearly recognized that necessity of full analytic solution is a clear sign of the sad fact that we do not understand theproblem.
23.7 Importance of qualitative understanding. It is important toknow how to solve the problems by hand: what special functions aresuitable, how they behave qualitatively, etc. To teach these has beenthe main objective of the conventional math-phys courses.334 However,for most scientists (esp. pure scientists) to juggle tons of special functions is not important at all.335 It is much more important to acquirethe sense or feeling of correct physics and mathematics so that we willnot be outsmarted by computers, or not to be drowned in the flood ofnumbers. The reader must be able to walk, but in order to go to thePacific coast she need not retrace the Oregon Trail on foot!
332Polygamma functions: the nth-derivative oflog r( z) is called the (n+1)-Gammafunction. In particular, n = 1 is called digamma function, n = 2 is called trigammafunction, etc.
333See E LInce, Ordinary Differential Equations (Dover, 1956; original 1926)Chapter 14.
334See, e.g., H. W. Wyld, Mathematical Methods for Physics (Benjamin, 1976).335Perhaps more than 50 years ago there were one-year courses solely devoted to
trigonometrics in universities (remember that the universities in those days werenot remedial schools of the high school education). This sounds absurd now. Torealize that some topics are unimportant is an important progress.
335
23.8 Use of symbol manipulation programs. Many standard analytic methods, e.g., the series expansion method (-+24B), are bestimplemented with the aid of mathematics softwares like Mathematicaor Maple. Special functions are available in the standard mathematics softwares. For example, with Mathematica, if the reader types inBesselJ[n,z], then she gets In(z) (-+27A.l). Hence, we need notbe extremely familiar with special functions, although we should knowtheir general features. Most analytical calculations can be mechanized,so it is probably wiser to practice the use of these programs than toexperience lengthy practice sessions of analytical methods.
(23.8)
(23.9)
(23.10)
(23.11)
fo(r,r.p), 'IjJ(a,r.p,h) = fh(r,r.p),go(r,z), 'ljJ(r,¢,z) = gq;(r,z),ha ( r.p, z), 'ljJ (b, r.p, z) = hb ( r.p, z).
'IjJ(r, r.p, 0)'IjJ(r,O,z)
'ljJ(a, r.p, z)
(-+2D.I0) with the boundary condition
23.9 Case study of separation of variables: Laplace equationwith Dirichlet condition. The purpose of this entry is to providea show case with the aid of a fairly difficult problem. The region isfan-shaped: z E [0, h], r.p E [0, ¢] and r E [a, b]:
[~~r~ +~ [j2 + a2
] 'IjJ = °r ar ar r 2ar.p2 az2
o
First we perform the step (C)(1) of 23.2. The separation procedure'IjJ = R(r)<I>(r.p)Z(z) gives three distinct eigenvalue problems. The fullsolution is the superposition of the solutions to all the following threeproblems (1)-(3).(1) With the boundary condition (r, r.p homogeneous; z inhomogeneous):
'IjJ(r, r.p, 0)'IjJ(r,O,z)'IjJ(a,r.p,z)
fo(r,r.p), 'IjJ(r,r.p,h) = fh(r,r.p),0, 'IjJ(r, r.p, z) = 0,
0, 'IjJ(b, r.p, z) = 0.
(23.12)(23.13)
(23.14)
The separated equations are
d2 <I>
dr.p2
d2Zdz2
~ [d2R + ~ dR] _ m
2+ 0'.2
R dr2 r dr r2 o.
(23.15)
(23.16)
(23.17)
336
The eigenvalue problems are (23.15) and (23.17) with homogeneousDirichlet boundary conditions (<I>(O) = <I>(</» = oand R(a) = R(b) = 0).The positivity of a 2 and m 2 follows from the negative definiteness ofthe operators.336 The solution must have the following form:
ffi,Cl
(23.18)Here Jm is the Bessel function (-27A.2-3), and N m is the Neumannfunction (-27A.16). m, Cm and D m are fixed by the Dirichlet condition:
Dm=O; Cmsinm</>+Dmcosm</>=O. (23.19)
We may choose Cm = 1 without any loss of generality. a, Am,n andBm,n are fixed by the Dirichlet condition
Am,nJm(aa) + Bm,nNm(aa)Am,cJm(ab) + Bm,aNm(ab)
0,0.
(23.20)(23.21 )
That is, Jm(aa)Nm(ab) = Jm(ab)Nm(aa) fixes a. E and F are determined from the inhomogeneous boundary condition (23.14) with the aidof complete orthogonality (-34B.5) of the eigenfunctions constructedabove (not easy or almost impossible bu hand for general a and b).(2) With the boundary condition (r, z homogeneous; cp inhomogeneous)
'IjJ(r,cp, O)'IjJ(r, 0, z)
'IjJ(a, cp, z)
0; 'IjJ(a, cp, h) = 0,
90(r, z); 'IjJ(r, </>, z) = 9<j>(r, z),0; 'IjJ(b, cp, z) =0.
(23.22)(23.23)(23.24)
The separated equations are
d2<I>
dcpd2Z
dz2
!- [d2R+ ! dR] + m2
_ a 2
R dr2 r dr r 20.
(23.25)
(23.26)
(23.27)
Here the positivity of a 2 is obvious from the condition that (23.26) becomes an eigenvalue problem (it is is not elementary to see this -34B.6
336Intuitively speaking, the eigenfunctions must be oscillatory functions to satisfy the orthogonality condition. "Negative definiteness" of an operator L meansUIL\f) ::; 0 for any ket If). The Laplacian ~ is a typical example.
337
Discussion (B)). m 2 also must be positive so that (23.27) becomes aneigenvalue problem. Hence, we may assume
m,Q
(23.28)where J and K are modified Bessel functions (-+27A.23). Here a, Ea
and Fa are fixed by the Dirichlet condition
Fa = 0; Ea sin ah + Fa cos ah = O. (23.29)
Ea = 1 is admissible. m, Am,a and Bm,a are determined by the boundary conditions
Am,aJim(aa) + Bm,aKim(aa) = 0, (23.30)Am,aJim(ab) + Bm,aKim(ab) = 0. (23.31)
That is, Jim (aa )Kim (ab) = Jim (ab )Kim (aa) determines m. C and Dare determined from the inhomogeneous boundary condition (23.23)with the aid of complete orthogonality of the eigenfunctions constructedabove.337
(3) With the boundary condition (cp, z homogeneous; r inhomogeneous)
'ljJ(r, cp, 0) 0, 'IjJ(r, cp, h) = 0,'IjJ(r, 0, z) = 0, ,'IjJ(r, ep, z) = 0,
'IjJ(a, cp, z) = ha(cp,z), 'IjJ(b,cp,z) = hb(cp,z).
(23.32)
(23.33)
(23.34)
The separated equations are338
(23.36)
(23.35)
(23.37)o.
d2 <.I>
dcpd2Z
dz2
!.. [d2R+ ~ dR] _m2
_ a 2
R dr2 r dr r2
The eigenvalue problems are easy ones: (23.35) and (23.36) with homogeneous Dirichlet conditions. We may thus assume
'IjJ = L(Am,aJm(ar)+Bm,aKm(ar))(Cm sin mO+DmcosmO)(Ea sin az+Fa cos az).m,a
(23.38)
337 This problem is nontrivial, since we need modified Bessel functions of imaginaryorder. See N. N. Lebedev, Special Functions fj Their Applications (Dover 1972)Section 6.5.
3381n this case obviously m 2 and 0:2 must be non-negative.
338
Here, 1m and Km are modified Bessel functions (~27A.23). A and Bmust be fixed from the boundary condition (23.34).
23.10 Remarks to 23.9.(1) If the region in the z-direction is not bounded, we need Fouriertransformations; if the region is not bounded in the r-direction, weneed the Fourier-Bessel(-Dini) transformation (-t27A.22).(2) The boundary condition in the r.p direction may be periodic.(3) The Neumann condition case is analogous.
339
24 General Linear ODE
The theory of general linear ODE is summarized, and thena constructive solution method (Frobenius' method) is outlined. This series method is best implemented with the aidof symbol manipulation programs. The reader should practice the method for one or two representative examples byhand or a step by step application of mathematics softwares.
Key words: analyticity of solution, fundamental systemof solutions, fundamental matrix, Wronskian, separationtheorem, Frobenius' theory, (regular and irregular) singular point, indicial equation, index.
Summary:(1) First-order n-vector continuous ODE preserves the linear independence of the initial condition vectors (the existence of fundamentalsystems 24A.4, 24A.11).(2) If the coefficient functions are holomorphic around x, then the solution around x is Taylor-expandable, so a series form fundamentalsystem can be constructed (24B.1). Even if the coefficients are notholomorphic, if their singularities are not very bad (regular 24B.2),then still a series form fundamental system can be constructed (Frobenius'theory) (24B.3-7).(3) The Frobenius method is best implemented by a computer. See24B.8 for a 'practical Frobenius.'(4) Separation of variables of the Laplace equation in the spherical coordinates requires Legendre polynomials (24C.1-2) and associate Legendre functions (24C.5, examples in 26B).
24.A General Theory
24A.1 The problem. We must be able to solve separated equations(---t23) which are usually ODE. They are linear but with nonconstantcoefficients. We know we have only to consider (---tllA.5)
du(x) = A(x)u(x),dx
340
(24.1)
where A(x) IS a n x n matrix which is cOlltinuous339 on an intervalfeR.
24A.2 Theorem [Unique existence of solution]. IfA(x) is continuous34o
in an open interval feR, then for any Uo E R n and Xo E I, there isa unique solution u(x) passing through (uo, xo) whose domain is f. 0This follows directly from the Cauchy-Peano and Cauchy-Lipschitz theorems (-+IIA.8, lIA.10).
24A.3 Analyticity of solution. A(x) may be considered to be amatrix consisting of functions on C as A(z).Theorem. Assume A(z) to be analytic (i.e., all the components areanalytic functions -+7.1, 7.10) in Dee. Then, a solution analyticaround a E D can be analytically continued (-+ 7) to any point in Dalong any curve in D. 0
This implies that the singular points of a solution, if any, appearwhere there are singularities (-+8A.2-7) of A(z).
Discussion.For ID Schrodinger equation, the wave function is finite at a point which is not asingularity of the potential. For example, the wave function of the harmonic oscillator is finite for finite x. For the Coulomb potential, the singularity can exist onlyat the origin.
24AA Theorem [Fundamental system of solutions]. The totality of solutions of (24.1) makes a n-vector space. Any basis set ofthis space is called the fundamental system of solutions. 0[Demo] Let VI, V2,' .. ,Vn be linearly independent vectors and Xo E f.Write the solution passing through (Vj, xo) as <Pj(x) (j = 1, ... , n). LetUo = Cl VI + C2V2 + ... + CnV n, and
(24.2)
It is obvious that the space cannot have a dimension larger than n. Ifthere is x such that u(x) = 0, then due to the uniqueness of the solution (-+24A.2) it must agree with the solution starting from 0, whichis obviously identically zero, so that u(x) can never be 0. Hence, thedimension of the solution space cannot be less than n. 0Notice that this theorem implies that <Pl(x), <P2(X),···, <Pn(x) are functionally independent: the identity for x E f
(24.3)
339We say that A(x) is continuous, analytic, etc., if all its components are, asfunctions, continuous, analytic, etc.
340 Our problem is a linear problem, so this is enough. A related discussion is inllA.10 Discussion (B).
341
implies Cj = 0 for all j. 341
24A.5 Fundamental matrix. The matrix <I> (x ) = (<PI (x), <P2(x), ... , <Pn(x))is called a fundamental matrix of (24.1), if {<PI (xJ' <P2 (x), ... , <Pn (x)) isa fundamental system of solutions (_24A.4).34
24A.6 Wronskian. Let Ul(X), U2(X), ... ,un(x) be n solutions to(24.1). The determinant of the matrix (Ul(X), U2(X),"', un(x)) iscalled the Wronskian of the set of solutions {Ul(X),U2(X),··· ,un(x)}.
If the Wronskian of the set {Ul(X), U2(X),'" ,un(x)} is nonzero,then this set is a fundamental system of solutions.
The converse is also true according to 24A.4. In other words:
24A.7 Theorem. A regular matrix X (x) satisfying
dX(x) = A(x)X(x)dx
is a fundamental matrix of (24.1).
(24.4)
24A.8 Theorem. Let W(x) be the Wronskian of the set of (any)n solutions to (24.1). Then,
d:;X) = [Tr A(x)]W(x).
This should be obvious from
det[(l + At)X] = detX + tTrAdetX + O[t2].
This formula follows from
detX = exp[Tr lnX],
(24.5 )
(24.6)
(24.7)
which is a very important formula and essentially follows from detX =TI .Ai, where .Ai are eigenvalues of X.
24A.9 Theorem. Let <I> (x ) be a fundamental matrix (-24A.5) of(24.1). Then, for any non-singular matrix P, <I>(x)P is again a fundamental matrix of (24.1). Conversely, if <I>(x) and 'lJ(x) are two fundamental matrices of (24.1), then there is a constant non-singular matrixP such that 'lJ(x) = <I>(x)P. 0
341This is of course a stronger condition that u i= O.342The evolution operator T(x,y) such that u(x) = T(x,y)u(y) is given by
T(x, y) = <I>(x)<I>(y)-l.
342
[Demo] Obviously, ~(x)P satisfies (24.4) and non-singular, so it is a fundamental matrix. Next, let P = ~(X)-l,¥(x), then a straightforward calculation showsdP/ dx = O. Hence, P must be a constant matrix, and non-singular by definition.
d2u dudx2 + P(x) dx + Q(x)u = 0, (24.8)
where P and Q are functions of x E R. This can be transformed intothe first order ODE of the form discussed in 24A.l:
24A.I0 Second order linear ODE. Separation of variables (-~23)
of linear second order PDE often gives second order linear ODE of thefollowing type:
dudx = A(x)u (24.9)
with u(x) = (U,d'lljdx)T and
(24.10)
24A.ll Fundamental system of solutions. Let Ul and U2 be twosolutions for (24.8). The Wronskian W(x) (-t24A.6) for these solutions is defined as
W() I Ul U2 Ix = u~(x) 'u~(x) . (24.11)
Tha is, W is the Wronskian of (24.9). If we can find Ul and U2 withW (x) =j:. 0, then the set {Ul' U2} is called a fundamental system ofsolutions. The general solution to (24.8) is Cl Ul + C2U2 for arbitraryconstants Cl and C2 (cf. 24A.4).
24A.12 Theorem [Separation theorem]. Let U and v make a fundamental system of solutions of (24.8). Then(1) The zeros of U and v are all of multiplicity one.(2) The zeros of u and v separate each other. D[Demo] Suppose u has a zero of multiplicity larger than one. Then u andu' canvanish simultaneously, so that the Wronskian W (-+24A.6) of u and v can vanish.This contradicts the assumption. Thus (1) must be true. To prove (2) note thatu and v cannot have a common zero, since W -=I O. Let al and az (> ad be twoadjacent zeros of u, and assume that v does not vanish in the interval J = (aI, az ).Then ulv is well defined in J, and is differentiable:
d(ulv) W(x)=dx ----;2' (24.12)
This cannot vanish. However, u/v = 0 at the both ends of J, so Rolle's theoremasserts that (24.12) must vanish in J, a contradiction. We can exchange u and v to
343
complete the proof.
Exercise.Consider the following l-Schrodinger problem
(-~ + V)1P = E1P, (24.13)
where V vanishes at infinity. If this equation has a bound state, it cannot be degenerate. In particular, the lowest energy bound state (ground state) cannot bedegenerate. Prove this showing or answering the following:(1) Degeneracy implies that there are two independent solutions for a given energy.What must be their Wronskian?(2) The Wronskian for localized state is zero.
24A.13 Making a partner. Suppose we have found one solution vto (24.8). We wish to make u (a partner of v) so that {u, 'v} becomes afundamental system of solutions (~24A.ll). We use (24.12). To compute the Wronskian W we can use (24.5) (~24A.8) with Tr A = -Pfor (24.8). W can be solved as
W = Woexp (- fl' P(v)dv).
From (24.12) we obtain
where c is a constant.
Exercise.One solution of
d2
y (1 ) dy 1- - - + 1 - + -y = 0dx 2 X dx x
is eX. Find its partner.
(24.14)
(24.15)
(24.16)
24.B Frobenius' theory
24B.l Analiticity of solutions. 24A.3 implies that if P and Q areanalytic in a region D, then the solution to (24.8) is unique and analytic in D. Hence, a local solution can be assumed to be in the powerseries form around a point where P and Q are holomorphic.
24B.2 Singular points. If P or Q becomes singular (~8A.2) at
344
a point a, a is called a singular point of the ODE (24.8).(1) At a singular point a, if the singularity of P is at worst a pole oforder one, and that of Q is at worst a pole of order two (~8A.5(4)(ii)),then a is called a regular singular point of the ODE.(2) Otherwise, a is called an irregular singular point of the ODE.
Discussion.343
In general (in more standard pure math literatures) the definition of a regular singular point is as follows. Let u be any solution of (24.8).Definition. Zo is a regular singular point of (24.8), if there is a positive number psuch that for any of its solution u satisfies
lim (z - zoYu(z) = 0z--+zo
(24.17)
That is, if the singularity of the solution (remember 24A.3, 24B.l) is at worstalgebraic at zo, we say Zo is a regular singular point.Theorem [Fuchs]. A necessary and sufficient condition for Zo to be a regular singular point of (24.8) is that Zo is a regular singular point in the sense of 24B.2. 0Its proof is not very simple (elementary but lengthy). An intuitive understanding is the 'balance condition' of the singularities (divergences) around Zo in (24.8).Consider only the most singular terms in (24.8) near zoo If the 'aggravation' bydifferentiation of the singularity in the solution is balanced by the singularities inthe coefficients, then we say the singularity is regular.
24B.3 Expansion around regular singular point. Frobenius showedthat power series expansion can give a local solution around a regularsingular point as well. Around a regular singular point a, which wemay set to be 0 without any loss of generality, we expect the followingform
00
u(z) = zJl L akzk, (24.18)k=O
where f-l is an appropriate complex constant. We may expand P and Qas (Laurent expansion ~8A.8)
00
zP(z) = LPkZk, (24.19)k=O
00
z2Q(z) = L qkZk. (24.20)k=O
Formally substituting these expansions into the differential equation(24.8), we get conditions for the equation to be satisfied identically:
343 Yosida p86
aO¢(f-l) 0,
al¢(f-l + 1) + ao(h (f-l) = 0
345
(24.21)(24.22)
and generally for n = 1, 2, ...n
4J(/-1 + n )an +~ an-kfh(/-1 + n - k) = 0,k=l
where
/-12 + (pO - 1)/-1 + qo,
/-1Pi + qi·
(24.23)
(24.24)
(24.25)
24B.4 Indicial equation. We may assume ao = 1 without any lossof generality. However, if (24.23) couples only even coefficients {a2nwith each other (or only odd coefficients), then even and odd coefficients are decoupled. Therefore, the choice ao = 1, a1 = 0 and thatao = 0, a1 = 1 both give different solutions (cf. 24C.2). (24.21) or4J(/-1) = 0 is called the indicial equation. It determines two (possiblyidentical) values of /-1, /-11 and /-12 (Henceforth, we assume Re/-11 2 Re/-12)'
Exercise.Find the indicial equation for
.!!- {(I - z2).!!-u} + {Z(l + 1) - ~} u= O.dz dz 1 - Z2
(24.26)
24B.5 Use of symbol manipulation programs. Expanding andregrouping expanded terms is performed by symbol manipulating programs very efficiently. In practice, Frobenius' method will not be usedoften, but if needed, the best way is to use computers to compute thesenes.
24B.6 Theorem. Assume that z = 0 is a regular singular point(--*24B.2(1)) of (24.8) and /-11, /-12 are the roots of the indicial equation4J(/-1) = 0 (cf.(24.24)). Then[1] If /-11-/-12 (j. N, there is a fundamental system of solutions (--*24A.11)in the form of (24.18) converging in some neighborhood of O.[2] If /-11 - /-12 E N, generally only one solution in the form of (24.18) isuniquely determined by the expansion method. See 24B.7 for furtherclassification. 0[Demo] Choose It = Itl. Then, ¢(It + n) cannot be zero for any n = 1,2, .. " so thatan can be uniquely determined from (24.23). The resultant series is convergent insome small neighborhood of z = O. This can be demonstrated by constructing amajorizing series.344 If Itl - 1t2 is not in N, then It = 1t2 also allows us to determine an uniquely, and the resultant solution is distinct from the one obtained for
344See, for example, H. S. Wilf, Mathematics for the Physical Sciences (Dover,1962), or E. T. Whittaker and G. N. Watson, A Course of Modern Analysis (Cambridge UP, 1927), Sect. 10.31 for an explicit demonstration.
346
Pl' However, if PI - P2 E N, then there is mEN such that P2 + m = PI or<P(P2 +m) = O. Therefore, we may not generally determine am for this P2'
24B.7 Theorem [For J.LI - J.L2 EN]. In case [2] of Theorem 24B.6.[21] If PI = P2, then any partner u (to make a fundamental system)of the solution v constructed for PI in the form (24.18) must contain alogarithmic term and has the following general form
u(z) = Av(z) In z + zJ-l l 1/J(z), (24.27)
where A is a nonzero constant, and 1/J is analytic around z = O. Thisfunction can be determined by substituting the series expansion formof (24.27) into (24.8).[22] If PI - P2 E N\ {O}, then a partner u of the solution v constructedfor PI in the form (24.18) has the following general form
u(z) = Av(z)lnz+zJ-l2 1/J(z), (24.28)
(24.30)
where v is again the solution constructed for PI in the form (24.18),A is a constant (can be zero), and 1/J is analytic around z = O. Thisfunction can be determined by substituting the series expansion formof (24.28) into (24.8).D[Demo] According to (24.15) (~24A.13) the ratio q(z) = ulv of v and its partneru is given by (Cl and Co are integration constants)
q(z) = CI + Co JZ d(v(()-2 exp [- J( P((')d(']
= CI + Co JZ d( [(1'1(1 + a~( + ... )]2 exp [- J( (~~ + PI + ...) d(']
= cl + Co JZ C(po+2I'Ilh(Od(, (24.29)
where h( z) is analytic around z = 0 as can be seen from
h(z) = exp [- JZ d((PI - P2( + ...)] 1(1 + al( + .. f.
Since from the indicia! equation (~24B.4) or <P(p) = 0 (cf.(24.24)) -Po + 1 =PI + P2, we know Po + 2PI = 1 + PI - P2 E N \ {O}. Therefore, (24.29) has thefollowing form
(24.31)
where A is a constant and 'P is a function analytic around z = O. Hence, u musthave the form (24.28). For PI = /12 A cannot be zero to make u functionally independent of v.O.
347
24B.8 Practical Frobenius.
~Ol Check the expansion center is at worst regularly singular (--t24B.2).1 Compute the indices f-Ll and f-L2 according to 24B.4.2 Choose the index with the larger real part f.-Ll and construct the
series solution following Frobenius (24B.3).(3) If f.-L2 is not equal to f.-L1, try to construct the second solution just asbefore. If the obtained solution is different (functionally independent345
from the first one, we are done.(4) If we obtain the same solution or f.-L1 = /-l2, assume the form withlogarithm as in 24B.7, and determine v in a power series form.
Exercise.346
(1) Show that a fundamental system of solutions of the equation
consists of
1 4Uj = x- 12x +"',
1 3U2 = 1 - 6'<: + ....
(2) Show that a fundamental system of solutions of the equation
consists of
Ul = X 1/ 2 {I + ~X2 + _1_x4 +... }16 1024 '
1 3/2U2 = Ul (x) log x - 16 x +....
(24.32)
(24.33)
(24.34)
(24.35)
(24.36)
(24.37)
24B.9 Construction of the second solution by differentiation.Let us write the solution obtained by Frobenius' method with the indexA as u(x; A). If u(x, Ad and u(x, A2) are functionally independent, thenwe can use u(x, Ad and a linear combination of the two as a fundamental system of solutions. Consider
(A1 - A2)u(x, Ad - m 'u(x, A2)A1 - A2 - m
(24.38)
345That is, their Wronskian (-+24A.6) is not identically zero. Often, withoutchecking the Wronskian, we can recognize the independence by inspection.
346 Watson-Whittaker p209.
348
(24.39)
For the case [21]' we choose m = 0 and compute the limit of A1 --+ A2
with the aid of I'Hospital's rule. That is, we compute
:>.u(x; >.)I,~"Computing this explicitly, we obtain the general form given in 24B.7[21].When Al - A2 = mEN, we perform a similar calculation:
Again we recover the form asserted in 24B.7.
24B.10 Examples.347
(1) Case [1]: f-i1 - f-i2 -::J N.
x2y" + (x2 + 3
56) y = 0
with
(24.40)
(24.41)
v
u
5/6 ( 3 2 9 4 )x 1- -x + -x + ...16 896 '
1/6 ( 3 2 )x 1 - gX + ....
(24.42)
(24.43)
(2) Case [21]: f-i1 = f-i2
x(x -1)y" + (3x -l)y' +y = 0
withv = 1/(1 - x), u = lnx/(1 - x).
(3) Case [22]: f-i1 - f-i2 E N \ {O} with a logarithmic term.
(x 2- 1)x2y" - (x 2 + l)xy' + (x 2 + l)y = 0
with
(24.44)
(24.45 )
(24.46)
v=x, u=xlnx+1/2x. (24.47)
(4) Case [22] f-i1 - f-i2 E N \ {O} without any logarithmic term (cf.27A.19, 27A.25).
(24.48 )
347They are taken from E. Kreyszig, Advanced Engineering Mathematics (Wiley,1983 Fifth edition) p163.
349
withv = sinxjJX, u = cosxjJX.
See 27A.2 also, for example.
(24.49)
24B.ll Singularity at infinity. To study the singularity of theequation (24.8) at infinity, we introduce ( = z-l as usual in complexfunction theory. The equation reads in terms of (
(24.50)
(24.51 )
Therefore (---t24B.2),(1) If 2z - z2 P(z) and z4Q(z) is regular at 00, z = 00 is a non-singularpoint.(2) If zP(z) and z2Q(z) are regular at 00, then z = 00 is a regularsingular point.(3) Otherwise, z = 00 is an irregular singular point.
24B.12 How to solve inhomogeneous problem. To solve the inhomogeneous version of (24.8)
d2u d'udx2 + P(x) dx + Q(x)u = f(x),
where f is a piecewise continuous function, we have only to find onespecial solution to this inhomogeneous equation; the general solutionis the sum of that for (24.8) and this special solution. If one cannotget it by inspection, then perhaps the most systematic way is to useLagrange's method of variation of constants described in 11B.13.
24.C Representative ExaIllples
24C.1 Legendre equation. If the method of separation of variables isused in the spherical coordinates for the Laplace equation (---t2D.I0),the angular part can further be split into the parts 8(B) and <1>( cp) as(cf.26A.2)
d2 1I> 2dcp2 + m II> = 0,
1 d ( de) ( 1n2
)-.-- sin8-B
+ f(f + 1) - ~ e = O.smB de d sm 0
350
(24.52)
(24.53)
If there is no cp dependence, then m = 0, and (24.53) simplifies to(~26B.6)
d2P _ 2z dP £(£+1) P = 0d 2 ?d + ? ,z 1 - z- z 1 - z-
where z = cosO and P(z) = 8(0). Or, we get
d ddz(1- z2)dz P +£(f+ l)P = 0,
(24.54)
(24.55)
(2) This implies that an can be expressed in terms of ao and al. Thechoice ao = 1, al = 0 gives an even power series
which is called the Legendre equation. z = ±l are regular singularpoints (~24B.2) of (24.54). (z = 00 is also a regular singular point.See 24B.ll.)
24C.2 Series expansion method applied to Legendre's equation; around z = o. Since z = 0 is a regular point, solutions canbe obtained in the series form P(z) = L:k=O akzk with the radius ofconvergence at least unity (~24B.l, 7.3).(1) Introducing this into (24.55), we get
(24.56)(n + l)(n + 2)an +2 + (£ - n)(£ + n + l)an = O.
P. -1 £(£+1) 2 £(£+1)(£-2)(£+3)even - - 2! z + 4! - ... ,
(2~·57 )and ao = 0, al = 1 gives an odd power series
P dd = z _ (£ - 1)(£ + 2) z3 + (£ - 1)(£ + 2)(£ - 3)(£ + 4) z5 _ •••o 3! 5! .
(24.58)(3) Notice that these two solutions make a fundamental system of solutions (~24A.ll). If £ = n E N \ {O}, then they become polynomialscalled the Legendre polynomials Pn(z) (~21B.2).
24C.3 Series expansion method applied to Legendre's equation; around z = 1. The indicial equation (24.24) is ¢(J-L) = J-L2 = 0,so this is the case [21.] of Theorem 24B.7. One solution in the seriesform is
_ 00 (£+1)(f+2) ... (f+k)(-f)(-f+l) ... (-£+k-l) (l-Z)kPe(z) - E k!2 2
(24.59)
351
u I I I J," J ..'
-- Q,(JI) Q.(JI) 0., , I I AK I... "< r-... P,r.-t-t- ~'l:o/J...... ,1<1:/ ~ I):..!!
1-,,-, ,~ V ~ \ ~I\. v / ".......
k: ...... ~rn
I=~i)~'(") b.t~ ...~ I
Q\(JI)1"--i--" """) A<o...I'I.
I Q,(JI)
1.
0.80.60.40.2o
-0.2-0.4.,..0.6-0.8
-1.00 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0
This is called the Legendre function of degree l of the first kind. Itspartner in the fundamental system is obtained in the form of (24.28)(~24B.1). For a positive integer e= n
1 1 + z n 2n - 4k + 3Qn(z) = 2Pn (z) In 1 _ z - {; (2k _ l)(n _ k + 1) Pn- 2k+1(Z). (24.60)
This is called the Legendre function of degree eof the second kind. SincePn and Qn make a fundamental system of solutions (-t24A.ll), theirzeros separate each other (~24A.12(2)).
(24.61)
24C.4 Gauss' hypergeometric equation. The following equationis called Gauss' hypergeometric equation
~u duz(l - z)-d? + b - (a + f3 + l)z]-d - af3u = 0,z- z
where a, f3 and 'Yare constants. z = 0, 1 and 00 are the regular singularpoints (-t24B.2(1)). The indicialequation (~24B.4) around z = 0 is
</>(p,) = P,(,£ - 1 + 'Y) = O.
For 1£ = 0 we can get (-t24B.6) for -, ¢ N
(24.62)
( . ) _ ~ (a)k({3)k kF a,{3",z = L...J k'() Z,
k=O • I k(24.63)
where(A)k = A(A + 1)··· (A + k - 1). (24.64)
F is called the hypergeometric function. For p, = 1 - " if I - 2 ¢ N,we get a partner of the above solution as
Zl-'Y F(a + 1 - " (3 +1 - 1,2 - ,; z).
Notice that from (24.59)
PAz) = F(v + 1, -v, 1; (1 - z)j2).
(24.65)
(24.66)
Discussion.If we scale z as kz in Gauss's equation, we obtain the equation of the followingform:
z(1 - kz)u" + (c - bz)u' - au = O. (24.67)
Its regular singular points are at 0, 11k and 00. There is no other singularities.Take the k -+ 0 limit to make 11k confluent to 00. Then, we obtain
zu" + (c - bz)u' - au =O. (24.68)
352
If we set b = 0, the equation is Bessel's equation (......27A.1). Indeed, replacing azwith -t2/4, c = v + 1, and v = tVu, then we have
(24.69)
It is obvious that 00 is its irregular singularity (......24B.2).
24C.5 Associate Legendre functions. Consider the case with m =Iofor (24.53) (~24C.l). Using the same transformation of the variablez = cos (), (24.53) becomes
d( de) ( m2
)dz (1-z2
)--;[; + f(f+1)-1_z2 8=0. (24.70)
z = ±1 are regular singular points (~24B.2). Instead of solving thiswith the aid of the series expansion, introduce Z as
(24.71)
Then, we have
d2 Z dZ(1 - z2) dz2 - 2(m + 1)z dz + (f - m)(f + m + 1)Z = O. (24.72)
Differentiate Legendre's equation (24.55) m times, we get
Therefore, in terms of Legendre functions Pe and Qf (~24C.3)
dm dm
Pt(z) = (1-z2)m/2_d
Pe(z), Qe(z) = (1-z2)m/2_d
Qe(z) (24.74)zm zm
are the fundamental system of solutions (~24A.ll) of (24.70), andare called associate Legendre functions (~26A.5-6). Notice that pris not a polynomial, if m is odd. Also
(24.75)
24C.6 Confluent hypergeometric equation. Replace z in the hypergeometric equation (24.61) with z//3 and let /3 ~ 00. We get
d2 ,u duz- + h' - z)- - au = 0
dz2 dz
353
(24.76)
This is called the confluent hypergeometric equation or Kummer's equation. z = 0 is a regular singular point (--+24B.2), but z = 00 is anirregular singular point (--+24B.2), which is created by the confluenceof two regular singular points 1 (which is scaled to f3 by the variablechange) and 00 of the hypergeometric equation. The indicial equation(24.24) is ¢>(f-L) = f-L(f-L - 1) + 1f-L = O. The series solution method gives
Ul = F(O'., 1; z), U2 = zl-"Y F(O'. -1 + 1,2 - 1; z),
where_ ~ (O'.)k k
F(0'.,1; z) ~ ~ -kl( ) Z , 1 =j:. 0, 1, 2, ....k=O . 1 k
This function is called the confluent hypergeometric function.
Exercise.Show that(1) (l+z)n=F(-n,~,~,z),.
(2) (ljz)log(l + z) = F(l, 1,2, -z).
354
(24.77)
(24.78)
APPENDIX a24 Floquet Theory
a24.1 We consider (24.1) with periodic A(x), that is, there is w > 0 such that
A(x + w) = A(x). (24.79)
a24.2 Theorem [Floquet]. If A in (24.1) is periodic, then there is a fundamentalmatrix such that
<I>(x) = F(x)exA, (24.80)
where F is a n x n matrix with period w, and A is a constant n x n matrix. 0
[Demo] Let <I>(x) be a fundamental matrix (-+24A.5) for (24.1). Then <I>(x + w)is also a fundamental matrix. Therefore, Theorem 24A.9 tells us that there is aconstant non-singular matrix M such that <I>(x + w) = <I>(x)M. Since M is nonsingular, its logarithm In M = N is well defined. Define A = N / w, and set
F(x) = <I>(x)e-xA.
We get with the aid of <1>(x +w) = <1>(x)M
<I>(x +w) = F(x +w)e(X+W)A = F(x +w)exA M = F(x)exAM.
Hence,F(x + w) = F(x).
In other words,
(24.81)
(24.82)
(24.83)
a24.3 Theorem. A linear ordinary differential equation (24.1) with a periodicmatrix A can be converted into a constant coefficient ordinary differential equation
dv(x) = Av(x)dx
with u = F(x)v, where F is defined by (24.81).0
(24.84)
a24.4 Characteristic exponents. The eigenvalues of A in (24.81) are called thecharacteristic exponents. There is no systematic way to obtain these exponents.
355
25 Asymptotic Expansion
A formal expansion of a solution of a linear ODE discussedin the previous section around an irregular singular pointgives generally a divergent series, but the series may still beuseful as asymptotic series. Almost all the expansion seriesobtained by perturbation calculations in physics are divergent but asymptotic series. The famous perturbation seriesof QED are examples. We cannot uniquely reconstruct thefunction from its asymptotic series expansion in general, butwe can with some auxiliary conditions. A famous exampleis the Borel summability.
Key words: asymptotic sequence, asymptotic series, optimal truncation, Watson's lemma, Laplace's method, Stirring's formula, acceleration of convergence, Borel sum, Boreltransformation, Nevanlinna's theorem.
Summary:(1) If Frobenius' method is blindly applied around an irregular singular point, we usually obtain divergent formal series, but they are oftenasymptotic (25.1). Most perturbation series in physics are only asymptotic (25.17).(2) Divergence does not automatically mean asymptoticity; A series isan asymptotic expansion of a function, if the truncation error at then-th order is smaller than the n-th order term (25.3). Therefore, itsoptimal truncation (25.5) is practically very useful.(3) Computation involving asymptotic series can be performed termwiselyexcept differentiation (25.10).(4) There are several standard methods to obtain the asymptotic expansion of functions and integrals (25.11-13, 25.15).(5) The asymptotic expansion (in terms of a given asymptotic sequence)of a function is unique (25.6), but an asymptotic series cannot uniquelydetermine a function (25.7).(6) However, if the function satisfies certain auxiliary conditions, thenit can be recovered from the asymptotic series. The most importantcondition is the Borel summability (Nevanlinna's theorem 25.20). Inthis case the Borel summation allows us to reconstruct the function(25.18-20).
25.1 Irregular singularity and divergence. Try to solve (24.8)following Frobenius (24B) blindly, assuming that x = a is an irregular
356
(25.1 )
(25.2)
singular point (-+24B.2):
00
u(x) = x>' L Ck Xk .
k=O
Formally, we get a set of formulas for Ck and A as in 24B.3. If, fortunately, C/ = afor allllarger than some N, we can get a regular solution.However, this is an accidental case, and usually we can prove that forsome k > °
. ICn-k Ihm -- =0,n->oo Cn
that is, the series (25.1) is divergent. 348 However, the resultant divergent series may be used as an asymptotic series around x = O.
25.2 Asymptotic sequence. Let {<Pn(x)} be an infinite sequenceof continuous functions. If <Pn+l(x) = o[<Pn(x)] around Xo, i.e.,
(25.3)
for all n > 0, the sequence is called an asymptotic sequence (around xo).
25.3 Asymptotic series. Let {<Pn} be an asymptotic sequence aroundXo. Then, the following formal series
(25.4)
is called an asymptotic series for a function j at Xo, if for each fixed n
(25.5)
(25.6)
as x -+ Xo. That is, if
lim j(x) - 2:k=o an<pn(x) =aX->Xo <Pn (x )
for all n, we say (25.5) is an asymptotic expansion of j around Xo interms of asymptotic function sequence {<Pi}, and write
Discussion.(A) Let A a (O',;3) denote the angular region
Aa(O',,B) == {zlO' < Arg(z-a) < ,B},
(25.7)
(25.8)
348See E L Inee, Ordinary Differential Equations (Dover, 1956; original 1926),p422. Also see W R Wasow, Asymptotic Expansions for Ordinary Differential Equations (Inteseienee, 1965).
357
where Arg is the principal argument (-+a4.7). We say a function f is expanded inthe (generalized) asymptotic power series around a in the angular region Aa(a, (3),if (25.5) hold when z -+ a is taken inside the angular region. The boundary of themaximal angular region where a given asymptotic expansion holds is called a Stokesline (-+25.8).(B) The Stirling formula 9.11 is admissible in the angular region Ao(-7f, 7f). Thiscan be shown with the aid of
( 1) ~ 1100
z ( 21rt)logf(z)= z-- logz-z+logv27f-- ~log 1-e- .2 7f 0 .. +t
25.4 Example. A typical example is:
(25.9)
(25.10)100 e-t/x 00
F(x) = --dt", L)-ltn !x n+l.
o 1 + t n=O
This is an asymptotic series around x = O. If x = 1/2, then the seriesread
1 1 133---+---+-+ ...2 4 4 8 4 .
(25.11)
This is hardly useful. However, if x is small, then the series should beusable as a numerical tool:349
112 6F(O.l) '" 10 - 100 + 1000 - 10000 + .... (25.12 )
25.5 Optimal truncation of asymptotic series. As is clear from thedefinition 25.3, to evaluate f(€), if we truncate the asymptotic sequenceat the n-th order, then the error (i.e., the difference between the truevalue and the estimate obtained from the truncated series) must besmaller than an ¢n ( E). Hence, for a given Ewe can find an optimal n totruncate the series by looking for n which minimizes an¢n (€).
For example, for (25.10), with the aid of Stirling's formula (-+9.11,also see 25.14)
n!xn+1 '" e(n+l) Inx+nln(n/e).
Hence, n '" 1/ € gives the optimal truncation position.
Discussion: How to efficiently compute series.(1) Euler transformation. Let
00
f(x) =L anxn
n=O
(25.13)
(25.14)
349Read a conversation between a numerical analyst and an asymptotic analyst onp19 of N. G. de Bruijn, Asymptotic Methods in Analysis (Dover, 1958, 1981).
358
be a convergent series. Define the difference operator D as
Then,00
f(x) = (1- x)-l ao + (1 - x)-l 2: Dan xn+1.n=O
(25.15)
(25.16)
(25.17)
This transformation is called the Euler transformation. Practically it is wise to usethis beyond some finite terms.(2) Subtraction trick. The above idea may be understood as subtracting theexpansion of (1- :r)-l ao from f(x). If we could find a function g which is close tof and easily expandable analytically, then considering f - g may be a good idea tocompute the series for f. For example, to compute
00 1
f = ~ (1+n2 )'
it is advantageous to use the knowledge
00 1
~ n(n + 1) = 1.
Hence,
(3) We wish to compute00 1
S-"- ~ 1 +n2
n=l
(i) The remainder satisfies the following inequalities
(25.18)
(25.19)
(25.20)
(25.21 )rOO dx 00 1 roo dx
} N 1 + x 2 < S N == l; 1 + n2 < JN -1 1+ x 2 .
Using this, find the necessary number of terms to obtain S within a 0.01% error.(ii) Now we use the subtraction trick (--+D7.5) with the aid of
What N do you need to obtain the same accuracy?(4) The same idea works for integrals as well. Consider
11 1I(E) = ~dx.
o vE + X
In this case 1(0) = 2 is easy, so let us subtract l/-.IX:
359
(25.22)
(25.23)
(25.24)
(25.25)
Introducing u = x / f. (rescaling trick), we realize that this integral is of order f.l/2.
The integration range may be replaced by [0, 00) to the lowest nontrivial order.
25.6 Uniqueness of asymptotic expansion. The asymptotic expansion up to a given number of terms of a given function is unique ifan asymptotic sequence is specified. DThis follows from the explicit formula for the coefficients:
1. f(x) - L:k:J ak<Pk(x)
an = 1m ).X---+XQ <Pn(x
25.7 Warning. However, an asymptotic series cannot uniquely determine a function. (1 +xt 1
, (1 +e- X )/(1 +x) and (1 +e-y'X +x)-l allhave the same asymptotic expansion L:( _1)n-1 x-n (x ~ 00) (Demonstrate this statement). If we try to asymptotically expand e-1/ x interms of the asymptotic sequence {x n
} (x ~ 0), all the coefficientsvanish, but obviously the function is not equal to O. Hence, we cannot generally recover a function from its asymptotic expansion, becausetranscendentally small terms are ignored by asymptotic expansion.
25.8 Stokes line. The transcendentally small term e-1/ x (x ~ +0)cannot be seen through asymptotic expansions as seen in 25.7. However, obviously this is no more small for x < O. Hence, if we considerthe function f (x) as a function f (z) of the complex variable z insteadof x, then its 'expandability into asymptotic series' should change drastically according to the sectors or regions on the complex plane. Theoccurrence of this drastic change is called Stokes' phenomenon and theboundary of these regions is called a Stokes line. The existence of thisphenomenon signifies nonconvergent asymptotic series.
25.9 Convergent power series is asymptotic. If f(x) is Taylorexpandable at x = a (Le., is analytic (~7.1) around a), then the Taylorseries is an asymptotic series. Conversely, if f(x) is holomorphic (~5.4)
and single valued in 0 < Ix - al < T for some positive T, then a is aremovable singularity (~8A.5(4)(i)), and the asymptotic series is theTaylor series of f around a.350
25.10 Operations with asymptotic series.(1) Termwise addition and subtraction of two asymptotic series (withthe same asymptotic sequence 25.2) is again an asymptotic series.(2) In the case of power series f '" L: anxn and 9 '" L: bnxn, their product f 9 has the asymptotic power series L: cnxn with Cn = L:~=o an-rb,..
350 Encyclopedic Dictionary of Mathematics vol I p124-6.
360
(3) Also for power series the asymptotic series of f(g) is obtained fromthat of f and g by substitution.(4) The termwise integration of the power asymptotic series is theasymptotic series of the integral:
rx 00 ain f(x)dx"" 2: _n_xn +I.
o n=O n + 1(25.26)
(5) However, termwise differentiation may not be allowed. A famouscounter example is e- I / x sin(e l / x ), which has 0 as its asymptotic powerseries as guessed easily from 25.7, but its derivative cannot be expandedm powers.(6) Termwise differentiation is allowed if the derivative of the functionalso has an asymptotic expansion. See the Discussion below.
Discussion.In this case, if f is holomorphic near a in the angular region and has an asymptoticpower series, then termwise differentiation is allowed so long as a is reached withinAa(a:,P) (->25.3 Discussion (A)).
25.11 How to obtain expansion I: Integration by parts(1) Let us estimate the tail of the normal distribution
1 100
2G(x) = -- e-Y /2dy.V2K x
Integrating by parts, we get
rn= 1 2/ 1002v 21fG(x) = _e-x 2 - e-Y /2dy.
x x
From this we easily get
(25.27)
(25.28)
(25.29)
(25.30)
This suggests that G(x) exp x2/2 can be asymptotically expanded inpowers of x-I. See also 25.12.(2)
100 -8 -t 00-8
- Ei( -t) = ~d8 "" ~ -1 ~d8t 8 t t 82
etc., gives an asymptotic expansion.(3) The decay rate of the Fourier expansion coefficients of a Ck
_ function discussed in 17.14 is an application of this method thanks to the
361
Riemann-Lebesgue lemma (-+17.11).(4) Fourier expansion of piecewise Ck-functions. To compute
J: f(x)eiwXdx (25.31)
we decompose the integration range into piecewise Ck sections, andthen estimate the integral asymptotically by integration by parts (againthanks to the Riemann-Lebesgue lemma) in each section.
Exercise.(A) Find the asymptotic expansion of Fresnel integrals
{X 7rU2 (X 7rU2
C(x) == io cos -2-du ; S(x) == io sin -2-du ;
[Hint. Use Jo= ---+8B.8.](B) Approximate estimation of integrals351
(1)
() 1x t2 dtI x = e .
o yx2 - t 2
For x « 1, we may replace et2~ 1. For x » 1, we introduce ~ = x - t, and
I(x) = ex2 ['" e-2ex+e2 d~ .io J2~x - ~2
(25.32)
(25.33)
(25.34)
(25.35)
Plotting the exponent in the integrand, we realize that the exponential factor is thelargest when ~ = 0, so that
lx d' x21= d x
2I(x) ~ ex2 e-2ex __"_ ~ ~ e-z~ rv ~.
o V2x~ 2x 0 .fZ 2x
(2)
(25.36)
This can be rewritten as
I( b) 1 roo _Z2 • 2 (b )a, = va io e sm va Z dz. (25.37)
If b » va, then the sine factor oscillates very rapidly, so we may replace it with itsaverage value 1/2. Therefore, ~ 1/va. Compare this with the exact value of I.
25.12 How to obtain asymptotic series II: Watson's lemma.Consider the following Laplace integral352
F(s) = ~oo e-stf(t)dt.
351 Migdal352 F is the Laplace transform of f (---+33).
362
(25.38)
Assume that f (t) has a power series expansion
00
(25.39)
with the radius of convergence R. Replace f in the integral (25.38) withits series expansion (25.39), and perform the integration termwisely.Then we get the following formal result:
00 anF(8)'=' '" .L 8 11 +1 '
71==0
(25.40)
Watson's Lemma. If there is a > 0 such that If(t)1 = G[eat] for
sufficiently large t, then (25.40) is actually an asymptotic expansion ofF around 8 = 00. 0 353
Example. An asymptotic expansion of the error function may easilybe obtained with the help of Watson's lemma:
2 100 2 2 2100 2Erfc(x) = - e-t dt = _e-X e-2xt- t dt...fi x ..fi 0
(25.41)
Now introduce u = xt and expand e-u2/
x2 in power series.
2e-x2 1OO
( u2
u4
u6
)Erfc(x) = -- e-2u 1- - + - - - + ... duox..fi 0 x2 2x4 6x6
(25.42)
This lemma can be used to estimate the asymptotic form of Fouriertransforms as well.
Exercise.(1) Show for x > 0
(25.43)
(25.44)100 -xl ~
~dt'"2:o 1 + t
VI :::-0
(3) The asymptotic expansion of Ci and si:
[Hint. Use s = xt as a new integration variable.](2) Show
. 100
cos t . 100sin tCz(x) == -dt, Sl,(X) == -dt.
x txt(25.45)
353 For a proof see B. Friedman, Lectures on Application-Oriented Mathematics(Wiley, 1969), p78.
363
This is the real and imaginary parts of
100 eit 100 ei(x+u)J(x) = -dt = --duo
x t 0 x+u(25.46)
(25.48)
Apply Watson's theorem to obtain the asymptotic expansions of these functions.Of course, repeated integration by parts should also work as can be guessed fromthe example in D25.11.(4) (This problem need not be here.) Find the asymptotic expansion in the x ---+ 00
limit of
(25.47)
in powers of l/x For n = 1, what is the optimum truncation of the resultant asymptotic series to compute E1 (N)?
25.13 How to obtain asymptotic series III: Laplace's method.Consider
/
+00
F(O) = -00 eBh(x)dx,
where h is a real C2-function with the following properties:(i) h(O) = 0 is an absolute maximum of h, and h < 0 for any nonzerox.(ii) There are positive constant a and b such that h ::; -a for Ixl 2: b.We must of course assume that the integral converges for sufficientlylarge O. Then, in the 0 -t 00 limit, we get
F(O) r"oJ ~(-Oh"(0))-1/2. (25.49)
(25.50)
25.14 Gamma function and Stirling's formula. Although we canapply Watson's lemma to get the asymptotic expansion of Gammafunction (-t9.1), it is not very easy, so we use the Laplace method.Substituting t = z(1 + x) in (9.3), we get
r(z + 1) = ezzz+l 17 [e-·1:(1 + x)r dx.
h in 25.13 reads -x+ln(1 +x), so it satisfies the condition of Laplace'smethod, and h"(O) = -1. Hence, we get
(25.51 )
which is the famous Stirling's formula (-t9.11) obtained by Laplace inthis way.
25.15 How to obtain asymptotic series IV: Method of steepest
364
descent. This is perhaps the most famous method to obtain asymptotic expansions of integrals. The principle is explained as follows. Wewish to compute the following contour integral on the complex plane
1= fc G(z)etf(Z)dz, (25.52)
where C is a contour from infinity to infinity on the complex plane suchthat on both ends the holomorphic function (-+5.4) f goes to -00. Gis also assumed to be holomorphic and t is a large positive constant.Let us split f into its real and imaginary parts as f = ¢ + i'lj;. Since ¢is a harmonic function (-+5.6), it can have a saddle point (-+29.6) z*,which satisfies f'(z*) = O. Modify the contour C to C* so that it canpass through z* and parallel to grad ¢ near z*. Along this pass
and 'Ij; must be almost constant, because the Cauchy-Riemann equation(-+5.3) tells us that gradients of ¢ and 'Ij; are orthogonal. Hence thesecond term in the above expansion along C* near z* must be realnon-positive. We may introduce a real coordinate ( such that f(z) =f(z*) - (2/2 + .". Let 0: be the angle between the real axis and thetangent of C* at z*. Then
f(z) = f(z*) + ~(z - z*? j"(z*) + ...2
Changing the integration variable, we get
1 1
1/21= etf(z*)G(z*)e-iet 27r
tf"(z*)
(25.53)
(25.54)
(25.55)
25.16 Acceleration or improvement of asymptotic series. If wecould convert the asymptotic series around 0 in powers of x into another asymptotic sequence which is in terms of an asymptotic sequenceconverging much more quickly to 0 than x n , then the asymptotic estimate should become much more accurate. An example is given here.354
Consider (25.10)1 1 2
F(x)=---+-+ ....X x 2 x 3
(25.56)
354See, for example, C. N. Moore, Summable Series and Convergence Factors
(Dover, 1966).
365
F(x) =
We wish to convert this into the power series in y = ¢(x). We assumeyjx ~ 1 in the x ~ 00 limit, and the Taylor-expandability: xjy =1 + ajy +bjy2 +.". Substituting this into (25.56), we get
1 1------,----- + ...y+a+b/y (y+aF
~ (1 - ~ + :: - :2 + ...) - :2 (1 _2ya + ...) + ....
(25.57)
Hence, choosing a = -1, we can kill the 1/y2 term. That is, we get
1 [1]Fx =--+0 .() x+1 (x+1)3(25.58)
This is much better than the original expansion for x » 1. Of course,one should not believe that the improvement is increasingly better if wecontinue this procedure indefinitely; the outcome is still an asymptoticexpansion.
Discussion.Consider the summation
00
S=L!(r),r=l
(25.59)
where! is well-behaved. Let Sen) be the partial sum up to the n-th term. Then,often
B C'Sen) = S + - + 2" + o[n-2
].n n
(25.60)
This can be used to estimate S from partial sums.A variant of this idea is the estimation of integral from numerical integration
with increment h. Let the integral be I and its approximately computed value withthe increment h be I(h). Then, often
I(h) = 1+ Bh + C'h2 + o[h2]. (25.61 )
25.17 Most perturbation series in physics are at best asymptotic. In field theory and statistical mechanics, in many cases we canperform analytical work only with the aid of some sort of perturbationtechniques. The resultant perturbation series are usually divergent.Physicists often claim that they are asymptotic, but divergence doesnot automatically mean that the series is asymptotic. Hence, we havetwo problems: (1) To show that the series is asymptotic and (2) Torecover the desired quantity from the asymptotic series. As we have
366
seen in 25.7, (2) is impossible without some auxiliary knowledge aboutthe function. Read Fejer's theorem (---t17.10) for Fourier series. Acertain summation method may recover the original function from adivergent series under an appropriate auxiliary conditions. (For Fejer'stheorem the needed auxiliary condition is the continuity of the function.) Thus, we may expect that a function satisfying certain auxiliarycondition could be recovered from its asymptotic series by a particularsummation method. A representative method is the Borel summation(---t25.18). Often the perturbation series in field theory are proved tobe Borel summable (i.e., the original quantity can be recovered from itsasymptotic series as a Borel sum).
25.18 Borel transform. Even if the RHS of
(25.62)
diverges, its "Borel sum"
00 tnB(t) = l: an,
n=O n.(25.63)
may converge. B(t) is called the Borel transform of the series (25.62).
25.19 Heuristics. Consider
(25.64)
Inserting this into (25.62), and formally changing the order of intergration and summation, we obtain
1100 100
f(z) = - B(t)e-t/Zdt = B(Az)e->'dA.zoo
(25.65)
Essentially, the Laplace transform (---t33) of B(t) is the desired function.
(25.66)(-1)n(2n)!
n!(2x)2n .
Exercise.(1) Apply this to (1 + X)-l '" I:(_)nxn.(2) We can asymptotically expand as ( ~~ if)
? 100 ? _",2 00Erfc(x) = ~ e- t2 dt '" ~_e- L
y7r '" v7r Xh =-0
Apply the Borel summation method to this series and recover the error function.
367
25.20 Nevanlinna's theorem. Let f(z) be analytic on the opendisc D in the figure 1 and its asymptotic expansion satisfies
(25.67)n-1
f(z) = L ak zk + Rn(z)k=O
~
withD IRn(z)1 ~ const.O"nn!lzln (25.68)
uniformly for all n and all zED for some positive 0". Then (25.67)is Borel summable (-+25.17). That is, the Borel transform B(t) ofthe series converges for ItI < 0"-1 and can be analytically continued toan analytical function B(t) (-+7.10) on the strip containing the entirepositive real axis. From this f can be recovered as355
lWi
1100
f(z) = - B(t)e-t/Zdt.z 0
(25.69)
-- -------------------~
355 For an elegant proof see A D Sokal, J. Math. Phys. 21, 261-3 (1980). However,this is not the general form given by the original author. For applications, see, forexample, Itzykson and Zuber, Quantum Field Theory (McGraw-Hill, 1980), Section9.4. J. Zinn-Justin. Quantum Field Theory and Critical Phenomena (ClarendonPress, 1989) Section 27.
368
26 Spherical Harmonics
Separation of variables of the Laplace equation in the spherical coordinates requires the spherical harmonic functionswhich make a complete orthonormal set of functions of spatial directions (i.e., functions on a unit sphere). Derivationoffunctional forms, the orthonormal relation, addition theorem related to the multipole expansion, and the applicationto PDE boundary value problems (potential problems) arediscussed.
Key words: spherical harmonics, spherical harmonic function, addition theorem, multipole expansion, interior problem, exterior problem, annular problem
Summary:(1) The angular part of the Laplacian in the spherical coordinateshave the orthonormal eigenfunctions called spherical harmonics ynm(26A.8-9). They are simultaneous eigenfunctions of the total and thez-component of the quantum mechanical angular momentum (26A.I0).(2) The addition theorem is used to decouple two spatial directions(26A.12), and applied to the multipole expansion of the electrostaticpotential (26A.14-15).(3) Spherical potential problems have different general expansion formsdepending on the domain of the problem (26B.2-5).
26.A Basic Theory
26A.l Separating variables in spherical coordinates. In the polarcoordinate system, the 3-Laplacian reads (-+2D.I0)
1 a2 1.6 = -a 2 r + 2 L , (26.1)r r r
where1 a. a 1 a2
L= ---S1110-+---sin e ae ae sin2 e acp2'
Separating the solution as u(r,e,cp) = R(r)Y(O,cp), we get
d2 R(r)-2rR(r) l(l + 1)-,dr r
369
(26.2)
(26.3 )
LY(B,cp) = -l(l + l)Y(B,cp). (26.4)
L is essentially the Laplacian on the unit sphere, and is a negative definite operator.
26A.2 Further separation of angular variables. Let us furtherassume Y(B, cp) = 8(B)<p(cp). The cp-direction must be the periodic direction, so the equation for <P must be an eigenvalue problem (cf. 23.9or 18.2). Hence,
(26.5)
and the rest is
1 d( de) ( m2
)-.- - sin0- + l(l + 1) - ~B e = O.sm BdB dB 8m
(26.6)
26A.3 Legendre's equation. If we introduce x = cos B, the (26.6)reads
d ( d8) ( m2
)dx (1-x2)dx + l(l+1)-1_x2 8=0,
which is called (modified) Legendre '8 equation.
26A.4 m = O. For m = 0 Legendre's equation reads (---+24C.l)
The general solution to this can be written as (---+24C.3)
e = API(x) + BQI(X),
(26.7)
(26.8)
(26.9)
where PI and QI are Legendre functions of first and second kind, respectively. QI is divergent at x = ±1, so that for a sphere problem thisfunction should not appear. Furthermore, PI is not finite at x = 1 if I isnot an integer. Hence, we need Pn (n EN), the Legendre polynomials(---+21B.2, 24C.2(3)). That is, I must be a nonnegative integer (theeigenvalue problem has been solved).
26A.5 m i=- O. For convenience 24C.5 is repeated here. If we define Z(x) by
(26.10)
370
(26.7) becomes
? d2 Z dZ(1 - x-) dx2 - 2(m + l)x dx + (n - m)(n + m + l)Z = O. (26.11)
This equation can be obtained by differentiating (26.7) m times. Therefore, the general solution of (26.7) is given by (~24C.5)
dm dm
P:(x) = (1 - x2 )m/2 dxmPn(x), Q~(x) = (1 - x2r /2 dxmQn(x).(26.12)
These functions are called associate functions of Pn and Qn' If we require that the solution is finite at x = 1, then P;; is the functionsappearing in the solution.
0.1 0.2 0.3 0.. o.s 0.6 0.7 Q.8 0.9 LO
•2
0~, TJ' rl tl\l\
t-'- ;--.I'-. ,/ l7'~ '\1\'B '\
~ Ir-..6
'v r J \• 1/ PV pi/ I :--...\V \ i'.1\ \ / /
<
·\/
6 1\ / \ / I/plV' V J
8 ......
L
L
0.2
-La
6
• ,f2
-i0 JJ-1': - V- N. \8
./ 17" -h( / I \\.6
\ \ r ~/ 1'J·/ 1/1\ 1\ '/ J51 I "olj/ \ y / I I21
\
V 1/ i'l/il
• IX / ill
6 "- / X IV ~ X / I
8 ---i'-
0.2
L
-0.
-LO 0.1 0.2 0.3 0.. 0.5 0.6 0.7 0.8 0.9 LO
26A.6 Associate Legendre functions. If m is odd, then Pnm isnot a polynomial:
pl(x)
pi(x)
pi(x)
Pj(x)
(1 - x2 ?12 = sinO,
3(1- X2
)1 / 2X = 3sinOcosO = ~sinO,
- 3(1 - x2 ) = 3sin2 a~(1- cos20),
~(1 - x2?/2(5x2 -1) = ~(sine + 5sin30),2 8
(26.13)
(26.14)
(26.15)
(26.16)
371
pi(x)
p](x)
1515(1- x2 )x = 4(cosO - cos30), (26.17)
15(1- x2?/2 = 15sin3 0 = 15(3sinO - sin30),4
(26.18)
etc., where x = cos O.
26A.7 Orthonormalization of associate Legendre functions. Wehave
J1 (l +m)! 2-1 PJ:l(x)Pl
m(x)dx = (l _ m)! 2l + 10k,l. (26.19)
[Demo]. The LHS is, for 1 > m, k > m
f(m) (26.20)
(26.21)
On the other hand, replacing m with m -1 and n with 1 in (26.11) and multiplying(1 - x 2)m-l, we get
Hence, (26.21) implies
(l+m)!f(m) = (1 + m)(l- m + l)f(m - 1) = ... = (1- m)!f(O).
f(O) = 2/(21 + 1) is obtained from 21A.5.
(26.22)
(26.23)
26A.8 Spherical harmonics. Now we can construct a complete orthonormal set of £2(52, sinO) (52 is the unit 2-sphere) (-+20.19). Letus define the kets {Il, m)} by (-+20.21-)
((), epll, m)
2l + 2 (l-lml)!p1m l ( LJ)_l_ im<.p2 (l + ImI)! I cos (7 -/2ife ,
(26.24)
where the ket \8, ep) satisfies (-+20.23, 20.25)
{271' rio dep io d8\O, ep) sinO(O,ep\ = 1.
372
(26.25)
and(0, cplo', cp') = 6(0 - 0')6(cp - cp')/ sinO. (26.26)
26A.9 Orthonormal relation for spherical harmonics. The decomposition of unity (-+20.15) reads
00 1
1 = 2: 2: 11, m)(l, ml1=0 m=-I
with the normalization
(l,mll',m') = 81•1,8m ,m"
In the ordinary notation these formulas read (-+20.26-27)
and
(26.27)
(26.28)
(26.29)
(26.30)
26A.10 Angular momentum. Quantum mechanically, _1i2L 2 is thetotal angular momentum operator. 11, m) is the simultaneous eigenketof the total angular momentum operator and the z-component of themomentum M z :
(i1i)2LI1,m)
Mzll,m)
1i21(l + 1)11, m),mll,m).
(26.31)
(26.32)
26A.11 Spherical harmonic function. A function X of angularcoordinates 0 and cp is called a spherical harmonic function of order n,if rnX becomes a harmonic function (-+2C.11). X satisfies
LX + n(n + 1)X = 0, (26.33)
where L is in 26A.1. Because of the completeness (-+17.3) of thespherical harmonics (essentially, its proof is in 37.1), any sphericalharmonic function of order n can be written as
n
X(O, cp) = L AmY~(O, cp).m=-n
373
(26.34)
26A.12 Addition theorem. Let, be the angle between the directionsspecified by the angular coordinates ((}, rp) and ((}', rp').356 Then,
Pn(cos,) = 47f t ynm(B', rp,)ynm(B, rp). (26.35)2n + 1 m=-n
This theorem allows us to decouple two directions.[Demo]. Notice that Pn(COS'f) is a spherical harmonic function of order n (due tospherical symmetry), so that we can expand it as
n
Pn(COS'f) = L Ynm(8,!p)Am(fj',if")'m=-n
(26.36)
The coefficients are fixed immediately from the following formula and the orthogonality of {Y;,}.
26A.13 Lemma. Let X be a spherical harmonic function of ordern, and, is the angle in 26A.12. Then,
127r 17r 47fdrp d(}sin(}X((},rp)Pn(cos,) = X(O',rp').
o 0 2n + 1(26.37)
[Demo]. The integration is all over the sphere, so we can freely choose the 8 = 0direction. Let us choose it to be the direction of (8', !p'), and write the new angularcoordinates as h, t,b). The integral we wish to compute becomes
(26.38)
where .\:" is X in new variables. X is again a spherical harmonic function of ordern (look at the spherical symmetry of (26.33)), so that it can be expanded as
Hence,
n
Xh, t,b) = L Bnynmh, t,b).
m=-n
(26.39)
1= J2n4: 1B o. (26.40)
To calculate Bo note the fact th'at ym 0, if') = 0 if m i= 0 (see the definition of P;:'in 26A.5), and Y~(O, if') = (2n + 1)/41r (Pn(l) = 1 -+21B.5(1)). Hence, from(26.39) we obtain
356 We have
, ){;!f;1r , '){;!f;1rBo = Xn(O,t,b -21 = X(8 ,!p -2l'n+ n+
cos'f = cos () cos ()' + sin () sin ()' cos(!p - !p').
374
(26.41 )
26A.14 Multipole expansion. Let p(x) be the charge distribution.Then the potential due to this charge distribution with respect to thezero potential at infinity is given by
J p(y)V(x)= dY4 I I'
1l"Eo X - Y
If p(x) vanishes for Ix I 2:: R, then
V(x) = to EO~n+1 Ctn 2m\ 1q;;'V,:"(B, 'I'l] ,where
q: = lR dr l7r dBsinB l27r d<prn~r:(B,<p)p(r,B,<p).
The expansion (26.43) is called the multipole expansion. 0[Demo], Let the angle between x and y be T' R = Ixl and r = Iyl. Then
Ix - yl = RJ1- 2(cosi' + (2,
(26.42)
(26.43)
(26.44)
(26.45)
where ( = r/R « 1). With the aid of the generating function of the Legendrepolynomials (-+21A.9), we get
(26.46)
Now we use the addition theorem 26A.12 to separate the x and y directions as
(26.47)
Putting this into (26.42) and exchanging the order of summation and integration(-+19.11), we get the desired formula.
26A.15 Lower order multipole expansion coefficients. For loworder expansions, the Cartesian expression is much more popular. Itreads
q P ·R lL:· ·Q··RR-V(R) = - +-- + - ~,J ~J ~ J +... (26.48)R r 3 2 R5 '
where R is the position vector from the center of the charge distribution, q is the total charge, p is the dipole moment
p = Jdxp(x)x,
375
(26.49)
and Qij is the quadrupole moment tensor
Qij =Jdx( 3xix j - X20ij )p(x). (26.50)
In terms of these more familiar moments, we can write
qg1
(26.51)V41r q,
qi -J831r (Px - ipy), (26.52)
q~ {f;pz, (26.53)
-1 {[;(px + ipy), (26.54)q1
q~ 112~(Ql1 - 2iQ12 - Q22), (26.55)
q~ -~ff;(Q13 - iQ23), (26.56)
qg 1ff (26.57)- "2 41r Q33,
-1 1ff;q2 3 81r (Q13 + iQ23), (26.58)
-2 1/* . (26.59)q2 12 21r (Ql1 + 2ZQ12 - Q22)'
Note that, generallyqln = q-m (26.60)n 11·
26.B Application to PDE
26B.l Formal expansion of harmonic function in 3-space. 26A.l3 and 26A.9 tell us that a harmonic function 1/J (---+2C.ll) can havethe following (formal)357 expansion in 3-space in terms of spherical har-
357If we wish, we could sayan expansion as a generalized function (-+14).
376
monic functions:
00 I
'l/J = L L Rlm(r)Yzm(O, cp),1=0 m=-I
where Rlm(r) obeys (-t26A.l)
d2 Rdr2rRIm = l(l + 1);.
Hence, Rim has the following general solution (-tllB.14)
Rlm(r) = Almrl + Blmr-I-l.
That is, we get the following formal expansion:
00 1
'ljJ = L L (A1mrl + Blmr-l-l)Yim(o, cp).1=0 m=-I
(26.61 )
(26.62)
(26.63)
(26.64)
26B.2 Interior problem. A harmonic function on 3-ball of radius acentered at the origin must be finite at the origin, so its general formmust be
00 I
'l/J = l: l: Almr1Yim(B, cp).1=0 m=-I
(26.65)
for r E [0, a].(1) Dirichlet condition on the sphere. The solution to the Lapalceequation on the sphere with the boundary condition at the surface
'l/J(a,O,cp) = V(B,cp)
must have the form of (26.65). Hence we must have
00 1
V(B, cp) = l: l: A1malYim(O, cp).1=0 m=-l
With the aid of the orthonormality in 26A.9, we obtain
7r 27r
Almal = fa dO sinO fa dcpY~(B, cp)V(O, cp).
(26.66)
(26.67)
(26.68)
(2) Neumann condition on the sphere. The solution to the Lapalceequation on the sphere with the boundary condition at the surface
a'ljJ\ar r=a = E(O, cp).
377
(26.69)
Differentiating (26.65), we obtain
(26.70)
Hence, it is easy to obtain an explicit formula analogous to (1).
26B.3 Exterior problem. If the harmonic function outside of asphere is bounded, then the solution must have the following form
00 I
'I/J = 2: 2: Blmr-l-lYzm(o, rp).1=0 m=-I
(26.71)
Blm are determined with the aid of orthonormality of spherical harmonics just as the interior problem.
26B.4 Uniqueness condition for exterior problem. We have discussed that if the domain D is not bounded, then the uniqueness condition is not trivial (~1.19, 29.9). To study this, first we study theproblem in the domain DnV, where V is a sphere of radius R. Suppose'l/JI and 'l/J2 are solutions to a given Dirichlet problem. Let 'I/J = 'WI - 'l/J2.Then, it is a solution to a homogeneous Dirichlet problem. Green'sformula tells us that
r (grad'I/J?dr = r 'I/J grad 'ljJ . dB = r 'I/J grad 4' . dB.JDnv Ja(DnV) JavnD
(26.72)Hence. for the integral to vanish a sufficient condition is
j'I/Jl < const.R-1/ 2-€ (26.73)
Boundedness of 'I/J is generally not enough to guarantee the unique solution.
26B.5 Annular problem. If the domain is a concentric sphere. theproblem is called an annular problem. In this case both terms in RIm in26B.l are needed. The boundary conditions on two spherical boundary surfaces allow us to determine the coefficients uniquely.
Exercise.Find the harmonic function on the annular region r E [a,3a] with the boundaryconditions u = cos ¢ on r = a and u = cos 3¢ on r = 3a.
26B.6 Cylindrically symmetric case. If the system under consideration is independent of rp (~24C.l), then the general solution
378
has the following formal expansion:
00
'I/J(r,(),<p) = 2)Alrl +Blr-I-1)pl(COS()).
1=0
(26.74)
This is certainly a solution of the Laplace equation as can be seen fromthe result in 26B.l (also 26B.8). The uniqueness of the solution tellsus that this is the general solution.
26B.7 Examples.(1) A conducting sphere of radius a is separated into the upper andthe lower halves. The upper halfis maintained at potential V1, and thelower at va. The electric potential outside the sphere is given by
v+V1 - va a -(V1_ Vo) L (-1 )(1_1)/2 2l + 1 (~) 1+1 (l - 2);; PI(cos ()).2 r oddl J2 r (l + 1)..
(26.75)(2) The electric potential due to uniformly charged disk of radius a.For r > a
Q 00 n-1 (2n - 3)!! (a) 2nV = -- L(-l) ()1I - P2n-1(cos()).
21rEor n=l 2n .. r(26.76)
Here Q is the total charge on the disk. For r < a there is an extracomplication, because () = 1r/2 is in the disk. However, for 0 E [0, 1r/2)there is no problem, and the solution is
Q [r 00 (-It-1(2n - 2)! (r)2n ]V=-- 1--P1(cos())+L 2n-1( )1' - P2n-1(cos()).
21rEor a n=l 2 n - 1 .n. a(26.77)
For () > 1r/2 we use the symmetry V(r,(),<p) = V(r,1r - O,rp).(3) The equilibrium temperature distribution of a half ball of radius awith the surface temperature specified as T = f( cos 0) and the bottomdisk is maintained at T = a. In this case we use the reflection principle(-716A.I0) to extend the problem to the whole ball. The boundarycondition for the extended problems is given by Tr=a = g(cos 0), whereg(x) = sgn(x)f(x). From the symmetry, the boundary condition onthe bottom surface is automatically satisfied. The formal expansion ofthe interior problem with cylindrical symmetry (-726B.6) is given by26B.3, so the answer reads
00
T = L AlrlYz0 ((), rp)1=0
379
(26.78)
with
(26.79)
Exercise.(1) Find the gravitational potential due to a sphere of radius R with the densitydistribution given by
p = rk X m (8, ',0),
where X m is a spherical harmonics of order m (--+26A.ll).In this case due to the superposition principle, the potential V is given by
(26.80)
(26.81)
Use (26.46) in 26A.14 to expand the Green's function. Then, use 26A.13 toperform the angular integral. In this way, we arrive at
411" R m +k+3 1V(x) - -- --X (8 'rJ)
- 2m + 1 m + k + 3lxlm+l m 'T'(26.82)
(2) Discuss the waves in a thin spherical layer of radius R. The equation of motion isthe wave equation written in the spherical coordinates with r suppressed (r = R).
380
(27.1)
27 Cylinder Functions
Separation of variables of the Laplace equation in the cylindrical coordinates requires Bessel and modified Bessel functions, which may perhaps be the most representative specialfunctions. Bessel and Neumann functions make a fundamental system for the radial part of the separated equationcalled the Bessel equation. Classical results about Besseland Neumann functions are summarized such as orthonormal relations (Fourier-Bessel-Dini expansion), generatingfunctions, integrals containing Bessel functions. Bessel functions with half odd integer parameter (or their streamlinedversion: spherical Bessel functions) are required to solve theHelmholtz equation in the spherical coordinates.
Key words: Bessel equation, Bessel function, Bessel's integral, generating function, recurrence relations, cylinderfunctions, zeros of Bessel functions, Neumann function, Hankel function, Fourier-Bessel-Dini expansion, Modified Besselfunction, spherical Bessel function, partial wave expansion.
Summary:(1) The Laplace equation in the cylindrical coordinates requires Besseland Neumann functions (27A.I, 27A.2, 27A.16). Pay attention tothe general shapes of these functions (27A.4, 27A.16).(2) Bessel functions make an orthonormal eigenfunction set for the radial part of the Laplacian (27A.2I-22).(3) The Helmholtz equation in the spherical cooridnates requires spherical Bessel functions (27A.25-26).(4) Many second order linear ODE can be solved in terms of cylinderfunctions (27A.28).
27.A General Theory
27A.I Bessel's equation. In terms of z = ar, the equation (23.17)(-423.9(1)) becomes
d2u 1 du ( m
2)-+--+ 1-- u=O.
dz2 Z dz z2
381
Z = 0 is a regular singular point (-+24B.2(1)), and Z = 00 is an irregular singular point (-+24B.2(2)).358
27A.2 Series solution to Bessel's equation around z = o. Theindicial equation (24.24) (-+24BA) is </J(JL) = JL2 - m2 = O. ChooseJL = m, al = 0, ao = 1/2m r(m + 1) and follow 24B.3. We get
(z)m 00 (-l)k (Z)2kJm(Z) = "2 Eklr(m+k+1) "2 (27.2)
This is called the Bessel function of order m (of the first kind). IfJLl - JL2 = 2m is not an integer (-+24B.6[1]), then J-m is a partner(-+24A.13) of Jm in a fundamental system of solutions (-+24A.ll) of(27.1). If m is a half odd integer, then Jm and J-m are still functionally independent (that is, this is the case with no logarithmic term in24B.7[22]).
If m is a positive integer, then Jm and J-m are not functionallyindependent:
Jm = (_l)m J-m. (27.3)
This can be demonstrated from (27.2) with the aid of r( -m + k + 1) =00 for k < m (-+9.1 or 27A. 7). In this case we need a different partner: Neumann functions (-+27A.15).
Exercise.(A)(1) Show that
(_1)k (z)m+2k (z)m (_1)k z2k ( 1 1)k!r(m + k + 1) 2 = 2 r(m + 1/2)r(1/2) (2k)!B m + 2k + 2 .
(27.4)(2) With the aid of the integral expression of the Beta function (9.22), show thatformally359
Jm(z) = r(1/2)r(~ + 1/2) (~) m11
tm
-1/ 2(1_ t)-1/2 cos[z(1 - t)1/2]dt (27.5)
for m + 1/2 > O.(3) Now, changing the integration variable as t = sin2 0, this formula can be rewritten as
1 (Z) m rJm(z) = r(1/2)r(m+ 1/2) 2 Jo cos(zcosO)sin
2mOdO. (27.6)
Notice that the integration from 0 to 7r/2 and from 7r /2 to 7r are identical in this
case, so Jo7r can be replaced by 2 Jo
7r/2. This formula is called Poisson's integral
358Therefore, the Bessel function is a special function of confluent type (---+23.5).359The exchange of the order of summation and integration can be justified.
382
representation.(4) If 1 - t = x 2 is introduced, then (27.5) can be rewritten as
1 mJlZ 1 2 m-1/2 izxdJm(z) = f(1/2)f(m +1/2) ("2) -1 ( - x ) e x.
(B) Demonstrate the following Whittaker's integral representation
I n+1/2(Z) = (_i)n f1rJ1 eizx Pn(x)dx.v~ -1
Here Pn is the Legendre polynomial.(C) Show
l1r
/2 sin x
Jo(xcos9)cos9d9 = --.o x
[Hint. Use the integral of cosn 9.]
(27.7)
(27.8)
(27.9)
27A.3 Definition. The Bessel function of order l/ can also be defined by
J () 1 r -"-(t-1)t-(II+l)dtliZ = 27riJce2 t ,
where C can be a unit circle centered at the origin,number. Obviously,
(27.10)
and l/ any real
Jo(O) = 1, In(O) = 0 for positive integer n. (27.11)
For integer l/ this definition and the result in 27A.I are identical asseen in 27A.4.
Discussion: Where did the Bessel functions appear first?The position of the earth (x, y) on the rotation plane can be written as
x =a(cos¢ - e), y =a~ sin¢, (27.12)
where e is eccentricity, a the long radius, and ¢ the excentric angle measured fromthe perihelion, and
¢ - esinq'> = vt, (27.13)
where t is the time since the earth passed the perihelion, and v is the average angularvelocity. Hence, if we can write down q'> as a function of t, then we can explicitlyobtain x(t) and y(t). Consider
dq'> 1-= ,dvt 1- ecosq'>
(27.14)
which is an even periodic function of vt. Hence, we can Fourier-expand it as
d¢- = 1 + 2: an cosnvt.dvt n=lOO
383
(27.15)
The coefficient can be computed as
an = 2~(1-eCOSc;i»-lcosnutd(ut),
211<= - cos[n(6 - ecos¢)]d¢.1r 0
Comparing this with the generating equation in 27A.S, we obtain
(27.16)
(27.17)
(27.18)
Exercise.Demonstrate
1 11<In(z) = _. eizcosil cosnOdO.1n n 0
(27.19)
27A.4 Series expansion. With the change of variables from t tou = tz/2, we rewrite the RHS of (27.10) as
~ (_:)11. r eu-z2/4uu-(II+l)du = ~ (:)11 f: (-l)m (:)m { eUu-(II+m+l)du.21rz 2 lC I 21rz 2 m=O m! 2 lCI
(27.20)The integral can be computed with the residue theorem (~8B.2) as
(Z)II 00 (_l)m (z)2m
JII(z)= 2" ];omlr(v+m+1) 2" ' (27.21)
which is in agreement with (27.2). The series is convergent on the wholecomplex plane. Due to the factor (_l)m it is clear that JII cannot haveany pure imaginary zero. From the formula, near the origin
..
(27.22)
384
891011122 3 .56
~I I I I I~ J,(x)
Io).!.~I I
f-
" [I,(x) I II/'"~~ I-"-J~- Jb) J~x/ , , , '. "
1/, , 1"'\ , / r-..... , N., ' ..., -
I'\. [;- ~ ..' ...., , I ~ l.,.. 1"';%
r--... ;\;1- Ii~( -- if ,.L~.!fr-1-. % I-JoI.%)
1.0
0.80.60..0.2o
-0.2-0.4-0.6
0
Exercise.(This problem need not be here.) Demonstrate
11
tn+lJn(t)dt = In+l(l). (27.23)
27A.5 Generating function.
1 z(t_ 1 )ez t = (27.24)n=-oo
This is from (27.10). This equation implies that I n for integer n is thecoefficients of the Laurent expansion (~8A.ll(2)) of exp[z(t - r 1)/2]around t = O.
27A.6 Bessel's integral. Replacing t in (27.10) with ei8 , we haveBessel's integral
1171"In(z) = - cos(nB - zsinB)dB.1r 0
(27.25)
Exercise.Show
(27.26)
27A.7 J-n(z) = (-1)nJn(Z). This can be obtained by replacing Bwith 1r - Bin (27.25). (We have already shown this in 27A.2.)
27A.8 Sine of sine ~ Bessel functions.
sin(zsinB) = 2J1(z)sinB+2Ja(z)sin3B+2J5(z)sin5B+ .. ·. (27.27)
To show this rewrite (27.24) with the aid of 27A.7 as
00
e!:(t-f) = Jo(z) + L In(z)[tn + (- )nt-nj.n=l
(27.28)
Now replace t with ei9 , and we get
eizsin9 = Jo(z) + 2iJ1(z) sin 9 + 2,h(z) cos 29 + 2iJ3(Z) sin 39 +.... (27.29)
385
Splitting this into real and imaginary part, we get (27.27) and
cos(z sin 8) = Jo(;;) + 2h(z) cos 28 + 2J4(z) cos 48 + ..... (27.30)
Thus, when sine appears inside a trigonometric function, recall In.
27A.9 Recurrence relations. Differentiating (27.24) with respectto z and comparing the coefficients of the power of t, we get
(27.31 )
In particular, with the aid of 27A. 7 we have
(27.32)
If we differentiate (27.24) with respect to t and then compare the coefficients of tn, we get
(27.33)
27A.tO Cylinder function. Any function f(z,1/) satisfying the following relations is called a cylinder function:
f(z,1/ - 1) + f(z, 1/ + 1)
f(z,1/ - 1) - !(z, 1/ + 1)
1/2- f(z, 1/),
Z
fJ2 fJz!(z, 1/).
(27.34)
(27.35)
(27.31) and (27.33) thus imply that Bessel functions are cylinder functions.
Exercise.(1) These relations can be rewritten as
ddZ
[ZV Jv(z)] = ZV Jv-I(z),
ddz[z-V Jv(z)] = _z-vJv+l(z).
(2) Derive
z-(v+m) Jv+m(z) = (_l)m (~ :z) m [z-V Jv].
Similarly, we can obtain
386
(27.36)
(27.37)
(27.38)
(27.39)
(3) Integral related to the Fraunhofer diffraction through a circular aperture:
(27.40)
[Hint. Use 27A.6 and 27A.10.]
27A.II Zeros of Bessel functions.(1) There are infinitely many zeros of In(z).(2) All the zeros of I n for n > -1 are real and of multiplicity one exceptz = O.(3) z-nJn(z) has no zero of multiplicity larger than one.(4) The zeros of In(z) separate the zeros of I n±l(Z).[Demo] From the Bessel equation (27.1) by scaling with a constant a we get
(27.41)
Hence,
where 13 is another constant. Multiplying x and integrationg (27.42) gives
l b [ () ()] bdJn ax dJn f3x(13 2
- a 2) xJn(ax)Jn(f3x)dx = x d I n(f3x) - x In(ax).
o x dx 0
(27.43)Since I n (x) is of order xn near x = 0 (-+(27.22)), if n > -1, the contributionfrom x = 0 of the RHS of (27.43) vanishes. Choose a to satisfy In(ab) = 0, and set13 =ct. Then I n(f3b) = 0 since all the coefficients in (27.21) are real (In(z) = In(z)).For these choices, the RHS of (27.43) is zero, so we have
(27.44)
This implies ,82 = a 2 , that is a = ct, since there is no pure imaginary zeros (-+27.4).That is, the zeros of I n are all real if n > -1.
The multiplicity of the zeros is known from the general property of the fundamental system (-+24A.12). At z = 0 the coefficient function is not C 1 , so thisargument is not applicable to z = O. Thus (2) and (3) have been demonstrated.Bessel's equation can be rewritten as
d2 Udz2 +HU =0 (27.45 )
with U(z) = In/JZ and H(z) = 1 + (1/4 - n2 )lz2• Let x E R be sufficiently large
to make H(x) > O. Suppose U > O. Then, irrespective of the sign of dUIdx (27.45)tells d2U/dx2 < O. Hence, as long as U > 0, dUIdx decreases with the increase of x.This continues until U = 0, but there dUIdx < 0 so U becomes eventually U < o.
387
This argument can be continued indefinitely. Thus, there must be infinitely manyzeros for Jn' This is (1). (4) follows from
d ( ±vJ ) vdz z v = ±z Jv=p, (27.46)
which can be derived from the nature of Jv as cylinder functions (......27A.I0).
27A.12 Proposition. For x E R
IJo(x)1 < 1
IJn(x)1 < 1/V2 for n = 1,2"" .
(27.47)
(27.48)
The first inequality follows from (27.25). The second inequality followsfrom the Gegenbauer-Neumann formula (---t27A.14. Expand the LHSand compare it with the RHS. cf. 27A.15(2).).
27A.13 Addition theorem.00
In(x + y) = L In-s(x)Js(Y),s=-oo
(27.49)
n 00
L Js(x)Jn-s(Y) + L(- )s[Js(x)Jn+Ay) + In+s(x)Js(Y)].s=o s=l
(27.50)
This follows from the generating function (27.24):
00 00 00
L In(x + y)tn = e(x+y )(t-1/t)/2 = L In(x)tn L In(y)tn.n=-oo n=-oo n=-oo
(27.51 )
27A.14 Gegenbauer-Neumann formula.
Jo(VR2 + 2Rr cos, + r2)00
Jo(R)Jo(r) + 2 L (_1)m Jm(R)Jm(r) cos m"m=l
00
Jo(R)Jo(r) + 2 L Jm(R)Jm(r) cos m,.m=l
(27.52)
[Demo] The second formula can be obtained from the first by r ...... -r and 27A. 7.With the aid of the generating function 27A.5, we obtain
(27.53)
388
Setting>. = eiO , this equation becomes
00 00
L In(eiOx)tn = e-(ix/t)sinO LeinOIn(x)tn. (27.54)n=-oo n=-'X)
Following the demonstration of the addition theorem 27A.13, we obtain
n"foo In(eiOx+ei'Py)tn = e-(i/t)(xsinO+ysin'PJ [n"foo einO In(x)tn] [n"foo ein ,!, In(y)tn] .
(27.55)Let x, y, () and 'P be real and xe iO + yei'P be real. Then, x sin () + y sin 'P = O.Compare the coefficients of to:
00
Jo(x cos() + y cos'P) = L eim(O-'PJJm(x)Lm(y)nl=-OO
Notice that if the imaginary part of ReiO + rei,!, vanishes, we can write
with '"Y = () - 'P. This concludes the proof. See 27A.2.
Exercise.Show 11"In(Z)Jn(z') = - Jo(J z2 + z'2 - 2zz' cosO) cosnOdO
7r °for n = 0,1,2, .....
27A.15 Integrals containing Bessel functions.(1) From Bessel's integral 27A.6 for a ~ 0 and b> 0
(27.56)
(27.57)
(27.58)
(27.59)
Especially 1000 Jo(bx)dx = lib. Replacing a in (27.59) with ia we get
(b> a assumed)
100 1
Jo(bx) cos axdx = .o Jb2 - a2
(27.60)
Differentiating these equations w.r.t. a, we compute similar integralswith insertions of powers of x. With the aid of (27.32) and integrationby parts, we obtain
roo bJo e-ax J1(bx )xdx = (b2 + a2 )3/2 .
389
(27.61)
(2) From the Gegenbauer-Neumann formula 27A.14 with the aid ofthe orthogonality of {coswy} we obtain (--*17.16 with l = 1f)
From this and (27.59), we get
(27.63)
Notice that this formula contains the generating function for the Legendre polynomials (--*21A.9), so that expansion in terms of r / R canbe calculated with the aid of Pn (cos,).(3) [Weber's integral] Expanding Jv(bx) as in 27A.2 and termwise integration (--*19.11) give for a > 0, b> 0 and for Rev>-l
(27.64)
(4) [Lommel's integral]
xa2 _ f32 {aJn (f3x)Jn+1(ax) - f3Jn (ax) In+l (f3x)} ,
(27.65)x
a2 _ f32 {f3Jn (ax)Jn - 1(f3x) - aJn (f3x)Jn+1(ax)}.
(27.66)
This follows from (27.43) and recurrence relations (--*27A.9).
Exercise.Show 2100
cosxtJo(x) = - ~dt.
1r 0 1 - t2(27.67)
27A.16 Neumann function of order m. When m E N\ {O}, we mayuse the general theory or the procedure in 24B.7[22], but traditionally,the following partner is chosen:
(27.68)
which is called the Neumann function of order m. For non-integer m(27.68) is well defined and obviously a partner of Jm in 24B.1. If
390
mEN, (27.68) becomes 0/0, so we interpret the formula with the aidof l'Hospital's rule:36o
The general solution of Bessel's equation (27.1) is given by
(27.70)
Notice that Neumann functions are cylinder functions as easily explicitly checked (~27A.I0).
Exercise.Demonstrate that
(27.71)
27A.17 N n(z) is singular at z = o. This follows from explicit formulas in the z ~ 0 limit (see the footnote of the previous entry):
No(z) '" (2/1r) In(z/2),Nn(z) '" -(n - 1)!(x/2)-n /1r for n = 1,2,···.
(27.72)(27.73)
0.5
o
-0.5
11II.(% ,I,N,(%) N
%) N,(' , %)'No(%).NI(%)'t:'3 %
~
~ ?t ;;-
~"~\~'I N.(% N,(%) :(%) N,(%)"N,{z
,/I
:r
-1.0o 1 2 3 • 5 6 7 8 9 10 11 12
360If we explicitly compute (27.69), we get
_~ " (.:.)_.!. ~1(m-k-1)! (.:.)-m+2k_.!.~¢(k+1)+¢(m+k+1) _ m(.:.)m+2kNm(z) - 1r Jm(~) In 2 1r t:o k! 2 1r t:o k!(m + k)! (1) 2 '
where l/J(z) = f'(z)jf(z). Thus this form is in conformity with the general theory24B.7[22].
391
(27.74)
27A.18 Lommels' formula. Since Jv and Nv make a fundamentalsystem of Bessel's equation (-+27A.16 and 24AA), their Wronskian(-+24A.6) W must satisfy 24A.13, i.e.,
W() W; -lnx WoX = oe =-.
x
To calculate Wo we may use limx-->o xW(x) = Woo If v is not an integer,Jv and J- v make a fundamental system (-+27A.2)' so
lim xW( Jv(x), J-v(x)) = lim x[Jv(x)J~v(x)-J~(x)J_v(x)] = _ 2 sin V1f,x-->o x-->o 1f
(27.75)where we have used the formula of complementary arguments for theGamma function 9.5. Thus, we obtain
2 sin V1fW(Jv(x), J-v(x)) = - .
1fX(27.76)
The result is correct even if v is an integer due to continuity. With theaid of this formula and the definition of Nn in 27A.16, we obtain
(27.77)
This is called Lommel '8 formula.
Exercise.Show(1)
(27.78)
(2)
(27.79)
27A.19 Bessel function with half odd integer parameters (Seespherical Bessel functions in 27A.25). The Bessel and Neumann functions with half odd integer parameters can be written in terms of elementary functions:
~sinz J () _ ~cosz
---, -1/2 Z - ---1f Vi 1f Vi
392
(27.80)
(2 (sinz )V~ -z- - cos z ,
_ [2cosz, N-1/2(Z) = [2sinz,V~ V~
(T ( . COSZ)1y:;;: smz + -z- .
(27.81)
(27.82)
(27.83)
Exercise.Derive
m zm+l/2J2 (1 d) m (Sin z)Jm +l/2 = (-1) -- -.-/7r z dz z
(27.84)
27A.20 Hankel functions. Hankel functions are defined as follows:
H~l)(Z)
H~2)(Z)
Jl1 (z) + iNn(z),
In(z) - iNn(z).
(27.85)
(27.86)
H~l) and H~2) make a fundamental system of solutions (----+24A.ll) forthe Bessel equation (----+27A.l).
Exercise.Show
H (2) ( )H(l) () H(l) ( )H(2) ( ) _ 4n x n+l x - n X n+1 X --.7rX
(27.87)
(27.88)
27A.21 Orthonormal basis in terms of Bessel functions. Theset of kets Ii, v) defined as follows is an orthonormal basis (----+20.10) ofL2 ([0, a], x) (----+ 20.19)
(xli, v) = J2 (v) Jv(r;v)x/a),aJv+1( r i )
where r~v) is the i-th zero of Jv(x) (----+27A.ll). That {Ii,v)} is abasis follows from the corresponding eigenvalue problem and the general theory (----+36.3). That this is normalized (orthogonality followsfrom the general theory) is seen with the aid of Lommel's integral(----+27A.15(4)). Using I'Hospital's rule, we take the a ----+ {3 limit toobtain
'~100
xJvCr~V)x/a?dx = a; [J~(r~v)))2. (27.89)
This can be further transformed into the desired result with the aid ofa recurrence relation in 27A.9.
393
The corresponding decomposition of unit operator 1 = L:~1 Ii, v)(i, vi(20.15) implies (cf. 20.26 for the delta function with a weight)
(27.90)
27A.22 Fourier-Bessel-Dini expansion. f E L2([0, a], x) (---t20.19)can be expanded as
where
00
f(x) = L CmJ1Arc;,lxja),m=l
(27.91)
(27.92)
Notice that this is nothing but a standard generalized Fourier expansionwith a special choice of the orthonormal basis. Hence the analogues ofthree key facts (---t17.8) holds.
27A.23 Modified Bessel functions. In terms of z = ar, the equation (23.37) becomes (---t23.9(3))
d2u 1 du ( m
2)-+--- 1+- u=O.
dz2 z dz z2(27.93)
z = 0 is a regular singular point (---t24B.2(1)), and z = 00 is anirregular singular point (---t24B.2(2)). If z in (27.1) is replaced with iz,we get this equation. Hence, Jm(iz) and Nm(iz) are solutions. However,with a suitable phase factor the following set is usually chosen as afundamental system of solutions (---t24A.ll)
mrri/2 J (. ) _ ~ 1 (~)2n+m (27.94)e m ~z - ~ n!f(n + m + 1) 2
7r Lm(z) - Im(z) 7r emrri/2J_m(iz) - e-mrri/2Jm(iz)- -2 sin m7r 2 sin m7r
(27.95 )
I and K are called modified Bessel functions. They are not cylinderfunctions.
394
2.0""""'\""TT:----"""----'---'/""I...---,---rr-j '"TT"I/---,--,,-r--T1,
1.8 I I II 1/ I II I1.6 I ;;-i--f--t++--fr--H--hH1.4: .:ij 'i'l I j ,
\ ': 'J ~ I I I1.2 -J)(", V if II I1.0 -V-"- I / -!! I I0.8r-"" I~ ....../
0.6 "sill Ii / I0.4 ~\ / II / l 1/0.'2 V ')(, / / / ....VI~v .....~"..... /f/1
Exercise.(1) Show the leading singularities:
K o(x) = -In x-, + In 2 + ,
Kn(x) = 2n - 1(n - 1)!x-n + .(2) Demonstrate
00
cosh x = l o(x) + 2 L 12n (x).n=1
(3) The solution tou" - zu =0
(27.96)
(27.97)
(27.98)
(27.99)
is called Airy functions. They become useful to study asymptotic behaviors of theBessel functions for large Izi and Ivl. We can easily find a fundamental system forthis equation, looking at the table in 27A.28:
U1 = Ai(z) == ~ (ir/2 K1/3 eZ;/2) = z~2 [L1/ 3 CZ;/2) - Il / 3 eZ;/2)]U2 = Bi(z) == z~2 [L1/ 3 CZ;/2) +11/ 3 eZ;/2)]. (27.100)
Ai (resp., Bi) is called the Airy function of the first (resp., second) kind.
27A.24 Helmholtz equation. For the equation of the type Lt'I/J =ti,',p, where L t is a differential operator with respect to time, the separation of variables gives us the Helmholtz equation
(27.101)
where - is explicitly written, because the Laplacian is a non-positiveoperator. The separation of variables in the spherical coordinates 'I/J =
395
R(r )Y (0, cp) gives
(1 a a 1 a2
)sinOaOsinOaO + sin2 0acp2 Y(O,cp) = -f(f+1)Y(O,cp), (27.102)
(27.103)
and~ d
2rR(r) = (_K2+ f(f + 1)) RCr).
r dr2 r2
(27.102) with the periodic boundary condition on the sphere giveseigenfunctions y£m(o,cp) (~26A.8) (f = 0,1,2,,,, and m = -/!,-/!+1 ... -1 0 1 ... /! for each /!) We may assume, , "" .
00 R
'ljJ = L L Rlm(r)Y£m(e, cp).R=Om=-e
(27.104)
(27.105)
Here Rim obeys (27.103).
27A.25 Spherical Bessel functions. Introducing u = ..j"K:FR(r) andz = Kr, (27.103) becomes
d2 ,u ! du (_ (f + 1/2)2) _d 2 + d + 1 2 U - O.z z z z
This is Bessel's equation (27.1) with m = .e + 1/2. Therefore, thefundamental system of solutions for (27.103) consists of JR+1/2(Kr)/..j"K:Fand NR+1/2Kr) /..j"K:F. Thus the following spherical Bessel function je andspherical Neumann function ne are defined:
je(z) ~ [7rJR+1/2(Z), ne(z) ~ [7rNe+1/2(z).V~ V~
The general solution to (27.103) is given by
(27.106)
Aje(z) + Bne(z). (27.107)
The spherical Hankel function is also defined analogously
h(1,2)( ) _ {[;H(1,2) ( )I X - 1+1/2 X •2x
(27.108)
Exercise.(1) Demonstrate
. (.) sin x)0 x =
x(27.109)
396
with the aid of the series expansion of the Bessel function. Also demonstrate
(2) Show
(3) Show
cosxno(x) =--
x(27.110)
(27.111)
(27.112)
27A.26 Orthonormal basis in terms of spherical Bessel functions. There is nothing new in the present case, since we know thecorresponding result for the Bessel function (--+27A.17). Therefore,
{f£ }OO
2 1 . (I)
a3 . ((I»)Jl(Pi ria) ,Jl+1 P, i=l
(27.113)
(27.114)
(27.115)
(27.117)
where p~l) = r~l+1/2) is the zeros of J1+1/2 (--+27A.ll), is an orthonormalbasis of £2([0, a]' r2 ) (--+20.19). For example, the decomposition of unitoperator reads (cf. 20.27)
8(x - y) _ 00 2 1 . (p~l)x) . (p~l)y)-----=-----=--'-- - '"' - JI - JI -
x2 ~ a3 . ((l»)2 a a',=1 Jl+1 Pi
27A.27 Partial wave expansion of plane wave.00
eikrcos9 = L(2l + 1)i1jl(kT)Pl(COSO).1=0
[Demo] eik .r satisfies the Helmholtz equation (.6.+k 2 )u(1') = 0 (---.27A.24). Hence.we may assume
00
eikrcos 0 = L cz]l(k1' )PI(cosO). (27.116)1=0
Therefore, the problem is to determine the coefficients ct. With the aid of theorthogonality of the Legendre polynomial (---.21A.5), we obtain
cljl(k1') = 2l +111
dxe ikrx PI(X).2 -1
To evaluate the integral, integrate it by parts and ignore 0[1/1']. We have
j1'k l'k I'k 2il
( In)dxe' rXPI(X) "" -.-[e"" - (-) e-' r] = -sin k1' - - .-1 zk1' kl' 2
(27.118)
Comparing this with the asymptotic formula for l' ---. 00. we arrive at ct = (2l + l)il.
397
27A.28 ODE solvable in terms of Cylinder functions. Manysecond order linear ODE can be solved in terms of cylinder functions.See the table.
"" +z.-Iu ' + (:jcltl'.U =0
u" + [13c1a + (li( 4:::'» lu =0
u'~ +a.z"~= 0
u" + (e2<-.') u=O
u" + (e2/·-.2)z-4 u =0
u" + (~+2i)U' - (~±;-) u =0
""+u'- 4.,2-1 u=o4%2
u" + (2. +l_ a )u, _ a(2. +1) u =0% 2%
u" +C-z2« +213ri%7-1) u'
+ [«2~;'r'+13r(r- 2a)iz7 -'] u =0
u"+(~-2lanz )u'-(~+ ta;,')u =0
u"+(~+2cotz )u'- (~-co;,')u=O
u" +(~-21f) u' + (1_~+~2_.p' -~) u=O
" {'!'''''''(Z) _.:!.(~"(zl)2u + 2 .p'(z) 4 ~'(z)
1 ](~'( »)'}+[(~(z»'+4-" ~.~ u=O
u" _ 'P'(z) u' +{~('P'(z»)' _.!. 'P"(z)'P(z) 4 tp(z) 2 'P(z)
_~ (~"(Z»)' + .!.~"'(Z)4 .p'(z) 2 ~'(:::)
+[(~(z»'+.!.-.,] (~'(Z»'}u =04 _ ~(Z)
,," _[1/I"(Z) + (2a -1) Iit'l z)] u'~'(Z) 1/I(z)
+[«'_.'+ (~(z»'1(1/I'(Z»)' u =0Hz)
Z.(z)
Z.(13z)
Z.(13z7 )
..riZ.({3z)
Z.(i%)
Z.(2i..r:z )
Z,,(...I73 .'af,z)
..1% Za(..f7j elaf2:::)
...I"%Z 1 (2..1«z·;');:+:i .+2
Z.(cz)
zZ.(ell')
ectlzZ.(z)
..I%e-'I2Z.(~)
z-'e"Zf2Z{~Z)
%"e±l#.7Z.({3z7)
_1_Z.(z)COS%
_.I_Z.(Z)slnz
exp(J~dz) Z.(z)
N(z) Z (.p(z»\/ lI"(:::) •
I~z(~(z»\/ ~'(z) •
(~(z»"Z.(Hz»
398
Exercise.Find the general solution to the following ODE
d2u (1 . 2 3 2 2 ))dz 2 + :2 + smh z - "4 (tanh z + coth z u = O.
27.B Applications to Solving PDE
(27.119)
(1) A circular membrane of radius a is applied a uniform force bsin wiover the membrane. Find the forced oscillation.361
(2)362 Consider a disc of radius a whose center is located at the originin the xy-plane. The boundary is maintained at T = 0, and the initialtemperature is given by
(
x2 + y2)T(x,y,O) = To 1- r 2 . (27.120)
Assume that the thermal diffusivity is /\'. Find T(x, y, t). The solution isgiven in the form of Fourier-Bessel-Dini expansion (~27A.22). Compute the expansion coefficients explicitly with the aid of the followingformula
17r/2 2Vf(v+1)dA. sinJ.!+1 A. cos2v+ 1 A.J (z sin A.) = J (z)'f' 'f' 'f' J.! 'f' v+1 p+v+1 .o z
(27.121 )(3) Circular wave guide: The equation for </> = B z reads
(27.122)
on r = a with the boundary condition </> = 0. The field can be separatedas
</>( r, cp) = B(r )eim<P,
where m E Z due to the univalency of the field. B(r) obeys
d2B 1 dB ( m
2)-+--+ k2 __ B=O.
dr2 r dr r2
Therefore, B = Jm(kr) is the eigenfunction.
361LSU82.362 L138.
399
(27.123)
(27.124)
28 Diffusion Equation: How irreversibility is captured
Our discussion on the diffusion equation in 1 relied veryheavily on our physics intuition. We wish to see whetherour intuition is correctly captured by the diffusion equation.The maximum principle tells us that the diffusion equationcaptures well irreversible nature of diffusion processes. Thisin turn implies that the diffusion problems are well-posed inHadamard's sense. Diffusion equations allow infinite speedof propagation of signals and matter, but adding second order time derivative terms cure this unphysical nature.
Key words: maximum principle, well-posedness, preservation of order, infinite propagation speed, telegrapher's(Maxwell-Cattaneo) equation.
Summary:(1) The solution to the diffusion equation evolves in time generallytoward the more 'featureless' function. This is guaranteed by the maximum principle (28.2).(2) When the solution of a problem is unique and depends on the auxiliary conditions continuously, the problems is said to be well-posedin the sense of Hadamard (28.3). Diffusion problems are well-posed(28.4).(3) Diffusion equations allow infinite speed of propagation (28.9). Onlythe addition of higher order time derivatives can cure this (28.10).
28.1 Elementary summary. We have learned where diffusion equations appear (-+1.2, 1.14, alB.2, ale.l, alF.17). Some Green'sfunctions have been constructed (-+16B), and we physically arguedthat if it exists, it is unique in the bounded domain in particular, under the following condition with a given initial field (-+1.18):(1) Dirichlet condition: At the boundary all the values of 'lj; are specified. For the heat conduction problem, this is the condition with thegiven wall temperature (i.e., thermostated).(2) Neumann condition: At the boundary the normal derivative of'lj; isgiven. For the heat conduction problem, this is the condition with thegiven heat flux through the wall.
We heavily relied on the zeroth law of thermodynamics: there isa unique equilibrium state if we wait long enough. Our argument is,however, in a certain sense circular, because we have shown that if the
400
diffusion equation is physically reasonable, then we can rely on physicsargument. To break this circle, we must demonstrate that indeed diffusion equation reflects thermodynamics correctly. This is equibvalentto demonstrating that our intuition and our mathematics could be inharmony (at least for the diffusion equation).
Exercise.Solve
~~ = ~u + b· V'u + et sin(x - bxt), (28.1 )
with the initial condition u(r,O) = Irl. Here b is a constant vector and bx is itsx-component.
28.2 Maximum principle. Let u be a solution363 of the diffusionequation
Ut = U xx (28.2) T
on f2..-- J x [0, T], where J is an interval on the x-axis. Then, its maximum value is taken on the parabolic boundary r = oJ x [0, T] U J x {O}.In particular, this means the maximum value of lui on J is a decreasingfunction of time.[Demo] Let J-l be the maximum value of u on the parabolic boundary r, and define
17fCl~!JO/N
bott'lrit'(f
v satisfies
Vt + v = V xx
(28.3)
(28.4)
I
in n° .364 If we can prove that v :s 0 on r implies v :s 0 in 1° x (0, T], then we arealmost done. Suppose v has a maximum value v = Vo > 0 at (xo, to) E n°. At thispoint V xx :s 0 and Vt = 0, so that (28.4) implies that Vo :s 0, a contradiction. Ifthere is a maximum at the boundary t = T, then Vt ;::: and V xx :s 0, so v < O. Weare done.(1) This principle also holds in d-space. An analogous demonstrationworks in any d-space, replacing J with a bounded region.(2) As can be seen from the demonstration, if the solution may be assumed to be bounded everywhere, then the principle holds even if theproblem is on an unbounded region.
Discussion.(1) What can you say about the evolution of the number of peaks of a solution tothe diffusion equation (under, say, a time-independent Neumann condition)?
363There are actually several kinds of solutions. A solution in the ordinary sense ofthe calculus (requiring necessary differentiability, etc.), is called a classical solution.
364.4° denotes the open kernel of the set .4. That is, .4° is the largest open set in
.4.
401
Gevrey's uniqueness theorem. Consider
au a2uat - ax2 +a(x, t)u = O. (28.5)
c:,
Here a is positive and continuous in the closed space time domain in the figure. 36.5
Let u be a solution to (28.5) that is continuous in the closed domain U consideredabove, satisfying (28.5) on the region (; = U subtracted its parabolic boundary, andwith continuous atu and 8~'u there. Then, 1£ cannot have any positive maximumnor negative minimum in (;.0[Demo]. Suppose we have a positive maximum inside DABC. Then, at the point
(28.6)
so that this contradicts a > O. If there is a positive maximum on the open segmentCD, then there,
81£ 82 1£11 > 0 8t 2:: 0, 8x2 ::; 0, (28.7)
This also contradicts a > O. To show the statement about the minimum, consider-1£ instead.
28.3 Well-posedness (in the sense of Hadamard).366 Even ifthe unique solution exists, if the solution is extremely sensitive to theauxiliary conditions such as boundary and initial data, then the PDEmay be useless for describing reproducible natural phenomena. A problem is said to be well-posed (in the sense of Hadamard), if(1) there is a solution which is unique,and(2) the solution depends continuously on the data (initial and otherauxiliary conditions).
Otherwise, the problem is called ill-posed. 367 Physically reasonableproblems are often well-posed as we will see later. For example, theDirichlet problem for the Laplace equation is well-posed (---t29.9).368
The existence of a solution implies that the problem is not overdetermined. The uniqueness of the solution implies that the problem is
365 A and B can be coincident. Furthermore, the side curves can wiggle wildly solong as they do not cross the upper and lower lines.
366 Jacque Salomon Hadamard, 1865-1963. Read J. Hadamard, The Psychology ofInvention in the Mathematical Field (Dover, 1945) on creativity.
367The condition (2) must be stated more precisely with the aid of some norm(---.3.3 footnote) to make the concept 'continuous' meaningful.
368 One might suggests that chaos is an example of the lack of well-posedness, butmost examples of chaos are well-posed, because the continuous dependence of thesolution on the initial condition is trivially satisfied for any finite time.
402
(28.8)
not underdetermined.
28.4 Cauchy problem of diffusion equation with Dirichlet condition is well-posed. That is, the solution is unique and dependscontinuously on the initial and boundary data. [This theorem is provedfor a bounded region here. Also we will not discuss the existence of asolution.][Demo] Let UI and U2 be two solutions of the same problem. Then, due to the linearity of the problem, the difference U = UI - U2 obeys the same diffusion equationwith a homogeneous Dirichlet boundary condition (i.e., U = 0 at the boundary ofthe domain) and U = 0 initially as we have discussed (---+1.18). From the maximum principle U cannot be larger than 0, and -u cannot be larger than O. Hence,Ul = U2. That is, if there is a solution, it is unique. Now, we compare two differentproblems 1 and 2 with the auxiliary conditions which are different slightly. Letthe solutions of 1 and 2 be UI and U2, respectively. Then, the maximum principle tells us that the maximum value of lUI - u21 in the region cannot be largerthan the differences in the initial and boundary data. Hence, the solution dependson the auxiliary conditions continuously.369 That is, the problem is well posed inHadamard's sense.
Exercise.Show that JU Inudx is non-increasing, if U obeys a diffusion equation. Assume theinitial U 2: 0, and consider the problem in R 2
•
28.5 Anti-diffusion: violation of second law. Thermodynamically destabilizing the world can produce ill-posed problems. A typicalexample is the 'anti'-diffusion equation.
au a2uay + ax2 = 0
Notice that the amplitude of the mode eih is amplified as e+k2y, so
unless the initial data decay faster than this factor in k-sp,ace, a kindof Hadamard instability occurs for any finite 'time' y > O. 70
Discussion.
au a2uat + t ax2 = f(x, t)
cannot make a well-posed problem. The reason should be obvious.371
(28.9)
369That is, when the sup norm of the change in the auxiliary condition is madesmall indefinitely, so does the sup norm of the corresponding change of the solution.
370 As we have seen, the ill-posedness of a problem is closely related to instabilityin the ultraviolet limit (k ---+ 00).
371y' Kannai, Israel J. Math. 9, 306 (1971).
403
(28.10)
28.6 Preservation of order, positivity. Let Ul and U2 be two solutions of the diffusion equation on the domain n as in 28.2. If Ul :::; U2
on the parabolic boundary (---428.2), then Ul :::; U2 in n°. Hence, forexample, if 'Ul :::; U2 at t = 0, then this relation holds forever. In particular, if the initial condition is positive and the boundary value isnon-negative, then the solution is positive forever. This should be obvious from the maximum principle.
28.7 Spatially inhomogeneous and/or anisotropic diffusion. Physically, the consequences of irreversibility should not be affected bythe existence of spatial inhomogeneity and/or anisotropy (with timedependence). We encounter the following equation in such a case (withthe summation convention):
au a2n an-a = aij(x, t) a a + bi(x, t)-a + c(x, t}u
t Xi Xj Xi
or its divergence form (with different coefficient functions):
au a an an-a = -aaij(X, t)-a + bi(x, t)-a + c(x, t)u.
t Xi Xj Xi(28.11)
The second law requires the positive definiteness ofthe matrix M atr( aij).Under this condition it is known that so long as c :::; 0 the maximumprinciple (---428.2) holds. Thus everything we can conclude intuitivelyabout diffusion based on thermodynamics should also be captured inthe spatially inhomogeneous diffusion equation. It is physically verysensible that the existence of the advection (---42B.6) term b is irrelevant to the maximum principle.
28.8 Unbounded space. So far we have heavily relied on the boundedness of the domain of the problem. Note that the diffusion equationcan have a rapidly growing solution even if the initial data is zerou(x,O) = 0 as Tikhonov demonstrated.372 See also the warning in1.18(5). In any case, this episode tells us a danger of mathematicalmodeling: since diffusion equations are derived as a balance conditionof conserved quantities (---4alB.2), it is physically unthinkable that initiallyeverywhere 0 solution can grow (However, if the growth rate ofthe solution as a function of X is not too rapid, then the initial valueproblem can be solved uniquely. In particular, a bounded solution isunique.)
28.9 Infinite propagation speed. For a very short time, the solution of the diffusion equation is almost independent of the (bounded)
372F. John, p211-3.
404
boundary condition away from the boundary, and is given by (3.6). Inparticular, if the thermal energy is concentrated at the origin at t =°(i.e., T(x,O) = 8(x) ~14.5):
(28.12)
is an accurate solution of OtT = ~T for short time in d-space (~16B.1).
For any positive t, however small it may be, T(x, t) > 0 for any x. Thuswe must conclude that heat can travel at infinite speed. This is true forthe diffusion equation for chemical species as well. This is physicallyunrealistic. However, for most applications of diffusion equations, thisis good enough because the tail part of T is much smaller than exponentially small quantities, and because significant error could occur onlyfor extremely short times (when a collective description like diffusionis not applicable).
28.10 Short-time modification of diffusion equation: the MaxwellCattaneo equation.373 We must modify the diffusion equation, if wewish to describe the short time behavior of the system more realistically. This is only possible by adding higher order time derivatives. 374
Hence, the following modification has been proposed:
(28.13)
where c is a positive constant. This is called, in the context of heatconduction, the Maxwell-Cattaneo equation. We have already comeacross this type of equation in conjunction to the propagation of electromagnetic wave in matter (e.g., the telegrapher's equation ~a1F.17).Therefore, obviously, infinite speed of propagation is eliminated.375
373cf. Compt. Rend. 247, 431 (1958).3741n Newton's equation of motion, the inertial effect is described by the second
order time derivative, and the dissipative effect by the first order time derivative asin Ii = -TJX + f, where TJ is the friction constant, and f an external driving force.If we pay our attention only to the very short time behavior of the system, we donot see the dissipation term. The effect of dissipation sets in only later. Such anobservation is also important in hydrodynamics. The Euler equation (-alE.7)accurately describes the initial motion of a body in a viscous fluid under impulsiveforce.
375 The equation now becomes a hyperbolic equation (-1.20). One of the important properties of hyperbolic equations is the finiteness of the propagation speed(_30.16).
405
29 Laplace Equation: Consequence of spatial moving average
A solution of the Laplace equation is called a harmonicfunction. This must be a function invariant under spatialmoving averaging as we discussed in 1.13. This propertyalmost determines the important features of the solutions ofthe Laplace equation and guarantees its well-posedness, etc.
key words: harmonic function, Green's formula, meanvalue theorem, its converse, maximum principle, analyticityof solution, Liouville's theorem
Summary:(1) Solutions to the Laplace equation must be invariant under spatialmoving average; a precise statement is the spherical mean-value theorem and its converse (29.4-5). The resulting smoothness can also bestated precisely (29.10).(2) From this, we immediately know that harmonic functions cannothave any local extremum inside the domain (29.6, 29.8). This deniesthe existence of any stable electrostatic structure (29.7).(3) Typical potential problems are well-posed (29.9).
29.1 Elementary summary. We have learned where the Laplaceequation appears (---+1.2, 1.14, a1B.3, a1F.6), and physically arguedwhat auxiliary conditions can ensure the uniqueness of the solution(---+1.19). The most important boundary conditions are Dirichlet conditions in which the value of the function 'IjJ on the boundary of thedomain is fixed, and Neumann conditions in which the normal derivative of'IjJ on the boundary is given.
Discussion.The Cauchy problem of the Laplace equation is not well-posed. This was seenin Discussion 2B.4(7). Physically, this is not surprising. To obtain the Laplaceequation instead of the wave equation for electromagnetic wave, we must changethe sign of Faraday's law (-+alF.8). This implies that we replace Lenz's law with'anti-Lenz's' law'. Lenz's law is a manifestation of the stability of the world, sothere is no surprise that the Laplace equation does not describe the well-behavedtime evolution in our world.
29.2 Laplace equation and harmonic functions. Any classical
406
solution to the Laplace equation is called a harmonic function. Theelectric potential due to point charges is a harmonic function wherethere is no charge (---+alF.6), and charges correspond to the singularities of the functions. The equilibrium drumhead is described by aharmonic function. The real and imaginary parts of an analytic function are harmonic functions (---+5.6).
Discussion.(1) There is no solution to the 3-Laplace equation on the unit ball centered at theorigin with the origin removed with the boundary condition u = 1 on the Ixl = 1and u(O) = O.(2) Consider the 2D Laplace equation 6.u = 0 on the half plane x > 0 with the'initial condition' u(O,y) = 0 and o",u(O,y) = f(y). If f is analytic, then there is alocal analytic solution, but if it is not, then there is not even a local solution.
29.3 Green's formula. Let D c R n be a bounded region, and uand v be C 2-functions defined on the closure of D. Here, we record theformulas again for convenience (---+16A.19).
and
{ (v~u + gradu· gradv)dT = { v gradu· dB,k hn
r (v~u - U~V)dT = r (vgradu - ugradv)· dB.in hn
(29.1 )
(29.2)
29.4 Spherical Mean-value theorem. Let u be harmonic on a region D eRn, and Br(x) be a ball of radius r centered at x such thatBr(x) C D. Then, we have
u(x) = S 1 ( ) r u(y)d(J(Y),n-l r iaBr(x)
(29.3)
where d(J(Y) = IdB(y)l, the area of the surface element, and Sn-l(r) isthe surface area of (n - 1)-sphere (i.e., the skin of the n-ball) of radiusr. 376 0
This should be intuitively expected from the interpretation of theLaplacian (---+ 1.13).[Demo] Set v(y) = l/lx - yln-2 (n > 2) or In Ix - yl (n = 2) in (29.2), andD = Br(x) \ B.(x) (r > €).377 Since v is harmonic in R n
\ {x} as a function of
376Sn_l(r) = 2rrn / 2r n - 1 /f(n/2).377 A \ B is the set of all the points in A but not in B: A \ B == {xix E .4.,:l.' ¢ B}.
407
y, v(y) is harmonic on D. To calculate the RHS of (29.2) we need the normalderivatives on oBr(x):
OV = (2 _ n)r1- n . (29.4)anSince both u(y) and v(y) are harmonic on D, (29.2) reads
o = f (vonu - uonv)da"(y)JeiDf (vonu - uonv)da"(y) - f (vonu - uonv)da-(y). (29.5)
J8B r (x) JaB,(x)
Using (29.4) and (16.35), we can rewrite this as
0= -(2 - n) [r1 - n fud(}'(y) - E1-
n f Ud(}'(y)] , (29.6)J8B r (x) JaB,(x)
which implies
limE1-n f ud(}'(y) = Sn-l(r)u(x).
,->0 JaB,(x)
The converse of this theorem is also true:
(29.7)
29.5 Theorem [Converse of mean-value theorem]. Let u be acontinuous function on a region D. If the mean value theorem 29.4holds for any r > 0 and x such that Br(x) C D, then u is Coo andharmonic on D. 0 378
29.6 Maximum principle. Let D be an open region and u be harmonic (~29.2) there. Suppose SUPXED u(x) =A < 00. If it t. A for'7x E D, then 'u(x) < A for '7x E D. 0This should be obvious from the mean-value theorem 29.4. Also, sincea harmonic function is a steady solution of a diffusion equation, fromthe maximum principle for the diffusion equation (~28.2), this shouldbe physically sensible. Changing u to -u gives the minimum counterpart. This theorem implies:Corollary. Let D be a compact set, and u be a harmonic function onthe open kernel of D and continuous on D, then the extremum of u onD is achieved on aD. 0This implies that static electric potential cannot have its extreme values where there is no charge. A grave consequence is the collapse ofclassical physics.
Discussion.Consider
(29.8)
in 3-space on a bounded region n. Assume u = 0 on 8n. Show that -1 S; U S; 1.
378 For a proof, see Folland p91 (2.5).
408
29.7 Classical physics cannot explain atoms: Earnshaw's theorem. It is impossible to have a stable static configuration of chargesin any static electric field. 0Unstable stationary configurations are not impossible (give an exampIe). This theorem and electromagnetic radiation inevitable from accelerated charges conclusively killed the possibility of explaining atomswithin classical physics.
29.8 Strong maximum principle. Let n be a bounded region inR n , and u be harmonic there. If u attains its maximum value M at aninner point of 0, then u is constant on n.
This is obvious from the mean value theorem.
29.9 Uniqueness and well-posedness. The solution of the Laplaceequation on a bounded domain D, if exists,379 is unique and dependscontinuously on the boundary data (---+29.11).0The proof is quite parallel to that for the diffusion equation (---+28.4).[Demo] Let UI and U2 be two solutions of the same problem. Then, due to thelinearity of the problem, the difference U = UI - U2 obeys the Laplace equationwith the homogeneous Dirichlet boundary condition (i.e., U = 0 at the boundaryof the domain). From the maximum principle (---.29.6) U cannot be larger than 0,and -u cannot be larger than O. Hence, UI = U2. That is, if there is a solution,it is unique. Now, we compare two different problems 1 and 2 with the auxiliaryconditions different slightly. Let the solutions of 1 and 2 be UI and U2, respectively.Then, the maximum principle tells us that the maximum value of lUI - u21 in theregion cannot be larger than the differences in the boundary data.
Discussion.The existence of a solution in a domain in 3 or higher dimensional space is a verydifficult problem, even if the boundary condition is continuous.
29.10 Smoothness of the solution. Since a harmonic function is,roughly speaking, invariant under spatial moving average, it must besmooth. Actually,Theorem. All the solutions of the Laplace equation are real analytic(---+13C.6(2) for d = 2. Here the assertion is for all d ~ 2. Analyticitymeans the convergence of the Taylor series.). 0
Discussion.(1) A solution to ~U = f is analytic if f is analytic (Courant-Hilbert).(2) Hadamard's exampleLet D be a bounded region. There exists a continuous function F aD ---. R
379\Ve have not yet constructed the solution!
409
such that it becomes the boundary value of a harmonic function IjJ on D for whichJD IgradljJ12dO" is not bounded. In this case although IjJ is Coo, its derivatives behavewilder and wilder as the point approaches the boundary of the domain.
If the boundary value is continuous, then the corresponding Dirichlet problemof the Laplace equation on a bounded domain has at most one solution.
29.11 Well-posedness of Poisson's equation. The general Poisson problem has the following form
~u = F in D, 1.1, = f on aD. (29.9)
Here D is a bounded region. If we are interested in smooth solution(for example, C2 ), then
(29.10)
where II liD is the L2-norm on D, and Cl, C2 are positive constants. Thisinequality clearly implies the well-posedness of our problem.
It is a good occasion to learn something about the so-called a prioriestimate.The inequality can be demonstrated as follows.(1) First, the problem is split into v and w: ~v = F in n, v = 0 on an and~w = 0 in n, w = f on an.(2) From the properties of the algebraic and geometric averages we get
(29.11)
(29.12)
for any positive E.
(3) Therefore,
That is, we have only to find bounds for v and w, respectively.(4) With the aid of the variational problem (-+34C.13) for the eigenvalue of theLaplacian -~:
, If) v(-~)vdx0< Al = Illf
vl/>o=o If) v 2dx
Hence, with the aid of the Schwarz inequality (-+20.7)
Hence,
(29.13)
(29.14)
II v l12:::; :2 ( F 2 dJ:. (29.15)1 if)
(5) Introduce an auxiliary function 'P such that ~'P = w on n and the homogeneousDirichlet condition on on (the existence of the solutions wand 'P is a prerequisiteof our argument). With the aid of Green's formula (-+29.3)
r w o'P dO" - { 'P °aw dO"1w~'Pdx -1 'Pf::.wdx. (29.16)Jan an Jan n n n
410
Hence,
(29.17)
We have used the Schwarz inequality.(6) For a function vanishing on the boundary
(29.18)
Hence, IIwl12 is bounded by 11F112. 380
Discussion.Partial derivatives of a harmonic function with respce to the Cartesian coordinatesare again harmonic. However, the partial derivatives with respect to curvilinearcoordinates are not necessarily so.
29.12 Comparison theorem. Let u and v be harmonic functionson a bounded domain n, and 1l ~ V on an. Then, u ~ v throughoutn.
29.13 Liouville's theorem.381 If 'u is a bounded harmonic functionon the whole space R n
, then u is a constant. 0 382
29.14 More general elliptic equation. The essence of the Laplacianis that it is an operator giving the deviation of the value of the functionfrom its local average. The Laplacian is obtained when we assume thatthe weight for the average is everywhere uniform (-1.13). We shouldbe able to choose a weighted average. Then, a more general equation
380 The following theorem is also relevant.Aleksandrov's theorem. The solution to Poisson's equation smoothly depends onthe charge distribution. Or, more precisely: Let D be a bounded domain and u be asolution of ~u = f in D with a homogeneous Dirichlet condition and is continuousup to the boundary of D. Then,
supu::; C1lflld,D
(29.19)
where C is a constant dependent on the spatial dimensionality and the radius ofD, and II· lid is the Ld-norm. [LP-norm for any positive p is defined by Ilfllp ==(J IfIPdx)l/P, where the integral is the Lebesgue integral (-.19).] See EgorovShubin, p93.
381Joseph Liouville, 1809-1882.382 Folland p94 (2.11).
411
(29.20)a2u au
aij-- +bi- +c(x)u = 0aXiXj aXi
with the positive definite matrix M atr(aij) appears. We may expectthat the key properties of the Laplacian should be true even for aij 8i a j ,
because they are due to the averaging principle. Indeed the maximumprinciple is true if c :::; 0 as intuitively expected. (The most statementsabove hold if c :::; 0383).
like
383 See, Yu. V. Egorov and M. A. Shubin (eds) Partial Differential Equations III,Chapter 2 (Springer, 1991)
412
30 Wave Equation: Finiteness of propagation speed
Wave equations are representative hyperbolic equations. Withthe aid of energy conservation, we discuss the well-posednessof wave equation problems. A general method to solve 3space wave equation is given (method of spherical meansdue to Poisson), which clearly shows Huygens' principle.Finally, the characterization of hyperbolic equation withconstant coefficients due to Garding is summarized.
Key words: characteristic curve, domain of dependence(influence), energy conservation, Huygens' principle, methodof spherical means, focusing, hyperbolicity in Garding's sense,finiteness of propagation speed.
Summary:(1) Wave equations have well-defined domains of dependence and influence: they are called the past and the future in relativity (30.3).Huygens' principle is correctly captured by the wave equation (28.9).(2) Wave equations allow propagation of a solution which is not smoothalong a special curve (characteristic curve) (30.2).(3) Wave equations preserve energy. This implies well-posedness ofwave equation problems (30.4, 30.6).(4) All the general methods to solve d-space wave equations are based onreducing them to 1D wave equations (30.19. For another, see 32D.9).In d(~ 2)-space, the time evolution due to wave equations may reducethe smoothness in the initial waves (30.10).(5) Garding conclusively characterized hyperbolicity (30.12-14), whichimplies finiteness of propagation speed (30.15),
30.1 Elementary summary. We have learned where the wave equations appear (-~1.2, alD.9-11, alF.8), and physically argued whatauxiliary conditions can ensure the uniqueness ofthe solution (-+1.20),We know how to obtain the unique solution to the initial value problemin R as d'Alembert's formula (-+2BA) for the I-space problem
fPu EPu8t2 - 8x2 ' (30.1)
We know from the telegrapher's equation (-+alF.17 or the MaxwellCattaneo equation -+28.10) that the second order time derivative pro-
413
hibits infinite speed propagation of the signal.
Exercise.Solve
(30.2)
on R x R.
where A-D are the apices of any parallelogram ABCD in space-timewhose edges are parallel to the characteristic curves x ± ct = const.This equality can be shown easily with the aid of d'Alembert's solution(--+2B.4). We may characterize a 'generalized solution' to (30.1) asany function u satisfying (30.3).
30.2 Characteristic curve. The solution method in 13C.6(l) reduces the I-wave equation (30.1) to two first order PDEs whose characteristic curves (--+13A.4) are x ± ct = const. These curves (actuallylines) are called the characteristic curves of the wave equation (--+ (C)below). If u is a solution to (30.1), then we can prove the followinggeneral identity:t
B
A-t-------::".
X
u(A) + u( C) = u(B) + u(D), (30.3)
Discussion.(A) Hyperbolic equations allow propagation of discontinuity without smoothing.Rewrite the wave equation (3.1) in the following form:
av au au av-=C-, -=C-.at ax at ax
(30.4)
Is there any curve 9(X, t) = 0 on which u and v are continuous but their derivativesjump? [We have already discussed this in detail in 1.2a Discussion.](B) Try the same thing as above for the telegrapher's equation.(C) We have already discussed the meaning of the characteristic curve in D 2.2. Letus continue the discussion for more general cases. Consider
(30.5)
where c(x) is a positive valued function. Suppose there is a discontinuity of thesolution of this equation along a curve 'P(x, t) = O. We assume the solution issmooth except on this curve. We rewrite the equation with the new coordinateX = 'P(x, t) and Y = 7/;(x, t), where ¢ is chosen to make (X, Y) a well-behavedcoordinate system.(1) Show that the result can be written as
414
where
L(ep)
(30.7)
(30.8)
(2) Suppose ou/oX has a discontinuity across ep(x, t) = O. Then, show that
Q(<p,ep) =0 (30.9)
must be satisfied. This equation is called the characteristic equation, and ep = const.is called a characteristic curve.(3) See that x = ±ct = const. are characteristic curves for the ordinary wave equation.(4) There are two characteristic curves passing through a given point. The singularity we are discussing is constrained on them, so its propagating speed should begiven by
dx _ oep(x, t) / oep(x, t) _- - - - ±c(x). (30.10)dt ot ox
(5) Notice that to solve the equation Q = 0 is equivalent to solving (30.10).
30.3 Domain of dependence, finite propagation speed. D'Alembert '8
solution (~2B.3) clearly shows that 'u at x at time t is completely determined by the initial data in the interval [x - ct, x + ctJ. This intervalis called the domain of dependence. Conversely, the initial data at (can influence the interval [( - ct, ( + ct] of the space at time t. This ofcourse means that the disturbance can propagate at fastest with speedc in contradistinction to parabolic equations (~28.9).
Discussion: Characteristic initial value problem.The light cone is a characteristic surface. If u is given on a characteristic surfaceas is shown in figure, then the solution is uniquely determined within its domainof influence. Hence, generally no boundary value problem in a closed domain has asolution for wave equations.
30.4 Energy conservation. The energy integral
E(t) = H{(:)' + c2 (~~n (30.11)
is time independent for classical solutions (~alD.12).
A formal calculation exchanging the order of differentiation with respect to time and integration is justifiable (~19.17).
Discussion.Suppose that a vibrating string of length L with a fixed end condition is subjected
415
to a damping force -a~. Discuss how the energy conservation is violated.
30.5 Uniqueness revisited. Although we already know the uniqueexistence of the solution to the initial value problem of (30.1) in R,let us reconsider the problem in terms of the energy integral. Sincethe equation is linear, to prove the uniqueness, we have only to consider that the homogeneous problem has only the zero solution: if vsatisfies (30.1) and the auxiliary conditions v(:r,O) = 0 for XED, andv(x, t) = a for x E DD for t 2:: 0, then v(x, t) = 0 in D x [0, t]. For thisinitial condition the total energy (30.11) is zero, so that the constancyof energy integral implies that Dtv(x, t) = D,rv(x, t) = O. This implies(with the aid of the mean value theorem) v is a constant. Since v iscontinuous, this implies that v == 0.
Discussion.(1) Riemann's method. Let
fJ2 fJ2L=:p(x)otZ - OxZ' (30.12)
Lv
v It=to ,x=xo
0,
1,(30.13)
(30.14)
2Vp(x) ~: +o~v = 0 on characteristic curves.
-t (30.15)
«f-t" The solution v is called the Riemann function (fundamental solution). In terms ofthis function, the solution to the initial value problem can be obtained as
r::t:::\ 1 [ r::t:::\ ] 11xB (OU ov)(yp(x)u)(P) = 2" (yp(x)uv)(A) + (Vp(x)uv)(B) +? p(x) -v - u- dx,JL ~ XA ot ot
(30.16)where A, B, P are the points in the figure. The formula is called Riemann's formula,and d'Alembert's formula is its sp:cial case.(2) How can we determine Riemann's function? The problem is to solve v for whichthe auxiliary conditions are given on the characteristic curves. Such a problem iscalled a Goursa's problem or characteristic boundary value problem. \Ve change theindependent variables from x, t to 'P+ and 'P- (characteristic curves (--+30.2). Theproblem now reads
OZv ov ov-,------,,-- - a-- - b-- = 0o'P-o'P+ 0'P- 0'P+
with the boundary conditions
Here a, b andf± are given functions. If we define
ov'l1± = --,
0'P±
416
(30.17)
(30.18)
(30.19)
then, the PDE can be cast in the following simultaneous Volterra integral equation:
(30.20)
(30.21)
This can be solved by an interative replacement method with the starting choice of
\If_ = f'-, \If+ = f!t-.
30.6 Well-posedness. We consider two problems (30.1) with u(x, 0) =fi(X) and Otu(x, 0) = gi(X) in R (i = 1,2). Denoting each solution asUi, we can easily get
(30.22)
(30.23)
from d'Alembert's formula (-t2B.4). Hence, the solution depends onthe data continuously. That is, small changes of the data cause a smallchange in the solution for any finite time.
30.7 Inhomogeneous wave equation. Consider
o2,u 02uot2 - c
2ox2 = F(x, t)
in R x R with the initial condition u(x,O) = f(x) and Otu(x,O) =g(x), where f is C2 and 9 is C1. The problem is a superposition ofthe homogeneous equation with the inhomogeneous initial conditionsstudied in 2B.4 and the following problem of inhomogeneous equationwith homogeneous initial conditions:
(30.24)
(30.25)
(30.26)
with v(x,O) = 0 and Otv(x, O) = O. The problem can be solved easilywith the introduction of the new variables (a standard trick -t2B.3)x ± ct as in
1 ht l x +c(t-r)v(x,t)=- dr F(IJ,r)dIJ.
2c 0 x-c(t-r)
Notice that if F(x, t) is an odd function of x, then so is v for all t.
30.8 Wave equation in 3-space, Huygens' principle. The initial value problem
417
with the initial condition
u = f(x), 8t u = g(x) for t = 0 (30.27)
in 3-space has the following solution:
u(x, t) = 4?r~2t jy-xl=ct g(y)da-(y) + ~ (4?r~2t jy-xl=ct f(y)da-(y)) .(30.28)
This is an explicit expression of Huygens' principle. This equation canbe a starting point of a numerical scheme. A demonstration of theequation follows.
Exercise.Solve the following 3-wave equation:
Utt = 6.u (30.29)
with the initial condition U = x 2 + y2 + Z2 and Ut = z.Needless to say, an inhomogeneous problem Du = q can be solved by linear
decomposition. The inhomogeneous problem with a homogeneous auxiliary conditions can be solved easily in terms of Green's functions (---+40).
30.9 Method of spherical means [Poisson]. Define
Mh(x,r)=~ r h(X+Ty)da-(y),41f- J1yl=1
(30.30)
where h is a C2-function, and a- is the area element of the sphere. Mhis an even function of T. Using Gauss' theorem (---;.2C.13), we get thefollowing Darboux's equation
(::2 + ~ :1' ) Mh(x,r) = ~Mh(X, 1'). (30.31)
Here ~ is the Laplacian acting on the function of x. (30.26) is convertedto
82 82
8t2(rMu) = c2 01'2 (1' Mu), (30.32)
where Mu is interpreted as a function of x, T and t as Mr'lt(x, 1', t), andthe initial condition becomes
Mu = M f , OtMu = Mg for t = O. (30.33)
Notice that Mu(x, 0, t) = u(x, t). (30.32) can be solved as (---;.2B.4):
1 . 1 l r+ctl' Mu(x, T, t) = -[(r+ct)Mf(x, r+ct)+(r-ct)Mf(x, r-ct)1+-
2yMg(x, y)dy.
2 c r-ct(30.34)
418
Using the fact that Mrf and Mrg are even functions of r, we can rewritethis as
Mu(x, r, t) = (ct + r)Mf(x, ct + r) ; (ct - r)Mf(x, ct - r) +21 jCHr yMg(x, y)dy.r cr ct-r
(30.35)Now, take the r -+ 0 limit (l'Hospital's rule is used) and we finallyarrive at (30.28). See 32C.14.
30.10 Focusing effect. (30.28) implies that the smoothness of thesolution u can be less than that of the initial data due to the derivativein the formula. This effect is called the focusing effect. This can happenwhen the initial condition is focussed into a small set, making caustics.This does not happen in I-space.
30.11 What is the mathematical essence of the wave equation?Physically, that the singularity can be propagated without smoothing(propagation of shock waves, for example) is a remarkable distinctionfrom the diffusion equation (parabolic equation). Also the finiteness ofthe speed of propagation is in striking contrast to the diffusion equation(-+28.9). Since the wave equation is nothing but Newton's equationof motion of an idealized elastic body (-+alD.9), the Newton-Laplacedeterminacy should apply. That is, the Cauchy problem must be wellposed (-+28.3). Garding384 answered the question decisively at leastfor the constant coefficient linear partial differential equations (of anyorder).
30.12 Hyperbolicity in Garding's sense. Let L.,-L(at , \7) be a Nth order linear PDE operator with constant coefficients. If L containsaN /atN385 and if the real parts of the zeros ..\i(e) of the characteristicequation L(..\, i~) = 0386 considered as an equation for ..\ are bounded asa function of~, then we say Du = 0 is a hyperbolic equation in Garding'ssense.
30.13 Example.(1) Wave equation (a;- c26.)u = O. L(..\, i~) = ..\2 + c2e. Therefore,..\(~) = ±icl~l. That is, the characteristic roots are purely imaginary,so obviously the equation is hyperbolic in Garding's sense.(2) Diffusion equation (at - D6.)u = O. L(..\,i~) = ..\ + D~2, so that..\(~) = - De is real and is not bounded as a function of (
384Garding wrote a nice book on mathematics: L. Garding, Encounter with Mathematics (Springer, 1977). Those who are interested in mathematics as a part of themodern culture will enjoy the book.
385If the highest order derivative is not at, then 30.15 below does not hold. Thatis, the propagation of front has infinite speed.
386Here not only the highest order terms but all the derivatives are taken intoaccount. Furthermore i accompanies with ~'
419
(3) However, if we add a second order time derivative term with a smallpositive coefficient as (€fit +at - D~)u = 0, which is called the telegrapher's equation or Maxwell-Cattaneo equation (-+alF.17, 28.10), thesituation is drastically different from the diffusion equation. For thisL(>.., ie) = E>..2 + >.. +De, so that >..(e) = (-1 ± VI - 4EDe)/2E. Henceits real part is bounded as a function of e. That is, the telegrapher'sequation is hyperbolic in Garding's sense.(4) Certainly, the Laplace equation ~u = 0 is not hyperbolic.
Discussion.The equation for transversal oscillations of a beam is given by
(30.36)
where f is essentially the external load. This equation is hyperbolic.
30.14 Theorem [Garding]. The Cauchy problem Lu = 0 underthe Cauchy condition akffatk(O,x) = Uk(X) (0:::; k:::; N -1) is wellposed in the sense of Hadamard (-+28.3) if and only if L is hyperbolicin Garding's sense. 0 387
Hence, the determinacy (and more) for the wave equation is vindicated.
30.15 Theorem [Finiteness of the propagation speed]. Let nbe the support of the Cauchy data for Lu = 0, where L is a linear partial differential operator with constant coefficients, and is hyperbolic inthe sense of Garding (---t30.12). Then the support of the solution attime t > 0 is included in the set {x : U~Enlx - el :::; ct}, where c is afinite number such that
(30.37)
Here "Xi are zeros of the symbol of the principal part of the differentialoperator (and are real for hyperbolic equations). 0 388
387 See John, Section 5.2.388S. Mizohata, Partial Differential Equations (Iwanami, 1965), Theorem 4.9.
420
31 Numerical Solution of PDE
Although we have been discussing analytical methods tosolve PDE, most problems are intractable by exact methods. In this section elementary numerical methods to solvePDE are outlined. We require a numerical scheme to bestable and consistent (i.e., converging to the original problem in the continuum limit). This is a section for ABC ofnumerical analysis.
Key words: discretization, consistency, stability, convergence, von Neumann condition, Courant-Friedrichs-Lewycondition
Summary:(1) There are two major methods to discretize a continuum problem:the Galerkin method and sampling at space-time lattice points (31.2).There can be many unconventional discretization schemes (31.4).(2) Any discrete scheme must recover the original problem in the continuum limit (consistency of the scheme). If the solution to a schemeis bounded, then the scheme is said to be stable. For linear problemsConsistency and stability imply convergence of the scheme (i.e., thesolution to the scheme converges to the true solution in the continuumlimit) (31.7).(3) Stability conditions for a scheme may be understood, roughly, bythe condition that physical propagation speed of the signal does notoutrun the numerical propagation speed (31.9, 31.11).
31.1 Discretization. To use computers to solve a differential equation, unless we use symbolic manipulation, we must discretize everything and express quantities in a finite number of rational numbers.Thus the fundamental question of numerical computations of differential equations is how faithful this map to the discrete world is.Numerical analysis is a discipline to analyze numerical algorithms andis as old as analysis itself. Already Newton discussed a series expansionmethod to solve ODE in his first calculus paper (1669). Euler introduced discretization methods in 1743.
Discussion.
421
Consider389dudt = feu), (31.1)
where f satisfies f(O) = f(l) =0, feu) > 0 for u E (0,1) and feu) < 0 for 1 < u < ""for some positive Ii > 1. Then, its Euler differencing result
(31.2)
(31.3)
exhibits chaos for ~t > C1 for some positive C1. Here 'exhibiting chaos' means thatthe solution has a 'natural' relation to random numbers (or the outcome of cointossing).390(B) Consider the following logistic equation
duat =u(l- u).
(1) Solve this equation with the initial condition u = Uo E (0,1) analytically.(2) Get the following type of difference equation with the aid of the center differencing scheme:
(31.4)
(31.5)
where a = 2~t, Un = u(n~t) and V n = Un-I.
(3) The equation (31.4) defines a map from R 2 into itself. The map exhibits chaosirrespective of the size of ~t.391 A more careful statement is as follows. Let time Tbe fixed and N == T / D.t. If D.t -+ 0, then up to N there is no pathological behavior.However, if ~t is fixed, then for sufficiently large N (consequently for large T),pathological behavior will show up.(4) The equation (31.4) converges to (more generally, see 31.3 Discussion)
du dvdt =v(l-v), dt =u(l-u).
This equation does not exhibit chaos, but is unstable near u = v = 1.
31.2 Two major methods of discretization. There are two majormethods to map a continuous problem to a discrete problem. One isthe sampling method (recall Green's approach -+1.8), and the other isthe Fourier expansion method.
The sampIing method tries to represent a function f (x) by a set offunction values sampled at appropriately located sampling points, andis usually called "the discrete variable method." We have already usedits primitive version in 1 (1.15, 1.18, 1.20).
The Fourier expansion method tries to describe a function f(x) asa truncated generalized Fourier expansion fN(X) (-+20.14). A typicalmethod is the one called the Galerkin method: Put fN(X) = 2:;;=1 an¢n(x),
389M. Yamaguti and H. Matano, Euler's finite difference scheme and chaos, Proc.Japan Acad. 55 Ser.A, 78-80 (1979).
390y. Oono, Period =f:. 2n implies chaos, Prog. Theor. Phys. 59, 1029-1030 (1978).391S. Ushiki, Central differencing scheme and chaos, Physica D 4, 407-424 (1982).
422
where <Pn denotes orthonormal functions (-+20.10), into the originalequation. Then, multiply <Pm(x) and integrate over x. This will givea set of equations for the Fourier coefficients. This is a finite set ofalgebraic equations, so there are many ways to solve it.392
31.3 Consistency, stability and convergence. If the discretization scheme recovers the original equation in the limit which recovers afunction from its discretized version, we say the method is consistent.If the discretized solution is bounded in terms of the input data (initialcondition, etc), we say the method is stable. Consistency and stabilityimply the convergence of the scheme. That is, if a numerical schemeis consistent and stable, then the scheme gives the solution which converges to the true solution of the original continuous problem in thelimit recovering a function from its discretized version. There are consistent but unstable schemes.393
Discussion.Probably the most famous example is the center differencing scheme:394
Since dx/dt ~ [x(tn+d - x(tn_l)]/2h, where h is the time increment tn+l - t n = hfor all n, we might be able to rewrite dx/dt = f(x) as
x(tn+l) - x(tn-d = f( ( ))2h x t n • (31.6)
The scheme is called the center differencing scheme. It is known that this equationconverges to the following simultaneous equation:
dxdt = f(y),
dydt = f(x). (31.7)
If x = y is stable, then there is no problem, but often this is not the case. Themethod doubles the dimensionality of the phase space (= the space where the trajectories are). Hence, even a two dimensional ODE could produce chaos as artifactafter center differencing discretization.
31.4 Discretization of PDE. The simplest method to discretize aPDE is to use a regular mesh on its domain and use the values ofthe functions sampled at the mesh points.395 As explained in 31.2 we
392The Galerkin method is often used to solve PDE. In this case the resultant set ofequations become a simultaneous set of ODEs. The method is also very importantas a tool to prove the existence of the solutions to PDEs like the Navier-Stokesequation. See Ladyzhenskaya quoted in 1.21 Discussion.
3930ne might think that if a scheme is not consistent, then the scheme is useless.However, the situation is not this simple, because we do not take the h -+ 0 limit inpractice. Hence, even if the limit may be different from the original equation, stillthe numerical solution for a finite h may be a good solution.
394M. Mizutani, T. Niwa and T. Ohno, Chaos and bifurcation phenomena in limiting central difference scheme, J. Math. Kyoto Univ. 23, 39-54 (1983).
395 A. Iserles, A First Course in the Numerical Analysis of Differential Equations(Cambridge UP, 1996) is an excellent introdution to the mathematical side of nu-
423
can also use the Galerkin method to discretize the PDE with the aidof generalized Fourier expansion (in terms of an appropriate completeset). Always the consistency and stability of the scheme are crucial.An important point recognized explicitly in recent years is that goodmodeling of physics on a discrete space can motivate a useful numericalsolver for PDE.
Discussion.A typical example is the numerical schemes for the simple equation
au au-;:} + c£:} = 0, (31.8)ut ux
where c is a constant. We can solve this equation analytically easily (--+1.2B Discussion(2), 2B.6, 13A.4), e.g., for the initial condition u(x,O) = 1 for x > 0 and0, otherwise. Ordinary discretization methods give miserable results (Try to solvethis with the simple Euler scheme). However, we know the essence of the equationis the translational symmetry of space:
for any M (this is the equation for weak solutions, cf. 2B.3). The problem is thatif we discretize u, then we know onlyu(xi) at sampling points {x;}. Therefore, itis very hard to describe the translational symmetry. The most natural idea is: (i)first reconstruct the continuous u from the discrete sampled values by interpolation,(ii) then translate the reconstructed continuous function according to (31.9), (iii)Finally sample the values of the shifted function at the grid points (see Figure).Actually, this reconstruction-resampling scheme is used in one of the best schemesfor (31.8). Thus, the reader should keep in mind that there is still an ample roomto devise unconventional numerical schemes for PDE.
(31.9 )u(x, t + M) = u(x - cM, t)]I. 7'-1./
,,-;(r
I \ I I I)
I I I " ~I I , ' ")
31.5 Discretization of Poisson's equation. Practically useful numerical schemes use simple discretization to solve a Poisson's equation:396
~u = f (31.10)
on a region D with the boundary condition u = 9 on aD. Let usconsider this in 2-space. To discretize this, we follow Euler: Let h bethe lattice spacing of the sampling regular square lattice; the samplingpoints are (ih,jh), where i and j are integers. Let us denote the valueof a function f at (ih, j h) as f [i, j]. The simplest scheme is
A [ ..] = uri + l,j] + u[i,j + 1] + uri - l,j] + u[i,j - 1] - 4u[i,j] - fl' .]Uh U 'l,} - h2 - 'l,}
(31.11)
merical analysis. Although, as the author explicitly says, it is not for practitioners,still the comments in the end of each chapter contain updated information and areuseful.
396If the domain is regular, say, a square, then, Fourier transform methods arepractical.
424
with u[i,j] = g[i,j] if (ih,jh) is on the discretized boundary. Let usdenote the set of grid points in the domain by Dh and the discretizedboundary by r h.
31.6 Solvability of (31.11). (31.11) is a linear algebraic equation,so that if the matrix defined by D..h is non-singular, then we can solveit. The non-singularity of the matrix can be shown with the aid ofthe maximum principle (-t29.6) which is still true after discretization,because the mean value theorem is correct as can be seen from the formof D..h (-t1.13). More precisely, we can show easily that if
(31.12)
then v ;::: 0 on Dh U rho This implies that if v and -v both satisfy(31.12), then v = 0 on Dh U rho That is, if D..hv = 0 on Dh and v = 0on f h , then its unique solution is v = 0 everywhere. Hence, the matrixdefining the simultaneous linear equation (31.11) is regular, and (31.11)has a unique solution. The matrix is very sparse, so many sparse marixsolvers can be used.
31.7 Consistency and stability =} convergence. Is this discretization scheme consistent? That is, in the h -t 0 limit can we claim thatthe discretized version converges to the original equation? If u is C3 onthe domain, we can demonstrate
(31.13)
Since we know the solution to Poisson's equation is very smooth (-t29.10)this is enough to demonstrate that indeed our scheme is consistent.
Our scheme is also stable: the solution to (31.11) is bounded bythe 'magnitudes' of f and 9 in the problem as
(31.14)
where c is a positive constant independent of h, f and g.397
Now we haveTheorem. The solution Uh to (31.11) converges uniformly to the solution to the original problem. More precisely,
397In this case, we need not restrict the size of h, but usually the stability holds forh up to some upper bound as we will see in the case of diffusion equation (->31.8).
425
(31.16)
oWe thus know that Uh converges to the true solution, but actually thisis shown only on the dense set that are limit points of the lattice points.Since we know from the general theory that the true solution is verysmooth, this should be enough.
31.8 Discretizing diffusion equation: B-method. Let us considerI-space diffusion equation
01l _ 02u Iat - ox2 +
on QT = {(x, t); x E (0,1), t E (0, Tn. We impose the initial conditionu(x,O) = a(x) for x E (0,1). We must also specify a boundary conditionat x = 0 and 1, but we will not explicitly write it down. The I-spaceversion of 8.h is given by
" I"] - uri + 1] + uri - 1] - 2u[i]UhU,'l - h2 . (31.17)
We must discretize the time axis with the spacing T. We introduce thefollowing notation
un[i] = u(ih, nT),
andUn+9[i] = Bun+di] + (1 - B)'lln[i].
We introduce the following scheme called the B-method:
(31.18)
(31.19)
(31.20)
For B = 0 this is the standard Euler method; for B = 1/2 it is calledthe Cranck-Nicholson method; for B = 1 it is called the backward Eulermethod. The latter two methods are called implicit methods, becausewe cannot immediately read off the updated data.
31.9 Stability analysis. A standard method to analyze the stabilityof a scheme is to compute the so-called amplification laetorA:
(31.21 )
The basic idea is that we prepare spatially bounded 'initial condition'(that is why eikl ) and study its time evolution. If IAI > 1, we are introuble.
426
31.10 Von Neumann's stability condition.398 In our case thescheme is stable if Un [i] is bounded for all i and n by a number proportional to the 'magnitude' of the initial condition a. Let us measure the'magnitude' with the following 'normalized £2-norm':
{1 N-l }1/2
Ilvllh = N ~ v[iF
The stability is defined by the inequality
(31.22)
(31.23)
for all n with some positive constant c independent of a, hand T( < 1).Theorem [von Neumann]. A necessary and sufficient condition for thescheme (31.20) to be stable is that there is a nonnegative constant bsuch that for any k
(31.24)
(31.25)
for any k E Z. In particular, the scheme is stable for () E [1/2,1]unconditionally and for () E [0,1/2] under the condition
T 1- < --,----h2 - 2(1- 20)'
which is called the stability condition.399 0Generally speaking, implicit schemes are more stable as seen here. However, implicit schemes are usually computationally more time consuming. The reader might think that exploiting the stability, we can choosea large T to compensate the complexity. Sometimes, this indeed works,but stability does not mean that the obtained solution is accurate, sothat choosing a large T is not usually wise.
Discussion.(A) In (31.20) put 9 = 0 and f = O. Assume
Un,j = >..( kteik (jhl.
Then, this is a solution to (31.20), if
T . 2>"(k) = 1 - 4 h2 sm (kh/2).
(31.26)
(31.27)
398John von Neumann, 1903-1957.399The stability condition may depend on the norm used. If we use the loo-norm,
then the RHS of (31.25) reads 1/2(1- 9) for 9 E [0,1).
427
This >.(k) is the amplification factor for the mode k. From this we conclude that
(31.28)
is required for the scheme to be stable. The condition can be rewritten as
h2
D < 2r' (31.29)
(31.30)>.(k) = 1- 4(1- ())(r/h2 )sin
2(kh/2)
1 + 4()(r / h2 ) sin2 (kh/2)
This may be interpreted as a condition for the numerical diffusion constant to belarger than the physical diffusion constant.
If r/h2 = 1/2, the scheme may violate the maximum principle.(B) In 31.8 try the same and derive the formula for the amplification factor for the() method:
From this the stability condition is given by (the von Neumann stability condition31.9)
(31.32 )
r 1h2 (1- 28) < 2' (31.31)
For () = 1/2, the method is called the Cranck-Nicolson scheme. In this case, ifr/h2 = 1, the scheme is stable, but does not satisfy the maximum principle (thenumber of peaks may increase).(C) Consider the following diffusion-advection equation:
au _ a2u bauat - ax2 - ax'
where b is a continuous function of x and t with boundedness: Ibl < B. Apply adiscretization scheme (not complicated one, please) and study its stability.
31.11 Consistency and convergence of O-method. If u is smoothenough,400 then we can show that the O-method is consistent. Underthe stability condition discussed in 31.10, the solution 'lth to (31.20)converges to the solution to the original PDE in the h ---t 0 limit. Moreprecisely,
31.12 Courant-Friedrichs-Lewy condition. Let us return to thesimple advection problem (31.8). Consider the following simple Eulerscheme
(31.34)
400C4 in space and C 3 in time, for example.
428
This is called the upstream approximation, because if c is interpretedas the stream velocity, the scheme uses the upstream information only.The scheme satisfies the stability condition, if
hT <-.- c (31.35)
The condition is called the Courant-Priedrichs-Lewy condition401 (CFLcondition). This implies that the numerical propagation speed hiTmust not be smaller than the physical propagation speed c. In otherwords, if physics outruns computation, the scheme becomes unstable.A similar interpretation may be possible for 31.10.
Exercise.(1) Compute the amplification factor for (13.28) and derive the Courant-FriedrichsLewy condition.(2) Show that the down stream scheme, which replaces un[i] - un[i - 1] in the upstream scheme with un[i + 1] - un[i] is always unstable.
31.13 Wave equation. A standard differencing practice to solve 1space wave equation 'Utt - c2uxx = 0 is the simple Euler scheme:
un+l(i) = 2'un(i) - un-l(i)
+ C~:) 2 {un(i + 1) + Un(i - 1) - 2un(i)}. (31.36)
It is easy to generalize this to d-space (The stability limit due to theCFL condition is cl:1t/l:1x ~ 1/Vd). This is a very stable and simplescheme, and is widely used. However, it suffers from the dispersionerror (The scheme conserves energy very well, but distorts initial conditions with steep wave fronts.)
Exercise.Study the stability condition of this simple scheme and demonstrate that we indeedneed the Courant-Friedrichs-Lewy condition (the numerical propagation speed mustbe faster than the physical speed).
401 Richard Courant, 1888-1972; Kurt Otto Friedrichs, 1901-1983.
429
32 Fourier Transformation
Basics of Fourier transform including the principle of FFT,major qualitative features like the uncertainty principle,sampling theorem, Wiener-Khinchine theorem are discussedin the first two subsections. Then, Fourier analysis of generalized functions and related topics such as Poisson's sumformula, the Plemelj formula are treated in the third subsection. As a related topic, Radon transform is discussedin the last subsection, which underlies many tomographictechniques.
32.A Basics
Fourier analysis is reviewed. The relation between smoothness of the function and the decay rate of its Fourier transform is important. As theoretical applications, uncertaintyprinciple, sampling theorem and the Wiener-Khinchin theorem about spectral analysis are discussed.
Key words: Fourier transform, deconvolution, inverse Fouriertransform, sine (cosine) transform, bra-ket notation, Plancherel'stheorem, Riemann-Lebesgue lemma
Summary:(1) Fix your convention of Fourier transform (32A.1, 32A.7). Deconvolution is often the place where Fourier transformation is effective(32A.2). Linear differential operators become multiplicative operators(32A.3).(2) The decay rate of the Fourier transform and the smoothness of itsoriginal function are closely related just as in the Fourier expansioncases (32A.11).
32A.1 Fourier transform. Let f be an integrable function (-+19.8)on R. If the following integral exists
j(k) = F(f)(k) =I: dxf(x)e- ikX, (32.1)
it is called the Fourier transform of f. Multidimensional cases can betreated similarly.
430
Exercise.(A) Consider the Fourier transform of a wave train of finite duration. Or, moreconcretely, compute the Fourier transform of
j(t) = [0(t + T) - 0(t - T)] cos at,
Sketch the Fourier transform.(B)(1) Demonstrate the Fourier transform of the following triangular function
is given by
X( )= 4sin2 (wT/2)
w Tw2 '
(2) Demonstrate
JOO sin2 aXd--2- x = 1.
-00 7rax
for any a :f:. awith the aid of (1).
(32.2)
(32.3)
(32.4)
32A.2 Deconvolution. As can be demonstrated with the aid of Fubini's theorem (--+19.14).
F(j * g) = F(j)F(g),
This is a very useful relation.
Exercise.In the following a and b are positive real numbers.(i) Fourier transform
x(x) = 0(b -Ixi).
(ii) Fourier transform e-al:o: l.(iii) Fourier transform
f(x) = e_al:o:lsinbx.x
(32.5)
(32.6)
(32.7)
32A.3 Differentiation becomes multiplication. We have an important relation
~ A
f' = +ikf·
431
(32.8)
The sign in front of the formula depends on our choice of the definition32A.1. We have the following formulas (~2C.7, 2C.9, 2C.11):
F(divv)
F(curl v)
F( -fJ.J)
+ik,vk
+ik x vk
k2fk'
(32.9)
(32.10)
(32.11)
The last formula explains why -fJ. is a natural combination - it is apositive definite operator.
32A.4 Theorem. If f : R ~ C is continuous (and bounded), andboth f and j are absolutely integrable, then the inversion formula holds
f(x) = ~100
f(k)e+ikxdk =F- 1(f).211" -00
(32.12)
DThe formula could be guessed from the Fourier expansion formula 17.1;actually Fourier reached this result in this way. (32.12) appears so oftenthat we have fairly a standard abbreviation
1 1 Joo ~ ( 1 ) d J=- = - dkk - 211" -00' k - 211" . (32.13)
32A.5 Theorem [Inversion formula for piecewise C1-function].Let f be piecewise C1-function on R. Then (cf. 17.7)
1 1 100. A"2 [J(xo - 0) + f(xo + 0)] = 211"P -00 dkelkxo f(k).
P denotes the Cauchy principal value (~8B.10, 14.17). DWe can write the formula as
(32.14)
~[f(xo - 0) + f(xo + 0)] = }~~I: d~sin[~~x~ ~ ~)] ](0. (32.15)
D
32A.6 More general convergence conditions. As can easily beimagined from 17.8 for a pointwise convergence of the Fourier transform, we need some conditions. For example, if f is of bounded variation402
402If a function can be written as a difference of two monotonically increasingfunctions, we say the function is of bounded variation.
432
near x, then (32.12) holds with f(x) being replaced by [J(x+O) + f(x0)]/2. If f is continuous and of bounded variation in (a, b), then (32.12)holds uniformly there.
32A.7 Remark(1) Mathematicians often multiply 1/V2ii to the definition of Fouriertransform as
j = _1_jOO dxf(x)e-ikx ,V2i -00
(32.16)
to symmetrize the formulas (as we will see in 32A.9 or 32B.1 sometimes this is very convenient), because
f(x) = ~jOO j(k)eikxdk.y21f -00
(32.17)
However, this makes the convolution formula (32.5) awkward. Forphysicists and practitioners, the definition in 32A.1 (the sign choicemay be different) is the most convenient, because we wish to computeactual numbers.(2) The integral over k may be interpreted as a sum over n such thatk = 21fn/L, where L is the size of the space. The following approximation is very useful in solid-state physics
(32.18)
(32.19)
32A.8 Sine and cosine transforms. If the space is limited to x 2:: 0,then Fourier sine and Fourier cosine transformations may be useful(cf. 17.16). If f(O) = f(O+), then
g(k) = 100
f(x)coskxdx, f(x) = ~100
g(k)coskxdk.
If f(O) = 0, then
g(k) =100
f(x)sinkxdx, f(x) = ~100
g(k)sinkxdk.
These can also be written concisely as
(32.20)
~ roo cos kx cos k' xdx _1f Jo2 looo . k . k' d- sm ,x sm' x x1f 0
433
8(k - k'),
8(k - k').
(32.21 )
(32.22)
They can be shown easily with the aid of the Fourier transform of 1(-+32C.8); Put cos kx = (eikx + e-ikx )/2, etc. into (32.21) or (32.22).
Exercise.There is an infinite medium whose thermal diffusivity is D. Its initial temperaturedistribution is given by Tlt=o = To (a:). Find the physically meaningful solution(~1.18(5)Warning). There are many ways to solve this. For example, we can usethe free space Green's function (~16B.l and the initial condition trick 16B.5. Wecan also use the Fourier transformation as follows.(1) Show that for any403 function 9 on R 3
1 100100100 Joo Joo Joog(x,y,z) = 11"3 dod(3d, dadbdco 0 0 -00 -00 -00
g(a,b,c)coso(x - a)cos(3(y - (3)cos,(z - c). (32.23)
(2) The integrands are linearly independent (no mode coupling, or super position principle), so that each term must satisfy the diffusion equation. IntroducingA(t) coso(x - a) cos (3(y - (30 cos ,(z - c) into the diffusion equation, show that
A(t) = !(a,b,c)e- D (o:'+f3'+-?Jt. (32.24)
(3) Combining (1) and (2), obtain the following formula, which can be obtaineddirectly with the use of the free space Greeen's function.
T(x, y, x, t) =11"-3/21:1:1: d'f)d~d(e-('7'H'+('J !(x+2VDT'f), y+2JDT~,z+2VDT().
(32.25)[Perform the integration over Greek letters.]
32A.9 Bra-ket notation of Fourier transform or momentum(wave-vector) kets. 32A.7 has the following symbolic representation (-+20.21-23 for notations).
f(x)
(xlk)
](k)
(xlf) = i:(xlk)dk(klf), (32.26)
1 ·k/iCe-' x, (32.27)
y27f
(kif) =i:(xlk)dk(klf) =~1: eikx f(xH32.28)
(k If) is the Fourier transform of f in this bra-ket symmetrized version(32A.7), and the normalization is different from that given in 32A.1.Notice that
f 1 Joo .(xly) = 8(x - y) = (xlk)dk(kly) = 27f 00 e,k(x-yJdk. (32.29)
403If you wish to be within the ordinary calculus, it must be integrable, but wemay proceed formally.
434
To rationalize this, we need the theory of Fourier transform of generalized functions (-t32C.8).
32A.10 Plancherel's theorem.
UI!) = JUlk)dk(kl!) (32.30)
is called Plancherel's formula. In our normalization (for physicists) in32A.1 this reads
(32.31)
(32.32)
The theorem tells us that if f is square integrable (that is, the totalenergy of the wave is finite), then the total energy is equal to the energycarried by individual harmonic modes. This is of course the counterpart of Parseval's equality (-t20.12).
32A.11 Theorem [Riemann-Lebesgue Lemma]. For an integrablefunction f
lim j(k) = O.Ikl~oo
If all the n-th derivatives are integrable, then j(k) = o[lkl-nJ.l:]There is an analogue of 17.11. There we have already discussed itsphysical meaning.404
32.B Applications of Fourier Transform
Fundamental applications of Fourier transformation important in practice are summarized: uncertainty principle, sampling theorem, the Wiener-Khinchine theorem (the relationbetween power spectrum and correlation function). Alsothe principle of FFT is outlined.
Key words: uncertainty principle, coherent state, bandlimited function, sampling theorem, sampling function, alias-ing, time-correlation function, power spectrum, Wiener-Khinchinetheorem, fast Fourier transform
404 see Katznelson p123.
435
Summary:(1) The uncertainty principle is a basic property of Fourier transformation. Its essence is the elementary Cauchy-Schwarz inequality (32B.1).(2) If the band width of a signal (function) is finite, then discrete sampling with sufficiently frequent sampling points perfectly captures thesignal. This is the essence of the sampling theorem (32B.5).(3) Spectral analysis is a fundamental tool of experimental physics. Itstheoretical basis is the Wiener-Khinchine theorem - Fourier transformof the time-correlation function is the power spectrum (32B.I0).(4) Spectral analysis becomes practical after the popularization of fastFourier transform (FFT) (32B.11-13).
32B.1 Theorem [Uncertainty principle]. Let f be in L 2(R) (-*20.19).Define the following averages
Then,
(x) Jxlf(xWdx/JIf(x)12dx,
(k) - J kl!(k)12dk/ J 1!(k)/2dk,
.6.x2 J(x - (x))2If(x)1 2dx/ J If(xWdx,
.6.k2 J(k - (k)?I!(kWdk/ J 1!(kWdk.
.6.x.6.k ;::: 1/2.
(32.33)
(32.34)
(32.35)
(32.36)
(32.37)
[Demo] Without loss of generality, we may assume (x) = 0, and also assume that fis already normalized. Define
j(k) = _1_ Jdxe ih f(x).y"h
Using Plancherel's theorem (-~32A.I0),we get (d. 32A.3)
so that
D.k2 = JI!'(x) - (k)f(xWdx.
The Cauchy-Schwarz inequality (-+20.7) implies
(32.38)
(32.39)
(32.40)
D.k2 tlx2 = JI!'(x) - (k)f(x)1 2dx Jx2 1!(x)12dx ;::::IJU1(x) - (k)!(X))X!(X)dXr '
(32.41 )
436
but since (x) = 0, the last formula reads
1f'(x)xf(x)dxI2 ~ IRe Jf'(x)xf(x)dxI 2 = 1/4.
The last number comes from the following integration by parts
Jf'(x)xf(x)dx = - Jf'(x)xf(x)dx - JIf(xWdx.
(32.42)
(32.43)
(32.44)
(32.45)
32B.2 Remark. As can be seen from the proof of 32B.l, the uncertainty principle is a disguised Cauchy-Schwarz inequality C~20.7)which says that the modulus of cosine cannot be larger than 1. Notethat obvious mathematical theorems can have profound implication inreal life.
32B.3 Coherent state. The equality in the uncertainty principleis realized if the wave function f is Gaussian
f(x) = 1 e-x2/2a27rl/4~1/2
Check indeed f:1xf:1k = 1/2. A state with this equality is called a coherent state.
32B.4 Band-limited function. If a function has a Fourier transform which has a compact support (i.e., j(k) is zero if Ikl > ko forsome ko > 0), then f is called a band-limited function.
32B.5 Theorem [Sampling theorem]. Let f be a band-limitedfunction such that j(k) be zero if Ikl > ko > O. Then,
f(x) = f= f(n7r/ko)sin(kox - n7r).n=-oo kox - n7r
That is, f can be reconstructed from the discrete sample values {f( n7r / ko)}nEZ.l:JThe sampling theorem is extremely important in communication (multichannel communication, bandwidth compression, etc.), and informa-tion storage (digitization as in CD).[Demo] Since j(k) is non-zero only on [-ko,koJ, we can Fourier expand this as afunction of period 2ko (-+17.2)
with
j(k) = L cneikn1r/ko
nEZ
_1_ jkO
j(k)e-in1rk/kodk = en'2ko -ko
437
(32.46)
(32.47)
On the other hand due to the band-limitedness
f(x) =~ jkO
j(k)e-ikxdk.211" -ko
Comparing (32.47) and (32.48), we get
11"en = -k f(n11" jko).
'0
(32.46), (32.48) and (32.49) give the desired result.
(32.48)
(32.49)
Exercise.Determine the minimum sampling rate (or frequency) for the signal 10 cos wt +2 cos 3wt. This is a trivial question, so do not think too much.
32B.6 Sampling function. The function
()sin(kox - mr)
'Pn x =kox - mf
(32.50)
appearing in (32.45) is called the sampling function. {'Pn}nEZ is anorthogonal system. There is an orthogonality relation:
(32.51)
Exercise.Demonstrate that the sampling functions {'Pn} make an orthogonal system. 7"-",,1 IS SJ..owCJ25fJ .
./
32B.7 Band-limited periodic function. The sampling theoremwould naturally tell us the following. A band-limited periodic functionwith no harmonics of order higher than N can be uniquely specified byits values sampled at appropriate 2N + 1 points in a single period.
32B.8 Aliasing. If the function we sample is strictly band-limited,then the above theorem of course works perfectly. However, oftenthe function has higher frequency components beyond the sample frequency. Then, just as we watch fast rotating wheel in the movie, whatwe sample is the actual frequency modulo the sample frequency (thatis, the beat between these frequencies). This phenomenon is calledaliasing. To avoid unwanted aliasing, often we filter the original signal(through a low-pass filter) and remove excessively high frequency components.
32B.9 Time-correlation function. Let x(t) be a stochastic process
438
or time-dependent data which is statistically stationary. Here 'stochastic' means that we have an ensemble of such signals (more precisely,we have a set of signals {x(t;w)}, where w is the probability parameterspecifying each sample signal. That is, if the reader wishes to start anobservation, one w is given (by God) and she will observe x(t;w). Theword 'stationary' implies that the ensemble average of x(t, w) does notdepend on t.405 Let us denote the ensemble average by ( )w' The timecorrelation function is defined by
C(t) = (x(t)x(O)w
and is a fundamental observable in many practical cases.The ensemble average of
(32.52)
(32.53)
is called the power spectrum of the signal x(t), where Xv is the Fouriertransform of x(t). Thanks to the advent of FFT (-+32B.12), it is easyto obtain the power spectrum experimentally (easier than the correlation function).
32B.I0 Theorem [Wiener-Khinchin]. The Fourier transform ofthe power spectrum of a stationary stochastic process is its power spectrum. That is,406
C(t) ex: I: e-ivta(v)dv. (32.54)
Its demonstration is a straightforward calculation. We compute (-+32C.8)
(xvx_ p) = (I: dtx(t)e ivtI: dsx(s )e- iPS )
= I: dt1: dseivte-ipS(x(t - s)x(O))
= 27T6(v - fl) 1: dteivtC(t).
That is, (xvx_ p) = 6(v -1l)U(V) so that
u(v) = 27T1: dteivtC(t).
(32.55)
(32.56)
405 Actually, in this case we only need the absolute time independence of the correlation function. A process with this property is called a weak stationary process.
406 Actually, if we normalize C(t) so that C(O) = 1 (simply regard C(t)jC(O) asC(t)), then we have probability measure U (-+a19.19) such that
1 JOO .C(t) = - e-wtdu(v).27T -00
However, in practice, the numerical constant and normalization are not crucial.
439
32B.ll Discrete Fourier transformation. Let X == {Xn}~==·l be asequence of complex numbers, and
e(x) =exp( -27fix). (32.57)
The following sequence X ={xn} is called the discrete Fourier transform of X:
N-l (k )X k = L e ;; X n .
n=O
Its inverse transform is given by
(32.58)
(32.59)1 N-l (-kn) kX n = N 1: e N X.
k=O
Notice that a straightforward calculation of these sums (N of them)costs O[N2] operations and is costly.
Exercise.Demonstrate the above inverse transform formula by showing
1 11I-/ .- Z Jk(m-nl/N - fJN fa:;o - mn'
(32.60)
32B.12 Principle of fast Fourier transform.407 Let N = N1N2.n, k E {O, 1, ... ,N - 1} can be uniquely written as408
n = nl + n2Nl, k = k1N2+ k2,
where ni, ki E {O, 1" .. ,Ni - 1} (i = 1 or 2). Notice that
e(kn/N) = e(klndNl)e(k2n2/N2)e(k2ndN).
(32.61)
(32.62)
ni and ki are uniquely determined, so we may write, e.g., (nln2) insteadof n. Then, (32.58) can be calculated as
N1 N2-1X(k1 k2
) = L e(k1ndNde(k2n2/N2)e(k2n d N )X(nln2)'n=O
N1-l { [N2-1
] }n~o e(k1ndN1) e(k2ndN) n~o e(k2n2/N2)Xnln2 .
(32.63)
407The algorithm, known sometimes as the Cooley-Tukey algorithm (J W Cooleyand J W Tukey, Math. Compo 19, 297 (1965)), was actually known to Gauss, butthe importance was widely recognized after this paper.
408This is an example of the so-called Chinese remainder theorem.
440
Consequently, the calculation of discrete Fourier transfrom can be decomposed into the following three steps:(1) Compute for any kz
N2- 1
X 711 k2 = l:= e(kznZ/NZ)X71l712'712=0
(2) Then, rotate the phase:
X711 k2 = e(kzndN)Xn1 k2.
(3) Finally compute for any k1
(32.64)
(32.65)
(32.66)
Now the number of necessary operations is 0 [N1 x Ni] +0 [N'f. x Nz]; ifN1 = Nz = ffi, then O[2N3/ Z]. If we can decompose N into m factorsof similar order, then the number of necessary operations is roughlyN 1- 1/ mNZ/m = N X N 1/ m. Hence, asymptotically, we can guess N In Nis the best possibility for the discrete Fourier transform of N numbers.
Exercise.Find the autocorrelation function of the signal
f(t) = 0(t + T) - 0(t - T).
Then illustrate the Wiener-Khinchine theorem with the example.
(32.67)
32.C Fourier Analysis of Generalized Function
Generalized functions can be Fourier transformed and physicists' favorite formulas like Jeikxdk = 21f8(x) or the Plemeljformula 1/(x+iO) = P(1/x)-i1f8(x) can be demonstrated.Fourier expansion of 8-function gives us the Poisson sumformula which may be used to accelerate the convergence ofseries.
Key words: Fourier expansion of unity, Poisson sum formula, Euler-MacLaurin sum formula. Plemelj formula
Summary:(1) Not convergent Fourier series may be interpreted as a generalized
441
function. A typical example is Poisson's sum formula (32C.2).(2) Formal calculation of Fourier transform of generalized functions often works, but whenever there is some doubt, return to the definition(32C.6, 32C.8).
32C.1 Delta function.
00
8(x) = I: ei2mrx
n=-oo
for x E (-1, 1).[Demo] We know as an ordinary Fourier series
1-2x ~-2- = L...J sin(2mrx)/mr
n=l
(32.68)
(32.69)
for x E ( 0.) I ). We may use the RHS to extend the LHS periodically for all R.Differentiate this termwisely, interpreting this as a formula for generalized functions(~14.14). We get
00
- 1 + b(x) = 2 L cos2mrxn=l
(32.70)
for x E (-1/2,1/2).The decomposition of unity (--t20.27) can also be used to obtain (32.68).
32C.2 Poisson's sum formula.
00 00
I: 8(x-k)= I: ei2mrx
k=-oo n=-oo
(32.71)
for x E R.This can be obtained easily from (32.68) by 'tessellating' the for
mula for (-1/2,1/2) over the whole range of R. From (32.71) we get
00 00
IAI L 6(x - Ak) = I: ei27rTlX/A
k=-oo Tl=-OO
(32.72)
(cf. 14.11). Applying a test function <p to this, we get the followingPoisson sum formula:
00 00
IA\ L <p(Ak) = L: ep(2n1r/A).k=-oo n=-oo
442
(32.73)
(Be careful with the normalization constant.) Also we can make acosine version of the Poisson sum formula
00 00
L 8(x - k) = 1 + 2 L cos(2mrx).k=-oo n=l
(32.74)
If f(x) is a gently decaying function, then its Fourier transform decaysrapidly, and vice versa. The Poisson sum formula is useful because itmay help accelerating the convergence of the series.
Exercise.Demonstrate f cosna = ::!:. cosh(1f - a) _ ~.
1 + n2 2 sinh 1f 2n=l
32C.3 Applications of Poisson sum formula.(1)
1 7f 7f""' = - coth-.L.... 1 + a2n2 a a
nEZ
The key formulas are
(2) f cosna = ~cosh(7f - a) _ ~
n=l 1 + n2 2 sinh 7f 2'
32CA Euler-MacLaurin sum formula.
(32.75)
(32.76)
(32.77)
f f(n) = roo f(x)dx+~ f(O)-~ 1'(0)+_1 f(3)(0) __1_f (5)(0)+ ...n=O io 2 12 720 30240
(32.78)[Demo] Let f be a function defined on the positive real axis. Extend it to the wholeR as an even function (f (x) = f (- x)). Apply the cosine version of the Poisson sumformula (32.74) and integrate from 0 to 00. Using the evenness of the function, weget
1 00 100
00 100- "2 f (O) + Lf(k) = f(x)dx+2L f(x)cos(2nn)dx.k=o 0 n=l 0
Integrating by parts the last integrals containing cosines, we get
00 1 roo ~ roo sin 2n1fxEf(k) = "2 f (O) + io f(x)dx - ~ Jo f'(x) 2n1f dx.
443
(32.79)
(32.80)
Keep applying integration by parts to get
~ rOO '(x) sin2mrx dx = _~ [f'(x) cos2mrx] 00 +f roo f"(x) cos2mr:r dx.~ Jo f 2mr ~ 2(mr)2 0 n=l Jo 2(nll-)2
(32.81)Thus
00 1 10000 1
'" f(k) = - f(O) + f(x)dx - 1'(0) '" -- + ....~ 2 ~ 2n2 K 2k=O 0 n=1
This gives the f'(O) term of the formula.
(32.82)
32C.5 Mulholland's formula for the canonical partition function for the rotational motion of a heteronuclear diatomicmolecule. The rotational partition function r(T) at temperature Tis given by
00 [ 'fi,2£(/!+ 1)]r(T) = E(2£ + l)exp - 2Ik
BT' (32.83)
where I is the moment of inertia of the molecule, and kB is the Boltzmann constant. Introduce a ='fi,2/2IkBT, and let
j(x) = (2x + 1) exp[-x(x + 1)0']. (32.84)
Apply (32.78) to this function, we get the following Mulholland's formula
1 1 a 40'2r(T) = - + - + - + - + 0[0'3]
a 3 15 315 .
The first term on the RHS is the classical value.
(32.85)
32C.6 Fourier transform of generalized functions. The crucialobservation is (for ~ see 32A.l): if j and rp both have well-definedFourier transforms,
(1, rp) = Jdk [J dxj(x)e- ikX] rp(k) = (I, rj;) (32.86)
The Fourier transform f =F[7] of a generalized function 7 is definedby
(f,cp) = (7,rj;), or (F[7],cp) = (7,F[cp]),
where rp E V, a test function.
(32.87)
Exercise.Demonstrate
1. sin AX ~()1m -- = Ku X.
A--+oo X
444
(32.88)
lim (b sin Ax = O.,\~"" Ja (32.89)
32C.7 Convenient test function space. For this definition it isdesirable that the set of test functions V (--+14.8) and the set of theirFourier transforms iY are identical. For the set of Schwartz class functions (--+14.8 footnote) this holds (--+32A.ll). [If we choose V to bethe set of all the functions with compact supports, then iY becomes verylarge, so that the class of generalized functions (for which (T, <p) mustbe meaningful) must be severely restricted, and is not very convenient.]
32C.8 Fourier transform of unity = delta function.
i = 21rb(k).
This is the true meaning of the physicists' favorite
(32.90)
(32.91)
Obviously,8 = 1 (direct calculation). That is, ;::2 implies multiplicationof 27f as we know in 32A.I0.[Demo] (1, ip) = (1, '13) = JiP(k)dk = F 2 [ip](O). Here F[ip] is a function on theconfiguration space (that is, a function of x) and is equal to 21rip(x). Therefore wehave obtained
(1, ip) = 21rip(O) =J21r15(x)ip(x)dx = (21rb, ip).
Exercise.Show
11""bet) = - coswtdw.1r a
cr. 32A.8.
32C.9 Translation. The following formulas should be obvious
(32.92)
(32.93)
(32.94)
30C.I0 Fourier transform of x, d/dx +-+ +ik. (--+32A.3)
x = +27fib'(k). (32.95)
In other words, since ;::2 == 27f,
8' = +ik.
445
(32.96)
[Demo] Start with the definition (x,rp) = (x,0) (-+32C.6) which is equal to
Jdxx0(x) = Jdxx [J e-ikXrp(k)dk] = Jdx Jdk ( - d~k e- ikX ) rp(k). (32.97)
Integrating this by parts, taking into account that the test function rp decays sufficiently quickly, we get
-JdxJdkie-ikxrp'(k) = -iJdki(k)rp'(J.~) = -21riJdkb(k)rp'(k) = 21riJdkb'(k)rp(k),
(32.98)where we have used (32.90) in 32C.8, and the definition of b' (-+14.14).
A more formal and direct 'demonstration' is
(32.99)
Convolution of the derivative of delta function is differentiation (-+14.23(2)), andthe Fourier transform of a convolution is the product of the Fourier transforms, i.e.,F(J * g) = F(J)F(g) (-+32A.2), so that we easily get )cf. 32A.3)
jl = +ikj.
32C.ll Fourier transform of xl!.
A (.d)nx n = 271" +Z dk 6(k).
In other words,
(32.100)
(32.101)
6(n) = (+ik)n. (32.102)
Since6'*f = f', 6(n) = 6'*6(n-1) = 6'*6'*···6'*6 (n6' are convoluted)(this is well defined ~14.23(2)). This and (32.96) immediately imply(32.102).
32C.12 Fourier transform of sign function.
sgn(k) = ~p~,
where P denotes the Cauchy principal value (~14.17).
[Demo] We have demonstrated (-+14.15)
ddx sgn(x) = 2b(x).
Fourier-transforming this, we get (-+(32.100) and 6= 1)
+ ikF(sgn)(k) = 2.
446
(32.103)
(32.104)
(32.105)
With the aid of (2) in 14.17, we can solve this equation for sgn as
sgn(k) = 2iP~ +cl5(k), (32.106)
where c is a constant not yet determined. To fix this constant we apply this equalityto an even test function, say e-k2
• Since sgn is an odd generalized function, andsince the Fourier transform of a Gaussian function is again Gaussian,
( • _k2) ( _X
2) 0sgn, e 0:: sgn, e =.
P(I/k) is also an odd function, so that this implies c = O.
32C.13 Plemelj formula.
w- lim _1_. = p~ =f i1l"<5(x) ,e-++O x ± EZ x
(32.107)
(32.108)
where w-lime-++o is the weak limit, that is, the limit is taken afterintegration in which the function appears is completed (--+8B.12).[Demo] Obviously,
lim e-<x8(x) = 8(x),<->0+
(32.109)
If we interpret this equation as an equation for generalized functions, then integration and the limit can be freely exchanged. Therefore, we get
0(k) = w- lim [':>0 e-(ik-<)x = lim _._1_.<->0+ Jo <->0+ zk + E
Since sgn(x) = 28(x) - 1, (32.103), (32.90) and (32.110) imply
2iP-k1
= lim 'k-2
- 27rl5(k).<~o+ z - E
(32.110)
(32.111)
32C.14 Initial value problem for wave equation. Thie is thesecond method to solve the wave equation in d-space (--+30.9, 32D.9).Consider
Du = o;u - t::>.u =° (32.112)
with the intial condition u(x,O) = f(x) and Otu(x, O) = g(x) on Rd.Here we assume f and g are with compact supports (i.e., vanish farfrom the origin). Applying spatial Fourier transformation, we get
so that we obtain
A A sin ktu(k, t) = f coskt + g------,;-.
447
(32.113)
(32.114)
Therefore,
1 J d (A ASinkt) -ik.xu(x, t) = (21r)d d k f cos kt + g-k- e . (32.115)
(32.116)
If we introduce the following Fourier transform (in the generalized function sense)
K( ) = _1_ Jddk sin kt -ik·xx,t (21r)d k e ,
We obtain (cf (30·2~))
u(x, t) = %t JddyK(x - y, t)f(y) +JddyK(x - y, t)g(y). (32.117)
Discussion.We can further transform the result with the aid of (d ;::: 2)
where Mf is the same as in 30.9 (the spherical average).
Exercise.Demonstrate that the solution to a wave equation can be written as a superpositionof plane waves. Or, demonstrate the following statement. If we introduce
, _ 1 (- .g(k))h±(k) = 2" j(k) ± zlkf ' (32.119)
Then, (32.115) can be written as
u(a: t) = _1_ Jddkei(k.a:-kt) h (k) + _1_ Jddkei(k.a:+kt) h (k) (32.120), (27f)d + (27f)d -
32.D Radon Transformation
Radon transformation is a theoretical basis of various tomographies. Its inverse transformation is constructed withthe aid of Fourier transformation. Radon transformationallows us to solve the Cauchy problem of the wave equation in any dimensional space. The explicit formula clearlydemonstrates the marked difference of even and odd dimensional spaces.
448
Key words: Radon's problem, Radon transform, modifiedRadon transform, tomography, wave equation, afterglow.
Summary:(1) The mathematical principle of tomography is Radon transformation (32D.3) whose inverse transformation is essentially calculable byFourier transformation (32D.4-5).(2) Radon transform gives a general method to solve d-wave equation(32D.9). The resultant solution clearly exhibits the afterglow effect ineven dimensional spaces (32D.I0).
32D.l Radon's problem. Radon (1917) considered the followingproblem: Reconstruct a function f(x, y) on the plane from its integralalong all lines in the plane. That is, the problem is to reconstruct theshape of a hill from the areas of all its vertical cross-sections.
32D.2 Radon transform. Let f be a function defined on a region inR 2
•409
Rf(s,w).- JR2 dx8(x· w - s)f(x) (32.121)
is called the Radon transform of f, where w is the directional vectorIwl = 1 specifying a line normal to it, and s E R is the (signed) distance between the line and the origin. The Radon problem 32D.l isto find f from Rf.
That (32.121) is the integral of f along the line specified by w·x = scan easily be seen if we introduce the rotated Cartesian coordinate system O-XIX2 such that the X2 axis is parallel to the line and Xl perpendicular to it. The integral now reads J8(XI - S)f(XI,X2)dxldx2 =J f(s,x2) dx2'
32D.3 Some properties of Radon transform. Note that(1) Rf(s,w) is an even homogeneous function (---+13B.l) of sand wof degree -1:
o
Rf(AS, AW) = IAI-IRf(s,w). (32.122)
409The definition given here can easily be extended to general d-space. See 32D.78. A good introduction to the topic may be found in 1. M. Gel'fand, M. 1. Graev andN. Ya. Vilenkin, Generalized Functions, vo1.5 Integral Geometry and RepresentationTheory (Academic Press, 1966). See also R. S. Strichartz, Am. Math. Month. 1982June-July.
449
(2) The Radon transform of a convolution (-+14.22) is a convolutionof Radon transforms:
(R [!R2 fI(y)h(x - Y)dyJ) (s,w) = f: dt [RfI(t,w)] [Rh(s - t,w)].
(32.123)
32DA Fourier transform of Radon transform.
j(pw) = F(Rf)(p,w) ~ i: Rf(s,w)e-ipsds. (32.124)
That is, the Fourier transform of Rf(s, w) with respect to s is theFourier transform of the function f itself with the 'k-vector' parallel tow.[Demo] Using the definition (32.121), we have only to perform a straightforwardcalculation:f: Rj(sw)e-ipSds = f: ds! dxj(x)t5(s - x· w)e- ips = ! dxj(x)e- ipW'X .
(32.125)Thus f can be reconstructed by
f(r) = _1_ Jj(pw)eipW .r dpdw(27r)d . (32.126)
32D.5 Theorem [Radon inversion formula]. Let f be a piecewiseCl-function defined on a region in R 2
• Then
f(x) = JRf(x . w, w)d(J(w), (32.127)
where d(J is the arc length element of the unit circle, and Rf is themodified Radon transform defined by
- 1 100. ~Rf(s,w) ~ 8 2 dpe-IPSpRf(p,w).
7r -00
(32.128)
32D.6 X-ray tomography. The Radon transformation is the theoretical underpinning of the particle beam tomographies. These areapplied not only medically, but also, e.g., to the anatomical study offossils such as trilobites.
(32.129)Rf(s, w) = JRd f(x)8(s - w . x)dx,
32D.7 d-space version. In d-space the Radon transform is definedas
450
where w is the position vector on the unit d - I-sphere Sd-l (the skinof the d-unit ball). The d-dimensional version of 32D.5 reads:
32D.8 Theorem.
f(x) = f dcr(w)Rf(x. w,w),}Sd-l
(32.130)
where
Rf(s,w)
Rf(p,w)
2(211f)d I: e-iPSlpld-1Rf(p,w)dp, (32.131)
I: Rf(s,w)eipSds (= j(pw)) (32.132)
with cr being the area element of Sd-l.
32D.9 Solving d-wave equation using Radon transform. Consider a wave equation in the whole d-space
(32.133)
with the initial condition u = f and 8t u = 9 at t = O. If the initialdata are constant on all the hyperplanes perpendicular to the directionw, i.e., j(x) = F(x . w) and g(x) = G(x . w), where F and G arefunctions defined on R, then we can apply the method to solve the1-space problem (---+2B.4) to get the solution as
1 1 i x ,w+t
u(x, t) = -[F(x· w + t) + F(x· w - t)] + -2 . G(s)ds. (32.134)2 X·W-t
Therefore, if we can decompose the initial data into a superpositionof data depending only on x . w, the superposition principle (---+1.4)allows us to reconstruct the solution from the pieces like (32.134). Ascan be seen from (32.130), d-dimensional Radon transformation is thevery tool to accomplish the desired decomposition.The strategy is as follows:(1) Calculate the modified Radon transform (32.132) for j and g,
(2) Solve the wave equation for Ru.(3) Use (32.130) to reconstruct u:
1f {I - - 11x.w+t
- }'u(x, t) = - dcr(w) -[Rf(x· w + t,w) + Rf(x· w - t,w)] + - . Rg(s,w) ds.2 Sd-l 2 2 X·W-t
(32.135)
32D.10 Waves in odd and even dimensional spaces behave very
451
(32.136)
differently. Let us calculate the modified Radon transform (32.132)explicitly. If d is odd, then jp!d-l = pd-l, so that multiplying p can beinterpreted as differentiation with respect to s as
_ 1 ( 1 ) d-l [)d-lRf(s w) = _(_1)(q-lJ/2 - --Rf(s w).'2 27f aSd- 1 '
In contrast, if d is even then the non-analyticity of Ipl must be dealtwith as Ip\d-l = sgn(p)pd-l, so that
1 ( 1 ) d-l [ [)d-l ]nf(s,w) = 2(-1)(q-lJ/2 27f H [)sd-l Rf(s,w) ,
where H is the Hilbert transform (---+8B.15) defined by
H f (x) = P Jf (s) ds,x-s
(32.137)
(32.138)
where P denotes the Cauchy principal value (---+14.17). This can beobtained from the convolution formula and the Fourier transform ofsgn (---+32C.12).
Look at the use of the modified Radon transform in the solution(32.135) when the initial velocity is everywhere zero. This applies tothe case of an instantaneous flash of light emitted from a point (thatis, f = 8(x)). If Rf(sw) is determined by Rf(s,w) only, then theobserver at distance sees only a flash of light. That is, the wave islocalized in time in odd-dimensional (2:: 3) spaces. On the other hand,if the spatial dimensionality is even, then the Hilbert transform impliesthat the wave is not localized in time. Thus, after watching a flash,the observer must feel that the world becomes brighter (the aftergloweffect in even dimensional spaces) (---+16C.4).
452
APPENDIX a32 Bessel Transform
a32.1 Theorem [Hankel]. Let f E L1([0, (0), r) and be piecewisecontinuous. Then
for v ~ 1/2. This may also be expressed as
(32.140)
Notice that the RHS is the delta function adapted to the weight r (Le.,or(r - r') --+18.25).41°0[Demo] Here (32.139) is proved for continuous L 1 (-19.8) functions and integerv = n. Let
F(x,y) = f(r)ein'P, (32.141)
where x = r cos cp and y = r sin cpo With the aid of the Fourier expression of thedelta function (-32C.8), we can write
Introduce polar coordinates as
r' cos 'If;, lJ = r' sin '~"
k cos fJ, ky = k sin fJ.
(32.142)
(32.143)
(32.144)
(32.142) is rewritten as (F(~, lJ) = f(r')e imp )
f(r)ein'P = roo dkk roo dr'r'f(r') {~111" dfJeikrcos(()-rpl~111"d'lf;ein1/Je-ikrfCOS(1/J-Ol}.Jo Jo 211" -11" 211" -11"
(32.145)Setting 'If; - e = t, we get
Here the generating function of Bessels functions (-27A.5) has been used. Analogously, we have
(32.148)
410More generally, f may be of bounded variation. See G. N. Watson, A Treatiseon the Theory of Bessel Function (Cambridge UP, 1962) p456~.
453
Hence, (32.145)-(32.148) implies (32.139) for v = n.A more convenient formulas may be
a32.2 Bessel transform and its inverse.
g(r)
h(r)
100
h(r')Jv(r'r)r'dr',
100
g( r')Jv(r'r )r'dr'.
(32.149)
(32.150)
Nate that these are the formulas for the Fourier sine (or cosine) transform (->32A.8) for 1I = ±1/2 (->27A.19).
a32.3 Examples. See 27A.15.
rOO 1 roo y e-ax
Jo e-ax
Jo(xy)dx = vla2 + y2 ..-. Jo vla2+ y2 Jo(xy)dy = ---;;-.(32.151)
1000 1 100 y cos ax
cos axJo(xy)dx = vi 2 2"-' vi ? 2 Jo(xy)dy = .o y - a 0 y- - a x
(32.152)
1000 yV 1000 yV+l
e-a2x2 xv+1 Jv(xy)dx = e-y2j4a2 ..-. . e-y2j4a2 J (xy)dy = e a2x2 XVo . (2a2)v+l 0 (2a2)v+l v .
(32.153)
454
33 Laplace Transformation
Laplace transformation is a disguised Fourier transformation for causal functions (the functions that are zero in thepast), and is a very useful tool to study transient phenomena. The inverse transformation is often not easy, but clevernumerical tricks may be used to invert the transforms. Appendix a33 discusses a disguised Laplace transformation,Mellin transformation, which is useful when we wish to solveproblems on fan shaped domains.
Key words: Laplace transform, fundamental theorem, convolution, time-delay, fast inverse Laplace transform.
Summary:(1) Laplace transformation 33.2 allows one to solve many ODE algebraically with the aid of tables (33.14).(2) Basic formulas like the convolution theorem, delay theorem, etcshould be known to this end (33.7-10).
33.1 Motivation. Due to causality, we often encounter functions oftime t that are zero for t < 0 (or often so for t ::; 0 due to continuity).Then, the so-called one-sided Fourier transform
(33.1)
appears naturally. However, if f(t) grows as eat (a> 0), then this doesnot make sense even in the sense of generalized functions (---t14.4).Even in this case, if we choose sufficiently large c > 0, the one-sidedFourier transform of e-ct f(t) exists in the ordinary sense. If f(t)e- ct 8(t)(8(t) is the Heaviside step function ---t14.15(3») is absolutely integrable, and l' is piecewise continuous for t > 0, then from the Fouriertransform of this function, f (t) for t > 0 can be recovered.
33.2 Definition of Laplace transform. The following transformation £'8 is called the Laplace transformation:
(33.2)
455
where s = c - iw and c is chosen sufficiently large so that the integralexists. .cs [u] is called the Laplace transform of U.
411
Discussion.(A) A discrete counterpart is the so-called z-transformation: The z-transform A(z)of {an} is defined by
00
A(z) = L anzn.n=O
(33.3)
This is also called the generating function of the sequence {an}. The inverse transform is given by
an = -21
. r dz ~n~; , (33.4)11'1 laD "-
where D is a disc containing the origin but excluding all the singularities of A(z).(B) z-transform is a convenient way to solve linear difference equation:
(33.5 )
For example, let us solveX n+2 - 2Xn+1 + X n = 0 (33.6)
with the 'initial conditions' Xo = 1, and Xl = O. The z-transform X(z) obeys
X(z) - 1 + 2z(X(z) - 1) + z2 X(z) = O.
From this we can solve X(z). The inverse transform gives X n = 1 - n.(C) An inhomogeneous linear difference equation is given by
(33.7)
(33.8)
The general solution to this equation is given by the sum of the general solution of(33.5) and a special solution to (33.8) just as the linear differential equation. If wecan compute the z-transform of {in}, then at least X(z) can be obtained. However,to obtain X n from X may not be very easy.
33.3 Who was Laplace (1749-1827) ? The 'Newton of France'was born into a cultivated provincial bourgeois family in Normandy(Beaumont-en-Auge) in 1749. After his secondary school education heattended University of Caen n 1766 to study the liberal arts, but twoof his professors (Gadbled and LeCanu) urged this gifted student topursue mathematics. With LeCanu's letter to d'Alembert (-+2B.5) heleft for Paris at age 18 in 1768. He impressed d'Alembert, who secureda position for him at the Ecole Militaire. In 1773 he demonstrated thatthe acceleration observed in Jupiter and Saturn was not cumulative butperiodic. This was the principal advance in dynamical astronomy since
411 For a history, see M. F. Gardner and J. L. Barnes, Transients in Linear SystemsvoLl (Wiley, 1942) Appendix C.
456
Newton toward establishing the stability of the solar system. This workwon him election to the Paris Academy in 1773.
Between 1778 and 1789 he was at his scientific prime. Laplace introduced his transformation in 1779, which was related to Euler's work.In 1780 he worked together with Lavoisier to make a calorimeter to establish that respiration is a form of combustion. Although he playeda decisive role to design the metric system in 1790, he wisely avoidedParis when the Jacobins dominated until 1794. In the late 1790s withthree well received books (one of which, Systeme du Mande, was notonly a fine science popularizer but also a model of French prose), hebecame a European celebrity.
Laplace advanced applied mathematics and theory of probabilitysubstantially. He based his theory on generating functions, and extended Jakobi Bernoulli's work on the law of large numbers. He wasamply honored by Napoleon and by Louis XVIII. During his final yearshe lived at his country home in Arceuil, next to his friend chemistBerthollet, surrounded by the adopted children of his thoughts, Arago,Poisson, Biot, Gay-Lussac, von Humboldt and others.
33.4 Fundamental theorem of Laplace transform.(1) The Laplace transform of f (33.2) exists for s such that e-(Res)t f(t) E£1([0,00)).(2) There is a one-to-one correspondence between f(t) and its Laplacetransform Ls[J]. More explicitly, we have
1 l c+
ioo
f(t) = -2. estLs[J]ds,1[2 c-ioo
(33.9)
where c is a real number larger than the convergence coordinate c* suchthat all the singularities of L s [J] lie on the left side of z = c* in C .412
[Demo] (1) is obvious. At least formally, (2) follows from the motivation 33.1.Fourier inverse transform of Ls[j] gives
Since dw = ids, (33.10) becomes
1 rc+ioo
jet) = 27ri }c-ioo £s[j(t)]estds.
For this integral to be meaningful, we need the following theorem:
(33.10)
(33.11)
412This was formally shown by Riemann by 1859. Mellin proved this in Acta Soc.Sci. Fenn. 21, 115 (1896). Hence, there is absolutely no justification to call thisintegral the 'Bromwitch integral.' History must not be distorted due to nationalinterests.
457
Discussion.(1) f(t) = exp(t lT
) with (1 > 1 does not have Laplace transforms.(2) The minimum real number r making f(t)e- rt E L2([0, +00)) is called the convergence coordinate.Exercise.Although practically, there is almost no need (-+33.14) of calculating the integral(33.9), still it is a good exercise of complex integration. Demonstrate the followinginverse transform relations with the aid of the residue theorem (-+8B).(1)
£-1 1 t"-l -od
s (s + a)" = (n _ 1)!e , (33.12)
where a > 0 and n is a positive integer.(2) How can we do a similar thing, if n is not an integer? In this case, s = °isa branch point (-+8A.2-4). If n E (0,1), then a straightforward contour integration along the contour in the figure works. The contribution from the small circle
~~*=~rt-+--7 vanishes in the small radius limit, and the contribution from the large circle is zero, ~ thanks to the Jordan lemma 8B.7. We need 9.5 to streamline the formula. If n is
larger, then probaly the cleverest way is to use 33.7(5) and reduce the problem tothe case of n E (0,1).
33.5 Theorem. £s[l] is holomorphic (~5.4) where £s[l] exists. 0 413
Therefore, if £s [I] exists for c > c*, then £s [I] has no singularity onthe half plane Re z ~ c.
This implies that(1) £s [I] is differentiable with respect to 8,
(2) £s[l] is determined by its behavior on the portion of the real axisx > c* through analytic continuation (~7.8).
33.6 Theorem. If 8 goes to 80 along a curve lying inside the convergence domain, then
Especially,
lim £s [I] = £so [I]·S-+So
lim £s [I] = o.s-+oo
(33.13)
(33.14)
[Demo] (33.14) follows from (33.13), which follows trivially from an elementaryproperty of the Lebesgue integral.
413 To prove this we need the following elementary theorem about Lebesgue integrationTheorem. Suppose(1) f(x, s) is integrable (-+19.8) for each s as a function of x,(2) f (x, s) is holomorphic for almost all x as a function of s,(3) There is an integrable function <I> such that If(x,s)1 :::; <I>(x).Then, Jelxf (x, s) is holomorphic as a function of s. 0
458
33.7 Some properties of Laplace transform.(1) a.cs[J(at)] = .cs/a[J(t)], where a is a positive constant. This can beshown by a straightforward calculation.(2) .cs[e-btf(t)] = .cs+b[J(t)]. This is straightforward, too.(3) .cs[tnf(t)] = (-l)n(d/ds)n.cs[f(t)]. In particular, .cs[tf(t)] = -d/ds.cs[f(t)].(4) .cs[J(n)(t)] = sn .cs[J(t)] _sn-1 f(O) - sn-2f'(0) - ... - sn-k f(k-1l(0)-... - sf(n-2)(0) - f(n-1)(0). In particular,
.cs[J'(t)] = s.cs[f(t)] - f(O). (33.15)
This is due to integration by parts.(5) .cs [1~ f( t')dt'] = S-l.cs[J(t)].(6) .cs[r1f(t)] = 1soo ds.cs[J(t)].(3) - (6) imply that calculus becomes algebra through the Laplace transformation. This is the most important and useful property facilitatingthe solution of linear ODE.
DiscussionThe following equation is called the Airy equation (->27A.23 Exercise (3))
(33.16)
Since the coefficient is only a linear function of t, Laplace transformation is advantageous. Let z be a function of s that is the Laplace transform of y with respect tot. Then,
Here C can be a path as shown in the figure. The integral is called the Airy integral
dz 2- - S Z = 0,ds
which can be solved easily as
Hence, a solution can be written as
Ai(t) = ~1exp (st - ~.~3) ds.27rz C 3
(33.17)
(33.18)
(33.19)
Show thatAi(O) = 3-1
/6 r(1/3)/27r. (33.20)
33.8 Convolution. If we adapt the ordinary definition of convolution 14.22 to functions that are zero for t < 0, we get
(ft * h)(t) = I t
ft(t - 'U)h('U)d'U.
459
(33.21)
A straightforward calculation gives
Exercise.
1" sin(x - y)u(y)dy +u(x) = cosx,
33.9 Time-delay,
1.cs[J(at - b)8(at - b)] = -e-bs/a.cs/a[J(t)]
a
(33.22)
(33.23)
(33.24)
This is also demonstrated by a simple calculation. e-TS is often calleda delay factor.
33.10 Periodic functions. If f is a function with period T, then
[Demo] Thanks to the periodicity, we get
100
e-stf(t)dt=100
e-stf(t +T)dt =Loo
e-STf(r)dreST ,
where t = r - T. This implies that
Solving this equation for £s[f], we get the desired formula.
(33.25 )
(33.26)
(33.27)
(33.28)
(33.29)
33.11 Examples.(1) .cs [l] = l/s is obvious by definition.(2) This with (2) of 33.7 implies .cs[e-bt ] = l/(s + b).(3) Linearity of the Laplace transformation and (2) give, for example,
1 . t . t s.cs[coswt] = -2 (.cs[e1W
] + .cs[e-1W]) = ? + ?'
s- w-
Analogously, we get .cs[coshat] = S/(S2 - a2), .cs[sinwt] = W/(S2 +w2),etc.(4) (3) with (2) of 33.7 gives for example
-bt s + b.cs[e coswt] = ( b)2 2's+ +w
460
(5) (1) and (3) of 33.7 imply
(33.30)
More generally, for v > -1
(33.31)
This can be shown immediately by the definition of the Gamma function (-9).(6) Combining (33.30) and (2) of 33.7 gives
,[
-bt n] n.[,se t = (s+b)n+l'
(7) An application of 33.10 is
Ls[l sin til = -2-1- coth 1fS.
S + 1 2
(33.32)
(33.33)
(8) Applying the convolution theorem 33.8 we can demonstrate
fat Jo(T)JO(t - T)dT = sin t
This follows from (-27A.15)
Exercise.(A) Show
(B) Find(1) L s cos2 wt.(2) For r > 0 and a > 0 Ls(t - h)E-a(t- t2 8(t - r).
(33.34)
(33.35)
(33.36)
33.12 Laplace transform of delta function. We can define Laplacetransforms of generalized functions. We will not discuss this, since therelation between Fourier and Laplace transformations 33.1 explainsvirtually everything we need practically. A subtlety may remain in
461
the definition of the Laplace transformation of 8(x), since the definition 33.2 requires an integration from O. That is, we must consider theproduct of 8(x) and 8(x), which is meaningless (-+14.6) as generalizedfunctions. Without any ambiguity for a> 0
(33.37)
This means the Laplace transform of the weak limit lim€-+o+ 8(t - E) is1. Hence, as a generalized function it is sensible to define (-+14.18)
(33.38)
From this (33.37) is obtained with the aid of the time delay formula33.9.
33.13 Short time limit.
lim f (t) = lim sL:s [j(t )] .t-+O+ s-+oo
(33.39)
[Demo] 33.7(4) with n = 1 reads Ls[J'(t)] = sLs[f(t)] - f(O). Apply 33.6 to 1',and we get lims_= Ls[j'(t)] = o.
33.14 Practical calculation of Laplace inverse transformation:Use of tables. Although the fundamental theorem 33.4(2) givesa method to compute the inverse transforms, practically, an easiermethod is to use a table of Laplace transforms of representative functions. The uniqueness of the transforms (-33.4(2)) guarantees thatonce we can find an inverse transform, that is the inverse transform ofa given function of s. Also numerical fast Laplace inverse transform isavailable.Exercise.(1) Solve the following differential equation with the aid of Laplace transformation
Here a and b are positive constants, and the initial condition is y(O) = y'(O) = O.(2) Using Laplace transformation, solve the following integrodifferential equation
y(t) = y'(t) + t + 21t
(t - u)y(u)du
with the initial condition y(O) = O.
33.15 Heaviside's expansion formula. 414 Let F(s) be a rational
414 Heaviside (1850-1925) introduced an algebraic method to solve ODEs, whichcan be understood as the Laplace transform method explained below. The method,
462
function415 F(s) = P(s)/Q(s), where P and Q are mutually primepolynomials, and the order of Q is higher than that of P. If Q(s) =A(s - a1) ... (s - an) and aI, ... ,an are all distinct, then
P(s) __ ~ CkLJ (33.40)
Q(s) 8==1 s - ak
with Ck = P(ak)/Q'(ak). DThis is obvious, and implies that
n
.c:;l[p(s)/Q(s)] = I: P(ak)eakt/Q'(ak).k==l
(33.41 )
(33.42)
(33.43)
33.16 Examples.
1 [S2 + s + 1] 1 . 1 oJ.c:; ? )3 =-(4+t)smt--(4t+t-)cost.(s- + 1 8 8
.c-1 [ 2s + 3 ] = _~ _ ~e-2t + ~et/28 2s3 + 3s2 - 2s 2 10 5
-1 [ S2 +1] VS [ t/2 (VS 1r) -t/2 (VS 1r)]£- = 1- - e cos -t + - + e cos -t - - .8 2( s4 + S2 + 1) 3 2 6 2 6
(33.44)
Exercise.(1) Find the inverse transform of
8 2 - W8 + w2
g(8) = (2 2)'88 +W
(Answer: 8(t) - sinwt).1+ e7rS
g(8) - ---,----;,------,- 8(82 + 1)'
(33.45)
(33.46)
33.17 Fast inverse Laplace transform. T. Rosono, "Numericalinversion of Laplace transform and some applications to wave optics,"Radio Science 16, 1015 (1981); Fast Laplace transform in Basic, (Kyoritsu Publ., 1984)
which requires generalized functions like the Heaviside step function, and even thedelta function, was never accepted by mathematicians of his day. According to ananecdote, he said that we could eat even though we did not know the mechanism ofdigestion. This story is often told as a story of a triumph of a self-educated genius.However, the method was actually invented by Cauchy long ago. Therefore thestory must be quoted as a failure of premature ossification of mathematics due tomediocre mathematicians.
415 A rational function is a ratio of two polynomials.
463
TableItf}
[9lv>-l]
[\Rv> -1]
[\R.>-1]
log t
t.-Ilog t [\R.>O]
1--(logp+-y)
p
rev)-[tjI(v)-logp]l'v
arccosh t=log (t+..;t2-I)}[t~l] ;Ko(P)
o [t < 1]
sinh at
arc..inh t=log (t+";/2+1)
sinh2 att
al(p2+a2)
p/(P2+a2)
_G-coth!Ep2+a2 2a
1 [ 1rp Jp2+a2 p+acosech 2a
ir(v)[ 1 1 ]-2- (p+ia:)v- (p-ia)v
rev) . ( a)1.2 2) '2 sm varctan-v' +a v, p .
r(v)[ 1 1 ] I-2- (p+ia:).+(p-ia)v .
rev) ( a)(1'2+0:2)./2 cos v arctan p
arctan (a!p)
J!!... ";p2+a2_p2 p2+a2
J!!... ";p2+a 2+p2 p2+a 2
1 ( a2)-log 1+-
2 1'2
(]1-,6) sin 9+a cos 9- (p-.a)2+ a 2
(p-,6) COR 9-a ..in 9(p-Il)2+ a 2
efll cos (a/+9)
tv-I COB at [\Rv>O)
clCf)
cos at";T
I-cos att
efllsin (at+9)
lv-I sin at [\Rv>-I)
lsinatl [a>OJ
(sinat)/t
sin at";T
cos at
sin at
Icos atl [a>O)
1/(p-a)r(v)/(p-a)v
";"ir/";p-a
log[I+(alp)](-a)n+I.,,<nl(op)Cl! ~)
! [Oi (p) cos p-si (II) sin p]p
;p [Ho(p)-No(P)]
al(p2-a2)VI(p2_a2)
2a:2/(P~-4a:2p)
(p2-2a:2)/(p~-4a2p )
r(V)[_I I_J2 (p-a)· (p+a).
~[1 1 ]2 (p-a)·+ (p+a)v
11
p+a-og-2 v-a
1 ( 4a2
)--log 1--4 1'2
r(.)r(ap)aB(v, ap)=a r(v+al')
(It 2)lji(p-a)-lji(p)
lOg~W--(p-i)logp+p
[a>OJ 2,vaeap' Erfc (";0: V)
[a>OJ
[a>OJ
[9lv>OJ
[\Rv, a>O]
sinh at
cosh at
sinh2 at
cosh2 attv-I sinh at
r\Rv>-l. v*O]
/.-1 cosh al [\Rv > OJ
(l_e- l /a)v-1
lo~ (t2+I)
e"'I/..tT(l-ral )!ttn/(I-r l /a)
I_eal
l-e- l
+(1-~-I-+-i)e-l'f«(a)
464
(33.47)
Appendix a33 Mellin Transformation
a33.1 Mellin transformation. The Mellin transform Jof f (1') is defined as
J(p) = 100
f(r)rP- 1dr.
This is well-defined for p satisfying 0"1 < Re p < 0"2, where
i 1
rUl-
1 If(r)ldr < +00, 100
rUZ-
1 If(r)ldr < +00.
a33.2 Theorem [Fundamental theorem of Mellin transformation].(1)
J(p) = i oo
f(r)r P- 1dr
is analytic in the strip 0"1 < Re p < 0"2.
(2) Inverse transformation:
1 1~f(r) = -2' f(p)r-Pdp,1ft r
(33.48)
(33.49)
(33.50)
where r is a straight line in the above strip.D[Demo] (1) is shown just as the counterpart for the Laplace transformation (-+). (2)is also a disguised version of the inversion formula for the Laplace transformation(-+33.2). Introduce t as l' = e- t . Then (33.47) reads
J(p) = 100
e-ptf(e-t)dt (33.51)
This is the Laplace transformation (-+33.3). Therefore, we can apply the inversetransformation formula to obtain
(33.52)
In terms of r, this is just what we wanted.
a33.3 Applications to PDE. If the region of the problem is fan-shaped, thenthe Mellin transformation is particularly useful. 2-Laplace problem in the cylindrical coordinates is
(fJ2 1 fJ) fJ2
1'2 fJr 2 + ;: fJr U+ fJcp2 U = O.
Melling transforming this, we get
2 ~ d2
~ 0P u + dcp2 U = ,
(33.53)
(33.54)
which can be solved easily. The rest is to compute the inverse transform. Tocalculate it as the Laplace transform (33.52) may be advantageous, since there isthe so-called fast Laplace transform algorithm (-+33.17).
465
34 Linear Operators
A linear partial differential operator is understood as a linear map from a function space into another function space.The most important case for physicists may be the linearmap on a Hilbert space. We will discuss the meaning ofself-adjointness of an operator in conjunction to quantummechanics in Part A. Part B discusses spectral decomposition of an operator. Part C is a short summary of spectrumtheory.
Key words: linear operator (symmetric, self-adjoint), operator extension, observable, spectral decomposition, decomposition of unity, spectral measure, semibound operator, spectrum (essential, point, discrete, absolute continuous), compact operator, Hilbert-Schmidt theorem.
Summary:(1) In quantum mechanics, self-adjoint linear operators are regardedas observables. The reason why self-adjointness is required can beglimpsed in 34A.2-5. [Notice that the explanation is probably verydifferent from the one given in physics courses, because in the ordinary quantum mechanics courses self-adjointness is never explained correctly.](2) Spectral decomposition is a generalization of diagonalization of matrices, and is the theoretical underpinning of separation of variables(34B.3, 34B.6).(3) Whether we may apply the spectral decomposition to a partial differential operator can be checked very formally (34B.5).(4) Spectrum of an operator is often directly related to physical observabIes as electronic and phonon spectra. A clear definition of spectrummust be recognized (34C.2). Physicists call absolutely continuous spectrum band spectrum, and point spectrum discrete spectrum (34C.8).Cantor-set like spectrum has also become relevant to physics, which isthe singular continuous spectrum.
466
34.A Self-Adjointness
34A.l Linear operator.416 As discussed in 20.9 the superpositionprinciple requires that the quantum mechanical state is described by avector in a vector space (----+20.1) (Hilbert space ----+20.3) V. A linearoperator A is a linear map from a subspace D(A) of V into V. D(A) iscalled the domain of A, and AD(A) ={Az : z E D(A)} is called therange of A. In quantum mechanics it is assumed that a linear operator(with appropriate properties) A corresponds to a dynamical variable(observable), and that for a state Ix), the expectation value of the observable A is given by (xIAlx).417Example. The domain of d/ dx in L2([a, bJ) (----+20.5(2» is not thewhole space, because d/ dx cannot be operated on non-differentiablefunctions. 418 However, since G I ([a, b]) is dense in L 2([a, b]), the domainof d/dx is dense in L2([a, b]).
34A.2 When can a linear operator be an observable?(1) Let A be a linear operator on a Hilbert space V (----+20.3). If D(A)is dense in V and Hermitian (i.e., (xIAy) = (Axly)419 ), we say A issymmetric. Since this is a necessary and sufficient condition for (xIAlx)to be real, physical observables must at least be symmetric.(2) However, this is not enough, because the extension of A may notbe symmetric. An operator A such that D(A) ::) D(A) and A = A onD(A) is called an extension of A. Unfortunately, indeed some symmetric operators are extended to non-symmetric operators.420 The wholeHilbert space should be physically meaningful, so that symmetry is notenough to characterize a respectable observable.
416The most authoritative (and accessible) reference is T. Kato, Perturbation Theory for Linear Operators (Springer, 1966).417Dirac explicitly assumes these, while Landau and Lifshitz use spectral decom
position to justify the assumption. However, all the assumptions have come fromthe observations based on finite dimensional linear algebra.
418More precisely, df/dx E L2([0, b]) is required.4190f course, this means
Jx(t)(Ay)(t)dt = J(Ax)(t)y(t)dt.
420 An example from H. Ezawa, Quantum Mechanics III (Iwanami, 1972) p26.follows. Let V = L 2 (R). The operator Z is defined by
(34.1)
467
(3) It is important that a symmetric operator A which corresponds to a'physical observable' should not be extended further. A condition is theself-adjointness. To understand this statement, we need the followingentries.
34A.3 Adjoint operator. Let A be an operator on a Hilbert space Vwhose domain is dense. Let D(A*) be the totality of x E V such that
(xIAy) = (zly) (34.3)
for all y E D(A) for some z E V. For x E D(A*) z is unique: if therewere two Zl and Z2, then (Zl - z2ly) = 0 for 'l/y E D(A). Since D(A)is dense, this implies Zl = Z2. Thus there is a unique map x -+ z. Wewill write this as z = A*x, defining a linear map A*. This is called theadjoint of A.
For example, -idjdx defined on cJ 421 is self-adjoint:
JdTf(x) (-i d~) g(x) = JdT {-i (d~f(X)) } g(x),
so that indeed (-idjdx)* = -idjdx.
(34.4)
34A.4 Self-adjoint operator. If A is a linear operator with a densedomain and A = A* (i.e., D(A) = D(A*) and symmetric), then A iscalled a self-adjoint operator.
34A.5 Observable should be at least self-adjoint. We knowthat an observable must be a symmetric operator. However, A* isobviously its extension, so it is natural to interpret that A* is 'the'observable. However, we know that this may not be symmetric. Thisstrongly suggests that observables must be self-adjoint, so that we willnever encounter imaginary eigenvalues. Later, we will learn that fora self-adjoint operator, we can unambiguously determine (define) theprobability of observing a particular value (or a particular range of thevalues) for any state in the state space thanks to the spectral decomposition theorem (-+34B.3). This justifies the identification.
with the dense domain spanned by {Hn e- x2!2} (-+21B.6). It is easy to check that
Z is symmetric. However, if this is applied to
cp(x) = X- 3!2 e-1!4x2
, for x> 0; otherwise cp(x) = 0, (34.2)
we know Z cp( x) = -icp(x) (except at x = 0; this exception may be ignored, because we are in a L2-space), so that (cp\Z\cp) = -i, the expectation value is purelyimaginary!
421 C 1 functions with compact supports, i.e., they vanish outside sufficieintly largesphere centeres at the origin.
468
34.B Spectral Decomposition
34B.l Spectral decomposition in finite dimensional space. Consider a normal linear operator422 A on a finite dimensional vector space.Let {>..} be its eigenvalues, and I>..) be the corresponding normalizedeigenkets. Then, we have the following spectral decomposition formula
A = L 1>")>"(>"1 = L >"P(>") ,A A
(34.5)
(34.6)
where P(>..) is the orthogonal projection (---t20.18) to the eigenspacebelonging to >...
1 = L 1>..)(>..1 = L P(>..)A A
is called a decomposition of unity (---t20.15). If we can have this decomposition, we can spectral decompose the operator. How can wegeneralize this to the operators on a Hilbert space (---t20.3)?
34B.2 Decomposition of unity in Hilbert space. This is, forphysicists, just (---t20.23)
1 =i: Iv)w(v)dv(vl, (34.7)
where Iv) is an eigenket or improper eigenket (because it may not benormalizable), and w is a weight function (let us call w( v) a spectralweight). To find improper eigenkets is called the generalized eigenvalueproblem (35.5 solves the problem.).
34B.3 Theorem. Let A be a self-adjoint operator (---t34AA) on aHilbert space V. Then, there is a unique decomposition of unity
such that
1 = JIv)w(v)(vl
A = i: vlv)w(v)dv(vl·
(34.8)
(34.9)
34BA Why do we pay attention to spectral decomposition?
4221£ a linear operator A satisfies A *A = A.4.*, then we say A is a normal operator.Its matrix representation is a normal matrix and is diagonalizable with a unitarytransformation. Actually, a necessary and sufficient condition for a matrix A to bediagonalizable with a unitary transformation is that A is normal.
469
It is a fundamental tool to understand operators, and is a very usefultool for quantum mechanics. In our current partial differential equation context, the spectral decomposition is of superb importance withrespect to, as the reader should have already guessed, the separationof variables (-+18, 23). However, to understand the justification ofthe method in general, we need almost all the machineries of elementary functional analysis. First of all, most partial differential operatorsare not self-adjoint. For example, the Laplacian with a homogeneousDirichlet condition is only symmetric. Hence, to use the operator theory, we must consider the self-adjoint extension (-+34A.2) of the differentialoperator. Rather heavy tools are required to obtain it, but theresult boils down to:
34B.5 Practical conclusion. The following is a practical conclusion about differential operators:(1) If P(x, D) is formally self-adjoint, Le.,
where
for
kf(x)P(x, D)g(x)dx = J(pT(x, D)f(x)) g(x)dx,
pT(x, D)f(x) = L (-D)Q(aQ(x)f(x)),IQlsm
(34.10)
(34.11)
(34.12)P(x, D)f(x) = L aQ(x)DQ
,IQlsm
(This guarantees that the operator is symmetric (-+34A.2)) and(2) if P(x, D) is semibounded, i.e., for any sufficiently differentiablef E L 2([2), there is a positive a such that
± kf(x)P(x,D)f(x)dx :S allfl1 2 (34.13)
for + or -, then (thanks to Friedrichs-Freudenthal's theorem423 ), thenP can be extended to a self-adjoint operator and,(A) The totality of normalized eigenfunctions fUn} of the operator:
(34.14)
makes an orthonormal basis for L 2([2),and(B) we may justify the separation of variables:
423See K. Yosida, Functional Analysis (Springer, 1980 Sixth edition), Chapter XI,Section 7, Theorem 2.
470
34B.6 Justification of separation of variables. Let n be a regionand P be a partial differential operator (with appropriate boundaryconditions) on L 2(n) satisfying the consitions (1) and (2) in 34B.5.Then there is an appropriate weight w (-+34B.3) such that the solution to
Ltu = P(x, D)-u, (34.15)
where L t is a differential operator with respect to time, is given by I.{J
such that
(34.16)
The formula inside [ ] holds if the spectrum is discrete (if not, the formula is not simple as we will see in 36.5).
Discussion.(A) The extension may be understood formally as follows. Let L· be the formaladjoint of L. Then the operator L introduced as follows is the extension of L (thatis, L· = L :::> L).
(uILv) = (L*ulv).
(B) We have encountered the following equation in 23.9 (2)
[d2 1 d m
2]- + -- + - R = _>.2 Rdr 2 r dr r 2
(34.17)
(34.18)
with the boundary conditions R(a) = R(b) = 0 (a < b). The eigenfunctions arewritten in terms of the following 'esoteric' functions Jim (x) and Kim (x). We wishto demonstrate that the eigenfunctions of this problem makes a complete system.We wish to use the 'high-tech' functional analytic weapon. That is:(1) Demonstrate that the operator is formally self-adjoint.(2) Demonstrate that the operator is semibounded (......25B.14).(C) With the aid of the same argument as above demonstrate that the totality ofspherical harmonics makes a complete set of functions. That is, demonstrate that
2_ 1 8. 8 1 82
L = -:--8 88 sm 888 + -.-2- 8 2sm sm 8 'P
is formally self-adjoint and semibounded.424
(34.19)
424Do not forget an appropriate weight when you perform integration. This also
applies to (B).
471
34.C Spectrum
34C.l Introduction to spectrum. Physicists usually write for alinear operator
£IA) = AlA) (34.20)and say that A is an eigenvalue. However, if £ is a linear operatoracting on a subset of a Hilbert space, then the equation makes sense,strictly speaking, only when IA) is in the Hilbert space (That is, IA)is normalizable ---t20.3). We know this is not always the case. If werewrite (34.20) as
(£ - A)IA) = 0, (34.21)
we realize that what we wish to mean by (34.20) is that (£ - A)-l is nota bounded operator: a linear operator A is a bounded operator, if its operator norm (---t12.2) is bounded: IIAII - sUPaED(A) IIAall/llall < +00.
34C.2 Resolvent, resolvent set. Let £ be a linear operator on aHilbert space V with a dense domain (---t34A.l). The operator
R(A) =(£ - A1)-l (34.22)
is called the resolvent of L. If the domain of R( A) is dense, and R( A)is bounded on its domain, then A is called a regular point. The totalityof the regular points of L is called the resolvent set of L and is denotedby p(L).
Notice that if A, JL E p(£). then
R(A) - R(JL) = (A - JL)R(A)R(JL). (34.23)
This is called the resolvent equation.
Exercise.(1) Demonstrate the resolvent equation.(2) Construct the resolvent kernel (i.e., R(x, y; >..) == (xl(L - >..)-lly)) for L =-d2 /dx 2 with the boundary condition 11'(0) = 11'(1) = O. Cf. 20.28,20.29.
34C.3 Spectrum. Let £ be a linear operator whose domain is densein a Hilbert space V. Then a-(£) =c \ p(£) is called the spectrum of£. In other words, A is a point in the spectrum of £, if (£ - A)-l is notdefined, or even if it is defined, its domain is not dense in V. or even ifdense, it is not a bounded operator.
34C.4 Classification of spectrum. Let T be a linear operator whosedomain is dense in a Hilbert space V.(1) If T - A is not one to one, that is, there is a nonzero ket IU)425 such
425 Of course, the ket must be in the Hilbert space. That is, it must be normalizable.
472
that Alu) = >'Iu), we say>. is an eigenvalue. The totality of such>. iscalled the point spectrum of T.(2) If T - >. is one to one, but if R(>') is not a bounded linear operator,and(21) if the domain of R(>') is dense, then we say>. belongs to the continuous spectrum.[(22) if the domain of R(>') is not dense, then we say>. belongs to theresidual spectrum. ]
34C.5 Discrete and essential spectrum. The totality of eigenvalues is called the point spectrum (Jp' The union of the continuousspectrum and the set of eigenvalues of infinite multiplicity is called theessential spectrum and is denoted by (Jess (L ). (J( L) \ (Jess (L) is calledthe discrete spectrum and is denoted by (Jdisc( L).
34C.6 Classification of continuous spectrum. Let L be a linear operator whose domain is in a Hilbert space V with a continuousspectrum (Jc(L). It is classified as follows:
Let w(>.) be the spectral weight (---+34B.2). If for any set A c(Jc(L) with measure zero (---+19.3) fA 1>.)w(>.)d>.(>.IV = {O}, we saythe spectrum is absolutely continuous, and the continuous spectrum iscalled an absolutely continuous spectrum. The definition applies to asubset of (Jc(L), so we may say the operator L has an absolutely continuous spectrum in [a, b], if f; 1>.)w(>.)d>.(>.IVV is a nontrivial subspaceof the Hilbert space V, but for any measure zero subset Q of [a, b]fQ 1>.)w(>.)d>.(>.IV = {O}. Otherwise, we say L has a singular continuous spectrum (like the one concentrated on a Cantor set).
34C.7 Pure point spectrum. Let L be a linear operator whose domain is dense in a Hilbert space V. If the linear hull of the eigenspacesfor all >. E (Jp(L) is dense in V, then we say L has a pure point spectrum.
34C.8 Are the above classification relevant to physics?(1) The Hamiltonian of ID harmonic oscillator has a pure point spectrum. (J = (Jp = (Jdisc'(2) The Hamiltonian of a particle in a ID periodic potential has an absolutely continuous spectrum, which physicists call a band spectrum.(3) Consider a random Id harmonic lattice. For example, the value ofthe spring constant is k or k' (=I- k) chosen randomly for each spring,or a harmonic lattice with a uniform spring constant but two kinds ofmass points m and M(=I- m) randomly placed on the lattice points. Inthis case all the harmonic modes are localized (i.e., in l2 ---+20.5(1)) andits spectrum is pure point (---+34C.7). The reason for the localizationis not very hard to understand intuitively; if there is a cluster of lighter
473
atoms, then they tend to behave differently from the rest. If the readersolve a finite size lattice system, then the mode localization lengthsmay be larger than the system size, so she would see clear localizationfor higher frequency modes only as illustrated below:
---....J.....-- 42
----..,Jv-"~ . "
1M * :~A 10
vlillo----I ,,_
H
1'''' I\f\f\" u°\.T\Tvv4J
I'll 0 0 00_- IeVlJ V
- .......o.....··oI1lW~,......u--
I\At\ At
12
J.11L==-/i-ol
(4) The problem in (3) is mathematically the same as the randomFrenkel model; that is, the discrete Schrodinger equation with randomhopping or with random site potential energy can be cast into the harmonic lattice problem. In this case localization is called the Andersonlocalization.(5) If the spring constant or hopping probability above is chosen to bealmost periodic (that is, it behaves like sin kx with k being irrational),426then the spectrum becomes self-similar.
In this case the eigenfunctions are not localized in the standardsense (Le., not in 12), but very different from the ordinary delocalized wave functions. If the largest peak is normalized, then in manycases the slow algebraic decay is observed. Experimentally, now we canfabricate almost periodic layered structures on which we can performoptical experiments. Numerically, the behavior above can be observedmost easily with the most irrational k = 1/(1 + 1/(1 + /(1 + 11(· . '.
/AI~ ~c/7.';-?>
t1. h c./IJ-w rt penod. (r;yr~
(4/er- !Cr:>hho.;...) I 1____________ ,,1".1 11.1.1' 11 1111
426physicists say a function f(x) is almost periodic if f(x) is a sum of periodicfunctions with incommensurate (not rationally related) periods.
474
(6) If the system exhibits only a point spectrum, then there cannot beany transport of phonons or electrons, because all the eigenfunctionsare spatially localized.
Discussion.If the system exhibits only a point spectrum, then there cannot be any transportof phonons or electrons, because all the eigenfunctions are spatially localized.
34C.9 Compact operator. If a linear operator A has a sequenceof finite-dimensional operator427 converging428 to it, we say A is a compact operator. If A is self-adjoint, then, roughly speaking, we can writeA", 2:1:=1 Ik)Ak(kl·
34C.10 Integral operator, Fredholm integral equation. Formally we can introduce a linear operator by the following integral429
(fu)(x) = l bdyw(y)K(x, y)u(y), (34.24)
where we assume u E L2([a,b],w) (---+20.19), and K is an integrablefunction. f is often called a Fredholm operator, and K is called itskernel.
u = fu + f (34.25)
for some function f E L2([a, b], w) is called a Fredholm integral equation.
34C.11 Theorem [Hilbert-Schmidt]. f in 34C.10 is a compactoperator, if
l b
dx w(x) l b
dyw(y)IK(x, y)1 2 < 00. (34.26)
Exercise.The inverse operator of the regular Sturm-Liouville operator is compact. Demonstrate this statement. Cf. 15.6.
34C.12 Spectral theorem for compact self-adjoint operator[Hilbert-Schmidt]. Let A be a compact self-adjoint operator (---+34C.4)on a Hilbert space V. Then,(1) V has an orthonormal basis {I en)} consisting of eigenvectors of A.(2) Let Alen) = Anjen ). Then An ---+ 0 as n ---+ 00.
(3) If Ix) = 2: cn/en), then Alx) = 2: cnAnlen). 0
427 A linear operator B is said to be finite dimensional, if its non-zero spectrum ispoint (-+34C.4) and the total dimension of its eigenspaces is finite.
428 with respect to the operator norm.429Mathematicians introduce a measure dfl instead of w. Cf. a19.
475
Thus, almost everything true for a finite dimensional Hermitian matrixis true. The only caution we need is that we cannot freely change theorder of the vectors in the basis (--+20.17). DCompactness implies A ,....., 'Ef=1 Ik)>'k(kl, so intuitively, the theorem isplausible.
34C.13 Variational Principle for compact self-adjoint operator. Let A be a compact (--+32C.9: do not forget that the theoremis NOT for any self-adjoint operator) self-adjoint linear operator on aHilbert space V. The unit vector If) which maximizes UIAIf) is aneigenvector of A belonging to the eigenvalue with the largest moduluswhich is identical to IUIAIf)I. D
34C.14 Finding eigenvalues with the aid of variational principle. With the aid of 34C.13 we can determine the largest moduluseigenvalue >'1 of a compact self-adjoint linear operator A, and a vectormaximizing F(x) to be denoted by 1>'1)' Let VI be the perpendicularsubspace to \>'1)' Since
(34.27)
if Iy) E VI, so is Aly) E Vi. Hence we can apply the same argumentto A restricted to VI. In this way we can construct the nonincreasingsequence (in modulus) of eigenvalues >'1, >'2," '.
476
(35.1 )
35 Spectrum of Sturm-Liouville Problem
Eigenvalues for a regular Sturm-Liouville problem can bestudied more conveniently through its Green's function whichis a Hilbert-Schmidt kernel.
Key words: Sturm-Liouville eigenvalue problem, fundamental theorem
Summary:
(1) Remember that the inverse operator of the regular Sturm-Liouvilleoperator is compact. All the fundamental properties of its spectrumfollows from this fact (--+35.3).(2) Details of the Weyl-Stone-Titchmarsh-Kodaira theorem 35.5 neednot be understood, but remember that there is a general way to expanda function in terms of functions in a fundamental system of solutionsof a formally self-adjoint differential operator.
35.1 Rewriting of the eigenvalue problem as integral equation.The Sturm-Liouville eigenvalue problem is to find A for
LSTU..- [ddxp(X) d~ + q(x)] U = AW(X)U
(with p > 0) under the following boundary condition:
Ba[u] - Ap(a)u'(a) - B'u(a) = 0,Bb[u] - Cp(b)u'(b) - D7L(b) = 0,
(35.2)
(35.3)
The problem can be rewritten with the aid of the Green's function(--+15.6) as
u(x) = AJdyw(y)G(xly)u(y) = A(9U)(X)
G is called the kernel of the integral operator 9.
(35.4)
35.2 Formal theory. [20.28 repeated] (35.4) can be written as
I'u) = A9Iu ), (35.5 )
where bras and kets are defined with the weight function tv (--+20.22,20.23). Let Ii) be an eigenket belonging to the eigenvalue Ai:
(35.6)
477
(35.7)
If {Ii)} is an orthonormal basis of L2([a, b], 'W) (~20.19), then from(35.5) we get
That is, the Green's function can be written as
(35.8)
We must justify this result.
35.3 Theorem fFundamental theorem of Sturm-Liouville eigenvalue problem. The eigenfunctions of a regular Sturm-Liouville problem (~15.4) form an orthogonal basis of L2([a,b],w) (~20.19), andthe sequence of eigenvalues satisfies IAnl ~ 00 as n ~ 00.0
[Demo] We can explicitly construct the Green's function for this problem as in 15.6,which is a continuous function of x and y, so that Q, whose kernel is given by 15.6,is a compact operator (-+34C.9) thanks to Hilbert and Schmidt 34C.ll. Its selfadjointness is also easy to demonstrate. Hence, we can apply 34C.12. Note thatthe eigenvalues here are the reciprocals of those in 34C.12.
Discussion.(A) Classical approach due to Priifer.Our demonstration heavily relied on functional analytic methods. The facts wereknown before functional analytic methods were widely available. Here a classicalproof of the theorem due to Priifer is given. The argument may seem more complicated and more artful, but more delicate results than those obtained by a high-techfunctional analysis may be obtained.(1) Suppose there is a solution u ;f=. °to (35.1). Then, pu' and u do not vanishsimultaneously. Hence, we can introduce a polar coordinate system such as
u(x)
p(x)u'(x)
p(x) sin lI(x),
p(x) cosll(x).
(35.9)
(35.10)
(2) Our eigenvalue problem can be rewritten as follows:
p' (x)
lI'
(p(X)-l + q(x) + Aw(x))psinllcosll
p(x )-1 cos2 II + (-AW(X) - q(x)) sin2 lI.
(35.11)
(35.12)
The second equation does not contain p, so we can integrate this for lI( x) with anarbitrary initial condtion lI(O) =: a.(3) A necessary and sufficient condition for A to be an eigenvalue of 35.1 is thatlI(x) with the initial condtion lI(a) =: a satisfies lI(b) =: ,6 + n7r, where n is a positiveinteger. Here the angles a and ,6 are determined as
tana =: A/B, tan,6 =: C/D (35.13)
with a,,6 E [0,11-).(4) Priifer's comparison theorem. Let lI(x, A) be the solution of (2) with the
478
initial condition 8(a) = a. Then, for x E (a, 13]
Al < A2 => O(x, Ad < O(x, A2). (35.14)
This tells us that the eigenfunction corresponding to a larger eigenvalue oscillatesfaster. 0 is monotonically increasing as a function of A. In particular,(5)
lim 9(b, A) ::::; 0,>'--00
lim 9(b, A) = +00.>'-+00
(35.15)
(35.16)
This implies(6) The Sturm-Liouville eigenvalue problem has a discrete set of eigenvalues suchthat
A1 < A2 < '" < An ...... +00. (35.17)
(7) Furthermore, the eigenfunction corresponding to the n-th largest eigenvalue hasexactly n - 1 simple zeros in (a, b). See 24A.13 (Discussion) for the simplicity ofthe zeros (non-degeneracy of eigenstates). For nodal sets, see 37.4. Also note thatthis proves the statement about the positions of the zeros of orthogonal polynomials21A.1l (2) (see 21A.7).(8) Completeness of the eigenfunctions: If a continuous function h(x) satisfies
l b
dxw(:r)h(x)¢n(x)dx = 0,
for all n E N, then h == 0, where ¢n is an eigenfunction belonging to An.Its proof depends on the fact that if (35.18) is true, then the solution to
[STY = w(x)h(x)
(35.18)
(35.19)
with the homogeneous boundary condtion has a continuous solution for any real A.However, this cannot be true if h == O.(9) (8) gives us a generalized Fourier expansion: If
(35.20)
is uniformly and absolutely convergent, then the coefficient can be computed as aFourier coefficien.(10) Let f be piecewisely C1• The formal series (35.20) is actually uniformly andabsolutely convergent.(B) Priifer's technique allows us to prove the following theorem about the distribution of zeros of a Schodinger equation:
u" + q(x)u = o.Suppose
m 2::::; q(x) ::::; M 2
•
Then, for any solution u =J=. 0, the spacing of the zeros fJ satisfies
~<fJ<~.M - - m
479
(35.21 )
(35.22)
(35.23)
Exercise.Suppose (35.21) is considered on [a, b] with a Dirichlet condition. Demonstrate thatthe magnitude of the eigenvalue An increases asymptotically as n2
•
Discussion.(C) Find the eigenvalues and eigenfunctions of the operator d2/dx 2 + A on [-1,1]with the following boundary conditions:(1) du/dx( -1) = du/dx(l) = O.(2) u - du/dx = 0 at x = ±l.(D) What happens if the regularity condition is dropped?430
Considerd (2 d )dt t dt x + AX = 0, (35.24)
with the following boundary conditions.(1) x( -1) + x'( -1) = x( 1) + x'(l) = 0 (no eigenvalue).(2) x( -1) + x'( -1) = 0 and x(l) - x'(l) = 0 (-2 is the only eigenvalue. Thecorresponding eigenfunction is t.)(E) Irrespective of the boundary conditions, the n-th eigenvalue of a Sturm-Liouvilleproblem is a continuous function of the coefficients of the equation (CourantHilbert).
35.4 Justification of separation of variables. When the regionof the problem is finite, very often the separated problems are regular Sturm-Liouville eigenvalue problem. Hence, 35.3 is the key (ifthe reader does not wish to use less elementary Friedrichs extension(---t34B.5)). However, notice that 35.3 is not enough to justify whatwe wish to do on unbounded regions. Friedrichs extensions work evenin such cases. Here, however, a more constructive theory is posted.
35.5 Theorem [Weyl-Stone-Titchmarsh-Kodaira]. Let L be asecond order linear differential operator which is formally self-adjoint:
d dL = --p(x)- + q(x),
dx dx
where p and q are Coo on (a, b).431 For A E R, consider
Lu = AU.
(35.25)
(35.26)
Let {'lh(x; A), 'l/J2(X; A)} be a fundamental system of solutions (---t24A.ll)of this equation. Then, there is a matrix measure Pij (i, j E {1, 2} )432
430 N431 a could be -00 and b 00.
432That is, any component of the matrix Matr{pij(A)} is a measure.
480
such that we can make the following decomposition of unity
6(x - y) = i:~ 'l/Ji(X; >')dPij(>'}ljJj(Y; >.).I,)
(35.27)
The equality here is in the L2-sense.433 Here the so-called density matrix Pij can be constructed from the resolvent (~34C.2) of L.D(35.27) implies the following:
f(x) = i:~ 'l/Ji(X; >')dPij(>.)fj(>'),I,)
and
h l b
h(>') = a dy'l/Jj(Y; >')f(y).
Thus fi(>') is a kind of generalized Fourier transform of f.
(35.28 )
(35.29)
433That is, when it is applied to a ket, the difference of RHS and LHS measuredin terms of the L 2 -norm is zero.
481
36 Green's Function: Laplace Equation
The Green's function method to solve the general boundaryvalue problem for the Laplace equation is given. Neumannconditions need special care.
Key words: fundamental solution, Kirchhoff's formula,Neumann function
Summary:(1) The reader must be able to explain the general idea of Green to herfriend, and how to use Green's formula (---t36.6).(2) The Neumann function needs a special care, because homogeneousboundary conditions and the unit source are not compatible (---t36. 7,37.9).
36.1 Summary up to this point. Definition of Green's functionsand fundamental solutions can be found in 14.2. An intuitive idea wasexplained in 1.8. Green's formula is in 16A.19 and some examples ofGreen's functions are in 16.
36.2 Fundamental solution. The fundamental solution ofthe Laplaceequation is a solution to
(36.1 )
It is customary to put - in front of the Laplacian, because -6. is apositive definite operator (---t32A.3). In d-space the following w is afundamental solution. For d ~ 3 the function vanishes at infinity, so itis also a Green's function for free space R d with the vanishing conditionat infinity (---t16A.4)
for d ~ 3,
for d = 2,(36.2)
where Sd-l is the surface volume of the (d - l)-unit sphere. 434
Notice that d-space function w can be obtained from the (d + 1)space counterpart through integrating along one coordinate direction(---t16A.5).
482
+p
(36.3)-P~
where S is the surface, r is the distance between the point on the surface and the ~6-.
point P where we measure the potential.(2) Let us introduce the angle () between the outward normal and the line connectingthe point on the surface and the point P. Then notice that
Discussion: Double layer.(1) Consider two parallel surfaces with their spacing d. We assume that the surfacesare orientable435 and let v denote the outward normal direction. Let us assume thatthe outer surface has a uniformly distributed charge of area density +p, and theinner surface has the same distribution of the charge but of oppoisite sign. p = pdis the area density of the dipole moment. We take the limit of d -> 0 while keepingp. The resultant double surface is called (electrical) double layer.(1) Show that the electrical potential (assuming 0 potential at infinity) is given by(in 3-space) (ignore numerical coefficients)
dr- cos() = -,
dv
so that (36.3) can be written as
yep) = (p cos () dO".is r2
(3) Notice that the solid angle of dO" seen from P is given by
dn = ± dO" cos () ,r 2
(36.4)
(36.5)
(36.6)
where the sign convention is +, if P is on the positive side of the double layer,436and -, if P is on the negative side. Hence, we have
V=± lPdn. (36.7)
(4) This implies that, when p = const., if P is outside a closed double layer S, thenV = O. If P is inside, then V = -41rp.
36.3 Theorem [Unique existence of Dirichlet problem Green'sfunction]. For any well behaved437 surface an enclosing an open region n, there exists the unique Green's function GD(xIY) for -Do whichvanishes on an.D
435that is, there are two sides unlike the Mobius strip.436This does not mean that P is located outside the layer even when the layer is
closed. Simply, we draw a tangent plane on the shell and we ask on which side Pexists.
437 This vague statement will not be made precise here to avoid the technicality.Piecewise smooth surfaces are admissible. Cf. 1.19(2) Discussion.
483
[Demo] The Green's function for the homogeneous Dirichlet problem is the solutionto
- L\Gn(xly) =6(x - y) (36.8)
with Gn = 0 for x E an. Here y is in n.438 The problem can be rewrittenas Gn(xly) = w(xly) + u(xly), where w is a fundamental solution in 36.2 and usatisfies
- L\u(xly) = 0 (36.9)
with the Dirichlet boundary condition u(xly) = -w(xly) for x E an. We have discussed that this problem has a unique solution at least informally (-.1.19, 29.9).
36.4 Symmetry of Dirichlet Green's function. In Green's formula (~16A.19) set u(x) = Gn(xly) and v(x) = Gn(xlz). Then, weget
If we use (36.8), this immediately gives
Gn(ylz) = Gn(zly),
(36.10)
(36.11)
the symmetry of the Green's function. We have already discussed this(formally in 35.2, 16A.20) (~37.7).
36.5 Free space Green's function is the largest. Let GD(xly)be the Green's function for a region D. Then,
(36.12)
Here w is the fundamental solution given in 36.2, that is, the Coulombpotential.This follows easily from the maximum principle 29.6.
36.6 Solution to Dirichlet problem in terms of Green's function(16A.21 repeated). The solution to the following Dirichlet problem onan open region n
- flu = ep, ulan = f,where ep and f are integrable functions, is given by
(36.13)
(36.14)u(x) = r GD(xly)ep(y)dy - r f(y)8n (Y1 GD(xly)dCJ(Y).Jn Jen
Here 8n (y) is the outward normal derivative at y, T is the volume element, and CJ is the surface volume element.
438Inevitably, y is an internal point of n, since it is open.
484
Discussion 3(.1v-The Discussion in 36.2 allows us to understand (! ;) in terms of the charge distri
j\bution in n and the double laye~n. That is, Dirichle conditions can be understood
as appropriate double layers. •
36.7 Special feature of homogeneous Neumann condition. Fora Neumann problem we do not know u but onu on the boundary. Weneed the Green's function satisfying the homogeneous Neumann condition. However, we cannot impose a homogeneous boundary conditionon a closed surface an as seen below. Let GN satisfy
Then, Gauss' theorem (--+2C.13) tells us that
r oGN dCJ = -1.Jan an
(36.15)
(36.16)
Therefore, the homogeneous Neumann condition cannot be imposed.439
The simplest boundary condition compatible with (36.15) is
oGN = -1/ r dCJ = -1 (36.17)an Jan (surface area of D)'
36.8 Neumann function. The function satisfying (36.15) and (36.17)is called the Neumann function. In terms of the Neumann function, thesolution to the following Neumann problem
reads
- 6:u = If, ulan = h (36.18)
(36.19 )
Note that the solution to a Neumann problem is unique only up to anadditive constant (--+1.19(3)).[Demo] In Green's formula let u be the solution and v be the Neumann functionGN. Then we have
u(x) = l GN(xly)ip(y)dy + full [GN(x1y)h(y) + u(y)j ill dO'(y)] dO'(y),
rGN(xly)ip(y)dy + r GN(xly)h(y)dO'(y) + canst. (36.20)Jn Jail439If we wish to keep the homogeneous Neumann boundary condition, we must
modify (36.15). This will be discussed in 37.9.
485
The constant can be ignored, because we need the solution up to an additive constant.
36.9 Method of images. (-+16A.7, 16A.8, 16A.14) With the aidof the superposition principle and the conformal invariance (say, thereflection principle) (-+16A.I0), we can construct Green's functionsfor special cases. For example, the half 3-space Green's function canbe obtained by 16A.7. Analogous half 2-space Green's function canbe obtained. Notice that this Green's function vanishes at infinity incontrast to the free space counterpart.
486
37 Spectrum of Laplacian
The spectrum of Laplacian gives the energy level of quantum mechanical billiards. It is important to grasp its general feature to understand the general spectrum of a particlein a potential well. One of the most interesting questionswas to determine the shape of the domain from the spectrum: Can you here the shape of the drum? Now, we knowthat this is impossible even in 2-space.
Key words: Fundamental theorem, nodes, eigenfunctionexpansion of Green's function
Summary:(1) Understand the eigenfunction expansion of Green's functions (-37.7,37.9).(2) Remember the general features of the spectrum and eigenfunctionsof the Laplacian with the Dirichlet condition on a bounded domain(-37.1). (Theoreticians) This is an example of the spectrum of compact operators.(3) We cannot hear the shape of the drum (-37.6).
37.1 Theorem [Fundamental theorem].440 Let n be a boundedopen region, and an be smooth. Then, the following eigenvalue problem
- /:).n = An, ulan = 0 (37.1)
has the following properties:(1) There are countably many eigenvalues {An} such that 0 :s; Al :s;A2 :s; .. " and limn ---+ oo An = +00.(2) There is no finite accumulation point for {An}.(3) Let 'Pn be an eigenfunction belonging to An. Then, {'Pn} IS anorthogonal basis of L 2(n).0
Physically, if we consider the eigenmodes of a drumhead, at least(1) and (2) are understandable. There should not be any upper limit inits frequency for an ideal continuum drumhead. For a finite frequencythere cannot be infinitely many independent modes.[Demo for 3-space] With the aid of the Green's function (--+36.3), we can convert(37.1) into an integral equation problem:
u(x) = Ain G(xly)u(y)dy == A(9U)(X). (37.2)
440 Actually, much more general theorems are known, since the Laplacian can bedefined on any Riemann manifold.
487
Since G(xly) - w(xly) is everywhere finite on n, if we can show
LIw(xlyWdx < +00 for'rly E n, (37.3)
the Hilbert-Schmidt theorem (-+34C.ll) tells us that 9 is a compact (self-adjoint)operator (-+34C.9). Let BE be a ball of radius E centered at y. On n \ BE theintegral is finite, so we have only to consider
(37.4)
But this is finite as can be seen from the order w 2 = O[lx - YI-2]. Hence, Theorem34C.12 tells us (1)-(3) except nonnegativity of the eigenvalue. We know -~ isnon-negative, so eigenvalues cannot be negative.
Discussion.According to the variational principle for the eigenvalues of self-adjoint operators,34C.13, we can say that the fundamental frequency of a drum goes up if the drumhead is constrained; in contrast, if the drum head is torn, then its fundamentalfrequency goes down.
37.2 Theorem [Monotonicity]. Let there be two open regions suchthat 0 :J 0'. Consider the eigenvalue problems -.6.u = AU on 0 withthe condition ·ulan = 0, and that with 0 replaced by 0'. Let the n-theigenvalue (arranged in the increasing order) for the problem with theregion 0 be An, and that for the region 0' be A~. Then, An :s; A~.D[Demo] We use the variational principle for the eigenvalues of compact self-adjointoperators 34C.13. Notice, however, the eigenvalue there is the reciprocal of theeigenvalues in our present context. That is, the variational principle gives us theeigenvalue with the smallest modulus. Due to the non-negativity of the eigenvalues, actually the variational principle gives us the smallest eigenvalue AI' Moregenerally, the minimum of (',01- ~I',O) under the condition (',01',0) = 1 is An in the orthogonal complement Vn of the direct sum of the eigenspaces for .AI, ... ,An-I' Forany n the minimum value of (',01- ~I',O) on Vn with the condition ',Olan = ',Olan' = 0cannot be smaller than that with the condition ',Oan = O.
37.3 Theorem. Eigenvalues depend on 0 continuously.D441
37.4 Theorem [Courant]. Let 'Un be the eigenfunction belonging tothe n-th smallest eigenvalue of -.6. on 0 under the condition ulan = O.Then the nodal set:
(37.5)
441See Courant-Hilbert, vol. I Chapter 6, Section 2 Theorem 10.
488
separates D into at most n disjoint components.D442
Discussion.Consider iA.Laplace eigenvalue problem in a bounded closed domain with a homogeneous Dirichlet boundary condition in 2-space. The curves on which the eigenfunction vanishes is called the nodal curve. Demonstrate that a nodal curve isperpendicular to the boundary curve, when the former touches the latter where thelatter is smooth.
37.5 Vibrating drumhead. The eigenmodes of a 2-dimensional drumhead of shape Dobey
(37.6)
If D is a disk of radius a, then the eigenfunctions (modes) are given by
(37.7)
where w = r~m) fa with r~m) being the n-th zero of Jm (--+27A.2). Illustration of low frequency modes can be found in Wyld p164-5.443
37.6 Can one hear the shape of the drum? Suppose the setof all the eigenvalues of -~ on D1 and that on D2 are identical. Canwe conclude that the shapes of the domains are congruent: D1 =D2?If yes, we can hear the shape of a drum. Now, we know this is not trueeven for 2-d drums.444 However, we can hear quite a lot. For example,we can here the area of the drumhead: Let N (.x) be the number ofeigenvalues less than .x. Then,
(37.8)
asymptotically for large .x, where p,(D) is the volume of D (conjecturedby Lorentz who gave a lecture on this at Gottingen. This was laterproved by Weyl). We can also here the number of holes.
37.7 Eigenfunction expansion of Green's function. The formal
442See Courant-Hilbert, Chapter 6, Section 6 for a proof.443Excellent pictures of modes of a kettledrum can be found in T. D. Rossing,
"The Physics of Kettledrums," Sci. Am. 247 (5) (1982) [November 1982].444 A readable account can be found in M A Shubin (I'd.) Partial Differential
Equations VII (Springer, 1994) Section 16.7 (p165-). However, the counter examplesare all on the domains with non-smooth boundaries. No smooth counterexample isknown. This is still a major problem. Historically, the first negative answer to thequestion was given in 16-space by Smale.
489
theory in 35.2 can be justified exactly as in the regular Sturm-Liouvilleproblem (---+35.3) thanks to 37.1. Hence we have:Theorem. The Green's function for the Laplacian in a compact domain n can be written as
00
G(xly) = L Ai1Ui(X)Ui(Y),i=l
(37.9)
where Ui is the normalized eigenvector belonging to the eigenvalue Aiof -.D..OFrom this, the symmetry of Green's functions (---+36.4) is obvious.
37.8 Examples.(1) The Green's function for a rectangular domain [0, a] X [0, b]. Theeigenvalues and the corresponding normalized eigenfunctions are givenby
2 . m1rX . n1rY (m1r) 2 (n1r) 2U mn = "j(;b sln -a- sm -b-' Amn = ----;;: + b (37.10)
for positive integers m and n. Hence, the Green's function for thepresent problem is, according to (37.9)
4 sin !!!:J!£ sin m7rX' sin !!:!!.Jl. sin mry'G( 1") - '"' a a b b
x,y X ,y - 1r2ab m;-;'O (mja)2 + (njb)2 (37.11)
(2) Cylindrically symmetric Green's function for 3-space. In this caseit is sensible to define the L2-space with weight r, because the volumeelement is 21rrdrdz. Hence, the delta function with the same weight(---+20.25) is convenient (that is, 8(r - r')8(z - z')jr ---+20.26). TheGreen's function is the solution to
(EP 1 0 02 ) , 8(r - r')
-.D.u = - -or-2 + :;:--or + -oz-2 U = 8(z - Z )----'--r-...:.... (37.12)
with the vanishing condition at infinity. We first solve the eigenvalueproblem
(02 1 0 02 )
- or2 + :;:- or + oz2 U = K,2 U • (37.13)
We get the eigenvalues and the corresponding normalized eigenfunctions as (---+27A.21)
1 iKZ T (k) \ 2 k2'tLK,k = .J21fe JO r, /lK,k = K, + .
490
(37.14)
Here, K E Rand k is any positive real. Thus 37.7 (or its naturalextension) tells us that the Green's function for our problem is
, , _ 1 1+00 1000 eiK,(z-zIlJo(kr)Jo(kr')G(r,zlr ,z) - -2 dK dk .2 k2 •
1r -00 0 K +(37.15)
Exercise.Construct the Green's functions for the Laplace equation with the following boundary conditions:(1) On [0,11"] X [0,211"] with a homogeneous Dirichlet boundary condition along x = 0,x = 11" and y = 211", and a homogeneous Neumann boundary condition on y = O.(2) On the same domain with a periodic boundary condition. ,;., -f1u. d- oI;.,-e.otz.;... (4. ):: J.I~(;hv·",
(j 1k f"""'-").
37.9 Neumann function in terms of eigenfunctions. Under thehomogeneous Neumann boundary condition any constant is an eigenfunction belonging to the zero eigenvalue. Hence, as can clearly be seenin (37.9), we cannot construct the Green's function. However, still thefollowing 'generalized Green's function' works:
,GN(xly) = L Aiui(x}ui(Y), (37.16)
where' implies that zero eigenvalue is excluded from the summation,and Ui is the normalized eigenfunction belonging to the eigenvalue Ai.
The solution to 37.8 can be written as
(37.17)
(This is essentially (36.19). The difference is a constant which we mayignore.)[Demo] First we find the equation for GN
(37.18)
where V is the volume of n. We have used that the normalized eigenfunctionbelonging to zero is 1/VV. Since the eigenfunctions are with the homogeneousNeumann condition
(37.19)
This is compatible with the equation (37.18). Now put v = GN in Green's formula16A.19, and we get (37.17), ignoring an additive constant.
491
38 Green's Function: Diffusion Equation
The Green's function method to solve the general initialboundary value problem for diffusion equations is given.The Markovian property of the free-space Green's function(= heat kernel) is the key to construct Feynman-Kac pathintegral representation of Green's functions.
Key words: reciprocity, general solution formula, eigenfunction expansion, Markovian property, Feynman-Kac formula, path integral.
Summary:(1) The reader should roughly remember the strategy for constructingGreen's function, and its use (---t38.9, 38.7).(2) The relation between the heat kernel and random walk is extremelyimportant.445 Many important properties of the heat kernel can be derived and/or understood with the aid of this interpretation.(3) The Markovian property (---t38.10) of the heat kernel is crucial indeveloping path integrals (---t38.11).(4) Functional integrals are staple for theoreticians. Cf. Glimm andJaffe.446
38.1 Summary up to this point. We have constructed the freespace Green's function, and used it to solve the initial value-problemin 16B.1, 16B.10. The image source method is explained in 16B.9 toconstruct Green's functions for various simple regions.
38.2 The most general diffusion problem. The general form ofthe problem on the region n is
(at - D!:l)u(x, t) = 'f?(x, t) on n (38.1 )
with the boundary condition, a Dirichlet or a Neumann condition on anfor t > 0, and the initial condition u(0, x) = f (x). It is a standard trick
445See 16B.8. An elementary (and classic) introduction is: S. Chandrasekhar.Rev. Mod. Phys. 15, 1 (1943). This does not use 8-function at all. If you use it,you can rewrite the review in a more concise way. To make a modern version is agood exercise (was good to the lecturer as an undergrad).
446 J. Glimm and A. Jaffe, Quantum Physics, a functional integral point of view,(Springer, 1987). However, the book is not recommended to a casual reader.
492
that the initial condition can be written as a source term as (----+16B.5)
(at - DtJ.)u(x, t) = cp(x, t) + f(x)8(t) on n. (38.2)
Hence, we have only to solve the homogeneous initial value problem.
38.3 Green's function. The solution to
(at - D tJ. )G(x, t Iy, s) = 0(t - s)8(x - y) (38.3)
with the homogeneous boundary condition is called the Green's function.
38.4 Existence of Dirichlet Green's function. We have constructed the Green's function Go for the free space in 16B.1. Now wewish to determine the Green's function for the homogeneous Dirichletproblem. Note that u.,- G(x, tlx', t') - Go(x, tlx', t') obeys the diffusionequation with the boundary condition
u(t, xlt', x') = -w(x, tlx', t') on an, t > t' (38.4)
and the initial condition u = 0 for t = t' (or t ::; t'). The uniqueexistence of the solution has been discussed (heuristically 1.18; 28.3).Hence, the existence of Green's functions is guaranteed at least for acompact domain.
The Neumann condition can also be treated analogously.
38.5 Counterpart of Green's formula. Let.c = at - DtJ. and.c+ =-at - DtJ.. Then for u which is zero for t ::; 0447 and also u ----+ 0in the t ----+ 00 limit we have
{'XJ dt { dx[(.cu)v-u.c+v] = -D (')Q dt r d(J"·(v"V'u-'u"V'v). (38.5)Jo In Jo Jan
This is essentially Green's theorem and can be proved quite analogously(----+16A.19).
Exercise. Prove this.
38.6 Reciprocity relations. Notice that the Green's function is afunction of t - s (time translational symmetry), so that G(x, t - TIY, sT) = G(x, tly, s). If we choose T = t + s, we get
G(x, tly, s) = G(x, -sly, -t). (38.6)
447Since the solution to the diffusion equation is very smooth, we may put theinitial condition at t = 0+ instead of t = o.
493
We have
(- :t - D~x) G(x, -tly, -s) = 6(x - y)6(t - s) (38.7)
(38.8)
as can easily be seen from the change of variables t -+ -t and s -+ -soHence, (38.6) implies
t:,+G = ( - :t - D~x) G(x, sly, t) = 8(x - y)8(t - s).
If we set u = G(z, rly, s) and v = G(z, tlx, r) in (38.5), we obtain,regarding u and v as functions of z and r
~oo dr in dz[(£G(z,rly,s))G(z,tlx,r)-G(z,rly,s)£+G(z,tlx,r)] = O.
(38.9)(Here the operators act on the functions of z and r.) That is, with theaid of (38.8)
G(y,tlx,s) = G(x,tly,s). (38.10)
38.7 Solution to general boundary value problem. In terms ofthe Green's function the solution to
(at - D~)-u(x, t) = cp(x, t) (38.11)
under the initial condition u(x, 0) = f(x) and an appropriate boundarycondition (inhomogeneous Dirichlet or Neumann that may depend ontime) reads
u(t,x) it ds kdyG(x,tly,s)cp(y,s) +kdyG(x,tly,O)f(y)
r r [ ou(y, s) a ]+ D Jo
ds JandCJ(Y) G(x,tIY,S) on(y) -'u(y'S)on(y)G(x,t1Y,S) .
(38.12)
Here the surface term simplifies if we specialize the formula to Dirichletor Neumann cases.[Demo] In the analogue of Green's theorem 38.5 we set u to be the solution tothe problem, and G to be the Green's function for the corresponding homogeneousboundary condition. We know .c+G(x, sly, t) = .c+G(y, six, t) = .c+G(x, tly, s) =8(x - y)(t - s) (-+38.6).
38.8 Steady source problem, recurrence of random walk. Let
494
us assume that the source term cp(x, t) is time-independent point source8(x), and the problem is in the free space with °initial condition. Then,(38.12) gives
U(x, t) = 1t
G(x, tiD, s)ds, (38.13)
which is increasing without limit for d ~ 2 and finite for d > 2. Thisdistinct behaviors for d > 2 or not can be understood as the recurrenceproperty of the random walks.
38.9 Eigenfunction expansion of Green's function. (cf. 35.22)Let An be the n-th eigenvalue of -.0. on n with a homogeneous boundary condition (Dirichlet or Neumann), and Un be the correspondingnormalized eigenfunction. Then the Green's function for the diffusionequation with the same boundary condition G(x, tlx', t') reads
(38.14)n
Notice that in this case the zero eigenvalue existing for the Neumanncondition is not excluded (this is required by the conservation of thetotal mass).
Exercise.Find the Green's function for the following equation on the unit 3-cube [0,1] x[0,1] x [0,1]
au 1-=-~u-cuat 2 '
(38.15)
where c is a positive constant, with a homogeneous Dirichlet boundary condition.
38.10 Markov property revisited. (-+16B.7) For the heat kernelGo (-+16B.1),
(38.16)
[Demo] Note the 'translation symmetry' (38.6) allows (38.16) to be rewritten withthe introduction of g(x, t) = Go(x, tiD, 0) as
g(x,t) = ( dyg(x-y,t-s)g(y,s).JRd
(38.17)
(There is NO integral with respect to time.) Introducing the Fourier transform 9 ofg with respect to x, this reads g(k,t) = g(k,t - s)g(k,s) (---+33.8). This is obviousfrom
'(k t) - -Dk2tg , - e .
495
(38.18)
which is directly obtainable from (at - Dtl)g(x, t) = 8(t)8(x).448
38.11 Feynman-Kac formula for the heat kernel. Using (38.16)repeatedly to divide the time axis into pieces, we get
N-I N
g(x, t) = II JRd dXi II g(Xi - Xi-I, t i - ti-I),i=1 i=1
(38.19)
where t N - t, to - 0, XN =X and Xo =0. Let us choose the equalspacing of the time axis f::::.t = t i - ti-I for all i, and let f::::.xi .,..- Xi - Xi-I'Then (38.19) and (448) imply
N-I N
g(x, t) = II JdXi(4?rDf::::.t)-3j2 exp[- 2]f::::.Xi? /4Df::::.t]. (38.20)i=1 i=1
If f::::.t is sufficiently small, then, formally,
t (f::::.Xi)2 -t rtdt (dX) 2
i=1 f::::.t io dt
Therefore, formally, (38.20) converges to
lX
(t)=x [ 1 1t(dX) 2]g(x, t) = 1>[x(.)] exp -- dt - ,
x(Ol=O 4D 0 dt
(38.21)
(38.22)
where 1> is the 'uniform measure,449 on the set of continuous functions[0, t] -t R 3
. This is the Feynman-Kac formula for the heat kernel.
38.12 Feynman-Kac path integral. The Green's function for
(at - Df::::. + V) u = °with u-t °in the Ixl -t 00 limit can be written as
(38.23)
lX (t l
=x [ it (1 (dX) 2 )]g(x, t) = V[x(·)] exp - dt - - + V(x(t)) ,x(Ol=o 0 4D dt
(38.24)
448We can invert this to get
449 This is a very delicate object, but is definable in a certain sense. However, inthese days. mathematicians seem to avoid this altogether. Cf. 20.2 Discussion (1).
496
where V is a function bounded from below.45o
450 A good introductory book on this subject may be R. P. Feynman, StatisticalMechanics, Chapter 3 (Benjamin, 1972). This path integral is well defined as aLebesgue integral on the set of continuous functions. For Schrodinger equation, wemust replace t with it. This replacement completely destroys the currently availablejustification of the formula as a Lebesgue integral.
497
39 Green's Function: Helmholtz Equation
The Helmholtz equation results from diffusion and waveequations. Its Green's functions are constructed with theaid of generalized function theory. To single out physicallymeaningful solution, we need an extra condition (radiationcondition).
Key words: Helmholtz equation, radiation condition, analogue of Green's formula
Summary:(1) If the region is finite, then there is no special difficulty comparedwith the Laplace case (----+39.2).(2T) Juggling of generalized functions in 39.4-6 seems to be the simplest way to obtain physically meaningful Green's function. If thereader can follow the logic, that is enough. However,(3) She must understand that a special condition is needed to guarantee the causality in the solution (Sommerfeld's radiation condition)(----+39.6).
39.1 Helmholtz equation. The Helmholtz equation (----+27A.24,16C.4)
- (~+ ",2)'l/J = 0 (39.1)
appears when we Laplace transform (----+33) the diffusion equation, orwhen we Fourier transform the wave equation (in this case ",2 = c2/w2 ).
Convention. We will use the time Fourier transform with eiwt . Thatis,
1 1+00
1jJ(t) = -2 dw1jJ(w)e- iwt•
'if -00
(39.2)
39.2 Green's function for Helmholtz equation on bounded domain. The formal formula for the Green's function is immediatelyobtained from the formal solution in, say, 35.2 or 37.7. Let us solve
- (~ + ",2)G = 1 (39.3)
on a region D under the homogeneous boundary condition at the boundary aD. We know that the Laplacian has a set of eigenkets {I>')} which
498
makes an orthonormal basis (->37.1 2: 1..\)(..\1 = 1 ->20.15). Sandwiching (39.3) with an eigenket and bra
(39.4)
so that we obtain(39.5)
39.3 Example: Neumann condition on a rectangular region.The Green's function under a homogeneous Neumann condition (i.e.,Neumann's function) for the Helmholtz equation in the rectangulardomain [0, a] X [0, b] can be obtained as
N(x, Ylx', y') = :b Ln~O,m~O,nm¥O
cos(mrx / a) cos(mrx' / a) cos(m7fY/b) cos(m7fY' /b)(n7f/a)2 + (m7f/b)2 - K,2
(39.6)
39.4 Green's function for the whole space. We wish to solve
(39,7)
with the boundary condition lui -> 0 as 11'1 -> 00. We interpret thisequation in the generalized function sense (-> 14). After Fourier transforming this (->32C.9), we obtain
(k2_ K,2)u = eikro .
Recalling 14.17(2), we can solve this equation as
u= h(k)eikro,
(39.8)
(39.9)
~ 1 2 2h(k) = P k2_ K,2 + C8(k - K, ), (39.10)
where C is a constant, but may depend on K,. This can be rewritten as(->32C.13, 8B.12)
h(k) = 2~ {[p (k ~ K,) + C8(k - K,)] - [p (k ~ K,) - C8(k + K,)]} .(39.11)
The Fourier inverse transform of u is given by the convolution of h(1')and 8(1' - 1'0) (->32A.2),
h(1') = 4:2r:r 1:00
dk e~: {[p (k ~ K,) + C8(k - K,)] - [p (k: K,) - C8(k + K,)]}.(39.12)
499
Here the angular integral has already been performed.
39.5 How to interpret the formal solution (39.11)7 Using thePlemelj formula (~32C.13), we can rewrite
P (-k1 ) + C<5(k - K) = lim k 1±. + (C ±i1r)<5(k - K), (39.13)• - K €~+o - K 't€
and
P (_1_) -C<5(k-K) = lim k 1 . -(C=fi1r)8(k-K). (39.14)k - K €~+o - K ± 't€
Thus, there are four combinations of + and - for (39.11). Consequently, we need an extra condition to select a solution.
39.6 Radiation condition (Ausstrahlungsbedingung). The extracondition to single out the physically meaningful solution from (39.11)IS
(39.15)
for r ~ 00 This condition is called the Ausstrahlungsbedingung (outradiating condition due to Sommerfeld). This requires that - must bechosen in (39.13) and + in (39.14): the integrand in (39.11) now reads
{ k 1 . -k 1 . +(C-i1r)[<5(k-K)+<5(k+K)]}eikT•
-K-U '+K+U(39.16)
Choosing C = i1r, we can remove unwanted e- iKT . Thus, we can get
(39.17)
That is,
G(rlro) = exp(iKlr - r ol ), ()41r lr - rol 39.18
which is called the retarded Green's function (cf. 40.1, 16C.l).
39.7 Green's functions for 2 and I-spaces. With the aid of ananalogous consideration, we can write down G in 2 and 1-space. For2-space (~27A.20 for H~I))
(39.19)
500
For I-space
G(rlra) = _1, exp(iA:jr - ral).2A:
The difference comes only from the angular integration.
(39.20)
39.8 Analogue of Green's formula. The equation correspondingto Green's formula 16A.19 is immediately obtained from Green's formula for the Laplace equation as
How to use it should now also be obvious (---?16A.21).
501
40 Green's Function: Wave Equation
The Green's functions of wave equations are constructed directly or from those of Helmholtz equation. The radiationcondition implies the specification of the time arrow.
Key words: retarded and advanced Green's function, propagator, afterglow effect, Helmholtz formula, causality, timearrow
Summary:(1) If we use the retarded Green's function for the Helmholtz equation,we can obtain the retarded Green's function (-+40.1).(2) For wave equations the tiem arow is selected by the radiation condition.
40.1 Fundamental solution. A fundamental solution to the waveequation satisfies
where
Dw(t, x; t', x') = 6(t - t')6(x - x'), (40.1 )
o ~ c-28; - Ll (40.2)
is called the D 'Alernbertian. Fourier-transforming this with respect totime, we obtain (-+39.1, 27A.24)
- (Ll + li:2 }w(w, x; t', x') = e-iwt'6(x - x') (40.3)
with Ii: = w/ c. Thus basically this is the same as the problem of findinga fundmental solution for the Helmholtz equation in the whole space.If we use the retarded Green's function for the Helmholtz equation(-+39.6), then inverse Fourier transformation gives
This can easily be integrated to give (--32C.8)
w(t,x;t',x')= 11 '1 6(t-t'-lx-x'l/c).41r x - X
502
(40.5)
Note that this is zero for any t < t'. This function is the Green functionfor 3-space, and is called the retarded Green function.
Discussion.In terms of the retarded Green's function, the inhomogeneous wave equation
Du = q
can be solved asu(t, re) =~ r q(t, y) dy.
41r J1re-YI$.ct Iy - rei
The formula is called the Duhamel's formula.
(40.6)
(40.7)
(40.8)
40.2 Advanced Green's function. We see above that the radiationcondition (-+39.6) imposes time reversal asymmetry (causality). Sincethe wave equation itself is time-reversal symmetric, the time reversed(40.5) should also be a solution to (40.1):
WA(t, x; t', x') = I 1 18(t - t' + Ix - x'i/c).41r x - x'
Note that this is zero for t > t' everywhere. This is anti-causal, and iscalled the advanced Green's function.
40.3 Propagator. A fundamental solution K(t, x; t', x') satisfyingthe boundary condition and symmetric in time is called the propagator of the problem. Its existence should be clear from the advancedand retarded Green's functions discussed above. The retarded Green'sfunction is related to the propagator as
G(t, x; t' x') = 8(t - t')K(t, x; t', x'), (40.9)
The fundamental solution satisfying the boundary condition and causality is called the retarded Green's function.
40.4 Symmetry of propagator.
K(t,xlt',x') = K(t - t',xIO,x'). (40.10)
This time translation symmetry directly follows from 41.1. This formula implies
and consequently
K(t,xlt',x') = K(-t',xl- t,x').
8t K(t, xlt', x') = -8t I K(t, xlt', x').
503
(40.11 )
(40.12)
They imply that
Analogously
so that we get
K(t,xlt',x') = -K(t',xlt,x').
K(t,xlt',x') = K(t,x'lt',x),
K(t,xlt',x') = -K(t',x'lt,x).
(40.13)
(40.14)
(40.15)
40.5 Eigenfunction expansion of propagator. Introducing theeigenfunction of the Laplacian with an appropriate homogeneous boundary condition (Dirichlet, Robin or Neumann condition) {IAn)} such that-~IAn) = AnIAn). we can separate the wave equation, to get
K(t, xlt', x') = (xl {~IAn)cSin[c2~? - t')] (Ani} Ix'). (40.16)
Here, if AO = 0 (this happens only when the Neumann condition isimposed), the sine term is computed with the aid of l'Hospital's rule.
40.6 Propagator in infinite space. From (38.4) and the symmetry we can easily guess that451
1K(t, xlO, 0) = -[8(t - x/c) - 8(t + x/c)].47fx (40.17)
This is indeed the right answer as can be computed from the continuumversion of (40.16}· .
c J 3 sin(ckt) ik.x c 47f (X!. .K(t,xIO,o) = (27f)3 d k ck e = (27f)3-;;- Jo dksm(ckt)sm(kx).
(40.18)See 32C.8 Exercise.
40.7 Propagator in 2- and I-spaces. For 2-space,
(2) _ ~ 8(ltl - x/c)K (t,xIO,o) - sgn(t)27f(t2 -x2/c2)1/2'
and for 1-space
504
(40.19)
(40.20)
Of course, they can be obtained by integrating unnecessary coordinatesout from the 3-space version (~16C.3).
40.8 Afterglow revisited. We can see explicitly from G(2) obtainable from K(2) that for Ixl < tc G(2) > 0, but this does not happen for3-space. This is the afterglow in even dimensional spaces. (~16C.4,
32D.10)
40.9 Helmholtz formula. The solution in 3-space to
D7P(t, x) = <p(t, x)
can be written as
(40.21 )
7P(t,x) = ft dt' f dxG(t, x; t', x')<p(t', x')JTl In
£: dt' ~n d(J(x') [G(t, x; t', x') 8:(~') - 7P 8nfx') G(t, x; t', X')]
+ 12
f dx'[G8t7P - 7P8tG]t=T1 • (40.22)c In
Just as in the case of the Helmholtz equation (~39.8), this is not theformula describing 7P in terms of the initial and boundary values.[Demo] Just as a proof of Green's formula (~16A.19), we get
fT
2 dt f dX[(Df)g-fl:lg] = - fT
2 dt f dS·[j'V'g-g'V'f]+ f dx 12[f8tg+g8tf]~~~~.JT1 In JT1 Jan In c
(40.23)Take f to be the retarded Green's function (~40.1), and 9 to be thesolution to (40.21), then this can be rewritten as the desired formula.
40.10 General causal solution. In (40.22) the surface integrals ofthe 4-volume n x [T1, t] describes the effects of the incoming waves inton from the past. Hence this can be rewritten as
7P(t, x) = 7Pin(t, x) + r dt' f dx'G(t,x;t', x')<p(t', x').JT1 I n(40.24)
Here 7Pin denotes the incoming wave. The Ausstrahlungsbedingung(~39.6) on 7P implies that 7Pin ~ 0 when n ~ R 3 and T1 ~ -00.
505
41 Colloquium: What Is Computation?
The section contains several deep examples of conceptualanalysis - of computer, computation, algorithm, randomness, etc. In A Church's idea on computation is outlined.Turing's idea in B made this idea convincing. Here Turingmachines are outlined. With these preparations, decisionproblems are briefly discussed in C. With the aid of oneof the main results of this part, computability and noncomputability in elementary calculus is outlined in D. PartE explains the basic idea of the algorithmic randomness. Inthe final part F, the algorithmic randomness is reconsideredfrom a more fundamental point of view. The concepts inA-C and E at least must be a part of elementary knowledgeof any civilized person.
Keywords: computer, algorithm, recursive function, Church'sthesis, Turing machine, universal Turing machine, Turingcomputability, decision problem, halting problem, recursiveset, recursively enumerable set, effectiveness, computablereal, computable function, Myhill's theorem (on differentiability).
Summary(1) The basic content of this section should be a rudimentary part ofevery intellectual person.(2) The reader must be able to explain to her lay friends what computation is (41A.11-12) and Turing machines 41B.1.(2) The concept of effectiveness must be understood (41DA).(3) The reader must be able to explain algorithmic randomness (41E.3).
41.A Recursive Functions and Church Thesis
41A.l What is a computer? It is a device to perform computation.Then, what do we mean by 'computation'? Intuitively, a computationis a transformation of a finite sequence of symbols by a finite numberof applications of finitely many well-defined rules (algorithms). Thisaspect does not change even if we wish to consider the so-called quantum computation. Since we consider precise computations only, i.e.,computations without any roundoff errors, we have only to consider
506
procedures to transform a nonnegative integer into another nonnegative integer (i.e., N into itself).449
41A.2 Arithmetic function. A function which maps N into itself iscalled an arithmetic function or a number theoretic function. Thus wehave only to consider the computation of arithmetic functions.
41A.3 Remark. A trivial fact should be kept in mind that we cannot understand the crucial concept 'computer' by analyzing the material basis of an actual computer. There are important things aboutthe physical world on which materialistic physics cannot say anythingmeaningful.
41A.4 Obviously computable functions. To characterize preciselythe procedure we call 'computation',45o we must start with intuitivelyobviously computable arithmetic functions S, U and C:(A) S(x) = x + 1,(B) Ur(XI,"', x n ) = Xi,(C) C~(XI,'" ,xn ) = m.S is the function giving the successor (in N) to x, and Ur is the projection operator selecting the i-th coordinate out of n coordinates. C~l
assigns a constant m( E N) to {Xl,'" , x n }. These are the functionsprobably everybody thinks computable.
41A.5 Basic operations on functions. Combining these elementaryfunctions, we can produce more complicated functions. What kind ofprocedures should we allow as obviously doable?451 The following three
449When we consider the computability of, e.g., irrational numbers, we must carefully treat the computational errors. Roughly speaking, something is computable ifthere is an algorithm which gives it within any specified error bar (-+41D.2). SeeM. B. Pour-EI and J. 1. Richards, Computability in Analysis and Physics (Springer,1989). Recently, Blum et al. have developed a theory of computation over reals: L.Blum, M. Shub and S. Smale, "On a theory of computation and complexity overthe real numbers: N P-completeness, recursive functions and universal machines,"Bull. Amer. Math. Soc., 21, 1-46 (1989). In the theory of computation we use here,that is, in the traditional theory of computation, a real number is considered as astring of bits. In contrast, in the new theory a real number is not viewed as itsdecimal or binary expansion, but rather as a mathematical entity.
450 An extremely efficient exposition can be found in the beginning part of R. 1.Soare, Recursively Enumerable Sets and Degrees - a study of computable functionsand computability generated sets - (Springer, 1987). A classic may be M. Davis,Computability and Unsolvability (Dover, 1982).
451 In the following, in contrast to the ordinary definition of functions in analysis,they are partial functions, i.e., f(xl,'" ,xn ) need not make sense (need not bedefined) for all n-tuples {(Xl,'" , x n )}. We simply stop assigning a value to f when
507
41A.6-8 procedures are regarded unambiguously doable and elementary:
41A.6 Composition. The first admissible procedure (I) is: Supposewe already have functions gl, ... ,gm and h. From this we can make
(41.1)
where h is a function of m variables, and gi are functions of n variables.There should not be any difficulty to accept this.
41A.7 (Primitive) Recursion (II). Suppose we have functions ofn + 1 variables f and a function of n variables such that
f(Xl" .. ,Xn, 0) = g(Xl,' .. ,xn). (41.2)
Then, starting with this function we can recursively construct f (Xl, ... ,Xn , m)for mEN as follows:
f(Xl,'" ,xn,m) = h(Xl,'" ,xn,m -l,f(Xl"" ,xn,m -1)), (41.3)
where functions h is a function of n + 2 variables.This is allowed for any mEN. This may cause some practical
problem, but if we are patient enough, for any fixed m there should notbe a problem.
41A.8 Minimalization (or unbounded search) (III). Let f(Xl"" ,xn)be a total function. 452 Then, we can make a function h(Xl,' .. ,xn-dwhich gives the smallest Xn satisfying f(Xl,"" xn) = 0 for each {Xl,"', xn-d.(Here f being a total function is crucial.)
(I)-(II) and (III) are markedly different, because while it is guaranteed that the procedures (I) and (II) end within finite number ofprocedures, (III) is not guaranteed to end; there may not be any solution (~41A.I0).
41A.9 Partial recursive functions. The functions generated byapplying the basic operations I- III finite times on the basic functions[A] - [C] are called partial recursive functions.
41A.I0 Algorithm and partial recursive functions. Algorithmsis a finite set of well defined finite means. We may say partial recursivefunctions are the functions we have algorithms to compute them (disregarding whether we can complete the procedure).
it is not defined. If f is defined for all (Xl,'" ,Xn ), we say the function is a totalfunction.
452That is, it is defined for any Xl,'" .T n E N.
508
The operation III (minimalization) is a very tricky procedure. Certainly, we can compute the value of f(XI,"', Xn-l, m) for a fixed choiceof the set {Xl,"', Xn-l} for any m, because f is a total function.Putting minto f one by one in the increasing order starting from m = 0,we can ask whether f(XI,"', Xn-l, m) vanishes or not. If f vanishesfor the first time with m = q, then we know h(Xl," . ,Xn-l) = q.
However, we do not know whether such integer ever exists. Thus.although we have an algorithm to perform the recursive procedures (weexplicitly know each step), we do not know whether we can ever finishit, since there is no guarantee of the existence of the solution. For partial recursive functions, the fact that the procedure has not yet beencompleted can imply either that the function is not defined for the giveninstance of variables (that is, no solution exists for f(xl,"', Xn-l, m) =0) or that the function has a value for the given instance but furthercomputation is required to obtain it.
41A.ll Recursive functions. More convenient functions should bethe ones for which not only each step of construction is explicitly known,but also there is a guarantee that its construction (computation) endswith finite number of steps. They are the recursive functions:Definition. Total partial recursive functions are called recursive functions.
Notice that there are only countably many recursive functions, butof course uncountably many non-recursive functions.
41A.12 Church's Thesis. Church proposed: Computable functionsare recursive functions. 0 453
We may say that Church proposed a definition of the word 'computable.'
41A.13 Remark. The crucial elements of the thesis is an explicitdescription of construction procedures,454 and the guarantee that theprocedure always ends with finite steps ('finitary dogma').
41A.14 The thesis was not well accepted initially. Church's thesis was not recognized as a really convincing definition of computabilitywhen it was first proposed. The reason for this is that there was no clear
453Some people identifies computable functions and partial recursive functions,and call this identification Church's thesis. For example, M. Li and P. Vit~tllyi, AnIntroduction to Kolmogorov Complexity and Its Applications (Springer, 1993); J.E. Hopcroft and J. D. Ullman, Introduction to Automata Theory, Languages andComputation (Addison Wesley, 1979) are among them.
454 Church's thesis implies that logical inference is possible only when the procedurecan be made purely syntactic. It is a sort of ultimate reductionism.
509
feeling about explicitly describable construction procedures. There maybe a completely novel type of algorithm which is not recursive, so thatthe thesis may be under the limitation of the contemporary mathematics. A more general objection was: even apparently intuitively obviousconcepts such as 'continuity' requires its definitive characterization bythe axioms of topological spaces; one must be cautious when one says'so-and-so' is intuitively obvious.455
41A.15 Turing machine was crucial. The thesis became really convincing after the work of Turing (Turing machine, see the next subsection), who started with the analysis ofthe limitations of our sensory andmental apparatus. Church himself wrote that Turing-computability hasthe advantage of making the identification with effectiveness in the ordinary sense evident immediately without any preliminary theorems.
41.B Turing Machine
41B.l Turing machine. A Turing machine consists of a two-way infinite tape divided into cells, a read-write head which can scan one cellof the tape at a time, and a 'black box' with finite number of internalstates. A Turing machine may essentially be identified with its program:
41B.2 Turing program. A Turing program is a finite set of thefollowing four-tuples, (q, S, *, q'). Here(1) q and q' are the internal states of the (black box of the) Turingmachine,(2) S is the symbol in the cell currently being scanned by the head, and(3) * is R, L or S': R (L) implies that at the next step the head movesone cell to the right (resp. left), and S' implies that the head replacesthe tape symbol S with S' without moving.
Thus (q, S, *, q') implies that if the Turing machine reads S whenits internal state is q, then the head does * and the internal state becomes q'. Usually the symbol in a cell is either blank B or 1. D
41B.3 Turing's motivation. Turing arrived at the concept of Turingmachine, analyzing the limitations of our sensory and mental apparatus. The restrictions required are:(i) finite symbols are allowed to each cell,
455R. Gandy, "The Confluence ofIdeas in 1936," in The Universal Turing Machine,a half-century survey (Oxford UP, 1988) : Godel as well as Post did not accept thethesis.
510
(ii) the computer can see only a finite number of cells at a time,(iii) at each time step the computer may alter the contents of a singlecell,(iv) scan can be finite range,(v) there is an upper bound to the number of states of mind, only fixedfinite set of instructions can be performed.
In short, Turing conceived our brains as finite machine (finite number of distinguishable symbols, finite number of rules or procedures,finite number of distinguishable internal states, etc.).
41BA How to operate Turing machine. We begin with the machine in its startin~ state qI with its head at the leftmost 1 on the tapewhere x is coded45 • There is a special state qH called the halting state.The number y of 1's left on the tape when the Turing machine comesto qH is the output of the Turing machine. That is, the Turing machinedefines a function x ---t y.
41B.5 Turing computable function. The Turing machine operated as in the previous entry may never halt, so that a Turing machinegenerally defines a partial function. If a Turing machine always haltsfor any input, the Turing machine defines a total function. In this casewe say the total function is a Turing computable function.
41B.6 Turing computability =Church computability. A Turingmachine which halts for all the inputs defines a recursive function, andany recursive function can be realized as a Turing machine which haltsfor all the inputs.D.457
That is, the set of all the functions computable by Turing machineis the totality of recursive functions.
41B.7 Remark. The identity of the Turing computability and computability due to Church made the Church's thesis convincing to many(---t41A.14-15). However, one might not accept all five restrictions forthe human thinking ability stated by Turing (---t41B.3). G6del andPost always believed that a true account ~ an acceptable theory - ofhuman mathematical intelligence must be nonmechanical. In particularG6del has argued that in our ability to handle abstract concepts we arenot subject to the restrictions described by Turing. These only applywhen we are dealing with (potentially) concrete objects such as stringsor symbols; they believed that satisfactory theory of mathematical intelligence must take account of nonfinitary creative reasoning.
456There are many different ways to code x. Choose one and fix it.457 A proof may be found in Davis or Soare op. cit.
511
4IB.8 Universal Turing machine. As is clear from the definitionof Turing machines, each machine which halts for all the inputs definesa single recursive function. Turing machines are, so to speak, singlefunction computers. Notice, however, that any Turing program can becoded in terms of positive integers (the G6del number of the Turingmachine program 4IB.2). Hence we can imagine a 'master' computerwhich compiles the program in G6del numbers into a set of (Turingexecutable) four-tuples to emulate the corresponding particular Turingmachine. That is, there is a Turing machine which can emulate all theTuring machine. Such a Turing machine is called a universal Turingmachines. Universal Turing machines may be regarded as idealized digital computers with infinite memory space.
4IB.9 Universality of universal Turing machine. Universal Turing machines are not unique, but the following theorem guarantees thatin essence all the universal Turing machines have the same computational power:Theorem [Kolmogorov-Solomonov]. Let M and M' be two universal Turing machines, and fM(x) be the length (in bits) of the shortestprogram for M to print out the output x. Then
(41.4)
where A(x) ~ B(x) implies that there is an x-independent positiveconstant c such that A(x) ~ B(x) + c for all x. D
4IB.I0 Absoluteness of Turing machine. What is impossible fora universal Turing machine cannot be done by any computer (or brain,if we accept Turing's analysis 4IB.3). This statement is unalteredeven if we consider quantum computation. However, if a computer ora machine can have an access to a large oracle set (solution sets), thesituation could be very different. This may be the reason for G6del'sand Post's objection to Turing's characterization of our brain function.
41.C Decision Problem
4IC.I Decision problem. Given a set of problems (or a set of instances of a problem, e.g., whether a polynomial has a as its root ornot), we ask whether there is an algorithm to answer all the problemsin the set. This problem is called a decision problem. If there is suchan algorithm, we say the set (or the problem) is decidable. If not, wesay it is undecidable. The word 'algorithm' was unclear before Turing,
512
but now we can clearly state that 'algorithm = existence of Turing program.'
41C.2 Remark. If a set is finite, or the problem has only finiteinstances, then it is trivially decidable, because we can check all ofthem one by one blindly. The decision problem becomes nontrivialonly if the problem has infinite instances like the one due to Diophantus (41C.3(1)).
41C.3 Examples.(1) Hilbert's 10th problem. Decide whether a polynomial P(Xl, X2, ... ,xn )
with integer coefficients (Diophantine equations) has an integer root.This is decidable if n = 2,458 but is undecidable for general n.459
(2) Is :3xl:3x2:3x3\fYl'" \fYmU true (m EN)? Here, U is any logicalformula within the first order logic460 without including :3, \f and freeobject variables. This is undecidable.
41C.4 Halting problem of Turing machine. Suppose we havea Turing machine T. We feed programs to it, and ask whether itever stops (that is, the solution is given within a finite time or not(-+41B.4)). Certainly, we can run the program on T, but that themachine has not yet stopped does not mean anything about its finalresult. Is there any algorithm to judge that a program a ever gives asolution when run on T? This is called the halting problem.
41C.5 Halting problem is undecidable. Suppose there is a desired algorithm which works on a universal Turing machine (-+41B.8)X. The meaning of the statement is this. In order to decide (T, a) (ais run on T) halts or not, we feed the Godel number of T (or a codefor the Turing program of T) and a to X. X stops after printing 1,if (T,a) halts.461 Otherwise, X stops after printing 0 (or B). Now wedemonstrate that the existence of such UT leads us to absurdity.(1) We construct another universal Turing machine Y as follows: if Xhalts after printing 1, Y keeps moving its head to the right; otherwiseY stops after printing O.(2) Now we make the third universal Turing machine Z such that if Zis fed a program f3, Z make (f3, (3) and then does what Y does.
458 A. Baker, Phil. Trans. Roy. Soc. London A 263 (1968).459 Ju. V. Matyasevich, 1970.46°1 recommend H. D. Ebbinghaus, J. Flum and W. Thomas, Mathematical Logic
(Springer Undergraduate Texts in Mathematics, 1984; there is a new edition).461 Here we identify T with its program or its Godel number, so program (T, a) on a
universal Turing machine means that the machine reads the program and emulatesthe machine T and then interpret a as T does.
513
(3) Suppose Z stops with a program Z. Then Y must halt with theinput (Z, Z). That is, X halts with (Z, Z), printing O. However, thismeans that Z does not halt with the program Z.(4) Suppose Z does not stop with the program Z. Then Y keeps running with the input (Z, Z). Hence X halts after printing 0 when fed(Z, Z), but then Y must halt after printing O. That is Z must halt withthe program Z.Hence, we cannot decide whether (T, a) halts or not.
41C.6 Recursive set. A set whose characteristic function is a recursive function (~41A.ll) is called a recursive set.
What this means is: if a set is a recursive set, then we have analgorithm to tell whether a given number is in the set or not. In thissense, we can tell the member of the set without referring to how togenerate the set.
In other words, if we can construct a Turing machine (or a program for a universal Turing machine) such that it can print 1 if theelement462 is in the set and 0 otherwise with finite steps.
41C.7 Recursively enumerable set. A set which is a range of arecursive function is called a recursively enumerable set.
Hence, if a set is a recursively enumerable set, we know how toproduce the set (there is a computer program which generates the set).We simply feed the elements of N one by one to the recursive function,and collect its outcomes.463
In other words, we feed all the possible Turing programs into aTuring machine (we must demand that the machine surely stops for allthe programs) and collect all the outputs.
41C.8 Theorem: There exists a recursively enumerable butnot recursive (RENR) set. This is important, so a demonstrationis given here.
Let <Px(y) be the output (if any) ofthe Turing machine whose Godelnumber is x (remeber that there are only countably many Turing machines), when its input is y. Here, both x and yare in N. Make aset
K = {x : <Px(x) is defined}. (41.5 )
That is, K is the set of all the numbers x such that the correspondingTuring machine halts with the input x. Certainly, this is a recursivelyenumerable set, because we know how to perform each step needed to
462 Of course, this must be suitably encoded so that the machine can understandit.
463In this case it is known that the enumeration can be done without repetition.See e.g., Zvonkin and Levine, op. cit. Theorem 0.4.
514
compute cPx(x), although we do not know whether it actually gives anumber or not. Now define a function f such that
That is,
f (x) ={ cPx (x) + 1,0,
if cPx (x) is defined,otherwise.
if cPx (x) is defined,otherwise,
(41.6)
(41.7)
where XK is the characteristic function of K. If K is recursive, thenXK is recursive, so there must be a Turing machine which reproducesf. However, there cannot be such a Turing machine; if any, there mustbe an x such that f(z) = cPx(z) for any zEN, but obviously thisis untrue for z = x. Thus we cannot assume that XK is a recursivefunction. (This is an example of the famous diagonal argument.)
41C.9 Theorem. A set Q is recursive if and only if both Q andQC are recursively enumerable. 0This should be obvious from the explanation in 41C.6.
41.D Computable Analysis
41D.l Computable rational sequence. We say a rational numbersequence {rk} is a computable rational number sequence, if for any k EN there are recursive functions (-+41A.ll) a, band s (b =I 0) suchthat
r = (_l)s(k)a(k)k b(k) . (41.8)
41D.2 Effective convergence. Let {rk} be a computable rationalsequence. We say it converges effectively to x E R, if there is a recursivefunction e(N) such that
(41.9)
That is, if {rk} converges to x in the ordinary sense of this word andif there is an algorithm to estimate error, we say {rk} converges effectively to x.
41D.3 Computable real number. x is a computable real number, if
515
there is a computable rational number sequence effectively convergingto x.
41D.4 Remark: Effectiveness. We say we can do something effectively, if we have an algorithm. We say a concept is effective, if wecan define it with an algorithm (for example, whether it is correct ornot can be decided). An asymptotic object such as irrational numbersis said to be an effective object when its construction and the distance(error) from the asymptotic limit can be estimated effectively. Thus,'effectiveness' is a precise formalization of 'constructibility.'
41D.5 How to destroy effectiveness. Let A = {a(n)} be a RENRset without repetition (i.e., a(n) =J. a(m), if n =J. m). We can computeeach a(n), but we cannot effectively tell whether, say, 10 appears in Aor not. Hence, if we can construct a procedure whose error estimate isbounded by 2-a(m), then effective estimation is destroyed.
41D.6 Waiting lemma. Let A = {a(n)} be a RENR set (---+41C.8)without repetition (i.e., a(n) =J. a(m) if n =J. m). Let
w(n) - max{mla(m) s n}.
Then, there is no recursive function (---+41A.ll) c(n) such that
w(n) S c(n).
(41.10)
(41.11)
That is, there is no algorithm to estimate the needed m so that {I, ... ,n} C{a(l),···,a(m)}. []
If c(n) were recursive, then we could tell whether n E A or notwith a finite number of steps. First, compute c(n) = m, then check alla(m') for m' up to m. If we could find n among the output, certainlyn E A; if we could not, then n =J. A. Hence, A would be a recursive set,a contradiction.
41D.7 Theorem. There is a bounded monotone increasing seriesconsisting of computable rational numbers that does not converge effectively (that is, although its convergence is guaranteed, we have nomeans to compute its value for sure). D.
Take A in the above and construct
00
S = L 2-a(n).
n=O
(41.12)
This is a desired example of the series claimed in the theorem. Since,for example, we do not know whether 2 is in A or not effectively, we cannot estimate S (which must be less than 2) better than the error of 1/4.
516
41D.8 Computable function. We say a function from R into itselfis computable, if its values at computable reals are computable reals.Pour-El and Richards impose further the following effective uniformcontinuity. There is a recursive function d such that for any n E N
Ix - yl ::; l/d(n)::;. If(x) - f(y)1 < 2-n. (41.13)
(41.14)
41D.9 'Ordinary functions' are computable. sin, cos, exp, I n ,
etc., are computable. Behind this statement lies the following 'effectiveWeierstrass'theorem.'
If we can find a recursive function D(n) such that
D(n)
Pn(x) = L rnjxj,
i=O
where rnj are computable rationals, we say {Pn} is a computable sequence of rational polynomials.Effective Weierstrass. If we can find a recursive function e(n) suchthat
m :2: e(N) ::;. If(x) - Pn(x)1 < 2-N,
then f is a computable function. 464
(41.15)
(41.16)
41D.I0 Computable operations on functions. Composition f 0
g, sum f ± g, multiplication f g, and many other elementary operations preserve computability. Integration also preserves computability.Hence, it is not hard to guess that the derivatives of computable analytic functions are again computable. However,
41D.ll Theorem [Myhill]. Even if f is a computable C1 function,l' may not be computable. 0
The following is the counterexample. Let
(x) = { exp( _x2 /(1 - x2)) for Ixl < 1,
'P 0 otherwise,
which is a Coo function. Let A = {a(n)} be the RENR set mentionedbefore. Define
Construct00
f(x) = L 4-a(k)'Pk(X).
k=O
464 See Pour-El and Richards, Chapter 0, Section 5 and 7.
517
(41.17)
(41.18)
This is computable, but
(41.19)
where XA is the characteristic function of A, which cannot be computed.
41D.12 PDE and computabilty.(1) Laplace and diffusion equations preserve the computability of theauxiliary conditions.(2) In d(~ 2)-space, the wave equation cannot preserve computability.More explicitly, even if the initial data is computable, the solution attime, say, t = 1 is not computable. It is not hard to understand this,if we notice that the Radon transformation formula (-37.12 d ~ 2)involves differentiation (cf. 41D.11).
41.E Algorithmic Randomness
41E.1 Regularity in sequence. If something (e.g., a sequence) is'random', then we would not discern any feature in it. Therefore, tocommunicate it to someone else, the simplest way is to send its faithfulcopy. If we could discern a certain characteristic feature or regularityin the sequence, we can exploit the feature to shorten its description.For example, the sequence 1123583145943707741561785... is producedby the rule an = an-l + a n-2 (mod 10) with al = a2 = 1. Thus to sendmillion numbers al,' .. ,a1Q6, we send the rule, the initial two numbers,and the total number of digits N = 106. If one has to send extremelymany terms, the length of the message is dominated by the length ofN. Hence, we may expect the message length is asymptotically proportional to log N.
41E.2 Another example. Consdier another example: 33057270365759591953092186117381932611793105118548074462379962749567351885752724.This may look random, but this is the 1001st to 1100th digits from thedecimal expansion of 1f. Hence, the statement "the 1001st to 1100thdigits in the decimal expansion of 1f" is already shorter than the sequence itself. Certainly, this would be the case if one wishes to sendmillion digits from the decimal expansion of 1f starting from the onemillionth digit. Again, in this example, the length of the message wouldbe dominated by the number specifying the total number of digits asin the preceding example. This example gives us another importantlesson. It is almost impossible to compress the length of the messageby only looking at the message (in this case a sequence). This implies
518
that there is no general fool-proof method to compress the message (totell whether the sequence is random or not) (-41E.6).
41E.3 Intuitive introduction to algorithmic randomness. Aformalization of the above idea of randomness = information incompressibility is the intuitive essence of algorithmic randomness due toSolomonov, Kolmogorov and Chaitin. If there is a much shorter program for a computer to print out the sequence than the printout itself,then the sequence cannot be random, because some order or discerniblestructure must have been used to information compress the sequence.Thus the idea seems to capture our intuition about the 'lawlessness' ofthe random sequence465 (but see41E.2).
41E.4 A definition of randomness. The randomness K(w) of a
binary sequence w E {O, l}N is defined by
K(w) =limsupRM(w[n])/n, (41.20)n~oo
where w[n] denotes the first n letters of w. 0Notice that this does not depend on the choice of the universal Turingmachine M thanks to Kolmogorov and Solomonov (-41B.9).
41E.5 Random sequence. A binary sequence w is (algorithmically)random if K(w) > O. 0
41E.6 Noncomputability of randomness. Notice that K(w) is notcomputable as can easily be guessed from the appearance of the wordssuch as the shortest program for a machines, etc. This is the difficultywe have already encountered with 1r. It is usually extremely hard (virtually impossible) to discern an order even if it exists which can beexploited to compress the sequence. Thus except for very obvious caseswe may not be able to tell whether a given sequence is random or not.It is generally impossible to quantify the randomness of a particularsequence in terms of K. 466
41E.7 Examples.(1) There are only countably many algorithmically non-random sequences, so the 01 sequences obtained from the binary expansion of
465To clearly define 'lawlessness' itself is a challenging endeavor. Within the ordinary classical logic it is impossible, because due to the exclusion of middle, if asequence does not have some property, then it can be characterized by the lack ofthe same property. Thus excluding one law implies admitting the negation of thatlaw.
466My math mentor told me, "A random number is like God. If you are told thatthis is God, you would be extremely suspicious."
519
almost all numbers in [0,1] with respect to the Lebesgue measure arealgorithmically random.(2) All the binary expansions of algebraic numbers are nonrandom.(3) 1r is not random algorithmically, although its decimal expansionsequence exhibits all the good characteristics of a random sequencestatistically.467 Such a fact will be crucial in trying to understand whatcomplexity is.468
41E.8 Randomness and chaos. Chaos in dynamical systems canbe characterized by the algorithmic random trajectories.469
41.F Randomness as a Fundamental Concept
41F.1 Why do we discuss randomness further? A mathematicalreason is stated below. Here a physical motivation is given. Everyoneknows that the basic principle of statistical mechanics is the principleof equal probability. That is, the sampling measure for the equilibriumstate is the Liouville measure (or the Riemann volume of the phasespace). It is in principle impossible to justify this with the aid of mechanics, because this is a statement about the initial condition of theclosed systems. Hence, this is a principle beyond any physicallaw,47oand dictates how we observe Nature. When we sample randomly, statistical mechanics holds. Thus at the heart of statistical mechanicsand thermodynamics, which are the only means to relate microscopicand macroscopic observables, there is a characterization of randomness.Thus randomness is of central importance in physics.
41F.2 Mathematization of 'randomness'. In the algorithmic characterization of randomness, 'randomness,' which is an intuitive concept,is mathematized by identifying it with 'the lack of computable regu-
467In practice, random number evolves. The random number of today is the sequence which passes all the statistical test available today.
468There are attempts to make a measure of randomness which is actually computable. One approach is to use finite automatons instead of universal Turingmachine. However, these approaches may be fundamentally flawed, because theconcept 'random' may naturally be transcendental. That is, whether a given instantis random or not cannot be judged within the mathematical (or logical) frameworkwe are working in (->41E.4).
469 A. A. Brudno, "Entropy and the complexity of the trajectories of a dynamicalsystem," Trans. Moscow Math. Soc. Issue 2, 127-151 (1983).
470 However, you could imagine that a special initial condition was imposed as timet = 0 of the Universe.
520
larity.' One may well argue that there is no guarantee that all theregularities are 'computable regularities'; some inspiration or revelation might tell us the existence of a different kind of regularity in thesequence. This is not an outrageous statement, if mathematical intelligence is, as supposed by Codel and Post, nonfinitary (~41B.7).Furthermore, 'random sampling' may be done by Nature herself. Inthis case, why do we have to assume that Her capability is restrictedto computation?
41F.3 Why Axiomatization of Randomness? Fundamental concepts should not have unique and privileged interpretations. That is, ifX is very fundamental, we should not be able to answer the question,"What is X?", because we need understanding of more fundamentalconcepts to answer the question. Thus, axiomatization in which Xappears as primitive is the only way to formalize our thought on fundamental objects or foundational issues.
41F.4 Van Lambargen Axioms. Van Lambalgen's independenceaxioms are informally as follows. He introduces a relation R such thatR(x, y) may be interpreted as 'y has no information about x,' or 'xcannot be information-compressed even with the extra information y(or 'oracle y'). This relation R is specified by the following axioms,which are put informally here:471
Rl. There is a sequence which cannot be information-compressed without any 'external information (or oracle). [:3xR(x, 0). ]R2. If x cannot be information-compressed with the information of yand Z, then it cannot be done so with the information of Z alone. [R(x, yz) :::::} R(x, z).] (Here y and z may be understood as sets, and yztheir joint set.)R3. If x cannot be information-compressed by the information of y,then x and yare different. [R(x, y) :::::} x =I- y.]R4. If there ia a ¢--relation between y and x, and x cannot be information -compressed with the aid of y, then there is a more random sequence w such that there is a ¢--relation between wand y,and w cannot be information-compressed with th aid of not only ybut of z. [:3x(R(x,y)A¢(x,y)) :::::} :3w(R(w,zY)A¢(w,y)). Here ¢should not have any parameters other than listed in y.] (In a certainsense, w satisfying R(w, y) is 'more random (lawless)' than x satisfying R(x,0), because R2 implies that w satisfies R(w,0). That is,w is not only information-incompressible without any extra informa-
471 There are slightly different versions of axioms given in van Lambalgen's papers.Also here the exposition is informal, so the numbering of the axioms are differentfrom the original versions. Probably the latest paper is, "Independence, randomnessand the axiom of choice," J. Symbolic Logic 57, 1274-1304 (1992).
521
tion, but also incompressible even with the extra-information y. Thusthis axiom demands that there always exists a 'more' random sequencethan a given one; Indeed, R4 implies, when no relation ¢ is chosen,:3xR(x, y) :::} :3w(R(w, zy). Hence, xR(x,0) implies that x is in a certain sense with the lowest level randomness.R5. If y cannot be information-compressed with the information z,and, simultaneously, x cannot be information-compressed with the aidof y, then y cannot be information-compressed with the aid of x. [R(y,z)I\R(x,yz):::} R(y,xz). ]472,4i3
41F.5 Grave consequences of R. There is a very grave consequenceof van Lambalgen's axioms of randomness. If we add these axioms Rto the usual Zermelo-Fraenkel axioms of sets, then Axiom of Choicedoes not hold.
472The following axiom is also sometimes required. R6. In the ordinals, there isno element of randomness. [R(x, y) => R(x,o:y), where 0: is an ordinal.]
473The axioms of independence and Friedman's quantifier 'almost all' Q have anintimate relation. QxljJ(x) can be interpreted as: if x is randomly generated, thenit is practically certain that rp(x). Thus we can translate Qxrp(x) as Vx(R(x,f/J) =>ljJ(x) ).
522
APPENDIX A.Rudiments of Analysis
Warning. This is not a substitute of a standard textbook of elementary calculus, but covers most topics every undergraduate analysiscourse must cover. This is only a summary or a check list of the reader'sknowledge. Scan the titles of the numbered entries, and if she finds asomewhat unfamiliar concept, read the entry. Try to form vivid mentalimage of defined concepts. Try to be able to explain why the statements are plausible intuitively. If you feel a theorem to be obvious, youneed not prove it. The following material heavily relies on K. Kodaira,Introductory Calculus I-IV (Iwanami 1986), and Encyclopedic Dictionary of Mathematics (Iwanami 1985, 3rd edition). J. D. DePree andC. W. Swartz, Introduction to Real Analysis (Wiley, 1988) may be recommended as an introductory textbook.
Table of Standard SymbolsI. Point sets and limitsII. FunctionsIII. DifferentiationIV. IntegrationV. Infinite SeriesVI. Functions of two variablesVII. Fourier series and Fourier transformVIII. Ordinary differential equationsIX. Vector analysis
524
E
CNQRZc r
COCOOCW
LI(A, p)L2(A,p)
infsupsupp
L(R)HS
Al
Table of Standard Symbols
all, any, arbitrary.there exist (s)A=} B means A implies B.if and only if (iff)A.,- B means "A is defined by B."a E A implies that a is an element of A.
the set of all complex numbers.the set of all nonnegative integersthe set of all rational numbersthe set of all real numbers.the set of all the integersthe set of all the r-times continuously differentiable functions.the set of all the continuous functionsthe set of all the infinite times differentiable functions.the set of all (real) analytic functionsLebesgue integrable functions on A with weight p.Square Lebesgue integrable functions on A with weight p.
infimumsupremumsupport
Left (right) hand side
Point Set and Limit
The properties of reals (=real numbers) such as their continuity areassumed to be known.
Al.I Sequence. Let aI, a2, ... be reals. aI, a2, a3,'" is called asequence and is denoted as {an}. Each real in the sequence {an} iscalled a term.
525
Al.2 Convergence, limit. A sequence is said to converge to ex iffor any positive E, there is a positive integer N( E) such that
n > N (E) :::} Ian - ex I < E. (A1.I)
ex is called the limit of the sequence {an}, and is often written as an ---t ex.
Al.3 Theorem [Cauchy]. A necessary and sufficient condition for a(real) sequence {an} to converge is that for any positive number E thereis a positive integer N (E) such that
n > N (E), m > N (E) :::} Ian - am I < Eo (A1.2)
oSuch a sequence is called a Cauchy sequence. [In an infinite dimensionalspace, a Cauchy sequence may not converge.]
AlA Symbol '0' and '0'.(I) f = 0 [g] means that the quantity f is of order 9 in the appropriatelimit in the context. That is lim f /9 is not divergent. For example,I - cosx = O[x2] in the x ---t 0 limit. That is, limx-+o(1 - cosx)/x2 <+00, which is, of course, correct.(2) f = o[g] means that the quantity f is 'much smaller' than g in theappropriate limit in the context. For example, sin(x2 ) = o[x] in thex ---t 0 limit.
Al.5 Limit and arithmetic operations commute. Let an ---t exand bn ---t {3. Then,(i) If an 2:: bn for infinitely many n, then ex 2:: {3.(ii) an ± bn ---t ex ± (3.(iii) anbn ---t ex{3.(iv) If an I- 0 and ex I- 0, then bn/ an ---t {3 / ex.
Al.6 Lower and upper bound, supremum and infimum. LetS c R. If any element in S does not exceed a real f-l (i.e., s :S f-l for anys E S) [resp., is not exceeded by a real number f-l (i.e., s 2:: f-l for anys E S)], we say S is bounded to the above [resp., bounded to the below]and f-l is called an upper bound [resp., lower bound] of S. The smallestupper bound [resp., the largest lower bound] of S is called the supreme[resp., infimum] of S, and is written as SUPsES S [resp., infsEs s]. If S isbounded to the above and to the below, S is said to be bounded.
Al.7 Monotone sequences. If al < a2 < ... < an < ... [resp.,al > a2 > ... > an > ...], {an} is called a monotone increasing [resp.,monotone decreasing] sequence. If al 2:: a2 2:: ... 2:: an 2:: ... [resp.,
526
al ::; a2 ::; ... ::; an ::; ...], {an} is called a monotone non-decreasing[resp., monotone non-increasing] sequence.
Al.8 Theorem [Bounded monotone sequences converge].A monotone non-decreasing [resp., non-increasing] sequence boundedto the above [resp., to the below] converges to its supremum [resp., itsinfimum]. 0
Al.9 Divergence to ± infinity. If a monotone non-decreasing sequence [resp., non-increasing sequence] is not bounded to the above[resp., to the below], we say it diverges to positive infinity [resp., negative infinity] and write limn-+oo an = +00 [resp., limn-+oo an = -00].
Al.IO Limsup and liminf. Suppose {an} is a bounded sequence. LetsUPn an+m = am for m = 1,2,3,···. Then {an} is a bounded monotonenon-increasing sequence. Hence, Theorem Al.8 tells us that limn-+oo anexists. This is called the superior limit of the sequence {an}, and is written as lim SUPn-+oo an' Analogously, the limit limm -+oo infn an+m exists,which is called the inferior limit of the sequence {an}, and is writtenas lim infn-+oo an'(i) For any positive E there are only finitely many an larger than lim SUPIHOO an +E, but there are infinitely many an larger than lim sUPn-+oo an - E.
(ii) For any positive E there are only finitely many an smaller thanlim infn-+oo an -E, but there are infinitely many an smaller than lim infn-+oo all +E.
(iii) A necessary and sufficient condition for {an} to converge is lim supan =lim inf an'
Al.II Infinite series. For a sequence {an}, al +a2+a3 +... +an+...is called an infinite series, and is often written as L:~=1 an' The convergence of the series is defined by the convergence of the sequence {sn}consisting of its partial sums: Sn..- al + ... + an' limn-+oo Sn, if it converges, is called the sum of the infinite series L:~=1 all' If { sn} does notconverge, the series is said to be divergent.If L:~=1 an converges, then an converges to zero.
Al.12 Absolute convergence. If L:~=1 lanl converges, L:~=1 an ISsaid to be absolutely convergent.(i) If {an} converges absolutely, {an} converges.(ii) Suppose L:~=1 rn is convergent and f n ~ O. If lanl ::; rn for all nlarger than some integer m, then L:~=1 an converges absolutely.
Al.13 Power series. A series of the form L:~=o an(x - b)n is called apower series, where b is a constant.
527
Al.14 Conditional convergence, alternating series. If a convergent series is not absolutely convergent, it is said to converge conditionally. If positive and negative terms appear alternatingly, the seriesis called an alternating series.If {an} (an > 0) is a monotone decreasing sequence converging to zero,then the alternating series al - a2 + a3 - a4 + ... converges (-[AV7]).
Al.15 Theorem [Nested sequence of intervals shrinking to apoint share the point]. If a sequence of closed intervals {In} suchthat In = [an, bn] satisfies (i) h ::) h ::) ... ::) In ::) .,. and (ii)limn--+oo(bn - an) = 0, then there is a unique real c which is in allIn·OFor this theorem it is crucial that In are closed intervals.
Al.16 Denumerability. An infnite set for which we can make a oneto-one correspondence with nonnegative integers N is called a countableset or denumerable set. An infinite set which is not countable is calledan uncountable set or nondenumerable set.The set of rational numbers Q is countable.
Al.17 Cantor's Theorem [Continuum is not denumerable]. Aclosed interval I = [a, b] is nondenumerable. 0
Al.18 n-space, distance, E-neighborhood. The totality of then-tuples (Xl, x2,' .. ,xn ) is a direct product set R x ... x R =R n
and is called the n-space. The (Euclidean) distance between two points(XI,' .. ,xn ) and (YI," . ,Yn) is defined by [(Xl -YI)2+ . . '+(Xn _Yn)2p /2.The (Euclidean) distance between point P and Q is denoted by IPQI.The totality of the points which are within the distance Eof point P iscalled the E-neighborhood (E-nbh) of P (and is denoted by U€ (P) in thisAppendix).
Al.19 Inner point, boundary, accumulating point, closure,open kernel. Let S be a subset of R n
.
Inner point: P is an inner point of S if there is E > 0 such thatU€(P) c S.Boundary point: If for any E> 0 U€(P) C,p Sand U€(P) n S '# 0, P iscalled a boundary point of S.Boundary: The totality of the boundary points of S is called the boundary of S and is denoted by as.Closure: S u as is called the closure of 3 and is denoted by [3]. IfT c S, then tTl C [Sl.Open kernel: S \ as is called the open kernel of S and is denoted by
528
8°.Dense: Let T be a subset of 8. If [T] :J 8, T is said to be dense in 8.Accumulating point: If Ue(P) n 8 contains infinitely many points of 8for any positive E, we say P is an accumulat'ing point of 8.Isolated point: If a point in 8 is not an accumulating point of 8, thepoint is called an isolated point.(i) A necessary and sufficient condition for a point Q to be in [8] isthat for any positive E Ue ( Q) n 8 i= 0.(ii) The totality of rational numbers Q has no inner point and [Q] = R.(iii) All the inner points of 8 are accumulating points of 8. An accumulating point of 8 is its inner point or its boundary point. If a boundarypoint of 8 is not in 8, it is an accumulating point of 8.(iv) A necessary and sufficient condition for a point P to be an isolatedpoint of 8 is that there is a positive E such that Ue(P) n 8 = 0.
A1.20 Open set, closed set. If 8 contains only its inner points,that is, if 8 = 8°, then 8 is called an open set. If all the boundarypoints are included in 8, that is, if 8 = [8], 8 is called a closed set.The empty set 0 is simultaneously open and closed, so is R.(i) The intersection of finite or infinite closed sets is a closed set.(ii) The union of finite or infinite open sets is an open set.(iii) The intersection of finitely many open sets is an open set.(iv) The union of finitely many closed sets is a closed set.
A1.21 Limit of point sequence. A sequence of points {Pn } (Pn ER n) is called a point sequence. If there is a point A such that limn-too IPnAI =0, we say the point sequence {Pn } converges to A. and write limn-too Pn =A.
A1.22 Bounded set, diameter. If the distance between any pointP E 8 and the origin 0 is bounded to the above (-+A1.6), then the set8 is called a bounded set. When 8 is a bounded set we can define itsdiameter 8(8) as 8(8) =SUPP,QES IPQI. There is a theorem analogousto Al.15:
A1.23 Theorem [Shrinking nested sequence of bounded closedsets]. If a sequence of nonempty bounded closed sets {8n } satisfies thefollowing two conditions (i) and (ii), then there is a unique point Pshared by all of the closed sets 8n : (i) 8 1 :J 82 :J ... :J 8n :J "', (ii)limn-too 8(8n ) = O.
A1.24 Covering. Let U be a set of sets. The joint set of all themembers of U is written as UUEUU. If a set 8 satisfies 8 C UUEUU,
then U is called a covering of S. If all the elements of U is open, it
529
is called an open covering of S. If a covering U contains only a finitenumber of elements, U is called a finite covering. If a subset V of U isalso a covering of S, V is called a subcovering of U.
A1.25 Compact set. If any open covering of S has a finite subcovering, S is called a compact set.
A1.26 Theorem [Compactness is equivalent to bounded closedness]. S is compact if and only if S is a bounded closed set.oThe only-if part is called the Heine-Borel covering theorem. This istrue only if the space is finite dimensional.
[27] Theorem [Bolzano and Weierstrass]. A bounded infinite setmust have an accumulating point( ---+Al.19).0Theorem. A bounded point sequence has a converging subsequence. 0 477
A2 Function
A2.1 Function, domain, range, independent and dependentvariables. Let D c R. A rule f corresponding a single real 17 to each~ E D is called a function f.478 17 := f(~) is called the value of f at (D is called its domain and f(D) == {f(OI~ E D} is called the rangeof f. Usually, f is described as f (x), and x is called the variable, andf(x) is called a function of x. When we write y:= f(x), x is called theindependent variable and y the dependent variable.
A2.2 Limit of function. Let f(x) be a function whose domain isD. We say f (x) converges to 0: in the limit x ---+ a, if for any positive£, there is a positive number 8(£) such that
Ix - al < 8(£), xED => If(x) - 0:1 < £.
and we write limx --+ a f( x) := 0:. limx --+ a and arithmetic operations arecommutative as A1.5. We have a theorem analogous to A1.3:
477 These theorems assume that we can always choose one point from each memberof a family of infinitely many sets. From the constructive point of view, this is notalways possible. That is, we may not be able to write a computer program to doso. In the usual mathematics, we postulate this possibility as an axiom called theAxiom of Choice.
478 This is often called a map as well.
530
A2.3 Cauchy's criterion. Let f be a function whose domain is D. Anecessary and sufficient condition for f to be convergent in the x ----t alimit is: For any positive € there is a positive 6( €) such that for x, y E D
Ix - al < 8(€), Iy - al < 8(E) * If(x) - f(y)1 < E.
o
A2.4 Graph of a function. The graph Gf of a function f is a setGf = {(x, f(x))lx ED}.
A2.5 Continuity. A function f is continuous at a, if limx-+a f(x) =f(a).If the definition of the limit is spelled out completely as in A2.2, wesay: f is continuous at a, if for xED and for any positive E there is apositive 6(E) such that
Ix - al < 6(E) =} If(x) - f(a)1 < E.
Theorem. If the domain of a continuous function f is a closed interval,then its range is again a closed interval. 0
A2.6 Left and right continuity. When taking the x ----t a limit,if x is always smaller (resp., larger) than a, we write this limiting procedure as limx-+a-o (resp., limx-+a+o) and is called the left limit (resp.,right limit). If limx-+a-o f(x) = f(a) (resp., limx-+a-o f(x) = f(a)), wesay f is left (resp., right) continuous at a.
A2.7 Theorem of middle value. Let a function f be continuousin a closed interval [a,b], and f(a) =I- f(b). There is a real c such thata < c < band f(c) = f.t for any f.t between f(a) and f(b). 0The image of a finite interval by a continuous map is again a finiteinterval.
A2.8 Uniform continuity. A function f is uniformly continuousin D if for any positive E, there is a positive constant 6(E) such that
Ix - yl < 6(E),x E D,Y ED=} If(x) - f(y)1 < E.
Theorem. A continuous function defined on a closed interval is uniformly continuous on the interval. 0
A2.9 Maximum and minimum. Let f be a function whose domainis D. If f(D) is bounded, we say f is bounded. If there is a maximum(resp., minimum) value in f(D), then it is called the maximum (resp.,
531
minimum) of f.Theorem [Maximum value theorem]. A continuous function defined on a closed interval has a maximum and minimum values. 0
A2.10 Composite function. Let f be a function whose domain isD, and g is a function whose domain is in the range of f, f(D). Thenh(x) = g(1(x)) is called the composite function of f and g, and is denoted by go f.
A2.11 Monotone function. Let f be a function whose domain isD. If for any x, y E D x < y implies f(x) < f(y) (resp., f(x) > f(y)),f is called a monotone increasing function (resp., monotone decreasing function). If for any x, y E D x < y implies f(x) ::; f(y) (resp.,f(x) 2:: f(y)), f is called a monotone non-decreasing function (resp.,monotone non-increasing function).
A2.12 Inverse function. Let f be a function whose domain is D.If there is only one x such that f(x) = y for each y E f(D), the correspondence y -+ x defines a function. This function, denoted by f- 1 , iscalled the inverse function of f.The symbol f- 1 is used generally to denote the preimage of a point.Thus f-l(X) = {ylf(y) = x, y ED}, where D is the domain of f. f- 1
becomes the inverse function, if f- 1(x) is a single point for all x in therange of f.Theorem. If f is a monotone increasing (resp., decreasing) functiondefined on an interval, then f has the inverse function which is monotone increasing (resp., decreasing). 0
A2.13 Even and odd functions. If a function f has a domain invariant under x -+ -x, and(i) f(x) = f( -x), we say f is an even function,(ii) f(x) = - f( -x), we say f is an odd function.
A3 Differentiation
A3.! Differentiability, derivative. Let f be a function defined onan interval I, and a E I. If the following limit, denoted by l'(a), exists,we say f is differentiable at a:
f'(a) = lim f(x) - f(a).x->a X - a
532
l'(a) is called the differential coefficient of f at a. If f is differentiablefor any x E I, we say that f is differentiable in I, and l'(x) becomes afunction on I. l' is called the derivative of f. To obtain l' from f issaid to differentiate f. Recognize that the existence ofthe limit impliesthat the limit does not depend on how the point a is reached.
A3.2 Theorem [Differentiability implies continuity]. If f is differentiable at a, then f is continuous there. If f is differentiable in aninterval I, it is continuous in the interval. 0Warning. However, continuity does not guarantee differentiability.See A3.l2.
A3.3 Increment, differential quotient]. Let f be as in A3.l andwrite y = f(x), and b:..y =f(x + b:..x) - f(x). b:..x and b:..y are calledincrements. Then
f'(x) = lim ~y,~x-+o uX
so that the derivative is also called the differential quotient and is denoted by dyjdx. If f is differentiable, then we may write
dyb:..y = dx b:..x + o[b:..x],
For 0 see Al.4.
A3.4 Right or left differentiable. If the right limit (-tA2.6) limx-+a+o(f(x)f (a) )/ (x - a) exists, then we say f is right differentiable at a, and thelimit, called right differential coefficient at a, is denoted by D+ f(a).Analogously the left differential coefficient D- f(a) can be defined.
A3.5 Differentiation and arithmetic operations commute. Letf, 9 be differentiable in some interval, and Cl, C2 be constants. Then'arithmetic operations do not destroy differentiability':(i) d:(cl!(x) + c2g(X)) = cl!'(x) + c2g'(X).(ii) d:(f(x)g(x)) = f'(x)g(x) + f(x)g'(x).(iii) If 9 is not zero then ..!L!(x) = !'(x)g(x)-!(x)g'(x)., dx g(x) g(x)2
A3.6 Derivative of composite function. Let f be a differentiablefunction on an interval I, and 9 be a differentiable function on an interval J containing f(1). Then, go f (-tA2.l0) is differentiable and
ddxg(f(X)) = g'(f(x))J'(x).
A3.7 Derivative of inverse function. Let f be a differentiable
533
monotone function on an interval I. Then its inverse function (~A2.I2)
is differentiable and
A3.S Theorem [Mean-value theorem]. Let f be a continuous function on the closed interval [a, b]. If f is differentiable in (a, b), then thereis ~ E (a, b) such that
f'(~) = f(b) - f(a).b-a
oA special case of this theorem is:
A3.9 Theorem [Rolle's theorem]. Let f be continuous in [a, b].If f is differentiable in (a, b) and f(a) = f(b), then there is ~ E (a. b)such that f'(O = O. 0
A3.I0 Theorem [Generalization of mean-value theorem]. Letf and g be continuous functions on a closed interval [a, b], and are differentiable on (a, b). If f' and g' do not simultaneously vanish in (a, b)and g(a) -:j= g(b), then there is ~ E (a, b) such that
f'(O f(b) - f(a)g'(O g(b) - g(a)'
o
A3.II Theorem [Condition for monotonicity]. A necessary andsufficient condition for a differentiable function defined on an interval Iis monotone increasing (~A2.11) is that f' (x) 2:: 0 on I and f' (x) > 0on a dense (~Al.19) subset of I. 0
A3.I2 Counterexamples.(i) f(x) = x sin(ljx) is continuous at x = 0 but not differentiable there.(ii) f(x) = L:;:;O=l 2-n lsin(7rn!x)I is continuous on R, but not differentiable on Q.(iii) f(x) = L:~=12-n cos(knnx) (k is an odd integer larger than 13) iscontinuous on R, but is nowhere differentiable.
A3.I3 Higher order derivatives. Suppose f is a differentiable function on an interval I. If l' is again differentiable on I, then we can definethe second derivative df' / dx. If the function f is sufficiently smooth,
534
then we can define higher-order derivatives like the n-th derivative,which is denoted by f(n)(x), dnf jdxn, Dnf(x) or (djdx)n f(x). Arithmetic operations do not destroy higher order differentiability as A3.5.The composite function of n-times differentiable functions is n-timesdifferentiable as [6].
A3.I4 Leibniz' formula.
dn
dxn(f(x)g(x)) fCnJ(x)g(x) + (7) fCn-lJ(x)g'(x) + ...
+(~)fCn-k)(x)gCk)(x) + ... + f(x)gCnJ(x).
A3.I5 Taylor's formula, remainder. Let f be a n-times differentiable function on an interval I, and a E I. Then for any x E I thereis a point ~ between a and x such that
n-l f(kJ( ) fCn)(t)f(x) = f(a) + L ,a (x - al + ,<" (x - a)n.
k=l k. n.(A3.1)
The last term is called the remainder, and is written as R n . 0For n = 1 this is the mean-value theorem (-+A3.8), and this theoremis regarded as an extension of the mean-value theorem.The remainder can be written as follows: Let e= a + O(x - a) (0 <B < 1).(i) Schlomilch's remainder: Choosing an integer q (0 :::; q :::; n - 1),
(ii) Cauchy's remainder. This is a special case of (i) with q = n - 1:
R = fCnJ(~) (1 _ B)n-l(x _ a)nn (n-l)! .
(iii) The remainder in (A3.1) is another special case of (i) with q = 0,and is called Lagrange's remainder.
A3.I6 Taylor's series. If f is infinite times differentiable, and {Rn }
in A3.I5 converges to zero, then f can be expanded in a Taylor seriesabout a:
00 JCk)(a)f(x) = f(a) +E k! (x - a)lc.
535
A3.17 Convex and concave function. Let f be a function whosedomain is I. Let Xl, X2 E I and A and fl be positive reals satisfyingA+fl=l.lf
(A3.2)
we say f is convex on I, and f is called a convex function. If there isno equality in (A3.2), then we say f is strictly convex, and f is called astrictly convex function. If - f is (strictly) convex, we say f is (strictly)concave, and f is called a (strictly) concave function.
A3.18 Theorem [Convexity and second derivative]. Let f bea twice differentiable function on an interval I.(i) A necessary and sufficient condition that f is convex on I is that1"(x) 2:: °for all the inner points of I.(ii) If 1"(x) > °for all the inner points of I, then f is strictly convex.oAn analogous theorem for concave functions should be self-evident.Simply switch f to - f.Remark. (i) assumes that f is twice differentiable. Convex functionsmust be continuous, but need not even be differentiable once.
A3.19 Local maximum, minimum. Let f be a continuous functionon an interval I, and a be an inner point of I. f(a) is a local maximum(resp., local minimum), if for some positive number E °< Ix - al < E
implies f(x) < f(a) (resp., f(x) > f(a)). These are collectively calledlocal extrema.Theorem. If f is a differentiable function on an interval I, and has alocal extremum at a E 1°,479 then f'(a) = 0.0Theorem. Let f be a function which is n-times (n 2:: 2) differentiableand f'(a) = 1"(a) = ... = fCn-I)(a) = °at some inner point of I.(i) If n is odd, then f(a) is not an extremum of f.(ii) If n is even, and fCn)(a) > 0, then f(a) is a local minimum of f(x).(iii) If n is even, and fCn)(a) < 0, then f(a) is a local maximum off(x ).0
A3.20 Stationary value. Suppose f is n-times differentiable in aninterval I, and for some inner point a of I f' (a) = 1"(a) = ... =fCn-l)(a) = 0 but fCn)(a) # O. If n 2:: 3 and odd, f(a) is called astationary value of f, and a is called a stationary point of f.Theorem. For the f in this item,(i) If fCn)(a) > 0, then there is a positive number E such that f isstrictly convex in [a, a + E] and strictly concave in [a - E, a].
479 For ° see Al.19.
536
(ii) If f(nJ(a) < 0, then there is a positive number E such that f isstrictly concave in [a, a + €] and strictly convex in [a - €, a].O
A3.21 Class en. Let f be a function defined on an interval I. Ifj is n-times differentiable and j(n) is continuous on I, then j is calleda function of class en (or a en-function). If f is infinite-times differentiable, it is called a eoo -function.Theorem. Let f and 9 be en-functions on an interval I.(i) Arithmetic operations do not destroy en-functions.(ii) go f is again a en-function.(iii) If for all x Elf' (x) # 0, and f is monotone, then its inversefunction f- 1 is a monotone en-function.OThese statements hold for eoo functions as well.
A3.22 Class ew• Let f be a COO-function in an open interval I.
If f can be Taylor-expanded (-tA3.16) in the neighborhood of eacha E I, then f is said to be real analytic in I and is called a real analyticfunction or a eW-function.Warning. A COO-function need not be a real analytic function. Atypical example is
'IjJ(x) Oforx:S;O,
e- 1/ x for x > 0.
Its derivatives at x = °all vanish, so that Taylor series formally constructed becomes identically zero, but this contradicts the fact that'IjJ(x) > °for positive x. Hence, this function is not real analytic.This is an important function to be used to 'mollify' functions throughconvolution.
A3.23 Theorem [Existence of mollifier]. Let a and b be two arbitrary points (a < b) in R. There is a eoo-function p(x) on R such thatp(x) =°for x :S a, p(x) = 1 for x 2:: band °:S p(x) :S 1.0Corollary. Let f and 9 be eoo-functions on R, and a and b are thesame as in the theorem. There is a eoo-function h such that h(x) = f(x)for x :s; a, h(x) = g(x) for x 2:: b (and h interpolates f and 9 betweena and b). 0Thus eoo-functions can be deformed freely. In contradistinction, ew _
functions cannot be deformed freely as shown in the following
A3.24 Theorem [Identity theorem]. Let f and 9 be eW-functionsdefined on an open set I. If f and 9 coincide in some neighborhood ofa point a E I, then f and 9 are identical on I. 0
A3.25 Complex analysis. Real analytic functions are best under-
537
stood as special complex-valued functions defined on the complex plain."The shortest path between two truths in the real domain passes throughthe complex domain." (J. Hadamard). H. A. Priestley, Introduction toComplex Analysis (Oxford UP, 1990, revised edition) is a convenientintroduction to the topic. See also my notes for Physics 413, which ismuch more complete than the book with a sizable chapter on conformalmapping and its application to boundary value problems.
A4 Integration
A4.1 Definite integral (Riemann integral). Let f be a continuousfunction defined on a closed interval I = [a, b]. Let a = XQ < Xl <X2 < ... < Xk < ... < Xm-l < Xm = b, and partition [a, b] into mintervals [Xk-l, Xk] (k = 1,2,···, m). The partion determined by theset .6. _ {XQ, Xl, ... , x m } is called the partition.6.. Let the maximumof IXk - Xk-ll (k = 1,2, .. ·,m) be 8(.6.). The following limit exists(remember that f is assumed to be continuous) and called the definiteintegral of f on [a, b]:
lb
f(x)dx = lim L f(~k)(Xk - Xk-l),a 8(Ll)-+Q k
where ~k E [Xk-l, Xk]. The limit does not depend on the choice of ~k' fis called the integrand, x the integration variable, and b (resp., a) theupper limit (resp., lower limit) of integration. The integration variablex is a dummy variable in the sense that we may freely replace it withany letter.We define for b> a fb
a f(x)dx - f: f(x)dx, and faa f(x)dx = O.Sometimes, the definite integral is written as
l bdx f(x).
This notation clearly shows that integration is an operation applied tof·
A4.2 Riemann-integrability. Integration can be defined even if f isnot continuous. Let f be a bounded function on [a, b]. For the partition.6. in A4.1, define
m m
SLl =L Mk(Xk - Xk-l), SLl =L mk(xk - Xk-l),k=l k=l
538
where M k (resp., mk) is the maximum (resp., minimum) value of fin [Xk-l, Xk]. Let S = sup~ S~ and s = inf~ s~ (Here the supremum(infimum) is looked for over all the possible finite partitions of [a, b]. IfS = s, we say f is Riemann integrable on [a, b]. In this case, S = s isthe definition of J: f(x)dx. Even if f has finitely many discontinuouspoints in I, f is Riemann-integrable.
A4.3 Basic properties of definite integral. Let f and 9 be Riemannintegrable on the closed interval [a, b].(i) For c E (a, b)
ib
f(x)dx = i c
f(x)dx + l bf(x)dx.
(ii) For arbitrary constant Cl and cz,
ib[clf(x) + czg(x)]dx = Clib f(x)dx + czib g(x)dx.
(iii) If f ;::: 0 on [a, b]' then J: f(x )dx ;::: O. If, furthermore, f is continuous and is not identically zero, then the integral is strictly positive.(iv)
A4.4 Theorem [Mean value theorem].(1) If f is a continuous function defined on a closed interval [a, b]' thereexists a point f, E (a, b) such that
1 ib
b - a a f(x)dx = f(O·
(2) If f and 9 are continuous on the closed interval [a,b], and if 9 > 0on the open interval (a, b), then there exists ~ E (a, b) such that
ib
f(x)g(x)dx = f(~) ib
g(x)dx.
o
A4.5 Fundamental theorem of calculus, primitive function, indefinite integral. If f is integrable on a closed interval I = [a, b], thenfor x E [a, b], we can define the definite integral of j on [a, xl:
F(x) = ix
j(t)dt
539
which is a function of x on [.Theorem [Fundamental theorem of calculus].
l b
f (x) = F(b) - F(a)
or F'(x) = f(x).DAny function such that F'(x) = f(x) on [is called a primitive functionof f. A primitive function for f, if any, is not unique; it is unique upto an additive constant. Thus any primitive function of f, if any, canbe written as
F(x) = l x
f(t)dx + C,
where C is called the integration constant.The indefinite integral of f is defined as a primitive function of f, andis denoted by
Jf(x)dx.
A4.6 Integration by parts. Let f and g be C1-functions (-tA3.21)on an interval [. Then
lb
f(x)g'(x)dx = f(x)g(x)l~ - lb
f'(x)g(x)dx,
where h(x)l~ =h(b) - h(a).
A4.7 Improper integral. When
lim r f(x)dxc->b Ja
exists, we write this J: f( x )dx even if f is not integrable on [a, b) in thesense of A4.2, and call it an improper integral. b may be a discontinuous point of f or ±oo. It is easy to construct the Cauchy convergencecriterion (-tA1.3) for improper integrals.Improper integrals satisfy A4.3 (i) and (ii), and if the improper integral of If I is definable (we say f is absolutely integrable; absolutelyintegrable functions are integrable.), (iii) holds as well.Also the fundamental theorem of calculus (-tA4.5), and the meanvalue theorem (-tA4.4) are valid.
A4.8 Change of integration variables. Let f be a continuous function on an interval [ = [a, b]' cp(t) be a continuous function defined on
540
an interval J whose range is in I. a,{3 E J (a =I=- (3), a = cp(a) andb = cp((3). Then
l b
f(x)dx = J: f(cp(t))cp'(t)dt.
If f is an even function (~[AII13]), then
rbf(x)dx = j-a f(x)dx.ia -b
If f is an odd function (~AII13]), then
rbf(x)dx = _j-a f(x)dx.
ia -b
A5 Infinite Series
A5.1 Changing the order of summation in infinite series. Absolutely convergent series and conditionally convergent series (~Al.12,
Al.14) have diametrically different properties with respect to the rearrangement of the terms in the summation:Theorem.(i) The sum of an absolutely convergent series does not depend on theorder of summation of the terms in the series.(ii) If a series L:~=1 an is conditionally convergent, then for any givenreal ethere is a reordering of the series {a-y(n)} such that
00
L a-y(n) = e·n=l
There is also a reordering to make the series divergent to ±oo. 0
A5.2 Product of two series. The product of two absolutely convergent series (~Al.12) can be computed via distributive law: Lets = L: an and t = L: bn, and both are absolutely convergent. Then
This is not necessarily true for conditionally convergent series, e.g., consider an = bn = (_1)n / .,(ii.
541
A5.3 Theorem [Comparison theorem I. comparison with improper integral]. Let r(x) > 0 be a continuous monotone decreasingfunction (~A2.11) on [k,+oo) with k being a positive integer suchthat limx --+ oo r(x) = O. Let rn =r(n). ~~=k rn converges (resp., diverges), if Jkoo r(x)dx converges (resp., diverges). 0Examples:(i) ~~=1 n -8 (s > 0) converges for S > 1 and diverges for S ~ 1.(ii) ~~=2{lj[n(logn)8]} (s > 0) converges for s > 1 and diverges fors ~ 1.(iii) ~~=3{lj[nlogn(loglogn)8]} (s > 0) converges for s > 1 and diverges for s 2:: 1.
A5.4 Theorem [Comparison theorem II. comparison of series].Let ~~1 Un and ~~=1 Vn be positive term series, and there is a positiveinteger no such that for n > no
Then(i) If ~ Vn converges, then so is ~ Un'
(ii) If ~ Un diverges, then so is ~ V n . 0From this theorem, we get useful convergence criteria:
A5.5 Cauchy's convergence criterion. Let ~ an be a positive termseries. Suppose the limit p = limn--+oo(anjan+d exists. If p < 1, thenthe series converges, and if p ~ 1, the series diverges.
A5.6 Gauss' convergence criterion. For a positive term series ~ anwith
an (J [1]-- = 1+ - + 0 1+<5'an +1 n n
where 8 is positive.48o Then the series converges if (J > 1, and divergesif (J :::; 1.
A5.7 Abel's formula. Let the partial sums Sm
t m = ~:=1 bn • Then
m m
L antn = [smtm - Sk-1 t k] - L Sn bn+1'n=k n=k
This is a discrete analogue of integration by parts (~A4.6).
This transformation implies the following criteria:(i) If ~~=1 an converges and ~~=2(tn - tn-d converges absolutely, then
480For the symbol 0 see Al.4.
542
E~=l antn converges.(ii) The sequence {sn} is bounded and {tn} is a monotone decreasingpositive sequence converging to zero, then E antn converges.For example, (ii) implies that E tn cos(nx) and E tn sin(nx) converge,if {tn } is a monotone decreasing positive sequence converging to zero.This is an extension of [AI14] on alternating series.
A5.8 Function sequence, convergence. A sequence of functions{fn(x)} is called a function sequence defined on I, if the domains of allthe functions in the sequence are identically I. For a fixed x = eEl, ifthe sequence {fn(e)} converges, we say the function sequence convergesat x = ( If the function sequence converges at every point of I, wesay that the sequence converges on I. The limit for each x may bewritten as f(x), which is regarded as the limit function of the functionsequence, and we say the function sequence {fn(x)} converges to f(x).More formally, we say that the function sequence {fn(x)} converges tof(x) if for each x E I and for any positive number E, there is a positiveinteger no( E, x) such that
n > no(E,x) => Ifn(x) - f(x)1 < E. (A5.1 )
A5.9 Uniform convergence. Let {fn(x)} be a function sequencedefined on an interval I. If in (A5.1) no(E, x) is independent of x E I,we say the function sequence {fn} is uniformly convergent to f on I.That {fn} is uniformly convergent to f on I is equivalent to
lim sup Ifn(x) - f(x)1 = o.n-+oo xEI
A5.!O Theorem [Cauchy's criterion for uniform convergence].Let {fn(x)} be a function sequence defined on an interval I. A necessaryand sufficient condition for the sequence to be uniformly convergent isthat there is a positive integer no(E) such that for any x E I
o
A5.!! Function series, convergence, uniform convergence, maximal convergence. L:~=1 f n (x) is called a function series. Let itspartial sum be Sm (x) = L:~=1 fn(x). If the function sequence {sn (x)}(uniformly) converges to sex), we say the series L:~=1 fn(x) (uniformly)converges to sex), which is called the sum of the series. If L:~=1 fn(x)is uniformly and absolutely convergent, we say the series is maximallyconvergent.
543
A5.I2 Theorem [Uniform convergence preserves continuity].Let {fn(x)} be a function sequence of continuous functions defined onan interval I.(i) If the sequence uniformly converges to f on I, then f is continuousin I.(ii) If the series 2:~=1 fn(x) converges uniformly, then its sum is a continuous function on 1.0
A5.I3 Theorem [Dini's theorem]. Let {fn(x)} be a sequence ofcontinuous functions defined on the closed interval [a, b]. Suppose thesequence is monotonically decreasing: for any x E [a, b] !I (x) 2:: h (x) 2::... 2:: fn(x) 2:: .... If the sequence {fn(x)} converges on [a, b] to a continuous function f (x), then the sequence uniformly converges to f on[a, b]. O.A5.I4 Theorem [Comparison theorem]. Let 2:~=1 an be a convergent positive term series. For a sequence of {fn(x)}, suppose Ifn(x)1 ::;an for all n on an interval I. Then the infinite sum 2:~=1 fn(x) is maximally convergent.O
A5.I5 Theorem [Exchange of limit and integration]. Let {fn(x)}be a sequence of continuous functions defined on [a, b]' uniformly convergent to f(x) there. Then
lb
f(x)dx = lim l b
fn(x)dx.a n-->oo a
oA more general theorem (Arzel1i's theorem) will be given in A5.I7.The theorem implies that a uniformly convergent series of continuousfunctions is termwisely integrable:
A5.I6 Theorem [Exchange of limit and differentiation]. Letfn(x) be a C1-function (-tA3.2I). If 2:~1 fn(x) converges on I, and2:~=1 f~ (x) converges uniformly on I, then the sum of the series isdifferentiable and
d 00 00
dx L fn(x) = L f~(x)n=l n=l
o
A5.I7 Theorem [Arzela's theorem]. Let fn(x) be a continuousfunction defined on a closed set [a,b] (actually this need not be a closed
544
interval) and uniformly bounded, i.e., there is a positive number Mindependent of n such that Ifn (x) I < M on the interval. If the functionsequence {fn(x)} converges to a continuous function f(x) on [a,b], then
lb
f(x)dx = lim l b
fn(x)dx.a n---+oo a
o
A5.IS Majorant. For a function sequence {fn(x)} defined on aninterval I, a function O"(x) such that Ifn(x)1 < O"(x) is called a majorantof the sequence.Theorem. If a majorant 0"(x) is integrable on the interval, then theorder of integration and limn ---+ oo can be exchanged. []
A5.I9 Convergence radius of power series. For a power series",00 nLtn=O anx ,
1r = l' I 11/1m sUPn---+oo an n
is called the convergence radius of the power series (the reason for thename is seen from the following theorem A5.20. The formula is calledthe Cauchy-Hadamard formula). Here, if the lim sup diverges to +00,then we define r = 0, and if lim sup converges to zero, then we definer = +00.
A5.20 Theorem [Power series is termwisely differentiable]. Thepower series L:~=o anxn is absolutely convergent for Ixl < r, and is divergent for Ixl > r, where r is the convergence radius (---tA5.I9). Forany 0 < p < r, the series is uniformly convergent (---tA5.II) in [-p, p]to a continuous function (cf. A5.I2), so that the series is termwiselydifferentiable there. 0
A5.2I Theorem [Power series defines a real analytic function].The power series L:~o anxn whose convergence radius is r uniquelydetermines a CW-function (---tA3.22) f in the open interval (-r, r).Actually, the power series is the Taylor series (---tA3.I6) for f. 0
A5.22 Theorem [Continuity at x = r or -r]. Let r be the convergence radius of the power series L:~=o an xn = f (x). If L:~=o anrn
is convergent, then f(x) is continuous in (-r,r]. If L:~=oan(-r)n isconvergent, then f(x) is continuous in [-r,r). 0
A5.23 Infinite product. For a sequence {an} (an =j:. 0) ala2'" an'"is called an infinite prod'l.lct, and is denoted by rr~=l an' Pn = ala2 ... anis called the partial product.
545
A5024 Convergence of infinite product. Let I1~=1 an be an infinite product and its partial product sequence be {Pn}. If this sequenceconverges, and P = limn -+oo Pn is not zero, we say the infinite productconverges to p: P = I1~=1 an' Else, we say the infinite product is divergent.
A5025 Theorem [Convergence condition for infinite product].(i) A necessary and sufficient condition for the infinite product I1~=1(1 +un) (un > -1) to be convergent is that the infinite series L:~=1 log( 1 +un) converges.(ii) If L:~=1 Un (Un> -1) converge absolutely, then I1~=1(1 + 'un) converges. (In this case we say the infinite product converges absolutely,and the product does not depend on the order of its terms.)(iii) If L: Un (Un> -1) and L: u; both converges, then the infinite product I1~=1(1 + un) converges. 0
A5026 Conditional convergence of infinite product. If an infinite product I1~=1(1 +un) converges but does not converge absolutely(-tA5.25(ii)), we can reorder the product to converge to any positivenumber. 0This is quite parallel to a similar theorem for conditionally convergentseries (-tA5.1(ii)).
A6 Function of Two Variables
Since real valued-functions of two variables illustrate complications dueto the existence of many independent variables, in this rudimentarypart, we discuss only a function defined on a point set D in R 2
•
A60l Rudiments of topology.(i) If an open set U E R 2 is not a join of two open sets, U is said to beconnected.(ii) Theorem. A necessary and sufficient condition for an open set Uto be connected is that any points P, Q E U can be connected by apiecewise straight curve in U. 0(iii) A connected open set is called a region, and its closure (-tAl.l9)is called a closed region.
546
(iv) Distance p(x, y) of two points x and y in R 2 is defined as
p(x, y) = Ix - yl = V(X1 - yd2+ (X2 - Y2)2,
where x (or y) is identified with its coordinate expression, say, (Xl, X2).
A6.2 Function, domain, range. Let D be a point set in R 2. A rule
which defines a corresfondence of each xED to some real is called afunction from D c R to R. D is called its domain and f(D) C R iscalled its range.481 We write f : D -+ R.
A6.3 Limit. Let D be a point set in R 2• For f : D -+ R, we say
limp-+.4 f(P) = 0: E R, if for any positive number E there is a positivenumber 8(E) such that
p(P, A) < 8(E) => If(P) - 0:1 < E.
Notice that the limit should not depend on how P approaches A.It is easy to write down Cauchy's criterion for the convergence (cfAl.3).
A6.4 Continuity. Let D be a point set in R 2• A function f :
D -+ R is continuous at an accumulation point (-+Al.19) P E D,if limQ-+p f(Q) = f(P). (To discuss the continuity on the point of Dwhich are not accumulating points of D is uninteresting.)Uniform continuity can also be defined quite analogously as in the onevariable function case (cf A2.8).
A6.5 Theorem [Maximum value theorem]. A real-valued continuous function defined on a bounded closed set D C R 2 has a maximumand minimum values on D. The range of f is a closed interval. 0(cf [AlI9]).
A6.6 Partial differentiation. Let f(x, y) be a real-valued functiondefined in a region D C R 2
, and (a, b) E D. If f(x, b) is differentiableat a with respect to x, we say that f( x, y) is partially differentiable withrespect to x at (a, b), and the derivative is denoted by fx(a, b). Moregenerally, if f is partial differentiable in D with respect to x, we maydefine fx(x, y):
f ( ) 1· f(x + h) - f(x)
x x,y = 1m h .h-+O
If we write z = f(x, y), fAx, y) is written as &zj&x. fx(x, y) is calledthe partial derivative of f with respect to x. We say that we partialdifferentiate f with respect to x to obtain fx(x, y). Similarly, we can
481The set {f(x) : xED} is often written as f(D).
547
define the partial derivative with respect to y of f. Analogously, wecan define higher-order (mixed) partial derivatives like fxxy.Warning. Even if fx and fy exists at a point, f need not be continuousat the point. 0This implies that the 'differentiability' of f must be defined separatelyfrom its partial differentiability.
A6.7 Differentiability, total differential. Let f(x, y) be a realvalued function defined in a region D C R 2
, and (a, b) E D. We say fis differentiable at (a, b) if there is constants A and B such that
f(x, y) = f(a, b) + A(x - a) + B(y - b) + oh!(x - a)2 + (y - b)2].
Theorem. If f above is differentiable at (a, b), then f is continuous there, and is partially differentiable with respect to x and y withA = fx(a, b), B = fy(a, b). D.dz = fxdx + fydy is called the total differential of f.We say that f is differentiable in D, if f is differentiable at every pointin D.Intuitively, if a local linear approximation is reliable, we say the function is differentiable.A6.8 Theorem [Partial differentiability and differentiability].Let f be a function defined in a region D C R 2. If fx and fy exist andare continuous in D, then f is differentiable in D. 0
A6.9 Theorem [Order of partial differentiation]. Let f be afunction defined in a region D C R 2. If partial derivatives fx, fy, fxyand fyx exist and if fxy and fyx are continuous, then fxy = fyx' 0
A6.10 Theorem [Young's theorem]. Let f be a function definedin a region D C R 2. If fx and fy exist and f is differentiable, thenfxy = fyx' 0
A6.11 Theorem [Schwarz' theorem]. Let f be a function definedin a region D C R 2. If partial derivatives fx, fy and fxy exist and iffxy is continuous, then fyx exists and fxy = fyx' 0
A6.12 fxy = fyx is not always correct. Let f(x, y) = xy(x2 y2)/(x2 + y2) except for (0,0), where f is defined to be O. Thenfxy(O,O) =I- fyx(O,O); the left-hand-side is -1, and the right-hand-sideis 1.
A6.13 en-class function. If f has all the partial derivatives of ordern, which are all continuous, we say that f is a en-function. If derivatives of any order exists, we say that the function is a COO-function.
548
A6.14 Composite function. Let cp(t) and 'ljJ(t) be continuous functions defined on an interval I such that (cp(t),'ljJ(t)) E D for all t E I.If f(x,y) is a continuous function defined on D, then f(cp(t),'ljJ(t)) iscontinuous.If cp(t) and 'ljJ(t) are differentiable with respect to t, and f(x, y) is differentiable in D, then f(cp(t),'ljJ(t)) is differentiable with respect to t,and
:t f (cp(t), 'ljJ(t)) = fx( cp(t), 'ljJ(t))cp' (t) + fy( cp(t), 'ljJ(t) )'ljJ' (t).
If cp(t) and 'ljJ(t) are en-functions of t, and f(x, y) is en in D, thenf(cp(t),'ljJ(t)) is again en.These propositions hold even if we replace the function of t with functions of sand t. For example, If cp(s, t) and 'ljJ(s, t) are en-functions ofsand t in a domain D1 , (cp(s,t),'ljJ(t,s)) ED, and f(x,y) is en in D,then f(cp(s,t),'ljJ(s,t)) is again en in D 1•
A6.15 Taylor's formula. Let f(x, y) be a en-class function defined ona region D, (a, b) E D, and the line segment AP with P = (a+ h, b+ k)be in D. Then
n-l 1 (a a)mf(a + f,b + k) = f(a,b) + L -, f-a + k-
a, f(a,b) + Rn
m=l m. x y
with1 (a a)n
Rn=n! fax+kay f(a+Bh,b+Bk)
for some () E (0, 1). Rn is called the residue.If f is a COO-function (-.[13]) in a region D and if in some subregionDA of D limn-->oo R n = 0, then we say f is Taylor-expandable in DA:
00 ap+q f( a b)f(x,y)=f(a,b)=L L a a' (x-a)P(y-b)q.
n=l p+q=n xP yq
If f is Taylor-expandable, we say f is a real analytic function (eW_
function) of two variables.This is a double series, so we need some general theory of double seriesand double sequences.
A6.16 Limit of double sequence. Let {anm } be a double sequence.If for any positive E, there is a positive integer N( E) such that
549
then we say the double sequence converges to a, and write limm,n->oo amn =a. It is easy to state Cauchy's convergence criterion (~A1.3) for a double series.
A6.17 Warning. limm,n->oo and limm->oo limn->oo or limm->oo limn--+ ooare distinct. For example, if amn = 2mnl(m2+n2
), limm--+ oo limn->oo amn =o and limn->oo limm--+ oo amn = 0, but limm,n->oo amn does not exist. Ifamn = (_l)n 1m + (_l)m In, then limm,n--+oo amn = 0 but the other limits do not exist.
A6.18 Theorem [Exchange of limits]. Suppose limm,n--+oo amn = aexists. If for each n liIllm--+oo amn exists, then limn->oo limm--+ oo amn = a.If for each m limn--+ oo amn exists, then limm--+oo limn--+ oo amn = a. 0
A6.19 Double series, convergence. For a double sequence {amn },2::,n=1 amn is called a double series. We say that the double seriesconverges if the double sequence {smn} of its partial sums Smn =2:;=1 2:~=1 apq converges. Its absolute convergence can also be definedanalogously as in the ordinary series case (~Al.12).
Theorem. If a double sequence 2:: n=1 amn converges absolutely, then00 00 00 '
2:m=1 2:n=1 amn = 2:m,n=1 amn . D
A6.20 Power series of two variables. The set G such that 2::,n=O amnxmynfor V(x, y) EGis absolutely convergent is called the convergence do-main of the double power series.Theorem. If for (e, 7]) i- (0,0) the double series 2: amnem7]n is bounded,then for Ixl < lei and Iyl < 17]1 the double power series 2::,n=O amnXmYnis absolutely convergent.D
A6.21 Exchange of order of limits, uniform convergence. Iffor any positive E there is N (E) independent of m such that
n > N (E) * Iamn - am I < E,
we say {amn } converges to am uniformly with respect to m.Theorem. If {amn } converges to am uniformly with respect to m inthe n ~ 00 limit, and if am converges to a in the m ~ 00 limit, thenlimm n--+oo amn = a. DThe~rem. If limn->oo limm--+oo amn exists, {amn } converges to am uniformly with respect to m in the n ~ 00 limit, and if am converges to ain the m ~ 00 limit, then limm--+ oo limn--+ oo amn = limn--+ oo limm->oo amn =a. 0In contrast to A6.18 here the existence of limm n--+oo amn is not assumed.,
A6.22 Counterexample.
550
(i) For amn = (-1)nm /(m+n), limm-+oo limn-+oo amn = 0, limn-+oo limm-+ oo amndoes not exist.(ii) For amn = m/(m + n) both limits exist but not identical.
A6.23 Theorem [Differentiation and integration within integration]. Let f(x, y) be a bounded function defined on a rectangleK = {(x, y)lx E [a, b]' y E [c, d]}. Assume that f is continuous as afunction of x (resp., y) for each y (resp., x). Then(i) f: dxf(x, y) is a continuous function of y in [c, d].(ii) If f(x, y) is partially differentiable with respect to y, and if fy(x, y)is bounded on K, and continuous as a function of x for each y, then
d 1b 1bf)dy a dxf(x, y) = a dx f)yf(x, y).
(iii)d (U
du ia dxf(x, y) = f( u, y).
(iv)
l d
dy l b
dxf(x, y) = l b
dx l d
dyf(x, y).
o
A6.24 Theorem [Differentiation and integration within improper integration]. Let f(x, y) be a bounded function defined on arectangle K = {(x, y)lx > a, y E [c, d]}. Assume that f is continuousas a function of x (resp., y) for each y (resp., x). Assume, furthermore, that there is a nonnegative continuous function 0"(x) such thatIf(x,y)1 ~ O"(x) and Ja+ oo dxO"(x) < +00. Then(i) Ja+ oo dxf(x, y) is a continuous function of y in [c, d].(ii) If f(x, y) is partially differentiable with respect to y and if thereis a nonnegative continuous function O"(x) such that Ify(x, y)1 ~ O"l(X)and ftXJ dXO"l (x) < +00, then
d 1+00 (+oo f)dy a dxf(x, y) = ia dx f)yf(x, y).
(iii)
fd (+oo 1+00 fdc dy ia dxf(x, y) = a dx c dyf(x, y).
o
551
A 7 Fourier Series and Fourier Transform
In this section all the integrals are Riemann integrals [AIV1]. Thusintegrable or absolutely integrable means Riemann-integrable and absolutely Riemann integrable.
A 7.1 Fourier series.{Fourier series Let I be a function on R withperiod 21r.482 Assume that the following integrals exist483
1 fn21l"an = - dxl(x) cosnx for n = 0,1,2"",1r 0
1 121l"bn = - dxl(x)sinnx for n = 1,2,3,···.
1r 0
Then1 00
S[J] = -aD + L (an cos nx + bnsin nx)2 n+l
is called the Fourier series of f. To construct S[j] is said to Fourierexpand f.Notice that the Fourier series converges uniformly (~[AVll]) if I:~=o Ian Iand I:~=o Ibn I both converge.
A 7.2 Theorem. Let I be a 21r periodic function which has at mostfinitely many discontinuities, and is absolutely integrable on [0, 21r]. IfS[J] converges uniformly, then S[/](xo) converges to I(xo) if I is continuous at xo. Specifically, if I is 21r-periodic continuous function, thenS[J] = 1.0This theorem uses the property of the Fourier series (its uniform convergence), so it is not very satisfactory. A7.8 below tells us that wecannot remove of this extra condition from this theorem.
A 7.3 Complex Fourier series. Let I be a function on R with period21r. Assume that the following integrals exist
C = ~ r27rdxl(x)e-ikx for k = ... -2 -1 °1 2 ...
n 21r Jo ' , "" .
Then00
S[J]..---- L cneikx
k=-oo
482That is, f(x + 21r) = f(x) for any x E R.483 The integration range can be [-1r.1r].
552
is called the complex Fourier series of I.Needless to say, a theorem corresponding to A 7.2 holds.
A7.4 Theorem [Bessel's inequality]. If I is 21'1"-periodic and squareintegrable on [0,21'1"], then
21r kf;oo ICkl 2 s i 1r1r dx ll(x)1
2•
o
A 7.5 Theorem [Parseval's equality]. If I is a 21r-periodic continuous function, and I' is square integrable (especially I is a 21'1"-periodicC1-function (~A3.21)), then S[J] uniformly converges to I. In thiscase the following equality holds
21r k"foo ICkl 2= J:1r dxll(xW,
which is called Parseval's equality.DWarning. The continuity of I is not sufficient even for pointwise convergence of S[J] to I. See A 7.8.
A 7.6 L2-convergence. A function sequence In defined on (-1r, 1'1")
is said to L 2 -converge to I (or to converge in the square mean), if
as n ~ 00.
A7.7 Theorem. If I is a 21'1"-periodic continuous function, then S[J]L2-converges to I, and Parceval's equality (~A7.5) holds. 0
A 7.8 Theorem [duBois-Reymond]. For a 21r-periodic function I,its continuity does not guarantee the pointwise convergence of S[J] toI. [Counterexamples exist.] 0However,
A 7.9 Theorem [Fejer]. Let Sn be the partial sum of the Fourierseries up to the n-th term. Define
1 n
O"n - --1 L Sk.n + k=O
If I is a 21'1"-periodic continuous function, then O"n uniformly convergesto I. 0
553
A 7.10 Piecewise C 1-function. A function I is said to be piecewise C1, if there are finitely many points ')'1 < ')'2 < ... < ')'m such thaton each open interval hI, ')'l+d I and f' are continuous and bounded.Notice that at each ')'1 right and left limits (~A2.6) of I (denoted by1hz +0) and Ihl- 0)) exist.
A 7.11 Theorem. If I is piecewisely C1, then S[I] converges to
[J(x + 0) + I(x - 0)]/2 for all x. The same holds if f' is piecewisecontinuous and square-integrable (i.e., its boundedness need not be assumed). The convergence is uniform except in the arbitrarily smallneighborhood of the discontinuities of f. 0
A 7.12 Theorem. If f is a 27r-periodic function, integrable on (-7r, 7r)and is of bounded variation,484 then the conclusion of A 7.11 holds. 0
A7.13 Theorem [Locality of convergence]. Let hand 12 be piecewise 27r-periodic functions integrable on (-7r, 7r). If there is a neighborhood of Xo such that h =12 on it, then S[h] converges (resp., diverges)at XQ if and only if S[12] converges (resp., diverges) at XQ. When theyconverge, the limits are identical. 0
A 7.14 Fourier transform. Let f be an integrable function on R.If the following integral exists
j = I: dxf(x )e-ikx,
it is called the Fourier transform of f. Mathematicians often multiply1/fi1i- to this definition to symmetrize the formulas. However, thismakes the convolution formula A 7.20(iv) awkward. For physicists andpractitioners, the definition here is the most convenient.If a function f : R ~ C is continuous except for finitely many points,and absolutely integrable, then its Fourier transform j : R ~ C is abounded continuous function such that limk--+oo f( ±k) = O.Also we have an important relation
A A
f' = ikf.
A7.15 Rapidly decreasing function. A function f : R ~ C iscalled a rapidly decreasing function, if the following two conditions hold:(i) f is a COO-function (~A3.21).
484That is, f can be written as a difference of two monotone increasing functions(-+A2.11).
554
(ii) For any k, lEN, xl f(k l ~ 0 in the Ixl ~ 00 limit.The function is also called a Schwartz-class function (or S-function).
A 7.16 Inverse Fourier transform. If f is a rapidly decreasing function, then the following inversion formula holds:
1 100~ .f(x) = - dkf(k)e1k
.r
.21f -00
D
A 7.17 Theorem. If f : R ~ C is continuous (and bounded), andboth f and} are absolutely integrable, then the inversion formula holds.D
A 7.18 Parseval's equality. If the inversion formula holds and iff is square integrable, we have
D
A 7.19 Convolution. Let f and 9 be integrable function defined on R.The following 17.,( x) is called the convolution of f and 9 and is denotedby f * g:
h(x) = (f * g)(x)..-- 1: dyf(x - y)g(y).
A 7.20 Properties of convolution.(i) The definition is symmetric with respect to f and g, that is, f *9 =9 * f·(ii) If f and 9 are rapidly decreasing, then so is h.(iii) h(k) = f(k) * 9
(iv) f; 9 = }f;.
A 7.21 Theorem [Inversion formula for piecewise C1-function].Let f be piecewise C1-function (~A7.10) on R. Then
1 1 100. ~'2 [J(xo - 0) + f(xo + 0)] = 21f P.v . -00 dke1kxo f(k).
Here p.v. implies Cauchy's principal value of the integral. DWe can write the formula as
1 . Joo sin(.\(xo - 0)-[J(xo - 0) + f(xo + 0)] = hm d~ ~ f(O·2 .\.......00 -00 Xo -
555
o
A 7.22 Multidimensional case. It is easy to generalize the rapidlydecreasing property to multidimensional cases. If a function is rapidlydecreasing, then formal generalization of the above results to multidimensional cases are legitimate.
AS Ordinary Differential Equation
Practical advice. See Schaum's outline series Differential Equationsby R. Bronson for elementary methods and practice. To learn the theoretical side, V. 1. Arnold, Ordinary differential equations (MIT Press1973; there is a new version from Springer) is highly recommended.
A8.! Ordinary differential equation. Let y be a n-times differentiable function of x E R. A funcitonal relation f(x, y, y',' .. ,y(n») = 0among x,y, y', "', y(n) is called an ordinary differential equation (ODE)for y(x), and n is called its order, where the domain of f is assumedto be appropriate. Such y(x) that satisfies f = 0 is called a solution tothe ODE.
A8.2 General solution, singular solution. The solution y = i.p(x, Cl, C2 • ... ,cn )
to f = 0 in A8.! which contains n arbitrary constants Cl, •. " Cn (whichare called integral constants) is called the general solution of f = O. Asolution which can be obtained from this by specifying the arbitraryconstants is called a particular solution. A solution which cannot beobtained as a particular solution is called a singular solution.
A8.3 Normal form. If the highest order derivative of y is explicitly solved as y(n)(x) = F(x,y','" ,y(n-l»), we say the ODE is in thenormal form. Notice that not normal ODE's may have many pathological phenomena.
A8A Initial value problem of first order ODE. Consider the following first order ODE
dy- = f(x,y),dx
556
(A8.1)
where f is defined in a region D C R 2• To solve this under the condi
tion that y(xo) = Yo ((xo, Yo) E D) is called a initial value problem.
A8.5 Theorem [Cauchy-Peano]. If for (A8.1) f is continuous ona region D C R 2, then for any (xo,Yo) E D there is a solution y(x)of (A8.I) passing through this point whose domain is an open interval(a,w) (-00::;; a < w::;; 00), and in the limits x ---+ a and x ---+ w y(x)approaches the boundary of D or the solution becomes unbounded. D
A8.6 Lipschitz condition. Let f(x, y) be a continuous functionwhose domain is a region D C R 2 . For any compact set (---+[AI25])KeD, if for any (x, v), (x', v') E K there is a positive constant LK(which is usually dependent on K) such that
If (x, y) - f (x' , y') I ::;; L [{ Iy - y/l,
then f is said to satisfy a Lipschitz condition on D for y.If f and fy are both continuous in D, then f satisfies a Lipschitz condition on D.
A8.7 Theorem [Cauchy-Lipschitz uniqueness theorem]. For(A8.1), if f satisfies a Lipschitz condition on D for y, then if thereis a solution passing through (xo, Yo) E D, it is unique. D
A8.8 Theorem. Let f : R ---+ R be a continuous and monotonedecreasing function. Then the initial value problem dy / dx = f(y) (forx > xo) with y(xo) = Yo has a unique solution for x :2: xo. D
A8.9 Method of quadrature. To solve an ODE by a finite number ofindefinite integrals is called the method of quadrature. Representativeexamples are given in A8.I0-A8.I3.
A8.I0 Separation of variables. The first order equation of the following form
dydx = p(x)q(y),
where p and q are continuous functions, is solvable by the separationof variables: Let Q(y) be a primitive function (---+[AIV5]) ofl/q(y) andP that of p. Then Q(y) = P(x) + C is the general solution, where Cisthe integration constant.
A8.II Perfect differential equation, integrating factor. Thefirst order ODE of the following form
dy P(x, y)dx Q(x, V)'
557
where Q =/= O. If there is a function <I> such that <I>x = P and <I>y = Q,then <I>(x, y) = C, C being the integral constant, is the general solution.Even if P and Q may not have such a 'potential' <I>' P and Q times somefunction I called integrating factor may have a 'potential.' However, itis generally not easy to find such a factor except for some special cases.
A8.I2 Linear first order equation, variation of parameter. Thefirst order equation
dydx = p(x)y + q(x)
is called a linear equation. The equation can be solved by the method ofvariation of parameters. Let y(x) = C(x)erP(S)dS. Then the equationfor C can be integrated easily. As we will see in A8.I4, the method ofvariation of parameters always works for linear equations.
A8.I3 Bernoulli equation. The first order equation of the followingform is called a Bernoulli equation:
dydx =p(x)y+Q(x)yn,
where n is a real number. Introducing the new variable z(x) = y(x)l-n,we can reduce this equation to the case [12] for z(x).
A8.I4 Linear ODE with constant coefficients, characteristicequation. Consider
(A8.2)
where a and b are constants.
P()..) = )..2 + a).. + b
is called its characteristic equation, and its roots are called characteristic roots.
A8.I5 Theorem [General solution to (A8.2)]. If the characteristicroots of (A8.2) are 0: and (3 (=/= 0:), then its general solution is the linearcombination of rpl(X) = eQX and rp2(X) = ef3x . If 0: = (3, then the general solution is the linear combination of rpl(X) = eQX and rp2(X) = xeQX
(the characteristic roots need not be real.) 0rpl(X) and rp2(X) are called fundamental solutions and {rpl(X),rp2(X)} iscalled the system of fundamental solutions for (A8.2).
558
A8.16 Inhomogeneous equation, Lagrange's method of variation of constants. An ODE
(AS.3)
with nonzero f is called an inhomogeneous ODE (the one withoutnonzero f is called a homogeneous equation). The general solutionis given by the sum of the general solution to the corresponding homogeneous equation and one particular solution to the inhomogeneousproblem. A method to find one solution to (AS.3) is the Lagrange'smethod of variation of constants. Let CPi( x) be the fundamental solutions and determine the functions Ci ( x) to satisfy (AS.3):
One solution can be obtained from
f(X)cp2(X) dC2W(x) dx
where W(x) = cpl(X)cp~(x) - cp2(X)cp~(x), the Wronskian of the fundamental system. 0If the two characteristic roots a and f3 are distinct, then such a u isgiven by
u(x) = _1_ (r dsf(s)eO'(t-s) _ rtdsf(s)e(3(t-S)) .
a - f3 Jo Jo
A9 Vector Analysis
A9.1 Gradient. Suppose we have a sufficiently smooth function f :D --+ R, where D C R 2 is a region. We may imagine that f(P) forP E D is the altitude of the point P on the island D. Since we assumethe landscape to be sufficiently smooth, at each point on D there is awell defined direction n of the steepest ascent and the slope (magnitude) s(~ 0). That is, at each point on D, we may define the gradientvector sn, which will be denoted by a vector field grad f.
559
A9.2 Coordinate expression of grad f. Although grad f is meaningful without any specific coordinate system, in actual calculations,introduction of a coordinate system is often useful. Choose a Cartesiancoordinate system O-xy. Then the vector has the following representation:
(of of)
grad f = ax' ay ,
or
(A9.1)
A9.3 Remark. Note that to represent grad f in terms of numbers,we need two devices: one is the coordinate system to specify the pointin D with two numbers, which allow us to describe f as a functionof two independent variables, and two vectors to span the two dimensional vector 'grad l' at each point on D. In principle any choice isfine, but practically, it is wise to choose these base vectors to be parallel to the coordinate directions at each point. In the choice A9.2, thecoordinate system has globally the same coordinate direction at everypoint on D, and the basis vectors are chosen to be parallel to thesedirections, so again globally uniformly chosen. Nonuniformity in spaceof representation schemes may cause complications. Especially whenwe formally use operators as explained below, we must be very careful(-----tA9.7,A9.9 for a warning).
A9.4 Nabla or del. (A9.1) suggests that grad is a map which maps fto the gradient vector at each point in its domain (if f is once partiallydifferentiable). We often write this linear operator as V', which is callednabla,485 but is often read 'del' in the US. We write grad f = V'f. V'has the following expression if we use the Cartesian coordinates (read[3])
n aV' ,,- L ikfj , (A9.2)
1.:=1 Xk
where Xk is the k-th coordinate and h is the unit directional vector inthe k- th coordinate direction.
A9.5 Divergence. Suppose we have a flow field (velocity field) uon a domain D E R 3
. Let us consider a convex domain486 V C R 3
which may be imagined to be covered by area elements dS whose areais IdSI, and whose outward normal unit vector is dS IldSI. Then u· dS
48S'Nabla' is a kind of harp (Assyrian harp).486 A set is said to be convex if the segment connecting any two points in the set
is entirely included in the same set.
560
is the rate of the volume of fluid going out through the area element inthe unit time. Hence the area integral
r dS.uJav
is the total amount of the volume of the fluid lost from the domain V.The following limit, if exists, is called the divergence of the vector fieldu at point P and is written as div u:
d · - l' fav u . dS'lVU = 1m IVI 'IVI-+O
(A9.3)
(A9.4)
where the limit is taken over a nested sequence of convex volumes converging to a unique point P. Thus its meaning is clear: div u is therate of loss of the quantity carried by the flow field u per unit volume.
A9.6 Cartesian expression of div. From (A9.3) assuming the existence of the limit, we may easily derive the Cartesian expression fordiv. Choose as V a tiny cube whose surfaces are perpendicular to theCartesian coordinates of O-xyz. We immediately get
d. _ fht x OUy oU z
'LV U - ox + oy + oz'
A9.7 Operator div. (A9.4) again suggests that div is an operatorwhich maps a vector field to a scalar field. Comparing (A9.2) and(A9.4) allows us to write
div u = \7 . u.
This 'abuse' of the symbol is allowed only in the Cartesian coordinates.Generalization to n-space is straightforward.
A9.8 Curl. Let u be a vector field as in A9.5. Take a singly connectedcompact surface S in R 3 whose boundary is smooth. The boundaryclosed curve with the orientation according to the right-hand rule isdenoted by oS (see Fig.). Consider the following line integral alongoS:
r u. dl,Jaswhere dl is the line element along the boundary curve. Ket us imaginea straight vortex line and take S to be a disc perpendicular to the lineand its center is on the line. Immediately we see that the integral isthe strength of the vortex whose center (singular point) goes through
561
(A9.5)
S. Thus the following limit, if exists, describes the 'area' density ofthe n-component of the vortex (as in the case of angular velocity, thedirection of vortex is the direction of the axis of rotation with the righthand rule):
. Ja8 u· dln· curl u = hm lSI '15/-.0
where the limit is over the sequence of smooth surfaces which convergesto point P with its orientation in the n-direction. If the limit exists,then obviously there is a vector curl u called curl of the vector filed u.
A9.9 Cartesian expression of curl. If we assume the existenceof the limit (A9.5), we can easily derive the Cartesian expression forcurl u. We have
curl u = (ouz _ oUy oU;r _ oUz oUx _ ouy )
oy oz 'oz ax 'oy ax' (A9.6)
or~ J k
curl u = ax Oy Oz = \7 xu. (A9.7)UX u y Uz
This 'abuse' of the nabla symbol is admissible only with the Cartesiancoordinates.
A9.10 Potential field, potential, solenoidal field, irrotationalfield. If a vector field u allows an expression u = grad 1>, then the fieldis called a potential field and 1> is called its potential. A field withoutdivergence div u = 0 is called a divergenceless or solenoidal field. Thefield without curl curl u = 0 is called an irrotational field.
A9.!!.(i) c'url grad 1> = 0 (Potential fields are irrotational).(ii) div curl u = O.(iii) If a vector field is irrotational on a singly connected domain,487then the field is a potential field.(iv) If a vector field u is solenoidal in a singly connected domain, thenthere is a vector field A on the domain such that u = curl A. A iscalled a vector potential.
487 A domain is singly connected, if, for any given pair of points in the domain,any two curves connecting them are homotopic. That is, they can be smoothlydeformed into each other without going out of the domain.
562
A9.12 Theorem [Gauss-Stokes-Green's theorem]. From our definitions of divergence and curl, we have(i) Gauss' theorem.
r u. dB = rdiv UdT,Jav Jv (A9.8)
where V is a domain in the 3-space and dT is the volume element.(ii) Stokes' theorem.
r u. dl = rcurl u· dB,Jas Js
where S is a compact surface in 3-space.
(A9.9)
A9.13 Laplacian. The operator /);, defined by /);,1 - div grad 1 iscalled the Laplacian, and is often written as 'V2 • /);, is defined for ascalar function.
A9.14 Laplacian for vector fields. If we formally calculate curl curl uin the Cartesian coordinates, then we have
curl curl u = grad div u - 'V2u.
Since the formal calculation treating 'V as a vector is legitimate onlyin the Cartesian coordinate system, this calculation is meaningful onlyin the Cartesian system. Thus, in particular 'V2u = (/);,'ux,/);,uy,/);,uz )
is meaningful only in this coordinate system. However, the other twoterms are coordinate-free expressions. Hence, we define /);,u as
/);,u _ grad div u - cud curl u.
563
(A9.10)
Index
AAbel's formula· . ·541absolute continuous spectrum· . ·473absolute value· . ·96absolutely continuity' . ·286absolutely convergence· . ·526accumulating point·· ·528addition theorem, for spherical
harmonics· . ·374adjoint operator· . ·468adjoint· . ·468admittance· . ·139advanced Green's function· . ·503advection· . ·25afterglow effect .. ·452, 505Airy equation· . ·459Airy function· . ·395Airy integral· . ·459Aleksandrov's theorem· . ·411algebraic branch point· . ·125algebraic function· . ·88aliasing· . ·438almost everywhere· . ·274Ampere's law· . ·39analytic completion· . ·118analytic continuation .. ·117analytic function· . ·114, 117analyticity· . ·114Anderson localization·· ·474annular problem· . ·378anti-difusion .. ·403area element .. ·63argument·· ·96associate Legendre functions· . ·353,
371asymptotic expansion· . ·357
563
asymptotic sequence· . ·357asymptotic series· . ·357atomic measure .. ·10Ausstrahlungsbedingung .. ·500autonomous" ·166axial vector' . ·41
Bbackward Euler method· . ·426balance equation .. ·25Banach space' . ·291band-limited function· . ·437Beppo-Levi's theorem· . ·277Bernoulli equation .. ·173Bessel function, addition theo-
rem·· ·388Bessel function, generating func
tion .. ·385Bessel function, orthonormal re-
lation .. ·393Bessel function· . ·130, 382Bessel transform .. ·454Bessel's equation· . ·381Bessel's inequality .. ·295, 552Bessel's integral· . ·385beta function·· ·145biholomorphic map· . ·154Biot-Savart's law· . ·39bodily forces· . ·32Bolzano-Weierstrass' theorem· . ·529Borel measure· . ·287Borel summable .. ·367Borel transform· . ·367bounded operator· . ·472bounded set· . ·528bounded variation· . ·432
bra·· ·294branch point· . ·125branch·· ·93Burgers equation· . ·4
CCantor, biography .. ·260Cantor set .. ·168Casorati-Weierstrass theorem· . ·127Cauchy, biography·· ·107Cauchy principal value· . ·209Cauchy sequence· . ·525[Cauchy-Lipschitz uniqueness the-
orem .. ·168,556Cauchy-Peano's theorem·· ·167,
556Cauchy-Riemann equation· . ·98Cauchy's formula" ·106Cauchy's theorem·· ·103Cauchy-Schwartz inequality· . ·293causal solution' . ·505causality· . ·139chain rule· . ·51chaos· . ·317characteristic curve, for wave equa
tion .. ·414characteristic curves .. ·191characteristic differential equa-
tion .. ·191characteristic exponent .. ·355characteristic polynomial· . ·177Chebychev polynomial· . ·316Chebychev's inequality' . ·329circline .. ·157Clairaut's differential equation· . ·165,
180classical polynomials· . ·306classical solution' . ·401classical special functions· . ·335cocircline condition .. ·157coherent state' . ·437Cole-Hopf transformation' . ·4compact operator· . ·475comparison theorem' . ·411
564
complete· . ·247, 251, 291complex conjugate·· ·95complex Fourier series· . ·552complex function· . ·54, 87complex number·· ·95complex plane· . ·96complexification .. ·182conditional convergence· . ·527conditional extremum .. ·79confluent hypergeometric equa-
tion .. ·354confluent hypergeometric func-
tion .. ·354conformal invariance .. ·161conformal map' . ·153conformal transform .. ·224conformality .. ·153, 155conjugate harmonic function· . ·100conjugate point· . ·84conservation law, local .. ·26conservation of charge· . ·39consistency, in numerical analy-
sis· . ·423constitutive relations· . ·32continuity equation· . ·37continuous spectrum .. ·473continuum· . ·24contour integration .. ·102convection .. ·25convergence· . ·97convergence circle· . ·115convergence coordinate· . ·458convergence disk· . ·115convergence radius· . ·114convex function .. ·50convexity' . ·535convolution, Fourier transform' . ·554convolution, Laplace transform
.. ·459convolution· . ·211cosine-Fourier series· . ·257Coulomb gauge·· ·41Coulomb's law· . ·39countable set· . ·527
Courant-Friedrichs-Lewy condi-tion .. ·429
Cranck-Nicholson method· . ·426cross ratio' . ·157curl· . ·33, 64, 560curl, curvilinear coordinates· . ·75curvilinear coordinates .. ·71cylinder function· . ·386cylindrical coordinates· . ·73
Dd'Alembert, biography· . ·61d'Alembertian .. ·42d'Alembert's formula· . ·57d'Alembert 's transformation· . ·175Darboux's equation· . ·418deBois Raymond's theorem·· ·552decomposition of unity·· ·297,
300, 469deconvolution· . ·431degree of ramification· . ·125delta function· . ·82delta function, definition· . ·203delta function, Fourier expansion
.. ·442delta function, Fourier transform
.. ·445delta function, Laplace transform
.. ·461dense· . ·528derivative· . ·98, 109Descartes' sign rule· . ·310diagonal method .. ·260differentiable· . ·47, 55, 98differential coefficient .. ·47differential form· . ·69differential operator .. ·202diffusion constant .. ·4, 26diffusion equation· . ·4, 27diffusion equation, Brownian par-
ticle .. ·28diffusion equation, short time be
havior .. ·405Dini's condition' . ·253
565
Dirac·· ·294direct method· . ·85Dirichlet condition· . ·18, 253Dirichlet function· . ·258, 275discrete Fourier transform· . ·440discrete spectrum· . ·473discretization· . ·13, 421discriminant .. ·17displacement current· . ·40displacement vector· . ·32divergence· . ·63, 559divergence, curvilinear coordinates
... 75domain of dependence· . ·415domain, of operator· . ·467double exponential decay formula
.. ·326double layer· . ·483double series· . ·549drumhead·· ·34dual space· . ·294
EEarnshaw's theorem·· ·409eigenvalue problem· . ·263electrical displacement· . ·42elementary functions· . ·88elementary transcendental func-
tions .. ·88elliptic cylindrical coordinates· . ·73elliptic equation·· ·16, 17, 21,
411elliptic integral· . ·160energy integral· . ·35, 415entire function· . ·88equidimensional equation· . ·178essential spectrum· . ·473Euclidean space' . ·291Euler, biography· . ·89Euler-Lagrange equation· . ·78Euler-MacLaurin sum formula· . ·443Euler's constant· . ·147Euler's equation, fluid dynamics
.. ·37
Euler's equation (variational cal-culus) .. ·78
Euler's formula· . ·88Euler's integral· . ·142Euler's theorem (variational cal-
culus) .. ·78existence of mollifier .. ·536exponential function· . ·88exponential, of matrix .. ·181,189extension of operator· . ·467exterior problem· . ·21, 378
Ffactorial· . ·143Faraday's law· . ·39fast Fourier transform· . ·440fast inverse Laplace transform· . ·463Fejer's theorem' . ·254, 552Feynman-Kac formula·· ·496Feynman-Kac path integral· . ·496FFT .. ·440Fick's law· . ·26first digit problem· . ·287Floque's theorem· . ·355flux density· . ·25focusing effect .. ·419Fokker-Planck equation· . ·30formally self-adjoint· . ·470Fourier, biography···9Fourier cosine transform· . ·267,
433Fourier expansion· . ·130, 243Fourier series· . ·244, 300Fourier sine transform· . ·266, 433Fourier transform· . ·430, 553Fourier transform, of generalized
functions .. ·444Fourier-Bessel-Dini expansion· . ·394Fourier-Bessel(-Dini) transforma-
tion .. ·339Fourier's law· . ·26Fredholm integral equation· . ·475Fredholm operator· . ·475Fresnel integral· . ·136
566
Frobenius' method· . ·345Frobenius' theorem (on minimal
polynomial) .. ·189Fubini's theorem· . ·277function element .. ·117functional .. ·77functional derivative· . ·81functional differentiation· . ·82functional integral· . ·287fundamental matrix· . ·342fundamental set ···284fundamental solution· . ·177, 202fundamental system of solutions
.. ·183,341
GGarding's theorem· . ·420Galerkin method· . ·422Gamma function·· ·142,364gap theorem· . ·119gauge symmetry' . ·41gauge transformation· . ·41Gauss, biography .. ·111Gauss formula, for integration· . ·320Gauss formula (numerical inte-
gration) .. ·320Gauss quadrature· . ·321Gaussian integral .. ·280Gauss-Stokes-Green's theorem· . ·68,
561Gegenbauer-Neumann formula· . ·388general solution, ODE· . ·164, 555generalized eigenspace· . ·189generalized Fourier expansion· . ·296generalized function· . ·202, 444generalized function, differenti-
ation .. ·207generalized homogeneous func
tion .. ·196generalized Rodrigues formula· . ·48,
306generalized solution· . ·5generating function· . ·115, 130generating function, for orthog-
onal polynomials· . ·307genuine singularity· . ·127Gershgorin's theorem· . ·186Gibbs' phenomenon·· ·251Glass pattern·· ·167good function principle· . ·278gradient· . ·53, 62, 558gradient, curvilinear coordinates
· .. 74Gram-Schmidt orthonormaliza-
tion .. ·297, 304Green, biography .. ·11Green's formula· . ·227, 407Green's function· . ·11, 202, 214,
239, 302, 493Green's function, diffusion equa
tion .. ·493Green's function, Sturm-Liouville
problem· . ·215Green's theorem· . ·68group" ·156
HHolder continuity· . ·167Hankel function .. ·393Hankel's theorem· . ·453harmonic function· . ·27, 67,100,
407Hartman-Grobman theorem· . ·185Haselgrove method· . ·328Hausdorff's theorem· . ·288heat conductivity· . ·26Heaviside step function .. ·209Heine-Borel covering theorem· . ·529Helmholtz equation· . ·395, 498Helmholtz formula· . ·505Helmholtz' theorem (vector field)
.. ·70Helmholtz-Hodge theorem· . ·70Helmholtz-Stokes-Blumental the-
orem ···71Hermite polynomial· . ·313Hilbert, biography· . ·291Hilbert space· . ·290
567
Hilbert transform·· ·140,452Hilbert-Schmidt' theorem· . ·475holomorphic function· . ·99, 155holomorphy .. ·155homogeneous conditions·· ·19homogeneous equation· . ·172homogeneous function .. ·195Hooke's law· . ·33Huygens' principle· . ·418hyperbolic equation· . ·16, 22hyperbolic equation in Garding's
sense· . ·419hyperbolic fixed point· . ·185hyperbolic function· . ·92hyperbolicity in Garding's sense
.. ·419hypergeometric equation .. ·352
Iideal fluid· . ·37idempotency .. ·297identity theorem (real analytic
function) .. ·536image charge .. ·225imaginary axis .. ·96imaginary part· . ·95imaginary unit· . ·95implicit method· . ·426incompressibility· . ·38indentation· . ·138indicial equation· . ·346infinitely many-valued function
.. ·93infinte product· . ·544inner product· . ·300integral equation· . ·324integral, of generalized function
... 211integrating factor .. ·171, 556integration, bilinearity .. ·103interior problem· . ·21, 377inverse trigonometric function· . ·94irregular singular point .. ·345,
357
irrotational field· . ·67irrotational flow .. ·38
JJensen's inequality· . ·50Jordan's lemma· . ·135Joukowski transformation· . ·158
Kket .. ·294kinematic viscosity· . ·37Kramers-Kronig relation·· ·139Kummer's equation· . ·354K-vector space· . ·289
LL2-convergence .. ·298, 552L 2- norm .. ·298L2-space .. ·292, 298lacunary series .. ·118Lagrange, biography .. ·79Lagrange interpolation formula
., ·322Lagrange multiplier· . ·79Lagrange's method of variation
of constants .. ·177, 558Laplace, biography· . ·456Laplace equation· . ·5, 41, 263,
406Laplace equation, Cartesian co
ordinates .. ·265Laplace equation, Cylindrical co
ordinates .. ·336Laplace euqation, fundamental
solution· . ·482Laplace transformation· . ·455Laplace's first integral· . ·313Laplacian, curvilinear coordinates
.. ·75Laplacian· . ·3, 14, 67Laplacian for vector field· . ·562Laplacian for vector field· . ·67Laurent series· . ·128Laurent's theorem" ·128
568
least square approximation· . ·304Lebesgue extension·· ·286Lebesgue integrable· . ·275Lebesgue integral· . ·275Lebesgue integral, relation to Rie-
mann integral .. ·276Lebesgue's convergence theorem
.. ·277Lebesgue's measure problem· . ·288left differentiability· . ·532Legendre equation· . ·351, 370Legendre function· . ·352Legendre polynomial· . ·305, 311,
351Legendre's condition, variational
calculus· . ·83liminf·· ·526limit· . ·97limsup .. ·526linear operator· . ·7, 467linear phenomenological law .. ·26linear response .. ·52linearity· . ·7linearly stable· . ·185Liouville's theorem· . ·411Lipschitz condition' . ·167, 556logarithm· . ·93logarithmic branch point· . ·125Lommel's formula·· ·392Lommel's integral· . ·390long-time tail .. ·36Lorentz gauge· . ·41
MMobius transformation· . ·156Malgrange-Ehrenpreis' theorem
...197Markov property .. ·495maximum principle· . ·408Maxwell-Cattaneo equation· . ·405Maxwell's equations· . ·40Maxwell's equation, in material
.. ·43Maxwell's equation, in vacuum
.. ·40mean-value theorem' . ·407mean-value theorem, converse· . ·408measurable· . ·275, 284measurable function .. ·286measurable set ···284measure .. ·285measure zero .. ·274Mellin transformation· . ·465method of images· . ·225, 486method of lowering the order· . ·175method of quadrature .. ·170method of spherical means· . ·418metric tensor .. ·72minimal polynomial· . ·189minimization sequence .. ·85minimum algebra·· ·285Mobius transformation· . ·156modified Bessel functions· . ·394modified Radon transform· . ·450modulus·· ·96momentum flux density .. ·35Monte Carlo method·· ·329Morera's theorem· . ·111multipole expansion· . ·375
Nnabla· . ·63, 559natural boundary· . ·118Navier-Stokes equation· . ·37n-ball, volume of· . ·150Neumann condition·· ·19Neumann function·· ·390,485Nevanlinna's theorem· . ·368Newtonian viscosity· . ·36nodal set· . ·488Noether's theorem· . ·84nonautonomous .. ·166non-standard· . ·89norm· . ·78, 291norm, for matrix· . ·181normal form .. ·165normal form, ODE· . ·555normal operator· . ·469
569
null set .. ·284numerical integration· . ·319
oobservable· . ·467Ohm's law· . ·44optimal truncation, of asymptotic
series· . ·358ordinary differential equation· . ·164orientation· . ·104orthogonal curvilinear coordinate
system" ·72orthogonal projection· . ·297orthonormal basis .. ·295outer measure· . ·284
pparabolic boundary .. ·401parabolic equation .. ·17parallellogram law .. ·293Parseval's equality· . ·295, 552partial derivative· . ·55partial wave expansion· . ·397partially differentiable· . ·55particular solution, ODE· . ·164,
555partner .. ·344path integral· . ·287perfect differential equation· . ·170perfect fluid· . ·37perturbation series· . ·366Plancherel's formula· . ·435Plancherel's theorem· . ·435Plemelj formula·· ·139,447Poincare's lemma· . ·69Poincare's lemma, converse of· . ·69point spectrum· . ·473Poisson equation, fundamental so
lution .. ·210Poisson sum formula· . ·442Poisson's equation, derivation· . ·6,
41Poisson's formula· . ·240, 269Poisson's ratio· . ·33
Poisson's sum formula· . ·442polar form .. ·96polar vector· . ·41polarization· . ·42pole·· ·127potential· . ·67potential field· . ·67potential flow· . ·38power·· ·93power series· . ·114power spectrum·· ·439preservation of order .. ·404principal part·· ·16, 129principal value of integral· . ·137principal value· . ·93, 96, 138principle of invariance of func-
tional relations· . ·117probability measure· . ·287projection· . ·297propagator·· ·503pure point spectrum· . ·473Pythagoras' theorem' . ·293
Qquasi Monte Carlo method· . ·327quasilinearfirst order PDE·· ·190
Rradiation condition· . ·500Radon inversion formula· . ·450Radon transform· . ·449random walk· . ·494rational function, integration· . ·134rational function· . ·88rational integral function· . ·88reaction-diffusion equation· . ·27real axis· . ·96real part .. ·95rectification· . ·166reflection principle· . ·224region·· ·6regular distribution· . ·205regular point .. ·472regular singular point· . ·345
570
regular Sturm-Liouville problem.. ·214
removable singularity· . ·127residue theorem· . ·131residue· . ·131resolvent· . ·472resolvent equation· . ·472resolvent set· . ·472response function· . ·139retarded Green's function· . ·500Reynolds number .. ·38Riccati's equation·· ·174Riemann, biography· . ·121Riemann geometry .. ·72Riemann mapping theorem· . ·158Riemann surface .. ·119Riemann-Lebesgue lemma· . ·255,
435Riemann-Lebesgue's theorem (con
vergence of Fourier series).. ·253
right differentiability· . ·532Ritz's method· . ·85Robin condition· . ·20Rodrigues' formula· . ·306
Ssampling function· . ·438Sampling theorem· . ·437Sato hyperfunction .. ·141scalar potential· . ·41scalar product· . ·291, 294scalar product, offunctions .. ·300scale invariant equation· . ·179Schliifli's integral· . ·308Schwartz class· . ·445Schwartz-class functions .. ·205Schwarz' theorem·· ·241,547Schwarz-Christoffel formula· . ·159Schwinger-Feynman parameter for-
mula·· ·149second order PDE, classification
.. ·16second variation· . ·83
self-adjoint operator· . ·468semibounded operator·· ·470separable .. ·295separation constant .. ·263separation of variables .. ·262, 332,
471separation theorem· . ·343shear viscosity· . ·36O"-additivity .. ·285simple function· . ·275sine-Fourier series· . ·257singly connected· . ·64singular continuous spectrum· . ·473singular point· . ·345singular solution, ODE·· ·164,
555singularity· . ·125Smoluchowski equation· . ·29Sobolev space· . ·293solenoidal field· . ·67, 561solution, classical· . ·5source term .. ·26special functions .. ·334special functions, classical· . ·335spectral decomposition· . ·469spectral weight· . ·469spectrum· . ·472spectrum, absolutely continuous
.. ·473spectrum, discrete· . ·473spectrum, essential· . ·473spectrum, point .. ·473spectrum, pure point .. ·473spectrum, singular continuous· . ·473spherical Bessel function· . ·396spherical Bessel function, orthonor-
mal relation· . ·397spherical coordinates· . ·73spherical Hankel function· . ·396spherical harmonic function· . ·373spherical harmonics· . ·372spherical Neumann functions· . ·396stability, in numerical analysis
.. ·423
571
stability, of fixed point· . ·185steepest descent· . ·365Stirling's formula· . ·150, 364Stokes approximation· . ·38Stokes equation .. ·38Stokes line .. ·360Stokes' phenomenon· . ·360strain tensor' . ·33stress tensor .. ·31strong derivative· . ·53strong differentiation· . ·53strong maximum principle· . ·409Sturm-Liouville eigenvalue prob-
lem .. ·478Sturm-Liouville problem· . ·214,
307Sturm's theorem· . ·309substantial derivative· . ·38superposition principle· . ·8supplementary equations· . ·44symmetric operator· . ·467system of fundamental solutions
... 177
Ttelegrapher's equation· . ·45test function· . ·204, 445theorem, Carlson's· . ·254theorem, Cauchy's·· ·103theorem, Dini .. ·253theorem, Gershgorin's .. ·186theorem, Hartman-Grobman .. ·185theorem, Hilbert-Schmidt' .. ·475theorem, indefinite integral· . ·106theorem, Morera's·· ·111theorem, Navanlinna's .. ·368theorem of identity' . ·117theorem, Sturm's· . ·309theorem, von Neumann's·· ·427theorem, Weyl-Stone-Titchmarsh-
Kodaira .. ·480theorem, Wiener-Khinchin· . ·439thermal diffusion constant· . ·27thermodynamics .. ·172
B-method .. ·426time correlation· . ·438time-delay· . ·460time-dependent Ginzburg-Landau
equation· . ·21total differential .. ·547transversality conditions· . ·80triangle inequality· . ·97, 294trigonometric function· . ·91
Uuncertainty principle· . ·436uncountable set· . ·527uniform convergence· . ·542uniform measure·· ·287upstream scheme· . ·429
Vvariation .. ·77variational calculus· . ·77vector potential· . ·41, 70vector space, complex· . ·289vector space, real· . ·289Veinberg's theorem· . ·84velocity potential· . ·38volume·· ·283volume element· . ·24, 75volume element, curvilinear co-
ordinates· . ·75volume viscosity· . ·36von Neumann's theorem·· ·427
WWatson's lemma· . ·362wave equation· . ·3, 34, 419, 452wave equation, fundamental so-
lution .. ·502wave equation in d-space .. ·451wave speed···3weak solution· . ·5Weierstrass, biography·· ·248Weierstrass function· . ·51Weierstrass' theorem·· ·246well-posed· . ·402
572
well-posedness, of wave equation.. ·417
Weyl's equidistribution theorem.. ·328
Weyl-Stone-Titchmarsh-Kodairatheorem· . ·480
Wick's theorem·· ·280Wiener-Khinchin's theorem· . ·439Wronskian .. ·342
X,Y,ZX-ray tomography·· ·450Young's modulus· . ·33Young's theorem·· ·547zeta function .. ·147z-transformation .. ·456