Calculus one and several variables 10E Salas solutions manual ch09
14. Calculus of Several Variables
Transcript of 14. Calculus of Several Variables
14. Calculus of Several Variables
10/12/21
New Homework: Problems 13.17, 13.20, 14.7, 14.9, and 14.13 are due on Tues-day, October 19.
Although the chapter title refers to calculus, the focus of the chapter is rather narrower,concentrating on various types of derivatives. Much of the material here is referred toas advanced calculus, or even introductory analysis.
14.1 The Ordinary Derivative
We begin by recalling the ordinary derivative. Let U ⊂ R be an open set. A functionf : U → R is differentiable at x0 ∈ U if
f′(x0) = limh→0
f(x0 + h) − f(x0)
h
exists. It is differentiable on U if the derivative f′(x0) exists for every x0 ∈ U. Theexpression inside the limit, both here and in similar definitions, is the difference quotient.The derivative is also denoted df/dx.
2 MATH METHODS
14.2 Difference Quotients
The difference quotients used to define the derivative are the slopes of chord runningbetween
(
x0, f(x0))
and(
x0 + h, f(x0 + h))
. We take the limit of these slopes to findthe slope of the tangent to f at x0. The derivative can be used to define a line, L by
y = f(x0) + f′(x0)(x− x0).
The line L is tangent to the graph of f at the point(
x0, f(x0))
.
x
y L
b
x0
f
Figure 14.2.1: The tangent line L at x0 has the equation y = f(x0) + f′(x0)(x− x0).
Linear Approximation. The tangent line L is a linear approximation of f(x) for x near x0.Setting ∆x = x− x0, and ∆y = y− f(x0), the equation becomes
∆y = y− f(x0) = f′(x0)∆x.
Now y approximates f(x), so f(x) ≈ f(x0) + f′(x0)∆x.
14. CALCULUS OF SEVERAL VARIABLES 3
14.3 Partial Derivatives
In economics, we often deal with functions defined on a subset of Rm. Production andutility functions, cost, expenditure, and profit functions are all examples. Let U ⊂ Rm
be an open set and suppose f : U → R.We define the ith partial derivative of f at x0 ∈ U by
∂f
∂xi(x0) = lim
h→0
f(x0 + hei) − f(x0)
h
provided the limit exists.
4 MATH METHODS
14.4 Computing Partial Derivatives
In the partial derivative, all of the variables except for xi are held constant. We thencompute the partial derivative by taking the limit of difference quotients as h → 0. Itgives the slope of a tangent line in direction ei. It is also the rate of change of thefunction f in the direction ei.
Since all of the variables except xi are held constant when computing ∂f/∂xi, wecan compute ∂f/∂xi by treating the other variables as constant and taking the ordinaryderivative. Thus
∂(x2 + y2)
∂x= 2x,
∂(x2y + xyz + y2z3)
∂x= 2xy + yz, and
∂(x2y + xyz + y2z3)
∂y= x2 + xz + 2yz3.
14. CALCULUS OF SEVERAL VARIABLES 5
14.5 Higher Partial Derviatives
If a partial derivative of f exists, it too may have partial derivatives. Expressions such as
∂
∂y
(
∂f
∂x
)
=∂
2f
∂y ∂xand
∂
∂x
(
∂f
∂x
)
=∂
2f
∂x2.
are used to write second (and higher) partial derivatives. Another common notationuses subscripts. Thus
∂f
∂x= fx,
∂
∂y
(
∂f
∂x
)
= (fx)y = fxy, and∂2f
∂x2= fxx.
Ck Functions. The notation Ck denotes the set of k-times continuously differentiablefunctions. A function f ∈ Ck if and only if all of the first k partial derivatives exist andare continuous. When all of the partial derivatives of f exist and are continuous, wewrite f ∈ C∞. A vector-valued function f is Ck if and only if each component functionis Ck.
6 MATH METHODS
14.6 Partial Derivatives and Utility
In consumer theory, partial derivatives can be used to compute marginal utilities andmarginal rates of substitution.
Example 14.6.1: Utility Functions. Suppose we have a Cobb-Douglas utility function
u(x, y) = xαy1−α
defined on R2++ where 0 < α < 1. The partial derivatives are the marginal utilities
MUx =∂u
∂x= αxα−1y1−α and MUy =
∂u
∂y= (1 − α)xαy−α.
We can use the marginal utilities to construct the marginal rate of substitution:
MRSxy =MUa
MUb
=α
1 − α
y
x.
14. CALCULUS OF SEVERAL VARIABLES 7
14.7 Partial Derivatives and Production
Partial derivatives are also used in producer theory. For one, they can be used to expressmarginal products and the marginal rate of technical substitution.
Example 14.7.1: Production Functions. Similarly, if
f(x1, x2) =√x1 +
√x2
is a production function on R2++, the partial derivatives are the marginal products
MP1 =∂f
∂x1=
1
2√x1
and MP2 =∂f
∂x2=
1
2√x2
.
Then the marginal rate of technical substitution is
MRTS12 =MP1
MP2=
√
x2
x1.
8 MATH METHODS
14.8 Approximation via Partial Derivatives
We can also use partial derivatives to approximate functions. The key is that if we havea small change in x, ∆x, the function will change by approximately
∆f ≈ ∂f
∂x∆x.
Here’s an example to show how it works.
Example 14.8.1: Approximation. Take the Cobb-Douglas production function F(K, L) =6K1/3L2/3. Suppose K = 8000 and L = 2744, yielding F(8000, 2744) = 23, 520. Nowsuppose K increases by ∆K = 10. We estimate the effect on output using ∂f/∂K.
∂f
∂K(8000, 2744) = 2K−2/3L2/3 = 2(8000)−2/3(2744)2/3 = .98
so we expect output to increase by about
∂f
∂K(8000, 2744) × ∆K = 0.98 × 10 = 9.8.
to 23, 520 + 9.8 = 23, 529.8. The actual value is f(8010, 2744) ≈ 23, 529.796, so youcan see it is a pretty good approximation, at least for relatively small changes in K orL.
14. CALCULUS OF SEVERAL VARIABLES 9
14.9 Frechet Differentiable Functions
The next step is to consider differentiability for vector-valued functions. The definitionmay look a little strange if you have never seen it before. However, it is a generalizationof the ordinary derivative.
Frechet Derivative. Suppose f : U → Rm where U is an open subset of Rk The functionf is Frechet differentiable at x0 ∈ U if there is a linear function L : Rk → Rm such that
lim‖h‖→0
f(x0 + h) − f(x0) − L(h)
‖h‖ = 0.
In that case L is the Frechet derivative at x0. If f is Frechet differentiable at every x0 ∈ U,we say that f is Frechet differentiable on U. The Frechet derivative at x0 is denotedDf(x0), or Dfx0. If we need to to specify which variables we are using to differentiate,we will use Dxf(x0).
10 MATH METHODS
14.10 Derivative of the Dot Product
We apply the definition to the dot product. The derivative of p·x is the covector pT .This makes sense as the derivative is a linear functional of the vector x, which must bea covector.
Example 14.10.1: Dot Product. An easy example is f(x) = p·x =∑k
i=1 pixi. Heref : Rk → R. Since it is a linear function, it is its own linear approximation.
If we set L(h) = f(h) = p·h, we find that
f(x0 + h) − f(x0) − L(h) = p·(x0 + h) − p·x0 − p·h = 0.
We divide by ‖h‖ and take the limit. The limit is zero, so the linear function L : Rk → R
is the Frechet derivative. As a linear functional, it is represented by the row vector p.In other words,
D(p·x)x0 = pT = (p1, . . . , pk)
for all x0 ∈ Rk.
14. CALCULUS OF SEVERAL VARIABLES 11
14.11 Frechet and Ordinary Derivatives
Suppose k = m = 1, so f : R → R. In this case the Frechet derivative is the same asthe ordinary derivative. One way to see that is to rewrite the definition of the ordinaryderivative.
f′(x0) = limh→0
f(x0 + h) − f(x0)
h
0 = limh→0
f(x0 + h) − f(x0)
h− f′(x0)
= limh→0
f(x0 + h) − f(x0) − hf′(x0)
h
= limh→0
f(x0 + h) − f(x0) − L(h)
h
where L(h) = f′(x0)h is the required linear function of h.
12 MATH METHODS
14.12 Matrix Form of the Derivative — The Jacobian
We’re about ready to write down the general form of the Frechet derivative for a C1
function.When dealing with derivatives of vector functions from Rk to Rm where m > 1, we
will have to be somewhat pedantic about how the vectors are written. Writing themproperly allows us to express certain relations as matrix products. Write them wrongly,and you get nonsense. Although we sometimes write a vector x ∈ Rm in casual fashionas (x1, . . . , xm), here it needs to be written as a column
x =
x1...
xm
.
When dealing with derivatives, we must use the formal column form. If f : Rk → Rm
is a vector function, the derivative must be a linear function from Rk → Rm. Thatmeans it can be represented by an m× k matrix with terms
[
Df(x0)]
ij=
∂fi
∂xj(x0)
Then we write the derivative of
f(x) =
f1(x)...
fm(x)
as Df(x0) =
∂f1/∂x1 ∂f1/∂x2 · · · ∂f1/∂xk∂f2/∂x1 ∂f2/∂x2 · · · ∂f2/∂xk
......
. . ....
∂fm/∂x1 ∂fm/∂x2 · · · ∂fm/∂xk
where all of the partial derivatives are evaluated at x0. The derivative matrix Df issometimes called the Jacobian derivative, Jacobian matrix, or just plain Jacobian. Thelinear function L(h) that the Jacobian represents is formed by multiplying the Jacobianby a vector h, obtaining a vector in Rm.
14. CALCULUS OF SEVERAL VARIABLES 13
14.13 The Derivative: An Example
Consider the function f : R3 → R2.
f(x) =(
x1x22 + x1x3
x22 + x2
1x3 + x2x3
)
.
The derivative is a linear mapping from R3 → R2, and is represented by a 2× 3 matrix.
Df(x) =(
x22 + x3 2x1x2 x1
2x1x3 2x2 + x3 x21 + x2
)
We evaluate the derivative at x0 = (1, 1, 2)T , obtaining
Df
(112
)
=( 3 1 1
4 4 2
)
.
Finally, the linear function L : R3 → R2 is
L(h) =[
Dfx0
]
h =( 3 1 1
4 4 2
)
(
h1
h2
h3
)
.
14 MATH METHODS
14.14 A Different Kind of Matrix Derivative
Should we have a function f : Rk → Rm that starts as a row vector,(
f1(x), . . . , f(xK))
we will take the derivative to be k×m matrix Df = (D(fT ))T . In other words,
Df =
∂f1/∂x1 · · · ∂fm/∂x1
∂f1/∂x2 · · · ∂fm/∂x2...
. . ....
∂f1/∂xk · · · ∂fm/∂xk
.
In such a case, the linear mapping L(h) is not formed by post-multiplying Df by h,but by pre-multiplying by hT .
When we consider second derivatives of functions from Rm → R, the first derivativewill be a row vector, and we can use the same method as above to take its derivativethat will then be represented by an m×m matrix. It will define a bilinear functional.One of the vectors will be used to post-multiply D2f, the other will be transposed so itcan pre-multiply D2f. In combination, this gives us a bilinear form.
14. CALCULUS OF SEVERAL VARIABLES 15
14.15 Frechet Derivatives and Approximation
Examining the definition, we see that the Frechet derivative Df(x0) defines a linearapproximation to the function f by
f(x0 + h) ≈ f(x0) + L(h) (14.15.1)
where L = Df(x0).To understand the approximation a bit better, we define the remainder R by
R(x0,h) = f(x0 + h) − f(x0) − L(h).
The remainder is the difference between the function f and its linear approximationf(x0) + L(h). By the definition of the derivative,
limh→0
R(x0,h)
‖h‖ = 0.
This can also be writtenR(x0,h)
‖h‖ = o(
‖h‖)
at 0.
We conclude that the linear approximation is fairly accurate near x0, and gets moreaccurate the closer we are to x0. In fact, the error term, the remainder, converges tozero enough faster that ‖h‖, that the ratio of them also converges to zero.
16 MATH METHODS
14.16 Marginal Rate of Substitution: Interpretation
We can use our knowledge of approximation to interpret the marginal rate of sub-stitution. Suppose we start at x0. Now consider the indifference curve through x,x : u(x) = u(x0). Let’s make a small change in xi, holding everything else constant.This moves us off the u(x) indifference curve. Now change xj to return us the the indif-ference curve. Since indifference curves slope downward (preferences are monotonic),there is a trade-off between goods i and j. They must have opposite signs.
We can now use the linear approximation. Define h by hi = ∆xi, hj = ∆xj, andhk = 0 for k 6= i, j, so h = ∆xiei + ∆xjej. Then
∆u ≈ Df(x0)h =∂u
∂xi∆xi +
∂u
∂xj∆xj = MUi∆xi + MUj ∆xj
It follows that MUi ∆xi + MUj∆xj ≈ 0, so
MRSij =MUi
MUj
≈ −∆xj
∆xi
In other words, if we plot a slice of the indifference curve in xi-xj space, with xi on thehorizontal axis, the marginal rate of substitution approximates the negative slope of thesecant through the points
(
x0 + ∆x, u(x0 + ∆x))
and(
x0, u(x0))
. Of course, if we let∆xi, ∆xj → 0, we obtain the negative slope of the tangent at u(x0).
14. CALCULUS OF SEVERAL VARIABLES 17
14.17 Total Derivatives and Linear Functionals
When f : Rk → R, the derivative takes the simpler form
Df =
(
∂f
∂x1,∂f
∂x2, . . . ,
∂f
∂xk
)
(14.17.2)
Sometimes the derivative is written in what appears to be a quite different manner
df =∂f
∂x1dx1 +
∂f
∂x2dx2 + · · · +
∂f
∂xkdxk. (14.17.3)
They’re supposed to both be the derivative. What is going on here?Recall that the derivative of f is a linear function from Rk to R. As such, it can
properly be written as a covector, a row vector. That is what we have done in equation(14.17.2). But what about equation (14.17.3)?
The obvious interpretation is that we mean that dxi must the ith basis elemente∗i = (0, . . . , 0, 1, 0, . . . , 0) with a 1 in the ith position.But why is e∗
i written dxi? Consider the ith coordinate function xi(x) = xi, whichmaps Rk to R. Its derivative, naturally denoted dxi must be the linear functional e∗
i .This function maps Rk to R such that dxi(ej) = e∗
i (ej) is given by the Kronecker deltaδij. Either way we write it, the differentials of the such functions form a basis for rowvectors.
18 MATH METHODS
14.18 One-Forms
The expression used for df in equation (14.17.3) is a simple type of differential form. Afunctional linear combination of the basis vectors dxi is a called a 1-form. It is importantthat there are no products of the dxi terms and each term has a single dxi. Real-valuedfunctions are sometimes referred to a 0-forms.
One-forms are a linear combination of the basis vectors of (Rm)∗ where the coeffi-cients are functions. The functions are the derivatives ∂f/∂xi. One-forms produce alinear functional on Rm for every x In other words, a differential 1-form is a functionwhose values are linear functionals.
14. CALCULUS OF SEVERAL VARIABLES 19
14.19 Derivatives and Approximation
Because the derivative of f is a linear approximation to f, if we feed Df(x0) a vectorsuch as
∆x = ∆x =
∆x1...
∆xm
=∑
i
∆xi ei,
we obtain
∆f ≈ Dfx0 ∆x
=m∑
i=1
[
∂f
∂xi(x0)dxi
(
m∑
j=1
∆xj ej
)]
=m∑
i=1
m∑
j=1
∂f
∂xi(x0) δij∆xj
=m∑
i=1
∂f
∂xi(x0)∆xi
which is a linear approximation to the change in f as x0 is replaced by x0 + ∆x (seealso equation (14.15.1)).
Of course, if we use the one-form df(∆x) and compute, we obtain the same answer.
20 MATH METHODS
14.20 Derivatives and Differential Forms
If we used the df version, we would get the same result. So why have two versions?Among other things, differential forms are used in integration. Products there appearin higher derivatives (k-forms) and correspond to area or volume elements. The appro-priate product here is something called the exterior or wedge product, which is bothalternating and multilinear, like determinants.
We also use products for higher Frechet derivatives, Dkf. These are k-multilinearfunctions, k-tensors. They are not alternating, and are not to be confused with thek-forms used in multi-dimensional integration. The k-forms are alternating, the kth
derivatives are not. Quite the contrary, terms such as ∂2f/
∂x2i appear and at times are
important. They would be zero if the tensors were alternating.
14. CALCULUS OF SEVERAL VARIABLES 21
14.21 Curves and their Tangents
A curve in Rm is an m-tuple of continuous functions
x(t) =
x1(t)...
xm(t)
where each xi : I → R where I is a open subset of R. The functions are the coordinatefunctions of the curve x and t is a parameter describing the curve. Curves are allowedto cross or repeat themselves.
One of the simplest curves is a straight line. Recall that straight lines can be writtenin parametric form as
x(t) = x0 + tx1 (10.39.5.)
The curve x(t) = (cos t, sin t) repeatedly traces out a circle of radius one centeredat the origin. By adding a coordinate, we obtain y(t) = (cos t, sin t, t), which is aright-handed helix in R3.
If x is differentiable, we can define the tangent vector as
x′(t) =(
x′1(t), . . . , x′m(t))T
.
It describes the instantaneous rate of change of the curve, both direction and magnitude.When x(t) is the path of an actual object in motion and t is time, x′(t) is the velocity at
any time t and ‖x′(t)‖ is the speed. The second derivative x′′(t) =(
x′′1 (t), . . . , x′′m(t))T
isthe acceleration at time t. Here x′′i (t) denotes the ordinary second derivative d2xi/dt
2.
22 MATH METHODS
14.22 Examples of Curves
Our first example is a straight line.
Example 14.22.1: Straight Lines. The straight line x(t) = x0 + tx1 has tangent x′(t) =x1, the direction of the line. In this form, there is no acceleration along the line,x′′(t) = 0. The same line can be traced out in other ways, for example the curvex(t) = x0 + t3x1 visits all of the points on that straight line, but does it in a differentfashion, with tangent 2tx1. Except at t = 0, it points in the same direction, but hasa different magnitude, reflecting the variously slower and faster motion along the line.Moreover the acceleration is not zero, but x′′(t) = 2x1.
The second example circle around and around the unit circle about zero.
Example 14.22.2: Perpetual Circle. The perpetual circle is defined by x(t) =(cos t, sin t). It continually retraces the circle of radius 1 about the origin. Its tangentvector is x′(t) = (− sin t, cos t). You’ll notice that for circular motion, the tangent is or-thogonal to the curve, x′(t)·x(t) = 0. The acceleration x′′(t) = (− cos t,− sin t) = −x(t)always points toward the origin.
Our third example is a right-handed helix, meaning the it winds counter-clockwise.
Example 14.22.3: Right-handed Helix. The helix y(t) = (cos t, sin t, t) has tangenty′(t) = (− sin t, cos t, 1). It is not perpendicular to x(t) due to the x3 component.The acceleration is x′′(t) = (− sin t,−cost, 0) and points toward the x3-axis. The factthat the third component of acceleration is zero shows the constant motion along thex3-axis. The other components reflect the circular motion about it.
14. CALCULUS OF SEVERAL VARIABLES 23
14.23 Regular Curves
We will often require that curves be sufficiently smooth. Such curves are called regular.
Regular Curve. A curve x(t) =(
x1(t), . . . , xm(t))T
in Rm is regular if each xi is contin-uously differentiable in t and x′(t) 6= 0 for all t.
Examples (14.22.1)-(14.22.3) are all regular.Regularity rules out sudden changes of direction.
Example 14.23.1: Curve with a Cusp. Let x(t) = t3, and y(t) = t2. Of course,x′(t) = (3t2, 2t)T . This curve is not regular at 0 because x(0) = (0, 0). It makes asudden change of direction there.
x
y
Figure 14.23.2: The point where the curve makes a sudden change of direction is called acusp.
It’s not possible to re-parameterize this curve to make it regular. The tangent changesfrom going straight down to straight up. Reversing the tangent requires that it be zeroat the reversal point, regardless of how the parameter is applied. One way to see this isthat if x′(t) is C1, so is x′2(t). For t < 0, x′2(t) is negative, and for t > 0, x′2(t) is positive.Continuity requires that x′2(0) = 0. The only way x can be regular at zero is if x′1(0) 6= 0.But in that case the tangent could not be straight up or down at 0.
24 MATH METHODS
14.24 Functions, Curves, and Derivatives
It is sometimes useful to evaluate the derivative of a function f defined along a curve. Ifx(t) is a regular curve in Rm and f : Rm → R, we can define g(t) = f x(t) = f
(
x(t))
.We can take the derivative of g in the following fashion, based on the chain rule:
g′(t) =∂f
∂x1
(
x(t))
x′1(t) +∂f
∂x2
(
x(t))
x′2(t)
+ · · · +∂f
∂xm
(
x(t))
x′m(t)
= Dfx(t) x′(t).
In the final formula, keep in mind that Df is a 1 ×m row vector and x′ is an m × 1column vector, so the matrix product is a number. Writing the vectors (x) and covectors(Df) in the right way insures everything lines up properly in the product.
As for the formula above, we state the relevant theorem, without proof.
Chain Rule I. Let x(t) =(
x1(t), . . . , xm(t))
be a C1 curve on an interval about t0 and f
a C1 function defined on a ball in Rm about x(t0). Then g(t) = f(
x(t))
is a C1 functionon an interval about t0 and
dg
dt(t0) =
∂f
∂x1
(
x(t0))
x′1(t0) + · · · +∂f
∂xm
(
x(t0))
x′m(t0)
= Dfx(t0) x′(t0).
14. CALCULUS OF SEVERAL VARIABLES 25
14.25 Directional Derivatives and Gradients
One way to think about derivatives in a particular direction is to consider a curve thatgoes in that direction. We want a curve with x(t0) = x0 and x′(t0) = v. There are manysuch curves, one is the line through x0 in the direction v, given by x(t) = x0 + (t− t0)v.
If f : Rm → R, we can consider the composite function g(t) = f(
x(t))
and take itsderivative
df
dt(t0) = Dfx0 x
′(t0) = Dfx0 v.
In fact, Chain Rule I ensures that any curve with x(t0) = x0 and x′(t0) = v will havethe same derivative. We refer to this as the directional derivative of f in the directionv. Two notations sometimes used for the directional derivative are
∂f
∂v(x0) and Dvf(x0)
partial Frechet derivatives taken with respect to a subset of the variables. we prefer theformer notation as Dvf conflicts with our notation for
It is sometimes useful to regard the derivative of a real-valued function f on U ⊂ Rm
as a vector rather than a covector. That vector is called the gradient, and is written
∇f(x0) =
∂f∂x1
(x0)∂f∂x2
(x0)...
∂f∂xm
(x0)
where x0 ∈ U. Of course, ∇f(x0) = Df(x0)T . We can then write the directionalderivative as
∂f
∂v(x0) = ∇f(x0)·v.
26 MATH METHODS
14.26 The Chain Rule
The following theorem is called “Chain Rule IV” by Simon and Blume. We’ll just call itthe Chain Rule.
The Chain Rule. Let U and V be open subsets of Rk and Rℓ, respectively. Supposef : U → Rℓ and g : V → Rm are C1 functions, and that x0 ∈ U and y0 = f(x0) ∈ V.We will write y = f(x) and g(y).
Then h = g f is also C1 on an open ball about x0 with
Dxh(x0) = Dyg(y0) ×Dxf(x0)
In other words, the Jacobian derivative of h is the product of the Jacobian derivatives ofg and f.
The Chain Rule both asserts that the composite function h = g f is differentiableand gives a formula for its derivative.
We know that Dh is an k× ℓ matrix, and Df is an ℓ×m matrix, so Dh is an k×mmatrix. Unpacking the matrix product shows that
∂hi
∂xj=
m∑
k=1
(
∂gi
∂yk
)(
∂fk
∂xj
)
(14.26.4)
for all i = 1, . . . , k and j = 1, . . . ,m. Equation (14.26.4) can be written more fully as
∂hi
∂xj(x0) =
m∑
k=1
(
∂gi
∂yk
)
(y0)
(
∂fk
∂xj
)
(x0),
again for all i = 1, . . . , k and j = 1, . . . ,m. Keep in mind that y0 = f(x0).
14. CALCULUS OF SEVERAL VARIABLES 27
14.27 Chain Rule for Direct and Indirect Variables
The Chain Rule has many applications beyond the obvious ones. Some functions willuse the same variables both directly and indirectly via another function.
Consider the functionΦ(x) = f
(
x, g(x))
.
Here x appears both directly in Φ, and indirectly via g. If x ∈ Rm, f takes m + 1arguments. Partition those arguments (x, y). Define a function h : Rm → Rm+1 by
h(x) =(
xg(x)
)
with Dh(x) =(
ImDxg
)
where Im is the m×m identity matrix.Then Φ(x) = f
(
h(x))
, so the Chain Rule tells us that
DxΦ = Df(x,y) ×Dxh
= Df(x,y) ×(
ImDxg
)
= DxΦ = Dxf + Dyf×Dxg.
28 MATH METHODS
14.28 An Application of the Chain Rule to Integrals
One useful application of the chain rule is to integrals with variable limits.
Example 14.28.1: Chain Rule and Integrals. Now consider let g : R → R and h : R2 →R be C1 and define
Φ(x) =
∫g(x)
0
h(x, t)dt.
Now define
f(x, y) =
∫y
0
h(x, t)dt
and apply the previous result to Φ(x) = f(
x, g(x))
. This calculation yields
Φ′(x) =d
dx
(∫g(x)
0
h(x, t)dt
)
= h(
x, g(x))
g′(x) +
∫g(x)
0
∂h
∂x(x, t)dt.
When g(x) = x, this formula reduces to
d
dx
(∫x
0
h(x, t)dt
)
= h(x, x) +
∫x
0
∂h
∂x(x, t)dt.
A similar method can be used on the case where both limits of integration are definedby functions.
14. CALCULUS OF SEVERAL VARIABLES 29
14.29 Second Derivatives: The Hessian 10/14/21
Second derivatives are a little more complex than first derivatives and we will startwith the case where f : Rm → R because it is easier to see what is going on. In thatcase, the first-derivative is a row vector
(
∂f∂x1
(x) . . . ∂f∂xm
(x))
.
This was discussed in section 14.14, where we used a modified matrix representationof such derivatives. We apply that method here.
For functions that are twice continuously differentiable, we write the matrix of secondpartial derivatives, the Hessian matrix, or Hessian, as
D2f(x) =
∂2f(x)∂x2
1
∂2f(x)∂x1 ∂x2
· · · ∂2f(x)∂x1 ∂xm
∂2f(x)∂x2 ∂x1
∂2f(x)∂x2
2
· · · ∂2f(x)∂x2 ∂xm
......
...∂2f(x)∂xm ∂x1
∂2f(x)∂xm ∂x2
· · · ∂2f(x)∂x2
m
.
Where∂
2f
∂xk ∂xℓ=
∂
∂xk
(
∂f
∂xℓ
)
.
30 MATH METHODS
14.30 Another Notation
This is one of those times when it is useful to employ an alternate notation for partialderivatives.
fi =∂f
∂xi, fij = (fi)j =
∂2f
∂xj ∂xi=
∂
∂xj
(
∂f
∂xi
)
, etc.
Notice the reversal of order between the two notations, ij versus ji. The alternatenotation allows us to write the Hessian in the more compact form
D2f(x) =
f11(x) f12(x) · · · f1m(x)f21(x) f22(x) · · · f2m(x)
......
...fm1(x) fm2(x) · · · fmm(x)
.
When f is twice continuously differentiable, it does not matter which order is usedfor the second partial derivatives because
∂2f
∂xk ∂xℓ=
∂2f
∂xℓ ∂xk.
The derivative is the same either way.1 If f is not C2, the order in which we take partialderivatives may affect the result.
1 In the economics literature, this result is often called Young’s Theorem, although a host of mathemati-cians have worked on the problem, including Cauchy and Lagrange. My experience is that mathematicsbooks usually don’t name the theorem after anyone, although I have seen it called the Clairaut-SchwartzTheorem. Clairaut published a proof in 1740 that does not meet modern standards of rigor. The firstrigorous proof was due to H.A. Schwartz in 1873. The most commonly used proof is that of Jordanpublished in 1883. E.W. Hobson and W.H. Young later proved the theorem under weaker conditions,1907-1909. The name Young’s Theorem may derive from the economist R.G.D. Allen.
14. CALCULUS OF SEVERAL VARIABLES 31
14.31 Interpreting The Hessian
So what exactly is the Hessian?The derivative of f : Rm → R is a mapping to a linear function from Rm → R. It
maps x to the linear functional Df(x), which takes a single argument Rm. We can writethe values of this linear functional for y ∈ Rm as the matrix product [Df(x)]y. TheHessian is a way of writing the derivative of the mapping x 7→ Df(x).
The second derivative at x is a linear function from Rm → (Rm)∗. We feed it a(column) vector (z ∈ Rm) and get a covector (row vector in (Rm)∗). We can then feedit a second column vector (y ∈ Rm), obtaining a number. The Hessian matrix makes itpossible to do this. The resulting bilinear map has the form
L(y, z) = zT[
D2f(x)]
y. (14.31.5)
It takes pairs of vectors (y, z), one to get a linear functional, and a second to produce anumber.
32 MATH METHODS
14.32 The Hessian is a Bilinear Form — A 2-Tensor
Writing equation (14.31.5) out, we obtain
L(y, z) = zT D2f(x)y
= (z1, . . . , zm)
f11(x) f12(x) · · · f1m(x)f21(x) f22(x) · · · f2m(x)
......
...fm1(x) fm2(x) · · · fmm(x)
y1
y2...
ym
=m∑
i,j=1
fij(x) ziyj.
Since D2f(x) is symmetric, it doesn’t really matter which vector goes where, but I havewritten it in the logical order. The vector y belongs to the linear functional, while thevector z combines with a linear function that makes to linear functionals.
The resulting mapping from Rm × Rm → R is bilinear, and is best thought of as a2-tensor, a linear mapping from Rm ⊗ Rm to R. As such we can write
D2f(x) =m∑
ij=1
fij(x)dxi ⊗ dxj
when we can use the outer product to write
L(y, z) =[
D2f(x)]
·(y⊗ z)
The dot product of the two matrices indicates we are multiplying the correspondingterm and adding, just like the ordinary dot product, but for matrices. This makes sensebecause linear functionals on Rm involve dot products. It shouldn’t be surprising tofind one here.
You may run across constructions where you vectorize the matrices. This is sometimesdone in econometrics. Then taking the dot product of the resulting vectors would giveyou the same result.
14. CALCULUS OF SEVERAL VARIABLES 33
14.33 Taylor’s Formula: Preview
One important use of the Hessian is Taylor’s formula, which we will prove later.First, a couple of definitions. In a vector space, the line segment between x and y is
ℓ(x,y) = (1 − t)x + ty : 0 ≤ t ≤ 1. A set S is convex if it contains ℓ(x,y) wheneverx,y ∈ S.
First Order Taylor’s Formula. Let f : U → R be C2 on a convex open set U ⊂ Rm.Then for every x, x′ ∈ U, there is a y on the line segment connecting x and x′ such that
f(x′) = f(x) + Df(x)(x′ − x) +1
2(x′ − x)T
[
D2f(y)]
(x′ − x). (14.33.6)
In Taylor’s formula, not only is the Hessian symmetric, but we feed it the samevector on each side, so there is even more symmetry. If L(·, ·) is a bilinear form, thenQ(z) = L(z, z) is called a quadratic form. To see why, write it out. When L is theHessian, we get
m∑
ij=1
fij(x) zizj.
It is a purely second degree polynomial in the zi, hence the term quadratic. The Taylorapproximation in equation (30.6.2) consists of a constant term, a linear term, and aquadratic term, and for small changes in x, will better approximate f than the derivativealone does.
There are higher Taylor expansions that involve third and higher degree terms, termswhich are 3-tensors, 4-tensors, etc.
34 MATH METHODS
14.34 Higher Derivatives as Tensors
Let f : Rm → R be k times continuously differentiable with k > 2. What do its higherderivatives, those past the Hessian, look like. Well, we know that Dkf(x) must be ak-linear form. That means it can be written as linear map from (Rm)⊗k → R, a k-tensor.
Although perhaps a bit tedious, it’s not hard to figure out what Dkf(x) looks like,even though they do not have convenient matrix representations.
[
Dkf(x)]
(y1, . . . ,yk) =m∑
i1,... ,ik=1
fi1···ik (x)y1i1· · ·yk
ik
where yij is the jth component of yi. As written here, yj
ikis the ik component of yj.
This means that Dkf(x) can be written as the k-tensor
Dkf(x) =m∑
i1,... ,ik=1
fi1···ik(x)dxi1⊗ · · · ⊗ dxik .
30. CALCULUS OF SEVERAL VARIABLES II 35
30. Calculus of Several Variables II
30.1 Rolle’s Theorem
We’ve already covered Weierstrass’s Theorem, which was the first thing in section 30.1of Simon and Blume. One important consequence of Weierstrass’s Theorem is Rolle’sTheorem.
Rolle’s Theorem. Suppose f : [a, b] → R is continuous on [a, b] and continuouslydifferentiable on (a, b). If f(a) = f(b) = 0, there is a point c ∈ (a, b) with f′(c) = 0.
Proof. If f is constant on [a, b], then f(x) = 0 for all x ∈ [a, b] and so f′(x) = 0 for allx ∈ (a, b). Then any c ∈ (a, b) will do.
If f is not constant on [a, b], either there is d ∈ (a, b) with f(d) > 0 or a d ∈ (a, b)with f(d) < 0. In the former case, f has a maximum at some c ∈ (a, b) by Weierstrass’sTheorem. The first order necessary condition for an interior optimum on R showsf′(c) = 0 (remember your calculus!). In the latter case, f has a minimum at somec ∈ (a, b) by Weierstrass’s Theorem, and so f′(c) = 0 by the first order necessarycondition for an interior optimum. Either way, we are done.
x
y
f
b
bc1
c2b b
5
Figure 30.1.1: The function goes both above and below the axes, meaning there must be atleast two points in the closed interval [0, 4] that are extrema, obeying f′(c) = 0. In this casethere are exactly two, labeled c1 and c2.
36 MATH METHODS
30.2 Mean Value Theorem
The Mean Value Theorem generalizes Rolle’s Theorem, which is the key to the proofof the Mean Value Theorem.
Mean Value Theorem. Let f : I → R be a C1 function on an interval I ⊂ R. Then forany points a, b ∈ I with a < b there is a point c, a < c < b with
f′(c) =f(b) − f(a)
b− a, or equivalently, f(b) − f(a) = f′(c)(b− a).
x
y
S
slope f(b) − f(a)b− a
T
slope f′(c)
f
a bc
Figure 30.2.1: The Mean Value Theorem gives us a point c ∈ (a, b) where the slope of thetangent T to f at
(
c, f(c))
is equal to the slope of the secant S through(
a, f(a))
and(
b, f(b))
.
30. CALCULUS OF SEVERAL VARIABLES II 37
30.3 Proof of the Mean Value Theorem
Mean Value Theorem. Let f : I → R be a C1 function on an interval I ⊂ R. Then forany points a, b ∈ I with a < b there is a point c, a < c < b with
f′(c) =f(b) − f(a)
b− a, or equivalently, f(b) − f(a) = f′(c)(b− a).
Proof. Define a function g by
g(x) = f(x) − f(a) − f(b) − f(a)
b− a(x− a).
Then g(x) is the vertical distance between the secant line S and f(x). The distance iszero at both a and b:
g(a) = f(a) − f(a) − f(b) − f(a)
b− a(a− a) = 0
and
g(b) = f(b) − f(a) − f(b) − f(a)
b− a(b− a) = 0.
By Rolle’s Theorem, there is a c ∈ (a, b) with
0 = g′(c) = f′(c) − f(b) − f(a)
b− a
Then f′(c)(b− a) = f(b) − f(a), proving the result.
38 MATH METHODS
30.4 Mean Value Theorem on Rm
There is a version for Rm that follows directly from the basic Mean Value Theorem.
Theorem 30.4.1. Let f : U → R be a continuously differentiable function defined on anopen set U ⊂ Rm. Suppose a,b ∈ U with the line segment ℓ(a,b) ⊂ U. Then there isa point c ∈ ℓ(a,b) such that
f(b) − f(a) = Dfc(b− a)
Proof. Define g : [0, 1] → U by g(t) = a + t(b − a) and set h(t) = f(
g(t))
. Thenh : [0, 1] → R. By the Mean Value Theorem, there is a t∗ ∈ (0, 1) with
h(1) − h(0) = h′(t∗)(1 − 0) = h′(t∗).
Let c = g(t∗) = a + t∗(b− a). Then
f(b) − f(a) = h(1) − h(0)
= h′(t∗)
= Dtf(
g(t∗))
= Dxf(
g(t∗))
g′(t∗)
= Dfc(b− a).
This tells us that f(b) = f(a) + Df(c)(b− a) for some c on the line segment ℓ(b,a),which is useful for approximating f at a point b based on its value at a point a.
30. CALCULUS OF SEVERAL VARIABLES II 39
30.5 Taylor Polynomials
There is a generalization of the Mean Value Theorem using Taylor Polynomials. Let f(k)
denote the kth derivative of f
f(k)(x) =dkf
dxk(x)
where f(0)(a) = f(a).Recall that k! denotes k factorial which is defined inductively for non-negative integers
by 0! = 1 and k! = k(k− 1)!. Thus
k! = 1 · 2 · 3 · · · (k− 1) · k.
The gamma function extends the definition to the complex numbers, excepting thenon-positive integers. They are related by k! = Γ (k + 1). When z ∈ C has a positivereal part,
Γ (z) =
∫∞
0
xz−1e−x dx.
Taylor Polynomial. The kth order Taylor polynomial is
Pk(x;a) = f(a) + f′(a)(x − a) +1
2!f′′(a)(x − a)2 + · · · +
1
k!f(k)(a)(x− a)k
=k∑
n=0
1
n!f(n)(a)(x − a)n.
The fact that Pk(a;a) = f(a) will be useful when we prove various forms of Taylor’sFormula.
Here are the first several Taylor polynomials:
P0(x;a) = f(x)
P1(x;a) = f(x) + f′(a)(x − a)
P2(x;a) = f(x) + f′(a)(x − a) +1
2f′′(a)(x− a)2
P3(x;a) = f(x) + f′(a)(x − a) +1
2f′′(a)(x− a)2 +
1
6f(3)(a)(x − a)3
40 MATH METHODS
30.6 First Order Taylor’s Formula in R
To see how the proof of Taylor’s formula works, we start with the first order Taylor’sformula. (The Mean Value Theorem is the zeroth order Taylor’s formula.)
Theorem 30.6.1. Let f : I → R be a C2 function defined on an interval in R. If a, b ∈ Ithere exists a c ∈ (a, b) such that
f(b) = f(a) + f′(a)(b − a) +1
2f′′(c)(b− a)2. (30.6.1)
= P1(b) +1
2f′′(c)(b− a)2
Proof. Define
g(x) = f(x) − P1(x;a) −M(x− a)2
= f(x) − [f(a) + f′(a)(x − a)] −M(x− a)2. (30.6.2)
By definition, g(a) = 0. Now choose M so that g(b) = 0. Then
M =1
(b− a)2[
f(b) − f(a) − f′(a)(b − a)]
By Rolle’s Theorem, there is a c1 ∈ (a, b) with g′(c1) = 0. Now
g′(x) = f′(x) − f′(a) − 2M(x− a)
so g′(a) = f′(a) − f′(a) = 0. Both g′(a) = g′(c1) = 0, so we apply Rolle’s Theoremagain, this time to g′.
From Rolle’s Theorem we obtain a c ∈ (a, c1) ⊂ (a, b) with g′′(c) = 0. Now
g′′(x) = f′′(x) − 2M
so f′′(c) = 2M. In equation (30.6.2), substitute M = f′′(c)/2 and set x = b to obtainequation (30.6.1).
30. CALCULUS OF SEVERAL VARIABLES II 41
30.7 kth Order Taylor’s Formula in R
We now consider the kth order Taylor’s formula.
Taylor’s Formula in R. Let f : I → R be a Ck+1 function defined on an interval in R. Ifa, b ∈ I there exists a c ∈ (a, b) such that
f(b) = f(a) + f′(a)(b − a) +1
2!f′′(a)(b− a)2 + · · · +
1
k!f(k)(a)(b − a)k
+1
(k + 1)!f(k+1)(c)(b− a)k+1 (30.7.3)
= Pk(b) +1
(k + 1)!f(k+1)(c)(b− a)k+1
PROOF SKIPPED
Proof. Define
g(x) = f(x) − Pk(x;a) −M(x− a)k+1
= f(x) − f(a) − f′(a)(x− a) − 1
2!f′′(a)(x− a)2
− · · · − 1
k!f(k)(a)(x − a)k + M(x− a)k+1 (30.7.4)
where
M =1
(b − a)k+1
[
f(b) − Pk(b;a)]
As before, g(a) = f(a) − Pk(a;a) = f(a) − f(a) = 0 and M has been chosen so thatg(b) = f(b)−Pk(b;a)−
[
f(b)−Pk(b;a)]
= 0. By Rolle’s Theorem, there is a c1 ∈ (a, b)with g′(c1) = 0.
Now
g′(x) = f′(x) − f′(a) − f′′(a)(x − a) − · · · − 1
(k− 1)!f(k)(a)(x − a)k−1
+ (k + 1)M(x − a)k.
Then g′(a) = f′(a) − f′(a) = 0. A second application of Rolle’s Theorem, now to g′,yields a c2 ∈ (a, c1) ⊂ (a, b) with g′′(c2) = 0.
(Proof continues on next page...)
42 MATH METHODS
30.8 Taylor’s Formula Proof Part II
PROOF SKIPPED
Remainder of Proof. Computing g′′ we obtain
g′′(x) = f′′(x) − f′′(a) − · · · − 1
(k− 2)!f(k)(a)(x− a)k−2
− (k + 1)kM(x− a)k−1.
It follows that g′′(a) = f′′(a) − f′′(a) = 0. A third application of Rolle’s Theorem, nowto g′′, yields a c3 ∈ (a, c2) ⊂ (a, b) with g′′′(c3) = 0.
We continue applying Rolle’s Theorem to successive derivatives until we eventuallyget to g(k)(x) with g(k)(ck) = 0 and
g(k)(x) = f(k)(x) − f(k)(a) − (k + 1)!M(x− a).
It follows that g(k)(a) = f(k)(a) − f(k)(a) = 0, so we apply Rolle’s Theorem one last timeto find a c ∈ (a, ck) ⊂ (a, b) with g(k+1)(ck) = 0.
We computeg(k+1)(x) = f(k+1)(x) − (k + 1)!M.
Then f(k+1)(c) = (k + 1)!M, so M = f(k+1)(c)/(k + 1)!.In equation (30.7.4), substitute M = f(k+1)(c)/(k + 1)! and set x = b to obtain
equation (30.7.3).
30. CALCULUS OF SEVERAL VARIABLES II 43
30.9 Big O, Little o
Big O and little o are notations used to describe the asymptotic behavior of functions.They can be applied at infinity, or at any finite point such as 0. The “O” stands fororder, and indicates that one function is the same order as the other. They are usefulfor describing the precision of estimates.1
Thus f(x) = O(
g(x))
as x → ∞ means there is an M > 0 such that
|f(x)|
g(x)≤ M
for x large enough.Similarly f(x) = O(x3) at 0 means there is an M > 0 with
|f(x)|
x3≤ M
for x near 0.Thus
10 − 1
x= O(1) as x → ∞.
Big O is used when the ratio of two functions is bounded. We use little o when theratio converges to zero. So f(x) = o
(
g(x))
at infinity means that for every ε > 0, thereis a K such that
|f(x)|
g(x)< ε
for all x > K. The definition at any finite point is similar.When we say R(x) = o
(
|x|k)
at x = 0, it means that for every ε > 0, there is a δ > 0with
|R(x)| < ε|x|k
for |x| < δ. In this case f converges to zero enough faster than |x|k that the ratio alsoconverges to zero.
1 This type of asymptotic notation is known as the Bachman-Landau notation. Big O was introduced byPaul Bachman in 1894, Edmund Landau proposed the little o notation in 1909. There are other variantsthat are less commonly used, such as Hardy and Littlewood’s Ω notation.
44 MATH METHODS
30.10 Big O and Little o as Limits SKIPPED
The definitions of big O and little o can also be stated in terms of limits.
Big O. We write f(x) = O(
g(x))
as x → a to mean
lim supx→a
|f(x)|
g(x)< ∞.
The case a = ±∞ is allowed.
That used the limit superior.
Limit Superior. The limit superior or lim sup is defined by
lim supx→a
f(x) = limε→0
(
supf(x) : x ∈ Bε(a))
when a is finite, and
lim supx→∞
f(x) = limk→∞
(
supf(x) : x > k)
when a = +∞.
The definition is a little simpler for sequences. When we have a sequence an,
lim supn→∞
= limn→∞
(
supam : m ≥ n)
.
Limit Inferior. The limit inferior, lim inf, is defined analogously. E.g.,
lim infn→∞
an = limn→∞
(
infam : m ≥ n)
.
As for little o, we use the same quotient, but here the limit decreases to 0. In thatcase, it is enough to use the limit itself rather than the limit superior.
Little o. We write f(x) = o(
g(x))
as x → a when
limx→a
|f(x)|
g(x)= 0.
The case a = ±∞ is allowed.
30. CALCULUS OF SEVERAL VARIABLES II 45
30.11 The Remainder in Taylor’s Formula SKIPPED
Define the kth remainder term by
Rk(x;a) = f(x) − Pk(x;a).
We can use Taylor’s formula to write
Rk(x;a)
(x− a)k=
1
k!
f(k+1)(c)(x− a)k+1
(x− a)k=
1
k!f(k+1)(c)(x− a).
As x → a, c → a, so the limit of the remainder is
limx→a
Rk(x;a) =1
k!f(k+1)(a)(a− a) = 0,
or in the little o notation,Rk(x;a) = o
(
|x− a|k)
.
This shows that Taylor’s formula is a good approximation of f for x near a.
46 MATH METHODS
30.12 Example: A Power Series
Example 30.12.1: Power Series. In some cases, the approximation is perfect ask → ∞, and we get a convergent power series. Consider f(x) = sin x and set a = 0.Then f(2k)(0) = 0, f(4k+1)(0) = +1 and f(4k+3)(0) = −1, so the Taylor expansion for sin xis
sin x = x− x3
3!+
x5
5!− x7
7!+ · · · =
∞∑
n=0
(−1)n
(2n + 1)!x2n+1.
Applying the ratio test, we find that this series converges absolutely for any x ∈ R
because∣
∣
∣
∣
x2n+3
(2n + 3)!
∣
∣
∣
∣
·∣
∣
∣
∣
(2n + 1)
x2n+1
∣
∣
∣
∣
=
∣
∣
∣
∣
x2
(2n + 2)(2n + 3)
∣
∣
∣
∣
→ 0
as n → ∞ for every real x.The remainder obeys
|R2k+1| =|x|2k+1
k!→ 0 as k → ∞,
so the power series converges to sin x. The convergence is uniform on any compactinterval.
30. CALCULUS OF SEVERAL VARIABLES II 47
30.13 Taylor’s Formula in Rm
Just as we did with the Mean Value Theorem, we can derive Taylor’s Formula in Rm
from the R1 version. We will use the shorthand notation[
Dkfa]
h⊗k to denote thek-tensor
[
Dkfa]
(h, . . . ,h) that is the kth derivative applied to h⊗ · · · ⊗ h ∈ (Rm)⊗k.
Taylor’s Formula in Rm. Let f : U → R be a Ck+1 function defined on an open set inRm. Suppose that for every a,b ∈ U, ℓ(a,b) ⊂ U. Then for all a,b ∈ U there exists ac ∈ ℓ(a,b) such that
f(b) = f(a) +[
Dfa]
(b− a) +1
2!
[
D2fa]
(b− a)⊗2 + · · · +1
k!
[
Dkfa]
(b− a)⊗k
+1
(k + 1)!
[
Dk+1fc]
(b− a)⊗(k+1) (30.13.5)
PROOF SKIPPED
Proof. We will piggyback off Taylor’s Formula in R. Define
φ(t) = f(
(1 − t)a + tb)
= f(
a + t(b− a))
.
Since U is open, φ : I → R is a C(k) function defined on an open interval I ⊃ [0, 1].We can apply Taylor’s Formula on the interval I between 0 and 1 to find
φ(1) =k∑
n=0
1
n!
[
φ(n)(0)]
1n +1
(k + 1)!
[
φ(k+1)(t∗)]
1k+1.
for some t∗ ∈ (0, 1). Then we apply the Chain Rule to find
φ′(t) =[
Df(
a + t(b− a))]
(b− a),
φ′′(t) =[
D2f(
a + t(b− a))]
(b− a)⊗2,
φ′′′(t) =[
D3f(
a + t(b− a))]
(b− a)⊗3
etc.
Setting t = 0 we obtain equation (30.13.5).
48 MATH METHODS
30.14 The Remainder Term in Rm
Again, the kth remainder term is Rk(x;a) = f(x) − Pk(x;a), so
Rk(x;a) =1
k!
[
D(k+1)fc]
(x− a)⊗(k+1).
Dividing by ‖x− a‖k, we obtain
Rk(x;a)
‖x− a‖k =1
k!
[
D(k+1)fc]
(x− a)⊗(k+1)
‖x− a‖k
Let u be the unit vector (x− a)/‖x− a‖. Then
Rk(x;a)
‖x− a‖k =1
k!
[
D(k+1)fc]
u⊗(k+1)‖x− a‖
As x → a, c → a, and we findRk(x;a)
‖x− a‖k → 0
as x → a. AlternativelyRk(x;a) = o
(
‖x− a‖k)
as ‖x− a‖ → a. This means that the remainder goes to zero as x → a enough fasterthan ‖x− a‖k → 0 that their ratio converges to zero.
29.3. CONNECTED SETS 49
29.3. Connected Sets
Our last collection of important topological concepts relates to connected sets.Roughly speaking, a connected set is a set that is a single contiguous piece. Beforedefining connectedness, we need another definition.
29.22 Relative Topology
We start by defining the relative or subspace topology.
Relative (Subspace) Topology. Let (X,T) be a topological space and S a subset of X. Therelative or subspace topology on S is defined by TS = S ∩ U : U ∈ T and (S,TS) iscalled a subspace of X.
The relative topology is the weakest topology where the inclusion map iS : S → X,defined by iS(x) = x, is continuous. In any topology where it is continuous, the setsU∩S must be open. The relative topology TS demands exactly that—no more, no less.The relative topology on S makes S into a topological space (in this case a subspace).When (X, d) is a metric space, the subspace topology is equivalent to (S, d).
Sets of the form U∩ S with U open in X are referred to as relatively open and sets ofthe form F ∩ S with F closed in X are called relatively closed.
50 MATH METHODS
29.23 Embedding R in R2
Example 29.23.1: Embedding R in R2. When R2 has the usual topology, we canembed R into R2 as A = x ∈ R2 : x2 = 0. The relative topology on A is the same asthe usual topology on R. This is illustrated in Figure 29.23.2.
bc bcA
x1
x2
Figure 29.23.2: The intersection of an open ball in R2 and the line A = x : x2 = 0 is anopen interval in A. Had we used a closed ball, we would have gotten a closed interval.
29.3. CONNECTED SETS 51
29.24 Relative Complements
The relative complement of A in S is defined by S \A = x ∈ S : x /∈ A = S ∩Ac.It’s easy to see that the relative complement of a relatively open set is relatively closed
and vice-versa. It is easy to show that the relative complements of relatively open setsare relatively closed and vice-versa.
Theorem 29.24.1. Let S ⊂ X have the relative topology induced by (X,T). Then A isthe relative complement of an open set if and only if A = S ∩ F for some set F that isclosed in X.
Proof. Let A = S \U = S∩Uc for some open set U. Then A = S∩ F where F = Uc
is closed.Conversely, if A = S ∩ F for some closed set F, then A = S \ Fc. Then U = Fc is
open and A = S \U.
52 MATH METHODS
29.25 Connected Sets 10/15/20
Connected and Disconnected Sets. Let (X,T) be a topological space. The space X isconnected if there do not exist two disjoint non-empty open sets U and V that obeyX = U ∪ V. If such sets do exist, we say they disconnect X and that X is disconnected
A subset S of X is connected it if is connected in the relative topology.
These definitions can be recast in terms of closed sets. Theorem 29.25.1 appliesregardless of whether we are using the base topology on X, or if we are considering asubset with the relative topology.
Theorem 29.25.1. A space X is disconnected if and only if there are non-empty disjointclosed subsets A and B that cover X.
Proof. The set X is disconnected if and only if there are non-empty open sets U andV with U ∩ V = ∅ and U ∪ V = X. Taking complements, and defining the closed setsA = Uc and B = Vc, we find this holds if and only if A ∪ B = Uc ∪ Vc = X andA∩B = ∅. Also there are u ∈ U and v ∈ V, if and only if u /∈ Uc = A and v /∈ Vc = B.Since the sets cover X, that is equivalent to A and B being non-empty.
29.3. CONNECTED SETS 53
29.26 Characterizing Relative Disconnections
Let S have the relative topology and U,V be relatively open sets that disconnect S.What can we say about the open sets in X that generate them?
Theorem 29.26.1. Let (X,T) be a topological space and S ⊂ X. Suppose U and V arerelatively open sets that disconnect S. Then there are open sets U′ and V ′ in X obeying
1. U′ ∩ S and V ′ ∩ S are both non-empty.2. S ⊂ U′ ∪ V ′.3. U′ ∩ V ′ ∩ S = ∅.
Conversely, if U′ and V ′ are open sets obeying (1)–(3), U = U′ ∩ S and V = V ′ ∩ Sdisconnect S in the relative topology.
Proof. By the definition of the relative topology, there are open sets U′, V ′ ⊂ X withU = U′ ∩ S and V = V ′ ∩ S. (1) Now U′ ∩ S and V ′ ∩ S are U and V, which arenon-empty. (2) Because U and V are disjoint, U′ ∩ V ′ ∩ S = ∅. (3) Finally, U and Vcover S, so (U′ ∩ S) ∪ (V ′ ∩ S) = S. This follows if S ⊂ U′ ∪ V ′.
We now prove the converse. By (1), U = U′ ∩ S and V = V ′ ∩ S are are non-empty.By (2), U and V are disjoint. By (3), their union is all of S. Since U and V are alsorelatively open, they disconnect S.
Notice that it need not be the case that U′∩V ′ is empty, only that it contain no pointsin S.
There’s a similar result for closed sets. The proof is quite similar, and has beenomitted.
Theorem 29.26.2. Let (X,T) be a topological space and S ⊂ X. Suppose A and B arerelatively closed sets that disconnect S. Then there are closed sets A′ and B′ in X obeying
1. A′ ∩ S and B′ ∩ S are both non-empty.2. S ⊂ A′ ∪ B′.3. A′ ∩ B′ ∩ S = ∅.
Conversely, if A′ and B′ are closed sets obeying (1)–(3), A = A′ ∩ S and B = B′ ∩ Sdisconnect S in the relative topology.
Intuitively, a set is connected if there are no breaks in the set, if it is all one piece.
Example 29.26.3: A Disconnected Set. A set such as S = [0, 1) ∪ (1, 2] ⊂ R is notconnected. One disconnection is A = (−1, 1) and B = (1, 3). Both sets are open, bothhave non-empty intersection with S. Here A ∩ S = [0, 1) 6= ∅ and B ∩ S = (1, 2] 6= ∅.The set S is covered because A ∪ B = (−1, 1) ∪ (1, 3) ⊃ S. Finally, the sets are disjoint:A ∩ B = ∅.
54 MATH METHODS
29.27 Intervals are Connected
In contrast to the previous example, any interval in the real line is a connected set.
Proposition 29.27.1. Any interval I in R is connected.
Proof. Let I be an interval, possibly infinite. We prove this by contradiction. SupposeI is not connected, then there are non-empty relatively closed sets A and B thatdisconnect I. Take a ∈ A and b ∈ B. We label the sets so that a < b and consider theinterval [a, b]. Note that [a, b] ⊂ I because I is an interval and a, b ∈ I.
Define z = sup([a, b] ∩A). The set [a, b] ∩A is non-empty as a ∈ A. Since the setis bounded, the supremum will exist. By construction, a ≤ z ≤ b, so z ∈ [a, b] ⊂ I.
Because A and [a, b] are relatively closed,
z ∈ cl(
[a, b] ∩A)
= [a, b] ∩A ⊂ A.
Of course z 6= b ∈ B because A ∩ B is empty.Since z is the upper bound of A ∩ [a, b], any w ∈ (z, b] must be in B. In particular,
z + 1n∈ B for n large enough.
Letting n → ∞ and using the fact that B is relatively closed, we find z ∈ B. Butz ∈ A so it cannot also be in B. This contradiction means that there are no sets Aand B that disconnect I. Therefore, I is connected.
29.3. CONNECTED SETS 55
29.28 Totally Disconnected Sets
Totally disconnected sets are the opposite extreme from connected sets. They containno connected subsets bigger than a single point. A set is totally disconnected if its onlyconnected subsets are the trivial ones—singletons and the empty set. Equivalently, aset is totally disconnected if you can disconnect any two distinct points in the set.
Example 29.28.1: The Rationals are Totally Disconnected. The rational numbers Q
provide an example of total disconnection. Suppose a is an irrational number (e.g.,√2, π). Then A = (−∞, a] ∩ Q and B = Q ∩ [a,+∞) are relatively closed sets in
Q that disconnect Q. Moreover, you can disconnect any subset of Q that contains atleast two distinct points in the same fashion. That means that the set Q ⊂ R is totallydisconnected.
Another totally disconnected set is the Cantor set of Example 12.33.1.
Example 29.28.2: The Cantor Set is Totally Disconnected. Suppose x < y are distinctelements of the Cantor set. As we saw, both can be written as ternary numbers consistingentirely of 0’s and 1’s. Take the first digit that differs, call it digit k. Digit k is 1 in x, 3in y. Let z have ternary expansion identical to x, except that the kth digit is 2. Thenx < z and y > z. Let A = C∩ (−∞, z] and B = C∩ [z,+∞). Then A and B are closedsets that disconnect C.
56 MATH METHODS
29.29 Connected Components of Disconnected Sets
Let (X,T) be a topological space that is not connected. If a subspace S is connected,we can ask it if is there are any larger connected subspaces that include it. If there arenot, then we call S a connected component or just a component of X.
Given a connected subset S, a maximal connected subset containing S can be con-structed by taking the union of all connected subsets of X that contain S. The key step inshowing this is that the union of two connected subsets containing S is also connected.In fact, we can show more than we need. If two connected subsets share even a singlepoint, their union is connected.
Theorem 29.29.1. Suppose A and B are connected subsets of (X,T) containing a pointx. Then A ∪ B is connected.
Proof. Suppose that A ∪ B can be disconnected by relatively open sets U and V.Label the sets U and V so x ∈ U. Keep in mind that x ∈ A∩B and take y ∈ V∩ (A∪B).There are two possibilities: y ∈ A or y ∈ B.
Suppose y ∈ A. But then both A ∩ U and A ∩ V are non-empty, so U and Vdisconnect A, which is impossible.
Otherwise, y ∈ B. Now x ∈ U and y ∈ V, so both B∩U and B∩V are non-empty.Then U and V disconnect the connected set B, which is also impossible.
As both possibilites are impossible, we cannot disconnect A ∪ B.
Earlier we considered the space [0, 1) ∪ (1, 2]. It has two connected components,[0, 1) and (1, 2]. Both sets are connected because they are intervals, and we add evena single point of one interval to the other, we get a disconnected set.
29.3. CONNECTED SETS 57
29.30 Continuity and Connectedness
Our next result is that the continuous image of a connected set is connected. This meansthat we can make connected sets from other connected sets by applying continuousfunctions to them.
Theorem 29.30.1. Suppose f : X → Y is continuous and S ⊂ X is connected. Then f(S)is connected.
Proof. Suppose not. Let A, B be relatively closed sets that disconnect f(S). Nowf−1(A) and f−1(B) are relatively closed because f is continuous. Moreover,
f−1(A) ∩ f−1(B) = f−1(A ∩ B) = ∅
andf−1(A) ∪ f−1(B) = f−1(A ∪ B) ⊃ f−1
(
f(S))
= S.
Thus f−1(A) and f−1(B) disconnect S. As this is impossible, f(S) cannot be discon-nected.
58 MATH METHODS
29.31 Path-Connected Sets
It is often easier to show a set is connected by showing it is path-connected. A pathfrom a to b in S is a continuous function f : [0, 1] → S with f(0) = a and f(1) = b.
Path-connected. A set S is path-connected if for every a, b ∈ S there is a path from ato b in S.
Path-connected sets are connected.
Proposition 29.31.1. Any path-connected set is connected.
Proof. Suppose S is path-connected andA, B are closed sets with S so that S ⊂ A∪B.Choose a ∈ A ∩ S and b ∈ B ∩ S and let f be a path between them in S.
Now consider f−1(A) and f−1(B). These are closed sets in [0, 1] with 0 ∈ f−1(A) and1 ∈ f−1(B). Moreover, f−1(A)∪ f−1(B) = f−1(A∪B) ⊃ [0, 1]. Since [0, 1] is connected,there is some x ∈ f−1(A)∩ f−1(B). It follows that f(x) ∈ A∩B and that A and B cannotdisconnect S. Thus S is connected.
In some spaces the connected and path-connected sets are the same. This happensin R, where the only connected sets are the intervals. This includes trivial intervalssuch as [a, a] and infinite intervals like (0,+∞). Intervals are always path-connected,meaning that the set of connected subsets and the set of path-connected subsets areidentical in R.
29.3. CONNECTED SETS 59
29.32 Convex and Star-shaped Sets
A convex set is one that contains the line segment between any pair of its points.
Convex Set. Let V be a vector space. A set S ⊂ V is convex if for every x,y ∈ S,ℓ(x,y) = (1 − t)x + ty : 0 ≤ t ≤ 1 ⊂ S.
A star-shaped set is a set where all points can be connected to a special point, thestar point, using a line segment.
Star-shaped Set. A set S ⊂ Rm is star-shaped if there is a point x0 ∈ S so that for any x,ℓ(x0, x) ⊂ S. We then say S is star-shaped with respect to x0. The point x0 is called astar point.
It’s easy to show that convex sets are star-shaped.
Proposition 29.32.1. Any convex set S is star-shaped and every point of it is a star point.
Proof. Let x0 be any point in S. Since S is convex, ℓ(x0,y) ⊂ S for any y ∈ S,showing that S is star-shaped and that any point x0 is a star point for S.
AB
Cb bD
E
Figure 29.32.2: The sets A and B are convex. The sets C and D are not convex, asdemonstrated by the red line segments that must leave the set to connect points. The sets Cand D are star-shaped with respect to the heavy dots. The annulus E is neither convex norstar-shaped, although it is connected.
60 MATH METHODS
29.33 Convex and Star-shaped Sets are Connected
An immediate corollary of the definitions together with Proposition 29.31.1 is thatconvex sets are connected.
Corollary 29.33.1. Any convex set is connected.
Proof. If S is convex and x, y ∈ S, the function f(t) = (1 − t)x + ty is a path from xto y in S.
Star-shaped sets are also connected.
Corollary 29.33.2. Any star-shaped set is connected.
Proof. Given x,y ∈ S, the path f(t) = (1 − 2t)x + 2tx0 for t ∈ [0, 1/2] andf(t) = (2 − 2t)x0 + (2t− 1)y for t ∈ [1/2, 1] is a path from x to y that passes throughthe star point x0.
29.3. CONNECTED SETS 61
29.34 Example: Connected, But Not Path-Connected
Path-connectedness is quite useful because it is often easier to show a set is connectedby showing it is path-connected rather than trying to directly use the definition ofconnectedness. There are limitations to this approach because the converse is not true.A set can be connected without being path-connected.
Example 29.34.1: Topologist’s Sine Curve. Consider R2 and let S = (0, y) : −1 ≤y ≤ 1∪ (x, sin(1/x) : x > 0. This is a variant of the topologist’s sine curve of Example13.7.1. The set S is connected but not path-connected.
Suppose, by way of contradiction, that S is path-connected. Then there is acontinuous path in S, f(t) =
(
x(t), y(t))
, with f(0) = (0, 0) and f(1) = (1/π, 0). Thecomponents of f, x and y, inherit continuity from f.
By the Intermediate Value Theorem, there is a t1, 0 < t1 < 1 with x(t1) = 2/3π.Then there is a t2, 0 < t2 < t1 with x(t2) = 2/5π. Continue this process to obtaina decreasing sequence tn with tn → 0 and x(tn) = 2/(2n + 1)π. Since tn is adecreasing sequence that is bounded below, it converges to some t0.
By continuity, y(tn) → y(t0). But this is impossible because y(tn) = +1 when n iseven and −1 when n is odd. This contradiction shows there is no such path f. Theset S cannot be path connected.
It is easy to see that the portion of S with x > 0 is path-connected, as is (0, y) : −1 ≤y ≤ 1. This means that the only possible disconnection is into A = (0, y) : −1 ≤ y ≤1 and B = (x, sin(1/x) : x > 0. This fails because B is not relatively closed. It doesnot contain limn(1/2πn, 0) = (0, 0). Therefore S is connected.
Figure 29.34.2: Topologist’s Sine Curve
62 MATH METHODS
29.35 The Intermediate Value Theorem
An important consequence of Theorem 29.30.1 is the Intermediate Value Theorem,which says that if a continuous function defined on an interval takes two values, it alsotakes all values in between.
Intermediate Value Theorem. If f : [a, b] → R is continuous, and y is a number be-tween f(a) and f(b), then there is at least one c ∈ [a, b] with f(c) = y.
Proof. We may assume f(a) < f(b) without loss of generality. We proceed bycontradiction. If no such c exists, (−∞, y] and [y,+∞) disconnect f([a, b]). This isimpossible by Proposition 29.30.1, and so such a c must exist.
29.3. CONNECTED SETS 63
29.36 A Simple Fixed Point Theorem10/19/21
A function f : X → X has a fixed point if there is an x∗ ∈ X with f(x∗) = x∗. TheIntermediate Value Theorem can be used to show that any continuous function mappingthe unit interval [0, 1] to itself has a fixed point.
Theorem 29.36.1. Let f : [0, 1] → [0, 1] be a continuous function. Then there is x∗ ∈[0, 1] with f(x∗) = x∗.
Proof. If f(0) = 0 or f(1) = 1, we are done.Otherwise, define g(x) = f(x) − x. Then g is continuous with g(0) > 0 and g(1) < 0.
By the Intermediate Value Theorem, there is a x∗, 0 < x∗ < 1, with g(x∗) = 0. Butthen f(x∗) = x∗ and we are done.
64 MATH METHODS
29.37 Contraction Mappings
We temporarily put aside connected sets to prove a stronger fixed point theorem,Banach’s Contraction Mapping Theorem. This powerful theorem can be used to provethe Inverse and Implicit Function Theorems of S&B Chapter 15. It can be used toshow the existence of solutions to differential equations. It has economic applications,including finding solutions to Bellman’s dynamic programming equation.
Contraction. Let (X, d) be a metric space. A function f : X → X is called a contractionif there is an r < 1 with d
(
f(x), f(y))
≤ r d(x, y) for every x, y ∈ X.
In other words, a mapping is a contraction if it the images of any two points areuniformly closer together than the original two points.
29.3. CONNECTED SETS 65
29.38 Contraction Mapping Theorem
Contraction Mapping Theeorem. Let f be a contraction on a complete metric space(X, d). Then f has a unique fixed point x∗. Moreover, take x0 ∈ X and define xn =f(xn−1) for n = 1, 2, . . . . Then xn → x∗.
Proof. First we show uniqueness. Suppose there are two fixed points x∗ and y∗.Then
d(x∗, y∗) = d(
f(x∗), f(y∗))
≤ r d(x∗, y∗).
This implies d(x∗, y∗) = 0, so x∗ = y∗.Consider the sequence xn given in the statement of the theorem. I claim that
d(xn+1, xn) ≤ rnd(x1, x0). We prove this by induction. It is true for n = 1 because
d(x2, x1) = d(
f(x1), f(x0))
≤ r d(x1, x0).
Also, if it is true for n,
d(xn+2, xn+1) = d(
f(xn+1), f(xn))
≤ r d(xn+1, xn) ≤ rn+1d(x1, x0)
shows it is true for n + 1. By induction it follows that it is true for all n = 1, 2, . . . .Suppose m ≥ n. Then
d(xm, xn) ≤m−n−1∑
i=0
d(xn+i+1, xn+i)
≤m−n−1∑
i=0
rn+id(x1, x0)
=rn
1 − rd(x1, x0).
Since the right hand side converges to zero as n → ∞, we conclude xn is a Cauchysequence. By completeness of X the sequence has a limit x∗. The function f iscontinuous, so
f(x∗) = f(
limn
xn)
= limn
f(xn) = lim xn+1 = x∗
showing that x∗ is the unique fixed point of f.
October 23, 2021
Copyright c©2021 by John H. Boyd III: Department of Economics, Florida InternationalUniversity, Miami, FL 33199