Multivariable Calculus - Bard Collegefaculty.bard.edu/~belk/math461s11/MultivariableCalculus.pdf ·...

Multivariable Calculus

1. The Derivative

Definition: Derivative at a PointLet U be an open subset of Rm, let F : U → Rn, and let a ∈ U . We say that F isdifferentiable at a if there exists an n×m matrix DFa satisfying

limh→0

F (a + h)− F (a)−DFa(h)

‖h‖= 0.

In this case, DFa is called the derivative of F at a.

A few notes about this definition:

1. The matrix DFa is uniquely determined. Specifically, if A and B are two ma-trices satisfying the definition of DFa, then subtracting the limits gives

limh→0

Ah−Bh

‖h‖= 0,

which implies that A = B.

2. Though we have stated the definition using the variable h, we could alternativelyuse the variable x = a + h. In this case, the limit definition of DFa becomes

limx→a

F (x)− F (a)−DFa(x− a)

‖x− a‖= 0.

3. The limit definition for DFa is equivalent to the following condition: for everyε > 0, there exists a δ > 0 so that

‖h‖ < δ ⇒ ‖F (a + h)− F (a)−DFa(h)‖ ≤ ε‖h‖.

Multivariable Calculus 2

If we use x = a + h instead, this implication becomes

‖x− a‖ < δ ⇒ ‖F (x)− F (a)−DFa(x− a)‖ ≤ ε‖x− a‖.

4. The definition of the derivative is closely related to a linear approximationformula. Specifically, the definition states that the error in the approximation

F (a + h) ≈ F (a) +DFa(h)

is much smaller than h as h→ 0. This is sometimes written

F (a + h) = F (a) +DFa(h) + o(h),

where o(h) denotes an arbitrary function satisfying

limh→0

o(h)

‖h‖= 0.

2. Partial Derivatives

Definition: Partial DerivativesLet U ⊂ Rm be open, let F : U → Rn, and let a ∈ U . If i ∈ {1, . . . ,m}, the partialderivative of F with respect to xi at a is defined as follows:

∂F

∂xi(a) = lim

h→0

F (a + hei)− F (a)

h.

Here ei denotes a unit vector in the xi direction.

A few notes about this definition:

1. If f : R→ R, then∂f

∂x(a) is the same as f ′(a).

2. As we have defined it, the partial derivative (∂F )/(∂xi)(a) is a vector. Specifi-cally, if F (x) =

(f1(x), . . . , fn(x)

), then

∂F

∂xi(a) =

(∂f1∂xi

(a), . . . ,∂fn∂xi

(a)

).


3. The partial derivatives for a function F fit into a matrix:

∂F1

∂x1(a) · · · ∂F1

∂xm(a)

.... . .

...

∂Fn∂x1

(a) · · · ∂Fn∂xm

(a)

This is called the Jacobian matrix for F at a. Each colum of this matrix isequal to ∂F/∂xi for some i.

4. The definition of a partial derivative closely resembles the limit definition ofthe derivative of single-variable function. Indeed, the partial derivative couldalternatively be defined by

∂F

∂xi(a) = (F ◦ γ)′(0)

where γ : R → Rm is the path defined by γ(t) = a + tei. That is, the partialderivative is velocity vector in Rn of the image of a path that moves in the xidirection in Rm.

5. More generally, one can define the directional derivative of F in the directionof a vector v ∈ Rm by the formula

∂F

∂v= lim

h→0

F (a + hv)− F (a)

h

Note then that the partial derivatives of F are simply the directional derivativesof F in the directions of the standard basis vectors e1, . . . , em. Many theoremsabout partial derivatives can be generalized to directional derivatives.

The following theorem describes the relationship between the partial derivatives andthe derivative of a differentiable function:

Theorem 1 Partial Derivatives of a Differentiable Function

Let U ⊂ Rm be open, let F : U → Rn, and let a ∈ U . If F is differentiable at a,then all of the partial derivatives of F exist at a, and

∂F

∂xi(a) = DFa(ei)

for all i ∈ {1, . . . ,m}, where ei is the i’th standard basis vector.


PROOF Let ε > 0. Since F is differentiable at a, there exists a δ > 0 so that

‖h‖ < δ ⇒ ‖F (a + h)− F (a)−DFa(h)‖ ≤ ε‖h‖.

Let h ∈ R, and suppose that |h| < δ. Then ‖hei‖ < δ, so∥∥∥∥F (a + hei)− F (a)

h−DFa(ei)

∥∥∥∥ =

∥∥∥∥F (a + hei)− F (a)− hDFa(ei)

h

∥∥∥∥=‖F (a + hei)− F (a)−DFa(hei)‖

h

≤ ε‖hei‖h

= ε.

We conclude that

limh→0

F (a + hei)− F (a)

h= DFa(ei) �

As a generalization of this theorem, one can prove that

∂F

∂v(a) = DFa(v)

for any vector v, where ∂F/∂v denotes the directional derivative of F in the directionof v.

The following corollary provides a formula for the derivativeDFa of a differentiablefunction at a point a.

Corollary 2 The Jacobian Matrix

Let U ⊂ Rm be open, and let a ∈ U . Let F : U → Rn be the function defined by

F (x) =(f1(x), . . . , fn(x)

)where each fi : U → R. If F is differentiable at a, then

DFa =

∂f1∂x1

(a) · · · ∂f1∂xm

(a)

.... . .

...

∂fn∂x1

(a) · · · ∂fn∂xm

(a)

.


PROOF By Theorem 1, we know that DFa(ei) is the partial derivative (∂F/∂xi)(a).But for any matrix A, the vector Aei is simply the i’th column of A, and thereforeDFa is the matrix whose i’th column is (∂F/∂xi)(a). From the definition of thepartial derivative, it is easy to see that

∂F

∂xi(a) =

(∂f1∂xi

(a), . . . ,∂fn∂xi

(a)

)and the corollary follows. �

EXAMPLE 1 Parametric CurvesIf f : (a, b)→ R is a differentiable function, it follows from Corollary 2 that

Dfx = f ′(x)

for any x ∈ (a, b). Indeed, it can be shown that f ′(x) exists if and only if Dfx exists.More generally, a parametric curve is any function γ : (a, b)→ Rn. The deriva-

tive of such a curve is the vector γ′(t) defined by the formula

γ′(t) = limh→0

γ(t+ h)− γ(t)

h.

That is, γ′(t) is equal to ∂γ/∂t. Again it follows from Corollary 2 that

Dγt = γ′(t)

for any t ∈ (a, b). Moreover, one can show that γ′(t) exists if and only ifDγt exists. �

EXAMPLE 2 The GradientLet U ⊂ Rm, and let f : U → R. If f is differentiable at a point a, then the derivativeof f at a is the matrix.

Dfa =

[∂f

∂x1(a) · · · ∂f

∂xm(a)

].

Note that Dfa is a row vector, not an ordinary (column) vector. As a linear trans-formation, Dfa is the linear functional Rm → R defined by

Dfa(ei) =∂f

∂xi(a).

The transpose of Dfa is vector, known as the gradient of f :

∇f(a) = (Dfa)T =

(∂f

∂x1(a), . . . ,

∂f

∂xm(a)

)


The gradient satisfiesDfa(v) = ∇f(a) · v

for every vector v, where · denotes the dot product. In particular,

∂f

∂xi(a) = ∇f(a) · ei

for each i ∈ {1, . . . ,m}. �

From Corollary 2, we might be tempted to think that the derivative DFa is thesame as the Jacobian matrix of partial derivatives, but this is not always the case.In particular, it is possible for the partial derivatives of F to exist without F beingdifferentiable:

EXAMPLE 3 A Non-Differentiable FunctionLet ϕ : [0, 2π]→ R be any continuous function with ϕ(0) = ϕ(2π). Define a functionf : R2 → R by

f(x, y) = r ϕ(θ)

where (r, θ) denote polar coordinates on R2. The following statements are easy toverify:

1. The function f is continuous at (0, 0).

2. (∂f)(∂x)(0, 0) exists if and only if ϕ(0) = −ϕ(π).

3. (∂f)(∂y)(0, 0) exists if and only if ϕ(π/2) = −ϕ(3π/2).

4. The function f is differentiable at (0, 0) if and only if f is linear, i.e. if and onlyif ϕ has the form ϕ(θ) = a cos θ + b sin θ for some constants a, b ∈ R.

In particular, if ϕ satisfies conditions (2) and (3) but not (4), then the partial deriva-tives of f will exist but f will not be differentiable.

For a specific example, consider the case where ϕ(θ) = sin(3θ). In Cartesiancoordinates, the resulting function f can be written

f(x, y) =3x2y − y3

x2 + y2,

where f(0, 0) = 0. A graph of this function is shown in Figure 1. Since

f(x, 0) = 0 and f(0, y) = −y


Figure 1: A non-differentiable function whose partial derivatives exist everywhere.

we have (∂f/∂x)(0, 0) = 0 and (∂f/∂y)(0, 0) = −1, so both partial derivatives existat (0, 0). However, it is not true that Df(0,0) =

[0 −1

]. For example, along the line

y = x we have f(x, x) = x, so the directional derivative in this direction satisfies

∂f

∂(1, 1)= 1 6=

[0 −1

] [11

].

An interesting feature of this example is that the partial derivatives of f are notcontinuous at the point (0, 0). For example, one can show that

∂f

∂y(x, 0) =

{3 if x 6= 0

0 if x = 0,

so ∂f/∂y is not continuous at the point (0, 0). �

The following theorem gives conditions on the partial derivatives that guaranteethe differentiability of a function. We shall not prove this theorem here:

Theorem 3 Continuous Partials Imply Differentiability

Let U ⊂ Rm be open, and let F : U → Rn. Suppose that the partial derivatives

∂F

∂x1, . . . ,

∂F

∂xm

exist at each point of U , and are continuous as functions U → Rn. Then F isdifferentiable on U .


In general, a function F : U → Rn is called continuously differentiable if eachof its partial derivatives ∂F/∂xi exists and is continuous as a function U → Rn.Equivalently, F is continuously differentiable if it is differentiable and the derivativeDF is continuous as a function U → Rn×m, where Rn×m is the space of all n × mmatrices.

3. The Chain Rule

We shall now state and prove the Chain Rule for multivariable functions. Our proofwill require the following definition of the norm of a matrix:

Definition: Norm of a MatrixThe norm of an n×m matrix A is defined by

‖A‖ = max‖u‖=1

‖Au‖,

where the max is taken over all unit vectors u ∈ Rm.

Note that the maximum in the above definition always exists since the unit spherein Rm is compact. Though we shall not prove it here, the norm ‖A‖ can also be definedas the largest eigenvalue of the symmetric matrix ATA.

The most important property of the matrix norm is the following inequality:

Proposition 4 Matrix Norm Inequality

Let A be an n×m matrix, and let v ∈ Rm. Then

‖Av‖ ≤ ‖A‖ ‖v‖.

PROOF Let u be a unit vector such that v = ‖v‖u. Then

‖Av‖ =∥∥A(‖v‖u)

∥∥ =∥∥‖v‖(Au)

∥∥ = ‖v‖ ‖Au‖ ≤ ‖v‖ ‖A‖. �

We will also need the following inequality, which is similar in spirit to the MeanValue Theorem from single variable calculus. Its proof is left as an exercise to thereader:


Lemma 5 Mean Value Inequality

Let U ⊂ Rm be open, let F : U → Rn, and suppose that F is differentiable at apoint a. Then for any M > ‖DFa‖, there exists a δ > 0 so that

‖x− a‖ < δ ⇒ ‖F (x)− F (a)‖ ≤ M‖x− a‖.

As a trivial consequence of this inequality, we see that a function which is differ-entiable at a must be continuous at a.

Theorem 6 Chain Rule

Let U ⊂ Rm and V ⊂ Rn be open, let F : U → V and G : V → Rp be continuous,and suppose that F is differentiable at a and G is differentiable at b = F (a).Then G ◦ F is differentiable at a, and

D(G ◦ F )a = DGbDFa.

PROOF Let ε > 0. Let LF : Rm → Rn and LG : Rn → Rp be the functions

LF (x) = F (a) +DFa(x− a) and LG(y) = G(b) +DFb(y − b)

We must show that

‖(G ◦ F )(x)− (LG ◦ LF )(x)‖ ≤ ε‖x− a‖

when x is sufficiently close to a. By the triangle inequality,

‖(G ◦ F )(x)− (LG ◦ LF )(x)‖

≤ ‖(G ◦ F )(x)− (LG ◦ F )(x)‖ + ‖(LG ◦ F )(x)− (LG ◦ LF )(x)‖.

We shall handle these two terms separately:

1. Fix a positive constant M > ‖DFa‖. Since G is differentiable at b, there existsan r > 0 so that

‖y − b‖ < r ⇒ ‖G(y)− LG(y)‖ ≤ ε

2M‖y − b‖.

Since F is continuous at a, we can find a δ1 > 0 so that

‖x− a‖ < δ1 ⇒ ‖F (x)− F (a)‖ < r,


Since F is differentiable at a, the Mean Value Inequality gives us a δ2 > 0 so that

‖x− a‖ < δ2 ⇒ ‖F (x)− F (a)‖ ≤ M‖x− a‖.

Combining these, we find that

‖(G ◦ F )(x)− (LG ◦ F )(x)‖ ≤ ε

2M‖F (x)− F (a)‖ ≤ ε

2‖x− a‖

whenever ‖x− a‖ < min(δ1, δ2).

2. For the second term, fix a positive constant N > ‖DGb‖. Since F is differentiableat a, there exists a δ3 > 0 so that

‖x− a‖ < δ3 ⇒ ‖F (x)− LF (x)‖ ≤ ε

2N‖x− a‖.

If ‖x− a‖ < δ3, it follows that

‖(LG ◦ F )(x)− (LG ◦ LF )(x)‖ =∥∥DGb

(F (x)− LF (x)

)∥∥≤ N‖F (x)− LF (x)‖

≤ ε

2‖x− a‖.

Combining our two bounds together, we find that

‖(G ◦ F )(x)− (LG ◦ LF )(x)‖ ≤ ε

2‖x− a‖+

ε

2‖x− a‖ = ε‖x− a‖

whenever ‖x− a‖ < min(δ1, δ2, δ3). �

4. Diffeomorphisms

One of the most important types of differentiable maps is the diffeomorphism:

Definition: DiffeomorphismLet U and V be open subsets of Rn. A diffeomorphism from U to V is a mapF : U → V with the following properties:

1. F is bijective.

2. F is differentiable, and

3. F−1 is differentiable.


U

F−−−−−→

V

Figure 2: A diffeomorphism between two open subsets of R2.

That is, a diffeomorphism is an invertible differentiable map whose inverse is alsodifferentiable. Diffeomorphisms play roughly the same role in multivariable calculusthat isomorphisms do in group theory, or homeomorphisms do in topology. (To beprecise, diffeomorphisms are the invertible morphisms in the category of open setsand differentiable maps.)

Geometrically, a diffeomorphism corresponds to a smooth “distortion” of an opensubset of Rn, as shown in Figure 2. The picture for a diffeomorphism in R3 would besimilar: each coordinate plane in the domain would map to a curved surface in therange.

Theorem 7 Derivative of the Inverse

Let ϕ : U → V be a diffeomorphism, let a ∈ U , and let b = ϕ(a). Then Dϕa isnonsingular, and

(Dϕa)−1 = D(ϕ−1)b.

PROOF By the Chain Rule,

D(ϕ−1)bDϕa = D(ϕ−1 ◦ ϕ)a = D(id)a = I,

where I is the n× n identity matrix. The proposition follows. �

EXAMPLE 4 Diffeomorphisms on RLet f : R→ R be a diffeomorphism. According to the above theorem, the derivativef ′ for such a function must satisfy f ′(a) 6= 0 for each a ∈ R. Moreover, the derivativesof f and f−1 are related by the formula

(f−1)′(b) =1

f ′(a)


for each a ∈ R, where b = f(a). Indeed, it follows from the Inverse Function Theorem(see below) that any differentiable bijection f : R → R with nonzero derivative is adiffeomorphism.

If f has a critical point, then it cannot be a diffeomorphism. For example, themap f : R → R defined by f(x) = x3 is a differentiable bijection, but is not adiffeomorphism, since the inverse function f−1(x) = 3

√x is not differentiable at 0. �

We can extend the notion of critical points to functions on Rn:

Definition: Critical Points for a Function Rn → Rn

Let U and V be open subsets of Rn, and let F : U → V . A point a ∈ U is called acritical point for F if either

1. F is not differentiable at a, or

2. F is differentiable at a, but DFa is a singular matrix.

Note that, if F is differentiable at a, then a is a critical point for F if and only ifthe determinant of the derivative matrix DFa is zero. This gives a practical test forfinding the critical points of a function.

According to Theorem 7, a diffeomorphism cannot have any critical points. Theconverse is also true, but it will require the Inverse Function Theorem to prove:

Theorem 8 Characterization of Diffeomorphisms

Let U and V be open subsets of Rn, and let F : U → V be a bijection. Then Fis a diffeomorphism if and only if F has no critical points.

Curvilinear Coordinates

Though diffeomorphisms might seem like an entirely new concept, they are closelyrelated to the familiar notion of curvilinear coordinates.


-¼

¼

0

0 2 4 6 8

U

ϕ−−−−−→

V

Figure 3: The polar coordinates diffeomorphism.

Definition: Curvilinear CoordinatesLet V be an open subset of Rn. A set {u1, . . . , un} of real-valued functions onV are called curvilinear coordinates if there exists an open set U ⊂ Rn and adiffeomorphism ϕ : U → V such that

ϕ(u1(x), . . . , un(x)

)= x

for all x ∈ V .

Note that the coordinat functions u1, . . . , un are essentially the components of theinverse diffeomorphism ϕ−1 : V → U . That is,

ϕ−1(x) =(u1(x), . . . , un(x)

)for all x ∈ V . Thus, we could alternatively define curvilinear coordinates as thecomponents of a diffeomorphism with domain V . For applications, however, the mapϕ is usually more important, so we will stick with the definition given above.

EXAMPLE 5 Polar CoordinatesLet V be the complement of the negative real axis (−∞, 0] × {0} in R2. Let U =(0,∞)× (−π, π), and define a function ϕ : U → V by

ϕ(r, θ) = (r cos θ, r sin θ).

This function is shown in Figure 3. Clearly ϕ is a bijection, and

detDϕ(r,θ) =

∣∣∣∣∣cos θ −r sin θ

sin θ r cos θ

∣∣∣∣∣ = r 6= 0

for all (r, θ) ∈ U , so ϕ is a diffeomorphism. The resulting curvilinear coordinatesr, θ : V → R are the familiar polar coordinates. �


EXAMPLE 6 Spherical CoordinatesLet V be the complement of the half-plane (−∞, 0] × {0} × R in R3. Let U be theregion (0,∞)× (−π, π)× (−π/2, π/2), and define a function ϕ : U → V by

ϕ(ρ, θ, φ) = (ρ cos θ cosφ, ρ sin θ cosφ, ρ sinφ).

Then ϕ is a bijection, and

detDϕ(ρ,θ,φ) =

∣∣∣∣∣∣∣∣cos θ cosφ −ρ sin θ cosφ −ρ cos θ sinφ

sin θ cosφ ρ cos θ cosφ −ρ sin θ sinφ

sinφ 0 ρ cosφ

∣∣∣∣∣∣∣∣ = ρ2 cosφ 6= 0

so ϕ is a diffeomorphism. The resulting curvilinear coordinates ρ, θ, φ : V → R arecalled spherical coordinates. �

In general, if ϕ : U → V is a diffeomorphism with (x1, . . . , xn) = ϕ(u1, . . . , un),then the derivative of ϕ is a matrix of partial derivatives:

Dϕ =

∂x1∂u1

· · · ∂x1∂un

.... . .

...

∂xn∂u1

· · · ∂xn∂un

Similarly, since (u1, . . . , un) = ϕ(x1, . . . , xn), the derivative of ϕ−1 is also a matrix ofpartial derivatives

D(ϕ−1) =

∂u1∂x1

· · · ∂u1∂xn

.... . .

...

∂un∂x1

· · · ∂un∂xn

By Theorem 7, these two matrices should be inverses at each pair of correspondingpoints in U and V .

Change of Variables

In this section we discuss the change of variables formula for integration. We beginby reviewing the geometric meaning of determinants.


v1

v2

v3

a

Figure 4: The parallelepiped P (a,v1,v2,v3).

Definition: ParallelepipedA parallelepiped in Rn is a set of the form

P (a,v1, . . . ,vn) = {a + t1v1 + · · ·+ tnvn | t1, . . . , tn ∈ [0, 1]}

where a,v1, . . . ,vn ∈ Rn (see Figure 4).

Parallelepipeds are the n-dimensional analogues of parallelograms. In the casewhere the vectors v1, . . . ,vn are orthogonal, the parallelepiped P (a,v1, . . . ,vn) issimply a rectangular box.

The following geometric formula is basic to any development of the theory ofdeterminants:

Theorem 9 Volume of a Parallelepiped

Let a,v1, . . . ,vn ∈ Rn, and let A be the matrix whose columns are the vectorsv1, . . . ,vn. Then the volume of the parallelepiped P (a,v1, . . . ,vn) is |detA|.

Here “volume” refers to area in the case where n = 2, or length in the case wheren = 1. As a result of the above theorem, we obtain the following interpretation forthe determinant of a linear transformation:

Theorem 10 Geometric Meaning of the Determinant

Let T : Rn → Rn be a linear transformation, and let P be a parallelepiped in Rn.Then T (P ) is also a parallelepiped, and the volume of T (P ) is |detT | times thevolume of P .


PROOF Suppose that P = P (a,v1, . . . ,vn). Then clearly

T (P ) = P(T (a), T (v1), . . . , T (vn)

).

Let A be the matrix whose columns are v1, . . . ,vn, and let B be the matrix whosecolumns are T (v1), . . . , T (vn). Then B is equal to TA, the product of the matrix forT with A, so

|detB| = |(detT )(detA)| = |detT | |detA|. �

In fact, a much more general statement is true:

Theorem 11 Linear Transformations and Measure

Let T : Rn → Rn be a linear transformation. Then

µ(T (S)

)= |detT |µ(S)

for any measurable set S ⊂ Rn, where µ denotes Lebesgue measure on Rn.Furthermore, if detT 6= 0, then∫

Rn

f dµ =

∫Rn

(f ◦ T ) |detT | dµ

for any L1 function f : Rn → R.

Sketch of Proof The first statement follows from the fact that

µ(S) = inf{∑

µ(Pn)∣∣ P1, P2, . . . are parallelepipeds in Rn and S ⊂

⋃Pn}

for any measurable set S. Then the second statement is clearly true for simplefunctions, and can be extended to any L1 function using the Dominated ConvergenceTheorem. �

We now consider the determinant of a differentiable map at each point:

Definition: JacobianLet U be an open subset of Rn, and let F : U → Rn be a differentiable map. TheJacobian of F is the function JF : U → [0,∞) defined by

JF (a) = detDFa.


Roughly speaking, the Jacobian measures the amount by which F multiplies vol-umes at a point a. That is, if S is a measurable set in a small neighborhood of a,then

µ(f(S)

)≈ |JF (a)|µ(S).

Incidentally, one can prove that the Jacobian is always a measurable function.We are now ready to state the change of variables formula, though we are not in

a position to prove it:

Theorem 12 Change of Variables

Let ϕ : U → V be a diffeomorphism, and let f : V → R be an L1 function. Then∫V

f dµ =

∫U

(f ◦ ϕ) |Jϕ| dµ,

where µ denotes Lebesgue measure on Rn.

EXAMPLE 7 Integration in Polar CoordinatesLet ϕ : U → V be the diffeomorphism

ϕ(r, θ) = (r cos θ, r sin θ)

defined in Example 5, where U = (−π, π) × (0,∞) and V is the complement of thenegative real axis. Then the Jacobian of ϕ is given by the formula

Jϕ(r, θ) = detDϕ(r,θ) =

∣∣∣∣∣cos θ −r sin θ

sin θ r cos θ

∣∣∣∣∣ = r.

Therefore, by the change of variables formula∫V

f(x, y) dµ(x, y) =

∫U

(f ◦ ϕ)(r, θ) r dµ(r, θ)

for any L1 function f : V → R. Since R2 − V has measure zero, it follows that∫R2

f(x, y) dµ(x, y) =

∫U

f(r cos θ, r sin θ) r dµ(r, θ). �

Multivariable Calculus - Bard Collegefaculty.bard.edu/~belk/math461s11/MultivariableCalculus.pdf ·...

Documents

Transcript of Multivariable Calculus - Bard Collegefaculty.bard.edu/~belk/math461s11/MultivariableCalculus.pdf ·...