Calculus Gems Barrow, Newton and Hooke · better than to read Calculus Gems by George Simmons, or...

0 INTRODUCTION

These notes are intended as a supplement to a lecture course. It is assumedthat the teacher will cover the material roughly in the order given here andcertainly will use the ideas and proofs as given here. There are enough sampleexercises given to allow for the construction of myriads of similar exercises,if needed. Thus, students taking this course do not need to buy an expensivetextbook.But this is not a traditional text. There are no pictures in the text; but thelectures should have lots of pictures on the blackboard as well as illustrativephysical objects — and then you can draw your own pictures in the text.There is no color; but there should be lots of (fluorescent) colored chalkin the lectures. There are no displayed numbered theorems, there are nospecial messages in boxes, there are no end of chapter summaries; but thereis a detailed list of contents to show how the plot unfolds. Nor are thereany historical vignettes; the student who wishes these can probably do nobetter than to read Calculus Gems by George Simmons, or Huygens andBarrow, Newton and Hooke by Vladimir Arnold. I take the view that afirst course on multi-variable calculus should be as short and as simple asit can be made. The majority of current calculus texts have unfortunatelygiven the impression that calculus is a massive compendium of formulas andprocedures. In fact, there are only a tiny handful of ideas in calculus andthese come up again and again in slightly different guises. It cannot be saidoften enough that differential calculus is a study of local linear approximationand integral calculus is a study of a generalized summation process.I do not claim that all the exercises in these notes are simple; some harderproblems have been included (with no special warning signs) for studentswho find the other exercises too easy — or too dull.This is not a compendium. It is a book to be READ through — think ofit as a novel whose plot is the unfolding of vector calculus. But it is notintended as fireside reading. Readers need paper and pencil to fill in somesteps in the arguments that have been deliberately left to the reader. Ofcourse these details are expounded in the lectures. Unlike a novel, an indexis provided; but these notes are regularly revised and occasionally a referencein the index may be one page off. But a good reader always browses forwardsand backwards from the point of initial attack.

1

Please try to enjoy calculus. It provides truly remarkable tools for the solu-tion of problems in geometry, in the physical sciences and in other sciences.

This latest version has benefited greatly from many suggestions by SergeTabachnikov.

John DuncanMay 1999

2

CONTENTS

Chapter 1 Paths in 2-space and 3-space via vectors

Paths, vectors and lines in 2-space. The scalar (or, dot) product andperpendicularity. The tangent vector to a path at a point. The length of apath. Dynamical interpretations. The unit tangent and unit normal vectors,curvature and the osculating circle.

Paths, vectors, lines and planes in 3-space. The scalar and vector prod-ucts. The geometry of lines and planes in 3-space. The scalar triple productand the volume of a parallelepiped. The tangent vector to a path at a point.The length of a path. The unit tangent, normal and binormal vectors, cur-vature and torsion. The Serret-Frenet formulas. Dynamics in 3-space.

Chapter 2 Differential calculus for several variables

Differentiable functions of two variables, partial derivatives, the tangentplane at a point on a surface. Algebraic rules for differentiable functions,chain rules. The gradient vector of f , directional derivatives, the steepestslope at a point on a surface. ∇f as a normal vector to the curve f = c andoptical properties of conics.

Differentiable functions of three variables and generalizations of the aboveideas. ∇F as a normal vector to the surface F = c. Quadric surfaces, ruledquadric surfaces, cylinders and cones. Change of variable with the Jacobianmatrix, homogeneous functions.

Local extrema: higher partial derivatives, local quadratic approximationand the classification of max/min/saddles via rt−s2. Constrained optimiza-tion via Lagrange multipliers; applications to inequalities.

Chapter 3 Integral calculus for several variables

Double integrals and repeated integrals over rectangles. Generalizationto other regions of integration. Moments and centers of mass. Integration inpolar coordinates:

dxdy = rdrdθ.

3

Generalizations to triple integrals. Integration in cylindrical coordinates:

dxdydz = rdrdθdz.

Integration in spherical coordinates:

dxdydz = ρ2 sinϕ dρdθdϕ.

Surface area.

Chapter 4 Vector calculus

Vector fields, gradient vector fields, gravitation. The three fundamentaldifferential expressions: ∇f, divF, curlF.

Work and path integrals; the conservative vector fields are the gradientvector fields. Green’s theorem in the plane — without singularities, and withsingularities. Areas via path integrals. Flux and the divergence theorem in 2-space (without, and with, singularities). Surface integrals and the divergencetheorem in 3-space (without, and with, singularities). Stoke’s theorem in 3-space.

Appendix

The general formula for change of variables in double integrals. Complexexponentials and their uses. The Laplacian in polars, cylindricals and spher-icals. The theorem of Pappus. The equality of mixed partial derivatives.

4

1 PATHS IN 2-SPACE AND 3-SPACE VIA VECTORS

In previous calculus classes we have studied graphs (or curves, or paths)in three different ways: explicitly as y = f(x) in Cartesian coordinates, orr = ϕ(θ) in polar coordinates; and implicitly as F (x, y) = c. Each methodis suited to specific cases but none is suitable to describe a general compli-cated path in 2-space — for example, the path of a person running withina complicated maze towards the center. In fact, there is a very simple wayto describe such complicated paths. It is enough to provide the Cartesiancoordinates of the runner at any time t. Each coordinate is a function of tand so we may describe the path by (x(t), y(t)); the path may be specifiedby letting t vary from the initial time t = a to the final time t = b. For thecase y = f(x) we take

(x(t), y(t)) = (t, f(t))

for the case r = ϕ(θ) we take

(x(θ), y(θ)) = (ϕ(θ) cos(θ), ϕ(θ) sin(θ)).

Any letter will do for the variable name! We prefer to use t when we arethinking of the path as the motion of a particle. Rather simple formulas forx(t) and y(t) can lead to quite complicated paths; for example, try to drawthe path

(x(t), y(t)) = (2t− 3 sin(t), 2− 3 cos(t))

(without technological helps). [It’s a drunken version of the line y = 2.] Sincedifferential calculus is about local linear approximation, we do well to beginwith a clear understanding of linear paths.

The straight line y = mx+c can be described by (x(t), y(t)) = (t,mt+c).More generally, consider the path given by

(x(t), y(t)) = (a+ αt, b+ βt).

For t = 0 we get the point (a, b). We get the point with t = 1 by addingon (α, β); we get the point with t = 2 by adding on (α, β) twice. Verify foryourself that the path here is just the line determined by the points (a, b)and (a + α, b + β). To say it another way, we get the path by starting atthe origin, jumping to the point (a, b) and then heading off in the direction(α, β). All of this can be conveniently described in the language of vectors.

5

A vector in 2-space is a shove of the plane, that is, a transformation ofthe form

(x, y) −→ (x+ p, y + q).

Notice that we know a vector as soon as we know where any one point isshoved to. If the origin is shoved to (3, 2) then p = 3 and q = 2. If the point(−1, 2) is shoved to (4, 1) then p = 5 and q = −1. In general, if (x, y) isshoved to (ξ, η) then p = ξ−x and q = η−y. We call the shove from (x, y) to(ξ, η) the arrow with tail at (x, y) and head at (ξ, η). We may thus think of avector as made up of an infinite collection of arrows. Any two of these arrowsare parallel and have the same length. We say that the vector is representedby any one of these arrows; for example, our vector is represented by thearrow with tail at the origin and head at (p, q). We denote a general vectorby u (or u) and we write

u =< p, q > .

Notice that there is an obvious association between the vector < p, q > andthe point (p, q) in 2-space. When we speak of the position vector < p, q >we have in mind the representation by the arrow with tail at (0, 0) and headat (p, q).

It is very easy to do algebra with vectors. Given the vectors u =< p, q >and v =< r, s >. We define the sum of u and v to be the shove obtained byperforming the shoves u and v in succession (it does not matter which shovewe do first — why?) and we denote it by u + v. Evidently we get the verysimple formula

< p, q > + < r, s >=< p+ r, q + s > .

The usual rules for addition of numbers hold equally for addition of vectors— check it for yourself! Given a scalar (real number) k, we define ku to bethe vector < kp, kq >. Notice that (−1)u is represented by the arrow withtail at (0, 0) and head at (−p,−q), or by the arrow with tail at (p, q) andhead at (0, 0); thus multiplication by −1 reverses the direction. Check foryourself such obvious algebraic rules as

k(u+ v) = ku+ kv, (k + l)u = ku+ lu.

The length (or magnitude) of a vector, denoted by |u|, is just the lengthof any representing arrow and so we have the formula |u| =

√p2 + q2. We

6

easily check that |ku| = |k||u|. Thus, for any vector u, (|u|−1)u is a unitvector in the same direction as u.

Now let’s go back to the line given by (x, y) = (a + αt, b + βt). Writer =< x, y >, r0 =< a, b >,v =< α, β > and we get the vector equation

r = r0 + tv.

Conversely, such a vector equation gives a line as follows: jump onto thepoint with position vector r0 and then move up and down in the direction ofthe vector v. Notice the linear shape for the above vector equation for a line!But this is not the traditional way to describe a line; we are more familiarwith an equation of the form px + qy = r. To link this in with vectors weneed to introduce another idea.

Given two lines in the plane we can work out the angle between themby using their slopes and the formula for tan(A − B); but there is a muchbetter way to do it by vectors. The angle between two vectors is, of course,the angle between representative arrows. Let u =< p, q > and v =< r, s >.We define the scalar product (or dot product) of u and v by the formula

u · v = pr + qs.

Notice that u ·u = |u|2. Consider triangle PQR where arrow PQ representsu, arrow PR represents v, and so the angle at P is the angle, say θ, betweenu and v. The Law of Cosines gives

RQ2 = PQ2 + PR2 − 2PQ PR cos(θ).

In vector notation this says

|u− v|2 = u · u+ v · v − 2|u||v| cos(θ).

We easily verify the distributive law

(u− v) · (u− v) = u · u+ v · v − 2u · v

for the scalar product and hence we deduce the vital formula

u · v = |u||v| cos(θ).

In particular, u and v are perpendicular if and only if u ·v = 0. As a simpleapplication, note that the equation ax + by = 0 can be written as n · r = 0

7

and so the direction along the line from (0, 0) to (x, y) is perpendicular tothe vector n =< a, b >. We say that n is a normal vector to the line. Moregenerally, for the line a(x−x0)+ b(y−y0) = 0 through the point (x0, y0), thevector < a, b > is perpendicular to the direction along the line from (x0, y0)to (x, y). Note in passing that the vector < a, b > is perpendicular to thevector < b,−a >. [We remark that many problems in plane geometry canbe solved by vector computations; see the exercises.]

Beyond these nice geometrical uses, vectors are indispensable in physics.Force is NOT a vector — it is an arrow and hence represents a vector, whichwe may call a force vector. The direction of the arrow gives the directionin which the force is acting, the length of the arrow corresponds to themagnitude of the force. Physical experiments confirm that two forces actingthrough a point have the same effect as the single force that correspondsto the sum of the two force vectors (the parallelogram law). Similarly, theinstantaneous velocity of a particle is an arrow and we combine velocities bythe same parallelogram law. Equally for acceleration (which is essentiallyequivalent to force by Newton’s second law). This is a good time to askthe obvious question: if so many concepts from physics are arrows and notvectors, why don’t we do our mathematics with arrows instead of the moreabstract notion of vectors? The answer is that we want to solve physicsproblems using algebraic tools, but there is no sensible definition for the sumof two arrows — unless they have a common tail. Even in that case we wouldget a different arrow algebra for each choice of the position of the commontail. So we are going to use vectors.

Return now to a general path (x(t), y(t)) in 2-space. How do we getthe tangent line at the point where t = p? We look for the local linearapproximation there. For smooth functions x(t), y(t) we have

x(t) ≈ x(p) + x′(p)(t− p), y(t) ≈ y(p) + y′(p)(t− p).

So the tangent line at t = p is the line through the point (x(p), y(p)) inthe direction of the vector < x′(p), y′(p) >; for obvious reasons we call thisthe tangent vector at t = p. Represent the path by the vector equationr(t) =< x(t), y(t) > and then it is natural to write r′(p) for the tangentvector at t = p. So the tangent line is given vectorially by r(p)+ (t− p)r′(p).When r(t) is the position vector of a particle at time t, we see that r′(p) isthe instantaneous velocity vector at t = p, usually denoted v(p). The speedv is the magnitude of the velocity and so the unit tangent vector at t = p is

8

given by t = (1/v)r′(p). For example, when

r(t) =< 2t− 3 sin(t), 2− 3 cos(t) >

we getr′(t) =< 2− 3 cos(t), 3 sin(t) >

and we can easily work out when the tangent line is vertical without havingto worry about infinite slope. We just look for tangent vectors which havethe form < 0, b > with b = 0. For the above example, we take t = p wherecos(p) = 2/3.

We now have a tool for paper-and-pencil path sketching; if we want only apicture it is of course easier to use a graphing calculator or computer software.On the other hand, a graphics picture may fail to show the finer propertiesof a path, or even, as we shall see later, give an entirely false impression ofthe shape of a path near a singular point. We say that t = p gives a singularpoint for the path r(t) if r′(p) = 0. As an example, consider the astroid givenby

r(t) = a < cos3(t), sin3(t) > .

Thenr′(t) = 3a cos(t) sin(t) < − cos(t), sin(t) >

and so we get a singular point when t = kπ/2 for k = 0, 1, 2, 3. For 0 <t < π/2 the unit tangent vector is given by t =< − cos(t), sin(t) > andthis gives the direction < −1, 0 > when t = 0. For 3π/2 < t < 2π the unittangent vector is given by t =< cos(t),− sin(t) > and this gives the direction< 1, 0 > when t = 2π. Evidently the astroid is needle sharp at the point(a, 0) — we call this a cusp point. [From physical considerations we can getthis instant direction reversal at a point only if the velocity vector is zeroat that point.] We can easily construct “pointed” curves where the anglebetween the one-sided tangent lines is positive; when this angle is small it ishard to tell from a graphics picture whether or not a cusp is present. Sketchthe astroid for yourself. Where is the longest tangent vector?

We may think of a path as a bent piece of wire and so the natural measureassociated with it is its length. To get a formula for the length we apply thestandard technique of integral calculus — calculate the formula for a verysmall piece and add up by integration! For two points very close togetherthe path between them is almost a straight line and so the length ds satisfies

ds2 = dx2 + dy2

9

by Pythagoras’ Theorem. But we have dx = x′(t)dt, dy = y′(t)dt and so thelength L from t = a to t = b is given by

L =

∫ b

a

√x′(t)2 + y′(t)2 dt

or, more succinctly,

L =

∫ b

a

|r′(t)|dt =∫ b

a

v dt.

This last formula for path length is, of course, obvious physically. Whenwe measure the length from t = a to a general point on the path we preferto denote the length by s; of course s is a function of t and we have thefundamental formula

v =ds

dt.

For most paths we are unable to carry out the exact integration; but we cando so for lots of interesting curves — for example, for the astroid we havev = 3a cos(t) sin(t) for 0 < t < π/2 and so it has length 3a/2 from t = 0 tot = π/2.

For a path r(t) the first derivative r′(t) tells us the direction of the pathat any point. We shall see that the second derivative enables us to measurehow tightly a curve is bending. Let θ be the angle between the tangent vectorand the horizontal. We define the curvature κ at a point to be the magnitudeof the rate of change of θ with respect to arc length; thus

κ =

∣∣∣∣dθds∣∣∣∣ .

This formula is conceptually natural but computationally useless. Evidentlythe unit tangent vector is given by t =< cos θ, sin θ > and hence

d

dst =< − sin θ

dθ

ds, cos θ

dθ

ds>

so that

κ =

∣∣∣∣ ddst∣∣∣∣ .

Recall that

t = (1/v)r′(t) =dt

ds

d

dtr = r′(s)

10

and so we haveκ = |r′′(s)|.

These are deliciously simple formulas, but unfortunately we almost never geta path given to us with r expressed as a function of s. Happily the chain ruleis at hand to tell us that d/ds = (1/v)d/dt. Consider the astroid (an easyexample!). We calculated earlier that, in the first quadrant,

t =< − cos(t), sin(t) >

and so we get (d/dt)t =< sin(t), cos(t) > and hence

d

dst =

1

v

d

dtt =

1

v< sin(t), cos(t) > .

It follows that the curvature, in the first quadrant, is given by the formulaκ = 1/(3a cos(t) sin(t)). We shall discuss the significance of this formulapresently.

Notice from our earlier computation that t is perpendicular to (d/ds)t.There is a deliciously easy way to see this by vector calculus. Check foryourself the following product rule:

d

dtu(t) · v(t) = u′(t) · v(t) + u(t) · v′(t).

Since t · t = 1 we deduce immediately that 2t · (d/ds)t = 0. A vector per-pendicular to a tangent vector is called a normal vector. In 2-space, such avector is unique up to length and possible reversal of direction. We removethe ambiguity in direction by defining the unit normal vector by the equation

d

dst = κn.

Since v ≥ 0 we note that n is in the same direction as the vector (d/dt)t; inpractice, this often gives us a short cut to the computation of n. As soon aswe draw t on a curve, then, without any computation, we already know nup to an ambiguity in direction. In fact, it is easy to see that n is the inwardnormal [draw a picture of a curve and see where t(p+ h)− t(p) points to!].

The computation of curvature is usually not so simple as for the astroid.The astroid is easy because we get a very simple formula for t in that case.

11

In general we have t = (1/v)r′(t). Using another product rule (which youshould check for yourself), we get

d

dtt =

−v′

v2r′(t) +

1

vr′′(t)

and we can readily calculate κ and n from this. We can illustrate the proce-dure by finding the curvature for the case y = f(x).

We have r =< x, f(x) > and so r′(x) =< 1, f ′(x) > and v2 = 1 + f ′(x)2.We differentiate the last equation to give vv′ = f ′(x)f ′′(x). Now we get

d

dxt =<

−v′

v2,−v′f ′(x)

v2+

1

vf ′′(x) >=

f ′′(x)

v3< −f ′(x), 1 >

after a little computation. It follows that

κ =|f ′′(x)|v3

, n = ±1

v< −f ′(x), 1 >

where v =√

1 + f ′(x)2 and where the sign is chosen to make n inwardpointing.

We now calculate the curvature for our favorite curves. Evidently astraight line has zero curvature everywhere. What about a circle? Withoutloss of generality we take the circle to be x2 + y2 = a2 and we take the point(0, a). Implicit differentiation gives x + yy′ = 0 and so 1 + yy′′ + (y′)2 = 0.We have y′ = 0 at our point and so y′′ = −1/y = −1/a. This gives κ = 1/a.[We have used this method as an illustration of using the above formula forcurvature via implicit differentiation. In fact, it is better to use the parame-terization r =< a cos(t), a sin(t) > and the method used for the astroid.] Acircle of large radius is almost straight and has small curvature; a circle ofsmall radius bends very quickly and has large curvature. The circle exampleis so fundamental that we use it as a benchmark. For a point P on a pathr(t) the osculating circle is the circle through P which has the same tangentline and the same curvature at P as the given path (this is the same as sayingthat the path and the circle agree up to the second derivative at P ). Thus,the osculating circle has radius 1/κ and its center is given by

ρ = r(p) + (1/κ)n

where n is the unit normal vector to the path at t = p.

12

There are many interesting exercises in calculating curvature for wellknown functions such as cos, sin, exp, log. It is often not easy to guess wherethe curvature is greatest. Consider the astroid where we found that κ =1/(3a cos(t) sin(t)) in the first quadrant. Here κ is least (and the astroid isflattest) when t = π/4. As t approaches 0 or π/2, κ approaches infinity!Near a cusp, the astroid is not getting flatter (which intuition suggests) —it is instead getting more and more curved. This fact is not obvious fromcomputer graphics, no matter how often one zooms in on the cusp.

Curvature is significant for road design. We never want an instantaneouschange in curvature; in particular, at a “cloverleaf” we never want to changedirectly from a circular arc to a straight line. The path traced out by thecenter of the osculating circle as we move along the given path is called theevolute of the path; it has significance in the theory of optics. The pathitself is called the involute of the evolute; it gives us the path of the end of apiece of string that unwinds off the evolute curve. This theme is the sourceof many interesting exercises.

We come finally to our discussion of paths in 3-space. Of course, such apath is described by three functions (x(t), y(t), z(t)). Again we wish to takea vector approach and so we begin by considering vectors in 3-space.

A vector in 3-space is a shove of 3-space, that is, a transformation of theform

(x, y, z) −→ (x+ a, y + b, z + c).

We now repeat all the ideas from 2-space. We denote a general vector byu =< a, b, c > and we do vector algebra by

< a, b, c > + < d, e, f >=< a+ d, b+ e, c+ f >

k < a, b, c >=< ka, kb, kc >

< a, b, c > · < d, e, f >= ad+ be+ cf.

The length of u is now given by |u| =√a2 + b2 + c2 (use Pythagoras twice).

The same rules of algebra apply as for 2-vectors. In particular we haveu · v = |u||v| cos(θ) (and the proof is identical to the 2-space proof).

Before we turn to paths in 3-space we need to study the linear geometricalobjects in 3-space, namely, lines and planes. A line in 3-space is evidentlygiven by

r = r0 + tv.

13

In coordinates, this looks like

x = a+ λt, y = b+ µt, z = c+ νt.

Thus (a, b, c) is one point on the line and the vector < λ, µ, ν > gives thedirection of the line. There are MANY ways to write a vector equation fora given line (this is true in 2-space, as in 3-space); we may replace r0 by anyother point r1 on the line, and we may replace v by any multiple of v.

A plane through a point r0 may be thought of as all lines through r0 thatare perpendicular to a given direction — a normal vector n to a plane. Soan implicit equation for the plane is given by

n · (r− r0) = 0.

In coordinates this looks like

ax+ by + cz = d.

We can also give an explicit equation for the plane. Let u, v be representedby two arrows in the plane with their tails at r0. Then any arrow in the planecorresponds to a vector of the form su + tv (where s, t vary independentlyover all real values). So the plane is given explicitly by

r = r0 + su+ tv.

Notice that we have two parameters for a plane as opposed to one parameterfor a line. We sometimes say that a plane has two degrees of freedom, whilea line has only one degree of freedom.

Given two points r0, r1, we get the line through them by taking v =r1 − r0. Given three points r0, r1, r2, (not all on one line), we get the planethrough them in the explicit form by taking v = r1− r0,u = r2− r0. How dowe get the plane in the traditional implicit form? We need to find a vectorn that is perpendicular to both u and v. This technical exercise occurs sofrequently that it becomes a new theme for vectors in 3-space.

Let u =< u1, u2, u3 > and v =< v1, v2, v3 >. We seek n =< n1, n2, n3 >so that n · u = 0 and n · v = 0. We have to solve two simultaneous linearequations in three unknowns. In fact, there are infinitely many solutions buteach is a scalar multiple of the following vector u × v, which we call thevector product of u and v (the order is important, u first and then v ):

u× v =< u2v3 − u3v2, u3v1 − u1v3, u1v2 − u2v1 > .

14

The plane which we gave explicitly as

r = r0 + su+ tv

is given implicitly by(u× v) · (r− r0) = 0

with the formula for u× v as above.The vector product has some very unusual algebraic rules:

u× u = 0, u× v = −v × u.

Happily, the most useful algebraic rule remains unchanged:

u× (v +w) = u× v + u×w

(and also the mirror image of this rule). It is natural to ask for a geometricalinterpretation for the length of u×v. With some algebraic manipulation weget

|u× v|2 = (u2v3 − u3v2)2 + (u3v1 − u1v3)

2 + (u1v2 − u2v1)2

= (u21 + u22 + u23)(v21 + v22 + v23)− (u1v1 + u2v2 + u3v3)

2

= |u|2|v|2 − (|u||v| cos(θ)2

= |u|2|v|2 sin2(θ)

and hence|u× v| = |u||v| sin(θ)

(we always take 0 ≤ θ ≤ π for the angle between two vectors). The aboveexpression is just the area of the parallelogram with sides u, v. Equivalently,the triangle with sides u, v has area (1/2)|u× v|. Note in passing that theformula u · v = |u||v| cos(θ) gives an easy way to calculate the angles of atriangle in 3-space given the coordinates of the three vertices.

There is another very useful piece of vector notation. The most importantvectors in 3-space are given by

i =< 1, 0, 0 >, j =< 0, 1, 0 >, k =< 0, 0, 1 > .

Then we can write any vector as

u =< u1, u2, u3 >= u1i+ u2j+ u3k.

15

With this notation, the vector product can be written as a formal 3 × 3determinant

u× v =

∣∣∣∣∣∣i j ku1 u2 u3v1 v2 v3

∣∣∣∣∣∣ .Alternatively, we can calculate u× v by using the distributive law on

(u1i+ u2j+ u3k)× (v1i+ v2j+ v3k).

All we need to know is the multiplication table for i, j,k. We easily verify:

i× i = 0, i× j = k, i× k = −j

j× i = −k, j× j = 0, j× k = i

k× i = j, k× j = −i, k× k = 0.

One delicate point remains. The vector u×v is one of the two possible vectorsw which is perpendicular to both u and v, and has length |u||v| sin(θ). Thisfixes w up to a possible reversal of direction; note, for example, that thevector −k is perpendicular to both i and j, and has length 1. The choice ofsign is determined by the condition that∣∣∣∣∣∣

u1 u2 u3v1 v2 v3w1 w2 w3

∣∣∣∣∣∣ ≥ 0.

In physical terms, this turns out to be equivalent (NOT obviously so) to thefollowing situation. Imagine the flat head of a right-handed screw in theplane determined by u and v. Rotate the screw from u to v. Then u × vpoints in the direction in which the screw moves into the wood (or othermedium).

Before returning to paths in 3-space, we shall consider the standard ge-ometrical problems for lines and planes. Given two lines r = r1 + su andr = r2 + tv, how do we decide if they meet at a point? We just check to seeif the three real simultaneous equations given by r1 + su = r2 + tv have asolution for s and t. In practice, we solve the simplest two of the equationsfor s, t and we check if the third equation is satisfied. Most pairs of linesare skew, that is, they never meet. Suppose the lines do meet, say, at r0.How do we find the plane containing the two lines? We write down eitherr = r0 + su + tv or (u × v) · (r − r0) = 0. What do we mean by the angle

16

between two planes? Open a book. You see the angle between the two pageswhen you see each page as a line and the spine between the pages as theirpoint of intersection. Thus the angle between two planes is just the anglebetween normal vectors to the planes and we readily calculate that by usingthe dot product.

Given two non-parallel planes, say, x+ y + z = 1 and x− z = 2; how dowe find their line of intersection? Put z = t and solve simultaneously to getx = 2 + t and then y = 1− x− z = −1− 2t. Thus, the line is given by

< x, y, z >=< 2, 1, 0 > +t < 1,−2, 1 > .

A longer method is to note that the direction of the line is perpendicular toboth n1 =< 1, 1, 1 > and n2 =< 1, 0,−1 > and so is given by the vectorn1 ×n2. As soon as we get one point on the line (for example, solve the twoplane equations with z = 0), we then know the line. How do we find theplane that passes through this line of intersection and also passes throughthe point (1, 2, 3)? There are long methods — and a very quick method. Forany choice of λ, the equation

(x+ y + z − 1) + λ(x− z − 2) = 0

defines a plane through the line of intersection. Put x = 1, y = 2, z = 3 inthis equation to find the appropriate value of λ. Done!

How do we find where the line r = r1+tv meets the plane n ·(r−r0) = 0?Just substitute the line formula in the plane equation to get the appropriatevalue of t. How do we find the shortest distance from r1 to the plane given byn · .(r− r0) = 0? We simply move from r1 to the plane in the direction of thenormal vector n to the plane; in other words, we find the value of t for whichr1 + tn lies on the plane n · (r − r0) = 0. In the process we can locate theactual point on the plane that is nearest to r1. How do we find the shortestdistance from r1 to the line given by r = r0 + tv? We just have to find thepoint P on the line so that the segment from P to r1 is perpendicular to theline; in other words, we have to solve the equation v · (r0 + tv − r1) = 0.Notice that in all of these problems we end up having to solve for t froma single linear equation! Other problems are just variations on the abovethemes.

There is one other nice geometrical idea in this context. The parallelo-gram in 2-space determined by u,v generalizes in 3-space to the parallelepipeddetermined by u,v,w. The area of the base determined by v,w is given by

17

|v×w|. The corresponding height of the parallelepiped is given by |u| cos(θ),where θ is the angle between u and v×w. So the volume of the parallelepipedis given by |u.(v ×w)| - this is just the absolute value of the determinant∣∣∣∣∣∣

u1 u2 u3v1 v2 v3w1 w2 w3

∣∣∣∣∣∣ .We say that the triple (u,v,w) is positive if the above determinant is positive;in which case, we get the following three formulas for the volume of theparallelepiped:

u · (v ×w) = v · (w × u) = w · (u× v).

This formula is also called the scalar triple product of (u,v,w).Finally we return to paths in 3-space. By reasoning as in the 2-space case

we see that the pathr(t) =< x(t), y(t), z(t) >

has tangent vector (velocity vector) given by

r′(t) =< x′(t), y′(t), z′(t) >

and so the tangent line at t = p is given by

r = r(p) + (t− p)r′(p).

For the motion of a particle, the acceleration vector is, of course, given byr′′(t).

In physics we usually start with information, not about the position vec-tor, but about the acceleration vector. Consider for example the case ofmotion under only gravity; thus we have r′′(t) = −gk. It follows that thevelocity vector must be of the form r′(t) = −gtk + b where b is a constantvector. As soon as we know the velocity vector at any one time, we cancalculate b. It follows next that the position vector must be of the form

r(t) = −1

2gt2k+ tb+ c

where c is another constant vector. As soon as we know the position vector atany one time, we can then calculate c. For the general case we start off with

18

three functions inside the vector r′′(t) and we calculate three antiderivativesto get r′(t), along with a constant vector of integration, b, say. We thenrepeat the process to find r(t). Unfortunately, in most problems in parti-cle dynamics we do not start with an explicit formula for the accelerationvector — we just know some equation that links the acceleration, velocityand position vectors, and this leads into the whole subject of differentialequations.

How do we measure the length of a path in 3-space? The straight linedistance formula gives us

ds2 = dx2 + dy2 + dz2

and so we get

L =

∫ b

a

√x′(t)2 + y′(t)2 + z′(t)2 dt

for the length from t = a to t = b. We still use s for the path length variableand the speed v satisfies

v = |r′(t)| = ds

dt.

As before, there are few examples where we can calculate the path lengthexactly. One nice example is the helix (now famous because of DNA) givenby

r(t) =< p cos(ωt), p sin(ωt), qt > .

This looks like a coiled spring (or the handrail of a spiral staircase). Weeasily check that v is constant for the helix, namely, v =

√p2ω2 + q2; it

follows that there is a linear equation which relates the variables s and t.Consider now an important classical curve from geometry — the twisted

cubic given byr(t) =< t, t2, t3 > .

The tangent vector is given by

r′(t) =< 1, 2t, 3t2 >

and so we have v =√1 + 4t2 + 9t4. There is no elementary integral for

this function and we cannot calculate the path length exactly. This is whatusually happens, but there is some other useful geometric information aboutpaths that we can always calculate.

19

How are we to quantify the bending and twisting of a path in 3-space?The Navy has three bad words for this — pitching, rolling and yawing. Ourstudy of bending in 2-space led to the equation

d

dst = κn.

The meaning of the unit tangent vector t in 3-space is clear (we divide thetangent vector by its length v). Now we define the curvature κ and the unitnormal vector n by the above equation (with the convention that we alwaystake κ ≥ 0). In the very special case when our 3-space path is entirelyconfined to one plane, this definition of curvature is just what we had inthe 2-space case. As before, we never expect to get r given explicitly as afunction of s; but we still have the chain rule to give (d/ds) = (1/v)(d/dt)and we can do computations just as in the 2-space case.

Consider the easy case of the helix

r(t) =< p cos(ωt), p sin(ωt), qt > .

Then we get the unit tangent vector

t = (1/v) < −pω sin(ωt), pω cos(ωt), q >

where v is constant. Now we get

d

dtt =

pω2

v< − cos(ωt),− sin(ωt), 0 >

d

dst =

pω2

v2< − cos(ωt),− sin(ωt), 0 >

and this gives us immediately

κ =pω2

p2ω2 + q2, n =< − cos(ωt),− sin(ωt), 0 > .

Notice that the unit normal points inwards and is perpendicular to the ver-tical central axis of the helix.

For a path like the twisted cubic, we content ourselves by calculatingκ and n for specific values of t. We have t = (1/v) < 1, 2t, 3t2 >, where

20

v2 = 1 + 4t2 + 9t4. This last equation gives us vv′ = 4t + 18t3. We alsodifferentiate the other equation to give

d

dtt =

−v′

v2< 1, 2t, 3t2 > +

1

v< 0, 2, 6t > .

We readily calculate this for a given value of t. Take the easy case when t = 0.Then v = 1, v′ = 0 and (d/dt)t =< 0, 2, 0 >. This gives (d/ds)t =< 0, 2, 0 >and so when t = 0 we have κ = 2 and n = j. This procedure can alwaysbe carried out (and it often helps the calculation to replace (−v′/v2) by(−vv′/v3)).

At each point on the path we have two unit vectors at right angles, namelyt and n. It is natural to look for a third unit vector perpendicular to these(then we shall have some rotation of the triple i, j, k). We define the unitbinormal vector b by

b = t× n.

It can be shown that (d/ds)b is a multiple of n; we write

d

dsb = −τn

and we call τ the torsion of the path at this point. Since n = b× t, we canuse the product rule to deduce that

d

dsn = −κt+ τb.

(The three equations that give the derivatives of t, n, b are called the Serret-Frenet formulas.)

Finally, we return briefly to the dynamics of a particle in 3-space. Thevelocity vector v(t) points in the same direction as the unit tangent vectort. In what direction does the acceleration vector point? Since v(t) = vt, wehave

a(t) =d

dtvt =

dv

dtt+ vvκn

and so the acceleration vector never has any component in the direction ofthe binormal — it always lies in the plane determined by t and n. There isa temptation to think that the magnitude of the acceleration should be therate of change of the speed. This is almost never the case. In fact, the aboveformula for a(t) gives us that

|a(t)|2 =(dv

dt

)2

+ κ2v4

21

and we see that the magnitude of the acceleration is equal to (dv/dt) onlyat a point where κ = 0 or v = 0.

22

PROBLEMS 1

Note: Some of the problems include parameters, a, b, . . .. These may be re-placed by specific numbers. In most problems the specific numbers can bechanged to provide lots of similar problems. Usually we list only one problemof each type; the students may formulate lots of similar problems by changingthe numbers.

1. Parameterize the Folium of Descartes x3 + y3 = 3xy by finding wherethe folium meets the line y = tx. Hence get x = 3t/(1+ t3), y = 3t2/(1+ t3).Which values of the parameter t describe the loop of the folium? Show thatx+ y is close to −1 when t is close to −1. Find all the points on the foliumwhere the slope is (i) horizontal, (ii) vertical.

2. For the loop: (x, y) = (cos 2t− cos t, sin 2t− sin t) 0 ≤ t ≤ 2π, findthe tangent vector at t = 0, π/3, π/2, π, 2π. Find all points where the tangentis (i) horizontal, (ii) vertical. There is a crossover point at (−1, 0); find theangle between the two curves at this point.

3. For the cycloid : (x, y) = (a(t− sin t), a(1− cos t)) 0 ≤ t ≤ 4π, findthe tangent vector at t = π/2, π, 2π. The cycloid gives the path traced outby a red dot on the rim of a wheel as the wheel rolls at uniform speed alonga horizontal plane.

4. In the above problem, move the red dot in to a fixed point on onespoke. Show that the new path is given by

(x, y) = (at− b sin t, a− b cos t)

where b < a. Describe the shape of such a path.5. As another variation, now consider a train wheel in which the red dot

is beyond the rim of the wheel that turns on the rail line. Show that theabove formula still holds, but now we have b > a. Determine the values of tfor which the x-component of the velocity of the red dot has negative values.The path now has crossover points; find the angle between the two curves atthis point.

6. A circle of radius b rolls (without slipping) inside a circle of radius a(of course a > b). The path followed by a fixed point P on the smaller circleis called a hypocycloid. Show that the path is given by

(x, y) = ((a− b) cos t+ b cos(a− b

bt), (a− b) sin t− b sin(

a− b

bt)).

23

Discuss the shape of such paths, and show that it reduces to the astroid whena = 4b.

7. Show that vectors u,v,w are represented by three sides of a triangle(with appropriate directions) if and only if u+ v +w = 0.

8. Let the arrows from A to C and from A to B represent the vectors uand v respectively, and let b = |u|, c = |v|. Let AD bisect the angle at Awith D on side BC. Prove that the arrow from A to D represents the vector(1 − t)u + tv where t = b/(b + c). [Hint: consider the rhombus with sidescu and bv.] Hence, or otherwise, show that AD = h cos(A/2) where h is theharmonic mean of b, c, that is, 2/h = (1/b+ 1/c).

9. Let O be the circumcenter of triangle ABC (thus, OA = OB =OC in length). Let the arrows OA,OB,OC represent the vectors u,v,wrespectively. Let H be the point such that the directed line segment fromO to H represents the vector u + v + w. Prove that H is the point wherethe three altitudes of the triangle meet (that is, H is the orthocenter of thetriangle).

10. Find the angles of the triangle whose vertices are:

P = (1, 0, 0), Q = (−1, 1, 1), R = (−1,−1, 1).

Find also the area of this triangle.11. Find the angle between the planes x+ y+ z = 3 and x+2y+3z = 6.12. Find an equation for the plane through the point (−1,−1,−1) and

the line of intersection of the planes in problem 11.13. Show that the lines

r =< 1, 1, 1 > + t < 1, 0,−1 >

andr =< 4, 0, 0 > + τ < 1,−1, 1 >

meet — at P , say. Find the line through P and the point (0, 0, 1).14. Find an equation for the plane Π that contains the point P =

(−1, 1, 1) and the line given by r =< 0, 0, 1 > + t < 2, 1, 1 > . Find theline of intersection of the plane Π and the plane x+ y + z = 5.

15. Find the lengths of the altitudes of the triangle given in problem 10.16. Find the shortest distance from the origin to the plane of the triangle

in problem 10.

24

17. A parallelepiped is formed by joining the origin to each of thepoints (−1, 1, 1), (1,−1, 1), (1, 1,−1) and then completing parallelograms inthe usual way; find its volume.

18. How many different parallelepipeds have four of their vertices at thepoints (0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1)? Do these parallelepipeds all havethe same volume?

19. Prove the general differentiation formulas in 3-space:

d

dt[k(t)u(t)] = k′(t)u(t) + k(t)u′(t)

d

dt[u(t) · v(t)] = u′(t) · v(t) + u(t) · v′(t)

d

dt[u(t)× v(t)] = u′(t)× v(t) + u(t)× v′(t).

20. Find the position vector r(t), given that

a(t) = 4 sin(2t)i+ 2 cos(t)j− gk

and v(0) = k and r(0) = i.21. Find the position vector r(t), given that

a(t) = (4 + 2t)i+ (2− t)j− 3e−tk

and v(0) = j and r(0) = 0.22. Try to calculate the lengths of appropriate pieces (loops, arches, etc)

for the paths in problems 1-6.23. Show that the path traced out by the center of curvature of the

parabola y = x2 has a cusp at the point (0, 1/2).24. Find the curvature and center of curvature for a general point on the

cycloid. Show that the path traced out by the center of curvature is just thesame as the cycloid but shifted across and down.

25. Find the curvature and center of curvature for a general point on theastroid. Show that the path traced out by the center of curvature is just theoriginal cycloid but rotated through π/4 and dilated suitably.

26. For the standard functions x2, cos(x), sin(x), tan(x), exp(x), log(x), . . .,guess the location of the points with greatest curvature — then work out theactual answers analytically.

25

27. For a given path r(s) (here, s is the arc length parameter), the pathtraced out by the center of curvature is called the evolute of the given path.It is given parametrically by

e(s) = r(s) + ρ(s)n(s)

but notice that s is NOT the arc length parameter for the evolute. Theoriginal curve is called the involute of the evolute. Suppose a piece of threadis wrapped along the evolute and anchored at one end; at the other end of thepath the thread is extended tangentially and stops at a point on the originalpath. As we unwind the thread off the evolute (always tangentially at thepoint of contact), the free end of the thread traverses the original path! Forthis to be true we need to show that e′(s) = c(s)n(s) for some scalar functionc(s). Verify that this holds with c(s) = ρ′(s) provided we can show that

r′(s) + ρ(s)n(s) = 0.

Use a vector calculus argument to prove this fact, starting by differentiatingthe equation n(s) ·n(s) = 1. Finally, if S is the arc length parameter for theevolute, note that we have proved that S ′(s) = ρ′(s). Hence S−ρ is constantand this concludes the proof of the claim about unwinding the thread fromthe evolute to give the involute.

28. Find v, κ, t,n for the path given by

r(t) =< e−t cos(t), e−t sin(t), e−t > .

What is the length of the path from t = 0 to t = π?29. For the path given by

r(t) =< 5(t− sin(t), 4(1− cos(t), 3(1 + cos(t) > (0 ≤ t ≤ 2π)

prove that v = 10 sin(t/2) and calculate t,n, κ as functions of t. What is thetotal length of the path?

30. For the path given by

r(t) =< 1− 3t, 1 + 3t, 3t3 >

calculate t,n, κ when t = 1.31. For the path given by

r(t) =<√2 log(t), 1/t, t >

26

prove that v = 1 + 1/t2 and calculate t,n, κ as functions of t. What is thetotal length of the path from t = 1 to t = 2?

32. Prove that (d/ds)b is a multiple of n. [Hint: start by differentiatingthe equation b(s) · b(s) = 1.] Since n = b× t, deduce that

d

dsn = −κt+ τb.

27

2 DIFFERENTIAL CALCULUS FOR SEVERAL VARIABLES

Recall that a real function of one variable is given by y = f(x) and itsgraph is a path in 2-space. We say that f is differentiable at x = a if it hasa good local linear approximation near x = a. We may write this as

f(a+ h) = f(a) + λh+ error.

The first two terms on the right hand side form a linear function which givesthe tangent line at (a, f(a)). For all the elementary functions of calculus, theerror function, say ϵ(h), satisfies an inequality

|ϵ(h)| ≤Mh2

for all small h. [The bound M depends on the function f , the point a, andthe size of the interval of h values. For numerical considerations, a large Mis undesirable; for theoretical considerations, any M will do.] We write thisas ϵ = O(h2). Of course we denote λ by f ′(a) and we write the tangent lineas

y = f(a) + f ′(a)(x− a).

A real function of two variables is given by z = f(x, y) and its graph is asurface in 3-space. We say that f is differentiable at (x, y) = (a, b) if it hasa good local linear approximation near (a, b) — which we write as

f(a+ h, b+ k) = f(a, b) + λh+ µk +O(h2 + k2).

The first three terms on the right hand side form a linear function whichgives the tangent plane at the point (a, b, f(a, b)). Assuming that f is dif-ferentiable, how are we to calculate the numbers λ, µ? Put k = 0 and weget

f(a+ h, b) = f(a, b) + λh+O(h2).

Write ϕ(x) = f(x, b) and we get

ϕ(a+ h) = ϕ(a) + λh+O(h2)

and so λ = ϕ′(a). Geometrically, z = f(x, b) gives a path on the surfaceabove the line of points (x, b), and λ is the slope of this path at x = a. Wecall λ the partial derivative of f with respect to x at (a, b) and we write

λ =∂f

∂x(a, b).

28

Similarly we may put h = 0 to give

f(a, b+ k) = f(a, b) + µk +O(k2).

Write ψ(y) = f(a, y) and we get

ψ(b+ k) = ψ(b) + µk +O(k2)

and so µ = ψ′(b). Geometrically, z = f(a, y) gives a path on the surfaceabove the points (a, y) and µ is the slope of this path at y = b. We call µthe partial derivative of f with respect to y at (a, b) and we write

µ =∂f

∂y(a, b).

Write x = a + h, y = b + k and we see that the tangent plane at(a, b, f(a, b)) is given by

z = f(a, b) + λ(x− a) + µ(y − b)

where λ, µ are as above. With c = f(a, b) we may also write the tangentplane as

λ(x− a) + µ(y − b)− (z − c) = 0.

It is worthwhile to relate this to our previous ideas for tangent lines to paths.The first path above is given by

r =< t, b, f(t, b) >

and so the tangent vector at t = a is given by < 1, 0, λ >. The second pathis given by

r =< a, t, f(a, t) >

and so the tangent vector at t = b is given by < 0, 1, µ >. A vector perpen-dicular to each of these tangent vectors is given by

< 1, 0, λ > × < 0, 1, µ > = < −λ,−µ, 1 > .

Thus, as expected, the tangent plane at (a, b, f(a, b)) contains the tangentvectors to each of the above paths.

29

To calculate partial derivatives, we simply hold one variable constant anddifferentiate in the usual way with respect to the other. For example, whenz = x2 + 2xy + 3y2 we get

∂z

∂x= 2x+ 2y,

∂z

∂y= 2x+ 6y.

To find the tangent plane at, say, (1, 1, 6), we simply put x = 1, y = 1 inthe above partial derivatives to get λ = 4, µ = 8. Thus the tangent plane isgiven by

4(x− 1) + 8(y − 1)− (z − 6) = 0.

Which functions are differentiable? Obviously the functions 1, x, and yare differentiable since they are exactly linear! To extend the list beyondthese we follow the same strategy as in the differential calculus for one vari-able. Recall that if f, g are differentiable functions of one variable, so also aref + g, kf, fg, f ◦ g. Thus as soon as we get just a small list of differentiablefunctions, we can apply the above recipes again and again to form enormouslists of differentiable functions. We use the same procedure for functions oftwo variables — except that now we do not have to worry in most casesabout a formula for the derivative. We calculate tangent planes by calculat-ing partial derivatives and that is a one variable exercise. [Standard strategy:reduce several-variable problems to several one-variable problems!]

Suppose that f, g are differentiable at (a, b). Then we can write

f(a+ h, b+ k) = f(a, b) + λh+ µk +O(h2 + k2)

g(a+ h, b+ k) = g(a, b) + ρh+ σk +O(h2 + k2).

Add together the two equations and we get the right kind of formula for f+gas long as we know that the sum of two O(h2 + k2)′s is another one. Butthis is an easy exercise in inequalities. Hence f + g is differentiable. An eveneasier computation shows that cf is differentiable for any constant c. Toshow that fg is differentiable we multiply together the above two equations.We get sixteen terms on the right hand side! Almost all these terms give aO(h2 + k2) term (recall that 2hk ≤ h2 + k2); the remaining terms give justthe kind of linear formula that we need. So fg is differentiable.

We now know that any polynomial formula in x, y gives a differentiablefunction. To get to more general formulas we need to consider the analogue of

30

the chain rule from calculus of one variable. In fact there are three analoguesto consider:

(i) z = ϕ(f(x, y)), (ii) z(t) = f(x(t), y(t)), (iii) z = f(p(x, y), q(x, y)).

In the above formulas, f, p, q are differentiable functions of two variables andϕ, x, y are differentiable functions of one variable.

In the first case, we have, as above,

f(a+ h, b+ k) = f(a, b) + λh+ µk +O(h2 + k2).

Write c = f(a, b) and ϵ = λh + µk + O(h2 + k2). Since ϕ(c + ϵ) = ϕ(c) +ϕ′(c)ϵ+O(ϵ2), we may write

ϕ(f(a, b)) = ϕ(f(a, b)) + ϕ′(c)[λh+ µk] +O(h2 + k2)

where we have made some routine manipulations with the O terms. It fol-lows that ϕ ◦ f is differentiable at (a, b). It follows that formulas such asexp(−x2 − y2), sin(xy), log(1 + x2 + y2),

√x4 + x2y2 + y4 all give differen-

tiable functions. In other words, any (reasonable) formula that we build outof our standard calculus functions will be differentiable. Of course, when wehave fractional formulas we have to worry about when the denominator iszero, and when we have a square root function we have to worry about whenthe expression under the square root is zero. When we need to calculatethe partial derivatives of ϕ ◦ f , we just use the chain rule for one variablefunctions.

Consider now the second case when z(t) = f(x(t), y(t)). We have

x(c+ ϵ) = x(c) + x′(c)ϵ+O(ϵ2), y(c+ ϵ) = y(c) + y′(c)ϵ+O(ϵ2).

Let a = x(c), b = y(c), h = x′(c)ϵ+O(ϵ2), k = y′(c)ϵ+O(ϵ2). Since

f(a+ h, b+ k) = f(a, b) + λh+ µk +O(h2 + k2)

it follows that

z(c+ ϵ) = z(c) + λx′(c) + µy′(c) +O(ϵ2)

where we have again done some routine manipulations with the O terms. Inthis case we are very interested in the formula for the derivative. We canwrite the general formula as

dz

dt=∂f

∂x

dx

dt+∂f

∂y

dy

dt.

31

Notice that this last formula is a dot product:

dz

dt=

⟨∂f

∂x,∂f

∂y

⟩·⟨dx

dt,dy

dt

⟩.

We recognize the second vector as the tangent vector to the path given byr(t) =< x(t), y(t) >. The first vector is so important that we give it a specialname and symbol; we call it the gradient vector of f (spoken as grad-f) andwe write

∇f =

⟨∂f

∂x,∂f

∂y

⟩.

Consider the special case in which the path is given by

r(t) =< a, b > +t < cos(α), sin(α) > .

The tangent vector is given by u =< cos(α), sin(α) > and we get z′(t) =∇f · u. What is this telling us geometrically? We are measuring the slopeon the surface z = f(x, y) as we head in the direction of the vector u. Ingeographical terms we are measuring the slope at a point on a mountainwhen we head in a specified compass direction. We call this the directionalderivative of f in the direction u. Note that the directional derivatives in

the directions of i, j are just the partial derivatives∂f

∂x,∂f

∂y. We have to

use a unit vector to ensure that t in the direction of u measures the sameamount as t in the direction of i. It makes sense to speak of the directionalderivative in the direction of the vector < 3, 4 >, but in order to makethe correct computation we have to replace < 3, 4 > by the unit vector< 3/5, 4/5 > in the same direction. For example to find the directionalderivative of f(x, y) = log(1 + x2 + y2) at (1, 2) in the direction of i + j wecalculate

∂f

∂x=

2x

1 + x2 + y2,∂f

∂y=

2y

1 + x2 + y2.

Thus, at (1,2) we have ∇f =< 1/3, 2/3 >. The unit vector in the directionof < 1, 1 > is given by < 1/

√2, 1/

√2 >. Hence the required directional

derivative is given by 1/√2.

From the geographical point of view (and for some important numericalalgorithms) it is natural to ask in which direction we should head to find thesteepest slope. Since ∇f · u = |∇f | cos(θ), where θ is the angle between∇f and u, it is clear that the slope is greatest when cos(θ) = 1, that is, u is

32

chosen in the direction of ∇f . For this choice of direction, the steepest slopeis given by |∇f |. See how much geometrical information is contained in thevector ∇f !

In studying curves in 2-space it is often best to use an implicit formf(x, y) = c; for example, x2 − y2 = 1 or x3 + y3 − 3xy = 0. What doesthe chain rule do for us here? Suppose that the curve is parameterizedby < x(t), y(t) >. This means that we have f(x(t), y(t)) = c for all t.Differentiate with respect to t (by the chain rule) and we get ∇f · .r′ = 0.This tells us that the vector ∇f is perpendicular to the tangent vector. In2-space this means that ∇f gives us a normal vector to the curve — yet moregeometrical information from ∇f ! These implicitly defined curves in 2-spacegive us a useful tool for representing a surface z = f(x, y). The contour lineat height c is the set of all points (x, y) such that f(x, y) = c. When we drawthese contour lines for a sample of values of c we get a geographical map forthe surface z = f(x, y). This enables us to visualize the shape of importantsurfaces such as z = x2, z = (x + y)2, z = x2 + y2, z = xy, etc. Conversely,if we know the shape of a complicated surface such as z = x3 + y3 − 3xy,then we immediately get a qualitative understanding of its contour lines. Inparticular, when a surface has a local mountain top, then the contour linefor that height will have an isolated point!

We pause for a nice geometrical application of the fact that ∇f gives anormal vector to the curve f = c. A ray of light from one focus of an ellipsereflects off the ellipse to pass through the other focus. [We hope you provedthis in Calculus I or II.] We begin a proof of this with a useful calculation.Let A = (a, b) be a fixed point and let P = (x, y) be a variable point andlet f(x, y) = AP . Now check that ∇f is the unit vector in the directionfrom A to P . An ellipse with foci at A and B may be specified by f = cwhere f(x, y) = AP + BP . Then ∇f , a normal vector to the ellipse at P ,is the sum of unit vectors in the directions of AP and BP . Recall fromthe geometry of the rhombus that this happens if and only if AP and BPmake equal angles with the normal. In other words, the reflected ray of lightAP returns as ray PB, as required. Provide similar proofs for the opticalproperties of parabolas and hyperbolas.

We should like to consider analogues of these ideas in 3-space, but firstwe need a brief discussion about functions of three (or more) variables. Thegraph of a function w = f(x, y, z) is in 4-space and so it is hard to visualize.

33

It sometimes helps to think of f as giving the density at each point in a solidregion (or perhaps the electric charge); but for more than two variables weare usually forced to depend on formulas and algebra. It is easy to generalizethe definition of differentiable. We say that f is differentiable at (a, b, c) if fhas a good linear approximation near (a, b, c); in other words

f(a+ h, b+ k, c+ l) = f(a, b, c) + λh+ µk + νl +O(h2 + k2 + l2).

What shall we call the linear approximation? After tangent line and tangentplane we simply use the generic phrase tangent hyper-plane — no matterhow many variables are present. So the tangent hyper-plane is given by

w = f(a, b, c) + λ(x− a) + µ(y − b) + ν(z − c).

To calculate λ we simply put k = l = 0. In other words we are differentiatingf with respect to just the variable x. Of course we call this the partialderivative of f with respect to x. Similarly for the other cases and so wewrite

λ =∂f

∂x(a, b, c), µ =

∂f

∂y(a, b, c), ν =

∂f

∂z(a, b, c).

To calculate a partial derivative, we just pretend that all the other variablesare constant. Obviously the functions 1, x, y, z are differentiable (they are ex-actly linear!) and we can repeat our earlier discussion to get all “reasonable”formulas to be differentiable.

What happens to the second chain rule discussed above? Consider thefunction w(t) = f(x(t), y(t), z(t)). We get the expected formula

dw

dt= ∇f · r′

where we now have

∇f =

⟨∂f

∂x,∂f

∂y,∂f

∂z

⟩.

The directional derivative of f in the direction of v is given by ∇f ·u whereu = (1/|v|)v. As above, this is greatest, with value |∇f |, when u is chosento be in the direction of the vector ∇f . Similar ideas apply for four or morevariables.

Suppose now that we have a surface inside 3-space given implicitly byF (x, y, z) = c (we discuss important examples below). Take a point P on

34

the surface and any path on the surface through P given parametrically byr(t) =< x(t), y(t), z(t) >. Thus we have

F (x(t), y(t), z(t)) = c

for all t. Differentiate with respect to t by the chain rule and we get

∇F · r′(t) = 0.

This says that the vector ∇F is perpendicular to every tangent vectorthrough P . It follows that ∇F gives us a normal vector to the tangentplane at P . So the tangent plane at (a, b, c) on the surface is given by

∇F · < x− a, y − b, z − c >= 0.

Let’s check that this agrees with our story for z = f(x, y)! Here we takeF (x, y, z) = f(x, y)− z and we calculate that

∇F =

⟨∂f

∂x,∂f

∂y,−1

⟩in agreement with our earlier discussion. The implicit case F (x, y, z) = cincludes the explicit case z = f(x, y) as a special case.

Before turning to the third version of the chain rule (!) we pause to con-sider important examples of surfaces defined implicitly. The most importantexample is the sphere, x2 + y2 + z2 = k2. If we “ squash” each axis, we getthe egg-shaped surface (formally, the ellipsoid)

x2

a2+y2

b2+z2

c2= 1.

Notice that the gradient vector at a general point on the sphere is given by2 < x, y, z >; as expected, it points in the direction of the radius vectorfrom the center of the sphere. For the ellipsoid, the gradient vector is almostnever in the same direction as the radius vector! In 2-space we change froman ellipse to a hyperbola by inserting a minus sign in the ellipse equation. In3-space we can insert either one or two minus signs in the ellipsoid equation:

x2

a2+y2

b2− z2

c2= 1,

x2

a2− y2

b2− z2

c2= 1.

35

The first is called a hyperboloid of one sheet, and the second a hyperboloid oftwo sheets. The first is a very remarkable surface because it is made up of aninfinite number of straight lines! (In modern architecture we can constructbuildings with this interesting shape by bolting together straight pieces ofreinforced concrete — much easier than trying to construct a spherical build-ing.)

To see where these straight lines come from, we consider the simple ex-ample x2 + y2 − z2 = 1. This can be rewritten as

(x+ z)(x− z) = (1 + y)(1− y).

Choose any real number λ. Then the above equation is satisfied provided wesatisfy the simultaneous equations

x+ z = λ(1 + y), λ(x− z) = 1− y.

Each of these equations gives a plane and the planes are obviously not par-allel. Thus the simultaneous equations give a line of points which lie on ourhyperboloid. Every different choice of λ gives a different line. The family oflines fills up the whole hyperboloid. There is a different collection of linesobtained by matching the factors in a different way:

x+ z = µ(1− y), µ(x− z) = 1 + y.

Each λ-line is parallel to or meets each µ-line (work out the intersection pointfor yourself) and this gives a two-parameter description of the hyperboloid(with parameters λ, µ). A careful analysis of these lines leads to the follow-ing construction for the surface. Make a cylindrical cage of vertical wiresattached to disks at the top and bottom. Twist one of these disks and thehyperboloid appears!

We get generalizations of the parabola, called paraboloids, by equationsof the form

4cz =x2

a2+y2

b2, 4cz =

x2

a2− y2

b2.

The second of these paraboloids is made up of straight lines (play the fac-torization trick again); it may be seen in the Sydney Opera Building. Thegeneric name for all of the above surfaces is quadric surfaces. The adjectivecomes from the fact that they are of degree two. By messy algebra we canshow that any second degree polynomial equation in x, y, z gives such a sur-face (usually rotated in space). But there are two degenerate cases that merit

36

special attention. The equation x2 + y2 = 1 gives a circle in 2-space. Whatdoes it represent in 3-space? There is no constraint on z and so we can movethe circle up and down to give a cylinder with circular cross-section. Moregenerally, the equation f(x, y) = c gives a curve in 2-space but a generalizedcylinder in 3-space. In the study of conics we get degenerate hyperbolas thatconsist of a pair of lines, for example, x2 − y2 = 0. In 3-space, degeneratehyperboloids give cones, for example, x2 + y2 − z2 = 0. This circular coneis made up of infinitely many straight lines through the origin. More gen-erally we can construct cones as follows. Take a curve in the plane z = c.Form lines by joining all points on the curve to the origin. If the curve isgiven by z = c and f(x, y) = a, we readily show that the cone is given byf(cx/z, cy/z) = a.

We return now to the third form of the chain rule. We are given z =f(x, y) where x = ϕ(u, v), y = ψ(u, v). We can think of this in terms ofchange of coordinates; we change from (x, y) coordinates to (u, v) coordinates.We end up with z = F (u, v). We want to know how the partial derivativesof z with respect to u and v relate to the partial derivatives of z with respectto x and y; more succinctly, what is the formula that connects the gradientvector of z with respect to the (u, v) variables to the gradient vector of zwith respect to the (x, y) variables?

If we specify all the functions involved then there is no difficulty in calcu-lating the partial derivatives. For example, if z = cos(xy) and x = u+ v, y =uv, then we have z = cos(uv(u + v)). We are interested in formulas thatapply for any function f . In practice, the most common changes of variablesare linear changes (for example, x = u+ v, y = u− v) and polar coordinates(x = r cos(θ), y = r sin(θ)).

To get the general formulas, all we have to do is to apply the second chainrule with t replaced by u and by v. Thus we get

∂z

∂u=∂z

∂x

∂x

∂u+∂z

∂y

∂y

∂u

and∂z

∂v=∂z

∂x

∂x

∂v+∂z

∂y

∂y

∂v.

We can write these equations in matrix-vector form. Regard ∇z(u, v), thegradient vector of z with respect to the (u, v) coordinates as a column vector.

37

Then the two equations may be written as

∇z(u, v) = J∇z(x, y)

where J is the Jacobian matrix given by

J =

∂x

∂u

∂y

∂u

∂x

∂v

∂y

∂v

.Often we want to have the equation the other way around, that is, we wantto write the (x, y) gradient vector in terms of the (u, v) gradient vector. The“obvious” method is to write u, v as functions of x, y and proceed analogouslyto the above. Usually it is impossible to calculate these functions (just tryit when x = u3 + v3, y = u2 + v2 !). It is much easier just to solve the abovetwo equations for two “unknown” partial derivatives, or, in other words, tocalculate the inverse matrix J−1.

For the simple linear change of variables, x = u+ v, y = u− v we get

∂z

∂u=∂z

∂x+∂z

∂y,

∂z

∂v=∂z

∂x− ∂z

∂y

and hence (∂z

∂u

)2

+

(∂z

∂v

)2

= 2

(∂z

∂x

)2

+ 2

(∂z

∂y

)2

.

In this special case, we get a very nice relationship between the lengths ofthe two gradient vectors. Now consider the change to polar coordinates. Weget

∂z

∂r= cos(θ)

∂z

∂x+ sin(θ)

∂z

∂y

∂z

∂θ= −r sin(θ)∂z

∂x+ r cos(θ)

∂z

∂y.

This is not a very pretty formula but we can improve it by dividing thesecond equation by r to get the same “dimensions” in each equation. We get

1

r

∂z

∂θ= − sin(θ)

∂z

∂x+ cos(θ)

∂z

∂y.

38

When we combine this equation with the first one we recognize that thematrix involved is just the matrix for rotating backwards through an angleθ; this makes it very easy to calculate the (x, y) partial derivatives in termsof the (r, θ) partial derivatives! Also, by squaring and adding we get the niceformula (

∂z

∂r

)2

+

(1

r

∂z

∂θ

)2

=

(∂z

∂x

)2

+

(∂z

∂y

)2

.

Later on, we shall see that we are often interested in the formula

u∂z

∂u+ v

∂z

∂v.

We get a nice formula for this for suitable polynomial changes of variables.Suppose that x = u3/v and y = v3/u. Then

∂z

∂u=

3u2

v

∂z

∂x− v3

u2∂z

∂y,

∂z

∂v= −u

3

v2∂z

∂x+

3v2

u

∂z

∂y.

It follows that

u∂z

∂u= 3x

∂z

∂x− y

∂z

∂y, v

∂z

∂v= −x∂z

∂x+ 3y

∂z

∂y

and so

u∂z

∂u+ v

∂z

∂v= 2x

∂z

∂x+ 2y

∂z

∂y.

There is another context in which we get nice formulas. We say thatz = f(x, y) is homogeneous of degree k if f(tx, ty) = tkf(x, y) whenevert > 0. For example, x2 + y2 is homogeneous of degree 2, and 1/

√x2 + y2 is

homogeneous of degree −1. In such a case we get

x∂f

∂x+ y

∂f

∂y= kf.

To see this, just differentiate the defining equation for homogeneity withrespect to t and then put t = 1.

All of the above chain rule ideas can be repeated for functions of three ormore variables — with the obvious elaboration of the formulas. In particular,for three variables the Jacobian matrix becomes 3× 3.

Our final theme is max/min problems. Given z = f(x, y) how do we findthe local maxima (mountain tops) and the local minima (lake bottoms). It is

39

geometrically obvious that these occur where the tangent plane is horizontal,that is, ∇f = 0. So the first step in the procedure is to find the locations(a, b) where ∇f = 0. How then do we determine the nature of each criticalpoint? For a function of one variable, y = ϕ(x), with ϕ′(a) = 0 we getϕ(a+ h) = ϕ(a) + (1/2)ϕ′′(a)h2 + O(h3) and hence we get a local minimumif ϕ′′(a) > 0 and a local maximum if ϕ′′(a) < 0. Evidently we need to findthe corresponding quadratic approximation for z = f(x, y). We do this byone of our favorite strategies — reduce to a one-variable problem!

Fix a, b, h, k and let ϕ(t) = f(a + th, b + tk). Notice that ϕ(0) = f(a, b)and ϕ(1) = f(a+ h, b+ k). But we know that

ϕ(1) = ϕ(0) + ϕ′(0) +1

2ϕ′′(0) + · · · .

We just need to calculate these derivatives. By the second version of thechain rule we get

ϕ′(t) = h∂f

∂x+ k

∂f

∂y

where the partial derivatives are evaluated at (a+ th, b+ tk). Put t = 0 andwe get ϕ′(0) = λh + µk as in the linear approximation. Now differentiateagain (this introduces higher order partial derivatives!) to get

ϕ′′(t) = h2∂2f

∂x2+ kh

∂2f

∂x∂y+ hk

∂2f

∂y∂x+ k2

∂2f

∂y2

with the partial derivatives evaluated at (a + th, b + tk) again. [For smoothfunctions we always get

∂2f

∂x∂y=

∂2f

∂y∂x

and we shall prove this in the appendix.] Put t = 0 and (with ∇f = 0) weget

f(a+ h, b+ k) = f(a, b) +1

2(rh2 + 2shk + tk2) + · · ·

where

r =∂2f

∂x2(a, b), s =

∂2f

∂x∂y(a, b), t =

∂2f

∂y2(a, b)

and the dots represent terms of smaller order of magnitude which may beneglected for h, k very small.

40

Now we have to decide whether the quadratic expression

Q = rh2 + 2shk + tk2

is always positive, or always negative, or sometimes positive and sometimesnegative. We just complete the square! Suppose first that r = 0. Then

rh2 + 2shk + tk2 = r

(h+

sk

r

)2

+rt− s2

rk2.

Since r and 1/r have the same sign, we see that always Q > 0 providedrt − s2 > 0 and r > 0. This gives a local minimum. Similarly, we alwayshave Q < 0 provided rt−s2 > 0 and r < 0. When rt−s2 < 0, the two squareterms have opposite signs and sometimes we have Q > 0 and sometimes wehave Q < 0. This gives a saddle point. When rt − s2 = 0, we come to noconclusion. [The expression rt− s2 is called the discriminant of Q.] Supposenow that r = 0. If t = 0, we can mimic the above analysis by completing thesquare from the other end. If we have both r = 0 and t = 0, then we clearlyget rt − s2 < 0, unless s = 0 — in which case we are in the indeterminatesituation. We now have an algorithm to determine the local geography ofthe surface z = f(x, y). In practice the harder part is to solve the equation∇f = 0!

For an easy example we take f(x, y) = xy exp(x2 − y2). A computationgives

∇f = ex2−y2 < y + 2x2y, x− 2xy2 >

and so ∇f = 0 if and only if we have

y(1 + x2) = 0, x(1− 2y2) = 0.

The first equation forces y = 0 and then the second forces x = 0. So the onlycritical point is at (0, 0). We may proceed to calculate r, s, t at (0, 0) and findthat rt− s2 < 0, but there is a much quicker way to finish the problem. It isclear from the formula (and the expansion of the exponential function) thatthe quadratic approximation at (0, 0) is given by hk and so there is obviouslya saddle at the origin.

For functions of three or more variables the situation is rather more com-plicated. It is fairly clear that we have to solve ∇F = 0 to find the criticalpoints, but how are we to decide the nature of a critical point? By modifying

41

the argument above we get the quadratic approximation to F (a+h, b+k, c+l)given by

F (a, b, c) +1

2(αh2 + βk2 + γl2 + 2δhk + 2ϵkl + 2ζlh)

(with ∇F = 0, of course), where α, β, γ are the second order partial deriva-tives with respect to x, y, z and δ, ϵ, ζ are the mixed second order partialderivatives. The easy way to decide the sign of the quadratic piece is tocomplete the square twice in a judicious way. For example,

h2 + k2 + l2 + 2hk − 2kl + 6lh = (h+ k + 3l)2 − 8l2 − 8kl

= (h+ k + 3l)2 − 48(l + k/2)2 + 2k2

and it is clear that this can take both positive and negative values for smallh, k, l and so we get a saddle.

Our final theme is constrained optimization. We may wish to find thesmallest value of x2+y2 when (x, y) is constrained to lie on the curve xy = 1.Or we may wish to find the largest and smallest values of xyz when (x, y, z)is constrained by both x+y+z = 7 and x2+y2+z2 = 17. In earlier calculuswe would solve the first problem by replacing y by 1/x and then minimizingx2+1/x2. [In Junior High we should do it by noting that x2+ y2 ≥ 2xy = 2,with equality when x = y = 1.] Suppose then that we want to find themaximum and/or minimum of f(x, y) subject to g(x, y) = 0. Imagine thecontour lines f(x, y) = c moving out like ripples from a pebble in a pond. Atthe first contact with the curve g(x, y) = 0, that contour line will be tangentto the curve and so their normal vectors will point in the same direction.This means that ∇f = λ∇g for some scalar λ. A similar situation will occurfor the last contact of a contour line with the curve. [Given that both a maxand a min occur we shall have a first and last contact — in general we mayhave only one such special contact, or even none at all.] Thus to find wherethe max and min occur we just solve the equations ∇f = λ∇g and g = 0.

There is a convenient format for this. Define

F (x, y, λ) = f(x, y)− λg(x, y)

and now we just have to solve ∇F = 0. The number λ is called a multiplierand this method is called the Lagrange multiplier method (in honor of theFrench mathematician to whom it is attributed). For the very simple problemabove we have to solve

2x− λy = 0, 2y − λx = 0, xy = 1.

42

Substitute for y and deduce that x(4 − λ2) = 0. We cannot have x = 0for then y = 0 in contradiction to xy = 1. Hence λ = 2 or λ = −2. Butλ = −2 gives y = −x and so −x2 = 1, which is impossible. So we must haveλ = 2, y = x, x2 = 1. This give (x, y) = (1, 1) or (x, y) = (−1,−1). In eithercase we get x2 + y2 = 2. How do we tell if this is a max or a min? Clearlythere is a min since, firstly, all the values of x2 + y2 are non-negative and,secondly, we cannot have a maximum because x2 + y2 takes arbitrary largevalues on the hyperbola xy = 1.

There is a useful piece of algebra for many of these problems. At theextreme values we certainly have r · ∇f = 0. In the above example thisleads to the equation: x2 + y2 = λ, and so we are done as soon as we findλ. In most of these problems we know in advance (often by geometry or byphysics) that we are looking for, say, a maximum and the problem is simplyto find the value of that maximum. [There are a few sneaky problems wherethe Lagrange method may deceive us. If we try to optimize x subject tox2 − y2 = 1 we shall come to the apparent solutions of x = −1 and x = 1.A picture will soon convince us that neither a max nor a min exists in thiscase!]

Everything we have said above applies equally to the problem of optimiz-ing f(x, y, z) subject to g(x, y, z) = 0. We again have to solve ∇f = λ∇galong with g = 0 — but now we have four simultaneous equations in fourvariables. This is usually formidable, but for a symmetric problem like op-timizing xyz subject to x2 + y2 + z2 = 1 we readily find that the extremalvalues occur for x2 = y2 = z2 = 1, and this gives a max of 1/(3

√3) and a

min of −1/(3√3).

The argument extends to n variables and becomes a powerful tool forproving inequalities. For example, the classical AGM inequality (Arithmetic-Geometric-Mean inequality) states that, for positive variables x1, . . . , xn, wehave

(x1x2 . . . xn)1n ≤ 1

n(x1 + x2 + · · ·+ xn).

To prove the AGM inequality, first use the Lagrange method to maximizethe product x1x2 . . . xn over positive variables that sum to 1. We easilyshow that the maximum occurs when all variables are equal. This gives theAGM inequality when the variables sum to 1, and the general case followsby homogeneity.

What happens if we have two constraints, say that g(x, y, z) = 0 andh(x, y, z) = 0? Each constraint gives a surface and the intersection is a

43

curve, say Γ. As the contour surfaces f(x, y, z) = c move out we get a firstcontact with Γ, say at P . It is geometrically clear that the contour surfacetouches the curve at P . Let t be a tangent vector to the curve at P . Thent is perpendicular to the normal to the contour surface, i.e. to ∇f . Sincet is tangent to the intersecting curve it is also perpendicular to the normalvector to each surface; so t is perpendicular to both ∇g and ∇h. So eachof these three gradient vectors must “lie” in the plane perpendicular to t. Itfollows (except in nasty singular cases) that we must have

∇f = λ∇g + µ∇h

for some scalars λ, µ. Now we have to solve this equation together with g = 0and h = 0. Alternatively we may define

F (x, y, z, λ, µ) = f(x, y, z)− λg(x, y, z)− µh(x, y, z)

and then we just have to solve ∇F = 0. Well, it’s five equations in fiveunknowns, and we have to be very careful to maintain correct logic in solvingthe simultaneous system.

Here is an interesting example. Three numbers have sum 7 and sum ofsquares 17; what is the greatest and least value of their product? Let

F (x, y, z, λ, µ) = xyz − λ(x+ y + z − 7)− µ(x2 + y2 + z2 − 17).

We get ∇F = 0 if and only if

yz = λ+ 2µx, zx = λ+ 2µy, xy = λ+ 2µz, x+ y + z = 7, x2 + y2 + z2 = 17.

Using subtraction on the first three equations leads to

z(x− y) = −2µ(x− y), x(z − y) = −2µ(y − z).

Thus x = y or z = −2µz together with y = z or x = −2µ. There are fourlogical possibilities. The case x = y = z leads to a contradiction. The casex = y = −2µ gives z = 7−x− y = 7+4µ. The sum of squares equation nowgives us

4µ2 + 2µ2 + (4µ+ 7)2 = 0, 3µ2 + 7µ+ 4 = 0

and hence we get µ = −1 or µ = −4/3. This leads to (x, y, z) = (2, 2, 3) or(x, y, z) = (8/3, 5/3, 5/3). The other two logical cases just lead to permuta-tions of these values, for example, (2, 3, 2) and (3, 2, 2). The corresponding

44

products are 12 and 320/27 = 12 − 23/27, so there is not much differencebetween the max and the min. We can see this geometrically. The two con-straint equations give a plane and a sphere and the intersecting circle happensto be a small circle. It is an interesting exercise to see what happens whenwe replace 7 and 17 by other numbers.

There is another nice way to solve this problem using the theory of equa-tions and symmetric polynomials. Let a, b, c be three numbers such thata+b+c = 7 and a2+b2+c2 = 17. These give the three roots of the equation(t− a)(t− b)(t− c) = 0. Multiply out to see that we have the equation

t3 − (a+ b+ c)t2 + (bc+ ca+ ab)t− abc = 0.

But

49 = (a+ b+ c)2 = a2 + b2 + c2 + 2(bc+ ca+ ab) = 17 + 2(bc+ ca+ ab)

and so bc+ ca+ ab = 16. So our equation is

t3 − 7t2 + 16t− k = 0

where k = abc. We want to know for which values of k we get three rootsfor this equation. Now we just graph the cubic t3 − 7t2 + 16t. Elementarycalculus shows that there is a local max at t = 2 with value 12, and a localmin at 8/3 with value 320/87. We get three roots only for 320/87 ≤ k ≤ 12.This solves the problem! In fact the graph shows that the three roots arealways positive — which means that the circle in this case lies entirely in thepositive octant.

There is another elementary, but messy way to do the problem. We caneliminate z from the two constraint equations to get a circle parametrizablein the form (x, y) = (p, q) + r(cos(θ), sin(θ)). Then z has a similar formulaand so xyz is a degree three polynomial in cos(θ) and sin(θ), but this is amessy one variable max/min problem.

Consider another natural problem. What is the greatest volume of arectangular box with surface area 600? We can apply the Lagrange multipliermethod to xyz subject to xy+yz+zx = 300 (with all variables positive), butwe can also do it just by algebra. When x = y = z = 10 we get a volume of1000 and we need to show that no greater volume can occur. By naming thesides appropriately we may suppose that x ≥ y ≥ z. Now consider the boxwith sides

√xy,

√xy, z. This has the same volume, but the new surface area,

45

xy+2z√xy is smaller (?why). Increase z to z∗ to move the surface area back

up to 600 and now our box with sides√xy,

√xy, z∗ has bigger volume than

before. So it is enough to search amongst the boxes with x = y ≥ z. [Wecould now reduce to a one variable calculus problem, but we’ll carry on byalgebra.] We can’t have x < 10 else the surface area would be less than 600.Also, if x > 10 then we must have z < 10, else the surface area is greaterthan 600. So we may suppose that x = y ≥ 10, z ≤ 10. Now we have

1000− xyz = (10− x)(10− y)(10− z) + 100(x+ y + z)− 10(xy + yz + zx)

and so it is now enough to show that

10(xy + yz + zx) ≥ xy + yz + zx = 300

or, x+ y + z ≥ 30. This amounts to showing that

x2 + y2 + z2 + 2(xy + yz + zx) ≥ 900

or, x2 + y2 + z2 ≥ 300. But we have

(x− y)2 + (y − z)2 + (z − x)2 ≥ 0

and hence2x2 + 2y2 + 2z2 ≥ 2xy + 2yz + 2zx = 600.

This completes the proof.For this kind of geometrical problem we often solve a second problem for

free by a kind of duality principle. A non-cube rectangular box with surfacearea S = 600 has volume V < 1000. Increase the sides to achieve V = 1000and then the surface area gets larger. Thus, for boxes with V = 1000, thesmallest surface area occurs for the cube with sides 10, 10, 10. In otherwords, max{V : S = 600} occurs for the values of (x, y, z) that achievemin{S : V = 1000}. Thus we have found the minimum of xy+yz+zx subjectto the constraint xyz = 600. Unfortunately this nice duality principle doesnot work for arbitrary functions V and S.

Some other natural max/min problems can be solved by pure geometryor by using trigonometric identities.

46

PROBLEMS 2


1. Calculate the first order partial derivatives for the functions:

(x2 + y2) exp(−xy), xy exp(−x2 − y2), xy log(1 + x2 + y2),

(x2 − y2)/(x2 + y2), (x+ y) sin(xy).

Make up as many other exercises as needed.2. For the surface z = (x2 − y2)/(x2 + y2), find the steepest slope at the

point (1, 1, 0) and the direction in which it occurs.3. For the surface z = xy, calculate the directional derivative at the

point (1, 1, 1) in an arbitrary direction. What information does this give youabout the shape of the surface near this point? Does it give any quantitativeinformation that is not already given by the tangent plane at this point?

4. Find the tangent plane to the surface z = x3 + y3 − 3xy at the point(1, 2, 3).

5. Find the tangent plane to the surface given by

x3 + y3 + z3 = 3xyz

at each of the points (1, 1, 1), (1,−1, 0).6. Given any smooth function V of x, y and the change of variables

x = u3/v, y = v3/u, show that

u∂V/∂u+ v∂V/∂v = 2x∂V/∂x+ 2y∂V/∂y.

7. V is a function of x, y, z where x = u+v+w, y = vw+wu+uv, z = uvw.Express ∂V/∂u, ∂V/∂v, ∂V/∂w in terms of ∂V/∂x, ∂V/∂y, ∂V/∂z and showthat

u∂V

∂u+ v

∂V

∂v+ w

∂V

∂w= x

∂V

∂x+ 2y

∂V

∂y+ 3z

∂V

∂z.

47

8. Let V be a smooth function of x, y and let x = ϕ(x, y), y = ψ(x, y) bea smooth change of variables with ϕ, ψ homogeneous functions of degree k.Prove that

u∂V

∂u+ v

∂V

∂v= kx

∂V

∂x+ ky

∂V

∂y.

9. Use problem 8 to make up lots of problems like problem 6.10. Generalize problem 8 to three variables, and then make up lots of

problems like problem 7. What happens to the main equation if each changeof variable formula has a different degree of homogeneity?

11. Let V = (x2 + y2)α. For which values of α is V harmonic , thatis, ∂2V/∂x2 + ∂2V/∂y2 = 0? Let W = (x2 + y2 + z2)α. For which α is Wharmonic, that is, ∂2W/∂x2 + ∂2W/∂y2 + ∂2W/∂z2 = 0?

12. Verify the construction given in the text for the hyperboloid of onesheet. [Hint: for each point P on the circle z = 1, x2 + y2 = 1, calculate theλ-line through P and then calculate the coordinates of the point Q wherethis λ-line meets the circle z = −1, x2 + y2 = 1. Check that the point Qis obtained from P by dropping two units vertically from the top circle tothe bottom circle and then rotating through one right angle. Note that thelength of PQ is the same for all such choices of P . Calculate this length tomake an exact model of the hyperboloid.]

13. Find the local extrema for the surface z = x3 + y3 − 3xy and decidethe type of each extremum. Repeat for the surface z = x4 + y4 − 4xy.

14. Find the local extrema for the surface z = x2 + y2 − xy2 and decidethe type of each extremum.

15. The surface z = sin(xy) clearly behaves like the saddle z = xy nearthe origin. Does the surface have any other local extrema?

16. Describe the set of local extrema for the surface z = xy sin(xy).17. A positive function of one variable which has exactly on local ex-

tremum, which is a local minimum, must have a global minimum (whichcoincides with the local minimum). Is the analogous statement true for func-tions of two variables?

18. Find the local extrema for the three variable function

x2 + y2 + z2 − 2xyz.

Same problem for the function x3 + y3 + z3 − 3xyz.19. Find the greatest and least values of 3x subject to x2 + xy + y2 = 3.20. Find the nearest points to the origin on the ellipse 2x2+4xy+3y2 = 9.

Same problem for the hyperbola x2 + 4xy + y2 = 6.

48

21. Two numbers have sum of squares 25. What is the greatest value oftheir product? Does the answer change if “numbers” means “integers”?

22. Two numbers have sum of cubes 91. What is the greatest value oftheir product? Does the answer change if “numbers” means “positive realnumbers”?

23. What is the nearest point to the origin on the surface xy+yz+zx = 3?Same problem for the surface xyz = 8.

24. Find the highest and lowest points on the curve of intersection of thesurfaces z = x2 + y2 and x+ 2y − z = 1. Same problem for the intersectionof the surfaces z = 3x2 + 2y2 and z = 5 − 2x2 − 3y2. Can you solve theseproblems by a one variable parameterization method instead of the Lagrangemultiplier method?

25. Three numbers have sum a(> 0) and sum of squares b2. Find nec-essary and sufficient conditions on a, b so that the three numbers must bepositive. For these cases, find the greatest and least values of the product ofthe three numbers.

26. Find the point P inside the triangle ABC such that the sum of thedistances PA + PB + PC is least. [Hint: Let P = (x, y) and let f(x, y) bethe sum of the distances. Show that ∇f = u + v +w, where these are theunit vectors in the directions from P to A,B,C respectively. Deduce thatthe minimum occurs when each angle between PA, PB, PC is 2π/3. Showthat the minimizing point F is constructed geometrically as follows. Con-struct equilateral triangles, external to triangle ABC on sides AB,AC. ThenF is the intersection of the circumscribing circles for these two equilateraltriangles. The point F is called the Fermat point of the triangle.]

27. Let P,Q,R be arbitrary points on sides BC,CA,AB of triangle ABC.Show that the sum of the distances PQ+QR+RP is minimized when PQRis a billiard trajectory and that the minimizing points are the feet of thealtitudes of triangle ABC. [Hint: Use Lagrange multipliers.]

28. Use the method in the text to prove the optical properties of parabolasand hyperbolas. (Specify the parabola by the focus-directrix equation, andthe hyperbola by the equation AP −BP = c.)

29. Find a polynomial p(x, y) such that p(x, y) > 0 for all points (x, y)and yet p(x, y) may have values arbitrarily close to 0.

30. For positive variables x1, . . . , xn, y1, . . . , yn, and p, q > 1 subject to1/p+ 1/q = 1, prove that

x1y1 + · · ·+ xnyn ≤ [xp1 + · · ·+ xpn]1/p [yq1 + · · ·+ yqn]

1/q.

49

3 INTEGRAL CALCULUS FOR SEVERAL VARIABLES

For a positive function of one variable, y = f(x), the definite integral∫ b

af(x)dxmeasures the area enclosed between y = 0 and y = f(x) from x = a

to x = b. Indeed we may as well use this as the intuitive definition of thedefinite integral. For a general function we take the definite integral to be thesigned area; areas above the axis are counted as positive and areas below theaxis are counted as negative. It helps to think of the integral as a generalizedsum of thin rectangles of height f(x) (which may be positive or negative)and width dx. The signed area viewpoint leads almost immediately to theFundamental Theorem of Calculus which is the principal tool to calculatedefinite integrals.

Now consider a positive smooth function of two variables, say, z = f(x, y).We define the double integral of f over the rectangle R = [a, b]× [c, d] to bethe volume enclosed between the plane z = 0 and the surface z = f(x, y)above the rectangle R, and we denote it by∫ ∫

R

f(x, y)dxdy.

For a general function, we define the integral as the signed volume. Dividethe rectangle into a grid of small rectangles with sides dx and dy. Thenwe may regard the double integral as a generalized sum of the cuboids withsigned volume f(x, y)dxdy. We may sum the cuboids with respect to y andthen we may sum these sums with respect to x; or we may do it the otherway around. This gives us the formulas∫ ∫

R

f(x, y)dxdy =

∫ b

a

{∫ d

c

f(x, y)dy

}dx =

∫ d

c

{∫ b

a

f(x, y)dx

}dy.

There is a second way to view this formula. Let

A(x) =

∫ d

c

f(x, y)dy

so that A(x) measures the cross-sectional area of a slice of the region whosevolume we are finding, the slice being parallel to the (x, z)-plane. ThenA(x)dx gives the volume of a thin cross-section (dx thick) and we get thetotal volume by integrating. A similar argument applies when we integrate

50

first with respect to x. [This method of slices was already used in earliercalculus classes.] These two repeated integrals enable us to calculate doubleintegrals by using one-variable methods. In particular, they show us that∫ ∫

R

(f + g)dxdy =

∫ ∫R

fdxdy +

∫ ∫R

gdxdy∫ ∫R

αfdxdy = α

∫ ∫R

fdxdy.

As a simple example, take f(x, y) = x2 + y2 and R = [0, 1]× [0, 1]. Then∫ ∫R

(x2 + y2)dxdy =

∫ 1

0

{∫ 1

0

(x2 + y2)dy

}dx =

∫ 1

0

(x2 +1

3)dx =

2

3.

In some cases the calculation is very quick. Suppose that the functionf(x, y) factorizes as f(x, y) = ϕ(x)ψ(y). Then we get∫ ∫

R

fdxdy =

∫ b

a

{ϕ(x)

∫ d

c

ψ(y)dy

}dx =

∫ b

a

ϕ(x)dx

∫ d

c

ψ(y)dy.

Of course, most functions f(x, y) do not factorize in this way; but often wecan write f(x, y) as a sum of such factorizations. For example, we may alsocalculate our integral above by∫ ∫

R

(x2 + y2)dxdy =

∫ 1

0

x2dx

∫ 1

0

dy +

∫ 1

0

dx

∫ 1

0

y2dy =1

3+

1

3=

2

3.

We are now equipped to calculate lots of volumes, but in fact we arestill severely restricted in that we can calculate only volumes that are abovea rectangle. How can we calculate volumes above a triangle or a disk ormore general regions in the (x, y)-plane? Consider the simple case of thetriangle T with vertices (0,0), (1,0) and (1,1). When we consider a y-slice,the integration with respect to y does not run from 0 to 1 but only from 0to x (draw a picture!). But the slicing argument still works and we get∫ ∫

T

fdxdy =

∫ 1

0

{∫ x

0

f(x, y)dy

}dx.

As a simple example, let f(x, y) = e−x2and we get∫ ∫

T

e−x2

dxdy =

∫ 1

0

{∫ x

0

e−x2

dy

}dx =

∫ 1

0

xe−x2

dx =1

2(1− e−2).

51

Now let D be the disk whose boundary is the unit circle x2 + y2 = 1. Forthe y-slice we have to integrate y from −

√1− x2 to

√1− x2, and then we

have to integrate x from -1 to 1. These are not very nice limits of integration,but happily there is a better way to integrate over disks (see below). Moregenerally we can do integrals over oval regions R where the y-slice runs fromthe bottom function β(x) to the top function τ(x), while x runs from a to b.Then we calculate the double integral by∫ ∫

R

f(x, y)dxdy =

∫ b

a

{∫ τ(x)

β(x)

f(x, y)dy

}dx.

For example, let f(x, y) = x2y2 and let R be the region cut off the parabolay2 = 4x by x = 0 and x = 1. Then the region is given by the inequalities−2

√x ≤ y ≤ 2

√x and 0 ≤ x ≤ 1. Thus∫ ∫

R

x2y2dxdy =

∫ 1

0

{∫ 2√x

−2√x

x2y2dy

}dx =

∫ 1

0

16

3x

72dx =

32

27.

Not every region is shaped like an oval, but nastier regions can often be cutup into a finite number of oval pieces and then we just integrate over eachpiece and add the answers.

In the above discussion we have focused on y-slices. Clearly we can do asimilar analysis for x-slices. For example, the triangle T with vertices (0,0),(1,0) and (1,1) may also be described by the inequalities y ≤ x ≤ 1 and0 ≤ y ≤ 1. Thus we get∫ ∫

T

f(x, y)dxdy =

∫ 1

0

{∫ 1

y

f(x, y)dx

}dy.

More generally, if the region R can be described by the inequalities λ(y) ≤x ≤ ρ(y) and c ≤ y ≤ d, then we get∫ ∫

R

f(x, y)dxdy =

∫ d

c

{∫ ρ(y)

λ(y)

f(x, y)dx

}dy.

For example, the parabolic region considered above can be described by theinequalities y2/4 ≤ x ≤ 1 and −2 ≤ y ≤ 2. This gives another way tocalculate the double integral using only polynomial calculus — but muchmessier arithmetic.

52

This last example raises an obvious question. Should we work with y-slicesor x-slices? After working many problems, one may begin to predict whichmethod will be easier. In some cases, only one of the methods is feasible.For example, in our earlier example with the function e−x2

we could not havecarried out the integration with respect to x first. We should also point outthat innocuous looking functions can often lead to hopeless integrals at thesecond stage of the repeated integral. Consider f(x, y) = cos(xy) over thesquare R = [0, 1]× [0, 1]. Then we get∫ ∫

R

cos(xy)dxdy =

∫ 1

0

sin(x)

xdx

and we are forced to use numerical approximations for the last integral.[Or we could use the Taylor polynomials for sin(x) for a hand calculatedapproximation.]

As an application we calculate an important volume (the formula wasprobably given in school). The standard simplex is the tetrahedron withvertices (0,0,0), (1,0,0), (0,1,0) and (0,0,1). Alternatively, this is the solidregion in the positive octant that is enclosed by the plane x + y + z = 1.To get its volume we want to integrate the function z = 1 − x − y over thetriangle T with vertices (0,0), (1,0) and (0,1). The triangle T is described bythe inequalities 0 ≤ y ≤ 1 − x and 0 ≤ x ≤ 1. Thus the volume V is givenby

V =

∫ 1

0

{∫ 1−x

0

(1− x− y)dy

}dx =

∫ 1

0

1

2(1− x)2dx =

1

6.

This agrees with the formula that gives the area of a pyramid as one thirdof the base times the height. This latter formula can also be derived byusing horizontal slices (the slice at height z has area (1 − z)2/2) and thenintegrating the area function with respect to z from 0 to 1.

There are several other applications for double integrals. Suppose that alamina in the plane has the shape of the region R. Suppose that the laminahas variable density, given by the function ρ(x, y). A small rectangle withsides dx, dy then has mass dm = ρ(x, y)dxdy and so the total mass of thelamina is given by

M =

∫ ∫R

ρ(x, y)dxdy.

[This idea is more of theoretical interest, since it is very difficult to determinea formula for the density function ρ(x, y).]

53

The small mass dm exerts a moment xdm about the y-axis. So the totalmoment about the y-axis is given by∫ ∫

R

xdm =

∫ ∫R

xρ(x, y)dxdy.

Suppose the lamina is compressed to a single point of mass M . How farshould this point mass be from the y-axis to give the same moment. If thesigned distance is x, then we need

xM =

∫ ∫R

xdm.

A similar discussion applies for moments about the x-axis and leads to theequation

yM =

∫ ∫R

ydm.

The point (x, y) is called the center of mass of the lamina. It is the balancingpoint for the lamina. To see this, note that the moment about the line x = xis given by ∫ ∫

R

(x− x)dm =

∫ ∫R

xdm− x

∫ ∫R

dm = 0.

Similarly we get zero moment about the line y = y.When the density function ρ(x, y) is constant, the equations simplify to

xA =

∫ ∫R

xdxdy, yA =

∫ ∫R

ydxdy.

where A is the area of the region R. Even in this case, the location of thecenter of mass may not be obvious. For a triangular lamina with uniformdensity the center of mass is at the intersection of the medians, since wereadily convince ourselves that the triangle balances about each median line.[In fact, this leads to a physics proof for the concurrence of the medians!]Consider now the semicircular lamina, x2 + y2 ≤ a2, y ≥ 0, again withuniform density. By symmetry, the center of mass lies on the y-axis, and wehave

yπa2/2 =

∫ ∫R

ydm =

∫ a

−a

∫ √a2−x2

0

ydydx =

∫ a

−a

(1/2)(a2 − x2)dx =2a3

3

54

and so y = (4a)/(3π). It is physically obvious that y < a/2, but the precisevalue of y is not obvious (without calculus).

Analogously we can consider moments of inertia, where the small con-tribution is given by x2dm or y2dm, and then we can define the center ofinertia (x, y) where

x2M =

∫ ∫R

x2dm, y2M =

∫ ∫R

y2dm.

The above formulas have a different interpretation in the setting of contin-uous probability. A probability density function on the plane is a non-negativefunction ρ(x, y) such that ∫ ∫

R2

ρ(x, y)dxdy = 1.

The probability that an observation (x, y) lies in a region R is then given by

Pr[(x, y) ∈ R] =

∫ ∫R

ρ(x, y)dxdy.

The average (or expected) value of x is given by

x =

∫ ∫R2

xρ(x, y)dxdy

and we have a similar formula for y. The second moments are used to definethe ideas of variance.

In calculating the center of mass of a semidisk above, we ought to havebeen tempted to ask how to do the problem in polar coordinates, since thesemidisk has such a nice description in polar coordinates, namely:

0 ≤ r ≤ a, 0 ≤ θ ≤ π.

How do we carry out integration in polar coordinates? Instead of covering theplane with the grids x = a and y = b we can cover with the grids r = a andθ = b. The region enclosed between r = a, r = a + dr and θ = b, θ = b + dθis almost a rectangle with sides dr and rdθ and area rdrdθ. [The error is sosmall that it disappears in the limit.] Thus we get∫ ∫

f(x, y)dxdy =

∫ ∫F (r, θ)rdrdθ.

55

For the first integral we express the region in terms of inequalities in x andy; for the second we express the same region in terms of inequalities in r andθ. Thus the center of mass problem above could be done as∫ ∫

ydxdy =

∫ a

0

∫ π

0

r sin(θ)rdrdθ =

∫ a

0

r2dr

∫ π

0

sin(θ)dθ.

Here is a famous application of the polar method. [Lord Kelvin said thata mathematician was someone to whom this argument was obvious!] LetI =

∫∞−∞ e−x2

dx. Then

I2 =

∫ ∞

−∞e−x2

dx

∫ ∞

−∞e−y2dy =

∫ ∞

0

∫ 2π

0

e−r2rdrdθ = π.

Notice that an annulus is a “polar rectangle”, for example, a ≤ r ≤ band 0 ≤ θ ≤ 2π. Such regions of integration are important in physics whenwe have to avoid a singularity at the origin (where the force in question maybecome infinite). We can also integrate over other interesting polar regionsof the form

0 ≤ r ≤ ϕ(θ), α ≤ θ ≤ β.

A common type of problem involves finding the volume enclosed betweentwo surfaces, a top surface zt and a bottom surface zb. Evidently the volumeis given by ∫ ∫

(zt − zb)dxdy

but how do we find the region of integration? The two surfaces meet wherezt = zb and this gives an equation in x, y, namely a cylinder in 3-space; thevolume is enclosed within this cylinder and this determines the region ofintegration. To illustrate, take zt = 12− x2 − 2y2 and zb = 2x2 + y2. Thesemeet where 12 = 3x2 + 3y2 and so the volume is given by

V =

∫ ∫(12− 3x2 − 3y2)dxdy =

∫ 2

0

∫ 2π

0

(12− 3r2)rdrdθ = 24π.

When we move from two variables to three or more variables we losethe natural geometrical meaning of integration as a volume, since we aremeasuring a region in four or more dimensions. For positive functions ofthree variables we can think of the function value as a variable density of a

56

solid region (say a cube, at first); thus f(x, y, z)dxdydz gives the mass of alittle piece of the solid and then the triple integral∫ ∫ ∫

f(x, y, z)dxdydz

measures the mass of the solid. Evidently we calculate it as a repeatedintegral. For the unit cube C we get∫ ∫ ∫

C

f(x, y, z)dxdydz =

∫ 1

0

dx

{∫ 1

0

dy

{∫ 1

0

f(x, y, z)dz

}}.

We can integrate over more general regions by describing them by in-equalities. For example, consider the standard simplex which is the regionin the positive octant under the plane x + y + z = 1 (equivalently, it is thetetrahedron with vertices (0, 0, 0), (1, 0, 0), (0, 1, 0), (0, 0, 1)). This region isdescribed by the inequalities:

0 ≤ z ≤ 1− x− y, 0 ≤ y ≤ 1− x, 0 ≤ x ≤ 1.

If the tetrahedron has variable density ρ = xyz then the mass of the tetra-hedron is given by

M =

∫ 1

0

xdx

∫ 1−x

0

ydy

∫ 1−x−y

0

zdz.

Notice that it takes some effort to work out such integrals! We get the sameformula for the center of mass of a region, but now there are three equations:

xM =

∫ ∫ ∫xdm, yM =

∫ ∫ ∫ydm, zM =

∫ ∫ ∫zdm

where dm = ρdxdydz. [Related formulas apply for moments of inertia.]When we integrate over complicated regions, the work involved becomes

substantial and we seek simplifications. The most important examples aresolid cans and solid spheres. For these, we use different coordinate systems.It is natural to describe a can by using polar coordinates instead of x andy, but leaving the z coordinate unchanged. This system is called cylindricalcoordinates (r, θ, z). Evidently we just get our little volume element givenby

dxdydz = rdrdθdz.

57

Thus, for a can of height one and base radius one, sitting on the origin withdensity ρ = z2 we get total mass

M =

∫ ∫ ∫z2rdrdθdz =

∫ 1

0

rdr

∫ 2π

0

dθ

∫ 1

0

z2dz =π

3.

We can integrate over a sphere of radius a in cylindrical coordinates byusing the inequalities

−√a2 − r2 ≤ z ≤

√a2 − r2, 0 ≤ r ≤ a, 0 ≤ θ ≤ 2π.

But, often it is better to introduce spherical coordinates. In this systemthere is just one length coordinate ρ, the distance of the point from theorigin, and two angle coordinates. The angle θ is just the polar coordinateangle as before, and ϕ is the angle by which we fall down from the northpole. Thus, at the equator, ϕ = π/2 and at the south pole, ϕ = π. Theseare intimately related to longitude and latitude on the surface of the earth,but the easiest way to understand them is to see them as obtained by twopolar coordinate changes. First change (x, y, z) to (r, θ, z) by x = r cos(θ)and y = r sin(θ). Now think of (z, r) as Cartesian coordinates and changethem to polar coordinates (ρ, ϕ) by z = ρ cos(ϕ) and r = ρ sin(ϕ). For thecomplete change of coordinates we get

x = ρ sin(ϕ) cos(θ), y = ρ sin(ϕ) sin(θ), z = ρ cos(ϕ).

A little volume element changes in stages

dxdydz = rdrdθdz = rdθρdρdϕ = ρ2 sin(ϕ)dρdθdϕ.

As an illustration we find the center of mass of a uniform solid hemisphere,radius a. By symmetry, it is given by (0, 0, z). The northern hemisphere isdescribed by 0 ≤ ρ ≤ a, 0 ≤ θ ≤ 2π, 0 ≤ ϕ ≤ π/2 and so we get

z2

3πa3 =

∫ a

0

ρ3dρ

∫ 2π

0

dθ

∫ π/2

0

cos(ϕ) sin(ϕ)dϕ =π

4

and hence z = 3a/8.We can form “ice cream cones” by restricting the ϕ angle. For example the

region enclosed by the sphere x2+y2+z2 = 1 and the cone x2+y2 = 3z2 has

58

maximal ϕ angle given by the angle between the z-axis and the line x =√3z,

namely π/3 (why?). Thus the region is described by the inequalities:

0 ≤ ρ ≤ 1, 0 ≤ θ ≤ 2π, 0 ≤ ϕ ≤ π/3.

We defer to a later appendix the problem of the general formula for changeof variables in integration. For now we turn to the general problem of surfacearea. We begin with two important examples where we can use a deliciouslysimple argument.

What is the surface area of a sphere of radius R? We regard a solid sphereas made up of a series of concentric spheres of thickness dr and surface areaS(r). We get the total volume by integrating and so

4

3πR3 =

∫ R

0

S(r)dr.

Differentiate with respect to R and the Fundamental Theorem of Calculusgives S(R) = 4πR2.

What is the surface area of a torus (the skin of a donut) whose cross-section is a circle of radius a and whose inner circle has radius b? The washermethod of Calculus I gave the volume of the torus to be 2π2a2b. Think ofthe torus as made up of lots of concentric tori (macaroni inside macaroni!)of thickness dr where r varies from 0 to a. Again we get the total volume byintegrating and so

2π2a2b =

∫ a

0

S(r)dr.

Differentiate and we get the surface area to be 4π2ab = 2πa2πb. Althoughwe can’t cut up the surface of the torus in the obvious way and flatten it outto get a rectangle with sides length 2πa and 2πb, the formula tells us thatthe total surface amounts to this quantity.

These arguments work because the dr’s add up in any direction to givethe total (constant) radius. The argument fails when we try it on an ellipsoid(indeed, there is no simple formula in that case).

Suppose now that we have a surface z = f(x, y) that sits above a regionR in the (x, y)-plane. We begin with the linear case and see how to getthe general case from that. When we open up a book, the area of onecover is greater than the projected area on the (x, y)-plane. The ratio of theprojected area to the cover is cos(θ) where θ is the angle between the planes.

59

If z = λx+µy+ ν then the upward normal vector is given by < −λ,−µ, 1 >and so we get 1/ cos(θ) =

√λ2 + µ2 + 1. At the infinitesimal level we have

dS =1

cos(θ)dxdy.

(Actually, in general, the little rectangle with sides dx and dy is twisted to aparallelogram on the plane above; but the area formula still works — why?)For a general surface we identify

λ =∂z

∂x, µ =

∂z

∂y

and hence we get the formula

S =

∫ ∫ √(∂z

∂x)2 + (

∂z

∂y)2 + 1 dxdy.

Not surprisingly we find that it is usually not possible to calculate this in-tegral exactly. Sometimes the surface function z is given in terms of polarcoordinates. Using an earlier chain rule computation we immediately get thepolar formula

S =

∫ ∫ √(r∂z

∂r)2 + (

∂z

∂θ)2 + r2 drdθ.

We can also represent a surface parametrically (with two parameters).Recall that a plane can be represented by

r = r0 + su+ tv.

We now ask: what is the image in 3-space of a rectangle in (s, t)-space withsides ds, dt? We readily check the answer to be a parallelogram with sidesgiven by the vectors dsu and dtv. This little parallelogram has area

|u× v|dsdt

and so the total area is the integral. For a plane, the vector product hasconstant length. For a general surface we get r = r(s, t) and we identifythe coefficients of s and t as (vector) partial derivatives and hence get theformula

S =

∫ ∫ ∣∣∣∣∂r∂s × ∂r

∂t

∣∣∣∣ dsdt.60

Again, we usually cannot calculate this exactly. As an amusing exercise youmay like to re-calculate the surface area of a torus using the parameterization

r =< (b+ a cosψ) cos θ, (b+ a cosψ) sin θ, a sinψ > .

As a closing touch, consider the problem of the intersection of two per-pendicular right circular cylinders of radius a. What is the volume of theircommon intersection and what is its surface area? Take one cylinder asx2 + y2 = a2 and the other as x2 + z2 = a2. Then zt =

√a2 − x2 and

zb = −√a2 − x2. Hence we get the volume by integrating

√a2 − x2 over the

disk x2 + y2 ≤ a2. By symmetry we can integrate over a quarter disk to getthe volume

V = 8

∫ a

0

∫ √a2−x2

0

√a2 − x2 dydx =

∫ a

0

(a2 − x2)dx =16a3

3.

Or, we may take the cylinders as x2 + z2 = a2 and y2 + z2 = a2. Then thecross-section at height z is a square with side length 2

√a2 − z2. This gives

V = 8

∫ a

0

(a2 − z2)dz =16a3

3.

Or, again, with the cylinders in this form the height function is given by

zt = min(√a2 − x2,

√a2 − y2).

The integration is over a square, but by symmetry we may reduce to just onetriangle and get the volume V as

16

∫ ∫T

ztdxdy = 16

∫ a

0

∫ x

0

√a2 − x2 dydx = 16

∫ a

0

x√a2 − x2 dx =

16a3

3.

For the surface area we may take the above form. By implicit differentiation,

from x2 + z2 = a2, we get x+ z∂z

∂x= 0,

∂z

∂y= 0 and this leads to√

1 + (∂z

∂x)2 + (

∂z

∂y)2 =

a

z.

Again using symmetry we can integrate over just one triangle to get

S = 16

∫ a

0

∫ x

0

a√a2 − x2

dydx = 16a

∫ a

0

x√a2 − x2

dx = 16a2.

61

Notice that we can also solve this problem by the “macaroni” method, sothat the surface area is just the derivative of the volume.

This may be a good time to review some ideas from previous chapters.The boundary surface of the intersection of the two solid cylinders x2+z2 ≤ a2

and y2 + z2 ≤ a2 is not smooth. There is an arete above and below each ofthe lines y = ±x. Part of one arete is given by

r =< t, t,√a2 − t2 > .

This looks like a circle when viewed along the x-axis or the y-axis. In fact,it is part of an ellipse. The distance variable along the line y = x is givenby s = t/

√2, and then we have z =

√a2 − 2s2 for 0 ≤ s ≤ a/

√2. What

is the angle between the two cylinders at the point (t, t,√a2 − t2)? For the

surface F = x2 + z2 = a2 we have ∇F =< 2x, 0, 2z >; For the surfaceG = y2 + z2 = a2 we have ∇F =< 0, 2y, 2z >. The corresponding unitnormal vectors are

m =< t/a, 0,√

1− t2/a2 >, n =< 0, t/a,√

1− t2/a2 >

and so the angle θ between the surfaces is given by

cos(θ) = m · n = 1− t2/a2.

For t = 0, we get cos(θ) = 1, θ = 0, so that we have a smooth point. Fort = a, we get cos(θ) = 0, θ = π/2. For t = a/2 we get cos(θ) = 3/4. Noticethat tangent vectors to the ellipse can be calculated from ∇F ×∇G or elseby calculating r′(t) since we have a parameterization. The parameterizationis preferable if we can find it. For example, the surfaces z = 4−2x2− y2 andz = 1+x2+2y2 meet where x2+y2 = 1 and hence we get a parameterization

r =< cos θ, sin θ, cos2 θ + 2 sin2 θ + 1 > .

62

PROBLEMS 3


1. Integrate xf(xy) over the rectangle [0, 1] × [0, π] for the cases whenf(t) = cos(t), sin(t), e−t. Given that F ′(t) = f(t), what is the general answer?

2. Let p(t) be a polynomial. Integrate p(xy) and p(x+ y) over the square[0, 1]× [0, 1].

3. Repeat problem 2 when the square is replaced by the triangle withvertices at (0, 0), (1, 0), (1, 1). What happens when you try to do the sameexercise using problem 1?

4. Integrate f(x2) over the triangle with vertices at (0, 0), (1, 0), (1, 1)when f(t) = cos(t), sin(t), e−t. Given that F ′(t) = f(t), what is the generalanswer?

5. Given a, b, c > 0, find the volume cut off from the positive octant [thatis, {(x, y, z) : x ≥ 0, y ≥ 0, z ≥ 0}] by the plane

x

a+y

b+z

c= 1.

6. Integrate x2 + y2 over the region enclosed between y = x2 and y = x.Integrate

√1 + x2 + y2 over the unit disk. Integrate 1/(1 + x2 + y2) over

the quarter unit disk in the positive quadrant. Integrate√xy − y2 over the

triangle with vertices (0, 0), (1, 1), (10, 1).7. Given a > 0, integrate (x2 + y2)a over the unit disk x2 + y2 ≤ 1.8. Find the value of k so that k(1 + x2 + y2)−2 is a probability density

function over the plane, in other words the integral of the function over thewhole plane is 1.

9. Given a, b, c > 0, find the volume enclosed between the two surfacesz = ax2 + by2 and z = c− bx2 − ay2.

10. The unit sphere x2 + y2 + z2 = 1 is sliced through by the planex+ y + z = c (for suitable values of c). Find the volume of each piece.

11. Find the volume common to the cylinders y2 + z2 = a2, z2 + y2 = a2.Find the volume common to the cylinders y2 + z2 = a2, z2 + y2 = a2, andx2 + y2 = a2.

63

12. Find the volume of the ellipsoid

x2

a2+y2

b2+z2

c2= 1.

[Hint: how does volume change if we change the scale on one axis? Reducethe problem to that of the volume of a sphere.]

13. Integrate√x2 + y2 over the cardioid r = 1 + cos(θ).

14. The uniform annulus 1 ≤ x2 + y2 ≤ 4 is cut in two by the x-axis.Find the centroid of the upper half.

15. The uniform ellipse 4x2 + y2 = 1 is cut in half by the y-axis. Findthe centroid of the right half.

16. Find the centroid of the semidisk x2+y2 ≤ 1, y ≥ 0, when the densityis given by ρ(x, y) = x2.

17. Find the centroid of the hemisphere x2 + y2 + z2 ≤ 1, z ≥ 0, whenthe density function is given by ρ(x, y) = x2.

18. Find the surface area of the portion of the surface z = x2 + 4y whichlies above the triangle with vertices (0, 0, 0), (1, 0, 0), (1, 1, 0).

19. Find the surface area of the boundary of the solid region that iscommon to the two cylinders y2 + z2 = a2, and z2 + x2 = x2.

20. Use the change of variables u = x − y, v = x + y to integrate thefunction exp[(x−y)/(x+y)] over the triangle with vertices (0, 0), (1, 0), (0, 1).

21. Integrate x2 + y2 over (i) the cylinder x2 + y2 ≤ 1, 0 ≤ z ≤ 1, (ii) thesphere x2 + y2 + z2 ≤ 1, (iii) the hemisphere x2 + y2 + z2 ≤ 1, z ≥ 0. Sameproblem for each of the functions x2+ y2+ z2, (x+ y+ z)2, yz+ zx+xy. Doall the integrals over the portion of the cone x2 + y2 = z2 cut off by z = 1.Do it again for the region that is the union of this last cone and the spherecenter (0, 0, 1) and radius 1.

22. Calculate the two repeated integrals of (x2 − y2)/(x2 + y2) over thesquare [0, 1] × [0, 1]. The two values are different! How can this be? Showthat the value of the function in polar coordinates is given by cos(2θ) andhence sketch the surface to see that it has nasty behavior near the z-axis.

23. Repeat the first part of the above exercise but use instead thefunction (x − y)/(x + y)3. Again we get different values! What happensif we try to calculate the repeated integral over the triangle with vertices(0, 0), (1, 0), (1, 1)?

24. What is the volume of the intersection of the sphere x2+y2+z2 = 4a2

and the cylinder x2+y2 = a2. Same problem for the intersection of the spherewith the cylinder (x − a)2 + y2 = a2. Can you do the case of the general

64

intersection of a sphere and a cylinder? Now try the same problem when thecross-section of the cylinder is a square or an equilateral triangle.

25. Where is the centroid of a tetrahedron?26. A region of area S is rotated about a line (not passing through the

region). A theorem of Pappus says that the volume generated is Sd where dis the distance traveled by the centroid of the region. Prove it! [Hint: startwith the volume formula given by the washer method, or the can method.]

27. The average distance d from a point r0 to a region R is defined by

Ad =

∫ ∫R

|r− r0|dxdy

where A is the area of the region R. Calculate this when R is a disk of radiusa and r0 is the center, a point on the circumference, any point in the plane.Try some other standard regions. Generalize to three dimensions.

28. Rotate a unit cube about a central diagonal. What is the volume(and shape?) thus generated?

29. What is the volume enclosed by the surface√

|x| +√

|y| +√|z| =√

|a|? Same problem for the surface x4 + y4 + z4 = a4.30. Integrate the function xyz over the portion of the cylinder x2+y2 = 1

from z = 0 to z = 1 that is contained in the positive octant. Same problemwhen the cylindrical can is replaced by the sphere x2 + y2 + z2 = 1.

31. For a positive constant k, integrate ρe−kρ2 and (1 + ρ2)−2 over thewhole of 3-space.

32. What is the change of infinitesimal volume under the change of vari-ables x = au, y = bv, z = cv? Generalize spherical coordinates to ellipsoidalcoordinates; what is the formula for the change in infinitesimal volume? Findthe mass of the ellipsoid x2/a2+ y2/b2+ z2/c2 ≤ 1 when the density functionis given by 1− (x2/a2 + y2/b2 + z2/c2)? Try elliptical coordinates to find thesurface area of the ellipsoid.

33. The sphere x2 + y2 + z2 = a2 lies within the cylinder x2 + y2 = a2.Prove that the area of the slice of the sphere from z = b to z = c is thesame as the area of the slice of the cylinder from z = b to z = c. How couldArchimedes prove this without the calculus?

65

4 VECTOR CALCULUS

We have already discussed various examples of vector valued functions.For example, a path in 3-space may be regarded as given by a vector valuedfunction of a single real variable t. For such a path we have three otherassociated vector valued functions of t, namely t,n,b. As another example,the surface F (x, y, z) = c may be regarded as a vector valued function oftwo real variables u, v. At each point on the surface, the gradient vector ∇Fgives a normal vector to the surface, and hence a vector valued function ofthe two real variables u, v. A vector field is just a vector valued function. Forus, it may be a function of one, two, or three real variables; the vector valuesmay be values in 2-space, or values in 3-space. There are many examplesof vector fields in physics; the vector at each point may represent force, orvelocity, or acceleration.

The most general vector field in three variables may be written as

F(x, y, x) = P (x, y, z)i+Q(x, y, z)j+R(x, y, z)k

or, more briefly asF = P i+Qj+Rk.

The nicest vector fields are the gradient vector fields of the form

F = ∇f =∂f

∂xi+

∂f

∂yj+

∂f

∂zk.

The important gravitational force field is a gradient vector field. Let thescalar valued function f be defined by f(x, y, z) = 1/ρ, where, as in spherical

coordinates, ρ2 = x2 + y2 + z2. Then 2ρ∂ρ

∂x= 2x, etc, and so we get

∇f =−1

ρ2

[x

ρi+

y

ρj+

z

ρk

]and we see that this force is inversely proportional in magnitude to the squareof the distance from the origin and the force points towards the origin. Thisis just the gravitational force field (up to a gravitational constant).

There are two important operators associated with a general vector fieldF; physical motivation for their definition will come later. The divergence ofF is defined by

divF =∂P

∂x+∂Q

∂y+∂R

∂z.

66

Notice that it is a scalar valued function. For example,

div∇f =∂2f

∂x2+∂2f

∂y2+∂2f

∂z2.

This formula is called the Laplacian of f and is the most important differen-tial formula in applied mathematics. Suppose that we define ∇ on its ownto mean

∇ =∂

∂xi+

∂

∂yj+

∂

∂zk.

Then we may write divF symbolically as

divF = ∇ · F.

(Note that all of the above have analogues for functions of two variables —we just omit the z and k terms.)

This latter formula suggests that we might consider a vector valued func-tion give by ∇ × F. We call this curlF, and by analogy with the determi-nantal formula for the vector product we write

curlF =

∣∣∣∣∣∣∣∣i j k∂

∂x

∂

∂y

∂

∂zP Q R

∣∣∣∣∣∣∣∣ .The above is the easy way to remember the formula for curlF. The general(unmemorable) formula is given by

curlF =

(∂R

∂y− ∂Q

∂z

)i+

(∂P

∂z− ∂R

∂x

)j+

(∂Q

∂x− ∂P

∂y

)k.

We can combine the operators ∇, div, and curl, provided the actualcombinations make sense. There are two nice cases:

curl∇f = 0, div curlF = 0.

The first is suggested by the fact that u×u = 0, and the second by the factthat u×v is perpendicular to u. In fact, both are true because second ordermixed partial derivatives are equal.

67

In many applications in physics we have spherical symmetry. Thus, letf(x, y, z) = ϕ(ρ), where ρ2 = x2 + y2 + z2. A computation gives

∇f = ϕ′(ρ)ρ

where ρ is the unit position vector given by

ρ =x

ρi+

y

ρj+

z

ρk.

Cases of special interest are ϕ(ρ) = ρc and ϕ(ρ) = log ρ. It is a worthwhileexercise to calculate curl ϕ(ρ)ρ, for general ϕ and also for the above specialcases.

For our next theme it helps to think of our vector field as a force field.In an earlier Calculus course we studied the physical notion of work in the1-dimensional setting. For that case, the work done by a constant force Fin moving along a straight line a distance c is Fc. Suppose now the forcevaries, F (x). The work done in moving a small distance dx is then F (x)dxand we get the total work by integrating. Suppose now we have a force fieldin 3-space,

F = P i+Qj+Rk.

Suppose a particle moves through this force field along the path Γ given byr(t) =< x(t), y(t), z(t) > for t from a to b (call the first point on Γ, A, and thelast point, B). By physical considerations we calculate the work done overa small length ds by using only the component of the force in the directionof the unit tangent vector, namely F · t. This gives contribution F · tds andwe integrate over Γ to get the total work done. How do we integrate over Γ?We recall that

t =dt

ds< x′(t), y′(t), z′(t) >

and so we get F · tds = (Px′ + Qy′ + Rz′)dt. Thus the total work done isgiven by

W =

∫Γ

F · tds =∫ b

a

(Px′ +Qy′ +Rz′)dt.

Of course, the term Px′ means P (x(t), y(t), z(t))x′(t), and similarly for theother two terms. Since x′(t)dt = dx, it is suggestive to write the work integralas

W =

∫Γ

(Pdx+Qdy +Rdz).

68

This formula is intuitive physically since we are adding the work contributionsin the x, y, and z directions. For paths in 2-space we simply put R = 0.There is a subtle point which we glossed over. Our last integral formula forwork gives the impression that the work depends only on the force field andthe physical path followed. But we just chose one parameterization for thepath — there are infinitely many different parameterizations. How do weknow that the integral is the same for each parameterization? Given oneparameterization r(t), it is not hard to show that any other is of the formr(ϕ(t)) and then the equality of the integrals follows by the substitution rulefor integrals of one variable.

As an example in 2-space, take F = yi − xj, and let us consider variouspaths that join A = (0, 0) to B = (1, 1). For the straight line path from Ato B we have x = t, y = t for t from 0 to 1. Then the work done is given by

W =

∫Γ

ydx− xdy =

∫ 1

0

tdt− tdt = 0.

Suppose now we take the straight line path from A to (1, 0) followed by thestraight line path from (1, 0) to B. For the first segment, we have x = t, y = 0for t from 0 to 1; for the second segment, we have x = 1, y = t for t from 0to 1. Hence the work done is given by

W =

∫Γ

ydx− xdy =

∫ 1

0

0dt+

∫ 1

0

−1dt = −1.

Suppose now we take the parabolic path y = x2; thus x = t, y = t2 for t from0 to 1. Hence the work done is given by

W =

∫Γ

ydx− xdy =

∫ 1

0

t2dt− t.2tdt = −1

3.

In each case, we get a different value for the work done. Are there forcefields in which the work done is the same for all paths from A to B? If wechange the above force field to F = yi+xj then the work done is

∫Γydx+xdy

and this ought to be∫Γd(xy), which ought to be xy calculated at B minus

xy calculated at A — in other words, independent of the actual path whichjoins A to B. We can justify this since we have F = ∇f with f(x, y) = xy.In general, suppose that F = ∇f . Then the work done is given by

W =

∫Γ

(∂f

∂x

dx

dt+∂f

∂y

dy

dt)dt =

∫ b

a

d

dtf(x(t), y(t))dt = f(B)− f(A).

69

[You should check that this formula also holds in 3-space.] We say thatthe force field F is conservative if the work done in going from A to B isindependent of the path joining A to B. Are there any conservative forcefields other than gradient vector fields? No, as we now show.

Suppose that F is a force field in some region R which is conservative.[The region R could have “holes” in it; all that matters is that it consist ofone piece, so that we can get a path from any A to any B within R. We shallneed to consider “holes” later when we deal with force fields (like gravitation)that have a singularity at some point, where the force becomes infinite.] Wehave to define f(x, y) so that F = ∇f . Fix any point A in the region R. LetB = (x, y) and let f(x, y) be the work done in moving from A to B alongsome path. Since F is conservative, it does not matter which path we choose.Let C = (x + h, y). To get f(x, y) we find the work done in going from Ato C. We can do this by going along the above path from A to B and thenalong the line segment L from B to C. Hence

f(x+ h, y)− f(x, y) =

∫L

Pdx+Qdy =

∫ x+h

x

P (t, y)dt

and it follows easily that∂f

∂x= P (x, y). A similar argument gives

∂f

∂y=

Q(x, y), and the proof is complete. [Check the 3-space case for yourself.]In the course of the above discussion, we used implicitly the following

facts. If the path Γ from A to C consists of the path Γ1 from A to B,followed by the path Γ2 from B to C, then we have∫

Γ

Pdx+Qdy =

∫Γ1

Pdx+Qdy +

∫Γ2

Pdx+Qdy.

If we reverse the direction on the path Γ by going from C to A, then we aretaking t from b to a. We call this path −Γ. Since

∫ a

b= −

∫ b

a, we see that∫

−Γ

Pdx+Qdy = −∫Γ

Pdx+Qdy.

When F is a conservative force field and Γ is a closed loop from A to A,then the work done in going around Γ is 0. Conversely, suppose that we havea force field F in which the work done around any closed loop is 0. Let Γ1

and Γ2 be any two paths from A to B. Form a closed loop by following Γ1

by −Γ2. There is zero work done around this closed loop and it follows from

70

the above discussion that the work is the same along Γ1 as Γ2. Hence F isconservative (and so is a gradient vector field).

There is a convenient way to distinguish between a path from A to B anda closed loop. When Γ is a closed loop, we denote the integral along Γ by∮

Γ

Pdx+Qdy.

Evidently there are two directions on a closed loop. For simple examples, itis intuitively clear which direction is positive or counterclockwise, and whichis negative or clockwise.

Suppose that F is a non-conservative vector field and that Γ is a closedloop in 2-space. There is a remarkably powerful formula that calculates thework done in going round the loop in terms of a double integral calculatedover the inside of the loop. This result, called Green’s Theorem, assumes thatthere are no singularities of F inside Γ (we shall see below how to modifythe formula when singularities are present). As usual, F = P i + Qj and Γis traversed in the positive direction; we write i(Γ) for the region inside Γ.Then ∮

Γ

Pdx+Qdy =

∫ ∫i(Γ)

(∂Q

∂x− ∂P

∂y)dxdy.

We shall prove Green’s Theorem by a pattern that will be repeatedthroughout this chapter. We prove the result in a very simple case andthen we do a patchwork quilt job for the general case. Suppose first that Γis a rectangle with sides x = a1, x = a2 and y = b1, y = b2. Then Γ consistsof four line segments; we parameterize the horizontal ones by x = t and thevertical ones by y = t. Thus∮Γ

Pdx+Qdy =

∫ a2

a1

P (t, b1)dt+

∫ b2

b1

Q(a2, t)dt+

∫ a1

a2

P (t, b2)dt+

∫ b1

b2

Q(a1, t)dt.

The two Q integrals give a difference that we recognize as coming from anintegration: ∫ b2

b1

[Q(a2, t)−Q(a1, t)]dt =

∫ b2

b1

∫ a2

a1

∂Q

∂x(s, t)dsdt.

A similar formula applies to the two P integrals (but with a different sign)and hence we get Green’s formula for a rectangle.

71

Now consider two such rectangles that meet along part of one edge, butdo not overlap otherwise. Apply Green’s formula to each of the rectangles.The double integrals add up to give the appropriate integral over the regionformed by combining the two rectangles. The common path on one edgeis traversed in opposite directions and so the integrals cancel out along thatpiece. We are left with the outer boundary traversed in the positive direction.(A picture may help.) Keep adding on more and more rectangles and “inthe limit” we get a general path Γ. [For those who worry about whether wecan really approximate the integral around Γ by using only horizontal andvertical line segments, there is a half-way house of first proving the resultfor a triangle (with one side horizontal and one side vertical) by just a slightelaboration of the above argument for a rectangle.]

With P = 0 and Q = x we get∮Γ

xdy =

∫ ∫i(Γ)

dxdy = area(i(Γ)),

and with P = −y and Q = 0 we get∮Γ

−ydx =

∫ ∫i(Γ)

dxdy = area(i(Γ)).

(Explain why these formulas are obvious from the viewpoint of cutting the re-gion into thin strips. Similarly obtain the polar coordinate formula

∮Γ(1/2)r2dθ.)

In computations it turns out to be best to take the average of these two for-mulas to give

area(i(Γ)) =

∮Γ

−1

2ydx+

1

2xdy.

As a nice application of this (used in modern surveying) we shall find thearea of the polygon with vertices (x1, y1), (x2, y2), . . . , (xn, yn). We just haveto calculate the path integral on one line segment and then the rest of theformula will be obvious. The path from (x1, y1) to (x2, y2) is given by

(x, y) = (x1, y1) + t(x2 − x1, y2 − y1)

for t from 0 to 1. Thus dx = (x2 − x1)dt and dy = (y2 − y1)dt and a“fortuitous” cancellation gives the path integral from (x1, y1) to (x2, y2) as

1

2

∫ 1

0

[−y1(x2 − x1) + x1(y2 − y1)]dt =1

2[x1y2 − x2y1].

72

The area inside Γ is just the sum of all such terms, and the last term is12[xny1 − ynx1]. Notice that each term in brackets is a 2 × 2 determinant.

Notice also that the proof does not require that the polygon be convex — wecan have re-entrant vertices.

As a more substantial application we can calculate the area inside theloop in the Folium of Descartes given by x3 + y3 = 3axy. Recall that aparameterization is given by

x =3at

1 + t3, y =

3at2

1 + t3

and that the loop is given by t from 0 to ∞. [Those who worry about ∞can calculate the area of half the loop by going along the folium from t = 0to t = 1 and then back down the line segment from (3a/2, 3a/2) to (0, 0),but there is no need for such squeamishness.] A little calculation and more“fortuitous” cancellation gives the area as∫ ∞

0

9a2t2

2(1 + t3)2dt =

3

2a2.

Often, but not always, the double integral in Green’s Theorem is lessmessy to compute than the path integral, especially if we have to use severaldifferent formulas to describe the path.

What happens when there is a singularity inside Γ? Suppose that

F =−y

x2 + y2i+

x

x2 + y2j

and that Γ is the ellipse x2/32 + y2/42 = 1. We build a fence around thesingularity! In this case, take Γ1 to be the circle x2 + y2 = 1. Join the circleto the ellipse by two line segments along the x-axis. Now we get two closedloops to which we can apply Green’s Theorem (since there is no singularityinside either of them). Again we get cancellation on the two line segmentson the x-axis and if A is the annular region between Γ and Γ1 then∮

Γ

(Pdx+Qdy)−∮Γ1

(Pdx+Qdy) =

∫ ∫A

(∂Q

∂x− ∂P

∂y)dxdy.

For this example we can check that∂Q

∂x− ∂P

∂y= 0 and so the double integral

is zero. Hence the path integrals are equal (this is true for any Γ that encloses

73

the origin) and it is very easy to calculate the one around the circle. We justtake x = cos(θ), y = sin(θ) for θ from 0 to 2π and we get∮Γ1

−yx2 + y2

dx+x

x2 + y2dy =

∫ 2π

0

− sin(θ)(− sin(θ)dθ)+cos(θ) cos(θ)dθ = 2π.

When more than one singularity is present, we just put a fence around eachone and we get the obvious generalization of the above formula.

All of the above discussion was motivated by the work integral

W =

∮Γ

F · t ds.

For the mathematician and the physicist we should expect some significancefrom the integral ∮

Γ

F · n ds.

This occurs for the physicist as follows. Suppose that F describes a flow inthe plane, for example, as a velocity vector field. We want to calculate thetotal flux outwardly across Γ. Here n is the outward unit normal vector at apoint on Γ. As with work, we measure only the component of the flow thatis perpendicular to the path Γ. This component is given by F · n and so theflow across a small piece of Γ is given by F · n ds. We get the total flow (orflux) by integrating over all of Γ. As usual, we take F = P i + Qj and we

parameterize Γ by (x(t), y(t)) for t from a to b. We have t =dt

ds< x′, y′ >

and so it follows that n = ± dt

ds< y′,−x′ >. Draw a picture to convince

yourself that the outward normal is given by

n =dt

ds< y′,−x′ >

and so we get∮Γ

F · n ds =∮Γ

(Py′ −Qx′)dt

dsds =

∮Γ

−Qdx+ Pdy.

Now apply Green’s Theorem (provided no singularities are present) tofind that the flux is given by∮

Γ

F · n ds =∫ ∫

i(Γ)

[∂P

∂x+∂Q

∂y]dxdy =

∫ ∫i(Γ)

divF dxdy.

74

If Γ is the small disk with center (x0, y0) and radius ϵ, then the doubleintegral is approximately πϵ2divF (calculated at the point (x0, y0)). HencedivF gives the limiting flux per unit area at (x0, y0), and this explains (in2-space) why the word divergence was used for this differential expression.

Notice that if F is the position vector xi + yj then divF = 2 and so theflux across Γ is precisely twice the area of i(Γ).

When a singularity is present inside Γ, we just put a fence Γ1 aroundit, with R the region between Γ and Γ1. Now we argue just as in Green’sTheorem with a singularity to get∮

Γ

F · n ds−∮Γ1

F · n ds =∫ ∫

R

divF dxdy.

For example, let

F =x

x2 + y2i+

y

x2 + y2j

and let Γ be the ellipse x2/32+ y2/42 = 1 and Γ1 be the circle x2+y2 = 1. A

computation shows that divF = 0 and so the double integral over R is zero.Hence,∮Γ

F ·n ds =∮Γ1

F ·n ds =∫ 2π

0

[− sin(θ)(− sin(θ))dθ+cos(θ) cos(θ)dθ] = 2π.

Notice that if we move Γ so that it no longer includes the singularity at theorigin, then the flux across Γ will then be zero by Green’s Theorem with nosingularity.

Our next step is to generalize the flux theorem to 3-space. Now we wantto measure the flow across a closed surface σ such as a sphere or ellipsoid ortetrahedron. The component of F normal to the surface is again given by F·nand so the flow across a small portion of surface is given by F · n dS. Recallthat we had various formulas in the previous chapter for dS. We denote theregion inside σ by i(σ). It is a remarkable fact that we get essentially thesame formula for flux in the 3-space case; the result is called the DivergenceTheorem, namely ∫ ∫

σ

F · n dS =

∫ ∫ ∫i(σ)

divF dV.

We prove the theorem first when σ is a cuboid and then generalize by thepatchwork quilt (or Lego block method).

75

Let σ be the cuboid given by a1 ≤ x ≤ a2, b1 ≤ y ≤ b2, c1 ≤ z ≤ c2. Wecompute the flux by working over the opposite faces two at a time. On thetop and bottom faces, we clearly have dS = dxdy. For the top face we haven = k, and for the bottom face we have n = −k. So the flux across thesefaces is given by∫ a2

a1

[R(x, y, c2)−R(x, y, c1)]dxdy =

∫ ∫ ∫i(σ)

∂R

∂zdxdydz.

On the other pairs of faces we have dS = dydz, n = ±i, and finally dS =dzdx, n = ±j. The result follows.

When we start to stack bricks together we get cancellation of the fluxcontributions across the common pieces of faces. We need to be a little carefulabout how we approach the limit, but the general result is now intuitivelyclear. Notice that if divF = 0 (this does NOT imply that F = 0, nor eventhat F is constant), then we have zero total flux across σ. When F is theposition vector xi+ yj+ zk we get divF = 3 and so the flux across σ is justthree times the volume of i(σ).

When a singularity is present inside σ, we again build a fence around it(usually with a sphere), say, σ1. We use the now standard decompositionmethod to show that in this case∫ ∫

σ

F · n dS −∫ ∫

σ1

F · n dS =

∫ ∫ ∫i(σ)

divF dV.

As an example, consider the gravitational vector field F = −ρ−2ρ. We haveearlier seen that divF = 0, except at the singularity at the origin. Thusthe flux has the same value for any closed surface σ with the origin insideit. To calculate the actual flux, we may take the unit sphere about theorigin. But then, F · n = −1 and so the flux is just the negative of thesurface area of the sphere, namely −4π. [Aside: the following formula fordS in spherical coordinates is often helpful. Since the volume element isgiven by dV = ρ2 sin(ϕ)dρdθdϕ, it is clear by the onion skin method thatdS = ρ2 sin(ϕ)dθdϕ.]

The theme we have followed for integration theorems started with Green’sTheorem for work and then moved to Green’s Theorem for flux and thento the Divergence Theorem for flux in 3-space. In each case, an integralalong a “boundary” is equal to an associated integral that takes place in onedimension higher. We have one more case to do. What happens to Green’s

76

Theorem for work when the closed loop is in 3-space? For example we couldhave a circle in 3-space which we regard as the boundary of a disk in 3-space.Equally that circle might be regarded as the boundary of a hemisphere in 3-space. Or we could take a twisted loop in 3-space (not lying in any plane), dipit in soap solution and see the wire as the boundary of a soap bubble surface(these are called minimal surfaces because the surface adopts a position withminimal energy). To see what happens in these situations we go back andtry to rewrite Green’s Theorem for work in 3-space language.

In the plane we have (with no singularities)∮Γ

Pdx+Qdy =

∫ ∫i(Γ)

[∂Q

∂x− ∂P

∂y]dxdy.

Notice that in the double integral we are integrating the third component ofcurlF, which we may write as curlF ·k. In fact it is better to regard k as theupward pointing normal at any point of the surface i(Γ). Finally we noticethat we can replace dxdy by dS. Now we can rewrite our integral formulafor work as ∮

Γ

F · t ds =∫ ∫

i(Γ)

curlF · n dS.

We can make one small readjustment. We replace i(Γ) by a surface σ in 3-space so that the boundary of σ is the closed loop Γ. There are some technicaldifficulties in specifying the positive direction along Γ and the upward unitnormal on σ. The intuitive rule of thumb is that we are proceeding in thepositive direction around Γ when the surface is on our left and the normalvector comes up through the surface. [No difficulty occurs for easy cases likehemispheres, but there are nasty (non-orientable) surfaces where we cannotmake sense out of “positive” and “up”. A famous example is the Moebiusband. A long strip of paper may be glued at the ends to form a bangle, butif one twist is introduced before the gluing then we get a Moebius band. Aspider who walks around this band once comes back to his starting point onthe other side of the paper!]

For non-nasty surfaces σ, Stoke’s Theorem states that∮Γ

F · t ds =∫ ∫

σ

curlF · n dS.

At first glance the theorem does not seem right. Clearly there are many dif-ferent surfaces σ that have the same boundary loop Γ. How can they all give

77

the same surface integral on the right hand side? Because of the DivergenceTheorem! As an illustration, let Γ be the equator, σ1 the northern hemi-sphere, and σ2 the southern hemisphere. Join together the two hemispheresto give the sphere σ. Remember that we always have div curlF = 0. So theDivergence Theorem gives∫ ∫

σ

curlF · n dS =

∫ ∫ ∫i(σ)

div curlF dS = 0.

For the Divergence Theorem, the unit normal n is outward and so for thesouthern hemisphere it is downward instead of upward. With this adjustmentof sign we thus get ∫ ∫

σ1

F · n dS =

∫ ∫σ2

F · n dS

as claimed. For this example we get the same answer if we replace eitherhemisphere by the disk through the equator — in which case we are essen-tially back at Green’s Theorem in the plane. Thus we see that new (andessentially harder) exercises on Stoke’s Theorem call for loops Γ that aretwisted in 3-space.

To prove Stoke’s Theorem, we begin with the case of a triangle. Since theintegrals have a physical interpretation we may as well regard the triangle asin the (x, y)-plane. But then the statement is just Green’s Theorem. Nowwe start gluing triangles together. The path integrals along common sidesoccur in opposite directions and cancel out as usual to give Stoke’s Theoremfor a more complicated surface. We just keep on doing this and go to thelimit. [We really have to be careful in this case to let all the triangles getinfinitesimally smaller as we go to the limit.]

We can apply Stoke’s Theorem to small circles centered at (x0, y0, z0) andthe associated disk on such a circle to give a physical interpretation for curlF.In the context of fluid dynamics, curlF was interpreted as a rotational effect— and in old textbooks it is denoted by rotF!

78

PROBLEMS 4


1. Calculate curl(F), curl(curl(F)), div(curl(F)),∇(div(F)) in each ofthe following cases:

x2i+ y2j+ z2k, x2zi− 2xzj+ z3k, (x/ρ)i+ (y/ρ)j+ (z/ρ)k

where, as usual, ρ2 = x2 + y2 + z2.2. Write p for the position vector xi+yj+zk. Calculate ∇.p and ∇×p.

Repeat the exercise when p is replaces by ρkp, where k is a real number(positive or negative).

3. Find an equivalent formula for each of the following:

∇.(fF), ∇× (fF), ∇.(F×G), ∇.∇(fg).

Let ∆ be the Laplacian in three (cartesian) variables. Prove that

∇× (∇× F) = ∇(∇.F−∆F

and∇.(f∇g − g∇f) = f∆g − g∆f.

Find another formula for ∇.(∇f×∇g) and prove also that ∇×(f∇f) = 0.4. Evaluate

∫Γ(x + y)2dx + (x − y)2dy where Γ is the path given by

r(t) = (1+ t)i+ t2j for t from 0 to 1. Same problem when Γ is the unit circle(in the positive direction). Same problem when Γ is the ellipse 4x2 + y2 = 4(in the positive direction).

5. Calculate the work done by the force field

F = (3x3 + y)i+ 4xy2j

in moving a particle in the positive direction once all around the boundaryof the region enclosed by y =

√x, y = 0, x = 4.

6. Repeat problem 5, but calculate the outward flux across the boundary.

79

7. Let F be the force field rk(xi+yj). Calculate the work done in movinga particle in the positive direction once around the unit circle when (i) k > 0,(ii) k < 0. Same problem when the circle is replace by the ellipse 4x2+y2 = 1.For which values of k is the force field a gradient vector field?

8. Repeat problem 7 for flux instead of work.9. Let Cn be the circle centered at the origin with radius n. Calculate∮

C2

F · n ds−∮C1

F · n ds

where F = (x/r)i+ (y/r)j and r2 = x2 + y2.10. A cylindrical can, of base radius 1 and height 4, sits on the xy-plane

with the center of its base on the origin. Find the outward flux throughthe can of the vector field F = x2i + y2j + z2k. Same problem for the caseF = xi+ y2j+ z3k.

11. Repeat problem 10 when the can is replaced by the unit spherecentered at the origin.

12. Let F = (6x5−3x2+12x3y−3y4)i+(3x4−12xy3−6y5+3y2)j. Showthat F is conservative and find f such that F = ∇f . Same problem when

F = [cos(xy)− xy sin(xy) + yz sin(zx)]i− [x2 sin(xy) + cos(zx) + z sin(yz)]j

+[xy sin(zx)− y sin(yz)]k.

13. Calculate the upward flux through the hemisphere x2 + y2 + z2 =1, z ≥ 0 of the vector field F = x2i + y2j + z2k. Same problem whenF = y2i+ z2j+ x2k.

14. Repeat problem 13 for the surface which is the bounded portion ofthe surface z = 12− x2 − 2y2 that is cut off by the surface z = 2x2 + y2.

15. Suppose that curl(F) = 0 in all of 3-space. Prove that F = ∇f forsome function f . Show by example that the conclusion can fail if the givenequation just holds everywhere in 3-space except for one point.

16. Suppose that div(F) = 0 in all of 3-space. Prove that F = curl(G)for some vector field G. [Hint: look for a G of the form P i+Qj.] Show by ex-ample that the conclusion can fail if the given equation just holds everywherein 3-space except for one point.

17. You want to explain to someone the key ideas and theorems of vectorcalculus (vector fields, conservative fields, gradient fields, work, independenceof path, flux, boundary theorems for integrals, etc). Write a suitable essay.

80

18. A particle is on the z-axis at height > a. A sphere of uniform densityhas center at the origin and radius a. Prove that the total gravitational forceexerted on the particle by the sphere is the same as the gravitational forceexerted by a particle at the origin with mass equal to that of the sphere. IsaacNewton did it by an ingenious argument - you can do it by hard calculus!

19. A planimeter is a device for measuring the area of a planar region bytraversing its boundary. A rod length a rotates about one end-point A. Theother end-point B is hinged to another rod of length b. A pencil attached tothe other end C of the second rod traces out a region R (that does not includeA inside it). The external angle between the rods at B is χ. Use the polarversion of Green’s line integral for area to prove that the area of R is givenby ab

∮Γcosχdθ, where Γ is the boundary of R (with the positive direction).

Now let Θ be the angle between the x-axis and the rod AB. Prove that thearea is equally given by ab

∮ΓcosχdΘ. Interpret an infinitesimal part of this

area as a rectangle with one vertex at B. What kind of measuring devicemay we attach at B to calculate this line integral?

81

APPENDIX

We gather together here various odds and ends that were omitted in theprevious chapters to avoid disrupting the main flow of ideas.

We begin with the formula for change of variable in a double integral. Wehave already considered the change from Cartesians to polars. We considernext the linear case and the general case follows readily. We change variablesby the equations

x = α + λu+ µv, y = β + ρu+ σv.

We consider a (u, v) rectangle with corners at the points (a, b) and (a +du, b + dv). Under the linear change of variables this rectangle correspondsto a parallelogram in the (x, y)-plane. We can calculate the area of thisparallelogram as soon as we know vectors for its sides. The point (a, b) inthe (u, v)-plane becomes the point (α + λa + µb, β + ρa + σb) in the (x, y)-plane. Similarly we get the image of (a+ du, b) in the (x, y)-plane and thensubtraction gives us the vector < λdu, ρdu >. A similar calculation gives theother vector as < µdv, σdv >. The area of this parallelogram is given by themagnitude of the vector product and hence is given by

|λσ − µρ| dudv.

Under the change of variable, the function f(x, y) becomes the functionF (u, v) and the region R in the (x, y)-plane becomes the region R∗ in the(u, v)-plane. So we get the change of variable formula∫ ∫

R

f(x, y)dxdy =

∫ ∫R∗F (u, v)|λσ − µρ| dudv.

For a general change of variable x = x(u, v), y = y(u, v), we get locallinear approximations as above with

λ =∂x

∂u, µ =

∂x

∂v, ρ =

∂y

∂u, σ =

∂y

∂v.

The errors in these local linear approximations are of small enough order ofmagnitude that they disappear in the limit. Notice that the term |λσ − µρ|

82

is just the absolute value of the determinant of the Jacobian matrix of thechange of variables. Hence we get the general formula∫ ∫

R

f(x, y)dxdy =

∫ ∫R∗F (u, v)| det J | dudv.

Even linear changes of variable are useful. Another useful change is to“elliptical” polar coordinates. Let

x = ar cos(θ), y = br sin(θ)

so that det J = abr. The elliptical region x2/a2 + y2/b2 ≤ 1 now changes tothe region 0 ≤ r ≤ 1, 0 ≤ θ ≤ 2π. This often simplifies enormously a doubleintegral taken over an ellipse.

Our next theme is change of variables for Laplacians, but we need somepreparatory work on complex exponentials. Recall that we write a complexnumber as a+ ib where a, b and real and i2 = −1. We define eiθ by

eiθ = cos(θ) + i sin(θ).

This gives

d

dθeiθ = − sin(θ) + i cos(θ) = i[cos(θ) + i sin(θ)] = ieiθ

which is the direct analogue of the formula for differentiating the real expo-nential. Notice in passing that

eiθeiϕ = [cos(θ)+i sin(θ)][cos(ϕ)+i sin(ϕ)] = cos(θ+ϕ)+i sin(θ+ϕ) = ei(θ+ϕ).

If we believe the index law for exponentials, this gives a means to rememberthe formulas for cos(θ + ϕ) and sin(θ + ϕ).

For λ = a+ ib we now define eλt by

eλt = eateibt = eat[cos(bt) + i sin(bt)]

and we get

d

dteλt = aeateibt + ibeateibt = (a+ ib)eateibt = λeλt.

Again, this is the direct analogue of the formula for real exponentials. It isimmediate that an anti-derivative for eλt is given by (1/λ)eλt. This leads toa quick method for some integrals.

83

We can evaluate the integrals∫ ∞

0

e−x cos(2x)dx,

∫ ∞

0

e−x sin(2x)dx

by integrating by parts twice. Instead we consider∫ ∞

0

e−x[cos(2x) + i sin(2x)]dx =

∫ ∞

0

e−λxdx =1

λ

where λ = 1 − 2i. But then 1/λ = (1 + 2i)/5. Taking real and imaginaryparts we immediately get∫ ∞

0

e−x cos(2x)dx =1

5,

∫ ∞

0

e−x sin(2x)dx =2

5.

This method makes it easy to calculate integrals like∫ ∞

0

xke−x cos(2x)dx,

∫ ∞

0

xke−x sin(2x)dx

where k is a positive integer.For a smooth function V of two variables, ∇V = (∂V/∂x)i + (∂V/∂y)j,

and so

div∇V =∂2V

∂x2+∂2V

∂y2.

Recall that this give the Laplacian of V . We get the obvious generalizationwhen V is a function of three variables. Functions whose Laplacian vanishesare called harmonic functions and they have many nice properties. In appli-cations it is essential to know what the Laplacian changes to when we changethe variables; one needs the formulas for polars, cylindricals and sphericals.The hard case is polars (the others follow easily). The traditional textbookproof is long and messy. There is a useful variant available when we usecomplex numbers. The chain rule gives

∂V

∂r=∂V

∂x

∂x

∂r+∂V

∂y

∂y

∂r= cos θ

∂V

∂x+ sin θ

∂V

∂y

and∂V

∂θ=∂V

∂x

∂x

∂θ+∂V

∂y

∂y

∂θ= −r sin θ∂V

∂x+ r cos θ

∂V

∂y.

84

When we solve these equations for ∂V/∂x and ∂V/∂y, we get

∂V

∂x= cos θ

∂V

∂r− sin θ

r

∂V

∂θ,

∂V

∂y= sin θ

∂V

∂r+

cos θ

r

∂V

∂θ.

It follows that∂V

∂x+ i

∂V

∂y= eiθ[

∂V

∂r+i

r

∂V

∂θ]

and∂

∂x− i

∂

∂y= e−iθ[

∂

∂r− i

r

∂

∂θ].

It is easy to check that

∂2V

∂x2+∂2V

∂y2= (

∂

∂x− i

∂

∂y)(∂V

∂x+ i

∂V

∂y).

Substitute from the above formulas and use the product rule for derivativesto end up with

∂2V

∂x2+∂2V

∂y2=∂2V

∂r2+

1

r

∂V

∂r+

1

r2∂2V

∂θ2.

[Aside: this method can be used for other factorizations - what would weget from the difference of squares?]

Suppose we now wish to change the Laplacian in three variables intocylindrical coordinates. We need only do the polar change from (x, y) to(r, θ) and so we get immediately that

∂2V

∂x2+∂2V

∂y2+∂2V

∂z2=∂2V

∂r2+

1

r

∂V

∂r+

1

r2∂2V

∂θ2+∂2V

∂z2.

To get to spherical coordinates, we just make a further polar change from(z, r) to (ρ, ϕ). By the two variable case

∂2V

∂z2+∂2V

∂r2=∂2V

∂ρ2+

1

ρ

∂V

∂ρ+

1

ρ2∂2V

∂ϕ2.

It remains to rewrite (1/r)(∂V/∂r) in terms of ρ and ϕ. But this is just theanalogue of expressing ∂V/∂y in terms of r and θ, and so we get

∂V

∂r= sinϕ

∂V

∂ρ+

cosϕ

ρ

∂V

∂ρ.

85

Putting all this together we end up with

∂2V

∂x2+∂2V

∂y2+∂2V

∂z2=∂2V

∂ρ2+

2

ρ

∂V

∂ρ+

1

ρ2 sin2 ϕ

∂2V

∂θ2+

1

ρ2∂2V

∂ϕ2+

cotϕ

ρ2∂V

∂ϕ.

These formulas make it easy to study harmonic functions of the formU(r)v(θ) in two variables, or U(ρ)v(θ)w(ϕ) in three variables.

There is a useful theorem on solids of revolution attributed to Pappus(long before calculus!). Let R be a region in the first quadrant. Rotate Rabout the y-axis. Then the volume of the solid of revolution is just the areaof R times the distance traveled by the center of mass of the region R. Acalculus proof is almost immediate when we use the formula for the solidof revolution given by the “tin can” method. In particular this gives animmediate formula for the volume of a solid torus.

We now fulfill our promise to show that the mixed partial derivatives

are equal for nice functions. Let z = f(x, y) be such that∂2f

∂x∂yand

∂2f

∂y∂xare differentiable functions. Suppose that the mixed partials are not equalat some point (a, b). Then, without loss of generality, we get a rectangleR = [p, q]× [r, s] such that

∂2f

∂x∂y(ξ, η) >

∂2f

∂y∂x(ξ, η)

for all points (ξ, η) in the rectangle R. It follows that∫ ∫R

∂2f

∂x∂ydξdη >

∫ ∫R

∂2f

∂y∂xdξdη.

We calculate each double integral as a repeated integral (using a differentorder of integration in each case) to find that each double integral has thevalue

f(q, s) + f(p, r)− f(q, r)− f(p, s).

This contradiction completes the proof. Notice in passing that the aboveideas give the following fact. Suppose that the mixed partial derivative of fis non-negative on the rectangle R. Then we get the inequality

f(q, s) + f(p, r) ≥ f(q, r) + f(p, s).

For suitable f this can give delicate inequalities which are not readily ob-tainable by one variable calculus techniques.

86

INDEX

Acceleration vector 18, 21AGM inequality 42annulus 64arrow 6astroid 10, 11, 12, 13, 25

Binormal 21

Cardioid 64center of mass 54, 58, 64center of inertia 55chain rule 31, 34, 37closed loop 70complex numbers 83cone 37conservative field 70constrained optimization 42contour lines 33crossover point 23curl 67, 78curvature 10, 20cusp 9cycloid 23, 25cylinder 37, 63cylindrical coordinates 57, 85

Differentiable 28directional derivative 32, 34discriminant 41divergence (div) 66, 75Divergence Theorem 75dot product 7double integral 50

Ellipsoid 35, 64ellipsoidal coordinates 65evolute 13, 26

Flux 74Folium of Descartes 23, 73force vector 8

Gradient vector 32gradient vector field 66Green’s Theorem 71

Harmonic function 48harmonic mean 24helix 19, 20homogeneous function 39hyperboloid 36, 48hypocycloid 23

Involute 13, 26

Jacobian 38, 83

Lagrange multipliers 42Laplacian 67, 84length of path 10, 19line 5, 7, 13line integral 68local linear approximation 5, 28local max/min 40, 41

Mass 53mixed partial derivatives 86moments 54

Normal vector 8

Osculating circle 12

Pappus 65paraboloid 36parallelepiped 17, 25

87

partial derivative 29, 34

path 5, 13

plane 14, 17

planimeter 82

polar coordinates 5, 38, 55, 85

position vector 6

probability density function 55, 63

Quadric surface 36

Repeated integral 50

right handed screw 16

Saddle point 41

scalar product 7, 13

scalar triple product 18

Serret-Frenet formulas 21

signed volume 50

singular point 9

singularity 56, 73, 76

spherical coordinates 58, 85

standard simplex 53, 57steepest slope 32Stoke’s Theorem 77surface area 59, 60

Tangent line 8, 18tangent plane 28, 34, 47tangent vector 8, 18, 29torsion 21torus 59triple integral 57twisted cubic 19, 20

Unit binormal vector 21unit normal vector 11, 20unit tangent vector 10, 20

Vector 6, 13vector field 66vector product 14

Work 68

88

Calculus Gems Barrow, Newton and Hooke · better than to read Calculus Gems by George Simmons, or...

Documents

Transcript of Calculus Gems Barrow, Newton and Hooke · better than to read Calculus Gems by George Simmons, or...