MATH203 Calculus of several variables - University of Otago

MATH203Calculus of several variables

Lecture Notes as of March 1, 2021

J. Frauendiener

Contents

1 Conic sections 41.1 Parabola . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Ellipse . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.3 Hyperbola . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Functions 92.1 Functions of one real variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Functions of two real variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.3 Traces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.4 Level curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3 Limits and Continuity 153.1 Limits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.2 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

4 Partial Derivatives 194.1 Partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Higher partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5 Linear approximation and derivative 235.1 Curves and tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 235.2 Linear approximation and derivative . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 245.3 The chain rule . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 295.4 Directional derivative and gradient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 315.5 Beyond the linear approximation: the multivariate Taylor expansion . . . . . . . . . . . . . . . . . . 34

6 Extremal points 366.1 Local extrema and critical points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366.2 An application: linear regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 416.3 Absolute maxima and minima . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 426.4 Constrained optimisation, Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

7 Functions of several variables 507.1 Vector valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 507.2 Differentiation of vector valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517.3 Composition of vector valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 527.4 The chain rule for vector valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537.5 More on surfaces, implicit differentiation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

8 Integration 608.1 The Riemann integral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608.2 Double integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.3 Integrals over irregular domains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 638.4 The change of variables formula for double integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . 668.5 More on polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 698.6 Integrals in polar coordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

9 Line integrals 739.1 Line integrals of scalar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

2

9.2 Line integrals of vector fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 749.3 Conservative vector fields and potentials . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

10 Vector identitities, Green’s and Gauss’ theorem 8310.1 Divergence of vector fields, vector identities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8310.2 Green’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8310.3 Surface integrals and Gauss’ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

3

1 Conic sections

Definition 1.1. A conic section is a curve that arises from cutting a double circular cone by a plane.

Looking edge-on onto the plane we can see that there are seven different cases:

circle parabola ellipse hyperbola two lines one line point

The last three cases are not interesting and we do not investigate them any further. Also, it is clear that thecircle is a special case of an ellipse. It is also worthwhile to point out that the circle and the parabola are specialin a particular sense: imagine, we change the intersecting plane slightly. Then, if the change is small enoughboth the ellipse and the hyperbola will still remain ellipse and hyperbola. But this is not true for the circle whichwill turn into an ellipse no matter how small the change. Similarly, a parabola will either turn into an ellipseor a hyperbola. Therefore, the ellipse and the hyperbola are the stable conic sections. Circle and parabola areunstable.

Note A conic section is either an ellipse (the circle is a special case), a hyperbola or a parabola.

Let us discuss these in turn.

1.1 Parabola

The basic equation for a parabola is y = ax2

x

ya > 0

a < 0

graph of the parabola

4

Example 1.1Plot the graph of the parabola given by the equation

y = 2x2 − 6x + 5

• Complete the square

y = 2�

x −32

�2

+12

• Note how the graph is shifted by 32 to the right

and 12 upwards compared to the standard form.

x

y

1 232

1

2

12

1.2 Ellipse

The equation for an ellipse in standard form is

x2

a2+

y2

b2= 1, a, b > 0

x

y

a

b

vertices

centre

Properties of the graph in standard form:

• The ellipse intersects the axes in the vertices (±a, 0) and (0,±b).• The centre (point) is at the origin (0,0).• The line segments from the centre to the vertices are the semi-axes.• The longer/shorter one is the major/minor semi-axis,• When a = b, then the intercepts with the axes are equal and the basic equation yields

x2

a2+

y2

a2= 1 =⇒ x2 + y2 = a2, a circle

5

Example 1.2Plot the graph of the ellipse with the equation

2x2 − 4x + y2 − 6y + 7= 0

• complete the squares

2(x − 1)2 + (y − 3)2 = 4

• write in standard form

(x − 1)2

2+(y − 3)2

4= 1

• the equation describes an ellipse with majorsemi-axis b = 2 and minor semi-axis a =

p2

• note, that its centre is shifted by 1 to the right andby 3 upwards. x

y

1 2

1

2

3

4

(1, 3) a

b

1.3 Hyperbola

The equation for a hyperbola in standard form is

x2

a2−

y2

b2= 1, a, b > 0 or

x2

a2−

y2

b2= −1, a, b > 0

We first discuss the formula on the left, the ‘left-right’ hyperbola:

x

yy = b

a x

y = − ba x

a−a

vertices

centre

forbidden region

asymptotes

6

Observe:

• There are no points on the curve with −a < x < a, since

y2

b2=

x2

a2− 1≥ 0

so x2 ≥ a2, i.e., x ≥ a or x ≤ −a.• x can grow without limit. Then also y grows without limit and

y2

b2=

x2

a2− 1≈

x2

a2=⇒ y ≈ ±

ba

x .

• For increasing x (and y) the curve comes ever closer to these lines, the asymptotes.

Note In order to find the asymptotes, write the equation in standard form, ignore the 1 on the right handside and solve for y .

Example 1.3Plot the graph of the hyperbola defined by the equation

x2 − 4x − 4y2 − 8y − 4= 0

• complete the squares

(x − 2)2 − 4(y + 1)2 = 4

• write in standard form

(x − 2)2

4−(y + 1)2

1= 1

• the equation describes a hyperbola with centre at(2,−1) and vertices (0,−1) and (4,−1).

x

y

0 10

1

(2,−1)

• note, that its centre is shifted by 2 to the right and by 1 downwards• To get the asymptotes we use the standard form, ignore the 1 on the right hand side and solve for y .

This yields

y = −1±12(x − 2)

The second equation for a hyperbola is

x2

a2−

y2

b2= −1, a, b > 0 .

Notice that we can write this equation in the form

y2

b2−

x2

a2= 1, a, b > 0 .

This means that we have exactly the same discussion as before except that we need to interchange x and y .Therefore, the curve has no points which lie in the region −b < y < b and the branches of the hyperbola lie inthe upper and lower halves of the plane instead of the left and right halves. The asymptotes are determined inthe exact same way as before. Here is a sketch of such a ‘top-bottom’ hyperbola:

7

x

yy = b

a x

y = − ba x

b

−b

vertices

centre

asymptotes

Question: How does the ‘hyperbola’ given by the equation y = 1x fit into this picture?

• We write the equation in the form

y x = 1

• define a transformation to ‘new’ coordinates ( x , y) by

x =1p

2( x + y) , y =

1p

2( x − y) .

• then the equation can be written

x y =12

�

x2 − y2�

= 1

orx2

2−

y2

2= 1

1

1

x

y

Exercise.(i) What is the geometric meaning of this coordinate transformation? (ii) Determine the forbidden region forthe hyperbola x y = 1. (iii) What are the asymptotes for this hyperbola? (iv) Where are its vertices?

8

2 Functions

What is a function? In very general terms, a function is an assignment of objects to objects in a univalent way.

f : M N

The green assignment is forbidden!

To each object on the left side there is assigned exactly one object on the right side. Notice that not all objectson the right must be assigned to. These objects can be quite general. Consider the following assignments:

• Assign to every mountain the person that first climbed it.• Assign to every person their father.• Assign to every mother her children.• Assign to every positive number x a number y so that y2 = x .

Are these all functions?

We want to be a bit more serious here.

2.1 Functions of one real variable

Definition 2.1. A real-valued function f of one real variable x is a rule that assigns to each x in a set D ⊂ R auniquely defined number y ∈ R. We write y = f (x) or more explicitly

f : D→ R, x 7→ y = f (x)

Example 2.1

y =1x2

This rule assigns to every x ∈ D := (−∞, 0)∪ (0,∞) the number 1/x2 ∈ R. The corresponding function f is

f : (−∞, 0)∪ (0,∞)→ R, x 7→1x2

.

Note Here are some more definitions: Consider a function f : M → N .1. The set M of x for which f is defined (i.e., in the example above, for which 1/x2 can be evaluated) is

called the domain of f .2. The set N which contains the objects assigned by f is called the target of f .3. For x ∈ M f (x) is the value of f at x .4. The set of all values of f is called the range of f . In general, it is a subset of the target N .

9

Example 2.2y =

p

1− x2, wherep

is the positive square root.1. Finding the domain: y can be computed as a real number as long as 1− x2 ≥ 0, i.e., when |x | ≤ 1. Hence,the domain is

D = [−1, 1].

2. Finding the range: as x varies from −1 to 1, y increases from 0 to 1, where it reaches a maximum and thendecreases back down to 0. Therefore,

range= [0, 1].

Example 2.3

y =1

x2 − 1The domain is easy to find:

D = R\{±1}= (−∞,−1)∪ (−1, 1)∪ (1,∞)

To find the range is a bit more complicated. The easyway is to use the shadow method or horizontal line testas explained in STEWART: imagine shining a spot-lightparallel to the x-axis onto the graph of the function.Then those regions on the y-axis which are in the darkmake up the range of the function. Here, there is noshadow for y ∈ (−1, 0]. Thus, the range is (−∞,−1]∪(0,∞) = R\(−1,0].

x

y

There is also a more systematic way to find the range. We identify all y ’s which can occur as values of thefunction. This entails to find all values a for which the equation a = 1/(x2 − 1) can be solved. So, we have

ax2 − (a+ 1) = 0

=⇒ x2 =a+ 1

a= 1+

1a

.

This equation has solutions only when 1+ 1/a ≥ 0. First, we note that a = 0 is not possible (why?). Thereare two cases to consider:

a > 0 : a+ 1≥ 0, no restriction,

a < 0 : a+ 1≤ 0, so a ≤ −1.

So we find that the equation can be solved for all a ≤ −1 or a > 0. These are the values of f so that the rangecomes out as before, (−∞,−1]∪ (0,∞).

10

Example 2.4

y = [x] := integer closest to x

To find the domain we ask for which x can y not be determined. Clearly, these are exactly those numberswhich lie exactly midway between integers, i.e., for the half-integers x = ±1

2 ,±32 , . . . Thus, the domain is

D = {x ∈ R : x −12/∈ Z}.

The range must be a subset of the set of all integers Z and it is easy to see that the range is equal to Z since[n] = n for each integer n ∈ Z.

2.2 Functions of two real variables

This will be our main case but the theory applies to much more general cases.

Definition 2.2. A real-valued function f of two variables x, y is a rule that assigns to each ordered pair (x , y) ina set D ⊂ R2 a uniquely defined number z ∈ R. We write

f : D→ R, (x , y) 7→ z = f (x , y).

Note As before we point out some properties:• D is the domain of f .• z is the value of f at (x , y). The set of all values of f is the range of f .• R2 = R×R= {(a, b) : a ∈ R and b ∈ R} can be thought of as the x , y-plane in an (x , y, z) diagram.

A function f of two variables can be represented asa graph in R3 (= R×R×R, 3-dim space), by plot-ting the point (x , y, f (x , y)) for each (x , y) ∈ D. Thegraph is a surface (for most of the functions we con-sider).To be precise:

graph( f ) = {(x , y, z) ∈ R3 : (x , y) ∈ D, z = f (x , y)}.(x , y)

(x , y, f (x , y))

11

Example 2.5If f (x , y) = x y − x + 2y then

f (1, 2) = 1 · 2− 1+ 5= 5,

f (t, t2) = t3 − t + 2t2,

f (u, v) = uv − u+ 2v,

in fact: f (�,4) = �4−�+ 24.

Only the slots (position) of the arguments are relevant, not the names of the variables.It follows that

f (x − y, x + y) = (x − y)(x + y)− (x − y) + 2(x + y)

= x2 − y2 + x + 3y

6= f (x , y).

Example 2.6Consider the rule

(x , y) 7→ z =1

x y − 1.

Where does this assignment make sense? We can evaluate it when-ever

x y − 1 6= 0 =⇒ y 6=1x

.

Therefore, the domain is

D = {(x , y) ∈ R2 : y 6= 1/x},

i.e., the x , y-plane without the hyperbola. -� -� � � �

-�

-�

�

�

�

�

�

x y > 1=⇒ z > 0

x y > 1=⇒ z > 0

x y < 1=⇒ z < 0

When the point (x , y) approaches the hyperbola, then z grows without limit to±∞ depending on the directionof approach. Therefore, the corresponding function is defined as

f : {(x , y) ∈ R2 : x y 6= 1} → R, (x , y) 7→1

x y − 1.

Example 2.7The assignment

(x , y) 7→p

x + y − 1

makes sense whenever x + y − 1 ≥ 0, i.e., when y ≥ 1− x .So, the domain is D = {(x , y) ∈ R2 : y ≥ 1− x}, which areall points on or above the line with the equation y = 1− x .The function which is defined by the assignment is

f : {(x , y) ∈ R2 : y ≥ 1− x} → R, (x , y) 7→p

x + y − 1.

-� -� � � �

-�

-�

�

�

�

�

�

12

Example 2.8

The assignment(x , y) 7→ ln(x2 − y)

can be evaluated for all points (x , y) which satisfy the conditionx2 − y > 0. These are all points below the parabola y = x2. Sothe function defined by the rule is

f : {(x , y) ∈ R2 : y < x2} → R, (x , y) 7→ ln(x2 − y).

-� -� � � �-�

�

�

�

�

�

�

�

2.3 Traces

Consider the assignment(x , y) 7→ x2 + y2 =: z.

Here, the domain is the entire (x , y)-plane, D = R2. What does the graph of the defined function look like?

We can get some idea by looking at the traces of the sur-face. These are curves on the surface which are obtainedby cutting it with planes orthogonal to the coordinate axes.For example: cutting the surface with a plane orthogonal tothe y-axis means that we put y = c, a constant, and obtainz = x2 + c2. These are parabolae continually shifted alongthe y-axis and simultaneously raised in the z-direction as cvaries. The surface is the union of all such curves.

Cutting with horizontal planes, i.e., putting z = c, a con-stant, we find

x2 + y2 = c.

Thus, the z-traces are circles with radiip

c. They exist onlyfor c ≥ 0. There are no points on the graph surface forwhich z < 0.

In a similar way, one considers x-traces, obtained by cuttingwith planes x = c.

2.4 Level curves

There is another way of getting an idea about the behaviour of functions.

Definition 2.3. A level curve or contour of a function f (x , y) is a curve in the (x , y)-plane on which z = f (x , y)is constant. In other words, f takes the same value at every point of a level curve.

13

-1

-1

-1

-0.5

-0.5

0

0

0.5

0.5

1

1

1.5

1.5

1.5

2

2

2.5

2.5

-1.0 -0.5 0.0 0.5 1.0-1.0

-0.5

0.0

0.5

1.0The diagram on the right is a contour plot of a func-tion. We see several lines labeled by numbers. Theseare the values that the function takes on the cor-responding contour. We find that the function hasa ‘trough’ near the point (0.5,0.4) and a ‘hill’ near(−0.5,−0.6). Furthermore, there is a region aroundthe point (−0.6,0.5)where the function is almost con-stant. Leaving this region in the positive or nega-tive y-direction we find higher values of the function,while moving left or right, in the positive or negativex-direction, we detect lower values.

Example 2.9

For the function f defined by the assignment (x , y) 7→ f (x , y) = x2+ y2

the level curves satisfy the equation x2 + y2 = c, a constant. They arecircles with radius

pc and they do not exist for c < 0. How are the level

curves spaced in the plane? When c increases by 1 the radius of thecorresponding contour grows by

pc + 1−

pc =

1p

c + 1+p

c→ 0, as c→∞.

So the contours come closer, they become denser, the further out fromthe origin they are. This means that the corresponding surface becomessteeper, since the distance it takes for an increase by 1 in height shrinks.The figure on the right displays the contours of the function f for allinteger values 0≤ c ≤ 10.

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

Example 2.10

Let the function f be defined by the assignment (x , y) 7→ f (x , y) =x2− y2. Its level curves satisfy the equation x2− y2 = c, a constant. Weneed to distinguish three cases:

• c > 0: in this case we can write the equation as

x2

c−

y2

c= 1,

the equation for a ‘left-right’ hyperbola in standard form.• c < 0: now we rewrite the equation in the form

x2

|c|−

y2

|c|= −1,

i.e., the standard form for an ‘up-down’ hyperbola.

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

3

• c = 0. In this case we have x2 = y2 or y = ±x , two lines (the asymptotes of the hyperbolae).Putting all these different contours together we obtain the figure on the right, which shows all contours forthe integer values between −10 and 10.

14

3 Limits and Continuity

3.1 Limits

We first define a useful concept for localising attention to the vicinity of a given point a= (a, b) ∈ R2.

Definition 3.1. The open disc centred at a with radius r is the set

{(x , y) ∈ R2 :Æ

(x − a)2 + (y − b)2 < r}.

Note, that d = |x− a| =p

(x − a)2 + (y − b)2 is the distance between the points a = (a, b) and x = (x , y). Thedisc is called open because it does not include its boundary points, i.e., those with d = r.

Given a function f : D→ R, where D ⊂ R2 is the domainof f . Suppose that D contains an open disk centred at(a, b) with radius r, except that (a, b) may not belongto D, i.e., f may not be defined at (a, b). As an exampleconsider the function defined by f (x , y) = (x2 + y2)−1.

(a, b)

(x , y)

rd

D

Definition 3.2. We say that f (x , y) tends to l as (x , y) tends to (a, b) if

| f (x , y)− l| → 0 as |x− a| → 0

for x 6= a. If this is the case then we call l the limit of f (x , y) as (x , y) tends to (a, b) and we write

lim(x ,y)→(a,b)

f (x , y) = limx→a

f (x , y) = l,

or f (x , y)→ l as (x , y)→ (a, b).

Note When a limit exists then it is unique.

Note In many cases the limit is obvious. For example

• lim(x ,y)→(1,2)

x3

x2 + y2=

11+ 4

=15

,

• lim(x ,y)→(0,0)

x2 − y2

x + y= lim(x ,y)→(0,0)

x − y = 0.

We discuss two examples.

15

Example 3.1Determine the limit

lim(x ,y)→(0,0)

x3

x2 + y2.

First we rewrite the expression asx3

x2 + y2=

x2

x2 + y2︸︷︷︸

≤1

·x ,

so that�

�

�

�

x3

x2 + y2− 0

�

�

�

�

=x2

x2 + y2· |x | ≤ |x | → 0 as (x , y)→ (0,0).

Therefore, according to the definition,

lim(x ,y)→(0,0)

x3

x2 + y2= 0.

Example 3.2Let the function f be defined by the assignment

f (x , y) =x − yx + y

and discuss the limitlim

(x ,y)→(0,0)f (x , y).

Note, that f is not defined in (0, 0). In order to get an idea about the behaviour of the function near (0,0)we approach this point from different directions. We choose an arbitrary number α ∈ R and consider the linewith the equation y = αx . Restricting the function f to points on the line yields

f (x ,αx) =x −αxx +αx

=1−α1+α

→1−α1+α

as (x , y)→ (0,0).

Thus, we obtain different limiting values depending on the direction of approach. So we must conclude thatlim(x ,y)→(0,0) f (x , y) does not exist.

Example 3.3While it is true that if lim(x ,y)→(a,b) f (x , y) exists and equals L, then the limits along all possible lines approach-ing (a, b) exist and equal L, the converse is not true: even when the limits along all lines exist and equal thesame number L, this does not mean that the limit lim(x ,y)→(a,b) f (x , y) exists. Here is a counter-example.Consider the function defined by

f (x , y) =x4

x4 + y2.

It is defined for (x , y) 6= (0, 0). What happens at (0,0)? Consider a line through (0,0) defined by y = mx forsome arbitrary m ∈ R. Then for x 6= 0

f (x , mx) =x4

x4 +m2 x2=

x2

x2 +m2→ 0 for x → 0.

Hence, the limits along every line through (0,0) exist and equal 0.

16

However, this does not imply that the limitlim(x ,y)→(0,0) f (x , y) exists. If it existed, then itwould have to be equal to 0, because the limit isunique. However, consider the parabola y = x2.Approaching (0,0) along the parabola we obtain forx 6= 0

f (x , x2) =x4

x4 + x4=

12→

12

for x → 0.

But even worse, approaching along an arbitraryparabola y = ax2 for some a ∈ R yields

f (x , ax2) =x4

x4 + a2 x4=

11+ a2

→1

1+ a2for x → 0,

giving different limits for different parabolas. There-fore, the limit lim(x ,y)→(0,0) f (x , y) does not exist.

3.2 Continuity

Definition 3.3. Let f : D→ R be a function of two variables. If

lim(x ,y)→(a,b)

f (x , y) = f (a, b),

then we say that f is continuous at (a, b). If f is continuous at all points in D, we say that f is continuous on D.

Note The function defined by f (x , y) = x3/(x2 + y2) is continuous at all points (x , y) 6= (0,0), where it isnot defined. However, the slightly altered function

f : R2→ R, (x , y) 7→

¨

f (x , y) (x , y) 6= (0,0),0 (x , y) = (0,0)

is continuous at all points in the (x , y)-plane.

Note If the function f and g are both continuous at a point (a, b), so are the functions f + g, f − g and f · g.The function f /g is continuous at (a, b) if g(a, b) 6= 0.

Note If φ is a continuous function of one variable and if f is continuous at (a, b) then the composed functionφ ◦ f , defined by φ ◦ f (x , y) = φ( f (x , y)), is continuous at (a, b).

With these properties we can build a large class of continuous functions starting with the three functions, definedeverywhere by

f (x , y) = x , g(x , y) = y, h(x , y) = c

for some arbitrary constant c. It is easy to see that these three functions are continuous everywhere. Now wecan conclude that

• the powers ax2, b y2, cx y are continuous everywhere• the 3rd order powers ax3, bx y2, . . . are continuous everywhere, continuing (by induction)

17

• all powers are continuous, and therefore also• all polynomials are continuous everywhere, so that finally• all rational functions defined by

r(x , y) =p(x , y)q(x , y)

for arbitrary polynomials p and q are continuous at all points (x , y) with q(x , y) 6= 0.

Example 3.4The functions defined by the following expressions are continuous everywhere

x2 − x y + y2

1+ x2 + y2, exp(x2 − y2), sin

�

x2 − y2

1+ x2 + y2

�

.

The function defined byx5 − y5

x2 − y2

is continuous at (a, b) if a 6= b.

Note Series, i.e., infinite sums, need special attention: for example, consider the series

f (x) = −2π

�

sin2πx +12

sin 4πx +13

sin 6πx +14

sin8πx + · · ·�

.

Every term is continuous everywhere so that every partial sum is continuous everywhere. However, one canshow that the series converges to a function which is not continuous everywhere. It has jumps at all integervalues n. In fact,

f (x) =

¨

2(x − n)− 1 n< x < n+ 1

0 x = n

x

y

0 1 2 3 4

18

4 Partial Derivatives

4.1 Partial derivatives

Think of the graph of a function f as a hill. Suppose you stand on that hill at a point P. As you walk away fromthat point you will go up or down or stay at the same level. The change of height will in general depend on thedirection in which you proceed.

The rate of change of f in the direction defined by a unit-vector u is the directional derivative of f at the pointP in the direction u. It is represented by the slope of the curve C on the surface which lies above the line indirection u.

We see here the exact same phenomenon as with con-tinuity: there are many directions from a point P inwhich the function can change. We will see later howto compute the directional derivative in the directionof an arbitrary unit-vector u.

If the function f is not constant then there are distin-guished directions:

• the one in which the function increases mostrapidly,

• the one in which it decreases most rapidly,• and those in which f does not change.

These directions are determined by the function itself,in contrast to the following other special directions.These are determined by our use of coordinates x and y:

• the direction parallel to the x-axis. In that case the directional derivative is the rate of change of f with x ,keeping y fixed. This is denoted by

∂ f∂ x= fx = ∂x f

and we call it the partial derivative of f with respect to x . From the definition of a derivative we have

∂ f∂ x(a, b) = lim

h→0

f (a+ h, b)− f (a, b)h

.

• the direction parallel to the y-axis. In that case the directional derivative is the rate of change of f with y ,keeping x fixed, denoted by

∂ f∂ y= f y = ∂y f

and we call it the partial derivative of f with respect to y . From the definition of a derivative we have

∂ f∂ y(a, b) = lim

h→0

f (a, b+ h)− f (a, b)h

.

Definition 4.1. The partial derivatives of a function f of two variables at the point (a, b) are ∂x f (a, b) and∂y f (a, b) as defined above.

19

Note These derivatives are called partial, because individually they do not give the full information aboutthe change of the function f . Note, that partial derivatives are directional derivatives in the directions of thecoordinate axes.

We can get an idea about the partial derivatives from the layout of the level curves of the function.

By comparing the neighbouring level curves we can geta rough idea about the sign of the partial derivatives atvarious points of the diagram. In the figure to the rightwe have marked three points and it is not difficult to getthese statements

at P :∂ f∂ x

> 0∂ f∂ y

< 0

at Q :∂ f∂ x

< 0∂ f∂ y

> 0

at R :∂ f∂ x= 0

∂ f∂ y

< 0-3.9

-3.51

-3.12

-3.12

-2.73

-2.732.34

-2.34

-1.95

-1.95

-1.56

-1.56

-1.56

-1.17

1.17

-1.17

-0.78

-0.78

-0.78

0.39

-0.39

0.39

0

0

0

0.39

0.39

0.39

0.78

1.17

1.56

P

QR

-1.0 -0.5 0.0 0.5 1.0-2.0

-1.5

-1.0

-0.5

0.0

4.2 Higher partial derivatives

We continue our discussion with a function f (x , y) of two variables. If the partial derivative fx can be evaluatedat every point of D, the domain of f , then it is by itself a real-valued function of two variables (x , y) definedon D

fx : D→ R, (x , y) 7→∂ f∂ x(x , y)

and so we can compute its partial derivatives

∂ fx

∂ x= fx x = ∂x(∂x f ) = ∂ 2

x f =∂ 2 f∂ x2

and∂ fx

∂ y= fx y = ∂y(∂x f ) =

∂ 2 f∂ y∂ x

.

Similarly, for f y we can compute its partial derivatives, denoted by any of the following possibilities

∂ f y

∂ x= f y x = ∂x(∂y f ) =

∂ 2 f∂ x∂ y

and∂ f y

∂ y= f y y = ∂y(∂y f ) = ∂ 2

y f =∂ 2 f∂ y∂ y

.

The four functions fx x , fx y , f y x and f y y are real-valued functions of two variables and we can compute theirpartial derivatives. In this way we can generate partial derivatives of arbitrary order. The derivatives up to thirdorder are displayed in the tree diagram

20

fx x x fx x y fx y x fx y y f y x x f y x y f y y x f y y y

fx x

∂x ∂y

fx y

∂x ∂y

f y x

∂x ∂y

f y y

∂x ∂y

fx

∂x ∂y

f y

∂x ∂y

f

∂x ∂y

Tree diagram for the partial derivatives of up to order three

Example 4.1Consider the function

f : R2→ R, (x , y) 7→

¨

x y x2−y2

x2+y2 x 6= 0, y 6= 0

0 x = y = 0

Let us compute the mixed derivatives at (0,0). Observe, that on the x-axis (y = 0!) we have for x 6= 0

∂ f∂ y(x , 0) = x

x2

x2+ x · 0 · · ·= x .

in this formula the dots indicate terms that we need not compute because they will be multiplied by 0. So wefind that on the x-axis

∂ f∂ y(x , 0) = x

and, therefore,∂ 2 f∂ x∂ y

(x , 0) = 1.

Similarly, for y 6= 0 we obtain along the y-axis

∂ f∂ x(0, y) = y

−y2

y2+ 0 · y · · ·= −y.

Again, the dots indicate terms which we have not computed because they are multiplied with 0. Now, we findthat on the y-axis

∂ 2 f∂ y∂ x

(0, y) = −1.

Combining these two results we obtain that at the origin

limx→0

∂

∂ x∂ f∂ y(x , 0) = 1 6= −1= lim

y→0

∂

∂ y∂ f∂ x(0, y)

Therefore, the mixed derivatives need not be equal! However, it is clear that the above example is rather involvedand this indicates that inequality of mixed derivatives is rather the exception and not the rule. Indeed, there isa theorem which states the conditions under which the mixed derivatives are equal.

Theorem 4.1 (Clairaut’s theorem). Suppose that a function f is defined on a disk D around a point (a, b) ∈ D. If

21

the functions fx y and f y x are both continuous on D then

fx y(a, b) = f y x(a, b).

This theorem can also be found attached to the name of Herrmann Amandus Schwarz.

Note Similar statements hold for the mixed higher derivatives, such as if fx y y , f y x y and f y y x are continuouson D then they are equal.

Note Therefore, when the theorem applies then in the mixed derivatives the order of differentiation doesnot play a role (for the result, not for the effort of computation).

Note One often paraphrases this theorem by saying that the partial derivatives commute, i.e., for any functionf for which the theorem applies one has

∂x(∂y( f )) = ∂y(∂x( f )).

Note This property is very important and many mathematical results hinge on it (see later). It is not veryoften that one encounters a case where the partial derivatives do not commute.

22

5 Linear approximation and derivative

5.1 Curves and tangent vectors

We want to describe a curve in the plane such as the one on theright.Imagine driving along the curve and determining your location atevery second. Every time we measure the location we obtain a point(x(t), y(t)), where t is the instant when we measure the location.

x

y

t = 1

t = 2t = 3

t = 4

t = 5

Stewart§13.1

This gives a function, which assigns to every instant of time within an interval I ⊂ R a point (x(t), y(t)) ∈ R2,

γ : I → R2, t 7→ (x(t), y(t)).

Definition 5.1. A curve in the plane (or in space) is a function γ, which assigns to each t in an interval I ⊂ R auniquely determined point (x(t), y(t)) ∈ R2 (or (x(t), y(t), z(t)) ∈ R3).

Example 5.1

(i) γ : (0, 2π) → R2, t 7→ (a cos t, b sin t) defines an ellipse with centre at (0,0) and semi axes a and b.Note, that the point (a, 0) does not lie on the curve.

(ii) γ : R→ R2, t 7→ (a cos t, b sin t) defines a curve which runs through the same ellipse as before, exceptthat now the ellipse is traced infinitely often.

(iii) γ : (−∞,∞)→ R2, t 7→ (a cos t, a sin t, bt). This is a space curve. In the (x , y) components the curveis like a circle with radius a, while its z component increases linearly with t. So this curve is a helix whichwinds upwards around the z-axis in a counter-clock wise direction when b > 0 and downwards whenb < 0.

(iv) Let f : R → R be a real-valued function, then the curve γ : R → R2, t 7→ (t, f (t)) is the graph of f .Note, that here x(t) = t and y(t) = f (t), so the relation between x and y is y = f (x).

Given a curve γ(t) = (x(t), y(t)) for t in some interval I and t0 ∈ I we considerthe vectors (see the figure to the right)

γ(t0 + h)− γ(t0)h

x

y

h= 2

h= 1.5

h= 1.0

h= 0.5

Definition 5.2. We define the derivative of γ at t0 by

γ(t0) = limh→0

γ(t0 + h)− γ(t0)h

.

If this limit exists we call γ(t0) the tangent vector to γ at t0.

23

Note The tangent vector has components given by γ(t) =

�

x(t)y(t)

�

.

A very similar definition applies to the tangent vector of a space curve in which case we have

γ(t) =

x(t)y(t)z(t)

.

We compute the tangent vectors for the examples above:

• For (i) and (ii) the tangent vector at t is

γ(t) =

�

−a sin tb cos t

�

.

• For (iii) we obtain

γ(t) =

−a sin ta cos t

b

.

• In case (iv) the tangent vector is

γ(t) =

�

1f ′(t)

�

.

1 2 3

f (x)

∆x = 1

∆y = f ′(1)

x

y

5.2 Linear approximation and derivative

Consider the definition of the derivative of a real-valued function

f ′(x0) = limh→0

f (x0 + h)− f (x0)h

.

This means that�

�

�

�

f (x0 + h)− f (x0)h

− f ′(x0)

�

�

�

�

→ 0, as h→ 0.

With x = x0 + h this can be written as�

�

�

�

f (x)− f (x0)− f ′(x0)(x − x0)x − x0

�

�

�

�

→ 0, as x → x0.

In order to abbreviate this one often writes this as

f (x)− f (x0)− f ′(x0)(x − x0) = o(x − x0),

orf (x) = f (x0) + f ′(x0)(x − x0) + o(x − x0).

The symbol o(x − x0) stands for terms which are left unspecified except for their behaviour as x approachesx0. One says that o(x − x0) is of higher order than x − x0 when |x − x0| → 01. With ∆y = f (x)− f (x0) and

1The exact definition of this symbol is the following: we say that f (x) is of higher order than g(x) as x → x0 and write f (x) = o(g(x))if and only if limx→x0

| f (x)||g(x)| = 0. It means that the higher order terms o(x − x0) vanish faster than |x − x0| as x tends to x0. For

example, (x − x0)2 = o(x − x0), (x − x0)3 = o(x − x0) butp

x − x0 6= o(x − x0).

24

∆x = x − x0 we can also write∆y = f ′(x0)∆x + o(∆x).

Among all lines y = f (x0)+m(x− x0) through the point (x0, f (x0)) with slope m the one with slope m= f ′(x0)agrees best with the graph of f (x) at x0. Therefore,

Definition 5.3. We call the line with equation y = f (x0)+ f ′(x0)(x − x0) the linear approximation to f (x) in thepoint x0.

The linear approximation has the form y = mx + b, i.e., a constant (b) + a term (mx) linear in x . We will findthis kind of structure again and again.

We now discuss the linear approximation of curves. So let γ be a curve in the plane given by its two componentsγ(t) = (x(t), y(t)). Its derivative at t0 is

dγdt(t0) = γ(t0) = lim

h→0

γ(t0 + h)− γ(t0)h

which is the same as the statement that�

�

�

�

γ(t0 + h)− γ(t0)h

− γ(t0)

�

�

�

�

→ 0, as h→ 0.

Or, with t = t0 + hγ(t) = γ(t0) + γ(t0)(t − t0) + o(t − t0).

Among all lines through the point γ(t0) the one with tangent vector γ(t0) agrees best with (γ(t) for t near t0.So, the linear approximation for γ(t) at t0 is the straight line with the parametric equation

l(t) = γ(t0) + γ(t0)(t − t0).

Again, we see that the linear approximation has the form ‘constant term’ + ‘linear term’.

Next we want to study the question as to what is the linear approximation for a real-valued function f of severalvariables. We will use again for convenience the case of two variables but everything generalises to more thantwo variables. Stewart

§14.4

y

x

z

x = a

y = b

vu

Fix a point (a, b) ∈ D in the domain of the func-tion. We cut the graph of the function f (x , y)by two vertical planes through (a, b) parallel tothe (y, z)-plane and the (x , z)-plane, respectively.They are given by the equations x = a and y = b,respectively. The trace obtained on either of theplanes is a curve, which can be written as graphsof the function z = f (a, y) or the function z =f (x , b).

Focusing on the trace in the plane y = b we can write it as a space curve

γb(t) = (a+ t, b, f (a+ t, b)), i.e., x(t) = a+ t, y(t) = b, z(t) = f (a+ t, b).

Note, that the points on this curve satisfy the equation which characterises the trace, namely z = f (a + t, b) =f (x , b). Note, also that γb(0) = (a, b, f (a, b)). Similarly, we can write the trace in the x = a-plane as the spacecurve

γa(t) = (a, b+ t, f (a, b+ t)), i.e., x(t) = a, y(t) = b+ t, z(t) = f (a, b+ t).

25

As before, we find that all points (x(t), y(t), z(t)) on this curve satisfy the equation which defines the trace of fin the plane x = a: f (x(t), y(t)) = f (a, b+ t) = z(t), so z = f (a, y).

We compute the tangent vectors of the two curves at t = 0. This gives us two vectors u and v at the pointP = (a, b, f (a, b))

u= γb(0) =

10

fx(a, b)

, v= γa(0) =

01

f y(a, b)

.

These two vectors define a plane through P given in parametric form as

x= p+ ru+ sv, for r, s ∈ R. (?)

Here, p is the position vector of the point P, pointing to P from the origin.

This vector equation can be written out explicitly and yields the three equations

x = a+ r, y = b+ s, z = f (a, b) + r fx(a, b) + s f y(a, b).

A plane can also be defined in normal form by an equation

n · (x− p) = 0. (??)

Here, the vector n is a normal vector to the plane and p is the position vector to a point P lying on the plane.The equation asserts that every vector from the point P to any other point on the plane with position vector x isperpendicular to n. What is the normal form of the plane defined by u and v? Since we already know that P lieson the plane we only need to determine a normal vector, which is perpendicular to all vectors in the plane.

There are two ways to determine a normal vector. We can compute the cross product between u and v, obtain-ing

n= u× v=

− fx(a, b)− f y(a, b)

1

.

Note, that this method is quick and easy, but it only works for functions of two variables. The reason is that inhigher dimensions (i.e., with more variables) there does not exist a cross product between vectors that producesa vector of the same kind.

The other possibility is to use the normal form equation. We write

n=

ABC

and insert into the equation (??)

A(x − a) + B(y − b) + C(z − f (a, b)) = 0. (? ? ?)

This equation must be fulfilled for every point on the plane. Consequently, when we insert (?) into (??) we mustget a true equation for all values of r and s:

Ar + Bs+ C(r fx(a, b) + s f y(a, b)) = 0.

Since this needs to hold for all r and s we obtain two equations

A+ C fx(a, b) = 0, B + C f y(a, b) = 0,

26

and this gives us expressions for A and B in terms of C

A= −C fx(a, b) B = −C f y(a, b).

Inserting these back into (? ? ?) (and dividing by C) we get the normal form

(z − f (a, b))− fx(a, b)(x − a)− f y(a, b)(y − b) = 0.

Note, that this is exactly what we get from using the normal vector determined above in terms of the crossproduct.

Definition 5.4. For a function f : D→ R, D ⊂ R2 with (a, b) ∈ D we call the plane with the equation

(z − f (a, b))− fx(a, b)(x − a)− f y(a, b)(y − b) = 0

the tangent plane to the surface given by z = f (x , y) at the point (a, b).

Note The vector of coefficients

n=

− fx(a, b)− f y(a, b)

1

is a normal vector to the plane. It is not necessarily a unit-vector.

Note For a function of more than two variables, say f (x1, x2, . . . , xn), we do not talk of the tangent planebut the tangent space. This is a higher dimensional object sitting in Rn+1. This is so because the graph of thisfunction is described by the equation xn+1 = f (x1, x2, . . . , xn) and the equation describing the tangent spaceat a point a= (a1, a2, . . . , an) becomes

(z − f (a))− fx1(a)(x1 − a1)− fx2

(a)(x2 − a2)− · · · − fxn(a)(xn − an) = 0.

Example 5.2Find the tangent plane to the graph of the function z = f (x , y) = x2 − y2 in the point (1,2).At (1,2) we have f (1, 2) = −3. Compute fx(x , y) = 2x and f y(x , y) = −2y , so that fx(1,2) = 2 andf y(1, 2) = −4. Now the equation for the tangent plane is obtained by inserting,

(z + 3)− 2(x − 1) + 4(y − 2) = 0.

Consider again the equation for the tangent plane to a function given in Definition 5.4. We can solve thisequation for z and obtain

z = f (a, b) + fx(a, b)(x − a) + f y(a, b)(y − b).

This defines, again, a function which has a constant part and a part which is linear in x and y . This function isthe linear approximation of f in the point (a, b).

Definition 5.5. The linear approximation of a function f of two variables in the point (a, b) is the function L fdefined by

L f : R2→ R, L f (x , y) = f (a, b) + fx(a, b)(x − a) + f y(a, b)(y − b).

Its graph coincides with the tangent plane of f at the point (a, b). The function L f is often called the linearisationof f in the point (a, b).

27

Note The tangent plane of L f at the point (a, b) coincides with the graph of L f . (Show this!)

Definition 5.6. A function of two variables (x , y) is differentiable in a point (a, b) if

f (a+∆x , b+∆y) = f (a, b) + fx(a, b)∆x + f y(a, b)∆y + o(|∆x |+ |∆y|).

Note This condition can be rephrased as follows

f (a+∆x , b+∆y)− L f (a+∆x , b+∆y) = o(|∆x |+ |∆y|).

In words, the linearisation of f is a very good approximation near (a, b), i.e., for small ∆x and ∆y .

It is not always easy to verify whether a given function is differentiable using this definition. That’s why wequote a theorem that puts our mind at ease:

Theorem 5.1. If the partial derivatives fx and f y exist and are continuous in (a, b), then f is differentiable in thatpoint.

When x changes from a to a +∆x and y changes from b to b +∆y , then the value of f changes from f (a, b)to f (a+∆x , b+∆y). Let us denote this difference by ∆z. Then we can write

∆z = fx(a, b)∆x + f y(a, b)∆y + o(∆x) + o(∆y)

for the change in the value of f . The corresponding change in the values of the linear approximation L f isdenoted by dz and we obtain with dx =∆x and dy =∆y

dz = L f (a+∆x , b+∆y)− L f (a, b) = fx(a, b)dx + f y(a, b)dy

This expression has a name of its own.

Definition 5.7. The total differential d f (a, b) of a function f (x , y) in the point (a, b) is

(dz =)d f (a, b) = fx(a, b)dx + f y(a, b)dy.

Its interpretation is the following: given arbitrary changes dx , dy in the variables the total differential gives anestimate of the change in the value z of the function. This estimate is based on the linear approximation. Thatmeans that when f is differentiable this estimate becomes increasingly better, the smaller dx and dy are.

This property is often used to estimate measurement errors. The idea here is that the measurement errors aremuch smaller than the value obtained by the measurement. Here is an example

28

Example 5.3The volume of a circular cone is V = 1

3πr2h, where r is the radius of the base and h is the height. Supposethat r = 3, h= 5 and that r increases by 0.1, while h decreases by 0.3. What is the approximate change in thevolume V?With dV = 2

3πrh dr + 13πr2 dh we get

dV = 10π×110−

13π× 9×

310=

110π.

The actual change in the volume is

∆V =13π(1+

110)2(h−

310)−

13πr2h≈ 0.0557π.

Here the estimated change is roughly twice as large as the true change. However, compute this for dr = 1100

and dh= − 3100 .

Example 5.4In the previous example suppose that r was measured with a margin of error of 2% and h within 3%. This isthe same as saying that the relative or percentage error in r is

∆rr=

error in rr

= 2%= 0.02=2

100

and that∆hh= 3%= 0.03=

3100

.

What is the percentage error in V? We use differentials for this, i.e., we write dr/r = 2/100 and dh/h= 3/100and compute dV/V from the formula for the volume and its differential,

dVV=

23πrh dr + 1

3πr2 dh13πr2h

= 2drr+

dhh= 2

2100

+3

100.

So the relative error in the volume is approximately 7%.

5.3 The chain rule

Going back to functions of one variable. Suppose we have two functions, f : R → R and g : R → R, and wedefine the composition function h= f ◦ g by the assignment

h(x) = f (g(x)),

then we compute the derivative of h in a point a by the chain rule as given in the

Theorem 5.2. If f is differentiable in g(a) and g is differentiable in a, then h is differentiable in a and its derivativeis

h′(a) = f ′(g(a))g ′(a).

How can one see this using differentials? Writing

z = h(x), z = f (y), y = g(x)

we can write the differentials of these functions as

dz = h′(x)dx , dz = f ′(y)dy, dy = g ′(x)dx .

29

When evaluating h(x) = f (g(x)) we see that a change dx in x causes a change in g(x) which we have nameddy . This, in turn causes a change in z = f (y) and this is the resulting change in h(x). So, the overall change inh(x) is a combination of the changes in y = g(x) and in z = f (y). Combining the differentials we obtain at thepoint a

dz = h′(a)dx = f ′(g(a))dy = f ′(g(a))g ′(a)dx .

Since this holds for arbitrary changes dx , this equation results in the chain rule.

Suppose now that f : D → R for some domain D ⊂ R2 is a function of two variables and that γ : R → D is acurve in the plane. We want to find the change in z = f (x , y) as (x , y) varies along the curve. So the questionis, what is

ddt

f (γ(t))?

To answer this we need to look at the change in γ(t) = (x(t), y(t)) as t varies around some value t0. Recall thelinear approximation for the curve γ

γ(t) = γ(t0) + γ(t0)(t − t0) + o(t − t0) =⇒ ∆γ= γ(t0)∆t + o(∆t)

or, in terms of differentials,

dγ=

�

dxdy

�

=

�

x(t0)y(t0)

�

dt.

Consequently, the change in z = f (x , y) near (a, b) = γ(t0) given changes dx and dy is

dz = d f (a, b) = fx(a, b)dx + f y(a, b)dy,

but the changes in x and y are given in terms of the change along the curve, so

dz = d f (a, b) =�

fx(a, b) x(t0) + f y(a, b) y(t0)�

dt.

The function that we are considering here is a real-valued function h of one variable, defined by

z = h(t) = f (γ(t))

so its differential is dz = h′(t0)dt. Comparing the two expressions for dz we find

h′(t0) =ddt

f (γ(t))�

�

�

t=t0

= fx(a, b) x(t0) + f y(a, b) y(t0).

This is the chain rule in this case.

Theorem 5.3. If f : D → R, D ⊂ R2 is differentiable in (a, b) and if γ : I → D is a curve running through thepoint (a, b) = γ(t0), which is differentiable at t0, then the composition function h : I → R, t 7→ h(t) = f (γ(t)) =f (x(t), y(t)) is differentiable in t0 and its derivative is

h′(t0) = fx(a, b) x(t0) + f y(a, b) y(t0).

30

Note Define the row vector ∇ f (a, b) = [ fx(a, b), f y(a, b)] and the column vector

γ(t0) =

�

x(t0)y(t0)

�

then

h′(t0) =∇ f (a, b) γ(t0) = [ fx(a, b), f y(a, b)] ·�

x(t0)y(t0)

�

is obtained by matrix multiplication of these two vectors.

5.4 Directional derivative and gradient

5.4.1 The gradient of a function

We already met the directional derivative earlier. The idea is this: we have a function f (x , y)and a point (a, b, f (a, b)) in the graph surface of thefunction. We also have unit-vector u = (u1, u2) in the(x , y)-plane which specifies a direction from the point(a, b) in the (x , y)-plane. We want the rate of changeof z = f (x , y) as (x , y) changes in the direction of uevaluated at the point (a, b).We define the curve

γ(t) =

�

ab

�

+ t

�

u1u2

�

.

y

x

z

u

P = (a, b, f (a, b))

(a, b)

Then,

γ(0) =

�

ab

�

and γ(0) =

�

u1u2

�

.

Now consider the function h defined by

h(t) = f (γ(t)) = f (a+ tu1, b+ tu2).

This function describes exactly what we want, namely the change of f along the curve γ. We can compute itsderivative using the chain rule and we find

h′(0) =∇ f (γ(0)) · γ(0) =∇ f (a, b) · u=�

fx(a, b), f y(a, b)�

�

u1u2

�

.

This is the directional derivative Du f (a, b) of f in the direction u evaluated in (a, b).

Theorem 5.4. If f is differentiable at (a, b) then it has a directional derivative Du f in the direction of any unit-vectoru at (a, b) and

Du f (a, b) =∇ f (a, b) · u.

31

Example 5.5Find the directional derivative of the function defined by f (x , y) = x2+ y2 at the point (1,−1) in the directionof the vector

�

23

�

.The corresponding unit-vector is

u=1p

13

�

23

�

so we obtain

Du f (a, b) = [2x , 2y]�

�

�

(1,−1)·

� 2p133p13

�

=4p

13−

6p

13= −

2p

13.

Note ∇ f (a, b) is a 2-dimensional vector (i.e., it has two components), called the gradient vector. It is alsodenoted by grad f (a, b).

Example 5.6For f (x , y) = x2 − y2 find the directional derivative Du f (1,−2) in the directions shown in the diagram.The gradient of f at (1,−2) is

∇ f (1,−2) = [2,4] .

and we obtain the following table for the directional derivatives

u Du f (1,−2)[1, 0] 2

1p2[1,1] 3

p2

[0, 1] 41p2[−1, 1]

p2

[−1, 0] −21p2[−1,−1] −3

p2

[0,−1] −41p2[1,−1] −

p2

x

y

1

−1

5.4.2 Steepest ascent/descent

Definition 5.8. At a given point (a, b) the direction in which Du f (a, b) is largest is called the direction of steepestascent. Similarly, the direction of steepest descent is the one in which Du f (a, b) is the most negative (i.e., negativeand largest in magnitude).

How can we find these directions? We need to examine the formula for the directional derivative

Du f (a, b) = grad f (a, b) · u.

This is the scalar product of two vectors, grad f (a, b) and u. Recall the relationship between the scalar productof two vectors x and y and their lengths and enclosed angle φ:

x · y= |x| |y| cosφ φ

x

y

So let φ be the angle enclosed by grad f (a, b) and u then

Du f (a, b) = |grad f (a, b)| |u| cosφ = |grad f (a, b)| cosφ.

32

Since the values of cosφ are constrained by −1≤ cosφ ≤ 1 and cosφ = ±1 for φ = 0 and φ = π we find

• The direction of steepest ascent occurs when cosφ is largest,i.e., whenφ = 0. This is the direction of∇ f (a, b). Therefore,the largest value of Du f (a, b) occurs in that direction and itsvalue is equal to the length of the gradient vector, |∇ f (a, b)|.

• The direction of steepest descent occurs when cosφ is themost negative, i.e., when φ = π. This is the direction of−∇ f (a, b). Therefore, the smallest value of Du f (a, b) occursin that direction and its value is −|∇ f (a, b)|.

(a, b)

level curve

grad f

−grad f

• The gradient of f at a point (a, b) provides two pieces of information: its direction is the direction ofsteepest ascent and its length is the maximal rate of change in that direction. The opposite direction is thedirection of steepest descent with a rate of change given by the negative of the length of the gradient.

Note When φ = π/2 or when φ = 3π/2 the directional derivative Du f (a, b) vanishes. These two directionsare the directions of the level curve through the point (a, b).

In summary:

1. ∇ f has the direction in which f increases most rapidly, i.e., in which Du f (a, b) is largest. The maximalrate of increase is |∇ f |=

q

f 2x + f 2

y .2. −∇ f is the direction in which f decreases most rapidly, i.e., in which Du f (a, b) is smallest. The maximal

rate of decrease is −|∇ f |.3. ∇ f is perpendicular to the level curves of f .

Example 5.7What does ∇ f look like qualitatively along the level curve z = 2?

z = 1

z = 2

z = 3

Example 5.8Find the direction of the level curve of f (x , y) = x y + y5 at (1,−1).We know that the level curve through (1,−1) is perpendicular to the gradient of f at that point. Now

∇ f (1,−1) = (y, x + 5y4)�

�

�

(1,−1)= (−1,6).

Vectors perpendicular to this are�

61

�

or

�

−6−1

�

,

and arbitrary multiples thereof. Any of these give the direction of the level curve.

33

Example 5.9In which direction is the function f defined by f (x , y) =

p

1− x2 − y2 increasing most rapidly at (1/2,−1/2)?This direction is given by the gradient, which is

∇ f (12

,−12) =

�

−x

p

1− x2 − y2,−

yp

1− x2 − y2

�

�

�

�

( 12 ,− 1

2 )=

1p

2(−1,1).

Example 5.10At a given point (a, b) the level curve of a function f (x , y) through that point is tangent to the direction (1,2).The directional derivative of f in the direction of the vector

�

−1−1

�

is 3. Find the gradient vector of f at thatpoint.We know that the gradient is a vector with two components, say ∇ f (a, b) = (α,β), where α and β are twonumbers that we need to determine from the conditions given. These give us two equations: The first is thatthe gradient is perpendicular to the direction of level curve, i.e.,

0=∇ f (a, b) ·�

12

�

= α+ 2β .

The second is the value of the directional derivative in the specified direction. The unit-vector in the direction(−1,−1) is 1/

p2(−1,−1), so that

3=∇ f (a, b) ·1p

2

�

−1−1

�

= −1p

2(α+ β).

Solving these two equations for α and β yields

∇ f (a, b) = (α,β) =p

2(−6,3)

5.5 Beyond the linear approximation: the multivariate Taylor expansion

In the case of a real-valued function of one variable such as defined by h(t) we have the means to get betterapproximations to the values of the function near a point t0 by making use of the Taylor expansion of h at t0.This is the power series constructed from the derivatives of h at t0 in the following way

h(t0) +h′(t0)

1!(t − t0) +

h′′(t0)2!

(t − t0)2 +

h′′′(t0)3!

(t − t0)3 + · · ·

Of course, in order for this to make sense we need all the derivatives of h at t0 to exist.

Note, that if we truncate the expansion after the term with power n, then we get the nth Taylor polynomial Tn(t)and we can write

h(t) = Tn(t) + o((t − t0)n).

The terms o((t − t0)n) are referred to as the remainder terms and they can be cast into different more explicitforms depending on the purposes.

The first Taylor polynomial isT1(t) = h(t0) + h′(t0)(t − t0)

and we recognize it as the linear approximation of the function h at t0.

With higher Taylor polynomials we get approximations for h(t) near t0 beyond the linear approximation, suchas a quadratic or cubic approximation.

34

We now want to investigate whether there exists a similar tool in the case of more variables. To this end wesuppose we are given a function f : D→ R on some domain D ⊂ R2 and we are interested in the behaviour off near a point a= (a, b) ∈ D. So let x= (x , y) ∈ D be a point near a and consider the curve

γ(t) = a+ t(x− a) = (a+ t(x − a), b+ t(y − b)) = (x(t), y(t)).

Then we have γ(0) = a, γ(1) = x and γ(0) = x− a.

As before we define the function h(t) = f (γ(t)) = f (x(t), y(t)). Since this is a function of one variable we canwrite down its Taylor expansion near t = 0. To do that we need the derivatives at 0. We now compute the firstthree derivatives

h′(t) = fx(x(t), y(t))(x − a) + f y(x(t), y(t))(y − b)

h′′(t) = fx x(x(t), y(t))(x − a)2 + fx y(x(t), y(t))(x − a)(y − b)

+ f y x(x(t), y(t))(y − b)(x − a) + f y y(x(t), y(t))(y − b)2

= fx x(x(t), y(t))(x − a)2 + 2 fx y(x(t), y(t))(x − a)(y − b) + f y y(x(t), y(t))(y − b)2,

h′′′(t) = fx x x(x(t), y(t))(x − a)3 + 3 fx x y(x(t), y(t))(x − a)2(y − b)

+ 3 fx y y(x(t), y(t))(x − a)(y − b)2 + f y y y(x(t), y(t))(y − b)3.

Using these expressions we may now write down the first three terms in the Taylor expansion for h at t = 0

h(t) = f (a) + t�

fx(a)(x − a) + f y(a)(y − b)�

+t2

2

�

fx x(a)(x − a)2 + 2 fx y(a)(x − a)(y − b) + f y y(a)(y − b)2�

+t3

6

�

fx x x(a)(x − a)3 + 3 fx x y(a)(x − a)2(y − b) + 3 fx y y(a)(x − a)(y − b)2 + f y y y(a)(y − b)3�

· · ·

Evaluating this at t = 1 we obtain an approximation for f (x) = h(1)

f (x) = f (a) +�

fx(a)(x − a) + f y(a)(y − b)�

+12

�

fx x(a)(x − a)2 + 2 fx y(a)(x − a)(y − b) + f y y(a)(y − b)2�

+16

�

fx x x(a)(x − a)3 + 3 fx x y(a)(x − a)2(y − b) + 3 fx y y(a)(x − a)(y − b)2 + f y y y(a)(y − b)3�

· · ·

It is not useful to write down the higher terms explicitly because they explode in size very quickly. This expansionis the generalised Taylor expansion for a function of two variables at the point a.

35

6 Extremal points

6.1 Local extrema and critical points

Definition 6.1. A neighbourhood of a point (a, b) ∈ R2 is an open disc centred at (a, b).

Definition 6.2. A function f (x , y) has a local maximum at (a, b) if, for all (x , y) in some neighbourhood of (a, b),we have f (x , y)≤ f (a, b)

Figure 6.1: At a local maximum a function assumes the largest value compared to points in a small neighbour-hood

Similarly, we define a local minimum.

How can one find the local extrema (maxima and minima) of a function? Obviously, at a local extremum thetangent plane must be horizontal. The equation for the tangent plane through the point (a, b, f (a, b)) is

z = f (a, b) + fx(a, b)(x − a) + f y(a, b)(y − b)

and it is horizontal (i.e., z is constant) only if fx(a, b) = f y(a, b) = 0, i.e., if ∇ f (a, b) = 0.

Definition 6.3. For a function f : D → R we call a point (a, b) ∈ D with ∇ f (a, b) = 0 a critical point. Thecorresponding point on the graph surface is called a stationary point.

Note Local extrema occur at critical points.

36

Figure 6.2: At a local maximum a function assumes the largest value compared to points in a small neighbour-hood

Example 6.1Find the critical points of f (x , y) = x2 + y2 − 6x y + 2+ 2x − 2y .The gradient of f is

∇ f (x , y) = (2x − 6y + 2, 2y − 6x − 2).

At a critical point (x , y) we have

2x − 6y + 2= 0,

2y − 6x − 2= 0

�

=⇒ −16x − 4= 0 =⇒ x = −14=⇒ y = 3x − 1=

14

.

So there is one critical point at (−14 , 1

4).

Note Not all critical points correspond to local extrema!

Definition 6.4. A stationary point which is neither a local maximum nor minimum is called saddle point.

How can one tell whether a critical point corresponds to a local maximum, a local minimum or to a saddlepoint? Here, the second derivatives of the function come into play. We arrange the second derivatives of f in amatrix

H f =

�

fx x fx yf y x f y y

�

.

This matrix is called the Hessian of f . It is automatically symmetric according to Clairaut’s theorem. We needthe discriminant, i.e., the determinant of the Hessian

D = fx x f y y − f 2x y .

37

Figure 6.3: A surface with a saddle point. Note, that the surface extends below and above the tangent plane atthe saddle point.

Theorem 6.1 (Second derivative test). Suppose (a, b) is a critical point of f (x , y), so that ∇ f (a, b) = 0. Then fhas

(i) a local maximum at (a, b) if D(a, b)> 0 and fx x(a, b)< 0,(ii) a local minimum at (a, b) if D(a, b)> 0 and fx x(a, b)> 0,

(iii) a saddle point at (a, b) if D(a, b)< 0.

If D(a, b) = 0 then more information is needed to make a conclusion.

Note Replacing fx x with f y y gives identical conclusions in (i) and (ii). (Show this!)

Example 6.2

f (x , y) = x2 + y2 − 4x + 10y − 3

Critical points: 2x − 4= 0, 2y + 10= 0 implies (2,−5) is the only critical point.

H f =

�

2 00 2

�

=⇒ D = 4> 0, fx x(2,−5) = 2> 0.

So there is a local minimum at (2,−5).

38

Example 6.3

f (x , y) = x2 + 2x y + y3 − y

Critical points: fx(x , y) = 2x+2y = 0, f y(x , y) = 2x+3y2−1= 0 implies 3y2−2y−1= (3y+1)(y−1) = 0,which gives two solutions y = −1

3 and y = 1. So we find two critical points: (13 ,−1

3) and (−1, 1).

H f (x , y) =

�

2 22 6y

�

=⇒ D(x , y) = 12y − 4.

at (13 ,−1

3): D = −4− 4< 0, a saddle point,at (−1,1): D = 8> 0, and fx x = 2> 0, a local minimum.

Example 6.4

f (x , y) =13

x3 − x2 y − y3 + 3y

Critical points: fx(x , y) = x2 − 2x y = 0, f y(x , y) = −x2 − 3y2 + 3= 0x = 0: y2 = 1 gives two points (0,−1) and (0, 1)x = 2y: −7y2 + 3= 0 gives another two points (2

q

37 ,q

37) and (−2

q

37 ,−

q

37)

H f (x , y) =

�

2x − 2y −2x−2x −6y

�

=⇒ D(x , y) = −12y(x − y)− 4x2.

at (0,−1): D = 12> 0, fx x = 2> 0, a local minimum,at (0,1): D = 12> 0, fx x = −2< 0, a local maximum,

at (2q

37 ,q

37): D = −12

q

37(2

q

37 −

q

37)− 43

7 = −487 < 0, a saddle point

at (−2q

37 ,−

q

37): D = −48

7 < 0, a saddle point

How do the second derivatives come into play? Here is a justification for the theorem.

Figure 6.4: Analysing the traces of a surface graph on vertical planes through the critical point

The idea is this: if (a, b) is a critical point we look at the traces of the graph of the function f (x , y) on a verticalplane through (a, b) which includes a unit-vector u. When we vary u by rotating the plane around the verticalline through (a, b) we get different traces. Each trace can be described as the graph of a function.

39

When (a, b) is a local minimum of f then, evidently, each trace has a local minimum and, vice versa, if everytrace has a local minimum then (a, b) corresponds to a local minimum of f . A similar statement is true for alocal maximum. If we find that some trace have a local maximum, while others have a local minimum, then(a, b) is a saddle point. In all other cases we cannot make a decision.

In order to describe the traces we consider the function h(t) = f (a+ tu1, b+ tu2) for a unit-vector u=� u1

u2

�

.

We need the derivatives of h:

h′(t) = fx(a+ tu1, b+ tu2)u1 + f y(a+ tu1, b+ tu2)u2,

h′′(t) = fx x(a+ tu1, b+ tu2)u21 + fx y(a+ tu1, b+ tu2)u2u1

+ f y x(a+ tu1, b+ tu2)u1u2 + f y y(a+ tu1, b+ tu2)u22

Now we haveh(0) = f (a, b),

h′(0) = fx(a, b)u1 + f y(a, b)u2 = 0,

h′′(0) = fx x(a, b)u21 + 2 fx y(a, b)u2u1 + f y y(a, b)u2

2

and we find that t = 0 is a critical point for h, as it was expected.

Now we focus on h′′(0) and assume that fx x(a, b) 6= 0. Then we can rearrange the expression for h′′(0) (droppingthe arguments (a, b) for clarity)

h′′(0) = fx x

�

u1 +fx y

fx xu2

�2

+

=D︷︸︸︷

fx x f y y − f 2x y

fx xu2

2.

We discuss some cases:

(i) fx x > 0, D > 0: both terms in h′′(0) are positive for all values of u1 and u2, h′′(0) ≥ 0. But h′′(0) cannotvanish: suppose, there was a direction u = (u1, u2) for which h′′(0) vanishes. Then both terms mustvanish individually since they are positive. But this means that u2 = 0, since D > 0 and fx x > 0, and thenu1 = − fx y/ fx xu2 = 0. This contradicts the fact that u must be a unit-vector.So, in this case h′′(0)> 0 for all directions u and (a, b) corresponds to a local minimum.

(ii) fx x < 0, D > 0: with a similar argument to (i) one shows that h′′(0)< 0 for all directions u, so that (a, b)corresponds to a local maximum.

(iii) D < 0: we write

h′′(0) = fx x

�

u1 +fx y

fx x

�2

−|D|fx x

u22.

Consider the two directions

u=

�

10

�

v=1

q

f 2x y + f 2

x y

�

fx y− fx x

�

.

When we compute h′′(0) for these two directions we find opposite signs: when fx x > 0 then h′′(0)> 0 foru and h′′(0)< 0 for v, while it is the other way around, when fx x < 0. This means that (a, b) correspondsto a saddle point.

When fx x = 0, then necessarily D = − f 2x y < 0 (if fx y 6= 0) and in fact one can again find two directions for

which h′′(0) has opposite signs. In all other cases, we cannot proceed further.

Note The second derivative criterion emerges essentially by discussing the quadratic approximation of thefunction at (a, b). When this fails, one needs to look at even higher approximations.

40

6.2 An application: linear regression

The problem: how to get the straight line of “best fit” to a set of data points? This line is called the regressionline.

x

yFirst, we need to discuss how to measure the “best fit”.The vertical lines represent the discrepancy between the “observed” y-valuesand those “predicted” by the regression line. The lengths of these lines are

|y1 − ax1 − b|, |y2 − ax2 − b|, . . . , |yn − axn − b|.

We need to minimise these lengths in an overall sense. There are infinitelymany ways of doing this. Here are two of them:

(i) minimise the sum:∑n

i=1 |yi − ax i − b|=minimum,(ii) minimise the sum of squares:

∑ni=1(yi − ax i − b)2 =minimum.

In general these two methods do not give the same result: the straight line of “best fit” does depend on how wemeasure the “best fit”.

Example 6.5 (A simple case)Let us find the horizontal line which best fits the data points (0, 0), (1, 1) and (2,0).

b

x

yEvidently, 0≤ b ≤ 1.Using method (i) we need to minimise

|0− b|+ |1− b|+ |0− b|= 1+ b.

This is minimal for b = 0. So the best fit horizontal line for method (i) is y = 0.Using method (ii) we need to minimise

(0− b)2 + (1− b)2 + (0− b)2 = 3b2 − 2b+ 1.

The minimum occurs when 6b− 2= 0, i.e., for b = 1/3. So the line of best fit for (ii) is y = 1/3.

We will focus here on method (ii), the so called method of least squares. The problem is to find a and b byminimising

n∑

i=1

(yi − ax i − b)2.

We define a function f by setting

f (a, b) =n∑

i=1

(yi − ax i − b)2,

where we consider the data points (x i , yi) to be given. For a minimum we need the gradient∇ f to vanish. So,

0= fa(a, b) = −2n∑

i=1

(yi − ax i − b)x i = −2

�� n∑

i=1

x i yi

�

− a

� n∑

i=1

x2i

�

− b

� n∑

i=1

x i

��

,

0= fb(a, b) = −2n∑

i=1

(yi − ax i − b) = −2

�� n∑

i=1

yi

�

− a

� n∑

i=1

x i

�

− bn

�

.

To solve these equations we introduce the abbreviations

X =1n

n∑

i=1

x i , Y =1n

n∑

i=1

yi , X 2 =1n

n∑

i=1

x2i , S =

1n

n∑

i=1

x i yi .

X is the average of the x-values, Y is the average of the y-values, X is the average of the squares of the x-valuesand S is a statistical measure for the correlation between the x and y values.

41

Now, the vanishing of ∇ f implies that

S = aX 2 + bX , Y = aX + b.

These equations can be solved for a and b:

a =S − X Y

X 2 − X2 , b =

Y X 2 − SX

X 2 − X2 .

This implies that the function f has one critical point (a, b) given in terms of the data points. To find out thecharacter of the critical point we compute the Hessian

H f (a, b) = 2n

�

X 2 XX 1

�

which has determinant D = X 2 − X2. It can be shown (can you think of an argument?) that D is always strictly

positive. And since faa is positive, we have demonstrated that the critical point given by a and b as above, is infact a minimum.

Example 6.6Consider the data

x i 0 3 4 5yi −1 7 7 11

, so n= 4.

Then, X = 3, Y = 6, X 2 = 25/2 and S = 26, so that

a =26− 18

252 − 9

=2 · 8

7, and b =

6252 − 3 · 26252 − 9

= −67

.

The equation for the line of best fit is

y =167

x −67

.

6.3 Absolute maxima and minima

Suppose that a function f is defined on a bounded domain S, which contains all its boundary points. Such aset is called compact. We want to find the maximal and minimal values of f on S. We will assume that f iscontinuous. Otherwise one can construct examples of functions which do not have maximal or minimal valueson S. We will even assume that f is differentiable.

Consider the two examples given in the two following diagrams.

42

• In the figure on the left we see a function which has its maximum and minimum values at interior pointsof S. Here, the extremal values occur at critical points and they are local extrema.

• In the figure on the right the extreme values occur on the boundary points. They are not critical pointsbecause we cannot surround them with an open disc that lies entirely in S. Also, it is clear from the figurethat the function does not have a horizontal tangent plane at these points.

Note This is true in general: the max/minimal values of a function f on a compact set occur either at criticalpoints of f that lie inside S or at boundary points of S.

Note To find the max/min values of f on a compact set S we need to(i) find the critical points of f in S and evaluate f at these points,

(ii) find the largest and smallest values of f on the boundary of S,(iii) take the largest and smallest values among the values found in (i) and (ii).

Example 6.7

Find the maximum and the minimum of the function

f : {(x , y} ∈ R2 : −1≤ x ≤ 1, −1≤ y ≤ 1} → R, (x , y) 7→ 3x2 − y2.

(i) Find the critical points:

∇ f (x , y) = (6x ,−2y) = 0 =⇒ x = y = 0.

So, there is one critical point, which corresponds to a saddle point with f (0, 0) = 0.L4

L1

L2

L3

(−1,−1) (1,−1)

(−1,1) (1, 1)

x

y

(ii) Max/min on the boundary: we break the boundary up into the four pieces L1, L2, L3 and L4 and discussthem in turn.On L1, x = 1 and −1 ≤ y ≤ 1 and f (1, y) = 3− y2. To find the extremal values for this function we againlook for critical points

ddy(3− y2) = f y(1, y) = −2y

which vanishes at y = 0. This is a maximum for f on L1 with f (1,0) = 3. The minimal value must occur onthe boundary of [−1,1] and indeed f (1,1) = f (1,−1) = 2. So on L1 we have the maximum 3 at (1,0) andthe minimum at (1,−1) or (1, 1).

43

On L2, y = 1 and −1≤ x ≤ 1, so f (x , 1) = 3x2 − 1. The critical points are at

ddx(3x2 − 1) = fx(x , 1) = 6x .

This gives one critical point at x = 0 with the minimal value f (0, 1) = −1. The maximal value must occur atthe boundary, i.e. for x = ±1 and we find (as before) f (1,1) = 2 and f (−1, 1) = 2. So, on L2 we have themaximum 2 at (1, 1) or (−1,1) and the minimum −1 at (0,1). On L3 we get the same max/min values as forL1.On L4 we get the same max/min values as for L2.Therefore, on the boundary of S we have the minimum −1 and the maximum 3.(iii) On the entire domain, the maximum value is 3 and it occurs on the boundary at (1,0) and (−1, 0), whilethe minimum value is −1 reached also on the boundary at (0, 1) and (0,−1).

Example 6.8Find the extremal values of the function

f : S→ R, (x , y) 7→ 4x3 − 4x y + y2 − 4x

where S is the triangle shown in the figure to the right. (i) critical points:

fx(x , y) = 12x2 − 4y − 4= 0

f y(x , y) = −4x + 2y = 0

So y = 2x , and 3x2 − 2x − 1 = 0, giving x = 1 and x = −13 . Thus, we

have two critical points: (1, 2) with f (1,2) = −4 and (−13 ,−2

3), but this liesoutside of S so we ignore it.

L1

L2

L3

(0, 0)

(3,3)(0,3)

x

y

(ii) Max/minimal values on the boundary.On L1, y = x and f (x , x) = 4x3 − 3x2 − 4x and

ddx

�

4x3 − 3x2 − 4x�

= 12x2 − 6x − 4= 0 =⇒ x =3±p

5712

≈ {0.879,−0.379}.

So check x = 0 (with f (0, 0) = 0), x = 3 ( f (3, 3) = 69) and x = 0.879 ( f (0.879, 0.879) ≈ −3.12). On L2,y = 3 and 0≤ x ≤ 3 so that f (x , 3) = 4x3 − 16x + 9. Critical points on L1:

ddx

�

4x3 − 16x + 9�

= 12x2 − 16= 0 =⇒ x = ±2p

3(discard x = −

2p

3)

Check x = 0 ( f (0, 3) = 9), x = 2/p

3 ( f (2/p

3,3) = 32/3p

3−32/p

3+9≈ −3.317) and x = 3 ( f (3,3) = 69).On L3, x = 0 and 0≤ y ≤ 3 so that f (0, y) = y2. This is minimal for y = 0 with f (0, 0) = 0 and maximal fory = 3 with f (0,3) = 9.(iii) The maximal value of f on S is 69 occurring at (3,3) and its minimal value is −4 at (1,2).

Example 6.9Find the dimensions of a rectangular box, open at the top having a volume V and requiring the least amountof material for its construction.Let x > 0, y > 0 and z > 0 be the length, width and height of the box, respectively. Then, we have V = x yzand we need to minimize the surface area A= x y + 2yz + 2xz. We eliminate z = V/(x y) in A obtaining

A= x y +2Vx+

2Vy

.

44

To find extremal values we look at the critical points:

∇A(x , y) = (y −2Vx2

, x −2Vy2) = 0 =⇒ x2 y = x y2 =⇒ x = y =⇒ x = y = 3p2V =: l, z =

12

3p2V =l2

.

The second derivative test gives

HA(x , y) =

�4Vx3 11 4V

y3

�

=⇒ D(l, l) =16V 2

l6− 1=

16V 2

4V 2− 1= 3

and Ax x(l, l)> 0: we have a local minimum.Note, that this problem is different from the previous two. We are here looking for the absolute minimumof A on the unbounded, open (axes not included) first quadrant i.e., a non compact set. It is true but notimmediately clear that the absolute minimum indeed occurs at the local minimum that we have identified.The value at the minimum is A= 3l2.

6.4 Constrained optimisation, Lagrange multipliers

The problem we want to consider is the following: maximise (or minimise) f (x , y)— i.e., optimise f (x , y)—subject to the constraint g(x , y) = 0. This means, find the optimal (maximal or minimal) value of f among allpairs (x , y) for which g(x , y) = 0.

The following simple example illustrates the problem: Among all rectangles with perimeter 2 find those withthe largest area. With the side lengths x and y of the rectangle we have f being the area as f (x , y) = x y , whilethe constraint is the constant perimeter, so that g(x , y) = 2(x + y)− 2. There are two ways to approach thisproblem.

(i) We use the constraint to solve for y , say, in terms of x , which gives y = 1− x , insert into the area, A(x) =f (x , 1− x) = x(1− x). Now we can find the critical point as usual, obtaining x = 1/2, so y = 1/2 and A= 1/4.Clearly, the critical point corresponds to a maximum and we find that the optimal rectangle is a square of sidelength 1

2 .

y

xg(x , y) = 0

Q

(ii) For the second approach we regard the constraint as a curve in the planeand draw the level curves of the area function f (x , y). This curve is drawn inthe figure to the left, where we also find the level curves of the area functionf (x , y). The constraint curve is intersected by almost all the level curves. So,as we move along the constraint curve the values of f change. Where is thelargest value that can be reached on the constraint curve? Evidently, this is atthe point Q. This is the point where the constraint curve and the level curve donot intersect, but only touch. At that point the tangents of the level curve of fand the constraint curve are the same. Or, putting it differently, the gradientsof f and g are parallel, i.e., proportional to each other.

In order to find Q we need to find the location (x , y)where∇ f (x , y) is parallelto ∇g(x , y), i.e., where

∇ f (x , y) = λ∇g(x , y), for some λ ∈ R.

Finishing the example we compute the gradients of f and g

∇ f (x , y) = (y, x), ∇g(x , y) = (1, 1).

The proportionality between the two leads to

(y, x) = λ(1, 1) =⇒ x = y = λ.

45

Inserting into the constraint gives us g(x , x) = 4x − 2 =⇒ x = y = λ = 12 . Thus, we get the same result as

with the first method.

The second method has advantages over the first because it is not always possible to solve the constraint equationfor one of the variables. So, in summary

Note In order to optimise f (x , y) subject to the constraint g(x , y) = 0 solve the equation

∇ f (x , y) = λ∇g(x , y), for some λ ∈ R.

The proportionality factor λ is called a Lagrange multiplier.

Example 6.10

x

yLFind the rectangle of perimeter 1 that has the diagonal with shortest length L.

We need to minimize f (x , y) =p

x2 + y2 subject to the constraint 2(x + y) = 1, i.e., sothat g(x , y) = 2(x + y)− 1= 0.

Requiring that the gradients are parallel, i.e., ∇ f (x , y) = λ∇g(x , y) gives two equations

xp

x2 + y2= 2λ,

yp

x2 + y2= 2λ.

Eliminate λ: xp

x2 + y2=

yp

x2 + y2=⇒ x = y

Now substitute into the constraint:

2(x + x)− 1= 0 =⇒ x = y =14

and the shortest diagonal is

L =

√

√ 116+

116=

1

2p

2.

Example 6.11Maximise and minimise f (x , y) = x + 2y subject to 4x2 + y2 = 1, i.e., g(x , y) = 4x2 + y2 − 1= 0.Look for solutions of ∇ f (x , y) = λ∇g(x , y)

(1, 2) = λ(8x , 2y) =⇒�

1= 8λx ,

2= 2λy.

Eliminate λ: y = 8xInsert into the constraint:

4x2 + 64x2 = 1 =⇒ x = ±1p

68, y = ±

8p

68

Evaluate f at these two points: we get two values 17p68=p

172 and − 17p

68= −

p172 .

46

Example 6.12Find the max/min values of f (x , y) = x y on the circle (x+1)2+ y2 = 1. Here, g(x , y) = (x+1)2+ y2−1= 0.Look for solutions of ∇ f (x , y) = λ∇g(x , y)

(y, x) = λ(2(x + 1), 2y) =⇒�

y = 2λ(x + 1),

x = 2λy.

Eliminate λ:2y2 = 2x(x + 1) =⇒ y = ±

Æ

x(x + 1)

Insert into the constraint:

(x + 1)2 + x(x + 1) = 1 =⇒ 2x2 + 3x = 0 =⇒ x1 = 0, x2 = −32

.

We obtain altogether three points with extremal values:

P1 = (0,0), P2 = (−32

,−p

32), P3 = (−

32

,

p3

2)

Evaluate f at these points: f (P1) = 0, f (P2) =34

p3, f (P3) = −

34

p3. P2 corresponds to a maximum, and P3

to a minimum of f .

Finding the extreme values of a function of more than two variables, subject to a constraint leads to exactly thesame procedure. Now, the constraint g(x1, x2, . . . , xn) = 0 describes a surface in a higher dimensional space,and we need to find the point(s) where the level surfaces of the function just touch the constraint surface. Thecondition is again, that the gradients of f and g should be parallel, i.e., that

∇ f (x1, x2, . . . , xn) = λ∇g(x1, x2, . . . , xn).

Now, the gradients of f and g both have n components, ∇ f = ( fx1, fx2

, . . . , fxn) and similarly for g. This leads

to n equations from which λ has to be eliminated.

Example 6.13 (Three variable version)Find the max/min of f (x , y, z) = x + 2y + 3z subject to the constraint g(x , y) = x2 + y2 + z2 − 1, i.e., on thesphere with radius 1 around the origin.The method to solve this problem is completely analogous to the two-variable case:

• Look for solutions of ∇ f (x , y, z) = λ∇g(x , y, z), i.e., (1, 2,3) = 2λ(x , y, z) giving three equations:

1= 2λx , 2= 2λy, 3= 2λz.

• Eliminate λ1

2x=

1y=

32z

, =⇒ y = 2x , z = 3x .

• Insert into the constraint:

x2 + 4x2 + 9x2 = 1 =⇒ x = ±1p

14.

So we have two points

P1 = (1p

14,

2p

14,

3p

14), P2 = −(

1p

14,

2p

14,

3p

14)

• Evaluate f : f (P1) =p

14 (max) and f (P2) = −p

14 (min).

When there are more than one constraint, i.e., gk(x1, . . . , xn) = 0 for k = 1,2, . . . m, then the condition gener-

47

alises to the equation

∇ f (x1, . . . , xn) = λ1∇g1(x1, . . . , xn) +λ2∇g2(x1, . . . , xn) + · · ·+λm∇gm(x1, . . . , xn)

for m numbers λ1,λ2, . . .λm ∈ R.

To understand this equations we consider the case of two constraints g(x , y, z) = 0 and h(x , y, z) = 0. Thevariables x , y, z are required to satisfy both of these equations, which means geometrically that the point (x , y, z)lies on both surfaces defined by the constraints. So they lie on the intersection of the two surfaces. This isillustrated in the left diagram of Fig. 6.5. Obviously, if the two surfaces have no point in common, then there isno solution to the optimisation problem for f (x , y, z) subject to the given two constraints. In the present case,the intersection of the two surfaces is a curve.

Figure 6.5: Left: Two constraint surfaces intersect in a curve (blue). Right: Level surfaces (light yellow) inter-secting this curve. Extreme values occur where a level surface touches the constraint curve.

The right diagram in Fig. 6.5 shows two level surfaces of a function f (x , y, z). With the same argument as inthe previous cases, we find that the maximal or minimal values of f along the intersection curve occur at pointswhere the intersection curve touches a level surface. At these points, the tangent vector to the intersection curveis perpendicular to the gradient of f . This is the geometric condition that we need.

However, every vector that is perpendicular to the tangent vector of the intersection curve is a linear combinationof the gradient vectors of the constraint functions, ∇g and ∇h. To see this, fix a point P on the intersectioncurve and let t denote its tangent vector at P. All the vectors perpendicular to t lie in the plane Π through P withnormal vector t. In particular, the two gradient vectors ∇g(P) and ∇h(P) lie in that plane. If the two gradientvectors are linearly independent (which we need to assume) then we can write every vector in the plane Π as alinear combination of the two gradients. So, in particular, there are two numbers λ and µ so that

∇ f = λ∇g +µ∇h.

48

Example 6.14 (More than one constraint)Find min/max values for f (x , y, z) = 3x − y −3z subject to the two constraints g(x , y, z) = x + y − z = 0 andh(x , y, z) = x2 + 2z2 − 1= 0.

• Solve ∇ f (x , y, z) = λ∇g(x , y, z) +µ∇h(x , y, z) with two Lagrange multipliers λ and µ:

(3,−1, 3) = λ(1, 1,−1) +µ(2x , 0, 4z) =⇒

3= λ+ 2µx ,

−1= λ,

−3= −λ+ 4µz.

• Eliminate λ and µ:

λ= −1 =⇒�

4= 2µx

−4= 4µz

�

=⇒ x = −2z

• Substitute into the constraints. The second constraint yields

4z2 + 2z2 − 1= 6z2 − 1= 0 =⇒ z = ±1p

6, x = ∓

2p

6

Substitution into the first constraint gives the corresponding z values:

y = z − x = ±3p

6.

So we get two points

P1 = (−2p

6,

3p

6,

1p

6), P2 = (

2p

6,−

3p

6,

1p

6)

• Evaluate f at these points: f (P1) = −2p

6 (min) and f (P2) = 2p

6 (max).

49

7 Functions of several variables

7.1 Vector valued functions

We have discussed at length the special case of a real-valued function of two variables. We have seen severaloccasions where the restriction to two variables was only artificial and where it was quite obvious how one couldgeneralise to a function of more than two variables. We have also had the occasion to encounter functions withseveral components, such as the functions which describe curves in the plane or in space.

This suggests to discuss a more general type of function that comprises both of the examples above.

Definition 7.1. A function f with m components of n variables is a rule, which assigns to every n-tuple(x1, x2, . . . , xn) ∈ Rn an m-tuple (y1, y2, . . . , yn) ∈ Rm. Often, f is also called a map.

We write

f : D→ Rm, (x1, x2, . . . , xn) 7→

y1y2...

ym

=

f1(x1, x2, . . . , xn)f2(x1, x2, . . . , xn)

...fm(x1, x2, . . . , xn)

.

The functions fk : D→ R are called component functions.

To save space we write x = (x1, x2, . . . , xn) and y = f(x) = ( f1(x), f2(x), . . . , fm(x)). When m = 1 the function fhas one component and it is a scalar valued function. When m≥ 2 then we say f is vector valued.

Example 7.1

(i) m= n= 1; f= f is a real valued function of one variable, y = f (x),(ii) m= 2, n= 1; f(t) = (x(t), y(t)) is a planar curve,

(iii) m= 1, n≥ 2; f (x1, x2, . . . , xn) is a real valued function of several variables as discussed so far,(iv) Consider a corn field: this can be described mathematically as a function which assigns to every point

(x , y) ∈ D, a certain region, a vector

v(x , y) =

v1(x , y)v2(x , y)v3(x , y)

(v) vector fields in R2, R3: at every point (x , y) ∈ R2 or (x , y, z) ∈ R3 we imagine attached a vector:[Pictures]

50

(vi) (iv) allows a different interpretation: instead of assigning vectors to a point (x , y) ∈ D we interpret thethree components as the coordinates of a point in R3. Then we get what is called a surface patch.

Φ : D→ R3, Φ(u, v) = (x(u, v), y(u, v), z(u, v)).

Consider for example the map Φ(θ ,φ) = (sinθ cosφ, sinθ sinφ, cosθ ). What can we say about thedomain of Φ? The variables θ and φ can run through all possible real numbers. However, the functionsare periodic with period 2π in θ and in φ, so we can restrict the domain to the region D = (0, 2π)×(0, 2π).What is the range of the map Φ? We can write x = sinθ cosφ, y = sinθ sinφ, z = cosθ , then wediscover that x2+ y2+z2 = 1. This means that every point in the range of Φ lies on the unit sphere. Themap Φ gives us another way to describe points on the sphere. A point on the sphere is characterised bytwo angles, such as its longitude and latitude.

(vii) Φ : R2 → R3, (u, v) 7→ (u, v,p

u2 + v2). Again, this map describes a surface in R3 since all points(x , y, z) = Φ(u, v) satisfy the equation z =

p

x2 + y2 which describes the upper half of a cone.

Exercise.How do you have to restrict the domain so that Φ in (vi) describes the largest part of the sphere without coveringany point more than once?

7.2 Differentiation of vector valued functions

Vector valued functions can be differentiated component by component. Let f : D→ Rm, D ⊂ Rn and write

f(x) = ( f1(x1, x2, . . . , xn), f2(x1, x2, . . . , xn), . . . , fm(x1, x2, . . . , xn)).

Each component function is a real valued function on D and we can compute its partial derivatives�

∂ fk

∂ x1(x),

∂ fk

∂ x2(x), . . . ,

∂ fk

∂ xn(x)�

=∇ fk(x).

The collection of all partial derivatives of all component functions forms a matrix

Jf(x) =

∂1 f1(x) ∂2 f1(x) · · · ∂n f1(x)∂1 f2(x) ∂2 f2(x) · · · ∂n f2(x)

......

...∂1 fm(x) ∂2 fm(x) · · · ∂n fm(x)

,

the Jacobian matrix of f. It is a m× n matrix, i.e., it has m rows (as many as f has components) and n columns(as many as f has variables).

Let us now consider some special cases:

(i) f : R→ R; in this case m= n= 1 and the Jacobian matrix has only one element

J f (x) = [ f′(x)].

(ii) f : R2→ R; here n= 2, m= 1 and

J f (x) = [∂1 f (x , y),∂2 f (x , y)] =∇ f (x , y),

i.e., the Jacobian matrix is identical with the gradient of f .

51

(iii) γ : R→ R3, with m= 3 and n= 1 and

Jγ(t) =

∂tγ(t)∂tγ(t)∂tγ(t)

= γ(t),

so the Jacobian matrix coincides with the tangent vector of γ.

7.3 Composition of vector valued functions

We want to define the composition of two vector valued functions f and g, where f has m components anddepends on k variables, i.e.,

(y1, y2, . . . , ym) = f(x) = ( f1(x1, x2, . . . , xk), f2(x1, x2, . . . , xk), . . . , fm(x1, x2, . . . , xk)).

We assume that g has l components and depends on n variables, i.e.,

(v1, v2, . . . , vl) = g(u) = (g1(x1, x2, . . . , xn), g2(x1, x2, . . . , xn), . . . , gl(x1, x2, . . . , xn)).

The composition f ◦ g of two functions involves the evaluation of f(g(u)). Clearly, the composition f ◦ g takesn variables (as many as g) and it has m components (as many as f). But in order for this to make any sensewe need to make sure that f and g are compatible: f depends on k variables so it expects k arguments which ghas to deliver. So the composition of f and g only makes sense if g has exactly as many components as f expects.Thus, we need l = k.

Definition 7.2. Let f : D→ Rm, D ⊂ Rk and g : U → Rk, U ⊂ Rn and suppose that range g ⊂ D. Then we definethe composition function f ◦ g : U → Rm by

f ◦ g(u) = f(g(u)).

f ◦ g is a function with m components and n variables.

Example 7.2f (x , y) = x2 + y2, g(r,φ) = (r cosφ, r sinφ). The composition f ◦ g is defined in terms of f and g by

f ◦ g(r,φ) = f(g(r,φ)) = f(r cosφ, r sinφ) = r2.

Example 7.3Consider the tree diagram

s t s s t u

x y z

w

How is this related to composition functions? We start at the top. This describes a function f : R3 → R withw = f (x , y, z). The variables x , y, z depend on three other variables s, t and u. Therefore, there is function ghidden in there which is described by the second and third level of the diagram. This function delivers threearguments to f and depends on three variables, thus

g : R3→ R3, (s, t, u) 7→ (x , y, z) = (g1(s, t), g2(s), g3(s, t, u)).

52

The composition function f ◦ g is

f ◦ g(s, t, u) = f (x(s, t), y(s), z(s, t, u)).

It corresponds exactly to the tree diagram above.

Example 7.4Consider the expression f (u2 + v2, u − v, u2 − v2). Again, this can be described by a composition function.Here, f depends on three variables and has one component. But the entire expression depends only on twovariables, u and v. Thus, there is a function g hidden in there which delivers three arguments, say x , y and z,to f and depends on the two variables u and v:

g : R2→ R3, (u, v) 7→ (x , y, z) = (g1(u, v), g2(u, v), g3(u, v)) = (u2 + v2, u− v, u2 − v2).

So we have f (u2 + v2, u− v, u2 − v2) = f ◦ g(u, v).

Example 7.5Suppose f : R3 → R is a scalar function of 3 variables and that γ : R → R3 is a curve. So we can writeγ(t) = (x(t), y(t), z(t)). The composition function f ◦ γ has one component and depends on one variable,

f ◦ γ(t) = f (γ(t)) = f (x(t), y(t), z(t)).

7.4 The chain rule for vector valued functions

Let f : D→ Rm, D ⊂ Rk and g : U → Rk, U ⊂ Rn, be such that f ◦ g : U → Rm is defined.

We want to compute its derivative, i.e., its Jacobian matrix Jf◦g. The composition f ◦ g has m components anddepends on n variables. So we expect that its Jacobian matrix has m rows and n columns. On the other hand,we know that Jf is a m× k matrix, while Jg is a k× n matrix. The only way to combine these two matrices to am× n matrix is to multiply them. Schematically

Jfmk

×

Jg

n

k

= Jf◦gmn

This is in fact how to compute the Jacobian of a composition function and we can state the general chain rule

Theorem 7.1. Let f : D→ Rm, D ⊂ Rk and g : U → Rk, U ⊂ Rn, be such that f ◦ g : U → Rm is defined. Suppose fand g are differentiable and let Jf and Jg be their Jacobian matrices. Then f ◦ g is differentiable and we compute itsJacobian Jf◦g at a point u ∈ U by matrix multiplication

Jf◦g(u) = Jf(g(u)) · Jg(u).

53

Example 7.6f (x , y) = x2 + y2, g(r,φ) = (r cosφ, r sinφ), then f ◦ g(r,φ) = r2.We compute the Jacobians

J f (x , y) = [2x , 2y], Jg(r,φ) =

�

cosφ −r sinφsinφ r cosφ

�

.

According to the chain rule

J f ◦g(r,φ) = J f (r cosφ, r sinφ) · Jg(r,φ) = [2r cosφ, 2r sinφ]

�


�

= [2r, 0]

Compare this with the direct calculation

J f ◦g(r,φ) =�

∂r r2,∂φ r2�

= [2r, 0] .

Example 7.7Consider again the tree diagram in Example 7.3. We identified this as the composition of f (x , y, z) and g(s, t, u)with

g(s, t, u) = (x(s, t), y(s), z(s, t, u)),

with w= f (x(s, t), y(s), z(s, t, u)). We compute the Jacobians

J f (x , y, z) =�

fx(x , y, z), f y(x , y, z), fz(x , y, z)�

, Jg(s, t, u) =

xs(s, t) x t(s, t) 0ys(s) 0 0

zs(s, t, u) zt(s, t, u) zu(s, t, u)

Then,J f ◦g(s, t, u) = J f (x(s, t), y(s), z(s, t, u)) · Jg(s, t, u)

=�

fx , f y , fz�

xs x t 0ys 0 0zs zt zu

=�

fx xs + f y ys + fzzs︸︷︷︸

∂sw

, fx x t + fzzt︸︷︷︸

∂t w

, fzzu︸︷︷︸

∂uw

�

.

With the explicit arguments we get for example:

∂ w∂ s(s, t, u) = fx(x(s, t), y(s), z(s, t, u)) xs(s, t)

+ f y(x(s, t), y(s), z(s, t, u)) ys(s)

+ fz(x(s, t), y(s), z(s, t, u)) zs(s, t, u).

We see from this that fx , f y and fz look exactly the same as f and so their derivatives are computed in thesame way as those of f .

54

Example 7.8Compute the derivatives of f (u2 + v2, u− v, u2 − v2). We define g as before

g(u, v) = (u2 + v2, u− v, u2 − v2)

and compute the Jacobians

J f (x , y, z) =�

fx(x , y, z), f y(x , y, z), fz(x , y, z)�

, Jg(u, v) =

2u 2v1 −12u −2v

and so (leaving off the arguments due to space restrictions)

J f ◦g(u, v) = J f (x(u, v), y(u, v), z(u, v)) · Jg(u, v)

=�

fx(x(u, v), y(u, v), z(u, v)), f y(x(u, v), y(u, v), z(u, v)), fz(x(u, v), y(u, v), z(u, v))�

·

2u 2v1 −12u −2v

=�

2ufx + f y + 2ufz , 2v fx − f y − 2v fz�

Example 7.9f : R3→ R, γ : R→ R3. Compute the derivative of f ◦ γ.

d( f ◦ γ)dt

(t) = J f ◦γ(t) = J f (γ(t)) · Jg(t)

=�

fx(x(t), y(t), z(t)), f y(x(t), y(t), z(t)), fz(x(t), y(t), z(t))�

·

x(t)y(t)z(t)

= fx(γ(t)) x(t) + f y(γ(t)) y(t) + fz(γ(t)) z(t).

Example 7.10Prove the product rule for two functions f (x) and g(x) using the general chain rule.

Consider the function Φ : R2 → R defined by Φ(u, v) = uv and the function h : R → R2 defined by h(x) =[ f (x), g(x)]. Then

Φ ◦ h(x) = Φ(h(x)) = f (x)g(x).

The derivative of Φ ◦ h can be found from the Jacobians of the individual functions

JΦ(u, v) = [v, u], Jh(x) =

�

f ′(x)g ′(x)

�

.

Thus,

ddx( f (x)g(x)) =

ddx(Φ(h(x)) = JΦ◦h(x) = JΦ(h(x)) · Jh(x) = [g(x), h(x)] ·

�

f ′(x)g ′(x)

�

= g(x) f ′(x)+ f (x)g ′(x).

7.5 More on surfaces, implicit differentiation

Consider a simple example of a curve given as the graph of a function f .

55

y = f (x) =p

1− x2

f : (−1, 1)→ R, x 7→p

1− x2 x

y

half circle

All points on the graph satisfy the equation x2+ y2 = 1. But, clearly, there are more points in the x y-plane whichsatisfy this equation, namely the full circle. In fact, let us define the function F : R2→ R, (x , y) 7→ x2 + y2 − 1,then F(x , y) = 0 on the entire circle. Hence, the circle is a level curve of F given by F(x , y) = 0.

Therefore, we have found a different way to represent the graph of f , namely as (part of) a level curve of afunction F of two variables. The obvious question we may ask is: how can we get from F(x , y) = 0 back toy = f (x)? Thinking of F(x , y) = 0 as an equation between x and y we need to solve this equation for y interms of x . In the example, we need to solve

0= F(x , y) = x2 + y2 − 1

for y , giving the resulty = ±

p

1− x2.

We get two possibilities and we need to choose the appropriate one.

Note It is impossible to express the complete circle as the graph of a function y = f (x) of one variable. Canyou see a reason why this is case?

Consider now the gradient of F : we know that ∇F(x , y) is perpendicular to the level curves of F . So it isperpendicular to the circle; more precisely, it is perpendicular to the tangent line at a point of the circle and wecan give the equation for the tangent line at a point (x0, y0) on the circle as

n1(x − x0) + n2(y − y0) = 0, where (n1, n2) =∇F(x0, y0).

In our example the gradient is

∇F(x0, y0) =

x0q

x20 + y2

0

,y0

q

x20 + y2

0

!

= (x0, y0)

and the equation for the tangent line becomes x0(x − x0) + y0(y − y0) = 0 or

x0 x + y0 y = 1.

Solving this equation for y yields the equation for a line in the usual form y = mx + b, namely

y = −x0

y0︸︷︷︸

=m

x +1y0

,

where m is the slope of the line.

Consider now the function f whose graph we are discussing. Its derivative at x0 is

f ′(x0) =−x0

q

1− x20

= −x0

y0

and — necessarily — agrees with the slope of the tangent line at that point. So we find that

f ′(x0) = −Fx(x0, y0)Fy(x0, y0)

where y0 = f (x0).

56

Exercise.Clearly, this formula does not make sense when Fy(x0, y0) = 0. Discuss this case in the example given and relateit to the behaviour of f and its graph.

We can phrase these results in the form of a theorem:

Theorem 7.2. Given a function F of two variables, the equation F(x , y) = 0 can be regarded as defining y as afunction y = f (x) of x. Then F(x , f (x)) = 0 for all values of x. The derivative of f at x0 is determined in termsof the derivative of F by the formula

f ′(x0) = −Fx(x0, y0)Fy(x0, y0)

where y0 = f (x0).

We can derive this formula easily using the chain rule by differentiating the formula F(x , f (x)) = 0 with respectto x . This yields at x0

Fx(x0, f (x0)) + Fy(x0, f (x0)) f′(x0) = 0

from which the result follows.

This result can be generalised. To keep it reasonably simple we only discuss the case of a function F(x , y, z)of three variables. Again, the equation F(x , y, z) = 0 defines a level surface of F . We want to regard this levelsurface as the graph of a function f which defines z in terms of the other variables x and y . As the example aboveshows we can not expect to get the entire level surface in this way but in general there is at least a part whichcan be given as the graph of a function. In order to get this function we regard the equation F(x , y, z) = 0 as anequation for z and solve it for z. This may not be explicitly possible but we can always think of z = f (x , y) asbeing implicitly defined by this equation. With z determined in this way we get the equation F(x , y, f (x , y)) = 0for all values x , y for which f is defined.

The tangent plane of the level surface at a point (x0, y0, z0) with F(x0, y0, z0) = 0, i.e., with z0 = f (x0, y0) hasa normal vector given by the gradient ∇F(x0, y0, z0) and its equation (in normal form) is therefore

∇F(x0, y0, z0) · (x− x0) = 0, where x=

xyz

, x0 =

x0y0z0

.

Written out explicitly, we obtain

Fx(x0)(x − x0) + Fy(x0)(y − y0) + Fz(x0)(z − z0) = 0.

Suppose that Fz(x0) 6= 0, then we can rewrite this equation in the form

(z − z0) +Fx(x0)Fz(x0)

(x − x0) +Fy(x0)

Fz(x0)(y − y0) = 0.

This is the equation of the tangent plane and it must agree with our previous formula (see Definition 5.4). Forthis to be true we must have

z0 = f (x0, y0), fx(x0, y0) = −Fx(x0)Fz(x0)

, f y(x0, y0) = −Fy(x0)

Fz(x0). (7.1)

Another way to obtain these formulae is by using the chain rule. We start with the equation F(x , y, f (x , y)) = 0.The left hand side of this equation depends on two variables, x and y . But F itself depends on three variables.So we define a function g which takes two variables and delivers three results:

g : R2→ R3, (x , y) 7→ (x , y, f (x , y)).

57

Then the equation becomes F(g(x , y)) = 0, or F ◦g(x , y) = 0. But then we also have JF◦g(x , y) = 0. Computingthe Jacobians we find

JF (x) =∇F(x) = [Fx(x), Fy(x), Fz(x)], Jg(x , y) =

1 00 1

fx(x , y) f y(x , y)

.

The chain rule implies

JF◦g(x , y) = JF (g(x , y)) · Jg(x , y) = [Fx(g(x , y)) + fx(x , y)Fz(g(x , y)), Fy(g(x , y)) + f y(x , y)Fz(g(x , y))]

and JF◦g(x0, y0) = 0 gives the two equations

fx(x0, y0) = −Fx(g(x0, y0))Fz(g(x0, y0))

, f y(x0, y0) = −Fy(g(x0, y0))

Fz(g(x0, y0)).

These equations agree with the equations (7.1) above since g(x0, y0) = (x0, y0, f (x0, y0)) = x0.

Example 7.11The equation z4 + z y2 + x3 − y = 0 defines z = f (x , y) as a function f of x and y . Find fx and f y (or zx andzy).With F(x , y, z) = z4+z y2+ x3− y we get Fx = 3x2, Fy = 2yz−1 and Fz = 4z3+ y2. Then the formulae (7.1)give

fx = −Fx

Fz= −

3x2

4z3 + y2, f y = −

Fy

Fz= −

2yz − 14z3 + y2

.

Another way to get this result is by regarding z as a function of x and y and differentiating the equation withrespect to x . This gives the equation

4z3zx + zx y2 + 3x2 = 0 =⇒ zx = −3x2

4z3 + y2.

Similarly, differentiating with respect to y and solving the resulting equation for zy yields

4z3zy + zy y2 + 2yz − 1= 0 =⇒ zy =1− 2yz4z3 + y2

.

In these equations we have think of z as being determined in terms of x and y .

Example 7.12F(x , y, z) = x2 + y2 + z − sinh(zx) = 0. Find yx and yz .

The gradient of F is ∇F(x , y, z) = [2x − z cosh(zx), 2y, 1− x cosh(zx)] and the equations (7.1) yield

yx = −Fx

Fy=

z cosh(zx)− 2x2y

, yz = −Fz

Fy=

x cosh(zx)− 12y

.

58

Example 7.13We can also obtain the equations (7.1) using differentials. Suppose we solve the equation F(x , y, z) = 0 forz = f (x , y). Then, when we change x and y and compute the ensuing change in z we should not see anychange in F , since we only move from one point on the level surface defined by F to another. Now the changein z is given by the differential of f

dz = fxdx + f ydy.

Inserting this into the differential of F yields

dF = Fxdx + Fydy + Fzdz = Fxdx + Fydy + Fz( fxdx + f ydy) = (Fx + Fz fx)dx + (Fy + Fz f y)dy.

Putting dF = 0 we get the equations (7.1).

59

8 Integration

8.1 The Riemann integral

We first recall the definition of the Riemann integral for a function of one real variable:

∫ b

af (x)dx

is defined as follows.

x

y

a b

Figure 8.1: Computing a Riemann sum

• subdivide the interval [a, b] by choosing points a = x0 < x1 < x2 < · · ·< xn = b,• from each subinterval [x i−1, x i] choose a point x∗i and then form the sum

(x1 − x0) f (x∗1) + (x2 − x1) f (x

∗2) + · · ·+ (xn − xn−1) f (x

∗n) =

∑

i=1

f (x∗i )∆x i , ∆x i = x i − x i−1,

Each term in this sum represents the area of the rectangle with width ∆x i and height f (x∗i ). Therefore,the sum approximates the area under the graph of the function f .

Definition 8.1. The Riemann integral∫ b

a f (x)dx is defined as the limiting value as all ∆x j → 0 so

∫ b

af (x)dx := lim

max{∆x j}→0

n∑

i=0

f (x∗i )∆x i .

Note This is still a very non-rigorous definition. Obviously, when max{∆x j} → 0 then the number n ofintervals (and intermediate points) must diverge since

∑ni=1∆i = b−a, the length of the interval. The correct

definition must also ensure that the value that is obtained for the limit does not depend on the choice of thesubdivision of the interval [a, b] and the choice of the intermediate points.

We can think of the integral as an infinite sum over infinitesimally thin rectangles (this goes back to Leibniz): ifdx is an ‘infinitesimally’ small length, then f (x)dx is an ‘infinitesimally’ thin rectangle between the point x andx+dx and then the integral is the formal infinite sum

∫

f (x)dx . The shape of the integral sign is a stylised S.

60

Note The integral represents the signed area between the graph and the x-axis in the sense that pieces thatlie below the x-axis count negatively.

8.2 Double integrals

Given a function f (x , y) of two variables defined over a region R ⊂ R2 how could we define the integral of f

x

y

a b

c

d

Figure 8.2: A rectangular domain of integration

over the region? To start off with the simplest case we thinkof R as a rectangle, R= [a, b]×[c, d] as indicated in Fig. 8.2.How can we interpret the double integral

∫∫

Rf (x , y)dxdy?

There are several possibilities:

(i) Iterated integral

∫ d

c

¨

∫ b

af (x , y)dx

«

dy.

This formula indicates to first integrate f (x , y) with respect to x holding y fixed, obtaining a function of y only,say

g(y) =

∫ b

af (x , y)dx

and then integrating this function over the interval [c, d], i.e.,

∫ d

cg(y)dy =

∫ d

c

¨

∫ b

af (x , y)dx

«

dy

to obtain the final answer. Fig. 8.3 is a geometric picture for the iterated integral:

y

x

a

b

z

this area represents∫ b

a f (x , y)dx = g(y)

Figure 8.3: The iterated integral as sliced volume under the graph of f

61

Example 8.1

∫ 1

0

¨

∫ 2

1

x y3dx

«

dy =

∫ 1

0

12

x2 y3�

�

�

2

1dy =

∫ 1

0

32

y3dy =38

y4�

�

�

1

0=

38

.

Note The interated integral represents the signed volume contained between the graph of the function f andthe (x , y)-plane. Again, this means that the pieces below the plane count negatively.

(ii) the other iterated integral:∫ b

a

¨

∫ d

cf (x , y)dy

«

dx .

Note This iterated integral has the same geometric interpretation as the previous iterated integral exceptthat we slice the volume along the y-direction, i.e., perpendicular to the x-axis.

Example 8.2

∫ 2

1

¨

∫ 1

0

x y3dy

«

dx =

∫ 2

1

14

x2 y4�

�

�

1

0dx =

∫ 2

1

14

xdx =18

x2�

�

�

2

1=

38

.

(iii) the double integral∫∫

Rf (x , y)dxdy.

Here, the volume is thought of as a sum of column elements. We subdivide the rectangle R into small rectangleswith the side lengths ∆x i and ∆y j , see Fig 8.4 for an illustration.

y

x

a

b

z

Figure 8.4: The double integral obtained by adding the volumes of small columns.

In each of the small rectangle we choose a point (x∗i j , y∗i j) and then we sum up the volumes of all the columnswith cross-sectional area ∆x i ·∆y j and height f (x∗i j , y∗i j)

∑

i, j

f (x∗i j , y∗i j)∆x i ·∆y j .

To get the double integral we take the limit of this sum as the subdivision of R becomes finer and finer.

62

Note As in the 1-dimensional case this limit has to be defined much more rigorously, see section 15.1 ofStewart. As in the previous cases this integral also represents the signed volume under the graph of thefunction f .

Which of these three possibilities is the correct one? They all are ‘reasonable’ but they have a different flavourto them. While (iii) is the mathematically correct generalisation of the 1-dimensional case the other two casesare helpful for computing the integrals. One would expect that they should all agree and this is in fact true ifthe function f is ‘nice’. This is the content of

Theorem 8.1 (Fubini’s theorem). If f is a continuous function on R= {(x , y) ∈ R2 : x ∈ [a, b], y ∈ [c, d]} then

∫∫

Rf (x , y)dxdy =

∫ b

a

¨

∫ d

cf (x , y)dy

«

dx =

∫ d

c

¨

∫ b

af (x , y)dx

«

dy.

This is not the most general case for when this theorem is true, see section 15.2 of Stewart for more details. Stewart15.2

Even though the three formulae must give the same results, they may be very different in practice. Consider thefollowing

Example 8.3Evaluate the integral

∫∫

[0,2]×[0,1]xex ydxdy.

Since the function f (x , y) = xex y we can compute the integral by either iterated integral. We first compute

∫ 1

0

¨

∫ 2

0

xex y dx

«

dy.

The inner integral becomes (using integration by parts)

∫ 2

0

xex y dx =xy

ex y�

�

�

2

0−∫ 2

0

1y

ex y dx =2y

e2y −1y2

ex y�

�

�

2

0=

2y

e2y −1y2

e2y +1y2

.

Next, the outer integral

∫ 1

0

2y

e2y −1y2

e2y +1y2

dy =

∫ 1

0

ddy

�

e2y

y−

1y

�

dy =e2y − 1

y

�

�

�

1

0= e2 − 1− lim

y→0

e2y − 1y

︸︷︷︸

l’Hôpital’s rule

= e2 − 3.

Now we compute the second iterated integral

∫ 2

0

¨

∫ 1

0

xex y dy

«

dx =

∫ 2

0

§

ex y�

�

�

1

0

ª

dx =

∫ 2

0

(ex − 1)dx = ex − x�

�

�

2

0= e2 − 2− 1= e2 − 3.

Obviously, we get the same result but with much less effort.

8.3 Integrals over irregular domains

So far we have looked at integrals over rectangular regions. How would we proceed if the region was not arectangle but a much more general shape?

63

x

y

a b

y1(x)

y2(x)

R

Figure 8.5: An irregular domainsliced vertically

Suppose that the region R is contained between x = a and x = b and that foreach value of x ∈ [a, b] the segment of y-values inside R is between y1(x)and y2(x), see Fig. 8.5. Then the integral becomes

∫∫

Rf (x , y)dxdy =

∫ b

a

¨

∫ y2(x)

y1(x)f (x , y)dy

«

dx .

Note One can prove this formula from the definition of the double inte-gral as a limit of Riemann sums. But for motivation it is enough to thinkof this integral as being obtained by slicing the region vertically into in-finitesimally thin slabs and integrating the function

g(x) =

∫ y2(x)

y1(x)f (x , y)dy, for every x ∈ [a, b].

x

y

c

d

R x2(y)

x1(y)

Figure 8.6: An irregular domainsliced horizontally

As before, the situation can also be viewed in an alternative way by thinkingof the region as being sliced horizontally, see Fig. 8.6.

Then the corresponding formula for the evaluation of the integral becomes

∫∫

Rf (x , y)dxdy =

∫ d

c

¨

∫ x2(y)

x1(y)f (x , y)dx

«

dy.

And as before we find the two formulae yield the same results if the functionf and the region R are ‘nice’. This is a more general version of Fubini’stheorem,

∫∫

Rf (x , y)dxdy =

∫ b

a

¨

∫ y2(x)

y1(x)f (x , y)dy

«

dx =

∫ d

c

¨

∫ x2(y)

x1(y)f (x , y)dx

«

dy.

Example 8.4Compute the double integral

∫∫

R(2x − y)dxdy where R is the region bounded by the upper unit semi-circleand the x-axis.

Slicing the region vertically the integral becomes

∫ 1

−1

(

∫

p1−x2

0

2x − y dy

)

dx =

∫ 1

−1

§

2x y −12

y2ª�

�

�

p1−x2

0dx

=

∫ 1

−1

2xp

1− x2 −12(1− x2)dx = −

23(1− x2)3/2 −

12

x +16

x3�

�

�

1

−1= −

23

.

x

y

−1 1

On the other hand, we can compute the integral also by slicing the region horizontally. This gives the integral

∫ 1

0

(

∫

p1−y2

−p

1−y2

2x − y dx

)

dy =

∫ 1

0

�

x2 − y x�

�

�

p1−y2

−p

1−y2

�

dy =

∫ 1

0

−2yÆ

1− y2 dy =23(1− y2)3/2

�

�

�

1

0= −

23

.

64

Example 8.5

Formulate the integral∫ 2

1

n

∫ x2

1 f (x , y)dyo

dx as an integral with the order of integration reversed.

First we need to find the region R:• from the limits of the inner integral 1≤ y ≤ x2

• from the limits of the outer integral 1≤ x ≤ 2This gives the region as indicated in Fig. 8.5 and the given integral is obtainedby slicing it vertically. In order to get the integral with the order reversed weneed to slice horizontally and this yields a different description of the sameregion:

• along the horizontal slices:p

y ≤ x ≤ 2• for values of y with 1≤ y ≤ 4.

x

y

1 2

1

R

This gives us the desired integral∫ 4

1

¨

∫ 2

py

f (x , y)dx

«

dy.

Example 8.6Write

∫∫

R f (x , y)dxdy as a repeated integral where R is the region indicated in Fig. 8.6Here, for every fixed y ∈ [0,1] the values of x lie in the interval [y, 2 − y]and we can write the integral as the iterated integral

∫ 1

0

¨

∫ 2−y

yf (x , y)dx

«

dy.x

y

(0, 0)

(1,1)

(2, 0)

R

Viewing the region sliced along the y-direction gives a more complicated expression, namely

∫ 1

0

�∫ x

0

f (x , y)dy

�

dx +

∫ 2

1

¨

∫ 2−x

0

f (x , y)dy

«

dx

Example 8.7Compute

∫∫

R x dxdy where R is the region bounded by the two curves y = 6x − x2 and y = x + 4.The curves intersect when 6x − x2 = x + 4, i.e., when x2 − 5x +4 = 0, giving x = 1 and x = 4. If we would slice the regionhorizontally, we would have to split the integral into two pieces.To avoid that, we slice vertically. Then the integral becomes

∫∫

Rx dxdy =

∫ 4

0

(

∫ 6x−x2

x+4

x dy

)

dx

=

∫ 4

0

�

x((6x − x2)− (x + 4))dy

dx

= −14

x4 +53

x3 − 2x2�

�

�

4

1= 11

14=

454

. x

y

6

6

7

8

9

1

R

65

Note The following remarks are useful:(i)

∫∫

R f (x , y)dxdy is the signed volume between the graph surface of f and the (x , y)-plane (z = 0).(ii) If f (x , y) ≥ g(x , y) over the region R then

∫∫

R f (x , y)− g(x , y)dxdy is the volume between the twograph surfaces.

(iii) Taking f (x , y) = 1 gives

∫∫

R1dxdy = volume shown

= area(R) · 1= area(R).

1

R

(iv) Recall how we compute the area between the graphs of two functions:

area=

∫ b

af (x)− g(x)dx =

∫ b

a

¨

∫ f (x)

g(x)1dy

«

dx

(v) In a completely analogous way one can define triple integrals, quadruple integrals or even integrals withany number of variables.

8.4 The change of variables formula for double integrals

Recall the change of variables formula (substitution rule) in the one-dimensional case:

∫ b

af (x)dx =

∫ x−1(b)

x−1(a)f (x(s))x ′(s)ds.

Here it is important that x(s) is a function with the property that x ′(s) 6= 0. What is the corresponding formulafor double integrals? To answer this question we need to talk about coordinates.

Consider the plane. We want to distinguish points in the plane. To this end we need to label them in some waythat gives each point a unique label. The usual way to do this is to

• select a special point, called the origin• drawing two perpendicular lines through the origin• selecting a unit on each line• assigning numbers xP and yP to a point P according to where a parallel

of an axis through P intersects the other axis.x

y

O

P

1

1

xP

yP

We can think of this process as covering the plane with a net of lines x = const and y = const. Every point inthe plane lies on exactly one line of constant x and one line of constant y . But no one tells us that this is theonly way to do it and that we need to do it in this way (see Fig. 8.7). In fact, here is another way: we chooseone special point, the origin and then we draw straight lines through it in all possible angles. Finally, we drawconcentric circles around the origin with arbitrary radii. We select one of the lines starting at O (usually theone going horizontally to the right) and assign to every half-line starting at O the angle to that selected line.Now, every point P on the plane lies on a circle and one of the half-lines and we can assign two numbers tothe point P, the radius of the circle and the angle of the half-line on which P lies. These are the so called polarcoordinates.

More generally, one can define arbitrary coordinates (u, v). We can think of the plane as being covered by twofamilies of lines given by u = const or v = const, see Fig. 8.7(c). A fixed pair (u, v) specifies exactly one point.

66

P

O

y

x

(a) Cartesian coordinates

O 0

30

60

90

120

150

180

210

240270

300

330

φ

r

(b) Polar coordinates

−2.0

−2.0−1.5

−1.5−1

−1−0.5

−0.50

00.5

0.51

11.5

1.52

22.5

2.53

33.5

3.54

44.5

4.55

55.5

5.56

66.5

6.57

7

P

(c) A general coordinate system with anet of lines u= const and v = const andthe point P = (1.5, 3)

Figure 8.7: Three ways to assign coordinates in the plane. Cartesian coordinates (left) and polar coordinates(middle) and general coordinates (right)

This point has also Cartesian coordinates (x , y). Thus, we obtain an assignment (u, v) 7→ (x , y) = Φ(u, v) =(x(u, v), y(u, v)) which we can use to express one set of coordinates in terms of other coordinates. For polarcoordinates this transformation is

Φ : (r,φ) 7→ Φ(r,φ) = (r cosφ, r sinφ) = (x , y).

Now suppose we want to integrate a function f over a region R but we cover the region with a general coordinatenet. We proceed in exactly the same way as for Cartesian coordinates (see Fig. 8.8(a)). We subdivide the regioninto small sections, select a point in each section and approximate the integral as a Riemann sum as before

∫∫

Rf dudv =

∑

i

f (Pi)∆Ai (8.1)

where Pi is a point in one of the small patches between two consecutive coordinate lines and where ∆Ai is thearea of that patch. When we make the net finer then we would expect that this sum approaches the value of the

R

(a) Subdivision of a region R into severalpatches with selected points

O

P Q

uu+∆u

v

v +∆v

(b) Closeup of one patch

Figure 8.8: Subdivision of the region R in general coordinates with patches and selected points

integral. But in order to obtain a relationship between the integral computed with Cartesian coordinates (x , y)and that computed in (u, v) coordinates we need an expression for ∆Ai in terms of ∆x and ∆y . Fig. 8.8(b)shows the closeup of one particular patch. What is its area?

67

The Cartesian coordinates of the marked points are

O : (x(u, v), y(u, v)),

P : (x(u, v +∆v), y(u, v +∆v)),

Q : (x(u+∆u, v), y(u+∆u, v)).

When ∆u and ∆v are small then the patch is approximated quite well by the parallelogram spanned by thevectors

−→OP and

−→OQ. These vectors are

−→OP =

�

x(u, v +∆v)y(u, v +∆v)

�

−�

x(u, v)y(u, v)

�

=

�

x(u, v +∆v)− x(u, v)y(u, v +∆v)− y(u, v)

�

=

�

xv(u, v)yv(u, v)

�

∆v + o(∆v)

and, similarly,−→OQ =

�

xu(u, v)yu(u, v)

�

∆u+ o(∆u)

The area ∆A of the patch is therefore approximated by the area of the parallelogram which can be computed bythe cross product

∆A≈ (xu yv − xv yu)∆u∆v + o(∆u) + o(∆v).

Putting this back into the approximating sum for the integral (8.1) and taking the limit as∆u,∆v→ 0 we obtainthe formula

∫∫

Rf (x , y)dxdy =

∫∫

R′f (x(u, v), y(u, v))

�

�

�

�

∂ (x , y)∂ (u, v)

�

�

�

�

dudv

Here R′ is the description of the region R in terms of the (u, v)-coordinates and ∂ (x ,y)∂ (u,v) is a common abbreviation

for xu yv − xv yu. With Φ(u, v) = (x(u, v), y(u, v)) we find that

JΦ(u, v) =

�

xu xvyu yv

�

so that∂ (x , y)∂ (u, v)

= xu yv − xv yu = det JΦ(u, v),

the Jacobian determinant. Thus, we can also write the change of variables formula as

Theorem 8.2 (Change of variables formula for double integrals).

∫∫

Rf (x , y)dxdy =

∫∫

R′f (x(u, v), y(u, v)) |det JΦ(u, v)| dudv.

And this is the form which one can generalize to more than two variables

Theorem 8.3 (Change of variables formula for multiple integrals). Let Φ : Rn→ Rn be a coordinate transforma-tion, i.e., a map for which det JΦ(u1, u2, . . . , un) 6= 0 for all (u1, u2, . . . , un) ∈ R′ and let R denote the image of R′

under Φ, then∫

. . .

∫∫

Rf (x1, x2, . . . , xn)dx1dx2 . . . dxn =

∫

. . .

∫∫

R′f (Φ(u1, u2, . . . , un)) |det JΦ(u1, u2, . . . , un)|du1du2 . . . dun.

Note In order for the transformation formula to make sense we need that det JΦ(u1, u2, . . . , un) 6= 0. In par-ticular, in the two-dimensional case this amounts to xu yv− xv yu 6= 0. Otherwise, the area of the small patches∆A becomes zero and cannot contribute to the integral.

68

Example 8.8Compute the integral

∫∫

(x + y)dxdy where R is the region between the lines y = x , y = x + 4, y = 4− 2x ,y = 8− 2x .Note, that the lines can be written also in the form y − x = 0, y − x = 4,y + 2x = 4 and y + 2x = 8. This suggests to introduce new coordinatesu= y − x and v = y +2x . Then, in the (u, v)-description the region R can beexpressed as 0 ≤ u ≤ 4 and 4 ≤ v ≤ 8. Thus, the region R′ is a square withside length 4. We can express x and y in terms of u and v

x =13(v − u), y =

13(v + 2u).

x

y

Rv

u

R′

To use the transformation formula we need the coordinate transformation

Φ(u, v) = (x(u, v), y(u, v)) = (13(v − u),

13(v + 2u))

and its Jacobian

JΦ(u, v) =

�

−13

13

23

13

�

The Jacobian determinant is det JΦ(u, v) = ∂ (x ,y)∂ (u,v) = −

13 and so the formula gives

∫∫

R(x + y)dxdy =

∫∫

R′(13(v − u) +

13(v + 2u))

�

�

�

�

−13

�

�

�

�

dudv =19

∫ 8

4

¨

∫ 4

0

(2v + u)du

«

dv = · · ·=2249

.

8.5 More on polar coordinates

Recall the coordinate transformation x = r cosφ, y = r sinφ. Table 8.1 lists Cartesian and polar coordinatesfor some points in the plane.

Exercise.Show that if points outside the origin should have a unique set of polar coordinates then the angle φ has tobe restricted to [0,2π) (or any other half open interval of length 2π). What are the allowed values for the rcoordinate? What happens at the origin?

Example 8.9Sketch the curve given in polar coordinates by the equation r = 1+ cosφ = r(φ) .

Think of a bead sliding along an arm rotating around the origin. The beadchanges its distance from the origin according to the function r(φ). In Carte-sian coordinates the same curve is described by

γ(t) = ((1+ cos(t)) cos(t), (1+ cos(t)) sin(t)).

Can you determine the behaviour of the curve at the origin? Compute thetangent vector there and interpret the result.

x

y

1

1

69

Table 8.1: Cartesian and polar coordinates for selected points in the plane

Cartesian polar

(x,y) (r,φ)

x

y1

1

(1, 1) (p

2,π

4)

x

y1

1

(0,0.5) (0.5,π

2)

x

y1

1

(−1,−1) (p

2,5π4)

x

y1

1

(0.5,0) (0.5, 0)

8.6 Integrals in polar coordinates

The area of a region R is computed by integrating the constant function 1 over the region. This can also be donewhen the region is defined in terms of polar coordinates. Let us look at the more general case of integratingan arbitrary function defined on R using polar coordinates. Let Φ(r,φ) = (r cosφ, r sinφ) be the coordinatetransformation between Cartesian and polar coordinates, then we use the change of variables formula to write

∫∫

Rf (x , y)dxdy =

∫∫

R′f (r cosφ, r sinφ) |det JΦ(r,φ)| drdφ

Now,

JΦ(r,φ) =

�


�

=⇒ det JΦ(r,φ) = r

and the integral becomes∫∫

Rf (x , y)dxdy =

∫∫

R′f (r cosφ, r sinφ) rdrdφ.

In case of the area we get

A(R) =

∫∫

R1dxdy =

∫∫

R′rdrdφ.

70

Example 8.10Compute the area of an annulus with inner radius r1 and outer radius r2.

φ

r

r1

r2

π 2π

R′Φ

x

y

r1 r2

R

The region R which is an annulus in (x , y)-coordinates is described in (r,φ)-coordinates by r1 ≤ r ≤ r2 and0 ≤ φ ≤ 2π, i.e., as a rectangle in the (r,φ)-plane. The area of the annulus is computed using the change ofvariables formula

∫∫

Rdxdy =

∫ r2

r1

¨

∫ 2π

0

r dφ

«

dr = 2π

∫ r2

r1

r dr = π�

r22 − r2

1

�

.

Example 8.11Find the area of the region enclosed by the curve r = φ, 0≤ φ ≤ 2π and the positive x-axis.

φ

r

2π

2π

R′

Φ

x

y

π2

π 2π

R

Now the area can be computed by

∫∫

Rdxdy =

∫∫

R′rdrdφ =

∫ 2π

0

¨

∫ φ

0

r dr

«

dφ =12

∫ 2π

0

φ2 dφ =43π3

Example 8.12Determine the area inside the cardioid.

The cardioid is the curve

γ(φ) = ((1+ sinφ) cosφ, (1+ sinφ) sinφ), 0≤ φ ≤ 2π.

In polar coordinates this curve can be given by the equation r = 1 + sinφ.Again, we compute the area using the change of variables formula x

y

2

−1 1

R

∫∫

Rdxdy =

∫∫

R′rdrdφ =

∫ 2π

0

¨

∫ 1+sinφ

0

rdr

«

dφ =12

∫ 2π

0

(1+ sinφ)2 dφ = · · ·=3π2

.

71

Example 8.13Using polar coordinates to evaluate the integral

I :=

∫ ∞

−∞e−x2

dx .

First step: compute I2

I2 =

∫ ∞

−∞e−x2

dx · I =∫ ∞

−∞I · e−x2

dx =

∫ ∞

−∞

�∫ ∞

−∞e−y2

dy

�

e−x2dx =

∫ ∞

−∞

�∫ ∞

−∞e−y2

e−x2dy

�

dx

=

∫∫

R2

e−x2−y2dxdy.

Second step: use polar coordinates

∫∫

R2

e−x2−y2dxdy =

∫ 2π

0

�∫ ∞

0

e−r2r dr

�

dφ =

∫ 2π

0

−12

e−r2�

�

�

∞

0dφ =

12

∫ 2π

0

dφ = π.

Therefore

I =

∫ ∞

−∞e−x2

dx =pπ.

72

9 Line integrals

Stewart§16.2

9.1 Line integrals of scalar functions

Suppose γ(t) for t ∈ [a, b] is a curve in the plane or in space and that you are givena function f defined along the curve. What is the integral of f along the curve?As an example think of a long cable made of different materials with different massdensities. How would one compute the total mass of the cable?

γ(t i) = (x i , yi)

∆si

M ≈N∑

i=1

f (x i , yi)∆si

If f describes the density of mass per centimetre along the cable then we can ap-proximate the total mass of the cable by chopping it into N small pieces, obtainingapproximate values for their mass and adding them up.

y

z

xγ(t)

f (γ(t))

We can get approximate values for the mass of a small piece by taking the value of the mass density at one ofthe points (x i , yi) on the small piece and multiplying it with the length ∆si of the chunk. This idea is exactlythe same as for the usual Riemann integrals that we discussed in Chapter 8. The complication here is that weneed to find the lengths of the different chunks of the curve.

To find∆s for a small piece along the curve γ(t) consider the figure to the right.It shows a small piece of the curve between the points γ(t) and γ(t+∆t. Thisfigure suggests that when ∆t is small the length of the curve between thepoints is approximated by the length of the straight line between the points,i.e., that ∆s ≈

p

∆x2 +∆y2. γ(t)

γ(t +∆t)

∆s

∆x

∆y

We get expressions for ∆x and ∆y as follows.

∆x = x(t +∆t)− x(t) = x(t)∆t + o(∆t)

∆y = y(t +∆t)− y(t) = y(t)∆t + o(∆t)

�

=⇒ ∆s ≈Æ

x(t)2 + y(t)2∆t + o(∆t).

and therefore the value for the mass becomes

M ≈N∑

i=1

f (x(t i), y(t i))Æ

x(t i)2 + y(t i)2∆t + o(∆t).

In the limit of infinitely many infinitely short pieces we get

M =

∫ b

af (x(t), y(t))

Æ

x(t)2 + y(t)2 dt

Therefore, we define the

73

Definition 9.1 (Line integral of a function). Let γ : [a, b]→ R2 be a curve in the plane and let f : R2 → R be afunction defined in a region of the plane containing the curve. Then the line integral of f along γ is the integral

∫

γ

f ds =

∫ b

af (x(t), y(t))

Æ

x(t)2 + y(t)2 dt.

In a similar way, we define the integral of a function over a curve in space.

Example 9.1Evaluate the line integral

∫

γ1+ 4(x − y)ds where γ is the curve defined by γ(t) = (t2 + t, t2) for 0≤ t ≤ 1.

First, compute the line element

ds =Æ

x(t)2 + y(t)2 dt =Æ

(2t + 1)2 + (2t)2 dt =p

8t2 + 4t + 1dt

and insert into the definition for the line integral. Then the integral becomes

∫

γ

1+ 4(x − y)ds =

∫ 1

0

(1+ 4t)p

8t2 + 4t + 1 dt =14·

23(8t2 + 4t + 1)3/2

�

�

�

1

0=

16(133/2 − 1)

Note In the special case when the function f is the constant 1, i.e., f (γ(t)) = 1, its integral over γ gives thelength of the curve.

L(γ) =

∫

γ

1 · ds =

∫ b

a

Æ

x(t)2 + y(t)2 dt.

9.2 Line integrals of vector fields

Suppose F : R2→ R2 is a vector field and γ is a curve in the plane.

x

y

γ

F

In many applications one is interested in the integral∫

γ

F · ds or

∫

γ

~F · d~r.

Such applications include

• electromagnetism (voltage)• mechanics (work, energy)• chemistry (reaction processes)• economics (welfare, consumer supply)

74

In order to evaluate such integrals we need to interpret ds. Consider the figureon the right. The vector ∆s= γ(t +∆t)− γ(t) has the components

∆s=

�

x(t +∆t)− x(t)y(t +∆t)− y(t)

�

=

�

x(t)∆ty(t)∆t

�

+ o(∆t) = γ(t)∆t + o(∆t)

for small ∆t.γ(t)

γ(t +∆t)

∆s

∆x

∆y

Therefore, we define ds = γ(t)dt. This is the vector line element. Note that it is essentially the tangent vectorto the curve at the point γ(t). Since F is also a vector we define

Definition 9.2. Let γ(t) with a ≤ t ≤ b be a curve in the plane and F a vector field in the plane so that F is definedon all points of the curve. The line integral

∫

γ

F · ds

is defined by∫

γ

F · ds=

∫ b

aF · γ(t)dt =

∫ b

aF1(γ(t)) x(t) + F2(γ(t)) y(t)dt.

Note Because of this formula one often also writes the line integral as∫

γ

F · ds=

∫

γ

F1dx + F2dy

and then inserts dx = xdt and dy = ydt.

We can generalize the line integral from the two variable case to more variables. For three variables the formulafor a line integral over a vector field is

∫

γ

F · ds=

∫

γ

F1dx + F2dy + F3dz

where now γ(t) = (x(t), y(t), z(t)) is a curve in space and F = (F1, F2, F3) is a vector field in space. Thegeneralization to even more variables follows in the exact same way.

Note The formula∫

γ

F · ds=

∫ b

a(F · γ)dt

for the line integral suggests that the integral only picks up the tangential component of the vector field. Thecomponent which is perpendicular to the curve does not contribute to the integral.

75

Note Every curve γ comes with an orientation that is derived from the parametrization: the points on thecurve are run through in the direction of increasing values of the parameter t.If γ(t), a ≤ t ≤ b is a parameterized curve, then we can define a related curve

γ(t) = γ(−t), −b ≤ t ≤ −a.

This curve runs through the same set of points as γ but in the opposite direction. It has the opposite orientationfrom γ and it is denoted by −γ.It is easy to see that

∫

−γF · ds= −

∫

γ

F · ds

Exercise.Suppose γ1 and γ2 are parametrizations of the same curve, in the sense that they run through the same pointsin the same direction but possibly with different speeds. Show that

∫

γ1

F · ds=

∫

γ2

F · ds.

This means that the value of the line integral of a vector field does not depend on the speed of the curve. It isindependent of the parametrization.

9.3 Conservative vector fields and potentials

Consider a function Φ : D→ R defined on some domain D ⊂ R3 (or any Rn with n ≥ 2). Let γ : [a, b]→ D bea curve in the domain D with endpoints γ(a) = P and γ(b) = Q. We construct a vector field F : D→ R3 from Φby taking its gradient

F=∇Φ= (Φx ,Φy ,Φz).

We want to compute the line integral∫

γF · ds:

∫

γ

F · ds=

∫

γ

Φx dx +Φy dy +Φz dz.

Written explicitly, the first term in the integrand becomes

Φx dx = Φx(x(t), y(t), z(t)) x(t)dt = Φx(γ(t)) x(t)dt

and similarly, for the other two terms. Taken together we can write

Φx(γ(t)) x(t) +Φy(γ(t)) y(t) +Φz(γ(t)) z(t) =∇Φ(γ(t)) · γ(t) =ddtΦ(γ(t)).

When inserting this back into the integral we arrive at

∫

γ

F · ds=

∫ b

a

�

ddtΦ(γ(t))

�

dt = Φ(γ(b))−Φ(γ(a)) = Φ(Q)−Φ(P).

We find the surprising result that the value of the line integral depends only on the endpoints of the curve butnot on the particular curve that was chosen to connect them.

We can generalize this a bit more by considering piecewise smooth curves. Such a curve is “broken” into n piecesin the sense that there is a finite number of values a < t1 < t2 < · · · < tn−1 < b where the curve is continuousbut not differentiable, so there is no unique tangent vector at these values. Here are some cases:

76

γ(a)

γ(t1)

γ(t2)

γ(t3)

γ(b)

γ(a)

γ(t1)

γ(t2)

γ(b)

γ(t1)

γ(a) = γ(b)

Definition 9.3. Let F be an arbitrary vector field on a domain D and let γ be a piecewise smooth curve. On eachsubinterval [t i , t i+1] with t0 := a and tn := b the curve γi : [t i , t i+1]→ D defined by γi(t) = γ(t) is differentiableand we can evaluate the line integral

∫

γi

F · ds.

The line integral of F over the entire curve γ is

∫

γ

F · ds=n−1∑

i=0

∫

γi

F · ds

When F is a gradient field as defined above we obtain for the line integral over any piecewise smooth curve γ

∫

γ

F · ds=n−1∑

i=0

∫

γi

F · ds=n−1∑

i=0

Φ(γ(t i+1))−Φ(γ(t i)) = Φ(γ(b))−Φ(γ(a)) = Φ(Q)−Φ(P)

as before.

Definition 9.4. A vector field F on a domain D is called conservative if the line integral∫

γF ·ds over any piecewise

smooth curve in D depends only on the two endpoints of the curve γ but not on the curve itself.

Note For a conservative vector field F it is true that∫

γ1

F · ds=

∫

γ2

F · ds

provided that γ1 and γ2 share the same starting point P and endpoint Q.

Note A vector field is conservative if the line integral over every piecewise smooth closed curve vanishes.

Note We have the result that vector fields F=∇Φ constructed as the gradient of a functionΦ are conservative.Thus, gradient fields are conservative.

This raises a question: Are gradient fields the only conservative vector fields, or are there other possibilities?Are there conservative vector fields, which are not the gradient of a function? The answer to this question is“no”, as we see in the following

Theorem 9.1. A vector field F : D → R3 on a domain D is conservative if and only if there exists a functionΦ : D→ R such that

F=∇Φ.

77

The “if and only if” phrase here means actually two statements. The “if” part corresponds to the statement that“If Φ is a function on D and we define F = ∇Φ then F is conservative”. The “only if” part states that “A vectorfield F can only be conservative if there is a function Φ on D with F=∇Φ”.

We have seen above that the “if” part is true: every gradient field is conservative. How can we know that the“only if” part is also true? We have to show that for every conservative vector field F on a domain D we canconstruct a function Φ so that F=∇Φ. So let’s do that. Pick a point O = (x0, y0, z0) ∈ D and define

Φ(x , y, z) =

∫

γ

F · ds

where γ is an arbitrary piecewise smooth curve connecting O and P = (x , y, z) in D. We also putΦ(x0, y0, z0) = 0.Then Φ is a function on D.(Why?) We want to show that ∇Φ = F, i.e., that ∂xΦ = F1, ∂yΦ = F2 and ∂zΦ = F3.Let us focus on the first equation. We have

∂xΦ(x , y, z) =ddtΦ(x + t, y, z)

�

�

t=0=ddt[Φ(x + t, y, z)−Φ(x , y, z)]

�

�

t=0 .

Since F is conservative we can write the difference inside the brackets in terms of a line integral along any pathγ from (x , y, z) to (x + t, y, z):

Φ(x + t, y, z)−Φ(x , y, z) =

∫

γ

F · ds.

Let us take the curve γ : [0, t] → D, γ(λ) = (x + λ, y, z) connecting (x , y, z) and (x + t, y, z). Then ˙γ(λ) =dγ/dλ(λ) = (1,0, 0) and

Φ(x + t, y, z)−Φ(x , y, z) =

∫ t

0

F(γ(λ)) · ˙γ(λ)dλ=∫ t

0

F1(x +λ, y, z)dλ.

Then we can compute the derivative

∂xΦ(x , y, z) =ddt

∫ t

0

F1(x +λ, y, z)dλ�

�

t=0= F1(x + t, y, z)�

�

t=0= F1(x , y, z).

Here, we have used the Fundamental Theorem of Calculus. In a similar way, one shows that the other compo-nents of ∇Φ= F are satisfied.

Definition 9.5. Let F be a conservative vector field. A function Φ : D → R for which ∇Φ = F holds is called apotential (function) for F.

Note Any two potentials Φ1 and Φ2 for the same conservative vector field F differ by only a constant. Thisfollows from ∇(Φ1 −Φ2) = 0.

Given a vector field F, is there a way to decide whether it is conservative? When F is conservative then it has apotential Φ with ∇Φ= F. Then it follows that

∂x∂yΦ= ∂y∂xΦ ⇐⇒ ∂1F2 − ∂2F1 = 0,

∂y∂zΦ= ∂z∂yΦ ⇐⇒ ∂2F3 − ∂3F2 = 0,

∂z∂xΦ= ∂x∂zΦ ⇐⇒ ∂3F1 − ∂1F3 = 0.

Therefore, if F is conservative, then it is necessary that it satisfies the equations on the right. For a vector fieldF= (F1, F2, . . . , Fn) in arbitrary dimension n a similar result is true

78

Theorem 9.2. Let F be a conservative vector field on a domain D ⊂ Rn then

F=∇Φ =⇒ ∂i Fk − ∂kFi = 0, i, k = 1 : n.

In particular, when n= 2 with F= (F1, F2)

F=∇Φ =⇒ ∂1F2 − ∂2F1 = 0.

Note The equations ∂i Fk − ∂kFi = 0, i, k = 1 : n are called integrability conditions because they must besatisfied for F to be the derivative of a function Φ, or in other words, for Φ to be the integral of F.

In three dimensions (and only in three dimensions) the integrability conditions have a particular meaning:

Definition 9.6. Given a vector field F : D→ R3 on some domain D ⊂ R3, we define a new vector field, its curl by

curlF= (∂2F3 − ∂3F2,∂3F1 − ∂1F3,∂1F2 − ∂2F1).

Note One way to remember the formula for the curl is by considering the determinant�

�

�

�

�

�

i j k∂1 ∂2 ∂3F1 F2 F3

�

�

�

�

�

�

= i(∂2F3 − ∂3F2) + j(∂3F1 − ∂1F3) + k(∂1F2 − ∂2F1) =

∂2F3 − ∂3F2∂3F1 − ∂1F3∂1F2 − ∂2F1

where i, j and k are the standard unit-vectors. For this reason the curl of F is also very often denoted by∇×F.

Note Thus, in three dimensions we can express the Theorem 9.2 also by saying that every conservative vectorfield is curl-free, i.e.,

F=∇Φ =⇒ curlF= 0.

What about the converse? Is it true that a curl-free vector field has a potential? The answer is “no” and we givea counterexample. Consider the vector field

F(x , y, z) =1

x2 + y2

y−x0

which describes the magnetic field of a long wire along the z-axis. It is defined on the domain D = R3\{(0, 0, z) :z ∈ R}, i.e., everywhere except on the z-axis. It is straightforward to verify that curlF= 0. If F were conservative,then its line integrals over any closed curve in D would vanish. Let us pick a particular curve in D, namely theunit-circle. It is parametrized by

γ : [0, 2π]→ D, γ(t) = (cos(t), sin(t), 0).

The tangent vector at t is

γ(t) =

− sin(t)cos(t)

0

and we can evaluate the line integral

∫

γ

F · γ(t)dt =

∫ 2π

0

dt = 2π 6= 0.

79

In fact, we get this for every curve that winds once around the z-axis. Therefore, this vector field is not conser-vative on its domain D.

The reason lies in a particular topological property of the domain D. Take any closed curve in D. Is it possibleto shrink this curve down to a point without leaving D, i.e., without rupturing it? The answer is ’no’, because itis not true for all curves. It does not work for a curve which winds around the z-axis. For those curves loopingaround the z-axis it is not possible to shrink them down to a point because they will necessarily have to crossthe z-axis, i.e., leave the domain D. Domains in which every loop can be contracted to a point have a specialname.

Definition 9.7. A domain D ⊂ Rn in which every closed curve can be continuously deformed into a point is calleda simple domain.

Here are some domains in R2, the blue ones are simple, the others are not:

On simple domains the converse statement about curl-free vector fields is true. More generally, we have the

Theorem 9.3. Let F : D→ Rn be a vector field satisfying the integrability conditions and suppose that D is simple.Then there exists a function Φ : D→ R on D so that ∇Φ= F.

Note In particular, in three dimensions, a curl-free vector field defined on a simple domain has a potential.Therefore, on simple domains vector fields are conservative if and only if they are curl-free.

Example 9.2Show that F(x , y) = (2x y2 + 1, 2x2 y + 2) is conservative and find a potential Φ.

• conservative: check the integrability condition

∂1F2 − ∂2F1 = ∂x(2x2 y + 2)− ∂y(2x y2 + 1) = 4x y − 4x y = 0.

The domain is R2, which is simple, and therefore F is conservative.• potential: there are several ways to compute a potential.

1. The first one is to successively solve the equation ∇Φ= F, i.e., starting with

∂xΦ= F1 = 2x y2 + 1 =⇒ Φ(x , y) = x2 y2 + x + g(y)

with an arbitrary function g(y) (which is killed when we take the partial derivative with respectto x). To determine this function we insert this partial solution for Φ into the second equation

∂yΦ= 2x2 y + 2 =⇒ 2x2 y + g ′(y) = 2x2 y + 2 =⇒ g ′(y) = 2 =⇒ g(y) = 2y + c

with an arbitrary constant c. We can fix this constant by requiring that Φ(0, 0) = 0, then we get thepotential

Φ(x , y) = x2 y2 + x + 2y.

80

2. The second method is to evaluate a line integral. We know that Φ(x , y) − Φ(0,0) =∫

γF · ds,

independently of the curve γ between (x , y) and (0,0). So we can choose a convenient curvebetween the two points and evaluate the line integral. Choosing the curve γ(t) = (t x , t y) fort ∈ [0,1] yields the integral

∫

γ

F ·ds=

∫ 1

0

F · γdt =

∫ 1

0

(2(t x)(t y)2+1)x +(2(t x)2(t y)+2)y dt =

∫ 1

0

(4t3 x2 y2+ x +2y)dt,

With the same requirement as before we obtain the same potential

Φ(x , y) = x2 y2 + x + 2y.

Example 9.3

Is F(x , y, z) = (4x , z2, 2yz) conservative? If so find a potential and evaluatethe line integral

∫

γF · ds where γ is the curve shown in the diagram to the

right. Note, that the diagram only shows the projection of the curve onto thex y-plane. x

y

(0,0, 0)

(1, 1,1) (2,1, 2)

(3, 0,3)

To check whether the vector field is conservative we need to compute its curl

curlF=

�

�

�

�

�

�

i j k∂x ∂y ∂z4x z2 2yz

�

�

�

�

�

�

= i(2z − 2z) + j(0− 0) + k(0− 0) =

000

.

So F is conservative (it is defined on R3 which is simple).For the potential we want Φx(x , y, z) = 4x , Φy(x , y, z) = z2, and Φx(x , y, z) = 2yz so we obtain successively:

1. Φx(x , y, z) = 4x =⇒ Φ(x , y, z) = 2x2 + g(y, z)2. Φy(x , y, z) = ∂y g(y, z) = z2 =⇒ g(y, z) = z2 y + h(z)3. Φz(x , y, z) = 2z y + h′(z) = 2yz =⇒ h(z) = c.

The potential is thereforeΦ(x , y, z) = 2x2 + z2 y + c.

The constant c is arbitrary and we can choose it at will. To evaluate the line integral we use∫

γ

F · ds= Φ(3,0, 3)−Φ(0,0, 0) = 18.

Consider again the example of the magnetic field above. Suppose we are only interested in the field on the righthalf space D = {(x , y, z) ∈ R3 : x > 0}, that is the vector field

F : D→ R3, F(x , y, z) =1

x2 + y2

y−x0

It is easy to convince oneself that the half space D is simple so that the vector field, being curl-free, must have apotential. In fact, for the function

Φ : D→ R, Φ(x , y, z) = −arctan(y/x)

we find that

∂xΦ(x , y, z) = −1

1+ y2/x2

−yx2=

yx2 + y2

, ∂yΦ(x , y, z) = −1

1+ y2/x2

1x=

−xx2 + y2

81

so that ∇Φ= F and Φ is a potential for F. Clearly, Φ is not defined when x = 0 so it is not possible to extend thisfunction beyond the domain D.

This is an instant where it is very important to look at the domains where the functions are defined. The vectorfield F is defined everywhere except on the z-axis. However, the potential Φ as defined above makes sense onlywhen x 6= 0. Therefore, we can not have this potential beyond D.

However, we can try to define a potential on as large a domain as possible. One finds that the largest simpledomain on which F is defined is R3 with one half-plane removed such as e.g. the domain bD := R3\{(x , y, z) ∈R3 : y < 0 and x = 0}, where half of the yz-plane has been removed. On this domain we can define a potentialbΦ for F as follows

bΦ : bD→ R, bΦ(x , y, z) =

arctan(y/x) x > 0,

π/2 x = 0 and y > 0,

π+ arctan(y/x) x < 0

Fig. 9.1 shows a plot of the function bΦ(x , y, 0).

Figure 9.1: The potential for the magnetic field of a wire defined on the maximal domain

Exercise.Show that this function is continuous and differentiable on bD. Verify that it is a potential for F on bD.

82

10 Vector identitities, Green’s and Gauss’ theorem

10.1 Divergence of vector fields, vector identities

We have seen in the previous chapters that there are useful operations that can be performed in relation to vectorfields such as the gradient of a functions or the curl of a vector field. There is one more operation that appearsvery often and that is the divergence of a vector field.

Definition 10.1. Let F : Rn → Rn be a vector field, F(x1, x2, . . . , xn) = (F1(x1, x2, . . . , xn), . . . , Fn(x1, x2, . . . , xn))then its divergence is the function

divF= ∂1F1 + ∂2F2 + · · ·+ ∂nFn =n∑

i=1

∂i Fi .

Another notation for divF is ∇ · F.

Note Specifically, in three dimensions with F(x , y, z) = (F1(x , y, z), F2(x , y, z), F3(x , y, z)) the divergence is

∇ · F= divF= ∂x F1 + ∂y F2 + ∂z F3,

and, similarly in two dimensions with F(x , y) = (F1(x , y), F2(x , y)) the divergence becomes

∇ · F= divF= ∂x F1 + ∂y F2.

In three dimensions there are several important relationships between grad, div and curl.

Theorem 10.1 (Vector identities). Let Φ : R3→ R be a function and let F,G : R3→ R3 be vector fields. Then

(i) F= gradΦ =⇒ curlF= 0, curl◦grad = 0,

(ii) F= curlG =⇒ divF= 0, div◦ curl = 0,

(iii) F= gradΦ =⇒ divF=∆Φ, div◦grad =∆.

Here, ∆ is the Laplace operator defined by ∆Φ= ∂x xΦ+ ∂y yΦ+ ∂zzΦ.

Exercise.Verify these identities.

10.2 Green’s theorem

In this section we focus on two dimensions. In the 2-dimensional plane we can characterize simple domains interms of their boundary. Suppose that γ is a simple (i.e., non-self intersecting) closed curved, traced out oncein the counter-clockwise sense and let D be the enclosed region. Then D is automatically a simple domain, seeFig. 10.2(a).

83

Figure 10.1: Examples of 2-dimensional vector fields: curl-free (left) and divergence-free (right)

γD

(a) A (non-convex) domain with asimple oriented boundary curve

x

yy2

y1

γ2γ1

yy

x1(y) x2(y)

(b) Splitting the boundary into twopieces

Figure 10.2: Two domains

Theorem 10.2 (Green). Let F = (F1, F2) be a 2-dimensional vector field, and let D be the interior of a simple,closed curve γ then

∫

γ

F · ds=

∫∫

DcurlFdxdy.

Explicitly, this formula becomes∫

γ

F1dx + F2dy =

∫∫

D

�

∂ F2

∂ x−∂ F1

∂ y

�

dxdy.

We want to give an indication of the proof of this theorem. Specifically, we will prove that∫

γ

F2dy =

∫∫

D

�

∂ F2

∂ x

�

dxdy

under the additional assumption that D is convex (“round”). This means that with any two points P and Q in Dthe entire line segment between P and Q lies in D.

The double integral is easily evaluated

∫∫

D

�

∂ F2

∂ x(x , y)

�

dxdy =

∫ y2

y1

¨

∫ x2(y)

x1(y)

∂ F2

∂ x(x , y)dx

«

dy =

∫ y2

y1

F2(x , y)�

�

�

x2(y)

x1(y)dy

=

∫ y2

y1

F2(x2(y), y)dy −∫ y2

y1

F2(x1(y), y)dy

To compute the line integrals on the boundary of D we need to split the curve γ into the two pieces γ1 and γ2,see Fig. 10.2(b). Now γ2 is parameterized by γ2(y) = (x2(y), y) for y1 ≤ y ≤ y2 so that

∫

γ2

F2dy =

∫ y2

y1

F2(x2(y), y) · 1dy.

84

Similarly, −γ1 is parameterized by (x1(y), y) for y1 ≤ y ≤ y2 so that

−∫

γ1

F2dy =

∫

−γ1

F2dy =

∫ y2

y1

F2(x1(y), y) · 1dy.

Putting things together we arrive at∫∫

D

�

∂ F2

∂ x(x , y)

�

dxdy =

∫ y2

y1

F2(x2(y), y)dy −∫ y2

y1

F2(x1(y), y)dy =

∫

γ2

F2dy −∫

−γ1

F2dy =

∫

γ

F2dy.

In a similar way one can show that∫∫

D

�

∂ F1

∂ x

�

dxdy = −∫

γ

F1 dx

with the final result∫∫

D

�

∂ F2

∂ x−∂ F1

∂ y

�

dxdy =

∫

γ

F1 dx + F2 dy.

Note This proof can be extended to more general domains by patching together simple domains, such as forexample

γD

γ1

γ2

D1

D2

∫∫

D

�

∂ F2

∂ x−∂ F1

∂ y

�

dxdy =

∫∫

D1

(. . .)dxdy +

∫∫

D2

(. . .)dxdy =

∫

γ1

F · ds+

∫

γ2

F · ds=

∫

γ

F · ds

since the piece of the curves separating the two domains is traversed twice but in opposite directions so thatthe corresponding line integrals cancel.

Example 10.1Suppose F= (F1, F2) satisfies ∂x F2 = ∂y F1, then Green’s theorem for an arbitrary simple, closed curve γ implies

∫

γ

F · ds=

∫∫

D

�

∂ F2

∂ x−∂ F1

∂ y

�

dxdy = 0

So F is conservative, as discussed in Section 9.3.

Example 10.2Compute I =

∫

γ(x2 + 10y)dx + (ey2

+ 8x)dy where γ is the circle with equation x2 + y2 = 2 traversed oncecounter-clockwise.From Green’s theorem

I =

∫∫

D8− 10 dxdy = −2

∫∫

Ddxdy = −2 (π · 2)

︸︷︷︸

area of D

= −4π.

85

Example 10.3Compute the line integral

∫

γF · ds for the vector field defined by

F= (y cos x − x y sin x , x y + x cos x)

and γ as shown.

y

x

1

1

γ

∫

γ

F · ds=

∫

γ

(y cos x − x y sin x)dx + (x y + x cos x)dy

=

∫∫

D(y − x sin x + cos x)− (cos x − x sin x)dxdy =

∫∫

ydxdy

=

∫ 1

0

¨

∫ 1−x

0

ydy

«

dx = · · ·=16

.

10.3 Surface integrals and Gauss’ theorem

We start with Green’s theorem∫∫

D

�

∂ F2

∂ x−∂ F1

∂ y

�

dxdy =

∫

γ

F · ds=

∫

γ

F1dx + F2dy

and rewrite in terms of the vector field E= (E1, E2) = (F2,−F1). This yields∫∫

D

�

∂ E1

∂ x+∂ E2

∂ y

�

dxdy =

∫

γ

E1dy − E2dx

or∫∫

DdivEdxdy =

∫

γ

E1 y − E2 x dt =

∫

γ

(E · n)ds.

Here, we have used the fact (see section 9.1) that ds =p

x2 + y2 dt and we also defined the vector

n=1

p

x2 + y2

�

y− x

�

.

What is this vector? It is defined at each point of the curve γ, see Fig. 10.3. We note that n ·n= 1 and n · γ= 0.

γ

n

γ

n

γ

n

D

Figure 10.3: The outward pointing normal vetor n along the curve γ

This shows that n is always perpendicular to the (tangent vector to the) curve γ and always points outside ofthe enclosed domain.

We have obtained the 2-dimensional version of the divergence theorem, also called Gauss’ law.

Theorem 10.3 (2d divergence theorem). Let E be a smooth vector field defined on a region D which is bounded

86

by a simple closed curve and let n be the outward normal to the boundary of D as defined above. Then∫∫

DdivEdxdy =

∫

γ

(E · n)ds.

Definition 10.2. The integral∫

γ(E ·n)ds is called the flux (integral) of E through γ, i.e., through the boundary of

the region D.

Note This theorem can be interpreted as a balance law [Picture]• the net sum of a commodity traded in and with a country is equal to its net imports and exports• the net heat inside a house is equal to the heat flux through the windows

We can generalize this formula to three (in fact, arbitrary) dimensions. Consider a region D ⊂ R3 with aboundary. This is a surface which encloses the region.

[Picture]

Theorem 10.4 (3d divergence theorem). Let F : D → R3 be a vector field defined on a domain D which has aboundary given by a surface S. Then

∫∫∫

DdivEdxdydz =

∫∫

S(E · n)dA

The integral on the right hand side is again called the flux integral. We have not yet explained how it is computed.Going back to the 2-dimensional case

∫

γ

E · nds =

∫

E1 y − E2 x dt =

∫ �

�

�

�

E1 xE2 y

�

�

�

�

dt

The second column of the determinant contains the tangent vector γ, which is the same as the Jacobian Jγ(t).

The surface S can be defined in terms of a surface patch (see 7.1)

Φ(u, v) = (x(u, v), y(u, v), z(u, v))

with a Jacobian matrix

JΦ(u, v) =

xu xvyu yvzu zv

.

It is very natural to consider the integral

∫∫

E1 xu xvE2 yu yvE3 zu zv

dudv

Computing the determinant gives

E1(yuzv − yvzu) + E2(zu xv − zv xu) + E3(xu yv − xv yu) = E ·N

where N is the cross product

N=

xuyuzu

×

xvyvzv

.

87

Since N is not in general a unit-vector we define

n=N

pN ·N

.

Note, that both N and n are perpendicular to the surface.

Exercise.Verify this statement.

If we now define the area-element dA=p

N ·Ndudv then we can write

∫∫

S(E · n)dA=

∫∫

E1 xu xvE2 yu yvE3 zu zv

dudv

and the divergence theorem holds with this interpretation of the flux integral.

Example 10.4Let D be the sphere with radius R and let the vector field E be the radial vector field E(x , y, z) = (x , y, z). Wecompute both sides in the divergence theorem. The left hand side yields

∫∫∫

DdivEdxdydz = 3

∫∫∫

DdivEdxdydz = 3 Volume(D) = 4πR3.

To evaluate the flux integral we need a surface patch for the sphere. There are many possibilities. We choosethe best known

Φ(θ ,φ) = R(sinθ cosφ, sinθ sinφ, cosθ ), for 0≤ θ ≤ π, 0≤ φ ≤ 2π

and compute its Jacobian matrix

JΦ(θ ,φ) = R

cosθ cosφ sinθ sinφcosθ sinφ sinθ cosφ− sinθ 0

With E(x(θ ,φ), y(θ ,φ), z(θ ,φ)) = R(sinθ cosφ, sinθ sinφ, cosθ ) we can compute the flux integral

∫∫

S(E · n)dA=

∫∫

R3

sinθ cosφ cosθ cosφ sinθ sinφsinθ sinφ cosθ sinφ sinθ cosφ

cosφ − sinθ 0

dθdφ = R3

∫∫

sinθ dθdφ = 4πR3.

88

MATH203 Calculus of several variables - University of Otago

Documents

Transcript of MATH203 Calculus of several variables - University of Otago