CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical...

31
tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation We wish to generalize the techniques of critical point analysis, and of local and global optimization, which you know in the context of the theory of real-valued functions of one real variable. In this chapter, we will consider critical point analysis and optimization for real-valued functions of several real variables. 4.1 Critical Points and Local Extrema We start off with the key definitions, which are straight generalizations of the one- dimensional ones; for neighbourhoods of points, we just replace open intervals by open balls. Definition 4.1.1 Suppose that p A R , and that R A f : . Let A 0 x . Then, 0 x is said to be a local maximum of f if ) ) ) ( ) ( ( ) ) ( ( )( )( 0 ( 0 0 x x x x x f f B A r r > . Similarly, 0 x is said to be a local minimum of f if ) ) ) ( ) ( ( ) ) ( ( )( )( 0 ( 0 0 x x x x x f f B A r r > . Finally, 0 x is said to be a local extremum of f if 0 x is either a local maximum of f or a local minimum of f . Remark 4.1.2 Denote by ) ( Loc.Max. f the set of all local maxima of f , ) ( Loc.Min. f the set of all local minima of f , and ) ( Loc.Extr. f the set of all local extrema of f . Then, by definition, we have ) ( Loc.Min. ) ( Loc.Max. ) ( Loc.Extr. f f f = ) ( U .

Transcript of CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical...

Page 1: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 1

CHAPTER FOUR

CRITICAL POINT ANALYSIS AND OPTIMIZATION

4.0 Motivation

We wish to generalize the techniques of critical point analysis, and of local and global

optimization, which you know in the context of the theory of real-valued functions of one

real variable. In this chapter, we will consider critical point analysis and optimization for

real-valued functions of several real variables.

4.1 Critical Points and Local Extrema

We start off with the key definitions, which are straight generalizations of the one-

dimensional ones; for neighbourhoods of points, we just replace open intervals by open

balls.

Definition 4.1.1

Suppose that pA R⊆ , and that R→Af : . Let A∈0x .

Then, 0x is said to be a local maximum of f if

)))()(())(()()(0( 00 xxxxx ffBAr r ≤⇒∈∈∀>∃ .

Similarly, 0x is said to be a local minimum of f if

)))()(())(()()(0( 00 xxxxx ffBAr r ≥⇒∈∈∀>∃ .

Finally, 0x is said to be a local extremum of f if 0x is either a local maximum of f

or a local minimum of f .

Remark 4.1.2

Denote by )(Loc.Max. f the set of all local maxima of f , )(Loc.Min. f the set of all

local minima of f , and )(Loc.Extr. f the set of all local extrema of f .

Then, by definition, we have

)(Loc.Min.)(Loc.Max.)(Loc.Extr. fff ∪= )( U⊆ .

Page 2: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 2

Definition 4.1.3

Suppose that pU R⊆ is open, and that R→Uf : is differentiable.

Then, U∈0x is said to be a critical point of f if

0x =)( 0Df .

Define

})(:{)(Crit 0xx =∈= DfUf )( U⊆ .

Remark 4.1.4

Note that

=

∂∂

∈∀∈= 0)()},...,1{(:)(Crit xxjx

fpjUf .

Just as in the on-dimensional case, we have the following basic theorem connecting

local extrema and critical points for a differentiable function. The result tells us that we

should look among the critical points to find local extrema (but, of course, it does not say

that every critical point is a local extremum!).

Theorem 4.1.5

Suppose that pU R⊆ is open, and that R→Uf : is differentiable.

Then,

)(Crit)(Loc.Extr. ff ⊆ .

Remark 4.1.6

The proof of Theorem 4.1.5 is based on the one-dimensional case: fix pRh∈

consider the function ) (; 0 hx tftg +a defined on an interval containing R∈0 .

To complement Theorem 4.1.5, we make the following definition.

Page 3: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 3

Definition 4.1.7

Suppose that pU R⊆ is open, and that R→Uf : is differentiable.

Then, U∈0x is said to be a saddle point of f if 0x is a critical point of f , but 0x is

not a local extremum of f .

Remarks 4.1.8

Denote by ).(Pt.Sad f the set of all saddle points of f .

Then we have the following:

)(Loc.Extr.)(Crit).(Pt.Sad fff −= ,

and

)(Crit f ).(Pt.Sad)(Loc.Extr. ff ∪=

).(Pt.Sad))(Loc.Min.)(Loc.Max.( fff ∪∪= .

Note also the following.

Suppose that )(Crit0 f∈x .

Then ).(Pt.Sad0 f∈x if and only if

)))()(())(())()(())((()0( 0000 xxxxxxxx ffBUffBUr rr <∩∈∃∧>∩∈∃>∀ −−++

i.e., no matter how small an open neighbourhood of the critical point 0x we take, we

can always find two points in this neighbourhood, one of which has a function value

larger than that at 0x , the other having a function value less than that at 0x .

Page 4: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 4

Examples 4.1.9

(i) Define 222 ),(;: yxyxf −−→ aRR .

Then, )(Loc.Max.(0,0) f∈ .

(ii) Define 222 ),(;: yxyxf −→ aRR .

Then, )(Sad.Pt.(0,0) f∈ .

(iii) Define 222 ),(;: yxyxf +→ aRR .

Then, )(Loc.Min.(0,0) f∈ .

Remark 4.1.10

The examples just given are important, because, as we will see in the next section,

they provide local models for non-degenerate critical points of a function R→Uf : ,

where 2open

R⊆U .

4.2 Critical Point Analysis

We will now discuss a means of analyzing critical points, namely their location and

their classification.

Remark 4.2.1

To locate the critical points of a differentiable function R→Uf : , where pU Ropen⊆ ,

we need to find the subset )(Crit f of U . In other words, we need to find all

Uxx p ∈= )...,,( 1x which satisfy the p simultaneous equations

=

∂∂

∧∧

=

∂∂ 0)(...0)(

1

xxpx

fxf .

Page 5: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 5

Having found all the critical points, we then need to classify each one as a local

maximum, a local minimum or a saddle point. For us, the local extrema will be the most

important, because every global extremum is, of course, a local extremum. Note

however, there are situations where saddle points are also important, e.g., in some areas

of Economics.

Remark 4.2.2

Recall from Theorem 3.2.8 that, if U∈0x and h is near 0 , then

h.o.t. )))((( 21)()()( 0000 +++=+ hxhhxxhx fhessDfff T .

Hence, if )(Crit0 f∈x , we may write

h.o.t.)))((( 21)()( 000 +=−+ hxhxhx fhessff T ,

so that we might expect that the local behaviour of f near the critical point is

controlled by the value of the hessian matrix at 0x , at least if this matrix is non-

singular.

This remark underlies the second derivative test for classifying (nondegenerate)

critical points.

Before we can state the second derivative test, we need the following definitions from

linear algebra. Recall that we use symmppM × to denote the set of all symmetric pp×

matrices with real entries.

Page 6: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 6

Definition 4.2.3

Let symmppMS ×∈ , and consider the associated quadratic form

hhhRR Sq TpS a;: → .

We say that S is negative definite if

)0)(()}{( <−∈∀ h0Rh Sp q .

We say that S is positive definite if

)0)(()}{( >−∈∀ h0Rh Sp q .

We say that S is indefinite if

))0)()((())0)()((( >∈∃∧<∈∃ ++−− hRhhRh Sp

Sp qq .

Theorem 4.2.4 (The second derivative test for classifying critical points)

Suppose that pU R⊆ is open, and that R→Uf : has continuous second partial

derivatives.

Let )(Crit0 f∈x . Then, we have:

( ))(( 0xfhess negative definite ) ⇒ ( )(Loc.Max.0 f∈x ) ;

( ))(( 0xfhess positive definite ) ⇒ ( )(Loc.Min.0 f∈x ) ;

( ))(( 0xfhess indefinite ) ⇒ ( )(Sad.Pt.0 f∈x ) .

Remark 4.2.5

In order to be able to apply Theorem 4.2.4 in a useful way, we need criteria for

determining whether or not a symmetric pp× matrix is positive definite, negative

definite or indefinite. We do not consider this problem in general, but deal only with

the cases 1=p and 2=p .

Page 7: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 7

Examples 4.2.6

(i) Suppose that 1=p .

Let )(Crit0 fx ∈ , where R→),(: baf .

Thus, 0xx = is a solution of the equation

0)( =′ xf .

Note that )())(( 00 xfxfhess ′′= (after identifying 11symm

11 ×× = MM with R ), so

that the associated quadratic form is the function 2

0)( )(;:0

hxfhq xf ′′→′′ aRR .

Hence, Theorem 4.2.4 just reduces to the familiar Theorem 1.5.13:

))(Loc.Max.()0)(( 00 fxxf ∈⇒<′′ ;

))(Loc.Min.()0)(( 00 fxxf ∈⇒>′′ .

Note that the hypothesis in the third possibility of Theorem 4.2.4 cannot arise if

1=p . This is because a 11× symmetric matrix cannot be indefinite. Indeed,

the image of 20)( )(;:

0hxfhq xf ′′→′′ aRR cannot contain both a positive real

number and a negative one.

Thus, in the 1=p case, if )(Crit0 fx ∈ and 0x is a nondegenerate critical point,

i.e., if 0)( 0 ≠′′ xf , then 0x is automatically a local extremum of f .

If, on the other hand, 0x is a degenerate critical point, i.e., if 0)( 0 =′′ xf , then

)())(( 00 xfxfhess ′′= does not provide sufficient information to classify the

critical point. Indeed, as you know, if 0)( 0 =′′ xf , then the critical point 0x

could be a local maximum, or a local minimum, or neither.

Page 8: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 8

(ii) Suppose now that 2=p .

Note that this is the most important case for this module.

Suppose that 2open

R⊆U , and that ),(),(;: yxfyxUf aR→ has continuous

second partial derivatives.

Let )(Crit),( 00 fyx ∈ . Thus, ),(),( 00 yxyx = is a solution of the simultaneous

equations

)0),(()0),(( =∧= yxfyxf yx .

Before we consider the hessian matrix, we need the following linear algebra

lemma concerning symmetric 22× matrices

Lemma

Let symm22×∈

= M

cbba

S , so that 2)det( bacS −= . Then, we have

( S is negative definite ) ⇔ ( )0()0)det(( <∧> aS ) ;

( S is positive definite ) ⇔ ( )0()0)det(( >∧> aS ) ;

( S is indefinite ) ⇔ ( 0)det( <S ) .

We apply this lemma to the symmetric 22× matrix

=

),(),(),(),(

),)((0000

000000 yxfyxf

yxfyxfyxfhess

yyyx

yxxx ,

thereby arriving at the following version of Theorem 4.2.4 in the case 2=p .

Put )),)(((det),( 0000 yxfhessyx =∆ . Then,

( )0),(()0),(( 0000 <∧>∆ yxfyx xx ) ⇒ ( )(Loc.Max.),( 00 fyx ∈ ) ;

( )0),(()0),(( 0000 >∧>∆ yxfyx xx ) ⇒ ( )(Loc.Min.),( 00 fyx ∈ ) ;

( 0),(( 00 <∆ yx ) ⇒ ( )(Sad.Pt.),( 00 fyx ∈ ) .

Page 9: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 9

We repackage the discussion of Example 4.2.6(ii) as an algorithm as follows.

Remark 4.2.7 (Algorithm for critical point analysis for functions of two variables)

To locate and classify the critical points of ),(),(;: yxfyxUf aR→ , perform the

following three steps.

Step Zero

For a general Uyx ∈),( , compute

( )),(),(),( yxfyxfyxDf yx= ,

and

=

),(),(),(),(

),)((yxfyxfyxfyxf

yxfhessyyyx

yxxx .

Step One

Find )(Crit f })00(),(:),({ =∈= yxDfUyx

})0),(()0),((:),({ =∧=∈= yxfyxfUyx yx .

Step Two

For each )(Crit),( 00 fyx ∈ , compute the symmetric matrix ),)(( 00 yxfhess ,

and also the real number )),)(((det),( 0000 yxfhessyx =∆ .

Then apply the second derivative test as described above to classify each critical

point:

( )0),(()0),(( 0000 <∧>∆ yxfyx xx ) ⇒ ( )(Loc.Max.),( 00 fyx ∈ ) ;

( )0),(()0),(( 0000 >∧>∆ yxfyx xx ) ⇒ ( )(Loc.Min.),( 00 fyx ∈ ) ;

( 0),(( 00 <∆ yx ) ⇒ ( )(Sad.Pt.),( 00 fyx ∈ ) .

Page 10: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 10

Note that the algorithm described in Remark 4.2.7 does not deal with the possibility

that 0)),)(((det 00 =yxfhess . This is the generalization of the one-dimensional

condition 0)( 0 =′′ xf . We make the following general definition (cf. Remarks 1.5.14).

Definition 4.2.8

Suppose that pU R⊆ is open, and that R→Uf : has continuous second partial

derivatives.

The critical point 0x of f is said to be nondegenerate if the symmetric pp× matrix

))(( 0xfhess is nonsingular, i.e., if

0)))(((det 0 ≠xfhess .

The critical point 0x of f is said to be degenerate if the symmetric pp× matrix

))(( 0xfhess is singular, i.e., if

0)))(((det 0 =xfhess .

Remarks 4.2.9

Thus our second derivative test in the 2=p case, i.e., Step Two of the algorithm in

Remark 4.2.7, deals only with the case of nondegenerate critical points.

If a critical point happens to be degenerate, then, just as in in the 1=p case, we need

additional information - i.e., information beyond the hessian - to classify the critical

point. Ways of doing this include a consideration of higher partial derivatives or of

the geometry of the graph.

In this module, we will deal, in the main, only with nondegenerate critical points, so

that our algorithm can be followed through right to completion.

Page 11: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 11

4.3 Global Extrema

We make a few brief remarks on the nature of global extrema. The discussion

generalizes that Section 1.5 where we reviewed the one-dimensional case.

Definition 4.3.1

Suppose that pA R⊆ , and that R→Af : . Let A∈0x .

We say that 0x is a global maximum of f if

))()(()( 0xxx ffA ≤∈∀ .

Similarly, we say that 0x is a global minimum of f if

))()(()( 0xxx ffA ≥∈∀ .

We say that 0x is a global extremum of f if 0x is either a global maximum of f or a

global minimum of f .

Denote by )(Glob.Max. f the set of all global maxima of f , and )(Glob.Min. f the

set of all global minima of f .

If )(Glob.Max.0 f∈x , then )( 0

def.

max xff = is called the maximum value of f .

If )(Glob.Min.0 f∈x , then )( 0

def.

min xff = is called the minimum value of f .

Remark 4.3.2

By definition, we have (cf. Definition 4.1.1)

)(Loc.Max.)(Glob.Max. ff ⊆ ,

and

)(Loc.Min.)(Glob.Min. ff ⊆ .

Page 12: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 12

Remark 4.3.3

Note that global extrema of functions on open sets, even bounded open sets, do not

always exist.

Example 4.3.4

Consider the differentiable function

211

1;)(:x

xR0−

→ aBf .

By inspection, }{)(Glob.Min. 0=f , and 1)(min == 0ff .

However, ∅=)(Glob.Max. f , so that maxf does not exist.

If the domain is a closed and bounded set, then we do have the following useful

existence result for continuous functions. (See Definition 2.3.10 for the definition of a

closed subset, and Definition 2.3.18 for the definition of a bounded subset.)

Theorem 4.3.5 (An existence result for global extrema)

Suppose that pV R⊆ is closed and bounded, and that R→Vf : is continuous.

Then,

∅≠)(Glob.Max. f ,

and

∅≠)(Glob.Min. f .

This result provides the following useful way of finding global extrema (cf. Theorem

1.5.18).

Page 13: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 13

Theorem 4.3.6

Suppose that pV R⊆ is closed and bounded, and write V as a disjoint union

VUV ∂∪= (see Remark 2.3.15). Here, 0VU = is the interior of V , and V∂ is the

boundary of V .

Suppose further that R→Vf : is continuous, and that R→Uf U : is differentiable.

Let )(Glob.Max.0 f∈x . Then one of the following holds:

(i) V∂∈0x ;

(ii) )(Loc.Max.0 Uf∈x .

A corresponding statement holds if 0x is a global minimum of f .

Remark 4.3.7

Theorem 4.3.6 provides us with a means of locating, say, maxf , for a function

R→Vf : on a closed and bounded set.

Firstly, we locate the local maxima of the restriction of the function to the interior.

Secondly, we find the local maxima of the restriction of the function to the boundary.

To do this, we can sometimes use the method of constrained optimization (see

Sections 4.4 and 4.5).

Thirdly, we compute the value of the function at all of the points found in the first two

steps. The largest of these values will be the maximum value maxf of the function.

Similarly, we can find the minimum value minf of the function.

Page 14: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 14

4.4 Constrained Optimization I - Solving the Constraint Equation

In Section 4.2, we considered the problem of locating and classifying the critical

points of a function R→Uf : , where pU R⊆ is open. This is a crucial step in solving

the problem of optimizing the function f , because, as we have seen (cf. Theorem 4.1.5),

the local extrema of f (i.e., the local maxima and the local minima of f ) are contained

in the set )(Crit f of critical points of f . Moreover, finding the local extrema of f will

lead us to the global extrema (if these exist) (cf. Remarks 4.3.2 and 4.3.3). This process

may be described as unconstrained optimization inasmuch as the domain of the objective

function is the whole of the set U , and is not constrained in any way.

Now, in many situations in Applied Mathematics, we are often concerned not with

unconstrained optimization, but with constrained optimization; the variables in the

optimization problem are not free to take on any value, but are constrained in some way.

We now set up a description of constrained optimization.

As above, consider a function R→Uf : , where U - the domain of optimization - is

an open subset of pR . In the theory of optimization, the function f is often called the

objective function. Suppose that the variables ),...,( 1 pxx=x are not free to take just any

value in U , but rather they are constrained to lie in some subset S of U , which we call

the constraint set. We assume that there exists a constraint function R→U:ϕ and a

constraint value R∈c such that S is the c-level set of ϕ , i.e., UcS ⊆= − )(1ϕ (see

Definition 2.5.3). Thus, the constraint equation may be written c=)(xϕ .

We may summarize the above discussion as follows.

Page 15: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 15

Remark 4.4.1 (The language of constrained optimization)

pU Ropen

⊆ the domain of optimization ;

R→Uf : the objective function ;

R→U:ϕ the constraint function ;

R∈c the constraint value ;

)(1 cS −= ϕ the constraint set ;

c=)(xϕ the constraint equation .

Example 4.4.2

Take pU R= , 2;: xxR a→Uϕ , and 1=c .

The constraint set is then the unit )1( −p -dimensional sphere, )(1 0S , sitting inside pU R= .

Our constrained optimization problem in this case would be: find maxf (or minf ) for

an objective function R→Uf : subject to the condition that the variable x is

constrained to lie on )(1 0S .

Having set up a framework of constrained optimization, the problem now is to

optimize f subject to the imposed constraint. In other words, we have to optimize the

function R→Sf S : , the restriction of the objective function f to the constraint set S .

Thus, we have:

Remark 4.4.3 (The problem of constrained optimization)

Optimize the restricted function R→Sf S : ,

i.e., find ( )maxSf

(or ( )minSf , depending on the type of problem under consideration).

Page 16: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 16

Ideally, we would obtain an ‘explicit’ description of the function R→Sf S : , and we

would apply the techniques of unconstrained optimization described in Sections 4.2 and

4.3. In particular, by analogy with the unconstrained case, we might expect that any

solutions to the constrained optimization problem would be among the critical points of

0Sf or possibly on the boundary of S .

One way of proceeding may be described as follows.

Remark 4.4.4 (Solving the constraint equation)

We try to solve the constraint equation, i.e., we use the implicit relation

cxx p =),...,( 1ϕ ,

to express px (say) as an explicit function of the remaining variables 11 ,..., −pxx ,

),...,( 11 −= pp xxXx .

Then we could substitute for px in ),...,( 1 pxxf , thereby obtaining a function of

)1( −p variables, viz.

)),...,(,,...,(),...,(ˆ),...,(; ˆ:ˆ1111

.

1111 −−−− =→ pp

def

pp xxXxxfxxfxxUf aR ,

(where the domain U must be specified).

Finally we could apply our unconstrained optimization techniques (see Sections 4.2

and 4.3) to the unconstrained function ˆ:ˆ R→Uf .

Note that this method of solving the constraint equation transforms a constrained

optimization problem in p variables into an unconstrained problem in )1( −p

variables.

If it works, then this method is sometimes the most direct way to proceed.

Page 17: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 17

Example 4.4.5

Suppose that we wish to construct an open cuboidal box which contains a given

volume 3. mV . Letting the base dimensions be . 1 mx and . 2 mx , and the height . 3 mx ,

we see that the volume constraint may be written Vxxx =321 . Suppose that we wish

to minimize the amount of material used in the construction of the box, i.e., we wish to

minimize the total surface area 2323121 . ))(2( mxxxxxx ++ . Using the language

developed above, we have:

domain of optimization }0,,:),,({ 3213

321 >∈= xxxxxxU R ;

objective function 32121321 )(2),,(;: xxxxxxxxUf ++→ aR ;

constraint function 321321 ),,(;: xxxxxxU aR→ϕ ;

constraint value Vc = ;

constraint set }:),,({ 321321 VxxxUxxxS =∈= .

Solving the constraint equation

Vxxx =321

gives us

),( 213 xxXx = ,

where

2121 ),(

xxVxxX = .

Page 18: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 18

Hence, defining the open set

}0,:),({ˆ21

221 >∈= xxxxU R ,

we must minimize the function

++=++=→

2121

212121212121

112)(2)),(,,(),(;ˆ:ˆxx

Vxxxx

VxxxxxxXxxfxxUf aR

Now, having solved the constraint, it remains for us to perform an unconstrained

optimization analysis on the function R→Uf ˆ:ˆ of two variables. Thus, we compute

)ˆCrit( f , etc., as in Section 4.2.

Remark 4.4.6

Note that, in the above example, and, of course, in general optimization problems, we

must prove that we actually find a global extremum of Sf . This can be done

explicitly, or by using additional information.

For example, suppose that, for some reason, we know that Sf has a unique global

extremum. Then, if there exists only one critical point, we must indeed have located

this global extremum.

4.5 Constrained Optimization II - The Method of Lagrange Multipliers

Unfortunately, in general, it is not possible, or not convenient, to solve the constraint

equation cxx p =),...,( 1ϕ to give one of the variables as an explicit function of the

remaining )1( −p variables.

Hence, another procedure is required. We now describe such a procedure, namely the

very powerful Method of Lagrange Multipliers. In order to state the result on which the

Method of Lagrange Multipliers is based, it is useful first to introduce the notion of the

gradient vector field.

Page 19: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 19

Definition 4.5.1

Suppose that pU Ropen

⊆ , and that R→UF : is differentiable.

The gradient of the function F is the vector field pUF R→∇ : defined by

∂∂

∂∂

=∇px

FxFF ,...,

1

.

Remarks 4.5.2

On identifying pR with the space of )1( p× -matrices pM ×1 , the gradient of

R→UF : is nothing more than the derivative of F (cf. Definition 3.1.7).

In particular, observe that, for U∈x , we have 0x =∇ )(F if and only if )(Crit F∈x .

The following result partly explains the geometric significance of the gradient vector

field.

Proposition 4.5.3

Suppose that pU Ropen⊆ , and that R→UF : is continuously differentiable.

Let R∈c . Suppose that )(10 cF −∈x , and assume that 0x ≠∇ )( 0F .

Then, in a neighbourhood of 0x , the c -level set )(1 cF − is a hypersurface (of

dimension equal to )1( −p ).

Moreover, )( 0xF∇ is a normal vector to )(1 cF − at 0x

(i.e., )( 0xF∇ is orthogonal to every tangent vector to )(1 cF − at 0x ).

The following Theorem, which gives a necessary condition for a constrained local

extremum, is fundamental to the Method of Lagrange Multipliers:

Page 20: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 20

Theorem 4.5.4

Suppose that pU Ropen

⊆ , and that R→Uf : and R→U:ϕ are continuously

differentiable functions.

Let R∈c , and denote by S the c -level set of ϕ , i.e., )(1 cS −= ϕ . Consider

R→Sf S : , the restriction of f to the subset S of U .

Let S∈0x , and assume that )(Crit0 ϕ∉x , i.e., assume that 0x ≠∇ )( 0ϕ .

Then, if )(Loc.Extr.0 Sf∈x , there exists R∈0λ such that

)()( 000 xx ϕλ ∇=∇f .

Remark 4.5.5

The conclusion of Theorem 4.5.4 may be interpreted geometrically as follows:

if R→Uf : , when restricted to the hypersurface US ⊆ , has a local extremum at

S∈0x , then )( 0xf∇ is orthogonal to S at 0x .

Remark 4.5.6

Theorem 4.5.4 provides us with two mutually exclusive possibilities for a local

extremum of the restriction of the objective function to the constraint set.

If )(Loc.Extr.0 Sf∈x , then, precisely one of the following two conditions must hold:

(Con 1) ))(())(( 00 0xx =∇∧= ϕϕ c ;

(Con 2) )))()(()(())(())(( 000000 xxR0xx ϕλλϕϕ ∇=∇∈∃∧≠∇∧= fc ;

In (Con 2), such a real number 0λ is called a Lagrange multiplier. In certain

applications of the method - e.g., to problems in Economics - the Lagrange multiplier

may be interpreted in a very useful way.

Page 21: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 21

Thus, in order to locate local extremal points of the constrained problem, we need to

search among points U∈0x satisfying either (Con 1) or (Con 2).

Define subsets 1L and 2L of the domain of optimization U as follows:

}1)(Consatisfies:{1 xx UL ∈= ;

}2)(Consatisfies:{2 xx UL ∈= .

We consider 1L and 2L in turn.

We have

)(Crit}))(())((:{1 ϕϕϕ ∩==∇∧=∈= ScUL 0xxx .

Thus, in order to obtain 1L , we just find all solutions U∈x of the simultaneous

equations c=)(xϕ and 0x =∇ )(ϕ . Alternatively, we could find )Crit(ϕ and then

intersect this with the constraint set S . Note that, in general, 1L will be ‘small’.

We have

})))()(()(())(())((:{2 xxR0xxx ϕλλϕϕ ∇=∇∈∃∧≠∇∧=∈= fcUL .

In order to find 2L , it is convenient to introduce the lagrangian function

RR →×− )(: 1LUg defined by

))(()(),( cfg −−= xxx ϕλλ ,

for all Rx ×−∈ )(),( 1LUλ .

Observe that:

)Crit(g })0))((())()((:)(),({ =−−∧=∇−∇×−∈= cfLU I x0xxRx ϕϕλλ

}))()(())(())((:),({ xx0xxRx ϕλϕϕλ ∇=∇∧≠∇∧=×∈= fcU .

Page 22: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 22

Hence, we see that

}))Crit(),(()(:{2 gUL ∈∈∃∈= λλ xRx .

Thus, in order to obtain 2L , we can just find all the critical points ),( λx of the

lagrangian function g , and then project out x from each one.

Observe that, having found 1L , we are left with the problem of finding the critical

points of RR →×− )(: 1LUg , a function of )1( +p variables ),,...,(),( 1 λλ pxx=x .

Thus, we might say that the Method of Lagrange Multipliers transforms a constrained

optimization problem in p variables into an unconstrained problem in )1( +p

variables.

The disjoint union SLLL ⊆∪= 21

def. contains all the critical points of R→Sf S : .

Some of these critical points might be local maxima, some might be local minima, but

some might be neither (just as in unconstrained critical point analysis). The next step

is to classify the elements of L .

One possible classification method uses a second derivative test, which involves the

notion of the so-called bordered hessian matrix, but we will not discuss this here.

Another method is to examine the geometry of the situation using level sets of the

objective function (see Examples 4.5.8).

Alternatively, if we had some theoretical result asserting the existence of global

extrema of R→Sf S : , then we need only compare the values of the objective

function at the critical points on our list (again, see Examples 4.5.8).

Now, for convenience, we express the Method of Lagrange multipliers, as described in

Remarks 4.5.6, in the form of an algorithm.

Page 23: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 23

Remark 4.5.7 (Algorithm for applying the Method of Lagrange Multipliers)

The problem is to optimize )(xf subject to c=)(xϕ . Perform the following five

steps.

Step Zero

Identify the data (cf. Remark 4.4.1):

pU Ropen⊆ the domain of optimization ;

R→Uf : the objective function ;

R→U:ϕ the constraint function ;

R∈c the constraint value .

Step One

Compute )Crit(1 ϕ∩= SL by finding all U∈x which satisfy the simultaneous

equations c=)(xϕ and 0x =∇ )(ϕ .

Step Two

(i) Write down the lagrangian function

))(()(),(; )(: 1 cfLUg −−→×− xxxRR ϕλλ a .

(ii) Find )Crit(g in the usual way (cf. Section 4.2).

(iii) Compute 2L by projecting out x from each element ),( λx of )Crit(g .

Step Three

Write down 21 LLL ∪= .

Then L is a subset of S which contains all possible candidates for a solution to

the original constrained optimization problem.

Page 24: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 24

Step Four

If it is known that the problem does have a solution (say from some theoretical

result), then this solution must be an element of the set L constructed in Step

Three. To identify the solution, proceed as follows. For each L∈x , compute

the corresponding value )(xf of the objective function, and see which x give

the highest or lowest value.

For example, suppose that the constraint set S is closed and bounded. Then,

because the constrained objective function R→Sf S : is continuous, it is

bounded, and, moreover, it attains its global maximum value ( )maxSf and its

global minimum value ( )minSf at some points of S . In particular, suppose that

2=L . Then one of the elements of L must be the global maximum, and the

other must be the global minimum. However, if 2>L , then some of the points

in L might not correspond to local extrema of R→Sf S : .

Note that, if S is not bounded, then Sf need not possess a global maximum or

global minimum; in this case, the original problem may not have a solution.

Examples 4.5.8

(i) Problem: maximize 21xx subject to 23 21 =+ xx .

Step Zero

2R=U ;

2121 ),(; : xxxxUf aR→ ;

2121 3),(; : xxxxU +→ aRϕ ;

2=c .

Page 25: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 25

Step One

We have 11

=∂∂xϕ and 3

2

=∂∂xϕ , so that ∅=)Crit(ϕ .

Hence, ∅=∩= )Crit(1 ϕSL .

Step Two

The lagrangian function g is given by

)23(),,(; : 2121212 −+−→× xxxxxxg λλ aRRR ,

so that λ−=∂∂

21

xxg , λ31

2

−=∂∂ xxg , and )23( 21 −+−=

∂∂ xxgλ

.

Hence, )Crit(),,( 21 gxx ∈λ if and only if ),,( 21 λxx satisfies the following

system of equations:

=−+=−=−

.023,03,0

21

1

2

xxxx

λλ

(*)

(Note that - as expected! - the equation 0=∂∂λg just gives us back the

constraint.)

As can be easily checked - do it! - the system (*) has a unique solution, namely

),,1(),,( 31

31

21 =λxx .

Hence, }),1({ 31

2 =L .

Step Three

We have }),1({ 31

21 =∪= LLL .

Step Four

It remains to check whether ),1(),( 31

21 =xx is a solution of our original

constrained optimization problem.

(Note that, since the constraint set }23:),{( 212

21 =+∈= xxxxS R is a line, we

can’t use existence arguments based on boundedness.)

Page 26: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 26

In fact, it is straightforward to check that Sxx ∈= ),1(),( 31

21 gives the global

maximum of R→Sf S : .

One way of doing this is using a geometrical argument. On the same diagram,

draw some appropriate level sets (curves) of 2121 ),( xxxxf = , together with the

constraint set (line) 23 21 =+ xx . Convince yourself that the point on

23 21 =+ xx where 2121 ),( xxxxf = takes its maximum value is the unique

point where 23 21 =+ xx is tangent to a level curve of f ; this point is indeed

),1(),( 31

21 =xx .

Note that this kind of level set analysis is often useful for classifying constrained

extrema.

(ii) Problem: minimize 1x subject to 022

31 =− xx .

Step Zero

2R=U ;

121 ),(; : xxxUf aR→ ;

22

3121 ),(; : xxxxU −→ aRϕ ;

0=c .

Step One

We have 21

1

3xx

=∂∂ϕ and 2

2

2xx

−=∂∂ϕ , so that )}0,0{()Crit( =ϕ .

Since }0:),{( 22

31

221 =−∈= xxxxS R ,

we have that )}0,0{()Crit(1 =∩= ϕSL .

Page 27: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 27

Step Two

The lagrangian function g is given by

)(),,(; )})0,0{((: 22

31121

2 xxxxxg −−→×− λλ aRRR ,

so that 21

1

31 xxg λ−=

∂∂ , 2

2

2 xxg λ=

∂∂ , and )( 2

23

1 xxg−−=

∂∂λ

.

Hence, )Crit(),,( 21 gxx ∈λ if and only if ),,( 21 λxx satisfies the following

system of equations:

=−==−

.0,02

,031

22

31

2

21

xxxx

λλ

(**)

You will see that the system (**) has no solutions – prove this! - so ∅=2L .

Step Three

We have })0,0({21 =∪= LLL .

Step Four

It remains to check whether )0,0(),( 21 =xx is a solution of our original

constrained optimization problem. Draw a picture of the constraint set

}0:),{( 22

31

221 =−∈= xxxxS R . Can you see why )0,0(),( 21 =xx is the

unique solution of the problem of minimizing 1x subject to Sxx ∈),( 21 ?

(iii) Problem: maximize 22

1 xx − subject to 122

21 =+ xx .

Step Zero

2R=U ;

22

121 ),(; : xxxxUf −→ aR ;

22

2121 ),(; : xxxxU +→ aRϕ ;

1=c .

Page 28: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 28

Step One

We have 11

2xx

=∂∂ϕ and 2

2

2xx

=∂∂ϕ , so that )}0,0{()Crit( =ϕ .

However, since }1:),{( 22

21

221 =+∈= xxxxS R ,

we see that ∅=∩= )Crit(1 ϕSL .

Step Two

The lagrangian function g is given by

)1(),,(; : 22

212

2121

2 −+−−→× xxxxxxg λλ aRRR ,

so that 111

22 xxxg λ−=

∂∂ , 2

2

21 xxg λ−−=

∂∂ , and )1( 2

22

1 −+−=∂∂ xxgλ

.

Hence, )Crit(),,( 21 gxx ∈λ if and only if ),,( 21 λxx satisfies the following

system of equations:

=−+=−−=−

.01,021,022

22

21

2

11

xxxxx

λλ

(***)

Solve the system (***).

You should find that })1,,(),1,,(),,1,0(),,1,0({)Crit( 21

23

21

23

21

21 −−−−−−=g .

Hence, we obtain }),(),,(),1,0(),1,0({ 21

23

21

23

2 −−−−=L .

Step Three

We have }),(),,(),1,0(),1,0({ 21

23

21

23

21 −−−−=∪= LLL .

Step Four

Note that the constraint set )0,0(}1:),{( 12

22

12

21 SxxxxS ==+∈= R is the

unit circle, and this is closed and bounded.

Hence the continuous function R→Sf S : attains its global maximum value

( )maxSf and its global minimum value ( )

minSf at some points of S .

Page 29: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 29

A global maximum of Sf is, in particular, a local maximum of Sf , and so is

an element of the set L . Similarly, a global minimum of Sf is an element of

the set L .

To pick out global maxima and global minima, we just compute the value of f

at each of the four points in L .

We find the following:

1)1,0( −=f ; 1)1,0( =−f ; 45

21

23 ),( =−f ; 4

521

23 ),( =−−f .

We conclude that ( ) 45

21

23

21

23

max),(),( =−−=−= fff S ,

and that ( ) 1)1,0(min

−== ff S .

The nature of the remaining point )1,0( − may be investigated by drawing, in the

same diagram, the constraint set S together with appropriate level curves of f .

You should be able to see that )1,0( − is, in fact, a local minimum of Sf .

(iv) Problem: Let +∈R321 ,, aaa , and consider the ellipsoid

=++∈= 1:),,(2

3

23

22

22

21

213

321ax

ax

ax

xxxS R .

Find the maximum and minimum values of the sum 321 xxx ++ for

Sxxx ∈),,( 321 .

Step Zero

3R=U ;

321321 ),,(; : xxxxxxUf ++→ aR ;

23

23

22

22

21

21

321 ),,(; :ax

ax

ax

xxxU ++→ aRϕ ;

1=c .

Page 30: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 30

Step One

We have 21

1

1

2ax

x=

∂∂ϕ , 2

2

2

2

2a

xx

=∂∂ϕ and 2

3

3

3

2ax

x=

∂∂ϕ , so that )}0,0,0{()Crit( =ϕ .

However, S∉)0,0,0( , so ∅=∩= )Crit(1 ϕSL .

Step Two

The lagrangian function g is given by

−++−++→× 1),,,(; : 2

3

23

22

22

21

21

3213213

ax

ax

ax

xxxxxxg λλ aRRR ,

so that 21

1

1

21a

xxg λ

−=∂∂ , 2

2

2

2

21a

xxg λ

−=∂∂ , 2

3

3

3

21

ax

xg λ

−=∂∂ , and

−++−=

∂∂ 12

3

23

22

22

21

21

ax

ax

axg

λ.

Hence, )Crit(),,,( 321 gxxx ∈λ if and only if ),,,( 321 λxxx satisfies the

following system of equations:

=−++

=−

=−

=−

.01

,02

1

,02

1

,021

23

23

22

22

21

21

23

3

22

2

21

1

ax

ax

ax

ax

ax

ax

λ

λ

λ

(****)

Solving the system (****) yields the two solutions

=

2,,,),,,(

23

22

21

321q

qa

qa

qa

xxx λ ,

and

−−−−=

2,,,),,,(

23

22

21

321q

qa

qa

qa

xxx λ ,

where, for convenience, we have defined 23

22

21 aaaq ++= .

Page 31: CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical Methods/S…tim swift 1 CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION 4.0 Motivation

tim swift 31

Hence, we obtain

−−−

=

qa

qa

qa

qa

qa

qa

L2

32

22

12

32

22

12 ,,,,, .

Step Three

We have

−−−

=∪=

qa

qa

qa

qa

qa

qa

LLL2

32

22

12

32

22

121 ,,,,, .

Step Four

Note that the constraint set S is an ellipsoid, and thus it is closed and bounded.

Hence the continuous function R→Sf S : attains its global maximum value

( )maxSf and its global minimum value ( )

minSf at some points of S .

Since 2=L , one of the elements be the global maximum of Sf , and the

other must be the global minimum.

We have

qq

aq

aq

af =

23

22

21 ,, ,

and

qq

aq

aq

af −=

−−−

23

22

21 ,, ,

so we conclude that the maximum value of the function 321 xxx ++ on the

ellipsoid 123

23

22

22

21

21 =++

ax

ax

ax is equal to 2

32

22

1 aaa ++ , whilst the minimum

value is equal to 23

22

21 aaa ++− .