MATH2070 - Non-linear optimisation with constraintsMATH2070 Optimisation Nonlinear optimisation with...

MATH2070 Optimisation

Nonlinear optimisation with constraints

Semester 2, 2012Lecturer: I.W. Guo

Lecture slides courtesy of J.R. Wishart

Introduction Lagrange Inequality Constraints and Kuhn-Tucker Second order conditions

Review

The full nonlinear optimisation problem with equality constraints

Method of Lagrange multipliers

Dealing with Inequality Constraints and the Kuhn-Tuckerconditions

Second order conditions with constraints


Non-linear optimisation

Interested in optimising a function of several variables withconstraints.

Multivariate framework

Variables x = (x1, x2, . . . , xn) ∈ D ⊂ Rn

Objective Function Z = f (x1, x2, . . . , xn)

Constraint equations,

g1(x1, x2, . . . , xn) = 0, h1(x1, x2, . . . , xn) ≤ 0,

g2(x1, x2, . . . , xn) = 0, h2(x1, x2, . . . , xn) ≤ 0,

......

gm(x1, x2, . . . , xn) = 0, hp(x1, x2, . . . , xn) ≤ 0.


Vector notation

Multivariate framework

Minimise Z = f (x)

such that g(x) = 0, h(x) ≤ 0,

where g :Rn−→Rm and h :Rn−→Rp.

Objectives?

I Find extrema of above problem.

I Consider equality constraints first.

I Extension: Allow inequality constraints.


Initial Example

Minimise the heat loss through the surface area of an open lidrectangular container.

Formulated as a problem

Minimise,S = 2xy + 2yz + xz

such that,

V = xyz ≡ constant and x, y, z > 0.

I Parameters : (x, y, z)

I Objective function : f = S

I Constraints : g = xyz − V


Graphical interpretation of problem

level contours of f(x, y)

g(x, y) = c

P

I Feasible points occur at the tangential intersection of g(x, y)and f(x, y).

I Gradient of f is parallel to gradient of g.I Recall from linear algebra and vectors

∇f + λ∇g = 0,

for some constant, λ.


Lagrangian

Definition

Define the Lagrangian for our problem,

L(x, y, z, λ) = S(x, y, z) + λg(x, y, z)

= 2xy + 2yz + xz + λ (xyz − V )

I Necessary conditions,

∂L

∂x=∂L

∂y=∂L

∂z=∂L

∂λ= 0.

I This will be a system of four equations with four unknowns(x, y, z, λ) that needs to be solved


Solving the system

I Calculate derivatives,

Lx = 2y + z − λyz, Ly = 2x+ 2z − λxz, Lz = 2y + x− λxy,Lλ = xyz − V,

I Ensure constraint,

Lλ = 0⇒ z =V

xy(1)

I Eliminate λ, solve for (x, y, z)

Lx = 0⇒ 2y + z = λyz ⇒ λ =2

z+

1

y(2)

Ly = 0⇒ 2x+ 2z = λxz ⇒ λ =2

z+

2

x(3)

Lz = 0⇒ 2y + 2x = λxy ⇒ λ =2

x+

1

y(4)


Solving system (continued)

I From (2) and (3) it follows x = 2y

I From (2) and (4) it follows x = z

I Use (1) with x = z = 2y then simplify,

x = (x, y, z) =

(3√

2V ,3

√V

4,

3√

2V

)


Constrained optimisation

Wish to generalised previous argument to multivariate framework,

Minimise Z = f (x), x ∈ D ⊂ Rn

such that g(x) = 0, h(x) ≤ 0,

where g :Rn−→Rm and h :Rn−→Rp.

I Geometrically, the constraints define a subset Ω of Rn, thefeasible set, which is of dimension less than n−m, in whichthe minimum lies.


Constrained Optimisation

I A value x ∈ D is called feasible if both g(x) = 0 andh(x) ≤ 0.

I An inequality constraint hi(x) ≤ 0 isI Active if the feasible x ∈ D satisfies hi(x) = 0I Inactive otherwise.


Equality Constraints

A point x∗ which satisfies the constraint g(x∗) = 0 is a regularpoint if

∇g1(x∗),∇g2(x∗). . . . ,∇gm(x∗)

is linearly independent.

Rank

∂g1∂x1

. . . ∂g1∂xn

......

∂gm∂x1

. . . ∂gm∂xn

= m.


Tangent Subspace

Definition

Define the constrainted tangent subspace

T = d ∈ Rn : ∇g(x∗)d = 0 ,

where ∇g(x∗) is an m× n matrix defined,

∇g(x∗) :=

∂g1∂x1

. . . ∂g1∂xn

......

∂gm∂x1

. . . ∂gm∂xn

=

∇g1(x∗)T

...∇gm(x∗)T

and ∇gTi :=

(∂gi∂x1

, . . . , ∂gi∂xn

)


Tangent Subspace (Example)


Generalised Lagrangian

Definition

Define the Lagrangian for the objective function f(x) withconstraint equations g(x) = 0,

L(x,λ) := f +

m∑i=1

λigi = f + λTg .

I Introduce a lagrange multiplier for each equality constraint.


First order necessary conditions

Theorem

Assume x∗ is a regular point of g(x) = 0 and let x∗ be a localextremum of f subject to g(x) = 0, with f, g ∈ C1. Then thereexists λ ∈ Rm such that,

∇L(x∗,λ) = ∇f(x∗) + λT∇g(x∗) = 0.

In sigma notation:

∂L(x∗)

∂xj=∂f(x∗)

∂xj+

m∑i=1

λi∂gi(x

∗)

∂xj= 0 , j = 1, . . . , n ;

and∂L(x∗)

∂λi= gi(x

∗) = 0 , i = 1, . . . ,m .


Another example

Example

Find the extremum for f(x, y) = 12(x− 1)2 + 1

2(y− 2)2 + 1 subjectto the single constraint x+ y = 1.

Solution: Construct the Lagrangian:

L(x, y, λ) = 12(x− 1)2 + 1

2(y − 2)2 + 1 + λ(x+ y − 1) .

Then the necessary conditions are:

∂L

∂x= x− 1 + λ = 0

∂L

∂y= y − 2 + λ = 0

∂L

∂λ= 0⇒ x+ y = 1 .


Solve system

Can solve this by converting problem into matrix form.

i.e. solve for a in Ma = b1 0 10 1 11 1 0

xyλ

=

121

Invert matrix M and solve gives,xy

λ

=

1 0 10 1 11 1 0

−1121

=

011


Comparison with unconstrained problem

Example

Minimize f(x, y) = 12(x− 1)2 + 1

2(y − 2)2 + 1 subject to the singleconstraint x+ y = 1.

I In unconstrained problem, minimum attained at x = (1, 2).

Minimum at f∗ = f(1, 2) = 1

I Constraint modifies solution.

Constrained minimum at f∗ = f(0, 1) = 2.

I Note: Constrained minimum is bounded from below by theunconstrained minumum.


Use of convex functions : Example

Example

Minimise: f(x, y) = x2 + y2 + xy− 5y such that: x+ 2y = 5 .

The constraint is linear (therefore convex) and Hessian matrix of fis positive definite so f is convex as well.

The Lagrangian is

L(x, y, λ) = x2 + y2 + xy − 5y + λ(x+ 2y − 5) .


Solve

Finding first order necessary conditions,

∂L

∂x= 2x+ y + λ = 0 (5)

∂L

∂y= 2y + x− 5 + 2λ = 0 (6)

∂L

∂λ= x+ 2y − 5 = 0 . (7)

In matrix form the system is,2 1 11 2 21 2 0

xyλ

=

055

Inverting and solving yields,xy

λ

=1

3

−5100


Inequality constraints

We now consider constrained optimisation problems with pinequality constraints,

Minimise: f(x1, x2, . . . , xn)

subject to: h1(x1, x2, . . . , xn) ≤ 0

h2(x1, x2, . . . , xn) ≤ 0

...

hp(x1, x2, . . . , xn) ≤ 0 .

In vector form the problem is

Minimise: f(x)

subject to: h(x) ≤ 0.


The Kuhn-Tucker Conditions

I To deal with the inequality constraints, introduce the so calledKuhn-Tucker conditions.

∂f

∂xj+

p∑i=1

λi∂hi∂xj

= 0 , j = 1, . . . , n ;

λihi = 0 , i = 1, . . . ,m ;

λi ≥ 0 , i = 1, . . . ,m .

I Extra conditions for λ apply in this sense,

hi(x) = 0 , if λi > 0, i = 1, . . . , p

or

hi(x) < 0 , if λi = 0, i = 1, . . . , p.


Graphical Interpretation

h(x, y) < 0 (λ = 0)


h(x, y) = 0

(λ > 0)

P

h(x, y) > 0

I If λ > 0 then find solution on line h(x, y) = 0

I If λ = 0 then find solution in region h(x, y) < 0




h(x, y) = 0

(λ > 0)

(x∗, y∗)

I Consider h(x, y) = 0, then λ > 0. Optimum at x∗ = (x∗, y∗)



(x∗, y∗)

h(x, y) < 0 (λ = 0)


h(x, y) = 0

I Consider h(x, y) < 0, then λ = 0. Optimum at x∗ = (x∗, y∗)


Kuhn Tucker

I Kuhn-Tucker conditions are always necessary.

I Kuhn-Tucker conditions are sufficient if the objective functionand the constraint functions are convex.

I Need a procedure to deal with these conditions.


Introduce slack variables

Slack Variables

To deal with the inequality constraint introduce a variable, s2i foreach constraint.

New constrained optimisation problem with m equality constraints,

Minimise: f(x1, x2, . . . , xn)

subject to: h1(x1, x2, . . . , xn) + s21 = 0

h2(x1, x2, . . . , xn) + s22 = 0

...

hp(x1, x2, . . . , xn) + s2p = 0 .


New Lagrangian

The Lagrangian for constrained problem is,

L(x,λ, s) = f(x) +

p∑i=1

λi(hi(x) + s2i

)= f(x) + λTh(x) + λTs,

where s is a vector in Rp with i-th component s2i .

I L depends on the n+ 2m independent variablesx1, x2, . . . , xn, λ1, λ2, . . . , λp and s1, s2, . . . , sp.

I To find the constrained minimum we must differentiate Lwith respect to each of these variables and set the result tozero in each case.


Solving the system

Need to solve n+ 2m variables given by the n+ 2m equationsgenerated by,

∂L

∂x1=

∂L

∂x2= · · · = ∂L

∂xn= 0

∂L

∂λ1=

∂L

∂λ2= · · · = ∂L

∂λp=∂L

∂s1=∂L

∂s2= · · · = ∂L

∂sp= 0.

I Having λi ≥ 0 ensures that

L = f(x) +

p∑i=1

λi(hi(x) + s2i

)has a well defined minimum.

I If λi < 0, we can always make L smaller due to the λis2i term.


New first order equations

I The differentiation, ∂L/∂λi = 0, ensures the constraints hold.

Lλi = 0⇒ hi + s2i = 0 , i = 1, . . . , p .

I The differentiation, ∂L∂si

= 0, controls observed region ofminimising,

Lsi = 0⇒ 2λisi = 0 , i = 1, . . . , p .

I If λi > 0 we are on the boundary since si = 0

I If λi = 0 we are inside the feasible region since si 6= 0.

I If λi < 0 we are on the boundary again, but the point is not alocal minimum. So only focus on λi ≥ 0.


Simple application

Example

Consider the general problem

Minimise: f(x)

subject to: a ≤ x ≤ b .

I Re-express the problem to allow the Kuhn-Tucker conditions.

Minimise: f(x)

subject to: − x ≤ −ax ≤ b .

I The Lagrangian L is given by

L = f + λ1(−x+ s21 + a) + λ2(x+ s22 − b) .


Solving system

I Find first order equations

∂L

∂x= f ′(x)− λ1 + λ2 = 0,

∂L

∂λ1= −x+ a+ s21 = 0

∂L

∂λ2= x− b+ s22 = 0,

∂L

∂s1= 2λ1s1 = 0

∂L

∂s2= 2λ2s2 = 0 .

I Consider the scenarios for when the constraints are at theboundary or inside the feasible region.


Possible scenarios

x

y

f ′(x) > 0

a bDOWNx

y

a b

f ′(x) = 0

INx

y

f ′(x) < 0

a bUP

I Down variable scenario (lower boundary).

s1 = 0⇒ λ1 > 0 and s22 > 0⇒ λ2 = 0.

Implies that x = a, and f ′(x) = f ′(a) = λ1 > 0, so that theminimum lies at x = a.


Possible scenarios

x

y

f ′(x) > 0

a bDOWNx

y

a b

f ′(x) = 0

INx

y

f ′(x) < 0

a bUP

I In variable scenario (Inside feasible region).

s21 > 0⇒ λ1 = 0 and s22 > 0⇒ λ2 = 0.

Implies f ′(x) = 0, for some x ∈ [a, b] determined by thespecific f(x), and thus there is a turning point somewhere inthe interval [a, b]


Possible scenarios

x

y

f ′(x) > 0

a bDOWNx

y

a b

f ′(x) = 0

INx

y

f ′(x) < 0

a bUP

I Up variable scenario (upper boundary).

s21 > 0⇒ λ1 = 0 and s2 = 0⇒ λ2 > 0.

Implies x = b, so that f ′(x) = f ′(b) = −λ2 < 0 and we havethe minimum at x = b.


Numerical example

Example

Minimise: f(x1, x2, x3) = x1(x1 − 10) + x2(x2 − 50)− 2x3

subject to: x1 + x2 ≤ 10

x3 ≤ 10 .

Write down Lagrangian,

L(x,λ, s) = f(x) + λT (h(x) + s) ,

where,

h(x) =

(x1 + x2 − 10x3 − 10

)s =

(s21

s22

)


Solve the system of first order equations

∂L

∂x1= 2x1 − 10 + λ1 = 0 (8)

∂L

∂x2= 2x2 − 50 + λ1 = 0 (9)

∂L

∂x3= −2 + λ2 = 0 (10)

The Kuhn-Tucker conditions are λ1, λ2 ≥ 0 and

λ1(x1 + x2 − 10) = 0 (11)

λ2(x3 − 10) = 0 (12)

Immediately from (10) and (12), λ2 = 2 and x3 = 10.


Two remaining cases to consider

I Inside the feasible set λ1 = 0.

From equations (8) and (9) find,

x = (x1, x2, x3) = (5, 25, 10)

However, constraint x1 + x2 ≤ 10, is violated

Reject this solution!

I At the boundary, λ1 > 0

Using constraint equation (11) x1 +x2 = 10. Find the system:

−2x1 − 30 + λ1 = 0

2x1 − 10 + λ1 = 0 ,

Solving gives, λ1 = 20 and x1 = −5, x2 = 15.


Motivation

I Determine sufficient conditions for minima (maxima) given aset of constraints.

I Define the operator ∇2 to denote the Hessian matrices,

∇2f := Hf .

I What are the necessary/sufficient conditions on ∇2f and∇2gj

mj=1

that ensure a constrained minimum?

I Consider equality constraints first.

I Extend to inequality constraints.


Notation

Hessian of the constraint equations in the upcoming results needscare . Define as follows,

I Constraint Hessian involving Lagrange multipliers,

λT∇2g(x) :=

m∑i=1

λi∇2gi(x) =

m∑i=1

λiHgi(x)


Necessary second order condition for a constrainedminimum with equality constraints

Theorem

Suppose that x∗ is a regular point of g ∈ C2 and is a localminimum of f such that g(x) = 0 and x∗ is a regular point of g.Then there exists a λ ∈ Rm such that,

∇f(x∗) + λT∇g(x∗) = 0.

Also, from the tangent plane T = y : ∇g(x∗)y = 0, theLagragian Hessian at x∗,

∇2L(x∗) := ∇2f(x∗) + λT∇2g(x∗),

is positive semi-definite on T . i.e. yT∇2L(x∗)y ≥ 0 for all y ∈ T .


Sufficient second order condition for a constrainedminimum with equality constraints

Theorem

Suppose that x∗ and λ satisfy,

∇f(x∗) + λT∇g(x∗) = 0.

Suppose further that the Lagragian Hessian at x∗,

∇2L(x∗) := ∇2f(x∗) + λT∇2g(x∗).

If yT∇2L(x∗)y > 0 for all y ∈ T . Then f(x∗) is a local minimumthat satisfies g(x∗) = 0.


Determine if positive/negative definite

Options for determining definiteness of ∇2L(x∗) on the subspaceT .

I Complete the square of yT∇2L(x∗)y.

I Compute the eigenvalues of yT∇2L(x∗)y.

Example

Max: Z = x1x2 + x1x3 + x2x3

subject to: x1 + x2 + x3 = 3.


Extension: Inequality constraints

Problem:

Min f(x)

subject to: h(x) ≤ 0

g(x) = 0.

where f, g,h ∈ C1.

I Extend arguments to include both equality and inequalityconstraints.

I Notation: Define the index set J :

Definition

Let J (x∗) be the index set of active constraints.

hj(x∗) = 0, j ∈ J ; hj(x

∗) < 0, j /∈ J .


First order necessary conditions

The point x∗ is called a regular point of the constraints,g(x) = 0 and h(x) ≤ 0, if

∇gimi=1 and ∇hjj∈J

are linearly independent.

Theorem

Let x∗ be a local minimum of f(x) subject the constraintsg(x) = 0 and h(x) ≤ 0. Suppose further that x∗ is a regularpoint of the constraints. Then there exists λ ∈ Rm and µ ∈ Rpwith µ ≥ 0 such that,

∇f(x∗) + λT∇g(x∗) + µT∇h(x∗) = 0

µTh(x∗) = 0.(13)


Second order necessary conditions

Theorem

Let x∗ be a regular point of g(x) = 0, h(x) ≤ 0 withf, g,h ∈ C2. If x∗ is a local minimum point of the fullyconstrained problem then there exists λ ∈ Rm and µ ∈ Rp suchthat (13) holds and

∇2f(x∗) +

m∑i=1

λi∇2gi(x∗) +

p∑j=1

µj∇2hj(x∗)

is positive semi-definite on the tangent space of the activeconstraints. i.e. on the space,

T =d ∈ Rn

∣∣∣∇g(x∗)d = 0; ∇hj(x∗)d = 0, j ∈ J


Second order sufficient conditions

Theorem

Let x∗ satisfy g(x) = 0, h(x) ≤ 0. Suppose there exists λ ∈ Rmand µ ∈ Rp such that µ ≥ 0 and (13) holds and

∇2f(x∗) +

m∑i=1

λi∇2gi(x∗) +

p∑j=1

µj∇2hj(x∗)

is positive definite on the tangent space of the active constraints.i.e. on the space,

T ′ =d ∈ Rn

∣∣∣∇g(x∗)d = 0; ∇hj(x∗)d = 0, j ∈ J,

where J =j∣∣∣hj(x∗) = 0, µj > 0

. Then x∗ is a strict local

minimum of f subject to the full constraints.


Ways to compute the sufficient conditions

The nature of the Hessian of the Lagrangian function and with itsauxiliary conditions can be determined.

1. Eigenvalues on the subspace T .

2. Find the nature of the Bordered Hessian.


Eigenvalues on subspaces

Let eimi=1 be an orthonormal basis of T . Then d ∈ T can bewritten,

d = Ez,

where z ∈ Rm and,

E = [e1 e2 . . . em] .

Then use,

dT∇2Ld = . . . ,


Bordered Hessian

Alternative method to determining the behaviour of the LagragianHessian.

Definition

Given an objective function f with constraints g :Rn−→Rm,define the Bordered Hessian matrix,

B(x,λ) :=

(∇2L(x,λ) ∇g(x)T

∇g(x) 0

)

Find the nature of the Bordered Hessian.

MATH2070 - Non-linear optimisation with constraintsMATH2070 Optimisation Nonlinear optimisation with...

Documents

Transcript of MATH2070 - Non-linear optimisation with constraintsMATH2070 Optimisation Nonlinear optimisation with...