01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n...

125
Nonlinear Optimization M.M. Pedram [email protected] Tarbiat Moallem University of Tehran (Spring 2011)

Transcript of 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n...

Page 1: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Nonlinear Optimization

M.M. [email protected]

Tarbiat Moallem University of Tehran(Spring 2011)

Page 2: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Referencesv Edwin K. P. Chong, Stanislaw H. Zak, An introduction to

Optimization, John Wiley & Sons, 2nd Edition, 2001.vDavid G. Luenberger, Linear and Nonlinear Optimization,

Addison-Wesley Publishing Company, 2nd Edition, 1989.v S.S. Rao, Optimization Theory and Application, John Wiley &

Sons, 2nd Edition, 1984.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 3: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Unconstrained Optimization

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 4: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

v Let X∈ ℜn:

v Let A = (aij) ∈ ℜn×n be a n×n symmetric matrix. the k-th order leading principal minor is defined as the determinant of the

k×k submatrix formed by the first k rows and columns.

1

2

1

.

.

n n

x

x

X

=

Quadratic Form

1 1 1 2 1

21 2 2 2

1 2

......

.

....

k

k

k k kk

a a a

a a a

a a a

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 5: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Quadratic FormvConsider the quadratic form:

v The above quadratic form (or simply matrix A) is called:

Positive semi-definite if XTAX ≥ 0 for all X.

Positive definite if XTAX > 0 for all X ≠ 0.

Negative semi-definite if XTAX ≤ 0 for all X.

Negative definite if XTAX < 0 for all X ≠ 0.

Indefinite if XTAX < 0 for some X and >0 for other X.

2

1

( )

2

T

ii i ij i ji i j n

Q X X A X

a x a x x≤ < ≤

=

= +∑ ∑ ∑

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 6: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Quadratic FormTheorems: Sylvester’s CriterionvA quadratic form XTAX, A = AT, is positive definite (or positive

semi-definite) if and only if all the leading principal minors of Aare >0 (or ≥ 0).

vA quadratic form XTAX, A = AT, is Negative definite ( negativesemi-definite) if and only if k-th leading principal minor of A has the sign of (-1)k , k=1,2,…,n (or k-th leading principal minor of A is zero or has the sign of (-1)k , k=1,2,…,n ).

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 7: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Quadratic FormTheoremsvA symmetric matrix A is positive definite (or positive semi-

definite) if and only if all eigenvalues of A are positive (or nonnegative).

vA symmetric matrix A is Negative definite (or negative semi-definite) if and only if all eigenvalues of A are negative (or nonpositive).

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 8: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Optimav Let f (X) = f (x1, x2 ,…, xn ) be a real-valued function of the n

variables x1, x2 ,…, xn.

v Suppose:

A point X0 is said to be a local maximum of f (X) if there exists an ε > 0 such that f (X0) ≥ f (X0+ h) for all |hj|≤ε. A point X0 is said to be a local minimum of f (X) if there exists an ε > 0 such

that f (X0) ≤ f (X0+ h) for all |hj|≤ε. A point X0 is said to be a strict local maximum of f (X) if there exists an ε >

0 such that f (X0) > f (X0+ h) for all |hj|≤ε. A point X0 is said to be a strict local minimum of f (X) if there exists an ε >

0 such that f (X0) < f (X0+ h) for all |hj|≤ε.

0 0 00 1 2

1 20 0 0

0 1 1 2 2

( , ,..., )( , ,..., )

( , , ..., )

n

n

n n

X x x x

h h h h

X h x h x h x h

==

+ = + + +

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 9: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Optima

A point X0 is said to be a absolute maximum or global maximum of f (X), if f (X0) ≥ f (X) for all X. A point X0 is said to be a absolute minimum or global minimum of f (X),

if f (X0) ≤ f (X) for all X. A point X0 is said to be a strict absolute maximum or strict global maximum of

f (X), if f (X0) > f (X) for all X. A point X0 is said to be a strict absolute minimum or strict global minimum of

f (X), if f (X0) < f (X) for all X.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 10: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example

x1: strict global minimizer x2: strict local minimizer x3: local (not strict) minimizer

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 11: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Conditions for Local MinimizersTheorem: First-Order Necessary Condition (FONC)v Let Ω be a subset of ℜn and f ∈C l a real-valued function on Ω. If

x* is a local minimizer of f over Ω, then for any feasible direction d at x*, we have

dT ∇f(x*) ≥ 0or <∇f(x*), d> ≥ 0

Corollary: Interior casev Let Ω be a subset of ℜn and f ∈C l a real-valued function on Ω. If

x* is a local minimizer of f over Ω and if x* is an interior point of Ω, then

∇f(x*) = 0

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 12: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Conditions for Local MinimizersTheorem (another statement)

vA necessary condition for x* to be an optimum point of f (x) is that ∇f (x*) = 0. i.e. , all the first order partial derivatives are zero at x*.

Definition

vA point x* for which ∇f (x*) = 0, is called a stationary point of f(x*) A stationary point is a potential candidate for local maximum or local

minimum.

i

fx

∂∂

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 13: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example: #1v Illustration of the FONC for the constrained case: x1 does not satisfy the FONC, x2 satisfies the FONC,

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 14: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example: #2vConsider the problem:

min f (x1,x2) = x12 + 0.5x2

2 + 3x2 + 4.5

s.t.: x1, x2≥0

a. Is the FONC for a local minimizer satisfied at x = [l,3]T?b. Is the FONC for a local minimizer satisfied at x = [0,3]T?c. Is the FONC for a local minimizer satisfied at x = [l,0]T?d.Is the FONC for a local minimizer satisfied at x = [0,0]T?

Solutionv ∇f (x1,x2) = [2x1, x2 + 3]T

v A plot of the level sets of f :

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 15: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example: #2a. At x = [1,3]T, we have ∇f (x1,x2) = [2,6]T. The point x = [1,3]T is an interior

point of Ω = x : x1≥ 0, x2≥ 0. Hence, the FONC requires ∇f (x1,x2) = 0. The point x = [1,3]T does not satisfy the FONC for a local minimizer.

b. At x = [0,3]T, we have ∇f (x1,x2) = [0,6]T, and hence dT ∇f (x1,x2) = 6d2, where d = [d1,d2]

T. For d to be feasible at (x1,x2), we need d1 > 0, and d2 can take an arbitrary value in ℜ. The point x = [0,3]T does not satisfy the FONC for a minimizer because d2 is allowed to be less than zero. For example, d = [1, -1]T is a feasible direction, but dT ∇f (x1,x2) = -6 < 0.

c. At x = [1,0]T, we have ∇f (x1,x2) = [2,3]T, and hence dT ∇f (x1,x2) = 2d2 +3d2. For d to be feasible at (x1,x2), we need d2 ≥ 0, and d1 can take an arbitrary value in ℜ. For example, d = [-5, -1]T is a feasible direction, but dT ∇f (x1,x2) = -7 < 0. Thus x = [1,0]T does not satisfy the FONC for a local minimizer.

d. At x = [0,0]T, we have ∇f (x1,x2) = [0,3]T, and hence dT ∇f (x1,x2) = 3d2. For d to be feasible at (x1,x2), we need d2 ≥ 0, and d1≥ 0. Hence x = [0,0]T satisfies the FONC for a local minimizer.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 16: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example #3: Function Approximationv Suppose that through an experiment the value of a function g is

observed at m points, x1, x2, . . . , xm. Thus, values g(x1), g(x2), . . . , g(xm) are known. We wish to approximate the function g(x) by a polynomial

h(x) = anxn + an-1xn-1 + . . . + a0

of degree n (or less), where n < m. Find ai ‘s.

Solutionv Define: ek = g(xk) − h(xk). v The best approximation is the polynomial that minimizes the sum of the squares

of these errors;2

1min

m

kk

e=

∑Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 17: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example: Function Approximationv or

v For FONC:

( ) 211 0 1 0

1

min ( , ,..., ) ( ) ...m

n nn n k n k n k

k

f a a a g x a x a x a−− −

=

= − + + + ∑

( )

( )

11 0

1

11 0

1 1

0

2 ( ) ... 0 0,1,...,

... . ( ) 0,1,...,

i

mi n nk k n k n k

k

m mn i n i i i

n k n k k k kk k

fa

x g x a x a x a for i n

or

a x a x a x x g x for i n

−−

=

+ − +−

= =

∂= ⇒

− − + + + = =

+ + + = =

∑ ∑

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 18: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example: Function Approximationv In matrix form:

v This leads directly to the system of (n+1) equations, which can be solved to determine ai’s.

0

1 1 1 1

( 1) ( 1) ( 1) 1( 1) 1

. ... ... ... . .:. ... ... ... . :

... ... ( )..

. ... ... ... . :

m m m mji i j i n i

k k k i k kk k k k

n n nn n

a

ax x x b g x x

a

+ +

= = = =

+ × + + ×+ ×

× =

=

∑ ∑ ∑ ∑

Column j

Rowi

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 19: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example: Function Approximationv Let

0

11

( 1) 1( 1) ( 1)

2

11

( 1) 1

. . .

. . .:

. .

. . .

.:

( )( ).

:

a

b c

mi j

n ij kk

n nn n

mm

kjkj k k

k

n

a

Aa A x

a

g xb g x x

+−

=

+ ×+ × +

==

+ ×

= = =

= =

=

∑∑

LL

L

L

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 20: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example: Function Approximationv The problem can be stated as the following quadratic form:

v Then as said before, the solution is determined by solving the following system of (n+1) equations :

v It should be noted that the answer is the solution to LSE problem.

1 0min ( , ,..., ) - 2a a b a cT Tn nf a a a A− = +

a bA =

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 21: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Conditions for Local MinimizersDefinitionvHessian matrix of f (X) : H(x) is the n×n matrix whose i-th row are

the partial derivaves of (j =1,2,…,n) with respect to xi (i=1,2,…,n). j

fx

∂∂

2 2 2

21 1 2 1

2 2 2

22 1 2 22

2 2 2

21 2

. .

. .( ) ( )

.

.

. . .

n

n

n n n

f f fx x x x x

f f fx x x x x

H f

f f fx x x x x

∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂ = ∇ =

∂ ∂ ∂ ∂ ∂ ∂ ∂ ∂

x x

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 22: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example

2 31 2 3 1 2 1 3( , , ) 7 3 4f x x x x x x x= − +

3

2

2

2 23 1 3

0 0 12

0 6 012 0 24

x

fx x x

∇ = −

3 23 2 1 37 4 6 12f x x x x ∇ = + −

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 23: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Conditions for Local MinimizersTheorem: Second-Order Necessary Condition (SONC)v Let Ω ⊂ ℜn and f ∈C 2 a real-valued function on Ω, x* a local

minimizer of f over Ω, and d a feasible direction at x*.If dT ∇f(x*) = 0, then

dT H(x*) d ≥ 0 (H(x*) is a positive semi-definite)where H is the Hessian of f.

Corollary: Interior Casev Let x* be an interior point of Ω ⊂ ℜn. If x* is a local minimizer of

f : Ω →ℜn, f ∈C 2, then∇f(x*) = 0 and dT H(x*) d ≥ 0 for all d∈ℜn

i.e. H(x*) is positive semi-definite.Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 24: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Conditions for Local MinimizersTheorem: Second-Order Sufficient Condition (SOSC), Interior Casev Let f ∈C 2 be defined on a region in which x* is an interior point.

Suppose that:1. ∇f(x*) = 0, 2. H(x*) is positive definite , i.e. dT H(x*) d > 0.Then, x* is a strict local minimizer of f.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 25: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Conditions for Local MinimizersTheorem

v Let X0 be a stationary point of f (X), A sufficient condition for X0 to be a

local minimum of f (X) is that the Hessian matrix H(X0) ispositive definite;

local maximum of f (X) is that the Hessian matrix H(X0) is negative definite.

If H(X0) is neither negative definite nor positive definite,

Ø If detH(X0) = 0 then x* is a local minimum, local maximum, or saddle point

Ø If detH(X0) ≠ 0 then x* is not a optimum

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 26: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Conditions for Local MinimizersCorollaryv If the Hessian matrix H(X) is indefinite at X0, where the necessity

conditions are satisfied, then the point X0 is not an extreme point.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 27: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Conditions for Local MinimizersQuestionvHow to determine sufficient condition when H(X) is semi-definite?

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 28: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Review: Necessary and Sufficient ConditionsLocal Minimumv Necessary conditions First-order (FONC)

∇f(x0)=0 (x0: stationary point) Second-order (SONC)

H(x0)=∇2f(x0) is positive semi-definite.v Sufficient conditions First-order (FOSC)

∇f(x0)=0 (x0: stationary point) Second-order (SOSC)

H(x0)=∇2f(x0) is positive definite.

Global Minimumv Compare all local minimums

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 29: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Find the stationary points of the function

f (x1,x2,x3) = 2x1x2x3 – 4x1x3 – 2x2x3 + x12 + x2

2 + x32 – 2x1 - 4x2 + 4x3

and hence find the extrema of f.

Solution:

2 3 3 11

1 3 3 22

1 2 1 2 33

2 4 2 2 0 (1)

2 2 2 4 0 (2)

2 4 2 2 4 0 (3)

fx x x x

x

fx x x x

x

fx x x x x

x

∂= − + − =

∂∂

= − + − =∂∂

= − − + + =∂

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 30: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Substituting in (3) for x2:

2x1+ x1x3 – x12x3 – 2x1 – 2 – x3 + x1x3 + x3 = -2

or

x1x3 (2 – x1) = 0

thus, x1= 0 or x3 = 0 or x1 = 2

v Case (i) x1 = 0:

(1) ⇒ x2x3 – 2x3 = 1 (4)

(2) ⇒ x2 – x3 = 2 (5)

(3) ⇒ -x2+x3 = -2 same as (5)

(4) using (5) ⇒ x3(2 + x3) – 2x3 = 1 or x32 = 1 i.e. x3= ±1

Example

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 31: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example Sub-case (i) x3 = 1 (using (5) ) ⇒ x2 = 3

Sub-case (ii) x3= -1 (using (5) ) ⇒ x2 = 1

There is 2 stationary points: (0,3,1), (0,1,-1)

v Case (ii) x3 = 0 :

(1) ⇒ x1 = 1

(2) ⇒ x2 = 2

⇒ (3) : x1x2 - 2x1 - x2 = -2 üTherefore, the stationary point is : (1,2,0)

v Case (iii) x1= 2 :

(1) ⇒ x2x3 – 2x3 = -1 (6)

(2) ⇒ x2 + x3 = 2 (7)

(3) ⇒ x2 + x3 = 2 same as (7)

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 32: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example(6) Using (7) ⇒ x3(2 – x3) – 2x3 = -1 ≡ x3

2 = 1 ⇒ x3 = ±1

Sub-case (i) x3 = 1 ⇒ (using (5)) x2 = 1

Sub-case(ii) x3 = -1 ⇒ (using (5) x2 = 3

There is 2 stationary points: (2,1,1), (2,3,-1)

The Hessian matrix:

3 2

3 1

2 1

2 2 2 4( ) 2 2 2 2

2 4 2 2 2

x xH X x x

x x

− = − − −

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 33: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

ExamplePoint Principal minors Nature

(0,3,1) 2, 0, -32 Saddle point

(0,1,-1) 2, 0, -32 Saddle point

(1,2,0) 2, 4, 8 Local min

(2,1,1) 2, 0, -32 Saddle point

(2,3,-1) 2, 0, -32 Saddle point

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 34: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Convex & Concave FunctionDefinition

v A function f(X)= f(x1,x2,…xn) of n variables is said to be convex if for each pair of points X1, X2 on the graph, the line segment joining these two points lies entirely above or on the graph.

i.e.

f((1− α)X1 + α X2) ≤ (1− α) f(X1) + α f(X2) for all α, 0 ≤ α ≤ 1

f is said to be strictly convex if for each pair of points X1, X2 on the graph,

f((1− α)X1 + α X2) < (1− α) f(X1) + α f(X2) for all α, 0 ≤ α ≤ 1

v f is called concave (strictly concave), if −f is convex (strictly convex).

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 35: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example: convex functionv f(x) = x2

X1 X2(1- α) X1 + α X2

x

y

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 36: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

v f(x) = -x2

Example: concave function

X1 X2(1- α) X1 + α X2 x

y

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 37: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

-60 -40 -20 0 20 40 60-0.4

-0.2

0

0.2

0.4

0.6

0.8

1

1.2

1.4

Example: nonconvex/nonconcave function

X1 X2

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 38: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Convexity for function of one variable

2

2 0d fdx

2

2 0d fdx

≤ Concave:

Convex:2

2 0d fdx

2

2 0d fdx

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 39: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Convexity test for functions of 2 variables

Principal minors convex Strictly convex concave Strictly concave

fxx ≥ 0 > 0 ≤ 0 < 0

fxx fyy − (fxy)2 ≥ 0 > 0 ≥ 0 > 0

2 2

2

2 2

2

( ) xx xy

yx yy

f ff fx x y

H Xf ff f

y x y

∂ ∂ ∂ ∂ ∂ = = ∂ ∂

∂ ∂ ∂

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 40: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Find if f is convex, concave or neither.

f (x1,x2) = 3x1 + 5x2 – 4x12 + x2

2 – 5x1x2

Solutionv Put f in matrix form

[ ] [ ]1 11 2 1 2

2 2

542( , ) 3 5

5 12

x xf x x x x

x x

− − = +

1 2

542( , )

5 12

A x x

− − ⇒ =

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 41: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example

v Since some eigenvalues are negative and some are positive, A is neither positive definite nor negative definite, which implies that f is neither convex and nor concave.

2

1

2

det( ) 054 412det 0 3 0

5 412

3 50 02

3 50 02

A Iλ

λλ λ

λ

λ

λ

⇒ − =

− − − ⇒ = ⇒ + − =

− −

− +⇒ = >

− −= <

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 42: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Find local minimum of f :

f (x) = x3

Solution

v The point x = 0 satisfies the FONC, SONC, but not SOSC.

2( ) 3 0 0( ) 6 ( ) 0f x x x

H x x H x

∇ = = ⇒ == ⇒ =

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 43: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Find local minimum of f :

f (x1,x2) = x12 + x2

2

Solution

v The point X = [0,0]T satisfies the FONC, SONC, and SOSC. It is a strict local minimizer.

v Actually X = [0,0]T is a strict global minimizer.

11 2 1 2

2

2

2 0( , ) 0 ( , )

2 0

2 0( ) ( )

0 2

xf x x X x x

x

H X X H X

∇ = = ⇒ = =

= ∈ℜ ⇒

for all is positivedefinite

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 44: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Find local minimum of f

f (x1,x2) = x12 - x2

2

Solution

v The point X = [0,0]T satisfies the FONC, but SONC is not satisfied. v It is a saddle point.

11 2 1 2

2

2

2 0( , ) 0 ( , )

2 0

2 0( ) ( )

0 2

xf x x X x x

x

H X X H X in

∇ = = ⇒ = = −

= ∈ℜ ⇒ −

for all is definite

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 45: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example

f(x1, x2)

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 46: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Importance of Concave Functions in NLPv Suppose that we have an NLP with the following properties: the feasible region, say Ω, is convex the objective function, say f, is concave the objective is to maximize the value of the objective function:

Then:v Any local maximum is a global maximum!

( )max s.t.

z f x

x

=

∈Ω

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 47: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Importance of Concave Functions in NLPproofv Suppose there exists a solution x’ that is a local maximum, but not a global

maximum. Since x’ is not a global maximum, there exists a solution x with the property that f(x) > f(x’).

v Now use the fact that f is concave Since f is concave, we have that

v If α is very close to 1, then α x’+(1-α)x is very close to x’ f (αx’+(1-α)x ) > f (x’ )

v Therefore, x’ cannot be a local optimum. This is a contradiction to our assumption that there exists a local maximum that is not a global maximum!

( ) ( ) ( )( ) ( )

( )

(1 ) (1 ) 0 1

(1 )

f x x f x f x

f x f x

f x

α α α α α

α α

′ ′+ − ≥ + − < <

′ ′> + −

′>

, for all

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 48: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Importance of Convex Functions in NLPv Suppose that we have an NLP with the following properties: the feasible region, say Ω, is convex the objective function, say f, is convex the objective is to minimize the value of the objective function:

Then:v Any local minimum is a global minimum!v The Hessian matrix of a convex function is positive semi-definite.

( )min s.t.

z f x

x

=

∈Ω

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 49: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Importance of Convex Functions in NLPAnother reason:v Basic optimization algorithms search for local optima. Those that try to find global optima generally just run underlying algorithms

several times starting at different solutions.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 50: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Properties of Convex/Concave FunctionsTheoremv Let f∈C1. Then f is convex over a convex set Ω if and only if

f(x) ≥ f(x0) + ∇f(x0)(x - x0)for all x, x0 ∈ Ω.

vA convex function lies aboveits tangent planes.

x

f(x)

x0

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 51: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Properties of Convex/Concave FunctionsTheoremv Let f ∈C2. Then f is convex over a convex set Ω containing an

interior point if and only if the Hessian matrix H of f is positive semi-definite throughout Ω.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 52: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Properties of Convex/Concave FunctionsNotes1. The Hessian matrix is the generalization to ℜn of the concept of

the curvature of a function, and correspondingly, positive definiteness of the Hessian is the generalization of positive curvature. Convex functions have positive (or at least nonnegative) curvature in every direction.

2. We sometimes refer to a function as being locally convex if its Hessian matrix is positive semi-definite in a small region, and locally strictly convex if the Hessian is positive definite in the region.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 53: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Properties of Convex/Concave FunctionsTheorem 1v Let f be a convex function defined on the convex set Ω. Then the

set Г where f achieves its minimum is convex, and any relative minimum of f is a global minimum.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 54: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Properties of Convex/Concave FunctionsTheorem 2v Let f ∈ С1 be convex on the convex set Ω. If there is a point

x*∈Ω such that, for all у∈Ω, ∇f(x*)(y - x*) ≥ 0, then x* is a global minimum point of f over Ω.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 55: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Properties of Convex/Concave FunctionsTheorem 3v Let f be a convex function defined on the bounded, closed convex

set Ω. If f has a maximum over Ω it is achieved at an extreme point of Ω.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 56: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

v Extrema = minimum/ maximum

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 57: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

constrained optimization

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 58: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Level Sets of a FunctionDefinitionv The level set of a function f : ℜn→ℜ at level c is the set of points

S=x| f(x)=c

v For f : ℜ2→ℜ , we are usually interested in S when it is a curve. v For f : ℜ3→ℜ , the sets S most often considered are surfaces.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 59: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Non-linear optimization with constraints 3

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 60: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example #3: vProduct Mix Problem:

Data Mining, Spring 2011

…..Eq (4)Max Z = 13x1 + 11x2 (Income)

s.t.4x1 + 5x2 ≤1500 (Storage Space)5x1 + 3x2 ≤ 1575 (Raw Material)x1 + 2x2 ≤ 420 (Production Rate)

x1 ≥ 0x2 ≥ 0

TMU, M.M.Pedram, [email protected]

Page 61: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

x2

525

420315 x1375

300

d3

210

d2

d1

Max Z = 13x1 + 11x2s.t.

d1 : 4x1 + 5x2 + s1 = 1500d2 : 5x1 + 3x2 + s2 = 1575d3 : x1 + 2x2 + s3 = 420

Zmax = 4335

Example #3:

(270, 75)

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 62: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example #3:

Data Mining, Spring 2011

1000 2000

2000

3000

3000

3000

4000

4000

4000

4000

5000

5000

5000

6000

6000

7000

7000

8000

0 50 100 150 200 250 300 3500

50

100

150

200

250

300

350

x1

x2

TMU, M.M.Pedram, [email protected]

Page 63: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

v f (x1,x2) = x12 + x2

2

-15 -10 -5 0 5 10 15

-10

0

10

0

50

100

150

200

xy x1

x2

f (x 1

,x2)

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 64: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Data Mining, Spring 2011

5

5

10

10

10

20 20

2020

40

40

40

40

40

40

60

6060

60

60

60

60

80

8080

80

80

8080

80

100

100

100

100

100

100

100100

100

120

120

120

120

-10 -8 -6 -4 -2 0 2 4 6 8 10-10

-8

-6

-4

-2

0

2

4

6

8

10

x1

x2

TMU, M.M.Pedram, [email protected]

Page 65: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

v f (x1,x2) = x12 - x2

2

-10-5 0

5 10

-10-5

05

10

-100

-50

0

50

100

xy x1

x2

f (x 1

,x2)

-80 -80

-80-80

-60

-60

-60

-60

-40

-40

-40

-40

-40

-40

-20

-20

-20

-20

-20

-20

0

0

0

0

0

0

0

0

20

20

20

20

20

20

40

40

40

40

40

40

60

60

60

60

80

80

80

80

x-10 -8 -6 -4 -2 0 2 4 6 8 10

-10

-8

-6

-4

-2

0

2

4

6

8

10

x1

x2

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 66: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

v Rosenbrock’s function: f (x1,x2) = 100 (x2−x12)2 + (1− x1)2

Data Mining, Spring 2011

-2-1

01

2

-2

-1

0

1

2

0

500

1000

1500

2000

2500

3000

x1x2

f (x 1

,x2)

TMU, M.M.Pedram, [email protected]

Page 67: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Data Mining, Spring 2011

0.7

0.7

0.7

0.7

0.77

7

7

7

7

7

7 7

7

7

70

70

70

70

70

70

70

70

70

70

200

200

200

200

200

200

200

200

200

500

500

500

500

500

500

700

700

700

700

700

1000 1000

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2-1

-0.5

0

0.5

1

1.5

2

2.5

3

x1

x2

TMU, M.M.Pedram, [email protected]

Page 68: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

v Peaks function:

-3-2

-10

12

3

-3-2

-10

12

3-10

-5

0

5

xyx1

x2

f (x 1

,x2)

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 69: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

The Importance of Level SetsTheoremv The vector ∇f(x0) is orthogonal to the tangent vector to an

arbitrary smooth curve passing through x0 on the level set determined by f(x) = f(x0).

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 70: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

The Importance of convexityvAs said before, convexity guarantees that a local optimum is a

global optimum.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 71: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Possible Optimal Solutions to Convex NLPs (not occurring at corner points)

Data Mining, Spring 2011

objective function level curve

optimal solution

Feasible Region

linear objective,nonlinear constraints

objective function level curve

optimal solution

Feasible Region

nonlinear objective,nonlinear constraints

objective function level curve

optimal solution

Feasible Region

nonlinear objective,linear constraints

objective function level curves

optimal solution

Feasible Regionnonlinear objective,linear constraints

TMU, M.M.Pedram, [email protected]

Page 72: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Local vs. Global Optimal Solutions for Nonconvex NLPs

Data Mining, Spring 2011

A

C

B

Local optimal solution

Feasible Region

D

EF

G

Local and global optimal solution

x1

x2

TMU, M.M.Pedram, [email protected]

Page 73: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

NLP with Equality Constraints

[ ]( )

1 2( )

0 1,2, ,

Min

s.t.

Tn

j

f x x x

h j m

=

= =

x x

x

L

K

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 74: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

The Lagrangian Functionv Let introduce the Lagrangian Function, L(x,λ), as:

L(x,λ) = f(x) + λT h(x)

where

v The notation ∇x f(x,y) means the gradient of f with respect to x.v Thus:

∇x L(x,λ) = ∇x f(x) + λT ∇x h(x)

1 1

2 2

( )( )

( ) ,: :( )

λλ

λ

= =

λ

m m

hh

h

xx

h x

xLagrangian Multipliers

or Dual Vector

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 75: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

First Order Necessary Conditions (FONC)Theorem : Lagrange's Theorem (FONC)v Let x* be a local minimizer (or maximizer) of f : ℜn→ℜ, subject

to h(x) = 0m×1, h : ℜn→ℜm, m ≤ n. Assume that x* is a regular point. Then, there exists λ* ∈ ℜm such that:

∇x L(x∗,λ∗) = ∇x f(x∗) + λ∗T ∇x h(x∗) = (0n ×1)T

vNote that the constraint are in the form of h(x) = 0m×1. Thus the above FONC could be stated as:

∇x L(x,λ) = 01×n

∇λ L(x,λ) = 01×m

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 76: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

First Order Necessary Conditions (FONC)v The Lagrangian can be thought as an unconstrained optimization

problem with variables x1, x2, …, xn and λ1, λ2, …, λm. The problem can be solved by solving the equations:

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

1

1

, 0 1,2,...,

, 0 1,2,...,

ni

mj

or for i nx

or for j mλ

×

×

∂ ∂= = = ∂ ∂

∂ ∂ = = =

∂ ∂

0

0

xL L

L Lλ

Page 77: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

First Order Necessary Conditions (FONC)v For n=2 and m=1, i.e., let the problem consists of f as a function of

2 variables and only one constraint. Then the Lagrange’s FONC is stated as follows for a local minimizer (or maximizer) x*:

∇f(x∗) + λ∗ ∇h(x∗) = 0

v The equation means that

∇f(x∗) and ∇h(x∗) must be in exactly opposite directionsat a minimum or maximum point!

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 78: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Consider the problem:

min x1 + x2

s.t: (x1)2 + (x2)2 – 1 = 0

The feasible region is a circle with a radius of one. The possible objective function curves are lines with a slope of -1. The minimum will be the point where the lowest line still touches the circle.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 79: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example

f(x) = 1

f(x) = 0

f(x) = -1.414

Feasible region

* 0.7070.707

x

=

( )f x∇

The gradient of f points in the direction of

increasing f

x1

x2

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 80: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Since the objective function lines are straight parallel lines, the gradient of f is a

straight line pointing toward the direction of increasing f, which is to the upper right.

v The gradient of h will be pointing out from the circle and so its direction will depend on the point at which the gradient is evaluated.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 81: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example

*( )h x∇

=

011x

1( )f x∇

Tangent Plane

1( )h x∇x1

x2

* 0.7070.707

x

=

f(x) = 1

f(x) = 0

f(x) = -1.414

*( )f x∇

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 82: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Conclusionsv At the optimum point, ∇f(x) is parallel to ∇h(x).v As we can see at point x1, ∇f(x) is not parallel to ∇h(x) and we can move

(down) to improve the objective function.

v We can say that at a max or min, ∇f(x) must be parallel to ∇h(x), Otherwise, we could improve the objective function by changing position.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 83: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Using the FONC for the previous example:

L(x,λ) = f(x) + λ h(x)

And the first FONC equation is:

( ) ( )( )2 21 2 1 2 1x x x xλ= + + ⋅ + −

0 1,2

0

i

for ix

λ

∂ = = ∂∂ =

L

L

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 84: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev This becomes:

v There are three equations and three unknowns. By solving the system:

v It can be seen from the graph that positive x1 & x2 corresponds to a maximum while negative x1 & x2 corresponds to the minimum.

11

22

1 2 0

1 2 0

xx

xx

λ

λ

∂= + =

∂∂

= + =∂

L

L

( ) ( )2 21 2 1 0x x

λ∂

= + − =∂

L

1 2 0.707x x= = ± 0.707λ = m

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 85: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Limitations of FONCv The FONC do not guarantee that the solution(s) will be

minimums/maximums.vAs in the case of unconstrained optimization, they only provide us

with candidate points that need to be verified by the second order conditions.

v If the problem is convex, then the FONC will guarantee the solutions be extreme points.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 86: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Some Definitionsv Let: ∇2

x L(x,λ) : the Hessian matrix of L(x,λ) = f(x) + λT h(x) with respect to x, ∇2f(x) : the Hessian matrix of f(x),∇2hj(x) : the Hessian matrix of hj(x), j=1,2,…,m.

Then:

∇2x L(x,λ) = ∇2f(x) + λ1.∇

2h1(x) + λ2.∇2h2(x) + …+ λm.∇2hm(x)

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 87: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Some Definitionsv The tangent space at a point x* on the surface

S=x∈ℜn : h(x)=0is the set T(x*)=y | <∇h(x), y>=0

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Tangent Plane

Page 88: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Let S=x∈ℜ3 : h1(x)=x1=0, h2(x)=x1−x2=0,

then:

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

1

2

1 2

1

2

3

33

1 0 0( )( )

1 1 0( )

( ) | ( ) 0, ( ) 0

1 0 0| 0

1 1 0

0| 0 ,

T

T

T T

hh

T h h

yyy

r the x axis inr

∇ ∇ = = −∇

= ∇ = ∇ =

= = −

= ∈ℜ = ℜ

xh x

x

x y x y x y

y

y

Page 89: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Second Order Necessary Conditions (SONC)

Theorem : SONCv Let x* be a local minimizer (or maximizer) of f : ℜn→ℜ, subject

to h(x) = 0m×1, h : ℜn→ℜm, m ≤ n. Assume that x* is a regular point. Then, there exists λ* ∈ ℜm such that:1. ∇x L(x,λ) = ∇x f(x) + λ∗T ∇x h(x) = 0n ×1

2. for all y∈ T(x*), we have yT ∇2x L(x,λ) y ≥ 0.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 90: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Second Order Sufficient Conditions (SOSC)

Theorem : SOSCv Let f and h ∈ C2, and there is a point x*∈ℜn and λ* ∈ ℜm such

that:1. ∇x L(x,λ) = ∇x f(x) + λ∗T ∇x h(x) = 0n ×1 .2. for all y∈ T(x*), we have yT ∇2

x L(x,λ) y > 0 .

Then x* is a strict minimizer f subject to h(x) = 0m×1.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 91: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

The Tangent space

x1

x3

x2Tangent Plane

(all possible y vectors)

( )h∇ x

( ) 0h =x

*x

v The tangent plane is the location of all y vectors and intersects with x*, It must be orthogonal (perpendicular) to ∇h(x)

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 92: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Note vNote the similarity between Lagrangian function and unconstraint

optimization.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 93: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Maximization ProblemsvNote that the previous definitions of the SONC & SOSC were for

minimization problems!v For maximization problems, the sense of the inequality sign will be

reversed (like that of unconstraint optimization).

v For maximization problems:SONC: yT ∇2

x L(x,λ) y ≤ 0

SOSC: yT ∇2x L(x,λ) y < 0

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 94: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

v The necessary conditions are required for a point to be an extremum but even if they are satisfied, they do not guarantee that the point is an extremum.

v If the sufficient conditions are true, then the point is guaranteed to be an extremum. But if they are not satisfied, this does not mean that the point is not an extremum.

Necessary & Sufficient

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 95: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Procedure1. Solve the FONC to obtain candidate points.2. Test the candidate points with the SONC

Eliminate any points that do not satisfy the SONC

3. Test the remaining points with the SOSC The points that satisfy them are min/max’s For the points that do not satisfy, we cannot say whether they are

extreme points or not.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 96: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

NLP with Inequality ConstraintsvConsider problems such as:

min f(x) x =[x1 x2 … xn]s.t. hi(x) = 0 i = 1, …, m

gj(x) ≤ 0 j = 1, …, p

vAn inequality constraint, gj(x) ≤ 0 is called “active” at x* if gj(x*) = 0.

v Let the set I(x*) contain all the indices of the active constraints at x*:

gj(x*) = 0 for all j in set I(x*)

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 97: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

NLP with Inequality Constraintsv The generalized Lagrangian is written:

vWe use λ’s for the equalities & µ’s for the inequalities.

1 1( , , ) ( ) ( ) ( )

pm

i i j ji j

f h gλ µ= =

= + ⋅ + ⋅∑ ∑x x x xL λ µ

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 98: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

FONC 4 Equality & Inequality ConstraintsKarush-Kuhn-Tucker (KKT) Theoremv For the generalized Lagrangian, the FONC become:

and the complementary slackness condition :

* * * * * * * *1

1 1( , , ) ( ) ( ) ( ) 0

pm

i i j j ni j

f h gλ µ ×= =

∇ = ∇ + ⋅∇ + ⋅∇ =∑ ∑x x x xL λ µ

* *( ) 0,j jgµ ⋅ = x* 0,jµ ≥ 1,j p= K

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Non-negative Lagrange Multiplier Two cases:

1. g j(x) = 0,2. g j(x) < 0 à µj =0

Page 99: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

v The SONC (for a minimization problem) are:for all y∈ T(x*), we have yT ∇2

x L(x*,λ*, µ*) y ≥ 0

Where J(x*).y = 0 as before.

v J(x*) : the matrix of the gradients of all the equality constraints and only the inequality constraints that are active at x*.

SONC 4 Equality & Inequality Constraints

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 100: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

SOSC 4 Equality & Inequality Constraintsv The SOSC for a minimization problem with equality & inequality

constraints are:yT ∇2

x L(x*,λ*, µ*) y > 0

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 101: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Solve the problem:

Min f(x) = (x1 – 1)2 + (x2)2

s.t. h(x) = (x1)2 + (x2)2 + x1 + x2 = 0g(x) = x1 – (x2)2 ≤ 0

The Lagrangian for this problem is:

( ) ( ) ( ) ( ) 2

22 2 221 2 1 1 2 1 2( , , ) ( 1)x x x x x x x xλ µ λ µ= − + + ⋅ + + + + ⋅ −xL

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 102: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev The first order necessary conditions:

( )1 11

2 1 2 0x xx

λ λ µ∂

= − + ⋅ ⋅ + + =∂

L

2 2 22

2 2 2 0x x xx

λ λ µ∂

= + ⋅ ⋅ + − ⋅ ⋅ =∂

L

( ) ( )2 21 2 1 2 0x x x x

λ∂

= + + + =∂L

( )( ) 0221 =−⋅ xxµ

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 103: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev Solving the 4 FONC equations, we get 2 solutions:

1.

2.

(1) 0.20560.4534

= −

x 9537.0=µ45.0=λ

(2) 00

=

x 2−=µ0=λ

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 104: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

ExamplevNow try the SONC at the 1st solution:

Both h(x) & g(x) are active at this point (they both equal zero). So, the Jacobian is the gradient of both functions evaluated at x(1):

( )( )(1)

1 1 2

2

2 1 2 1 1.411 0.09321 2 1 0.9068

x

x xJ

x+ +

= = − − − x

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 105: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev The only solution to the equation:

(1)( ) 0J ⋅ =x y

is: 00

=

y

And the Hessian of the Lagrangian is:

(1)

2 2 2 0 2.9 00 2 2 2 0 0.993x

x

λλ µ

+ ∇ = = + −

L

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 106: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev So, the SONC equation is:

v This inequality is true, so the SONC is satisfied for x(1) and it is still a candidate point.

[ ] 2.9 0 00 0 0 0

0 0.993 0

⋅ ⋅ = ≥

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 107: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev The SOSC equation is:

And we just calculated the left-hand side of the equation to be the zero matrix. So, in our case for x2:

Thus, the SOSC not satisfied.

2 * * *( , , ) 0Tx λ µ⋅∇ ⋅ >y x yL

(1)

2 0 0Tx x

>⋅∇ ⋅ =y yL

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 108: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev For the second solution:

Again, both h(x) & g(x) are active at this point. The Jacobian is:

( )( 2 )

1 2(2)

2

2 1 2 1 1 11 2 1 0

x

x xJ

x+ +

= = − − x

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 109: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev The only solution to the equation:

(2)( ) 0J ⋅ =x y

is: 00

=

y

and the Hessian of the Lagrangian is:

( 2)

2 2 2 0 2 00 2 2 2 0 2x

x

λλ µ

+ ∇ = = + − −

L

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 110: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev So, the SONC equation is:

v This inequality is true, so the SONC is satisfied for x(2) and it is still a candidate point.

[ ] 2 0 00 0 0 0

0 2 0

⋅ ⋅ = ≥ −

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 111: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Examplev The SOSC equation is:

And we just calculated the left-hand side of the equation to be the zero matrix. So, in our case for x2:

Thus, the SOSC not satisfied.

2 * * *( , , ) 0Tx λ µ⋅∇ ⋅ >y x yL

(1)

2 0 0Tx x

>⋅∇ ⋅ =y yL

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 112: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example Conclusionsv So, we can say that both x(1) & x(2) may be local minimums, but we

cannot be sure because the SOSC are not satisfied for either point.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 113: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Dual ProblemvUsing dual problem Constrained optimization à

unconstrained optimization

vNeed to change maximizationto minimization

vOnly valid when the original optimization problem is convex/concave (strong duality)

Dual Problem* arg min ( )l

λλ λ=

* arg max ( )

subject to ( )x

x f x

g x c

=

=

Primal Problem

x*=λ*

When convex/concave

( ) ( ( ) ( ( ) ))λ λ= + −x

Max l f x g x c

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 114: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Example:

v Introduce a Lagrange multiplier λ for constraint v Construct the Lagrangian

v KKT conditions

,max subject to 6

x yxy x y+ ≤

( , ) + (6 )L x y xy x yλ= − −

( , ) 0

( , ) 0

6 3(6 ) 0

L x y yx x yL x y xy

x yx y

λλ

λ

λλ

∂ = − = ∂ ⇒ = =∂ = − =∂

+ ≤ ⇒ <− − =

6 ( ) 0x y− + ≥

o Expressing objective function using λ

o Solution is λ=3

2min ( ) 6. . 3

Ls t

λ λ λλ

= − +≤

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 115: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

SVM

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 116: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Perceptron: Linear Separators v Binary classification can be viewed as the task of separating classes

in feature space:

wTx + b = 0

wTx + b < 0

wTx + b > 0

f(x) = sign(wTx + b)

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 117: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Linear SeparatorsvWhich of the linear separators is optimal?

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 118: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Classification MarginvDistance from example xi to the separator is v Examples closest to the hyperplane are support vectors. vMargin ρ of the separator is the distance between support

vectors.

wxw br i

T +=

r

ρ

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 119: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Maximum Margin ClassificationvMaximizing the margin is good.v Implies that only support vectors matter; other training examples

are ignorable.

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 120: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Linear SVM Mathematicallyv Let training set (xi, yi)i=1..n, xi∈Rd, yi ∈ -1, 1 be separated by a

hyperplane with margin ρ. Then for each training example (xi, yi):

v For every support vector xs the above inequality is an equality. After rescaling w and b by ρ/2 in the equality, we obtain that distance between each xs and the hyperplane is

v Then the margin can be expressed through (rescaled) w and b as:

wTxi + b ≤ - ρ/2 if yi = -1wTxi + b ≥ ρ/2 if yi = 1

w22 == rρ

wwxw 1)(y

=+

=br s

Ts

yi(wTxi + b) ≥ ρ/2⇔

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 121: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Linear SVMs Mathematically (cont.)v Then we can formulate the quadratic optimization problem:

Which can be reformulated as:

Find w and b such that

is maximized

and for all (xi, yi), i=1...n : yi(wTxi + b) ≥ 1w2

Find w and b such that

Φ(w) = ||w||2=wTw is minimized

and for all (xi, yi), i=1...n : yi (wTxi + b) ≥ 1

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 122: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Linear SVMs Mathematically (cont.)vUse Lagrangian Theory to solve the optimization problem L:

FOCs:

∑=

−+><−><=l

iiiii bxwywwbwL

1]1),([,

21),,( αα

1

1

( , , ) 0

( , , ) 0

l

i i ii

l

i ii

L w bw y x

wL w b

yb

αα

αα

=

=

∂= − =

∂∂

= =∂

∑1

1

0

l

i i ii

l

i ii

w y x

y

α

α

=

=

⇒ =

⇒ =

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 123: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Linear SVMs Mathematically (cont.)v Substitute w into the Lagrangian:

vDual Problem:

∑∑

∑∑∑

==

===

=

><−=

+><−><=

−+><−><=

l

jijijiji

l

ii

l

ii

l

jijijiji

l

jijijiji

l

iiiii

xxyy

xxyyxxyy

bxwywwbwL

1,1

11,1,

1

,21

,,21

]1),([,21),,(

ααα

ααααα

αα

li

yts

xxyyW

i

l

iii

l

jijijiji

l

ii

,...1,0

0..

,21)(max

1

1,1

=>

=

><−=

∑∑

=

==

α

α

αααα

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 124: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Support Vector Classification (cont)vMoving to Linear Non-Separable Situation Make the training sample linearly separable in the feature space implicitly

defined by K(x,z) Primal Problem:

Dual Problem:

libxwyts

ww

ii

bw

,...,1,1))(,(..

,min ,

=≥+><

><

φ

liyts

xxKyy

xxyyW

i

l

iii

l

jijijiji

l

ii

l

jijijiji

l

ii

,...1,0,0..

),(21

)(),(21)(max

1

1,1

1,1

=>=

−=

><−=

∑∑

∑∑

=

==

==

αα

ααα

φφαααα

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]

Page 125: 01 - NLP (1390-03-17)yamaghani.com/Files/1123201135459.pdfvLet X∈ℜn: vLet A = (aij) ∈ℜn×n be a n×nsymmetric matrix.}the k-thorder leading principal minor is defined as the

Implementation TechniquesvWhat Problem Do We Have?

This is a quadratic programming problemØ Standard software packages existØ NP-Hard, Poor scalability (memory requirement)

This is also a very special quadratic programming problem

liyts

xxKyyW

i

l

iii

l

jijijiji

l

ii

,...1,0,0..

),(21)(max

1

1,1

=>=

−=

∑∑

=

==

αα

αααα

Data Mining, Spring 2011TMU, M.M.Pedram, [email protected]