Download - Introduction to Unconstrained Optimization: Part 1

IntroductionOptimality Conditions

Functions Models and AlgorithmsQuadratic Forms and Scaling

Introduction to Unconstrained Optimization:Part 1

James Allison

ME 555January 29, 2007

James Allison Introduction to Unconstrained Optimization: Part 1



MotivationOutline

Monotonicity Analysis

What can MA be used for?

Problem size reduction (elimination of variables andconstraints)

Identification of problems such as unboundedness

Solution of optimization problem in some cases

What can be done if MA does not lead to a solution?

Application of optimality conditions

Use optimality conditions to derive analytical solution

Use numerical algorithms based on optimality conditions




MotivationOutline

Scope

Ch. 4: Unconstrained Optimization

Concerned only with objective function

Constrained optimization covered in Ch. 5

minx

f (x)

Assumptions

Functions and variables are continuous

Functions are C 2 smooth




MotivationOutline

Lecture Outline

Derivation of optimality conditions

Analytical solutions

Function approximations

Numerical methods

First order methods (gradient descent)Second order methods (Newton’s Method)

Problem scaling




LogicMath ReviewDerivation

Optimality Conditions

Necessary Conditions

B ⇒ AB is true only if A is true (A is necessary for B)

Sufficient Conditions

A⇒ BB is true if A is true (A is sufficient for B)

Necessary and Sufficient Conditions

B ⇐⇒ AB is true if and only if A is true (A is necessary and sufficient for B)





Math Review

Gradient: ∇f (x)

Multidimensional derivative

Vector valued (column in my documents, row in POD2)

∇f (x) = [∂f /∂x1, ∂f /∂x2, . . . , ∂f /∂xn]T

Points in direction of steepest ascent

Example:

f (x , y) = x2 +2y2−xy

x

y

x2 + 2 y2 − x y

x=[1 1] T

∇ f(x) = [1 3] T

0 1 2 3 4 50

0.5

1

1.5

2

2.5

3

3.5

4

4.5

5





Math Review

Hessian: H, sometimes written ∇2f (x)

Multidimensional second derivative

Matrix valued (symmetric)

Provides function shape information

H =

∂2f (x)∂x2

1

∂2f (x)∂x1x2

∂2f (x)∂x2x1

∂2f (x)∂x2

2

Example:

f (x , y) = x2 + 2y2 − xy





Math Review: Taylor’s series expansion

Function of one variable:

f (x) =∞∑

n=0

f (n)(x0)

n(x − x0)

n

Function of multiple variables:

f (x) = f (x0) +n∑

i=1

∂f (x0)

∂xi(xi − xi0)

+1

2

n∑i=1

n∑j=1

∂2f (x0)

∂xi∂xj(xi − xi0)(xj − xj0) + o

(‖x− x0‖2

)





Math Review: Taylor’s series expansion

Function of one variable:

f (x) =∞∑

n=0

f (n)(x0)

n(x − x0)

n

Function of multiple variables: (matrix form)

∂f , f (x)− f (x0) = ∇f (x0)∂x + ∂xTH∂x + o(‖x− x0‖2

)where ∂x , x− x0 is the perturbation vector





Derivation of Optimality Conditions

First order necessity

Suppose that x∗ is a local minimum of f (x).

no perturbations about x∗ will result in a function decrease

first order function approximation can be used to derivenecessary conditions (i.e., if x∗ is a minimum, then what mustbe true?)

If x∗ minimizes f (x), then ∇f (x∗) = 0If ∇f (x†) = 0, then x† is a stationary point, but not necessarily aminimizer










If x∗ minimizes f (x), then ∇f (x∗) = 0

If ∇f (x†) = 0, then x† is a stationary point, but not necessarily aminimizer










If x∗ minimizes f (x), then ∇f (x∗) = 0If ∇f (x†) = 0, then x† is a stationary point, but not necessarily aminimizer






Second order sufficiency

Suppose that x† is a stationary point of f (x).


second order function approximation can be used to derivenecessary conditions (i.e., what must be true for x† to be aminimum?)

If H � 0 at x†, then x† minimizes f (x)





Analytical Solution Example

minx

f (x) = x21 + 2x2

2 − x1x2 + x1

1) Identify stationary point

x† =

[−4

7−1

7

]

2) Test stationary point

H =

[2 −1−1 4

]� 0





Analytical Solution Example

minx

f (x) = x21 + 2x2

2 − x1x2 + x1

1) Identify stationary point

x† =

[−4

7−1

7

]2) Test stationary point

H =

[2 −1−1 4

]� 0




Optimization AlgorithmsFirst Order Models and AlgorithmsSecond Order Models and Algorithms

Functions Models and Algorithms

Analytical solution to optimization problem frequentlyimpossible (why?)

Numerical algorithms based on FONC and function modelscan be used





First Order Model: Gradient Descent

f (x) ≈ f (xk) +∇f (xk)∂x

1 Build a linear model about the current point xk

2 Move in the direction of steepest descent (−∇f (xk)) untilf (x) stops improving

3 Update the linear model and repeat until no descent directionexists (∇f (xk) = 0)

Iterative formula:xk+1 = xk − α∇f (xk)





Line Search

f (x) = 7x21 + 2.4x1x2 + x2

2 , xk = [10 − 5]T

minα

f (α) = x0 − α∇f (x0)

x1

x 2

Example 1: Convex Quadratic Function

xs

x0

−15 −10 −5 0 5 10 15−15

−10

−5

0

5

10

15

0 0.05 0.1 0.15 0.20

500

1000

1500

2000

2500

α

f(x 0−α

∇ f(

x 0))

Line Search View





Discussion on Gradient Descent Method

Stability/optimality

Descent

Speed of convergence





Second Order Model: Newton’s Method

f (x) ≈ f (xk) +∇f (xk)∂x + ∂xTH∂x

1 Build a quadratic model about the current point xk

2 Go to the quadratic approximation for the stationary point

3 Update the quadratic model and repeat until the current pointis a stationary point (∇f (xk) = 0)

Iterative formula:

xk+1 = xk −H−1∇f (xk)





Second Perspective on Newton’s Method

Newton’s method for root finding (solve f (x) = 0):

xk+1 = xk − f (xk)

f ′(xk)

Multidimensional system of equations (solve f(x) = 0):

xk+1 = xk − J−1f(xk)

What system of equations do we need to solve in unconstrainedoptimization?

∇f (x) = 0





Discussion on Newton’s Method

Stability/optimality

Descent

Speed of convergence




Quadratic FormsEigenvaluesProblem Scaling

Quadratic Forms

Quadratic Form: function is a linear combination of xixj terms

Matrix representation:

f (x) = x′Ax

Example: f (x) = 2x21 + x1x2 + x2

2 + x2x3 + x23

f (x) = [x1 x2 x3]

2 0.5 00.5 1 0.50 0.5 1

x1

x2

x3

= x′Ax





Types of Quadratic Functions

if x′Ax > 0 ∀x, A is positive definite⇒ convex quadraticfunction

if x′Ax < 0 ∀x, A is negative definite⇒ concave quadraticfunction

if x′Ax ≶ 0, A is indefinite⇒ hyperbolic quadratic function

Physical interpretation?





� ��

��

� ��

�

� ��

��

� � ��

� ��

��

�� ! #"$� %��

�&

'( )*

�+�

),

�� -� ��! #"$� .�/� ��

� � � � ��

� ��

� �

�

�

��

� �

� ��

��

� ��

� � ��

� � � ��

� ��

� � ��

�

��

��0 �� -� ��% 1"2� .�/� �0

� &

'( )*

��

),

��0 �� -� ��% 1"2� .�/� �0

� � � � ��

� ��

� �

�

�

��

� �

� ��

��

� ��

�

� ��

��

� �

3�4$5 � �76 ��8 �% 9��/�% #"$� %��

� &

'( )*

��

),

3�4$5 � �76 ��8 �% 9��/�% #"$� %��

� � � � ��

� ��

� �

�

�

��

� �





Quadratic Function Definitions

f1(x) = x′A1x

f2(x) = x′A2x

f3(x) = x′A3x

Where:

A1 =

[7 1.2

1.2 1

], A2 =

[−7 1.21.2 −1

], and A3 =

[−5 2.62.6 2

]





Eigenvalues, Eigenvectors, and Function Geometry

Av = λvv eigenvector

λ eigenvalue

Provide insight into function shape

Facilitate useful coordinate system rotations

Helpful for understanding problem condition (ellipticity) andscaling





Interpretation of Eigenvalues

f (x) = f0 + x′b + x′Ax

∇f (x) = b + 2Ax

x† = −1

2A−1b

Shift coordinate system:

f (z) = f† + z′Az

Rotate coordinate system:

f (p) = f† +n∑

i=1

λip2i





Numerical Examples

Function 1:

v1 =

[.189−.982

], λ1 = .769, v2 =

[−.982−.189

], λ2 = 7.23

Function 2:

v1 =

[−.982.189

], λ1 = −.723, v2 =

[−.189−.982

], λ2 = −.769

Function 3:

v1 =

[−.949.314

], λ1 = −5.86, v2 =

[−.314−.949

], λ2 = 2.86





Numerical Examples

��

��

�� !� �"

#�$�% % $�%#�$�&

#�$�%

#�&

%

&

$�%

$�&

��

��

�� '� � ��!��(�"� �'�� "

#�$�% % $�%#�$�&

#�$�%

#�&

%

&

$�%

$�&

�"�

��

)+*", � ��- �". �� '�� "

#�$�% % $�%#�$�&

#�$�%

#�&

%

&

$�%

$�&

λ1 = .769λ2 = 7.23

λ1 = −.723λ2 = −.769

λ1 = −5.86λ2 = 2.86





Connection with Quadratic Form

x′Ax > 0 ∀x ⇐⇒ λi > 0 ∀i⇒ A � 0 ∧ convex quadratic function

x′Ax < 0 ∀x ⇐⇒ λi < 0 ∀i⇒ A ≺ 0 ∧ concave quadratic function

if x′Ax ≶ 0 ⇐⇒ λi ≶ 0⇒ A is indefinite ∧ hyperbolic quadratic function





Problem Scaling





Condition Number

C =λmax

λmin

C � 1⇒ numerical difficulties, slow convergence

C ≈ 1⇒ faster convergence





Scaling Approaches

Scale design variables to be the same magnitude: y = s′x

Account for v not aligned with coordinate axes: y = S−1x

Implement scaling within algorithm: