1 Computacion Inteligente Derivative-Based Optimization.
Transcript of 1 Computacion Inteligente Derivative-Based Optimization.
![Page 1: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/1.jpg)
1
Computacion Inteligente
Derivative-Based Optimization
![Page 2: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/2.jpg)
2
Contents
• Optimization problems
• Mathematical background
• Descent Methods
• The Method of Steepest Descent
• Conjugate Gradient
![Page 3: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/3.jpg)
3
OPTIMIZATION PROBLEMS
![Page 4: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/4.jpg)
4Terms in Mathematical Optimization
1. Objective function – mathematical function which is optimized by changing the
values of the design variables.
2. Design Variables – Those variables which we, as designers, can change.
3. Constraints – Functions of the design variables which establish limits in individual
variables or combinations of design variables.
![Page 5: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/5.jpg)
5Problem Formulation
3 basic ingredients…– an objective function,– a set of decision variables,– a set of equality/inequality constraints.
The problem is
to search for the values of the decision variables that minimize the objective function while satisfying the constraints…
![Page 6: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/6.jpg)
6Mathematical Definition
– Design Variables: decision and objective vector
– Constraints: equality and inequality
– Bounds: feasible ranges for variables
– Objective Function: maximization can be converted to minimization due to the duality principle
max minf x f x
min : , 0, 0L U
xy f x x x x h x g x
Obective Decision vector
Bounds constrains
![Page 7: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/7.jpg)
7Steps in the Optimization Process
1. Identify the quantity or function, f, to be optimized.
2. Identify the design variables: x1, x2, x3, …,xn.
3. Identify the constraints if any exist
a. Equalities
b. Inequalities
4. Adjust the design variables (x’s) until f is optimized and all of the constraints are satisfied.
![Page 8: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/8.jpg)
8Local and Global Optimum Designs
1. Objective functions may be unimodal or multimodal.
a. Unimodal – only one optimumb. Multimodal – more than one optimum
2. Most search schemes are based on the assumption of a unimodal surface. The optimum determined in such cases is called a local optimum design.
3. The global optimum is the best of all local optimum designs.
![Page 9: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/9.jpg)
9Weierstrass Theorem
• Existence of global minimum
• If f(x) is continuous on the feasible set S which is closed and bounded, then f(x) has a global minimum in S
– A set S is closed if it contains all its boundary pts.
– A set S is bounded if it is contained in the interior of some circle
compact = closed and bounded
)numberfinite:,( ccxxT
![Page 10: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/10.jpg)
10Example of an Objective Function
-1 -0.5 0 0.5 1-1
-0.8
-0.6
-0.4
-0.2
0
0.2
0.4
0.6
0.8
1
x1
x2
![Page 11: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/11.jpg)
11Multimodal Objective Function
0 0.5 1 1.50
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
local maxsaddle point
![Page 12: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/12.jpg)
12Optimization Approaches
• Derivative-based optimization (gradient based)
– Capable of determining “search directions” according to an objective function’s derivative information
• steepest descent method;
• Newton’s method; Newton-Raphson method;
• Conjugate gradient, etc.
• Derivative-free optimization
• random search method;
• genetic algorithm;
• simulated annealing; etc.
![Page 13: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/13.jpg)
13
MATHEMATICAL BACKGROUND
![Page 14: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/14.jpg)
14Positive Definite Matrices
• A square matrix M is positive definite if
• It is positive semidefinite if
0Tx Mx for all x ≠ 0
0Tx Mx for all x
The scalar xTMx = is called a quadratic form.,x Mx
![Page 15: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/15.jpg)
15Positive Definite Matrices
• A symmetric matrix M = MT is positive definite if and only if its eigenvalues λi > 0. (semidefinite ↔ λi ≥ 0)
– Proof (→): Let vi the eigenvector for the i-th eigenvalue λi
– Then,
– which implies λi > 0,
i i iMv v
20 T T
i i i i i i iv Mv v v v
prove that positive eigenvalues imply positive definiteness.
![Page 16: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/16.jpg)
16Positive Definite Matrices
• Theorem: If a matrix M = UTU then it is positive definite
• Proof. Let’s f be defined as
• If we can show that f is always positive then M must be positive definite. We can write this as
• Provided that Ux always gives a non zero vector for all values of x except when x = 0 we can write b = U x, i.e.
• so f must always be positive
T T Tf x Mx x U Ux
2Ti
i
f b b b
Tf Ux Ux
![Page 17: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/17.jpg)
17Quadratic Functions
• f: Rn → R is a quadratic function if
– where Q is symmetric.
1
2T Tf x x Qx b x c
![Page 18: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/18.jpg)
18Quadratic Functions
• It is no necessary for Q be symmetric.– Suposse matrix P non-symmetric
Q is symmetric
1 1
11 12 1 1
21 21 2
1
1 1
2 2
1: :2
:
n nT
ij i ji j
n
n nn
f x p x x x Px
p p p x
p xx x
p p
1 1( )
2 2T
ij ij jix Q x where q p p
![Page 19: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/19.jpg)
19Quadratic Functions
– Suposse matrix P non-symmetric. Example
Q is symmetric
2 2 21 1 2 1 3 2 2 3 3
1( ) 2 2 4 6 4 5
2f x x x x x x x x x x
2 2 41
( ) , 0 6 42
0 0 5
Tf x x Px P
2 1 21
, 1 6 22
2 2 5
Tx Qx Q
![Page 20: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/20.jpg)
20Quadratic functions
• Given the quadratic function
1
2T Tf x x Qx b x c
If Q is positive definite, then f is a parabolic “bowl.”
![Page 21: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/21.jpg)
21Quadratic functions
• Two other shapes can result from the quadratic form.
– If Q is negative definite, then f is a parabolic “bowl” up side down.
– If Q is indefinite then f describes a saddle.
![Page 22: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/22.jpg)
22Quadratic Functions
• Quadratics are useful in the study of optimization.
– Often, objective functions are “close to” quadratic near the solution.
– It is easier to analyze the behavior of algorithms when applied to quadratics.
– Analysis of algorithms for quadratics gives insight into their behavior in general.
![Page 23: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/23.jpg)
23One Dimension Derivative
• The derivative of f: R → R is a function f ′: R → R given by
• if the limit exists.
0
' limh
df x f x h f xf x
dx h
![Page 24: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/24.jpg)
24Directional Derivatives
• Along the Axes…
x
yxf
),(
y
yxf
),(
![Page 25: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/25.jpg)
25Directional Derivatives
• In general direction…
v
yxf
),(
2Rv
1v
![Page 26: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/26.jpg)
26Directional Derivatives
x
yxf
),(
y
yxf
),(
![Page 27: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/27.jpg)
27Directional Derivatives
• Definition: A real-valued function f: Rn → R is said to be continuously differentiable if the partial derivatives
• exist for each x in Rn and are continuous functions of x.
• In this case, we say f C1 (a smooth function C1)
1
,...,n
f f
x x
![Page 28: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/28.jpg)
28The Gradient vector
• Definition: The gradient of f: in R2 → R:
It is a function ∇f: R2 → R2 given by
( , ) :T
f ff x y
x y
),( yxfIn the plane
![Page 29: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/29.jpg)
29The Gradient vector
• Definition: The gradient of f: Rn → R is a function ∇f: Rn → Rn given by
11
( ,..., ) : ,...,
T
nn
f ff x x
x x
![Page 30: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/30.jpg)
30The Gradient Properties
• The gradient defines (hyper) plane approximating the function infinitesimally
yy
fx
x
fz
![Page 31: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/31.jpg)
31The Gradient Properties
• By the chain rule
vfpv
fp ,)(
1v
pf
v
![Page 32: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/32.jpg)
32The Gradient Properties
• Proposition 1:
is maximal choosing
p
p
ff
v
1
intuitive: the gradient points at the greatest change direction
v
f
Prove it!
![Page 33: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/33.jpg)
33The Gradient properties
• Proof:
– Assign:
– by chain rule:
1p
p
v ff
2
( , ) 1( ) ( ) , ( )
( )
1,
p p
p
p
p p p
p p
f x yp f f
v f
ff f f
f f
![Page 34: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/34.jpg)
34The Gradient properties
• Proof:
– On the other hand for general v:
( , )( ) ,
,
p p
p
p
f x yp f v f v
v
f
f x yf p
v
![Page 35: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/35.jpg)
35The Gradient Properties
• Proposition 2: let f: Rn → R be a smooth function C1 around p,
• if f has local minimum (maximum) at p then,
0 pf
Intuitive: necessary for local min(max)
![Page 36: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/36.jpg)
36The Gradient Properties
• Proof: intuitive
![Page 37: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/37.jpg)
37The Gradient Properties
• We found the best INFINITESIMAL DIRECTION at each point,
• Looking for minimum: “blind man” procedure
• How can we derive the way to the minimum using this knowledge?
![Page 38: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/38.jpg)
38Jacobian
• The gradient of f: Rn → Rm is a function Df: Rn → Rm×n given by
called Jacobian
Note that for f: Rn → R , we have ∇f(x) = Df(x)T.
![Page 39: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/39.jpg)
39Derivatives
• If the derivative of ∇f exists, we say that f is twice differentiable.
– Write the second derivative as D2f (or F), and call it the Hessian of f.
![Page 40: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/40.jpg)
40Level Sets and Gradients
• The level set of a function f: Rn → R at level c is the set of points S = {x: f(x) = c}.
![Page 41: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/41.jpg)
41Level Sets and Gradients
• Fact: ∇f(x0) is orthogonal to the level set at x0
![Page 42: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/42.jpg)
42Level Sets and Gradients
• Proof of fact:
– Imagine a particle traveling along the level set.
– Let g(t) be the position of the particle at time t, with g(0) = x0.
– Note that f(g(t)) = constant for all t.
– Velocity vector g′(t) is tangent to the level set.
– Consider F(t) = f(g(t)). We have F′(0) = 0. By the chain rule,
– Hence, ∇f(x0) and g′(0) are orthogonal.
' 0 ' 0 0 0T
F g f g
![Page 43: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/43.jpg)
43Taylor's Formula
• Suppose f: R → R is in C1. Then,
– o(h) is a term such that o(h) = h → 0 as h → 0.
– At x0, f can be approximated by a linear function, and the approximation gets better the closer we are to x0.
0 0 00'f x f x f x x ox x x
![Page 44: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/44.jpg)
44Taylor's Formula
• Suppose f: R → R is in C2. Then,
2
0
2
0 0 0 0 0
1' ''
2
f x f x f x x x f x x
o x x
x
– At x0, f can be approximated by a quadratic function.
![Page 45: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/45.jpg)
45Taylor's Formula
• Suppose f: Rn → R.
– If f in C1, then
– If f in C2, then
0 00 0
Tf x f x f x x x xx o
0 0
2
0 0 0
0
0
1
2
T Tf x f x f x x x x x F x x
x
x
o x
![Page 46: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/46.jpg)
46In What Direction does a Gradient Point?
• We already know that ∇f(x0) is orthogonal to the level set at x0.
– Suppose ∇f(x0) ≠ 0.
• Fact: ∇f points in the direction of increasing f.
![Page 47: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/47.jpg)
47Proof of Fact
• Consider xα = x0 + α∇f(x0), α > 0.
– By Taylor's formula,
• Therefore, for sufficiently small ,
f(xα) > f(x0)
00 0 0
2
0 0
Tf x f x x x f x
f x f x
o x x
o
![Page 48: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/48.jpg)
48
DESCENT METHODS
![Page 49: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/49.jpg)
49The Wolfe Theorem
• This theorem is the link from the previous gradient properties to the constructive algorithm.
• The problem:
)(min xfx
![Page 50: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/50.jpg)
50The Wolfe Theorem
• We introduce a model for algorithm:nRx 0
0)( ixfn
i Rh
)(minarg0
iii hxf
iiii hxx 1
Data
Step 0: set i = 0
Step 1: if stop,
Step 2: compute the step-size
Step 3: set go to step 1
else, compute search direction
![Page 51: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/51.jpg)
51The Wolfe Theorem
• The Theorem:
– Suppose f: Rn → R C1 smooth, and exist continuous function: k: Rn → [0,1], and,
– And, the search vectors constructed by the model algorithm satisfy:
0)(0)(: xkxfx
iiiii hxfxkhxf )()(),(
![Page 52: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/52.jpg)
52The Wolfe Theorem
– And
• Then
– if is the sequence constructed by the algorithm model,
– then any accumulation point y of this sequence satisfy:
00)( ihyf
0)( yf
0}{ iix
![Page 53: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/53.jpg)
53The Wolfe Theorem
• The theorem has very intuitive interpretation:
• Always go in descent direction.
)( ixf
ih
The principal differences between various descent algorithms lie in the first procedure for determining successive directions
![Page 54: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/54.jpg)
54
STEEPEST DESCENT
![Page 55: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/55.jpg)
55The Method of Steepest Descent
• We now use what we have learned to implement the most basic minimization technique.
• First we introduce the algorithm, which is a version of the model algorithm.
• The problem: )(min xfx
![Page 56: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/56.jpg)
56The Method of Steepest Descent
• Steepest descent algorithm:nRx 0
0)( ixf
)(minarg0
iii hxf
iiii hxx 1
Data
Step 0: set i = 0
Step 1: if stop,
Step 2: compute the step-size
Step 3: set go to step 1
else, compute search direction )( ii xfh
![Page 57: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/57.jpg)
57The Method of Steepest Descent
• Theorem:
– If is a sequence constructed by the SD algorithm, then every accumulation point y of the sequence satisfy:
– Proof: from Wolfe theorem
0)( yf
0}{ iix
Remark: Wolfe theorem gives us numerical stability if the derivatives aren’t given (are calculated numerically).
![Page 58: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/58.jpg)
58The Method of Steepest Descent
• How long a step to take?
1i i ix x h
Note search direction is if x
– We are limited to a line search
• Choose λ to minimize f .
. . . directional derivative is equal to zero.
![Page 59: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/59.jpg)
59The Method of Steepest Descent
• How long a step to take?
– From the chain rule:
• Therefore the method of steepest descent looks like this:
0),()( iiiii hhxfhxfd
d
1( )i if x h They are orthogonal !
![Page 60: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/60.jpg)
60The Method of Steepest Descent
![Page 61: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/61.jpg)
61Gradient Descent Example
Given:
Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.
1 2 1 2 1 2, 2sin 1.47 sin 0.34 sin sin 1.9f x x x x x x
λ arbitrary
![Page 62: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/62.jpg)
62Optimum Steepest Descent Example
Given:
Find the minimum when x1 is allowed to vary from 0.5 to 1.5 and x2 is allowed to vary from 0 to 2.
1 2 1 2 1 2, 2sin 1.47 sin 0.34 sin sin 1.9f x x x x x x
![Page 63: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/63.jpg)
63
CONJUGATE GRADIENT
![Page 64: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/64.jpg)
64Conjugate Gradient
• We from now on assume we want to minimize the quadratic function:
• This is equivalent to solve linear problem:
cxbAxxxf TT 2
1)(
( ) 1 10
2 2Tf x
f x A x Ax bx
Ax bIf A symmetric
![Page 65: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/65.jpg)
65Sample: 2D lineal system
• La solucion es la interseccion de las lineas
3 2
2 6A
2
8b
0c
![Page 66: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/66.jpg)
66Sample: 2D lineal system
– Cada elipsoide tiene f(x) constante
In general, the solution x lies at the intersection pointof n hyperplanes, each having dimension n – 1.
![Page 67: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/67.jpg)
67Conjugate Gradient
• What is the problem with steepest descent?
– We can repeat the same directions over and over…
• Wouldn’t it be better if, every time we took a step, we got it right the first time?
![Page 68: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/68.jpg)
68Conjugate Gradient
• What is the problem with steepest descent?
– We can repeat the same directions over and over…
• Conjugate gradient requires n gradient evaluations and n line searches.
![Page 69: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/69.jpg)
69Conjugate Gradient
• First, let’s define de error as
bxA ~
xxe ii~
• ei is a vector that indicates how far we are from the solution.
solution
Start point
![Page 70: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/70.jpg)
70Conjugate Gradient
• Let’s pick a set of orthogonal search directions
0 1 1, ,..., ,...,j nd d d d
iiii dxx 1
(should span Rn)
– In each search direction, we’ll take exactly one step,
that step will be just the right length to line up evenly with x
![Page 71: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/71.jpg)
71Conjugate Gradient
– Unfortunately, this method only works if you already know the answer.
• Using the coordinate axes as search directions…
![Page 72: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/72.jpg)
72Conjugate Gradient
• We have
bxA ~
iiii dxx 1
( )f x Ax b Ax Ax
xxe ii~
( ) ( )i i if x A x x Ae
![Page 73: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/73.jpg)
73Conjugate Gradient
• Given , how do we calculate ?
iiii dxx 1
jd j
• ei+1 should be orthogonal to di
0 1d e
![Page 74: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/74.jpg)
74Conjugate Gradient
• Given , how do we calculate ?
– That is
jd j
1( ) 0Ti id f x
( )T Ti i i i
i T Ti i i i
d Ae d f x
d Ad d Ad
1 0Ti id Ae
( ) 0Ti i i id A e d
![Page 75: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/75.jpg)
75Conjugate Gradient
• How do we find ?
– Since search vectors form a basis
jd
1
00
n
iiide
1
0110020010 ...
j
iiij deddedee
On the other hand
1
0
1
0
j
iii
n
iiij dde
![Page 76: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/76.jpg)
76Conjugate Gradient
• We want that after n step the error will be 0:
– Here an idea: if then:jj
11
0
1
0
1
0
1
0
n
jiii
j
iii
n
iii
j
iii
n
iiij ddddde
nj 0ne
So if:
![Page 77: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/77.jpg)
77Conjugate Gradient
• So we look for such that
– Simple calculation shows that if we take
jj jd
0iTj Add
The correct choice is
ji
( )i id f x
![Page 78: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/78.jpg)
78Conjugate gradient
• Conjugate gradient algorithm for minimizing f:
Step 4: and repeat n times
Step 1:
nRx 0Data
Step 0:
Step 3:
)(: 000 xfrd
iTi
iT
ii Add
rr
iiii dxx 1
iT
i
iT
ii rr
rr 111
iiii drd 111
)(: ii xfr
Step 2:
![Page 79: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/79.jpg)
79
Sources
• J-Shing Roger Jang, Chuen-Tsai Sun and Eiji Mizutani, Slides for Ch. 5 of “Neuro-Fuzzy and Soft Computing: A Computational Approach to Learning and Machine Intelligence”, First Edition, Prentice Hall, 1997.
• Djamel Bouchaffra. Soft Computing. Course materials. Oakland University. Fall 2005
• Lucidi delle lezioni, Soft Computing. Materiale Didattico. Dipartimento di Elettronica e Informazione. Politecnico di Milano. 2004
• Jeen-Shing Wang, Course: Introduction to Neural Networks. Lecture notes. Department of Electrical Engineering. National Cheng Kung University. Fall, 2005
![Page 80: 1 Computacion Inteligente Derivative-Based Optimization.](https://reader035.fdocuments.us/reader035/viewer/2022081603/56649e755503460f94b7598a/html5/thumbnails/80.jpg)
80
Sources
• Carlo Tomasi, Mathematical Methods for Robotics and Vision. Stanford University. Fall 2000
• Petros Ioannou, Jing Sun, Robust Adaptive Control. Prentice-Hall, Inc, Upper Saddle River: NJ, 1996
• Jonathan Richard Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain. Edition 11/4. School of Computer Science. Carnegie Mellon University. Pittsburgh. August 4, 1994
• Gordon C. Everstine, Selected Topics in Linear Algebra. The GeorgeWashington University. 8 June 2004