A Short Tutorial on Evolutionary Multiobjective Optimization
Optimization tutorial
-
Upload
northwestern-university -
Category
Engineering
-
view
453 -
download
0
Transcript of Optimization tutorial
1
Sebastian Bernasek7-14-2015
Intro to Optimization: Part 1
2
What is optimization?
Identify variable values that minimize or maximize some objective while satisfying
constraints
objective
variables
constraints
minimize f(x)
where x = {x1,x2,..xn}
s.t. Ax < b
3
What for?Finance
• maximize profit, minimize risk• constraints: budgets, regulations
Engineering• maximize IRR, minimize emissions• constraints: resources, safety
Data modeling
4
Given a proposed model:y(x) = θ1 sin(θ2 x)
which parameters (θi) best describe the data?
Data modeling
5
Which parameters (θi) best describe the data?
We must quantify goodness-of-fit
Data modeling
6
A good model will have minimal residual error
Goodness-of-fit metrics
where Xi,Yi are data
and y(Xi) is the model, e.g.
7
Least Squares
Weighted Least Squares
Goodness-of-fit metrics
gives greater importance to more precise data
all data equally important
We seek to minimize SSE and WSSE
8
Log likelihood
Define the likelihood L(θ|Y)=p(Y|θ)as the likelihood of θ being the true parameters given the observed data
Goodness-of-fit metrics
9
Log likelihood
Given p(Yi iid Y | θ) we can compute p(Y|θ):
the log transform is for convenience
Goodness-of-fit metrics
We seek to maximize ln L(θ | Y)
10
Log likelihood
So what is p(Yi|θ) ?
Assume each residual is drawn from a distribution. For example, assume ei are Gaussian distributed with
Goodness-of-fit metrics
11
Log likelihood
Goodness-of-fit metricsmaximize ln L(θ | Y)
12
Least Squares• simple and straightforward to implement• requires large N for high accuracy
Weighted Least Squares• accounts for variability in precision of
variables• converges to least squares for high N
Log Likelihood• requires assumption for residuals PDF
Goodness-of-fit metrics
13
Given a proposed model:y(x) = θ1 sin(θ2 x)
which parameters (θi) best describe the data?
Data modeling
objective
variables
constraints
minimize SSE(θ)
where θ = {θ1,θ2,..θn}
s.t. Aθ < b
14
Given a proposed model:y(x) = θ1 sin(θ2 x)
which parameters (θi) best describe the data?
Data modeling
optimumvariables
minimum
θ = {5,1}
SSE(θ) = 277
15
minimize f(x)where x = {x1,x2,..xn}
s.t. Ax < b
So how do we optimize?
16
Types of problems
There are many classes of optimization problems
1. constrained vs unconstrained2. static vs dynamic3. continuous vs discrete variables4. deterministic vs stochastic variables5. single vs multiple objective functions
17
Types of algorithms
There are many more classes of algorithms that attempt to solve these problems
NEOS, UW
18
Types of algorithms
There are many more classes of algorithms that attempt to solve these problems
current scope
NEOS, UW
19
Unconstrained Optimization
Zero-Order Methods (function calls only)• Nelder-Mead Simplex (direct search)• Powell Conjugate Directions
First-Order Methods• Steepest Descent • Nonlinear Conjugate Gradients• Broyden-Fletcher-Goldfarb-Shanno Algorithms (BFGS)
Second-Order Methods• Newton’s Method• Newton Conjugate Gradient
Here we classify algorithms by the derivative information utilized.
scipy.optimize.fmin
20
All but the simplex and Newton methods call one-dimensional line searches as a subroutine
Common option:• Bisection Methods (e.g. Golden Search)
General Iterative Schemeα=step size
dn = search direction
Unconstrained Optimization in 1-D
linear convergence, but robust
21
1-D Optimization
root finding
Calculus-based option:• Newton-Raphson
Unconstrained Optimization in 1-D
can use explicit derivatives or
numerical approximation
we want:
so let:
22
Move to minimum of quadratic fit at each point
can achieve quadratic convergence for twice differentiable functions
Newton Raphson
COS 323 Course Notes, Princeton U.
23
Newton’s Method in N-Dimensions 1. Construct a locally quadratic model (mn)
via Taylor expansion about xn:
2. At each step we want to move toward the minimum of this model, where
points near xn…
differentiating…
solving…
24
Newton’s Method in N-Dimensions 3. The minimum of the local second-order
model lies in the direction pn.
Determine the optimal step size, α, by 1-D optimization
Search directionGeneral Iterative Schemeα=step size
dn = search direction
Golden search, Newton’s method, Brent’s Method, Nelder-Mead Simplex, etc.
25
Newton’s Method in N-Dimensions 4. Take the step
5. Check termination criteria and return to step 3
possible criteria:
• Maximum iterations reached• Change in objective function below
threshold• Change in local gradient below threshold• Change in local Hessian below threshold
26
BFGS Algorithm (quasi-Newton method)
• Numerically approx. H-1(xn)• Multiply matrices
Newton’s Method in N-Dimensions How do we compute the Hessian?
Newton’s Method
• Define H(xn) expressions• Invert it and multiply
• Accurate• Costly for high N• Requires 2nd derivatives
• Avoids solving system• Only req. 1st derivatives• Crazy math I don’t get
scipy.optimize.fmin_bfgs
27
Gradient Descent Newton/BFGS make use of the local
Hessian Alternatively we could just use the
gradient1. Pick a starting point, x0
2. Evaluate the local derivative3. Perform line-search along gradient
4. Move directly along the gradient
5. Check convergence criteria and return to 2
28
Gradient Descent Function must be differentiable Subsequent steps are always
perpendicular Can get caught in narrow valleys
29
Conjugate Gradient Method Avoids reversing previous iterations by ensuring
that each step is conjugate to all previous steps, creating a linearly independent set of basis vectors1. Pick a starting point and evaluate local
derivative2. First step follows gradient descent
3. Compute weights for previous steps, βn
where
is the steepest directionPolak-Ribiere Version
30
Conjugate Gradient Method Creates a set, si, of linearly independent
vectors that span the parameter space xi.
4. Compute search direction, sn
5. Move to optimal point along sn
6. Check convergence criteria and return to step 3
*Note that setting βi = 0 yields the gradient descent algorithm
31
Conjugate Gradient Method For properly conditions problems, guaranteed to
converge in N iterations Very commonly used scipy.optimize.fmin_cg
32
Powell’s Conjugate Directions Performs N line searches along N basis vectors in
order to determine an optimal search direction Preserves minimization achieved by previous
steps by retaining the basis vector set between iterations1. Pick a starting point and a set of basis vectors
2. Determine the optimum step size along each vector
3. Let the search vector be the linear combination of basis vectors:
is convention
33
Powell’s Conjugate Directions4. Move along the search vector
5. Add to the basis and drop the oldest basis vector
6. Check the convergence criteria and return to step 2
and
Problem: Algorithm tends toward a linearly dependent basis set
Solutions: 1. Reset to an orthogonal basis every N iterations 2. At step 5, replace the basis vector corresponding to the largest change in f(x)
34
Powell’s Conjugate DirectionsAdvantages
No derivatives required only uses function calls Quadratic convergence
Accessible via scipy.optimize.fmin_powell
35
Nelder-Mead Simplex AlgorithmDirect search algorithmDefault method:
scipy.optimize.fmin
Method consists of a simplex crawling around the parameter space until it finds and brackets a local minimum.
36
Nelder-Mead Simplex AlgorithmSimplex: convex hull of N+1 vertices in N-space.
2D: a triangle 3D: a tetrahedron
37
Nelder-Mead Simplex Algorithm1. Pick a starting point and define a simplex around
it with N+1 vertices xi
2. Evaluate f(xi) at each vertex and rank order the vertices such that x1 is the best and xN+1 is the worst
3. Evaluate the centroid of the best N vertices
38
Nelder-Mead Simplex Algorithm4. Reflection: let
If replace xN+1 with xr
worst point(highest function val.)
COS 323 Course Notes, Princeton U.
39
Nelder-Mead Simplex Algorithm5. Expansion:
If reflection resulted in the best point, try:
If then replace xN+1 with xe
If not, replace xN+1 with xr
worst point(highest function val.)
COS 323 Course Notes, Princeton U.
40
Nelder-Mead Simplex Algorithm6. Contraction: If reflected point is still the worst, then try contraction
COS 323 Course Notes, Princeton U.
41
Nelder-Mead Simplex Algorithm7. Shrinkage:
If contraction fails, scale all vertices toward the best vertex.
for
COS 323 Course Notes, Princeton U.
42
Nelder-Mead Simplex Algorithm Advantages:
Doesn’t require any derivatives Few function calls at each iteration Works with rough surfaces
Disadvantages: Can require many iterations Does not always converge. Convergence criteria are
unknown. Inefficient in very high N
43
Nelder-Mead Simplex Algorithm
Parameter
Required Typical
α > 0 1β > 1 2γ 0 < γ < 1 0.5δ 0 < δ < 1 0.5
44
Algorithm Comparison
min f(x) iterations f(x) evals f'(x) evalspowell -2 2 43 0
conjugate gradient -2 4 40 10gradient descent -2 3 32 8
bfgs -2 6 48 12simplex -2 45 87 0
f(x) = sin(x) + cos(y)
45
Algorithm Comparison
f(x) = sin(xy) + cos(y)
Simplex & Powell seem to similarly follow valleys with a more “local” focus
BFGS/CG readily transcend valleys
46
2-D Rosenbock Function
min f(x) iterations f(x) evals f'(x) evalspowell 3.8 E-28 25 719 0
conjugate gradient 9.5 E-08 33 368 89gradient descent 1.1 E+01 400 1712 428
bfgs 1.8 E-11 47 284 71simplex 5.6 E-10 106 201 0