DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS
Transcript of DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS
![Page 1: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/1.jpg)
1
DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS
Delphine Sinoquet (IFPEN)
COURSE 1: MAIN DFO METHODS
https://www.ifpenergiesnouvelles.fr/page/delphine-sinoquet
![Page 2: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/2.jpg)
2 | © 2 0 2 0 I F P E N
DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS
Course 1: main DFO methods
Course 2: various applications of DFO
Course 3: some challenges in DFO
![Page 3: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/3.jpg)
3 | © 2 0 2 0 I F P E N
REFERENCES
Audet, C. and Hare W., Derivative-Free and Blackbox Optimization, Springer Series in Operations Research and Financial Engineering (2017)
Conn, A.R., Scheinberg, K., Vicente, L.N., Introduction to derivative-free optimization. SIAM, Philadelphia (2009)
Rios, L.M. & Sahinidis, N.V., Derivative-free optimization: a review of algorithms andcomparison of software implementations, J Glob Optim (2013), Vol. 56, Issue 3, pp 1247–1293
Cartis, C., Lectures on global and derivative-free optimization, University of Oxford (2018)
![Page 4: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/4.jpg)
4 | © 2 0 2 0 I F P E N
WHY DERIVATIVE FREE OPTIMIZATION ?
![Page 5: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/5.jpg)
5 | © 2 0 2 0 I F P E N
STUDY OF COMPLEX SYSTEMS
Parameters
$$
Responses of interest
Reach one or severalobjectives by tuning the
input parameters
Experiments/Simulations
![Page 6: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/6.jpg)
6 | © 2 0 2 0 I F P E N
STUDY OF COMPLEX SYSTEMS
Parameters
$$
Responses of interest
Reach one or severalobjectives by tuning the
input parameters
➢ Manual optimization (trial & error): when the expert knows very well and can control the system, and when the number of parameters is small
Experiments/Simulations
![Page 7: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/7.jpg)
7 | © 2 0 2 0 I F P E N
STUDY OF COMPLEX SYSTEMS
Parameters
$$
Responses of interest
Reach one or severalobjectives by tuning the
input parameters
➢ Manual optimization (trial & error): when the expert knows very well and can control the system, and when the number of parameters is small
➢ Random exploration: How many simulations should we do ? How do we know that the current set of values is close to a solution ?
Experiments/Simulations
![Page 8: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/8.jpg)
8 | © 2 0 2 0 I F P E N
STUDY OF COMPLEX SYSTEMS
Parameters
$$
Responses of interest
Reach one or severalobjectives by tuning the
input parameters
➢ Manual optimization (trial & error): when the expert knows very well and can control the system, and when the number of parameters is small
➢ Random exploration: How many simulations should we do ? How do we know that the current set of values is close to a solution ?
➢ Discretisation of the parameter space on a regular grid: 3n simulations if we consider 3 points per dimension and n parameters !!!
Experiments/Simulations
![Page 9: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/9.jpg)
9 | © 2 0 2 0 I F P E N
Experiments/Simulations
OPTIMIZATION
STUDY OF COMPLEX SYSTEMS
Parameters
$$
Responses of interest
![Page 10: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/10.jpg)
10 | © 2 0 2 0 I F P E N
Experiments/Simulations
OPTIMIZATIONmin𝑥
𝑓 𝑥
STUDY OF COMPLEX SYSTEMS
➢ An optimizer O is an algorithm which proposes iteratively a new x based on the information from previous trials in order to approximate the solution of the problem 𝑥 = argmin(𝑓 𝑥 )
S O S O𝑥0 𝑓(𝑥0) 𝑥1 𝑓(𝑥𝑘) 𝑥𝑘+1
Optimizer
Simulator 𝑥 𝑓(𝑥)
Parameters
$$
Responses of interest
![Page 11: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/11.jpg)
11 | © 2 0 2 0 I F P E N
Experiments/Simulations
OPTIMIZATIONmin𝑥
𝑓 𝑥
STUDY OF COMPLEX SYSTEMS
➢ An optimizer O is an algorithm which proposes iteratively a new x based on the information from previous trials in order to approximate the solution of the problem 𝑥 = argmin(𝑓 𝑥 )
➢ The cost of the optimization is linked with the number of calls to the simulator S which evaluates f
S O S O𝑥0 𝑓(𝑥0) 𝑥1 𝑓(𝑥𝑘) 𝑥𝑘+1
Optimizer
Simulator 𝑥 𝑓(𝑥)
Parameters
$$
Responses of interest
![Page 12: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/12.jpg)
12 | © 2 0 2 0 I F P E N
WHY DERIVATIVE FREE OPTIMIZATION ?
More and more complex simulatorsblack-box simulator (proprietary code or a simulation package)derivatives of objective function are not available
numerical approximation of ∇𝑓(𝑥) is expensive: finite-differences when computing 𝑓(𝑥) is expensive or for a high number of optimization variables 𝑥
computing 𝑓(𝑥) is expensive: time consuming numerical simulations or experiments
➢Need for optimization methods adapted to derivative free problems
Simulator𝑓(𝑥)
Optimizerminx𝑓(𝑥)
𝑥
𝑓(𝑥)
![Page 13: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/13.jpg)
13 | © 2 0 2 0 I F P E N
DFO METHODS
(Standard derivative-based methods with approximate gradients)
Direct Search methods Nelder Mead Simplex
Pattern Search
Surrogate optimization / model-based DFO methodsLocal model of the objective function
Global model of the objective function
Stochastic DFO methodsEvolutionary strategies
Simulated annealing
![Page 14: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/14.jpg)
14 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS
by finite differences (F.D.)n+1 simulations per iteration
by generalized finite derivatives (E.F.D.)
by an approximation model of the objective functionsimulations from previous iterations + m (≤ n) new simulations
∇𝑥𝑖𝑓(𝑥𝑘) ≈
𝑓 𝑥𝑘 + ℎ𝑒𝑖 − 𝑓(𝑥𝑘)
ℎ
∇𝑓 𝑥𝑘 ≈ ∆𝑥 −1 𝑓 𝑥𝑘 + ∆𝑥 − 𝑓 𝑥𝑘
∆𝑥 is the perturbation matrix (for F.D. ∆𝑥 = 𝐼𝑛)
෨𝐹 𝑥𝑘 + 𝑠 = 𝑓 𝑥𝑘 + 𝑔𝑘𝑠 ou ෨𝐹 𝑥𝑘 + 𝑠 = 𝑓 𝑥𝑘 + 𝑔𝑘𝑠 + 𝑠𝑇 ෩𝐻𝑘𝑠
∇𝑓 𝑥𝑘 ≈ 𝑔𝑘
![Page 15: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/15.jpg)
15 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS: FINITE DIFFERENCES
x1
x2
![Page 16: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/16.jpg)
16 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS: FINITE DIFFERENCES
x1
x2
![Page 17: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/17.jpg)
17 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS: FINITE DIFFERENCES
x1
x2
![Page 18: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/18.jpg)
18 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS: FINITE DIFFERENCES
x1
x2
![Page 19: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/19.jpg)
19 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION
x1
x2
![Page 20: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/20.jpg)
20 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION
x2
x1
![Page 21: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/21.jpg)
21 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION
x2
x1
![Page 22: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/22.jpg)
22 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION
x2
x1
![Page 23: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/23.jpg)
23 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS: MODEL OF THE OBJECTIVE FUNCTION
x1
x2
![Page 24: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/24.jpg)
24 | © 2 0 2 0 I F P E N
A CLASSICAL EXAMPLE FOR NL OPTIMIZATION
-1 0 1-0,5
-0
0,5
1
1,5
Rosenbrock function :
(-1;1)initial point solution
(1;1)
![Page 25: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/25.jpg)
25 | © 2 0 2 0 I F P E N
DERIVATIVE-BASED METHOD
14 iterations
Solution obtained with Newton method
![Page 26: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/26.jpg)
26 | © 2 0 2 0 I F P E N
DERIVATIVE-BASED METHOD
53 iterations
Solution obtained with BFGS method
![Page 27: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/27.jpg)
27 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS
92 simulations
-1 0 1-0,5
-0
0,5
1
1,5
Solution obtained with BFGS method
![Page 28: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/28.jpg)
28 | © 2 0 2 0 I F P E N
APPROXIMATE GRADIENTS
Function-evaluation cost: 𝑛 + 1 evaluations at each iteration
Difficulty to choose the finite difference step ℎ
If noisy function →meaningless approximate gradients
➢Convergence issues
![Page 29: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/29.jpg)
29 | © 2 0 2 0 I F P E N
DFO METHODS
Use only objective function values
No gradient approximation
Sample of points 𝑥𝑖 𝑖=1,…,𝑝➔ simulations 𝑥𝑖 𝑖=1,…,𝑝➔ new iterate 𝑥𝑘
![Page 30: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/30.jpg)
30 | © 2 0 2 0 I F P E N
DFO METHODS
(Standard derivative-based methods with approximate gradients)
Direct Search methods Nelder Mead Simplex
Pattern Search
Surrogate optimization / model-based DFO methodsLocal model of the objective function
Global model of the objective function
Stochastic DFO methodsEvolutionary strategies
Simulated annealing
![Page 31: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/31.jpg)
31 | © 2 0 2 0 I F P E N
DIRECT SEARCH METHODS
Nelder-Mead simplex algorithm
Source : Richards (2010)
based on the comparison of objective function values on a (𝑛 + 1) simplex: 𝑓 𝑥1 ≤ 𝑓 𝑥2 … ≤ 𝑓(𝑥𝑛+1)
Attempt to improve the worst objective function value 𝑓(𝑥𝑛+1):𝑥𝑛+1 is replaced by a point belonging to the line ҧ𝑥, 𝑥𝑛+1with ҧ𝑥 =
1
𝑛σ𝑖=1𝑛 𝑥𝑖, centroid of the best 𝑛 points
→ expansion, reflection or contraction of the simplex at each iteration
![Page 32: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/32.jpg)
32 | © 2 0 2 0 I F P E N
DIRECT SEARCH METHODS
Nelder-Mead simplex algorithm
Source : Wright (2013)
![Page 33: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/33.jpg)
33 | © 2 0 2 0 I F P E N
DIRECT SEARCH METHODS
Nelder-Mead simplex algorithm
Termination conditions: function values at vertices are close to each other
or simplex becomes too small
Simulation cost:k=0 and for any shrinkage step: (𝑛 + 1) evaluations
1 or 2 evaluations for all other steps
Limited convergence results (only for 𝑛 = 1 or 𝑛 = 2)see Torczon (1991) for other simplex methods with better convergence results
Lot of failure examples
![Page 34: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/34.jpg)
34 | © 2 0 2 0 I F P E N
DIRECT SEARCH METHODS
𝑥1 = 𝑥0 + 𝛼0𝑒1
𝑥2 = 𝑥1 + 𝛼1𝑒2
⋮𝑥𝑛 = 𝑥𝑛−1 + 𝛼𝑛−1𝑒𝑛−1
𝑥𝑛+1 = 𝑥𝑛 + 𝛼𝑛𝑒1
⋮
𝛼𝑘 is chosen to produce a sufficient decrease
𝑓 𝑥𝑘 + 𝛼𝑘𝑒𝑘−1 < 𝑓 𝑥𝑘 − 𝜌 𝛼𝑘
with 𝜌 𝑡 ≥ 0 increasing function of 𝑡, 𝜌(𝑡)/𝑡𝑡→0
0
→Inefficient: coordinate direction (almost) ⊥ ∇𝑓 𝑥𝑘
→Efficient when the variables are essentially uncoupled
Linesearch derivative free methods: e.g. coordinate search method
![Page 35: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/35.jpg)
35 | © 2 0 2 0 I F P E N
DIRECT SEARCH METHODS
Pattern search methods
Motivation: parallelisation of function evaluationsInstead of one search direction 𝑠𝑙(= 𝑒𝑙) in linesearch, explore a set of directions 𝐷𝑘
e.g. 𝐷𝑘 = 𝑒1, 𝑒2, … , 𝑒𝑛, −𝑒1, −𝑒2, … , −𝑒𝑛
At each iteration, for a given mesh step 𝛼𝑘:
Search step (OPTIONNAL): evaluate the objective function on a finite number of points with any method: along a given direction, on a simplex, …
Poll Step: if no better point if found in optionnal search step, search for a better point in the 𝐷𝑘 directions: 𝑥𝑘 + 𝛼𝑘𝑠𝑖 , 𝑠𝑖 ∈ 𝐷𝑘
If no better point is found (no smaller function value):𝑥𝑘+1 = 𝑥𝑘 and decrease the mesh size 𝛼𝑘
else 𝑥𝑘+1 = 𝑥𝑘 + 𝛼𝑘𝑠𝑖 and increase the mesh size
![Page 36: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/36.jpg)
36 | © 2 0 2 0 I F P E N
DIRECT SEARCH METHODS
Pattern search methods
Source : Kolda et al. (2003)
![Page 37: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/37.jpg)
37 | © 2 0 2 0 I F P E N
DIRECT SEARCH METHODS
Pattern search methods: DIRECT
Source : Perttunen et al. (1993)
= Dividing RECTangles
Divide each side of the « rectangle(s) » associated with the smallest function values into3 in order to define sub-rectangles
Evaluate the center of the new rectangles
Stopping criteria: minimal size of the rectangles
Global convergence for continuous functions
High evaluation cost
![Page 38: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/38.jpg)
38 | © 2 0 2 0 I F P E N
DIRECT ALGORITHM
-1 0 1-0,5
-0
0,5
1
1,5
517 simulations
![Page 39: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/39.jpg)
39 | © 2 0 2 0 I F P E N
DIRECT ALGORITHM
-1 0 1-0,5
-0
0,5
1
1,5
Last iterations
![Page 40: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/40.jpg)
40 | © 2 0 2 0 I F P E N
DIRECT SEARCH METHODS
Popular methodsEasy to implement
Easy to parallelize
But expensive in terms of simulations
often coupled with a surrogate model in the search step
Pattern search methods
![Page 41: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/41.jpg)
41 | © 2 0 2 0 I F P E N
DFO METHODS
(Standard derivative-based methods with approximate gradients)
Direct Search methods Nelder Mead Simplex
Pattern Search
Surrogate optimization / model-based DFO methodsLocal model of the objective function
Global model of the objective function
Stochastic DFO methodsEvolutionary strategies
Simulated annealing
![Page 42: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/42.jpg)
42 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Optimization methods based on a surrogate model of the objective function
to limit the number of evaluations of the objective functions
the model is updated during the iterations based on new simulations
The model is either global or local
![Page 43: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/43.jpg)
43 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Global models
Design of experiment technique choose evaluation points to be used to compute the initial model
space filling design (maximin criterion)
Regression choose a model type
Gaussian process or kriging, Radial Basis Function (RBF), Neuronal Networks
Sampling criterionchoose new point(s) to evaluate for the update of the modelminimum of the current model, maximum of the error prediction ...
Model
![Page 44: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/44.jpg)
44 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Global models: Gaussian process (kriging)
Assumption: the objective function is assumed to be a realization of a Gaussian random process (GP) with parametric mean function and stationary covariance function
𝐹 𝑥 = 𝛽𝑇𝑟 𝑥 + 𝑍(𝑥)
Regression(e.g. polynomial d=1) zero-mean, stationary covariance
stochastic part
![Page 45: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/45.jpg)
45 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Global models: Gaussian process (kriging)
Assumption: the objective function is assumed to be a realization of a Gaussian random process (GP) with parametric mean function and stationary covariance function
𝐹 𝑥 = 𝛽𝑇𝑟 𝑥 + 𝑍(𝑥)
The surrogate model is the conditional expectation of the GP
𝐹 𝑥 = 𝐸 𝐹 𝑥 | 𝑥𝑖 , 𝑓 𝑥𝑖 𝑖=1,…,𝑝
= 𝛽𝑇 𝑟 𝑥 + 𝑘𝑇 𝑥 𝐾−1 𝑌𝑝 − 𝑅𝛽
avec 𝑅 = 𝑟𝑗 𝑥𝑖𝑖,𝑗
, 𝐾 = 𝜌 𝑥𝑖 , 𝑥𝑗𝑖,𝑗, 𝑘 𝑥 = (𝜌 𝑥, 𝑥1 , … , 𝜌 𝑥, 𝑥𝑝 )
![Page 46: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/46.jpg)
46 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Global models: Gaussian process (kriging)
Assumption: the objective function is assumed to be a realization of a Gaussian random process (GP) with parametric mean function and stationary covariance function
𝐹 𝑥 = 𝛽𝑇𝑟 𝑥 + 𝑍(𝑥)
The surrogate model is the conditional expectation of the GP
𝐹 𝑥 = 𝐸 𝐹 𝑥 | 𝑥𝑖 , 𝑓 𝑥𝑖 𝑖=1,…,𝑝
= 𝛽𝑇 𝑟 𝑥 + 𝑘𝑇 𝑥 𝐾−1 𝑌𝑝 − 𝑅𝛽
The variance of GP are used as error indicators 𝜎2(𝑥) = 𝜎2 − 𝑘𝑇 𝑥 𝐾−1𝑘𝑇 𝑥
avec 𝑅 = 𝑟𝑗 𝑥𝑖𝑖,𝑗
, 𝐾 = 𝜌 𝑥𝑖 , 𝑥𝑗𝑖,𝑗, 𝑘 𝑥 = (𝜌 𝑥, 𝑥1 , … , 𝜌 𝑥, 𝑥𝑝 )
![Page 47: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/47.jpg)
47 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Global models: Gaussian process (kriging)
![Page 48: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/48.jpg)
48 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Sampling criterion based on Expected Improvement (EI)a balance between exploration and minimization
➢ EGO (Efficient Global Optimization)
or Bayesian Optimization
argmax 𝐸𝐼 𝑥 = 𝐸 𝐼 𝑥
= 𝐸 max(0, 𝑓𝑚𝑖𝑛 − 𝐹(𝑥))
![Page 49: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/49.jpg)
49 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Local models
Quadratic interpolation models built from a set of appropriately chosen sample points𝐹𝑘 𝑠 = 𝑐𝑘 + 𝑠𝑇𝑔𝑘 +
1
2𝑠𝑇𝐻𝑘𝑠, s ∈ ℬ 𝑥𝑘 , Δ (trust region) (TR)
with 𝑐𝑘 ∈ ℝ, 𝑔𝑘 ∈ ℝ𝑛 and 𝐻𝑘 ∈ ℝ𝑛×𝑛 (symmetric) that satisfy interpolation conditions:𝐹𝑘 𝑥𝑖 − 𝑥𝑘 = 𝑓 𝑥𝑖
The matrix of linear system must be non-singular and well conditioned
Minimization of the quadratic model in the trust region min𝑠 ≤Δ𝑘
𝐹𝑘 𝑠
Update the model with new evaluations
Improve the geometry of the interpolation set to help with the model interpolation stepOne point is replaced by another one that improves the conditioning of the interpolation matrix
![Page 50: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/50.jpg)
50 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Local models
Let 𝑠𝑘 be the solution of the (TR) minimization problem
Predicted model decrease 𝐹𝑘 0 − 𝐹𝑘 𝑠𝑘 = 𝑓 𝑥𝑘 − 𝐹𝑘 𝑠𝑘
Actual function decrease 𝑓 𝑥𝑘 − 𝑓 𝑥𝑘 + 𝑠𝑘
The trust region is updated according to the value of 𝜌𝑘 =𝑓 𝑥𝑘 −𝑓 𝑥𝑘+𝑠𝑘
𝑓 𝑥𝑘 − 𝐹𝑘 𝑠𝑘
If 𝜌𝑘 ≥ 𝜂 (successful step): 𝑥𝑘+1 = 𝑥𝑘 + 𝑠𝑘, Δ𝑘+1 ≥ Δ𝑘
If 𝜌𝑘 < 𝜂 (unsuccessful step): 𝑥𝑘+1 = 𝑥𝑘, Δ𝑘 is reduced or the interpolation set is improved
![Page 51: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/51.jpg)
51 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Local modelsobjective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region
![Page 52: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/52.jpg)
52 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Local modelsobjective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region
![Page 53: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/53.jpg)
53 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Local modelsobjective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region
![Page 54: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/54.jpg)
54 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Local models objective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region
![Page 55: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/55.jpg)
55 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Local modelsobjective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region
![Page 56: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/56.jpg)
56 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Local models objective functionquadratic model to be minimizedevaluation pointsolution of the QPpoint for model improvementTrust RegionMinimal region
![Page 57: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/57.jpg)
57 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Local models (TR DFO)
-1 0 1-0,5
-0
0,5
1
1,5
86 simulations
![Page 58: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/58.jpg)
58 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Global models (Gaussian Process)
29 simulations
-1 0 1-0,5
-0
0,5
1
1,5
![Page 59: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/59.jpg)
59 | © 2 0 2 0 I F P E N
DFO METHODS
(Standard derivative-based methods with approximate gradients)
Direct Search methods
Nelder Mead Simplex
Pattern Search
Surrogate optimization / model-based DFO methods
Local model of the objective function
Global model of the objective function
Stochastic DFO methodsEvolutionary strategies
Simulated annealing
![Page 60: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/60.jpg)
60 | © 2 0 2 0 I F P E N
STOCHASTIC METHODS
Evolution strategies
Global optimization
Very few assumptions on function regularity
Main principle1. Random generation of initial population
2. Evaluation of each individual of current generation
3. Reproduction: selection of the best individuals
4. Diversification: cross-over and mutation
5. Replacement : survival of the best individuals
6. Repeat step 2 until satisfying solution is obtained
![Page 61: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/61.jpg)
61 | © 2 0 2 0 I F P E N
STOCHASTIC METHODS
Evolution strategies
An example of a cross-over operator for continuous variablesPair of individuals 𝑥, 𝑦 selected randomly
𝑥, 𝑦 → 𝛼𝑥 + 1 − 𝛼 𝑦, 𝛼~𝑈( 0; 1 )
An example of a mutation operator for continuous variablesaddition of a Gaussian noise
𝑥𝑖 ≔ 𝑥𝑖 + 𝑢𝑖 , 𝑢𝑖~𝑁 0, 𝜎𝑖2 , 𝑖 = 1,2, … , 𝑛
𝜎𝑖 is a critical parameter to tune
![Page 62: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/62.jpg)
62 | © 2 0 2 0 I F P E N
STOCHASTIC METHODS
Evolution strategies
1/5-rule (Rechenberg)
If < 0.2, then increases
If > 0.2, then decreases (where = % of successful mutations over T generations)
➢Can adapt one general step-size
➢But no individual step size
Mutative step-size control Strategy parameters (step-sizes) treated similarly to optimized parameters
➢Facilitates adaptation of individual step-sizes
![Page 63: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/63.jpg)
63 | © 2 0 2 0 I F P E N
STOCHASTIC METHODS
Evolution strategies
Mutation operator
Covariance matrix ~ inverse of a “global” Hessian matrix
Select the best individuals to compute the meanupdate of
the covariance matrix
the global standard deviation
![Page 64: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/64.jpg)
64 | © 2 0 2 0 I F P E N
STOCHASTIC METHODS
Evolution strategies vs. genetic algorithms
Main difference : in evolutionary strategies, only the best individuals are allowed to reproduce (elitist selection)
The parents can be included in the next generation
Similar operators: mutation, cross-over
![Page 65: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/65.jpg)
65 | © 2 0 2 0 I F P E N
STOCHASTIC METHODS
Evolution strategies: CMAES method (Hansen)
Generation 1 Generation 20 Generation 30
![Page 66: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/66.jpg)
66 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Evolution strategy; CMAES
-1 0 1-0,5
-0
0,5
1
1,5
370 simulations
![Page 67: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/67.jpg)
67 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Evolution strategy; CMAES
170 last simulations
-1 0 1-0,5
-0
0,5
1
1,5
![Page 68: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/68.jpg)
68 | © 2 0 2 0 I F P E N
STOCHASTIC METHODS
Simulated annealing
Principle: emulate the physical system of the cooling of a solid so that the frozen state is frozen for a minimum energy configuration
At a given iteration, a new solution (state) 𝑥𝑘+1 is determined from the previous solution (state) 𝑥𝑘
randomly choose a neighbour of 𝒙𝒌: 𝒚𝒌then, the new state is
𝑥𝑘+1 = ൞
𝑦𝑘 if 𝑓 𝑦𝑘 ≤ 𝑓(𝑥𝑘)
𝑦𝑘 if 𝑓 𝑦𝑘 > 𝑓(𝑥𝑘)𝑥𝑘 otherwise
with the probability 𝑒𝑓 𝑥𝑘 −𝑓 𝑦𝑘
𝑡𝑘
𝒕𝒌 (temperature of the system) is a decreasing sequence in order to decrease the probability to accept a bad solution (increasing 𝑓) at last iterations
![Page 69: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/69.jpg)
69 | © 2 0 2 0 I F P E N
SURROGATE OPTIMIZATION METHODS
Simulated annealing: SIMPSA
-1 0 1-0,5
-0
0,5
1
1,5
280 last simulations
![Page 70: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/70.jpg)
70 | © 2 0 2 0 I F P E N
DFO METHODS
SQPAL = SQP BFGS – FDSQA = local surrogate optimEGO = global surrogate optim (kriging)NMSMAX = Nelder-Mead simplexCMAES = Evolutionary Strategy
53 problems2 ≤ 𝑛 ≤ 12
Benchmark Moré & Wild
![Page 71: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/71.jpg)
71 | © 2 0 2 0 I F P E N
DFO METHODS
Summary (I)
Classical methods with approximate gradients➢popular (do not change optimizer)
➢but not adapted for large scale problem
➢step size tuning is cumbersome
Direct Search methods ➢ revival of these methods with parallelization➢hybrid implementation (coupled with surrogate models)➢not adapted for large scale problems
![Page 72: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/72.jpg)
72 | © 2 0 2 0 I F P E N
DFO METHODS
Summary (II)
Surrogate optimizationLocal interpolation model with TR
➢can handle constraints (coupled with SQP)
➢good performances in terms of number of function evaluations
Global models
➢global methods
➢not adapted for large scale problems: needs a lot of evaluation to obtain a good accuracy
![Page 73: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/73.jpg)
73 | © 2 0 2 0 I F P E N
DFO METHODS
Summary (III)
Evolutionary strategies / Simulated annealing➢no assumption on function regularities
➢discrete optimization is possible
➢global methods
➢but expensive in terms of function evaluations
➢difficulties to handle constraints
![Page 74: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/74.jpg)
74 | © 2 0 2 0 I F P E N
DFO METHODS
Global versus Local
Multi-start optimizationrun a local method from several initial points
➢well adapted for functions with a small number of local minima
➢handles constraints
Global surrogate optimization (kriging, RBF, NN) / Evolutionary algorithms / Simulated annealing➢not adapted for large scale problems:
needs a lot of evaluations to obtain a good accuracy
➢difficulties for handling constraints
![Page 75: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/75.jpg)
75 | © 2 0 2 0 I F P E N
REFERENCES
Audet, C. and Hare W., Derivative-Free and Blackbox Optimization, Springer Series in Operations Research and Financial Engineering (2017)
Conn, A.R., Scheinberg, K., Vicente, L.N., Introduction to derivative-free optimization. SIAM, Philadelphia (2009)
Rios, L.M. & Sahinidis, N.V., Derivative-free optimization: a review of algorithms andcomparison of software implementations, J Glob Optim (2013), Vol. 56, Issue 3, pp 1247–1293
Cartis, C., Lectures on global and derivative-free optimization, University of Oxford (2018)
![Page 76: DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS](https://reader030.fdocuments.us/reader030/viewer/2022012800/61b5dfd4c5b2960c601d2cb0/html5/thumbnails/76.jpg)
76 | © 2 0 2 0 I F P E N
DERIVATIVE FREE OPTIMIZATION AND APPLICATIONS
Course 1: main DFO methods
Course 2: various applications of DFO
Course 3: some challenges in DFO