Including Uncertainty Models for Surrogate-based Global Optimization The EGO algorithm
description
Transcript of Including Uncertainty Models for Surrogate-based Global Optimization The EGO algorithm
1
Including Uncertainty Models for Surrogate-based Global Optimization
The EGO algorithm
2
Introduction to optimization with surrogates
• Based on cycles. Each consists of sampling design points by simulations, fitting surrogates to simulations and then optimizing an objective.
• Construct surrogate, optimize original objective, refine region and surrogate.– Typically small number of cycles with large number of simulations in
each cycle.
• Adaptive sampling – Construct surrogate, add points by taking into account not only
surrogate prediction but also uncertainty in prediction.– Most popular, Jones’s EGO (Efficient Global Optimization).– Easiest with one added sample at a time.
3
Background: Surrogate Modeling
• Differences are larger in regions of low point density.
Surrogates replace expensive simulations by simple algebraic expressions fit to data.
•Kriging (KRG)•Polynomial response surface
(PRS)•Support vector regression•Radial basis neural networks
Example:
is an estimate of .
4
Background: Uncertainty
Some surrogates also provide an uncertainty estimate: standard error, s(x).
Example: kriging and polynomial response surface.
Both of these are used in EGO.
5
Kriging fit and defining Improvement• First we sample the function
and fit a kriging model.
• We note the present best solution (PBS)
• At every x there is some chance of improving on the PBS.
• Then we ask: Assuming an improvement over the PBS, where is it likely be largest?
6
What is Expected Improvement?Consider the point x=0.8, and the random variable Y, which is the possible values of the function there. Its mean is the kriging prediction, which is slightly above zero.
[ ( )] [max( ,0)]PBSE I E y Y x
ˆ ˆ( ) ( )ˆ[ ( )] ( ( )) ( )( ) ( )
PBS PBSPBS
y y x y y xE I x y y x s xs x s x
yPBS: Present Best Solution: Kriging Prediction
s(x) : Prediction standard deviation : Cumulative density function of standard normal distribution: Probability density function of standard normal distribution
7
Expected Improvement (EI)• Idea can be found in Mockus’s work as far back as 1978.• Expectation of improving beyond the Present Best
Solution (PBS) or current best function value, yPBS.
ˆ ˆ( ) ( )ˆ( ) ( ( )) ( )( ) ( )
PBS PBSPBS
y y x y y xEI x y y x s xs x s x
Predicted difference between current minimum
and prediction at x
Penalized by area under the
curve
Large when is small with respect to yPBS, promotes exploitation
Large when s(x) is large, promotes exploration
Balance between seeking promising areas of design space and the uncertainty in the model.
Mockus, J., Tiesis, V. and Zilinskas, A. (1978), The application of Bayesian methods for seeking the extremum, in L.C.W. Dixon and G.P. Szego (eds.), Towards Global Optimisation, Vol.2, pp. 117–129. North Holland, Amsterdam.
8
Exploration and Exploitation
EGO maximizes E[I(x)] to find the next point to be sampled.
•The expected improvement balances exploration and exploitation because it can be high either because of high uncertainty or low surrogate prediction.
•When can we say that the next point is “exploration?”
9
Recent developments in EI• Sasena MJ, “Flexibility and Efficiency Enhancements for
Constrained Global Design Optimization with Kriging Approximations,” Ph.D. Thesis, University of Michigan, 2002.– Cool criterion for EI
• Sobester A, Leary SJ, and Keane AJ, “On the design of optimization strategies based on global response surface approximation models,” Journal of Global Optimization, 2005.– Weighted EI
• Huang D, Allen TT, Notz WI, and Zeng N, “Global optimization of stochastic black-box systems via sequential kriging meta-models,” Journal of Global Optimization, 2006.– EGO with noisy data (Augmented EI)
10
Probability of targeted Improvement (PI)
• First proposed by Harold Kushner (1964).• PI is given by
yTarget : Set Target to be achieved ()TI: Target of Improvement (usually a constant)
: Kriging Predictions(x) : Prediction standard deviation
: Cumulative density function of a normal distribution
argarg
(̂ )P ( ) ( ) ( )
T etT et
y y xI x P Y y
s x
æ ö- ÷ç ÷ç= < = F ÷ç ÷ç ÷çè ø
Kushner HJ, “A new method of locating the maximum point of an arbitrary multipeak curve in the presence of noise,” Journal of Basic Engineering, Vol. 86, pp. 97-106, 1964.
11
Probability of targeted Improvement (PI)
PBS
12
0 0.2 0.4 0.6 0.8 10
0.05
0.1
0.15
0.2
0.25
0.3
0.35
PI
x
Next point added
0 0.2 0.4 0.6 0.8 1-10
-5
0
5
10
15
20
y
x
25% Target of Improvement
dataPBSyTarget
y(x)yKRG(x)
0 0.2 0.4 0.6 0.8 10
0.05
0.1
0.15
0.2
PI
x
Next point added
0 0.2 0.4 0.6 0.8 1-10
-5
0
5
10
15
20
y
x
50% Target of Improvement
dataPBSyTarget
y(x)yKRG(x)
Target of 25% improvement on PBSTarget of 50% improvement on PBS
Sensitivity to target setting
Target too modest
Target too ambitious
13
How to deal with target setting in PI?• One way is to change to EI.• Approach proposed by Jones: Use several targets
corresponding to low, medium, high desired improvements.• Leads to selection of multiple points per iteration.• Local and Global search at the same time.• Takes advantage of parallel computing.
• Recent development:– Chaudhuri A, Haftka RT, and Viana FAC, “Efficient Global
Optimization with Adaptive Target for Probability of targeted Improvement,” 8th AIAA Multidisciplinary Design Optimization Specialist Conference, Honolulu, Hawaii, April 23-26, 2012.
14
Expanding EGO to surrogates other than Kriging
(a) Kriging
(b) Support vector regression
We want to run EGO with the most accurate surrogate. But we have no uncertainty model for SVR
Considering the root mean square error, :
12
0
( )RMSe e x dx
15
Importation of Uncertainty model
16
Hartmann 3 exampleHartmann 3 function (initially fitted with 20 points):
After 20 iterations (i.e., total of 40 points), improvement (I) over initial best sample (IBS):
IBS optm
IBS
y yI
y
17
Two other Design of ExperimentsFIRST:
SECOND:
18
Summary: Hartmann 3 example
In 34 DOEs (out of 100) KRG outperforms RBNN (in those cases, the difference between the improvements has mean of only 0.8%).
Box plot of the difference between improvement offered by different surrogates after 20 iterations (out of 100 DOEs)
19
Potential of EGO with Multiple surrogates
Hartmann 3 function (100 DOEs with 20 points)
Overall, surrogates are comparable in performance.
20
EGO with Multiple Surrogates (MSEGO)Traditional EGO uses kriging to generate one point at a time.We use multiple surrogates to get multiple points.
21
EGO with Multiple Surrogates (MSEGO)
“krg” runs EGO for 20 iterations adding one point at a time.
“krg-svr” and “krg-rbnn” run 10 iterations adding two points/iteration.
Multiple surrogates offer good results in half of the time!!!
22
EGO with Multiple Surrogates and Multiple Sampling Criteria
Multiple sampling (or infill) criteria (each of them also have the capability of adding multiple points per optimization cycle):
• EI: Expectation of improvement beyond the present best solution.− Multipoint EI (q-EI), Kriging Believer, Constant Liar
• PI: Probability of improving beyond a set target.− Multiple targets, multipoint PI
Multiple sampling criteria with multiple surrogates has very high potential of:
• Providing insurance against failed optimal designs.• Providing insurance against inaccurate surrogates.• Leveraging parallel computation capabilities.
23
Problems EGO• What is Expected Improvement?
• What happens if you set very ambitious targets while using Probability of targeted Improvement (PI)? How could this feature be useful?
• What are the potential advantages of adding multiple points per optimization cycle using multiple surrogates and multiple sampling criteria?