Surrogate model based design optimization Aerospace design is
synonymous with the use of long running and computationally
intensive simulations, which are employed in the search for optimal
designs in the presence of multiple, competing objectives and
constraints. The difficulty of this search is often exacerbated by
numerical `noise' and inaccuracies in simulation data and the
frailties of complex simulations, that is they often fail to return
a result. Surrogate-based optimization methods can be employed to
solve, mitigate, or circumvent problems associated with such
searches. Alex Forrester, Rolls-Royce UTC for Computational
Engineering Bern, 22 nd November 2010
Slide 2
Coming up before the break: Surrogate model based optimization
the basic idea Kriging an intuitive perspective Alternatives to
Kriging Optimization using surrogates Constraints Missing data
Parallel function evaluations Problems with Kriging error based
methods 2
Slide 3
Surrogate model based optimization Surrogate used to expedite
search for global optimum Global accuracy of surrogate not a
priority 3 SAMPLING PLAN OBSERVATIONS CONSTRUCT SURROGATE(S) design
sensitivities available? multi-fidelity data? SEARCH INFILL
CRITERION (optimization using the surrogate(s)) constraints
present? noise in data? multiple design objectives? ADD NEW
DESIGN(S) PRELIMINARY EXPERIMENTS
Slide 4
Kriging (with a little help from Donald Jones) 4
Slide 5
Intuition is Important! People are reluctant to use a tool they
cant understand Recall how basic probability was motivated by
various games of chance involving dice, balls, and cards? In the
same way, we can also make kriging intuitive. Therefore, we will
now describe The Kriging Game
Slide 6
Game Equipment: 16 function cards (A1, A2,, D4) A B C D 1 2 3
4
Slide 7
Rules of the Kriging Game Dealer shuffles cards and draws one
at random. He does not show it. Player gets to ask the value at
either x=1, x=2, x=3, or x=4 Based on the answer, the Player must
guess the values of the function at all of x=1, x=2, x=3, and x=4
Dealer reveals the card. Players score is the sum of squared
differences between the guesses and actual values (lower is better)
The Player and Dealer switch roles and repeat. After 100 times, the
person with the lowest score wins. Whats the best strategy?
Slide 8
Example: Ask value at x=2 and answer is y=1 A B C D 1 2 3
4
Slide 9
The value at x=2 rules out all but 4 functions: C1, A2, A3, B3
At any value other than x=2, we arent sure what is the value of the
function. But we know the possible values. What guess will minimize
our squared error?
Slide 10
Yes, its the mean But why?
Slide 11
The best predictor is the mean Our best predictor is the mean
of the functions that match the sampled values. Using the range or
standard deviations of the values, we could also give a confidence
interval for our prediction.
Slide 12
Why could we predict with a confidence interval? We had a set
of possible functions and a probability distribution over themin
this case, all equally likely Given the data on the sampled points,
we could subset out those functions that match, that is, we could
condition on the sampled data To do this for more than a finite set
of functions, we need a way to describe a probability distribution
over an infinite set of possible functions a stochastic process
Each element of this infinite set of functions would be a random
function But how do we describe and/or generate a random
function?
Slide 13
How about a purely random function? Here we have x values 0,
0.01, 0.02, ., 0.99, 1.00. At each of these we have generated a
random number. Clearly this is not the kind of function we
want.
Slide 14
Whats wrong with a purely random function? No continuity!
Values at y(x) and y(x+d) for small d can be very different. Root
cause: the values at these points are independent. To fix this, we
must assume the values are correlated, and that C(d) = Correlation(
y(x+d), y(x) ) 1 as d 0 Where the correlation is over all possible
random functions. OK. Great. I need a correlation function C(d)
with C(0)=1. But how do I use such a correlation function to
generate a continuous random function?
Slide 15
Making a random function
Slide 16
The correlation function 16
Slide 17
We are ready! Assuming we have estimates of the correlation
parameters (more on this later), we have a way of generate a set of
functions the equivalent of the cards in the Kriging Game. Using
statistical methods involving conditional probability, we can
condition on the data to get an (infinite) set of random functions
that agree with the data.
Slide 18
Random Functions Conditioned on Sampled Points
Slide 19
Slide 20
The Predictor and Confidence Intervals
Slide 21
What it looks like in practice: 21 Sample the function to be
predicted at a set of points i.e. run your experiments/simulation
s
Slide 22
22 20 Gaussian bumps with appropriate widths (chosen to
maximize likelihood of data) centred around sample points
Slide 23
Multiply by weightings (again chosen to maximize likelihood of
data) 23
Slide 24
Add together, with mean term, to predict function 24 Kriging
predictionTrue function
Slide 25
Alternatives to Kriging
Slide 26
Moving least squares Quick Nice regularization parameter No
useful confidence intervals How to choose polynomial & decay
function? 26
Slide 27
27
Slide 28
Support vector regression Quick predictions in large design
spaces Slow training (extra quadratic programming problem) Good
noise filtering Lovely maths! 28
Slide 29
29
Slide 30
Multiple surrogates Surrogate built using a committee machine
(also called ensembles) Hope to choose best model from a committee
or combine a number of methods Often not mathematically rigorous
and difficult to get confidence intervals Blind Kriging is, perhaps
a good compromise selected by some data analytic procedure 30
Slide 31
Blind Kriging (mean function selected using Bayesian forward
selection) 31
Slide 32
RMSE ~50% better than ordinary Kriging in this example 32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
Optimization Using Surrogates 36
Slide 37
Polynomial regression based search (as Devils advocate)
Slide 38
Gaussian process prediction based optimization 38
Slide 39
39
Slide 40
40
Slide 41
Gaussian process prediction based optimization (as Devils
advocate) 41
Slide 42
But, we have error estimates with Gaussian processes 42
Slide 43
Error estimates used to construct improvement criteria 43
Probability of improvement Expected improvement
Slide 44
Probability of improvement 44 Probability there will be any
improvement, at all Can be extended to constrained and multi-
objective problems
Slide 45
Expected improvement 45 Useful metric that balances prediction
& uncertainty Can be extended to constrained and multi-
objective problems
Slide 46
Constrained EI 46
Slide 47
Probability of constraint satisfaction is just like the
probability of improvement 47 Probability of satisfaction
Prediction of constraint function Constraint function Constraint
limit
Slide 48
Constrained expected improvement Simply multiply by probability
of constraint satisfaction: 48
Slide 49
A 2D example 49
Slide 50
50
Slide 51
51
Slide 52
Missing Data 52
Slide 53
What if design evaluations fail? No infill point augmented to
the surrogate model is unchanged optimization stalls Need to add
some information or perturb the model add random point? impute a
value based on the prediction at the failed point, so EI goes to
zero here? use a penalized imputation (prediction + error
estimate)? 53
Slide 54
Aerofoil design problem 2 shape functions (f 1,f 2 ) altered
Potential flow solver (VGK) has ~35% failure rate 20 point optimal
Latin hypercube max{E[I(x)]} updates until within one drag count of
optimum 54