8/17/2019 Innovative Methodologies in Evolution Strategies
1/62
ICD Center for Applied Systems Analysis
Innovative Methodologies in
Evolution Strategies— INGENET Project Report D 2.2 —
June 1998
Thomas Bäck, Boris Naujoks
Center for Applied Systems Analysis (CASA)
Informatik Centrum Dortmund
Joseph-von-Fraunhofer-Str. 20
D-44227 Dortmund
8/17/2019 Innovative Methodologies in Evolution Strategies
2/62
ii
8/17/2019 Innovative Methodologies in Evolution Strategies
3/62
Abstract
This INGENET report describes the state-of-the-art in research and application of evo-
lution strategies with the goals of making this knowledge accessible to the INGENET mem-
bers in a compact form and outlining the technological and economical perspectives of
evolution strategies on the European level.
Evolution strategies are one of the main paradigms in the field of evolutionary compu-
tation, focusing on algorithms for adaptation and optimization which are gleaned from the
model of organic evolution.The report puts its emphasis on algorithmic and application-oriented aspects of evolu-
tion strategies. The algorithmic aspects include an overview of all components of a mod-
ern (
,
)-strategy and a detailed explanation of the concept of strategy parameter self-
adaptation, which is considered to be the main distinguishing feature between evolution
strategies and genetic algorithms. The self-adaptation process implements and evolution-
ary optimization process also on the level of strategy parameters such as mutational step
sizes and therefore offers an elegant solution to the parameter tuning problem of evolu-
tionary algorithms. The working principles of self-adaptation are explained in detail in
section 3 of this report.
A number of recent variations of the basic evolution strategy, including alternatives for
the self-adaptation method, the introduction of hierarchies of evolution strategies, and theprinciple of individual aging in the ( , , , )-strategy, are presented in section 4.
Further aspects which are of strong interest from an application-oriented point of view
include noisy and dynamic object functions as well as multiple criteria decision making
problems and constraint handling. These are discussed in section 5, clarifying the fact that
evolution strategies offer effective techniques for handling all of these additional difficulties
of practical applications.
Section 6 gives a brief overview of the parallelization possibilities of evolution strate-
gies, which are suitable for fine-grained as well as coarse-grained parallelization.
An overview of practical applications of evolution strategies is given in section 7, where
case studies are grouped into disciplines and the corresponding literature references are
given. Due to the strong increase of the number of publications in the field of evolutionary
computation in the 1990s, the collection of case studies stops with most recent examples
from 1994, however containing more than 150 examples up to that time.
The report concludes by giving an outline of the perspectives of evolution strategies
by discussing its technological future with a focus on the economic potential by industrial
applications of these algorithms. This outline might serve as a technological roadmap for
the exploitation of these techniques within a ten year timeframe.
Thomas Bäck and Boris Naujoks Dortmund, June 1998
Contact information:
Center for Applied Systems Analysis
Informatik Centrum Dortmund
Joseph-von-Fraunhofer-Str. 20
D-44227 Dortmund, Germany
Phone: +49 231 9700 366
Fax: +49 231 9700 959
Email: [email protected]
iii
8/17/2019 Innovative Methodologies in Evolution Strategies
4/62
iv
8/17/2019 Innovative Methodologies in Evolution Strategies
5/62
Contents
1 A Brief History 1
2 The Algorithm 2
2.1 Working Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2.2 The Structure of Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2.3 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.5 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.6 Termination Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
3 Self-Adaptation 6
4 Variations 11
4.1 Mutative Step-Size Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
4.2 Derandomized Step-Size Adaptation . . . . . . . . . . . . . . . . . . . . . . . 11
4.3 Hierarchical Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.4 The ( , , , )-Strategy: Aging of Individuals . . . . . . . . . . . . . . . . . . 13
5 Application-Oriented Extensions 15
5.1 Noisy Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
5.2 Robust Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.3 Dynamic Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
5.4 Multiple Criteria Decision Making . . . . . . . . . . . . . . . . . . . . . . . . 22
5.5 Constraint Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
6 Parallel Evolution Strategies 25
6.1 The Master-Slave Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
6.2 Coarse Grained Parallelism: The Migration Model . . . . . . . . . . . . . . . 26
6.3 Fine Grained Parallelism: The Diffusion Model . . . . . . . . . . . . . . . . . 27
6.4 A Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
7 Applications 28
7.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
7.2 Biotechnology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.3 Technical Design Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 29
7.4 Chemical Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
7.5 Telecommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
7.6 Dynamic Processes, Modeling, Simulation . . . . . . . . . . . . . . . . . . . . 327.7 Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.8 Microelectronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.9 Military . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
7.10 Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
7.11 Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
v
8/17/2019 Innovative Methodologies in Evolution Strategies
6/62
7.12 Production Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.13 Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.14 Supply- and Disposal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 35
7.15 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
8 Perspectives 37
References 39
vi
8/17/2019 Innovative Methodologies in Evolution Strategies
7/62
1 A Brief History
Evolution Strategies are a joint development of Bienert, Rechenberg and Schwefel, who did
preliminary work in this area in the 1960s at the Technical University of Berlin (TUB) in Ger-
many. First applications were experimental and dealt with hydrodynamical problems like shape
optimization of a bended pipe [119], drag minimization of a joint plate [164], and structure
optimization of a two-phase flashing nozzle [210] 1 . Due to the impossibility to describe and
solve such optimization problems analytically or by using traditional methods, a simple al-gorithmic method based on random changes of experimental setups was developed. In these
experiments, adjustments were possible in discrete steps only, in the first two cases (pipe and
plate) by changing certain joint positions and in the latter case (nozzle) by exchanging, adding
or deleting nozzle segments. Following observations from nature that smaller mutations occur
more often than larger ones, the discrete changes were sampled from a binomial distribution
with prefixed variance. The basic working mechanism of the experiments was to create a mu-
tation, adjust the joints or nozzle segments accordingly, perform the experiment and measure
the quality criterion of the adjusted construction. If the new construction happened to be better
than its predecessor, it served as basis for the next trial. Otherwise, it was discarded and the
predecessor was retained. No information about the amount of improvements or deteriorations
was necessary. This experimental strategy led to unexpectedly good results both for the bended
pipe and the nozzle.
Schwefel was the first who simulated different versions of the strategy on the first available
computer at TUB, a Zuse Z23 [200], later on followed by several others who applied the simple
Evolution Strategy to solve numerical optimization problems. Due to the theoretical results of
Schwefel’s diploma thesis, the discrete mutation mechanism was substituted by normally dis-
tributed mutations with expectation zero and given variance [200]. The resulting two membered
ES works by creating one n -dimensional real-valued vector of object variables from its parent
by applying mutation with identical standard deviations to each object variable. The resulting
individual is evaluated and compared to its parent, and the better of both individuals survives to
become parent of the next generation, while the other one is discarded. This simple selectionmechanism is fully characterized by the term (1+1)-selection.
For this algorithm, Rechenberg developed a convergence rate theory for n 1 for two
characteristic model functions, and he proposed a theoretically confirmed rule for changing the
standard deviation of mutations (the 1 = 5 -success rule) [166].
Obviously, the (1+1)-ES did not incorporate the principle of a population. A first multi-
membered Evolution Strategy or ( +1)-ES having > 1 was also designed by Rechenberg
to introduce a population concept. In a ( +1)-ES parent individuals recombine to form one
offspring, which after being mutated eventually replaces the worst parent individual — if it is
better (extinction of the worst). Mutation and adjustment of the standard deviation was realized
as in a (1+1)-ES, and a recombination mechanism as explained in section 2.4 was used. This
strategy, discussed in more detail in [12], was never widely used but provided the basis to facil-
itate the transition to the ( + )-ES and ( , )-ES as introduced by Schwefel 2 [201, 202, 203].
1 This experiment is one of the first known examples of using operators like gene deletion and gene duplication,
i.e. the number of segments the nozzle consisted of was allowed to vary during optimization.2 The material presented here is based on [203] and a number of research articles, but in the meantime an
1
8/17/2019 Innovative Methodologies in Evolution Strategies
8/62
Again the notation characterizes the selection mechanism, in the first case indicating that the
best individuals out of the union of parents and offspring survive while in the latter case
only the best offspring individuals form the next parent generation (consequently, > is
necessary). Currently, the ( , )-strategy characterizes the state-of-the-art in Evolution Strategy
research and is therefore the strategy of our main interest to be explained in the following. As
an introductory remark it should be noted that the major quality of this strategy is seen in its
ability to incorporate the most important parameters of the strategy (standard deviations and
correlation coefficients of normally distributed mutations) into the search process, such that op-timization not only takes place on object variables, but also on strategy parameters according to
the actual local topology of the objective function. This capability is termed self-adaptation by
Schwefel [204] and will be a major point of interest in discussing the Evolution Strategy.
2 The Algorithm
2.1 Working Principle
In general, evolutionary algorithms mimic the process of natural evolution, the driving process
for the emergence of complex and well adapted organic structures, by applying variation and se-lection operators to a set of candidate solutions for a given optimization problem. The following
structure of a general evolutionary algorithm reflects all essential components of an evolution
strategy as well (see e.g. [10]):
Algorithm 1:
t : = 0
initializeP ( t )
evaluateP ( t )
while not terminate do
P
0
( t ) : =
variation( P ( t ) )
;
evaluate( P
0
( t ) )
;
P ( t + 1 ) : =
select ( P
0
( t ) Q )
t : = t + 1
od
In case of a ( , )-evolution strategy, the following statements regarding the components of
algorithm 1 can be made:
P ( t ) denotes a population (multiset) of individuals (candidate solutions to the given
problem) at generation (iteration) t of the algorithm.
The initialization att = 0
can be done randomly, or with known starting points obtainedby any method.
The evaluation of a population involves calculation of its members quality according to
the given objective function (quality criterion).
updated and extended edition of Schwefel’s book was published (i.e., [207]).
2
8/17/2019 Innovative Methodologies in Evolution Strategies
9/62
The variation operators include the exchange of partial information between solutions (re-
combination) and its subsequent modification by adding normally distributed variations
(mutation) of adaptable step sizes. These step sizes are themselves optimized during the
search according to a process called self-adaptation.
By means of recombination and mutation, an offspring population P 0 ( t ) of candi-
date solutions is generated.
The selection operator chooses the
best solutions fromP
0
( t )
(i.e.,Q =
) as starting
points for the next iteration of the loop. Alternatively, a ( + )-evolution strategy would
select the
best solutions from the union of P
0
( t )
andP ( t )
(i.e.,Q = P ( t )
).
The algorithm terminates if no more improvements are achieved over a number of subse-
quent iterations or if a given amount of time is exceeded.
The algorithm returns the best candidate solution ever found during its execution.
In the following, these basic components of an evolution strategy are explained in some
more detail. For extensive information about evolution strategies, refer to [5, 169, 207].
Using a more formal notation following the outline given in [209, 208], one iteration of thestrategy, that is a step from a population P ( T ) towards the next reproduction cycle with P ( T + 1 ) ,
can be modeled as follows:
P
( T + 1 )
: = o p t
E S
( P
( T )
) (1)
where o p tE S
: I
! I
is defined by
o p t
E S
: = s e l ( m u t r e c )
(2)
operating on an input population P ( T ) according to
o p t
E S
( P
( T )
) = s e l ( P
( T )
t
t
i = 1
f m u t ( r e c ( P
( T )
) ) g
(3)
(here,t
denotes the union operation on multisets). Equation (3) clarifies that the population
at generation T + 1 is obtained from P T by first applying a -fold repetition of recombination
and mutation, which results in an intermediate populationP
0 of size
, and then applying the
selection operator to the union of P
( T ) andP
0 . Recall that the recombination operator generates
only one individual per application, which can then be mutated directly.
In the following, both the formal as well as the informal way of describing the algorithmic
components will be used as it seems appropriate.
2.2 The Structure of Individuals
For a given optimization problem
f : M I R
n
! I R f ( ~x ) ! m i n
an individual of the evolution strategy contains the candidate solution~x 2 I R
n as one part
of its representation. Furthermore, there exist a variable amount (depending on the type of
3
8/17/2019 Innovative Methodologies in Evolution Strategies
10/62
strategy used) of additional information, so-called strategy parameters, in the representation of
individuals. These strategy parameters essentially encode the n -dimensional normal distribution
which is to be used for the variation of the solution.
More formally, an individual ~a = ( ~ x ~ ~ ) consists of up to three components ~x 2 I R n (the
solution), ~ 2 I R n (a set of standard deviations of the normal distribution), and 2 ; n
(a set of rotation angles representing the covariances of the n -dimensional normal distribution),
where n
2 f 1 : : : n g and n
2 f 0 ( 2 n ; n
) ( n
; 1 ) = 2 g . The exact meaning of these
components is described in more detail in section 2.3.
2.3 Mutation
The mutation in evolution strategies works by adding a normally distributed random vector
~z N (
~
0 C ) with expectation vector ~ 0 and covariance matrix C ; 1 , where the covariance matrix
is described by the mutated strategy parameters of the individual. Depending on the amount of
strategy parameters incorporated into the representation of an individual, the following main
variants of mutation and self-adaptation can be distinguished:
n
= 1 , n
= 0 : The standard deviation for all object variables is identical ( ), and all
object variables are mutated by adding normally distributed random numbers with
0
= e x p (
0
N ( 0 1 ) )
(4)
x
0
i
= x
i
+
0
N
i
( 0 1 )
(5)
where 0
/ (
p
n )
; 1 . Here, N ( 0 1 ) denotes a value sampled from a normally distributed
random variable with expectation zero and variance one. The notation N i
( 0 1 ) indicates
the random variable to be sampled anew for each setting of the indexi
.
n
= n , n
= 0 : All object variables have their own, individual standard deviation i
,
which determines the corresponding modification according to
0
i
=
i
e x p (
0
N ( 0 1 ) + N
i
( 0 1 ) ) (6)
x
0
i
= x
i
+
0
i
N ( 0 1 ) (7)
where 0 / (p
2 n )
; 1 and / (q
2
p
n )
; 1 .
n
= n , n
= n ( n ; 1 ) = 2 : The vectors ~ and ~ represent the complete covariance
matrix of then
-dimensional normal distribution, where the covariances are given by rota-
tion angles
j
describing the coordinate rotations necessary to transform an uncorrelated
mutation vector into a correlated one. The details of this mechanism can be found in [5]
(pp. 68–71) or [180]. The mutation is performed according to
0
i
=
i
e x p (
0
N ( 0 1 ) + N
i
( 0 1 ) )
(8)
0
j
=
j
+ N
j
( 0 1 )
(9)
~x
0
= ~x + N (
~
0 C ( ~
0
~
0
) ) (10)
whereN (
~
0 C ( ~
0
~
0
) )
denotes the correlated mutation vector and 0 0 8 7 3
.
4
8/17/2019 Innovative Methodologies in Evolution Strategies
11/62
The amount of information included into the individuals by means of the self-adaptation
principle increases from the simple case of one standard deviation up to the order of n 2 addi-
tional parameters in case of correlated mutations, which reflects an enormous degree of freedom
for the internal models of the individuals. This growing degree of freedom often enhances the
global search capabilities of the algorithm at the cost of the expense in computation time, and
it also reflects a shift from the precise adaptation of a few strategy parameters (as in case of
n
= 1 ) to the exploitation of a large diversity of strategy parameters.
One of the main design parameters to be fixed for the practical application of the evolutionstrategy concerns the choice of
n
andn
, i.e., the amount of self-adaptable strategy parameters
required for the problem.
2.4 Recombination
In evolution strategies recombination is incorporated into the main loop of the algorithm as the
first variation operator and generates a new intermediate population of individuals by -fold
application to the parent population, creating one individual per application from % (1 % )
individuals. Normally, % = 2 or % = (so-called global recombination) are chosen (but see
also section 4.4 for a generalization). The recombination types for object variables and strategyparameters in evolution strategies often differ from each other, and typical examples are dis-
crete recombination (random choices of single variables from parents, comparable to uniform
crossover in genetic algorithms) and intermediary recombination (arithmetic averaging). A typ-
ical setting of the recombination consists in using discrete recombination for object variables
and global intermediary recombination for strategy parameters. For further details on these
operators, see [5].
The recombination operator needs also be specified for a (
,
)-evolution strategy when
> 1
is chosen.
2.5 Selection
Essentially, the evolution strategy offers two different variants for selecting candidate solutions
for the next iteration of the main loop of the algorithm: ( , )-selection and ( + )-selection.
The notation ( ) indicates that parents create > offspring by means of recombina-
tion and mutation, and the best offspring individuals are deterministically selected to replace
the parents (in this case, Q = in algorithm 1). Notice that this mechanism allows that the
best member of the population at generation t + 1 might perform worse than the best individual
at generation t , i.e., the method is not elitist , thus allowing the strategy to accept temporary
deteriorations that might help to leave the region of attraction of a local optimum and reach
a better optimum. Moreover, in combination with the self-adaptation of strategy parameters,( , )-selection has demonstrated clear advantages over its competitor, the ( + ) method.
In contrast, the ( + )-strategy selects the survivors from the union of parents and off-
spring, such that a monotonic course of evolution is guaranteed (Q = P ( t )
in algorithm 1).
For reasons related to the self-adaptation of strategy parameters, the (
,
)-evolution strategy
is typically preferred.
5
8/17/2019 Innovative Methodologies in Evolution Strategies
12/62
2.6 Termination Criterion
There are several options for the choice of the termination criterion, including the measurement
of some absolute or relative measure of the population diversity (see e.g. [5], pp. 80–81), a
predefined number of iterations of the main loop of the algorithm, or a predefined amount of
CPU time or real time for execution of the algorithm.
3 Self-Adaptation
The settings for the learning rates , 0 and 0
are recommended by Schwefel as reasonable
heuristic settings (see [202], pp. 167–168), but one should have in mind that, depending on
the particular topological characteristics of the objective function, the optimal setting of these
parameters might differ from the values proposed. For n
= 1 , however, [26] has recently
theoretically shown that, for the sphere model
f ( ~x ) =
n
X
i = 1
( x
i
; x
i
)
2
(11)
the setting 0
/ 1 =
p
n is the optimal choice, maximizing the convergence velocity of the evo-
lution strategy. Moreover, for a (1 )-evolution strategy Beyer derived the result that 0
c
1
=
p
n (for 1 0 ), where c 1
denotes the progress coefficient of the (1 )-strategy.
For an empirical investigation of the self-adaptation mechanism defined by the mutation
operator variants (4)–(8), [204, 205, 206] used the following three objective functions which
are specifically tailored to the number of learnable strategy parameters in these cases:
1. Function
f
1
( ~x ) =
n
X
i = 1
x
2
i
(12)
requires learning of one common standard deviation , i.e., n = 1 .
2. Function
f
2
( ~x ) =
n
X
i = 1
i x
2
i
(13)
requires learning of a suitable scaling of the variables, i.e., n
= n .
3. Function
f
3
( ~x ) =
n
X
i = 1
0
@
i
X
j = 1
x
j
1
A
2
(14)
requires learning of a positive definite metrics, i.e., individual i and n = n ( n ; 1 ) = 2
different covariances.
As a first experiment, Schwefel compared the convergence velocity of a (1 1 0
) and a (1+10)-
evolution strategy withn
= 1
on the sphere modelf
1
withn = 3 0
. The results of a comparable
experiment performed for this study (averaged over ten independent runs, with the standard
6
8/17/2019 Innovative Methodologies in Evolution Strategies
13/62
Figure 1: Comparison of the convergence velocity of a (1 1 0
)-strategy and a (1 + 1 0
)-strategy
in case of the sphere modelf
1
withn = 3 0
andn
= 1
.
deviations initialized with a value of 0.3) are shown in figure 1, where the convergence velocity
or progress is measured by log(
q
f
m n
( 0 ) = f
m n
( g ) )
withf
m n
( g )
denoting the objective function
value in generationg
. It is somewhat counterintuitive to observe that the non-elitist (1 1 0
)-
strategy, where all offspring individuals might be worse than the single parent, performs better
than the elitist (1+10)-strategy. This can be explained, however, by taking into account that
the self-adaptation of standard deviations might generate an individual with a good objective
function value but an inappropriate value of for the next generation. In case of a plus-strategy,
this inappropriate standard deviation might survive for a number of generations, thus hindering
the combined process of search and adaptation. The resulting periods of stagnation can be
prevented by allowing to forget the good search point, together with its inappropriate step size.From this experiment, Schwefel concluded that the non-elitist ( )-selection mechanism is an
important condition for a successful self-adaptation of strategy parameters. Recent experimental
findings by Gehlhaar and Fogel [56] on more complicated objective functions than the sphere
model give some evidence, however, that the elitist strategy performs as well as or even better
than the ( )-strategy in many practical cases.
For a further illustration of the self-adaptation principle in case of the sphere model f 1
, we
use a time-varying version where the optimum location ~x = ( x 1
: : : x
n
) is changed every 150
generations. Ten independent experiments for n = 3 0 and 1000 generations per experiment
are performed with a (15,100)-evolution strategy (without recombination). The average best
objective function value (solid curve) and the minimum, average, and maximum standard devi-ations m n
, avg, and m a x are reported in figure 2. The curve of the objective function value
clearly illustrates the linear convergence of the algorithm during the first search interval of 150
generations. After shifting the optimum location at generation 150, the search stagnates for a
while at the bad new position before the linear convergence is observed again.
The behavior of the standard deviations, which are also plotted in figure 2 clarifies the
7
8/17/2019 Innovative Methodologies in Evolution Strategies
14/62
Figure 2: Best objective function value and minimum, average, and maximum standard devi-
ation in the population plotted over the generation number for the time-varying sphere model.
The results were obtained by using a (15,100)-evolution strategy with n
= 1 , n = 3 0 , without
recombination.
Figure 3: Convergence velocity on f 2 for a ( 1 0 0 )-strategy with 2 f 1 : : : 3 0 g for the self-adaptive evolution strategy (dashed curve) and the strategy using optimum prefixed values of
the standard deviations i
.
8
8/17/2019 Innovative Methodologies in Evolution Strategies
15/62
Figure 4: Comparison of the convergence velocity of a (1 5 1 0 0
)-strategy with correlated muta-
tions (solid curve) and with self-adaptation of standard deviations only (dashed curve) in case
of the functionf
3
withn = n
= 1 0
,n
= 4 5
.
reason for the periods of stagnation of the objective function values: Self-adaptation of standard
deviations works both by decreasing them during the periods of linear convergence and by
increasing them during the periods of stagnation, back to a magnitude such that they have an
impact on the objective function value. This process of standard deviation increase, which
occurs at the beginning of each interval, needs some time which does not yield any progress
with respect to the objective function value. According to [25], the number of generations
needed for this adaptation is inversely proportional to 20
(that is, proportional to n ) in case of a
(1 )-evolution strategy.
In case of the objective function f 2 , each variable x i is differently scaled by a factorp
i ,such that self-adaptation requires to learn the scaling of n different
i
. The optimal settings
of standard deviations i
/ 1 =
p
i are also known in advance for this function, such that self-
adaptation can be compared to an evolution strategy using optimally adjusted i
for mutation.
The result of this comparison is shown in figure 3, where the convergence velocity is plotted for
( 1 0 0 )-evolution strategies as a function of , the number of parents, both for the self-adaptive
strategy and the strategy using the optimal setting of i
.
It is not surprising to see that, for the strategy using optimal standard deviations i
, the
convergence rate is maximized for = 1 , because this setting exploits the perfect knowledge in
an optimal sense. In case of the self-adaptive strategy, however, a clear maximum of the progress
rate is reached for a value of = 1 2
, and both larger and smaller values of
cause a strongloss of convergence speed. The collective performance of about 12 imperfect parents, achieved
by means of self-adaptation, almost equals the performance of the perfect (1,100)-strategy and
outperforms the collection of 12 perfect individuals by far. This experiment indicates that self-
adaptation is a mechanism that requires the existence of a knowledge diversity (or diversity of
internal models), i.e., a number of parents larger than one, and benefits from the phenomenon
9
8/17/2019 Innovative Methodologies in Evolution Strategies
16/62
of collective (rather than individual) intelligence.
Concerning the objective function f 3
, figure 4 shows a comparison of the progress for a
(15,100)-evolution strategy with n
= n = 1 0 , n
= 0 (that is, no correlated mutations) and
n
= n ( n ; 1 ) = 2 = 4 5 (that is, full correlations). In both cases, intermediary recombi-
nation of object variables, global intermediary recombination of standard deviations, and no
recombination of the rotation angles is chosen. The results demonstrate that, by introducing
the covariances, it is possible to increase the effectiveness of the collective learning process in
case of arbitrarily rotated coordinate systems. Recently, [180] has shown that an approxima-tion of the Hessian matrix could be computed by correlated mutations with an upper bound of
+ = ( n
2
+ 3 n + 4 ) = 2 on the population size, but the typical settings ( = 1 5 , = 1 0 0 )
are often not sufficient to achieve this (an experimental investigation of the scaling behavior of
correlated mutations with increasing population sizes and problem dimension has not yet been
performed).
The choice of a logarithmic normal distribution for the modification of the standard devia-
tions i
in connection with a multiplicative scheme in equations (6), (4) and (8) is motivated by
the following heuristic arguments (see [202], p. 168):
1. A multiplicative process preserves positive values.
2. The median should equal one to guarantee that, on average, a multiplication by a certain
value occurs with the same probability as a multiplication by the reciprocal value (i.e.,
the process would be neutral under absence of selection).
3. Small modifications should occur more often than large ones.
The effectiveness of this multiplicative logarithmic normal modification is presently also
acknowledged in evolutionary programming, since extensive empirical investigations indicate
some advantage of this scheme over the original additive self-adaptation mechanism used in
evolutionary programming [185, 184, 186], where
0
i
=
i
( 1 + N ( 0 1 ) ) (15)
(with a setting of 0 2 [186]). Recent investigations indicate, however, that this becomes
reversed when noisy objective functions are considered, where the additive mechanism seems
to outperform multiplicative modifications [4].
The study by Gehlhaar and Fogel [56] also indicates that the order of the modifications of
x
i
and i
has a strong impact on the effectiveness of self-adaptation: It is important to mutate
the standard deviations first and to use the mutated standard deviations for the modification of
object variables. As the authors point out in that study, the reversed mechanism might suffer
from generating offspring that have useful object variable vectors but bad strategy parameter
vectors, because these have not been used to determine the position of the offspring itself.Concerning the sphere model f
1
and a (1 )-strategy, Beyer has recently indicated that equa-
tion (15) is obtained from equation (6) by Taylor expansion breaking off after the linear term,
such that both mutation mechanisms should behave identically for small settings of the learning
rates
0
and
, when
0
=
[25]. This was recently confirmed also with some experiments for
the time-varying sphere model [15].
10
8/17/2019 Innovative Methodologies in Evolution Strategies
17/62
4 Variations
4.1 Mutative Step-Size Control
For a (1, )-strategy and n
= 1 , the self-adaptation of strategy parameters can also be facilitated
by using the so-called mutational step size control by Rechenberg, which modifies the standard
deviations according to the following rule ([169], p. 47):
0
=
(
if u U ( 0 1 ) 1 = 2
= if u U ( 0 1 ) > 1 = 2 (16)
A value of = 1 3 of the learning rate is proposed by Rechenberg.
As shown in [25], this self-adaptation rule also provides a reasonable choice with a con-
vergence velocity comparable to that achieved by equation 4 for the convex case. This result
confirms that the self-adaptation principle works for a variety of different probability density
functions for the modification of step sizes, i.e., it is a very robust technique.
4.2 Derandomized Step-Size Adaptation
In contrast to the techniques discussed so far, the derandomized mutational step size control
proposed in [146] accumulates information about the selected individual’s mutation vector ~z
over the course of evolution by adding up the successful mutations. The authors claim that the
method enables a reliable adaptation of individual step sizes (i.e.,n
different standard devia-
tions i
) even in small populations, namely, in (1, )-strategies with = 1 0 in the experiments
reported. The proposed method utilizes a vector~z
g of accumulated mutations as well as indi-
vidual step sizes
i
and a global step size
according to [146]:
~z
g
= ( 1 ; c ) ~z
g ; 1
+ c ~z
~z
0
=
~
0
(17)
0
=
0
@
e x p
0
@
~z
g
p
n
q
c
2 ; c
; 1 +
1
5 n
1
A
1
A
(18)
0
i
=
i
0
@
z
g
i
q
c
2 ; c
+ 0 3 5
1
A
(19)
x
0
i
= x
i
+
0
0
i
N
i
( 0 1 ) (20)
Essentially, equation (17) captures the history of successful mutations by a weighted sum
of the mutations selected in preceding generations (i.e., ~z g ; 1 ) and the mutation vector ~z of
the selected parent individual (notice that the method applies to (1, )-strategies, i.e., ~z is the
mutation vector of the single best offspring individual produced in generationg ; 1
). Thevector ~ z g is then used to update both a global step size and individual step sizes
i
according
to equations (18) and (19), where~z
g in equation (18) denotes the absolute value of ~ z
g , while
z
g
i
in equation (19) indicates the absolute value of itsi
-th component.
Equation (20) then denotes the generation of offspring individuals from the single parent
(with componentsx
i
) in a way similar to equation (6), but now using
0 and
0
i
. Concerning the
11
8/17/2019 Innovative Methodologies in Evolution Strategies
18/62
choice of the new learning rates c , , and 0 , both theoretical and empirical arguments are given
in [146] for the settings c = 1 = p
n , = 1 = p
n , 0 = 1 = n .
The experimental results presented in [146] demonstrate a clear convergence velocity im-
provement of the derandomized mutational step size control when compared to an (8,50)-
evolution strategy using the update rule given in equation (6), but the investigations focus on
unimodal objective functions.
The general idea of utilizing information from past generations as well is very convincing
and should motivate further research on the derandomized self-adaptation scheme. It should benoted, however, that the method has to be classified at the border between adaptive and self-
adaptive control methods, because equations (18) and (19) do not define a mutational variation
of step sizes involving a random variation in the sense of those defined previously. Randomness
is introduced only by means of the vector~z
g , which takes the mutation vector of the parent
individual into account, not an actually generated random variation.
4.3 Hierarchical Evolution Strategies
This kind of evolution strategy abstracts from the individual and takes genetic operators even
on the level of populations into account. It was introduced by Rechenberg [169] and denoted as
0
=
0
+
0
( = + )
; ES.
Here the inner brackets denote a normal ( = + ) -ES (the notation = indicates a -ary
recombination operator) which runs 0 times for generations, each. After that one got 0
populations and 0 populations are selected for the next generation on the population level.
These 0 populations run through a recombination and mutation cycle ( 0 = 0 ) on the level of
populations to generate 0 new populations and then run the inner ( = + ) -ES again for
cycles. This reproduction cycle on the population level is done 0 times.
The problem to arise is the recombination and mutation on the level of populations. Recom-
bination of populations can be done by simply taking single individuals from all 0 populations
into the succeeding population. Mutation can than be invoked by mutating each of the single in-dividuals or by moving the centres of gravity of the populations [169]. The latter one of course
needs more computational effort.
One can recognize that there are two levels of hierarchy in the approach shown here:
1. The level of individuals, and
2. the level of populations.
The concept however can be applied to more than one level and the nesting can increase to
higher levels like sorts and families in natural evolution [77].
The benefit of these hierarchical or nested evolution strategies is the isolation of populations.
These populations can run in parallel and explore different parts of the search space. Becausethis is done several times it leads to a better exploration of the search space. Rechenberg indi-
cates that this kind of strategy is qualified for multimodal optimization [169].
This ES can also be used for multicriteria optimization (see also section 5.4) because the
objectives to select for can be different on every step of the hierarchy. This only works with in-
dependent objectives, however because e.g. the objective selected for in the level of populations
12
8/17/2019 Innovative Methodologies in Evolution Strategies
19/62
is not working in the level of individuals. This will destroy every good information regarding
one objective in the case of contradicting ones.
A detailed description of the implementation is given in [169] but one should have in mind
that this approach again increases the number of parameters for an evolution strategy. This does
not only need more effort in programming but also requires knowledge and experience in the
tuning of the parameters to achieve good results.
4.4 The ( , , , )-Strategy: Aging of Individuals
In the ( + ) -ES the offspring and their parents are united, before according to a given
criterion, the fittest individuals are selected from this set of size + . Both and can be
as small as1
in this case, in principle. Indeed, the first experiments were all performed on the
basis of a ( 1 + 1 ) -ES. In the ( ) -ES, with > 1 , the new parents are selected from
the
offspring only, no matter whether they surpass their parents or not. The latter version is
in danger to diverge (especially in connection with self-adapting variances – see below) if the
so far best position is not stored externally or even preserved within the generation cycle (so-
called elitist strategy). So far, only empirical results have shown that the comma version has to
be preferred when internal strategy parameters have to be learned on-line collectively. For thatto work, > 1 and intermediary recombination of the mutation variances seem to be essential
preconditions. It is not true that ESs consider recombination as a subsidiary operator.
The( )
-ES implies that each parent can have children only once (duration of life: one
generation = one reproduction cycle), whereas in the plus version individuals may live eternally
– if no child achieves a better or at least the same quality. The new ( ) -ES as defined
in [209, 208] introduces a maximal life span of 1 reproduction cycles (iterations). Now,
both original strategies are special cases of the more general strategy, with = 1 resembling
the comma- and with = 1 resembling the plus-strategy, respectively. Thus, the advantages
and disadvantages of both extremal cases can be scaled arbitrarily. Other new options include:
Free number of parents involved in reproduction (not only 1, 2, or all).
Tournament selection as alternative to the standard ( ) -selection.
Free probabilities of applying recombination and mutation.
Further recombination types including crossover.
In a ( , , , )-ES, the representation of individuals is extended by a positive integer value
2 I N
0
, the remaining life span of the individual in iterations (reproduction cycles). Whenever
a new individual is created by mutation and recombination, its remaining life span is initialized
to =
. The remaining life span is decremented by the selection operator for all individualswhich survive selection.
The remaining life span is then used to modify the traditional deterministic ES selection
operator , which can be defined formally as:
s e l : I
+
! I
(21)
13
8/17/2019 Innovative Methodologies in Evolution Strategies
20/62
Let P ( T ) denote some parent population in reproduction cycle T , ~ P ( T ) their offspring produced
by recombination and mutation, and Q ( T ) = P ( T ) t ~ P ( T ) 2 I + where the operator t denotes
the union operation on multisets. Then
P
( T + 1 )
: = s e l ( Q
( T )
) (22)
The next reproduction cycle contains the best individuals still having a positive remaining
duration of life, i.e., the following relation is valid:
8 ~a 2 P
( T + 1 )
:
a
> 0 ^ 6 9
~
b 2 Q
( T )
n P
( T + 1 )
:
~
b
> ~a (23)
where the relation
> (read: better than) introduces a maximum duration of life, , that defines
an individual to be better than an other one if its remaining duration of life k
is still positive
and its fitness (measured by the objective function) is better.
The definition of the
> - relation is given by:
~a
k
>
~
~a
`
: ,
k
> 0 ̂ f ( ~x
k
) f (
~
~x
`
) (24)
At the end of the selection process, the remaining maximum life durations have to be decre-
mented by one for each survivor:
( T + 1 )
k
: =
~
( T )
k
; 1 8 k 2 f 1 : : : g (25)
It should be noted again that, according to the definition (24) of the “better than” relation, a
setting of = 1 results in discarding the parents regardless of their quality (i.e., the ( , )-
selection as in traditional evolution strategies) while = 1
guarantees parents to be discarded
only if they are outperformed by offspring individuals (i.e., the (
+
)-selection as in traditional
evolution strategies).
As an alternative to this variant of selection, the tournament selection is well suited for
parallelization of the selection process. This method selects times the best individual from
a random subsetB
k
of sizeB
k
=
,2 + 8 k 2 f 1 : : : g
and transfers it tothe next reproduction cycle (note that there may appear duplicates!). The best individual within
each subset B k
is selected according to the
> relation which was introduced in (24). A formal
definition of the ( ) tournament selection follows: Let
B
k
Q
( T )
8 k 2 f 1 : : : g
(26)
be random subsets of Q ( T ) , each of size B k
= . For each k 2 f 1 : : : g choose ~ a k
2 B
k
such that
8
~
b 2 B
k
: ~a
k
>
~
b (27)
Finally,
P
( T + 1 )
: =
G
k = 1
f ~a
( T + 1 )
k
g (28)
As an extension to the traditional recombination operator, the generalized recombination
operatorr e c : I
! I
is defined as follows:
r e c : = r e c o
(29)
14
8/17/2019 Innovative Methodologies in Evolution Strategies
21/62
where c o : I ! I chooses 1 parent vectors from I with uniform probability, and
r e : I
! I creates one offspring vector by mixing characters from parents.
Let A P ( T ) of size A = be a subset of arbitrary parents chosen by the operator c o ,
and let ^~ a 2 I be the offspring to be generated. If A = f ~a 1
~a
2
g , ~a 1
and ~a 2
being two out of
parents, holds, recombination is called bisexual. If A = f ~a 1
: : : ~ a
g and > 2 , recombination
is called multisexual. While recombination in evolution strategies was originally proposed for
the two cases of = 2 and = (global recombination), and was restricted to = 2 in
genetic algorithms, Eiben generalized the idea for an arbitrary number of parents 2
involved in the creation of either one (e.g., in case of scanning crossover ) or (e.g., in case
of diagonal crossover ) offspring individuals [39, 41, 40]. This generalization is adapted here
for extending discrete and intermediary recombination in evolution strategies to an arbitrary
number of parents, but still generating one offspring only per application of the recombination
operator. First experimental results in parameter optimization indicate that the optimum value of
is problem-dependent, but in many cases = is the most efficient setting for recombination
of the object variables [38].
In contrast to traditional evolution strategies which always apply recombination for the cre-
ation of offspring, we also propose here to introduce recombination probabilities p r
2 0 1
3
as a further generalization of the algorithm. A recombination probability p r
for one of the
three components of individuals that might undergo recombination is algorithmically realized
by sampling a uniform random variable u U ( 0 1 ) and applying no recombination, if u > p r
,
or the corresponding recombination operator, if u p r
.
Finally, an offspring individual created by recombination is equipped with a remaining life
time = .
5 Application-Oriented Extensions
5.1 Noisy Objective Functions
Originally designed for experimental optimization [166, 203], Evolution Strategies are claimed
to be of general applicability as well as robust in the presence of noise. Whereas the universality
of these algorithms was validated through lots of applications [13] little is known about the
robustness in case of pertubations. But the ability to deal with noisy functions not only is a
prerequisite for experimental optimization, e.g. because of limited precision of observations,
but also in the context of numerical optimization like in the field of computer simulation.
Despite of their simple structure Evolution Strategies show a complex dynamic behavior.
Theoretical investigations up to now were successful only for simplified strategy variants and
convex objective functions like the sphere modelf
1
( ~x ) =
P
n
i = 1
x
2
i
.
Here we cite a result from Beyer [24], which describes the dynamics of the (1, )-ES on the
noisy objective functionf
1
( ~x ) + N ( 0
)
:
R
g
=
2
0
@
n
2 R
;
2 R c
1
q
2
+ ( 2 R )
2
1
A (30)
R
andg
denote the remaining distance to the true optimimum point (~ 0
) and the current
15
8/17/2019 Innovative Methodologies in Evolution Strategies
22/62
2 3 5 10 50 100
c
1
0.5642 0.8463 1.1630 1.539 2.249 2.508
Table 1: Some values forc
1
generation number, respectively. The standard deviations for the mutation and the perturbationare given by and
. The model is of dimensionality n and c 1
denotes the so called progress
coefficient , which is a slowly increasing function in [24]:
c
1
p
2 l n (31)
Expressions (30) and (31) hold for large n and -values, respectively.
Table 1 lists some values of c
1
which are analytically derived for 5
and numerically
approximated for > 5
from Scheel [187].
We will make use of equation (30) to investigate the steady state, i.e. R 1
:
R
g
! 0 .
Assuming l i m g ! 1
= 0 we get
R
1
=
1
2
s
n
c
1
and (32)
f
1
( R
1
) =
n
4 c
1
(33)
Equation (33) can be used to validate experimental results for the sphere model.
For the experiments, standard deviations
2 f 0 0 0 1 0 0 0 5 0 0 1 0 0 5 0 1 0 5 1 0 g are
utilized to perturb the function values and the evolution strategies’ behavior is compared to the
unperturbed case (
= 0 ). The experiments are performed by running a (1,100)-ES as well as
a (15,100)-ES with n
= 1 for the convergence velocity test. Each experiment is repeated for a
total of N = 1 0 0 independent runs in order to obtain statistically significant results. In contrast
to the standard method which assesses the quality of an optimization run by concentrating on the
individual of best (in our case, minimal) objective function value, this is not reasonable in case
of perturbed evaluations because the populations’ extreme values represent outliers. Instead,
the evaluations are based on the average objective function value of the offspring population,
which provides a more robust measure of the true (unperturbed) quality of the individuals.
The experiments are performed on the sphere modelf
1
withn = 3 0
. The initial population
consists of object variables chosen uniformly at random from the interval ; 3 0 3 0
. All initial
standard deviations are set to a value of 25.0, and n
= 1 is used for all runs. Each of the N =
1 0 0
runs is terminated after2 0 0 0 0
function evaluations (2 0 0
generations), and the objective
function data of all runs is averaged to obtain a result of statistical significance. (Indeed, thedata from 100 runs passes a Kolmogorov-Smirnov test for the hypothesis of normally distributed
data for a significance level of 0 0 1 and a confidence interval of 1% around the average.)
Figure 5 shows the behavior of a (1,100)-ES for the set of different perturbation magnitudes
as well as the unperturbed case. The average objective function value is plotted against the
number of generations.
16
8/17/2019 Innovative Methodologies in Evolution Strategies
23/62
Figure 5: Courses of evolution for (1,100)-ES on the sphere model and standard deviations
2 f 0 0 0 0 1 0 0 0 5 0 0 1 0 0 5 0 1 0 5 1 0 g for the perturbation.
The courses of evolution clearly demonstrate the capability of an evolution strategy to pro-
ceed as fast as in the unperturbed case as long as the magnitude of
is small in comparison to
f . If f decreases beyond a certain level the selection is based on the perturbation only and the
search process becomes a random walk thus limiting the convergence precision.
Table 2 shows a remarkable accordance between theoretical and experimental results com-
paring the (1,100)-ES steady states. The difference of a factor of approximately 1.3 can be
explained through the fact, that equation (33) is valid forn ! 1
and ! 0
only.Increasing the parent population size to a more practical value of 15, we observe a similar
behavior (figure 6). A closer look not only shows a moderate speed up due to the influence
of recombination, but also a much better localisation of the optimum point in the steady state
by approximately a factor of 4. This effect is caused by the reduction of selection pressure
which prohibits the outliers to take over the whole population. A first analysis lets us assume
an optimal parameter value for between 10 and 15 in this configuration.
5.2 Robust Design
Robustness is an important requirement for almost all kinds of products, i.e. they should keep agood performance under varying conditions (temperature or humidity). Furthermore, the impact
of wear, as well as manufacturing tolerances, should be limited as much as possible. Conse-
quently, the production process itself as well as the environmental influences after the product is
put to use have to be regarded during the product design. We have shown for multilayer optical
coatings (MOCs) how robust designs can be achieved by using evolutionary algorithms. MOCs
17
8/17/2019 Innovative Methodologies in Evolution Strategies
24/62
(1,100)-ES (1,100)-ES (15,100)-ES
theory observation observation
1.0 2.990 3.8 0.975
0.5 1.495 1.9 0.469
0.1 0.299 0.4 0.091
0.05 0.150 0.2 0.047
0.01 0.030 0.038 0.0090.005 0.015 0.02 0.005
0.001 0.003 0.004 0.001
Table 2: f ( R 1
) for the (1,100)-ES theory, (1,100)-ES experiment and the (15,100)-ES experi-
ment.
are used to guarantee specific transmission and/or reflection characteristics of optical devices.
The objective of MOC designs is to find sequences of layers of particular materials with spe-
cific thicknesses showing the desired characteristics as closely as possible. The MOC design
problem is not analytically solvable.Let ~x = ( x
1
: : : x
n
) be a vector of parameters of a given design problem, e.g., the refraction
indices and thickness of the optical layers. Given a function f ( ~x ) describing the merit of a
design feature, e.g. the color perception of the reflected light, and being a target value for
f ( ~x ) , then if disturbances are neglected the task is to find such an ~x that the difference between
f ( ~x
) and is minimized.
On the other hand the usability of two products although manufactured under almost iden-
tical conditions might differ significantly, due to external conditions such as temperature and
humidity, or internal factors such as wear as well as manufacturing tolerances. Some of these
factors are not controllable at all. Others can only be reduced with unjustifiable effort. Thus
they are regarded as disturbances, and it is desired to reduce their influence as much as possible.
Here we focused on manufacturing tolerances, but the approach could easily be extended.
The disturbances are represented by a vector of random numbers ~ = ( 1
: : :
n
). If the
probability distribution of the i
are known as well as their influence on f we might rewrite
f ( ~x ) as ~ f ( ~x ~ ) . In our example the disturbances are assumed to be normally distributed with
zero mean and will have an additive influence on the parameter values. Thus, we define
~
f ( ~x
~
) = f ( x
1
+
1
: : : x
n
+
n
) (34)
The task is now to minimize the deviations of ~ f ( ~x ~ ) from .
This leads to the question of how to assess these deviations. The traditional approach regards
all products with~
f ( ~x
~
) ;
as equally good for some predefined
and all others as off-cuts. But this approach is somewhat unrealistic, since if such products are assembled to larger
units such as devices on electronic boards malfunctions might occur due to aggregations of
deviations of single elements.
The method of parameter design after Taguchi [218, 93, 179] takes these effects into account
by considering every deviation from the objective
as a loss. In practical applications quadratic
18
8/17/2019 Innovative Methodologies in Evolution Strategies
25/62
Figure 6: Courses of evolution for (15,100)-ES on the sphere model and standard deviations
2 f 0 0 0 0 1 0 0 0 5 0 0 1 0 0 5 0 1 0 5 1 0 g for the perturbation.
loss functions of the form
(
~
f ( ~x
~
) ; ) )
2 (35)
have proven to be well suited if no better alternative is known. The expected loss then becomes
L = k E ( (
~
f ( ~x
~
) ; )
2
) (36)
where k is some constant and E denotes the expectation value of the quadratic deviation.
In our work we follow the approach of Greiner [61, 62] who defines the objective functionas
E
( ;
~
f )
2
( ~x ) = k
Z
( ;
~
f ( ~x
~
) )
2
P (
~
) d
~
(37)
where P ( ~ ) denotes the the joint probability distribution of the distrubances. Since in most
applications the expectation value E cannot be calculated analytically it must be approximated.
Here we use1
t
t
X
i = 1
( ;
~
f ( ~x
~
i
) )
2 (38)
as an estimate, where ~ i
i = 1 : : : t , are vectors of normally distributed random numbers with
mean zero and standard deviation
. The estimation error scales proportional to
p
t
, and sincein most applications the possible number of evaluations is very limited this approach yields a
stochastic optimization problem. As evolutionary algorithms have proven their robustness in
case of noisy objective functions [46, 24, 9, 64] they are promising candidates here.
In order to clarify the relationship between the original merit functionf
and the expected
lossL
we investigated a rectangular function. We could show that optimla points of L
do not
19
8/17/2019 Innovative Methodologies in Evolution Strategies
26/62
necessarily correspond to optimal points of f ( ~x ) ; . As already mentioned we considered as
an practical example the design of multilayer optical coatings most frequnetly used for optical
filters. During the production process the layer thickness can not be controlled with arbitrary
precision. Additionally, the refraction indices vary slightly due to pollution of the optical mate-
rials. Thus, we might observe significant variances in the quality of single filters.
Basically, we applied two modified evolution strategies (ES). A extended ( 2 5 + 5 0 ) -ES for
mixed-integer optimization after [14] and a parallel diffusion model after [199], where the in-
dividuals are located on a regular grid. We used 15 subpopulations with a size of 20x25, aneighborhood size of 7x7 and an isolation time of 30 generations. The MOC designs found by
the evolutionary algorithms are substantially more robust to parameter variations than a refer-
ence design and therefore perform much better in the average case, although for the undisturbed
case the reference design is significantly better. This observation was expected, since sensitiv-
ity analysis shows that many local optima are not robust under parameter variations. For more
details see [230].
5.3 Dynamic Environments
The principle of self-adaptation promises to be useful not only in case of static optimizationproblems, but also for dynamic optimization problems where the objective function changes
over the course of optimization. The dynamic environment requires the evolutionary algorithm
to maintain sufficient diversity for continuous adaptation to the changes of the landscape, which
should be possible by means of self-adaptation of strategy parameters. Recently, it was demon-
strated that indeed the self-adaptation principle in evolution strategies provides an effective way
of tracking moving optima in case of dynamic objective functions [6].
In the general case of a dynamic environment, the goal is not only to acquire an optimal
solution but also to track its progression through the search space as closely as possible. In
contrast to the static optimization problem f ( ~x ) ! m i n (~x 2 M ), the dynamic optimization
problem
f ( ~ x t ) ! m i n ~x 2 M t 2 T
depends on an additional parameter t 2 T (the time) as well, i.e., the objective function changes
with t . Generally, this implies that, for ti
6= t
j
, f ( ~ x ti
) 6= f ( ~ x t
j
) , i.e., the objective function
might be different after each function evaluation, in contrast to a simplified form of dynamic
behavior where the objective function remains constant within specific time intervals tk
t
k
+
t
k
, such that
t
i
t
j
2 t
k
t
k
+ t
k
) f ( ~ x t
i
) = f ( ~ x t
j
)
For the investigations reported in [6], it was assumed that the dynamics of the objective
function and the dynamics of the evolutionary algorithm are synchronized by identifyingt
withthe generation index of the algorithm and by keeping f constant within one generation, such
that t
k
1
andt
i
t
j
t
k
2 f 0 1 2 : : : t
m a x
g
. Moreover, t
k
= : g
is also assumed to be
constant, such that the objective function changes every g
generations after completing the
evaluation of the whole population in case of a generational evolutionary algorithm such as the
evolution strategy.
20
8/17/2019 Innovative Methodologies in Evolution Strategies
27/62
Figure 7: Evolution strategy results for the linear dynamics with update frequenc y g = 1
(left), g = 5 (middle), g = 1 0 (right).
Three dynamical environments derived from the sphere model
f ( ~x ) =
n
X
i = 1
x
2
i (39)
are used for the experiments. The dynamical environments are generated by translating the base
function along a linear trajectory according to
f ( ~ x t ) =
n
X
i = 1
( x
i
+
i
( t ) )
2 (40)
where t 2 I N 0
denotes the time counter (equivalent to the generation number in an evolutionary
algorithm).
The trajectory is defined by setting i
( 0 ) = 0 8 i 2 f 1 : : : n g , and
i
( t + 1 ) =
(
i
( t ) + s ( t + 1 ) mod g = 0
i
( t )
else (41)
The algorithm used here is a standard (15,100)-evolution strategy with local discrete re-
combination on the object variables x i
and global intermediary recombination on the strategy
parameters i
. 100 offspring individuals are generated per generation, n
= n variances are
used for self-adaptation (although it is well known that one variance is optimal for the sphere
model), all object variables are uniformly initialized within the range ; 5 0 5 0 , and 50 indepen-
dent runs are performed over 500 generations, each. The experiments for the linear dynamics,
with update frequencies g 2 f 1 5 1 0 g and severity s 2 f 0 0 1 0 1 0 5 g are shown in figure
7.In this figure, the left, middle, and right subfigure correspond with an update frequency of
1, 5, and 10 generations, respectively, and each of the subfigures contains the three curves for
the different levels of the severity parameter.
All results reported here give a clear impression that the self-adaptation of variances as uti-
lized in a (
,
)-evolution strategy is an effective method for tracking dynamic environments. In
21
8/17/2019 Innovative Methodologies in Evolution Strategies
28/62
all cases, the optimization proceeds with a linear rate of convergence as predicted by the theory
of evolution strategy behavior on the sphere model, until the objective function value reaches
an order of magnitude corresponding to the squared value of the severity parameter s . With
an update frequency of g = 1 , the algorithm constantly follows the dynamic environment
without any deteriorations.a With larger update frequencies g 2 f 5 1 0 g , the objective func-
tion values oscillate with a frequency of g generations between the objective function value
achieved by a continuous update at every generation (left figures) and the further improvement
that can be achieved by holding the environment constant for g generations. This results in alarger amplitude of the oscillation when
g
increases.
The direct conclusion from the three sets of experiments reported here is that the lognormal
self-adaptation rule as used in (
,
)-evolution strategies is perfectly able to track the dynamic
optima.
5.4 Multiple Criteria Decision Making
It has become increasingly obvious that the optimization under a single scalar–valued criterion
— often a monetary one — fails to reflect the variety of aspects in a world getting more and more
complex. Often, there are several conflicting optimization criteria (e.g., costs vs. reliability),
such that the objective function is characterized best by a multiple-criteria approach with k > 1
objectives, i.e.:~
f : M ! I R
k
~
f ( ~x ) = ( f
1
( ~x ) : : : f
k
( ~x ) )
(42)
Under such circumstances, the goal of the search is to identify solutions which can not be
improved in any combination of the objectives without degradation in the remaining, i.e., a
solution ~x i
is called Pareto-optimal (nondominated): ,
6 9 ~x
j
:
~
f ( ~x
j
)
P
8/17/2019 Innovative Methodologies in Evolution Strategies
29/62
Pareto-based approaches, using a population ranking according to Pareto dominance.
While all of these approaches can be used in combination with an evolution strategy, we
focus here on a study which falls in the second of the above mentioned categories and uti-
lizes the concept of polyploidy to deal with different objectives. More precisely, the following
modifications to a ( , )-evolution strategy are made [112, 113]:
Since the environment now consists of k objectives the selection step is provided with a
fixed user–definable vector that determines the probability of each objective to become
the sorting criterion in the k iterations of the selection loop. Alternatively, this vector may
be allowed to change randomly over time.
Furthermore, the extension of an individual’s genes by recessive information turned out
to be necessary in order to maintain the population’s capability of coping with a chang-
ing environment. The recessive genes enable a fast reaction after a sudden variation of
the probability vector. One can also observe this behaviour in nature: The younger the
environment the higher the portion of polyploid organisms.
Using these principles, the algorithm is able to generate solutions covering the Pareto front,
such that the user is provided with an idea of the tradeoffs between the objectives. It should benoted that efficient solutions in one generation may become dominated by individuals emerg-
ing in a later generation. This explains the non–efficient points in figure 8 (left) for the two
objectives
f
1
( ~x ) =
n
X
i = 1
( ; 1 0 e x p ( ; 0 2
q
x
2
i
+ x
2
i + 1
) ) (44)
f
2
( ~x ) =
n
X
i = 1
( x
i
0 8
+ 5 s i n ( x
i
)
3
)
(45)
For efficiency reasons the ‘parents’ of the next generation are stored provisionally in an array
that is cleaned out if there is not enough space left for further individuals. If this operation does
not result in enough free space solutions ‘close’ to one another are deleted. As an important
side effect the elements of the Pareto set are forced apart thus allowing a good survey with only
a finite number of solutions. Figure 8 (right) displays the situation after tidying up.
When working with diploid individuals the inclusion of the recessive genes in the selection
step turns out to be vital. Otherwise, undisturbed by the outside world they lead such a life
of their own that an individual whose dominant genes have been freshened up with recessive
material has no chance of surviving the next selection step. The best results were achieved with
a probability of about 1 = 3 for exchanging dominant and recessive genes. This value also serves
as a factor when putting together the overall fitness vector. Only in this way the additional
recessive material can serve as a stock of variants. From further test runs one can also concludethat diploid or, in general, polyploid individuals are not worth the additional computing time in
a static environment consisting only of one objective function.
Since the algorithm tries to cover the Pareto set as good as possible a probability distribution
forcing certain minimum changes during the mutation step ought to yield better results. Indeed,
the (symmetric) Weibull distribution turned out to be better than the Gaussian distribution.
23
8/17/2019 Innovative Methodologies in Evolution Strategies
30/62
Figure 8: Graphical visualization of the output of the algorithm.
The stochastic approach towards vector optimization problems via evolution strategies leads
to one major advantage: In contrast to other methods no subjective decisions are required during
the course of the iterations. Instead of narrowing the control variables space or the objective
space by deciding about the future direction of the search from an ‘information vacuum’ the
decision maker can collect as much information as needed before making a choice which of the
alternatives should be realized. Moreover, using a population while looking for a set of efficient
solutions seems to be more appropriate than just trying to improve one ‘current best’ solution.
One might exploit the algorithm’s capability of self–adapting its parameters even further:
The exchange rate between dominant and recessive genetic material can be adjusted on–line
thus providing the user with a measure of convergence. The self–adaptation property largely
depends on a selection scheme that forces the algorithm to ‘forget’ the good solutions (‘parents’)of one generation. When accepting a possible recession from one generation to the next on the
phenotype level individuals with a better ‘model’ of their environment, i.e. better step sizes i
are likely to emerge in later generations. This kind of selection seems to be lavish at first sight
but it favours better adapted settings, thus speeding up the search in the long run.
5.5 Constraint Handling
In practical application problems, the feasible region F usually is only a subspace of the whole
search space S , and it is defined by a set of m additional constraints:
g
j
( ~x ) 0 for j = 1 : : : q (46)h
j
( ~x ) = 0 for j = q + 1 : : : m : (47)
During the optimum seeking process of ESs, inequality constraints so far have been handled
as barriers, i.e., offspring that violate at least one of the restrictions are lethal mutations. Before
the selection operator can be activated, exactly
non-lethal offspring must have been generated.
24
8/17/2019 Innovative Methodologies in Evolution Strategies
31/62
In case of a non-feasible start position ~x ( 0 ) , a feasible solution must be found at first. This
can be achieved by means of an auxiliary objective function
~
f ( ~x ) =
m
X
j = 1
g
j
( ~x )
j
( ~x ) (48)
with
j
( ~x ) = ; 1
if g
j
( ~x ) 0 and d .
This kind of handling bounds can be used with all optimum seeking methods, provided that
they are started within the feasible region. Some may have trouble with the sine-term due to the
periodicity introduced, however.
6 Parallel Evolution Strategies
Due to the fact, that all individuals of a population act simultaneously in nature one can speak
of an inherent parallelism in evolution. Although this was already known when the principles of evolutionary algorithms were designed, no one could at that time imagine the power of parallel
computers, which are now available. Consequently, evolutionary algorithms have usually been
implemented sequentially.
Nowadays we are used to parallel computers and so in the last years a lot of suggestions to
parallise evolutionary algorithms have been made. The goals of parallelism are simple:
25
8/17/2019 Innovative Methodologies in Evolution Strategies
32/62
Speed: Get the same results like a sequential algorithm in less time.
Robustness: Get more robust results regarding errors or noisy information.
Quality: Get better results in the same time as a sequential algorithm.
There are at least two different approaches to parallel evolutionary algorithms [57, 5] which
are described here next to a mixed-model approach which tries to put the best of both models
together. Before that a very simple but effective way to use parallel hardware is presented,which does not match to the models presented afterwards.
6.1 The Master-Slave Approach
This approach is very effective if the calculation of the fitness function is time intensive, e.g. when
optimizing simulaton models where the simulation software runs a long time like in [7].
In this case the evolutionary algorithm can be divided into a master-process, where the
individuals are generated and the genetic operators are applied, and a number of slave-processes,
where the fitness function is evaluated.
Now the different processes can run on different maschines and the fitness calculation for awhole population can be done parallel. A special kind of steady-steate selection [228] with a
( + 1 ) ;
selection scheme was presented in [7, 11, 97] which nearly avoids any idle times on
the processors, because every time a fitness is calculated a new individual is send to the idle
processor without waiting for any other results from the slaves.
6.2 Coarse Grained Parallelism: The Migration Model
In the migration model a population is divided into a number of subpopulations, so-called demes
[5]. These subpopulations are still panmictic but exchange genetic information by the migration
of individuals. Two concepts are known [57, 215, 5]:
1. In the Island Model there is a random exchange of information between the subpopula-
tions, and
2. in the Stepping Stone Model this exchange is limited to migration paths which connect
the subpopulations that are placed in a topology (e.g. a ring, or a torus etc.).
These algorithms can be scaled to a balanced usage of processing and communication resources
by tuning the local population size and the migration frequencies.
Different ways to choose the individuals to leave the local population are known. To choose
one randomly seems to be a good compromise between the danger of premature stagnation
when choosing the best individual and small chances to survive in the new subpopulation whenchoosing the worst one to leave.
Another problem is the way to insert immigrants into the new population. A solution which
c
Top Related