Download - Innovative Methodologies in Evolution Strategies

8/17/2019 Innovative Methodologies in Evolution Strategies

1/62

ICD Center for Applied Systems Analysis

Innovative Methodologies in

Evolution Strategies— INGENET Project Report D 2.2 —

June 1998

Thomas Bäck, Boris Naujoks

Center for Applied Systems Analysis (CASA)

Informatik Centrum Dortmund

Joseph-von-Fraunhofer-Str. 20

D-44227 Dortmund


2/62

ii


3/62

Abstract

This INGENET report describes the state-of-the-art in research and application of evo-

lution strategies with the goals of making this knowledge accessible to the INGENET mem-

bers in a compact form and outlining the technological and economical perspectives of

evolution strategies on the European level.

Evolution strategies are one of the main paradigms in the field of evolutionary compu-

tation, focusing on algorithms for adaptation and optimization which are gleaned from the

model of organic evolution.The report puts its emphasis on algorithmic and application-oriented aspects of evolu-

tion strategies. The algorithmic aspects include an overview of all components of a mod-

ern (

,

)-strategy and a detailed explanation of the concept of strategy parameter self-

adaptation, which is considered to be the main distinguishing feature between evolution

strategies and genetic algorithms. The self-adaptation process implements and evolution-

ary optimization process also on the level of strategy parameters such as mutational step

sizes and therefore offers an elegant solution to the parameter tuning problem of evolu-

tionary algorithms. The working principles of self-adaptation are explained in detail in

section 3 of this report.

A number of recent variations of the basic evolution strategy, including alternatives for

the self-adaptation method, the introduction of hierarchies of evolution strategies, and theprinciple of individual aging in the ( , , , )-strategy, are presented in section 4.

Further aspects which are of strong interest from an application-oriented point of view

include noisy and dynamic object functions as well as multiple criteria decision making

problems and constraint handling. These are discussed in section 5, clarifying the fact that

evolution strategies offer effective techniques for handling all of these additional difficulties

of practical applications.

Section 6 gives a brief overview of the parallelization possibilities of evolution strate-

gies, which are suitable for fine-grained as well as coarse-grained parallelization.

An overview of practical applications of evolution strategies is given in section 7, where

case studies are grouped into disciplines and the corresponding literature references are

given. Due to the strong increase of the number of publications in the field of evolutionary

computation in the 1990s, the collection of case studies stops with most recent examples

from 1994, however containing more than 150 examples up to that time.

The report concludes by giving an outline of the perspectives of evolution strategies

by discussing its technological future with a focus on the economic potential by industrial

applications of these algorithms. This outline might serve as a technological roadmap for

the exploitation of these techniques within a ten year timeframe.

Thomas Bäck and Boris Naujoks Dortmund, June 1998

Contact information:

Center for Applied Systems Analysis

Informatik Centrum Dortmund

Joseph-von-Fraunhofer-Str. 20

D-44227 Dortmund, Germany

Phone: +49 231 9700 366

Fax: +49 231 9700 959

Email: [email protected]

iii


4/62

iv


5/62

Contents

1 A Brief History 1

2 The Algorithm 2

2.1 Working Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2.2 The Structure of Individuals . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.3 Mutation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.4 Recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.5 Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2.6 Termination Criterion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

3 Self-Adaptation 6

4 Variations 11

4.1 Mutative Step-Size Control . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

4.2 Derandomized Step-Size Adaptation . . . . . . . . . . . . . . . . . . . . . . . 11

4.3 Hierarchical Evolution Strategies . . . . . . . . . . . . . . . . . . . . . . . . . 12

4.4 The ( , , , )-Strategy: Aging of Individuals . . . . . . . . . . . . . . . . . . 13

5 Application-Oriented Extensions 15

5.1 Noisy Objective Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5.2 Robust Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

5.3 Dynamic Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

5.4 Multiple Criteria Decision Making . . . . . . . . . . . . . . . . . . . . . . . . 22

5.5 Constraint Handling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

6 Parallel Evolution Strategies 25

6.1 The Master-Slave Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

6.2 Coarse Grained Parallelism: The Migration Model . . . . . . . . . . . . . . . 26

6.3 Fine Grained Parallelism: The Diffusion Model . . . . . . . . . . . . . . . . . 27

6.4 A Hybrid Approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

7 Applications 28

7.1 Artificial Intelligence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

7.2 Biotechnology . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7.3 Technical Design Applications . . . . . . . . . . . . . . . . . . . . . . . . . . 29

7.4 Chemical Engineering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

7.5 Telecommunications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

7.6 Dynamic Processes, Modeling, Simulation . . . . . . . . . . . . . . . . . . . . 327.7 Medicine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7.8 Microelectronics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7.9 Military . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7.10 Physics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

7.11 Pattern Recognition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

v


6/62

7.12 Production Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.13 Robotics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.14 Supply- and Disposal Systems . . . . . . . . . . . . . . . . . . . . . . . . . . 35

7.15 Miscellaneous . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

8 Perspectives 37

References 39

vi


7/62

1 A Brief History

Evolution Strategies are a joint development of Bienert, Rechenberg and Schwefel, who did

preliminary work in this area in the 1960s at the Technical University of Berlin (TUB) in Ger-

many. First applications were experimental and dealt with hydrodynamical problems like shape

optimization of a bended pipe [119], drag minimization of a joint plate [164], and structure

optimization of a two-phase flashing nozzle [210] 1 . Due to the impossibility to describe and

solve such optimization problems analytically or by using traditional methods, a simple al-gorithmic method based on random changes of experimental setups was developed. In these

experiments, adjustments were possible in discrete steps only, in the first two cases (pipe and

plate) by changing certain joint positions and in the latter case (nozzle) by exchanging, adding

or deleting nozzle segments. Following observations from nature that smaller mutations occur

more often than larger ones, the discrete changes were sampled from a binomial distribution

with prefixed variance. The basic working mechanism of the experiments was to create a mu-

tation, adjust the joints or nozzle segments accordingly, perform the experiment and measure

the quality criterion of the adjusted construction. If the new construction happened to be better

than its predecessor, it served as basis for the next trial. Otherwise, it was discarded and the

predecessor was retained. No information about the amount of improvements or deteriorations

was necessary. This experimental strategy led to unexpectedly good results both for the bended

pipe and the nozzle.

Schwefel was the first who simulated different versions of the strategy on the first available

computer at TUB, a Zuse Z23 [200], later on followed by several others who applied the simple

Evolution Strategy to solve numerical optimization problems. Due to the theoretical results of

Schwefel’s diploma thesis, the discrete mutation mechanism was substituted by normally dis-

tributed mutations with expectation zero and given variance [200]. The resulting two membered

ES works by creating one n -dimensional real-valued vector of object variables from its parent

by applying mutation with identical standard deviations to each object variable. The resulting

individual is evaluated and compared to its parent, and the better of both individuals survives to

become parent of the next generation, while the other one is discarded. This simple selectionmechanism is fully characterized by the term (1+1)-selection.

For this algorithm, Rechenberg developed a convergence rate theory for n 1 for two

characteristic model functions, and he proposed a theoretically confirmed rule for changing the

standard deviation of mutations (the 1 = 5 -success rule) [166].

Obviously, the (1+1)-ES did not incorporate the principle of a population. A first multi-

membered Evolution Strategy or ( +1)-ES having > 1 was also designed by Rechenberg

to introduce a population concept. In a ( +1)-ES parent individuals recombine to form one

offspring, which after being mutated eventually replaces the worst parent individual — if it is

better (extinction of the worst). Mutation and adjustment of the standard deviation was realized

as in a (1+1)-ES, and a recombination mechanism as explained in section 2.4 was used. This

strategy, discussed in more detail in [12], was never widely used but provided the basis to facil-

itate the transition to the ( + )-ES and ( , )-ES as introduced by Schwefel 2 [201, 202, 203].

1 This experiment is one of the first known examples of using operators like gene deletion and gene duplication,

i.e. the number of segments the nozzle consisted of was allowed to vary during optimization.2 The material presented here is based on [203] and a number of research articles, but in the meantime an

1


8/62

Again the notation characterizes the selection mechanism, in the first case indicating that the

best individuals out of the union of parents and offspring survive while in the latter case

only the best offspring individuals form the next parent generation (consequently, > is

necessary). Currently, the ( , )-strategy characterizes the state-of-the-art in Evolution Strategy

research and is therefore the strategy of our main interest to be explained in the following. As

an introductory remark it should be noted that the major quality of this strategy is seen in its

ability to incorporate the most important parameters of the strategy (standard deviations and

correlation coefficients of normally distributed mutations) into the search process, such that op-timization not only takes place on object variables, but also on strategy parameters according to

the actual local topology of the objective function. This capability is termed self-adaptation by

Schwefel [204] and will be a major point of interest in discussing the Evolution Strategy.

2 The Algorithm

2.1 Working Principle

In general, evolutionary algorithms mimic the process of natural evolution, the driving process

for the emergence of complex and well adapted organic structures, by applying variation and se-lection operators to a set of candidate solutions for a given optimization problem. The following

structure of a general evolutionary algorithm reflects all essential components of an evolution

strategy as well (see e.g. [10]):

Algorithm 1:

t : = 0

initializeP ( t )

evaluateP ( t )

while not terminate do

P

0

( t ) : =

variation( P ( t ) )

;

evaluate( P

0

( t ) )

;

P ( t + 1 ) : =

select ( P

0

( t ) Q )

t : = t + 1

od

In case of a ( , )-evolution strategy, the following statements regarding the components of

algorithm 1 can be made:

P ( t ) denotes a population (multiset) of individuals (candidate solutions to the given

problem) at generation (iteration) t of the algorithm.

The initialization att = 0

can be done randomly, or with known starting points obtainedby any method.

The evaluation of a population involves calculation of its members quality according to

the given objective function (quality criterion).

updated and extended edition of Schwefel’s book was published (i.e., [207]).

2


9/62

The variation operators include the exchange of partial information between solutions (re-

combination) and its subsequent modification by adding normally distributed variations

(mutation) of adaptable step sizes. These step sizes are themselves optimized during the

search according to a process called self-adaptation.

By means of recombination and mutation, an offspring population P 0 ( t ) of candi-

date solutions is generated.

The selection operator chooses the

best solutions fromP

0

( t )

(i.e.,Q =

) as starting

points for the next iteration of the loop. Alternatively, a ( + )-evolution strategy would

select the

best solutions from the union of P

0

( t )

andP ( t )

(i.e.,Q = P ( t )

).

The algorithm terminates if no more improvements are achieved over a number of subse-

quent iterations or if a given amount of time is exceeded.

The algorithm returns the best candidate solution ever found during its execution.

In the following, these basic components of an evolution strategy are explained in some

more detail. For extensive information about evolution strategies, refer to [5, 169, 207].

Using a more formal notation following the outline given in [209, 208], one iteration of thestrategy, that is a step from a population P ( T ) towards the next reproduction cycle with P ( T + 1 ) ,

can be modeled as follows:

P

( T + 1 )

: = o p t

E S

( P

( T )

) (1)

where o p tE S

: I

! I

is defined by

o p t

E S

: = s e l ( m u t r e c )

(2)

operating on an input population P ( T ) according to

o p t

E S

( P

( T )

) = s e l ( P

( T )

t

t

i = 1

f m u t ( r e c ( P

( T )

) ) g

(3)

(here,t

denotes the union operation on multisets). Equation (3) clarifies that the population

at generation T + 1 is obtained from P T by first applying a -fold repetition of recombination

and mutation, which results in an intermediate populationP

0 of size

, and then applying the

selection operator to the union of P

( T ) andP

0 . Recall that the recombination operator generates

only one individual per application, which can then be mutated directly.

In the following, both the formal as well as the informal way of describing the algorithmic

components will be used as it seems appropriate.

2.2 The Structure of Individuals

For a given optimization problem

f : M I R

n

! I R f ( ~x ) ! m i n

an individual of the evolution strategy contains the candidate solution~x 2 I R

n as one part

of its representation. Furthermore, there exist a variable amount (depending on the type of

3


10/62

strategy used) of additional information, so-called strategy parameters, in the representation of

individuals. These strategy parameters essentially encode the n -dimensional normal distribution

which is to be used for the variation of the solution.

More formally, an individual ~a = ( ~ x ~ ~ ) consists of up to three components ~x 2 I R n (the

solution), ~ 2 I R n (a set of standard deviations of the normal distribution), and 2 ; n

(a set of rotation angles representing the covariances of the n -dimensional normal distribution),

where n

2 f 1 : : : n g and n

2 f 0 ( 2 n ; n

) ( n

; 1 ) = 2 g . The exact meaning of these

components is described in more detail in section 2.3.

2.3 Mutation

The mutation in evolution strategies works by adding a normally distributed random vector

~z N (

~

0 C ) with expectation vector ~ 0 and covariance matrix C ; 1 , where the covariance matrix

is described by the mutated strategy parameters of the individual. Depending on the amount of

strategy parameters incorporated into the representation of an individual, the following main

variants of mutation and self-adaptation can be distinguished:

n

= 1 , n

= 0 : The standard deviation for all object variables is identical ( ), and all

object variables are mutated by adding normally distributed random numbers with

0

= e x p (

0

N ( 0 1 ) )

(4)

x

0

i

= x

i

+

0

N

i

( 0 1 )

(5)

where 0

/ (

p

n )

; 1 . Here, N ( 0 1 ) denotes a value sampled from a normally distributed

random variable with expectation zero and variance one. The notation N i

( 0 1 ) indicates

the random variable to be sampled anew for each setting of the indexi

.

n

= n , n

= 0 : All object variables have their own, individual standard deviation i

,

which determines the corresponding modification according to

0

i

=

i

e x p (

0

N ( 0 1 ) + N

i

( 0 1 ) ) (6)

x

0

i

= x

i

+

0

i

N ( 0 1 ) (7)

where 0 / (p

2 n )

; 1 and / (q

2

p

n )

; 1 .

n

= n , n

= n ( n ; 1 ) = 2 : The vectors ~ and ~ represent the complete covariance

matrix of then

-dimensional normal distribution, where the covariances are given by rota-

tion angles

j

describing the coordinate rotations necessary to transform an uncorrelated

mutation vector into a correlated one. The details of this mechanism can be found in [5]

(pp. 68–71) or [180]. The mutation is performed according to

0

i

=

i

e x p (

0

N ( 0 1 ) + N

i

( 0 1 ) )

(8)

0

j

=

j

+ N

j

( 0 1 )

(9)

~x

0

= ~x + N (

~

0 C ( ~

0

~

0

) ) (10)

whereN (

~

0 C ( ~

0

~

0

) )

denotes the correlated mutation vector and 0 0 8 7 3

.

4


11/62

The amount of information included into the individuals by means of the self-adaptation

principle increases from the simple case of one standard deviation up to the order of n 2 addi-

tional parameters in case of correlated mutations, which reflects an enormous degree of freedom

for the internal models of the individuals. This growing degree of freedom often enhances the

global search capabilities of the algorithm at the cost of the expense in computation time, and

it also reflects a shift from the precise adaptation of a few strategy parameters (as in case of

n

= 1 ) to the exploitation of a large diversity of strategy parameters.

One of the main design parameters to be fixed for the practical application of the evolutionstrategy concerns the choice of

n

andn

, i.e., the amount of self-adaptable strategy parameters

required for the problem.

2.4 Recombination

In evolution strategies recombination is incorporated into the main loop of the algorithm as the

first variation operator and generates a new intermediate population of individuals by -fold

application to the parent population, creating one individual per application from % (1 % )

individuals. Normally, % = 2 or % = (so-called global recombination) are chosen (but see

also section 4.4 for a generalization). The recombination types for object variables and strategyparameters in evolution strategies often differ from each other, and typical examples are dis-

crete recombination (random choices of single variables from parents, comparable to uniform

crossover in genetic algorithms) and intermediary recombination (arithmetic averaging). A typ-

ical setting of the recombination consists in using discrete recombination for object variables

and global intermediary recombination for strategy parameters. For further details on these

operators, see [5].

The recombination operator needs also be specified for a (

,

)-evolution strategy when

> 1

is chosen.

2.5 Selection

Essentially, the evolution strategy offers two different variants for selecting candidate solutions

for the next iteration of the main loop of the algorithm: ( , )-selection and ( + )-selection.

The notation ( ) indicates that parents create > offspring by means of recombina-

tion and mutation, and the best offspring individuals are deterministically selected to replace

the parents (in this case, Q = in algorithm 1). Notice that this mechanism allows that the

best member of the population at generation t + 1 might perform worse than the best individual

at generation t , i.e., the method is not elitist , thus allowing the strategy to accept temporary

deteriorations that might help to leave the region of attraction of a local optimum and reach

a better optimum. Moreover, in combination with the self-adaptation of strategy parameters,( , )-selection has demonstrated clear advantages over its competitor, the ( + ) method.

In contrast, the ( + )-strategy selects the survivors from the union of parents and off-

spring, such that a monotonic course of evolution is guaranteed (Q = P ( t )

in algorithm 1).

For reasons related to the self-adaptation of strategy parameters, the (

,

)-evolution strategy

is typically preferred.

5


12/62

2.6 Termination Criterion

There are several options for the choice of the termination criterion, including the measurement

of some absolute or relative measure of the population diversity (see e.g. [5], pp. 80–81), a

predefined number of iterations of the main loop of the algorithm, or a predefined amount of

CPU time or real time for execution of the algorithm.

3 Self-Adaptation

The settings for the learning rates , 0 and 0

are recommended by Schwefel as reasonable

heuristic settings (see [202], pp. 167–168), but one should have in mind that, depending on

the particular topological characteristics of the objective function, the optimal setting of these

parameters might differ from the values proposed. For n

= 1 , however, [26] has recently

theoretically shown that, for the sphere model

f ( ~x ) =

n

X

i = 1

( x

i

; x

i

)

2

(11)

the setting 0

/ 1 =

p

n is the optimal choice, maximizing the convergence velocity of the evo-

lution strategy. Moreover, for a (1 )-evolution strategy Beyer derived the result that 0

c

1

=

p

n (for 1 0 ), where c 1

denotes the progress coefficient of the (1 )-strategy.

For an empirical investigation of the self-adaptation mechanism defined by the mutation

operator variants (4)–(8), [204, 205, 206] used the following three objective functions which

are specifically tailored to the number of learnable strategy parameters in these cases:

1. Function

f

1

( ~x ) =

n

X

i = 1

x

2

i

(12)

requires learning of one common standard deviation , i.e., n = 1 .

2. Function

f

2

( ~x ) =

n

X

i = 1

i x

2

i

(13)

requires learning of a suitable scaling of the variables, i.e., n

= n .

3. Function

f

3

( ~x ) =

n

X

i = 1

0

@

i

X

j = 1

x

j

1

A

2

(14)

requires learning of a positive definite metrics, i.e., individual i and n = n ( n ; 1 ) = 2

different covariances.

As a first experiment, Schwefel compared the convergence velocity of a (1 1 0

) and a (1+10)-

evolution strategy withn

= 1

on the sphere modelf

1

withn = 3 0

. The results of a comparable

experiment performed for this study (averaged over ten independent runs, with the standard

6


13/62

Figure 1: Comparison of the convergence velocity of a (1 1 0

)-strategy and a (1 + 1 0

)-strategy

in case of the sphere modelf

1

withn = 3 0

andn

= 1

.

deviations initialized with a value of 0.3) are shown in figure 1, where the convergence velocity

or progress is measured by log(

q

f

m n

( 0 ) = f

m n

( g ) )

withf

m n

( g )

denoting the objective function

value in generationg

. It is somewhat counterintuitive to observe that the non-elitist (1 1 0

)-

strategy, where all offspring individuals might be worse than the single parent, performs better

than the elitist (1+10)-strategy. This can be explained, however, by taking into account that

the self-adaptation of standard deviations might generate an individual with a good objective

function value but an inappropriate value of for the next generation. In case of a plus-strategy,

this inappropriate standard deviation might survive for a number of generations, thus hindering

the combined process of search and adaptation. The resulting periods of stagnation can be

prevented by allowing to forget the good search point, together with its inappropriate step size.From this experiment, Schwefel concluded that the non-elitist ( )-selection mechanism is an

important condition for a successful self-adaptation of strategy parameters. Recent experimental

findings by Gehlhaar and Fogel [56] on more complicated objective functions than the sphere

model give some evidence, however, that the elitist strategy performs as well as or even better

than the ( )-strategy in many practical cases.

For a further illustration of the self-adaptation principle in case of the sphere model f 1

, we

use a time-varying version where the optimum location ~x = ( x 1

: : : x

n

) is changed every 150

generations. Ten independent experiments for n = 3 0 and 1000 generations per experiment

are performed with a (15,100)-evolution strategy (without recombination). The average best

objective function value (solid curve) and the minimum, average, and maximum standard devi-ations m n

, avg, and m a x are reported in figure 2. The curve of the objective function value

clearly illustrates the linear convergence of the algorithm during the first search interval of 150

generations. After shifting the optimum location at generation 150, the search stagnates for a

while at the bad new position before the linear convergence is observed again.

The behavior of the standard deviations, which are also plotted in figure 2 clarifies the

7


14/62

Figure 2: Best objective function value and minimum, average, and maximum standard devi-

ation in the population plotted over the generation number for the time-varying sphere model.

The results were obtained by using a (15,100)-evolution strategy with n

= 1 , n = 3 0 , without

recombination.

Figure 3: Convergence velocity on f 2 for a ( 1 0 0 )-strategy with 2 f 1 : : : 3 0 g for the self-adaptive evolution strategy (dashed curve) and the strategy using optimum prefixed values of

the standard deviations i

.

8


15/62

Figure 4: Comparison of the convergence velocity of a (1 5 1 0 0

)-strategy with correlated muta-

tions (solid curve) and with self-adaptation of standard deviations only (dashed curve) in case

of the functionf

3

withn = n

= 1 0

,n

= 4 5

.

reason for the periods of stagnation of the objective function values: Self-adaptation of standard

deviations works both by decreasing them during the periods of linear convergence and by

increasing them during the periods of stagnation, back to a magnitude such that they have an

impact on the objective function value. This process of standard deviation increase, which

occurs at the beginning of each interval, needs some time which does not yield any progress

with respect to the objective function value. According to [25], the number of generations

needed for this adaptation is inversely proportional to 20

(that is, proportional to n ) in case of a

(1 )-evolution strategy.

In case of the objective function f 2 , each variable x i is differently scaled by a factorp

i ,such that self-adaptation requires to learn the scaling of n different

i

. The optimal settings

of standard deviations i

/ 1 =

p

i are also known in advance for this function, such that self-

adaptation can be compared to an evolution strategy using optimally adjusted i

for mutation.

The result of this comparison is shown in figure 3, where the convergence velocity is plotted for

( 1 0 0 )-evolution strategies as a function of , the number of parents, both for the self-adaptive

strategy and the strategy using the optimal setting of i

.

It is not surprising to see that, for the strategy using optimal standard deviations i

, the

convergence rate is maximized for = 1 , because this setting exploits the perfect knowledge in

an optimal sense. In case of the self-adaptive strategy, however, a clear maximum of the progress

rate is reached for a value of = 1 2

, and both larger and smaller values of

cause a strongloss of convergence speed. The collective performance of about 12 imperfect parents, achieved

by means of self-adaptation, almost equals the performance of the perfect (1,100)-strategy and

outperforms the collection of 12 perfect individuals by far. This experiment indicates that self-

adaptation is a mechanism that requires the existence of a knowledge diversity (or diversity of

internal models), i.e., a number of parents larger than one, and benefits from the phenomenon

9


16/62

of collective (rather than individual) intelligence.

Concerning the objective function f 3

, figure 4 shows a comparison of the progress for a

(15,100)-evolution strategy with n

= n = 1 0 , n

= 0 (that is, no correlated mutations) and

n

= n ( n ; 1 ) = 2 = 4 5 (that is, full correlations). In both cases, intermediary recombi-

nation of object variables, global intermediary recombination of standard deviations, and no

recombination of the rotation angles is chosen. The results demonstrate that, by introducing

the covariances, it is possible to increase the effectiveness of the collective learning process in

case of arbitrarily rotated coordinate systems. Recently, [180] has shown that an approxima-tion of the Hessian matrix could be computed by correlated mutations with an upper bound of

+ = ( n

2

+ 3 n + 4 ) = 2 on the population size, but the typical settings ( = 1 5 , = 1 0 0 )

are often not sufficient to achieve this (an experimental investigation of the scaling behavior of

correlated mutations with increasing population sizes and problem dimension has not yet been

performed).

The choice of a logarithmic normal distribution for the modification of the standard devia-

tions i

in connection with a multiplicative scheme in equations (6), (4) and (8) is motivated by

the following heuristic arguments (see [202], p. 168):

1. A multiplicative process preserves positive values.

2. The median should equal one to guarantee that, on average, a multiplication by a certain

value occurs with the same probability as a multiplication by the reciprocal value (i.e.,

the process would be neutral under absence of selection).

3. Small modifications should occur more often than large ones.

The effectiveness of this multiplicative logarithmic normal modification is presently also

acknowledged in evolutionary programming, since extensive empirical investigations indicate

some advantage of this scheme over the original additive self-adaptation mechanism used in

evolutionary programming [185, 184, 186], where

0

i

=

i

( 1 + N ( 0 1 ) ) (15)

(with a setting of 0 2 [186]). Recent investigations indicate, however, that this becomes

reversed when noisy objective functions are considered, where the additive mechanism seems

to outperform multiplicative modifications [4].

The study by Gehlhaar and Fogel [56] also indicates that the order of the modifications of

x

i

and i

has a strong impact on the effectiveness of self-adaptation: It is important to mutate

the standard deviations first and to use the mutated standard deviations for the modification of

object variables. As the authors point out in that study, the reversed mechanism might suffer

from generating offspring that have useful object variable vectors but bad strategy parameter

vectors, because these have not been used to determine the position of the offspring itself.Concerning the sphere model f

1

and a (1 )-strategy, Beyer has recently indicated that equa-

tion (15) is obtained from equation (6) by Taylor expansion breaking off after the linear term,

such that both mutation mechanisms should behave identically for small settings of the learning

rates

0

and

, when

0

=

[25]. This was recently confirmed also with some experiments for

the time-varying sphere model [15].

10


17/62

4 Variations

4.1 Mutative Step-Size Control

For a (1, )-strategy and n

= 1 , the self-adaptation of strategy parameters can also be facilitated

by using the so-called mutational step size control by Rechenberg, which modifies the standard

deviations according to the following rule ([169], p. 47):

0

=

(

if u U ( 0 1 ) 1 = 2

= if u U ( 0 1 ) > 1 = 2 (16)

A value of = 1 3 of the learning rate is proposed by Rechenberg.

As shown in [25], this self-adaptation rule also provides a reasonable choice with a con-

vergence velocity comparable to that achieved by equation 4 for the convex case. This result

confirms that the self-adaptation principle works for a variety of different probability density

functions for the modification of step sizes, i.e., it is a very robust technique.

4.2 Derandomized Step-Size Adaptation

In contrast to the techniques discussed so far, the derandomized mutational step size control

proposed in [146] accumulates information about the selected individual’s mutation vector ~z

over the course of evolution by adding up the successful mutations. The authors claim that the

method enables a reliable adaptation of individual step sizes (i.e.,n

different standard devia-

tions i

) even in small populations, namely, in (1, )-strategies with = 1 0 in the experiments

reported. The proposed method utilizes a vector~z

g of accumulated mutations as well as indi-

vidual step sizes

i

and a global step size

according to [146]:

~z

g

= ( 1 ; c ) ~z

g ; 1

+ c ~z

~z

0

=

~

0

(17)

0

=

0

@

e x p

0

@

~z

g

p

n

q

c

2 ; c

; 1 +

1

5 n

1

A

1

A

(18)

0

i

=

i

0

@

z

g

i

q

c

2 ; c

+ 0 3 5

1

A

(19)

x

0

i

= x

i

+

0

0

i

N

i

( 0 1 ) (20)

Essentially, equation (17) captures the history of successful mutations by a weighted sum

of the mutations selected in preceding generations (i.e., ~z g ; 1 ) and the mutation vector ~z of

the selected parent individual (notice that the method applies to (1, )-strategies, i.e., ~z is the

mutation vector of the single best offspring individual produced in generationg ; 1

). Thevector ~ z g is then used to update both a global step size and individual step sizes

i

according

to equations (18) and (19), where~z

g in equation (18) denotes the absolute value of ~ z

g , while

z

g

i

in equation (19) indicates the absolute value of itsi

-th component.

Equation (20) then denotes the generation of offspring individuals from the single parent

(with componentsx

i

) in a way similar to equation (6), but now using

0 and

0

i

. Concerning the

11


18/62

choice of the new learning rates c , , and 0 , both theoretical and empirical arguments are given

in [146] for the settings c = 1 = p

n , = 1 = p

n , 0 = 1 = n .

The experimental results presented in [146] demonstrate a clear convergence velocity im-

provement of the derandomized mutational step size control when compared to an (8,50)-

evolution strategy using the update rule given in equation (6), but the investigations focus on

unimodal objective functions.

The general idea of utilizing information from past generations as well is very convincing

and should motivate further research on the derandomized self-adaptation scheme. It should benoted, however, that the method has to be classified at the border between adaptive and self-

adaptive control methods, because equations (18) and (19) do not define a mutational variation

of step sizes involving a random variation in the sense of those defined previously. Randomness

is introduced only by means of the vector~z

g , which takes the mutation vector of the parent

individual into account, not an actually generated random variation.

4.3 Hierarchical Evolution Strategies

This kind of evolution strategy abstracts from the individual and takes genetic operators even

on the level of populations into account. It was introduced by Rechenberg [169] and denoted as

0

=

0

+

0

( = + )

; ES.

Here the inner brackets denote a normal ( = + ) -ES (the notation = indicates a -ary

recombination operator) which runs 0 times for generations, each. After that one got 0

populations and 0 populations are selected for the next generation on the population level.

These 0 populations run through a recombination and mutation cycle ( 0 = 0 ) on the level of

populations to generate 0 new populations and then run the inner ( = + ) -ES again for

cycles. This reproduction cycle on the population level is done 0 times.

The problem to arise is the recombination and mutation on the level of populations. Recom-

bination of populations can be done by simply taking single individuals from all 0 populations

into the succeeding population. Mutation can than be invoked by mutating each of the single in-dividuals or by moving the centres of gravity of the populations [169]. The latter one of course

needs more computational effort.

One can recognize that there are two levels of hierarchy in the approach shown here:

1. The level of individuals, and

2. the level of populations.

The concept however can be applied to more than one level and the nesting can increase to

higher levels like sorts and families in natural evolution [77].

The benefit of these hierarchical or nested evolution strategies is the isolation of populations.

These populations can run in parallel and explore different parts of the search space. Becausethis is done several times it leads to a better exploration of the search space. Rechenberg indi-

cates that this kind of strategy is qualified for multimodal optimization [169].

This ES can also be used for multicriteria optimization (see also section 5.4) because the

objectives to select for can be different on every step of the hierarchy. This only works with in-

dependent objectives, however because e.g. the objective selected for in the level of populations

12


19/62

is not working in the level of individuals. This will destroy every good information regarding

one objective in the case of contradicting ones.

A detailed description of the implementation is given in [169] but one should have in mind

that this approach again increases the number of parameters for an evolution strategy. This does

not only need more effort in programming but also requires knowledge and experience in the

tuning of the parameters to achieve good results.

4.4 The ( , , , )-Strategy: Aging of Individuals

In the ( + ) -ES the offspring and their parents are united, before according to a given

criterion, the fittest individuals are selected from this set of size + . Both and can be

as small as1

in this case, in principle. Indeed, the first experiments were all performed on the

basis of a ( 1 + 1 ) -ES. In the ( ) -ES, with > 1 , the new parents are selected from

the

offspring only, no matter whether they surpass their parents or not. The latter version is

in danger to diverge (especially in connection with self-adapting variances – see below) if the

so far best position is not stored externally or even preserved within the generation cycle (so-

called elitist strategy). So far, only empirical results have shown that the comma version has to

be preferred when internal strategy parameters have to be learned on-line collectively. For thatto work, > 1 and intermediary recombination of the mutation variances seem to be essential

preconditions. It is not true that ESs consider recombination as a subsidiary operator.

The( )

-ES implies that each parent can have children only once (duration of life: one

generation = one reproduction cycle), whereas in the plus version individuals may live eternally

– if no child achieves a better or at least the same quality. The new ( ) -ES as defined

in [209, 208] introduces a maximal life span of 1 reproduction cycles (iterations). Now,

both original strategies are special cases of the more general strategy, with = 1 resembling

the comma- and with = 1 resembling the plus-strategy, respectively. Thus, the advantages

and disadvantages of both extremal cases can be scaled arbitrarily. Other new options include:

Free number of parents involved in reproduction (not only 1, 2, or all).

Tournament selection as alternative to the standard ( ) -selection.

Free probabilities of applying recombination and mutation.

Further recombination types including crossover.

In a ( , , , )-ES, the representation of individuals is extended by a positive integer value

2 I N

0

, the remaining life span of the individual in iterations (reproduction cycles). Whenever

a new individual is created by mutation and recombination, its remaining life span is initialized

to =

. The remaining life span is decremented by the selection operator for all individualswhich survive selection.

The remaining life span is then used to modify the traditional deterministic ES selection

operator , which can be defined formally as:

s e l : I

+

! I

(21)

13


20/62

Let P ( T ) denote some parent population in reproduction cycle T , ~ P ( T ) their offspring produced

by recombination and mutation, and Q ( T ) = P ( T ) t ~ P ( T ) 2 I + where the operator t denotes

the union operation on multisets. Then

P

( T + 1 )

: = s e l ( Q

( T )

) (22)

The next reproduction cycle contains the best individuals still having a positive remaining

duration of life, i.e., the following relation is valid:

8 ~a 2 P

( T + 1 )

:

a

> 0 ^ 6 9

~

b 2 Q

( T )

n P

( T + 1 )

:

~

b

> ~a (23)

where the relation

> (read: better than) introduces a maximum duration of life, , that defines

an individual to be better than an other one if its remaining duration of life k

is still positive

and its fitness (measured by the objective function) is better.

The definition of the

> - relation is given by:

~a

k

>

~

~a

`

: ,

k

> 0 ̂ f ( ~x

k

) f (

~

~x

`

) (24)

At the end of the selection process, the remaining maximum life durations have to be decre-

mented by one for each survivor:

( T + 1 )

k

: =

~

( T )

k

; 1 8 k 2 f 1 : : : g (25)

It should be noted again that, according to the definition (24) of the “better than” relation, a

setting of = 1 results in discarding the parents regardless of their quality (i.e., the ( , )-

selection as in traditional evolution strategies) while = 1

guarantees parents to be discarded

only if they are outperformed by offspring individuals (i.e., the (

+

)-selection as in traditional

evolution strategies).

As an alternative to this variant of selection, the tournament selection is well suited for

parallelization of the selection process. This method selects times the best individual from

a random subsetB

k

of sizeB

k

=

,2 + 8 k 2 f 1 : : : g

and transfers it tothe next reproduction cycle (note that there may appear duplicates!). The best individual within

each subset B k

is selected according to the

> relation which was introduced in (24). A formal

definition of the ( ) tournament selection follows: Let

B

k

Q

( T )

8 k 2 f 1 : : : g

(26)

be random subsets of Q ( T ) , each of size B k

= . For each k 2 f 1 : : : g choose ~ a k

2 B

k

such that

8

~

b 2 B

k

: ~a

k

>

~

b (27)

Finally,

P

( T + 1 )

: =

G

k = 1

f ~a

( T + 1 )

k

g (28)

As an extension to the traditional recombination operator, the generalized recombination

operatorr e c : I

! I

is defined as follows:

r e c : = r e c o

(29)

14


21/62

where c o : I ! I chooses 1 parent vectors from I with uniform probability, and

r e : I

! I creates one offspring vector by mixing characters from parents.

Let A P ( T ) of size A = be a subset of arbitrary parents chosen by the operator c o ,

and let ^~ a 2 I be the offspring to be generated. If A = f ~a 1

~a

2

g , ~a 1

and ~a 2

being two out of

parents, holds, recombination is called bisexual. If A = f ~a 1

: : : ~ a

g and > 2 , recombination

is called multisexual. While recombination in evolution strategies was originally proposed for

the two cases of = 2 and = (global recombination), and was restricted to = 2 in

genetic algorithms, Eiben generalized the idea for an arbitrary number of parents 2

involved in the creation of either one (e.g., in case of scanning crossover ) or (e.g., in case

of diagonal crossover ) offspring individuals [39, 41, 40]. This generalization is adapted here

for extending discrete and intermediary recombination in evolution strategies to an arbitrary

number of parents, but still generating one offspring only per application of the recombination

operator. First experimental results in parameter optimization indicate that the optimum value of

is problem-dependent, but in many cases = is the most efficient setting for recombination

of the object variables [38].

In contrast to traditional evolution strategies which always apply recombination for the cre-

ation of offspring, we also propose here to introduce recombination probabilities p r

2 0 1

3

as a further generalization of the algorithm. A recombination probability p r

for one of the

three components of individuals that might undergo recombination is algorithmically realized

by sampling a uniform random variable u U ( 0 1 ) and applying no recombination, if u > p r

,

or the corresponding recombination operator, if u p r

.

Finally, an offspring individual created by recombination is equipped with a remaining life

time = .

5 Application-Oriented Extensions

5.1 Noisy Objective Functions

Originally designed for experimental optimization [166, 203], Evolution Strategies are claimed

to be of general applicability as well as robust in the presence of noise. Whereas the universality

of these algorithms was validated through lots of applications [13] little is known about the

robustness in case of pertubations. But the ability to deal with noisy functions not only is a

prerequisite for experimental optimization, e.g. because of limited precision of observations,

but also in the context of numerical optimization like in the field of computer simulation.

Despite of their simple structure Evolution Strategies show a complex dynamic behavior.

Theoretical investigations up to now were successful only for simplified strategy variants and

convex objective functions like the sphere modelf

1

( ~x ) =

P

n

i = 1

x

2

i

.

Here we cite a result from Beyer [24], which describes the dynamics of the (1, )-ES on the

noisy objective functionf

1

( ~x ) + N ( 0

)

:

R

g

=

2

0

@

n

2 R

;

2 R c

1

q

2

+ ( 2 R )

2

1

A (30)

R

andg

denote the remaining distance to the true optimimum point (~ 0

) and the current

15


22/62

2 3 5 10 50 100

c

1

0.5642 0.8463 1.1630 1.539 2.249 2.508

Table 1: Some values forc

1

generation number, respectively. The standard deviations for the mutation and the perturbationare given by and

. The model is of dimensionality n and c 1

denotes the so called progress

coefficient , which is a slowly increasing function in [24]:

c

1

p

2 l n (31)

Expressions (30) and (31) hold for large n and -values, respectively.

Table 1 lists some values of c

1

which are analytically derived for 5

and numerically

approximated for > 5

from Scheel [187].

We will make use of equation (30) to investigate the steady state, i.e. R 1

:

R

g

! 0 .

Assuming l i m g ! 1

= 0 we get

R

1

=

1

2

s

n

c

1

and (32)

f

1

( R

1

) =

n

4 c

1

(33)

Equation (33) can be used to validate experimental results for the sphere model.

For the experiments, standard deviations

2 f 0 0 0 1 0 0 0 5 0 0 1 0 0 5 0 1 0 5 1 0 g are

utilized to perturb the function values and the evolution strategies’ behavior is compared to the

unperturbed case (

= 0 ). The experiments are performed by running a (1,100)-ES as well as

a (15,100)-ES with n

= 1 for the convergence velocity test. Each experiment is repeated for a

total of N = 1 0 0 independent runs in order to obtain statistically significant results. In contrast

to the standard method which assesses the quality of an optimization run by concentrating on the

individual of best (in our case, minimal) objective function value, this is not reasonable in case

of perturbed evaluations because the populations’ extreme values represent outliers. Instead,

the evaluations are based on the average objective function value of the offspring population,

which provides a more robust measure of the true (unperturbed) quality of the individuals.

The experiments are performed on the sphere modelf

1

withn = 3 0

. The initial population

consists of object variables chosen uniformly at random from the interval ; 3 0 3 0

. All initial

standard deviations are set to a value of 25.0, and n

= 1 is used for all runs. Each of the N =

1 0 0

runs is terminated after2 0 0 0 0

function evaluations (2 0 0

generations), and the objective

function data of all runs is averaged to obtain a result of statistical significance. (Indeed, thedata from 100 runs passes a Kolmogorov-Smirnov test for the hypothesis of normally distributed

data for a significance level of 0 0 1 and a confidence interval of 1% around the average.)

Figure 5 shows the behavior of a (1,100)-ES for the set of different perturbation magnitudes

as well as the unperturbed case. The average objective function value is plotted against the

number of generations.

16


23/62

Figure 5: Courses of evolution for (1,100)-ES on the sphere model and standard deviations

2 f 0 0 0 0 1 0 0 0 5 0 0 1 0 0 5 0 1 0 5 1 0 g for the perturbation.

The courses of evolution clearly demonstrate the capability of an evolution strategy to pro-

ceed as fast as in the unperturbed case as long as the magnitude of

is small in comparison to

f . If f decreases beyond a certain level the selection is based on the perturbation only and the

search process becomes a random walk thus limiting the convergence precision.

Table 2 shows a remarkable accordance between theoretical and experimental results com-

paring the (1,100)-ES steady states. The difference of a factor of approximately 1.3 can be

explained through the fact, that equation (33) is valid forn ! 1

and ! 0

only.Increasing the parent population size to a more practical value of 15, we observe a similar

behavior (figure 6). A closer look not only shows a moderate speed up due to the influence

of recombination, but also a much better localisation of the optimum point in the steady state

by approximately a factor of 4. This effect is caused by the reduction of selection pressure

which prohibits the outliers to take over the whole population. A first analysis lets us assume

an optimal parameter value for between 10 and 15 in this configuration.

5.2 Robust Design

Robustness is an important requirement for almost all kinds of products, i.e. they should keep agood performance under varying conditions (temperature or humidity). Furthermore, the impact

of wear, as well as manufacturing tolerances, should be limited as much as possible. Conse-

quently, the production process itself as well as the environmental influences after the product is

put to use have to be regarded during the product design. We have shown for multilayer optical

coatings (MOCs) how robust designs can be achieved by using evolutionary algorithms. MOCs

17


24/62

(1,100)-ES (1,100)-ES (15,100)-ES

theory observation observation

1.0 2.990 3.8 0.975

0.5 1.495 1.9 0.469

0.1 0.299 0.4 0.091

0.05 0.150 0.2 0.047

0.01 0.030 0.038 0.0090.005 0.015 0.02 0.005

0.001 0.003 0.004 0.001

Table 2: f ( R 1

) for the (1,100)-ES theory, (1,100)-ES experiment and the (15,100)-ES experi-

ment.

are used to guarantee specific transmission and/or reflection characteristics of optical devices.

The objective of MOC designs is to find sequences of layers of particular materials with spe-

cific thicknesses showing the desired characteristics as closely as possible. The MOC design

problem is not analytically solvable.Let ~x = ( x

1

: : : x

n

) be a vector of parameters of a given design problem, e.g., the refraction

indices and thickness of the optical layers. Given a function f ( ~x ) describing the merit of a

design feature, e.g. the color perception of the reflected light, and being a target value for

f ( ~x ) , then if disturbances are neglected the task is to find such an ~x that the difference between

f ( ~x

) and is minimized.

On the other hand the usability of two products although manufactured under almost iden-

tical conditions might differ significantly, due to external conditions such as temperature and

humidity, or internal factors such as wear as well as manufacturing tolerances. Some of these

factors are not controllable at all. Others can only be reduced with unjustifiable effort. Thus

they are regarded as disturbances, and it is desired to reduce their influence as much as possible.

Here we focused on manufacturing tolerances, but the approach could easily be extended.

The disturbances are represented by a vector of random numbers ~ = ( 1

: : :

n

). If the

probability distribution of the i

are known as well as their influence on f we might rewrite

f ( ~x ) as ~ f ( ~x ~ ) . In our example the disturbances are assumed to be normally distributed with

zero mean and will have an additive influence on the parameter values. Thus, we define

~

f ( ~x

~

) = f ( x

1

+

1

: : : x

n

+

n

) (34)

The task is now to minimize the deviations of ~ f ( ~x ~ ) from .

This leads to the question of how to assess these deviations. The traditional approach regards

all products with~

f ( ~x

~

) ;

as equally good for some predefined

and all others as off-cuts. But this approach is somewhat unrealistic, since if such products are assembled to larger

units such as devices on electronic boards malfunctions might occur due to aggregations of

deviations of single elements.

The method of parameter design after Taguchi [218, 93, 179] takes these effects into account

by considering every deviation from the objective

as a loss. In practical applications quadratic

18


25/62

Figure 6: Courses of evolution for (15,100)-ES on the sphere model and standard deviations

2 f 0 0 0 0 1 0 0 0 5 0 0 1 0 0 5 0 1 0 5 1 0 g for the perturbation.

loss functions of the form

(

~

f ( ~x

~

) ; ) )

2 (35)

have proven to be well suited if no better alternative is known. The expected loss then becomes

L = k E ( (

~

f ( ~x

~

) ; )

2

) (36)

where k is some constant and E denotes the expectation value of the quadratic deviation.

In our work we follow the approach of Greiner [61, 62] who defines the objective functionas

E

( ;

~

f )

2

( ~x ) = k

Z

( ;

~

f ( ~x

~

) )

2

P (

~

) d

~

(37)

where P ( ~ ) denotes the the joint probability distribution of the distrubances. Since in most

applications the expectation value E cannot be calculated analytically it must be approximated.

Here we use1

t

t

X

i = 1

( ;

~

f ( ~x

~

i

) )

2 (38)

as an estimate, where ~ i

i = 1 : : : t , are vectors of normally distributed random numbers with

mean zero and standard deviation

. The estimation error scales proportional to

p

t

, and sincein most applications the possible number of evaluations is very limited this approach yields a

stochastic optimization problem. As evolutionary algorithms have proven their robustness in

case of noisy objective functions [46, 24, 9, 64] they are promising candidates here.

In order to clarify the relationship between the original merit functionf

and the expected

lossL

we investigated a rectangular function. We could show that optimla points of L

do not

19


26/62

necessarily correspond to optimal points of f ( ~x ) ; . As already mentioned we considered as

an practical example the design of multilayer optical coatings most frequnetly used for optical

filters. During the production process the layer thickness can not be controlled with arbitrary

precision. Additionally, the refraction indices vary slightly due to pollution of the optical mate-

rials. Thus, we might observe significant variances in the quality of single filters.

Basically, we applied two modified evolution strategies (ES). A extended ( 2 5 + 5 0 ) -ES for

mixed-integer optimization after [14] and a parallel diffusion model after [199], where the in-

dividuals are located on a regular grid. We used 15 subpopulations with a size of 20x25, aneighborhood size of 7x7 and an isolation time of 30 generations. The MOC designs found by

the evolutionary algorithms are substantially more robust to parameter variations than a refer-

ence design and therefore perform much better in the average case, although for the undisturbed

case the reference design is significantly better. This observation was expected, since sensitiv-

ity analysis shows that many local optima are not robust under parameter variations. For more

details see [230].

5.3 Dynamic Environments

The principle of self-adaptation promises to be useful not only in case of static optimizationproblems, but also for dynamic optimization problems where the objective function changes

over the course of optimization. The dynamic environment requires the evolutionary algorithm

to maintain sufficient diversity for continuous adaptation to the changes of the landscape, which

should be possible by means of self-adaptation of strategy parameters. Recently, it was demon-

strated that indeed the self-adaptation principle in evolution strategies provides an effective way

of tracking moving optima in case of dynamic objective functions [6].

In the general case of a dynamic environment, the goal is not only to acquire an optimal

solution but also to track its progression through the search space as closely as possible. In

contrast to the static optimization problem f ( ~x ) ! m i n (~x 2 M ), the dynamic optimization

problem

f ( ~ x t ) ! m i n ~x 2 M t 2 T

depends on an additional parameter t 2 T (the time) as well, i.e., the objective function changes

with t . Generally, this implies that, for ti

6= t

j

, f ( ~ x ti

) 6= f ( ~ x t

j

) , i.e., the objective function

might be different after each function evaluation, in contrast to a simplified form of dynamic

behavior where the objective function remains constant within specific time intervals tk

t

k

+

t

k

, such that

t

i

t

j

2 t

k

t

k

+ t

k

) f ( ~ x t

i

) = f ( ~ x t

j

)

For the investigations reported in [6], it was assumed that the dynamics of the objective

function and the dynamics of the evolutionary algorithm are synchronized by identifyingt

withthe generation index of the algorithm and by keeping f constant within one generation, such

that t

k

1

andt

i

t

j

t

k

2 f 0 1 2 : : : t

m a x

g

. Moreover, t

k

= : g

is also assumed to be

constant, such that the objective function changes every g

generations after completing the

evaluation of the whole population in case of a generational evolutionary algorithm such as the

evolution strategy.

20


27/62

Figure 7: Evolution strategy results for the linear dynamics with update frequenc y g = 1

(left), g = 5 (middle), g = 1 0 (right).

Three dynamical environments derived from the sphere model

f ( ~x ) =

n

X

i = 1

x

2

i (39)

are used for the experiments. The dynamical environments are generated by translating the base

function along a linear trajectory according to

f ( ~ x t ) =

n

X

i = 1

( x

i

+

i

( t ) )

2 (40)

where t 2 I N 0

denotes the time counter (equivalent to the generation number in an evolutionary

algorithm).

The trajectory is defined by setting i

( 0 ) = 0 8 i 2 f 1 : : : n g , and

i

( t + 1 ) =

(

i

( t ) + s ( t + 1 ) mod g = 0

i

( t )

else (41)

The algorithm used here is a standard (15,100)-evolution strategy with local discrete re-

combination on the object variables x i

and global intermediary recombination on the strategy

parameters i

. 100 offspring individuals are generated per generation, n

= n variances are

used for self-adaptation (although it is well known that one variance is optimal for the sphere

model), all object variables are uniformly initialized within the range ; 5 0 5 0 , and 50 indepen-

dent runs are performed over 500 generations, each. The experiments for the linear dynamics,

with update frequencies g 2 f 1 5 1 0 g and severity s 2 f 0 0 1 0 1 0 5 g are shown in figure

7.In this figure, the left, middle, and right subfigure correspond with an update frequency of

1, 5, and 10 generations, respectively, and each of the subfigures contains the three curves for

the different levels of the severity parameter.

All results reported here give a clear impression that the self-adaptation of variances as uti-

lized in a (

,

)-evolution strategy is an effective method for tracking dynamic environments. In

21


28/62

all cases, the optimization proceeds with a linear rate of convergence as predicted by the theory

of evolution strategy behavior on the sphere model, until the objective function value reaches

an order of magnitude corresponding to the squared value of the severity parameter s . With

an update frequency of g = 1 , the algorithm constantly follows the dynamic environment

without any deteriorations.a With larger update frequencies g 2 f 5 1 0 g , the objective func-

tion values oscillate with a frequency of g generations between the objective function value

achieved by a continuous update at every generation (left figures) and the further improvement

that can be achieved by holding the environment constant for g generations. This results in alarger amplitude of the oscillation when

g

increases.

The direct conclusion from the three sets of experiments reported here is that the lognormal

self-adaptation rule as used in (

,

)-evolution strategies is perfectly able to track the dynamic

optima.

5.4 Multiple Criteria Decision Making

It has become increasingly obvious that the optimization under a single scalar–valued criterion

— often a monetary one — fails to reflect the variety of aspects in a world getting more and more

complex. Often, there are several conflicting optimization criteria (e.g., costs vs. reliability),

such that the objective function is characterized best by a multiple-criteria approach with k > 1

objectives, i.e.:~

f : M ! I R

k

~

f ( ~x ) = ( f

1

( ~x ) : : : f

k

( ~x ) )

(42)

Under such circumstances, the goal of the search is to identify solutions which can not be

improved in any combination of the objectives without degradation in the remaining, i.e., a

solution ~x i

is called Pareto-optimal (nondominated): ,

6 9 ~x

j

:

~

f ( ~x

j

)

P


29/62

Pareto-based approaches, using a population ranking according to Pareto dominance.

While all of these approaches can be used in combination with an evolution strategy, we

focus here on a study which falls in the second of the above mentioned categories and uti-

lizes the concept of polyploidy to deal with different objectives. More precisely, the following

modifications to a ( , )-evolution strategy are made [112, 113]:

Since the environment now consists of k objectives the selection step is provided with a

fixed user–definable vector that determines the probability of each objective to become

the sorting criterion in the k iterations of the selection loop. Alternatively, this vector may

be allowed to change randomly over time.

Furthermore, the extension of an individual’s genes by recessive information turned out

to be necessary in order to maintain the population’s capability of coping with a chang-

ing environment. The recessive genes enable a fast reaction after a sudden variation of

the probability vector. One can also observe this behaviour in nature: The younger the

environment the higher the portion of polyploid organisms.

Using these principles, the algorithm is able to generate solutions covering the Pareto front,

such that the user is provided with an idea of the tradeoffs between the objectives. It should benoted that efficient solutions in one generation may become dominated by individuals emerg-

ing in a later generation. This explains the non–efficient points in figure 8 (left) for the two

objectives

f

1

( ~x ) =

n

X

i = 1

( ; 1 0 e x p ( ; 0 2

q

x

2

i

+ x

2

i + 1

) ) (44)

f

2

( ~x ) =

n

X

i = 1

( x

i

0 8

+ 5 s i n ( x

i

)

3

)

(45)

For efficiency reasons the ‘parents’ of the next generation are stored provisionally in an array

that is cleaned out if there is not enough space left for further individuals. If this operation does

not result in enough free space solutions ‘close’ to one another are deleted. As an important

side effect the elements of the Pareto set are forced apart thus allowing a good survey with only

a finite number of solutions. Figure 8 (right) displays the situation after tidying up.

When working with diploid individuals the inclusion of the recessive genes in the selection

step turns out to be vital. Otherwise, undisturbed by the outside world they lead such a life

of their own that an individual whose dominant genes have been freshened up with recessive

material has no chance of surviving the next selection step. The best results were achieved with

a probability of about 1 = 3 for exchanging dominant and recessive genes. This value also serves

as a factor when putting together the overall fitness vector. Only in this way the additional

recessive material can serve as a stock of variants. From further test runs one can also concludethat diploid or, in general, polyploid individuals are not worth the additional computing time in

a static environment consisting only of one objective function.

Since the algorithm tries to cover the Pareto set as good as possible a probability distribution

forcing certain minimum changes during the mutation step ought to yield better results. Indeed,

the (symmetric) Weibull distribution turned out to be better than the Gaussian distribution.

23


30/62

Figure 8: Graphical visualization of the output of the algorithm.

The stochastic approach towards vector optimization problems via evolution strategies leads

to one major advantage: In contrast to other methods no subjective decisions are required during

the course of the iterations. Instead of narrowing the control variables space or the objective

space by deciding about the future direction of the search from an ‘information vacuum’ the

decision maker can collect as much information as needed before making a choice which of the

alternatives should be realized. Moreover, using a population while looking for a set of efficient

solutions seems to be more appropriate than just trying to improve one ‘current best’ solution.

One might exploit the algorithm’s capability of self–adapting its parameters even further:

The exchange rate between dominant and recessive genetic material can be adjusted on–line

thus providing the user with a measure of convergence. The self–adaptation property largely

depends on a selection scheme that forces the algorithm to ‘forget’ the good solutions (‘parents’)of one generation. When accepting a possible recession from one generation to the next on the

phenotype level individuals with a better ‘model’ of their environment, i.e. better step sizes i

are likely to emerge in later generations. This kind of selection seems to be lavish at first sight

but it favours better adapted settings, thus speeding up the search in the long run.

5.5 Constraint Handling

In practical application problems, the feasible region F usually is only a subspace of the whole

search space S , and it is defined by a set of m additional constraints:

g

j

( ~x ) 0 for j = 1 : : : q (46)h

j

( ~x ) = 0 for j = q + 1 : : : m : (47)

During the optimum seeking process of ESs, inequality constraints so far have been handled

as barriers, i.e., offspring that violate at least one of the restrictions are lethal mutations. Before

the selection operator can be activated, exactly

non-lethal offspring must have been generated.

24


31/62

In case of a non-feasible start position ~x ( 0 ) , a feasible solution must be found at first. This

can be achieved by means of an auxiliary objective function

~

f ( ~x ) =

m

X

j = 1

g

j

( ~x )

j

( ~x ) (48)

with

j

( ~x ) = ; 1

if g

j

( ~x ) 0 and d .

This kind of handling bounds can be used with all optimum seeking methods, provided that

they are started within the feasible region. Some may have trouble with the sine-term due to the

periodicity introduced, however.

6 Parallel Evolution Strategies

Due to the fact, that all individuals of a population act simultaneously in nature one can speak

of an inherent parallelism in evolution. Although this was already known when the principles of evolutionary algorithms were designed, no one could at that time imagine the power of parallel

computers, which are now available. Consequently, evolutionary algorithms have usually been

implemented sequentially.

Nowadays we are used to parallel computers and so in the last years a lot of suggestions to

parallise evolutionary algorithms have been made. The goals of parallelism are simple:

25


32/62

Speed: Get the same results like a sequential algorithm in less time.

Robustness: Get more robust results regarding errors or noisy information.

Quality: Get better results in the same time as a sequential algorithm.

There are at least two different approaches to parallel evolutionary algorithms [57, 5] which

are described here next to a mixed-model approach which tries to put the best of both models

together. Before that a very simple but effective way to use parallel hardware is presented,which does not match to the models presented afterwards.

6.1 The Master-Slave Approach

This approach is very effective if the calculation of the fitness function is time intensive, e.g. when

optimizing simulaton models where the simulation software runs a long time like in [7].

In this case the evolutionary algorithm can be divided into a master-process, where the

individuals are generated and the genetic operators are applied, and a number of slave-processes,

where the fitness function is evaluated.

Now the different processes can run on different maschines and the fitness calculation for awhole population can be done parallel. A special kind of steady-steate selection [228] with a

( + 1 ) ;

selection scheme was presented in [7, 11, 97] which nearly avoids any idle times on

the processors, because every time a fitness is calculated a new individual is send to the idle

processor without waiting for any other results from the slaves.

6.2 Coarse Grained Parallelism: The Migration Model

In the migration model a population is divided into a number of subpopulations, so-called demes

[5]. These subpopulations are still panmictic but exchange genetic information by the migration

of individuals. Two concepts are known [57, 215, 5]:

1. In the Island Model there is a random exchange of information between the subpopula-

tions, and

2. in the Stepping Stone Model this exchange is limited to migration paths which connect

the subpopulations that are placed in a topology (e.g. a ring, or a torus etc.).

These algorithms can be scaled to a balanced usage of processing and communication resources

by tuning the local population size and the migration frequencies.

Different ways to choose the individuals to leave the local population are known. To choose

one randomly seems to be a good compromise between the danger of premature stagnation

when choosing the best individual and small chances to survive in the new subpopulation whenchoosing the worst one to leave.

Another problem is the way to insert immigrants into the new population. A solution which

c