Not all parents are equal for MO-CMA-ES

16
Motivation Parent Selection Operators Experimental validation Not all parents are equal for MO-CMA-ES Ilya Loshchilov 1,2 , Marc Schoenauer 1,2 , Michèle Sebag 2,1 1 TAO Project-team, INRIA Saclay - Île-de-France 2 and Laboratoire de Recherche en Informatique (UMR CNRS 8623) Université Paris-Sud, 91128 Orsay Cedex, France Kindly presented by Jan Salmen, Universität Bochum with apologies from the authors Evolutionary Multi-Criterion Optimization - EMO 2011 Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 1/ 16

description

t. The Steady State variants of the Multi-Objective CovarianceMatrix Adaptation Evolution Strategy (SS-MO-CMA-ES) generate oneoffspring from a uniformly selected parent. Some other parental selectionoperators for SS-MO-CMA-ES are investigated in this paper. These operators involve the definition of multi-objective rewards, estimating theexpectation of the offspring survival and its Hypervolume contribution.Two selection modes, respectively using tournament, and inspired fromthe Multi-Armed Bandit framework, are used on top of these rewards.Extensive experimental validation comparatively demonstrates the merits of these new selection operators on unimodal MO problems

Transcript of Not all parents are equal for MO-CMA-ES

Page 1: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Not all parents are equal for MO-CMA-ES

Ilya Loshchilov1,2, Marc Schoenauer1,2, Michèle Sebag2,1

1TAO Project-team, INRIA Saclay - Île-de-France

2and Laboratoire de Recherche en Informatique (UMR CNRS 8623)Université Paris-Sud, 91128 Orsay Cedex, France

Kindly presented by Jan Salmen, Universität Bochumwith apologies from the authors

Evolutionary Multi-Criterion Optimization - EMO 2011

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 1/ 16

Page 2: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Content

1 MotivationWhy some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?

2 Parent Selection OperatorsTournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection

3 Experimental validationResultsSummary

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 2/ 16

Page 3: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Why some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?

Why some parents are better than other ?

Objective 1

Ob

jective

2

Objective 1O

bje

ctive

2

Hard to find segments of the Pareto front

Some parents improve the fitness faster than other

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 3/ 16

Page 4: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Why some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?

Why MO-CMA-ES ?

µMO-(1+1)-CMA-ES ≡ µ× (1+1)-CMA-ES + global Pareto selection

CMA-ES excellent on single-objective problems (e.g., BBOB)

In µMO-(1+1)-CMA-ES, each individual is a (1+1)-CMA-ES1

Objective 1

Ob

jective

2

Objective 1

Ob

jective

2

Two typical adaptations for MO-CMA-ES

1C. Igel, N. Hansen, and S. Roth (2005). "The Multi-objective Variable Metric Evolution Strategy"

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 4/ 16

Page 5: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection

Evolution Loop for Steady-state MO-CMA-ES

Parent Selection

Variation

Evaluation

Environmental selection

Update / Adaptation

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 5/ 16

Page 6: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection

Tournament Selection (quality-based)

A total preorder relation ≺X is defined on any finite subset X:

x ≺X y ⇔ PRank(x,X) < PRank(y,X) // lower Pareto rank

or // same Pareto rank and higher Hypervolume Contribution

PRank(x,X) = PRank(y,X) and ∆H(x,X) > ∆H(y,X)

Tournament (µ+t 1) selection for MOO

Input: tournament size t ∈ IN ; population of µ individuals X

Procedure: uniformly select t individuals from X

Output: the best individual among t according to ≺X criterion

With t = 1: standard steady-state MO-CMA-ES with random selection a

aC. Igel, T. Suttorp, and N. Hansen (EMO 2007). "Steady-state Selection and EfficientCovariance Matrix Update in the Multi-objective CMA-ES"

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 6/ 16

Page 7: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection

Multi-armed Bandits

Original Multi-armed Bandits (MAB) problem

A gambler plays arm (machine) j at time t

and wins reward : rj,t ={

1 with prob. p0 with prob. (1-p)

Goal: maximize the sum of rewards

Upper Confidence Bound (UCB) [Auer, 2002]

Initialization: play each arm onceLoop: play arm j that maximizes:

r̄j,t + C√

2 ln∑

knk,t

nj,t,

where{

r̄j,t average reward of arm j

nj,t number of plays of arm j

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 7/ 16

Page 8: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection

Reward-based Parent Selection

MAB for MOO

r̄j,t is the average reward along a time window of size w

ni,t = 0 for new offspring or for an individual selected w steps ago

select parent i =

{

i with ni,t = 0 if exist,

i = Argmax{

r̄j,t + C√

2 ln∑

knk,t

nj,t

}

otherwiseAt the moment, C = 0 (exploration iff ni,t = 0)

Objective 1

Ob

jective

2

1. =2. = +mutation3. Is successful ?4. Update and5.

ParentSelection()

ComputeRewards()

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 8/ 16

Page 9: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection

Defining Rewards I: (µ+ 1succ), (µ+ 1rank)

Parent a from the population Qg generates offspring a′.Both the offspring and the parent receive reward r:

(µ+ 1succ)

If a′ becomes member of new population Q(g+1):

r = 1 if a′ ∈ Q(g+1), and 0 otherwise

(µ+ 1rank)

A smoother reward is defined by the rank of a′ in Q(g+1):

r = 1−rank(a′)

µif a′ ∈ Q(g+1), and 0 otherwise

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 9/ 16

Page 10: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection

Defining Rewards II: (µ+ 1∆H1), (µ+ 1∆Hi

)

(µ+ 1∆H1)

Set the reward to the increase of the total Hypervolume contributionfrom generation g to g + 1:

r =

{

0 if offspring is dominated∑

a∈Q(g+1) ∆H(a,Q(g+1))−∑

a∈Q(g) ∆H(a,Q(g)) otherwise

(µ+ 1∆Hi)

A relaxation of the above reward, involving a rank-based penalization:

r =1

2k−1

ndomk(Q(g+1))

∆H(a, ndomk(Q(g+1)))−

ndomk(Q(g))

∆H(a, ndomk(Q(g)))

where k denotes the Pareto rank of the current offspring, andndomk(Q

(g)) is k-th non-dominated front of Q(g).

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 10/ 16

Page 11: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

ResultsSummary

Experimental validation

Algorithms

The steady-state MO-CMA-ES with modified parent selection:- 2 tournament-based: (µ+2 1) and (µ+10 1);- 4 MAB-based: (µ+ 1succ), (µ+ 1rank), (µ+ 1∆H1) and (µ+ 1∆Hi

).

The baseline MO-CMA-ES:- steady-state (µ+ 1)-MO-CMA , (µ≺ + 1)-MO-CMA and generational(µ+ µ)-MO-CMA.

Default parameters (µ = 100), 200,000 evaluations, 31 runs.

Benchmark Problems:

sZDT1:3-6 with the true Pareto front shifted in decision space:x′

i ← |xi − 0.5| for 2 ≤ i ≤ n

IHR1:3-6 rotated variants of the original ZDT problems

LZ09-1:5 with complicated Pareto front in decision space 2

2H. Li and Q. Zhang. "Multiobjective Optimization Problems With Complicated Pareto Sets,MOEA/D and NSGA-II." IEEE TEC 2009

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 11/ 16

Page 12: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

ResultsSummary

Results: sZDT1, IHR1, LZ3 and LZ4

0 2 4 6 8 10

x 104

10−2

10−1

100

sZDT1

number of function evaluations

∆ H

targ

et

(µ + 1)(µ

< + 1)

(µ +10

1)

(µ + 1∆ H

i

)

(µ + 1succ

)

(µ + 1rank

)

0 2 4 6 8 10

x 104

10−5

10−4

10−3

10−2

10−1

100

IHR1

number of function evaluations

∆ H

targ

et

(µ + 1)(µ

< + 1)

(µ +10

1)

(µ + 1∆ H

i

)

(µ + 1succ

)

(µ + 1rank

)

0 2 4 6 8 10

x 104

10−2

10−1

100

LZ09−3

number of function evaluations

∆ H

targ

et

(µ + 1)

(µ< + 1)

(µ +10

1)

(µ + 1∆ H

i

)

(µ + 1succ

)

(µ + 1rank

)

0 2 4 6 8 10

x 104

10−3

10−2

10−1

100

LZ09−4

number of function evaluations

∆ H

targ

et

(µ + 1)(µ

< + 1)

(µ +10

1)

(µ + 1∆ H

i

)

(µ + 1succ

)

(µ + 1rank

)

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 12/ 16

Page 13: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

ResultsSummary

Results: (µ+ 1), (µ+10 1), (µ+ 1rank) on LZ problems

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-4

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-4

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-4

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-5

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-5

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-5

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-3

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-3

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-3

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-2

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-2

x2

x3

0

0.5

1

-1

0

1-1

-0.5

0

0.5

1

x1

LZ09-2

x2

x3

0

0.5

1

0

0.5

10

0.2

0.4

0.6

0.8

1

x1

LZ09-1

x2

x3

0

0.5

1

0

0.5

10

0.2

0.4

0.6

0.8

1

x1

LZ09-1

x2

x3

0

0.5

1

0

0.5

10

0.2

0.4

0.6

0.8

1

x1

LZ09-1

x2

x3

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 13/ 16

Page 14: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

ResultsSummary

Loss of diversity

0 0.2 0.4 0.6 0.8 10

0.5

1

1.5

2

2.5

3

3.5

4

4.5

Objective 1

Obj

ectiv

e 2

sZDT2

All populations of MO−CMA−ESTrue Pareto FrontPareto Front of MO−CMA−ES

0 0.2 0.4 0.6 0.8 1 1.2−1

0

1

2

3

4

5

Objective 1

Obj

ectiv

e 2

IHR3

All populations of MO−CMA−ESTrue Pareto FrontPareto Front of MO−CMA−ES

Typical behavior of (µ+ 1succ)-MO-CMA on sZDT2 (left) and IHR3 (right)problems after 5,000 fitness function evaluations.

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 14/ 16

Page 15: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

ResultsSummary

Summary

Speed-up (factor 2-5) with MAB schemes: (µ+ 1rank) and (µ+ 1∆Hi).

Loss of diversity, especially on multi-modal problems. Too greedyschemes: (µ+ 1succ) and (µ+t 1).

Perspectives

Find "appropriate" C for exploration term r̄j,t + C√

2 ln∑

k nk,t

nj,t.

Allocate some budget of evaluations for dominated arms (individuals)generated in the past to preserve the diversity.

Integrate the reward mechanism and update rules of (1+1)-CMA-ES,e.g. success rate and average reward in (µ+ 1rank).

Experiments on Many-objective problems.

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 15/ 16

Page 16: Not all parents are equal for MO-CMA-ES

MotivationParent Selection Operators

Experimental validation

ResultsSummary

Thank you for your attention !

Please send your questions to{Ilya.Loshchilov,Marc.Schoenauer,Michele.Sebag}@inria.fr

Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 16/ 16