Not all parents are equal for MO-CMA-ES
-
Upload
ilya-loshchilov -
Category
Technology
-
view
230 -
download
0
description
Transcript of Not all parents are equal for MO-CMA-ES
MotivationParent Selection Operators
Experimental validation
Not all parents are equal for MO-CMA-ES
Ilya Loshchilov1,2, Marc Schoenauer1,2, Michèle Sebag2,1
1TAO Project-team, INRIA Saclay - Île-de-France
2and Laboratoire de Recherche en Informatique (UMR CNRS 8623)Université Paris-Sud, 91128 Orsay Cedex, France
Kindly presented by Jan Salmen, Universität Bochumwith apologies from the authors
Evolutionary Multi-Criterion Optimization - EMO 2011
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 1/ 16
MotivationParent Selection Operators
Experimental validation
Content
1 MotivationWhy some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?
2 Parent Selection OperatorsTournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
3 Experimental validationResultsSummary
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 2/ 16
MotivationParent Selection Operators
Experimental validation
Why some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?
Why some parents are better than other ?
Objective 1
Ob
jective
2
Objective 1O
bje
ctive
2
Hard to find segments of the Pareto front
Some parents improve the fitness faster than other
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 3/ 16
MotivationParent Selection Operators
Experimental validation
Why some parents are better than other ?Multi-Objective Covariance Matrix Adaptation Evolution Strategy ?
Why MO-CMA-ES ?
µMO-(1+1)-CMA-ES ≡ µ× (1+1)-CMA-ES + global Pareto selection
CMA-ES excellent on single-objective problems (e.g., BBOB)
In µMO-(1+1)-CMA-ES, each individual is a (1+1)-CMA-ES1
Objective 1
Ob
jective
2
Objective 1
Ob
jective
2
Two typical adaptations for MO-CMA-ES
1C. Igel, N. Hansen, and S. Roth (2005). "The Multi-objective Variable Metric Evolution Strategy"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 4/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Evolution Loop for Steady-state MO-CMA-ES
Parent Selection
Variation
Evaluation
Environmental selection
Update / Adaptation
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 5/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Tournament Selection (quality-based)
A total preorder relation ≺X is defined on any finite subset X:
x ≺X y ⇔ PRank(x,X) < PRank(y,X) // lower Pareto rank
or // same Pareto rank and higher Hypervolume Contribution
PRank(x,X) = PRank(y,X) and ∆H(x,X) > ∆H(y,X)
Tournament (µ+t 1) selection for MOO
Input: tournament size t ∈ IN ; population of µ individuals X
Procedure: uniformly select t individuals from X
Output: the best individual among t according to ≺X criterion
With t = 1: standard steady-state MO-CMA-ES with random selection a
aC. Igel, T. Suttorp, and N. Hansen (EMO 2007). "Steady-state Selection and EfficientCovariance Matrix Update in the Multi-objective CMA-ES"
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 6/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Multi-armed Bandits
Original Multi-armed Bandits (MAB) problem
A gambler plays arm (machine) j at time t
and wins reward : rj,t ={
1 with prob. p0 with prob. (1-p)
Goal: maximize the sum of rewards
Upper Confidence Bound (UCB) [Auer, 2002]
Initialization: play each arm onceLoop: play arm j that maximizes:
r̄j,t + C√
2 ln∑
knk,t
nj,t,
where{
r̄j,t average reward of arm j
nj,t number of plays of arm j
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 7/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Reward-based Parent Selection
MAB for MOO
r̄j,t is the average reward along a time window of size w
ni,t = 0 for new offspring or for an individual selected w steps ago
select parent i =
{
i with ni,t = 0 if exist,
i = Argmax{
r̄j,t + C√
2 ln∑
knk,t
nj,t
}
otherwiseAt the moment, C = 0 (exploration iff ni,t = 0)
Objective 1
Ob
jective
2
1. =2. = +mutation3. Is successful ?4. Update and5.
ParentSelection()
ComputeRewards()
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 8/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Defining Rewards I: (µ+ 1succ), (µ+ 1rank)
Parent a from the population Qg generates offspring a′.Both the offspring and the parent receive reward r:
(µ+ 1succ)
If a′ becomes member of new population Q(g+1):
r = 1 if a′ ∈ Q(g+1), and 0 otherwise
(µ+ 1rank)
A smoother reward is defined by the rank of a′ in Q(g+1):
r = 1−rank(a′)
µif a′ ∈ Q(g+1), and 0 otherwise
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 9/ 16
MotivationParent Selection Operators
Experimental validation
Tournament Selection (quality-based)Multi-armed BanditsReward-based Parent Selection
Defining Rewards II: (µ+ 1∆H1), (µ+ 1∆Hi
)
(µ+ 1∆H1)
Set the reward to the increase of the total Hypervolume contributionfrom generation g to g + 1:
r =
{
0 if offspring is dominated∑
a∈Q(g+1) ∆H(a,Q(g+1))−∑
a∈Q(g) ∆H(a,Q(g)) otherwise
(µ+ 1∆Hi)
A relaxation of the above reward, involving a rank-based penalization:
r =1
2k−1
∑
ndomk(Q(g+1))
∆H(a, ndomk(Q(g+1)))−
∑
ndomk(Q(g))
∆H(a, ndomk(Q(g)))
where k denotes the Pareto rank of the current offspring, andndomk(Q
(g)) is k-th non-dominated front of Q(g).
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 10/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Experimental validation
Algorithms
The steady-state MO-CMA-ES with modified parent selection:- 2 tournament-based: (µ+2 1) and (µ+10 1);- 4 MAB-based: (µ+ 1succ), (µ+ 1rank), (µ+ 1∆H1) and (µ+ 1∆Hi
).
The baseline MO-CMA-ES:- steady-state (µ+ 1)-MO-CMA , (µ≺ + 1)-MO-CMA and generational(µ+ µ)-MO-CMA.
Default parameters (µ = 100), 200,000 evaluations, 31 runs.
Benchmark Problems:
sZDT1:3-6 with the true Pareto front shifted in decision space:x′
i ← |xi − 0.5| for 2 ≤ i ≤ n
IHR1:3-6 rotated variants of the original ZDT problems
LZ09-1:5 with complicated Pareto front in decision space 2
2H. Li and Q. Zhang. "Multiobjective Optimization Problems With Complicated Pareto Sets,MOEA/D and NSGA-II." IEEE TEC 2009
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 11/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Results: sZDT1, IHR1, LZ3 and LZ4
0 2 4 6 8 10
x 104
10−2
10−1
100
sZDT1
number of function evaluations
∆ H
targ
et
(µ + 1)(µ
< + 1)
(µ +10
1)
(µ + 1∆ H
i
)
(µ + 1succ
)
(µ + 1rank
)
0 2 4 6 8 10
x 104
10−5
10−4
10−3
10−2
10−1
100
IHR1
number of function evaluations
∆ H
targ
et
(µ + 1)(µ
< + 1)
(µ +10
1)
(µ + 1∆ H
i
)
(µ + 1succ
)
(µ + 1rank
)
0 2 4 6 8 10
x 104
10−2
10−1
100
LZ09−3
number of function evaluations
∆ H
targ
et
(µ + 1)
(µ< + 1)
(µ +10
1)
(µ + 1∆ H
i
)
(µ + 1succ
)
(µ + 1rank
)
0 2 4 6 8 10
x 104
10−3
10−2
10−1
100
LZ09−4
number of function evaluations
∆ H
targ
et
(µ + 1)(µ
< + 1)
(µ +10
1)
(µ + 1∆ H
i
)
(µ + 1succ
)
(µ + 1rank
)
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 12/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Results: (µ+ 1), (µ+10 1), (µ+ 1rank) on LZ problems
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-4
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-4
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-4
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-5
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-5
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-5
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-3
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-3
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-3
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-2
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-2
x2
x3
0
0.5
1
-1
0
1-1
-0.5
0
0.5
1
x1
LZ09-2
x2
x3
0
0.5
1
0
0.5
10
0.2
0.4
0.6
0.8
1
x1
LZ09-1
x2
x3
0
0.5
1
0
0.5
10
0.2
0.4
0.6
0.8
1
x1
LZ09-1
x2
x3
0
0.5
1
0
0.5
10
0.2
0.4
0.6
0.8
1
x1
LZ09-1
x2
x3
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 13/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Loss of diversity
0 0.2 0.4 0.6 0.8 10
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Objective 1
Obj
ectiv
e 2
sZDT2
All populations of MO−CMA−ESTrue Pareto FrontPareto Front of MO−CMA−ES
0 0.2 0.4 0.6 0.8 1 1.2−1
0
1
2
3
4
5
Objective 1
Obj
ectiv
e 2
IHR3
All populations of MO−CMA−ESTrue Pareto FrontPareto Front of MO−CMA−ES
Typical behavior of (µ+ 1succ)-MO-CMA on sZDT2 (left) and IHR3 (right)problems after 5,000 fitness function evaluations.
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 14/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Summary
Speed-up (factor 2-5) with MAB schemes: (µ+ 1rank) and (µ+ 1∆Hi).
Loss of diversity, especially on multi-modal problems. Too greedyschemes: (µ+ 1succ) and (µ+t 1).
Perspectives
Find "appropriate" C for exploration term r̄j,t + C√
2 ln∑
k nk,t
nj,t.
Allocate some budget of evaluations for dominated arms (individuals)generated in the past to preserve the diversity.
Integrate the reward mechanism and update rules of (1+1)-CMA-ES,e.g. success rate and average reward in (µ+ 1rank).
Experiments on Many-objective problems.
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 15/ 16
MotivationParent Selection Operators
Experimental validation
ResultsSummary
Thank you for your attention !
Please send your questions to{Ilya.Loshchilov,Marc.Schoenauer,Michele.Sebag}@inria.fr
Ilya Loshchilov, Marc Schoenauer, Michèle Sebag Reward-based selection for MOO 16/ 16