Bias and Variance in Continuous EDA: massively parallel continuous optimization
-
Upload
olivier-teytaud -
Category
Technology
-
view
182 -
download
2
description
Transcript of Bias and Variance in Continuous EDA: massively parallel continuous optimization
Biais and varriance in continuous EDA
F. Teytaud, O. Teytaud
EA, Starsbourg 2009
Tao, Inria Saclay Ile-De-France, LRI (Université Paris Sud, France), UMR CNRS 8623, I&A team, Digiteo, Pascal Network of Excellenc
Outline
Introduction
Main step-size adaptation rules
State of the art
Experimental results
Conclusions
Evolutionary algorithms are parallel
Straightforward parallelization: If pop size = , then linear parallelization until processors.
But are there algorithms which really benefit from large ?
Goal of this paper
Restrict our attention to continuous domains;
Restrict our attention to unconstrained problems;
Restrict our attention to convergence rate (monomodal problems);
Restrict our attention to no covariance (but all algorithms can be generalized to cov.)
Analyze the speed-up as a function of , assuming at least processors.
Outline
Introduction
Main step-size adaptation rules
State of the art
Experimental results
Conclusions
The main rules for step-size adaptation
( )While I have time{ ( ,..., ) ( ,Generate points x1 x distributed as N x )
,...,Evaluate the fitness at x1 x , Update x update
}
: Main trouble choosing
- Cumulative step size adaptation
-Mutative self adaptation
Estimation of Multivariate Normal Algorithm
Main algorithms
Main algorithms
Main algorithms
Main algorithms
We have a simple and proved trick against
premature convergence now (Gecco paper).
Outline
Introduction
Main step-size adaptation rules
State of the art
Experimental results
Conclusions
Results from Beyer and Sendhoff
Cumulative step-size adaptation <== not very good for large
Mutative self-adaptation <== much better (+ covariance possible)
Estimation of Multivariate Normal Algorithm <== ? ? ?
Outline
Introduction
Main step-size adaptation rules
State of the art
Experimental results
Conclusions
First, we confirm results from Beyer and Sendhoff
(sphere function;see Beyer and Sendhoff for more)
EMNA or SSA on the sphere
EMNA or SSA on the sphere
twice faster than CSA
three times faster than CSA
2.5 times faster than CSA
2.5 times faster than CSA
EMNA or SSA on the sphere
80% faster than SA
30% fasterthan SA
13% fasterthan SA
13% fasterthan SA
Anisotropic version: one step-size per axis
Anisotropic version: one step-size per axis
We recover, on the Cigar or Schwefel functions,
results similar to the sphere
Outline
Introduction
Main step-size adaptation rules
State of the art
Experimental results
Conclusions
Conclusions, 1/2: EMNA is great for parallelism
Continuous spaces, unconstrained optimization, from the point of view of convergence rate.
Simple algorithm, similar to EMNA (trick against premature convergence); simpler than SA or CSA.
Parallel performance > SA >> CSA.
Straightforward covariance-based version
Parameter-free (but: smaller when large ?)
Straightforward fault-tolerance (important for grids/clouds!)
Conclusions, 2/2: fundamental issues
As known since Beyer, 2001,
(1, ) far less parallel than ( µ/µ, )
The higher the dimension, the better the speed-up (consistent with Fournier et al., PPSN08).
# of processors can be linear as a function of dimension