Download - Noisy Optimization Convergence Rates (GECCO2013)

Log-logConvergence forNoisyOptimizationS. Astete-Morales, J. Liu, O. Teytaud

[email protected], INRIA-CNRS-LRI, Univ. Paris-Sud, 91190 Gif-sur-Yvette,France

Abstract

We consider: Noisy optimization problems, without the assumption of variance vanishing in theneighborhood of the optimum.We show mathematically: Exponential number of resamplings and number of resamplings poly-nomial in the inverse step-size lead to a log-log convergence rate.We show empirically: Convergence rate is obtained also with polynomial resampling setting.

Compared to the state of the art: Our results provide

i) Proofs of log-log convergence for evolution strategies (which were not covered byexisting results) in the case of objective functions with quadratic expectations andnoise with constant variance.

ii) Log-log rates also for objective functions with expectation Ef(x) = ||x− x∗||p.

iii) Experiments with different parametrizations.

Notationd: dimensionn: # iterationxn: parent at nrn: # eval. at n for each

en: # eval. at np: power in fitσn: step-sizeN : Gaussian

Algorithm: (µ, σ)-ESInitialize ParametersInput: initial individual and initial step-sizen← 1while (true) doGenerate λ individuals independently, each:ij = xn + σnNEvaluate each of them and average their fit-ness values.Select the µ best individualsUpdaten← n+ 1

end while

Theoretical Analysis

Focus: Simple revaluation rules, choosing thenumber of resamplings.

• Preliminary: Noise-free [Auger,A.]Some ES verify:

log(||xn||)n

< C < 0

• Non Adaptive, Scale InvarianceWe prove: If rn = dKζne then

log(||xn||)n

< C ′ < 0

• Adaptive, no Scale InvarianceWe prove: If rn = dY σ−ηn e then

log(||xn||)n

< C ′′ < 0

Different settingsand ⇒ Same property!

ad hoc resampling

Experimental Results• Polynomial number of resamplings:

Setting:No Scale Invariancefitness(x) = ||x||p +N , p = 2We experiment: rn = dLnζe

Results:ζ = 0⇒ poor resultsζ = 1, 2 or 3⇒ slopes close to − 1

2pBetter for ζ = 2 or ζ = 3 than ζ = 1

Conjecture:Asymptotic regime is − 1

2p , reached later for ζ large

0 5 10 15−5

−4

−3

−2

−1

0

1

log(Evaluations)

log(||y||)

K=2ζ=2p=2d=2

slope=−0.3267

0 5 10 15−5

−4

−3

−2

−1

0

1

log(Evaluations)

log(||y||)

slope=−0.2829 K=2ζ=2p=2d=4

Conclusion• Log-log Convergence:

Proof of convergence, no scale invariance, real algorithms in noisy caselog(||xn||)log(en)

→ C

• Room for Improvement:ConstantsLess noise ⇒ better ratesAlgorithms with surrogate models

References[1] Mohamed Jebalia and Anne Auger and Nikolaus Hansen: Log linear convergence and divergence of the scale-invariant (1+1)-ES in noisy environments, Springer (2010)[2] Rémi Coulom and Philippe Rolet and Nataliya Sokolovska and Olivier Teytaud: Handling Expensive Optimization with Large Noise, 19th International Symposium on

Foundations of Genetic Algorithms (2011)[3] Olivier Teytaud and Jérémie Decock: Noisy Optimization Complexity, FOGA - Foundations of Genetic Algorithms XII (2013)