[IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)]...

Building Multivariate Density Functions based onPromising Direction Vectors

Ignacio Segovia DomınguezCenter for Research in Mathematics

Guanajuato, MexicoEmail: [email protected]

Arturo Hernandez AguirreCenter for Research in Mathematics

Guanajuato, MexicoEmail: [email protected]

Abstract—In this paper we introduce a method to build a largevariety of multivariate density functions based on univariate dis-tributions and promising direction vectors. The stochastic modelconstructed in our proposal simulates random vector towardsdirections with high probability of improving the population.Also, we provide two algorithms to use this ideas in the globaloptimization problem. The first one is a Hybrid Estimation ofDistribution Algorithm and the second one is the Adaptive Basisof Evolution Strategy. Both algorithms are tested and show a goodperformance in a set of benchmark problems, even outperformingpopular competitive algorithms. In the best of our knowledge, thecentral idea described here is not in previous literature aboutglobal optimization.

I. INTRODUCTION

A statistical model describes the relationship among ran-dom variables in a mathematical framework [1]. Many prob-lems in computer science imply the adjustment of parametricmodels to achieve the required properties. In this context,the parameter inference plays a critical role in the modelselection problem. Researchers are usually concerned about anaccurate parameter estimation, but in many cases the originalparameter’s model is not enough to adjust properly to thedataset. Even the parametric model selected could not besufficiently rich to capture the desirable properties.

The global optimization problem has been widely studiedby using different approaches. In the last decades stochasticalgorithms outperforms previous proposals [2]. This has led tosearch new and complex statistic models [3] [4] [5] [6]. Manyefforts have been done in order to build multivariate densityfunctions capable of simulating the random vector in promisingdirections of the search space [7] [8] [9]. The majority ofthese are focused to Multivariate Normal Distribution, therebylimiting the possible samples.

Our approach merges multivariate random distributions andlinear transformations, albeit we are specially interested inChange of Basis of Multivariate Random Vectors. Althoughthis approach will increase the number of parameters, a richerstatistical model than the original, can be built. In addition, ourproposal builds a large variety of Multivariate Densities thatcontains dependency of variables accordingly to promissorydirections in the landscape.

The organization of the paper is as follows. Section IIintroduces the multivariate distributions based on linear trans-formations of univariate distributions. Section III develops achange of basis matrix from promissory search directions. In

section Section IV we present a way to apply the previousideas to improve the Estimation of Distribution Algorithmperformance. Section V provides a way to use these ideasin the evolution strategy algorithm. Section VI is devoted tothe computational results on a set of test problems, whereboth algorithms show good performance consistently. Finally,Section VII provides some concluding remarks.

II. CHANGE OF BASIS OF MULTIVARIATE RANDOMVARIABLES

In many applications we are interested in some subspaceof Rn defined as the set of all linear combinations of somegiven set of vectors. Often it will be more convenient to writeelements in this subspace as linear combinations of vectorsfrom a set containing as few vectors as possible [10]. Thisnotion lead naturally to the following definition.

Definition 2.1: If a subspace W of a vector space V consistof the set of all linear combinations of a finite set of vectorsu1, ...,us from V, then the set is said to span the space W. Wealso imply that the set generates the space W.

A common example is the canonical basis {e1, ..., es},which spans the vector space Rs. But, definition 2.1 also allowsset of vectors linearly dependent, even though the number ofvector is higher than the dimension s. We will be interestedin spanning sets containing as few vectors as possible, so thefollowing theorem help us with this matter.

Theorem 2.1: If u1, ...,us are not all zero and span avector space, we can always select from these a linealyindependent set that spans the same space.

Linearly independent spanning sets will turn out to be thesmallest possible spanning sets that we desire for efficientrepresentation of vector spaces. Therefore we give them aspecial name [10].

Definition 2.2: A basis for a vector space is a linearlyindependent set of vectors that spans the space.

As a consequence of the described above, the expressionof any vector in a vector space in terms of vectors in a basisfor the space is unique [11]. Moreover, the dimension of thespace is assigned by the number of vectors in a basis. Thusany vector can be changed to another basis coordinates by alinear transformation. Firstly we define a useful matrix.

Definition 2.3: A change of basis matrix B, from basis βto canonical basis, is a set of linearly independent basis columnvectors {b1, ...,bD}, where D is the dimension of the space.

2013 IEEE Congress on Evolutionary Computation June 20-23, Cancún, México

978-1-4799-0454-9/13/$31.00 ©2013 IEEE 1702

The change of coordinates in any vector ~Z from basis βto canonical basis, and vice versa, is related to the change ofbasis matrix B. So, the column vector [~Z]β in basis β can beexpressed in canonical euclidian coordinates as,

[~Z]e = B[~Z]β (1)

and vice versa

[~Z]β = B−1[~Z]e (2)

Notice that the change of coordinates in this way canbe easily computed. However, do not overlook that we areconcerned in change of basis of random vectors.

Moreover, we construct the multivariate distributions basedon linear transformations of univariate distributions. Let f =(f1(·), ..., fD(·))′ denote a vector of univariate distributionsand [~θi]β the vector of parameters for univariate distributionfi(·), in coordinates with respect to basis β. Thus, the prob-ability density function of the multivariate distribution withindependent components respect to basis β is given by,

f([~Y ]β ; [θ]β

)=

D∏i

f([Yi]β ; [~θi]β

)(3)

which has not dependent variables with respect to basis βcoordinates.The multivariate distribution proposed above allows easysimulation of random vectors. It is only required randomsimulations from independent distributions fi(·) with vectorparameter [~θi]β . Although there is not dependency amongrandom variables [Yi]β these might arise from a change ofbasis. We are interested in random vectors respect to canonicalbasis, therefore random vector simulations of density (3) canbe performed by,

[~Y ]β ∼ f([~Y ]β ; [θ]β

)(4)

[~Y ]e = B[~Y ]β (5)

where the matrix B is the change of basis matrix frombasis β to canonical basis. The change of basis in (5)produces interaction between variables not previously found.For instance, consider the 2D examples in Table I and Fig.1.Although there is not dependency among variables respect tobasis β, a wide and complex set of interactions arises respectto canonical basis.

Notice that the main directions of distributions in canonicalspace are produced by the basis vectors in B. Then a statisticaldensity built in this way allows us to model promising searchdirections. This feature and the facility to create new andcomplex density functions makes this a promissory approach.

TABLE I. DESCRIPTION OF EXAMPLES

Example B f1(·) f2(·)

E1[

1 10.5 2

]LN ([1]β , [1]β) LN ([1]β , [1]β)

E2[2 −21 −0.2

]N ([0]β , [1]β) N ([0]β , [1]β)

E3[

1 −0.5−2 0.5

]N ([0]β , [0.25]β) Exp ([1.5]β)

E4[1 0.51 −0.5

]Beta ([2]β , [2]β) LN ([0]β , [1]β)

III. BUILDING A CHANGE OF BASIS MATRIX

In many applications a statistical model is choosen to de-scribe a selected property or phenomena. In this case a properlyparameter selection is needed. Sometimes the researcher has aset of data in order to perform a parameter inference method.In other situations the lack of data leads the researcher to getinformation about parameters from another place.

Our construction of multivariate distribution describedin section II enables complex modelling. However, noticethat the number of parameters required grows quickly withincreasing dimension, due to the presence of change of basismatrix B. Because two sets of parameters are requested:the usual parameters for univariate distributions and D × Dparameters to build the matrix B. In this section we proposea way to compute the second one.

Since our goal is promissory directions modelling, anatural step is to build the matrix B from the populationand its fitness values. The individuals are just finite samplesfrom the continuous domain search. It produces restrictedknowledge about the objective problem F . Nevertheless, ifthese samples are different then is possible to find promissorysearch directions.

Let us analyze the simplest (non-trivial) case: how muchinformation about promissory directions give a population sizeof two ? To answer this question please notice Fig.2.

This example shows a promissory direction search for sam-ple(~Xi

)in minimization problem, based on the knowledge of

another one and its own. The main idea is to lead the movementtowards the other individual if its fitness value is less, or tothe opposite direction if there is not. This is summarized indefinition 3.1.

Definition 3.1: The simplest direction improvement vector~vik is the vector towards the sample/individual ~Xi whereimprovement in fitness value is expected, due to the knowledgeof the other sample ~Xk. For minimization problems it is givenby the following,

~vik = sign(F(~Xi

)−F

(~Xk

))·(~Xk − ~Xi

)(6)

which follows that ~vik = ~vki.

As a consequence, each pair of individuals with differentfitness values have its own simplest direction improvementvector. In addition, if there is at least two different sampleswith different fitness values, promissory search directions can

1703

0 1 2 3 4 5 6 7 8 9

0

2

4

6

8

10

12

14

16

(a)

−6 −4 −2 0 2 4 6

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

(b)

−1.5 −1 −0.5 0 0.5 1

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

(c)

0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2

−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

(d)

Fig. 1. Contour plot of probability distribution functions respect to canonicalbasis. The vectors correspond to matrix B. The parameters used are in tableI. (a) Example E1. (b) Example E2. (c) Example E3. (d) Example E4

be found. In the literature there are proposals to compute agradient estimation based in similar versions of this definition[12] [13].

• •F

(~Xi

)= 2.6

~Xi

F(~Xk

)= 1.8

~Xk~vik

(a)

• •F

(~Xi

)= 2.6

~Xi

F(~Xk

)= 3.1

~Xk

~vik

(b)

Fig. 2. Building the simplest direction improvement vector by knowledgecollected from two individuals

Notice that a population size of N individuals produces

NC2 =N !

2! (N − 2)!=N (N − 1)

2(7)

different simplest direction improvement vectors. In addition,each individual has N − 1 simplest direction improvementvectors when it is compared to the rest of the population.

The change of basis matrix B is composed by D basis vec-tors. If we are interested in promissory directions modelling, areasonable thought is to link matrix B and simplest directionimprovement vectors, please see definition 3.2.

Definition 3.2: The direction improvement change of basismatrix Bi consists of D different column vectors, where eachof these is a simplest direction improvement vector. Hence,

Bi =[~vik1~vik2 · · ·~vikD−1

~vikD]

(8)

where subindexes k1 6= k1 6= · · · 6= kD−1 6= kD.

Then, the matrix Bi is built by simplest directionimprovement vectors from a comparison between the ~Xi

individual with D other individuals. Thus, there is a directionimprovement change of basis matrix Bi to each individual~Xi.

Our proposal encodes information in change of basis matrixabout promising directions search. Such that the simulationsby the multivariate distribution described in section II buildrandom vectors inheriting the knowledge about directions.Now the question is: how to select the other D individualsdifferent to ~Xi to compute vectors ~vikj? Well, still a widestudy should be done to answer this question; because manyproperties should be analyzed like proximity, angular distanceamong others. In applications described below we select theD nearest neighbors to individual ~Xi.

IV. HYBRIDIZATION OF ESTIMATION OF DISTRIBUTIONALGORITHMS BY LOCAL MUTATION DISTRIBUTIONS

Estimation of Distribution Algorithms (EDA) are stochasticoptimization methods that use probability density functions tomodel uncertainty about optimum location [2]. This approachhas shown interesting properties and good performanceon many benchmark problems [7] [14]. That is because

1704

the stochastic model guides the global search by samplingpromissory regions in the search space. It does not explicitlybuild directions search to improve the fitness values. Inthis regard, many efforts have been done in order to adddirection search to EDAs [15] [16] [17] [18]. In this sectionwe propose a way to apply the previously developed changeof basis of multivariate random variables to improve the EDAperformance.

The global and local search are important issues instochastic optimization algorithms, therefore a compromisebetween them should exist. We perform a simplistic hybridEDA version to maintain global and local search. The mainidea is simulation of a portion of individuals from a globalstochastic model and the rest from few local models, pleasesee the Fig.3.

1: Pob0 ← U (Domain) . First population2: Nlocal ← Number of local mutations3: t← 04: while (Stop condition is not reached) do5: Compute fitness value of Pobt6: Sglobal ← Samples from global model7: for i = 1 : Nlocal do . Local mutations8: C1:D ← D nearest neighbors to individual ~Xi

9: Build Bi by definition 3.210: [~Y ]β ∼ f

([~Y ]β ; [θ]β

)11: [~Y ]e = Bi[~Y ]β12: S

(i)local ← ~Xi + [~Y ]e

13: end for14: Pobt+1 ← Selection (Pobt, Sglobal, Slocal)15: t← t+ 116: end while

Fig. 3. The Hybrid EDA pseudocode

The Hybrid EDA pseudocode describes our local mutationprocedure. In addition, the population encompasses knowledgefrom previous population, global model and local models. Inthe experimental section we present some results to discuss theeffectiveness of local mutations on this hybrid EDA. Although,this Hybrid EDA version does not depend on global modelselected, in order to test it we choose a successful global modelknown as the Gaussian Poly-Tree [19].

V. ADAPTIVE BASIS EVOLUTION STRATEGY

The Evolution Strategies (ES) are one of the main branchesof evolutionary computation [20] [21]. These maintain a set ofµ individuals and update its population by λ simulations froma stochastic model, among other heuristics. There are manyversion of ES algorithms, though two of them have been moresuccessful: the CMA-ES and Natural Evolution Strategies [8][9].

The stochastic model is an important issue for ES algo-rithms. Many proposals have been studied [22]. We developeda ES algorithm with our stochastic model proposed in sectionsII and III, because any multivariate distribution could be used.Two issues will be studied: an adaptive basis β and theparameters of univariate distributions.

In previous Hybrid EDA algorithm we used many changeof basis matrices, one for each local mutation. Since we willtrait with |µ| = 1 parents just one matrix B is needed. A setof D×D updates are necessary to build the matrix B in eachiteration t. Then the number of parameters rapidly grows. Toavoid this we build a simplified basis β.

Suppose a normalized vector ~b0 defines the main directionof column basis vectors in B. In addition, the angle ψ andthe length δ denotes the angle between two vectors ~b0 ↔ ~biand the real length of vectors in the search space; all ~bi arenormalized vectors. Notice that this will reduce the number ofparameters to D + 2.

1: aux1 ←∑Di=1 ~ei

2: ~o0 ← aux1/||aux1||3: aux2 ← ~ei − (~ei · ~o0) · ~o04: ~wi ← aux2/||aux2||5: R2×2 ←

[cosψ − 1 − sinψsinψ cosψ − 1

]6: ~oi ← ~o0 + [~o0 ~wi]R2×2

[~o0 · ~o0~wi · ~o0

]7: aux3 ← ~b0 −

(~bi · ~o0

)· ~o0

8: ~q0 ← aux3/||aux3||9: A←

[~o0 ~q0 null(~o0,~b0)

]10: φ← ~o0 ·~b0

11: RD×D ←

cosφ − sinφ 0 · · · 0sinφ cosφ 0 · · · 00 0 1 · · · 0

0 · · · 0. . . 0

0 · · · · · · 0 1

12: M← A RD×D A−1

13: ~bi ←M~oi

Fig. 4. Building the normalized basis vectors ~bi

How to build a change of basis matrix B from D + 2parameters? There are a lot of possible constructions, herewe propose a method based in linear algebra. The algorithmdescribed in Fig.4 computes the normalized basis vectors ~bistarting with canonical basis vectors ~ei; in addition a nullprocedure is required, it returns an orthonormal basis for thenull space of arguments obtained from the singular valuedecomposition. The main idea is to build the change of basisvectors in canonical space and rotate it towards our main vectorby matrix M. Hence, the change of basis matrix is given by,

B = δ ·[~b1 ~b2 · · · ~bD−1 ~bD

](9)

Since we use |µ| = 1 the alone individual will be called~X0. The evolution strategies updates the parents set in eachgeneration from a subset of λ samples; ~X1, ~X2, ..., ~XN . Eachof them collect different information about the objective func-tion. We use this fact to compute weighted updates. The setof weights is computed as,

αi =exp

(β(F(~Xi

)−F

(~X0

)))∑Nj=i exp

(β(F(~Xj

)−F

(~X0

))) (10)

1705

where,

β = − log (N)

|Fworst −F(~X0

)|

(11)

How much do we believe in the knowledge collected bythe samples λ? The new information could guide the searchtowards a local optimum, for this reason we should be cautious.A parameter ρ is added to avoid trapping in local optimum.The evolution strategy proposed is summarized in Fig.5.

1: ~X(0)0 ← U (Domain) . First sample

2: ρ← Adaptability value3: N ← Number of samples4: B(0) ← Random orthonormal matrix5: ~b

(t)0 ← Main vector of B(0)

6: t← 07: while (Stop condition is not reached) do8: for i = 1 : N do . Local mutations9: [~Y ]β ∼ f

([~Y ]β ; [θ]β

)10: [~Y ]e = B(t)[~Y ]β11: λ

(t)i ← ~X

(t)0 + [~Y ]e

12: end for13: Compute fitness values of λ(t)

14: ~X(t+1)0 ← Selection

(~X(t)0 , λ(t)

)15: if

(~X(t+1)0 == ~X

(t)0

)then . There is not update

16: tlast ← Last generation where ~X0 was updated17: λ(t) ←

[λ(t) λ(t−1) · · · λ(tlast+1) λ(tlast)

]18: end if19: Compute αi by (10) . To update B20: ~b

(new)0 ←

∑Nk=1 αk · ~v0k

21: δ(new) ← |~b(new)o |

22: ψ(new) ←∑Nk=1 αk · arccos

(~b(t)0 · (~v0k/||~v0k||)

)23: δ(t+1) ← (1− ρ) δ(t) + ρδ(new)

24: aux1 ← (1− ρ)~b(t)0 + ρ~b(new)0

25: ~bt+10 ← aux1/||aux1||

26: ψ(t+1) ← (1− ρ)ψ(t) + ρψ(new)

27: Compute vectors ~b(t+1)i by algorithm 4

28: Build B(t+1) by (9)29: t← t+ 130: end while

Fig. 5. The Adaptive Basis Evolution Strategy pseudocode

Our proposal adjusts the change of basis matrix B inevery generation. Our selection procedure switch betweentwo popular schemas. The (µ + λ) schema is used in everygeneration, except when a number of fixed not updates to ~X0

is reached, just in this generation a (µ, λ) schema is performed.Moreover, the set λ(t) includes samples from the last update of~X0 to the actual generation. It is important to emphasize that,in the best of our knowledge, there is not a similar basis-basedevolution strategy in literature. In the section of experimentswe empirically analyze the convergence of this proposal.

VI. EXPERIMENTS

The Change of Basis of Multivariate Random Variables,described in this paper, is a new proposal to merge multivariate

distributions and promissory directions. In sections IV and Vwe developed two possible ways to use it. Firstly a hybridschema was proposed to enhance the Estimation of Distribu-tion Algorithms performance and secondly a new adaptablebasis-based evolution strategy was developed. Both algorithmsare tested and the results are presented below.

A. Experiment on The Gaussian Poly-Tree EDA

The global distribution used here is the Gaussian Poly-Tree (GPT), so a comparison among the no-hybrid and the hy-brids algorithms was executed. The no-hybrid EDA simulatesd0.5Ne individuals from the global distribution model. In thehybrid EDAs version, for a population with size N , as manyas 12.5% of N are mutated individuals and 37.5% is generatedfrom the global distribution model. For both versions we useall population to build the global model and next populationselection is applied as the normal µ + λ approach. The tableII shows the parameters for each univariate model, notice thatthe same univariate density is used to each basis vector.

TABLE II. UNIVARIATE DISTRIBUTIONS AND ITS PARAMETERSRESPECT TO BASIS β

Model fi([Yi]β ; [~θi]β)M1 N ([0]β , [0.6084]β)M2 U ([−1.25]β , [1.25]β)M3 LN ([−0.657]β , [0.6084]β)M4 U ([0]β , [1.25]β)

Each algorithm is tested with a total of 12 benchmarkproblems, which are described in the Appendix section. Thefollowing parameters are used in all experiments. Independentruns are performed with the following initialization:Number of runs: 30 .Initialization. Asymmetric initialization for all problems, us-ing the search space shown in Table VIII .Population size. For a problem in D dimensions the popula-tion is [20× (1 +D0.7)].Stopping conditions. Maximum number of fitness functionevaluations is reached: 3 × 105; or target error smaller than1× 10−6; or not improving larger than 1× 10−13 is detectedafter 30 generations when the mean of D standard deviations,one for each dimension, is less than 1× 10−13.

TABLE III. SUCCESS RATE ON 10 DIMENSION PROBLEMS

F GPT GPT +M1 GPT +M2 GPT +M3 GPT +M4

F1 100.00 100.00 100.00 100.00 100.00F2 0.00 0.00 0.00 0.00 0.00F3 0.00 0.00 0.00 0.00 0.00F4 50.00 100.00 100.00 100.00 100.00F5 0.00 100.00 100.00 100.00 100.00F6 30.00 100.00 100.00 100.00 100.00F7 0.00 100.00 100.00 100.00 100.00F8 100.00 100.00 100.00 100.00 100.00F9 96.67 100.00 96.67 93.33 96.67F10 0.00 0.00 0.00 0.00 0.00F11 80.00 100.00 100.00 100.00 100.00F12 0.00 0.00 0.00 0.00 0.00

Comments: the success rate on 10 dimension problems, TableIII, shows a significant enhance to reach the optimum value,when a change of basis model is added. The success rateon problems Different Powers, Schwefel 1.2, Zakharov, Tridand Levy 8 were totally improved and the most successfullocal model was the gaussian distribution. A similar behavior

1706

TABLE IV. SUCCESS RATE ON 20 DIMENSION PROBLEMS


F1 100.00 100.00 100.00 100.00 100.00F2 0.00 0.00 0.00 0.00 0.00F3 0.00 0.00 0.00 0.00 0.00F4 0.00 100.00 100.00 100.00 100.00F5 0.00 96.67 100.00 86.67 100.00F6 0.00 100.00 100.00 100.00 100.00F7 0.00 0.00 0.00 0.00 0.00F8 100.00 100.00 100.00 100.00 100.00F9 90.00 100.00 100.00 100.00 100.00F10 0.00 0.00 0.00 0.00 0.00F11 96.67 100.00 100.00 100.00 100.00F12 0.00 0.00 0.00 0.00 0.00

TABLE V. AVERAGE OF MINIMUM VALUES OBTAINED AND ITSSTANDARD DEVIATION ON 20 DIMENSION PROBLEMS


F1 8.47e-7 8.51e-7 8.77e-7 8.50e-7 8.73e-7(1.10e− 7) (1.16e− 7) (9.20e− 8) (9.61e− 8) (1.02e− 7)

F2 3.12e+2 2.16e+2 2.61e+2 2.99e+2 2.33e+2(1.51e+ 2) (7.60e+ 1) (1.10e+ 2) (1.43e+ 2) (8.13e+ 1)

F3 1.14e+2 7.12e+1 6.98e+1 6.64e+1 6.57e+1(1.97e+ 2) (1.42e+ 1) (1.68e+ 1) (1.63e+ 1) (1.55e+ 1)

F4 2.31e-1 9.21e-7 8.64e-7 8.03e-7 7.54e-7(6.99e− 1) (1.06e− 7) (1.67e− 7) (1.85e− 7) (1.95e− 7)

F5 5.33e-1 1.00e-6 9.89e-7 1.07e-6 9.81e-7(4.09e− 1) (8.05e− 8) (1.28e− 8) (2.82e− 7) (2.69e− 8)

F6 8.96e-2 9.32e-7 9.50e-7 9.42e-7 9.58e-7(2.79e− 1) (6.04e− 8) (5.15e− 8) (5.40e− 8) (3.28e− 8)

F7 -7.82e+2 -1.43e+3 -1.44e+3 -1.42e+3 -1.45e+3(2.73e+ 2) (3.06e+ 1) (1.84e+ 1) (3.14e+ 1) (2.22e+ 1)

F8 9.41e-7 9.13e-7 9.10e-7 9.14e-7 9.14e-7(6.92e− 8) (7.26e− 8) (9.24e− 8) (6.38e− 8) (6.54e− 8)

F9 8.05e-6 9.01e-7 9.31e-7 9.35e-7 9.01e-7(3.84e− 5) (1.10e− 7) (8.82e− 8) (6.53e− 8) (1.10e− 7)

F10 7.29e+1 7.66e+1 7.58e+1 7.74e+1 7.44e+1(5.93e+ 0) (5.44e+ 0) (6.18e+ 0) (5.21e+ 0) (8.04e+ 0)

F11 7.35e-6 8.57e-7 8.66e-7 8.83e-7 8.39e-7(3.55e− 5) (1.18e− 7) (1.12e− 7) (7.53e− 8) (1.76e− 7)

F12 1.82e+1 1.60e+1 1.60e+1 1.60e+1 1.59e+1(2.92e− 1) (1.82e− 1) (2.19e− 1) (1.98e− 1) (1.71e− 1)

happened in the success rate of 20 dimension problems, TableIV, where the success rate on problems Different Powers,Schwefel 1.2, Zakharov and Levy 8 were totally improved,but the most successful local models were the log-normal andthe asymmetric uniform density. Although, the 20 dimensionproblems Ellipsoid, Cigar Tablet, Trid and Rosenbrock werenot improved in success rate values, Table V shows a markedimprovement in its mean aptitude values reached. In addition,Table VI shows a less number of evaluations for no-hybridEDA than the hybrid versions, this means a premature conver-gence of the no-hybrid EDA. Then, we conclude that the localmutation method enhances the EDA performance, at least inthis set of benchmark functions.

B. Experiment on Adaptive Basis Evolution Strategy

The Adaptive Basis Evolution Strategy described in sectionV require a few parameters to be tested. In this experiment wechoose a number of samples of N = 1 + round(1.5 log(D)),where D is the dimension problem. The adaptability factor isthe fixed value ρ = 1

5 and the number of generations until ~X0

is not updated is round(10 log(D)).

The set of parameters [~θ]β for each univariate model is

TABLE VI. AVERAGE OF NUMBER OF EVALUATIONS OBTAINED ANDITS STANDARD DEVIATION ON 20 DIMENSION PROBLEMS


F1 1.84e+4 2.31e+4 2.31e+4 2.31e+4 2.31e+4(2.26e+ 2) (2.93e+ 2) (2.38e+ 2) (2.34e+ 2) (2.04e+ 2)

F2 3.00e+5 3.00e+5 3.00e+5 3.00e+5 3.00e+5(0.00e+ 0) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0)

F3 3.00e+5 3.00e+5 3.00e+5 3.00e+5 3.00e+5(0.00e+ 0) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0)

F4 2.20e+5 1.44e+5 1.15e+5 1.03e+5 9.43e+4(3.14e+ 4) (5.10e+ 4) (3.87e+ 4) (3.56e+ 4) (2.90e+ 4)

F5 2.16e+5 2.60e+5 2.64e+5 2.66e+5 2.56e+5(1.03e+ 4) (2.83e+ 4) (2.09e+ 4) (2.67e+ 4) (2.33e+ 4)

F6 1.82e+5 2.04e+5 1.86e+5 1.97e+5 1.87e+5(1.10e+ 4) (1.86e+ 4) (2.81e+ 4) (2.83e+ 4) (2.53e+ 4)

F7 3.00e+5 3.00e+5 3.00e+5 3.00e+5 3.00e+5(2.40e+ 3) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0)

F8 2.23e+4 2.82e+4 2.82e+4 2.80e+4 2.81e+4(2.62e+ 2) (2.91e+ 2) (2.78e+ 2) (3.94e+ 2) (2.26e+ 2)

F9 2.88e+4 3.21e+4 3.28e+4 3.09e+4 3.04e+4(2.25e+ 4) (7.56e+ 3) (7.75e+ 3) (6.42e+ 3) (5.68e+ 3)

F10 3.00e+5 3.00e+5 3.00e+5 3.00e+5 3.00e+5(0.00e+ 0) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0)

F11 1.58e+4 1.98e+4 1.87e+4 1.95e+4 1.93e+4(8.76e+ 3) (5.38e+ 3) (3.32e+ 3) (5.47e+ 3) (4.83e+ 3)

F12 1.17e+5 3.00e+5 3.00e+5 3.00e+5 3.00e+5(1.84e+ 4) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0) (0.00e+ 0)

computed accordingly to the dimension problem. Such thatF ([1]β , [θ]β) = 0.25 for asymmetric multivariate distributionsand F ([1]β , [θ]β) − F ([-1]β , [θ]β) = 0.25 for symmetricmultivariate distributions, where 1 is a D × 1 column vectorof ones. Thus, we ensure simulations near to ~X0.

Four versions of our algorithm were tested: ABESUwith uniform asymmetric distributions, ABESN with normaldistributions centered in zero value, ABESLN with log normaldistributions and ABESExp with exponential distributions. Acomparison with CMA-ES and xNES algorithms was realized[23]. The experiment consist in 100 independent runs of eachalgorithm with a success rate of 100% for all algorithms. Inaddition, the number of simulations for the xNES algorithm isd6 + 6D0.6e to ensure a maximum success rate value.

Comments: all algorithms have a success rate of 100%,however some are faster than others, Fig.6 and Fig.7. ThexNES algorithm has the worst performance, even in the bestnumber of evaluations. Our proposal shows faster convergenceto global optimum. Some multivariate distributions outper-forms the CMAES algorithm, even in the worst number ofevaluations. More research is needed to analyze the benefitsof basis-based evolution strategies.

VII. CONCLUSION

In this paper we merged change of basis theory andmultivariate random variables to build more complex models,and we applied this to the global optimization problem. Thecapabilities of this approach suggest a wide range of possibleapplications. Our proposal builds a multivariate distributionwhich contains promissory directions search. Actually we usethe Change of Basis of Multivariate Random Variables in twopopular branches of evolutionary algorithms: Estimation ofDistribution Algorithms (EDAs) and Evolutionary Strategies(ES). To test the first one we constructed a simple hybrid

1707

2 4 6 8 10 12 14 16 18 2010

2

103

104

105

xNES

CMAESABES

U

ABESN

ABESLN

ABESExp

(a)

2 4 6 8 10 12 14 16 18 2010

2

103

104

105

xNES

CMAESABES

U

ABESN

ABESLN

ABESExp

(b)

2 4 6 8 10 12 14 16 18 2010

2

103

104

105

xNES

CMAESABES

U

ABESN

ABESLN

ABESExp

(c)

Fig. 6. Convergence of Adaptive Basis Evolution Strategy VS CMAES andxNES. Best number of evaluations. (a) Sphere. (b) Schwefel 1.2. (c) Zakharov

EDA. In this case the local mutations simulates promissorydirections search which enhance the EDA performance. In thesecond one an Adaptive Basis Evolution Strategy algorithmwas proposed and tested. Its capability to create samplesin promising directions makes a superior performance, even

2 4 6 8 10 12 14 16 18 2010

2

103

104

105

xNES

CMAESABES

U

ABESN

ABESLN

ABESExp

(a)

2 4 6 8 10 12 14 16 18 2010

2

103

104

105

xNES

CMAESABES

U

ABESN

ABESLN

ABESExp

(b)

2 4 6 8 10 12 14 16 18 2010

2

103

104

105

xNES

CMAESABES

U

ABESN

ABESLN

ABESExp

(c)

Fig. 7. Convergence of Adaptive Basis Evolution Strategy VS CMAES andxNES. Worst number of evaluations. (a) Sphere. (b) Schwefel 1.2. (c) Zakharov

versus popular algorithms like CMAES and xNES, at least in asubset of benchmark functions. In addition, the Adaptive BasisEvolution Strategy developed above is a generalized basis-based evolution strategy, which does not exist in literature,in the best of our knowledge.

1708

REFERENCES

[1] N. Balakrishnan and V. Nevzorov, A Primer on Statistical Distributions.Wiley, 2004. [Online]. Available: http://books.google.com.mx/books?id=JIfk5kBdLGIC

[2] M. Pelikan, K. Sastry, and E. Cantu-Paz, Eds., Scalable Optimization viaProbabilistic Modeling: From Algorithms to Applications, ser. Studiesin Computational Intelligence. Springer, 2006, vol. 33.

[3] R. Salinas-Gutierrez, A. Hernandez-Aguirre, and E. R. Villa-Diharce,“Estimation of distribution algorithms based on copula functions,” inProceedings of the 13th annual conference companion on Genetic andevolutionary computation, GECCO ’11. New York, NY, USA: ACM,2011, pp. 795–798.

[4] R. E. Neapolitan, Learning Bayesian Networks. Prentice Hall seriesin Artificial Intelligence, 2004.

[5] C. K. Chow and C. N. Liu, “Approximating discrete probabilitydistributions with dependence trees,” IEEE Transactions on InformationTheory, vol. IT-14, no. 3, pp. 462–467, May 1968.

[6] R. Shachter and C. Kenley, “Gaussian influence diagrams,” ManagementScience, vol. 35, no. 5, pp. 527–550, May 1989.

[7] J. Lozano, P. Larranaga, I. n. Inza, and E. Bengoetxea, Towards a NewEvolutionary Computation: Advances on Estimation of DistributionsAlgorithms, ser. Studies in Fuzziness and Soft Computing. Springer,2006.

[8] A. Auger and N. Hansen, “A restart cma evolution strategy withincreasing population size,” in Evolutionary Computation, CEC ’05.The 2005 IEEE Congress on, vol. 2, sept. 2005, pp. 1769 – 1776 Vol.2.

[9] D. Wierstra, T. Schaul, J. Peters, and J. Schmidhuber, “Natural evolutionstrategies,” in Proceedings of the Congress on Evolutionary Computa-tion, CEC ’08, Hongkong. IEEE Press, 2008.

[10] B. Noble and J. Daniel, Applied linear algebra. Prentice Hall, 1977.[11] G. Strang, Introduction to Linear Algebra. Wellsley-Cambrige Press,

2003.[12] R. Salomon, “Evolutionary algorithms and gradient search: similarities

and differences,” Evolutionary Computation, IEEE Transactions on,vol. 2, no. 2, pp. 45 –55, jul 1998.

[13] A. Lara, O. Schutze, and C. Coello Coello, “On gradient-based lo-cal search to hybridize multi-objective evolutionary algorithms,” inEVOLVE- A Bridge between Probability, Set Oriented Numerics andEvolutionary Computation, ser. Studies in Computational Intelligence,E. Tantar, A.-A. Tantar, P. Bouvry, P. Del Moral, P. Legrand, C. A.Coello Coello, and O. S, Eds. Springer Berlin Heidelberg, 2013, vol.447, pp. 305–332.

[14] I. Segovia-Domınguez, A. Hernandez-Aguirre, and E. Villa-Diharce,“Global optimization with the gaussian polytree eda,” in Proceedingsof the 10th international conference on Artificial Intelligence: advancesin Soft Computing, MICAI ’11, Volume Part II. Berlin, Heidelberg:Springer-Verlag, 2011, pp. 165–176.

[15] C. W. Ahn and H.-T. Kim, “Estimation of particle swarm distributionalgorithms: bringing together the strengths of pso and edas,” in Pro-ceedings of the 11th Annual conference on Genetic and evolutionarycomputation, GECCO ’09. New York, NY, USA: ACM, 2009, pp.1817–1818.

[16] R. Kulkarni and G. Venayagamoorthy, “An estimation of distributionimproved particle swarm optimization algorithm,” in Intelligent Sensors,Sensor Networks and Information, 2007. ISSNIP 2007. 3rd InternationalConference on, dec. 2007, pp. 539 –544.

[17] P. A. N. Bosman and E. D. de Jong, “Combining gradient techniques fornumerical multi-objective evolutionary optimization,” in Proceedings ofthe 8th annual conference on Genetic and evolutionary computation,GECCO ’06. New York, NY, USA: ACM, 2006, pp. 627–634.

[18] J. Hewlett, B. Wilamowski, and G. Dundar, “Merge of evolutionarycomputation with gradient based method for optimization problems,” inIndustrial Electronics, 2007. ISIE 2007. IEEE International Symposiumon, june 2007, pp. 3304 –3309.

[19] I. Segovia-Domınguez, A. Hernandez-Aguirre, and E. R. Villa-Diharce,“The gaussian polytree eda with copula functions and mutations,” inEVOLVE- A Bridge between Probability, Set Oriented Numerics andEvolutionary Computation, ser. Studies in Computational Intelligence,

E. Tantar, A.-A. Tantar, P. Bouvry, P. Del Moral, P. Legrand, C. A.Coello Coello, and O. Schutze, Eds. Springer Berlin Heidelberg, 2013,vol. 447, pp. 123–153.

[20] H. Lichtfuss, “Evolution eines rohrkrmmers,” TU-Berlin, 1965.[21] I. Rechenberg, “Cybernetic solution path of an experimental problem,”

Royal Air Force Establishment, Tech. Rep., 1965.[22] H.-G. Beyer, The theory of evolution strategies. New York, NY, USA:

Springer-Verlag New York, Inc., 2001.[23] T. Glasmachers, T. Schaul, S. Yi, D. Wierstra, and J. Schmidhuber,

“Exponential natural evolution strategies,” in Proceedings of the 12thannual conference on Genetic and evolutionary computation, GECCO’10. New York, NY, USA: ACM, 2010, pp. 393–400.

APPENDIX

One set of problems is used in the experiments: 12 differentproblems are shown in Table VII. Relevant information aboutthe functions, such as the optimum vector, value of the functionat that vector, and search domain is provided in Table VIII.

TABLE VII. TEST PROBLEMS.

Name Alias Definition

Sphere F1

∑di=1 x

2i

Ellipsoid F2∑di=1 10

6 i−1d−1 x2

i

Cigar Tablet F3 x21 +

∑d−1i=2 104x2

i + 108x2d

Different Powers F4

∑di=1 |xi|

2+10 i−1d−i

Schwefel 1.2 F5

∑di=1

(∑ij=1 xj

)2

Zakharov F6

∑di=1 x

2i +

(∑di=1 0.5ixi

)2+(∑d

i=1 0.5ixi)4

Trid F7∑di=1 (xi − 1)2 −

∑di=2 xixi−1

Ackley F8 −20 exp

(−0.2

√1d

∑di=1 x

2i

)− exp

(1d

∑di=1 cos (2πxi)

)+ 20 + e

Griewangk F9∑di=1

x2i4000 −

∏di=1 cos

(xi√i

)+ 1

Rastrigin F10 10d+∑di=1

[x2i − 10 cos (2πxi)

]Levy 8 F11

∑d−1i=1 (yi − 1)2

[1 + 10 sin2 (πyi+1)

]yi = 1 + 1

4 (xi + 1) sin2 (πy1) + (yd − 1)2

Rosenbrock F12∑d−1i=1

[(1− xi)2 + 100

(xi+1 − x2

i

)2]

TABLE VIII. SEARCH DOMAIN, GLOBAL MINIMUM AND PROPERTIESOF TEST PROBLEMS.

Alias Modes Global Minimum Domain

F1 Unimodal f(x∗) = 0 : xi = 0 xi ∈ [−600, 300]d






F7 Unimodal f(x∗) = − d(d+4)(d−1)6 : xi = i (d+ 1− i) xi ∈

[−d2, d2

]dF8 Multimodal f(x∗) = 0 : xi = 0 xi ∈ [−10, 5]d

F9 Multimodal f(x∗) = 0 : xi = 0 xi ∈ [−600, 300]d


F11 Multimodal f(x∗) = 0 : xi = −1 xi ∈ [−10, 5]d


1709

[IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)]...

Documents

Transcript of [IEEE 2013 IEEE Congress on Evolutionary Computation (CEC) - Cancun, Mexico (2013.06.20-2013.06.23)]...