[Lecture Notes in Computer Science] Applications of Evolutionary Computation Volume 7835 || Cloud...

10
Cloud Scale Distributed Evolutionary Strategies for High Dimensional Problems Dennis Wilson, Kalyan Veeramachaneni, and Una May O’Reilly Massachusetts Institute of Technology, USA {dennisw,kalyan,unamay}@csail.mit.edu Abstract. We develop and evaluate a cloud scale distributed covariance matrix adaptation based evolutionary strategy for problems with dimen- sions as high as 400. We adopt an island based distribution model and rely on a peer-to-peer communication protocol. We identify a variety of parameters in a distributed island model that could be randomized lead- ing to a new dynamic migration protocol that can prove advantageous when computing on the cloud. Our approach enables efficient and high quality distributed sampling while mitigating the latencies and failure risks associated with running on a cloud. We evaluate performance on a real world problem from the domain of wind energy: wind farm turbine layout optimization. 1 Introduction Our goal is to design Estimation of Distribution Algorithms, or EDAs, for high dimensional problems via large scale cloud computing resources: hundreds or even thousands of cores made available via virtualization of commodity hard- ware. Cloud-scaling helps us achieve the much needed higher sampling rates for high dimensional problems. While the cloud provides us access to large num- ber of resources on demand, resource sharing through virtualization implies that some nodes could be slower than others. Hence algorithms designed based on synchronous computation/communication and shared, distributed memory ar- chitectures could incur latencies. We are investigating whether a certain type of distribution model for EDAs is more amenable to these on-demand resources and what distribution protocol and software infrastructure we should build to support them. An example model is a master-slave fitness distributed model with a very large sample population. However, in this model a bottleneck arises when the distribution is re-estimated since the entire population needs to be evaluated before a sub-sample is selected to re-estimate the distribution. Recognizing this, as well as due to success of island based models in classical evolutionary algo- rithms, we are exploring an island based model where multiple instances of an EDA, one per island, optimize locally while periodically communicating progress information to neighbors. The asynchronous execution and communication between islands allows for desired higher sampling rates, but requires a distribution methodology. We map A.I. Esparcia-Alc´ azar et al. (Eds.): EvoApplications 2013, LNCS 7835, pp. 519–528, 2013. c Springer-Verlag Berlin Heidelberg 2013

Transcript of [Lecture Notes in Computer Science] Applications of Evolutionary Computation Volume 7835 || Cloud...

Cloud Scale Distributed Evolutionary Strategies

for High Dimensional Problems

Dennis Wilson, Kalyan Veeramachaneni, and Una May O’Reilly

Massachusetts Institute of Technology, USA{dennisw,kalyan,unamay}@csail.mit.edu

Abstract. We develop and evaluate a cloud scale distributed covariancematrix adaptation based evolutionary strategy for problems with dimen-sions as high as 400. We adopt an island based distribution model andrely on a peer-to-peer communication protocol. We identify a variety ofparameters in a distributed island model that could be randomized lead-ing to a new dynamic migration protocol that can prove advantageouswhen computing on the cloud. Our approach enables efficient and highquality distributed sampling while mitigating the latencies and failurerisks associated with running on a cloud. We evaluate performance on areal world problem from the domain of wind energy: wind farm turbinelayout optimization.

1 Introduction

Our goal is to design Estimation of Distribution Algorithms, or EDAs, for highdimensional problems via large scale cloud computing resources: hundreds oreven thousands of cores made available via virtualization of commodity hard-ware. Cloud-scaling helps us achieve the much needed higher sampling rates forhigh dimensional problems. While the cloud provides us access to large num-ber of resources on demand, resource sharing through virtualization implies thatsome nodes could be slower than others. Hence algorithms designed based onsynchronous computation/communication and shared, distributed memory ar-chitectures could incur latencies. We are investigating whether a certain typeof distribution model for EDAs is more amenable to these on-demand resourcesand what distribution protocol and software infrastructure we should build tosupport them.

An example model is a master-slave fitness distributed model with a verylarge sample population. However, in this model a bottleneck arises when thedistribution is re-estimated since the entire population needs to be evaluatedbefore a sub-sample is selected to re-estimate the distribution. Recognizing this,as well as due to success of island based models in classical evolutionary algo-rithms, we are exploring an island based model where multiple instances of anEDA, one per island, optimize locally while periodically communicating progressinformation to neighbors.

The asynchronous execution and communication between islands allows fordesired higher sampling rates, but requires a distribution methodology. We map

A.I. Esparcia-Alcazar et al. (Eds.): EvoApplications 2013, LNCS 7835, pp. 519–528, 2013.c© Springer-Verlag Berlin Heidelberg 2013

520 D. Wilson, K. Veeramachaneni, and U.M. O’Reilly

each island to an independent node (with either single or multiple cores) andbuild a socket level communication layer across the network of nodes. In thissubmission we present a cloud EDA algorithm we have named CASINO– CloudAssets for StochastIc Numerical Optimization. CASINO is a communicationframework for EDA development that is used here to distribute the CovarianceMatrix Adaptation based Evolutionary Strategy, CMA-ES [1].

In designing a distributed CMA-ES, we focus on the communication proto-col for the migration of progress information. We examine whether randomizedmigration protocols are effective when compared to static and centralized pro-tocols, as well as the type of information exchanged by the independent EDAs.Conventionally island models communicate current best solutions. In this paperwe experiment with CMA-ES passing either the best solutions or the island’scovariance matrix.

Our evaluation perspective is practical; we focus on a real world wind farmturbine layout optimization. We proceed as follows: Section 2 briefly considersrelated work. In Section 3 we briefly review the CMA-ES algorithm. Section 4describes CASINO. In Section 5 we describe different randomized protocols forsending information between islands. Section 6 describes the layout optimizationproblem we use CASINO for. Section 7 compares different strategies based ontheir performance on our exemplar problem. Section 8 concludes with futurework.

2 Related Work

There is a large body of literature on methods of distributing Evolutionary Al-gorithms, EAs, for which [2–5] serve well as overviews. In some circumstances,simple parallelization models such as independent, parallel runs or master-slavefitness evaluation suffice. There are examples of continuously valued, distributedEAs for numerical optimization which commonly use Particle Swarm Optimiza-tion, Evolutionary Strategies, or Genetic Algorithms [6–11]. A few of these ap-proaches focused on adapting MapReduce to scale algorithms to compute re-sources on the cloud [10].

A closely related work appears in [7] where authors have used a Message Pass-ing Interface, or MPI, over a distributed file system. They developed this systemfor compute grids and achieved efficiencies via MPI over distributed, shared andhybrid memory systems. In our current work we do not use or require such afile system since we believe that it can cause latencies and is not particularlyrequired for an island based model which has asynchronous and infrequent com-munications. Additionally, on the cloud we cannot assume the multiple cores ofour virtual machines reside on the same physical machines, which algorithmsbased on shared memory systems assume.

3 Distributed CMA-ES Strategy

CMA-ES self-adapts the covariance matrix of a multivariate normal distribution.This normal distribution is then sampled to draw the variables of a candidate

Cloud Scale Distributed Evolutionary Strategies 521

solution in the multidimensional search space. The covariance matrix guides thesearch by adaptively biasing sampling toward historically profitable correlationsbetween the variables. This makes the evolutionary search powerful.

Consider a representation xk for the kth solution to the optimization problemthat attempts to minimize the objective function f(x). In each iteration, t, thealgorithm samples λ number of solutions from a multivariate normal distributiongiven by

x(t+1)k ∼ N (m(t), σ2(t),C(t))∀k. (1)

After evaluating these solutions against the fitness function a subset μ are se-lected for updating the mean and covariance of the multivariate distribution.The mean is updated by

m(t+1) =

μ∑

i=1

ωix(t+1)i , such that

μ∑

i=1

wi = 1andwi > 0 (2)

The covariance matrix could be simply updated by:

C(t+1)μ =

μ∑

i=1

wi

(x(t+1)i −m(t)

σ(t)

)(x(t+1)i −m(t)

σ(t)

)T

(3)

The CMA-ES algorithm also incorporates additional information based on tra-jectory of the mean and the covariance matrix as iterations progress. For furtherinformation we refer readers to [1].

4 CASINO Setup

One of the main goals in CASINO’s design was a simplistic communication pro-tocol and the ease of its setup. The network layout and migration protocolscan be easily manipulated by changing neighbor selection methods, informa-tion passed, and communication frequency without disturbing the underlyingarchitecture. The Instrument Control Toolbox in MATLAB is used to facili-tate information passing over TCP-IP, and system-level ssh calls are used fornotification. CASINO’s setup has the following configuration steps:

Step 1: Infrastructure: A central server node requests a batch of nodes fromthe cloud. The server collects, and then broadcasts, the nodes’ IP addresses aslist R. A node i establishes a connection with another node j from its IP list Ri

by creating an exclusive mailbox with a unique id. The mailbox configuration isdetermined by the user; while we have explored different migration topologies,every node has the capacity to engage in various topologies given Ri. Each nodealso creates a notification mailbox which a sender uses to inform it that a messagehas arrived at the sender’s exclusive mailbox.

Step 2: Information Communicated: Currently, we allow each CMA-ESnode to send either a subset of its individuals or its covariance matrix. In the

522 D. Wilson, K. Veeramachaneni, and U.M. O’Reilly

Fig. 1. An example of a 5-node network. The server node (not shown) is connected toport 5000 on each worker node. These nodes are fully connected to each other using theports numbered higher than 5000. They also all use port 22 as a notification mailbox.

case of former, the best μ individuals are chosen, though the entire set could becommunicated. The fitness of the individuals is sent as well to avoid reevaluation.Similarly, covariance matrices are reduced to the upper triangle of the matrixbefore passing and the best individual’s fitness is included to combat redundancy.We are also interested in communicating parts of a larger, centralized covariancematrix, but have not yet explored this option.

Step 3: Migration Protocol: The details of the migration protocols will befully discussed in Section 5. Our ability to experiment with different migrationprotocols comes from the infrastructure configured in Step 1 as well the natureof CMA-ES algorithm where information received can be integrated in any of itsiteration. We experiment with a variety of pre-determined migration protocolsand a set of randomized strategies.

Step 4: Message Information Integration: During a run, a node integratesthe information it receives from its neighbors. When the information unit is itsneighbor’s best μ solutions, it simply merges these into its population beforemaking a selection of best μ from which the next covariance matrix is estimatedand subsequently updated.

When the unit is the entire covariance matrix, we modify the CMA-ES algo-rithm include the covariance matrix from the neighbor:

α(1 − ccov)C(t) + (1− α)(1 − ccov)Cn

(t) (4)

where α is the relative weight. We call this neighbor-update. When covariancematrices arrive from multiple nodes, we rank the covariance matrices based ontheir associated fitness, that of the best population on their island, and choosethe best one to integrate.

5 Randomized Migration Protocols

Next we attempt to overlay different migration protocols on our infrastructure.A migration protocol is defined by three aspects: topology chosen, topology

Cloud Scale Distributed Evolutionary Strategies 523

parameters, and type of information. Below we describe different ways one canmake choices in these.

Topology Selection:With regards to topology one can overlay a fixed topologylike Ring, Broadcast , or even No communication which we call a static topol-ogy. An alternative is to choose a topology at random every generation. This ispossible because each node in our network has the IP addresses of all the othernodes. This we call a dynamic strategy.

Parameter Selection: There are two parameters that are specified in a mi-gration protocol. They are: q, number of neighbors, and γ, migration frequency.Selection of these parameters can have significant influence on the network useand possibly latencies. These parameters are chosen randomly as per a proba-bility distribution; in our case γ ∼ U [5, 10] and q ∼ N (k, k

2 ) where k ∈ [n, n2 ].

Our current framework allows the user to choose these parameters in the follow-ing ways: the choice can be made centrally and every island is passed the sameparameters at initialization (homogeneous) or each island can choose its ownparameters based on the distribution (heterogeneous). Additionally, we allowfor further protocols by introducing randomization during run time (RR). Eachnice can change its parameter q during run time and decide whether or not sendrandomly by flipping a biased coin.

A user of our framework can select any configuration in terms of topologyand parameters. We experimented with a few and the table below presents thenames we use for these protocols and the choices that are made for topology andparameters in each of them.

Table 1. Different protocols tested. HO implies homogeneous, HE implies heteroge-neous, and RR implies randomized during run time

Protocol Topology Parametersq γ

Static Static HO HO

Static-Random Frequency Static HO HE

Dynamic Dynamic HE HE

Dynamic-Random Frequency Dynamic RR HE

Dynamic-REG Dynamic RR RR

6 An Exemplar High Dimensional Problem

Toanalyze theperformance of the algorithmunder a variety of choices,we selectedareal world high dimensional problem.We chose a wind energy layout problem thathas been studied by a number of researchers who have applied either centralizedversion (with multithreading for parallelizing fitness evaluation) of the CMA-ESalgorithm. The goal is to identify a turbine layout, given by the x, y co-ordinatesfor the turbines, that maximizes the energy capture from a given farm

arg max(X,Y )

η(X,Y, v, β(v)) (5)

524 D. Wilson, K. Veeramachaneni, and U.M. O’Reilly

where v is the wind speed, and the function β(v), known as a power curve, givesthe power generated by a specific turbine for a given wind speed. Wind speedv however is a random variable with a Weibull distribution, pv(v, c, k), which isestimated from wind resource data. This distribution also changes as a functionof direction, θ which varies from 00 − 3600, yielding a probability density func-tion for different θ given by pθv(v, c, k). Additionally, wind flows from a certaindirection with some probability P (θ). These different pieces of information areinputs to the algorithm. Due to the random nature of wind velocity, the objec-tive function evaluates the expected value of the energy capture for a given windresource and turbine positions. For a single turbine, this value can be calculatedusing

Ei[η] =

θ

P (θ)

v

pθv(v, ci, k,i xi, yi, X, Y )βi(v). (6)

Equation 6 evaluates the overall average energy over all wind speeds for a givenwind direction, and then averages this energy over all the wind directions. ci, kiare turbine specific resource parameters derived for the ith turbine after wakecalculations. For more details, refer to [12].

The goal of the optimization problem is to maximize Equation 6. This prob-lem has analysis value because of its high dimensionality, non-linear variablerelationships and expensive fitness evaluation.

7 Experiments and Analysis

We use a 200 turbine problem with a 400 dimensional search space for evaluation.All experiments use 100 cloud nodes with an island on each. The CMA-ESparameters of each island are μ = 10 and λ = 20. All experiments are run 20times and the results presented are averages. Each run takes approximately 18minutes. The fitness evaluation for a 200 turbine problem takes 5 seconds.

0 20 40 60 80 100

1.22

1.24

1.26

1.28

x 106

Generation

Po

wer

Ou

tpu

t

No CommunicationBroadcastCircularDynamicStatic−Random Frequency

Fig. 2. Performance of different static topologies vs randomized topologies

Cloud Scale Distributed Evolutionary Strategies 525

Are Dynamic Topologies Harmful?We first compare Static-Random Frequency and Dynamic which have randommigration rates and random topologies to Static with Ring and Broadcast topolo-gies. For fair comparison, Static-Random Frequency and Ring both have q = 2implying they exchange the same amount of information. In Dynamic the num-ber of neighbors is drawn randomly q ∼ N (k, k

2 ) at the beginning of the run,where k = 2. We include an experiment of No communication also.

Per Figure 2, as expected, No communication fares worst. Broadcast was aspoor or statistically the same as No communication likely because there wastoo much information exchanged.The randomized strategies work better thanRing suggesting that a random topology is at least not harmful, and might driveadvantageous population diversity.

0 20 40 60 80 100

1.22

1.24

1.26

1.28

x 106

Generation

Po

wer

Ou

tpu

t

CentralizedComplete StaticStatic−Random FrequencyDynamicDynamic RandomDynamic − REG

(a)

0 500 1000 15001.22

1.24

1.26

1.28 x 106

Time [s]

Po

wer

Ou

tpu

t

StaticStatic−RFDynamicDynamic−RFDynamic−REG

(b)

Fig. 3. Comparing centralized CMA-ES with 100 times the population to a distributedCMA-ES with smaller populations and different information sharing protocols. Thetime scale is also show as a performance comparison; the centralized CMA-ES runtimewas of a higher order of magnitude and is not shown.

Are Islands Harmful?As another check, we compare the island model to a CMA-ES where all sam-ples are centralized, (μ = 1000 and λ = 2000) and each sample directly affectsthe covariance matrix update. We observe that the centralized CMA-ES pre-maturely converges quickly (within 25 generations). Initially, it outperforms thedistributed protocols, but within 100 generations, all of them surpass it. Theinitial benefit of centralized CMA-ES is its complete communication and instan-taneous integration of results. In the island model, full integration hangs onresult migration.

This clearly shows that distributed sampling is not harmful and could, indeed,be advantageous.We also compare the time taken for different protocols to finish.Figure 3(b) shows the progression of different approaches in terms of fitness astime progresses. All values are averaged over 20 trials.

What Is the Best Number of Neighbors? We now evaluate the sensitiv-ity of each distributed strategies and randomized protocols to q, the number of

526 D. Wilson, K. Veeramachaneni, and U.M. O’Reilly

0 20 40 60 80 100

1.22

1.24

1.26

1.28

x 106

Generation

Po

wer

Ou

tpu

t 24163250

(a) Static

0 20 40 60 80 100

1.22

1.24

1.26

1.28

x 106

Generation

Po

wer

Ou

tpu

t 24163250

(b) Static-Random Frequency

0 20 40 60 80 100

1.22

1.24

1.26

1.28

x 106

Generation

Po

wer

Ou

tpu

t 24163250

(c) Dynamic

0 20 40 60 80 100

1.22

1.24

1.26

1.28

x 106

Generation

Po

wer

Ou

tpu

t 24163250

(d) Dynamic-Random Frequency

0 20 40 60 80 100

1.22

1.24

1.26

1.28

x 106

Generation

Po

wer

Ou

tpu

t 24163250

(e) Dynamic-REG

Fig. 4. Comparison of q for distributed protocols

neighbors or, for dynamic protocols with respect to q, the mean of the distribu-tion from which the number of neighbors was determined. Figure 4 shows eachprotocol with q = 2, 4, 8, 16, 32, 50. The Static protocol is best with a two neigh-bor topology. The best q for the other randomized protocols is not discerniblestatistically, though, in the case of Dynamic q = 4 appears to best the othersby a slim margin. Across the protocols, this makes it infeasible to state whetherthe “best” q remains the same or differs.

Static-Random Frequency performs the best overall, but was also the mostcostly in computation time, taking 23 minutes per run on average. This is causedby processing time of incoming and outgoing populations. While each node inthis protocol sends its populations to the number of individuals indicated, 2 forthe best run, a single node may be a receiver more frequently. The overburdenednodes increase the overall optimization time.

Cloud Scale Distributed Evolutionary Strategies 527

0 20 40 60 80 1003.15

3.2

3.25

3.3

3.35 x 105

GenerationP

ow

er O

utp

ut

Covariance SendingPopulation Sending

Fig. 5. Performance comparison of communicating best solutions vs the covariance ma-trix for a 50 turbine optimization problem using the Static-Random Frequency protocol

Static-Random Frequency and Dynamic both showed increased variability be-tween values of q. Multiple runs of Dynamic-Random Frequency and Dynamic-REG outperformed the others but not on average. This is the impact of increasedrandomization; a randomized dynamic network may end up having little commu-nication, or communication may be isolated to a small section of the population.

Information Unit: Individuals vs Covariance Matrix. Time permits usonly to briefly study the difference between communicating the μ best individu-als as units of information to exchanging the covariancematrix, see Figure 5. Notethere will be a tradeoff point between μ and dimensionality where the message size

of each will cross. A covariance matrix for an n-dimensional problem is of size n2

2 .For a 200 dimension problem this message size and update time is high, so we com-pared the two information units on a 50 dimension problem. The best individualinformationoutperformed the covariancematrix approach initially, butby100 gen-erations both experiments achieve the samemean best fitness. This is likely due toour decision to only integrate the best covariance matrix received from neighborseach iteration. This slows the sharing of crucial information.

8 Conclusions and Future Work

In this paper we presented our island based CMA-ES algorithm capable of run-ning on the cloud. We identified a variety of parameters for an island basedmodel which can be randomized in order to overcome the latencies introducedby the cloud due to its virtualization layer and resource sharing. We investigatedthe performance of these strategies (on a real world problem) by running on ourprivate cloud. We also investigated whether or not passing covariance matrixhelps the distributed model in terms of performance. Passing the covariance ma-trix is extremely expensive and it is not clear from the brief study we performedif it is beneficial. We did observe that the pattern of convergence is different. Aspart of future work we would like to investigate this further.

This paper’s investigation related to dynamic and randomized migration fora cloud based black box optimization is likely to generally extrapolate to similarisland model versions of EDAs. This is because EDAs do not each individually

528 D. Wilson, K. Veeramachaneni, and U.M. O’Reilly

make special consideration related to topology, e.g. the source and destinationof communicated information in CMA-ES or other such algorithms is not spe-cific to the algorithm. For our investigation with respect to what information toexchange: when current best solutions are migrated, results should extrapolateto other algorithms. However, because the covariance matrix is not common toall approaches, the findings are restricted to CMA-ES.

Acknowledgements. Dennis Wilson acknowledges the support of MIT En-ergy Initiative. Una-May and Kalyan acknowledge the support from GE GlobalResearch center. Any opinions, findings, and conclusions or recommendationsexpressed in this material are those of the authors and do not necessarily reflectthe views of General Electric Company or MITEI.

References

1. Hansen, N.: The CMA evolution strategy: a comparing review. In: Lozano, J.A.,Larranaga, P., Inza, I., Bengoetxea, E. (eds.) Towards a New Evolutionary Com-putation. Advances in Estimation of Distribution Algorithms, pp. 75–102. Springer(2006)

2. Alba, E.: Parallel metaheuristics: a new class of algorithms, vol. 47. Wiley-Interscience (2005)

3. Tomassini, M.: Spatially structured evolutionary algorithms. Springer (2005)4. Nedjah, N., Alba, E., de Macedo Mourelle, L.: Parallel Evolutionary Computations.

Springer (2006)5. Cantu-Paz, E.: Efficient and accurate parallel genetic algorithms. Springer, Nether-

lands (2000)6. Zhu, W.: Nonlinear optimization with a massively parallel evolution strategy pat-

tern search algorithm on graphics hardware. Applied Soft Computing 11(2), 1770(2011)

7. Muller, C.L., Baumgartner, B., Ofenbeck, G., Schrader, B., Sbalzarini, I.: pcmalib:a parallel fortran 90 library for the evolution strategy with covariance matrix adap-tation. In: Proceedings of the 11th Annual Conference on Genetic and EvolutionaryComputation, pp. 1411–1418. ACM (2009)

8. Rubio-Largo, A., Gonzalez-Alvarez, D.L., Vega-Rodrıguez, M.A., Almeida-Luz,S.M., Gomez-Pulido, J.A., Sanchez-Perez, J.M.: A Parallel Cooperative Evolution-ary Strategy for Solving the Reporting Cells Problem. In: Corchado, E., Novais,P., Analide, C., Sedano, J. (eds.) SOCO 2010. AISC, vol. 73, pp. 71–78. Springer,Heidelberg (2010)

9. Rudolph, G.: Global Optimization by Means of Distributed Evolution Strategies.In: Schwefel, H.-P., Manner, R. (eds.) PPSN 1990. LNCS, vol. 496, pp. 209–213.Springer, Heidelberg (1991)

10. Gunarathne, T., Wu, T.L., Qiu, J., Fox, G.: Mapreduce in the clouds for science.In: 2010 IEEE Second International Conference on Cloud Computing Technologyand Science (CloudCom), November 30-December 3, pp. 565–572 (2010)

11. Verma, A., Llora, X., Goldberg, D., Campbell, R.: Scaling genetic algorithms usingmapreduce. In: Ninth International Conference on Intelligent Systems Design andApplications, ISDA 2009, November 30-December 2, pp. 13–18 (2009)

12. Kusiak, A., Song, Z.: Design of wind farm layout for maximum wind energy capture.Renewable Energy 35(3), 685–694 (2010)