Data-Intensive Computing for Competent Genetic Xavier Llor ... · Data-Intensive Computing for...

of 17/17
Data-Intensive Computing for Competent Genetic Algorithms: A Pilot Study using Meandre Xavier Llor` a IlliGAL Report No. 2009001 February, 2009 Illinois Genetic Algorithms Laboratory University of Illinois at Urbana-Champaign 117 Transportation Building 104 S. Mathews Avenue Urbana, IL 61801 Office: (217) 333-2346 Fax: (217) 244-5705
  • date post

    07-Jun-2020
  • Category

    Documents

  • view

    7
  • download

    0

Embed Size (px)

Transcript of Data-Intensive Computing for Competent Genetic Xavier Llor ... · Data-Intensive Computing for...

  • Data-Intensive Computing for Competent GeneticAlgorithms: A Pilot Study using Meandre

    Xavier LloràIlliGAL Report No. 2009001

    February, 2009

    Illinois Genetic Algorithms LaboratoryUniversity of Illinois at Urbana-Champaign

    117 Transportation Building104 S. Mathews Avenue

    Urbana, IL 61801Office: (217) 333-2346Fax: (217) 244-5705

  • Data-Intensive Computing for Competent Genetic Algorithms:

    A Pilot Study using Meandre

    Xavier Llorà

    National Center for Supercomputing ApplicationsUniversity of Illinois at Urbana-Champaign

    1205 W. Clark Street, Urbana, IL [email protected]

    February 5, 2009

    Abstract

    Data-intensive computing has positioned itself as a valuable programming paradigm to effi-ciently approach problems requiring processing very large volumes of data. This paper presentsa pilot study about how to apply the data-intensive computing paradigm to evolutionary com-putation algorithms. Two representative cases—selectorecombinative genetic algorithms andestimation of distribution algorithms—are presented, analyzed, discussed. This study showsthat equivalent data-intensive computing evolutionary computation algorithms can be easilydeveloped, providing robust and scalable algorithms for the multicore-computing era. Exper-imental results show how such algorithms scale with the number of available cores withoutfurther modification.

    1 Introduction

    Data-intensive flow frameworks can be traced back to mid 90’s with the appearance of frameworkssuch as D2K (Welge et al., 2003), and later simplified and popularized by Google’s MapReducemodel (Dean and Ghemawat, 2004) and Yahoo!’s open source Hadoop project1. The underlyingcommon characteristic of all these frameworks is their data-flow oriented processing. Availabilityof data drives, not only the flow of execution, but also the parallel nature of such processing.The growth of the internet has pushed researchers from all disciplines to deal with volumes ofinformation where the only viable path is to utilize data-intensive frameworks (Uysal et al., 1998;Beynon et al., 2000; Foster, 2003; Mattmann et al., 2006). Despite large bodies of research onparallelizing evolutionary computation algorithms are available (Cantú-Paz, 2000; Alba, 2007),there has been little work done in exploring the usage of data-intensive computing.

    The inherent parallel nature of evolutionary algorithms makes them optimal candidates forparallelization (Cantú-Paz, 2000). Moreover, as we will layout in this paper, evolutionary algorithmsand their inherent need to deal with large volumes of data—regardless if it takes the form ofpopulations of individuals or samples out of a probabilistic distribution—can greatly benefit froma data-intensive computing modeling. Two key benefits of such a modeling are: (1) favoringencapsulation, reutilization, and sharing via Lego-like component modeling, and (2) massive parallel

    1http://hadoop.apache.org/

    1

    http://hadoop.apache.org/

  • data-driven execution. The first one targets improving software engineering best practices and adetailed discussion is beyond the scope of this paper and can be find elsewhere (Llorà et al., 2008).In this paper we will focus on the massive parallel data-driven execution that allows users toautomatically benefit from the advances of the current multicore era—which has open the door topetascale computing—without having to modify the underlying algorithm.

    To illustrate these points two representative cases will be analyzed. A simple selectorecombi-native genetic algorithm (Goldberg, 1989, 2002) will be used to illustrate how to it can be modelusing the data-intensive computing approach. We will review (1) some of the basic steps of thetransformation process required to achieve its data-intensive computing counterpart, (2) how todesign components that can maximally benefit from a data-driven execution, and (3) some counterintuitive results. The second example will address the parallelization of the model building of esti-mation of distribution algorithms. We will show how a data-driven implementation of the extendedcompact classifier system (eCGA) (Harik et al., ress) produces, de facto, a parallelized implementa-tion of the costly model building stage. Experiments will show that speedups linear to the numberof cores or processors are possible without any further modification. Both examples were imple-mented using Meandre2(Llorà et al., 2008), a semantic-driven data-intensive computing frameworkfor high performance computing.

    The rest of this paper is organized as follows. Section 2 briefly reviews basic data-intensiveconcepts used throughout the paper. This section also presents key features of Meandre that willbe used in the experimental section. Section 3 presents a first exercise of developing a data-intensivecomputing version of a simple selectorecombinative genetic algorithm. This section also presentsexecution profiles obtained and some counter intuitive results. Then, in section 4 we present theresults obtained after the conversion of the eCGA model building algorithm showing that speedupslinear to the number of cores or processors are possible without any further modification. Finally,section 5 presents some conclusions and future work.

    2 Meandering into Data-Intensive Computing

    Meandre (Llorà et al., 2008) is a semantic-enabled web-driven, dataflow execution environment. Itprovides the machinery for assembling and executing data flows. Flows are software applicationscomposed by components that process data. Each flow represents as a directed graph of executablecomponents—nodes—linked through their input and output ports. Based on the inputs, prop-erties, and its internal state, an executable component may produce output data. Meandre alsoprovides publishing capabilities per flow and executable component, enabling users to assemble arepository of components based on the reuse and share. Users can readily leverage other researchand development efforts by querying and integrating component descriptions that have been pub-lished previously at other shareable repository locations transparently. It is important to mentionhere, that theses are self contained elements—other approaches like Chimera still rely on externalinformation Foster (2003). Meandre builds on three main concepts: (1) dataflow-driven execution,(2) semantic-web metadata manipulation, and (3) metadata publishing. A detailed description ofthe Meandre data-intensive computing architecture is beyond the scope of this paper and can befound elsewhere (Llorà et al., 2008).

    2Catalan spelling of the word meander.

    2

  • 



Component


    P


    Inputs
 Outputs


    Behavior
descriptor
 Implementa7on


    (a) A component is described by several input andoutput ports where data flows through. Also, eachcomponent have a set of properties which govern itsbehavior in the presence of data.

    Read


    P
 Merge


    P


    Convert


    P


    Show


    P


    Get


    P


    Dataflow execution

    (b) A flow is a directed graph where multiple compo-nents are connected together via input/output ports.A flow represents a complex task to solve.

    Figure 1: A data-intensive flow is characterized by the components it uses (basic process units) andtheir interconnection (a direct graph). Grouping several components together describes a complextask. It also emphasize rapid development by component reutilization.

    2.1 Data-flow execution engines

    Conventional programs perform their computational tasks by executing a sequence of instructions.One after another, each code instruction is fetched and executed. Any data manipulation is per-formed by these basic units of execution. In a broad sense, this approach can be termed “code-drivenexecution.” Any computation task is regarded as a sequence of code instructions that ultimatelymanipulates data. However, data-driven execution (or dataflow execution) revolves around the ideaof applying transformational operations to a flow or stream of data. In a data-driven model, dataavailability determines the sequence of code instructions to execute.

    An analogy of the dataflow execution model is the black box operand approach. That is,any operand (operator) may have zero or more data inputs. It may also produce zero or moredata through its data outputs. The operand behavior may be controlled by properties (behaviorcontrols). Each operand performs its operations based on the availability of its inputs. For instance,an operand may require that data is available in all its inputs to perform its operations. Othersmay only need some, or none. A simple example of a black box operand could be the arithmetic‘+’ operand. This operand can be modeled as follows:

    1. It requires two inputs.

    2. When two inputs are available, it performs the addition.

    3. It then pushes the result as an output.

    Such a simple operand may have two possible implementations. The first one defines a exe-cutable component (Meandre terminology for a black box operator) with two inputs. When datais present on both inputs, then the operator is executed—fired. The operator produces one piece ofdata to output, which may become the input of another operator. Another possible implementationis to create a component with a single input that adds together two consecutive data pieces received.

    3

  • The component requires an internal variable (or state) which stores the first data piece of a pair.When the second data piece arrives, it would be added to the first and an output is produced. Theinternal variable would then be cleared so that the component will know that the next data piecereceived is the first of a new pair. As we will see later in this paper, both implementations havemerit, but in certain conditions we will choose one over the other based on clearness and efficiencycriteria.

    Meandre uses the following terminology:

    1. Executable component: A basic unit of processing.

    2. Input port: Input data required by a component.

    3. Firing policy: The policy of component execution (e.g. when all/any input ports containdata).

    4. Output port: Outputs data produced by component execution.

    5. Properties: Component read-only variables used to modify component behavior.

    6. Internal state: The collection of data structures designed to manage data between componentfirings.

    Figure 1 presents a schema of the component and flow anatomy. Components with inputand output ports can be interconnected to describe a complex task, commonly referred as flow.Dataflow execution engines provide a scheduler that determines the firing (execution) sequence ofcomponents3.

    2.2 Components

    Meandre components serve as the basic building block of any computational task. There are twokinds of Meandre components: (1) executable components and (2) flow components. Regardless oftype, all Meandre components are described using metadata expressed in RDF. Executable com-ponents also require an executable implementation form that can be understood by the Meandreexecution engine4. Meandre’s metadata relies on three ontologies: (1) the RDF ontology (Beck-ett, 2004; Brickley and Guha, 2004) serves as a base for defining Meandre components; (2) theDublin Core elements ontology (Weibel et al., 2008) provides basic publishing and descriptive ca-pabilities in the description of Meandre components; and (3) the Meandre ontology describes aset of relationships that model valid components, as understood by the Meandre execution enginearchitecture—refer to (Llorà et al., 2008) for a more detailed description.

    2.3 Programming Paradigm

    The programming paradigm creates complex tasks by linking together a bunch of specializedcomponents—see Figure 1. Meandre’s publishing mechanism allows components develop by thirdparties to be assembled in a new flow. There are two ways to develop flows on Meandre: (1) usingvisual programming tools, or (2) using Meandre’s ZigZag scripting language—see (Llorà et al.,2008). For simplicity purposes, throughout the rest of this paper flows will be presented as ZigZagscripts.

    3Meandre uses a decentralized scheduling policy designed to maximize the use of multicore architectures. Meandrealso allows works with processes that require directed cyclic graphs—extending beyond the traditional MapReducedirected acyclic graphs.

    4Java, Python, and Lisp are the current languages supported by Meandre to implement a component

    4

  • 3 Selectorecombinative Genetic Algorithms

    Selectorecombinative genetic algorithms (Goldberg, 1989, 2002) mainly rely on the use of selectionand recombination. We chose to start with them because they present a minimal set of operatorsthat will help us to illustrate the main elements to take into account when. As we will see, theaddition of mutation operators will be trivial after the setting up the proper data-intensive flow.The rest of this section will present a quick description of the algorithm we transform and implementit using Meandre, a discussion of some of the elements that need to be taken into account, andfinally review the execution profile of the final implementation.

    3.1 The original algorithm

    The basic algorithm we will target to implement as a data-intensive flow can be summarized asfollows.

    1. Initialize the population with random individuals.

    2. Evaluate the fitness value of the individuals.

    3. Select good solutions by using s-wise tournament selection without replacement (Goldberget al., 1989).

    4. Create new individuals by recombining the selected population using uniform crossover5(Sywerda,1989).

    5. Evaluate the fitness value of all offspring.

    6. Repeat steps 3–5 until some convergence criteria are met.

    3.2 Population vs. individuals

    The first step in designing a data-intensive flow implementation of the algorithm presented in theprevious section is to identify what data will be processed. This decision is similar to the partitionstep of the methodology proposed by Foster (Foster, 1995), to design general purpose parallelprograms, where data is maximally partitioned to maximize parallelization. The two possibleoptions here are to deal with populations or individuals. E2K (Llorà, 2006)—a data-flow extensionfor D2K (Welge et al., 2003)—chose to use populations. Such a decision only allows parallelizingthe concurrent evolution of distinct populations. In this paper we will choose the second option.Our data-flow implementation is going to be built around processing individuals. In other words, apopulation will be a stream of individuals—a stream initiator and a stream terminator will encloseeach stream. Making the decision of process streams of individuals will allow creating componentsthat perform the genetic manipulation as the stream goes by. This approach may be regarded asan analogy of pipeline segmentation on central processing units.

    3.3 Components

    Inspecting the algorithm presented in section 3.1, we need to create components that implementeach of the operation required: initialization, evaluation, selection, and recombination. Initializationand evaluation are straight forward; the first one creates the stream of individuals that formsa population, and the second one updates the fitness of the individuals as they are streamed

    5For this particular exercise we have assume a crossover probability pχ=1.0.

    5

  • Figure 2: Meandre’s flow for the proposed selectorecombinative genetic algorithm flow.

    by. The recombination components—as introduced in section 2—will require and internal state.As individuals stream by, it requires two individuals in order to perform the uniform crossoveroperation.

    The selection component requires a bit more thinking. One may think that a similar imple-mentation to the one used for the recombination component may work approximately accurateenough. However, such an implementation would be equivalent to implement a spatial based selec-tion method instead of a tournament one. Spatially constraint selection methods have been shownto elongate the takeover time, and thus reduce the selection pressure when compared to targetedtournament selection without replacement (De Jong and Sarma, 1995; Sarma and De Jong, 1997,1998; Llorà, 2002; Giacobini et al., 2005). Also, following on the temptation of accumulate allthe individuals and then recreate the stream as we conduct tournaments against the accumulatedpopulation also seems to prone to introduce a large sequential bottleneck and thus leaving theexecution profile prone to the Amdahl’s law (Amdahl, 1967). The answer is simpler. Create allthe required tournaments when you get the stream initiator. Then as individuals are streamed inperform the possible tournaments and start streaming the new selected population. Thus, we willguarantee that as individuals are still streamed in, we are already streaming out of the componenta newly selected population, minimizing the serialization time required.

    3.4 The data-intensive flow

    Figure 2 presents the components discussed above and how they get assembled to for the final data-intensive flow. The sbp (stream binary population) streams a population of individuals to start theflow. Individuals get streamed into the soed (single open entry door) component. This is a specialcomponent that its only goal is to make sure that all the individuals on the initial populationare streamed into the evaluation eps component before the next streamed population arrives.The goal of this component is to avoid having individuals from the next population mixed with theprevious population ones. It is important to note here that individuals my still be streamed in fromthe initialization when new individuals are already being streamed out the recombination ucbpscomponent, and hence, the need to avoid such a problem. Evaluated individuals are streamed intothe tournament selection twrops component, and the selected ones into the recombination ucbpscomponent. New offspring will be sent for evaluation passing through the soed safe gate. Table1 presents the ZigZag script implementing this flow. A detailed description on how to implementMeandre components and write ZigZag scripts can be found elsewhere—see (Llorà et al., 2008) and

    6

  • Table 1: ZigZag script implementing a selectorecombinative genetic algorithm using Meandre com-ponents.

    ## Main imports## Omitted for simpliciticy#

    ## Instances#sbp,soed,eps = SBP(), SOED(),E PS()noit,twrops,ucbps = NOIT(), TWROPS(), UCBPS()print = PRINT()

    ## The flow#@new_pop, @cross_pop = sbp(), ucbps()@pop_ed = soed(

    initial_stream:new_pop.population;stream:cross_pop.population

    )@eval_pop = eps(population:pop_ed.stream)@sel_pop = twrops(population:eval_pop.population)@cnt = noit(population:sel_pop.population)ucbps(population:cnt.population)print(population:cnt.final_population)

    7

  • 0


    5000


    10000


    15000


    20000


    25000


    /sbp

    /soe

    d


    /eps


    /twrops


    /noit


    /ucbps


    (a) Original

    0


    5000


    10000


    15000


    20000


    25000


    /sbp


    /soed


    /eps/mapper/popula2on/4


    /eps/paralell/0


    /eps/paralell/1


    /eps/paralell/2


    /eps/paralell/3


    /eps/reduce/popula2on/4


    /twrops


    /ucbps/mapper/popula2on/4


    /ucbps/paralell/0


    /ucbps/paralell/1


    /ucbps/paralell/2


    /ucbps/paralell/3


    /ucbps/reduce/popula2on/4


    /noit


    (b) Parallelized

    Figure 3: Execution profile of the original data-instensive flow implementing a selectorecombinativegenetic algorithm and its automatically parallelized version of epb and ucbps components (paral-lelization degree equal to the number of available cores, 4). Times are in milliseconds and areaverages of twenty runs.

    .

    http://seasr.org/meandre.

    3.5 Experiments

    We run some experiments on this to illustrate the benefits of data-intensive computing model.Unless noted otherwise, the experiments were run on an Intel 2.8GHz Quad Core equipped with4Gb of RAM, running Ubuntu Linux 8.0.4, and Sun JVM 1.5.0 15. The problem we solved was arelatively small OneMax (Goldberg, 1989, 2002), for 5, 000 bits and a population size of 10, 000.The goal of the exercise was to reveal the execution profile of the converted algorithm. Figure3(a) presents the average time spent by each component. Times are averages of 20 runs. The firstthing to point out is that evaluation is not the most expensive part of the execution. OneMax isso simple, that the cost of selection and crossover dominates the execution.

    Such counter intuitive profile would be a problem is we took a traditional parallelizationroute based on master/slave configurations delegating evaluations (Cantú-Paz, 2000)—which worksits best on the presence on costly evaluation functions. Thanks to choosing a data-intensivecomputing—and Meandre’s ability to automatically parallelize components6—we can also auto-matically parallelize the costly part of the execution: the uniform crossover7. Also, we can, at thesame time parallelize the evaluation, which in this situation may have little effect. However, thekey property to highlight is that either in this cases, or in the case of having a costly evaluationfunction, the underlying data-intensive flow algorithm does not need to be change, and componentparallelization will help, for a given problem, parallelize the costly parts of the execution profile—see Figure 3(b). Next section will show the scalability benefits that this data-intensive approachcan help unleash.

    6Multiple instance of a component can be spawned to process in parallel incoming individuals.7Same considerations would apply in case of having a mutation component.

    8

    http://seasr.org/meandre

  • 4 The Extended Compact Genetic Algorithm

    Probabilistic model-building genetic algorithms (PMBGAs) (Pelikan et al., 2000, 2002)—also knowas estimation of distribution algorithms (EDAs) (Larrañaga and Lozano, 2002)—rely on learninga probability distribution from a population which will be sampled to create the new population.Usually, the learned distribution also provides useful insights about the problem structure. Despitethe flavor we choose, learning the probability distribution (or model building)—ignoring the con-tribution of fitness evaluation after the discussion we had in the previous section—is the costliestpiece, governing the execution time. The rest of this section will present a quick description of asimple PMBGA/EDA, the extended compact genetic algorithm (Harik et al., ress). We will alsofocus how we transform and implement the model-building algorithm of eCGA using Meandre.Finally, we will review the scalability of the final implementation on multicore and multiprocessorNUMA (non uniform memory access) architectures.

    4.1 An eCGA overview

    The extended compact genetic algorithm (eCGA) (Harik et al., ress), is based on a key idea thatthe choice of a good probability distribution is equivalent to linkage learning. The measure ofa good distribution is quantified based on minimum description length(MDL) models. The keyconcept behind MDL models is that given all things are equal, simpler distributions are betterthan the complex ones. The MDL restriction penalizes both inaccurate and complex models,thereby leading to an optimal probability distribution. The probability distribution used in eCGAis a class of probability models known as marginal product models (MPMs). MPMs are formedas a product of marginal distributions on a partition of the genes. MPMs also facilitate a directlinkage map with each partition separating tightly linked genes.

    The eCGA can be algorithmically outlined as follows:

    1. Initialize the population with random individuals.

    2. Evaluate the fitness value of the individuals.

    3. Select good solutions by using s-wise tournament selection without replacement (Goldberget al., 1989).

    4. Build the probabilistic model: In χeCGA, both the structure of the model as well as theparameters of the models are searched. A greedy search is used to search for the model ofthe selected individuals in the population.

    5. Create new individuals by sampling the probabilistic model.

    6. Evaluate the fitness value of all offspring.

    7. Repeat steps 3–6 until some convergence criteria are met.

    Two things need further explanation: (1) the identification of MPM using MDL, and (2) thecreation of a new population based on MPM.

    The identification of MPM in every generation is formulated as a constrained optimizationproblem,

    Minimize Cm + Cp (1)Subject to

    χki ≤ n ∀i ∈ [1,m] (2)

    9

  • where χ is the alphabet cardinality—χ = 2 for the binary strings—Cm is the model complexitywhich represents the cost of a complex model and is given by

    Cm = logχ(n+ 1)m∑i=1

    (χki − 1

    )(3)

    and Cp is the compressed population complexity which represents the cost of using a simple modelas against a complex one and is evaluated as

    Cp =m∑i=1

    χki∑j=1

    Nij logχ

    (n

    Nij

    )(4)

    where m in the equations represent the number of BBs, ki is the length of BB i ∈ [1,m], and Nij isthe number of chromosomes in the current population possessing bit-sequence j ∈ [1, χki ]8 for BBi. The constraint (Equation 2) arises due to finite population size.

    The greedy search heuristic used in χ-eCGA starts with a simplest model assuming all thevariables to be independent and sequentially merges subsets until the MDL metric no longer im-proves. Once the model is built and the marginal probabilities are computed, a new populationis generated based on the optimal MPM as follows, population of size n(1 − pc) where pc is thecrossover probability, is filled by the best individuals in the current population. The rest n · pcindividuals are generated by randomly choosing subsets from the current individuals according tothe probabilities of the subsets as calculated in the model.

    One of the critical parameters that determines the success of eCGA is the population size.Analytical models have been developed for predicting the population-sizing and the scalability ofeCGA (Sastry and Goldberg, 2004). The models predict that the population size required to solvea problem with m building blocks of size k with a failure rate of α = 1/m is given by

    n ∝ χk(σ2BBd2

    )m logm, (5)

    where n is the population size, χ is the alphabet cardinality (here, χ = 3), k is the building blocksize, σ

    2BBd2

    is the noise-to-signal ratio (Goldberg et al., 1992), and m is the number of building blocks.For the experiments presented in this paper we used k = |a|+1 (where |a| is the number of addressinputs), σ

    2BBd2

    =1.5, and m = `|I| (where ` is the rule size).

    4.2 Costly model building

    As we did with the selectorecombinative genetic algorithm, and loosely following Foster’s method-ology (Foster, 1995), we will try to identify what data is going to drive our execution. In thisparticular case, the relevant pieces of information used for eCGA’s model building are the genepartitions used to compute the MPM. The greedy model-building algorithm requires exploring alarge number of possible partition merges while building the model—being O(`3) the worst casescenario. Thus, this would suggest that the partitions of genes should be the basic piece of data tostream. At each step of the model building process, a stream of partitions will need to be evaluatedto compute its combined complexity score. The evaluation of each partition is also independent ofeach other, further simplifying the overall design.

    8Note that a BB of length k has χk possible sequences where the first sequence denotes be 00· · · 0 and the lastsequence (χ− 1)(χ− 1) · · · (χ− 1)

    10

  • Figure 4: Meandre’s flow for eCGA.

    0
5000
10000
15000
20000
25000
30000
35000
40000
45000


    /init_ecga


    /update_par55ons


    /greedy_ecga_mb


    /print_model


    (a) Original

    0
5000
10000
15000
20000
25000
30000
35000
40000
45000


    /init_ecga


    /update_par55ons/mapper/

    ecp/4


    /update_par55ons/mapper/

    epi/4


    /update_par55ons/mapper/

    epj/4


    /update_par55ons/paralell/0


    /update_par55ons/paralell/1


    /update_par55ons/paralell/2


    /update_par55ons/paralell/3


    /update_par55ons/reduce/

    epc/4


    /greedy_ecga_mb


    /print_model


    (b) Parallelized

    Figure 5: Execution profile of the original data-instensive flow implementing eCGA and its auto-matically parallelized version of its update partitions component (parallelization degree equal tothe number of available cores, 4). Times are in milliseconds and are averages of twenty runs.

    4.3 Components and the data-instensive flow

    Figure 4 presents the four components we will use to implement a data-intensive version of eCGA.The init ecga component creates a new population, evaluates the individuals (using and MKdeceptive trap (Goldberg, 2002) where k = 4 and d = 0.25), pushes the population and startsstreaming the initial set of gene partitions that require evaluation. Then, the update partitioncomponent computes the combine complexity of the partition and streams that information to thegreedy ecga mb component. This component implements the greedy algorithm that receives theevaluated partitions and decides which ones to merge. In case that a partition merge is possible, thenew possible partitions to be evaluated are streamed into the update partition component. If nomerger is possible, the greedy ecga mb component pushes the final MPM model to the print modelcomponent. Table 2 present the ZigZag script implementing the data-intensive computing versionof eCGA.

    4.4 Experiments

    We ran three different experiments. First we measure the executions profile of the implementdata-intensive eCGA. Figure 5(a) presents the average time spend in each component over 20runs—` = 256 and n = 100, 000. All the runs lead to learning the optimal MPM model. Theinteresting element to highlight on Figure 5(a) is the already known fact that most of the executiontime of the model building process was spend evaluation the gene partitions. Also, the initialization

    11

  • Table 2: ZigZag script implementing the extended compact genetic algorithm using Meandre com-ponents.

    ## Main imports## Omitted for simpliciticy#

    ## Instances#init_ecga = INIT_ECGA()greedy_ecga_mb = GREEDY_ECGA_MB()update_partitions = UPDATE_ECGA_PARTITIONS()print_model = PRINT_MODEL()print_pop = PRINT_POP_MATRIX()

    ## The flow#@init_ecga = init_ecga()@update_part = update_partitions(

    ecga_partition_cache:init_ecga.ecga_partition_cache;

    ecga_partition_to_update_i:init_ecga.ecga_partition_to_update_i;

    ecga_partition_to_update_j:init_ecga.ecga_partition_to_update_j

    )@greedy_mb = greedy_ecga_mb(

    ecga_partition_cache:update_part.ecga_partition_cache

    )update_partitions(

    ecga_partition_cache:greedy_mb.ecga_partition_cache;

    ecga_partition_to_update_i:greedy_mb.ecga_partition_to_update_i;

    ecga_partition_to_update_j:greedy_mb.ecga_partition_to_update_j

    )print_model(ecga_model:greedy_mb.ecga_model)

    12

  • 0


    1


    2


    3


    4


    5


    6


    PW‐1
 PW‐2
 PW‐3
 PW‐4


    Figure 6: eCGA speedup compared to the original non data-intensive computing implementation.Figure show the speedup as a function of the number of update partitions parallelized compo-nents available.

    is being negatively affected by the partition updates, since it is being held back since it producespartitions much faster than they can be evaluated. This fact can be observed on Figure 5(b). Afterproviding four parallelized partition evaluation components, not only the overall wall clock timedrop, but also the initialization time too, since now, the update input queues can keep up with theinitial set of partitions generated.

    We repeated these experiments providing {1,2,3, and 4} parallelized update partition com-ponents and measured the overall speedup against the traditional eCGA model building implemen-tation. Figure 6 presents the speedup results graphically. The first element to highlight is that,only using one update partition component instance we obtained a speedup bigger than one.This is the result of few improvements on the implementation of the component, which allowed toremove extra layers of unnecessary function calls present in the original code. Also, the fact thatpartitions results are streamed into the greedy model builder adds and extra improvement—similarto pipeline segmentation as discussed earlier and the availability of idle cores. The final speedupgraph shows a clear linear increase in performance as more cores are efficiently used. Finally, werun a the same flow on an SGI Altix machine with a multiprocessor NUMA architecture at theNational Center for Supercomputing Applications (NCSA)9 and requested 16 and 32 nodes andaveraged the results of 20 runs, providing linear speedup of 14.01 and 27.96, slightly worst than onmulticore machines due to the NUMA interconnection overhead.

    5 Conclusion

    In this paper we have shown that implementing evolutionary computation algorithms using adata-intensive computing paradigm is possible. We have presented step-by-step transformationsfor two illustrative cases—selectorecombinative genetic algorithms and estimation of distributionalgorithms—and reviewed some few best practices during the process. Moreover, our results havealso shown the inherent benefits of the underlying usage of data-flow execution and how, when

    9http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/TechSummary/

    13

    http://www.ncsa.uiuc.edu/UserInfo/Resources/Hardware/SGIAltix/TechSummary/

  • properly engineered, these algorithms can directly benefit from the current race on increasing thenumber of cores per chips. We have shown that linear speedups are possible without any changeto the underlying algorithms based on data-intensive computing. We have also shown that suchresults hold for multicore architectures, but also for multiprocessor NUMA architectures.

    Our current future work is focused on analyzing other evolutionary computation algorithms,and what challenges they may face when developing their data-intensive computing counterparts.We are also currently working on evaluating the performance of the algorithms presented in thispaper on distributed-memory supercomputers and how to tune the data-intensive execution enginein those environments. majority rule.

    6 Acknowledgments

    I would like to thank The Andrew W. Mellon Foundation and the National Center for Supercom-puting Applications for their funding and support to the SEASR project that has made possiblethe development of the Meandre infrastructure. Also, my special thanks to the Illinois GeneticAlgorithms Lab, and his director Prof. David E. Goldberg, for the valuable advice, guidance, andsupport throughout these years. I would also like to thank Kumara Sastry, for introducing me tothe world of eCGA. Finally, special thanks to the National Center for Supercomputer Applicationsand R-Systems Inc. for granting me access to the computational resources that have made thispaper possible.

    References

    Alba, E., editor (2007). Parallel Metaheuristics. Wiley.

    Amdahl, G. (1967). Validity of the single processor approach to achieving large-scale computingcapabilities. In AFIPS Conference Proceedings, pages 483–485.

    Beckett, D. (2004). RDF/XM Syntax Specification (Revised). W3C Recommendation 10 February2004, The World Wide Web Consortium.

    Beynon, M. D., Kurc, T., Sussman, A., and Saltz, J. (2000). Design of a framework for data-intensive wide-area applications. In HCW ’00: Proceedings of the 9th Heterogeneous ComputingWorkshop, page 116, Washington, DC, USA. IEEE Computer Society.

    Brickley, D. and Guha, R. (2004). RDF Vocabulary Description Language 1.0: RDF Schema. W3CRecommendation 10 February 2004, The World Wide Web Consortium.

    Cantú-Paz, E. (2000). Efficient and Accurate Parallel Genetic Algorithms. Springer.

    De Jong, K. and Sarma, J. (1995). On decentralizing selection algorithms. In In Proceedings of theSixth International Conference on Genetic Algorithms, pages 17–23. Morgan Kaufmann.

    Dean, J. and Ghemawat, S. (2004). MapReduce: Simplified Data Processing on Large Clusters. InOSDI’04: Sixth Symposium on Operating System Design and Implementation.

    Foster, I. (1995). Designing and Building Parallel Programs: Concepts and Tools for ParallelSoftware Engineering. Addison Wesley.

    14

  • Foster, I. (2003). The virtual data grid: A new model and architecture for data-intensive collabora-tion. In in the 15th International Conference on Scientific and Statistical Database Management,pages 11–.

    Giacobini, M., Tomassini, M., and Tettamanzi, A. (2005). Takeover time curves in random andsmall-world structured populations. In GECCO ’05: Proceedings of the 2005 conference onGenetic and evolutionary computation, pages 1333–1340, New York, NY, USA. ACM.

    Goldberg, D. E. (1989). Genetic algorithms in search, optimization, and machine learning. Addison-Wesley, Reading, MA.

    Goldberg, D. E. (2002). The Design of Innovation: Lessons from and for Competent GeneticAlgorithms. Kluwer Academic Publishers, Norwell, MA.

    Goldberg, D. E., Deb, K., and Clark, J. H. (1992). Genetic algorithms, noise, and the sizing ofpopulations. Complex Systems, 6:333–362. (Also IlliGAL Report No. 91010).

    Goldberg, D. E., Korb, B., and Deb, K. (1989). Messy genetic algorithms: Motivation, analysis,and first results. Complex Systems, 3(5):493–530.

    Harik, G. R., Lobo, F. G., and Sastry, K. (in press). Linkage learning via probabilistic modelingin the ECGA. In Pelikan, M., Sastry, K., and Cantú-Paz, E., editors, Scalable Optimizationvia Probabilistic Modeling: From Algorithms to Applications, chapter 3. Springer, Berlin. (AlsoIlliGAL Report No. 99010).

    Larrañaga, P. and Lozano, J. A., editors (2002). Estimation of Distribution Algorithms. KluwerAcademic Publishers, Boston, MA.

    Llorà, X. (2006). E2K: evolution to knowledge. SIGEVOlution, 1(3):10–17.

    Llorà, X. (February, 2002). Genetic Based Machine Learning using Fine-grained Parallelism forData Mining. PhD thesis, Enginyeria i Arquitectura La Salle. Ramon Llull University, Barcelona.

    Llorà, X., Ács, B., Auvil, L., Capitanu, B., Welge, M., and Goldberg, D. E. (2008). Meandre:Semantic-driven data-intensive flows in the clouds. In Proceedings of the 4th IEEE InternationalConference on e-Science, pages 238–245. IEEE press.

    Mattmann, C. A., Crichton, D. J., Medvidovic, N., and Hughes, S. (2006). A software architecture-based framework for highly distributed and data intensive scientific applications. In ICSE ’06:Proceedings of the 28th international conference on Software engineering, pages 721–730, NewYork, NY, USA. ACM.

    Pelikan, M., Goldberg, D. E., and Cantú-Paz, E. (2000). Linkage learning, estimation distribution,and Bayesian networks. Evolutionary Computation, 8(3):314–341. (Also IlliGAL Report No.98013).

    Pelikan, M., Lobo, F., and Goldberg, D. E. (2002). A survey of optimization by building and usingprobabilistic models. Computational Optimization and Applications, 21:5–20. (Also IlliGALReport No. 99018).

    Sarma, J. and De Jong, K. (1997). An analysis of local selection algorithms in a spatially struc-tured evolutionary algorithm. In Proceedings of the Seventh International Conference on GeneticAlgorithms, pages 181–186. Morgan Kaufmann.

    15

  • Sarma, J. and De Jong, K. (1998). Selection pressure and performance in spatially distributedevolutionary algorithms. In In Proceedings of the World Congress on Computatinal Intelligence,pages 553–557. IEEE Press.

    Sastry, K. and Goldberg, D. E. (2004). Designing competent mutation operators via probabilisticmodel building of neighborhoods. Proceedings of the Genetic and Evolutionary ComputationConference, 2:114–125. Also IlliGAL Report No. 2004006.

    Sywerda, G. (1989). Uniform crossover in genetic algorithms. In Proceedings of the third interna-tional conference on Genetic algorithms, pages 2–9, San Francisco, CA, USA. Morgan KaufmannPublishers Inc.

    Uysal, M., Kurc, T. M., Sussman, A., and Saltz, J. (1998). A performance prediction frameworkfor data intensive applications on large scale parallel machines. In In Proceedings of the FourthWorkshop on Languages, Compilers and Run-time Systems for Scalable Computers, number 1511in Lecture Notes in Computer Science, pages 243–258. Springer-Verlag.

    Weibel, S., Kunze, J., Lagoze, C., and Wolf, M. (2008). Dublin Core Metadata for ResourceDiscovery. Technical Report RFC2413, The Dublin Core Metadata Initiative.

    Welge, M., Auvil, L., Shirk, A., Bushell, C., Bajcsy, P., Cai, D., Redman, T., Clutter, D., Aydt,R., and Tcheng, D. (2003). Data to Knowledge (D2K). Technical report, Technical ReportAutomated Learning Group, National Center for Supercomputing Applications, University ofIllinois at Urbana-Champaign.

    16

    IntroductionMeandering into Data-Intensive ComputingData-flow execution enginesComponentsProgramming Paradigm

    Selectorecombinative Genetic AlgorithmsThe original algorithmPopulation vs. individualsComponentsThe data-intensive flowExperiments

    The Extended Compact Genetic AlgorithmAn eCGA overviewCostly model buildingComponents and the data-instensive flowExperiments

    ConclusionAcknowledgments