GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 ·...

24
EPCC Technical Report EPCC-TR-91-21 Genetic Set Recombination and its Application to Neural Network Topology Optimisation Nicholas J. Radcliffe Edinburgh Parallel Computing Centre, University of Edinburgh, King’s Buildings, EH9 3JZ, Scotland Abstract Forma analysis is applied to the task of op- timising the connectivity of a feed-forward neural network with a single layer of hidden units. This problem is reformulated as a mul- tiset optimisation problem and techniques are developed to allow principled genetic search over fixed- and variable-size sets and multi- sets. These techniques require a further gen- eralisation of the notion of gene, which is pre- sented. The result is a non-redundant repre- sentation of the neural network topology op- timisation problem together with recombina- tion operators which have carefully designed and well-understood properties. The tech- niques developed have relevance to the ap- plication of genetic algorithms to constrained optimisation problems. 1 Introduction While genetic algorithms have been applied to a num- ber of problems in neural networks, there are severe difficulties with this endeavour. This is significant for two reasons. First, many of the problems in neu- ral networks are important in their own right and do not presently have any wholly satisfactory means of resolution. A good example of this is the choice of network topology. Secondly, the failure modes of the genetic algorithm seen in neural network applications are common to a broader class of problems, and their study can yield more general insights. This paper is a study in the application of forma anal- ysis (Radcliffe [22, 23]) to this and related problems. It begins in section 2 with a brief review and a discussion of the difficulties with previous genetic approaches to problems in neural networks. This is followed, in sec- tion 3, by a short review of schema- and forma anal- ysis and a discussion of the “permutation problem” for neural networks. (Although the paper is intended to be self-contained, the reader may find it easier to follow after reading Radcliffe [23].) The core of the paper, sections 4, 5 and 6 is a study of the application of forma analysis to optimisation problems for which the solution is a set or multiset. Both fixed- and variable-size sets and multisets are considered. Section 4 reviews na¨ ıve approaches to these problems, section 5 uses forma analysis to gain further insights and to develop a more satisfactory for- mulation of the problem, and section 6 is a study of the general phenomenon of “non-separability” of formae. The reason for studying these problems is that in sec- tion 7 neural network topology optimisation problems are reformulated as multiset optimisation problems, and the theory developed in the preceding sections becomes directly applicable. This section includes a discussion of sub-parameter-level recombination with particular reference to hidden nodes. The paper closes with a summary and discussion of the results presented, and draws out some of the wider implications for genetic search in other domains. 2 Genetic Approaches to Neural Networks Genetic algorithms are increasingly being applied to problems in neural networks. Rudnick [24] and Weiss [34] have produced excellent bibliographies for this field in 1990. A number of approaches can be distin- guished, all of which have had limited success, and most of which have concentrated on Rumelhart-type feed-forward networks. The two primary areas of ac- tivity have been: 1. Topology Optimisation. The genetic algorithm is used to select a topol- ogy (pattern of connectivity) for the network which is then trained using some fixed train- ing scheme, most commonly back-propagation of errors (Rumelhart [25]). This approach is inherently computationally demanding because 1

Transcript of GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 ·...

Page 1: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

EPCC Technical Report EPCC-TR-91-21

Genetic Set Recombination and its Application to

Neural Network Topology Optimisation

Nicholas J. [email protected] Parallel Computing Centre, University of Edinburgh, King’s Buildings, EH9 3JZ, Scotland

Abstract

Forma analysis is applied to the task of op-timising the connectivity of a feed-forwardneural network with a single layer of hiddenunits. This problem is reformulated as a mul-tiset optimisationproblem and techniques aredeveloped to allow principled genetic searchover fixed- and variable-size sets and multi-sets. These techniques require a further gen-eralisation of the notion of gene, which is pre-sented. The result is a non-redundant repre-sentation of the neural network topology op-timisation problem together with recombina-tion operators which have carefully designedand well-understood properties. The tech-niques developed have relevance to the ap-plication of genetic algorithms to constrainedoptimisation problems.

1 Introduction

While genetic algorithms have been applied to a num-ber of problems in neural networks, there are severedifficulties with this endeavour. This is significantfor two reasons. First, many of the problems in neu-ral networks are important in their own right and donot presently have any wholly satisfactory means ofresolution. A good example of this is the choice ofnetwork topology. Secondly, the failure modes of thegenetic algorithm seen in neural network applicationsare common to a broader class of problems, and theirstudy can yield more general insights.

This paper is a study in the application of forma anal-ysis (Radcliffe [22, 23]) to this and related problems. Itbegins in section 2witha brief review anda discussionof the difficulties with previous genetic approaches toproblems in neural networks. This is followed, in sec-tion 3, by a short review of schema- and forma anal-ysis and a discussion of the “permutation problem”for neural networks. (Although the paper is intendedto be self-contained, the reader may find it easier to

follow after reading Radcliffe [23].)

The core of the paper, sections 4, 5 and 6 is a studyof the application of forma analysis to optimisationproblems for which the solution is a set or multiset.Both fixed- and variable-size sets and multisets areconsidered. Section 4 reviews naıve approaches tothese problems, section 5 uses forma analysis to gainfurther insights and to develop amore satisfactory for-mulationof the problem, and section6 is a study of thegeneral phenomenon of “non-separability”of formae.The reason for studying these problems is that in sec-tion 7 neural network topology optimisationproblemsare reformulated as multiset optimisation problems,and the theory developed in the preceding sectionsbecomes directly applicable. This section includes adiscussion of sub-parameter-level recombinationwithparticular reference to hidden nodes.

The paper closes with a summary and discussion ofthe results presented, anddraws out some of thewiderimplications for genetic search in other domains.

2 Genetic Approaches to NeuralNetworks

Genetic algorithms are increasingly being applied toproblems in neural networks. Rudnick [24] andWeiss[34] have produced excellent bibliographies for thisfield in 1990. A number of approaches can be distin-guished, all of which have had limited success, andmost of which have concentrated on Rumelhart-typefeed-forward networks. The two primary areas of ac-tivity have been:

1. Topology Optimisation.The genetic algorithm is used to select a topol-ogy (pattern of connectivity) for the networkwhich is then trained using some fixed train-ing scheme, most commonly back-propagationof errors (Rumelhart [25]). This approach isinherently computationally demanding because

1

Page 2: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

the complete conventional training phase (itselfcomputationally intensive) is required simply toevaluate the fitness of a chromosome (networktopology). The approach remains reasonably at-tractive despite this because of the paucity ofprincipled alternative methods for selecting thenetwork topology. Representative studies in thisclass include those of Miller, Todd & Hegde [15],Harp, Samad & Guha [11, 10], Whitley, Stark-weather& Bogart [39], Muhlenbein [18] andHan-cock [9]. This class of problems is addressed insection 7 of the paper.

2. Genetic Training Algorithms.Selecting weights for a neural network is itself anoptimisation problem,1 and a genetic algorithmcan naıvely be applied to it, using an inverse er-ror as the measure of utility (fitness). Whitleyand his co-workers [35, 39, 38] have done muchwork in this area, and the study by Montana andDavis [16] is especially ingenious and notewor-thy.

Hybrid approaches have also been discussed (Rad-cliffe [21]), and there have been studies in which ge-netic algorithms have been used to tune the parame-ters of other training schemes, including initialweightconfigurations (Belew, McInerny & Shraudolph [1]).

All of these approaches have associated problems,which have been discussed by Montana & Davis [16],Radcliffe [20, 21], Belew et al. [1] andWhitley et al. [37].Principal among these, and appearing in many differ-ent guises, is a permutational redundancy associatedwith the arbitrariness of labels of topologically equiv-alent hidden nodes. Specifically, to take an extremecase, in a fully-connected feed-forward, layered net-work with a single hidden layer comprisingNh units,(figure 1) there is approximately an Nh! potential re-dundancy associated with the indistinguishability ofnetworks with relabelled hidden units.

If the genetic representation (whether it be topologiesor weights that form the search space) distinguishesbetween networks which differ only by the labellingof hidden units, the search space is enormously en-larged. While optima usually become more numer-ous by a comparable factor, the global nature of ge-netic search tends to make navigation through theenlarged search space very difficult (Radcliffe [20, 21],Belew et al. [1], Whitley et al. [37] and section 3.2).Genetic algorithms are sensitive to the potential forredundant representations in a way that most othersearch schemes (for example, gradient techniques and1 albeit one which often has a rather poorly-defined objective function

h1 h2 h3i1 i2 i3 i4 i5o1 o2 o3 o4

Figure 1: A fully-connected 5–3–4 layered neu-ral network. Notice that the hidden node labelsh1; h2; h3 are arbitrary (providing that the hid-den nodes are identical) giving, in this case, a 3!redundancy in labelling.

stochastic processes like simulated annealing) are not.There are two (related) reasons for this:

1. local techniques tend to make ‘smaller’ moves inthe search space than thosepossibleundergeneticrecombination (“crossover”);

2. most techniques maintain only a single solutionrather than a population of solutions.

The relationship between these two points should beclear: the danger is exemplified by the casewhere twoequivalent networks (identical up to a re-labelling ofhidden units) can be recombined to produce a childwhich is not equivalent to them. This is aphenomenonwhich is not seen in “conventional” genetic search(for example, simple parameter optimisation) and ithas been strongly argued elsewhere the problem ishighly detrimental to the effectiveness of the searchprocess both in the specific context of neural networks(Radcliffe [20, 21], Belew et al. [1]) and more generally(Radcliffe [22, 23]).

3 Schema and Forma Analysis

3.1 Formulation and Principles

In order to understand the motivations for the ideasput forward in this paper it is necessary briefly toreview some of the theory of genetic algorithms.

Holland’s ground-breaking formulation and analy-sis of genetic algorithms introduced the theoretical

2

Page 3: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

framework of schema analysis, and the well-known(if often poorly-expressed) Schema Theorem (Holland[12]). This formulation applies primarily to k-ary2string chromosomal representations for which eachlocus (site) on the chromosome has a well-definedmeaning.3In 1985 Goldberg & Lingle [8] extended Holland’swork to cover permutation-based problems (suchas the well-known Travelling Sales-rep Problem)through the introduction of o-schemata (see alsoGold-berg [5]). More recently, Vose & Leipins [32, 33] andRadcliffe [22, 23] have independently further gener-alised Holland’s results to take in very much moregeneral objects which Vose calls predicates and Rad-cliffe terms formae. This paper uses and builds uponRadcliffe’s formulation.

A schema may be viewed as a set of chromosomeswhich share somespecified subset of their genes. Hol-land introduced a “don’t care” symbol to aid thedescription of schemata, so that the schema 1 0 isthe set of all chromosomes which have a one at theirfirst locus anda zero at their third locus. A forma, sim-ilarly, may be viewed as a set of chromosomes whichare related by some (any) specific characteristic: thisneed not be the sharing of gene values.4 It is conve-nient to regard both schemata and formae as equiva-lence classes of solutions under given (often implicit)equivalence relations over the representation space C(the set of chromosomes).

One of the central tenets of forma analysis is that for-mae should be chosen which group together chromo-somes coding solutions which might plausibly havesimilar performance. Having chosen such formae, ge-netic operators are constructed with a view to manip-ulating solutions5 in meaningful ways. Specifically,the aim is to build recombination operators whichrespect forma membership and properly assort formae(Radcliffe [22, 23]). These terms are explained andillustrated in figures 2 and 3 respectively, and are de-fined more rigorously in section 6.1.

3.2 Application to Neural Networks

In addition to increasing the size of the search space,the permutationproblemdescribed in section 2makes2 base k, e.g. k = 2 gives binary, k = 8 octal etc.3 Of course, the theorem applies to any string-based representation givensuitable coefficients quantifying the disruptive effects of the genetic op-erators, but the observed schema averages on which the theorem cruciallydepends will have usefully low variance only—loosely—when the locihave well-defined meanings. This is discussed in detail in Radcliffe [21].4 At least, not as genes are conventionally understood.5 strictly, their chromosomal representatives

� �� � � CFigure 2: A recombination operator is said to re-spect a set of formae if given any pair of chromo-somes � and �, all of their children under recom-bination are members of all the formae to whichboth parents belong. The similarity set of � and �,written � � �, is the smallest forma which con-tains them both. This can be constructed as theintersection of all formae containing them bothparents, illustrated above. Respect amounts tothe requirement that each child produced by re-combination lies in the similarity set of its par-ents. � �0� \ �0� � CFigure 3: A key function of recombination is thatit mixes features from parents so that childrenmay exhibit traits inherited from each. Giventwo parents, � and �0 which belong to formae �and �0 respectively, itmay be possible togeneratechildren which are members of both � and �0 (i.e.children which exhibit both the traits capturedby the two formae). A recombination operatorwhich will, with non-zero probability, produce achild in the intersection of arbitrary formae � and�0 (given parents � 2 � and �0 2 �0, and assumingthat �\�0 6= 6 e ) is said properly to assort the formaeunder consideration. It should be noted thatthe requirements of respect (figure 2) and properassortment are not always compatible, thoughfor many sets of formae (including schemata)they are.

3

Page 4: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

respect and proper assortment rather hard to ensure.These differing aspects of the permutation problemwill be referred to as the numerical permutation problemand the navigational permutation problem respectively,and will now be considered in turn.

Throughout the paper a distinction will be made be-tween a “true” search space, S, consisting of the actualstructures under consideration (in the present casenetwork topologies) and a representation space (orspace of chromosomes)C. Assume that networkswithNi input nodes, No output nodes and up to Nh hid-den nodes are considered. Then thenumber ofhiddennode types is 2Ni+No , because the connection to eachexternal node may be present or absent.

The number of network topologies (the size of S) isgiven by jSj � (2Ni+No )NhNh! : (1)(The approximation in this expression is that all thehidden nodes have different connectivities, justifyingthe Nh! in the denominator. This approximation isgood when the number of hidden node types vastlyexceeds themaximum number of hidden units, whichis almost always the case.)

For example, if there are ten input nodes, ten outputnodes and a maximum of ten hidden nodesjSj � (220)1010! � 1:6� 10603� 106 � 4� 1053: (2)The (naıve) representation space C consists of chro-mosomes which use one bit to mark the presence orabsence of a connection between each external nodeand each labelled hidden node. The size of C is givenby the numerator alone,jCj = (220)10 � 1:6� 1060: (3)While redundancy which expands the size of thesearch space by a factor ofmore thanamillion for evena problem of modest size may at first seem daunting,the reader may be tempted to reflect that the differ-ence between overall sizes 1060 and 1054 seems ratherless significant. This feeling may be reinforced by ob-serving that as the size of the network increases, therate of growth of the size of the search space (char-

acterised by the numerator, 2(Ni+No)Nh ) outstrips therate of growth of the permutation problem (for likelyvalues of node numbersNi,No, andNh) which growsonly as Nh!. Thus, for example, increasing the size ofthe problem from a 10–10–10 network to an 11–11–11network increases the size of S from 1054 to 1065.Complacency, however, would be misplaced. Forsuppose that some formae were constructed in the

hi1 i2 i3 i4 i5o1 o2 o3 o4Figure 4: A hidden node h, which can be de-scribed by the set of external nodes to whichit is connected, f i1; i3; i4; o1; o3 g: This can also,of course, be conveniently described by the bi-nary string 101101010where a one indicates a thepresence of a connection and a zero the absence,and where the nodes are laid out in order withthe input nodes preceding the output nodes.

true search space S but that the representation usedwere redundant in the way described (i.e. larger bya factor of around Nh!). Ensuring respect and assort-ment of these formaewouldbe essentially impossible.

To see this, imagine that there are two beneficial setsof hidden nodes, where a hidden node is consideredcompletewith its set of external connections (figure 4).Assume that one chromosome � represents a networkwhich contains the first “good” set of hidden units,and that another chromosome � codes another net-work containing the second beneficial set of units.Thinking of these sets of units as “building blocks”,the aim would be to bring the two together in a sin-gle chromosome by recombination. If, however, thehidden unit labels of the first beneficial set of nodeson � overlap with the labels of the second set on �,recombination will be unable to bring these two setstogether, no matter how often it is applied6 (figure 5).At one level, this can be viewed simply as an exampleof the numerical permutation problem, for “all” thatthe genetic algorithm needs to do is to construct asimilar chromosome � 0 which is like � but has nodelabels for the second “good” set of hidden units whichdo not overlap with those of the first set on �; in prac-tice, the difficulty is worse.

A good way to see this is to consider an increasinglycommon way of implementing genetic algorithms,in which an isolated sub-population model is used incontrast to the traditional panmictic population. Theisolated sub-population model consists of a number6 That is, the problem is not merely on of proper assortment, for the formaecannot be weakly assorted either (Radcliffe [23]).

4

Page 5: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

h1 h2 h3 h4 h5h1 h2 h3 h4 h5 N1N2Figure 5: The hidden layers from two networks,N1 and N2, the first of which can be trained torecognise the first half of a training set reliably,and the second of which can reliably be trainedto classify the second half. If the label of thehidden node h3 on the second network werechanged to h2 recombination would be possiblewithout disturbing either subnetwork, whereasthis is impossible with the fixed but arbitraryhidden node labels shown.

of genetic algorithms each tackling the same searchproblem with separate populations. There is usuallyoccasional migration of solutions between subpopu-lations. This approach is important both because ofits convenience for parallel implementations and be-cause the maintenance of isolated sub-populations ishelpful in delaying loss of genetic diversity, encour-aging “speciation” and reducing the number of eval-uations required to find an acceptable solution (Nor-man [19], Muhlenbein [17] etc.)

Consider the relatively likely situation in which � isdrawn from a population on one processor which haslargely converged on network topologies which in-clude the first beneficial set of hidden units, and � isdrawn from a different population in which the sec-ond group of units is common. The clash of node la-bellings will ensure that nomatter how often chromo-somes from the two populations are brought together,they will be unable to be recombined to produce a so-lution which contains both the beneficial sets of units.This is the navigational permutationproblem. This dif-ficulty could, of course, arise in a similar way in thepanmictic population model if the population con-tained two “species”, each possessing one of the twobeneficial sets of nodes, provided that the fitnessesof the two species were such that neither would bequickly destroyed by the population dynamics.

The foregoing discussion should convince the readerthat the permutationproblem is a serious impediment

to genetic search. It is not isolated to neural networkproblems, though this is the present focus.7 A majoraim of this paper is to suggest a possible representa-tion for neural network topology optimisation whichavoids permutational redundancy and allows formaedefined in a way independent of hidden unit labels(that is, formae which are well-defined in S, as wellas C) to be respected and properly assorted. This for-mulation builds on the idea of regarding hidden unitscomplete with their external connections as basic en-tities, and views a network topology as a collection ofsuch hidden nodes. This is the reason for the concen-tration on set and multiset optimisation problems inthe following sections.

4 Sets, Multisets and Formae

4.1 Preliminaries

In order to approach the problem of optimising thetopology of a neural network with a genetic algo-rithm, it is useful first to consider set and multisetoptimisation problems, which will form the solutionframework. Recall that the distinction between a setand a multiset is that duplication of elements is notsignificant in sets, so thatf a; a; b g � f a; b g (4)whereas in multisets an element may appear morethan once fja; a; bjg 6= fja; bjg: (5)(The notation fj � � � jg is used to indicate a multiset.)The difference is significant in this context.

A number of different set and multiset optimisationproblems may be distinguished. In general therewill be a “universal set”, E , from which elements aredrawn. The aim is to construct a set or multiset con-sisting of elements drawn from this universal set soas to optimise some property of the resulting set ormultiset. Examples could include

1. finding locations for bottle banks so as to max-imise recycling in some area;

2. selecting members of a committee to make a en-vironmental impact assessment;

3. choosing connections in a neural network tomin-imise its average learning time to some acceptableerror;7 An even more pernicious form of the permutation problem is seen when

graphs are being optimised for some property, for in this case there isgenerally a permutational redundancy of order n!where n is the numberof nodes in the graph, and there is not normally an analogue of the fixed“external” units.

5

Page 6: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

4. choosing connections in a neural network tomax-imise its generalisation capability.

The first could be a set or multiset optimisation prob-lem, according to whether multiple bottle banks wereto be allowed at a single location or not. It is likely, forpractical purposes that the number of bottle bankswould be fixed (and that increasing this numberwould increase the potential for recycling) so that thesize of the solution set would be known at the outset.

The second case is certainly a set rather than a mul-tiset problem, since no human can appear more thanonce on a committee, and though the size may beknown before-hand (perhaps because of budgetaryconstraints) it could be that it formed part of the op-timisation. (The more people on a committee, in gen-eral, the longer decisions take to agree, and there arepeople whom it may be desirable not to have presentor who could contribute nothing useful anyway.)

In the case of the two neural network problems, theideal number of connections may artificially be fixedbeforehand, but in general will not be known andwillitself be subject to optimisation. (It should be remem-bered that while, in principle, the presence of a con-nection should never be a problem since its strength(weight) can always be set to zero, in practice a givenlearning schememaywell be hinderedby the presenceof a connection.) The trade-off is particularly acute inthe case where the goal is to maximise the generali-sation capability of the network. In this case, too fewconnections can prevent acceptable learning, whiletoo many will tend to hinder generalisation throughthe phenomenon of over-learning.8There may also be other complications, such as con-straints on the sets, (there must be at least three bottlebanks in the Prime Minister’s constituency, the com-mittee should not include arch-rivals etc.) but thesewill not be considered in this paper. Thus four classesof set optimisation problems will be considered—fixed-size sets, variable-size sets, fixed-size multisetsand variable-size multisets.

4.2 Fixed-Size Multisets

In the case of a fixed-size multiset, the naıve approachis to use a k-ary representation, where k = jEj, the sizeof the universal set, to allow each locus on a conven-tional linear chromosome to take any allele from E ,and to proceed as normal with a conventional genetic8 In fact there is evidence that for learning schemes like back-propagation itmaybe desirable to train with a net of relatively high connectivity and thento prune nodeswith highly correlated firing patterns (Seitsma & Dow [28],Burkitt [3]). This complication will not be considered in this paper.

operators. The problems with this are obvious andprofound.

1. There is a huge redundancy in the representation,i.e. the representation space C is much larger thanthe real search space S because of the differentorders in which themembers of themultiset maybewritten. While the number of optima is also, ingeneral, increased dramatically (though not nec-essarily by the same factor9) navigation throughthis larger search space may be very difficult.

2. More specifically, respect of “meaningful” formaewill be difficult to ensure. “Meaningful” for-mae must group chromosomes only accordingto properties of the solutions they encode: theyshould not distinguish between twodifferent rep-resentations of a single solution. Thus, formaefor sets or multisets should correspond to well-defined sets of solutions (sets or multisets) in S,and not be defined only in the representationspace C.To see that conventional (one-point) crossovercannot respect any set of formae thus defined inthe context of multisets, simply note that a trivialconsequence of respect is that crossing a solutionwith itself should result in the same solution; ap-plying conventional crossover to chromosomesab and ba could result in aa or bb.

3. If E is large, this representation has a high cardi-nality, which is traditionally not favoured. Thisaspect is discussed in section 7.1.

Whitley [36] has taken essentially this approach tosearching for a winning hand in a simplified form ofpoker, but he additionally used an operator whichreversed the sequences of arbitrary portions of thechromosome.10 This does not solve the fundamentalproblems (1 and2), though it did enable him to findhischosen global optimum very easily. This was an arti-fact arising from the fact that the optimum happenedto be five aces, a pattern which is easy to produceusing this form of “inversion”. This is discussed indetail in chapter 6 of Radcliffe [21].

4.3 Variable-Size Sets and Multisets

Variable-size sets and multisets can be dealt with in amore traditional fashion with fewer problems. In thiscase each locus on the chromosome can correspond9 For example, optimising over amultiset of sizen, if the solution ben copiesof a single element, there is only one representation of the optimum, butn!different representations of solutions in which every element is different.10 It is important to note that this is not inversion in the traditional senseintroduced by Holland [12].

6

Page 7: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

to an element of the universal set E from which ele-ments are to be drawn. The gene values (alleles) canthen indicate the number of copies of the element inquestion tobe included in themultiset. In this case, bi-nary genes correspond to set optimisation, and highercardinality representations correspond to multisets.

This approach is both simple and traditional, and therepresentation scheme described contains no redun-dancy. It perhaps, however, requires the problem to beviewed in a slightly unconventional way. Thus in thefour problems listed above, the positions on the chro-mosome would represent different possible locationsfor the bottle banks, the different possible people onthe committee, and the connections in the network. Insome cases this will lead to very long chromosomes,though this is not necessarily problematical.

Roughly this approach to neural network topologyoptimisationwas taken byMiller, Todd & Hegde [15].In this case, however, there is a further complicationalready mentioned, which is the equivalence of dif-ferent hidden nodes under re-labelling. Thus whileMiller et al. directly manipulated the binary connec-tion matrices for the neural network, a connectioncij 2 f0; 1g between nodes i and j contributes to re-dundancy in the representation if either i or j is ahidden node. This problem will be reconsidered insection 7.

4.4 Fixed-Size Sets

Fixed-size sets present more of a problem for tradi-tional schemes. The (wholly inadequate) approachdescribed for dealing with fixed-size multisets couldbe used if the recombination operator were altered toensure that multiple copies of elements were nevergenerated. This could be fairly easily achieved if thechromosomes were sorted, though the resulting re-combination operator would have to be carefully de-signed to ensure that it was unbiased.

Similarly, the approach described for variable-sizedsets andmultisets could be adopted for fixed-size sets,but with the additional constraint that the sum of thegenes should be the number of elements in the set.This could be ensured in a number of slightly unprin-cipled ways, including random ‘helpful’ mutationsafter crossover.

The approach described in the next section obviatesthe need for such manipulation, and can be extendedto deal with the other classes of set and multiset opti-misation problems discussed.

5 Set and Multiset Recombination

5.1 Random Respectful Recombination

Radcliffe [23] introduced a class of random, respectfulrecombination (R3) operators, which are defined withrespect to specific sets of formae and which are guar-anteedboth to respect andproperly to assort those for-maewhenever these two conditions are compatible.11The R3 operator simply makes a uniform, randomchoice over themembers of the similarity set (figure 2)of the two parents undergoing recombination.

It is easy to see that R3 fulfills these claims. It plainlyrespects the formae since respect amounts precisely tothe condition that all children be members of the sim-ilarity set of their parents (figure 2). The requirementthat respect and proper assortment be compatible istherefore precisely the requirement that the solutionswhich proper assortment requires be capable of gen-eration by the recombination operator lie in the simi-larity set of the two parents. Thus any operator whichchooses every element from the similarity set withsome non-zero probability, and never generates anyother, must respect and properly assort the formae,provided that this is possible.12In the light of this, the essential requirement is toconstruct suitable formae for set and multiset opti-misation problems. The four classes of set and mul-tiset problems identified in section 4.1 will now bediscussed in turn, suitable formae for them will besuggested and the R3 operator and others will be con-structed.

5.2 Fixed-Size Sets (Attempt I)

The most obvious definition of formae for set prob-lems (whether fixed-size or otherwise) is that theyspecify elements which the solution must contain.This is very convenient, but requires rather carefulnotation to avoid subtleties.

Let the universal set (from which all elements aredrawn) be E . Then, assuming that the set size is fixedto be N , the search space S is a subset of the powerset13 P(E). Specifically,S = �� � E �� j�j = N : (6)11 A set of formae which can be simultaneously respected and properly as-sorted is said to be separable, and a recombination operator which achievesthis is said to separate the formae.12 This is not the same as saying that every operator which separates a set offormaemust be capable of generating every solution in the similarity set ofthe two parents; the condition is sufficient but not necessary.13 The power set of a set A is the set of all subsets ofA.

7

Page 8: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

Eh�i a b c� S

Figure 6: The universal set E (top) contains, inthis example, three elements. ThepowersetP(E)of E is the set of all subsets of E : each subset isalso drawn in the top figure. The search space Sis identical to the power set, and is illustrated inthe lower figure. The forma � has the descriptionset h�i = fag, as shown. This forma consists ofall the sets in S which contain h�i, as is shown inthe bottom figure.

A forma � is then a subset of S, and the set � of allformae is subject to � � P(S): (7)More precisely, �will be defined by� = �� � S �� 9 E� � E : (� 2 � () E� � �) : (8)This says that every forma � has an associated descrip-tion set E�. This description set consists of themembersof E which a solution must contain in order to be aninstance of the forma �. It will be convenient to usethe notation h�i for the description set E�. Figure 6illustrates the general idea.

An example of a low precision14 forma � with thedescription set h�i = fag is� = �� 2 S �� � � fag : (9)14 The precision of a forma is similar to the order of a schema, and is defined inRadcliffe [22, 23]. Essentially, high precision formae are small (contain fewmembers) and low precision formae are large (contain many members).The precision of a schema is 2o, where o is the order of the schema.

Notice that the description set of the intersection oftwo formae is the union of their description sets,h� \ �0i � h�i [ h�0i: (10)Recall that the similarity set of two solution sets isthe intersection of all formae which contain them. Itshouldbe clear that this, the smallest formacontainingsolution sets � and �, is the forma whose descriptionset is their intersection:h� � �i � � \ �: (11)Recall also that, quite generally, the R3 operator pro-duces a child by randomly selecting a member fromthe similarity set of the twoparents. Thus, in this case,given two parents � and �, R3 chooses a random setwhich contains their intersection. For example, ifE = f a; b; c; d; e; f g (12)andN = 3, the similarity set of fa; b; cg and fa; d; eg isdescribed byhfa; b; cg� fa; d; egi = fa; b; cg\ fa; d; eg = fag (13)so thatfa; b; cg� fa; d; eg = �fa; b; cg; fa; b; dg; fa; b; eg;fa; b; fg; fa; c; dg; fa; c; eg;fa; c; fg; fa; d; eg; fa; d; fg;fa; e; fg: (14)Thus R3 applied to fa; b; cg and fa; d; eg picks one ofthese ten sets, each with probability one tenth.

This operator may seem a little odd, in that it canproduce a solution set containing an element whichneither of the parent sets contains: this is addressedin section 5.4. Of more immediate concern is the ob-servation that R3 fails properly to assort the formaeas specified. To see this, simply observe that fa; b; cgis a member of the forma described by h�i = fb; cg,and fa; d; eg is a member of the forma described byh�0i = fdg, but that R3 cannot produce a memberof the intersection of these formae � \ �0, becauseh� \ �0i = fb; c; dg, and R3 will always pick a mem-ber of the similarity set given in equation 14. This isnot a failing of the operator, but rather reflects the factthat the formae � are not separable, i.e. they cannotsimultaneously be respected and properly assorted.The general problem of non-separable formae is dis-cussed in section 6.

5.3 Variable-Size Sets

The non-separability of the formae encountered in theconsideration of fixed-size sets in the previous section

8

Page 9: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

can be seen to be the direct result of the restriction tofixed size. All of the definitions of the previous sec-tion carry over to the case of variable-size setswith theexception of the definition of the search space (equa-tion 6), which is replaced byS = P(E): (15)This changes the similarity sets and consequently therandom, respectful recombination operator R3. Con-sidering the “same” example as before, the descriptionset for fa; b; cg� fa; d; eg is unchanged (equation 13),hfa; b; cg� fa; d; egi = fa; b; cg\ fa; d; eg = fag:The similarity set itself is, however, quite different,fa; b; cg� fa; d; eg= �� � E �� a 2 �; (16)where there is now no restriction on the size of �. Toverify that proper assortment is satisfied if a randommember of this similarity set is selected, simply notethat the union of the two parents is always a memberof the similarity set and that the intersection of anypair of formae containing the children contains thisunion.

Thus variable-size sets are simpler than their fixed-size counterparts and have separable formae, ensur-ing that the R3 operator both respects and properlyassorts them. In common with the fixed-size case,however, children may be producedwhich contain el-ements which belong to neither of the parents. This isaddressed next.

5.4 Gene Transmission and Basic Formae

In response to situations like the ones above, in whichR3 succeeds in respecting formae, and (if they are sep-arable) in properly assorting them, but generates so-lutions which bear rather less relation to their parentsthanmight be deemed desirable, the concept of a com-plete orthogonal basis and a formal concept of genewereintroduced in Radcliffe [23]. These ideas will now bere-examined in the context of the examples above.

In their original conception, formae were introducedas equivalence classes induced by arbitrary equivalencerelations over the search space15 S (Radcliffe [22, 23]).Thus, the idea was to try to choose equivalence rela-tions which would group solutions into equivalenceclasses which might reasonably be expected to con-tain solutions with correlated (similar) performance.In this way, the formae arose as secondary objects, in-duced by equivalence relations. It is important to note15 In practice formae tend to be defined over the representation space C ofchromosomes: the distinction, though important, is not of great relevanceto this discussion.

that more than one equivalence relation is used at atime in this analysis, which is slightly unusual andconstitutes a potential source of confusion.

It shouldbe clear that regarding formaeas equivalenceclasses of equivalence relations is not a restriction ontheir generality, since a forma representing an arbi-trary subset � of the search space S can be inducedby constructing an equivalence relation � accordingto the rule� � � () (�; � 2 � or �; � =2 �) : (17)Traditional schemata over strings of length n can beviewed as equivalence classes of equivalence relationsdescribed by members of = f ; gn: (18)Here is the traditional “don’t care” character intro-duced by Holland [12], and is a “care” character.16An equivalence relation from this set then relatesthose chromosomes which agree (have a common al-lele value) at every position in which the descriptionof the equivalence relation has the “care” character; positions in which the equivalence relation hasare not considered. Thus the equivalence relation

, defined for a binary representation, has fourequivalence classes (formae), 0 0, 0 1, 1 0and 1 1, which are, of course, ordinary schemata.Notice that the similarity set of two chromosomes de-finedwith respect to schemata is the schemawhichhasthe “don’t care” symbol at every position at whichthe two chromosomes disagree, and their commonvalue at each remaining locus, so that, for example,1010� 1001 = 10 : (19)It can be seen, therefore, that the R3 operator definedwith respect to schemata, in the case of binary chro-mosomes, reduces to precisely the familiar uniformcrossover operator17 (e.g. Syswerda [30]). In the caseof k-ary genes with k > 2, however, R3 makes a ran-dom choice over the whole allele set for genes at whichthe two parents disagree: this may not be desirable.The introduction of the notion of a complete orthogo-nal basis for a set of equivalence relations allows thispossibility to be disbarred.

The concept of a basis is rather simple, and is moti-vated by familiar definitions from the algebra of linearspaces. The idea is that the equivalence relations with16 Similar notation can be used to describe Walsh partitions in the analysis ofdeception. See Goldberg [6].17 Strictly, uniform crossover is parameterised by the probability of drawingeach gene from the “first” parent: in this paper this probability is alwaysassumed to be 0:5 unless an explicit statement is made.

9

Page 10: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

�1�2�3 �01 �02 �03�1 \ �02[ ] [ 0][ \ 0]C CC

Figure 7: The set of formae induced by the equiv-alence relations , 0 and \ 0. The formaeinduced by \ 0 are intersections of those in-duced by and 0.

only a single definingposition (i.e. only one symbol)can be used to generate all the higher-order equiva-lence relations. The definitions that follow will allowthe setE = f ; ; ; g (20)to be interpreted as a basis for the equivalence rela-tionswhich induce schemata (equation18)withn = 4.Intersection of equivalence relations will be definedin a natural way (see figure 7) so that, for example:\ = : (21)The basic equivalence relations in E can then be iden-tified as genes, and the basic formae (equivalenceclasses) as alleles. Having made these identifications,it is possible to insist that in general, as in the fa-miliar case, each of a child’s genes be inherited fromone or other parent. That is, the child should be inthe same basic forma as one of its parents for each ofthe basic equivalence relations in E. This principleis called strict transmission of genes (Radcliffe [23]). IfR3 is modified to obey this principle, yielding the in-heritance crossover operator, then uniform crossover isrecovered for k-ary string representations with k > 2.It is important to appreciate that the purpose of thisrather laborious construction of a simple operator isthat the construction is valid for any set of formae in-duced by a set of equivalence relations for which acomplete orthogonal basis can be found.

A more rigorous formulation of complete orthogo-nal basis than the foregoing is now presented, basedclosely on Radcliffe [23].

First, intersection is defined for equivalence relations.For these purposes an equivalence relation � is con-veniently described by a binary function : C � C �! f 0; 1 g (22)which returns 1 if its arguments are equivalent and 0if they are not: (�; �) = � 1; if � � �,0; otherwise. (23)The intersection of two equivalence relations ; 0 2 can then be defined by( \ 0)(�; �) = � 1; if (�; �) = 0(�; �) = 1,0; otherwise. (24)Given this, a subset E � will be said to constitute acomplete orthogonal basis for provided that� (Completeness)All relations 2 can be constructed as theintersection of some subset of the basic relations:8 2 9E � E :\E = : (25)� (Orthogonality)Every forma � induced by every basic relation 2 E is compatible with every forma �0 inducedby every other basic relation 0 2 E:8 ; 0 2 E ( 6= 0)8� 2 [ ] 8�0 2 [ 0] : � \ �0 6= 6 e ; (26)where [ ] is the set of equivalence classes (formae)induced by .

The relationship between these definitions and theircounterparts in linear algebra shouldbe clear. The no-tion of completeness is essentially identical, and ex-presses the fact that the basic equivalence relationsspan the space of equivalence relations under con-sideration, while orthogonality ensures that alleles(membership of basic formae) can be freely mixed.It will become apparent in later sections that the defi-nition of orthogonality can be relaxed to somedegree;this will be necessary in order for a suitable basis tobe found for some classes of multiset formae.

5.5 Fixed-Size Sets (Attempt II)

Having defined genes in terms of a complete orthog-onal basis for some equivalence relations the task isnow to findequivalence relationswhich induce the setformae described in section 5.2 and to find a completeorthogonal basis for them.

Recall that these formaewere characterised by a set ofelements which a solutionmust contain in order to be

10

Page 11: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

an instance of the forma in question. Thus, a simpleforma is described byh�i = fag: (27)Clearly various equivalence relations could be con-structed which have � as one of their equivalenceclasses. One such can be generated simply by usingthe trivial rule expressed by equation 17 as follows: fag(�; �) = � 1; if (a 2 � \ � or a =2 � [ �),0; otherwise. (28)This equivalence relation induces two equivalenceclasses, one comprising the solutions containing theelement a and another comprising those which donot. Thus, a second equivalence class, which had notoriginally been specified, has also been induced by �.There is clearly an equivalence relation fxg of theform described by equation 28 for each x 2 E . More-over, these are intuitively natural candidates for a ba-sis for a set of equivalence relations which mightgenerate all the formae of the type described. As willnow be demonstrated, if the rule for intersection ofequivalence relations described by equation 24 is fol-lowed, the set E = � fxg �� x 2 E (29)does indeed form a complete orthogonalbasis for a setof equivalence relationswhich induce all the formaein � as defined in equation 8, together with others.To see this, consider the intersection of fag and fbg,which will be denoted fa;bg. According to the defi-nition of intersection for equivalence relations (equa-tion 24)( fag \ fbg)(�; �) = 8>><>>: 1; if fag(�; �)= fbg(�; �) = 1,0; otherwise.

(30)This equivalence relation induces four equivalenceclasses, which might conveniently be written�ab = f� 2 C �� a 2 �; b 2 �g;�a�b = f� 2 C �� a 2 �; b =2 �g;��ab = f� 2 C �� a =2 �; b 2 �g;��a�b = f� 2 C �� a =2 �; b =2 �g: (31)The generalisation of this is rather obvious. A gen-eral equivalence relation, 2 , has a description set,conveniently written h i, which is a subset of the uni-versal set E . Members of the search space (themselvessubsets of E) are then equivalent under precisely ifthey contain the same subset of the members of thedescription set h i. Formally, (�; �) = � 1; if h i \ � = h i \ �,0; otherwise. (32)

It is clear that E (defined in equation 29) does indeedform a basis for the equivalence relations.18 A forma� induced by an equivalence relation 2 is thencharacterised by a partition of the description set h i.It then becomes convenient to describe a forma by a2-tuple h�i = (�+; ��) (33)where �+ \ �� = 6 e (34)and �+ [ �� = h i (35)with the interpretation� 2 � () �� \ �+ = �+ and � \ �� = 6 e � : (36)Having made these identifications, it is possible todefine the similarity set of two chromosomes withrespect to the formae � induced by . This will allowthe random respectful recombination operator R3 tobe constructed. Using the notation for the descriptionsets of formae just introduced, this givesh� � �i = �� \ �; E � (� [ �)�; (37)where the minus sign denotes set subtraction. TheR3 operatormakes a random (uniform) selection fromthis similarity set. Returning to the example usedearlier, (equation 12, withN = 3),hfa; b; cg � fa; d; egi= �fa; b; cg \ fa; d; eg; E � �fa; b; cg[ fa; d; eg��= �fag; ffg�: (38)This describes the forma containing those sets whichcontain a and exclude f . WithN = 3, this givesfa; b; cg� fa; d; eg= �fa; b; cg; fa; b; dg;fa; b; e; g;fa; c; dg; fa; c; eg;fa; d; e;g: (39)Thus, R3 for these formae can be understood as anoperator which

1. copies all the elements which are common to thetwo parents into the child;

2. fills the remaining places in the child with a ran-dom selection of the unused elements from thetwo parents.18 Technically, there is a problem given the definition of orthogonality, when

equivalence relations with description sets of size greater than or equalto the fixed size of solution sets are considered, but this is a very minorconsideration.

11

Page 12: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

So a child � of � and � has the natural properties� \ � � � � � [ �: (40)It is clear, therefore, that in this case R3 strictly trans-mits genes where a gene corresponds to an element ofE and an allele to the presence or absence of that ele-ment (equation28). Notice, however, that the counter-example used at the end of section 5.2 remains valid,so that that the formae are not separable, with the con-sequence that R3 cannot properly assort them. Thisagain arises directly from the restriction to fixed-sizesets.

An alternativewayof viewing this operator is to imag-ine a conventional linear chromosome in which everyposition represents an element from the universal set,and to imagine an operator like uniform crossover,but constrained so that the total number of 1’s on thechild is constant and equal to N , the fixed size of theset.

5.6 Fixed-Size Multisets

The extension of the previous case from sets to mul-tisets is in essence simple, but involves one complica-tion. The basic idea will be that rather than specifywhether or not an element is amember of themultisetunder consideration, a forma will specify the multi-plicities of some elements. Formally, let Pm(E) be themultipower set of E , that is, the set of allmultisetswhoseelements are drawn from E . Then themultiplicity func-tion m : E � Pm(E) �! Z+ [ f0g (41)is defined so thatm(x; �) is the number of copies of xin the multiset �.A forma for multisets could either specify exact mul-tiplicities for elements or could give bounds on theirmultiplicities. Since the former is a special case of thelatter, where the bounds aremaximally tight, themoregeneral case will be examined.

A forma is now conveniently described by a set of3-tuples of the form (x;N #x; N "x) each of which is un-derstood to specify that the multiplicitym(x; �) of theelement x in the set � lies in the inclusive range N #x toN "x . For example, a forma � with the description seth�i = f(a; 0; 0); (b; 1; 3)g (42)contains all those multisets over E of size N whichcontain no copies of a and contain between 1 and 3copies of b (figure 8).As usual, there are a number of sets of equivalence re-lations which could be constructed to generate these

a b c d e f0

1

2

3

4

5

Figure 8: A visualisation of the forma � withdescription set h�i = f(a; 0; 0); (b; 1;3)g. Thefull ranges for elements c to f correspond tothe “don’t care” character familiar from conven-tional schemata.

formae, and again an obvious starting point is equiv-alence relations based on the lowest-order formae.Thus the equivalence relation which induces theforma described by h�i = f(x;N #x; N "x)g would havethe same description seth i = �(x;N #x; N "x) (43)and would be defined by (�; �) = 8>>>><>>>>: 1; ifm(x; �);m(x; �) < N #x

orm(x; �);m(x; �) 2 [N #x; N "x]orm(x; �);m(x; �) > N "x,0; otherwise, (44)

where [N #x; N "x] = �n 2 Z �� N #x � n � N "x: (45)As was the intention, formae can now specify a rangeof multiplicities for any element; a single equivalencerelation can in fact be seen to suffice to define up tothree ranges simultaneously. The problem of findinga basis for these equivalence relations will now beconsidered.

The natural candidates to form a basis are the equiva-lence relationswhich divide the range ofmultiplicitiesfor a single element into a lower portion and an upperportion, as shown in figure 9,E = � 2 �� h i = f(x; 0; N "x)g; x 2 E ; N "x 2 [0; N?](46)where N? is the maximum allowed multiplicity foran element. These equivalence relations can easilybe seen to be complete, for any “first order” equiv-alence relation19 with a description set f(x;N #x; N "x)g19 Once genes have been defined, order can be defined for formae in a waysimilar to the definition for schemata.

12

Page 13: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

x x x x x x 0 1 2 3 4 50

1

2

3

4

5

Figure 9: The set of equivalence relations withdescription sets of the form f(x; 0; N )g dividethe range [0; N?] into a lower and and upperportion as shown: the equivalence relationsmaybe thought of as simple dividing lines at integer-plus-half values.

x x x 1 3 T =0

1

2

3

4

5

Figure 10: Any equivalence relation definedon a single element can be constructed as anintersection of the elements in E as defined inequation 46. Here has the description seth i = (x; 2; 3) and is constructed as the inter-section of 1 with description set h 1i = (x; 0; 1)and h 3i = (x; 0; 3).

can be constructed as an intersection of the relationswithdescription sets fx; 0; N "xg and f(x;N #x; N?)g (fig-ure 10). Higher order equivalence relations can thenbe constructed trivially by intersection. It is equallyeasy, however, to see that the relations inE do not sat-isfy the condition of orthogonality specified in equa-tion 26. To verify this, simply note that if a set is amember of the forma with description set f(x; 0; 2)git cannot also be a member of the forma with the de-scription set f(x; 4; N?)g, as would be required if Ewere orthogonal (equation 26).

Rather than abandon this potential basis, it is instruc-tive to return to the analogywith linear algebra whichled to the original formulation of the conditions ona basis, namely completeness and orthogonality. In

linear algebra there is a weaker notion than orthogo-nality known as linear independence: a set of vectors issaid to be linearly independent if no one of them canbe expressed as a linear combination of the others.Following the analogy, a set of equivalence relationswill be said to be independent if no one of them can beconstructed as an intersection of some of the others.The set E defined in equation 46 satisfies this weakercondition.20The purpose of introducing the notion of a completeorthogonal basis for a set of equivalence relations wasto generalise the notion of a gene and allowaprincipleof strict gene transmissiontobe extended tomore gen-eral formae. It will be demonstrated below that theweaker notion of a complete non-orthogonal basis suf-fices for the definition of genes, and thus is adequatefor the original purpose. The notion of independenceis formalised as follows:� (Independence)A setE of equivalence relations will be said to beindependent if no one of the relations 2 E canbe expressed as the intersection of some subset ofthe others, i.e.9= 2 E 9E � E : \E = : (47)

Using the samedefinition of genes and alleles for non-orthogonal bases as for orthogonal bases, (i.e. genesare the basic equivalence relations and alleles are thebasic equivalence classes) it is now possible to con-struct the inheritance crossover operator induced by thebasis E for , described by equation 46.The inheritance crossover operation can be definedin a way similar to random respectful recombination,the difference being that instead of selecting from thesimilarity set of the twoparent chromosomes it selectsfrom the subset of chromosomes in the similarity setwhich have every gene in common with one or otherparent. This subset, which for parents � and � is writ-ten � � �, is called their inheritance set, and definedby� � � = �� 2 � � � �� 8 2 E : � 2 [�] [ [�] (48)where [�] is the equivalence class induced by towhich � belongs. The inheritance crossover operatorpicks each element in the inheritance set of the par-entswith equal probability, and both strictly transmitsgenes andproperly assorts formae provided that theseconditions are compatible. (The proof of this is identi-cal in form to the proof that R3 respects and properlyassorts a set of formae, section 5.1.)20 A rather minor point which should nevertheless be made in passing isthat the formae now being considered violate the closure discussed inRadcliffe [22, 23]: this turns out to be unimportant.

13

Page 14: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

The formalism developed above can now be appliedto the problem of recombining fixed-size multisets.The similarity set of two chromosomes (now multi-sets) is the forma with the description seth���i = �(x;N #x; N "x) �� N #x = min�m(x; �);m(x; �)�;N "x = max�m(x; �);m(x; �)�.(49)This similarity set contains all those multisets of thegiven fixed sizeN which have at least as many copiesof each element as the parent with fewer copies, andno more than the number held by the parent withmore. For example, if the chosen fixed size for themultisets is five, and the universal set E is given byequation 12, then given� = fja; a; a; b; cjg (50)and � = fja; b; b; c; djg (51)the similarity set is described byh� � �i = �(a; 1; 3); (b; 1; 2); (c; 1;1);(d; 0; 1); (e; 0; 0); (f; 0; 0): (52)The similarity set itself contains those multisets con-taining fja; b; cjg together with exactly two elementsfrom fja; a; b; djg. The inheritance set of any � and � is,for these equivalence relations, identical to the simi-larity set. To see this, consider any basic equivalencerelation with the description seth i = f(x; 0; N )g: (53)This has two equivalence classes, described byh�1i = f(x; 0; N )g;h�2i = f(x;N + 1; N?)g: (54)If both parents belong to the same basic forma, thentheir similarity set is clearly a subset of this forma. If,however, they belong to different basic formae, thensince there are only two of these, the requirement thattheir similarity set lie in their union is no restrictionat all. Thus inheritance sets for these equivalencerelations are indeed identical to similarity sets21 andso it can be seen that strict gene transmission is in thiscase no stronger a requirement than respect.

5.7 Variable-Size Multisets

Variable-sizemultisets can be dealt with simply by re-laxing the constraint of fixed size as discussed in theprevious section. The formae then arrived at are sepa-rable and the inheritance crossover operator (which is21 This is not, of course, true in general.

in this case identical to theR3 operator) not only prop-erly assorts and respects the formae, but also strictlytransmits genes as a direct consequence of the identityof the similarity and inheritance sets.

In summary, R3 for variable-size multisets simply in-serts a number of copies of each element from the uni-versal set which is bounded by the number of copiesin the two parents, and in doing so strictly transmitsgenes and properly assorts the formae induced bythe equivalence relations generated by the completespanning basis of equation 46.

6 Non-Separability of Formae

Before going on to apply the results of section 5 tothe problem of optimising neural network topologies,it is appropriate to consider the general problem ofnon-separability of formae, which arises from someofthe formae, bases and recombination operators con-structed thus far.

6.1 Background on Formae

There were a number of motivations for the formaanalysis developed by Radcliffe [22, 21, 23], the mostimportant of which can be summarised as follows:

1. Nature of Representation.Holland’s schemata are defined for fixed-lengthstrings for which each locus has a well-definedallele set, with the implicit assumption that alldistributions of alleles over loci represent validsolutions. For many problems, including thosefrom graph theory, set theory, constrained opti-misation and neural networks, no useful codingof this form is known. In any case, it is frequentlyvery muchmore convenient to store andmanipu-late structures in a non-string form, as is the case,for example, with Koza’s evolution of Lisp pro-grams (Koza [13, 14]). A generalisation was thusrequired.

2. Genotype-Phenotype Mapping.Schemata are defined in the representation spaceand thus group together genotypes (chromo-somes). Where the genotype and phenotypespaces are isomorphic (that is, every chromosomecorresponds to exactly one solution in the realsearch space S and conversely every solution inS is represented by exactly one chromosome in C)this is probably acceptable, but where the map-ping ismore complex than this itmay bedesirableto define formae in the true search space S.

14

Page 15: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

3. Redundancy.A particular example of the inadequacy ofschemata arises when the coding introduces re-dundancy, as tends to be the case in graph andset optimisation problems (see section 2).

4. Generality.The groupings of chromosomes which can beexpressed by schemata are variegated and havebeen shown to be sufficiently general to be usefulin many problems. Nevertheless, there are manyother cases where it is desirable to be able to useother partitionings of the search space. Formaanalysis allows this.

5. Intrinsic Parallelism.The counting argument which is sometimesusedto claim that binary representations are morepowerful than those of higher cardinality ap-plies only if attention is restricted to traditionalschemata. If more general formae are consid-ered, the argument no longer holds. The degreeof intrinsic parallelism which can be inferred isdefined entirely by the selection of formae (Rad-cliffe [22, 21]).

Having shown that the “schema theorem” applies togeneral formae in exactly the samewayas to schematagiven suitable expressions for the disruption coeffi-cients (Radcliffe [22],Vose [32]) the question becamehow to manipulate formae sensibly. Guidance wastaken from studies of the traditional crossover opera-tors for conventional linear chromosomes.

The three characteristics of recombination operatorswhich were suggested to be desirable are as follows:� Respect.The formulation of the principle of respect wasmotivated by a desire to ensure that in caseswhere the parents share some attribute, childrenare guaranteed to inherit that attribute also. (Thisis qualified only by mutation, which is tradition-ally understood to serve the important but sec-ondary role of ensuring that the entire searchspace remains accessible (Holland [12]), thoughsee also Schaffer and Eschelman [27] and refer-ences therein.) Thus respect requires that when-ever two parents are both a member of someforma, all their offspring be members of thatforma also. This principle has been indepen-dently formulated by Vose [32, 33].� Gene TransmissionRespect alone is not enough to ensure that ev-ery gene possessed by a child is taken from oneor other parent. The introduction of the notion

of a complete orthogonal basis for a set of equiva-lence relations which induce the chosen formaeallowed a general notion of gene to be formalised(Radcliffe [23]) and thus allowed a principle of(strict) gene transmission to be formulated. Theintroduction of non-orthogonal bases in this pa-per allows further application of the principle. Itshould be noted that gene transmission impliesrespect.� AssortmentThe notion of assortment can be viewed as an ex-tension and formalisation of the “building-blockhypothesis” (Goldberg [5]) which expresses someof the most fundamental beliefs about the way inwhich genetic search proceeds. The key idea isthat by recombining two solutions it is sometimespossible to piece together a solution which com-bines properties of the two parents. A recombi-nation operator is said properly to assort a set offormae if it is the case that whenever one parent �is a member of one forma �, and another parent �0is a member of another forma �0, then providedthat the intersection of the two formae is non-empty (that is, provided that some chromosomeexists which is amember of both formae) it is pos-sible that the recombinationwill produce a child �which is a member of both formae, i.e. � 2 � \ �0.It is interesting to note that while traditional one-point crossover (or indeed, n-point) does notproperly assort schemata in the absence of aninversion operator, when inversion is present,it does.22 Uniform crossover, on the otherhand, does properly assort schemata. Moreover,while one-point crossover does not properly as-sort schemata in the absence of inversion, it doesweakly assort them in the sense that given a finitenumber of generations and applications, it doeshave that ability to assemble a chromosome inthe intersection of any two compatible schemata,given suitable parents.

The problem faced in the case of fixed-size sets andmultisets with the formae discussed in section 5 isthat the requirements of respect and assortment areincompatible, so that the formae are said to be non-separable. This is illustrated in figure 11.

6.2 Examples of Non-Separability

Examples of non-separable formae have already beenseen in sections 5.2, 5.5 and 5.6, and the previous diffi-culties in even respecting reasonable formae for prob-22 This assumes that linkage is taken to be unspecified.

15

Page 16: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

� �0� \ �0� �0� � �0� � �0 C� �0� \ �0� �0� � �0� � �0 C

Figure 11: The top figure shows non-separableformae. Notice that the intersection of � and �0,while non-empty, does not lie within the simi-larity set of chromosomes � 2 � and �0 2 �0. Thebottom figure shows separable formae.

lems in neural networks has also been mentioned insection 2. The travelling sales-rep problem (TSP) pro-vides further interesting examples of non-separability.Whitley [40] and Radcliffe [23] have both argued thatin tackling the TSP it is essential to focus attention onedges rather than nodes.

There are two obvious sets of formae which mightbe constructed for the TSP which are based on edges,the difference being whether the edges are consid-ered to be directed or undirected. In either case, aforma is characterised by a set of edges which a tourmust contain in order to be an instance of the forma.Radcliffe [23] has show that non-directed edge for-mae are not separable using the example in figure 12.The example in figure 13, which is due to Vose [31],suffices to show that directed edge formae are alsonon-separable. It should be noted, however, that thissecond example relies on the introduction of a cycle,which is only permitted if the cycle forms the entiretour. In this sense, the problem with directed edges isperhaps less severe than with non-directed edges.

Whitley has constructed a genetic edge recombinationoperator with the specific aim of ensuring high trans-mission of edges from parents to children.23 The firstversion of this operator (Whitley et al. [40]) did not en-23 It should be noted that transmission of edges is subtly different from trans-mitting edge-formae (see section 6.4).

3

2

1 1

2

4

Figure 12: The tour fragment on the left is a

member of the forma described by f$23g, whichcontains all tours which include the 2–3 or the3–2 edge. Similarly, the tour fragment on the

right is amemberof the formadescribed by f$24g.Since both tours are also a member of the forma

described by f$12g, however, if these formae areto be respected all of their children must con-tain the 1–2 edge, thus preventing a child beingproduced which is a member of the forma de-

scribed by f$23; $24g, the intersection of those de-scribed by f$23g and f$24g. Thus these formae arenon-separable.

3

2

1

4

3

2

1

4� �0Figure 13: � is a member of the directed edge formadescribed by h�i = f!12; !34g, containing thosetours which include the directed edges

!12 and!34. Similarly, �0 is a member of the forma de-scribed by h�0i = f!41g. Since, however, both �and �0 are members of the directed edge formadescribed by f!23g, respect requires that all theirchildren contain the

!23 edge, thuspreventing theconstruction of amember of �\�0, which has thedescription set f!12; !34; !41g.

16

Page 17: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

sure respect since it allowed children tobe constructedwhich did not possess an edge common to both par-ents. It also failed to ensure proper assortment, byconcentrating exclusively on high transmission ratesfor edges. A second version of the operator was con-structed specifically to ensure respect24 (Whitley etal. [41]) and in so doing guaranteed that proper as-sortment was violated.

The important point tonote here is that if it is acceptedthat edges (whether directed or otherwise) hold thekey to the TSP, the problem of non-separability arisesimmediately.

6.3 Exploitation and Exploration

In cases such as fixed-size set recombination and thetravelling sales-rep problem it is clear that some ac-commodation between respect and assortment willbe required if the formae suggested are to be manipu-lated effectively. The conflict can be viewed as an un-usually sharp form of the familiar trade-off betweenexploitation of information already gathered (encapsu-lated by respect and gene transmission) and adequateexploration of the search space (encapsulated by as-sortment), as discussed by Holland [12].

The counter-examples used to show that the fixed-size set- and multiset formae are not separable andto show the same for both directed and non-directededge formae for the TSP share rather similar charac-teristics, so that focusingon a single examplewill haverelevance to them all. For simplicity, the example forfixed-size set formae used at the end of section 5.2 willbe revisited. Recall that the problem is that the sets� = f a; b; c g (55)and �0 = f a; d; e g (56)of size three are respective members of the formae �and �0 described by h�i = f b; c g (57)and h�0i = f d g; (58)but that the sole member of the intersection � \ �0 is� = f b; c; d g; (59)which does not lie in the similarity set �� � describedby h� � �i = fag: (60)The effect of giving primacy to respect (as do all R3operators) is to make impossible the construction of24 though it was not discussed in these terms

the solution � from these parents. This extremelywor-rying, not least because experience shows that prema-ture convergence is a common problem with geneticalgorithms. Thus if, in the present example, the pres-ence of a in a solution is generally beneficial in theearly stages of genetic search, but �, which does notcontain a, is the optimum, it is quite possible thata will become represented in every chromosome inthe population early on so that the presence of so-lutions such as � and � in a later population wouldnot allow the optimum to be constructed even thoughall of the necessary “building blocks” would appearto be present. Genetic search would be entirely de-pendent in this circumstance on a mutation whicheradicated the element a from a solution, a situationwhich though not irretrievable seems better avoided.This prospect, which could equally easily manifestitself in more realistic, larger-scale examples, is suf-ficiently worrying to suggest that assortment shouldbe given precedence over respect, the lack of whichwould seem likely to do no more than delay progresstowards an optimum, rather than imposingmutation-dependent barriers. Similar comments apply equallyto the edge formae discussed above.

6.4 Assortment

While the preceding discussion has suggested thatassortment should take priority over respect whenthere is a conflict, this does not mean that respectshould be altogether discarded in such situations. Aparameterisedoperatorwill nowbe introducedwhichallows the priority given to the two considerations tobe varied, so that with the parameter set at one end ofthe scale respect is complete (and proper assortmentis violated) and as the parameter is adjusted ever-lessregard is paid to the requirements of respect.

Consider the R3 operator for fixed-size sets. This hasbeen described as having two stages: the first con-structs a partial child which contains only the inter-section of the two parents; the remaining spaces inthe child are chosen at random from the remainingelements which the parents contain. An alternativeapproach, which would ensure proper assortment butdrastically violate respect, would involve discardingthe first stage and simply picking elements from theunion of the two parents at random. This approachwould attach no weight at all to the fact that some el-ements were present in both parents and in this sensewould have entirely disregarded respect.

These two extremes can be interpolated between byattaching a weight to elements of the union, with el-ements of the intersection being accorded a higher

17

Page 18: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

element a b c d e f

weight 2 1 1 1 1 0

probability 1/3 1/6 1/6 1/6 1/6 0

Table 1: Weights for a parameterised assorting re-combination operator of weight 2, given parent sets� = f a; b; c g and � = f a; d; e g. The last line showsthe probability that each element will be included inthe child.

weight than those present in only one parent. Theprobability of picking an element could then be madeproportional to its weight. It would seem reasonableto set the weight of elements in the intersection to atleast twice that of other elements since these elementswere present twice, once in each parent. In the case ofthe example used above (equations 55–60) this wouldlead to the weights shown in table 1. Clearly higherweights than two could be used to ensure a greaterdegree of respect, but the higher the weight is made,less assortmentwill be performed. The generalisationto multisets is given in section 9.1.3.

Constructing similarly parameterised operators forthe TSP which could be computed efficiently is moredifficult, though a paper specification for them is pos-sible.

7 Neural Network Topologies

The professed purpose of developing the machineryof set and multiset formae was to aid the applica-tion of genetic search to problems in neural networks,though the set andmultiset optimisationproblems areof independent interest. The formulation of networktopology optimisation given in section 2 was chosento bring out the multiset-like nature of the problem.

Recall that if attention is restricted to feed-forwardnetworks with a single layer of hidden units then anetwork topology can be described as a multiset ofhidden units each of which is specified by its set ofexternal connections (figure 4). This is an entirelynon-redundant representation to which the results ofthe forma analysis of earlier sections can immediatelybe applied. In principle, this should allow geneticsearch to proceed efficiently.

There is, however, an obvious complication. Supposethat a modest network with ten input nodes and tenoutput nodes is to be considered, and that up to tenhidden units will be employed (the example used in

section 2). In this case the size of the search spaceis about 4 � 1053 (equation 2). The problem is notthe size of this search space (which is fairly modestby the standards of genetic algorithms) but the factthat each chromosome will contain at most ten hid-den units, while the number of hidden unit types is220 � 106. Thus, even if a population of 10,000 wereemployed, less than one percent of the available nodetypes could be included in the population. If the nodetypes are considered as atomic and not available forrecombination it will be extremely difficult for the ge-netic algorithm to make progress.

Of course, this situation is not unfamiliar in geneticsearch, for in the classic case of parameter optimisa-tion exactly the samepredicament arises. If, for exam-ple, ten parameters are to be optimised, each codedusing twenty bits, the similarity should immediatelybe clear (though in this case the search space wouldbe larger, because each of the parameters would nor-mally have a meaning, so that this is not a set opti-misation problem). The solution usually employedis to allow recombination to take place at the sub-parameter level either through employing binary orother low-cardinality encodings or by using recombi-nation operatorswhich make use of knowledge of thehigh-level meaning of parameters. (For examples ofthe latter see Davis [4] and the discussions of rangeformae in Radcliffe [22, 23]; for a sceptical view seeGoldberg [7]).

By exploiting this analogy it will be possible bothto construct a sensible genetic approach to networktopology optimisation and to shed a little more lighton traditional parameter optimisation.

7.1 High Cardinality Representations and GeneRecombination

Consider the classic parameter optimisation problemwhere the search space is comprises vectors

v = (v1; v2; : : : ; vn) (61)so that S = I1 � I2 � � � � � In � Rn (62)with Ii = [ai; bi] (63)and the intervals Ii are understood to be discrete ap-proximations to their continuous counterparts. Usingk bits per parameter, the typical chromosomal repre-sentation would employC = Bkn (64)where B = f0; 1g: (65)

18

Page 19: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

A chromosome � 2 C would be given by� = (�11; �12; : : : ; �1k;�21; �22; : : : ; �2k;......

. . ....�n1; �n2; : : : ; �nk); (66)

where vi is represented by (�i1; �i2; : : : ; �ik). Ifschemata are defined in the space C of chromosomesthen uniform crossover respects and properly assortsthe schemata �C = f 0; 1; gnk (67)and strictly transmits genes. On the other hand, ifschemata are defined in parameter spaceS, as definedby equation 62, so that�S = nYi=1�Ii [ f g�! (68)then uniform crossover in chromosome space C nei-ther respects the schemata in�S nor transmits param-eters, though it does assort them.

Thus transmission of genes and respect can in thesecases be seen to be sub-parameter level concepts. Anumber of observations follow:

1. Using one-point crossover in the absence of inver-sion, most parameters will be transmittedwhole:a maximum of one will be crossed.

2. Using one-point crossover in the presence ofinversion, any number of parameters may becrossed.

3. Similar comments apply when using two-pointcrossover (except that up to two parameters maythen be crossed), and this is true even usingBooker’s reduced surrogate form (Booker [2]).

4. With uniform crossover, any number of parame-ters may be recombined.

5. Although the parameter-space schemata �S areproperly assorted by uniform crossover appliedin the space of binary chromosomes C, the proba-bility of generating a child in the intersection �\�0given � 2 � 2 �S (69)and �0 2 �0 2 �S (70)is much lower than if uniform crossover is ap-plied in the parameter space.25 For example, inS(7; 8)� (5; 3) = f(7; 3); (7; 8); (5; 8); (5; 3)g ; (71)25 This is true if 1-point crossover with inversion is applied in C also, though

in this case the linkages should in principle makeuseful crossesmore likely.

having four elements, whereas in C(0111; 1000)� (0101; 0011) = (01 1; 0 );(72)where the right hand side is understood to be aset with sixteen elements.

It is useful, especially in the current context, to thinkof uniform crossover as applied at the bit-level as hav-ing two components, the first of which randomly se-lects each parameter from one or other parent, whilethe second crosses these parameters. (Of course,the fact that each component is subsequently crossedmeans that the initial choice is irrelevant.) Similarly,one-point crossover in C (without inversion) can bethought of as first randomly crossing the whole pa-rameter string in S, and then crossing the parameterat the interface at the bit level. The two approachescould be combined.

This way of thinking about conventional operatorssuggests a solution to the problem of the extremelyhigh cardinality of the neural network topology rep-resentation suggested above.

7.2 Crossing Hidden Nodes

The proposed way of tackling neural network topol-ogy optimisation makes use of the previously dis-cussed breakdown of recombination into two stages.Most of the nodes in the child (say all but one) willbe generated using the multiset recombination oper-ators introduced in section 5. The final node or nodeswill be produced by directly crossing nodes from thetwo parents. This turns out to be an extremely simplematter because the hidden nodes have already beenconveniently described as binary strings (figure 4).Moreover, this binary representation of a hidden unitis highly meaningful and completely non-redundant:a one indicates the present of a connection to an ex-ternal node and a zero indicates its absence. Thusuniform crossover is the natural operator to use toperform the sub-node level cross, since it transmitsgenes and properly assorts the schemata which arethe natural formae for describing hidden units.

7.3 Direct Input-Output Connections

The formulation thus far has concentrated on strictlylayered networks so that direct connections from theinput nodes to the output nodes have not been con-sidered. Each such connection can, however, be de-scribed perfectly andwithout redundancy by a binarygene, and thus if such connections are to form partof the optimisation they may simply be added to the

19

Page 20: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

chromosome. Such portions of the gene can be recom-binedusinguniformor some other standard crossoveroperator without further complication.

8 Linkage and Forma Disruption

The discussion of recombination operators inthe previous sections has not considered twomuch-discussed and intricately related aspects ofrecombination—linkage between the components ofa solution and disruption rates for formae. These willnow briefly be reviewed.

When Holland introduced genetic algorithms withhis seminal 1975 book (Holland [12]), he listed threegeneric operators—crossover, mutation and inver-sion. While there have been experiments with in-version, which are summarised in Goldberg [5], andwhile there is still a belief among some workers thatas more complex problems are tackled the inclusionof inversion will be found to be more helpful, inver-sion operators are rarely now used in practice. Theissue of linkage is, however, relevant not only to tra-ditional linear chromosomes. In the case of the setand multiset operations which form one focus of thispaper it would be easy to add linkage information inthe usualway, tomanipulate this information throughinversion or other operators, and to use it to modifythe probability of groups of genes being transferred tochildren en masse.

Similarly, the uniform crossover operatorwith param-eter half26 is widely criticised for being unduly dis-ruptive both of short schemata and of longer ones, thelatter effect arising because uniform crossover withparameter half is biased towards taking half the ge-netic material for a child from each parent (Schafferet al. [26], Syswerda [30], Radcliffe [21]). It has beenpointed out, however, by Spears and De Jong [29] thatby using parameters other than half with uniformcrossover its degree of disruptiveness can be com-pletely controlled. Whether or not uniform crossoverappears more attractive is therefore primarily a func-tion of whether there is any a priori reason to believethat the arrangement of the genes on the chromosomegroups together bits which should be tightly coupled.(Where parameters are laid out on continuous seg-ments of the chromosome, there is a strong argumentthat this is so.)

Just as uniformcrossover can be parameterised to con-trol the amount of geneticmaterial it tends to take fromeach parent, theweights of the elements used to deter-26 i.e. equal probabilities of taking each gene from either parent

mine the child in the manner described in section 6.4can be used to control the degree of forma disruptionexhibited by the recombination operators discussedin this paper.

9 Discussion and Summary

This paper contains a number of disparate threads,principal among which are the application of geneticalgorithms to problems in neural networks and theapplication of forma analysis to the construction ofsuitable operators and representations for (multi)setrecombination. These considerations have requireda number of sub-discussions, including the intro-duction of non-orthogonal bases for equivalence re-lations and a consideration of sub-parameter levelrecombination with particular reference to crossinghidden nodes in neural network topologies. Eachof these subjects is now set in context and—whereappropriate—summarised.

9.1 Set and Multiset Recombination

Set recombination can be regarded as a special case ofmultiset recombination in which the maximum num-ber of copies of any element is one. A distinctionmust, however, be made between fixed- and variable-size multisets.

9.1.1 Formae and Equivalence Relations

Generic formae for multisets specify ranges of val-ues for the multiplicities of elements drawn from auniversal set E . These formae are induced by equiva-lence relations over the search space of which a lowprecision example is given by equation 44: (�; �) = 8>>>><>>>>:1; ifm(x; �);m(x; �) < N #x

orm(x; �);m(x; �) 2 [N #x; N "x]orm(x; �);m(x; �) > N "x,0; otherwise.

This equivalence relation is described by the descrip-tion set f(x;N #x; N "x)g. A basis for these equivalencerelations is given by equation 46:E = � 2 �� h i = f(x; 0; N "x)g; x 2 E ; N "x 2 [0; N?]:9.1.2 Variable-Size (Multi)set Recombination

The inheritance/R3 operator for these formae gener-ates a child �, given parents �; � 2 C [� Pm(E)] asfollows.

20

Page 21: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

� For each element x 2 E setm(x; �) so thatmin�m(x; �);m(x; �)� � m(x; �)� max�m(x; �);m(x; �)�;(73)where m(x; �) is the number of copies of x in �,and in the case of sets (as opposed to multisets)m(x; �) 2 f0; 1g. In each case a uniform randomchoice is made between the maximum and mini-mum allowed values for the multiplicity.

This operator respects andproperly assorts the formaeinduced by the equivalence relations in , and trans-mits the genes identified with the basic equivalencerelations in E. The operator can be biased towardsthe multiplicity of one or other parent without violat-ing these properties. Notice that� \ � � � � � [ �: (74)9.1.3 Fixed-Size (Multi)set Recombination

The corresponding inheritance crossover operator formultisets of fixed sizeN generates a child � from par-ents � and � as follows:1. Let n#x(�; �) = min(m(x; �);m(x; �)) (75)and n"x(�; �) = max(m(x; �);m(x; �)) : (76)Then for each x 2 E set the initial values for themultiplicity to be theminimum of that of the twoparents: 8x 2 E : m(x; �i) = n#x(�; �); (77)where the subscript i on �i indicates that this isthe initial child only.

2. Fill the remainingN �Xx2E n#x(�; �)places in the child � by increasing themultiplicityof elements at random subject to the followingconditions on the final child �,8x 2 E : m(x; �) � n"x(�; �); (78)and j�j = N: (79)

This operator respects formae and strictly transmitsgenes but does not assort thembecause the formae arenon-separable. For this reason, the following alterna-tive assorting recombination operator is to be preferred:

1. Assign to each element x in E a weightwx = m(x; �) +m(x; �): (80)2. Fill each available place in the child in turnpickingeach element xwith probabilityp(x) = wx�Xx02E wx0 (81)subject to the constraints on the final child � givenby equations 78 and 79.

9.1.4 Mutation

In the case of multiset (as opposed to set) recombina-tion, there is a bias in the operator towards themiddleof the range of allowed multiplicities for elements.This should be redressed by the use of end-point muta-tion operators as discussed in Radcliffe [22, 23]. These,with low probability, set the multiplicity of some el-ement in a child to 0 or N? (the maximum allowedvalue), making any other necessary adjustments ifmultisets are of fixed size.

9.2 Neural Networks

By framing three-layer neural network topology op-timisation as a multiset problem over hidden nodes,each complete with external connections, the follow-ing have been achieved:� a non-redundant representation;� the multiset recombination operator discussedabove may be applied;� suitable formae exist which can be separated.

Because, however, of the very large size of the uni-versal set E , consisting of all possible hidden unittypes (2n for networkswith n external (input/output)nodes), recombination of hidden units is also advis-able. If the hidden nodes are described by binarystrings, with each bit representing an external con-nection, conventional crossover operators (uniform,n-point etc.) can be used to perform this cross in thecontrolled ways advocated in section 7.2.

Experimental verification of the efficacy of this ap-proach remains to be demonstrated. An integratedscheme for optimising connectivity and weights alsorequires further work and might be regarded as the“Holy Grail” in this area.

9.3 Non-Orthogonal Bases

The introduction of non-orthogonal bases is a signifi-cant innovation for formaanalysis andallowsadeeper

21

Page 22: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

study of problems for which schemata are not appro-priate and for which an orthogonal basis for equiv-alence relations inducing appropriate formae is un-available.

Given a complete orthogonal basis E for a set of equivalence relations which induce the chosenformae, a chromosomal representation can be con-structed by allocating one locus to each basic equiva-lence relation in E:e1 e2 e3 e4 e5 e6 e7 e8 � � � enwhere jEj = n. The basic equivalence classes (basicformae) then serve as the alleles. Uniform crossoverstrictly transmits these genes and properly assorts theformae. The critical point here is that with an orthog-onal basis, any chromosome of this form representssome legal solution in S uniquely and every solutionin S has one such chromosomal representative.In the case of a non-orthogonal basis, not all chro-mosomes of this type are legal: constraints exist onthe permitted combinations of gene values (alleles).The work in section 5.6 shows, however, that whiletraditional crossover operators (uniform, n-point etc.)cannot manipulate these formae and solutions effec-tively, the inheritance crossover operator will respectthe formae and transmit genes while properly assort-ing whenever these conditions are compatible. Incases where gene transmission prevents proper as-sortment, operators such as the assorting crossover ofsection 6.4 can provide reasonable alternatives.

It is hoped that the examples in this paper convinc-ingly demonstrate the flexibility of forma analysis,and the insights which it can provide.

Acknowledgements

I would like to thankPeterHancock for a conversationwhich revived my interest in and consideration of theapplication of genetic algorithms to problems in neu-ral networks, and thus led indirectly to the to work inthis paper. I would also like to thank Mike Normanfor continuing occasional discussions of genetic algo-rithms, which form the backdrop to the thinking inthis paper.

References

[1] Richard K. Belew, John McInerny, and Nicol N.Schraudolph. Evolving Networks: Using theGenetic Algorithm with ConnectionistLearning. Technical Report CS90–174, UCSD(La Jolla), 1990.

[2] Lashon Booker. Improving Search in GeneticAlgorithms. In Lawrence Davis, editor, GeneticAlgorithms and Simulated Annealing. Pitman(London), 1987.

[3] Anthony N. Burkitt. Optimisation of theArchitecture of Feed-forward Neural Networkswith Hidden Layers by Unit Elimination.Complex Systems, 5(4):371–380, 1991.

[4] Lawrence Davis. Handbook of Genetic Algorithms.Van Nostrand Reinhold (New York), 1991.

[5] David E. Goldberg. Genetic Algorithms in Search,Optimization &Machine Learning.Addison-Wesley (Reading, Mass), 1989.

[6] David E. Goldberg. Genetic Algorithms andWalsh Functions: Part I, A Gentle Introduction.Complex Systems, 3:129–152, 1990.

[7] David E. Goldberg. Real-coded GeneticAlgorithms, Virtual Alphabets, and Blocking.Technical Report IlliGAL Report No. 90001,Department of General Engineering, Universityof Illinois at Urbana-Champaign, 1990.

[8] David E. Goldberg and Robert Lingle Jr. Alleles,Loci and the Traveling Salesman Problem. InProceedings of an International Conference onGenetic Algorithms. Lawrence ErlbaumAssociates (Hillsdale), 1985.

[9] Peter J. B. Hancock. GANNET: Design of aNeral Net for Face Recognition by a GeneticAlgorithm. Technical report, University ofStirling, 1990.

[10] Steven Alex Harp, Tariq Samad, and AlokeGuha. The Genetic Synthesis of NeuralNetworks. Technical Report CSDD–89–I4852–2,Honeywell, 1989.

[11] Steven Alex Harp, Tariq Samad, and AlokeGuha. Towards the Genetic Synthesis of NeuralNetworks. In Proceedings of the ThirdInternational Conference on Genetic Algorithms.Morgan Kaufmann (San Mateo), 1989.

[12] John H. Holland. Adaptation in Natural andArtificial Systems. University of Michigan Press(Ann Arbor), 1975.

[13] John R. Koza. Genetic Programming: AParadigm for Genetically Breeding Populations

22

Page 23: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

of Computer Programs to Solve Problems.Technical Report STAN-CS-90-1314, StanfordUniversity, 1990.

[14] John R. Koza. Evolving a Computer to GenerateRandom Numbers Using the GeneticProgramming Paradigm. In Proceedings of theFourth International Conference on GeneticAlgorithms, pages 37–44. Morgan Kaufmann(San Mateo), 1991.

[15] Geoffrey F. Miller, Peter M. Todd, andShailesh U. Hegde. Designing Neural Networksusing Genetic Algorithms. In Proceedings of theThird International Conference on GeneticAlgorithms. Morgan Kaufmann (San Mateo),1989.

[16] David J. Montana and Lawrence Davis.Training Feedforward Neural Networks UsingGenetic Algorithms. In Proceedings of theEleventh International Joint Conference on ArtificialIntelligence, pages 762–767, 1989.

[17] H. Muhlenbein. Parallel Genetic Algorithms,Population Genetics and CombinatorialOptimization. In Proceedings of the ThirdInternational Conference on Genetic Algorithms.Morgan Kaufmann (San Mateo), 1989.

[18] H. Muhlenbein and J. Kindermann. TheDynamics of Evolution and Learning —Towards Genetic Neural Networks. In P. Pfiefer,Z. Schreter, F. Fogelman-Soulie, and L. Steels,editors, Connectionism in Perspective.North-Holland, 1989.

[19] Michael Norman. A Genetic Approach toTopology Optimisation for MultiprocessorArchitectures. Technical report, University ofEdinburgh, 1988.

[20] Nicholas J. Radcliffe. The Permutation Problem.(unpublished manuscript), 1988.

[21] Nicholas J. Radcliffe. Genetic Neural Networks onMIMD Computers. PhD thesis, University ofEdinburgh, 1990.

[22] Nicholas J. Radcliffe. Equivalence ClassAnalysis of Genetic Algorithms. ComplexSystems, 5(2):183–205, 1991.

[23] Nicholas J. Radcliffe. Forma Analysis andRandom Respectful Recombination. InProceedings of the Fourth International Conferenceon Genetic Algorithms, pages 222–229. MorganKaufmann (San Mateo), 1991.

[24] Mike Rudnick. A Bibliography of theIntersection of Genetic Search and Neural

Networks. Technical Report CS/E 90-001,Oregon Graduate Centre, 1990.

[25] D. E. Rumelhart, G. E. Hinton, and R. J.Williams. Learning Representations byBack-Propagating Errors. Nature, 323, 1986.

[26] J. David Schaffer, Richard A. Caruna, Larry J.Eshelman, and Rajarshi Das. A Study of theControl Parameters Affecting OnlinePerformance of Genetic Algorithms for FunctionOptimisation. In Proceedings of the ThirdInternational Conference on Genetic Algorithms.Morgan Kaufmann (San Mateo), 1989.

[27] J. David Schaffer and Larry J. Eshelman. OnCrossover as an Evolutionarily Viable Strategy.In Proceedings of the Fourth InternationalConference on Genetic Algorithms, pages 61–68.Morgan Kaufmann (San Mateo), 1991.

[28] J. Seitsma and R. J. F. Dow. Neural Net Pruning—Why and How. In Proceedings of the IEEEConference on Neural Networks, volume II, 1988.

[29] William M. Spears and Kenneth A. De Jong. Onthe Virtues of Parameterised UniformCrossover. In Proceedings of the FourthInternational Conference on Genetic Algorithms,pages 230–236. Morgan Kaufmann (San Mateo),1991.

[30] Gilbert Syswerda. Uniform Crossover inGenetic Algorithms. In Proceedings of the ThirdInternational Conference on Genetic Algorithms.Morgan Kaufmann (San Mateo), 1989.

[31] Michael D. Vose. Personal Communication,1991.

[32] Michael D. Vose. Generalizing the Notion ofSchema in Gentic Algorithms. ArtificialIntelligence, (in press).

[33] Michael D. Vose and Gunar E. Liepins. SchemaDisruption. In Proceedings of the FourthInternational Conference on Genetic Algorithms,pages 237–243. Morgan Kaufmann (San Mateo),1991.

[34] Gerhard Weiss. Combining Neural andEvolutionary Learning: Aspects andApproaches. Technical Report FKI–132–90,Forschungsberichte Kunstliche Intelligenz, 1990.

[35] Darrell Whitley. Applying Genetic Algorithmsto Neural Network Problems: A PreliminaryReport. (Unpublished manuscript).

[36] Darrell Whitley. Using Reproductive Evaluationto Improve Genetic Search and HeuristicDiscovery. In Proceedings of the Second

23

Page 24: GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation · 2014-09-30 · EPCCTechnicalReportEPCC-TR-91-21 GeneticSetRecombinationanditsApplicationto NeuralNetworkTopologyOptimisation

International Conference on Genetic Algorithms.Lawrence Erlbaum Associates (Hillsdale), 1987.

[37] Darrell Whitley, Stephen Dominic, and RajarshiDas. Genetic Reinforcement Learning WithMultilayer Neural Networks. In Proceedings ofthe Fourth International Conference on GeneticAlgorithms. Morgan Kaufmann (San Mateo),1991.

[38] Darrell Whitley and Thomas Hanson. TheGenitor Algorithm: Using Genetic Algorithmsto Optimize Neural Networks. Technical ReportCS-89-107, Colorado State University, 1989.

[39] Darrell Whitley, Timothy Starkweather, andChristopher Bogart. Genetic Algorithms andNeural Networks: Optimizing Connections andConnectivity. Technical Report CS-89-117,Colorado State University, 1989.

[40] Darrell Whitley, Timothy Starkweather, andD’Ann Fuquay. Scheduling Problems andTraveling Salesmen: The Genetic EdgeRecombination Operator. In Proceedings of theThird International Conference on GeneticAlgorithms. Morgan Kaufmann (San Mateo),1989.

[41] Darrell Whitley, Timothy Starkweather, andDanial Shaner. The Traveling Salesmen andSequence Sheduling: Quality Solutions UsingGenetic Edge Recombination. In LawrenceDavis, editor, Handbook of Genetic Algorithms.Van Nostrand Reinhold (New York), 1991.

24