CS 478 - Machine Learning Genetic Algorithms (II).

11
CS 478 - Machine CS 478 - Machine Learning Learning Genetic Algorithms (II) Genetic Algorithms (II)

Transcript of CS 478 - Machine Learning Genetic Algorithms (II).

Page 1: CS 478 - Machine Learning Genetic Algorithms (II).

CS 478 - Machine LearningCS 478 - Machine Learning

Genetic Algorithms (II)Genetic Algorithms (II)

Page 2: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 22

Schema (I)Schema (I)

A schema A schema HH is a string from the extended alphabet {0, 1, is a string from the extended alphabet {0, 1, *}, where * stands for “don't-care” (i.e., a wild card)*}, where * stands for “don't-care” (i.e., a wild card)

A schema represents or matches a number of strings:A schema represents or matches a number of strings:

SchemSchemaa

RepresentativesRepresentatives

*1**1* 010, 011, 110, 010, 011, 110, 111111

10*10* 100, 101100, 101

00*1100*11 00011, 0011100011, 00111 There are 3There are 3LL schemata over strings of length schemata over strings of length LL

Page 3: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 33

Schema (II)Schema (II)

Since each position in a string may take on either Since each position in a string may take on either its actual value or a *, each binary string in a GA its actual value or a *, each binary string in a GA population contains, or is a representative of, 2population contains, or is a representative of, 2LL schemataschemata

Hence, a population with Hence, a population with nn members contains members contains between 2between 2LL and min( and min(nn22LL, 3, 3LL) schemata, depending ) schemata, depending on population diversity. (The upper bound is not on population diversity. (The upper bound is not strictly strictly nn22LL as there are a maximum of 3 as there are a maximum of 3LL schemata)schemata)

Geometrically, strings of length Geometrically, strings of length LL can be viewed as can be viewed as points in a discrete points in a discrete LL-dimensional space (i.e., the -dimensional space (i.e., the vertices of hypercubes). Then, schemata can be vertices of hypercubes). Then, schemata can be viewed as hyperplanes (i.e., hyper-edges and viewed as hyperplanes (i.e., hyper-edges and hyper-faces of hypercubes)hyper-faces of hypercubes)

Page 4: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 44

Schema OrderSchema Order

The order of a schema The order of a schema HH is the number of non * is the number of non * symbols in symbols in HH

It is denoted by It is denoted by oo((HH):):

SchemaSchema OrderOrder

1*1*011*1*01 44

0*0* 11

*0**1*0**1 22

A schema of order A schema of order oo over strings of length over strings of length LL represents 2represents 2LL--oo strings strings

Page 5: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 55

Schema Defining LengthSchema Defining Length

The defining length of a schema The defining length of a schema HH is the is the distance the first and last non * symbols in distance the first and last non * symbols in HH

It is denoted by It is denoted by ((HH):):

SchemaSchema Defining LengthDefining Length

1*1*011*1*01 55

*1*1*1*1 22

*0****0*** 00

Page 6: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 66

Intuitive ApproachIntuitive Approach

Schemata encode useful/promising characteristics Schemata encode useful/promising characteristics found in the population.found in the population.

What do selection, crossover and mutation do to What do selection, crossover and mutation do to schemata?schemata? Since more highly fit strings have higher probability of Since more highly fit strings have higher probability of

selection, on average an ever-increasing number of selection, on average an ever-increasing number of samples is given to the observed best schemata.samples is given to the observed best schemata.

Crossover cuts strings at arbitrary sites and swaps. Crossover cuts strings at arbitrary sites and swaps. Crossover leaves a schema unscathed if it does not cut the Crossover leaves a schema unscathed if it does not cut the schema, but it may disrupt a schema when it does. For schema, but it may disrupt a schema when it does. For example, 1***0 is more likely to be disrupted than **11* example, 1***0 is more likely to be disrupted than **11* is. In general, schemata of short defining length are is. In general, schemata of short defining length are unaltered by crossover.unaltered by crossover.

Mutation at normal, low rates does not disrupt a particular Mutation at normal, low rates does not disrupt a particular schema very frequently.schema very frequently.

Page 7: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 77

Intuitive ConclusionIntuitive Conclusion

Highly-fit, short-defining-length schemata Highly-fit, short-defining-length schemata (called (called building blocksbuilding blocks) are propagated ) are propagated

generation to generation by giving generation to generation by giving exponentially increasing samples to the exponentially increasing samples to the

observed bestobserved best

……And all this takes place in parallel, with no memory And all this takes place in parallel, with no memory other than the population. This parallelism as been other than the population. This parallelism as been termed termed implicitimplicit as as nn strings of length strings of length LL actually allow actually allow min(min(nn22LL, 3, 3LL) schemata to be processed.) schemata to be processed.

Page 8: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 88

Formal AccountFormal Account

See the PDF document containing a See the PDF document containing a formal account of the effect of formal account of the effect of selection, crossover and mutation, selection, crossover and mutation, culminating in the Schema Theorem.culminating in the Schema Theorem.

Page 9: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 99

Prototypical Steady-state Prototypical Steady-state GAGA

PP pp randomly generated hypotheses randomly generated hypotheses For each For each hh in in PP, compute , compute fitnessfitness((hh)) While maxWhile maxhh fitnessfitness((hh) < ) < thresholdthreshold (*)(*)

PPss Select Select r.pr.p individuals from individuals from PP (e.g., FPS, RS, (e.g., FPS, RS, tournament)tournament)

Apply crossover to random pairs in Apply crossover to random pairs in PPss and add all offspring and add all offspring to to PPoo

Select Select mm% of the individuals in % of the individuals in PPoo with uniform probability with uniform probability and apply mutation (i.e., flip one of their bits at random)and apply mutation (i.e., flip one of their bits at random)

PPww r.pr.p weakest individuals in weakest individuals in PP PP PP – – PPww + + PPoo

For each For each hh in in PP, compute , compute fitnessfitness((hh))

Page 10: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 1010

Influence of LearningInfluence of Learning

Baldwinian evolution: Baldwinian evolution: learned behaviour causes changes only to learned behaviour causes changes only to the fitness landscapethe fitness landscape

Lamarckian evolution: Lamarckian evolution: learned behaviour also causes changes to learned behaviour also causes changes to the parents' genotypesthe parents' genotypes

Example:Example: … … calculating fitness involves two steps, namely k-means clustering calculating fitness involves two steps, namely k-means clustering

and NAP classification. The effect of k-means clustering is to refine the and NAP classification. The effect of k-means clustering is to refine the starting positions of the centroids to more “representative” final starting positions of the centroids to more “representative” final positions. At the individual's level, this may be viewed as a form of positions. At the individual's level, this may be viewed as a form of learning, since NAP classification based on the final centroids' positions learning, since NAP classification based on the final centroids' positions is most likely to yield better results than NAP classification based on is most likely to yield better results than NAP classification based on their starting positions. Hence, through k-means clustering, an their starting positions. Hence, through k-means clustering, an individual improves its performance. As fitness is computed after individual improves its performance. As fitness is computed after learning, GA-RBF makes implicit use of the Baldwin effect. (Here, we learning, GA-RBF makes implicit use of the Baldwin effect. (Here, we view the result of k-means clustering, namely the improved positions of view the result of k-means clustering, namely the improved positions of the centroids, as the learned “traits”). A straightforward way of the centroids, as the learned “traits”). A straightforward way of implementing Lamarckian evolution consists of coding the new implementing Lamarckian evolution consists of coding the new centroids’ positions back onto the chromosomes of the individuals of centroids’ positions back onto the chromosomes of the individuals of the current generation, prior to genetic recombination.the current generation, prior to genetic recombination.

Page 11: CS 478 - Machine Learning Genetic Algorithms (II).

Fall 2004Fall 2004 CS 478 - Machine LearningCS 478 - Machine Learning 1111

ConclusionConclusion

Genetic algorithms are used Genetic algorithms are used primarily for:primarily for: Optimization problems (e.g., TSP)Optimization problems (e.g., TSP) Hybrid systems (e.g., NN evolution)Hybrid systems (e.g., NN evolution) Artificial lifeArtificial life Learning in Learning in classifier systemsclassifier systems