Gary D. Boetticher Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA
description
Transcript of Gary D. Boetticher Boetticher@uhcl Univ. of Houston - Clear Lake, Houston, TX, USA
The Assessment and Application of Lineage Information in Genetic Programs for Producing Better Models
Gary D. Boetticher [email protected]. of Houston - Clear Lake, Houston, TX, USA
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Kim Kaminsky [email protected]. of Houston - Clear Lake, Houston, TX, USA
About the Author: Gary D. Boetticher
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Ph.D. in Machine Learning and Software Engineering
A neural network-based software reuse economic model Executive member of IEEE Reuse Standard Committees (1990s) Commercial consultant:
U.S. Olympic Committee, LDDS Worldcom, Mellon Mortgage, … Currently: Associate Professor
Department of Comp. Science/Software Engineering
University of Houston - Clear Lake,
Houston, TX, USA
[email protected] Research interests: Data mining, ML, Computational Bioinformatics,
and Software metrics
Motivating Questions
Does chromosome lineage information within a Genetic Program (GP) provide any insight into the effectiveness of solving problems?
If so, how could these insights be utilized to make better breeding decisions?
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
2) Determine the fitness for each (1 /Stand. Error)http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on
Information Reuse and Integration
Genetic Program Overview
X, Y, and Z RESULT?
X Y Z RESULT
2 4 5 30
5 3 2 16
: : : :
1 3 6 24
1) Create a population of equations
Eq# Equation
1 X+Y
2 (Z-X)*Y+X
: :
1000 (X*X)-Z
87
84
:
57
3) Breed Equations
X + Y
(Z-X) * Y+X
(Z-X) + Y
X * Y+X
4) Generate new populations and breed until a solution is found
Genetic Program Overview
Equation Fitness
(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 75
: :
Y 22
Y - X 18
Generation N Generation N+1
Equation Fitness
(X - Z)
(X + Y) * (Y * Y)
Z + Y
:
X
Y + Y
Why discard legacy information?
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Goal: Examine fitness patterns over time
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
Equation Fitness(X+Y) 87
(X - Z) * (Y * Y) 86
ZY 85
(X - Z) * (Y * Y) 84
Y 79
Y - X 75
Z + Y 75
(X - Z) * (Y * Y) 75
Y 73
Y - X 71
(X - Z) * (Y * Y) + W + W 68
Y - X 67
ZY 66
(X - Z) * (Y * Y) 66
Y 65
Y - X 65
(X - Z) * (Y * Y) + W + W 64
Y - X 64
Z - Y 62
(X - Z) * (Y * Y) 59
Y 58
Y - X 55
(X - Z) * (Y * Y) + W + W 44
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Generation 1 Generation 2 Generation 3
Localized?
Volatile?
Proof of Concept Experiments - 1
5 experiments using synthetic equations:Z = W + X + Y
Z = 2 * X + Y – W
Z = X / Y
Z = X3
Z = W2 + W * X - Y
Data slightly perturbedto prevent prematureconvergence
Genetic Program1000 Chromosomes (Equations)50 GenerationsBreeding based on fitness rank
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Proof of Concept Experiments - 2
For the 1000 Chromosomes:
Divide into 5 groups of 200(by fitness)
Focus on the best, middle, and worst groups
See where each group’s offspring occur in the next generation
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = W + X + Y
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = 2 * X + Y – W
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = X / Y
Best
MiddleWorst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = X 3
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Results for Z = W 2 + W * X - Y
Best
Middle
Worst
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Applied ExperimentsBest class produces best offspring. Now what?Compare 2 Genetic Programs (GPs)
1) Use a vanilla-based GP2) Use a GP that breeds only the top 20% of a
population and replicates 5 times.
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Genetic Program1000 Chromosomes (Equations)50 Generations20 Trials
Equations to modelZ = Sin(W) + Sin(X) + Sin(Y)
Z = log10
(WX) + (Y * Z)
Results for Z = Sin(W) + Sin(X) + Sin(Y)
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Vanilla-Based
GP
Lineage-Based
GPAverage Fitness 591.8 740.9
Average r2 0.8734 0.9315
Ave. Generations needed to complete
29.1
28.5
Results for Z = log10
(W X) + (Y * Z)
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Vanilla-Based
GP
Lineage-Based
GPAverage Fitness 210.9 346.5
Average r2 0.7244 0.8069
Ave. Generations needed to complete
50.0
48.6
Conclusions
http://nas.cl.uh.edu/boetticher/publications.html The 2006 IEEE International Conference on Information Reuse and Integration
Proof of concept experiments demonstrate the viability of considering lineage in GPs
Applied experiments show that lineage-based GP modeling produce better results faster