1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos...
-
Upload
jordan-mccoy -
Category
Documents
-
view
213 -
download
0
Transcript of 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos...
11
Class Project 510Class Project 510
Team Members
John A. Watne
Jordan D. Howe
Ian R. Erlanson
Geoffrey A. Reglos
Sengdara Phetsomphou
22
Project OverviewProject OverviewI.I. Problem DescriptionProblem DescriptionII.II. Requirements AnalysisRequirements AnalysisIII.III. TechnologyTechnologyIV.IV. Settings and System DesignSettings and System DesignV.V. AlgorithmAlgorithmVI.VI. Graphical User Interface (GUI)Graphical User Interface (GUI)VII.VII. Lessons LearnedLessons LearnedVIII.VIII. Future EnhancementFuture Enhancement
33
Problem DescriptionProblem Description• In this project, we are attempting to design a Genetic
Programming system that will produce a pre-defined mathematical equation equivalent to (y = (x² + 1) / 2), derived from training data consisting of several values for x and the resulting values for y.
• Analogous to DNA evolution, this program will display characteristics, such as crossover and mutation.
• Key components of the system are a fitness and selection function that will decide if the generated solution meets minimum requirements.
• We expect that each subsequent generation of solutions will be “better” – that is, will better reproduce the training data – than the previous generation, thus eventually resulting in a correct mathematical equation.
44
Requirement AnalysisRequirement Analysis
– Given training data, consisting of a set of ten positive x values and the matching y values, the genetic programming system will generate a function that closely matches the pre-defined mathematical function, y = (x² +1)/2.
– The resulting function must be generated within the allotted fifteen minutes.
– The expected output of the system will consist of• Mathematical function: y = (x² +1)/2• Total elapsed time• Any pertinent information related to the resulting function,
such as the number of generations evolved, function, fitness value, etc.
55
Requirement Analysis - Requirement Analysis - ContinuedContinued
• If the genetic programming system fails to produce a function within an acceptable tolerance level in the fifteen minute time frame, then terminate execution
• Output the best function along with its associated fitness value upon termination of the Genetic Programming generation and testing loop, whether due to:– finding a solution within the desired tolerance
OR– the allocated time expiring
• The system must be able to accept a change in requirements a week before the due date
• The genetic programming system must run on PCs available in the classroom.
66
Requirement Analysis – Requirement Analysis – cont.cont.
Finite State MachineFinite State Machine
77
Timer
+setCutoffTime(minutes : long) : void+minutesElapsed() : long+start() : void+timeExpired() : boolean
-startTime : long-currTime : long-elapsedTime : long-cutOffTime : long
GPTester
+readTrainingData() : TrainingData[]+withinTolerance(inp : gpNode) : boolean+printGenerationResults() : String
-tolerance : double-TheTimer : Timer
GPRandomNumerGenerator
+initialize() : double+getNumber() : Double
TrainingData
+setX() : void+setY(inp : double, double) : void+getX() : double+getY() : double
-x : double-y : double
GPGeneration
+addNodeToGeneration() : void+chooseNode(inp : GPNode, GPNode) : GPNode[]+doCrossover(inp : GPNode, GPNode) : GPNode[]+setMaxNumberInGeneration(inp : int) : void+setTotalFit() : double+getBestNode() : GPNode+getTotalNode() : GPNode+getAverageFit() : double+setProbabilities(inp : double, double, double) : void
-nodeSet[] : GPNode-totalFit : double-bestNode : GPNode-bestFit : double-numberInGeneration : int-averageFit : double-crossoverProbability : double-mutateProbability : double-newEntrantProbability : double-maxNumberInGeneration : double
GPNode
+getLevel() : int+toString() : String+stringToCharStack() : stack+evaluate(inp : double) : double+getPrecedence(inp : char) : int+doMutate() : void+clone() : GPNode+getFit(inp : TrainingData[]) : double
-leftOperand : GPNode-rightOperand : GPNode-label : char-level : int-nodeType : int-parent : GPNode
Requirement Analysis – Requirement Analysis – cont.cont.Unified Modeling LanguageUnified Modeling Language
88
Requirement Analysis – Requirement Analysis – cont.cont.Data Flow DiagramData Flow Diagram
99
TechnologyTechnology
Programming Language• Sun Java 1.4
Development Environments• NetBeans• Eclipse • EditPlus• DOS Prompt
1010
Why Java?Why Java?• There were a number of programming languages for our
use in this project, such as C or C++. • Java was chosen as the programming language of
choice for a number of reasons:– When we were evaluating the technical skills of each team
member, Java was the language with the greatest familiarity in the group
– Java is free to download and use
• The construction of the GP Programs from individual nodes lends itself to an object-oriented methodology, and Java is an object-oriented programming language.
• Ease of implementation was another consideration since we are not familiar with the classroom where the presentation will take place.
1111
Settings & System DesignSettings & System Design
– Using an object-oriented system design that reflects the UML shown in the Requirements Analysis section, each class will be implemented by a separate java .class file.
– All .class files needed by the genetic programming system will be stored in the same directory on the PC on which the program is run.
– For the inital version of the program, • All inputs will be hard coded within the Java source code• The output will be written to the standard output when
executed from a command prompt.
1212
Settings & System Design – Settings & System Design – cont.cont.– Random Number Generator
• Java class using system time as a seed
– Function and Terminal Set• Numbers 1 through 9• Operators: +, -, *, /
– Data Structures Used• Binary Tree
– Creation of generated functions
– Maximum Depth = 5
• Stack– Evaluation using postfix traversal
– Determining crossover point
1313
Settings & System Design – Settings & System Design – cont.cont.
– Programs per Generation• 50 programs per generation
– Genetic Operator Probabilities• Crossover = 80%• Mutation = 10%• Reproduction (Cloning) = 15%• New Entrant = 5%
1414
Settings & System Design – Settings & System Design – cont.cont.
– Divide by Zero• Dead on Arrival (DOA) indicator• If TRUE, the function will not be included for
consideration into the next generation
1515
Algorithms Algorithms by by
John A. Watne John A. Watne
1616
AlgorithmsAlgorithms
• Fitness and Selection– Fitness: sum of squared errors; targeted fitness value
= zero. – p(i) = (1 / (n-1)) * [1 - (Fit(i) / Sum Fit(i))] for n > 1;
100% otherwise – Any GP programs with division by zero errors for any
x value in the training data are determined to be "Dead On Arrival", and are not allowed to reproduce or count toward the total and average fitness values for the generation.
• Method of Tree Traversal– We implanted a post-order method for tree traversal.
1717
Algorithms - Algorithms - continuedcontinued
• Sorting– After a new generation of GP programs has been created and
each one evaluated, they could be sorted in ascending order of fitness.
– This would ease the selection of valid functions into the subsequent generation because the possible solution would be towards the front of the array. We chose not to use any sorting in any part of the GP Project for a number of reasons.
– One reason is that we were concerned about the fifteen minute time limit.
– Also, we chose to simplify the design to meet the deadline of the project. We are also attempting to implement a GUI and we were concerned that this logic would consume much needed processing time from the CPU.
– We have considered adding sorting by fitness value as a future enhancement.
1818
Algorithms – Algorithms – continued.continued.
Key Correction to Algorithm: • Issue: When reviewing the graph of best fit and
average fit of each succeeding generation, the values were swinging up and down, rather than being continuously non-increasing (that is, never increasing; always decreasing or remaining level).
• Resolution: Thus, rather than just cloning randomly selected individuals from the prior generation, make sure that the best program from the prior generation survives unchanged as the first program added to the new generation. This guarantees that the best fit for a program in the new generation can be no worse than the best fit from its previous (parent) generation
1919
Best Fit of GP Program by Generation - continuedBest Fit of GP Program by Generation - continued
Before Fix:
2020
Best Fit GP Program by GenerationBest Fit GP Program by Generation
After Fix:
2121
Graphical User InterfaceGraphical User Interfacebyby
Ian R. Erlanson Ian R. Erlanson
2222
Output ScreenOutput Screen
2323
Lessons Learned Lessons Learned and Future Enhancements and Future Enhancements
bybyGeoffrey A. ReglosGeoffrey A. Reglos
2424
Lessons LearnedLessons Learned
• I got good practice at reading and working with other people’s code and writing code that conformed to project specifications.
• I personally have learned an essential step in the development of a computer program especially when John and others start with a simple solution, then seek to understand that solution’s performance characteristics, which I feel that it helps me to see how to develop the computational procedure for solving a problem
2525
Lessons Learned -Lessons Learned -continuecontinue
• I underestimated the work involved with documentation. Thus, I learned about the need for the documenter to work more closely with the developer to understand the details of the program(s).
• I learned to work with a group of people in a short term project. We were able work within each individual’s strengths and weaknesses to accomplish a goal of successfully completing the project in a timely manner. The important characteristics of working with this group were communication and trust of some degree.
2626
Lessons Learned -Lessons Learned -continuecontinue
• I learned more about the use of probability of survival, so common to actuarial work, applied to the creation of new software by software.
2727
Future EnhancementsFuture Enhancements
• Implement sorting in ascending order for the functions in a generation. This will ensure that the function with the best fitness value is at the top.
• Implement more flexibility of the input of training data. Currently, the training data is hardcoded. We would like to have a GUI which will offer the user a number of choices in how to accept training data in different formats. This would also involve adding more logic to parse and format the data into an acceptable form for use by the GP program.
• Use Ant to simplify the task of managing the build of the project.
2828
Q & AQ & A