1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos...

28
1 Class Project 510 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou

Transcript of 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos...

Page 1: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

11

Class Project 510Class Project 510

Team Members

John A. Watne

Jordan D. Howe

Ian R. Erlanson

Geoffrey A. Reglos

Sengdara Phetsomphou

Page 2: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

22

Project OverviewProject OverviewI.I. Problem DescriptionProblem DescriptionII.II. Requirements AnalysisRequirements AnalysisIII.III. TechnologyTechnologyIV.IV. Settings and System DesignSettings and System DesignV.V. AlgorithmAlgorithmVI.VI. Graphical User Interface (GUI)Graphical User Interface (GUI)VII.VII. Lessons LearnedLessons LearnedVIII.VIII. Future EnhancementFuture Enhancement

Page 3: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

33

Problem DescriptionProblem Description• In this project, we are attempting to design a Genetic

Programming system that will produce a pre-defined mathematical equation equivalent to (y = (x² + 1) / 2), derived from training data consisting of several values for x and the resulting values for y.

• Analogous to DNA evolution, this program will display characteristics, such as crossover and mutation.

• Key components of the system are a fitness and selection function that will decide if the generated solution meets minimum requirements.

• We expect that each subsequent generation of solutions will be “better” – that is, will better reproduce the training data – than the previous generation, thus eventually resulting in a correct mathematical equation.

Page 4: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

44

Requirement AnalysisRequirement Analysis

– Given training data, consisting of a set of ten positive x values and the matching y values, the genetic programming system will generate a function that closely matches the pre-defined mathematical function, y = (x² +1)/2.

– The resulting function must be generated within the allotted fifteen minutes.

– The expected output of the system will consist of• Mathematical function: y = (x² +1)/2• Total elapsed time• Any pertinent information related to the resulting function,

such as the number of generations evolved, function, fitness value, etc.

Page 5: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

55

Requirement Analysis - Requirement Analysis - ContinuedContinued

• If the genetic programming system fails to produce a function within an acceptable tolerance level in the fifteen minute time frame, then terminate execution

• Output the best function along with its associated fitness value upon termination of the Genetic Programming generation and testing loop, whether due to:– finding a solution within the desired tolerance

OR– the allocated time expiring

• The system must be able to accept a change in requirements a week before the due date

• The genetic programming system must run on PCs available in the classroom.

Page 6: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

66

Requirement Analysis – Requirement Analysis – cont.cont.

Finite State MachineFinite State Machine

Page 7: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

77

Timer

+setCutoffTime(minutes : long) : void+minutesElapsed() : long+start() : void+timeExpired() : boolean

-startTime : long-currTime : long-elapsedTime : long-cutOffTime : long

GPTester

+readTrainingData() : TrainingData[]+withinTolerance(inp : gpNode) : boolean+printGenerationResults() : String

-tolerance : double-TheTimer : Timer

GPRandomNumerGenerator

+initialize() : double+getNumber() : Double

TrainingData

+setX() : void+setY(inp : double, double) : void+getX() : double+getY() : double

-x : double-y : double

GPGeneration

+addNodeToGeneration() : void+chooseNode(inp : GPNode, GPNode) : GPNode[]+doCrossover(inp : GPNode, GPNode) : GPNode[]+setMaxNumberInGeneration(inp : int) : void+setTotalFit() : double+getBestNode() : GPNode+getTotalNode() : GPNode+getAverageFit() : double+setProbabilities(inp : double, double, double) : void

-nodeSet[] : GPNode-totalFit : double-bestNode : GPNode-bestFit : double-numberInGeneration : int-averageFit : double-crossoverProbability : double-mutateProbability : double-newEntrantProbability : double-maxNumberInGeneration : double

GPNode

+getLevel() : int+toString() : String+stringToCharStack() : stack+evaluate(inp : double) : double+getPrecedence(inp : char) : int+doMutate() : void+clone() : GPNode+getFit(inp : TrainingData[]) : double

-leftOperand : GPNode-rightOperand : GPNode-label : char-level : int-nodeType : int-parent : GPNode

Requirement Analysis – Requirement Analysis – cont.cont.Unified Modeling LanguageUnified Modeling Language

Page 8: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

88

Requirement Analysis – Requirement Analysis – cont.cont.Data Flow DiagramData Flow Diagram

Page 9: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

99

TechnologyTechnology

Programming Language• Sun Java 1.4

Development Environments• NetBeans• Eclipse • EditPlus• DOS Prompt

Page 10: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1010

Why Java?Why Java?• There were a number of programming languages for our

use in this project, such as C or C++. • Java was chosen as the programming language of

choice for a number of reasons:– When we were evaluating the technical skills of each team

member, Java was the language with the greatest familiarity in the group

– Java is free to download and use

• The construction of the GP Programs from individual nodes lends itself to an object-oriented methodology, and Java is an object-oriented programming language.

• Ease of implementation was another consideration since we are not familiar with the classroom where the presentation will take place.

Page 11: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1111

Settings & System DesignSettings & System Design

– Using an object-oriented system design that reflects the UML shown in the Requirements Analysis section, each class will be implemented by a separate java .class file.

– All .class files needed by the genetic programming system will be stored in the same directory on the PC on which the program is run.

– For the inital version of the program, • All inputs will be hard coded within the Java source code• The output will be written to the standard output when

executed from a command prompt.

Page 12: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1212

Settings & System Design – Settings & System Design – cont.cont.– Random Number Generator

• Java class using system time as a seed

– Function and Terminal Set• Numbers 1 through 9• Operators: +, -, *, /

– Data Structures Used• Binary Tree

– Creation of generated functions

– Maximum Depth = 5

• Stack– Evaluation using postfix traversal

– Determining crossover point

Page 13: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1313

Settings & System Design – Settings & System Design – cont.cont.

– Programs per Generation• 50 programs per generation

– Genetic Operator Probabilities• Crossover = 80%• Mutation = 10%• Reproduction (Cloning) = 15%• New Entrant = 5%

Page 14: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1414

Settings & System Design – Settings & System Design – cont.cont.

– Divide by Zero• Dead on Arrival (DOA) indicator• If TRUE, the function will not be included for

consideration into the next generation

Page 15: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1515

Algorithms Algorithms by by

John A. Watne John A. Watne

Page 16: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1616

AlgorithmsAlgorithms

• Fitness and Selection– Fitness: sum of squared errors; targeted fitness value

= zero. – p(i) = (1 / (n-1)) * [1 - (Fit(i) / Sum Fit(i))] for n > 1;

100% otherwise – Any GP programs with division by zero errors for any

x value in the training data are determined to be "Dead On Arrival", and are not allowed to reproduce or count toward the total and average fitness values for the generation.

• Method of Tree Traversal– We implanted a post-order method for tree traversal.

Page 17: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1717

Algorithms - Algorithms - continuedcontinued

• Sorting– After a new generation of GP programs has been created and

each one evaluated, they could be sorted in ascending order of fitness.

– This would ease the selection of valid functions into the subsequent generation because the possible solution would be towards the front of the array. We chose not to use any sorting in any part of the GP Project for a number of reasons.

– One reason is that we were concerned about the fifteen minute time limit.

– Also, we chose to simplify the design to meet the deadline of the project. We are also attempting to implement a GUI and we were concerned that this logic would consume much needed processing time from the CPU.

– We have considered adding sorting by fitness value as a future enhancement.

Page 18: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1818

Algorithms – Algorithms – continued.continued.

Key Correction to Algorithm: • Issue: When reviewing the graph of best fit and

average fit of each succeeding generation, the values were swinging up and down, rather than being continuously non-increasing (that is, never increasing; always decreasing or remaining level).

• Resolution: Thus, rather than just cloning randomly selected individuals from the prior generation, make sure that the best program from the prior generation survives unchanged as the first program added to the new generation.  This guarantees that the best fit for a program in the new generation can be no worse than the best fit from its previous (parent) generation

Page 19: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

1919

Best Fit of GP Program by Generation - continuedBest Fit of GP Program by Generation - continued

Before Fix:

Page 20: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

2020

Best Fit GP Program by GenerationBest Fit GP Program by Generation

After Fix:

Page 21: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

2121

Graphical User InterfaceGraphical User Interfacebyby

Ian R. Erlanson Ian R. Erlanson

Page 22: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

2222

Output ScreenOutput Screen

Page 23: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

2323

Lessons Learned Lessons Learned and Future Enhancements and Future Enhancements

bybyGeoffrey A. ReglosGeoffrey A. Reglos

Page 24: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

2424

Lessons LearnedLessons Learned

• I got good practice at reading and working with other people’s code and writing code that conformed to project specifications.

• I personally have learned an essential step in the development of a computer program especially when John and others start with a simple solution, then seek to understand that solution’s performance characteristics, which I feel that it helps me to see how to develop the computational procedure for solving a problem

Page 25: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

2525

Lessons Learned -Lessons Learned -continuecontinue

• I underestimated the work involved with documentation. Thus, I learned about the need for the documenter to work more closely with the developer to understand the details of the program(s).

• I learned to work with a group of people in a short term project. We were able work within each individual’s strengths and weaknesses to accomplish a goal of successfully completing the project in a timely manner. The important characteristics of working with this group were communication and trust of some degree.

Page 26: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

2626

Lessons Learned -Lessons Learned -continuecontinue

• I learned more about the use of probability of survival, so common to actuarial work, applied to the creation of new software by software.

Page 27: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

2727

Future EnhancementsFuture Enhancements

• Implement sorting in ascending order for the functions in a generation. This will ensure that the function with the best fitness value is at the top.

• Implement more flexibility of the input of training data. Currently, the training data is hardcoded. We would like to have a GUI which will offer the user a number of choices in how to accept training data in different formats. This would also involve adding more logic to parse and format the data into an acceptable form for use by the GP program.

• Use Ant to simplify the task of managing the build of the project.

Page 28: 1 Class Project 510 Team Members John A. Watne Jordan D. Howe Ian R. Erlanson Geoffrey A. Reglos Sengdara Phetsomphou.

2828

Q & AQ & A