Slides from my oral defense

43
Coevolutionary Automated Software Coevolutionary Automated Software Correction Correction A Proof of Concept A Proof of Concept Master’s Oral Defense Master’s Oral Defense September 8, 2008 September 8, 2008 Josh Wilkerson Josh Wilkerson Committee Committee Dr. Daniel Tauritz – Dr. Daniel Tauritz – Chair Chair Dr. Bruce McMillin Dr. Bruce McMillin Dr. Thomas Weigert Dr. Thomas Weigert

Transcript of Slides from my oral defense

Page 1: Slides from my oral defense

Coevolutionary Automated Software CorrectionCoevolutionary Automated Software Correction A Proof of ConceptA Proof of Concept

Master’s Oral DefenseMaster’s Oral Defense

September 8, 2008September 8, 2008

Josh WilkersonJosh Wilkerson

CommitteeCommittee

Dr. Daniel Tauritz – ChairDr. Daniel Tauritz – Chair

Dr. Bruce McMillinDr. Bruce McMillin

Dr. Thomas WeigertDr. Thomas Weigert

Page 2: Slides from my oral defense

Page 2Motivation

In 2002 the National Institute of Science and Technology stated [9]:

– Software errors cost the U.S. economy $59.5 billion a year

– Approximately 0.6% of gross domestic product

– 30% of these costs could be removed by earlier, more effective software defect detection and an improved testing infrastructure

Page 3: Slides from my oral defense

Page 3Problem Statement

Software Debugging:– Test the software

– Locate the errors identified

– Correct the errors

Time consuming yet critical process

Many publications on automating the testing process

None that fully automate the testing and correction phase

Page 4: Slides from my oral defense

Page 4The System Envisioned

Page 5: Slides from my oral defense

Page 5Most Related Work

Paolo Tonella [14] and Stefen Wappler [6,15,16]

– Unit testing of object oriented software

– Used evolutionary methods

– Focused only on testing, did nothing with correction

Timo Mantere [7,8]

– Two-population testing system using genetic algorithms

– Optimized program parameters through evolution

– The more control the EA has over the program the better the results

Page 6: Slides from my oral defense

Page 6Technical Background

Darrel Rosin [10,11] and John Cartlidge [1]

– Extensive analysis of co-evolution

– Outline many potential problems that can occur during co-evolution

Koza [2,3,4,5]

– Popularized genetic programming in the 1990’s

– Father of modern genetic programming

Page 7: Slides from my oral defense

Page 7CASC Evolutionary Model

Page 8: Slides from my oral defense

Page 8CASC Evolutionary Model

Page 9: Slides from my oral defense

Page 9Parsing in the CASC System

The program population is based on the program to be corrected (seed program)

Page 10: Slides from my oral defense

Page 10Parsing in the CASC System: Step 1

The ANTLR system is used to create parsing tools (only done once for each language)

The parser created is based on a provided grammar (C++)

The resulting parser is dependent on the ANTLR libraries

Page 11: Slides from my oral defense

Page 11Parsing in the CASC System: Step 2

The system reads in the source code for the program to correct

The code to evolve is extracted in preprocessing

Page 12: Slides from my oral defense

Page 12Parsing in the CASC System: Step 3

The preprocessed source code to evolve is provided to the parsing tools

Page 13: Slides from my oral defense

Page 13Parsing in the CASC System: Step 4

The parsing tools produce the Abstract Syntax Tree (AST) for the evolvable code

The AST produced is heavily dependent on the ANTLR libraries

These dependencies incur unnecessary computational cost

Page 14: Slides from my oral defense

Page 14Parsing in the CASC System: Step 5

The ANTLR AST is provided to the CASC AST translator

The AST translator removes the ANTLR dependencies from the AST

The result is a lightweight version of the AST

Page 15: Slides from my oral defense

Page 15Parsing in the CASC System: Step 6

The lightweight AST is provided to the CASC coevolutionary system

Copies of the AST are randomly modified

Initial variation phase

Page 16: Slides from my oral defense

Page 16CASC Evolutionary Model

Page 17: Slides from my oral defense

Page 17CASC Evolutionary Model

Page 18: Slides from my oral defense

Page 18CASC Evolutionary Model

Page 19: Slides from my oral defense

Page 19CASC Evolutionary Model

Reproduction

– Parents selected using tournament selection

– Uniform crossover with bias

– Program child subtrees of the roots were used for crossover

Mutation

– Each offspring has a chance to mutate

– Only specific nodes are considered for program mutation

– Genes to be mutated are altered based on a Gaussian distribution

Page 20: Slides from my oral defense

Page 20CASC Evolutionary Model

Page 21: Slides from my oral defense

Page 21CASC Evolutionary Model

Page 22: Slides from my oral defense

Page 22

For each individual:– Randomly select set of (unique) opponents

– Check hash table to retrieve repeat pairing results

– Execute program with test case as input for each new pairing

– Apply fitness function to program output, store fitness for the trial

– Set individual fitness as average fitness across all trials

Program compilation is performed as needed

Program errors/time-outs result in arbitrarily low fitness

This is done in parallel, using the NIC-Cluster and MPI

CASC Evolutionary Model: Fitness Evaluation

Page 23: Slides from my oral defense

Page 23CASC Evolutionary Model

Page 24: Slides from my oral defense

Page 24CASC Evolutionary Model

Page 25: Slides from my oral defense

Page 25CASC Evolutionary Model

Page 26: Slides from my oral defense

Page 26Experimental Setup

Proof of concept

Correction of insertion sort implementation

Test case: unsorted data array

Page 27: Slides from my oral defense

Page 27Experimental Setup

Fitness function

Scoring method

For each element x in the output data array:

– For each element a before x in the array, decrement score if x < a, increment score otherwise

– For each element b after x in the array, decrement score if x > b, increment score otherwise

Normalized to fall between 0 and 1

-1 assigned to programs with errors/time-outs

Page 28: Slides from my oral defense

Page 28Experimental SetupExperimental Setup

Four seed programs used

– Each has one common error and one unique error (of varying severity)

Four different configurations used

– Mutation Rate: Likelihood of an offspring being mutated

– Mutative Proportion: Amount of change mutation incurs

  Config 0 Config 1 Config 2 Config 3

Mutative Rate Moderate High Moderate High

Mutative Proportion Moderate Moderate High High

Page 29: Slides from my oral defense

Page 29Results

A total of 16 experiments per full run

High computational complexity and limited resources

Five full runs were completed, totaling in 80 experiments

Page 30: Slides from my oral defense

Page 30Summary of Results

Seed Program: Config. Best (Std. Dev.) Average (Std. Dev.)

A : Base 0.526 (0.262) 0.163 (0.157)

A : Enhanced Rate 0.557 (0.283) 0.170 (0.166)

A : Enhanced Proportion 0.537 (0.226) 0.196 (0.133)

A : Enhance Both 0.559 (0.255) 0.175 (0.153)

     

B : Base 0.965 (0.353) 0.275 (0.374)

B : Enhanced Rate 0.975 (0.357) 0.276 (0.370)

B : Enhanced Proportion 0.950 (0.432) 0.587 (0.458)

B : Enhance Both 0.959 (0.434) 0.415 (0.463)

     

C : Base 0.707 (0.224) 0.372 (0.196)

C : Enhanced Rate 0.717 (0.224) 0.366 (0.179)

C : Enhanced Proportion 0.716 (0.217) 0.369 (0.172)

C : Enhance Both 0.717 (0.224) 0.377 (0.181)

     

D : Base 1.0 (0.282) -0.484 (0.535)

D : Enhanced Rate 1.0 (0.948) -0.568 (0.572)

D : Enhanced Proportion 1.0 (0.946) -0.554 (0.587)

D : Enhance Both 1.0 (0.946) -0.601 (0.604)

Run three of both the program A and B experiments found a solution in the initial population (these were omitted from the table)

20% of the experiments (16) reported success

Page 31: Slides from my oral defense

Page 31Summary of Results

Seed Program: Config. Best (Std. Dev.) Average (Std. Dev.)

A : Base 0.526 (0.262) 0.163 (0.157)

A : Enhanced Rate 0.557 (0.283) 0.170 (0.166)

A : Enhanced Proportion 0.537 (0.226) 0.196 (0.133)

A : Enhance Both 0.559 (0.255) 0.175 (0.153)

     

B : Base 0.965 (0.353) 0.275 (0.374)

B : Enhanced Rate 0.975 (0.357) 0.276 (0.370)

B : Enhanced Proportion 0.950 (0.432) 0.587 (0.458)

B : Enhance Both 0.959 (0.434) 0.415 (0.463)

     

C : Base 0.707 (0.224) 0.372 (0.196)

C : Enhanced Rate 0.717 (0.224) 0.366 (0.179)

C : Enhanced Proportion 0.716 (0.217) 0.369 (0.172)

C : Enhance Both 0.717 (0.224) 0.377 (0.181)

     

D : Base 1.0 (0.282) -0.484 (0.535)

D : Enhanced Rate 1.0 (0.948) -0.568 (0.572)

D : Enhanced Proportion 1.0 (0.946) -0.554 (0.587)

D : Enhance Both 1.0 (0.946) -0.601 (0.604)

75% of the experiments reported above 0.7 fitness

Page 32: Slides from my oral defense

Page 32Summary of Results

Seed Program: Config. Best (Std. Dev.) Average (Std. Dev.)

A : Base 0.526 (0.262) 0.163 (0.157)

A : Enhanced Rate 0.557 (0.283) 0.170 (0.166)

A : Enhanced Proportion 0.537 (0.226) 0.196 (0.133)

A : Enhance Both 0.559 (0.255) 0.175 (0.153)

     

B : Base 0.965 (0.353) 0.275 (0.374)

B : Enhanced Rate 0.975 (0.357) 0.276 (0.370)

B : Enhanced Proportion 0.950 (0.432) 0.587 (0.458)

B : Enhance Both 0.959 (0.434) 0.415 (0.463)

     

C : Base 0.707 (0.224) 0.372 (0.196)

C : Enhanced Rate 0.717 (0.224) 0.366 (0.179)

C : Enhanced Proportion 0.716 (0.217) 0.369 (0.172)

C : Enhance Both 0.717 (0.224) 0.377 (0.181)

     

D : Base 1.0 (0.282) -0.484 (0.535)

D : Enhanced Rate 1.0 (0.948) -0.568 (0.572)

D : Enhanced Proportion 1.0 (0.946) -0.554 (0.587)

D : Enhance Both 1.0 (0.946) -0.601 (0.604)

There was a high amount of variation in the experiment endpoints

Large number of possible solutions for each seed program

Page 33: Slides from my oral defense

Page 33Summary of Results

Seed Program: Config. Best (Std. Dev.) Average (Std. Dev.)

A : Base 0.526 (0.262) 0.163 (0.157)

A : Enhanced Rate 0.557 (0.283) 0.170 (0.166)

A : Enhanced Proportion 0.537 (0.226) 0.196 (0.133)

A : Enhance Both 0.559 (0.255) 0.175 (0.153)

     

B : Base 0.965 (0.353) 0.275 (0.374)

B : Enhanced Rate 0.975 (0.357) 0.276 (0.370)

B : Enhanced Proportion 0.950 (0.432) 0.587 (0.458)

B : Enhance Both 0.959 (0.434) 0.415 (0.463)

     

C : Base 0.707 (0.224) 0.372 (0.196)

C : Enhanced Rate 0.717 (0.224) 0.366 (0.179)

C : Enhanced Proportion 0.716 (0.217) 0.369 (0.172)

C : Enhance Both 0.717 (0.224) 0.377 (0.181)

     

D : Base 1.0 (0.282) -0.484 (0.535)

D : Enhanced Rate 1.0 (0.948) -0.568 (0.572)

D : Enhanced Proportion 1.0 (0.946) -0.554 (0.587)

D : Enhance Both 1.0 (0.946) -0.601 (0.604)

The seed program D experiments were the toughest for the system

Seeded error resulted in either a 0 or -1 fitness

Experiments were either hit or miss

Page 34: Slides from my oral defense

Page 34Discussion of False Positives

A number of the programs returned by successful experiments still have an error

For example, this is the evolvable section from a solution:

for(m=0; m-1 < SIZE-1; m=m+1)

{

for(n=m+1; n>0 && data[n] < data[n-1]; n=n-1)

Swap(data[n], data[n-1]);

}

When m is SIZE-1, n is initialized to Size (invalid array index)

Tough to catch

Page 35: Slides from my oral defense

Page 35Conclusion

The goal: demonstrate a proof of concept coevolutionary system for integrated automated software testing and correction

A prototype Coevolutionary Automated Software Correction system was introduced

80 experiments were conducted

16 successes, with 75% of best-of-experiment fitnesses reporting over 0.7 (out of 1.0)

These experiments indicate validity of CASC system concept

Further work is required to determine scalability

Article on this work submitted to IEEE TSE

Page 36: Slides from my oral defense

Page 36Work in Progress and Future Work

Evolve complete parse tree– Preliminary results using GP evolutionary model are favorable

Cut down on run-times

– Add symmetric multiprocessing (server-client) functionality

– More efficient compilation

– Acquire additional computing resources (e.g., NSF Teragrid)

Investigate the potential benefits of co-optimization [12,13]

Page 37: Slides from my oral defense

Page 37Work in Progress and Future Work

Implement adaptive parameter control

Investigate options for detecting errors like false positives

Parameter sensitivity analysis

Page 38: Slides from my oral defense

Page 38References

[1] J. P. Cartlidge. Rules of Engagement: Competitive Coevolutionary Dynamics in Computational Systems. PhD thesis, University of Leeds, 2004.

[2] J. R. Koza. Genetic Programming: On the Programming of Computers by the Means of Natural Selection. MIT Press, Cambridge MA, 1992.

[3] J. R. Koza. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge MA, 1994.

[4] J. R. Koza. Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufmann, 1999.

[5] J. R. Koza. Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Kluwer Acadmeic Publishers, 2003.

[6] F. Lammermann and S. Wappler. Benefits of software measures for evolutionary white-box testing. In Proceedings of GECCO 2005 - the Genetic and Evolutionary Computation Conference, pages 1083–1084, Washington DC, 2005. ACM, ACM Press.

Page 39: Slides from my oral defense

Page 39References

[7] T. Mantere and J. T. Alander. Developing and testing structural light vision software by co-evolutionary genetic algorithm. In QSSE 2002 The Proceedings of the Second ASERC Workshop on Quantative and Soft Computing based Software Engineering, pages 31–37. Alberta Software Engineering Research Consortium (ASERC) and the Department of Electrical and Computer Engineering, University of Alberta, Feb. 2002

[8] T. Mantere and J. T. Alander. Testing digital halftoning software by generating test images and filters co-evolutionarily. In Proceedings of SPIE Vol. 5267 Intelligent Robots and Computer Vision XXI: Algorithms, Techniques, and Active Vision, pages 257–258. SPIE, Oct. 2003.

[9] M. Newman. Software Errors Cost U.S. Economy $59.5 Billion Annually. NIST News Release, June 2002.

[10] C. D. Rosin and R. K. Belew. Methods for competitive coevolution: Finding opponents worth beating. In L. Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 373–380, San Francisco, CA, 1995. Morgan Kaufmann.

[11] C. D. Rosin and R. K. Belew. New methods for competitive coevolution. Evolutionary Computation, 5(1):1–29, 1997.

Page 40: Slides from my oral defense

Page 40References

[12] T. Service. Co-optimization: A generalization of coevolution. Master's thesis, Missouri University of Science and Technology, 2008.

[13] T. Service and D. Tauritz. Co-optimization algorithms. In Proceedings of GECCO 2008 - the Genetic and Evolutionary Computation Conference, pages 387-388, 2008.

[14] P. Tonella. Evolutionary testing of classes. In Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis, pages 119–128, Boston, Massachusetts, 2004. ACM Press.

[15] S. Wappler and F. Lammermann. Using evolutionary algorithms for the unit testing of object-oriented software. In Proceedings of GECCO 2005 - the Genetic and Evolutionary Computation Conference, pages 1053–1060, Washington DC, 2005. ACM, ACM Press.

[16] S. Wappler and J. Wegener. Evolutionary unit testing of object-oriented software using strongly-typed genetic programming. In Proceedings of GECCO 2006 - the Genetic and Evolutionary Computation Conference, pages 1925– 1932, Seattle, Washington, 2006. ACM, ACM Press.

Page 41: Slides from my oral defense

Page 41

Questions?

Page 42: Slides from my oral defense

Page 42Koza’s GP Evolutionary Model

Back to future work slide

Page 43: Slides from my oral defense

Page 43Diversity in New Experiments

Program Population Diversities Under New Evolutionary Model

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50

Generation

Population Standard Deviation

Exp 1

Exp 2

Exp 3