Slides from my oral defense

Coevolutionary Automated Software CorrectionCoevolutionary Automated Software Correction A Proof of ConceptA Proof of Concept

Master’s Oral DefenseMaster’s Oral Defense

September 8, 2008September 8, 2008

Josh WilkersonJosh Wilkerson

CommitteeCommittee

Dr. Daniel Tauritz – ChairDr. Daniel Tauritz – Chair

Dr. Bruce McMillinDr. Bruce McMillin

Dr. Thomas WeigertDr. Thomas Weigert

Motivation

In 2002 the National Institute of Science and Technology stated [9]:

– Software errors cost the U.S. economy $59.5 billion a year

– Approximately 0.6% of gross domestic product

– 30% of these costs could be removed by earlier, more effective software defect detection and an improved testing infrastructure

Problem Statement

Software Debugging:– Test the software

– Locate the errors identified

– Correct the errors

Time consuming yet critical process

Many publications on automating the testing process

None that fully automate the testing and correction phase

The System Envisioned

Most Related Work

Paolo Tonella [14] and Stefen Wappler [6,15,16]

– Unit testing of object oriented software

– Used evolutionary methods

– Focused only on testing, did nothing with correction

Timo Mantere [7,8]

– Two-population testing system using genetic algorithms

– Optimized program parameters through evolution

– The more control the EA has over the program the better the results

Technical Background

Darrel Rosin [10,11] and John Cartlidge [1]

– Extensive analysis of co-evolution

– Outline many potential problems that can occur during co-evolution

Koza [2,3,4,5]

– Popularized genetic programming in the 1990’s

– Father of modern genetic programming

CASC Evolutionary Model

Parsing in the CASC System

The program population is based on the program to be corrected (seed program)

Parsing in the CASC System: Step 1

The ANTLR system is used to create parsing tools (only done once for each language)

The parser created is based on a provided grammar (C++)

The resulting parser is dependent on the ANTLR libraries


The system reads in the source code for the program to correct

The code to evolve is extracted in preprocessing


The preprocessed source code to evolve is provided to the parsing tools


The parsing tools produce the Abstract Syntax Tree (AST) for the evolvable code

The AST produced is heavily dependent on the ANTLR libraries

These dependencies incur unnecessary computational cost


The ANTLR AST is provided to the CASC AST translator

The AST translator removes the ANTLR dependencies from the AST

The result is a lightweight version of the AST


The lightweight AST is provided to the CASC coevolutionary system

Copies of the AST are randomly modified

Initial variation phase


Reproduction

– Parents selected using tournament selection

– Uniform crossover with bias

– Program child subtrees of the roots were used for crossover

Mutation

– Each offspring has a chance to mutate

– Only specific nodes are considered for program mutation

– Genes to be mutated are altered based on a Gaussian distribution

For each individual:– Randomly select set of (unique) opponents

– Check hash table to retrieve repeat pairing results

– Execute program with test case as input for each new pairing

– Apply fitness function to program output, store fitness for the trial

– Set individual fitness as average fitness across all trials

Program compilation is performed as needed

Program errors/time-outs result in arbitrarily low fitness

This is done in parallel, using the NIC-Cluster and MPI

CASC Evolutionary Model: Fitness Evaluation

Experimental Setup

Proof of concept

Correction of insertion sort implementation

Test case: unsorted data array

Experimental Setup

Fitness function

Scoring method

For each element x in the output data array:

– For each element a before x in the array, decrement score if x < a, increment score otherwise

– For each element b after x in the array, decrement score if x > b, increment score otherwise

Normalized to fall between 0 and 1

-1 assigned to programs with errors/time-outs

Experimental SetupExperimental Setup

Four seed programs used

– Each has one common error and one unique error (of varying severity)

Four different configurations used

– Mutation Rate: Likelihood of an offspring being mutated

– Mutative Proportion: Amount of change mutation incurs

Config 0 Config 1 Config 2 Config 3

Mutative Rate Moderate High Moderate High

Mutative Proportion Moderate Moderate High High

Results

A total of 16 experiments per full run

High computational complexity and limited resources

Five full runs were completed, totaling in 80 experiments

Summary of Results

Seed Program: Config. Best (Std. Dev.) Average (Std. Dev.)

A : Base 0.526 (0.262) 0.163 (0.157)

A : Enhanced Rate 0.557 (0.283) 0.170 (0.166)

A : Enhanced Proportion 0.537 (0.226) 0.196 (0.133)

A : Enhance Both 0.559 (0.255) 0.175 (0.153)

B : Base 0.965 (0.353) 0.275 (0.374)

B : Enhanced Rate 0.975 (0.357) 0.276 (0.370)

B : Enhanced Proportion 0.950 (0.432) 0.587 (0.458)

B : Enhance Both 0.959 (0.434) 0.415 (0.463)

C : Base 0.707 (0.224) 0.372 (0.196)

C : Enhanced Rate 0.717 (0.224) 0.366 (0.179)

C : Enhanced Proportion 0.716 (0.217) 0.369 (0.172)

C : Enhance Both 0.717 (0.224) 0.377 (0.181)

D : Base 1.0 (0.282) -0.484 (0.535)

D : Enhanced Rate 1.0 (0.948) -0.568 (0.572)

D : Enhanced Proportion 1.0 (0.946) -0.554 (0.587)

D : Enhance Both 1.0 (0.946) -0.601 (0.604)

Run three of both the program A and B experiments found a solution in the initial population (these were omitted from the table)

20% of the experiments (16) reported success

Summary of Results


A : Base 0.526 (0.262) 0.163 (0.157)

A : Enhanced Rate 0.557 (0.283) 0.170 (0.166)


A : Enhance Both 0.559 (0.255) 0.175 (0.153)

B : Base 0.965 (0.353) 0.275 (0.374)

B : Enhanced Rate 0.975 (0.357) 0.276 (0.370)


B : Enhance Both 0.959 (0.434) 0.415 (0.463)

C : Base 0.707 (0.224) 0.372 (0.196)

C : Enhanced Rate 0.717 (0.224) 0.366 (0.179)


C : Enhance Both 0.717 (0.224) 0.377 (0.181)

D : Base 1.0 (0.282) -0.484 (0.535)

D : Enhanced Rate 1.0 (0.948) -0.568 (0.572)


D : Enhance Both 1.0 (0.946) -0.601 (0.604)

75% of the experiments reported above 0.7 fitness

Summary of Results


A : Base 0.526 (0.262) 0.163 (0.157)

A : Enhanced Rate 0.557 (0.283) 0.170 (0.166)


A : Enhance Both 0.559 (0.255) 0.175 (0.153)

B : Base 0.965 (0.353) 0.275 (0.374)

B : Enhanced Rate 0.975 (0.357) 0.276 (0.370)


B : Enhance Both 0.959 (0.434) 0.415 (0.463)

C : Base 0.707 (0.224) 0.372 (0.196)

C : Enhanced Rate 0.717 (0.224) 0.366 (0.179)


C : Enhance Both 0.717 (0.224) 0.377 (0.181)

D : Base 1.0 (0.282) -0.484 (0.535)

D : Enhanced Rate 1.0 (0.948) -0.568 (0.572)


D : Enhance Both 1.0 (0.946) -0.601 (0.604)

There was a high amount of variation in the experiment endpoints

Large number of possible solutions for each seed program

Summary of Results


A : Base 0.526 (0.262) 0.163 (0.157)

A : Enhanced Rate 0.557 (0.283) 0.170 (0.166)


A : Enhance Both 0.559 (0.255) 0.175 (0.153)

B : Base 0.965 (0.353) 0.275 (0.374)

B : Enhanced Rate 0.975 (0.357) 0.276 (0.370)


B : Enhance Both 0.959 (0.434) 0.415 (0.463)

C : Base 0.707 (0.224) 0.372 (0.196)

C : Enhanced Rate 0.717 (0.224) 0.366 (0.179)


C : Enhance Both 0.717 (0.224) 0.377 (0.181)

D : Base 1.0 (0.282) -0.484 (0.535)

D : Enhanced Rate 1.0 (0.948) -0.568 (0.572)


D : Enhance Both 1.0 (0.946) -0.601 (0.604)

The seed program D experiments were the toughest for the system

Seeded error resulted in either a 0 or -1 fitness

Experiments were either hit or miss

Discussion of False Positives

A number of the programs returned by successful experiments still have an error

For example, this is the evolvable section from a solution:

for(m=0; m-1 < SIZE-1; m=m+1)

{

for(n=m+1; n>0 && data[n] < data[n-1]; n=n-1)

Swap(data[n], data[n-1]);

}

When m is SIZE-1, n is initialized to Size (invalid array index)

Tough to catch

Conclusion

The goal: demonstrate a proof of concept coevolutionary system for integrated automated software testing and correction

A prototype Coevolutionary Automated Software Correction system was introduced

80 experiments were conducted

16 successes, with 75% of best-of-experiment fitnesses reporting over 0.7 (out of 1.0)

These experiments indicate validity of CASC system concept

Further work is required to determine scalability

Article on this work submitted to IEEE TSE

Work in Progress and Future Work

Evolve complete parse tree– Preliminary results using GP evolutionary model are favorable

Cut down on run-times

– Add symmetric multiprocessing (server-client) functionality

– More efficient compilation

– Acquire additional computing resources (e.g., NSF Teragrid)

Investigate the potential benefits of co-optimization [12,13]

Work in Progress and Future Work

Implement adaptive parameter control

Investigate options for detecting errors like false positives

Parameter sensitivity analysis

References

[1] J. P. Cartlidge. Rules of Engagement: Competitive Coevolutionary Dynamics in Computational Systems. PhD thesis, University of Leeds, 2004.

[2] J. R. Koza. Genetic Programming: On the Programming of Computers by the Means of Natural Selection. MIT Press, Cambridge MA, 1992.

[3] J. R. Koza. Genetic Programming II: Automatic Discovery of Reusable Programs. MIT Press, Cambridge MA, 1994.

[4] J. R. Koza. Genetic Programming III: Darwinian Invention and Problem Solving. Morgan Kaufmann, 1999.

[5] J. R. Koza. Genetic Programming IV: Routine Human-Competitive Machine Intelligence. Kluwer Acadmeic Publishers, 2003.

[6] F. Lammermann and S. Wappler. Benefits of software measures for evolutionary white-box testing. In Proceedings of GECCO 2005 - the Genetic and Evolutionary Computation Conference, pages 1083–1084, Washington DC, 2005. ACM, ACM Press.

References

[7] T. Mantere and J. T. Alander. Developing and testing structural light vision software by co-evolutionary genetic algorithm. In QSSE 2002 The Proceedings of the Second ASERC Workshop on Quantative and Soft Computing based Software Engineering, pages 31–37. Alberta Software Engineering Research Consortium (ASERC) and the Department of Electrical and Computer Engineering, University of Alberta, Feb. 2002

[8] T. Mantere and J. T. Alander. Testing digital halftoning software by generating test images and filters co-evolutionarily. In Proceedings of SPIE Vol. 5267 Intelligent Robots and Computer Vision XXI: Algorithms, Techniques, and Active Vision, pages 257–258. SPIE, Oct. 2003.

[9] M. Newman. Software Errors Cost U.S. Economy $59.5 Billion Annually. NIST News Release, June 2002.

[10] C. D. Rosin and R. K. Belew. Methods for competitive coevolution: Finding opponents worth beating. In L. Eshelman, editor, Proceedings of the Sixth International Conference on Genetic Algorithms, pages 373–380, San Francisco, CA, 1995. Morgan Kaufmann.

[11] C. D. Rosin and R. K. Belew. New methods for competitive coevolution. Evolutionary Computation, 5(1):1–29, 1997.

References

[12] T. Service. Co-optimization: A generalization of coevolution. Master's thesis, Missouri University of Science and Technology, 2008.

[13] T. Service and D. Tauritz. Co-optimization algorithms. In Proceedings of GECCO 2008 - the Genetic and Evolutionary Computation Conference, pages 387-388, 2008.

[14] P. Tonella. Evolutionary testing of classes. In Proceedings of the 2004 ACM SIGSOFT international symposium on Software testing and analysis, pages 119–128, Boston, Massachusetts, 2004. ACM Press.

[15] S. Wappler and F. Lammermann. Using evolutionary algorithms for the unit testing of object-oriented software. In Proceedings of GECCO 2005 - the Genetic and Evolutionary Computation Conference, pages 1053–1060, Washington DC, 2005. ACM, ACM Press.

[16] S. Wappler and J. Wegener. Evolutionary unit testing of object-oriented software using strongly-typed genetic programming. In Proceedings of GECCO 2006 - the Genetic and Evolutionary Computation Conference, pages 1925– 1932, Seattle, Washington, 2006. ACM, ACM Press.

Questions?

Koza’s GP Evolutionary Model

Back to future work slide

Diversity in New Experiments

Program Population Diversities Under New Evolutionary Model

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0 10 20 30 40 50

Generation

Population Standard Deviation

Exp 1

Exp 2

Exp 3

Slides from my oral defense

Documents

Transcript of Slides from my oral defense