Automatically Finding Patches Using Genetic...

32
Automatically Finding Patches Using Genetic Programming published in ICSE’09 Westley Weimer, Thanh Vu Nguyen, Claire Le Goues, and Stephanie Forrest Presenter: Jihun Park SELab 2014.01.10 LAB Seminar

Transcript of Automatically Finding Patches Using Genetic...

Page 1: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Automatically Finding Patches Using Genetic Programmingpublished in ICSE’09

Westley Weimer, Thanh Vu Nguyen, Claire Le Goues, and Stephanie Forrest

Presenter: Jihun Park

SELab

2014.01.10

LAB Seminar

Page 2: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

DrawbacksComplicate input

Introduction

2

• Software bug

– Maintenance takes the majority of total software dev. cost.

– Fixing bug is inevitable, difficult, and tedious manual process.

Automatic program

repair

Difficult formal specification

Program annotations

Special coding practice

Harmful repairs

Restricted property

Just narrowing down to few lines

Page 3: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

DrawbacksComplicate input

Introduction

3

• Software bug

– Maintenance takes the majority of total software dev. cost.

– Fixing bug is inevitable, difficult, and tedious manual process.

Automatic program

repair

Difficult formal specification

Program annotations

Special coding practice

Harmful repairs

Restricted property

Just narrowing down to few lines

GOAL

Suggesting automatic patch generation technique using Genetic

Programming• With simple inputs,

• Not sacrificing required functionality,• Generating a concrete patch

Page 4: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Outline

• Introduction

• Motivating Example

• Background

• Approach Overview

• Genetic Programming (GP) for Program Repair

• Experiments

• Conclusion

• Discussion

4

Page 5: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Motivating Example

5

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }

With positiveand negativetest cases,

gcd(1071, 1029) = 21

gcd(0, 55) = > infinite loop

Positive test case

Negative test case

Page 6: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Motivating Example

6

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }

With positiveand negativetest cases,

gcd(1071, 1029) = 21

gcd(0, 55) = > infinite loop

Positive test case

Negative test case

Page 7: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Motivating Example

7

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }

With positiveand negativetest cases,

gcd(1071, 1029) = 21

gcd(0, 55) = > infinite loop

Positive test case

Negative test case

Locate suspicious code

Page 8: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Motivating Example

8

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }

With positiveand negativetest cases,

gcd(1071, 1029) = 21

gcd(0, 55) = > infinite loop

Positive test case

Negative test case

Locate suspicious code

Insert/remove/replace statementswith existing ones to fix negative TC.

Page 9: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Motivating Example

9

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 exit(0);6 a = a - b;7 }8 while (b != 0)9 if (a > b)10 a = a - b;11 else12 b = b - a;13 printf("%d", a);14 exit(0);15 }

With positiveand negativetest cases,

gcd(1071, 1029) = 21

gcd(0, 55) = > infinite loop

Positive test case

Negative test case

Locate suspicious code

Insert/remove/replace statementswith existing ones to fix negative TC.

gcd(1071, 1029) = 21

gcd(0, 55) = infinite loop

Positive test case

Negative test case

gcd(0, 55) = 55

Page 10: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Motivating Example

10

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 exit(0);6 a = a - b;7 }8 while (b != 0)9 if (a > b)10 a = a - b;11 else12 b = b - a;13 printf("%d", a);14 exit(0);15 }

With positiveand negativetest cases,

gcd(1071, 1029) = 21

gcd(0, 55) = > infinite loop

Positive test case

Negative test case

Locate suspicious code

Insert/remove/replace statementswith existing ones to fix negative TC.

If we find a fix, minimize it by deleting extra statements.

gcd(1071, 1029) = 21

gcd(0, 55) = infinite loop

Positive test case

Negative test case

gcd(0, 55) = 55

Page 11: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Motivating Example

11

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 exit(0);6 }7 while (b != 0)8 if (a > b)9 a = a - b;10 else11 b = b - a;12 printf("%d", a);13 exit(0);14 }

With positiveand negativetest cases,

gcd(1071, 1029) = 21

gcd(0, 55) = > infinite loop

Positive test case

Negative test case

Locate suspicious code

Insert/remove/replace statementswith existing ones to fix negative TC.

If we find a fix, minimize it by deleting extra statements.

gcd(1071, 1029) = 21

gcd(0, 55) = infinite loop

Positive test case

Negative test case

gcd(0, 55) = 55

Page 12: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

0 1 0 1 1 0 1 1

0 1 0 1 1 0 0 0

Background

• Genetic programming

– Applying Genetic Algorithm(GA) to a computer program

Representation

Crossover

Mutation

Selection

Represent computer program as a individual (chromosome).

Crossover two individuals to make a new child.

Select next generation by assessing individuals using fitness function.

Program 0 1 0 1 1 0 0 1

0 1 0 1 1 0 0 1

0 1 0 1 1 0 0 1

0 0 0 1 0 0 0 1

1 1 1 1 1 0 0 1{ {

Individuals that works better than others

Change individuals with mutation operator.

0 1 0 1 1 0 0 1 0 1 0 1 1 1 0 1

1 1 0 1 1 0 0 0

0 1 1 1 0 1 0 1

1 1 0 1 1 1 0 1

0 1 1 1 0 0 0 0

Page 13: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Approach Overview

13

Representation

Crossover

Mutation

Selection

Represent computer program as a individual (chromosome).

Crossover two individuals to make a new child.

Select next generation by assessing individuals using fitness function.

Minimization Minimize the final solution by removing extra statements

Change individuals with mutation operator.

Program {stmt1,w1} {stmt2,w2} …

{stmt1,w1} {stmt2,w2} … {stmt1,w1} {stmt5,w2} …

{stmt3,w1} {stmt7,w2} … {stmt3,w1} {stmt2,w2} …

{stmt6,w1} {stmt2,w2} … {stmt6,w1} {stmt7,w2} …

0 10 1 0 1 1 0 0 1{

{

Individuals that pass many test cases

Page 14: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Program Representation

• Represent program statements as AST node. (using CIL*)

14

Statement sequence

if

Compareop: ==

a 0

List<Stmt>

Method invocation

printf

“%d” b

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }

Original AST

while

*: “Cil: An infrastructure for C program analysis and transformation”, G.C. Necula, S.McPeak, S.P.Rahul, and W. Weimer, ICCC’02

Page 15: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Program Representation

• Represent program statements as AST node. (using CIL*)

15

Statement sequence

If (a==0)

Printf(“%d”,b)

while(b!=0)

If(a>b)

a = a - b; b = b – a;

printf(“%d”,a) exit(0);

High-level AST

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }

*: “Cil: An infrastructure for C program analysis and transformation”, G.C. Necula, S.McPeak, S.P.Rahul, and W. Weimer, ICCC’02

Page 16: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Program Representation

• Represent program statements as AST node. (using CIL*)

• Pairing each statement with weight.

16

Node representation

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }

[3] If (a==0) [4] Printf(“%d”,b)

[6] while(b!=0)

[7] If(a>b) [8] a = a - b; [10] b = b – a;

[11] printf(“%d”,a) [12] exit(0);

Page 17: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Program Representation

• Represent program statements as AST node. (using CIL*)

• Pairing each statement with weight.

17

[3] If (a==0) [4] Printf(“%d”,b)

[6] while(b!=0)

[7] If(a>b) [8] a = a - b; [10] b = b – a;

[11] printf(“%d”,a) [12] exit(0);

Node representation

1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }

{{[3]},0.1} {{[4]},1.0} {{[6]},0.1} {{[7]},0.1} {{[8]},0} {{[10]},0.1} {{[11]},0} {{[12]},0}

Chromosome representation: A 𝐿𝑖𝑠𝑡 𝑜𝑓 < 𝑠𝑡𝑚𝑡𝑖 , 𝑝𝑟𝑜𝑏𝑖 >

Weighting policyVisited by only negative TCs: 1.0Visited by only positive TCs: 0.0

Visited by both TCs: 0.1

Page 18: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Mutation

• Mutation algorithm

– For a 𝑠𝑡𝑚𝑡𝑖, mutation is applied if the condition is satisfied

• Three mutation operator

– Insert: insert 𝑠𝑡𝑚𝑡𝑗 statement after 𝑠𝑡𝑚𝑡𝑖 .

– Swap: insert 𝑠𝑡𝑚𝑡𝑗 statement instead of 𝑠𝑡𝑚𝑡𝑖 .

– Delete: delete 𝑠𝑡𝑚𝑡𝑖.

* NOTE: 𝑝𝑟𝑜𝑏𝑖 is not changed.18

𝑟𝑎𝑛𝑑 0,1 ≤ 𝑝𝑟𝑜𝑏𝑖 ∧ 𝑟𝑎𝑛𝑑 0,1 ≤ 𝑊𝑚𝑢𝑡

𝑃𝑎𝑡ℎ 𝑖 ← < {𝑠𝑡𝑚𝑡𝑖; 𝑠𝑡𝑚𝑡𝑗}, 𝑝𝑟𝑜𝑏𝑖 >

𝑃𝑎𝑡ℎ 𝑖 ← < 𝑠𝑡𝑚𝑡𝑗, 𝑝𝑟𝑜𝑏𝑖 >

𝑃𝑎𝑡ℎ 𝑖 ← < { }, 𝑝𝑟𝑜𝑏𝑖 >

Page 19: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Crossover

• Randomly select cutoff point, then combine fromboth parent.

19

{stmt1,w1} {stmt2,w2} {stmt3,w3} {stmt4,w4} {stmt5,w5} {stmt6,w6}

{stmt1’,w1} {stmt2’,w2} {stmt3’,w3} {stmt4’,w4} {stmt5’,w5} {stmt6’,w6}

Randomly selected cutoff point

{stmt1,w1} {stmt2’,w2} {stmt3’,w3} {stmt4’,w4} {stmt5’,w5} {stmt6’,w6}

{stmt1’,w1} {stmt2,w2} {stmt3,w3} {stmt4,w4} {stmt5,w5} {stmt6,w6}

Parents:

Children:(Next generation)

|Parent generation|= (pop_size/2) |Parent +Children|= (pop_size)

|Next generation|= (pop_size/2)Selection process

Page 20: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Selection

• Fitness function is used by selection process to assesseach chromosome.

• Fitness function encodes software requirements.

– The positive test cases: necessary functionality that cannotbe sacrificed.

– The negative test cases: the fault to be repaired.

– A chromosome that cannot be compiled: zero fitness score.

𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑃 = 𝑊𝑃𝑜𝑠𝑇 × 𝑡 ∈ 𝑃𝑜𝑠𝑇 𝑃 𝑝𝑎𝑠𝑠𝑒𝑠 𝑡 |+𝑊𝑁𝑒𝑔𝑇 × 𝑡 ∈ 𝑁𝑒𝑔𝑇 𝑃 𝑝𝑎𝑠𝑠𝑒𝑠 𝑡 |

𝑊𝑃𝑜𝑠𝑇: Weight for positive test cases𝑊𝑁𝑒𝑔𝑇: Weight for negative test cases

Page 21: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Selection (cont’d)

• Selection process determines the next generation.

• Stochastic Universal Sampling (SUS) is used.

– The probability of selection is proportional to relativefitness in the population.

21

A B C D E F G

0 FTotal fitness = F

F/NStart point ∈ [0,F/N)

An Stochastic Universal Sampling (SUS) example

Termination criterion: A chromosome passes all test cases

Page 22: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Repair Minimization

• Using Tree differencing algorithm and Delta debugging,minimize the final result.

22

Original program Final results which passes all TCs

Removed

Added

Delta debugging:- Finding minimum difference of two test cases that one fails and the other passes.

Removed Added Added

… Find the minimal subset of the difference!

X O O

Page 23: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Experimental Setup

• Goal of experiment

1. Evaluate performance and scalability

2. Measure run-time cost

3. Evaluate the success rate

4. Understand how test cases affect repair quality

• Test cases

– 1 fault test cases

– A small number of (2-6) positive test cases• Non-crashing fuzz inputs (randomly generated)

• Manually created simple positive test cases

23

Page 24: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Experimental Setup (Cont’d)

• Parameters

– pop_size: 40

– maximum of ten generations

– 𝑊𝑃𝑜𝑠𝑇 = 1 and 𝑊𝑁𝑒𝑔𝑇 = 10

• 10 subject programs

24

Page 25: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Experimental Results

• 54% of time is spent executing test cases, and 30% is spent compiling program variants.

• 5.5 insertions, deletions, and swaps applied to a variant between generations.

• The average initial repair was evolved using 3.5 crossovers and 1.8 mutations over 6.0generations.

• All of repairs (1) compile, (2) fix the defect, and (3) avoid compromising requiredfunctionality in the positive test cases provided.

Page 26: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Related work

• W. Weimer et al.: automatic repairing with specification.

– Require formal spec. which are rarely available.

– Repairs sacrifice other required functionalities.

– Only repairs single-thread violations of temporal safety.

• T. Ball et al., S. Chaki et al., and A. Groce et al.: Tracelocalization, minimization, and explanation.

– Narrow down a large counterexample backtrace to a few lines.

– Only deal with the fault those found by static analysis.

• Arcuri: repair software bugs automatically using GP

– Needs formal specification as oracle.

– No evaluation on real bugs and real software.

26

Page 27: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Related work

• A Systematic Study of Program Repair: Fixing 55 outof 105 Bug s for $8 Each, Claire Le Goues, MichaelDewey-Vogt, Stephanie Forrest, and Westley Weimer.(ICSE’11)

– Using similar technique based on Genetic Programming,

– With the Amazon cloud service,

27

Page 28: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Conclusion

• Presenting a fully automated technique for repairbugs using Genetic Programming (GP).

• Suggesting a novel representation of program for GPand genetic operators.

• Suggesting patch minimization approach using deltadebugging and tree differencing algorithm.

28

Page 29: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

Discussion

• Pros

– Positive test cases are much easier to obtain than formalspecifications or code annotations.

– Weighted representation makes this possible specifically inscalability manner.

• Cons

– Assuming that the defect is reproducible and the negativetest case is deterministic.

– Assuming that the path along negative TC is different frompositive TC.

– Assuming that the repair can be constructed fromstatements already extant in the program.

29

Page 30: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

30

Thank you for listening

Page 31: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

• Stmt ::= Instr(instr list) | Return(exp option)

| Goto(stmt) | Break

| Continue | If(exp, stmt list, stmt list)

| Switch(exp, stmt list, stmt list)

| Loop(stmt list)

31

Page 32: Automatically Finding Patches Using Genetic Programmingse.kaist.ac.kr/wp-content/uploads/2014/01/Automatically... · 2014. 1. 15. · Automatically Finding Patches Using Genetic Programming

• A delta debugging example

32