Automatically Generated Patches as Debugging Aids: A Human Study (FSE 2014)
Automatically Finding Patches Using Genetic...
Transcript of Automatically Finding Patches Using Genetic...
Automatically Finding Patches Using Genetic Programmingpublished in ICSE’09
Westley Weimer, Thanh Vu Nguyen, Claire Le Goues, and Stephanie Forrest
Presenter: Jihun Park
SELab
2014.01.10
LAB Seminar
DrawbacksComplicate input
Introduction
2
• Software bug
– Maintenance takes the majority of total software dev. cost.
– Fixing bug is inevitable, difficult, and tedious manual process.
Automatic program
repair
Difficult formal specification
Program annotations
Special coding practice
Harmful repairs
Restricted property
Just narrowing down to few lines
DrawbacksComplicate input
Introduction
3
• Software bug
– Maintenance takes the majority of total software dev. cost.
– Fixing bug is inevitable, difficult, and tedious manual process.
Automatic program
repair
Difficult formal specification
Program annotations
Special coding practice
Harmful repairs
Restricted property
Just narrowing down to few lines
GOAL
Suggesting automatic patch generation technique using Genetic
Programming• With simple inputs,
• Not sacrificing required functionality,• Generating a concrete patch
Outline
• Introduction
• Motivating Example
• Background
• Approach Overview
• Genetic Programming (GP) for Program Repair
• Experiments
• Conclusion
• Discussion
4
Motivating Example
5
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }
With positiveand negativetest cases,
gcd(1071, 1029) = 21
gcd(0, 55) = > infinite loop
Positive test case
Negative test case
Motivating Example
6
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }
With positiveand negativetest cases,
gcd(1071, 1029) = 21
gcd(0, 55) = > infinite loop
Positive test case
Negative test case
Motivating Example
7
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }
With positiveand negativetest cases,
gcd(1071, 1029) = 21
gcd(0, 55) = > infinite loop
Positive test case
Negative test case
Locate suspicious code
Motivating Example
8
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }
With positiveand negativetest cases,
gcd(1071, 1029) = 21
gcd(0, 55) = > infinite loop
Positive test case
Negative test case
Locate suspicious code
Insert/remove/replace statementswith existing ones to fix negative TC.
Motivating Example
9
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 exit(0);6 a = a - b;7 }8 while (b != 0)9 if (a > b)10 a = a - b;11 else12 b = b - a;13 printf("%d", a);14 exit(0);15 }
With positiveand negativetest cases,
gcd(1071, 1029) = 21
gcd(0, 55) = > infinite loop
Positive test case
Negative test case
Locate suspicious code
Insert/remove/replace statementswith existing ones to fix negative TC.
gcd(1071, 1029) = 21
gcd(0, 55) = infinite loop
Positive test case
Negative test case
gcd(0, 55) = 55
Motivating Example
10
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 exit(0);6 a = a - b;7 }8 while (b != 0)9 if (a > b)10 a = a - b;11 else12 b = b - a;13 printf("%d", a);14 exit(0);15 }
With positiveand negativetest cases,
gcd(1071, 1029) = 21
gcd(0, 55) = > infinite loop
Positive test case
Negative test case
Locate suspicious code
Insert/remove/replace statementswith existing ones to fix negative TC.
If we find a fix, minimize it by deleting extra statements.
gcd(1071, 1029) = 21
gcd(0, 55) = infinite loop
Positive test case
Negative test case
gcd(0, 55) = 55
Motivating Example
11
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 exit(0);6 }7 while (b != 0)8 if (a > b)9 a = a - b;10 else11 b = b - a;12 printf("%d", a);13 exit(0);14 }
With positiveand negativetest cases,
gcd(1071, 1029) = 21
gcd(0, 55) = > infinite loop
Positive test case
Negative test case
Locate suspicious code
Insert/remove/replace statementswith existing ones to fix negative TC.
If we find a fix, minimize it by deleting extra statements.
gcd(1071, 1029) = 21
gcd(0, 55) = infinite loop
Positive test case
Negative test case
gcd(0, 55) = 55
0 1 0 1 1 0 1 1
0 1 0 1 1 0 0 0
Background
• Genetic programming
– Applying Genetic Algorithm(GA) to a computer program
Representation
Crossover
Mutation
Selection
Represent computer program as a individual (chromosome).
Crossover two individuals to make a new child.
Select next generation by assessing individuals using fitness function.
Program 0 1 0 1 1 0 0 1
0 1 0 1 1 0 0 1
0 1 0 1 1 0 0 1
0 0 0 1 0 0 0 1
1 1 1 1 1 0 0 1{ {
Individuals that works better than others
Change individuals with mutation operator.
0 1 0 1 1 0 0 1 0 1 0 1 1 1 0 1
1 1 0 1 1 0 0 0
0 1 1 1 0 1 0 1
1 1 0 1 1 1 0 1
0 1 1 1 0 0 0 0
Approach Overview
13
Representation
Crossover
Mutation
Selection
Represent computer program as a individual (chromosome).
Crossover two individuals to make a new child.
Select next generation by assessing individuals using fitness function.
Minimization Minimize the final solution by removing extra statements
Change individuals with mutation operator.
Program {stmt1,w1} {stmt2,w2} …
{stmt1,w1} {stmt2,w2} … {stmt1,w1} {stmt5,w2} …
{stmt3,w1} {stmt7,w2} … {stmt3,w1} {stmt2,w2} …
{stmt6,w1} {stmt2,w2} … {stmt6,w1} {stmt7,w2} …
0 10 1 0 1 1 0 0 1{
{
Individuals that pass many test cases
Program Representation
• Represent program statements as AST node. (using CIL*)
14
Statement sequence
if
Compareop: ==
a 0
List<Stmt>
Method invocation
printf
“%d” b
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }
Original AST
while
*: “Cil: An infrastructure for C program analysis and transformation”, G.C. Necula, S.McPeak, S.P.Rahul, and W. Weimer, ICCC’02
Program Representation
• Represent program statements as AST node. (using CIL*)
15
Statement sequence
If (a==0)
Printf(“%d”,b)
while(b!=0)
If(a>b)
a = a - b; b = b – a;
printf(“%d”,a) exit(0);
High-level AST
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }
*: “Cil: An infrastructure for C program analysis and transformation”, G.C. Necula, S.McPeak, S.P.Rahul, and W. Weimer, ICCC’02
Program Representation
• Represent program statements as AST node. (using CIL*)
• Pairing each statement with weight.
16
Node representation
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }
[3] If (a==0) [4] Printf(“%d”,b)
[6] while(b!=0)
[7] If(a>b) [8] a = a - b; [10] b = b – a;
[11] printf(“%d”,a) [12] exit(0);
Program Representation
• Represent program statements as AST node. (using CIL*)
• Pairing each statement with weight.
17
[3] If (a==0) [4] Printf(“%d”,b)
[6] while(b!=0)
[7] If(a>b) [8] a = a - b; [10] b = b – a;
[11] printf(“%d”,a) [12] exit(0);
Node representation
1 /* requires: a >= 0, b >= 0 */2 void gcd(int a, int b) {3 if (a == 0) {4 printf("%d", b);5 }6 while (b != 0)7 if (a > b)8 a = a - b;9 else10 b = b - a;11 printf("%d", a);12 exit(0);13 }
{{[3]},0.1} {{[4]},1.0} {{[6]},0.1} {{[7]},0.1} {{[8]},0} {{[10]},0.1} {{[11]},0} {{[12]},0}
Chromosome representation: A 𝐿𝑖𝑠𝑡 𝑜𝑓 < 𝑠𝑡𝑚𝑡𝑖 , 𝑝𝑟𝑜𝑏𝑖 >
Weighting policyVisited by only negative TCs: 1.0Visited by only positive TCs: 0.0
Visited by both TCs: 0.1
Mutation
• Mutation algorithm
– For a 𝑠𝑡𝑚𝑡𝑖, mutation is applied if the condition is satisfied
• Three mutation operator
– Insert: insert 𝑠𝑡𝑚𝑡𝑗 statement after 𝑠𝑡𝑚𝑡𝑖 .
– Swap: insert 𝑠𝑡𝑚𝑡𝑗 statement instead of 𝑠𝑡𝑚𝑡𝑖 .
– Delete: delete 𝑠𝑡𝑚𝑡𝑖.
* NOTE: 𝑝𝑟𝑜𝑏𝑖 is not changed.18
𝑟𝑎𝑛𝑑 0,1 ≤ 𝑝𝑟𝑜𝑏𝑖 ∧ 𝑟𝑎𝑛𝑑 0,1 ≤ 𝑊𝑚𝑢𝑡
𝑃𝑎𝑡ℎ 𝑖 ← < {𝑠𝑡𝑚𝑡𝑖; 𝑠𝑡𝑚𝑡𝑗}, 𝑝𝑟𝑜𝑏𝑖 >
𝑃𝑎𝑡ℎ 𝑖 ← < 𝑠𝑡𝑚𝑡𝑗, 𝑝𝑟𝑜𝑏𝑖 >
𝑃𝑎𝑡ℎ 𝑖 ← < { }, 𝑝𝑟𝑜𝑏𝑖 >
Crossover
• Randomly select cutoff point, then combine fromboth parent.
19
{stmt1,w1} {stmt2,w2} {stmt3,w3} {stmt4,w4} {stmt5,w5} {stmt6,w6}
{stmt1’,w1} {stmt2’,w2} {stmt3’,w3} {stmt4’,w4} {stmt5’,w5} {stmt6’,w6}
Randomly selected cutoff point
{stmt1,w1} {stmt2’,w2} {stmt3’,w3} {stmt4’,w4} {stmt5’,w5} {stmt6’,w6}
{stmt1’,w1} {stmt2,w2} {stmt3,w3} {stmt4,w4} {stmt5,w5} {stmt6,w6}
Parents:
Children:(Next generation)
|Parent generation|= (pop_size/2) |Parent +Children|= (pop_size)
|Next generation|= (pop_size/2)Selection process
Selection
• Fitness function is used by selection process to assesseach chromosome.
• Fitness function encodes software requirements.
– The positive test cases: necessary functionality that cannotbe sacrificed.
– The negative test cases: the fault to be repaired.
– A chromosome that cannot be compiled: zero fitness score.
𝑓𝑖𝑡𝑛𝑒𝑠𝑠 𝑃 = 𝑊𝑃𝑜𝑠𝑇 × 𝑡 ∈ 𝑃𝑜𝑠𝑇 𝑃 𝑝𝑎𝑠𝑠𝑒𝑠 𝑡 |+𝑊𝑁𝑒𝑔𝑇 × 𝑡 ∈ 𝑁𝑒𝑔𝑇 𝑃 𝑝𝑎𝑠𝑠𝑒𝑠 𝑡 |
𝑊𝑃𝑜𝑠𝑇: Weight for positive test cases𝑊𝑁𝑒𝑔𝑇: Weight for negative test cases
Selection (cont’d)
• Selection process determines the next generation.
• Stochastic Universal Sampling (SUS) is used.
– The probability of selection is proportional to relativefitness in the population.
21
A B C D E F G
0 FTotal fitness = F
F/NStart point ∈ [0,F/N)
An Stochastic Universal Sampling (SUS) example
Termination criterion: A chromosome passes all test cases
Repair Minimization
• Using Tree differencing algorithm and Delta debugging,minimize the final result.
22
Original program Final results which passes all TCs
Removed
Added
Delta debugging:- Finding minimum difference of two test cases that one fails and the other passes.
Removed Added Added
… Find the minimal subset of the difference!
X O O
Experimental Setup
• Goal of experiment
1. Evaluate performance and scalability
2. Measure run-time cost
3. Evaluate the success rate
4. Understand how test cases affect repair quality
• Test cases
– 1 fault test cases
– A small number of (2-6) positive test cases• Non-crashing fuzz inputs (randomly generated)
• Manually created simple positive test cases
23
Experimental Setup (Cont’d)
• Parameters
– pop_size: 40
– maximum of ten generations
– 𝑊𝑃𝑜𝑠𝑇 = 1 and 𝑊𝑁𝑒𝑔𝑇 = 10
• 10 subject programs
24
Experimental Results
• 54% of time is spent executing test cases, and 30% is spent compiling program variants.
• 5.5 insertions, deletions, and swaps applied to a variant between generations.
• The average initial repair was evolved using 3.5 crossovers and 1.8 mutations over 6.0generations.
• All of repairs (1) compile, (2) fix the defect, and (3) avoid compromising requiredfunctionality in the positive test cases provided.
Related work
• W. Weimer et al.: automatic repairing with specification.
– Require formal spec. which are rarely available.
– Repairs sacrifice other required functionalities.
– Only repairs single-thread violations of temporal safety.
• T. Ball et al., S. Chaki et al., and A. Groce et al.: Tracelocalization, minimization, and explanation.
– Narrow down a large counterexample backtrace to a few lines.
– Only deal with the fault those found by static analysis.
• Arcuri: repair software bugs automatically using GP
– Needs formal specification as oracle.
– No evaluation on real bugs and real software.
26
Related work
• A Systematic Study of Program Repair: Fixing 55 outof 105 Bug s for $8 Each, Claire Le Goues, MichaelDewey-Vogt, Stephanie Forrest, and Westley Weimer.(ICSE’11)
– Using similar technique based on Genetic Programming,
– With the Amazon cloud service,
27
Conclusion
• Presenting a fully automated technique for repairbugs using Genetic Programming (GP).
• Suggesting a novel representation of program for GPand genetic operators.
• Suggesting patch minimization approach using deltadebugging and tree differencing algorithm.
28
Discussion
• Pros
– Positive test cases are much easier to obtain than formalspecifications or code annotations.
– Weighted representation makes this possible specifically inscalability manner.
• Cons
– Assuming that the defect is reproducible and the negativetest case is deterministic.
– Assuming that the path along negative TC is different frompositive TC.
– Assuming that the repair can be constructed fromstatements already extant in the program.
29
30
Thank you for listening
• Stmt ::= Instr(instr list) | Return(exp option)
| Goto(stmt) | Break
| Continue | If(exp, stmt list, stmt list)
| Switch(exp, stmt list, stmt list)
| Loop(stmt list)
31
• A delta debugging example
32