Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum...

21
Localizing Failure-Inducing Program Edit B d S t If ti Edits Based on Spectrum Information Lingming Zhang, Miryung Kim, Sarfraz Khurshid The University of Texas at Austin ICSM2011, September 27th 2011 1

description

Paper: Localizing Failure-Inducing Program Edits Based on Spectrum Information.Authors: Lingming Zhang, Miryung Kim, Sarfraz Khurshid.Session: Research Track Session 1: Faults and Regression Testing

Transcript of Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum...

Page 1: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Localizing Failure-Inducing Program Edit B d S t I f tiEdits Based on Spectrum Information

Lingming Zhang, Miryung Kim, Sarfraz KhurshidThe University of Texas at Austin

ICSM2011, September 27th 2011

1

Page 2: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Overview

Change impact analysis is effective at findingChange impact analysis is effective at finding suspicious edits but lacks precise ranking.

Spectrum based fault localization is effective atSpectrum-based fault localization is effective at ranking but does not scale well.

Our insight: combine change impact analysis andOur insight: combine change-impact analysis and spectrum-based fault localization.• Identify suspicious edits using extended call graphs. • Rank suspicious edits using dynamic program

spectrum information.

L. Zhang: Localizing failure-inducing program edits based on spectrum information 2

Page 3: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Summary of our results

FaultTracer localizes failure-inducing edits with

y

FaultTracer localizes failure inducing edits with high precision:

Id tif i i i dit t f• Identifying suspicious edits: outperforms Chianti by 19.37%.

• Ranking all suspicious edits: ranks real regression faults within top 3 edits for 14 ofregression faults within top 3 edits for 14 of the 22 studied real-world failures. R ki th d l l i i dit• Ranking method-level suspicious edits: outperforms existing heuristic by 56.25%.

L. Zhang: Localizing failure-inducing program edits based on spectrum information 3

Page 4: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Outline

FaultTracer ApproachFaultTracer ApproachEmpirical EvaluationRelated Work ConclusionsConclusions

L. Zhang: Localizing failure-inducing program edits based on spectrum information 4

Page 5: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Examplep

Program P Program P’Program P Program Ppublic class A {

public static int f1=0;public static int f2=0;

public class A {public static int f1=1;public static int f2=1;

evolve

p ;...

}class B {

int f1=0; int f2=0; int f3=0;

public static int f2 1;...

}class B {

int f1=0; int f2=1; int f3=1;public int foo(){return f1;}...

}class C extends B{

; ; ;int f4=1;public int foo(){ if(f1>=0) return f1;

else return f4;

Regression test suite T

...}

}...

}class C extends B{

T t

public int f1=3;public void bar(int f) {f3=f+f1;}...

}

public void test1() { A.bar(1); }public void test2() { ... }public void test3() { }Test

Re-TestBug!Bug!

public void test3() { ... }public void test4() {

C c = new C();int f = c.foo();

}

L. Zhang: Localizing failure-inducing program edits based on spectrum information 5

}public void test5() { ... }

Page 6: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

FaultTracer overview

Selecting tests

TT’

Detecting changes and

gbased on Extended Call Graph analysis

P∆

②Tchanges and 

dependences

①P’∆

ᵟtId tif i i i

① ③

tIdentifying suspicious 

edits based on Extended 

Call Graph analysisRank suspicious edits based on④Call Graph analysis edits  based on

program spectrum information

ᵟt’L. Zhang: Localizing failure-inducing program edits based on spectrum information 6

Page 7: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Extended Call Graph representationp p

public void test1() { A.bar(1); }public void test4() {

C c = new C();int f = c.foo();

}

Extended�Call�Graph�used by FaultTracer

Traditional�Call�Graph�used by Chianti used�by�FaultTracerused�by�Chianti

test1 test4

<C,C.foo()>

test1 test4

<C,C.foo()>

A.bar() C.foo()C.C() A.bar()

<SFW,A.f2>

A.Clinit() C.foo()

<FR,C.f1>

C.C()A.Clinit()

B.B()

A.f2 B.f1B.B()

L. Zhang: Localizing failure-inducing program edits based on spectrum information 7

Page 8: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Step 1. Detecting atomic changes and p g gdependences

Change types

Description

CM Change�method

AM Add�method

DM Delete�method

AF Add�field

DF Delete�field

CFI Change�instance�field

CSFI Change static fieldCSFI Change�static�field

LCm Method�look-up�change

LCf Field�look-up changeChange dependences inference rulesChange�dependences�inference�rules

Atomic�Change�Types

L. Zhang: Localizing failure-inducing program edits based on spectrum information 8

Page 9: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Step 2. Test selection based on Extended C ll G h (ECG) l iCall Graph (ECG) analysis

FaultTracer directly matches all changes with test ECGs before edits to select the influenced tests.before edits to select the influenced tests.

L. Zhang: Localizing failure-inducing program edits based on spectrum information 9

Page 10: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Step 3. Suspicious edit identification b d E t d d C ll G h l ibased on Extended Call Graph analysis

FaultTracer directly selects the non-look-up changes appear on test ECGs after edits as suspicious edits.appear on test ECGs after edits as suspicious edits.

FaultTracer selects method or field edits that have caused look-up changes on test ECGs as suspicious editslook up changes on test ECGs as suspicious edits.

L. Zhang: Localizing failure-inducing program edits based on spectrum information 10

Page 11: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Step 4. Spectrum-based fault localization f ditCorrelation between suspicious edits and testsfor program edits

pEdits test2 test3 test4 test5

CSFI(A.f1)

CM(B f )CM(B.foo)

AF(C.f1)

AM(C.bar)

Suspiciousness score computationout Pass Pass Pass Fail

Suspiciousness Score TieBreak

Edits Tarantula SBI Jaccard Ochiai -EditsCSFI(A.f1) 0.00 0.00 0.00 0.00 -

CM(B.foo) 0.75 0.50 0.50 0.71 1

AF(C.f1) 0.75 0.50 0.50 0.71 0

AM(C.bar) 1.00 1.00 1.00 1.00 -

L. Zhang: Localizing failure-inducing program edits based on spectrum information 11

Page 12: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Outline

FaultTracer ApproachFaultTracer ApproachEmpirical EvaluationRelated Work ConclusionsConclusions

L. Zhang: Localizing failure-inducing program edits based on spectrum information 12

Page 13: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Research Questions

RQ1: How does FaultTracer compare to Chianti in id tif i i i dit ?identifying suspicious edits?

RQ2: How effective is FaultTracer in ranking suspicious edits?suspicious edits?

L. Zhang: Localizing failure-inducing program edits based on spectrum information 13

Page 14: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Subjects: overviewj

Subjects from Software-artifact Infrastructure Repository (SIR)Repository (SIR).

Project Version Program Size (KLoC) NumberProject Version Program Size (KLoC) Number of Test

Jtopas 0.0-3.0 1.83 ~ 5.36 95-209

Xml-Security 0.0-3.0 17.44 ~ 18.99 84-106

JMeter 0.0-5.0 31.01 ~ 41.05 70-97

Ant 0.0-8.0 17.20 ~ 80.44 112-878

L. Zhang: Localizing failure-inducing program edits based on spectrum information 14

Page 15: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Subjects: change statistics

Number of changes for each version pair

j g

Number of changes for each version pair

Ant5 0-6 0Ant6.0-7.0Ant7.0-8.0

Ant2.0-3.0Ant3.0-4.0Ant4.0-5.0Ant5.0 6.0

AM

DM

JMeter3.0-4.0JMeter4.0-5.0

Ant0.0-1.0Ant1.0-2.0 DM

CM

AF

JMeter0.0-1.0JMeter1.0-2.0JMeter2.0-3.0JMeter3.0 4.0

DF

CFI

CSFI

Jtopas2.0-3.0XmlSec0.0-1.0XmlSec1.0-2.0XmlSec2.0-3.0

LCm

LCf

0 1000 2000 3000 4000 5000 6000 7000

Jtopas0.0-1.0Jtopas1.0-2.0

p

L. Zhang: Localizing failure-inducing program edits based on spectrum information 15

0 1000 2000 3000 4000 5000 6000 7000

Page 16: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

RQ1: How does FaultTracer compare to Chi ti i id tif i i i dit ?FaultTracer achieves 19.37% improvement in theChianti in identifying suspicious edits?FaultTracer achieves 19.37% improvement in the

precision of identification suspicious edits.

120

140

160

80

100

120

40

60 ChiantiFaultTracer

0

20

1.0

2.0

3.0

1.0

2.0

3.0

1.0

2.0

3.0

4.0

5.0

1.0

2.0

3.0

4.0

5.0

6.0

7.0

8.0

Jtop

as0.

0-

Jtop

as1.

0-2

Jtop

as2.

0-3

XmlS

ec0.

0-

XmlS

ec1.

0-2

XmlS

ec2.

0-3

JMet

er0.

0-

JMet

er1.

0-2

JMet

er2.

0-3

JMet

er3.

0-4

JMet

er4.

0-5

Ant0

.0-

Ant1

.0- 2

Ant2

.0-3

Ant3

.0-4

Ant4

.0-5

Ant5

.0-6

Ant6

.0-7

Ant7

.0-8

L. Zhang: Localizing failure-inducing program edits based on spectrum information 16

X X X

Page 17: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

RQ2: How effective is FaultTracer in ki i i dit ?

Ranks all types of edits:ranking suspicious edits?Ranks all types of edits:

• Average performance.Tarantula SBI Jaccard Ochiai Suspicious 

edit num.Editnumber

Average 8.50 8.50 10.83 14.66 68.83 3932Percentage Toedit number

0.22% 0.22% 0.28% 0.37% 1.75% --

• Example (Ant5.0-6.0)T t T t l SBI J O hi i S i i EditTest Tarantula SBI Jaccar

dOchiai Suspicious 

edit num.Editnumber

ant.taskdefs.optional.EchoPropertiesTest testEchoToBadFile

1 1 1 10 182 5019pertiesTest.testEchoToBadFile

L. Zhang: Localizing failure-inducing program edits based on spectrum information 17

Page 18: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

RQ2: How effective is FaultTracer in ki i i dit ?

Ranks method edits (FaultTracer v.s. Heuristic)ranking suspicious edits?Ranks method edits (FaultTracer v.s. Heuristic)

• Achieves 56.25% improvement in the precision of localizing method-level failure-inducing editslocalizing method-level failure-inducing edits

L. Zhang: Localizing failure-inducing program edits based on spectrum information 18

Page 19: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Limitations

Does not currently filter out refactorings (e.g., useDoes not currently filter out refactorings (e.g., use RefFinder [Prete+2010]).

Uses only four spectrum based fault localizationUses only four spectrum-based fault localization techniques.

The experimental evaluation is limited by the small number of real regression faults.number of real regression faults.

L. Zhang: Localizing failure-inducing program edits based on spectrum information 19

Page 20: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Related work

Change-impact analysisChange impact analysis• Chianti [Ren+2004]• Crisp [Chesley+2005]• Crisp [Chesley+2005]• Heuristic ranking [Ren+2007]

Fault localization• Spectrum-basedSpectrum based

• E.g., Tarantula [Jones+2002], SBI [Liblit+2005], Jaccard[Abreu+2007], Ochiai [Abreu+2007].

• Delta debugging [Zeller1999]• Model-basedModel based

• E.g., Bayesian diagnosis [Kleer+1987]

L. Zhang: Localizing failure-inducing program edits based on spectrum information 20

Page 21: Faults and Regression testing - Localizing Failure-Inducing Program Edits Based on Spectrum Information

Conclusion

FaultTracer combines change impact analysis with g p ydynamic spectra.

FaultTracer improves change impact analysis basedFaultTracer improves change impact analysis based extended call graph analysis.

Experimental evaluation shows FaultTracer:Experimental evaluation shows FaultTracer:• Performs 19.37% better than Chianti in determining

affecting changesaffecting changes.• Localizes failure-inducing edits within top 3 edits for

14 of the 22 regression failures14 of the 22 regression failures.• Performs 56.25% better than previous heuristic for

l li i f il i d i ditlocalizing failure-inducing program edits.

zhanglm10@gmail com

L. Zhang: Localizing failure-inducing program edits based on spectrum information 21

[email protected]