TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

22
Defect report classification in accordance with areas of testing Anna Gromova, Exactpro Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 2460, +1 415 830 38 49 www.exactpro.com

Transcript of TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Page 1: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Defect report classification in

accordance with areas of testing

Anna Gromova, Exactpro

Open Access Quality Assurance & Related Software Development for Financial Markets

Tel: +7 495 640 2460, +1 415 830 38 49

www.exactpro.com

Page 2: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com2

Defect Management

Areas of research in defect management:

• automatic defect fixing

• automatic defect detection

• metrics and predictions of defect reports

• quality of defect reports

• triaging defect reports

Page 3: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com3

• Examples of metrics:

• time to fix / time to resolve

• which defects get reopened

• which defects get fixed

• which defects get rejected

Metrics of testing

Page 4: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com4

Area of testing: Component/s and Summary

Page 5: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com5

● Manual classification of 2,795 defect reports extracted from the

bug tracking system.

● Answers to the following questions based on the previous

classification and natural language processing:

1. Does feature selection improve defect classification?

1. What combinations of the classifiers and feature selection

methods give the best results?

Contribution

Page 6: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com6

Text categorization allows solving

the following tasks:

● classifying defects in

relation to different features,

such as the type of issue,

security or the configuration

aspect;

● predicting the

assignment of a developer that

should fix the bug;

● predicting the category

of the software component that

is connected to the defect,

etc.

Classification: related work

Page 7: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com7

Techniques: preprocessing

● Natural language processing:

❖ Tokenization

❖ Removal of stop-words

❖ Stemming

● Bag of words (TF-IDF)

TF(t,d)=freq(t,d)/(maxw∈D freq(w,d))

IDF(t,D)=log2 (|D|/(d∈D:t∈d))

freq(t,d) — term frequency, i.e. the number of times that term t occurs in document d;

max w∈D freq(w,d) — the maximum frequency of any term in document d;

d∈D:t∈d — number of documents containing t;

D — total number of documents in the corpus

TFIDF=TF(t,d)×IDF(t,D)

Page 8: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com8

Techniques: feature selection

Page 9: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com9

Classifiers:

● Logistic regression

● SVM

● Decision tree

● Random forest

● Naive Bayes

● Bayes Net

Techniques

Page 10: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com10

Objects

Page 11: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com11

Example

CR

T1: Property1 = true

T2: Property1 = true

Market Structure

Document

Ti: Property1 = false

Current situation

Market Structure Gateway

T1: Property1 = true T1: Property1 = NULL

Page 12: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com12

Approach

Page 13: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com13

Results: metrics

Page 14: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com14

The red values correspond to the minimum values of the F-measure, the green values - to the maximum.

Classifier FSAREA 1 AREA 2 AREA 3 AREA 4 AREA 5 AREA 6 AREA 7 AREA 8

F-measure F-measure F-measure F-measure F-measure F-measure F-measure F-measure

LogReg No 0,745 0,404 0,758 0,905 0,8 0,892 0,964 0,877

SVM No 0,741 0 0,389 0,852 0,389 0,723 0,914 0,864

J48 No 0,898 0,832 0,739 0,953 0,931 0,955 0,991 0,952

RandFor No 0,771 0,628 0,667 0,928 0,867 0,874 0,935 0,968

Bnet No 0,716 0,864 0,764 0,912 0,92 0,862 0,982 0,917

Bayes No 0,68 0,628 0,647 0,847 0,779 0,777 0,956 0,867

LogReg IG 0,907 0,811 0,764 0,883 0,88 0,922 0,894 0,916

SVM IG 0,948 0,862 0,836 0,924 0,938 0,95 0,991 0,938

J48 IG 0,822 0,867 0,739 0,943 0,931 0,955 0,991 0,973

RandFor IG 0,959 0,887 0,897 0,938 0,948 0,936 0,991 0,98

Bnet IG 0,716 0,864 0,764 0,912 0,92 0,862 0,982 0,917

Bayes IG 0,701 0,633 0,688 0,846 0,815 0,784 0,956 0,861

LogReg Cons 0,909 0,86 0,915 0,952 0,938 0,964 0,991 0,973

SVM Cons 0,95 0,87 0,885 0,953 0,938 0,964 0,991 0,976

J48 Cons 0,804 0,829 0,739 0,921 0,931 0,955 0,991 0,902

RandFor Cons 0,939 0,877 0,9 0,95 0,945 0,964 0,991 0,991

Bnet Cons 0,86 0,862 0,792 0,941 0,939 0,964 0,991 0,962

Bayes Cons 0,816 0,752 0,733 0,892 0,935 0,955 0,991 0,929

LogReg Cfs 0,88 0,811 0,83 0,921 0,93 0,915 0,991 0,912

SVM Cfs 0,941 0,862 0,836 0,915 0,938 0,936 0,957 0,91

J48 Cfs 0,821 0,821 0,739 0,916 0,931 0,931 0,991 0,838

RandFor Cfs 0,941 0,842 0,815 0,93 0,938 0,936 0,991 0,918

Bnet Cfs 0,782 0,862 0,815 0,926 0,945 0,847 0,982 0,903

Bayes Cfs 0,714 0,782 0,881 0,914 0,925 0,8 0,991 0,889

LogReg SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962

SVM SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962

J48 SSF 0,821 0,829 0,739 0,916 0,931 0,955 0,991 0,894

RandFor SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,962

Bnet SSF 0,86 0,862 0,836 0,916 0,938 0,955 0,991 0,962

Bayes SSF 0,923 0,87 0,836 0,916 0,938 0,955 0,991 0,928

Results: hold out

Page 15: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com15

Results: hold out

Page 16: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com16

Results: cross-validation

Classifier FSAREA 1 AREA 2 AREA 3 AREA 4 AREA 5 AREA 6 AREA 7 AREA 8

F-measure F-measure F-measure F-measure F-measure F-measure F-measure F-measure

LogReg No 0,724 0,654 0,464 0,837 0,618 0,875 0,967 0,915

SVM No 0,748 0,052 0,726 0,873 0,563 0,86 0,949 0,877

J48 No 0,925 0,821 0,743 0,925 0,927 0,963 0,991 0,957

RandFor No 0,813 0,687 0,721 0,93 0,875 0,941 0,975 0,948

Bnet No 0,717 0,856 0,691 0,913 0,89 0,911 0,982 0,911

Bayes No 0,718 0,7 0,654 0,853 0,789 0,814 0,969 0,841

LogReg IG 0,856 0,785 0,789 0,881 0,882 0,852 0,991 0,879

SVM IG 0,948 0,854 0,825 0,933 0,954 0,971 0,991 0,943

J48 IG 0,931 0,868 0,752 0,947 0,944 0,969 0,991 0,957

RandFor IG 0,954 0,859 0,918 0,939 0,943 0,964 0,985 0,974

Bnet IG 0,717 0,856 0,691 0,913 0,818 0,911 0,982 0,911

Bayes IG 0,718 0,776 0,631 0,849 0,89 0,827 0,973 0,844

LogReg Cons 0,934 0,833 0,914 0,948 0,948 0,974 0,991 0,969

SVM Cons 0,946 0,844 0,914 0,954 0,954 0,976 0,991 0,965

J48 Cons 0,931 0,809 0,789 0,923 0,934 0,968 0,991 0,952

RandFor Cons 0,942 0,837 0,92 0,95 0,951 0,975 0,991 0,975

Bnet Cons 0,818 0,855 0,757 0,946 0,93 0,975 0,991 0,964

Bayes Cons 0,811 0,773 0,78 0,882 0,891 0,937 0,991 0,935

LogReg Cfs 0,921 0,831 0,872 0,931 0,939 0,951 0,982 0,915

SVM Cfs 0,941 0,844 0,841 0,937 0,952 0,962 0,982 0,92

J48 Cfs 0,933 0,791 0,748 0,917 0,933 0,963 0,991 0,905

RandFor Cfs 0,929 0,858 0,88 0,938 0,949 0,958 0,988 0,922

Bnet Cfs 0,797 0,856 0,815 0,931 0,93 0,935 0,988 0,903

Bayes Cfs 0,739 0,78 0,865 0,909 0,912 0,879 0,988 0,849

LogReg SSF 0,924 0,856 0,836 0,916 0,942 0,968 0,991 0,96

SVM SSF 0,924 0,849 0,836 0,917 0,941 0,968 0,991 0,96

J48 SSF 0,927 0,794 0,748 0,917 0,933 0,968 0,991 0,942

RandFor SSF 0,924 0,849 0,841 0,916 0,942 0,968 0,991 0,958

Bnet SSF 0,866 0,856 0,823 0,915 0,942 0,968 0,991 0,958

Bayes SSF 0,924 0,85 0,841 0,916 0,938 0,968 0,991 0,957

The red values correspond to the minimum values of the F-measure, the green values - to the maximum.

Page 17: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com17

Results: cross-validation

Page 18: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com18

Results: hold-out vs cross-validation

Page 19: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com19

1. Manual classification of 2,795 defect reports extracted from the bug tracking

system according to the area of testing.

2. Building classifiers for each area using different machine learning and natural language

processing techniques.Methods of feature selection: information gain, the consistency-based and correlation-based methods,

and the simplified silhouette filter. Methods of classification: logistic regression, support vector machines,

decision tree, random forest, Bayes net and Naive Bayes.

❖ Feature selection is an integral part of a successful classification process

❖ The following combinations of the classifiers and feature selection methods have the best

results in both types of the set division:

- random forest and information gain;

- random forest and the consistency-based method;

- support vector machines and information gain;

- support vector machines and the consistency-based method.

Conclusions

Page 20: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com20

● Clustering of defect-reports

● Prediction of the metric called “which defects get reopened”.

Future work

Page 21: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com21

Thank you!

Page 22: TMPA-2017: Defect Report Classification in Accordance with Areas of Testing

Open Access Quality Assurance & Related Software Development for Financial Markets Tel: +7 495 640 24 60 , +1 415 830 38 49

www.exactpro.com22

● Antoniol G., Ayari K., Di Penta M., Khomh F., Guhneuc Y.-G.: Is it a bug or an enhancement?: A text-

based approach to classify change requests. In Proc. 2008 Conf. Center for Adv. Studies Collaborative

Res.: Meeting Minds, 2008, ser. CASCON 08, Article No. 23. New York, NY, USA: ACM, 304-318

● Xia X., Lo D., Qiu W., Wang B., Zhou B.: Automated Configuration Bug Report Prediction Using Text

Mining. In 2014 IEEE 38th Annual Computer Software and Applications Conference, 2014, 107–116

● Gegick M., Rotella P., Xie T.: Identifying security bug reports via text mining: An industrial case study.

In Proc. 7th IEEE Working Conf. Mining Software Repositories (MSR), May 2010, IEEE Computer

Society, 11-20

● Zhou Y., Tong Y., Ruihang Gu, Gall H.C.: Combining Text Mining and Data Mining for Bug Report

Classification. In Proc. of 30th International Conference on Software Maintenance and Evolution

(ICSM/ICSME), IEEE, 2014, 311–320

● Somasundaram K., Murphy G.C.: Automatic categorization of bug reports using latent dirichlet

allocation. In proc. of the 5th India Software Engineering Conference , ISEC’12, New York, 2012, ACM,

125–130

● Cubranic D., Murphy G.C: Automatic bug triage using text categorization. In Proc. 16th Int. Conf.

Software Eng. Knowledge Eng.. : KSI Press, 2004, 92–97

● Sureka A.,Indukuri K.V.: Linguistic analysis of bug report titles with respect to the dimension of bug

importance. In Proceedings of the Third Annual ACM Bangalore Conference, Article No. 9, ACM, 2010,

1–6

Related work