Studying Software Quality Using Topic Models
Transcript of Studying Software Quality Using Topic Models
![Page 1: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/1.jpg)
Studying Software Quality Using Topic Models
Tse-Hsun (Peter) Chen
![Page 2: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/2.jpg)
Related Publications
2
Explaining Software Defects Using Topic Models, Tse-Hsun Chen, Stephen W. Thomas, Meiyappan Nagappan, Ahmed E. Hassan, 9th Working Conference on Mining Software Repositories (MSR). Zurich, Switzerland. June 2-3, 2012 (acceptance rate: 18/64 (28%))
Studying the Effect of Testing on Code Quality using Topic Models, Tse-Hsun Chen, Stephen W. Thomas, Hadi Hemmati, Meiyappan Nagappan, Ahmed E. Hassan, under review for the Journal of Empirical Software Engineering. Springer Press (Impact Factor 1.854).
An Empirical Study of Concerns and Their Ability to Explain Defects in Large Software Systems, Tse-Hsun Chen, Stephen W. Thomas, Meiyappan Nagappan, Ahmed E. Hassan, to be submitted for IEEE Transactions on Software Engineering (Impact Factor 1.98).
![Page 3: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/3.jpg)
Thesis Statement
3
Topics, which are approximations of software concerns, can be used to study software quality by better explaining the quality of code and helping allocate software quality assurance efforts effectively.
![Page 4: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/4.jpg)
4
int readFile(String filePath){ // reading filefp =
readFile(filePath)if fp == NULLreturn -1
elsereturn fp
}
int manageMemory(int index){
if mem[index] is not NULL{
// find free // memory
freeInd = findFreeMemoryLoc()
goto(freeInd)}}
More Risky Concern
Can we use concerns to study software quality?
![Page 5: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/5.jpg)
Capturing Concerns Using Topic Models
manage memory index mem free ind find free memory loc
read file file path fp file path fp
Topics Models(LDA)
Topic 1
Topic 2
read, file, path, fp, file
5
manage, memory, mem,
free
Topic 3Index, ind, find,
loc
60 %0 %40 %
0 %55 %45 %
![Page 6: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/6.jpg)
6
Studying code quality using topics
Studying code coverage using topics
CodeThings to
test
![Page 7: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/7.jpg)
7
Studying code quality using topics
Studying code coverage using topics
CodeThings to
test
![Page 8: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/8.jpg)
8
How defect prone are topics?
Can topics help explain software defects?
Studying Code Quality Using Topics
![Page 9: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/9.jpg)
Are Topics Equally Defect-prone?
9
If they are, then we CANNOT use topics to study code quality
[MSR 2012]
![Page 10: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/10.jpg)
10
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
[MSR 2012]
![Page 11: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/11.jpg)
11
F1
F2
F3
T1
T2
T3
T4
Measuring Topic Defect-proneness
[MSR 2012]
![Page 12: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/12.jpg)
12
Few Topics are Defect-prone
Jface,Comparison check
Task, Eclipse, Task ui,Repository
[MSR 2012]
Topi
c D
efec
t Den
sity
![Page 13: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/13.jpg)
Explaining Defects
13
Lines of Code
Pre-release DefectsCode Churn
Static
Historical
Topics Topic Metrics
[MSR 2012]
![Page 14: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/14.jpg)
Explainability of Metrics
14
Deviance Explained(D1)
D2
Improvement in Explainability = D2 – D1
Static
StaticTopics
[MSR 2012]
![Page 15: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/15.jpg)
15
F1
F2
F3
T1
T2
T3
T4
Using Topics to Explain DefectsNumber of Topics
[MSR 2012]
![Page 16: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/16.jpg)
16
F1
F2
F3
T1
T2
T3
T4
Using Topics to Explain DefectsNumber of Topics
[MSR 2012]
![Page 17: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/17.jpg)
17
F3
T1
T2
T3
T4
Using Topics to Explain DefectsNumber of Defect-prone
Topics
F1
F2
[MSR 2012]
![Page 18: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/18.jpg)
More Topics More Defects in File
Series10
10
20
30
40
50
60
30 %
48 %
18
Avg.
% Im
prov
emen
t in
D2
[MSR 2012]
Series10
5
10
15
20
25 21 % 21 %
49 %
0 %
7 % 6 %
Number of Topics
Number of Defect-prone Topics
![Page 19: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/19.jpg)
Compare with Other Cohesion/Coupling Metrics
19
# of topics and other topic-based metrics, which one is better?
# of topics?
[TSE 201X]
![Page 20: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/20.jpg)
# Topics Outperforms Others
20
Series10
5
10
15
20
25
30
35
40
45
%Av
g. Im
prov
emen
t in
D2
over
bas
e
# topics (our metric) State-of-the-arts metrics
39 %
3 % 3 %
20 %
[TSE 201X]
![Page 21: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/21.jpg)
21
Studying code quality using topics
Studying code coverage using topics
CodeThings to
test
![Page 22: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/22.jpg)
22
Studying code quality using topics
Studying code coverage using topics
CodeThings to
test
![Page 23: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/23.jpg)
We found only a few topics are defect-prone…
C an we allocate MORE testing resources on low tested but defect prone
topics?
23
![Page 24: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/24.jpg)
24
Can we predict low unit tested and high defect-prone topics?
Studying Code Coverage Using Topics
Relationship between code coverage and quality?
![Page 25: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/25.jpg)
Measuring Topic Testedness
25
F1
T1
T1
T2
[EMSE 201X]
Topic Testedness: how much a topic is tested
![Page 26: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/26.jpg)
More Unit Tested, Less Defect Prone
26[EMSE 201X]
![Page 27: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/27.jpg)
Predict LTHD Topics Accurately
27
Series10.62
0.64
0.66
0.68
0.7
0.72
0.74
0.76
0.78
0.8
0.82
Avg.
F-M
easu
re
[EMSE 201X]
0.8
0.76
0.68
![Page 28: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/28.jpg)
Can We Give Improvements to Existing Approach?
Tester usually test at concern level…but existing approaches do not satisfy it
28
Can we HELP existing test allocation approach?
[EMSE 201X]
![Page 29: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/29.jpg)
Low Overlap With Existing Approach – Prediction Model
29
Top N buggy files that may need more test
Top N buggy files found
On average, only 5.3% overlapping files
[EMSE 201X]
Our ApproachPrediction–based
Approach
![Page 30: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/30.jpg)
File Defect DensityNumber of Bugs
30
Lines of CodeFile Defect Density =
A measure for estimating efforts for finding bugs
![Page 31: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/31.jpg)
Files We Found Have Higher Defect Density
31
Series10
50
100
150
200
250
300
Avg.
% D
efec
t Den
sity
Impr
ovem
ents
[EMSE 201X]
64 %
242 %
30 %
![Page 32: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/32.jpg)
32
Studying code quality using topics
Studying code coverage using topics
CodeThings to
test
![Page 33: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/33.jpg)
Thesis Statement
33
Topics, which are approximations of software concerns, can be used to study software quality by better explaining the quality of code and helping allocate software quality assurance efforts effectively.
![Page 34: Studying Software Quality Using Topic Models](https://reader035.fdocuments.us/reader035/viewer/2022081604/5878d7ba1a28ab917a8b66d1/html5/thumbnails/34.jpg)
34
Code
Study Code Quality using Topics
Relationship between defects and topics
Use topicsTo explaindefects
Study Code Coverage using Topics
Relationship between topic testedness and defects
Predict low unit tested and defect prone topics
Things to
test