Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

41

description

How comments can be used to locate problem domain concepts and then identify the related source code.

Transcript of Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Page 1: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

A Comment Analysis Approach for ProgramComprehension

José L. Freitas 1 Daniela da Cruz 1 Pedro R. Henriques 1

1Universidade do Minho, Portugal

Software Engineering Workshop, CreteOct. 12-13, 2012

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 2: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Context

Program Comprehension is a vital task of SoftwareMaintenance.

In Software Maintenance, 50% of the time is spent oncomprehending the system.

Several approaches of source code analysis have been appliedto develop PC tools: program slicing, control-�ow, data-�ow,etc.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 3: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Motivation

Most of PC tools are based on the extraction of structuralinformation.Example: Function Y is used by function X n times. etc.

However, they lack the extraction of the meaning of a programor the Problem domain concepts related with the program.Example: Function Y calculates the amount of credit of a

banking account. etc.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 4: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Motivation

Comments can be the biggest source of semantic informationon code, alongside with identi�ers.

1 /∗ This f u n c t i o n r e c e i v e s the i d number o f a bank ingaccount and r e t u r n s the a v a i l a b l e amount o f c r e d i t∗/

3 i n t c r e d i t ( i n t i d ) { . . . }

Why not use comments to search for Problem Domain concepts,needed to understand a program?

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 5: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Bad and Good Comments

When a comment is bad or good?

Apart from the existing controversy around this subject, a badcomment can start from being a comment which is inconsistentwith the code which is commenting, and that leads to themisleading of the person who reads it.

Brooks states that comments help on the comprehension if theyprovide Problem and Program Domain information and means toestablish bridges between those two domains.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 6: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Goal

Create a Program Comprehension tool that explores comments tosearch for Problem Domain concepts: Darius.1

1Relative to King Darius I of Persia, the �rst known man to create the �rstbridge between Europe and Asia, on the Bosphorus strait.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 7: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Outline

1 Darius � Comment EvaluatorPreliminary study

2 Darius � Concept LocatorExperiment

3 Conclusion

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 8: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Comment Evaluator

The �rst version of Darius analyzes:

Comment Quantity: number of comments, percentage ofcomments, etc.

Comment Content: Use of Problem Domain and ProgramDomain terms.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 9: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study

Comment Evaluator Modules

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 10: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius GUI (1)

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 11: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Comment Extractor module

Comment Extractor

Darius extract three types of comments:

1 Inline Comments, IC for short: // ...

2 Block Comments, BC for short: /* ... */

3 JavaDoc Comments, BC for short: /** ... */

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 12: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Comment Extractor module

In order to discover and identify what type of source code entity isassociated with the comment, the next line after the comment isextracted too. Darius associates comments with:

1 classes

2 interfaces

3 methods

4 conditionals (if)

5 loops (while and for)

6 switches

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 13: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Statistics Calculator module

Statistic Calculator module

Number of comments of a project (global, per type ofcomment and per line of source code);

Average number of comment lines per lines of code;

Average number of lines of a non inline comment;

Average number of each type of source code entity which iscommented;

Type of comments most used (global and per source codeentity).

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 14: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Words Analyzer module

Words Analyzer module

Given a list of words extracted from the ontology of the ProblemDomain, Darius computes:

Percentage and frequency of words in the list found incomments;

Frequency of each type of comment that contains words fromthe list;

Frequency of each type of source code entity commented thatcontains words from the list.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 15: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Words Analyzer module

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 16: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Outline

1 Darius � Comment EvaluatorPreliminary study

2 Darius � Concept LocatorExperiment

3 Conclusion

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 17: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study

In order to perform a preliminary study, 10 open-source softwareprojects written in Java were selected.The choice for the use of open-source projects has two reasons:

1 The source code is totally free;

2 Open-source software projects are highly used by thecommunity to change and manipulate the source code overand over again

These kind of projects tend to be constantly updated and thuscomprehension tasks are involved. Commenting can be a properway of helping on these tasks.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 18: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study

Project Description Files LoC Classes

iText PDF Library 480 145666 403

ganttproject Project Management Library 530 68945 394

gwt-dev Google's Web Toolkit 987 192738 803

jEdit Text Editor 531 176006 404

vuze Peer-to-peer client 3284 785935 2463

junit Tests Framework 154 10926 130

jfreechart Chart Library 989 313231 876

antlr Grammar Framework 221 85867 212

jexcelapi Excel Library 438 93876 166

robocode Programming Game of Robots 571 81519 485

Total 8185 1954709 6336

Table : Description and size of each selected project

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 19: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study results

Comment Quantity: 6/10 test programs ≥ 19% comments

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 20: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study results

Type of Comments

Project #CM CM/LOSC #IC #BC #JD

iText 13343 0.24 4930 3777 4636

ganttproject 4468 0.11 2925 814 729

gwt-dev 12969 0.16 7219 866 4884

jEdit 18986 0.21 806 14421 3759

vuze 27723 0.08 18245 2319 7159

junit 519 0.21 2 77 440

jfreechart 22516 0.27 6592 2530 13394

antlr 5292 0.14 3903 1380 9

jexcelapi 8354 0.26 2213 775 5366

robocode 5071 0.19 3108 102 1861

Total 119241 0.16 63633 13371 42237

Table : Comments Frequency in the projects.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 21: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study results

Project If For While Switch Class Interf. Method

iText 5 7 7 7 89 90 76

ganttproject 5 5 3 8 57 41 18

gwt-dev 9 10 7 5 96 97 19

jEdit 9 8 4 2 86 79 61

vuze 6 6 5 7 45 46 24

junit 1 0 0 0 25 71 37

jfreechart 6 10 2 18 100 100 100

antlr 11 16 5 4 61 56 22

jexcelapi 14 18 12 0 99 100 88

robocode 7 11 12 3 76 94 20

Total 7 8 6 5 69 60 45

Table : Percentage of Source Code Entities commented

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 22: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study results

Project If For While Switch Class Interf. Method

iText IC IC IC IC JD JD JD

ganttproject IC IC IC IC JD JD JD

gwt-dev IC IC IC IC JD JD JD

jEdit IC IC IC IC JD JD JD

vuze IC IC IC IC JD JD JD

junit IC NA NA NA JD JD JD

jfreechart IC IC IC IC JD JD JD

antlr IC IC IC IC BC BC BC

jexcelapi IC IC BC NA JD JD JD

robocode IC IC IC IC JD JD JD

Total IC IC IC IC JD JD JD

Table : Most used type of comment per type of source code entity

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 23: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study results

Comment Content: 10/10 test programs ≥ 23% Problemand Program Domain terms

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 24: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study results

Goal: explore the content of comments, by checking weathercomments contain Problem and Program domain information.Information necessary to run these tests:

a list of problem domain terms for each one of the softwareprojects.

a (single) list of program domain terms.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 25: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study results

Project Problem Domain Program Domain

iText 92.31 86.76

ganttproject 84.31 75.0

gwt-dev 56.34 86.76

jEdit 89.74 86.76

vuze 92.11 88.24

junit 81.82 67.65

jfreechart 86.36 89.71

antlr 88.24 83.82

jexcelapi 79.31 85.29

robocode 88.89 83.82

Total 82.21 83.38

Table : Percentage of domain words found

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 26: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study results

IC BC JD

Project Prob. Prog. Prob. Prog. Prob. Prog.

iText 13.05 13.99 6.89 14.1 14.7 17.36

ganttproject 13.76 13.52 14.78 11.58 14.67 14.2

gwt-dev 0.96 19.7 2.03 18.31 2.62 22.16

jEdit 5.1 17.15 6.44 24.76 9.28 16.69

vuze 4.6 18.02 5.14 11.38 4.29 18.89

junit 0 20.0 17.14 16.57 22.66 25.77

jfreechart 20.7 20.73 16.74 12.45 15.58 21.41

antlr 13.85 13.81 13.95 10.7 2.13 11.35

jexcelapi 10.38 16.16 17.08 12.97 24.97 17.01

robocode 17.0 14.06 16.52 12.6 25.13 12.5

Total 9.97 16.27 8.33 14.58 13.13 19.13

Table : Frequency (%) of words of each Domain per type of comment

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 27: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Preliminary study conclusion

Higher level source code entities tend to have commentsoriented for Problem Domain information, while commentsof lower level entities tends to include more ProgramDomain information.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 28: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Outline

1 Darius � Comment EvaluatorPreliminary study

2 Darius � Concept LocatorExperiment

3 Conclusion

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 29: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius - Problem Concept Location

Goal: Search of Problem Domain concepts to �nd the mappings ofthese concepts on the source code.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 30: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius GUI (2)

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 31: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Employed Techniques

Latent Semantic Analysis (LSA)A technique in natural language processing, of analyzingrelationships between a set of documents and the terms theycontain by producing a set of concepts related to the documentsand terms.LSA assumes that words that are close in meaning will occur closetogether in text. It constructs a matrix containing word counts perparagraph (rows represent unique words and columns representeach paragraph).

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 32: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Employed Techniques

Vector Space Model (VSM)An algebraic model for representing text documents as vectors.Each dimension corresponds to a separate term. If a term occurs inthe document, its value in the vector is non-zero.A weight is used to evaluate how important a word is to adocument in a collection. The importance increasesproportionally to the number of times a word appears in thedocument.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 33: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Outline

1 Darius � Comment EvaluatorPreliminary study

2 Darius � Concept LocatorExperiment

3 Conclusion

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 34: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Experiment with iText

The object of study chosen to be subject on this test, is iText.iText contains a su�cient amount of comments, and the contentsof that comments have a su�cient dose of Problem and Programdomain information, and so this program can be explored for PCpurposes using its comments.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 35: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Experiment with iText

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 36: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Experiment with iText

How a PDF document is created?

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 37: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Experiment with iText

How to write a PDF document into an output stream?

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 38: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Experiment with iText

How to add a title to a PDF?

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 39: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Darius � Experiment with iText

Going deeper in the searches and using the information discoveredin each executed query, the programmer can build an incrementalknowledge of the software.

The programmer should be able to �gure out the implementation ofevery concept on the source code and the relations among them, byusing the information present on comments.

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 40: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Outline

1 Darius � Comment EvaluatorPreliminary study

2 Darius � Concept LocatorExperiment

3 Conclusion

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension

Page 41: Comment Analysis approach for Program Comprehension (Software Engineering Workshop - Crete)

Conclusion

Do real world programs actually contain enough and meaningfulcomments to justify the analysis e�ort and the approach proposed?

Using simple but e�ective queries the process of locatingconcepts using comment information is faster than thecomplex task of reading the whole source code of the program.

Darius shows the potential value of comprehension thatcomments poses.

As future work:

Questionnaires will be made to understand how a programmerwould deal with Darius.Develop Darius as a plugin for an IDE (e.g. Eclipse).

Freitas, Cruz, Henriques Comment Analysis for Program Comprehension