Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection...

25
Documenting the feature implementations mined from the OO source code of a collection of software variants Rafat AL-MSIE’DEEN, Abdelhak-Djamel SERIAI, Marianne HUCHARD, Christelle URTADO and Sylvain VAUTTIER LIRMM / CNRS and Montpellier 2 University - Montpellier - France LGI2P / Ecole des Mines d’Alès - Nîmes - France

description

Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Transcript of Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection...

Page 1: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Documenting the feature implementationsmined from the OO source code

of a collection of software variants

Rafat AL-MSIE’DEEN, Abdelhak-Djamel SERIAI, Marianne HUCHARD,Christelle URTADO and Sylvain VAUTTIER

LIRMM / CNRS and Montpellier 2 University - Montpellier - FranceLGI2P / Ecole des Mines d’Alès - Nîmes - France

Page 2: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Outline

Context of software product line (reverse) engineeringThe big picture of our proposal for software product line reverseengineeringOverview of the feature documentation processUsed techniques (in a nutshell)Step by step feature documentation process on an exampleConclusion and perspectives

Page 3: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Undisciplined development of software product variants

Software product variants are often developed in an undisciplinedmanner.

ad-hoc reuse: copying and modifying previous software code (clone and own).

Results is a set of software products that:Implicitly share some (but not all) code,Are hard to understand, maintain, evolve.

Developing new variants still requires efforts.

Context

Page 4: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Benefits of software product line engineering

Expliciting shared (common) features and specific (variable) ones is aplus.

There is an abstract model of the developed productsincreases understandability

Products are easier to maintain and evolvesingle point of maintenance

Future products are easier to developdisciplined reuse

Software product line engineeringDomain engineering

A repository for reusable software feature implementationsA model of valid software products

– FODA feature modelApplication engineering

A software application development process based on feature selection

Context

Page 5: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Software product line engineering

My product line

Circle

Square

The triangles

The rectangles

Rectangle

Right triangleEquilateral triangle

Parallelogram

Feature implementation repository

FODA feature model

mandatoryoptionalalternative (xor)or

Domain engineering

Software product variants’ code(applications)

Application engineering

Context

Page 6: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Software product line (reverse) engineering

My product line

Circle

Square

The rectangles

Rectangle

Equilateral triangle

Parallelogram

Feature implementation repository

FODA feature model

mandatoryoptional

alternative (xor)or

Domain engineering

Software product variants’ code(applications)

Application engineering

The triangles

Right triangle

New product derivation

Software product linereverse engineering

FODA: Feature-oriented domain analysis

Context

Page 7: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALESBig picture

Software product line reverse engineering process

Automatically builds a « canonical » feature model from:the object-orientes source code of software variants,the software variant use-cases, if available.

Assumes:software code is object-oriented,software code statically reflects the features it implements,

no pre-compiling, macros, parameterizable code, etc.software code respects best practices relatively to naming (names for sourcecode elements are relevant),a given feature is always implemented with identical code,feature implementations are disjoint,features are functional.

Page 8: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Software product line reverse engineering process

Software product variants’ code

STEP 1Feature mining

Common features’ implementations

Variable features’ implementations

Circle

Square

Rectangle

Equilateral triangle

Parallelogram

Right triangle

STEP 2Feature

documentation

My product line

Circle

Square

XOR

Rectangle

Equilateral triangle

Parallelogram

OR

Right triangle

« Canonical » FODA feature modelwith constraints

Named and documentedcommon and variable features

STEP 3Feature model building

use-case-1

use-case-2

Use cases and their descriptions

Big picture

Page 9: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Feature documentation

Feature documentationautomatically assigning a meaningful name to previously mined featureimplementationscan be based either on:

source code analysis (using the most frequent words extracted from the feature’sOO source code elements, i.e. identifiers),use-case names, when available (assuming a functional feature to use-casecorrespondance).

Automatically assigning a use-case name to a feature implementationamounts to automatically identify the use-case that corresponds to afeature implementation.

Overview

Page 10: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Feature documentation process

A three step process:eliminate unsignificant use-cases to search from (to reduce search space fornext step),

making small groups of use-cases and their corresponding features: hybridblocks

among each hybrid block, compute textual similarity between the features’code and the use-casesuse these similarities to build the input of a clustering technique thatassociates a use-case (therefore its name) to each feature implementation.

Using three techniques:Relational Concept Analysis (RCA) for step 1,Latent semantic indexing (LSI) for step 2,Formal concept analysis (FCA) for step 3.

Overview

Page 11: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

FCA and RCA in a nutshell

Techniques

Formal Concept Analysis Concept Lattices [Barbut & Monjardet 1970 ; Ganter & Wille 1999]

Extract abstractions from a set of objects described by attibutesMax. set of objects (extent) that share a max. set of attributes (intent)

x

a4 a5a3a2a1

xxxo4xxo3

xxo2

xo1

Binary context

({o4},{a1,a3,a5})({o3},{a1,a3,a4})

({o1,o2,o3,o4},{})

({o3,o4},{a1,a3})

({},{a1,a2,a3,a4,fa})

({o2,o3,o4},{a3})({o1,o3,o4},{a1})

Concept lattice

({o2},{a2,a3})

Page 12: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALESTechniques

FCA and RCA in a nutshell

Technique to simplify the readingRepresent only attibutes (resp. objects) when they first appear from top down(resp. botttom up): simplified intent (resp. extent).

Technique to tame complexityconsider a sub-order by removing concepts that have an empty simplifiedintent and empty simplified extent: The lattice becomes an AOC-poset.

Relational Concept AnalysisAn iterative version of RCA in which objects are described by attributes andrelations.Generates a relational lattice family

Page 13: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

LSI in a nutshell

LSI is an information retrieval technique that compute lexical similaritybetween documents.It is based on the occurrence of terms in documents.

The number of terms (k) is a parameter of this method.The Term Frequency - Inverse Document Frequency (TF-IDF) weightingscheme is applied.Cosine similarity is computed.Before their analysis, texts are pre-processed:

Stop words, articles, punctuation marks, numbers are filtered out.All text is lower cased.Text is stemmed (using WordNet API).

Techniques

Page 14: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

The use-case diagrams of the 2nd and 4th MTG software variants

Mobile tourist guide (MTG) software variants

Step by step

The previously mined feature implementationspresence in each MTG software variant

(part of the relational context family)

Page 15: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Relational Context Family (RCF)

Step by step

Use case presence in each MTGsoftware variant

(part of the relational context family) Use case and feature implementation co-occurrence

(part of the relational context family)

Page 16: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Concept Lattice Family (CLF)

s

Step by step

Page 17: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Generating the hybrid blocks from the CLF

Step by step

Page 18: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Documents: Feature implementations

Hybrid Block_1publicviewMap(){int a = 0;while (a > 5){if (a != 20) {} else {a = 30;}}}

View Map

Queries: Use-cases and their descriptions

Constructing a raw corpus from each hybrid block

Step by step

Page 19: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Measuring an hybrid block similarity based on LSI

Step by step

Page 20: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Measuring an hybrid block similarity based on LSI

Step by step

Page 21: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Clustering use-cases described by features

Step by step

Page 22: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Conclusion

Software product variants’ code

STEP 1Feature mining

Common features’ implementations

Variable features’ implementations

Circle

Square

Rectangle

Equilateral triangle

Parallelogram

Right triangle

STEP 2Feature

documentation

My product line

Circle

Square

XOR

Rectangle

Equilateral triangle

Parallelogram

OR

Right triangle

« Canonical » FODA feature modelwith constraints

STEP 3Feature model building

use-case-1

use-case-2

Use cases and their descriptions

Conclusion

Page 23: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Conclusion

Software product variants’ code

STEP 1Feature mining

Common features’ implementations

Variable features’ implementations

Circle

Square

Rectangle

Equilateral triangle

Parallelogram

Right triangle

STEP 2Feature

documentation

My product line

Circle

Square

XOR

Rectangle

Equilateral triangle

Parallelogram

OR

Right triangle

« Canonical » FODA feature modelwith constraints

STEP 3Feature model building

use-case-1

use-case-2

Use cases and their descriptions

Use

1) RCA to reduce search space,

2) LSI to compute similarity between use cases and feature implementations

3) and, FCA to build the feature to use case correspondance (clustering).

Conclusion

Page 24: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Christelle URTADO - LGI2P / ECOLE DES MINES D’ALES

Conclusion

Related contributionsFeature mining (step 1) @ last year’s SEKETool implementationExperiments on 3 real use casesFeature model building (and constraint identification)

PerspectivesExtend the scope

Consider feature evolutionAdd dynamic code analysis

Improve the techniquesSearch based algorithms as an alternative clustering techniqueAutomatically identify junctions between feature implementationsBuild a « more » hierarchical feature model

Experiment at a wider scale.

Conclusion

Page 25: Documenting the Mined Feature Implementations from the Object-oriented Source Code of a Collection of Software Product Variants

Documenting the feature implementationsmined from the OO source code

of a collection of software variants

Rafat AL-MSIE’DEEN, Abdelhak-Djamel SERIAI, Marianne HUCHARD,Christelle URTADO and Sylvain VAUTTIER

Contact info:[email protected]

http://www.lgi2p.mines-ales.fr/~urtado