Aspect Miningan emerging research domain
1
Kim Menshttp://www.info.ucl.ac.be/~km
Special thanks to Andy Kellens, Paolo Tonella & Kris Gybels
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
1. Early aspects
2. Advanced Browsers
3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining2
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
1. Early aspects
2. Advanced Browsers
3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining2
Aspects in existing systems?
‣ Aspects offer
- Better modularisation and “separation of concerns”
- Improved “ilities”
‣ Also useful for already existing systems
- But how to migrate a legacy system to AOSD?
3
Migrating a System
4
Migrating a System
4
Aspect Mining
Migrating a System
4
Aspect Refactoring
Aspect Mining
Why do we need aspect mining?
‣ Legacy systems
- large, complex systems
- not always clearly documented
- need to find the crosscutting concerns (what?)
- need to find the extent of the crosscutting concerns (where?)
- program understanding/documentation
5
Why do we need aspect mining?
‣ Legacy systems
- large, complex systems
- not always clearly documented
- need to find the crosscutting concerns (what?)
- need to find the extent of the crosscutting concerns (where?)
- program understanding/documentation
5
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
1. Early aspects
2. Advanced Browsers
3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining6
Different Kinds of Mining Approaches
1.“Early aspect” discovery techniques
2.Advanced special-purpose browsers
3.(Semi-)automated code-level mining techniques
7
Early Aspects
‣ Try to find the aspects in artifacts of the early phases of the software life-cycle :
- requirements
- design
‣ Less useful when trying to identify aspects in legacy code
8
Different Kinds of Mining Approaches
1.“Early aspect” discovery techniques
2.Advanced special-purpose browsers
3.(Semi-)automated code-level mining techniques
9
Special-PurposeCode Browsers
‣ Idea: help a developer in exploring a concern
‣ By browsing / navigating the source code
- User provides interesting “seed” in the code
- Tool suggests interesting related points to consider
‣ Iteratively, build up a model of the crosscutting concerns
‣ Examples:
- FEAT, Aspect Browser, (Extended) Aspect Mining Tool, Prism, JQuery, Soul
10
Special-PurposeCode Browsers
‣ Idea: help a developer in exploring a concern
‣ By browsing / navigating the source code
- User provides interesting “seed” in the code
- Tool suggests interesting related points to consider
‣ Iteratively, build up a model of the crosscutting concerns
‣ Examples:
- FEAT, Aspect Browser, (Extended) Aspect Mining Tool, Prism, JQuery, Soul
10
FEAT
‣ Create a “Concern Graph”, a representation of the concern mapping back to the source code.
‣ Consists out of a set of program entities
‣ Start out with a number of program entities (e.g. “all callers of the method print()”)
‣ Iteratively explore the relations and refine the concern graph
- fan-in (callers, ...)
- fan-out (callees, ...)11
FEAT
12
Special-PurposeCode Browsers
‣ Disadvantages :
- Require quite some manual effort (does it scale?)
➡ Try to automate this process
- Requires some preliminary knowledge about system (need for seeds)
13
Different Kinds of Mining Approaches
1.“Early aspect” discovery techniques
2.Advanced special-purpose browsers
3.(Semi-)automated code-level mining techniques
14
Automated Approaches
‣ Idea: Semi-automatically find possible aspect candidates in source code of a system
‣ No magic:
- Manual filtering of the results
- False positives/negatives
- May require preliminary knowledge
‣ Possibility to use results as seed for browsers
15
Automated Approaches
‣ Hidden assumption of all techniques :
- look for symptoms of scattering and tangling
- either static (code) or dynamic (run-time)
‣ Discover these symptoms using a variety of data analysis techniques
- inspired by data mining, code analysis, ...
- but specifically (re)designed for aspect mining
16
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
1. Early aspects
2. Advanced Browsers
3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining17
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
18
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
18
Recurring Patterns[Breu&Krinke]
‣ Assumption: if certain method calls always happen before/after the same call, they might be part of an aspect
‣ “Reverse engineering” of manual implementation of before/after advice
‣ Technique: find pairs of methods which happen right before/after each other and check wether this pattern recurs in the code
19
Recurring Patterns[Breu&Krinke]
‣ Inherently dynamic (analysis of execution trace), but:
- static : use control-flow graphs to calculate calling relations
- hybrid : augment dynamic information with static type information to improve results
‣ CCC needs to be rigorously implemented (calls always in same order)
20
Automated aspect mining techniques Complementary to dedicatedbrowsers, there exist a number of techniques which have as goal to au-tomate the process of mining aspects and which propose their user oneor more aspect candidates. To this end, these techniques reason aboutthe source code of the system or about data that is acquired by execut-ing or manipulating the code. All techniques seem to have at least incommon that they search for symptoms of cross-cutting concerns, usingeither techniques from data mining and data analysis like formal conceptanalysis and cluster analysis, or more classic code analysis techniqueslike program slicing, software metrics and heuristics, clone detection andpattern matching techniques, dynamic analysis, and so on.
In this survey we focus only on this second category of techniques whichsemi-automatically assist a developer in the activity of mining the cross-cutting concerns in an existing system. In the remainder of this sectionwe take a closer look at the di!erent automated code-level aspect miningapproaches that have been proposed over the last few years.
3.1 Analysing recurring patterns of execution traces
Breu and Krinke propose an aspect mining technique named DynAMiT(Dynamic Aspect Mining Tool) [13], which analyses program traces re-flecting the run-time behaviour of a system in search of recurring execu-tion patterns. To do so, they introduce the notion of execution relationsbetween method invocations. Consider the following example of an eventtrace, where the capitals represent method names:
B() {C() {
G() {}H() {}
}}
A() {}
Breu and Krinke distinguish between four di!erent execution relations:outside-before (e.g., B is called before A), outside-after (e.g. A is calledafter B), inside-first (e.g., G is the first call in C) and inside-last (e.g., His the last call in C). Using these execution relations, their mining algo-rithm discovers aspect candidates based on recurring patterns of methodinvocations. If an execution relation occurs more than once, and recursuniformly (for instance, every invocation of method B is followed by aninvocation of method A), it is considered to be an aspect candidate. Toensure that the aspect candidates are su"ciently cross-cutting, there isan extra requirement that the recurring relations should appear in dif-ferent ‘calling contexts’. Although this approach is inherently dynamic,the authors have repeated the experiment using control-flow-graphs [14],a static technique, to calculate the execution relations. Breu also reportson a hybrid approach [15] where the dynamic information is comple-mented with static type information in order to remove ambiguities andimprove on the results of the technique.
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
21
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
21
Clone Detection[Bruntink, Shepherd]
‣ Assumption: crosscutting concerns result in code duplication
‣ Certain CCCs are implemented using a recurring pattern, idiom
‣ Technique: find classes of related code
- exact same
- same structure; different literals
22
Clone Detection [Bruntink, Shepherd]
‣ source code analysis
‣ applicable if the CCCs are implemented as a pattern
23
In order to perform this evaluation, we use grok to pro-
cess the sets of clone classes of both clone detectors sepa-
rately. For each of the concerns we consider, we try to find
an ordered selection of clone classes that does a good job at
‘covering’ the region of code defined by the concern in ques-
tion. A source code line of a concern is covered by a clone
class if it is included in one of the clones (code fragments) of
the clone class.
For each concern, we then proceed as follows: for all of
the clone classes in the set, we calculate which concern lines
are covered by each clone class. The clone class that cov-
ers the most lines of the concern is selected, and the concern
lines that are covered will no longer be considered during
the remainder of the algorithm. Subsequently, the algorithm
will select the clone class that covers the most of the remain-
ing concern lines, and so on until no more concern lines are
covered by any clone class. If it occurs that multiple clone
classes cover an equal number of concern lines, we select
the clone class that contains the least number of non-concern
lines. Similar to lines belonging to a concern, non-concerns
lines are also considered at most once.
6. Obtained Results
Our primary goal is finding the code belonging to a cer-
tain concern. Therefore, in our algorithm to select the clone
classes (see Section 5), we favor coverage and sacrifice pre-
cision (defined below). Arguably, other goals require differ-
ent criteria to rank the clone classes. For example, in order
to identify opportunities for (automatic) refactoring, preci-
sion would be the primary issue. We plan to explore these
possibilities in the future.
In order to evaluate to what extent the clone detectors
meet our goal, we investigate the level of concern coverage
met by the clone classes. Concern coverage is the fraction
of a concern’s source code lines that are covered by the first
n selected clone classes. Using the selection algorithm de-
scribed in Section 5 we obtain the results displayed in Fig-
ure 2(a) and Figure 2(b) for Bauhaus’ ccdiml and CCFinder,
respectively.
Additionally, we evaluate the precision obtained by the
first n selected clone classes. Precision is defined as follows:
precision(n) =concernLines(n)
totalLines(n),
where n indicates the first n selected clone classes, concern-
Lines equals the number of concern code lines covered by
the first n selected clone classes, and likewise totalLines
equals the total number of lines covered by the first n se-
lected clone classes. Figure 2(c) and Figure 2(d) show the
precision obtained by the first n selected clone classes for
Bauhaus’ ccdiml and CCFinder, respectively.
Observe that as the number of clone classes considered in-
creases, the coverage displays a monotonic growth, whereas
the precision tends to decrease. The highest coverage is
less than 100% in all cases: the remaining percentage cor-
responds to concern code that is coded in such a unique way
that it does not occur in any clone class. For example, Fig-
ure 2(a) and Figure 2(b) show that 5% of the memory error
handling code is not part of any clone class.
We are primarily interested in achieving sufficient cover-
age without loosing too much precision. Therefore, we will
focus on the number of clone classes needed to cover most
of a concern, where we will consider 80% to be a sufficient
coverage level.
6.1. Memory Error Handling
Using 9 clone classes is enough to sufficiently cover the
memory error handling concern for both Bauhaus’ ccdiml
and CCFinder, resulting in 69% and 52% precision, respec-
tively.
We observe that CCFinder yields a clone class that al-
ready covers 45% of the concern code. This particular clone
class contains 96 clones which are 6 lines in length. Figure 3
shows an example clone from this class. While the lines
marked with ‘M’ belong to the memory handling concern,
only the lines marked with ‘C’ are included in the clones.
CCFinder allows clones to start and end with little regard to
syntactic units. In contrast, Bauhaus’ ccdiml does not allow
this, due to its AST-based clone detection algorithm.
M C if (r != OK)M C {M C ERXA_LOG(r, 0, ("PLXAmem_malloc failure."));M CM C ERXA_LOG(VSXA_MEMORY_ERR, r,M C ("%s: failed to allocated %d bytes.",M func_name, toread));MM r = VSXA_MEMORY_ERR;M }
Figure 3. CCFinder clone covering memory error
handling.
Furthermore this clone class does not cover memory er-
ror handling code exclusively. In Figure 2(d), note that the
precision obtained for the first clone class is roughly 82%.
Through inspection of the code we found that some of the
clones do not cover memory error handling code at all, but
code that is similar at the syntactical level, yet semantically
different.
6.2. Parameter Checking
Our results show that the parameter checking concern is
found very well by both clone detectors: using 7 clone
classes of Bauhaus’ ccdiml is sufficient to cover 80% of the
concern, while for CCFinder we can suffice with 4 clone
5
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
24
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
24
Formal Concept Analysis
25
• Starts from• a set of elements
• a set of properties of those elements
• Determines concepts• Maximal groups of elements and properties
• Group:
• Every element of the concept has those properties
• Every property of the concept holds for those elements
• Maximal
• No other element (outside the concept) has those same properties
• No other property (outside the concept) is shared by all elements
Formal Concept Analysis
26
object-oriented
functional logic static typing
dynamic typing
C++ X - - X -
Java X - - X -
Smalltalk X - - - X
Scheme - X - - X
Prolog - - X - X
Formal Concept Analysis
26
object-oriented
functional logic static typing
dynamic typing
C++ X - - X -
Java X - - X -
Smalltalk X - - - X
Scheme - X - - X
Prolog - - X - X
Formal Concept Analysis
26
object-oriented
functional logic static typing
dynamic typing
C++ X - - X -
Java X - - X -
Smalltalk X - - - X
Scheme - X - - X
Prolog - - X - X
Formal Concept Analysis
26
object-oriented
functional logic static typing
dynamic typing
C++ X - - X -
Java X - - X -
Smalltalk X - - - X
Scheme - X - - X
Prolog - - X - X
Formal Concept Analysis
27
Identifier Analysis (Mens & Tourwé)
‣ Assumption: methods belonging to the same concern are implemented using similar names
‣ Eg. methods implementing Undo concern will have “undo” in their signature
‣ Technique: Use Formal Concept Analysis with the methods as objects and the different substrings as properties
28
Identifier Analysis
29
figure
drawing
request
remove
update
change
event …
drawingRequestUpdate(DrawingChangeEvent e) - X X - X - - …
figureRequestRemove(FigureChangeEvent e) X - X X - - - …
figureRequestUpdate(FigureChangeEvent e) X - X - X - - …
figureRequestRemove(FigureChangeEvent e) X - X X - - - …
figureRequestUpdate(FigureChangeEvent e) X - X - X - - …
… … … X … … … … …
‣ Lots of concepts: filter irrelevant ones
‣ Group “similar” concepts
Execution traces (ceccato & tonella)
‣ Assumption: use case scenario’s are a good indicator for crosscutting concerns
‣ Technique: Analyse execution trace using FCA with use cases as objects; methods called from within use case as attributes
‣ Concept specific to one use case is possible aspect if:
- Methods belong to more than one class (scattering)
- Methods of same class occur in multiple use cases (tangling)
30
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
31
Fan-in Analysis (Marin)
‣ Assumption: The fan-in metric is a good indicator of scattering
‣ “Methods which get called often from different contexts are possible crosscutting concerns”
‣ Technique: calculate the fan-in of each method, filter auxiliary methods/accessors and sort the resulting methods
32
Unique Methods (Gybels & kellens)
‣ Assumption: Some CCCs are implemented by calling a single entity from all over the code
‣ E.g. Logging
‣ Technique: devise a metric (unique method) which finds such crosscutting concerns
33
Unique Methods (Gybels & kellens)
34
Logginglog()
ClassA ClassC
ClassB
Unique MethodA method without a return value which implements a message
implemented by no other method.
Q: Why no return type?A: Use of value in base
computation
Unique Methods (Gybels & kellens)
35
Class Selector Calls
Parcel #markAsDirty 23
ParagraphEditor #resetTypeIn 19
UIPainter #broadCastPendingSelectionChange 18
ControllerCodeRegenerator #pushPC 15
AbstractChangeList #updateSelection: 15
PundleModel #updateAfterDo: 10
??????????????????????? #beGarbageCollectable ??
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
36
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
36
Cluster Analysis
‣ Input:
- a set of objects
- a distance measure between those objects
‣ Output:
- groups of objects which are close (according to the distance function)
37
Aspect Mining Using Cluster Analysis (He, Shepherd)
‣ Distance function between two methods
‣ Call Clustering:
- distance in function of times they occur together in same method (cfr. recurring execution traces)
‣ Method Clustering:
- distance in function of commonalities in name of methods (cfr. identifier analysis)
38
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
39
‣ Pattern matching• Analysing recurring patterns of execution traces• Clone detection (token/PDG/AST-based)
‣ Formal concept analysis‣ Execution traces / identifiers
‣ Natural language processing on code‣ AOP idioms‣ Fan-in analysis / Detecting unique methods
‣ Cluster analysis‣ method names / method invocations
‣ ... and many more ...
Some Automated Approaches
39
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
1. Early aspects
2. Advanced Browsers
3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining40
Comparison
41
5 Assessment
Based on the criteria introduced in Section 4, we compare the di!erentaspect mining techniques summarised in Section 3. To gain space, weabbreviate the names of the techniques used , as shown in Table 1.
Abbreviated name Short description of the technique Section
Execution patterns Analysing recurring patterns of execution traces 3.1Dynamic analysis Formal concept analysis of execution traces 3.2Identifier analysis Formal concept analysis of identifiers 3.2Language clues Natural language processing on source code 3.3Unique methods Detecting unique methods 3.4Method clustering Hierarchical clustering of similar method names 3.5Call clustering Clustering based on method invocations 3.5Fan-in analysis Fan-in analysis 3.6Clone Detection Detecting aspects using PDG-based 3.7(PDG-based) clone detectionClone Detection Detecting aspects using AST-based and 3.7(AST/token-based) token-based clone detection
Table 1. List of techniques that were compared
For each of the studied techniques, Table 2 shows the kind of data (staticor dynamic) analysed by that technique, as well as the kind of analysisperformed (token-based or structural/behavioural).
Kind of input data Kind of analysisstatic dynamic token-based structural/behavioural
Execution patterns X X - XDynamic analysis - X - XIdentifier analysis X - X -Language clues X - X -Unique methods X - - X
Method clustering X - X -Call clustering X - - XFan-in analysis X - - X
Clone detection (PDG) X - - XClone detection (token) X - X -Clone detection (AST) X - - X
Table 2. Kind of input data and kind of analysis of each technique
Most techniques work on statically available data. ‘Dynamic analysis’reasons about execution traces and thus requires executability of thecode under analysis. Only ‘Execution patterns’ works with both kindsof input, since both a static version which uses control-flow graphs, anda dynamic version which uses execution traces, exist. As for the kindof reasoning, four techniques perform a token-based analysis of the in-put data. ‘Identifier Analysis’ and ‘Method Clustering’ reason about thenames of the methods in a system only. The ‘Language Clues’ approachis token-based because it reasons about individual words which appearin the program’s source code. The four token-based techniques all rely onthe assumption that cross-cutting concerns are often implemented by therigorous use of naming conventions. The seven other techniques reasonabout the input at a structural or behavioural level.
Kind of input data and kind of analysis
Comparison
41
5 Assessment
Based on the criteria introduced in Section 4, we compare the di!erentaspect mining techniques summarised in Section 3. To gain space, weabbreviate the names of the techniques used , as shown in Table 1.
Abbreviated name Short description of the technique Section
Execution patterns Analysing recurring patterns of execution traces 3.1Dynamic analysis Formal concept analysis of execution traces 3.2Identifier analysis Formal concept analysis of identifiers 3.2Language clues Natural language processing on source code 3.3Unique methods Detecting unique methods 3.4Method clustering Hierarchical clustering of similar method names 3.5Call clustering Clustering based on method invocations 3.5Fan-in analysis Fan-in analysis 3.6Clone Detection Detecting aspects using PDG-based 3.7(PDG-based) clone detectionClone Detection Detecting aspects using AST-based and 3.7(AST/token-based) token-based clone detection
Table 1. List of techniques that were compared
For each of the studied techniques, Table 2 shows the kind of data (staticor dynamic) analysed by that technique, as well as the kind of analysisperformed (token-based or structural/behavioural).
Kind of input data Kind of analysisstatic dynamic token-based structural/behavioural
Execution patterns X X - XDynamic analysis - X - XIdentifier analysis X - X -Language clues X - X -Unique methods X - - X
Method clustering X - X -Call clustering X - - XFan-in analysis X - - X
Clone detection (PDG) X - - XClone detection (token) X - X -Clone detection (AST) X - - X
Table 2. Kind of input data and kind of analysis of each technique
Most techniques work on statically available data. ‘Dynamic analysis’reasons about execution traces and thus requires executability of thecode under analysis. Only ‘Execution patterns’ works with both kindsof input, since both a static version which uses control-flow graphs, anda dynamic version which uses execution traces, exist. As for the kindof reasoning, four techniques perform a token-based analysis of the in-put data. ‘Identifier Analysis’ and ‘Method Clustering’ reason about thenames of the methods in a system only. The ‘Language Clues’ approachis token-based because it reasons about individual words which appearin the program’s source code. The four token-based techniques all rely onthe assumption that cross-cutting concerns are often implemented by therigorous use of naming conventions. The seven other techniques reasonabout the input at a structural or behavioural level.
few
dynamic
Kind of input data and kind of analysis
Comparison
41
5 Assessment
Based on the criteria introduced in Section 4, we compare the di!erentaspect mining techniques summarised in Section 3. To gain space, weabbreviate the names of the techniques used , as shown in Table 1.
Abbreviated name Short description of the technique Section
Execution patterns Analysing recurring patterns of execution traces 3.1Dynamic analysis Formal concept analysis of execution traces 3.2Identifier analysis Formal concept analysis of identifiers 3.2Language clues Natural language processing on source code 3.3Unique methods Detecting unique methods 3.4Method clustering Hierarchical clustering of similar method names 3.5Call clustering Clustering based on method invocations 3.5Fan-in analysis Fan-in analysis 3.6Clone Detection Detecting aspects using PDG-based 3.7(PDG-based) clone detectionClone Detection Detecting aspects using AST-based and 3.7(AST/token-based) token-based clone detection
Table 1. List of techniques that were compared
For each of the studied techniques, Table 2 shows the kind of data (staticor dynamic) analysed by that technique, as well as the kind of analysisperformed (token-based or structural/behavioural).
Kind of input data Kind of analysisstatic dynamic token-based structural/behavioural
Execution patterns X X - XDynamic analysis - X - XIdentifier analysis X - X -Language clues X - X -Unique methods X - - X
Method clustering X - X -Call clustering X - - XFan-in analysis X - - X
Clone detection (PDG) X - - XClone detection (token) X - X -Clone detection (AST) X - - X
Table 2. Kind of input data and kind of analysis of each technique
Most techniques work on statically available data. ‘Dynamic analysis’reasons about execution traces and thus requires executability of thecode under analysis. Only ‘Execution patterns’ works with both kindsof input, since both a static version which uses control-flow graphs, anda dynamic version which uses execution traces, exist. As for the kindof reasoning, four techniques perform a token-based analysis of the in-put data. ‘Identifier Analysis’ and ‘Method Clustering’ reason about thenames of the methods in a system only. The ‘Language Clues’ approachis token-based because it reasons about individual words which appearin the program’s source code. The four token-based techniques all rely onthe assumption that cross-cutting concerns are often implemented by therigorous use of naming conventions. The seven other techniques reasonabout the input at a structural or behavioural level.
Some techniques look
only at tokens
Some look at more
struct./
behav. infofew
dynamic
Kind of input data and kind of analysis
Comparison
42
Granularity of and symptoms looked for by each technique
Granularity Symptomsmethod code fragment scattering tangling
Execution patterns X - X -Dynamic analysis X - X XIdentifier analysis X - X -Language clues X - X -Unique methods X - X -
Method clustering X - X -Call clustering X - X -Fan-in analysis X - X -Clone detection - X X -
Table 3. Granularity of and symptoms looked for by each technique
Table 3 summarises the finest level of granularity (methods or code frag-ments) of the di!erent techniques, and whether they look for symptomsof scattering and/or tangling. With a few exceptions, the typical gran-ularity of the techniques surveyed is at method level. Therefore, mosttechniques output several sets of methods, each representing a potentialaspect seed. Only the three ‘Clone detection’ techniques detect aspectcode at the level of code fragments and can therefore provide more fine-grained feedback on the code that needs to be put into the advice of therefactored aspect. All techniques use scattering as the basic indicator ofthe presence of a cross-cutting concern. Only ‘Dynamic analysis’ takesboth scattering and tangling into account, by requiring that the methodswhich occur in a single use-case scenario are implemented in multipleclasses (scattering), but also that these methods occur in multiple use-cases and thus are tangled with other concerns of the system.
Technique Largest case Size case Empiricallyvalidated
Execution patterns Gra!ti 3100 methods/82KLOC -Dynamic analysis JHotDraw 2800 methods/18KLOC -Identifier analysis JHotDraw 2800 methods/18KLOC -Language clues PetStore 10KLOC -Unique methods Smalltalk image 3400 classes/66000 methods -
Method clustering JHotDraw 2800 methods/18KLOC -Call clustering Banking example 12 methods -Fan-in analysis JHotDraw 2800 methods/18KLOC -
TomCat 5.5 API 172KLOC -Clone detection (PDG) TomCat 38KLOC -
Clone detection (AST/token) ASML C-Code 20KLOC X
Table 4. An assessment of the validation of each technique
To provide more insights into the validation of the techniques, Table 4mentions the largest case on which each technique has been validated,together with the size of that case, and whether the results have beenevaluated quantitatively (for example, how many known aspects wereactually reported, how many false positives and negatives were reported,and so on). While the size of the largest analysed system is significant formost of the studied techniques (only ‘Call Clustering’ was applied to atoy example only), empirical validation of the results was almost always
Comparison
42
Granularity of and symptoms looked for by each technique
Granularity Symptomsmethod code fragment scattering tangling
Execution patterns X - X -Dynamic analysis X - X XIdentifier analysis X - X -Language clues X - X -Unique methods X - X -
Method clustering X - X -Call clustering X - X -Fan-in analysis X - X -Clone detection - X X -
Table 3. Granularity of and symptoms looked for by each technique
Table 3 summarises the finest level of granularity (methods or code frag-ments) of the di!erent techniques, and whether they look for symptomsof scattering and/or tangling. With a few exceptions, the typical gran-ularity of the techniques surveyed is at method level. Therefore, mosttechniques output several sets of methods, each representing a potentialaspect seed. Only the three ‘Clone detection’ techniques detect aspectcode at the level of code fragments and can therefore provide more fine-grained feedback on the code that needs to be put into the advice of therefactored aspect. All techniques use scattering as the basic indicator ofthe presence of a cross-cutting concern. Only ‘Dynamic analysis’ takesboth scattering and tangling into account, by requiring that the methodswhich occur in a single use-case scenario are implemented in multipleclasses (scattering), but also that these methods occur in multiple use-cases and thus are tangled with other concerns of the system.
Technique Largest case Size case Empiricallyvalidated
Execution patterns Gra!ti 3100 methods/82KLOC -Dynamic analysis JHotDraw 2800 methods/18KLOC -Identifier analysis JHotDraw 2800 methods/18KLOC -Language clues PetStore 10KLOC -Unique methods Smalltalk image 3400 classes/66000 methods -
Method clustering JHotDraw 2800 methods/18KLOC -Call clustering Banking example 12 methods -Fan-in analysis JHotDraw 2800 methods/18KLOC -
TomCat 5.5 API 172KLOC -Clone detection (PDG) TomCat 38KLOC -
Clone detection (AST/token) ASML C-Code 20KLOC X
Table 4. An assessment of the validation of each technique
To provide more insights into the validation of the techniques, Table 4mentions the largest case on which each technique has been validated,together with the size of that case, and whether the results have beenevaluated quantitatively (for example, how many known aspects wereactually reported, how many false positives and negatives were reported,and so on). While the size of the largest analysed system is significant formost of the studied techniques (only ‘Call Clustering’ was applied to atoy example only), empirical validation of the results was almost always
few below
method-level
Comparison
42
Granularity of and symptoms looked for by each technique
Granularity Symptomsmethod code fragment scattering tangling
Execution patterns X - X -Dynamic analysis X - X XIdentifier analysis X - X -Language clues X - X -Unique methods X - X -
Method clustering X - X -Call clustering X - X -Fan-in analysis X - X -Clone detection - X X -
Table 3. Granularity of and symptoms looked for by each technique
Table 3 summarises the finest level of granularity (methods or code frag-ments) of the di!erent techniques, and whether they look for symptomsof scattering and/or tangling. With a few exceptions, the typical gran-ularity of the techniques surveyed is at method level. Therefore, mosttechniques output several sets of methods, each representing a potentialaspect seed. Only the three ‘Clone detection’ techniques detect aspectcode at the level of code fragments and can therefore provide more fine-grained feedback on the code that needs to be put into the advice of therefactored aspect. All techniques use scattering as the basic indicator ofthe presence of a cross-cutting concern. Only ‘Dynamic analysis’ takesboth scattering and tangling into account, by requiring that the methodswhich occur in a single use-case scenario are implemented in multipleclasses (scattering), but also that these methods occur in multiple use-cases and thus are tangled with other concerns of the system.
Technique Largest case Size case Empiricallyvalidated
Execution patterns Gra!ti 3100 methods/82KLOC -Dynamic analysis JHotDraw 2800 methods/18KLOC -Identifier analysis JHotDraw 2800 methods/18KLOC -Language clues PetStore 10KLOC -Unique methods Smalltalk image 3400 classes/66000 methods -
Method clustering JHotDraw 2800 methods/18KLOC -Call clustering Banking example 12 methods -Fan-in analysis JHotDraw 2800 methods/18KLOC -
TomCat 5.5 API 172KLOC -Clone detection (PDG) TomCat 38KLOC -
Clone detection (AST/token) ASML C-Code 20KLOC X
Table 4. An assessment of the validation of each technique
To provide more insights into the validation of the techniques, Table 4mentions the largest case on which each technique has been validated,together with the size of that case, and whether the results have beenevaluated quantitatively (for example, how many known aspects wereactually reported, how many false positives and negatives were reported,and so on). While the size of the largest analysed system is significant formost of the studied techniques (only ‘Call Clustering’ was applied to atoy example only), empirical validation of the results was almost always
few below
method-level few
tangling
Comparison
43
Granularity Symptomsmethod code fragment scattering tangling
Execution patterns X - X -Dynamic analysis X - X XIdentifier analysis X - X -Language clues X - X -Unique methods X - X -
Method clustering X - X -Call clustering X - X -Fan-in analysis X - X -Clone detection - X X -
Table 3. Granularity of and symptoms looked for by each technique
Table 3 summarises the finest level of granularity (methods or code frag-ments) of the di!erent techniques, and whether they look for symptomsof scattering and/or tangling. With a few exceptions, the typical gran-ularity of the techniques surveyed is at method level. Therefore, mosttechniques output several sets of methods, each representing a potentialaspect seed. Only the three ‘Clone detection’ techniques detect aspectcode at the level of code fragments and can therefore provide more fine-grained feedback on the code that needs to be put into the advice of therefactored aspect. All techniques use scattering as the basic indicator ofthe presence of a cross-cutting concern. Only ‘Dynamic analysis’ takesboth scattering and tangling into account, by requiring that the methodswhich occur in a single use-case scenario are implemented in multipleclasses (scattering), but also that these methods occur in multiple use-cases and thus are tangled with other concerns of the system.
Technique Largest case Size case Empiricallyvalidated
Execution patterns Gra!ti 3100 methods/82KLOC -Dynamic analysis JHotDraw 2800 methods/18KLOC -Identifier analysis JHotDraw 2800 methods/18KLOC -Language clues PetStore 10KLOC -Unique methods Smalltalk image 3400 classes/66000 methods -
Method clustering JHotDraw 2800 methods/18KLOC -Call clustering Banking example 12 methods -Fan-in analysis JHotDraw 2800 methods/18KLOC -
TomCat 5.5 API 172KLOC -Clone detection (PDG) TomCat 38KLOC -
Clone detection (AST/token) ASML C-Code 20KLOC X
Table 4. An assessment of the validation of each technique
To provide more insights into the validation of the techniques, Table 4mentions the largest case on which each technique has been validated,together with the size of that case, and whether the results have beenevaluated quantitatively (for example, how many known aspects wereactually reported, how many false positives and negatives were reported,and so on). While the size of the largest analysed system is significant formost of the studied techniques (only ‘Call Clustering’ was applied to atoy example only), empirical validation of the results was almost always
An assessment of the validation of each technique
Comparison
43
Granularity Symptomsmethod code fragment scattering tangling
Execution patterns X - X -Dynamic analysis X - X XIdentifier analysis X - X -Language clues X - X -Unique methods X - X -
Method clustering X - X -Call clustering X - X -Fan-in analysis X - X -Clone detection - X X -
Table 3. Granularity of and symptoms looked for by each technique
Table 3 summarises the finest level of granularity (methods or code frag-ments) of the di!erent techniques, and whether they look for symptomsof scattering and/or tangling. With a few exceptions, the typical gran-ularity of the techniques surveyed is at method level. Therefore, mosttechniques output several sets of methods, each representing a potentialaspect seed. Only the three ‘Clone detection’ techniques detect aspectcode at the level of code fragments and can therefore provide more fine-grained feedback on the code that needs to be put into the advice of therefactored aspect. All techniques use scattering as the basic indicator ofthe presence of a cross-cutting concern. Only ‘Dynamic analysis’ takesboth scattering and tangling into account, by requiring that the methodswhich occur in a single use-case scenario are implemented in multipleclasses (scattering), but also that these methods occur in multiple use-cases and thus are tangled with other concerns of the system.
Technique Largest case Size case Empiricallyvalidated
Execution patterns Gra!ti 3100 methods/82KLOC -Dynamic analysis JHotDraw 2800 methods/18KLOC -Identifier analysis JHotDraw 2800 methods/18KLOC -Language clues PetStore 10KLOC -Unique methods Smalltalk image 3400 classes/66000 methods -
Method clustering JHotDraw 2800 methods/18KLOC -Call clustering Banking example 12 methods -Fan-in analysis JHotDraw 2800 methods/18KLOC -
TomCat 5.5 API 172KLOC -Clone detection (PDG) TomCat 38KLOC -
Clone detection (AST/token) ASML C-Code 20KLOC X
Table 4. An assessment of the validation of each technique
To provide more insights into the validation of the techniques, Table 4mentions the largest case on which each technique has been validated,together with the size of that case, and whether the results have beenevaluated quantitatively (for example, how many known aspects wereactually reported, how many false positives and negatives were reported,and so on). While the size of the largest analysed system is significant formost of the studied techniques (only ‘Call Clustering’ was applied to atoy example only), empirical validation of the results was almost always
An assessment of the validation of each technique
common
benchmark?
Comparison
43
Granularity Symptomsmethod code fragment scattering tangling
Execution patterns X - X -Dynamic analysis X - X XIdentifier analysis X - X -Language clues X - X -Unique methods X - X -
Method clustering X - X -Call clustering X - X -Fan-in analysis X - X -Clone detection - X X -
Table 3. Granularity of and symptoms looked for by each technique
Table 3 summarises the finest level of granularity (methods or code frag-ments) of the di!erent techniques, and whether they look for symptomsof scattering and/or tangling. With a few exceptions, the typical gran-ularity of the techniques surveyed is at method level. Therefore, mosttechniques output several sets of methods, each representing a potentialaspect seed. Only the three ‘Clone detection’ techniques detect aspectcode at the level of code fragments and can therefore provide more fine-grained feedback on the code that needs to be put into the advice of therefactored aspect. All techniques use scattering as the basic indicator ofthe presence of a cross-cutting concern. Only ‘Dynamic analysis’ takesboth scattering and tangling into account, by requiring that the methodswhich occur in a single use-case scenario are implemented in multipleclasses (scattering), but also that these methods occur in multiple use-cases and thus are tangled with other concerns of the system.
Technique Largest case Size case Empiricallyvalidated
Execution patterns Gra!ti 3100 methods/82KLOC -Dynamic analysis JHotDraw 2800 methods/18KLOC -Identifier analysis JHotDraw 2800 methods/18KLOC -Language clues PetStore 10KLOC -Unique methods Smalltalk image 3400 classes/66000 methods -
Method clustering JHotDraw 2800 methods/18KLOC -Call clustering Banking example 12 methods -Fan-in analysis JHotDraw 2800 methods/18KLOC -
TomCat 5.5 API 172KLOC -Clone detection (PDG) TomCat 38KLOC -
Clone detection (AST/token) ASML C-Code 20KLOC X
Table 4. An assessment of the validation of each technique
To provide more insights into the validation of the techniques, Table 4mentions the largest case on which each technique has been validated,together with the size of that case, and whether the results have beenevaluated quantitatively (for example, how many known aspects wereactually reported, how many false positives and negatives were reported,and so on). While the size of the largest analysed system is significant formost of the studied techniques (only ‘Call Clustering’ was applied to atoy example only), empirical validation of the results was almost always
An assessment of the validation of each technique
no toy
examplescommon
benchmark?
Comparison
43
Granularity Symptomsmethod code fragment scattering tangling
Execution patterns X - X -Dynamic analysis X - X XIdentifier analysis X - X -Language clues X - X -Unique methods X - X -
Method clustering X - X -Call clustering X - X -Fan-in analysis X - X -Clone detection - X X -
Table 3. Granularity of and symptoms looked for by each technique
Table 3 summarises the finest level of granularity (methods or code frag-ments) of the di!erent techniques, and whether they look for symptomsof scattering and/or tangling. With a few exceptions, the typical gran-ularity of the techniques surveyed is at method level. Therefore, mosttechniques output several sets of methods, each representing a potentialaspect seed. Only the three ‘Clone detection’ techniques detect aspectcode at the level of code fragments and can therefore provide more fine-grained feedback on the code that needs to be put into the advice of therefactored aspect. All techniques use scattering as the basic indicator ofthe presence of a cross-cutting concern. Only ‘Dynamic analysis’ takesboth scattering and tangling into account, by requiring that the methodswhich occur in a single use-case scenario are implemented in multipleclasses (scattering), but also that these methods occur in multiple use-cases and thus are tangled with other concerns of the system.
Technique Largest case Size case Empiricallyvalidated
Execution patterns Gra!ti 3100 methods/82KLOC -Dynamic analysis JHotDraw 2800 methods/18KLOC -Identifier analysis JHotDraw 2800 methods/18KLOC -Language clues PetStore 10KLOC -Unique methods Smalltalk image 3400 classes/66000 methods -
Method clustering JHotDraw 2800 methods/18KLOC -Call clustering Banking example 12 methods -Fan-in analysis JHotDraw 2800 methods/18KLOC -
TomCat 5.5 API 172KLOC -Clone detection (PDG) TomCat 38KLOC -
Clone detection (AST/token) ASML C-Code 20KLOC X
Table 4. An assessment of the validation of each technique
To provide more insights into the validation of the techniques, Table 4mentions the largest case on which each technique has been validated,together with the size of that case, and whether the results have beenevaluated quantitatively (for example, how many known aspects wereactually reported, how many false positives and negatives were reported,and so on). While the size of the largest analysed system is significant formost of the studied techniques (only ‘Call Clustering’ was applied to atoy example only), empirical validation of the results was almost always
An assessment of the validation of each technique
no toy
examples little
empirical
validationcommon
benchmark?
Comparison (4)
44
neglected. It is also worth noting that 4 out of 9 techniques have beenvalidated on the same case: JHotDraw.One important criterion to help selecting an appropriate technique tomine a given system for aspects is what implicit or explicit assumptionsthat technique makes about how the crosscutting concerns are imple-mented. Table 5 summarises these assumptions in terms of preconditionsthat a system has to satisfy in order to find suitable aspect candidateswith a given technique.
Technique Preconditions on crosscutting concerns in the analysed programExecution patterns Order of calls in context of crosscutting concern is always the same.Dynamic analysis At least one use case exists that exposes the crosscutting concern
and another one that does not.Identifier analysis Names of methods implementing the concern are alike.Language clues Context of concern contains keywords which are synonyms for the
crosscutting concern.Unique methods Concern is implemented by exactly one method.
Method clustering Names of methods implementing the concern are alike.Call clustering Concern is implemented by calls to same methods from di!erent
modules.Fan-in analysis Concern is implemented in separate method which is called a high
number of times, or many methods implementing the concern callthe same method.
Clone detection Concern is implemented by reusing a certain code fragment.
Table 5. What conditions does the implementation of the concerns have to satisfy inorder for a technique to find viable aspect candidates?
‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-based clone detection’ all rely on the assumption that developers rig-orously made use of naming conventions when implementing the cross-cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume thatmethods which often get called together from within di!erent contextsare candidate aspects. The fan-in technique assumes that crosscuttingconcerns are implemented by methods which are called many times (largefootprint), or by methods calling such methods.
Technique User InvolvementExecution patterns Inspection of the resulting “recurring patterns”.Dynamic analysis Selection of use cases and manual interpretation of results.Identifier analysis Browsing of mined aspects using IDE integration.Language clues Manual interpretation of resulting lexical chains.Unique methods Inspection of the unique methods; eased by sorting on importance.
Method clustering Browsing of mined aspects using IDE integration.Call clustering Manual inspection of resulting clusters.Fan-in analysis Selection of candidates from list of methods, sorted on highest fan-in.Clone detection Browsing and manual interpretation of the discovered clones.
Table 6. Which kind of user involvement do the di!erent techniques require?
Table 6 summarises the kind of involvement that is required from theuser. None of the existing techniques works fully automatic. All tech-niques require that their users browse through the resulting aspect can-didates in order to find suitable aspects. Some require that the users
What conditions does the implementation of the concerns have to satisfy in order to find viable aspect candidates?
Comparison (4)
44
neglected. It is also worth noting that 4 out of 9 techniques have beenvalidated on the same case: JHotDraw.One important criterion to help selecting an appropriate technique tomine a given system for aspects is what implicit or explicit assumptionsthat technique makes about how the crosscutting concerns are imple-mented. Table 5 summarises these assumptions in terms of preconditionsthat a system has to satisfy in order to find suitable aspect candidateswith a given technique.
Technique Preconditions on crosscutting concerns in the analysed programExecution patterns Order of calls in context of crosscutting concern is always the same.Dynamic analysis At least one use case exists that exposes the crosscutting concern
and another one that does not.Identifier analysis Names of methods implementing the concern are alike.Language clues Context of concern contains keywords which are synonyms for the
crosscutting concern.Unique methods Concern is implemented by exactly one method.
Method clustering Names of methods implementing the concern are alike.Call clustering Concern is implemented by calls to same methods from di!erent
modules.Fan-in analysis Concern is implemented in separate method which is called a high
number of times, or many methods implementing the concern callthe same method.
Clone detection Concern is implemented by reusing a certain code fragment.
Table 5. What conditions does the implementation of the concerns have to satisfy inorder for a technique to find viable aspect candidates?
‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-based clone detection’ all rely on the assumption that developers rig-orously made use of naming conventions when implementing the cross-cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume thatmethods which often get called together from within di!erent contextsare candidate aspects. The fan-in technique assumes that crosscuttingconcerns are implemented by methods which are called many times (largefootprint), or by methods calling such methods.
Technique User InvolvementExecution patterns Inspection of the resulting “recurring patterns”.Dynamic analysis Selection of use cases and manual interpretation of results.Identifier analysis Browsing of mined aspects using IDE integration.Language clues Manual interpretation of resulting lexical chains.Unique methods Inspection of the unique methods; eased by sorting on importance.
Method clustering Browsing of mined aspects using IDE integration.Call clustering Manual inspection of resulting clusters.Fan-in analysis Selection of candidates from list of methods, sorted on highest fan-in.Clone detection Browsing and manual interpretation of the discovered clones.
Table 6. Which kind of user involvement do the di!erent techniques require?
Table 6 summarises the kind of involvement that is required from theuser. None of the existing techniques works fully automatic. All tech-niques require that their users browse through the resulting aspect can-didates in order to find suitable aspects. Some require that the users
Techniques make
different assumptions
about the source
code
What conditions does the implementation of the concerns have to satisfy in order to find viable aspect candidates?
Comparison
45
neglected. It is also worth noting that 4 out of 9 techniques have beenvalidated on the same case: JHotDraw.One important criterion to help selecting an appropriate technique tomine a given system for aspects is what implicit or explicit assumptionsthat technique makes about how the crosscutting concerns are imple-mented. Table 5 summarises these assumptions in terms of preconditionsthat a system has to satisfy in order to find suitable aspect candidateswith a given technique.
Technique Preconditions on crosscutting concerns in the analysed programExecution patterns Order of calls in context of crosscutting concern is always the same.Dynamic analysis At least one use case exists that exposes the crosscutting concern
and another one that does not.Identifier analysis Names of methods implementing the concern are alike.Language clues Context of concern contains keywords which are synonyms for the
crosscutting concern.Unique methods Concern is implemented by exactly one method.
Method clustering Names of methods implementing the concern are alike.Call clustering Concern is implemented by calls to same methods from di!erent
modules.Fan-in analysis Concern is implemented in separate method which is called a high
number of times, or many methods implementing the concern callthe same method.
Clone detection Concern is implemented by reusing a certain code fragment.
Table 5. What conditions does the implementation of the concerns have to satisfy inorder for a technique to find viable aspect candidates?
‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-based clone detection’ all rely on the assumption that developers rig-orously made use of naming conventions when implementing the cross-cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume thatmethods which often get called together from within di!erent contextsare candidate aspects. The fan-in technique assumes that crosscuttingconcerns are implemented by methods which are called many times (largefootprint), or by methods calling such methods.
Technique User InvolvementExecution patterns Inspection of the resulting “recurring patterns”.Dynamic analysis Selection of use cases and manual interpretation of results.Identifier analysis Browsing of mined aspects using IDE integration.Language clues Manual interpretation of resulting lexical chains.Unique methods Inspection of the unique methods; eased by sorting on importance.
Method clustering Browsing of mined aspects using IDE integration.Call clustering Manual inspection of resulting clusters.Fan-in analysis Selection of candidates from list of methods, sorted on highest fan-in.Clone detection Browsing and manual interpretation of the discovered clones.
Table 6. Which kind of user involvement do the di!erent techniques require?
Table 6 summarises the kind of involvement that is required from theuser. None of the existing techniques works fully automatic. All tech-niques require that their users browse through the resulting aspect can-didates in order to find suitable aspects. Some require that the users
Which kind of user involvementdo the different techniques require?
Comparison
45
neglected. It is also worth noting that 4 out of 9 techniques have beenvalidated on the same case: JHotDraw.One important criterion to help selecting an appropriate technique tomine a given system for aspects is what implicit or explicit assumptionsthat technique makes about how the crosscutting concerns are imple-mented. Table 5 summarises these assumptions in terms of preconditionsthat a system has to satisfy in order to find suitable aspect candidateswith a given technique.
Technique Preconditions on crosscutting concerns in the analysed programExecution patterns Order of calls in context of crosscutting concern is always the same.Dynamic analysis At least one use case exists that exposes the crosscutting concern
and another one that does not.Identifier analysis Names of methods implementing the concern are alike.Language clues Context of concern contains keywords which are synonyms for the
crosscutting concern.Unique methods Concern is implemented by exactly one method.
Method clustering Names of methods implementing the concern are alike.Call clustering Concern is implemented by calls to same methods from di!erent
modules.Fan-in analysis Concern is implemented in separate method which is called a high
number of times, or many methods implementing the concern callthe same method.
Clone detection Concern is implemented by reusing a certain code fragment.
Table 5. What conditions does the implementation of the concerns have to satisfy inorder for a technique to find viable aspect candidates?
‘Identifier Analysis’, ‘Method Clustering’, ‘Language Clues’ and ‘Token-based clone detection’ all rely on the assumption that developers rig-orously made use of naming conventions when implementing the cross-cutting concerns. ‘Execution Patterns’ and ‘Call Clustering’ assume thatmethods which often get called together from within di!erent contextsare candidate aspects. The fan-in technique assumes that crosscuttingconcerns are implemented by methods which are called many times (largefootprint), or by methods calling such methods.
Technique User InvolvementExecution patterns Inspection of the resulting “recurring patterns”.Dynamic analysis Selection of use cases and manual interpretation of results.Identifier analysis Browsing of mined aspects using IDE integration.Language clues Manual interpretation of resulting lexical chains.Unique methods Inspection of the unique methods; eased by sorting on importance.
Method clustering Browsing of mined aspects using IDE integration.Call clustering Manual inspection of resulting clusters.Fan-in analysis Selection of candidates from list of methods, sorted on highest fan-in.Clone detection Browsing and manual interpretation of the discovered clones.
Table 6. Which kind of user involvement do the di!erent techniques require?
Table 6 summarises the kind of involvement that is required from theuser. None of the existing techniques works fully automatic. All tech-niques require that their users browse through the resulting aspect can-didates in order to find suitable aspects. Some require that the users
Which kind of user involvementdo the different techniques require?
Manual effort still
required
46Dynamic
Static
DynamicAnalysis
ExecutionPatterns
UniqueMethods
Fan-inAnalysis
CallClustering
MethodClustering
IdentifierAnalysis
LanguageClues
Clustering
ConceptAnalysis
CloneDetection
AST-basedClone Detection
Token-basedClone Detection
PDG-basedClone Detection
method
fragments
Token-
Based
Structural /
Behavioral
method-level
Taxonomy
Overview
‣ What is aspect mining and why do we need it ?
‣ Different kinds of mining approaches :
1. Early aspects
2. Advanced Browsers
3. Source-code mining
‣ Overview of automated approaches
‣ Comparison of automated code mining techniques
‣ Conclusion: limitations of aspect mining47
Summary (1)
‣ Aspect Mining
- Identification of crosscutting concerns in legacy systems
- Aid developers when migrating an existing application to aspects
- Browsing approaches and automated techniques
48
Summary(2)
‣ Different techniques are complementary
- Different input (dynamic vs. static); granularity
- Rely on different assumptions/symptoms
- Make use of different analysis techniques
• cluster analysis, FCA, heuristics, ...
‣ When mining a system, apply (a combination of) different techniques
49
Possible research directions
‣ Quantitative comparison of techniques:
- Which aspects can be detected by which technique?
- Common benchmark/analysis framework
‣ Cross-fertilization with other research domains
- slicing, metrics, ...
50
Limitations
‣ There is no “silver bullet”
- Still a lot of manual intervention required
- False positives / false negatives still remain
- Are we really mining for aspects?
• or is it “joinpoint mining”?
51
Limitations
‣ Joinpoint mining?
- how to obtain the pointcut definition?
- how obtain the advice?
‣ Scalability
- user involvement
‣ Validation
‣ Granularity52
More reading...
‣ A. KELLENS, K. MENS & P. TONELLA. A Survey of Automated Code-Level Aspect Mining Techniques. Submitted to Transactions on Aspect-Oriented Software Development, Special Issue on Software Evolution. 2006.
‣ M. CECCATO, M. MARIN, K. MENS, L. MOONEN, P. TONELLA & T. TOURWE. Applying and Combining Three Different Aspect Mining Techniques. Software Quality Journal, 14(3) : 209–231. Springer, September 2006.
53
More reading...
‣ A. KELLENS, K. MENS & P. TONELLA. A Survey of Automated Code-Level Aspect Mining Techniques. Submitted to Transactions on Aspect-Oriented Software Development, Special Issue on Software Evolution. 2006.
‣ M. CECCATO, M. MARIN, K. MENS, L. MOONEN, P. TONELLA & T. TOURWE. Applying and Combining Three Different Aspect Mining Techniques. Software Quality Journal, 14(3) : 209–231. Springer, September 2006.
53
Top Related