OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O....

57
OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    223
  • download

    1

Transcript of OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O....

Page 1: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

OORPTObject-Oriented Reengineering Patterns and Techniques

7. Problem Detection

Prof. O. Nierstrasz

Page 2: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.2

Roadmap

> Metrics> Object-Oriented Metrics in Practice> Duplicated Code

Page 3: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.3

Roadmap

> Metrics— Software quality— Analyzing trends

> Object-Oriented Metrics in Practice> Duplicated Code

Page 4: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.4

Why Metrics in OO Reengineering (ii)?

> Assessing Software Quality—Which components have poor quality?

(Hence could be reengineered)—Which components have good quality?

(Hence should be reverse engineered) Metrics as a reengineering tool!

> Controlling the Reengineering Process—Trend analysis: which components changed?—Which refactorings have been applied? Metrics as a reverse engineering tool!

Page 5: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.5

ISO 9126 Quantitative Quality Model

SoftwareQuality

Functionality

Reliability

Efficiency

Usability

Maintainability

Portability

ISO 9126 Factor Characteristic Metric

Error tolerance

Accuracy

Simplicity

Modularity

Consistency

defect density= #defects / size

correction impact= #components

changed

correction time

Leaves are simple metrics, measuring basic attributes

Page 6: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.6

Product & Process Attributes

Product AttributeDefinition: measure aspects of artifacts delivered to the customerExample: number of system defects perceived, time to learn the system

Process AttributeDefinition: measure aspects of the process which produces a productExample: time to correct defect, number of components changed per correction

Page 7: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.7

External & Internal Attributes

External AttributeDefinition: measures how the product/process behaves in its environmentExample: mean time between failure, #components changed

Internal AttributeDefinition: measured purely in term of the product, separate from its behaviour in contextExample: class coupling and cohesion, method size

Page 8: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.8

External vs. Internal Product Attributes

External Internal

Advantage: > close relationship with quality factors

Disadvantage:> relationship with quality factors is

not empirically validated

Disadvantages:> measure only after the product is

used or process took place> data collection is difficult; often

involves human intervention/interpretation

> relating external effect to internal cause is difficult

Advantages:> can be measured at any time> data collection is quite easy and

can be automated> direct relationship between

measured attribute and cause

Page 9: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.9

Metrics and Measurements

> Weyuker [1988] defined nine properties that a software metric should hold. — Read Fenton & Pfleeger for critiques.

> For OO only 6 properties are really interesting [Chidamber 94, Fenton & Pfleeger ]— Noncoarseness:

– Given a class P and a metric m, another class Q can always be found such that m(P) m(Q)

– Not every class has the same value for a metric

— Nonuniqueness. – There can exist distinct classes P and Q such that m(P) = m(Q)– Two classes can have the same metric

— Monotonicity– m(P) m (P+Q) and m(Q) m (P+Q), P+Q is the “combination” of the

classes P and Q.

Page 10: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.10

Metrics and Measurements (ii)

— Design Details are Important– The specifics of a class must influence the metric value. Even if a

class performs the same actions details should have an impact on the metric value.

— Nonequivalence of Interaction– m(P) = m(Q) m(P+R) = m(Q+R) where R is an interaction with

the class.

— Interaction Increases Complexity– m(P) + (Q) < m (P+Q). – when two classes are combined, the interaction between the too

can increase the metric value

> Conclusion: Not every measurement is a metric.

Page 11: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.11

Selecting Metrics

> Fast— Scalable: you can’t afford log(n2) when n 1 million LOC

> Precise— (e.g. #methods — do you count all methods, only public ones,

also inherited ones?)— Reliable: you want to compare apples with apples

> Code-based— Scalable: you want to collect metrics several times— Reliable: you want to avoid human interpretation

> Simple— Complex metrics are hard to interpret

Page 12: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.12

Assessing Maintainability

> Size of the system, system entities— Class size, method size, inheritance— The intuition: large entities impede maintainability

> Cohesion of the entities— Class internals— The intuition: changes should be local

> Coupling between entities— Within inheritance: coupling between class-subclass— Outside of inheritance— The intuition: strong coupling impedes locality of changes

Page 13: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.13

Sample Size and Inheritance Metrics

Class

AttributeMethodAccess

Invoke

BelongTo

Inherit

Inheritance Metricshierarchy nesting level (HNL)# immediate children (NOC)# inherited methods, unmodified (NMI)# overridden methods (NMO)

Class Size Metrics# methods (NOM)# instance attributes (NIA, NCA)# Sum of method size (WMC)

Method Size Metrics# invocations (NOI)# statements (NOS)# lines of code (LOC)

Page 14: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.14

Sample class Size

> (NIV) — [Lore94] Number of Instance Variables (NIV) — [Lore94] Number of Class Variables (static) (NCV) — [Lore94] Number of Methods (public, private, protected) (NOM)

> (LOC) Lines of Code> (NSC) Number of semicolons [Li93] number of

Statements > (WMC) [Chid94] Weighted Method Count

— WMC = ∑ ci

— where c is the complexity of a method (number of exit or McCabe Cyclomatic Complexity Metric)

Page 15: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.15

Hierarchy Layout

> (HNL) [Chid94] Hierarchy Nesting Level , (DIT) [Li93] Depth of Inheritance Tree,

> HNL, DIT = max hierarchy level> (NOC) [Chid94] Number of Children > (WNOC) Total number of Children > (NMO, NMA, NMI, NME) [Lore94] Number of Method

Overridden, Added, Inherited, Extended (super call)> (SIX) [Lore94]

— SIX (C) = NMO * HNL / NOM— Weighted percentage of Overridden Methods

Page 16: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.16

Method Size

> (MSG) Number of Message Sends> (LOC) Lines of Code> (MCX) Method complexity

— Total Number of Complexity / Total number of methods— API calls= 5, Assignment = 0.5, arithmetics op = 2, messages

with params = 3....

Page 17: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.17

Sample Metrics: Class Cohesion

> (LCOM) Lack of Cohesion in Methods — [Chidamber 94] for definition— [Hitz 95] for critique

Ii = set of instance variables used by method Milet P = { (Ii, Ij ) | Ii Ij = }

Q = { (Ii, Ij ) | Ii Ij }if all the sets are empty, P is emptyLCOM = |P| - |Q| if |P|>|Q|

0 otherwise> Tight Class Cohesion (TCC)> Loose Class Cohesion (LCC)

— [Bieman 95] for definition— Measure method cohesion across invocations

Page 18: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.18

Sample Metrics: Class Coupling (i)

> Coupling Between Objects (CBO)— [Chidamber 94a] for definition, — [Hitz 95a] for a discussion— Number of other classes to which it is coupled

> Data Abstraction Coupling (DAC)— [Li 93] for definition— Number of ADT’s defined in a class

> Change Dependency Between Classes (CDBC)— [Hitz 96a] for definition— Impact of changes from a server class (SC) to a client class

(CC).

Page 19: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.19

Sample Metrics: Class Coupling (ii)

> Locality of Data (LD)— [Hitz 96] for definition

LD = ∑ |Li | / ∑ |Ti | Li = non public instance variables

+ inherited protected of superclass+ static variables of the class

Ti = all variables used in Mi, except non-static local variablesMi = methods without accessors

Page 20: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.20

The Trouble with Coupling and Cohesion

> Coupling and Cohesion are intuitive notions— Cf. “computability”— E.g., is a library of mathematical functions “cohesive”— E.g., is a package of classes that subclass framework classes

cohesive? Is it strongly coupled to the framework package?

Page 21: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.21

Conclusion: Metrics for Quality Assessment

> Can internal product metrics reveal which components have good/poor quality?

> Yes, but...— Not reliable

– false positives: “bad” measurements, yet good quality– false negatives: “good” measurements, yet poor quality

— Heavyweight Approach– Requires team to develop (customize?) a quantitative quality model– Requires definition of thresholds (trial and error)

— Difficult to interpret– Requires complex combinations of simple metrics

> However...— Cheap once you have the quality model and the thresholds— Good focus (± 20% of components are selected for further inspection)

> Note: focus on the most complex components first!

Page 22: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.22

Roadmap

> Metrics> Object-Oriented Metrics in Practice

— Detection strategies, filters and composition— Sample detection strategies: God Class …

> Duplicated Code

Michele Lanza and Radu Marinescu, Object-Oriented Metrics in Practice, Springer-Verlag, 2006

Michele Lanza and Radu Marinescu, Object-Oriented Metrics in Practice, Springer-Verlag, 2006

Page 23: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.23

Detection strategy

> A detection strategy is a metrics-based predicate to identify candidate software artifacts that conform to (or violate) a particular design rule

Page 24: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.24

Filters and composition

> A data filter is a predicate used to focus attention on a subset of interest of a larger data set— Statistical filters

– I.e., top and bottom 25% are considered outliers

— Other relative thresholds– I.e., other percentages to identify outliers (e.g., top 10%)

— Absolute thresholds– I.e., fixed criteria, independent of the data set

> A useful detection strategy can often be expressed as a composition of data filters

Page 25: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.25

God Class

> A God Class centralizes intelligence in the system— Impacts understandibility— Increases system fragility

Page 26: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.26

ModelFacade (ArgoUML)

> 453 methods> 114 attributes> over 3500 LOC> all methods and all

attributes are static

Page 27: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.27

Feature Envy

> Methods that are more interested in data of other classes than their own [Fowler et al. 99]

Page 28: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.28

ClassDiagramLayouter

Page 29: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.29

Data Class

> A Data Class provides data to other classes but little or no functionality of its own

Page 30: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.30

Data Class (2)

Page 31: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.31

Property

Page 32: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.32

Shotgun Surgery

> A change in an operation implies many (small) changes to a lot of different operations and classes

Page 33: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.33

Project

Page 34: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.34

Roadmap

> Metrics> Object-Oriented Metrics in Practice> Duplicated Code

— Detection techniques— Visualizing duplicated code

Page 35: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.35

Code is Copied

Small Example from the Mozilla Distribution (Milestone 9)Extract from /dom/src/base/nsLocation.cpp

[432] NS_IMETHODIMP [433] LocationImpl::GetPathname(nsString[434] {[435] nsAutoString href;[436] nsIURI *url;[437] nsresult result = NS_OK;[438] [439] result = GetHref(href);[440] if (NS_OK == result) {[441] #ifndef NECKO[442] result = NS_NewURL(&url, href);[443] #else[444] result = NS_NewURI(&url, href);[445] #endif // NECKO[446] if (NS_OK == result) {[447] #ifdef NECKO[448] char* file;[449] result = url->GetPath(&file);[450] #else[451] const char* file;[452] result = url->GetFile(&file);[453] #endif[454] if (result == NS_OK) {[455] aPathname.SetString(file);[456] #ifdef NECKO[457] nsCRT::free(file);[458] #endif[459] }[460] NS_IF_RELEASE(url);[461] }[462] }[463] [464] return result;[465] }[466]

[467] NS_IMETHODIMP [468] LocationImpl::SetPathname(const nsString[469] {[470] nsAutoString href;[471] nsIURI *url;[472] nsresult result = NS_OK;[473] [474] result = GetHref(href);[475] if (NS_OK == result) {[476] #ifndef NECKO[477] result = NS_NewURL(&url, href);[478] #else[479] result = NS_NewURI(&url, href);[480] #endif // NECKO[481] if (NS_OK == result) {[482] char *buf = aPathname.ToNewCString();[483] #ifdef NECKO[484] url->SetPath(buf);[485] #else[486] url->SetFile(buf);[487] #endif[488] SetURL(url);[489] delete[] buf;[490] NS_RELEASE(url); [491] }[492] }[493] [494] return result;[495] }[496]

[497] NS_IMETHODIMP [498] LocationImpl::GetPort(nsString& aPort)[499] {[500] nsAutoString href;[501] nsIURI *url;[502] nsresult result = NS_OK;[503] [504] result = GetHref(href);[505] if (NS_OK == result) {[506] #ifndef NECKO[507] result = NS_NewURL(&url, href);[508] #else[509] result = NS_NewURI(&url, href);[510] #endif // NECKO[511] if (NS_OK == result) {[512] aPort.SetLength(0);[513] #ifdef NECKO[514] PRInt32 port;[515] (void)url->GetPort(&port);[516] #else[517] PRUint32 port;[518] (void)url->GetHostPort(&port);[519] #endif[520] if (-1 != port) {[521] aPort.Append(port, 10);[522] }[523] NS_RELEASE(url);[524] }[525] }[526] [527] return result;[528] }[529]

Page 36: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.36

Case Study LOCDuplication

without comments

with comments

gcc 460’000 8.7% 5.6%

Database Server 245’000 36.4% 23.3%

Payroll 40’000 59.3% 25.4%

Message Board 6’500 29.4% 17.4%

How Much Code is Duplicated?

Usual estimates: 8 to 12% in normal industrial code15 to 25 % is already a lot!

Page 37: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.37

is not considered duplicated code.

could be abstracted to a new function

...getIt(hash(tail(z)));...

...getIt(hash(tail(a)));...

...computeIt(a,b,c,d);...

...computeIt(w,x,y,z);...

What is Duplicated Code?

> Duplicated Code = Source code segments that are found in different places of a system.— in different files— in the same file but in different functions— in the same function

> The segments must contain some logic or structure that can be abstracted, i.e.,

> Copied artifacts range from expressions, to functions, to data structures, and to entire subsystems.

Page 38: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.38

Copied Code Problems

> General negative effect— Code bloat

> Negative effects on Software Maintenance— Copied Defects — Changes take double, triple, quadruple, ... Work— Dead code— Add to the cognitive load of future maintainers

> Copying as additional source of defects — Errors in the systematic renaming produce unintended aliasing

> Metaphorically speaking:— Software Aging, “hardening of the arteries”, — “Software Entropy” increases even small design changes become very

difficult to effect

Page 39: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.39

Nontrivial problem: • No a priori knowledge about which code has been copied• How to find all clone pairs among all possible pairs of segments?

Lexical Equivalence

Semantic Equivalence

Syntactical Equivalence

Code Duplication Detection

Page 40: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.40

Source Code Transformed Code Duplication Data

Transformation Comparison

Author Level Transformed Code Comparison Technique

Johnson 94 Lexical Substrings String-Matching

Ducasse 99 Lexical Normalized Strings String-Matching

Baker 95 Syntactical Parameterized Strings String-Matching

Mayrand 96 Syntactical Metric Tuples Discrete comparison

Kontogiannis 97 Syntactical Metric Tuples Euclidean distance

Baxter 98 Syntactical AST Tree-Matching

General Schema of Detection Process

Page 41: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.41

Recall and Precision

Page 42: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.42

…//assign same fastid as containerfastid = NULL;const char* fidptr = get_fastid();if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ];

…//assign same fastid as containerfastid = NULL;const char* fidptr = get_fastid();if(fidptr != NULL) { int l = strlen(fidptr); fastid = newchar[ l + 1 ];

…fastid=NULL;constchar*fidptr=get_fastid();if(fidptr!=NULL)intl=strlen(fidptr)fastid = newchar[l+]

…fastid=NULL;constchar*fidptr=get_fastid();if(fidptr!=NULL)intl=strlen(fidptr)fastid = newchar[l+]

Simple Detection Approach (i)

> Assumption: – Code segments are just copied and changed at a few places

> Noise elimination transformation– remove white space, comments– remove lines that contain uninteresting code elements

– (e.g., just ‘else’ or ‘}’)

Page 43: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.43

Simple Detection Approach (ii)

> Code Comparison Step— Line based comparison

(Assumption: Layout did not change during copying)— Compare each line with each other line. — Reduce search space by hashing:

– Preprocessing: Compute the hash value for each line– Actual Comparison: Compare all lines in the same hash bucket

> Evaluation of the Approach— Advantages: Simple, language independent — Disadvantages: Difficult interpretation

Page 44: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.44

while (<>) { chomp; $totalLines++;

# remo ve comments of type /* */ my $codeOnly = ''; while(($inComment && m|\*/|) || (!$inComment && m|/\*|)) { unless($inComment) { $codeOnly .= $` } $inComment = !$inComment; $_ = $'; } $codeOnly .= $_ unless $inComment; $_ = $codeOnly;

s|//.*$||; # remo ve comments of type // s/\s+//g; #remo ve white space s/$keywordsRegExp//og if $remo veKeywords; #remo ve keywords

$equivalenceClassMinimalSiz e = 1;$slidingWindo wSize = 5;$remo veKeywords = 0;@keywords = qw(if then else );

$keywordsRegExp = join '|', @k eywords;

@unwantedLines = qw( else return return; { } ; );push @unw antedLines, @keywords;

A Perl script for C++ (i)

Page 45: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.45

A Perl script for C++ (ii)

$codeLines++; push @currentLines, $_; push @currentLineNos, $.; if($slidingWindowSiz e < @currentLines) { shift @currentLines; shift @currentLineNos;} #print STDERR "Line $totalLines >$_<\n"; my $lineToBeCompared = join '', @currentLines; my $lineNumbersCompared = "<$ARGV>"; # append the name of the fi le $lineNumbersCompared .= join '/', @currentLineNos; #print STDERR "$lineNumbersCompared\n"; if($bucketRef = $eqLines{$lineToBeCompared}) { push @$bucketRef , $lineNumbersCompared; } else {$eqLines{$lineToBeCompared} = [ $lineNumbersCompared ];} if(eof) { close ARGV } # Reset linenumber-count for next file

• Handles multiple files• Removes comments

and white spaces• Controls noise (if, {,)• Granularity (number of

lines)• Possible to remove

keywords

Page 46: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.46

Output Sample

Lines: create_property(pd,pnImplObjects,stReference,false,*iImplObjects);create_property(pd,pnElttype,stReference,true,*iEltType);create_property(pd,pnMinelt,stInteger,true,*iMinelt);create_property(pd,pnMaxelt,stInteger,true,*iMaxelt);create_property(pd,pnOwnership,stBool,true,*iOwnership);Locations: </face/typesystem/SCTypesystem.C>6178/6179/6180/6181/6182 </face/typesystem/SCTypesystem.C>6198/6199/6200/6201/6202Lines: create_property(pd,pnSupertype,stReference,true,*iSupertype);create_property(pd,pnImplObjects,stReference,false,*iImplObjects);create_property(pd,pnElttype,stReference,true,*iEltType);create_property(pd,pMinelt,stInteger,true,*iMinelt);create_property(pd,pnMaxelt,stInteger,true,*iMaxelt);Locations: </face/typesystem/SCTypesystem.C>6177/6178</face/typesystem/SCTypesystem.C>6229/6230

Lines = duplicated linesLocations = file names and line number

Page 47: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.47

Enhanced Simple Detection Approach

> Code Comparison Step— As before, but now

– Collect consecutive matching lines into match sequences– Allow holes in the match sequence

> Evaluation of the Approach— Advantages

– Identifies more real duplication, language independent

— Disadvantages– Less simple– Misses copies with (small) changes on every line

Page 48: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.48

Abstraction

— Abstracting selected syntactic elements can increase recall, at the possible cost of precision

Page 49: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.49

Metrics-based detection strategy

> Duplication is significant if:— It is the largest possible duplication chain uniting all exact

clones that are close enough to each other. — The duplication is large enough.

Page 50: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.50

Automated detection in practice

> Wettel [ MSc thesis, 2004] uses three thresholds:— Minimum clone length: the minimum amount of lines present in

a clone (e.g., 7)— Maximum line bias: the maximum amount of lines in between

two exact chunks (e.g., 2)— Minimum chunk size: the minimum amount of lines of an exact

chunk (e.g., 3)

Mihai Balint, Tudor Gîrba and Radu Marinescu, “How Developers Copy,” ICPC 2006Mihai Balint, Tudor Gîrba and Radu Marinescu, “How Developers Copy,” ICPC 2006

Page 51: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.51

Exact Copies Copies with Inserts/Deletes Repetitive

a b c d e f a b c d e f a b c d e fa b x y e f b c d e a b x y dc ea x b c x d e x f xg ha

Variations Code Elements

Visualization of Duplicated Code

> Visualization provides insights into the duplication situation— A simple version can be implemented in three days— Scalability issue

> Dotplots — Technique from DNA Analysis — Code is put on vertical as well as horizontal axis— A match between two elements is a dot in the matrix

Page 52: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.52

Detected ProblemFile A contains two copies of a piece of code

File B contains another copy of this code

Possible SolutionExtract Method

All examples are made using Duploc from an industrial case study (1 Mio LOC C++ System)

File A

File A

File B

File B

Visualization of Copied Code Sequences

Page 53: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.53

Detected Problem4 Object factory clones: a switch statement over a type variable is used to call individual construction code

Possible SolutionStrategy Method

Visualization of Repetitive Structures

Page 54: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.54

Visualization of Cloned Classes

Class A

Class B

Class BClass A

Detected Problem:Class A is an edited copy of class B. Editing & Insertion

Possible SolutionSubclassing …

Page 55: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.55

20 Classes implementing lists for different data types

DetailOverview

Visualization of Clone Families

Page 56: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.56

Conclusion

> Duplicated code is a real problem— makes a system progressively harder to change

> Detecting duplicated code is a hard problem— some simple techniques can help— tool support is needed

> Visualization of code duplication is useful— basic tool support is easy to build

(e.g., 3 days with rapid-prototyping)

> Curing duplicated code is an active research area

Page 57: OORPT Object-Oriented Reengineering Patterns and Techniques 7. Problem Detection Prof. O. Nierstrasz.

© Stéphane Ducasse, Serge Demeyer, Oscar Nierstrasz

OORPT — Problem Detection

7.57

License

> http://creativecommons.org/licenses/by-sa/2.5/

Attribution-ShareAlike 2.5You are free:• to copy, distribute, display, and perform the work• to make derivative works• to make commercial use of the work

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or licensor.

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one.

• For any reuse or distribution, you must make clear to others the license terms of this work.• Any of these conditions can be waived if you get permission from the copyright holder.

Your fair use and other rights are in no way affected by the above.

Attribution-ShareAlike 2.5You are free:• to copy, distribute, display, and perform the work• to make derivative works• to make commercial use of the work

Under the following conditions:

Attribution. You must attribute the work in the manner specified by the author or licensor.

Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only under a license identical to this one.

• For any reuse or distribution, you must make clear to others the license terms of this work.• Any of these conditions can be waived if you get permission from the copyright holder.

Your fair use and other rights are in no way affected by the above.