2001 IEEE International Conference on Software Maintenance (ICSM'01).

35
1 Comparing the Decompositions Produced by Software Clustering Algorithms using Similarity Measurements 2001 IEEE International Conference on Software Maintenance (ICSM'01). Brian S. Mitchell & Spiros Mancoridis Math & Computer Science, Drexel University

description

Comparing the Decompositions Produced by Software Clustering Algorithms using Similarity Measurements. 2001 IEEE International Conference on Software Maintenance (ICSM'01). Brian S. Mitchell & Spiros Mancoridis Math & Computer Science, Drexel University. Motivation. - PowerPoint PPT Presentation

Transcript of 2001 IEEE International Conference on Software Maintenance (ICSM'01).

Page 1: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

1

Comparing the Decompositions Produced by Software Clustering

Algorithms using Similarity Measurements

2001 IEEE International Conference on Software Maintenance (ICSM'01).

Brian S. Mitchell & Spiros MancoridisMath & Computer Science, Drexel University

Page 2: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

2Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Motivation

Using module dependencies when determining the similarity between two decompositionsis a good idea…

Page 3: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

3Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Clustering the Structure of a System (1)Given the structure of a system…

Page 4: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

4Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Clustering the Structure of a System (2)

The goal is to partition the system structure graphinto clusters…

The clusters shouldrepresent the

subsystems

Page 5: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

5Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Clustering the Structure of a System (3)

But how do we know that the clustering result is good?

Page 6: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

6Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Ways to Evaluate Software Clustering Results…

Assess it against a mental modelAssess it against a benchmark standardTechniques: Subjective Opinions Similarity Measurements

Given a software clustering result, we can:

Page 7: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

7Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Example: How “Similar” are these Decompositions?

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

Green Edges:Similarity still the same…Red Edges:Not as similar…Conclusions:Once we add the red edges the similarity between PA and PB decreases

PA

PB

Blue Edges:Similarity still the same…

Page 8: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

8Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Observations

Edges are important for determining the similarity between decompositionsExisting measurements don’t consider edges: Precision / Recall (similarity) MoJo (distance)

Our idea: Use the edges to determine similarity

Page 9: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

9Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Research ObjectivesCreate new similarity measurements that use dependencies (edges) EdgeSim (similarity) MeCl (distance)

Evaluate the new similarity measurements against MoJo & Precision/RecallUse similarity measurements to support evaluation of software clustering results (see our WCRE’01 paper)

Page 10: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

10Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Example: How “Similar” are these Decompositions?

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

Add Green Edges:PR, MoJo, MeCl & EdgeSim unchanged.

Add Red Edges:PR, MoJo unchanged. EdgeSim, MeCl reduced.

PA

PB

Add Blue Edges:PR, MoJo, MeCl & EdgeSim unchanged.

Page 11: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

11Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Definitions

M1

M2

M3

M4

Internal/Intra-Edge: Edge within a cluster

External/Inter-Edge: Edge between two clusters

Page 12: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

12Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

EdgeSim Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

Page 13: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

13Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

EdgeSim Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

Step 1:Find Common

Inter- and Intra-Edges

Page 14: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

14Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

EdgeSim Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

CommonEdge Weight

Total EdgeWeight

=10

19= 53%

Page 15: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

15Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c f g

hd e

i j

k l

MDG

Page 16: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

16Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(AB)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

A1 A3A2

B1 B2

Page 17: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

17Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A1 B1)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

c

A1,1

A1 A3A2

B1 B2

U

Page 18: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

18Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A2 B1)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

A1,1 A2,1

A1 A3A2

B1 B2

U

Page 19: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

19Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A1 B2)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e

A1,1 A2,1

A1,2

A1 A3A2

B1 B2

U

Page 20: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

20Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A2 B2)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

A1,1 A2,1

A1,2 A2,2

A1 A3A2

B1 B2

U

Page 21: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

21Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(A3 B2)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

A1,1 A2,1

A1,2 A2,2

A3,2

A1 A3A2

B1 B2

U

Page 22: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

22Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

B1

B2

MeCl Example(AB)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

A1,1 A2,1

A1,2 A2,2

A3,2

A1 A3A2

B1 B2

Page 23: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

23Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example (AB)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

A1,1 A2,1

A1,2 A2,2

A3,2

A1 A3A2

B1 B2

B1

B2

Newly Introduced Inter-Edges

Page 24: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

24Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example(BA)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

B1,1 B1,2

B2,1 B2,2

B2,3

A1 A3A2

B1 B2

Page 25: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

25Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

A1 A2

A3

MeCl Example(BA)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

B1,1 B1,2

B2,1 B2,2

B2,3

A1 A3A2

B1 B2

Page 26: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

26Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Example (BA)

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

a b

cf g

d e h

k l

i j

B1,1 B1,2

B2,1 B2,2

B2,3

A1 A3A2

B1 B2

A1 A2

A3

Newly Introduced Inter-Edges

Page 27: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

27Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

MeCl Calculation

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

A1 A3A2

B1 B2

Inter-Edges Introduced

MeCl(AB):({b,e},{e,c},{g,h},{f,h})

MeCl(BA):({e,i},{h,j},{b,f},{c,f},{h,e})

MeCl= 1 -maxW(MAB, MBA)Total Edge Weight

MeCl = 1 -5

19 = 73.7%

Page 28: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

28Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Similarity Measurement RecapM1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A1:

B1:

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A2:

B2:

P1 P2

MoJo(P1) = MoJo(P2) = 87.5%

PR(P1) = PR(P2) = P:84.6%, R:68.7%, AVGPR=76.7%

Conclusion… P1 is equally similar to P2

Page 29: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

29Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Similarity Measurement RecapM1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A1:

B1:

M1

M5

M2

M6

M3

M4

M7

M8

M1

M5

M2

M6

M3

M4

M7

M8

A2:

B2:

P1 P2

EdgeSim(P1)=77.8% EdgeSim(P2)=58.3%

MeCl(P1)=88.9% MeCl(P2)=66.7%

Conclusion… P1 is more similar than P2

Page 30: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

30Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Summary:EdgeSim & MeCl

EdgeSim: Rewards clustering algorithms for

preserving the edge types Penalizes clustering algorithms for

changing the edge types

MeCl: Rewards the clustering algorithm for

creating cohesive “subclusters”

Page 31: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

31Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Special Modules

i j

k la b

c

d e

f g

h

c

f

g

a

b

e

h

k

l

d

i

j

PA

PB

A1 A3A2

B1 B2

Omnipresent Modules:“Strong” Connection to other ModulesLibrary Modules:Always used by other modules, never use other modules

Isomorphic Modules:Modules equally connected to other subsystems

Page 32: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

32Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Special Modules

i

k la b

c f g

h

c

f

g

a

b

h

k

l

i

PA

PB

A1 A3A2

B1 B2

k

k

Omnipresent Modules:RemovedLibrary Modules:Removed

Isomorphic Modules:Replicated

Special Treatment of Special Treatment of Special Modules helps to Special Modules helps to determine the Similaritydetermine the Similarity

d

d

Page 33: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

33Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Case Study Overview

Clustering Algorithms

Similarity Evaluation Tool

Similarity Analysis

Precision/Recall

Source Codevoid main(){ printf(“hello”);}

Clustered Result

M1

M2

M3

M5M4

M6

M7 M8

MoJo

EdgeSim

MeClAverage, Variance, etc. based on 100 clustering runs…(4950 Evaluations)

Page 34: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

34Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Case Study ObservationsAll similarity measurements exhibit consistent behavior for the systems studied

Removal of “special” modules improved all similarity measurements Treating isomorphic modules specially only improved similarity slightlyEdgeSim and MeCl produced higher and less variable similarity values then Precision/Recall and MoJo

For all systems examined:If MeCl(SA) < MeCl(SB) then MoJo(SA) < MoJo(SB), PR(SA) < PR(SB), and EdgeSim(SA) < EdgeSim(SB)

Page 35: 2001 IEEE International Conference on Software Maintenance  (ICSM'01).

35Drexel University Software Engineering Research Group (SERG)http://serg.mcs.drexel.edu

Questions

Special Thanks To: AT&T Research Sun Microsystems DARPA NSF US Army