An Empirical Investigation into Scenario Level Software...

77
An Empirical Investigation into Scenario Level Software Evolution using Calling Context Trees A DISSERTATION SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE AWARD OF THE DEGREE OF MASTER OF TECHNOLOGY IN COMPUTER SCIENCE AND ENGINEERING SUBMITTED BY Sarishty Gupta Roll No. 14203008 UNDER THE SUPERVISION OF Dr. Paramvir Singh Assistant Professor DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING DR. B. R. AMBEDKAR NATIONAL INSTITUTE OF TECHNOLOGY JALANDHAR 144011, PUNJAB (INDIA) JULY, 2016

Transcript of An Empirical Investigation into Scenario Level Software...

Page 1: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

An Empirical Investigation into Scenario Level Software

Evolution using Calling Context Trees

A DISSERTATION

SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR

THE AWARD OF THE DEGREE

OF

MASTER OF TECHNOLOGY

IN

COMPUTER SCIENCE AND ENGINEERING

SUBMITTED BY

Sarishty Gupta

Roll No. 14203008

UNDER THE SUPERVISION OF

Dr. Paramvir Singh

Assistant Professor

DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING

DR. B. R. AMBEDKAR NATIONAL INSTITUTE OF TECHNOLOGY

JALANDHAR – 144011, PUNJAB (INDIA)

JULY, 2016

Page 2: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

i

DR. B. R. AMBEDKAR NATIONAL INSTITUTE OF TECHNOLOGY, JALANDHAR

CANDIDATE’S DECLARATION

I hereby certify that the work, which is being presented in the dissertation, entitled “An

Empirical Investigation into Scenario Level Software Evolution using Calling

Context Trees” by “Sarishty Gupta” in partial fulfillment of requirements for the award

of degree of M.Tech. (Computer Science and Engineering) submitted to the Department

of Computer Science and Engineering of Dr. B R Ambedkar National Institute of

Technology, Jalandhar, is an authentic record of my own work carried out during a period

from August, 2015 to July, 2016 under the supervision of Dr. Paramvir Singh, Assistant

Professor. The matter presented in this dissertation has not been submitted by me in any

other University/Institute for the award of any degree.

Sarishty Gupta

Roll No. 14203008

This is to certify that the above statement made by the candidate is correct and true to the

best of my knowledge.

Dr. Paramvir Singh (Supervisor)

Assistant Professor

Department of Computer Science & Engineering

Dr. B. R. Ambedkar NIT, Jalandhar

The M.Tech (Dissertation) Viva-Voce examination of Sarishty Gupta, Roll No.

14203008, has been held on ____________ and accepted.

External Examiner Supervisor Head of Department

Page 3: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

ii

ACKNOWLEDGEMENTS

First and Foremost, I would like to express my gratitude to my supervisor Dr. Paramvir

Singh, Assistant Professor, for the useful comments, remarks and engagement through

the learning process of this master thesis. I cannot thank him enough for his tremendous

support and help. He motivated and encouraged me throughout this work. Without his

encouragement and guidance this project would not have materialized. I consider myself

extremely fortunate to have a chance to work under his supervision. In spite of his busy

schedule, he was always approachable and took his time off to guide me and gave

appropriate advice.

I also wish to thank whole heartedly all the faculty members of the Department of

Computer Science and Engineering and especially, Mr. Amit Dogra for the invaluable

knowledge they have imparted on me and for teaching the principles in most exciting and

enjoyable way. I also extend my thanks to the technical and administrative staff of the

department for maintaining an excellent working facility.

I would like to thank my family for their continuous support and blessings throughout

the entire process, both by keeping me harmonious and helping me putting me pieces

together. I would also like to extend thank to my friends for the useful discussions,

constant support and encouragement during whole period of the work.

I would like to thank almighty GOD for giving me enough strength and lifting me

uphill this phase of life.

Sarishty Gupta

Page 4: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

iii

ABSTRACT

Software evolution and maintenance plays a crucial role in software development life

cycle. Software evolution is a process of undergoing change from the conception to the

decommissioning of a software system. Software comprehension is needed for better

understanding of software functionality. Software metrics are one of the preferred ways

to understand and control the software systems. Software evolution can be better

understood if analyzed at user scenario level. Understanding the behavior of participating

classes across a set of software versions could be helpful in scenario comprehension.

Scenario level analysis defines the behavior of software system in a user centric

perspective.

Dynamic analysis techniques help in analyzing the run time behavior of programs.

Calling Context Tree (CCT) provides complete information about the dynamic behavior

of programs. CCTs are used in a wide range of software development processes and large

applications such as testing, debugging and error reporting, performance analysis,

program analysis, security enforcement, and event logging. CCTs have never been used

to study the scenario level software evolution before.

This work empirically investigates whether CCT based metrics such as number of nodes,

height etc. provide new insights into comprehending the evolution of scenarios. A set of

four static, three dynamic, and four CCT metrics are analyzed to comprehend the

evolution of eight scenarios across four open source java applications. Correlation

analysis and principal component analysis are used to analyze the relationship among the

selected set of metrics. The results reveal that two out of four CCT metrics have high

correlation with selected static and dynamic metrics. Height of CCT remains constant

across multiple versions of sample applications per scenario. Therefore, CCT metrics

provide useful information for scenario level evolution.

Page 5: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

iv

CONTENTS

CERTIFICATE i

ACKNOWLEDGEMENTS ii

ABSTRACT iii

LIST OF FIGURES vii

LIST OF TABLES viii

LIST OF ABBREVIATIONS ix

CHAPTER 1

INTRODUCTION

1.1 Software Evolution 01

1.1.1 Software Quality Attributes 02

1.1.2 Evolution Challenges 03

1.2 Software Comprehension 05

1.2.1 Purpose of Software Comprehension 05

1.2.2 Comprehension Challenges 06

1.3 User Scenarios 08

1.4 Software Design Quality Metrics 09

1.4.1 Static Metrics 09

1.4.2 Dynamic Metrics 10

1.5 Calling Context Profiling 10

1.5.1 Calling Context Tree 11

1.6 Motivation 12

1.7 Research Objectives 13

1.8 Thesis Outline 13

Page 6: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

v

CHAPTER 2

LITERATURE REVIEW

2.1 Software Evolution 14

2.1.1 Lehman’s laws of software evolution 14

2.1.2 Empirical studies on software evolution 15

2.1.3 Software metrics and evolution 17

2.2 Scenario based Work 18

2.3 Calling Context Trees 20

2.4 Chapter Summary 22

CHAPTER 3

EXPERIMENTAL DESIGN AND METHODOLOGY

3.1 Sample Applications 23

3.2 Selected Metrics 26

3.2.1 CCT Metrics 26

3.2.2 Dynamic Metrics 27

3.2.3 Static Metrics 27

3.3 Tools Used 27

3.4 Data Analysis Techniques 32

3.4.1 Correlation Analysis 32

3.4.2 Principal Component Analysis 33

3.4 Methodology 34

3.5 Chapter Summary 35

CHAPTER 4

RESULTS AND ANALYSIS

4.1 Evolution of CCT metrics 36

4.1.1 Evolution of NON 36

4.1.2 Evolution of NLN 36

Page 7: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

vi

4.1.3 Evolution of AH 37

4.1.4 Evolution of Height 37

4.2 Descriptive Statistics 40

4.3 Correlation Analysis 42

4.4 Principal Component Analysis 45

4.5 Result Summary 48

4.6 Chapter Summary 49

CHAPTER 5

CONCLUSIONS AND FUTURE WORK

5.1 Conclusions 50

5.2 Future Work 51

REFERENCES 52

APPENDIX A 59

APPENDIX B 62

APPENDIX C 64

APPENDIX D 66

Page 8: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

vii

LIST OF FIGURES

Figure 1.1 Representation of software evolution process 02

Figure 1.2 Example Java code 12

Figure 1.3 CCT representation 12

Figure 3.1 Use case scenario for DrawSWF, JHotDraw, Art of Illusion

(Scenario 1)

25

Figure 3.2 Use case scenario for DrawSWF, JHotDraw, Art of Illusion

(Scenario 2)

25

Figure 3.3 Use case scenario Sunflow (Scenario 1) 25

Figure 3.4 Use case scenario for Sunflow (Scenario 2) 26

Figure 3.5 CCT generated by JP2 in the form of XML file 28

Figure 3.6 FindBugs tool showing bug patterns 29

Figure 3.7 EclEmma tool showing the classes covered 30

Figure 3.8 CodePro AnalytiX showing LOC metric 31

Figure 3.9 STAN tool showing CBO metrics 32

Figure 3.10 Methodology flow 35

Figure 4.1 Evolution of NON (Scenario 1) 38

Figure 4.2 Evolution of NLN (Scenario 1) 39

Figure 4.3 Evolution of AH (Scenario 1) 39

Figure 4.4 Evolution of Height (Scenario 1) 39

Page 9: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

viii

LIST OF TABLES

Table 2.1 Laws of software evolution 15

Table 3.1 Characteristics of Sample Applications 24

Table 4.1 CCT metrics for DrawSWF 37

Table 4.2 CCT metrics for Sunflow 37

Table 4.3 CCT metrics for JHotDraw 38

Table 4.4 CCT metrics for Art of Illusion 38

Table 4.5 Descriptive statistics for DrawSWF 40

Table 4.6 Descriptive statistics for Sunflow 41

Table 4.7 Descriptive statistics for JHotDraw 41

Table 4.8 Descriptive statistics for Art of illusion 42

Table 4.9 Correlation matrix for DrawSWF 43

Table 4.10 Correlation matrix for Sunflow 44

Table 4.11 Correlation matrix for JHotDraw 44

Table 4.12 Correlation matrix for Art of Illusion 45

Table 4.13 PCA results for DrawSWF 46

Table 4.14 PCA results for Sunflow 47

Table 4.15 PCA results for JHotDraw 47

Table 4.16 PCA results for Art of Illusion 48

Page 10: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

ix

LIST OF ABBREVIATIONS

CCT Calling Context Tree

NON Number Of Nodes

NLN Number of Leaf Nodes

AH Average Height

IC Import Coupling

EC Export Coupling

NPC Number of Participating Classes

LOC Lines Of Code

CC Cyclomatic Complexity

CBO Coupling Between Object classes

NOB Number Of Bugs

PCA Principal Component Analysis

Page 11: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 1

CHAPTER 1

INTRODUCTION

Software maintenance and evolution is a continuous process in the software development life

cycle to repair existing faults, eradicate deficiencies, enhance platform compatibility,

overcome complexity and increase user satisfaction. Successful software requires continuous

change that is triggered by evolving requirements, technologies, and stakeholder knowledge.

The imminent changes in client requirements render software evolution inevitable. Software

evolution is defined as “the dynamic behavior of software systems as they are maintained and

enhanced over their lifetimes” [1]. Software evolution has gained importance and picked up

the attention of software developers over the past years.

1.1 Software Evolution

Lehman and Belady [1] initially applied the term evolution to software in 1970s, and since

then most researchers have utilized this term to refer to the broader view of change in

software systems. Software evolution is concerned with the sequence of changes to a software

system over its lifetime; it incorporates both development and maintenance. Indeed, software

systems need to continuously evolve during their life cycle for different reasons: adding new

features to fulfill user requirements, changing business needs, presenting novel technologies,

correcting faults, enhancing quality, etc. [4, 5]. Software that doesn’t have the ability to face

these changes in environmental behavior has to face early demise. Although the dimension

and extent of change is not 100% predicted, but introducing evolvability and changeability

from the very beginning (in software architecture and design process) improves the quality of

software, and reduces the maintenance cost.

Software evolution plays a major role in the overall lifecycle of a software system. As

software evolves, the changes made to the software must be deliberately managed. Software

evolution is based on the delivery of the multiple releases. Each release introduces new

functionality which is not present in the earlier versions and provides valuable information to

the developers and users. As the working environment of software is not stable so the real

success of a system is not based on the success of its one or two releases, but on its ability to

evolve gracefully against the changing requirements.

Software evolution is important since organizations have invested huge amounts of money in

their software and are now totally reliant on these systems. Their systems are critical business

Page 12: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 2

assets, and they need to put resources in the system change to keep up the value of these

assets. Subsequently, most large organizations spend more on maintaining existing systems

than on development of new systems. On the basis of an informal industry poll, Erlikh [6]

observed that 85–90% of organizational software costs are evolution (or development) costs.

Other surveys recommend that about two-thirds of software costs are evolution costs.

Certainly, the cost of software changes is substantial part of the IT budget plan for all

companies.

Figure 1.1: Representation of software evolution process

1.1.1 Software Quality Attributes

The quality attributes are used to characterize anticipated quality level and they often define

the non functional aspects of the software system. The overall quality of a system depends

upon the tight coupling of different attributes. There are different classifications for the

quality attributes discussed in the literature. The ISO/IEC 9126 defines six software quality

attributes which are functionality, reliability, efficiency, usability, maintainability and

portability while McCall described the following quality attributes and named them as quality

factors [68].

Correctness: This attribute defines the degree to which software system meets its

intended goals.

Reliability: This attribute defines the degree to which software system expected to

meets its intended goals.

Efficiency: This attribute defines the degree to which software system proficiently use

the available resources to meet its intended goals.

Integrity: It is the extent to which software system is secure in term of authorization.

Page 13: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 3

Usability: This attribute defines the degree to which software system is easy to use,

interact and operate.

Maintainability: This attribute defines how easy to locate and fix the error.

Testability: This attribute defines the degree of easiness in ensuring the correctness of

software system.

Flexibility: This attribute defines how easy to modify or evolve a software system.

Portability: This attribute defines how easy it is to shift a software system from one

hardware/software platform to another.

Reusability: This attribute defines the extent to which a software system or the

components of software system can be used for other applications.

Interoperability: This attribute defines the capability of a software system to work

with other systems without unusual effort.

Software quality can be characterized into external and internal attributes.

External quality attributes: These attributes are those which can be evaluated when

executing the software. The external attributes are visible to the users of the system.

Poor reliability, for instance, is visible to the user if the software system does not

perform as expected.

Internal quality attributes: These attributes are those which are evaluated by

inspecting the internal features of the software. The internal attributes concern the

developers of the system. For example, during the operation of a system, the users will

not notice whether the components of that system have loose coupling or not.

1.1.2 Evolution Challenges

There are number of evolution challenges discussed in the literature by different researchers

but a brief explanation is given by Tom Mens [69] which is as follows:

Upholding and amending software quality: By the passage of time the quality of

software decreases due to adaptation of new requirements and it makes the structure of

the system more complex. This phenomenon was named as ‘software aging’ by David

Parnas [70]. According to Manny Lehman [13], if the software is being evolved, there

should be some strategies, tools and techniques that improve or maintain the quality of

the system.

Need of common evolution platform: Addressing the previous challenge; with

respect to long-lasting huge software systems demands highly intelligent tools and

Page 14: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 4

techniques. Building of such kind of tool is not an easy task to be performed by a

single individual or research group. So there is a need of a common platform on which

group of researches and individual form all over the world can work and make some

tools and techniques.

Requirement of tools for higher order artifacts: Majority of the evolution

supporting tools are available for low level software artifacts like source code. So the

tool support for higher level artifacts i.e. software requirement specification, analysis,

designs and architecture is another challenge.

Co-evolution between higher and lower artifacts: As there is a greater relationship

between higher and lower artifacts, so evolution in any of the artifacts will have

impact on others. This gives rise to another problem which is need of co-evolution

between lower and higher artifacts, so that they can be evolved in parallel.

Formal support for evolution: Formal methods are the methods and techniques that

use the mathematical power for software specification, verification, validation and

development activities. As the user requirements continuously change, such formal

methods are required which may handle evolving user demands.

Evolution-oriented Languages: The research and development of evolution-oriented

languages has also been a challenge in order to treat the concept of change as essential

fact of life. It is similar to object oriented languages that consider the reusability as

first-class feature. Tom Mens et al. [69] discussed that it is easy to integrate the

evolutionary aspects in dynamically typed languages as compared to statically typed

languages. Classical modular development approach is also very helpful in improving

the evolvability of a program but it has a limitation i.e. the ability of adding or

replacing a method in a class that is not defined in that particular module. Object

oriented languages like C++ and Java support this feature but to a little extent.

Mutli-language Support: As the number of formal languages for different sort of

software engineering activities are increasing day by day such as programming

languages, modeling languages, specification languages, etc. So the development of a

set of standards in order to improve the interoperability and coordination among them

is also a challenge for software evolution.

Evolution as fundamental part in Software life cycle: Conventional software life

cycle models have less focus on the notion of evolution. The development of change-

integrated lifecycle model should be considered to improve evolvability.

Page 15: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 5

Collection and manipulation of evolution records: Finding evolution records i.e.

bug reports, change requests, source code, configuration information, versioning

repositories, error logs, documentation etc and the ways how the evolution is recorded,

can help in improving evolution. The tools and techniques required to manipulate

huge amount of data well in time is another evolution challenge.

Generalized law of software evolution: Michael W. Godfrey [48] raised another

issue; which is to generalize laws of software evolution for open source as well as

proprietary software. As it is not considered a good practice to do experiments on

proprietary software due to reason that proprietors of the software usually don’t want

their information to be exposed anyway i.e. through their software. So it is another

evolvability challenge to perform experiments on non proprietary software and

develop generalized laws for both kind of software.

1.2 Software Comprehension

Software comprehension is an elementary and expensive activity in the software maintenance.

A large part of software maintenance is software comprehension which utilizes massive

amount of time and effort [71]. It is essential to have good understanding of the software

system as a whole to effectively make required changes. For better software comprehension, it

is essential to have detailed information about certain software aspects, such as problem

domain, execution effect, cause-effect relationship, product-environment relationship and

decision support features.

1.2.1 Purpose of Software Comprehension

The basic purpose of software comprehension is to get the quick understanding of the system

to implement the requested change in such a way that does not disturb the architecture of the

software system and does not hinder the future evolution [72]. When performing maintenance

of a software system it is not possible to make changes without having a complete

understanding of system and the interactions within that system.

When developers deal with a large and a rather complex software system, it is not easy to

make changes without having a complete understanding of the interactions and the relations

that exist between the different system components. Therefore, there arises an urgent need for

developers to comprehend the software. In general, the purpose of comprehension depends on

the task of interest. That is to say, there must be some cause to force the development team to

comprehend software systems. For example, a developer may try to localize a bug/feature, or

Page 16: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 6

assess possible or obtainable changes to an API. Most frequently a specific concept or

particular feature is inspected in the software, and this concept/ feature is most often related to

a user change requests [56].

Source code contains a lot of information that is either peripheral or hidden by other

components. One useful approach that developers have suggested is to facilitate program

understanding and program maintenance by extracting and clearly representing the

information that is most important in source code. Software comprehension tools help

engineers in capturing the benefit of new added code. They are necessary as economic

demands need a maintenance engineer to rapidly and successfully develop comprehension of

the parts of source code that are relevant to a maintenance request.

1.2.2 Comprehension Challenges

Stupid use of high intelligence (Tricky Code): It is observed that programmers write

tricky code which is very hard to understand by other person, which later on causes

problems in comprehension. There could be many intentions behind writing tricky

code that are to show intelligence or smartness, job security etc [73].

Different programming styles: Programming styles vary from company to company

and programmer to programmer. Growing dependency on software systems has made

them larger in size. Sometimes it is not possible for a programmer or a small group of

programmers to write a large program. Due to higher development cost of the local

service providers, software development companies also use offshore software

development where programmers globally participate in software development. So a

large software program is co-written by a group of programmers having different

programming skill and experience. These sorts of programs are very difficult to

understand.

Poor Naming convention: Though the program comprehension is not only limited to

the naming of identifier but it is also not possible to refuse their importance. The use

of meaningless, inconsistent and poor naming conventions may be a heavy burden or

source of lack of comprehension. Similarly meaningful, descriptive and self-

documenting names are also insufficient but the main concern is that a name should

also have the ability to define the concept clearly.

Program representation: Complex kind of control flow and unordered blocks of

code make the comprehension more complex. Unobvious or unclear dependencies

make it more difficult for the programmers to understand code. Grouping related

Page 17: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 7

statements and modules might smoothen the process of comprehension. Good visual

layout makes the code more understandable and modifiable in the absence of author.

Different layout tools can be used like whitespace, blank lines, alignment, indentation

etc.

Insufficient comments: It is a known fact that the lack of comments and the outdated-

comments as well, are the major causes to increase maintenance cost and reduce the

comprehension. There are some problems regarding to commenting code which are

hard to cope. It is often seen that comments and source code ratio does not remain

stable in source code evolution

Dangerously deep-nesting/Deep-Inheritance tree (DIT): Undue usage of deep-

nesting is one of the major causes in making code comprehension unmanageable.

According to the author [74], there are just a few people who are able to understand

deep-nesting which goes more than three levels. Similarly, depth of inheritance tree

could also affect understandability. For example, if classes are deeply inherited in a

program they keep more methods to inherit, which give rise to complexity hence it is

another reason of making the understandably harder.

Concept location: A change request may consist of a problem statement or an

innovative idea by the user. After the change request understanding the next phase is

to locate the specific part of code which is responsible for the problem or where the

change should be implemented to get the required result. Identification of that

particular part of code is known as concept/concern location. Concept location is

another challenging task in software comprehension as in case of a large complex

program it is neither possible nor cost effective way to read all the code from the

scratch.

Code duplication: Code duplication is another common problem that severely

complicates the software comprehension process. No doubt writing the code from the

beginning is very hard as compared to code duplication but software having that

problem is often considered bad. There could be several reasons for code duplication

like programmers feel it easier and more reliable to use a pretested code fragment.

Identification of dead code: The Legacy systems that has been maintained and

evolved by different programmers for several times mostly contain noteworthy

amount of dead code. The dead code might contain some methods and classes that are

no more usable, but identification of such classes and methods is not an easy task.

Page 18: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 8

Moreover, when a legacy system undergoes maintenance the dead code become one of

the major problems in understanding code.

1.3 User Scenarios

Scenarios have picked up attention in both research and practice as a way of grounding

software-engineering projects in the users’ work. Gough et al. [7] studied the use of scenarios

and concluded that the scenarios clearly succeeded in providing a means by which customer

representatives, requirements engineers, and other stakeholders in the development process

could work with system requirements. A scenario can be defined as a description of a possible

set of events that might reasonably take place [8]. Software engineers look at scenarios as an

effective means to discover user needs, to better utilize the use of systems in work processes,

and to systematically explore system behavior – under both normal and exceptional situations.

Software engineers view scenarios as a powerful intend to determine user needs and to

discover the behavior of the system. Scenarios are helpful as they give us separation from the

present, widen the future and permit the making of alternatives of future. The principle

purpose of developing scenarios is to animate thinking about possible events, presumptions

relating these events, possible risks, opportunities, and strategies [2, 3].

The use case analysis is used to show the interactions between systems and users in a

particular environment. The first step in use case analysis is to determine the types of users or

other systems that will use the facilities of the system. These are called actors. An actor is a

role that a user or some other system plays when interacting with the system. The second step

in use case analysis is to determine the tasks that each actor can perform. Each task is called a

use case because it represents one particular way the system will be used. A use case is a

typical sequence of actions that an actor performs in order to complete a given task. For

example, use case scenario for leaving a particular automated car park (parking lot).

Actors: Car drivers

Use case: Exit car park, paying cash

Goals: To leave the parking lot after having paid the amount due.

Jarke et al. [2] spanned three areas i.e. strategic management, human–computer interaction

and software engineering in their survey of how scenarios help in analysis and design. The

characterizing property of a scenario is that it envisions a detailed description of activity that

the user engages in when performing a specific task [9]. While scenarios can be used for

various purposes throughout the software-engineering life cycle they have received most

Page 19: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 9

attention as a design artifact for providing conceptual design in the users’ work. The scenarios

preserve a real-world feel of the contents and flow of the users’ work. This shows that the

scenarios succeeded in describing the users’ tasks and useful by all the software engineers

across their different responsibilities within the project [3].

1.4 Software Design Quality Metrics

Software metrics play a crucial part in understanding and controlling the overall software

engineering process. Software metrics help us to make significant evaluations for software

products and guide us in taking managerial and technical decisions like cost estimation,

quality assurance testing, budget planning, software debugging, software performance

optimization, and optimal personnel task assignments. Software metrics can, especially, be

used to analyze the evolution of the software systems [10]. Metrics have, in fact, various

characteristics for providing evolution support. A large number of metrics have been proposed

for measuring various attributes of software systems such as size, complexity, cohesion and

coupling.

A large amount of research work is done on finding the structural measures for software

maintainability and evolvability. Chidamber & Kemerer‘s [20] CK-metrics is very helpful for

structural measures but the limitation with CK-metrics is that it only offers services to class

level and not to the complete software system level. CK-metrics uses following class level

measure to assess the level of evolvability of a class:

Number of methods in class (WMC)

Coupling between objects (CBO)

Number of children (NOC)

Depth of inheritance tree (DIT)

Cohesion of methods (LCOM)

Response set for class (RFC)

1.4.1 Static Metrics

Static metrics are obtainable at the early phases of software development life cycle and deals

with structural features of software. These metrics are easy to gather. Static complexity

metrics estimate the amount of effort needed to develop, and maintain the code. Static metrics

concentrates on static properties of the software and a number of static metrics have been

proposed in literature for the measurement of cohesion [20,33-36 ], coupling [20,37-39 ], and

other attributes of object-oriented software using source code or design of the software, which

Page 20: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 10

are static in nature. The static metrics are able to evaluate various aspects of the complexity of

design or source code of a software system but are unable to accurately predict the dynamic

behavior of applications.

1.4.2 Dynamic Metrics

Dynamic metrics capture the dynamic behaviour of the software system and are computed on

the basis of the data collected during actual execution of the system. Major benefit of using

dynamic metrics is their ability to more accurately measure the internal attributes of software

like coupling, complexity etc., which have direct impact on quality factors of software such as

reliability, reusability, testability, maintainability, error-rates and performance. Dynamic

metrics are efficient to deal with all object-oriented features and dead code. Major dynamic

metrics have been proposed for the measurement of coupling [40-43], cohesion [44, 45], and

complexity [46, 47].

1.5 Calling Context Profiling

Profiling analyses programs at runtime to gather performance-related information like method

call counts and execution time. Basically profiling data can be obtained with two different

methods: instrumentation and sampling. With instrumentation, the program code is modified

to enable the collection of profiling data, for example by adding code at method calls and

returns. These modifications are not necessarily at source code level, but can be applied at

every level from source code to machine code. In the case of Java, the most useful level is at

bytecode, because the source code is not always available and bytecode can be instrumented

easily during class loading. By modifying the program code itself, the profile of the analysed

program can differ extremely from the original program. This is mainly due to changes in

possible compiler optimizations. In contrast to instrumentation, which can see each and every

method call, sampling is a statistical approach, where the program is interrupted and the state

is analysed in certain intervals. Therefore, it cannot determine exact method call counts, but

only a ratio of how often a certain method was seen executing.

Often it is not enough to know only the top executing methods, because the problem is in a

caller method and not in the executing method itself. Therefore, a complete calling context

needs to be recorded for each call or sample, depending on which method is used. A calling

context is a stack trace, which consists of the top executing method and all caller methods

down to the program’s main method.

Page 21: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 11

Calling context profiling is a technique for analyzing dynamic inter-procedural control flow

of applications and provides dynamic metrics for each calling context. It is particularly

important for understanding and optimizing object-oriented software, where polymorphism

and dynamic binding hinder static analyses. Calling context profiling hereby collects statistics

separately for each calling context, such as the number of method invocations or the CPU

time spent in a calling context.

1.5.1 Calling Context Tree

The calling context tree (CCT) [11] is a data structure that represents all distinct calling

contexts of a program. Each node in a CCT corresponds to a method call and path from a

node to root represents calling context. The parent of a CCT node corresponds to the caller’s

context, while the child node represents the callee methods. The CCT maintains dynamic

metrics, such as method invocations, CPU time etc for each calling context and provides

complete information on dynamic program behavior. Figure 1.2 presents an example java

code and Figure 1.3 shows the corresponding CCT representation.

static int count;

void M ( ) {

for (int i = 0; i < 2; i++)

A ( ) ;

for (i = 0; i < 2; i++)

B ( ) ;

}

void A ( ) {

B ( ) ;

C ( ) ;

}

void B( ) {

count++ ;

if(count >= 3)

D ( ) ;

else

return ;

}

Page 22: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 12

void C ( ){

return ;

}

void D ( ){

return ;

}

Figure 1.2: Example Java code

Figure 1.3: CCT representation

1.6 Motivation

The following points present the motivation behind this work:

During evolution, the programmers must comprehend the existing program to be able

to add new functionalities or new properties to it. Software comprehension is a set of

activities performed in maintenance process to get a better understanding of software

functionality, architecture and overall impact on executing the change [71]. Inaccurate

and incomplete understanding of a software system is likely to severely degrade its

performance and reliability. Therefore, it is necessary to perform good program

comprehension to provide effective software maintenance and enable successful

evolution of software systems. It is important to comprehend the details of any

complex artifact to be able to maintain it.

Software systems are becoming more complex and difficult to comprehend. Very

large software systems contain several millions lines of source code and voluminous

documentation. The details of such a large system cannot be comprehended easily

[72]. So, software evolution can be analyzed at the scenario level. Scenarios are used

to define the functionality and behavior of a software system in a user-centric

Page 23: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Introduction

Page 13

perspective. It is easier to comprehend software systems at scenario level due to lesser

size.

Developers confront difficulties to understand the behaviour of software systems and

so static analysis has become less efficient. Consequently, dynamic analysis

techniques are used to analyze run-time behavior of programs. Profiling can be a

valuable technique here. Worthy information about program understanding, runtime

optimization, and performance analysis is provided by calling context. Calling context

profiling technique helps in analyzing dynamic interprocedural control flow of

applications.

1.7 Research Objectives

In this dissertation work, following objectives are addressed:

Objective 1:

To design and implement a scenario level metric collection process for aggregating calling

context tree and dynamic metrics.

Objective 2:

To perform a correlation analysis among a set of scenario level static, dynamic and CCT

metrics across multiple versions of selected sample applications.

1.8 Thesis Outline

The rest of this thesis is structured as follows:

Chapter 2 provides the literature survey of work related to this thesis. It presents the

previous work related to software evolution, static and dynamic metrics, calling

context tree.

Chapter 3 describes the experimental study design, including sample applications,

selected metrics, tools used, data analysis techniques and methodology followed.

Chapter 4 presents the analysis and results. This chapter also discusses the observed

values and their analysis.

Chapter 5 concludes the work with some useful future recommendations.

Page 24: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 14

CHAPTER 2

LITERATURE REVIEW

This chapter describes the literature related to software evolution, scenario and calling context

tree. Section 2.1 discusses the studies related to software evolution including Lehman’s laws,

empirical studies of software evolution and software metrics. Section 2.2 presents the scenario

related studies. Section 2.3 describes the studies based on calling context trees and their

applications.

2.1 Software Evolution

Historically, software evolution showed up as an unexpected and unplanned phenomenon that

was observed in the original case study [1]. Since that time, it gained the importance and

moved into the center of attention of software engineers. There are several studies that map

the field of software evolution.

2.1.1 Lehman’s laws of software evolution

Belady and Lehman [1, 12] first observed different results on the continuing changes in the

software system and performed an empirical study on IBM’s OS/360 and other large scale

systems. Based on these studies, Lehman et al. [13-15] proposed the laws of software

evolution. Eight laws of software evolution were refined and formulated to model the

dynamic behavior of software systems. These laws are Continuing Change, Increasing

Complexity, Self Regulation, Conservation of Organizational Stability, Conservation of

Familiarity, Continuing Growth, Declining Quality, and Feedback System. Table 2.1

summarizes these eight laws.

The laws of software evolution were initially found to observe how large software systems

were developed and maintained in industrial environment using conventional management

techniques and procedures [14]. Attention was mainly directed to the phenomena related to

development, continual adaptation, client satisfaction, global activity rate and feedback. Laws

I, II, VI, VII, and VIII have immediate appeal and offer strong intuition into the nature of

evolving software systems. Of these, Law VIII is the most subtle and complex, and perhaps

deserves further elaboration and study; for example, it also completely recognizes the role of

user feedback in providing impetus for change. Laws III, IV, and V propose hypotheses that

are more easily testable by empirical study [15], and perhaps warrant periodic re-evaluation

and re-examination as the nature of software and software development also changes.

Page 25: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Literature Review

Page 15

Table 2.1: Laws of software evolution

No. Law Name Description

1 Continuing Change An E-type system must be continually adapted else it

becomes progressively less satisfactory.

2 Increasing

Complexity

As an E-type system evolves its complexity increases

unless work is done to maintain or reduce it.

3 Self Regulation Global E-type system evolution processes are self-

regulating.

4 Conservation of

Organizational

Stability

The average effective global activity rate in an evolving

E-type system is invariant over system lifetime.

5 Conservation of

Familiarity

During the active life of an E-type system, the content

of successive releases is statistically invariant.

6 Continuing Growth The functional content of E-type systems must be

continually increased to maintain user satisfaction over

their lifetime.

7 Declining Quality The quality of E-type systems will appear to be

declining unless they are rigorously maintained and

adapted to operational environmental changes.

8 Feedback System E-type evolution processes constitute multi-level,

multi-loop, multi-agent feedback systems and must be

treated as such to be successfully modified or

improved.

2.1.2 Empirical studies on software evolution

Kemerer and Slaughter [16] conducted longitudinal empirical research on software evolution

to focus on how software cost and effort change over time. The investigation focused on the

types of changes, costs and effort to evolve the software. It then evaluated the existing

software evolution laws. The authors discussed the analysis methods used in the study, such

as examining the time series and sequence data analysis, as they believed that at this stage

clarifying research methods is more essential than obtaining a large volume of results.

Bennett and Rajlich [49] described a landscape for research in software maintenance and

evolution in order to improve the speed and accuracy of change while reducing costs and

Page 26: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Literature Review

Page 16

identifying key problems, promising solution strategies and topics of importance. They

presented a model of the complete software life-cycle, called the staged model. They

summarized a long-term radical view of software evolution, based on a service model not a

product model designed to meet the expected needs of emergent organizations

Godfrey and German [48] discussed the concept of software evolution from several

perspectives. They described insights about software evolution arised from Lehman’s laws of

software evolution. They compared software evolution to other kinds of evolution, from

science and social sciences field, and examined the forces that shape change. Finally, they

discussed the changing nature of software in general as it relates to evolution, and proposed

open challenges and future directions for software evolution research.

Cook et al. [75] presented an approach to understand software evolution that is based around

the quantifiable concept of evolvability. The concept gathered the study of software product

quality, the software evolution process, and their relationships with the organisational

environment. They assessed the opportunities for analysing and measuring evolvability at pre-

design, architectural, and source code stages in the software development.

Ambros et al. [50] presents several analysis and visualization techniques to understand

software evolution by exploiting the rich sources of artifacts that are available. Based on the

data models that need to be developed to cover sources such as modification and bug reports

they described how to use a Release History Database for evolution analysis. They presented

approaches to analyze developer effort for particular software entities. Further, change

coupling analyses that can reveal hidden change dependencies among software entities were

explained. They showed how to investigate architectural shortcomings over many releases

and to identify trends in the evolution.

Xie et al. [24] conducted an empirical study on the evolution of seven open source systems.

The study investigated the Lehman’s evolution laws. The authors found that various branches

of open source systems evolve in parallel. They utilized source code metrics as well as project

and defect information and analyzed growth rate of software, maintenance branches and

software changes distribution. They also discovered similarities in the evolution patterns of

the studied programs.

Novais et al. [26] performed a systematic mapping study to structure the research in software

evolution and visualization. The study investigated Software Evolution Visualization (SEV)

Page 27: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Literature Review

Page 17

approaches, and collected evidence about how SEV research is structured, incorporating

current evidence on the goals of the proposed approaches and identified key challenges for its

use in practice. The authors observed that, as the software evolution area is complex so there

is a need for multi-metrics, multi-perspective, multi-strategy SEV, to address various existing

software engineering goals.

Rajlich [51] discussed evolutionary software development and the software change, which is

the fundamental software evolution task. It further explained processes of evolutionary

software development that include agile, iterative, open source, inner source, and other

processes, and advocates investigation of the individual practices within these processes. The

author discussed the basic task of software evolution, software change, and presented phased

model of software change (PMSC) together with directions how it can be enhanced. It listed

research methodologies that have been used in investigation of software evolution, and

besides empirical work and proposed to pay attention to model creation and reasoning, so that

the process of knowledge discovery is more complete.

2.1.3 Software metrics and evolution

Mens et al. [17] provided an overview of the approaches that use metrics to analyze,

understand, predict and control software evolution. The authors classified the analysis into

two categories: predictive analysis (i.e. use of metrics before evolution) and retrospective

analysis (i.e. use of metrics after the evolution). To support retrospective analysis, metrics can

be utilized to comprehend the quality evolution of a software system by considering the

successive releases. Specifically, metrics can be used to measure whether the quality of a

software has improved or degraded between two releases.

Lee et al. [18] presented an analysis of open source software evolution with software metrics.

The authors explored that size, cohesion and coupling metrics can be used to assess the

software quality and understand the evolution behavior during software evolution process.

Software metrics were derived from several releases of the open source software system

studied.

Jermakovics et al. [19] proposed an approach to visually identify software evolution patterns

related to requirements. A combined visualization demonstrating the evolution of a software

system with the implementation of its requirements is suggested. The authors discussed that

such view can help project managers to keep the evolution process of a software system under

control. They utilized in their work complexity, cohesion and coupling metrics as defined by

Page 28: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Literature Review

Page 18

Chidamber et al. [20]. Mens et al. [21] presented a metrics-based study about the evolution of

Eclipse. They considered seven major releases and investigated whether three of the

Lehman’s laws of software evolution were supported by the data collected. They concentrated

on continuing growth, increasing complexity and continuing change.

Yacoub et al. [52] addressed the problem of measuring the quality of object-oriented designs

using dynamic metrics. They presented a set of dynamic metrics to measure the quality of

designs at an early development phase. The suite consisted of metrics for dynamic complexity

and object coupling based on execution scenarios. They applied the dynamic metrics to assess

the quality of a pacemaker application. They envisaged that dynamic metrics are as important

as static metrics for evaluating design quality and identified the need for empirical studies to

assess the correlation between dynamic metrics and faulty modules.

Briand et al. [22] performed a comprehensive empirical investigation of all the measures in

object-oriented (OO) design. They explored the relationship between existing OO design

measures and the software quality. Zimmerman et al. [23] applied data mining to version

histories to determine changes in a software system. They presented a tool, ROSE to guide the

programmers about the related changes. The tool ROSE predicted the changes for a given

project on the basis of previous versions.

Murgia et al. [27] explored software quality evolution in open source systems using agile

practices. The authors used a set of object oriented metrics to study software evolution and its

relationship with bug distribution. According to the achieved results, they concluded that

there is no a single metric which is able to explain the bug distribution during the evolution of

the analyzed systems.

Drouin et al. [25] empirically analyzed the quality evolution of software systems using

metrics. They analyzed the historical data gathered from successive versions of three open

source systems. They investigated the evolution of Quality Assurance Indicator metric along

the evolution of studied software systems. They examined three issues: the evolution of the Qi

metric and the internal quality of the studied software systems, the evolution of different size

attributes, and the quality of the removed classes versus the quality of added ones.

2.2 Scenario based Work

Breitman and Cesar [8] reported a study on a framework for scenario evolution. They

performed a laboratory case study, to analyze the evolution of scenarios in software

Page 29: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Literature Review

Page 19

engineering from requirement elicitation phase to coding. They observed around a hundred

and fifty scenarios to discover relationships and operations that occur during evolution. They

have proposed a scenario evolution framework which provides a classification of the

relationships and operations that can be observed in scenario evolution.

Bui and Carroll [2] have performed a review of scenario management from three disciplines –

strategic management, human-computer interaction and software system engineering. They

have proposed a new interdisciplinary scenario management framework. They have

concluded their finding on the basis of various brainstorming techniques. Their study

facilitates the use of scenario approaches and makes the study of scenarios more interesting

and effective.

Breitman and Cesar [28] have also performed an extensive research on the evolution of

scenarios. They have analyzed twelve case studies of around two hundred scenarios

containing about eight hundred episodes. They have captured data on scenario evolution to

confirm existing results to gather requirements for scenario evolution support environment.

They have formulated their results in a three tier framework consisting of process, product

and instance levels.

Zhang et al. [32] discussed that scenarios describe concrete behaviors of software and play an

important role in software development and specifically in requirements engineering. They

presented an integration method to detect the inconsistency between scenarios from different

viewpoints and to provide support for scenario evolution by creating new scenarios from an

integrated scenario.

Hertzum et al. in his work [3] discussed that the research conducted on scenario-based

evolution (SBD) does not include much studies as to how scenarios are practically used by

software engineers in real world projects. Hence, it is important that such research should

compare and evaluate current state of the art SBD evolution approaches. The study evaluates

the use of scenarios during the actual conceptual design of a large information system and

compares the role of scenarios with three other design artefacts: requirements specification,

user interface prototype and business model.

Breitman et al. [29] highlighted the fact that though scenarios are very helpful in identifying

and communicating requirements of Computer Based Systems (CBSs) but they don’t appear

to be applicable to the remaining CBS development process. They proposed that making

Page 30: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Literature Review

Page 20

scenarios more applicable to entire software development lifecycle requires integrating

scenarios with various other representations that are used during CBS development and such

kind of integration can be accomplished by tracing technology. They have prototyped an

automated support for full life cycle scenario management and tested their prototype on some

non-trivial systems.

Salah et al. [30] have proposed an approach using dynamic analysis to extract views of a

software system at three different levels – use case views, module interaction views and class

interaction views. These views can be used by maintainers to locate system features that need

to be modified. They have evaluated their approach on the Mozilla web browser system. They

have found all different views created during dynamic analysis of the case study driven by use

cases. The software views and automated tools presented in their paper have the ability to

collect dynamic data and present it in the form of a set of views.

Alspaugh and Anton [31] have discussed six aspects of scenarios with the inherent structure

on their automated support and the results of using such support. Their study suggest that use

of automated support frees analysts and allows them to focus on tasks which demand human

intelligence, hence resulting in better quality scenarios leading to better system requirements.

They conclude that scenarios which more effectively explore all possibilities, more correctly

express the understanding of stakeholders and analysts and better highlight the requirements

issues should be resolved in requirements elicitation phase itself rather than being left for

software developers to sort out.

Eitan et al. [54] addressed the problem of comprehending cause and effect relationships

between relatively independent behavior components of a single application. They focused on

the model of behavioral, scenario-based, programming written with the Java package BPJ.

They proposed browsing, filtering, and grouping mechanisms for comprehending traces. They

presented a tool with which helps the user to easily follow the decisions of the collective

execution mechanism.

2.3 Calling Context Trees (CCTs)

Ammon et al. [11] proposed a run-time data structure, called the calling context tree (CCT), to

label an arbitrary metric or set of metrics with its dynamic calling context. The CCT captures

a program’s calling behavior more precisely than a call graph, but its size is bounded, unlike a

complete dynamic call tree. Context sensitive profiling provides a calling context for flow

sensitive (or other) procedural-level profiles using CCT. A CCT can accurately record a

Page 31: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Literature Review

Page 21

metric along different call paths thereby solving the “gprof problem”. Another important

property of CCT is that the sets of paths in the DCT and CCT are identical. The out-degree of

a vertex in the CCT is bounded by the number of unique procedures that may be called by the

associate procedure. The breadth of the CCT is bounded by the number of procedures in a

program. In the absence of recursion, the depth of a CCT also is bounded by the number of

procedures in a program. However, with recursion, the depth of a CCT may be unbounded. To

bound the depth, they defined the vertex equivalence as follows:

Vertices v and w in a DCT are equivalent if: v and w represent the same procedure,

and

The tree parent of v is equivalent to the tree parent of w, or v = w, or there is a vertex

u such that u represents the same procedure as v and w and u is an ancestor of both v

and w.

Moret et al. [53] introduced Complete Calling Context Profiling (CCCP), an approach that

reconciles completeness and accuracy of the created CCTs, portability, and reduced overhead.

CCCP uses a generic bytecode instrumentation framework ensuring comprehensive bytecode

coverage, including the methods of standard Java class library. They used the resulting CCTs

for a detailed analysis of the dynamic behavior of Java systems and presented a thorough

analysis of the origin of runtime overheads.

Bond et al. [79] presented Breadcrumbs, an efficient technique for recording and reporting

dynamic calling contexts. They built it on an existing technique for computing a compact

encoding of each calling context that client analyses can use in place of a program location.

They combined the static call graph with limited dynamic information collected at cold call

sites, and using a backwards heuristic search to find potential contexts that match the calling

context value. They used the technique to add context sensitivity to two dynamic analyses: a

race detector and an analysis to identify the origins of null pointer exceptions.

Sarimbekov et al. [55] presented the design and implementation of JP2, a tool that profiles

both the inter- and intra-procedural control flow of workloads on standard JVMs. The calling

context profiles produced by JP2 preserve callsite information and execution statistics at the

level of individual basic blocks of code. JP2 is complemented with scripts that compute

different dynamic bytecode metrics from the profiles. They used JP2 for crossprofiling for an

embedded Java processor.

Page 32: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Literature Review

Page 22

Serrano and Zhuang [76] presented an approach for building calling context information

useful for program understanding, performance analysis and optimizations. The approach

exploited a lightweight profiling mechanism providing partial call traces. They proposed three

steps to merge partial call traces into a lesser number of partial calling context trees and

intended to minimize errors such that the final partial contexts represent the actual

components of the real calling context tree with a very high probability.

Huang and Bond [77] proposed a novel approach for maintaining the calling context at run

time and providing context sensitivity to dynamic analyses. They demonstrated a CCU-based

approach to outperform a CCT-based approach when adding context sensitivity to bug

detection analyses, and offering an appealing direction for future work on context-sensitive

dynamic analysis. They implemented CCU-based approach in a high-performance Java virtual

machine and integrated it with a staleness based memory leak detector and happens-before

data race detector, so they can report context-sensitive program locations that cause bugs.

Sarvari et al. [78] presented an efficient and scalable technique to extract design level

dynamic metrics from Calling Context Tree (CCT) using cloud based MapReduce paradigm.

They used CCT profiles having node count up to 40 million to extract a number of dynamic

coupling metrics. They analyzed performance characteristics like speed-up and scale-up to

strengthen the applicability of the parallel computation approach.

2.4 Chapter Summary

This chapter presented the literature survey regarding software evolution, usage of scenarios

and calling context tree. It discussed in detail Lehman’s laws of software evolution, various

empirical studies related to software evolution and studies related to software metrics and

evolution.

Page 33: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 23

CHAPTER 3

EXPERIMENTAL DESIGN AND METHODOLOGY

This chapter explains the experimental design and methodology followed for the study.

Section 3.1 discusses the sample applications under study. Section 3.2 presents selected set of

metrics. Section 3.3 describes the tools used for computation of metrics. Section 3.4

discusses the data analysis techniques used. Section 3.5 presents the experimental

methodology adopted for the study.

3.1 Sample Applications

The sample applications are selected on the basis of different sizes (in KLOC) and number of

classes. We selected GUI (Graphical User Interface) based applications. Multiple versions of

sample applications are considered in order to study the software evolution in an elaborative

manner. For our study, we selected four open source Java software systems, namely

DrawSWF, JHotDraw, Sunflow and Art of Illusion.

DrawSWF: DrawSWF [57] is a simple drawing application which generates animated

flash file. DrawSWF is completely customizable application depending on the user's

preferences. Five versions of DrawSWF are considered for elaborative analysis.

JHotDraw: JHotDraw [58] is a two-dimensional GUI framework for structured

drawing. It can be used to create many different editors from a simple doodle program

up to full fledged diagram editors and vector animation tools. Its design relies on some

well-known design patterns. Six versions of JHotDraw are selected for this study.

Sunflow: Sunflow [59] is a rendering system for photo-realistic image synthesis. It is

written in Java and built around a flexible ray tracing core and an extensible object-

oriented design. It has simple API for procedural scene creation. Five versions of

Sunflow are considered for experimental analysis.

Art of Illusion: Art of Illusion [60] is a 3D modeling, rendering, and animation

studio. It is capable of modeling and rendering both photorealistic and non-

photorealistic images and animations. Five versions of Art of Illusion are selected for

analysis.

Table 3.1 provides the characteristics of the selected sample applications such as total number

of classes present in the software and Lines of Code (KLOC) of the source code.

Page 34: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 24

Table 3.1 Characteristics of Sample Applications

Application Version Number of classes LOC

DrawSWF

1.2.5 177 17090

1.2.6 180 17468

1.2.7 205 18989

1.2.8 285 27686

1.2.9 285 27674

JHotDraw

7.1 465 54129

7.2 623 72051

7.3 639 73415

7.4.1 641 73308

7.5.1 671 79669

7.6 674 80542

Sunflow

6.1 126 13029

6.2 127 13096

6.3 131 13336

7.1 187 21835

7.2 185 21970

Art of Illusion

2.9.1 467 88719

2.9.2 469 89012

3.0 479 91572

3.0.1 479 91733

3.0.2 481 92222

Two composite scenarios of each application are considered in order to analyze the scenario-

centric features of software evolution. For applications DrawSWF, JHotDraw and Art of

Illusion first scenario (Scenario 1) considered is creating a figure in a file and saving that file

while second scenario (Scenario 2) considered is opening an existing file and then editing the

file and further saving it. For application Sunflow, scenario 1 is opening a scene file, building

it and then rendering while scenario 2 is performing interactive photorealistic rendering (IPR)

Page 35: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 25

of a scene file. Figure 3.1 and 3.2 shows the use case scenario for DrawSWF, JHotDraw, Art

of Illusion for scenario 1 and 2. Figure 3.3 and 3.4 shows the use case scenario for Sunflow

application for scenario 1 and 2.

Figure 3.1: Use case scenario for DrawSWF, JHotDraw, Art of Illusion (Scenario 1)

Figure 3.2: Use case scenario for DrawSWF, JHotDraw, Art of Illusion (Scenario 2)

Figure 3.3: Use case scenario for Sunflow (Scenario 1)

Page 36: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 26

Figure 3.4: Use case scenario for Sunflow (Scenario 2)

3.2 Selected Metrics

For each released version of the sample application, we computed a set of metrics. The

selected metrics include CCT metrics, static metrics, and dynamic metrics. The selected set of

metrics is computed for both scenario of each sample application.

3.2.1 CCT Metrics

Height: The height of a CCT is the height of the method at the root node of the CCT.

The height of a method is the maximum height of any sub-tree within the CCT with an

instance of the method as its root [61]. Height h(M) of any method M is:

h(M) = max (1+h(c)) cƐchild(M)

Average Height (AH): The average height of a CCT is defined as the average of

heights of the branches starting from the root node. Average height H is computed as:

)(

)(1

rout

ih

H

n

i

Here ‘r’ is the root node, ‘n’ is the number of branches of root node and ‘out’ denotes

the outdegree.

Number of Nodes (NON): The total number of nodes in a CCT is the number of

nodes in the left subtree of the root, plus the number of nodes in its right subtree, plus

one.

Number of Leaf Nodes (NLN): The number of leaf nodes is the count of leaf

methods in the CCT. A leaf method is one which doesn’t call any other method.

Page 37: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 27

3.2.2 Dynamic Metrics

Import Coupling (IC): This measure counts the number of distinct server classes

used by all methods of all objects of a class [80]. This metric quantify the extent to

which messages are sent from a class to other classes and objects in a system at

runtime.

Export Coupling (EC): This measure counts the number of distinct client classes that

in all objects of a given class are being used [80]. This metric quantify the extent to

which classes receive messages from other objects and classes in the system at

runtime.

Number of Participating Classes (NPC): This measure counts the number of classes

that participate in a scenario.

3.2.3 Static Metrics

Lines of Code (LOC): This measure is a count of the number of lines that contain

characters other than white space and comments.

Cyclomatic Complexity (CC): The cyclomatic complexity of a single method is a

measure of the number of distinct paths of execution within the method [81]. It is

measured by adding the one path for the method with each of the paths created by

conditional statements and operators. We have calculated the sum of the cyclomatic

complexity of each of the methods defined in the target elements.

Coupling Between Object classes (CBO): This measure is count of the number of

classes to which it is coupled [20]. It is used to measure the extent of coupling

occurring between a pair of objects or classes in a software system.

Number of Bugs (NOB): This measure counts the bug patterns in a program. Bug

patterns are possible errors in the code. It can be calculated for each class or for whole

software system.

3.3 Tools Used

This section presents the various tools used in the experimental methodology. It describes the

tools used to generate CCT, to calculate the set of software metrics.

JP2: JP2 [62] is an open source tool which is based on JVM. It is a calling context

profiler which collects accurate and complete profiles. JP2 collects profiles that are not

only complete but also call-site aware. That is, JP2 is able to distinguish between

multiple call sites within a single method. JP2 uses selective profiling of the dynamic

Page 38: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 28

extent of chosen methods and supports profiling of native method invocations. JP2

collects various static and dynamic metrics and then associate the metrics to a

corresponding node in CCT. JP2 generates output in textual and XML format. XML

output format can be used to compute metrics of interest.

FindBugs: FindBugs [63] is an open source static analysis tool that detects the bug

patterns in the Java bytecode. It is designed to evaluate what kinds of defects can be

effectively detected with relatively simple techniques. It uses data flow analysis for

checking bugs. FindBugs has a plugin architecture, in which detectors can be defined,

each of which may report several different bug patterns. Rather than using a pattern

language for describing bugs, FindBugs detectors are simply written in Java, using a

variety of techniques. FindBugs also includes some more sophisticated analysis

techniques devised to help effectively identify certain issues, such as dereferencing of

null pointers. We have used eclipse plug-in for FindBugs.

Figure 3.5: CCT generated by JP2 in the form of XML file

Page 39: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 29

Figure 3.6: FindBugs tool showing bug patterns

EclEmma: EclEmma [64] is a tool for Java code coverage and is openly available for

eclipse. It brings code coverage analysis directly into the Eclipse workbench. When a

Java application is run in coverage mode, EclEmma gathers coverage data and

displays different coverage metrics. It automatically computes coverage statistics

when the application terminates. Eclemma does not require modifying your projects or

performing any other setup.The coverage view in EclEmma lists coverage summaries

for the java projects and drill down to method level. The result of a coverage session is

directly visible in java source editors. Coverage report can also be exported in xml,

html or csv format.

Page 40: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 30

Figure 3.7: EclEmma tool showing the classes covered

CodePro AnalytiX: CodePro AnalytiX [65] is a set of software analysis tools for

eclipse developers who are concerned about improving software quality and reducing

developments costs and schedules. Java software audit features of the tool make it an

indispensable assistant to the developer in reducing errors as the code is being

developed and keeping coding practices in line with organizational guidelines. The

ability to make corrections to the code immediately can dramatically reduce

developments costs and improve the speed of finished product delivery. It can be used

in these areas: code audit, metrics, test generation, code dependencies analysis. This

tool computes static code metrics such as cyclomatic complexity, abstractedness,

efferent coupling, weighted methods etc. CodePro Analytix tool is used to compute

static metrics LOC and CC in the study.

Page 41: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 31

Figure 3.8: CodePro AnalytiX showing LOC metric

STAN: STAN [66] is a structure analysis tool for Java. It covers various aspects of

software quality and provides a set of metrics for measuring quality. STAN supports a

set of carefully selected metrics, suitable to cover the most important aspects of

structural quality. This tool is used to calculate CBO for each sample application.

STAN supports:

a. Several counting metrics

b. Average Component Dependency, Fat and Tangled

c. Metrics by Robert C. Martin [82]

d. Metrics by Chidamber & Kemerer [20]

Page 42: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 32

Figure 3.9: STAN tool showing CBO metric

3.3 Data Analysis Techniques

In this section, we provide a description of the data analysis techniques used to analyze the

metrics data collected for all sample applications.

3.3.1 Correlation Analysis

We analyzed the correlation between the selected set of metrics along the evolution of

multiple versions of software systems. Correlation analysis is a measure of the degree of

relationship between two variables. In our study, we used Pearson’s correlation coefficient.

The value of correlation coefficient lies between -1 and +1. A correlation value near to zero

shows that there is no linear relationship between variables. A correlation of +1 or -1 shows

that there is a perfect correlation (positive or negative) between variables. If the value of two

variables increase (or decrease) together then there is a positive correlation. If the value of one

variable increases and it decreases for another variable then there is a negative correlation. To

perform the analysis we used Analyse-it [67] tool in Excel.

Page 43: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 33

Pearson Correlation: Pearson correlation is used to measure the degree of the relationship

between linear related variables. Its value lies between -1 and +1. The closer the value is to 1

or –1, the stronger the linear correlation. The following equation is used to calculate the

Pearson r correlation:

where,

r: Pearson correlation coefficient

x: Sample value in first data set

y: Sample value in second data set

n: Number of values in a particular data set

The following categories indicate the way to interpret the calculated r value:

0.0 to 0.2: Very weak correlation

0.2 to 0.4: Weak, low correlation

0.4 to 0.6: Moderate correlation

0.7 to 0.9: Strong, high correlation

0.9 to 1.0: Very strong correlation

3.3.2 Principal Component Analysis (PCA)

PCA is used to analyze the structure in a data set and determines uncorrelated components

they capture. It is used to reduce the number of dimensions in the data. A small number of

uncorrelated components are much simpler to comprehend and useful in analysis than a large

number of correlated components. Principal components (PCs) are linear combinations of the

standardized independent variables. PCs are calculated as follows:

The first PC is the linear combination of all standardized variables which explain a

maximum amount of variance in the data set.

The second and subsequent PCs are linear combinations of all standardized variables,

where each new PC is orthogonal to all previously calculated PCs and captures a

maximum variance under these conditions.

PCA is a way to picture the structure of the data as completely as possible by using as few

variables as possible. Each PC is calculated by taking a linear combination of an eigenvector

of the correlation matrix (or covariance matrix or sum of squares and cross products matrix)

Page 44: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 34

with the variables. The eigenvalues represent the variance of each component. The eigenvalue

for a given factor measures the variance in all the variables which is accounted for by that

factor. The ratio of eigenvalues is the ratio of explanatory importance of the factors with

respect to the variables. If a factor has a low eigenvalue, then it is contributing little to the

explanation of variances in the variables and may be ignored as redundant with more

important factors. PCA is a dimensionality reduction or data compression method. The goal is

dimension reduction and there is no guarantee that the dimensions are interpretable. To select

a subset of variables from a larger set based on which original variables have the highest

correlations with the principal component.

3.4 Methodology

Figure 3.10 shows the overview of the methodology followed. First of all, multiple versions

of applications are required to analyze the software evolution. So, different versions of

applications are downloaded from the SourceForge repository. The executable jar file is

created for each sample application. Then the executable jar file of application is loaded to the

JP2 profiler. JP2 dumps the collected profile to disk in the XML format. The XML files are

generated for two selected scenarios of each application. After that DOM parser is used to

parse the XML files and examine the structure and contents of the XML file. The java DOM

parser is used to calculate the CCT metrics at system level. Dynamic behavior of applications

is analyzed using CCT metrics. The evolution of CCT metrics is studied across the various

versions of each sample application for both the scenarios.

Next, various metrics are computed using various tools. EclEmma tool is used to find the

NPC for each scenario of the sample applications. Dynamic import and export coupling

metrics are calculated for each participating class using java DOM parser. Then NOB in the

participating classes is found using the FindBugs tool. After that, LOC and CC are computed

using CodePro AnalytiX tool and CBO is calculated using STAN tool. The computed set of

metrics is studied for further analysis.

Finally, to determine the relationship among the selected set of metrics various statistical

techniques are applied. Correlation analysis is computed to investigate how strongly the

metrics are related. To analyze the covariance structure of the collected data PCA is

calculated.

Page 45: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Experimental Design and Methodology

Page 35

Figure 3.10 Methodology Flow

3.5 Chapter Summary

This chapter discussed the characteristics of the sample applications. It further described the

selected set of metrics (i.e. CCT metrics, static metrics and dynamic metrics) and tools used in

the experimental design. Then data analysis techniques used are explained. Finally, the overall

methodology followed is discussed.

Page 46: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 36

CHAPTER 4

RESULTS AND ANALYSIS

This chapter presents the detailed discussion on the results obtained from the experimental

methodology followed. Section 4.1 discusses the evolution of CCT metrics. Section 4.2

explains the descriptive statistics. Section 4.3 describes the correlation analysis to find the

relationship among selected set of metrics. Principal component analysis is discussed in

section 4.4 followed by the summarization of results.

4.1 Evolution of CCT metrics

The analysis of data collected allowed us to discover the trend of CCT metrics along the

evolution of each application. Table 4.1-4.4 shows the computed CCT metric values along the

multiple versions of DrawSWF, Sunflow, JHotDraw and Art of Illusion for scenario 1. Figure

4.1-4.4 presents the evolution of CCT metrics for DrawSWF, Sunflow, JHotDraw, Art of

Illusion for scenario 1.

Data regarding the scenario 2 of all the applications are shown in Appendix A. Table A.1-A.4

shows the computed CCT metrics values of DrawSWF, Sunflow, JHotdraw, Art of Illusion.

Figure A.1-A.4 shows the evolution of CCT metrics for the respective four applications.

4.1.1 Evolution of NON

It is evidence from Figure 4.1 for scenario 1 that the value of NON increases with each new

version for DrawSWF and Sunflow applications. For JHotDraw and Art of Illusion the value

of NON first increases till version 3 then decreases further. In case of scenario 2, the values of

NON increases with each new version for DrawSWF and Sunflow applications. For

JHotDraw the value of NON first increases till version 3 then decreases. For Art of Illusion

the value of NON decreases with each new version.

4.1.2 Evolution of NLN

It is evidence from Figure 4.1 for scenario 1 that the value of NON increases with each new

version for DrawSWF and Sunflow applications. For JHotDraw and Art of Illusion the value

of NON first increases till version 3 then decreases further. In case of scenario 2, the values

of NON increases with each new version for DrawSWF and Sunflow applications. For

JHotDraw the value of NON first increases till version 3 then decreases. For Art of Illusion

the value of NON decreases with each new version.

Page 47: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 37

4.1.3 Evolution of AH

In case of scenario 1, the value of AH for DrawSWF first increases then decreases. For

Sunflow the value of AH slightly increases for subsequent versions. For JHotDraw the value

of AH increases till version 3 then decreases for version 4 and then increases further. In case

of scenario 2 also same trend for AH is followed for each sample application as in case of

scenario 1.

4.1.4 Evolution of Height

For scenario 1 it is evidence from Figure 4.4 that height of CCT remains almost constant for

each application. For scenario 2 also the height of CCT remains almost constant for each

application as shown in Figure A.4 in Appendix A.

Table 4.1 CCT metrics for DrawSWF

Version NON NLN AH Height

1.2.5 9877 5212 5.07 22

1.2.6 9899 5226 5.22 22

1.2.7 10264 5414 5.44 22

1.2.8 11155 5898 5.21 22

1.2.9 11171 5906 5.21 22

Table 4.2 CCT metrics for Sunflow

Version NON NLN AH Height

6.1 36483 18536 17.5 96

6.2 37430 19012 17.67 98

6.3 36805 18696 17.67 98

7.1 55729 28222 18.33 96

7.2 56342 28540 18.33 96

Page 48: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 38

Table 4.3 CCT metrics for JHotDraw

Version NON NLN AH Height

7.1 38491 20252 10.02 42

7.2 58136 30108 10.79 44

7.3 58421 30272 11.25 42

7.4.1 38040 19724 9.83 42

7.5.1 38365 19880 10.15 42

7.6 38284 19822 10.28 42

Table 4.4 CCT metrics for Art of Illusion

Version NON NLN AH Height

2.9.1 56130 28830 13.18 46

2.9.2 56018 28778 12.78 46

3.0 56734 29152 12.65 46

3.0.1 55230 28396 12.29 46

3.0.2 55254 28408 11.98 46

Figure 4.1: Evolution of NON (Scenario 1)

0

10000

20000

30000

40000

50000

60000

70000

1 2 3 4 5 6

DrawSWF

Sunflow

JHotDraw

Art of Illusion

Page 49: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 39

Figure 4.2: Evolution of NLN (Scenario 1)

Figure 4.3: Evolution of AH (Scenario 1)

Figure 4.4: Evolution of Height (Scenario 1)

0

5000

10000

15000

20000

25000

30000

35000

1 2 3 4 5 6

DrawSWF

Sunflow

JHotDraw

Art of Illusion

0

2

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6

DrawSWF

Sunflow

JHotDraw

Art of Illusion

0

20

40

60

80

100

120

1 2 3 4 5 6

DrawSWF

Sunflow

JHotDraw

Art of Illusion

Page 50: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 40

4.2 Descriptive Statistics

Descriptive statistics for each metric are calculated across multiple versions of each sample

application. Table 4.5-4.8 shows the descriptive statistics for each metric of DrawSWF,

Sunflow, JHotDraw, Art of Illusion application for scenario 1. Columns “Mean”, “SD”,

“Min”, “Median”, “Max” represents the mean value, standard deviation, minimum value,

median, and maximum value for each measure respectively. These statistics are used to select

metrics that show enough variances to help in further analysis, as low variance measures can

not differentiate classes very well and therefore are likely to be less useful. Analyzing and

presenting the distribution of measures is useful for the explaining the results of the

subsequent analysis.

The descriptive statistics for each metric of DrawSWF, Sunflow, JHotDraw, Art of Illusion

application for scenario 2 are shown in Appendix B. It is observed that NON and NLN has

high variance whereas AH has low variance.

Table 4.5 Descriptive Statistics for DrawSWF (Scenario 1)

Variable Mean SD Min Median Max

NON 10473.2 648.2 9877 10264.0 11171

NLN 5531.2 347.8 5212 5414.0 5906

AH 5.230 0.133 5.07 5.210 5.44

Avg IC 1.550 0.169 1.37 1.620 1.73

Avg EC 1.238 0.123 1.11 1.270 1.38

LOC 4942.2 1215.9 3698 4929.0 6218

CC 748.2 164.6 590 723.0 919

NOB 8.8 5.5 3 10.0 14

NPC 69.0 14.5 54 71.0 83

Avg CBO 3.486 0.099 3.37 3.500 3.59

Page 51: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 41

Table 4.6 Descriptive Statistics for Sunflow (Scenario 1)

Variable Mean SD Min Median Max

NON 44557.8 10485.4 36483 37430.0 56342

NLN 22601.2 5280.2 18536 19012.0 28540

AH 17.900 0.399 17.50 17.670 18.33

Avg IC 2.762 0.200 2.60 2.650 2.98

Avg EC 2.680 0.133 2.53 2.630 2.83

LOC 8303.6 2010.0 6740 6969.0 10639

CC 1630.0 461.7 1271 1320.0 2173

NOB 4.8 1.1 4 4.0 6

NPC 53.8 14.8 43 43.0 70

Avg CBO 7.002 0.706 6.47 6.500 7.82

Table 4.7 Descriptive Statistics for JHotDraw (Scenario 1)

Variable Mean SD Min Median Max

NON 44956.2 10320.9 38040 38428.0 58421

NLN 23343.0 5306.9 19724 20066.0 30272

AH 10.3862 0.5319 9.833 10.2130 11.250

Avg IC 3.190 0.183 2.96 3.155 3.49

Avg EC 2.968 0.160 2.75 2.970 3.22

LOC 19148.8 2180.0 15339 19787.5 20902

CC 3810.2 315.7 3221 3932.0 4071

NOB 28.0 7.1 20 26.5 37

NPC 115.2 8.4 100 117.0 125

Avg CBO 5.173 0.514 4.56 5.090 6.11

Page 52: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 42

Table 4.8 Descriptive Statistics for Art of Illusion

Variable Mean SD Min Median Max

NON 55873.2 637.4 55230 56018.0 56734

NLN 28712.8 317.9 28396 28778.0 29152

AH 12.576 0.461 11.98 12.650 13.18

Avg IC 2.912 0.073 2.82 2.910 3.01

Avg EC 2.200 0.041 2.16 2.190 2.27

LOC 32234.8 317.8 31726 32322.0 32528

CC 6244.8 38.1 6182 6248.0 6281

NOB 53.0 1.4 52 52.0 55

NPC 141.8 0.4 141 142.0 142

Avg CBO 12.478 0.221 12.22 12.490 12.81

4.3 Correlation Analysis

In this section, we analyze the correlation between the selected set of metrics along the

evolution of multiple versions of sample applications. Table 4.9-4.12 shows the correlation

matrix for DrawSWF, Sunflow, JHotDraw, Art of Illusion for scenario 1 while correlation

matrix for scenario 2 is shown in Appendix C.

For DrawSWF, NON and NLN are highly correlated with IC, EC, LOC, CC, NOB, NPC and

CBO. AH has low correlation with each measure. NOB has high correlation with NON, NLN,

IC, EC, LOC, CC, NPC and CBO. Also, NPC is highly correlated with NON, NLN, IC, EC,

LOC, CC, NOB and CBO.

For Sunflow, NON and NLN are highly correlated with avg height, IC, EC, LOC, CC, NOB,

NPC and CBO. AH also has high correlation with each measure. NOB has high correlation

with NON, NLN, AH, IC, EC, LOC, CC, NPC and CBO. Also, NPC is highly correlated with

NON, NLN, AH, IC, EC, LOC, CC, NOB and CBO.

For JHotDraw, NON and NLN are highly correlated with avg height, IC, EC, NOB, NPC and

CBO. AH also has high correlation with NON, NLN, IC, EC, NOB and CBO. NOB has high

correlation with NON, NLN, AH, IC, EC and CBO. Also, NPC is highly correlated with LOC

and CC.

Page 53: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 43

For Art of Illusion, NON and NLN are highly correlated with IC, EC, NOB and CBO. AH

also has moderate correlation with NON, NLN, IC, NOB and CBO. NOB has high correlation

with AH only. NPC is highly correlated with LOC and CC.

In case of scenario 2, the results of correlation matrix for each application is same as shown in

Table C.1-C.4 in Appendix C.

Specifically, for each application the results can be summarized as:

NON and NLN are strongly correlated.

NON and NLN has high correlation with IC, EC, and CBO.

NOB has high correlation with NON, NLN, IC, EC and CBO.

NPC has high correlation with LOC and CC.

AH doesn’t have correlation with any metric.

Table 4.9 Correlation matrix for DrawSWF (Scenario 1)

r NON NLN AH Avg IC Avg EC LOC CC NOB NPC Avg

CBO

NON - 1.000 0.086 0.905 0.933 0.983 0.995 0.954 0.967 0.960

NLN 1.000 - 0.079 0.902 0.931 0.982 0.995 0.952 0.965 0.958

AH 0.086 0.079 - 0.440 0.367 0.232 0.169 0.350 0.312 0.355

IC 0.905 0.902 0.440 - 0.996 0.955 0.939 0.983 0.977 0.979

EC 0.933 0.931 0.367 0.996 - 0.967 0.959 0.985 0.983 0.987

LOC 0.983 0.982 0.232 0.955 0.967 - 0.996 0.991 0.996 0.983

CC 0.995 0.995 0.169 0.939 0.959 0.996 - 0.979 0.987 0.978

NOB 0.954 0.952 0.350 0.983 0.985 0.991 0.979 - 0.999 0.991

NPC 0.967 0.965 0.312 0.977 0.983 0.996 0.987 0.999 - 0.992

Avg CBO 0.960 0.958 0.355 0.979 0.987 0.983 0.978 0.991 0.992 -

Page 54: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 44

Table 4.10 Correlation matrix for Sunflow (Scenario 1)

r NON NLN AH Avg

IC

Avg

EC

LOC CC NOB NPC Avg

CBO

NON - 1.000 0.988 0.992 0.967 0.998 0.998 0.999 0.999 0.999

NLN 1.000 - 0.988 0.992 0.967 0.998 0.998 0.999 0.999 0.999

AH 0.988 0.988 - 0.962 0.991 0.988 0.988 0.985 0.985 0.983

Avg IC 0.992 0.992 0.962 - 0.929 0.990 0.989 0.995 0.995 0.994

Avg EC 0.967 0.967 0.991 0.929 - 0.962 0.961 0.961 0.961 0.958

LOC 0.998 0.998 0.988 0.990 0.962 - 1.000 0.998 0.998 0.999

CC 0.998 0.998 0.988 0.989 0.961 1.000 - 0.998 0.998 0.999

NOB 0.999 0.999 0.985 0.995 0.961 0.998 0.998 - 1.000 0.999

NPC 0.999 0.999 0.985 0.995 0.961 0.998 0.998 1.000 - 0.999

Avg

CBO

0.999 0.999 0.983 0.994 0.958 0.999 0.999 0.999 0.999 -

Table 4.11 Correlation matrix for JHotDraw (Scenario 1)

r NON NLN AH Avg

IC

Avg

EC

LOC CC NOB NPC Avg

CBO

NON - 1.000 0.925 0.857 0.791 -0.217 -0.056 0.929 -0.072 0.758

NLN 1.000 - 0.924 0.863 0.798 -0.239 -0.078 0.936 -0.093 0.763

AH 0.925 0.924 - 0.938 0.879 -0.108 0.027 0.885 -0.057 0.906

Avg

IC

0.857 0.863 0.938 - 0.988 -0.378 -0.255 0.941 -0.337 0.975

Avg

EC

0.791 0.798 0.879 0.988 - -0.435 -0.327 0.925 -0.414 0.967

LOC -0.217 -0.239 -0.108 -0.378 -0.435 - 0.976 -0.481 0.937 -0.300

CC -0.056 -0.078 0.027 -0.255 -0.327 0.976 - -0.351 0.979 -0.180

NOB 0.929 0.936 0.885 0.941 0.925 -0.481 -0.351 - -0.390 0.845

NPC -0.072 -0.093 -0.057 -0.337 -0.414 0.937 0.979 -0.390 - -0.276

Avg

CBO

0.758 0.763 0.906 0.975 0.967 -0.300 -0.180 0.845 -0.276 -

Page 55: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 45

Table 4.12 Correlation Matrix for Art of Illusion (Scenario 1)

r NON NLN AH Avg

IC

Avg

EC

LOC CC NOB NPC Avg

CBO

NON - 1.000 0.682 0.963 0.839 -0.095 -0.522 0.294 -0.225 0.965

NLN 1.000 - 0.664 0.963 0.852 -0.070 -0.502 0.268 -0.206 0.970

AH 0.682 0.664 - 0.604 0.191 -0.774 -0.938 0.852 -0.733 0.498

Avg

IC

0.963 0.963 0.604 - 0.881 -0.079 -0.515 0.267 -0.291 0.981

Avg

EC

0.839 0.852 0.191 0.881 - 0.398 -0.070 -0.214 0.136 0.942

LOC -0.095 -0.070 -0.774 -0.079 0.398 - 0.879 -0.941 0.895 0.091

CC -0.522 -0.502 -0.938 -0.515 -0.070 0.879 - -0.873 0.922 -0.362

NOB 0.294 0.268 0.852 0.267 -0.214 -0.941 -0.873 - -0.791 0.120

NPC -0.225 -0.206 -0.733 -0.291 0.136 0.895 0.922 -0.791 - -0.106

Avg

CBO

0.965 0.970 0.498 0.981 0.942 0.091 -0.362 0.120 -0.106 -

4.4 Principal Component Analysis (PCA)

In this section the results for PCA are shown. The PCA method is applied to 10 metrics. Table

4.13-4.16 shows the PCA results for DrawSWF, Sunflow, JHotDraw, Art of Illusion for

scenario 1. The PCA results for scenario 2 are presented in Appendix D. The number of

principal components is decided based on the amount of variance explained by each

component. A typical threshold is considering principal components with variance greater

than 1.0. Using this criterion, it is noticed that the metrics mainly capture two orthogonal

dimensions in the sample space. In other words, for each sample application two principal

components are considered. The values above 0.5 are considered to interpret principal

components.

For DrawSWF, in both scenarios PC1 measures the NON, NLN, IC, EC and CBO. PC2

measures AH. For Sunflow, PC1 measures NON, NLN, IC, EC and CBO for both scenarios.

But PC2 doesn’t measure AH in case of each scenario.

Page 56: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 46

For JHotDraw, PC1 measures NON, NLN, IC, EC and CBO for both scenario. And PC2

measures AH for both scenario. For Art of Illusion, PC1 measures NON, NLN, IC, EC and

CBO for both scenario. PC2 measures AH in case of scenario 2 but not in case of scenario 1.

A significant amount of variance is captured by the NON and NLN metrics that is not

accounted for the AH as AH is weakly or moderately correlated with other metrics. By

analyzing the definitions of metrics that exhibit high loadings in PC1 and PC2 for each

application, the PCA results can be summarized as:

PC1 measures NON, NLN, IC, EC and CBO. This shows that all these measures are

related to each other. As NON and NLN belongs to same principal component and

correlation matrix also shows that these two are strongly correlated then it may be

sufficient to evaluate only one of them.

PC2 measures AH. It shows that average height of CCT captures additional

information about software evolution.

Table 4.13 PCA results for DrawSWF (Scenario 1)

Component PC1 PC2

NON 0.972 0.222

NLN 0.970 0.229

Avg Height 0.301 -0.951

Avg IC 0.977 -0.164

Avg EC 0.987 -0.083

LOC 0.994 0.074

NOB 0.996 -0.053

CC 0.989 0.139

NPC 0.998 -0.012

Avg CBO 0.996 -0.056

Page 57: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 47

Table 4.14 PCA results for Sunflow (Scenario 1)

Component PC1 PC2

NON 1.000 0.005

NLN 1.000 0.006

Avg Height 0.989 -0.146

IC 0.999 0.043

EC 1.000 -0.003

LOC 0.999 0.001

CC 0.999 0.002

NOB 0.999 0.028

NPC 0.999 0.028

Avg CBO 0.999 0.035

Table 4.15 PCA results for JHotDraw (Scenario 1)

Component PC1 PC2

NON 0.897 -0.311

NLN 0.905 -0.290

Avg Height 0.012 -0.882

Avg IC 0.981 -0.094

Avg EC 0.965 -0.004

LOC -0.489 -0.851

CC -0.361 -0.930

NOB 0.983 -0.013

NPC -0.417 -0.895

Avg CBO 0.919 -0.138

Page 58: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 48

Table 4.16 PCA results for Art of Illusion (Scenario 1)

Component PC1 PC2

NON -0.899 -0.419

NLN -0.889 -0.441

Avg Height -0.898 0.370

Avg IC -0.890 -0.432

Avg EC -0.580 -0.801

LOC 0.506 -0.862

CC 0.836 -0.533

NOB -0.642 0.705

NPC 0.616 -0.685

Avg CBO -0.810 -0.582

4.5 Result Summary

The results can be summarized as follows:

There is relationship between the selected set of metrics (i.e. static, dynamic and CCT

metrics). It can be observed from Table 4.9-4.12 that NON and NLN are highly

correlated with IC, EC, CBO.

It can be observed from Table 4.13-4.16 that AH is measured by different component.

Also, AH is not correlated with other metrics. So, AH can capture additional

information about software evolution.

CCT metrics are helpful in providing scenario level evolution information. It can be

observed from Figure 4.1-4.4 that the height of the CCT remains constant across

multiple versions of sample applications per scenario.

It is also observed that when a number of classes are added in a version of an

application then coupling is increased with increase in participating classes and thus

increases the number of nodes in a CCT.

Page 59: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Results and Analysis

Page 49

4.6 Chapter Summary

This chapter presented the results and analysis obtained from experimental methodology.

Evolution of CCT metrics is discussed for each application. Then descriptive statistics is

explained. Correlation analysis is performed to find the relationship among the selected set of

metrics. Finally, Principal component analysis is done followed by summarization of results.

Page 60: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 50

CHAPTER 5

CONCLUSIONS AND FUTURE WORK

Software evolution has gained enormous attention of both developers and researchers.

Software systems necessarily have to change after the deployment in order to stay useful.

Software evolution is concerned with the sequence of changes to a software system over its

lifetime. Software comprehension is needed for better understanding of software

functionality. But comprehending large software systems is difficult due to large size and high

complexity. It is easier to comprehend software systems at scenario level because of lesser

size. Also, scenario level analysis defines behavior of software system in a user centered

perspective. Dynamic analysis techniques are used to analyze run time behavior of programs.

CCT provides complete information about the dynamic behavior of programs. It maintains

dynamic metrics for each calling context.

An empirical study to investigate the scenario-centric features of software evolution at

runtime is presented in this work. The study is performed on four open source java projects

namely DrawSWF, JHotDraw, Sunflow, Art of Illusion. A set of scenario level metrics i.e.

static, dynamic and CCT metrics is extracted. The evolution of CCT metrics is analyzed

across multiple releases of subject applications.

5.1 Conclusions

Based on the results drawn in the previous chapter, we conclude the following:

Correlation analysis shows that there is relationship between the selected set of

metrics. As the NON and NLN increases then IC, EC, CBO also increases. NOB

increases with increase in NON, NLN, IC, EC, CBO. This trend is followed in each

application.

As the number of classes is added in a version of application then it increases NON

and NLN in a CCT which further increases dynamic coupling in the system.

AH of CCT can capture some extra information for scenario level software evolution.

Also, the height of CCT remains constant for scenario of sample application.

The results show that CCT metrics provide some additional insights in the software

evolution.

Page 61: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Conclusions and Future Work

Page 51

5.2 Future Work

The challenges faced during implementation of this work provide scope for the future work.

These challenges are:

The study can be extended by exploring more CCT metrics. Also, it would be

interesting to analyze the relationship between CCT metrics and other object oriented

quality metrics and how different measures can provide better support to software

developers.

The study considers two scenarios for each sample application. So, different and more

number of scenarios can be considered in future.

The experimental study includes small and medium sized projects hence it can be

replicated on large sized systems in future. The study is performed on a limited set of

sample applications. Thus, a large number of applications can be considered to

validate the study further.

The study can be analyzed on industrial systems so that more generalized results can

be obtained.

Page 62: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 52

REFERENCES

[1] Belady, L.A. and Lehman, M.M., 1976. A Model of Large Program Development,

IBM Systems J., 15(1), pp. 225-252.

[2] Jarke, M., Bui, X.T. and Carroll, J.M., 1998. Scenario management: An

interdisciplinary approach. Requirements Engineering, 3(3-4), pp.155-173.

[3] Hertzum, M., 2003. Making use of scenarios: a field study of conceptual

design. International Journal of Human-Computer Studies, 58(2), pp. 215-239.

[4] Sommerville, I., 2010. Software Engineering, 9th Edition, Addison Wesley.

[5] Xie, G., Chen, J. and Neamtiu, I., 2009. Towards a better understanding of software

evolution: An empirical study on open source software. In ICSM, pp. 51-60.

[6] Erlikh, L., 2000. Leveraging legacy system dollars for e-business. IT professional,

pp.17-23.

[7] Gough, P.A., Fodemski, F.T., Higgins, S.A. and Ray, S.J., 1995. Scenarios-an

industrial case study and hypermedia enhancements. In Requirements Engineering,

Proceedings of the Second IEEE International Symposium, pp. 10-17.

[8] Breitman, K.K., 1998. A framework for scenario evolution. In Requirements

Engineering, Proceedings of the Third International Conference of IEEE, pp. 214-

221.

[9] Carroll, J.M., 1997. Scenario-based design. Handbook of human-computer

interaction 2 .

[10] Fenton, N. E. and Pfleeger, S. L., 1997. Software Metrics : A Rigorous & Practical

Approach, 2nd Edition, PWS Publishing Company.

[11] Ammons, G., Ball, T. and Larus, J.R., 1997. Exploiting hardware performance

counters with flow and context sensitive profiling. ACM Sigplan Notices, 32(5), pp.

85-96.

[12] Lehman M.M. and Belady L..A., 1985. Program Evolution: Processes of Software

Change. Academic Press: New York NY.

[13] Lehman, M. M., 1980. On Understanding Laws, Evolution, and Conservation in the

Large-Program Life Cycle, Journal of Systems and Software, 1(3), pp. 213-221.

[14] Lehman, M. M., 1997. Laws of Software Evolution Revisited, Position Paper,

EWSPT96, Oct. 1996, LNCS 1149, Springer Verlag, pp 108-124.

Page 63: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 53

[15] Lehman, M. M., Ramil, J. F., Wernick, P. D., Perry, P. E. and Turski, W. M., 1997.

Metrics and Laws of Software Evolution–The Nineties View, Proceedings of the 4th

International Software Metrics Symposium, pp. 20-32.

[16] Kemerer, C.F. and Slaughter, S., 1999. An empirical approach to studying software

evolution. IEEE Transactions on Software Engineering, 25(4), pp. 493-509.

[17] Mens, T. and Demeyer, S., 2001. Future trends in software evolution metrics.

In Proceedings of the 4th international workshop on Principles of software

evolution ACM. pp. 83-86.

[18] Lee, Y., Yang, J. and Chang, K.H., 2007. Metrics and evolution in open source

software. In Seventh International Conference on Quality Software IEEE, pp. 191-

197.

[19] Jermakovics, A., Scotto, M. and Succi, G., 2007. Visual identification of software

evolution patterns. In Ninth international workshop on Principles of software

evolution: in conjunction with the 6th ESEC/FSE joint meeting, ACM. pp. 27-30.

[20] Chidamber, S. R. and Kemerer, C. F., 1994. A Metric Suite for Object-Oriented

Design, IEEE Transactions on Software Engineering, 20(6), pp. 476-493.

[21] Mens, T., Fernández-Ramil, J. and Degrandsart, S., 2008. The evolution of Eclipse.

In Software Maintenance. ICSM 2008. IEEE International Conference, pp. 386-395.

[22] Briand, L.C., Wüst, J., Daly, J.W. and Porter, D.V., 2000. Exploring the relationships

between design measures and software quality in object-oriented systems. Journal of

systems and software, 51(3), pp. 245-273.

[23] Zimmermann, T., Zeller, A., Weissgerber, P. and Diehl, S., 2005. Mining version

histories to guide software changes. IEEE Transactions on Software

Engineering, 31(6), pp. 429-445.

[24] Xie, G., Chen, J. and Neamtiu, I., 2009. Towards a better understanding of software

evolution: An empirical study on open source software. In ICSM, Vol. 9, pp. 51-60.

[25] Drouin, N., Badri, M. and Touré, F., 2013. Analyzing software quality evolution using

metrics: an empirical study on open source software. Journal of Software, 8(10), pp.

2462-2473.

[26] Novais, R.L., Torres, A., Mendes, T.S., Mendonça, M. and Zazworka, N., 2013.

Software evolution visualization: A systematic mapping study. Information and

Software Technology, 55(11), pp.1860-1883.

Page 64: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 54

[27] Murgia, A., Concas, G., Tonelli, R. and Turnu, I., 2009. Empirical study of software

quality evolution in open source projects using agile practices. In Proc. of the 1st

International Symposium on Emerging Trends in Software Metrics, pp. 11.

[28] Breitman, K. and do Prado, J.S., 2000. Scenario evolution: a closer view on

relationships. In Requirements Engineering. Proceedings. 4th International

Conference of IEEE, pp. 95-105.

[29] Breitman, K.K., do Prado Leite, J.C.S. and Berry, D.M., 2005. Supporting scenario

evolution. Requirements Engineering, 10(2), pp.112-131.

[30] Salah, M., Mancoridis, S., Antoniol, G. and Di Penta, M., 2006. Scenario-Driven

Dynamic Analysis for Comprehending Large Software Systems. In CSMR, 6, pp. 71-

80.

[31] Alspaugh, T.A. and Antón, A.I., 2008. Scenario support for effective

requirements. Information and Software Technology, 50(3), pp. 198-220.

[32] Hui, Z.H. and Ohnishi, A., 2003. Integration and evolution method of scenarios from

different viewpoints. In Software Evolution, 2003. Proceedings. Sixth International

Workshop on Principles of IEEE, pp. 183-188.

[33] Briand, L.C., Daly, J.W. and Wust, J.K., 1999. A unified framework for coupling

measurement in object-oriented systems. IEEE Transactions on software

Engineering, 25(1), pp. 91-121.

[34] Lee, Y.S., Liang, B.S., Wu, S.F. and Wang, F.J., 1995, Oct. Measuring the coupling

and cohesion of an object-oriented program based on information flow. In Proc.

International Conference on Software Quality, Maribor, Slovenia, pp. 81-90.

[35] Eder, J., Kappel, G. and Schrefl, M., 1993. Coupling and cohesion in object toriented

systems. Tech. Rep, Department of Information Systems, University of Linz, Austria.

[36] Alexander, R.T. and Offutt, A.J., 2004. Coupling-based testing of O-O programs.

Journal of Universal Computer Science, 10(4), pp. 391-427.

[37] Chae, H.S., Kwon, Y.R. and Bae, D.H., 2000. A cohesion measure for object-oriented

classes. Software Practice and Experience, 30(12), pp. 1405-1431.

[38] Zhou, Y., Wen, L., Wang, J., Chen, Y., Lu, H. and Xu, B., 2003, Dec. DRC: A

dependence relationships based cohesion measure for classes. In Proc. the Tenth Asia-

Pacific Software Engineering Conference (APSEC 2003), Chiang Mai, Thailand, pp.

215-233.

Page 65: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 55

[39] Wang, J., Zhou, Y., Wen, L., Chen, Y., Lu, H. and Xu, B., 2005. DMC: A more

precise cohesion measure for classes. Information and Software Technology, 47(3),

pp. 167-180.

[40] Arisholm E., 2002, Jun. Dynamic coupling measures for object-oriented software. In

Proc. the Eighth IEEE Symposium Software Metrics, Ottawa, Canada, pp. 33-42.

[41] Mitchell, Á. and Power, J.F., 2004. An empirical investigation into the dimensions of

run-time coupling in Java programs. In Proc. The 3rd Conference on the Principles

and Practice of Programming in Java, Las Vegas, USA, pp. 9-14.

[42] Yacoub, S.M., Ammar, H.H. and Robinson, T., 1999. Dynamic metrics for object-

oriented designs. In Proc. the 5th International Software Metrics Symposium, Boca

Raton, USA, pp.50-61.

[43] Hassoun, Y., Counsell, S. and Johnson, R., 2005. Dynamic coupling metric proof of

concept. IEE Proc. Software, 152(6), pp. 273-279.

[44] Gupta, N. and Rao, P., 2001, Nov. Program execution based module cohesion

measurement. In Proc. the 16th International Conference on Automated on Software

Engineering, San Diego, USA, pp. 144-153.

[45] Mitchell, A. and Power, J.F. Run-time cohesion metrics: An empirical investigation.

In Proc. the International Conference on Software Engineering Research and Practice,

Las Vegas, USA, pp. 532-537.

[46] Munson, J.C. and Khoshgoftaar, T.M., 1992. Measuring dynamic program

complexity. IEEE Software, 9(6), pp. 48-55.

[47] Khoshgoftaar, T. M., Munson J. C., Lanning D. L., 1993. Dynamic system

complexity. In Proc. Software Metrics Symposium, Baltimore, USA, pp.129-140.

[48] Godfrey, M.W. and German, D.M., 2008. The past, present, and future of software

evolution. In Frontiers of Software Maintenance, IEEE, pp. 129-138.

[49] Bennett, K.H. and Rajlich, V.T., 2000. Software maintenance and evolution: a

roadmap. In Proceedings of the Conference on the Future of Software

Engineering, ACM, pp. 73-87.

[50] D’Ambros, M., Gall, H., Lanza, M. and Pinzger, M., 2008. Analysing software

repositories to understand software evolution. In Software evolution, Springer Berlin

Heidelberg, pp. 37-67.

[51] Rajlich, V., 2014. Software evolution and maintenance. In Proceedings of the on

Page 66: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 56

Future of Software Engineering, ACM, pp. 133-144.

[52] Yacoub, S.M., Ammar, H.H. and Robinson, T., 1999. Dynamic metrics for object

oriented designs. In Software Metrics Symposium. Proceedings. Sixth International,

pp. 50-61.

[53] Moret, P., Binder, W. and Villazon, A., 2009. CCCP: Complete calling context

profiling in virtual execution environments. In Proceedings of the ACM SIGPLAN

workshop on Partial evaluation and program manipulation, pp. 151-160.

[54] Eitan, N., Gordon, M., Harel, D., Marron, A. and Weiss, G., 2011. On visualization

and comprehension of scenario-based programs. In Program Comprehension (ICPC),

IEEE 19th International Conference, pp. 189-192.

[55] Sarimbekov, A., Sewe, A., Binder, W., Moret, P., Schoeberl, M. and Mezini, M.,

2011. Portable and accurate collection of calling-context-sensitive bytecode metrics

for the Java virtual machine. In Proceedings of the 9th International Conference on

Principles and Practice of Programming in Java, ACM, pp. 11-20.

[56] Kagdi, H., Collard, M.L. and Maletic, J.I., 2007. A survey and taxonomy of

approaches for mining software repositories in the context of software

evolution. Journal of software maintenance and evolution: Research and

practice, 19(2), pp.77-131.

[57] http://drawswf.sourceforge.net/

[58] http://www.jhotdraw.org/

[59] http://sunflow.sourceforge.net/

[60] http://www.artofillusion.org/

[61] Maplesden, D., Tempero, E., Hosking, J. and Grundy, J.C., 2015, January. Subsuming

methods: Finding new optimisation opportunities in object-oriented software.

In Proceedings of the 6th ACM/SPEC International Conference on Performance

Engineering, pp. 175-186.

[62] Sarimbekov, A., Sewe, A., Binder, W., Moret, P. and Mezini, M., 2014. JP2: Call-site

aware calling context profiling for the Java Virtual Machine. Science of Computer

Programming, pp.146-157.

[63] Hovemeyer, D. and Pugh, W., 2004. Finding bugs is easy. ACM Sigplan

Notices, 39(12), pp.92-106.

Page 67: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 57

[64] http://www.eclemma.org/

[65] https://marketplace.eclipse.org/content/codepro-analytix

[66] http://stan4j.com/

[67] http://analyse-it.com/

[68] McCall, J.A., 1994. Quality factors. Encyclopedia of software engineering.

[69] Mens, T., Wermelinger, M., Ducasse, S., Demeyer, S., Hirschfeld, R. and Jazayeri,

M., 2005. Challenges in software evolution. In Eighth International Workshop on

Principles of Software Evolution, pp. 13-22.

[70] Parnas, D.L., 1994. Software aging. In Proceedings of the 16th international

conference on Software engineering pp. 279-287.

[71] Austin, M.A. and Samadzadeh, M.H., 2005. Software comprehension/maintenance:

An introductory course. In 18th International Conference on Systems Engineering, pp.

414-419.

[72] Penny, G. and Armstrong, T., 2003. Software Maintenance: Concepts and

Practice. SE, ISBN, pp. 978-981.

[73] Davis, A.M., 1995. 201 principles of software development. McGraw-Hill, Inc.

[74] McConnell, S., 1993. Code Complete: a practical handbook of software

construction. Redmond, Wash.: Microsoft Press.

[75] Cook, S., Ji, H. and Harrison, R., 2000. Software evolution and software

evolvability. University of Reading, UK, pp.1-12.

[76] Serrano, M. and Zhuang, X., 2009, March. Building approximate calling context from

partial call traces. In Proceedings of the 7th annual IEEE/ACM International

Symposium on Code Generation and Optimization, pp. 221-230.

[77] Huang, J. and Bond, M.D., 2013. Efficient context sensitivity for dynamic analyses

via calling context uptrees and customized memory management. In ACM SIGPLAN

Conference on Object-Oriented Programming, Systems, Languages, and Applications

(OOPSLA), pp. 53-72.

[78] Sarvari, S., Singh, P. and Sikka, G., 2015. Efficient and Scalable Collection of

Dynamic Metrics using MapReduce. In Asia-Pacific Software Engineering

Conference (APSEC), pp. 127-134.

[79] Bond, M.D., Baker, G.Z. and Guyer, S.Z., 2010. Breadcrumbs: efficient context

Page 68: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 58

sensitivity for dynamic bug detection analyses. In ACM Sigplan Notices Vol. 45, No.

6, pp. 13-24.

[80] Arisholm, E., Briand, L.C. and Foyen, A., 2004. Dynamic coupling measurement for

object-oriented software. IEEE Transactions on Software Engineering, 30(8), pp. 491-

506.

[81] Gill, G.K. and Kemerer, C.F., 1991. Cyclomatic complexity density and software

maintenance productivity. IEEE transactions on software engineering, 17(12),

pp.1284-1288.

[82] Martin, R., 1994. OO design quality metrics. An analysis of dependencies, pp.151-

170.

Page 69: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 59

APPENDIX A

This Appendix presents CCT Metrics and evolution of CCT metrics for each sample

application for Scenario 2.

Table A.1 CCT metrics for DrawSWF (Scenario 2)

Version NON NLN AH Height

1.2.5 13029 6888 5.93 46

1.2.6 13033 6890 5.96 46

1.2.7 13828 7308 5.66 46

1.2.8 14389 7612 5.89 46

1.2.9 14389 7612 5.89 46

Table A.2 CCT metrics for Sunflow (Scenario 2)

Version NON NLN AH Height

6.1 18095 9300 16.83 96

6.2 18178 9344 17 98

6.3 18225 9364 17 98

7.1 25322 12944 17.67 96

7.2 25475 13032 17.67 96

Table A.3 CCT metrics for JHotDraw (Scenario 2)

Version NON NLN AH Height

7.1 42959 22506 10.71 42

7.2 61549 31832 12.77 44

7.3 63368 32776 11.71 42

7.4.1 42247 21858 10.58 42

7.5.1 42151 21798 11.84 42

7.6 41548 21472 11.98 42

Page 70: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 60

Table A.4 CCT metrics for Art of Illusion (Scenario 2)

Version NON NLN AH Height

2.9.1 59027 30456 13.13 46

2.9.2 57487 29556 12.62 46

3.0 55977 28780 14.07 46

3.0.1 55999 28794 12.45 46

3.0.2 56024 28804 14.07 46

Figure A.1: Evolution of NON (Scenario 2)

Figure A.2: Evolution of NLN (Scenario 2)

0

10000

20000

30000

40000

50000

60000

70000

1 2 3 4 5 6

DrawSWF

Sunflow

JHotDraw

Art of Illusion

0

5000

10000

15000

20000

25000

30000

35000

1 2 3 4 5 6

DrawSWF

Sunflow

JHotDraw

Art of Illusion

Page 71: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 61

Figure A.3: Evolution of AH (Scenario 2)

Figure A.4: Evolution of Height (Scenario 2)

0

2

4

6

8

10

12

14

16

18

20

1 2 3 4 5 6

DrawSWF

Sunflow

JHotDraw

Art of Illusion

0

20

40

60

80

100

120

1 2 3 4 5 6

DrawSWF

Sunflow

JHotDraw

Art of Illusion

Page 72: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 62

APPENDIX B

This Appendix presents the descriptive statistics for each sample application for Scenario 2.

Table B.1 Descriptive statistics for DrawSWF (Scenario 2)

Variable Mean SD Min Median Max

NON 13733.6 681.0 13029 13828.0 14389

NLN 7262.0 362.4 6888 7308.0 7612

AH 5.866 0.119 5.66 5.890 5.96

Avg IC 1.660 0.115 1.53 1.730 1.76

Avg EC 1.328 0.077 1.24 1.370 1.40

LOC 5166.2 1216.0 3925 5141.0 6445

CC 779.0 165.6 620 753.0 951

NOB 8.8 5.5 3 10.0 14

NPC 70.0 14.5 55 72.0 84

Avg CBO 3.482 0.079 3.39 3.500 3.59

Table B.2 Descriptive statistics for Sunflow (Scenario 2)

Variable Mean SD Min Median Max

NON 21059.0 3962.0 18095 18225.0 25475

NLN 10796.8 2000.7 9300 9364.0 13032

AH 17.234 0.404 16.83 17.000 17.67

Avg IC 2.706 0.333 2.46 2.480 3.15

Avg EC 2.582 0.181 2.42 2.490 2.82

LOC 8284.0 1981.9 6740 6969.0 10542

CC 1609.8 450.1 1259 1308.0 2129

NOB 4.8 1.1 4 4.0 6

NPC 52.6 12.7 43 44.0 67

Avg CBO 6.836 0.767 6.26 6.300 7.80

Page 73: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 63

Table B.3 Descriptive statistics for JHotDraw (Scenario 2)

Variable Mean SD Minimum Median Maximum

NON 48970.3 10473.3 41548 42603.0 63368

NLN 22103.7 11010.8 2178 22182.0 32776

AH 11.598 0.826 10.58 11.775 12.77

Avg IC 3.057 0.046 3.00 3.045 3.13

Avg EC 2.910 0.056 2.83 2.905 2.99

LOC 21773.8 2180.0 17964 22412.5 23527

CC 4323.2 315.7 3734 4445.0 4584

NOB 33.0 7.1 25 31.5 42

NPC 129.2 8.4 114 131.0 139

Avg CBO 5.247 0.539 4.74 4.995 6.02

Table B.4 Descriptive statistics for Art of Illusion

Variable Mean SD Minimum Median Maximum

NON 56938.8 1422.2 55977 56024.0 59207

NLN 29278.0 736.9 28780 28804.0 30456

AH 13.268 0.774 12.45 13.130 14.07

Avg IC 2.910 0.032 2.88 2.900 2.96

LOC 32745.8 317.8 32237 32833.0 33039

Avg EC 2.250 0.016 2.23 2.250 2.27

CC 6316.8 38.1 6254 6320.0 6353

NOB 53.0 1.4 52 52.0 55

NPC 142.8 0.4 142 143.0 143

Avg CBO 12.410 0.097 12.31 12.390 12.53

Page 74: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 64

APPENDIX C

This Appendix presents correlation matrix for selected set of metrics for each sample

application for Scenario 2.

Table C.1 Correlation matrix for DrawSWF (Scenario 2)

r NON NLN AH Avg IC Avg EC LOC CC NOB NPC Avg CBO

NON - 1.000 -0.306 0.943 0.940 0.995 0.986 0.999 1.000 0.897

NLN 1.000 - -0.299 0.941 0.938 0.996 0.987 0.999 1.000 0.896

AH -0.306 -0.299 - -0.587 -0.575 -0.222 -0.145 -0.347 -0.305 -0.384

Avg IC 0.943 0.941 -0.587 - 0.999 0.907 0.878 0.956 0.943 0.928

Avg EC 0.940 0.938 -0.575 0.999 - 0.903 0.877 0.953 0.940 0.944

LOC 0.995 0.996 -0.222 0.907 0.903 - 0.996 0.990 0.995 0.865

CC 0.986 0.987 -0.145 0.878 0.877 0.996 - 0.978 0.986 0.864

NOB 0.999 0.999 -0.347 0.956 0.953 0.990 0.978 - 0.999 0.901

NPC 1.000 1.000 -0.305 0.943 0.940 0.995 0.986 0.999 - 0.897

Avg CBO 0.897 0.896 -0.384 0.928 0.944 0.865 0.864 0.901 0.897 -

Table C.2 Correlation matrix for Sunflow (Scenario 2)

r NON NLN AH Avg IC Avg EC LOC CC NOB NPC Avg CBO

NON - 1.000 0.987 0.981 0.979 0.999 0.999 1.000 0.999 0.994

NLN 1.000 - 0.987 0.980 0.979 0.999 0.999 1.000 0.999 0.994

AH 0.987 0.987 - 0.971 0.980 0.989 0.989 0.985 0.987 0.981

Avg IC 0.981 0.980 0.971 - 0.926 0.976 0.974 0.983 0.978 0.955

Avg EC 0.979 0.979 0.980 0.926 - 0.985 0.986 0.975 0.978 0.990

LOC 0.999 0.999 0.989 0.976 0.985 - 1.000 0.999 0.998 0.995

CC 0.999 0.999 0.989 0.974 0.986 1.000 - 0.998 0.998 0.996

NOB 1.000 1.000 0.985 0.983 0.975 0.999 0.998 - 0.999 0.993

NPC 0.999 0.999 0.987 0.978 0.978 0.998 0.998 0.999 - 0.996

Avg CBO 0.994 0.994 0.981 0.955 0.990 0.995 0.996 0.993 0.996 -

Page 75: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 65

Table C.3 Correlation matrix for JHotDraw (Scenario 2)

r NON NLN AH Avg IC Avg EC LOC CC NOB NPC Avg CBO

NON - 0.721 0.556 0.919 0.859 -0.238 -0.071 0.935 -0.085 0.987

NLN 0.721 - 0.163 0.657 0.614 -0.407 -0.260 0.610 -0.213 0.713

AH 0.556 0.163 - 0.220 0.127 0.200 0.195 0.527 0.087 0.441

Avg IC 0.919 0.657 0.220 - 0.983 -0.383 -0.192 0.896 -0.173 0.967

Avg EC 0.859 0.614 0.127 0.983 - -0.517 -0.335 0.884 -0.300 0.930

LOC -0.238 -0.407 0.200 -0.383 -0.517 - 0.976 -0.481 0.937 -0.346

CC -0.071 -0.260 0.195 -0.192 -0.335 0.976 - -0.351 0.979 -0.171

NOB 0.935 0.610 0.527 0.896 0.884 -0.481 -0.351 - -0.390 0.954

NPC -0.085 -0.213 0.087 -0.173 -0.300 0.937 0.979 -0.390 - -0.172

Avg CBO 0.987 0.713 0.441 0.967 0.930 -0.346 -0.171 0.954 -0.172 -

Table C.4 Correlation matrix for Art of Illusion (Scenario 2)

r NON NLN AH Avg

IC

LOC Avg

EC

CC NOB NPC Avg

CBO

NON - 1.000 -0.319 0.977 -0.972 0.884 -0.925 0.982 -0.892 0.917

NLN 1.000 - -0.319 0.977 -0.972 0.883 -0.926 0.981 -0.894 0.915

AH -0.319 -0.319 - -0.297 0.465 -0.349 0.179 -0.391 0.100 -0.331

Avg

IC

0.977 0.977 -0.297 - -0.968 0.950 -0.843 0.950 -0.884 0.946

LOC -0.972 -0.972 0.465 -0.968 - -0.883 0.879 -0.941 0.895 -0.878

Avg

EC

0.884 0.883 -0.349 0.950 -0.883 - -0.643 0.894 -0.707 0.978

CC -0.925 -0.926 0.179 -0.843 0.879 -0.643 - -0.873 0.922 -0.705

NOB 0.982 0.981 -0.391 0.950 -0.941 0.894 -0.873 - -0.791 0.948

NPC -0.892 -0.894 0.100 -0.884 0.895 -0.707 0.922 -0.791 - -0.692

Avg

CBO

0.917 0.915 -0.331 0.946 -0.878 0.978 -0.705 0.948 -0.692 -

Page 76: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 66

APPENDIX D

This Appendix presents principal components analysis results for each sample application for

Scenario 2.

Table D.1 PCA results for DrawSWF (Scenario 2)

Component PC1 PC2

NON 0.992 0.106

NLN 0.992 0.113

AH -0.399 0.911

Avg IC 0.975 -0.220

Avg EC 0.975 -0.212

CC 0.962 0.268

LOC 0.976 0.195

NPC 0.992 0.107

NOB 0.996 0.063

Avg CBO 0.932 -0.054

Table D.2 PCA results for Sunflow (Scenario 2)

Component PC1 PC2

NON 0.999 0.019

NLN 0.999 0.017

AH 0.991 -0.007

Avg IC 0.978 0.207

Avg EC 0.983 -0.176

LOC 1.000 -0.009

CC 1.000 -0.017

NPC 0.999 0.011

NOB 0.999 0.035

Avg CBO 0.995 -0.080

Page 77: An Empirical Investigation into Scenario Level Software ...pvsingh.com/wp-content/uploads/2017/07/Sarishty-thesis.pdf · 3.4 Data Analysis Techniques 32 3.4.1 Correlation Analysis

Page 67

Table D.3 PCA results for JHotDraw (Scenario 2)

Component PC1 PC2

NON 0.913 -0.404

NLN 0.746 -0.070

AH 0.335 -0.524

Avg IC 0.927 -0.222

Avg EC 0.934 -0.063

LOC -0.609 -0.790

CC -0.458 -0.877

NOB 0.964 -0.127

NPC -0.454 -0.844

Avg CBO 0.953 -0.296

Table D.4 PCA results for Art of Illusion (Scenario 2)

Component PC1 PC2

NON -0.996 0.054

NLN -0.996 0.056

AH 0.362 0.893

Avg IC -0.989 0.040

Avg EC -0.915 -0.124

CC 0.893 -0.256

LOC 0.984 0.084

NPC 0.884 -0.327

NOB -0.978 -0.060

Avg CBO -0.932 -0.096