A parallel framework for software defect detection and...

15
Cluster Comput (2017) 20:2267–2281 DOI 10.1007/s10586-017-0892-6 A parallel framework for software defect detection and metric selection on cloud computing Md Mohsin Ali 1 · Shamsul Huda 2 · Jemal Abawajy 2 · Sultan Alyahya 3 · Hmood Al-Dossari 3 · John Yearwood 2 Received: 23 October 2016 / Revised: 13 March 2017 / Accepted: 27 April 2017 / Published online: 24 May 2017 © Springer Science+Business Media New York 2017 Abstract With the continued growth of Internet of Things (IoT) and its convergence with the cloud, numerous inter- operable software are being developed for cloud. Therefore, there is a growing demand to maintain a better quality of soft- ware in the cloud for improved service. This is more crucial as the cloud environment is growing fast towards a hybrid model; a combination of public and private cloud model. Considering the high volume of the available software as a service (SaaS) in the cloud, identification of non-standard software and measuring their quality in the SaaS is an urgent issue. Manual testing and determination of the quality of the software is very expensive and impossible to accomplish it to some extent. An automated software defect detection model that is capable to measure the relative quality of software and identify their faulty components can significantly reduce both the software development effort and can improve the cloud service. In this paper, we propose a software defect detec- tion model that can be used to identify faulty components B Shamsul Huda [email protected] Md Mohsin Ali [email protected] Jemal Abawajy [email protected] Sultan Alyahya [email protected] Hmood Al-Dossari [email protected] John Yearwood [email protected] 1 The Australian National University, Canberra, Australia 2 Deakin University, Melbourne, Australia 3 King Saud University, Riyadh, Saudi Arabia in big software metric data. The novelty of our proposed approach is that it can identify significant metrics using a combination of different filters and wrapper techniques. One of the important contributions of the proposed approach is that we designed and evaluated a parallel framework of a hybrid software defect predictor in order to deal with big software metric data in a computationally efficient way for cloud environment. Two different hybrids have been devel- oped using Fisher and Maximum Relevance (MR) filters with a Artificial Neural Network (ANN) based wrapper in the parallel framework. The evaluations are performed with real defect-prone software datasets for all parallel versions. Experimental results show that the proposed parallel hybrid framework achieves a significant computational speedup on a computer cluster with a higher defect prediction accuracy and smaller number of software metrics compared to the indepen- dent filter or wrapper approaches. 1 Introduction Due to the rapid development of cloud computing, the size and complexity of cloud based software products are con- tinually increasing. With the advent of Internet of Things (IoT) and its convergence with cloud, the functionalities and requirements of cloud based software products are also increasing. This poses more challenges to the cloud based business organization to develop high quality software prod- ucts. Thus, determination of the quality of the software product and maintaining their quality are very important and challenging due the exponential growth of overall complex- ity. Considering the importance of tackling this challenge, software industries are spending around 1/4th of their bud- get for quality assurance and testing [4]. 123

Transcript of A parallel framework for software defect detection and...

Page 1: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

Cluster Comput (2017) 20:2267–2281DOI 10.1007/s10586-017-0892-6

A parallel framework for software defect detection and metricselection on cloud computing

Md Mohsin Ali1 · Shamsul Huda2 · Jemal Abawajy2 ·Sultan Alyahya3 · Hmood Al-Dossari3 · John Yearwood2

Received: 23 October 2016 / Revised: 13 March 2017 / Accepted: 27 April 2017 / Published online: 24 May 2017© Springer Science+Business Media New York 2017

Abstract With the continued growth of Internet of Things(IoT) and its convergence with the cloud, numerous inter-operable software are being developed for cloud. Therefore,there is a growing demand tomaintain a better quality of soft-ware in the cloud for improved service. This is more crucialas the cloud environment is growing fast towards a hybridmodel; a combination of public and private cloud model.Considering the high volume of the available software as aservice (SaaS) in the cloud, identification of non-standardsoftware and measuring their quality in the SaaS is an urgentissue. Manual testing and determination of the quality of thesoftware is very expensive and impossible to accomplish it tosome extent. An automated software defect detection modelthat is capable to measure the relative quality of software andidentify their faulty components can significantly reduce boththe software development effort and can improve the cloudservice. In this paper, we propose a software defect detec-tion model that can be used to identify faulty components

B Shamsul [email protected]

Md Mohsin [email protected]

Jemal [email protected]

Sultan [email protected]

Hmood [email protected]

John [email protected]

1 The Australian National University, Canberra, Australia

2 Deakin University, Melbourne, Australia

3 King Saud University, Riyadh, Saudi Arabia

in big software metric data. The novelty of our proposedapproach is that it can identify significant metrics using acombination of different filters and wrapper techniques. Oneof the important contributions of the proposed approach isthat we designed and evaluated a parallel framework of ahybrid software defect predictor in order to deal with bigsoftware metric data in a computationally efficient way forcloud environment. Two different hybrids have been devel-oped using Fisher and Maximum Relevance (MR) filterswith a Artificial Neural Network (ANN) based wrapper inthe parallel framework. The evaluations are performed withreal defect-prone software datasets for all parallel versions.Experimental results show that the proposed parallel hybridframework achieves a significant computational speedup on acomputer clusterwith a higher defect prediction accuracy andsmaller number of softwaremetrics compared to the indepen-dent filter or wrapper approaches.

1 Introduction

Due to the rapid development of cloud computing, the sizeand complexity of cloud based software products are con-tinually increasing. With the advent of Internet of Things(IoT) and its convergence with cloud, the functionalitiesand requirements of cloud based software products are alsoincreasing. This poses more challenges to the cloud basedbusiness organization to develop high quality software prod-ucts. Thus, determination of the quality of the softwareproduct and maintaining their quality are very important andchallenging due the exponential growth of overall complex-ity. Considering the importance of tackling this challenge,software industries are spending around 1/4th of their bud-get for quality assurance and testing [4].

123

Page 2: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

2268 Cluster Comput (2017) 20:2267–2281

The basic way of delivering a bug-free quality softwareproduct is to locate and correct the defects. For large and com-plex software products in the cloud environment, there is asignificant number of test cases, and in this scenario testingis too tedious and costly. Detecting and correcting faults on aproduct is reported as 100 times more expensive than repair-ing during the development stage [31]. However, delayeddetection propagates the effect of faults on the subsequentstages and complicated the overall scenario. Similarly, earlydetection is challenging due to the absence of own failuredata [39].

Generally, internal and external metrics are used as a qual-ity measure. Internal metrics are based on the software codeand external metrics are measured by the behavioral char-acteristics of the software [4]. Software developers requireto monitor and track internal attributes as early as pos-sible throughout the development process. Changing themonitored attributes from one stage to another indicates apotential design problem.Monitoring, measuring, analyzing,and tracking all the metrics for a developer at every stage ofa highly evolving software is too challenging and costly tomaintain its quality.

Thus, automated software fault detection is an impor-tant activity to significantly reduce the overall cost of thehighly evolving software product in the cloud environment.Currently, there are three types of automated fault detec-tion research tracks are being focused on: (a) estimating theamount of remaining defects in the software product (e.g.[10,13,30]), (b) discovering defect associations (e.g. [36]),and (c) classifying the software components as defective ornon-defective (e.g. [17,26,27,29]). The work in this articlelies in the third category.

Current trend of the classification problem relies on thestandard machine learning (ML) approaches such as naïvebayes (NB) [7], support vector machines (SVM) [11], deci-sion trees [32], and artificial neural networks (ANN) [24].The key ideas of these techniques are to use the internal codemetrics of the defect data from the similar software projects[5,25,28] to train the classifier so that they could classifythe targeted software modules as defective or non-defectivebased on this learning.

All the software code metrics do not hold importantcharacteristics. Monitoring and controlling all these met-rics are too complex and tedious for the software devel-opers for every release. Moreover, training the classifierwith lots of irrelevant metrics unnecessarily extends thetraining time, complicates the classifier model, and mayintroduce the over-fitting problem [6,23]. As a result,selecting the key software defect metrics is very impor-tant for the classifier as a part of a high quality soft-ware product delivery. This also makes it easier for thedevelopers to monitor only the important metrics at everyrelease.

Accurate classification using the selected key metrics asinputs in a classifier requires a proper configuration of theproblem and validation technique. Improper validation maycause misleading results which could eventually be prop-agated to the subsequent publications. Improper validationmay arise due to different reasons such as: (a) using imbal-ance datasets in the training stage, (b) not applying varioustypes of datasets for validation, and (c) skipping the “falsenegative” (identifyingdefective component as non-defective)checking, which incurs more negative impact on classifica-tion.

Prior work of the authors [20] combines both the filter[34] and wrapper [3] approaches to propose a hybrid clas-sification model to take the advantages of both approaches.The evaluation is carried out on multivariate process moni-toring and detection of sources of out-of-control signals inmanufacturing systems [20]. However, prior work [20] waslimited to a single filter and a single search strategy.

Furthermore, the employed defect predictors for the com-plex software systems are taking toomuch time to finish theirprocessing. For the cloud, size of the software product anddatasets is trending upwards and reaches to a very big dataform, processing time of the predictor is a key issue. Onewayto resolve this issue is to harness the full power of the mod-ern computer architecture by a parallel prediction system.To scale-up the system further, multiple computer systemsconnected through a high-speed network could be employedto execute the predictor in parallel. However, to achieve allof these parallel computation benefit, the predictor must bedesigned accordingly.

In this article, we propose a parallel hybrid wrapper-filterapproach to find out the key software metrics for efficientlyclassifying a software module as a defective component ora non-defective component. The key contributions of thisarticle are as follows:

– A novel parallel framework for identifying key metricsusing a combination of different filter and wrapper meth-ods has been proposed.

– A combination of multiple filters with a wrapper for in-depth analysis has been proposed.

– An extensive analysis of the proposed approaches withvarious types of real defect-prone datasets for robust eval-uation has been accomplished.

The organization of this article is as follows: Sect. 1 presentsan introduction. Background information and related workare discussed in Sect. 2. Proposed methodologies of parallelhybrid models are discussed in Sect. 3. An implementationdetail with experimental evaluation of parallel execution per-formance and important feature selection accuracy for defectprediction are presented in Sect. 4. Finally, Sect. 5 concludesthe article.

123

Page 3: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

Cluster Comput (2017) 20:2267–2281 2269

2 Background and related Work

Some background information and existing work closelyrelated to this work are presented in this section.

2.1 Software metrics and metric selection

Software metrics are the tools in software engineering that asoftware engineer uses to understand different aspects of thecode base and the overall progress of the software project.While the metrics are generally used for quality assurance,performance, debugging, management and estimating costs,they are becoming popular in finding defects in pre- orpost-released code, predicting defective code, and predict-ing project success and risks.

There are different types of metrics available. They couldbe classified into requirements metrics, product metrics orprocess metrics. They could also be classified based onthe code, programmer productivity, design, testing, mainte-nance, and management. Some of these are measurable andthe others are not. While some metrics are able to measuredirectly, others are measured analytically. As for example, inobject oriented (OO) software, design coupling based direct-measurablemetrics such as coupling between objects (CBO),response for a class (RFC),messagepassing coupling (MPC),and information-flow-based coupling (ICP) play an impor-tant role in defect prediction [5]. Other important measurablesoftwaremetrics are lines of code (LOC)metric, theMcCabecyclomatic complexity (MCC) metric, the McCabe essentialcomplexity (MEC) metric, the McCabe module design com-plexity (MMDC) metric, and Halstead metrics.

The selection of significant software metrics for build-ing a simple but quality software defect prediction model isprimarily based on the search-based techniques. An exhaus-tive search (total 2n search spaces for an n-dimensionalmetrics) for a large number of metrics is infeasible due tolimited project resources. An exponential to linear reductionof search spaces is usually achieved by applying some kindof ranking to the features. A forward selection (FS) [37],backward elimination (BE) [37], or a combination of thesetwo are applied to the ranked features to select the significantsubset of features.

2.2 Filters, wrappers and related classification costs

Filter [34] and wrapper [3] methods are proved to be use-ful for the selection of key metrics. Filter method analyzesthe intrinsic properties of the metrics, rather than the eval-uation by the prediction model, to score the selected subsetof metrics to determine the best subset of metrics. This isreported as computationally faster as it does not need to runthe prediction algorithm for scoring the subset of metrics, butneglecting the potential interaction among the elements of the

subset may tend to select redundant metrics whichmay resultin poor prediction performance. Wrapper method, on theother hand, applies prediction algorithm2n times to select thebest-performing metric subsets from the n-dimensional met-rics, which makes this method computationally expensive.Although this method provides good prediction performancedue to having interactions with the subset elements, but hasa potential risk of over-fitting [6,23,38].

Two things may happen in a binary classifier model pre-sented in [6,23]. The classifier can either correctly label amodule, or it failed to do so. Generally, the cost of classify-ing a faulty module as a non-faulty is more costly than theopposite (i.e. classifying a non-faulty module as faulty). Inthis article, we consider the same cost for all misclassifica-tion, rather than adopting this general cost model. Thus theprediction accuracy of the proposed approaches is computedby the ratio of the number of correctly predicted modules tothe total number of modules.

Comparison of greedy forward selection based two-variant ensemble learning classifier and classifier based oncorrelation-based forward selection is presented in [25]. Itis observed that the ensemble learning classifier is out-performed than the correlation-based classifier on imbal-ance datasets containing redundant information. Key featureselection by quantum particle swarm optimization (QPSO)technique and applying these features to an artificial neuralnetwork (ANN) based classifier for detecting software fault-proneness are presented in [22].

Threshold-moving technique is applied in the final stageof the adaptive boosting algorithm (Ada-Boost [15,16]) toreduce the effect of misclassification in [41]. Software defectpatterns are used by the association rule mining techniquesand finding out some action patterns for the prediction ofdefects in the software modules is presented in [9]. Changesmade in software components used as input in the non-MLbased model for predicting defects on software componentsis presented in [18]. A number of fuzzy sets and types ofmembership functions used as expert knowledge in the adap-tive neuro fuzzy inference system (ANFIS) model to predictsoftware defects is presented in [14].

The influence of dataset size and metrics set on the per-formance of the software fault prediction shows that randomforest (RF) and naïve bayes (NB) techniques outperform asa software fault predictor for the large and small datasets,respectively, for the method-level metrics (irrespective of thepresence of the derived Halstead metrics) [8]. On the otherhand, fault prediction using the class-level metrics, whethercorrelation-based metric selection applied on it or not, showsthat NB is the best fault predictor. While these techniquesare evaluated based on the types and sizes of the metrics,they are unable to make any conclusion about the quality ofthe metrics for the defect prediction, which is the case ofours.

123

Page 4: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

2270 Cluster Comput (2017) 20:2267–2281

Table 1 Comparison of existing work

Approach name Significant contribution Limitations References

Filter Computationally faster Does not evaluate the subset [34]

Wrapper Good prediction performance A potential risk of over-fitting [6,23,41]

Greedy forward selection Uses Sawarm optimization Makes search space wider compared tofilter

[25]

Threshold-moving Improved performance No feature selection [15,16]

ANFIS Fuzzy membership function used Only classification [14]

Random forest Uses method level metric No significant improvement in approachand no feature selection

[8]

Naive Bayes Uses method level metrics No significant improvement in approachand no feature selection

[8]

Defect-proneness A feature selection frame work usesexisting classifier

Two separate steps for feature selectionand classification and similar searchstrategy of filter approach

[35]

Proposed approach Hybrid of wrapper filter and their parallelframework, combines advantages fromboth filter and wrapper in acomputationally efficient way

Can be investigated with other wrapper Current work

It is presented in [40] that the software designmetric alonecan be used on the very early stages of the software develop-ment cycle to classify a software module as either defectiveor non-defective. However, the outcome presented is not gen-erally acceptable as the evaluation was carried out only on asingle dataset.

A systematic literature review consisting of 64 primarystudies is presented in [28] to analyze the effectiveness ofmachine learning (ML) techniques as software fault predic-tors. The author identified that theMLmodels are performingbetter than the traditional statistical models as a defect pre-dictor of softwaremodules. It is identified from the study that(a) the most widely usedML techniques for the prediction ofsoftware faults are C4.5, naïve bayes (NB), multilayer per-ceptron (MLP), support vectormachines (SVM), and randomforest (RF), (b) the most commonly used feature selectiontechnique is correlation-based feature selection, (c) the mostfrequently used features are the procedural metrics, (d) themost useful object oriented features areCBO,RFC, andLOC,(e) the most frequently used data sets are the NASA data set,and (f) the most widely used performance measures are theaccuracy, precision, recall, area under a receiver operatingcharacteristic curve (AUC), and F-Measure.

A systematic literature review on 106 primary studies iscarried out in [33] to find out the popularity of softwaremetrics as a defect predictor. It is identified from the studythat 49% of the metrics used are object-oriented (OO) (Chi-damber and Kemerer’s (CK) metrics are among the highest),27% are traditional source code metrics or 24% are pro-cess metrics.While academic researchers prefer OOmetrics,industry researchers choose processmetrics. Table 1 presents

a comparison of the existing approaches’ advantages and dis-advantages in detail.

Selection of important reliability metrics through anexpert opinion and applying these to a fuzzy logic baseddefect predictor to predict the software defects at each phaseof the software development life cycle (SDLC) are presentedin [39]. Inclusion of expert in the system may cause biasnessin the evaluation and may lead to the selection of unre-lated key metrics from different projects. The elimination ofstrong requirements of a quality expert by means of a semi-supervised hybrid self-organizing map (HySOM) model asa combination of self-organizing map (SOM) and artificialneural network (ANN) is presented in [2].

A cost-sensitive software fault prediction model whichassigns different cost factors for misclassification is pre-sented in [21]. The advantage of this modeling is that theproject manager could chose an appropriate misclassifica-tion cost based on the sensitivity of the project. For a highlysensitive (high-risk) project, misclassification cost is veryimportant, but it is not so much matter for a low-risk project.Analyzing the impact of eleven different misclassificationcosts to the proposed model, the authors conclude that min-imizing the overall misclassification cost, rather than thenumber of misclassifiedmodules, is expected to be an impor-tant property of a better classifier.

A general framework for solving the biasness problem insoftware defect prediction is presented in [35]. In this frame-work different learning techniques are analyzed to chose thebest subset of features from the historical training data andused it to train the predictor with the training data. Finally,this predictor is used to predict the faults on a software with

123

Page 5: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

Cluster Comput (2017) 20:2267–2281 2271

the new testing data. Although this framework is stable, pro-viding more accurate results, and solving the baseline biasof the previous approaches, one of the disadvantages is thatthe feature selection and defect prediction are performed intwo separate steps, rather than in a single step. Combiningthese two steps process into a single one may reduce thecomputational complexity.

Prior work of the authors on multivariate process moni-toring and detection of sources of out-of-control signals inmanufacturing systems based on a hybrid of wrapper and fil-ter approach is presented in [20]. The major disadvantages ofthese approaches are that multiple filters and wrapper werenot considered and the algorithm was serial.

3 Methodology

3.1 Problem statement

Let us assume that we have a set of m training softwaremodules {X : x j | j = 1, 2, 3, ...,m} and the correspondingset of m fault/no-fault classes {C : cl | l = 1, 2, 3, ...,m}.It is also assumed that we have a set of n software metrics{Q : qi | i = 1, 2, 3, ..., n}. The problem of software defectprediction consists of using the metrics in Q to classify themodules by a wrapper in X by labeling them as either faultyor non-faulty. Suppose that {T : tl | l = 1, 2, 3, ...,m} isa set of labels produced by the wrapper. If Tl = Cl∀l ∈[1, 2, 3, ...,m], then the wrapper correctly classifies all themodules. Otherwise, some of the modules are misclassified.The problem of determining the set of significant softwaremetrics can be defined mathematically as

G(Q, X) = argmaxSk⊂Q

Perf(Sk | X) . (1)

Here Perf(Sk | X) is a function representing the capability ofsoftware fault identification by a set of metric Sk . It can bemathematically defined as

Perf(Sk | X) =∑m

i=1 oim

, (2)

where

oi ={1 if ci = ci0 otherwise .

Here ci is the predicted fault type by the decision functionDof the wrapper and ci is the actual fault type. The goal of anywrapper is to determine a decision function D to maximizethe performance function Perf(Sk |X) for a subset of metrics.

One of the key challenges is to find out a wrapper whichcan maximize the performance function, and at the same

time, generate information about the significant metrics.Since there are a total of 2n possible subsets Sk ⊂ Q for an n-dimensional metric set Q, generating these huge subsets forG(Q, X) in Eq. (1) is also a computationally expensive task.Reduction of exponential search space problem into a lowercomputational complexity problem is still an open problem.

3.2 Hybrid framework

The selection of key metric subsets by a wrapper requires toexecute it 2n times for an n-dimensional metric set. Somesearch strategies like backward-elimination (BE), forward-selection, or a combination of these two can reduce thesearch space from 2n to O(n), when heuristics of a wrap-per is applied to score metrics in a subset Sk . Although theprediction performance of a wrapper is good due to havinginteractions with the metrics in Sk , it has a potential risk ofover-fitting [6,23]. Filter, on the other hand, is computation-ally faster as it uses intrinsic properties ofmetrics in Sk , ratherthan actually applying prediction algorithm, to score metricsin Sk . However, it has a disadvantage of selecting redundantfeatures due to having no interactions among the metrics inSk .

Our proposed hybrid framework consists of taking thebenefit of both filter and wrapper approaches, rather thanimproving them individually. It uses both approaches to scoreeach metric in Sk and applies the combined score to selectthe significant metrics. A detailed hybrid algorithm (serial) ispresented in Algorithm 1. In this algorithm, a single wrappercalled Artificial Neural Network based Input Gain Measure-ment Approximation (ANNIGMA) [19], and one of the twofilters called (a)MaximumRelevance (MR), and (b) FISHERbasedfilter scoring are applied.As a search strategy, it appliesBE. However, any filter, wrapper, and search strategy can begenerally applied in this algorithm.

A detailed description of the MR and FISHER filters andtwo hybrid models which apply these filters, called MR-ANNIGMAand FISHER-ANNIGMA, used in the algorithmis as follows.

3.2.1 MR filter

MR is a good heuristic which is able to select salient featuresin the data mining area. It uses mutual information whichstatistically summarizes the degree of relevance between themetrics Sk and class variable cl . Usually, significant metricsprovide more relevant information about the class variablethan the insignificant metrics. Thus, MR could be applied toselect the significant metrics which are more relevant to theclass variable, by means of using statistical heuristics.

123

Page 6: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

2272 Cluster Comput (2017) 20:2267–2281

The normalized MR score of a metric qi ∈ Sk for classlabel cl ∈ C is defined as follows

Relevance(qi ) = I (qi ; cl)maxqi∈Sk

I (qi ; cl) , (3)

where I (qi ; cl) is the mutual information between qi and clwhich is defined as

I (qi ; cl) = H(qi ) − H(qi | cl) . (4)

Here H(qi ) is the entropy of qi with the probability densityfunction p(qi ), where qi takes discrete values from the set ofvalues in Sk , and is defined as

H(qi ) = −∑

qi∈Skp(qi ) log p(qi ) . (5)

H(qi | cl), on the other hand, is the conditional entropybetween qi and cl with the joint probability density functionp(qi , cl), and is defined as

H(qi | cl) = −∑

qi∈Sk

cl∈Cp(qi , cl) log p(cl | qi ) , (6)

where cl takes the discrete values from the set of values inC .

3.2.2 FISHER filter

With FISHER filter [12], a subset of features is clustered insuch a way that the distance between data points in differ-ent classes becomes as far as possible while that in the sameclasses becomes as close as possible for the data domain ofthe selected features. Let us assume that there are d (≤ m)distinct classes ck ∈ C with sample size sk, k = 1, 2, ..., d.We further assume that μl

k and σ lk are the mean and standard

deviation of kth class of corresponding lth feature, and μl

and σ l are the mean and standard deviation of X of corre-sponding lth feature. Then the FISHER score of lth featureql is calculated as follows

FISHER(ql) =∑d

k=1 sk(μlk − μl)2

(σ l)2, (7)

where (σ l)2 = ∑dk=1 sk(σ

lk)

2.

3.2.3 Hybrid MR-ANNIGMA and FISHER-ANNIGMAModels

Suppose a two layer neural network (NN) shown in Fig. 1has input {Q : qi | i = 1, 2, 3, ..., n} on the input layer i ,

Fig. 1 A single hidden layer multilayer perceptron (MLP) neural net-work

the hidden layer is r , network weights on two layers are Wir

and Wrs , and F is a logistic activation linear function. Thedecision function on output layer s of this NN is defined asfollows

Ds =∑

r

F

(∑

i

qi × Wir

)

× Wrs , (8)

where F(q) = 1/(1 + exp(−q)). Based on this decisionfunction Ds , the ANNIGMA [19] score can be defined as

ANNIGMA(qis) = Gis

maxn

Gns, (9)

whereGis = ∑r |Wir ×Wrs | is the local gain for i th input to

sth output. Since cross-validation is applied to the wrapper,the averageANNIGMA score for i thmetric qi and sth outputnode for n-fold cross-validation is calculated as follows

ANNIGMA(qis)avg = 1

n

n∑

i=1

ANNIGMA(qis)i , (10)

where ANNIGMA(qis)i | i ∈ [1, 2, 3, ..., n] is ANNIGMA(qis) of i th fold. Then ANNIGMA score for i th metric is

ANNIGMA(qi ) = 1

K

K∑

s=1

ANNIGMA(qis)avg , (11)

where K is the total number of output node of the networkshown in Fig. 1.

Finally, the combined score of a metric in the MR-ANNIGMA is computed by adding Eqs. (3) and (11) asfollows

MR-ANNIGMA(qi ) = Relevance(qi )

+ANNIGMA(qi ) . (12)

123

Page 7: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

Cluster Comput (2017) 20:2267–2281 2273

Algorithm 1: Hybrid approach for selection of metricand identifying the faultsInput: A set of n software metrics {Q : qi | i = 1, 2, 3, ..., n}

used to classify a set of m software modules{X : x j | j = 1, 2, 3, ...,m} as faulty or non-faulty.

Output: A significant subset of metrics SBEST .1 Set current set of metrics Scurrent ← Q // whole set ofn metrics ;

2 Set S ← NULL // initial sets of metrics ;3 Select either MR or FISHER filter ;4 for i = 1 to n − 1 do // backward elimination step5 Compute the selected MR score using equation (3) or

FISHER score using equation (7);6 for f old = 1 to n do // cross-validation step7 Train ANN with software metric set Scurrent ;8 Compute ANNIGMA scores of all metrics in the set

Scurrent by equation (9);9 Compute accuracy ;

10 Compute average accuracy of all folds for Scurrent ;11 Compute average score for Scurrent of sth output of

ANNIGMA by equation (10) ;12 Compute average score for Scurrent of ANNIGMA by

equation (11) ;13 Compute combined score for every metric in Scurrent for the

selected MR-ANNIGMA by equation (12) or forFISHER-ANNIGMA by equation (13) ;

14 Rank the metrics in Scurrent using combined score indescending order ;

15 S ← S ∪ Scurrent ;16 Update the metric set Scurrent by removing the metric with

lowest score ;17 Scurrent ← Scurrent − (metric wi th lowest score) ;

18 SBEST ← Find the subset from S with the highest accuracy ;19 return SBEST ;

Similarly, the combined score of a metric in the FISHER-ANNIGMA is computed by adding Eqs. (7) and (11) asfollows

FISHER-ANNIGMA(qi ) = FISHER(ql)

+ANNIGMA(qi ) . (13)

3.3 Proposed parallel hybrid framework

We proposed two new parallel classifier models. One is ashared-memory parallel classifier where the classificationaccuracy of both the serial and parallel versions are the same.Another one is a hybrid of distributed- and shared-memoryparallel classifier, in which, the classification accuracy dete-riorates from the serial version.

3.3.1 Shared-memory parallel model

Analysis of the hybrid framework (serial) reveals that each ofthe cross-validation steps (Lines 6–9 of Algorithm 1) couldbe computed independently on the same dataset. For eachstep the only difference is how the training and testing parts

Algorithm 2: Hybrid parallel approach for identifyingthe faultsInput: A set of n software metrics {Q : qi | i = 1, 2, 3, ..., n}

used to classify a set of m software modules{X : x j | j = 1, 2, 3, ...,m} as faulty or non-faulty.

Output: Subsets of metrics and accuracies.1 Select either MR or FISHER filter ;2 Compute combined score for every metrics in Q for the selectedMR-ANNIGMA by equation (12) or FISHER-ANNIGMA byequation (13) ;

3 Rank the metrics in Q using combined score in descending order ;4 From the sorted metrics in Q create a set of n subsets of metrics

{S : si | i = 1, 2, 3, ..., n} wheresi = {q j | j = 1, 2, 3, ..., n + 1 − i} ;

5 for i = 1 to n do in parallel // backward eliminationstep

6 for f old = 1 to n do in parallel// cross-validation step

7 Train ANN with software metric set si ;8 Compute accuracy for si ;

9 Compute average accuracy of all folds for si ;

10 return S and the corresponding accuracies ;

Fig. 2 Shared-memory parallel model. Four folds of a BE step aremapped onto four different cores of a node to compute them in parallel

of the same dataset is defined. This computational propertyallows us to design a parallel cross-validation model under ashared-memory architecture. Under thismodel, an unique setof training and testing data is forked to each worker mappedto a dedicated core of a multicore node to compute themin parallel as shown in Fig. 2. When all the workers finishtheir computation, they are joined together. This fork-joinshared-memory parallelism is repeated for each backwardelimination (BE) step.

The algorithm of the shared-memory parallel model is thesame as Algorithm 1, except the serial parallel loop at Line6 will be replaced by a parallel loop.

123

Page 8: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

2274 Cluster Comput (2017) 20:2267–2281

Fig. 3 Hybrid parallel model.Four subsets of features aredistributed across four differentnodes of a cluster to computethem in parallel to achieve adistributed-memory parallelism.Each node again computescross-validation steps in parallelto achieve a shared-memoryparallelism. A hybridparallelism is achieved bycombining these two levels ofparallelism

For the maximum utilization of a node, the number offolds must be greater or equal to the core count of the node.Otherwise, some of the cores will be unused. If T1 and Tnare the execution times of cross-validation on a single andn active cores of a node, respectively, then the upper boundof speedup achieved with this level of parallelism is Sn =T1/Tn .

3.3.2 Hybrid distributed-/shared-memory parallel model

It is observed from the hybrid framework (serial) that back-ward elimination (BE) steps (Lines 4–17 of Algorithm 1)could be parallelized if we sacrifice the classification accu-racy. Classification accuracy depends on how the featurescore of each subset is calculated. In serial BE, the scoreof a feature is not constant throughout the computation. Itdepends on the subset size. Thus, when a lowest-score fea-ture is eliminated from the subset in each BE step, re-scoringof features for the newly formed subset (reduced size) isrequired. This re-scoring controls the classification accuracyin serial BE, which is not possible in parallel BE. Based onthe initial scoring, different subsets are created in parallelversion: one containing all features, another with excludingonly the lowest-score feature, the next onewith excluding thetwo lowest-score features, and so on, until the subset size isreduced to one.Eachof these unique subsets is then computedin parallel. Since cross-validation executed for each BE stepachieves a shared-memory parallelism, these unique subsetsare distributed across multiple nodes to achieve another levelof distributed-memory parallelism as shown in Fig. 3. Whenthe number of folds and cores of a node are the same, a ded-

icated node is assigned for each unique subset to compute.Otherwise, multiple nodes are utilized for computing a par-allel BE step.

An algorithm of the hybrid distributed-/shared-memoryparallel model is presented in Algorithm 2.

The maximum speedup achieved only with distributed-memory parallelism is n f , where n f is the total numberof features in a dataset. Thus, the upper bound of over-all speedup achieved from the hybrid (both distributed- andshared-memory) parallelism is n f × Sn .

4 Implementation and evaluation

This section presents an overview of our implementation,experimental setup, and evaluation of the proposed tech-niques.

4.1 Implementation overview and experimental setup

We used Matlab-R2014b for our implementation. BothMatlab Distributed Computing Server (MDCS) and Par-allel Computing Toolbox (PCT) features are used in theimplementation. A Generic profile is configured to setupthe execution environment. To achieve shared-memory par-allelism, we used parpool() to create pool of Matlabworkers, and used parfor to execute iterations in paral-lel on workers. Each worker is configured to run exclusivelyon a single core. To distribute parallel tasks onto multiplenodes, we used createCommunicatingJob() of typepool, createTask(), and submit().

123

Page 9: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

Cluster Comput (2017) 20:2267–2281 2275

Table 2 Execution performanceof the FISHER-ANNIGMAparallel application for KC2dataset containing 21 features.Each Matlab worker is mappedto an unique core

Numberof cores

Folds Model Serial timein sec (T1)

Parallel timein sec (Tn)

Speedup(T1/Tn)

5 5 Shared-memory 2001.52 439.64 4.55

10 10 Shared-memory 4098.26 472.80 8.67

16 16 Shared-memory 6586.13 489.70 13.45

336 16 Hybrid 6586.13 29.83 220.79

Table 3 Accuracies for ANNIGMA, FISHER, and FISHER-ANNIGMA for KC2 dataset.

Subset size ANNIGMA FISHER FISHER-ANNIGMA

21 81.648 81.954 81.149

20 82.452 82.222 81.916

19 81.418 81.226 81.571

18 82.261 82.49 81.877

17 81.992 83.985 82.146

16 81.992 83.831 82.452

15 83.103 83.18 81.801

14 81.877 83.18 82.567

13 81.916 81.571 83.027

12 83.333 83.18 82.261

11 82.874 83.908 83.257

10 83.946 84.253 82.107

9 84.138 83.793 83.448

8 84.138 83.487 83.793

7 84.023 83.372 83.755

6 84.444 83.333 83.18

5 84.023 84.215 83.64

4 83.257 84.215 83.716

3 83.946 82.682 84.483

2 83.448 82.95 82.835

1 84.598 76.015 83.333

Experiments are conducted on the Raijin cluster managedby theNational Computational Infrastructure (NCI) with thesupport of theAustralianGovernment. This cluster has a totalof 3, 592 compute nodes, each with 16-core (dual 8-core)Intel Xeon (Sandy Bridge 2.6 GHz) processors, connectedthrough Infiniband FDR interconnect, and having a total ofapproximately 160 TBytes of main memory and approxi-mately 10 PBytes of usable fast filesystem [1].

4.2 Execution performance

The execution performance of the parallel application is pre-sented in Table 2. It is observed from the table that on 5,

Table 4 Accuracies for ANNIGMA, FISHER, and FISHER-ANNIGMA for PC2 dataset.

Subset size ANNIGMA FISHER FISHER-ANNIGMA

36 97.02 96.725 97.074

35 96.295 96.779 96.51

34 96.94 97.181 97.101

33 97.101 96.913 96.752

32 96.295 96.537 96.644

31 96.591 96.752 96.725

30 96.993 97.208 96.698

29 97.262 97.315 96.725

28 97.128 96.993 97.315

27 96.993 97.369 97.289

26 96.966 96.913 97.181

25 97.369 97.101 96.859

24 96.886 97.074 97.315

23 97.262 96.617 96.671

22 97.128 97.423 97.503

21 97.101 97.396 97.423

20 96.886 97.208 97.074

19 97.208 96.886 97.477

18 97.53 97.315 97.101

17 97.396 97.181 97.289

16 97.074 97.128 97.181

15 97.396 97.101 97.315

14 97.342 97.262 97.315

13 96.779 97.396 97.02

12 97.557 97.342 97.262

11 97.45 97.02 97.423

10 97.396 97.53 97.289

9 97.584 97.396 97.638

8 97.638 97.664 97.664

7 97.664 97.315 97.45

6 97.611 97.772 97.396

5 97.718 97.826 97.53

4 97.664 97.852 97.342

3 97.53 97.852 97.718

2 97.718 97.852 97.745

1 97.852 97.852 97.852

123

Page 10: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

2276 Cluster Comput (2017) 20:2267–2281

Table 5 Accuracies for ANNIGMA, FISHER, and FISHER-ANNIGMA for KC1 dataset.

Subset size ANNIGMA FISHER FISHER-ANNIGMA

21 84.846 84.912 84.798

20 84.912 84.808 84.922

19 84.931 84.979 85.017

18 84.855 84.675 84.742

17 84.912 84.666 84.884

16 84.912 84.742 84.95

15 84.628 84.836 84.96

14 84.903 84.742 84.817

13 84.969 84.742 84.836

12 84.798 84.704 84.704

11 84.893 84.742 84.685

10 84.808 84.704 84.685

9 84.751 84.523 84.685

8 84.637 84.561 84.609

7 84.647 84.59 84.58

6 84.59 84.59 84.571

5 84.523 84.523 84.533

4 84.486 84.542 84.542

3 84.542 84.542 84.542

2 84.514 84.542 84.542

1 84.542 84.542 84.542

10, and 16 cores the parallel application achieves nearly alinear speedup on a multicore node (4.55, 8.67, and 13.45,respectively). The speedup achieved on 336 cores distributedacross 21 nodes is also significant (220.79).

4.3 Important feature selection and accuracy

The evaluations of the independent filters and wrapper, andthe hybrid approaches are carried out for KC1, KC2, JM1,PC1, PC2, PC3, and PC4 datasets. The accuracies of thebinary classifiers are considered in the evaluation process.

Tables 3, 4, 5, 6, 7, 8, and 9, show the accuracies of twoindependent filters FISHER andMR, a wrapper ANNIGMA,and two hybrid approaches FISHER-ANNIGMA and MR-ANNIGMA. Each row of these tables shows accuracies ofdifferent approaches for a BE step with a subset of features.Each of the accuracies presented here is an average of threetrials.

It is observed from Table 9 that independent filters,wrapper, and hybrid approaches all show more than 90%accuracies for 37 features of PC4 dataset. This trend contin-ues up to 8 features,with somefluctuations of accuracies. Theaccuracies drop tomore than 89% for subset sizes 7–5.Whilethese accuracies remain the same (89%) for subset sizes 4–3for ANNIGMA and MR-ANNIGMA, they are decreased to

Table 6 Accuracies for ANNIGMA, FISHER, and FISHER-ANNIGMA for PC1 dataset.

Subset size ANNIGMA FISHER FISHER-ANNIGMA

21 93.201 92.84 92.985

20 91.957 92.48 92.39

19 92.444 92.552 92.552

18 92.444 92.372 92.714

17 92.606 93.057 92.786

16 92.624 92.967 92.786

15 92.588 92.768 92.678

14 92.588 92.804 92.678

13 92.642 92.714 92.732

12 93.021 93.129 93.093

11 92.876 93.165 92.804

10 92.678 93.111 92.894

9 92.967 93.237 93.147

8 93.183 93.309 93.309

7 93.327 93.544 93.381

6 93.309 93.417 93.291

5 93.255 93.472 93.345

4 93.165 93.057 93.345

3 93.309 93.345 93.219

2 93.003 93.093 92.967

1 93.039 92.949 93.021

Table 7 Accuracies for ANNIGMA, FISHER, MR, FISHER-ANNIGMA, and MR-ANNIGMA for JM1 dataset.

Subsetsize

ANNIGMA FISHER MR FISHER-ANNIGMA

MR-ANNIGMA

21 81.268 81.251 81.323 81.31 81.29

20 81.356 81.303 81.391 81.323 81.396

19 81.323 81.317 81.165 81.281 81.282

18 81.317 81.198 81.253 81.321 81.299

17 81.435 81.365 81.336 81.321 81.338

16 81.398 81.417 81.373 81.259 81.409

15 81.47 81.275 81.317 81.385 81.27

14 81.354 81.354 81.378 81.415 81.376

13 81.365 81.433 81.334 81.463 81.431

12 81.338 81.316 81.282 81.374 81.391

11 81.328 81.358 81.294 81.352 81.424

10 81.417 81.4 81.268 81.444 81.286

9 81.363 81.51 81.297 81.308 81.321

8 81.354 81.444 81.284 81.327 81.294

7 81.268 81.393 81.303 81.358 81.29

6 81.282 81.402 81.251 81.229 80.976

5 81.26 81.38 80.974 81.255 81.007

4 81.297 81.191 80.917 81.393 80.987

3 81.218 81.244 80.977 81.299 80.983

2 80.845 80.889 80.867 81.224 80.948

1 80.858 81.007 80.847 80.853 80.869

123

Page 11: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

Cluster Comput (2017) 20:2267–2281 2277

Table 8 Accuracies for ANNIGMA, FISHER, MR, FISHER-ANNIGMA, and MR-ANNIGMA for PC3 dataset.

Subsetsize

ANNIGMA FISHER MR FISHER-ANNIGMA

MR-ANNIGMA

37 86.555 86.481 86.667 86.592 86.481

36 86.388 86.332 86.24 86.444 85.794

35 86.258 86.24 85.85 86.797 86.258

34 85.868 85.311 86.054 86.258 86.648

33 86.407 86.37 86.388 86.351 86.815

32 86.611 86.778 86.815 86.648 86.834

31 87.205 86.76 86.871 87.261 87.502

30 87.428 86.927 86.89 86.797 87.707

29 86.704 86.518 87.224 86.815 87.205

28 87.001 86.76 86.555 86.871 87.428

27 87.279 86.537 86.834 86.555 87.094

26 87.057 87.187 87.038 86.63 87.279

25 87.837 86.982 86.964 87.038 87.428

24 87.985 86.89 86.704 87.094 87.112

23 87.762 87.001 86.964 87.001 86.704

22 87.595 86.722 86.852 87.057 86.89

21 87.298 86.444 87.261 87.131 87.762

20 87.725 86.165 87.317 86.945 87.131

19 87.187 86.685 87.242 86.815 87.317

18 87.725 86.388 87.298 87.409 87.131

17 87.577 87.131 86.945 87.187 87.484

16 86.815 86.927 87.261 87.205 87.688

15 87.391 86.592 86.871 86.927 87.558

14 87.818 86.76 87.279 87.001 87.279

13 87.372 86.722 87.242 87.391 87.372

12 87.781 86.611 87.205 87.539 87.688

11 87.558 87.112 87.911 87.669 87.539

10 87.688 87.112 87.484 87.261 87.781

9 87.725 87.502 86.927 87.651 88.227

8 87.837 87.112 86.704 87.502 87.539

7 87.317 87.224 87.149 87.409 87.279

6 87.075 87.614 87.539 87.279 87.465

5 87.502 87.632 87.317 87.521 87.224

4 87.521 87.502 87.205 87.707 87.298

3 87.558 86.76 87.447 87.391 87.409

2 87.558 87.057 87.558 87.205 87.558

1 87.558 87.558 87.558 87.558 87.558

87% for the others. MR-ANNIGMA achieves an accuracy of89% for a single and double feature subsets, whereas otherapproaches showan accuracy of 87%. So,MR-ANNIGMA isperforming better than other approaches for selecting impor-tant features (single and double features) for PC4 dataset.

It is observed from Tables 3, 4, 5, 6, 7 and 8 that thehybrid approaches do not perform better compared to theindependent filters and wrapper for KC1, KC2, JM1, PC1,PC2, and PC3 datasets, respectively, due to class imbalanceproblem in the datasets.

Table 9 Accuracies for ANNIGMA, FISHER, MR, FISHER-ANNIGMA, and MR-ANNIGMA for PC4 dataset.

Subsetsize

ANNIGMA FISHER MR FISHER-ANNIGMA

MR-ANNIGMA

37 90.631 90.357 90.48 90.302 90.658

36 91.317 90.357 90.508 90.617 90.878

35 91.001 90 90.658 91.331 90.604

34 90.59 90.302 90.713 90.617 90.754

33 90.645 90.165 90.645 90.754 90.645

32 91.166 90.165 90.672 90.782 90.796

31 90.631 90.466 90.439 90.947 90.604

30 90.617 90.494 91.207 90.658 90.508

29 90.233 90.096 90.082 90.165 90.7

28 90.741 89.835 90.727 90.494 90.864

27 90.508 90.11 90.178 90.645 90.549

26 90.988 89.835 90.892 90.713 90.604

25 90.892 90.027 90.343 90.672 90.549

24 91.029 90.37 90.178 90.823 90.604

23 90.617 89.822 90.37 90.974 90.576

22 90.562 89.849 90.247 90.165 90.439

21 90.974 89.835 90.384 90.7 90.48

20 90.974 89.863 90.466 90.494 90.384

19 90.658 89.726 89.904 90.672 90.562

18 90.672 90.178 89.616 90.59 89.781

17 91.166 89.945 89.959 90.672 89.753

16 90.809 90.096 89.931 91.166 89.822

15 91.056 89.931 89.973 90.96 90.041

14 91.317 90.384 90.302 90.535 89.506

13 91.152 89.986 89.822 90.288 89.945

12 90.137 90.055 89.547 90.316 90.261

11 90.329 90.151 90.398 90.878 90.069

10 90.288 90.343 90.274 90.37 89.671

9 89.835 90.11 90.192 90.261 90.288

8 89.931 89.945 90.178 89.835 89.451

7 89.259 90.302 89.739 89.918 89.575

6 89.081 89.383 89.726 89.726 89.314

5 89.204 89.767 89.726 89.575 89.479

4 89.534 88.381 87.874 89.273 89.41

3 89.671 87.682 87.915 87.49 89.41

2 87.764 87.791 87.819 87.49 89.588

1 87.791 87.723 87.888 87.75 89.232

Figure 4 presents the score ofmetrics at different iterationsfor subsets of different approaches. Each iteration gener-ates a subset of metrics based on the ranking score of eachapproach. These are then ranked which are shown in Fig.4. The horizontal axis presents the metrics and vertical axispresents the score of metrics. A demonstration of the BE iter-ations including the computed score of the individual metricin each iteration of the hybrid model (FISHER-ANNIGMA)is shown in Fig. 4 for KC2 dataset. It is observed that the

123

Page 12: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

2278 Cluster Comput (2017) 20:2267–2281

Fig. 4 Hybrid score in BE iterations for ANNIGMA, FISHER, and FISHER-ANNIGMA with different sets of subsets for KC2 dataset

hybrid model starts with 21 features (Fig. 4a), and 5th fea-ture achieves the lowest score. Therefore, hybrid eliminates5th feature in its next BE iteration, and recalculates the score.These BE iterations are then shown for the subsets of 13 to8 features as shown in Fig. 4b–g, respectively.

5 Conclusions

IoT and cloud computing have been increasingly and suc-cessfully used to improve the various services provided bythe government organization, different industries and busi-ness sectors. Therefore, high quality of the software product

123

Page 13: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

Cluster Comput (2017) 20:2267–2281 2279

for cloud is essential and significantly contributes to theinteroperability and integration of various services withindifferent organizations. Due to the evolving and highly scal-able nature of cloud computing, complexity and size of therelated software products are growing exponentially whichmakes it very challenging to ensure a very good quality. Inthis regard, development of automated defect detection andquality assurancemodel is an urgent step in software develop-ment life cycle (SDLC) stages formonitoring and controllinga large number of softwaremetrics for cloud computing envi-ronment. In order to keep the detection model simple butcomputationally efficient, selection of significant softwaremetric from a big software metric dataset is a subsequentresearch question.

In this article, we have demonstrated that a combina-tion of filter and wrapper approach is a better heuristic forselecting significant softwaremetricwith increased accuracy,compared to the independent filters and wrapper. We havedemonstrated ANNIGMA wrapper and two hybrid versionswith FISHER and MR filters, and evaluated them with KC1,KC2, JM1, PC1, PC2, PC3, and PC4 datasets. It is observedthat for PC4datasetMR-ANNIGMAapproach outperformedother approaches for selecting only significant features withincreased accuracy. MR-ANNIGMA shows an accuracy of89% compared to 87% of others while selecting two orone significant features. We have also demonstrated that theproposed parallel hybridmodel achieves a significant compu-tational speedup compared to its serial version.An evaluationwith KC2 dataset on a 16-core multicore node demonstratesthat with 5, 10, and 16 folds, parallel model achieves 4.55,8.67, and 13.45 speedup, respectively, compared to its serialversion.We further demonstrated that another parallel hybridmodel gains a speedup of 220.79 on 336 cores of a clustercompared with its serial counterparts. The current analysisis limited to the procedural metrics. Future work includesextending the proposed approach for object-oriented met-rics. Moreover, more filters and wrappers could be integratedtogether to analyze the combined performance.

Acknowledgements The authors would like to extend their sincereappreciation to the Deanship of Scientific Research at King SaudUniversity for its participation in funding this research group (RGP-1436-039).

References

1. NCI: National computational infrastructure. http://nci.org.au/raijin/

2. Abaei, G., Selamat, A., Fujita, H.: An empirical study based onsemi-supervised hybrid self-organizing map for software fault pre-diction. Knowl.-Based Syst. 74, 28–39 (2015)

3. Aparisi, F., Sanz, J.: Interpreting the out-of-control signals of mul-tivariate control charts employing neural networks. Int. J. Comput.Electr. Autom. Control Inf. Eng. 4(1), 24–28 (2010)

4. Arar, O.F., Ayan, K.: Software defect prediction using cost-sensitive neural network. Appl. Soft Comput. 33(C), 263–277(2015)

5. Asad, A.A., Alsmadi, I.: Evaluating the impact of software metricson defects prediction, part 2. Comput. Sci. J. Mold. 22(1), 127–144(2014)

6. Balagani, K.S., Phoha, V.V.: On the feature selection criterionbased on an approximation of multidimensional mutual informa-tion. IEEE Trans. Pattern Anal. Mach. Intell. 32(7), 1342–1343(2010)

7. Bayes, T.: An essay towards solving a problem in the doctrine ofchances. Philos. Trans. R. Soc. Lond. 53, 370–418 (1763)

8. Catal, C., Diri, B.: Investigating the effect of dataset size, metricssets, and feature selection techniques on software fault predictionproblem. Inf. Sci. 179(8), 1040–1058 (2009)

9. Chang, C.P., Chu, C.P., Yeh, Y.F.: Integrating in-process softwaredefect predictionwith associationmining to discover defect pattern.Inf. Softw. Technol. 51(2), 375–384 (2009)

10. Compton, B.T.,Withrow,C.: Prediction and control of ada softwaredefects. J. Syst. Softw. 12(3), 199–207 (1990)

11. Cristianini,N., Shawe-Taylor, J.:An Introduction toSupportVectorMachines: AndOther Kernel-based LearningMethods. CambridgeUniversity Press, New York, NY (2000)

12. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification. Wiley,New York (2001)

13. Ebrahimi, N.B.: On the statistical analysis of the number of errorsremaining in a software design document after inspection. IEEETrans. Softw. Eng. 23(8), 529–532 (1997)

14. Erturk, E., Sezer, E.A.: A comparison of some soft computingmethods for software fault prediction. Expert Syst. Appl. 42(4),1872–1879 (2015)

15. Freund, Y.: Boosting a weak learning algorithm by majority. Inf.Comput. 121(2), 256–285 (1995)

16. Freund, Y., Schapire, R.E.: A decision-theoretic generalization ofon-line learning and an application to boosting. J. Comput. Syst.Sci. 55(1), 119–139 (1997)

17. Guo, L., Ma, Y., Cukic, B., Singh, H.: Robust prediction offault-proneness by random forests. In: Proceedings of the 15thInternational Symposium on Software Reliability Engineering(ISSRE 2004). pp. 417–428 (2004)

18. Hassan, A.E.: Predicting faults using the complexity of codechanges. In: Proceedings of the 31st International Conference onSoftware Engineering. pp. 78–88. IEEE Computer Society (2009)

19. Hsu, C.N., Huang, H.J., Schuschel, D.: The ANNIGMA-wrapperapproach to fast feature selection for neural nets. IEEE Trans. Syst.Man Cybern. B 32(2), 207–212 (2002)

20. Huda, S., Abdollahian,M.,Mammadov,M., Yearwood, J., Ahmed,S., Sultan, I.: A hybrid wrapper-filter approach to detect thesource(s) of out-of-control signals in multivariate manufacturingprocess. Eur. J. Oper. Res. 237(3), 857–870 (2014)

21. Jiang, Y., Cukic, B.: Misclassification cost-sensitive fault predic-tion models. In: Proceedings of the 5th International Conferenceon Predictor Models in Software Engineering. pp. 20:1–20:10.PROMISE ’09 (2009)

22. Jin, C., Jin, S.W.: Prediction approach of software fault-pronenessbased on hybrid artificial neural network and quantum particleswarm optimization. Appl. Soft Comput. 35, 717–725 (2015)

23. Kohavi,R., John,G.H.:Wrappers for feature subset selection.Artif.Intell. 97(1–2), 273–324 (1997)

24. Kröse, B., Smagt, P.V.D.: An introduction to Neural Networks. TheUniversity of Amsterdam, Amsterdam (1993)

25. Laradji, I.H., Alshayeb, M., Ghouti, L.: Software defect predictionusing ensemble learning on selected features. Inf. Softw. Technol.58, 388–402 (2015)

26. Lessmann, S., Baesens, B., Mues, C., Pietsch, S.: Benchmarkingclassification models for software defect prediction: a proposed

123

Page 14: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

2280 Cluster Comput (2017) 20:2267–2281

framework andnovel findings. IEEETrans. Softw.Eng.34(4), 485–496 (2008)

27. Li, Z., Reformat, M.: A practical method for the software fault-prediction. In: Proceedings of the IEEE International Conferenceon Information Reuse and Integration (IRI 2007). pp. 659–666(2007)

28. Malhotra, R.: A systematic review of machine learning techniquesfor software fault prediction. Appl. Soft Comput. 27, 504–518(2015)

29. Menzies, T., Greenwald, J., Frank, A.: Data mining static codeattributes to learn defect predictors. IEEE Trans. Softw. Eng. 33(1),2–13 (2007)

30. Munson, J.C., Khoshgoftaar, T.M.: Regression modelling of soft-ware quality: empirical investigation. Inf. Softw. Technol. 32(2),106–114 (1990)

31. Pelayo, L., Dick, S.: Applying novel resampling strategies to soft-ware defect prediction. In: Proceedings of the 2007AnnualMeetingof the North American Fuzzy Information Processing Society(NAFIPS 2007). pp. 69–72 (2007)

32. Quinlan, J.R.: Induction of decision trees. Mach. Learn. 1(1), 81–106 (1986)

33. Radjenovic, D., Hericko, M., Torkar, R., Živkovic, A.: Softwarefault prediction metrics: a systematic literature review. Inf. Softw.Technol. 55(8), 1397–1418 (2013)

34. Rodger, J.A.: Toward reducing failure risk in an integrated vehiclehealth maintenance system. Expert Syst. Appl. 39(10), 9821–9836(2012)

35. Song, Q., Jia, Z., Shepperd,M., Ying, S., Liu, J.: A general softwaredefect-proneness prediction framework. IEEE Trans. Softw. Eng.37(3), 356–370 (2011)

36. Song, Q., Shepperd, M., Cartwright, M., Mair, C.: Software defectassociation mining and defect correction effort prediction. IEEETrans. Softw. Eng. 32(2), 69–82 (2006)

37. Sutter, J.M., Kalivas, J.H.: Comparison of forward selection, back-ward elimination, and generalized simulated annealing for variableselection. Microchem. J. 47(1), 60–66 (1993)

38. Wang, H., Khoshgoftaar, T.M., Hulse, J.V., Ga, K.: Metric selec-tion for software defect prediction. Int. J. Softw. Eng. Knowl. Eng.21(2), 237–257 (2011)

39. Yadav, H.B., Yadav, D.K.: A fuzzy logic based approach for phase-wise software defects prediction using softwaremetrics. Inf. Softw.Technol. 63, 44–57 (2015)

40. Zhao,M.,Wohlin, C., Ohlsson, N., Xie,M.: A comparison betweensoftware design and codemetrics for the prediction of software faultcontent. Inf. Softw. Technol. 40(14), 801–809 (1998)

41. Zheng, J.: Cost-sensitive boosting neural networks for softwaredefect prediction. Expert Syst. Appl. 37(6), 4537–4543 (2010)

Md Mohsin Ali completed aPhD at the Research Schoolof Computer Science at TheAustralian National University(ANU) in December 2016. Hehas done his Masters fromCanada and Undergraduate fromBangladesh both are majoringin Computer Science. SinceAugust 2016, he has been withNational Computational Infras-tructure (NCI) at ANU as a StaffScientist. Previously, he was aLecturer at theComputerScienceand Engineering (CSE) Depart-

ment at Khulna University of Engineering and Technology (KUET),

Bangladesh. He has published more than 25 papers in well reputedconferences and journals. His research interests include resilience forhigh-performance computing, large-scale simulations on supercomput-ers, datamining andmachine learning, mobile computing and computernetworks.

Shamsul Huda is a Lecturerin School of Information Tech-nology, Deakin University, Aus-tralia. He has published morethan 50 journal and conferencepapers in well reputed jour-nals including IEEE Transac-tions. His main research areais computational intelligence,information security, optimiza-tion approaches to data mining,health informatics. Earlier to joinin the University of Ballarat, hehas worked as an Assistant pro-fessor in the Computer Science

and Engineering (CSE) Department in Khulna University of Engineer-ing and Technology (KUET), Bangladesh.

Jemal Abawajy is a full pro-fessor at school of InformationTechnology, Faculty of Science,Engineering and Built Environ-ment, Deakin University, Aus-tralia. He is currently the Direc-tor of the Parallel and Distribut-ing Computing Laboratory. Heis a Senior Member of IEEEComputer Society; IEEETechni-cal Committee on Scalable Com-puting (TCSC); IEEE TechnicalCommittee onDependable Com-puting and Fault Tolerance andIEEE Communication Society.

He has served on the editorial-board of numerous international journalsand currently serving as associate editor of the International Journalof Big Data Intelligence and International Journal of Parallel, Emer-gent and Distributed Systems. He has also guest edited many specialissues. He is the author/co-author of five books, more than 250 papersin conferences, book chapters and journals such as IEEE Transactionson Computers and IEEE Transactions on Fuzzy Systems. He also edited10 conference volumes.

Sultan Alyahya received hisPhD degree in Computer Sciencefrom Cardiff University, UK, in2013. He also received his MScdegree in Information SystemsEngineering from the same uni-versity in 2007. The BSc degreewas obtained with honors ininformation systems from KingSaud University. He is currentlyan assistant professor at the Col-lege of Computer and Informa-tion Sciences, King Saud Uni-versity. His main research inter-ests are in the fields of software

123

Page 15: A parallel framework for software defect detection and ...users.cecs.anu.edu.au/...for-software-defect... · highly evolving software product in the cloud environment. Currently,

Cluster Comput (2017) 20:2267–2281 2281

project management, agile development and computer supported co-operative work (CSCW).

Hmood Al-Dossari is an Assis-tant Professor inCollege ofCom-puter and Information Sciencesat King Saud University. Heholds a MS and PhD in Com-puter Science from King SaudUniversity and Cardiff Univer-sity respectively. His researchinterests include quality of ser-vice assessment, social mining,human behavior modeling andreputation and trust manage-ment in cloud computing. Hehas around twelve publicationsin international conferences and

journals. He has attended various conferences and presentedmany sem-inars.

John Yearwood is the Headof School of Information Tech-nology, Deakin University, Aus-tralia. His main research areasare machine learning, optimiza-tion and information security.He has published two booksand over 200 refereed journals,book chapters and conferencearticles. ProfessorYearwoodwasthe Editor-in-Chief of the Jour-nal of Research and Practice inInformation Technology, and areviewer for many journals.

123