Research Article ABC and IFC: Modules Detection Method for ...

Research ArticleABC and IFC Modules Detection Method for PPI Network

Xiujuan Lei12 Fang-Xiang Wu3 Jianfang Tian1 and Jie Zhao1

1 School of Computer Science Shaanxi Normal University Xirsquoan Shaanxi 710062 China2 School of Electronics Engineering and Computer Science Peking University (Visiting Scholar) Beijing 100871 China3Division of Biomedical Engineering University of Saskatchewan Saskatoon SK Canada S7N 5A9

Correspondence should be addressed to Fang-Xiang Wu faw341mailusaskca

Received 8 January 2014 Accepted 9 May 2014 Published 2 June 2014

Academic Editor Jianxin Wang

Copyright copy 2014 Xiujuan Lei et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited

Many clustering algorithms are unable to solve the clustering problem of protein-protein interaction (PPI) networks effectively Anovel clustering model which combines the optimization mechanism of artificial bee colony (ABC) with the fuzzy membershipmatrix is proposed in this paper The proposed ABC-IFC clustering model contains two parts searching for the optimum clustercenters using ABC mechanism and forming clusters using intuitionistic fuzzy clustering (IFC) method Firstly the cluster centersare set randomly and the initial clustering results are obtained by using fuzzy membership matrix Then the cluster centers areupdated through different functions of bees in ABC algorithm then the clustering result is obtained through IFC method basedon the new optimized cluster center To illustrate its performance the ABC-IFC method is compared with the traditional fuzzyC-means clustering and IFCmethodThe experimental results onMIPS dataset show that the proposed ABC-IFCmethod not onlygets improved in terms of several commonly used evaluation criteria such as precision recall and P value but also obtains a betterclustering result

1 Introduction

With the completion of human genome project researcheson protein-protein interaction (PPI) networks [1] have beena hot topic in the life science area which can not only provideclues to explore biological functions to deeply understandthe essence of life activities but also give important infor-mation to understand mechanism of diseases Currently anumber of methods have been proposed in order to detectprotein functional modules and predict protein functionsfrom PPI networks such as Markov random field methodand spectral clustering method [2] However PPI networksare complex because of enormous data volume FurthermorePPI networks usually consist of a large number of high-density protein nodes and some sparse connection nodeswhich causes the networks to show the small-world and scale-free features Hence clustering is the primary tool for datamining in PPI networks

Clustering methods are typical tools in data miningRecently many clustering methods emerged [3] such astraditional partitioning methods [4] which can only find theglobular cluster and meanwhile the cluster number should

be known in advance hierarchical methods [5ndash8] which arecomputationally expensive and are difficult in determiningthe merging threshold density-based methods [9ndash13] whichcannot categorize the network that has a large number ofsparse nodes MCL-based methods [14] which are suitableto hard clustering spectral clustering methods [15] in whichit is difficult to determine the neighborhood matrix fuzzyclustering methods [16] which are sensitive to initial valueand are easy to fall into localminimum and so on Functionalflow clustering algorithm [17 18] is a new clustering methodfor PPI networks which regards each node as a ldquoreservoirrdquoand passes on the flow to the next node by the connectingedge The algorithm is easy to be implemented and suitablefor PPI networks But the precision and recall values are notgood enough Thus it can be seen that these methods havedifferent degrees of drawbacks in mining PPI networks dueto the unique characteristics of PPI networks We have donesome researches on PPI networks clustering issues basedon swarm intelligent optimization methods [19ndash21] but theperformance of the algorithms still needs to be improved

Some classical clustering algorithms split samples intosome disjoint clusters that is one sample belongs to only one

Hindawi Publishing CorporationBioMed Research InternationalVolume 2014 Article ID 968173 11 pageshttpdxdoiorg1011552014968173

2 BioMed Research International

cluster absolutely Results of these methods are not ideal forPPI networks in that there are some overlapping functionalmodules in PPI networks In fact proteins in PPI networksdo not belong to a single cluster According to the definitionof fuzzy partition Dunn [22] has extended hard clusteringto fuzzy clustering which allows one protein to belong tomultiple clusters This expresses the fuzzy concept of ldquoitbelongs to both this cluster and that clusterrdquo Therefore fuzzyclustering algorithm ismore suitable for clustering proteins inPPI networks Fuzzy c-means (FCM) clustering method [23]optimal fuzzy clustering algorithm [24] and intuitionisticfuzzy clustering (IFC) [25] algorithm are commonly usedTraditional fuzzy clustering algorithms adopt distance as themeasure of sample similarity criteria As most nodes in PPInetworks are unreachable it is difficult to apply them to PPInetworks effectively

In this paper a new clustering model is proposed bycombining artificial bee colony (ABC) optimization andintuitionistic fuzzy membership Specifically the proposedmethods employ fuzzy membership matrix for partitioninginitial clusters first Then a new clustering objective functionis presented by taking characteristics of PPI networks thesimilarity among clusters and the density and the weightsof interaction nodes within clusters into consideration Thenew method also takes advantage of ABC algorithm tooptimize the values of the objective function to gain thecluster results In addition the new algorithm makes useof ABC algorithm to optimize cluster centers automaticallyto overcome defects of sensitivity to cluster centers throughfuzzy c-means clustering and intuitionistic fuzzy clusteringalgorithm Computational experiments show that the newmodel and algorithm perform better than existing algo-rithms

The rest of this paper is organized as follows In Section 2basic concepts and principles are introduced firstly secondlythe proposed ABC fuzzy clustering model is discussed andthen the flow chart and the computational step are listedalong with the time complexity analysis of the algorithmPerformance and evaluation of the ABC-IFC algorithm isshown by comparing with FCM and IFC in Section 3Section 4 concludes this study

2 Materials and Methods

21 Two Properties of PPI Networks Watts and Strogatz[26] presented small-world property of complex networksstating that most nodes are not neighbors of one anotherwhile they can be reached from any of other nodes by asmall number of hops or steps A small-world network hassmaller average distance than random networks and largerclustering coefficient than regular networks There are somesubnetworks in which almost any two nodes are connectedbecause of the effect of the large clustering coefficient On theother hand any two nodes are connectedmostly through oneshort path

Regarding each network as an undirected graph the 119890degree of node V is called 119896 if node V has 119896 adjacent node119875(119896) represents the frequency of nodes with the degree of 119896

if119875(119896) follows the power-lawdistribution such as119875(119896) prop 119896minus120574with 120574 isin [2 3] the graph is considered scale-free Withscale-free property [27] only the degree of a small numberof nodes is large while that of most nodes is small Statisticalphysicists used to call the phenomenon following power-lawdistribution as scale-free phenomenon

22 Fuzzy C-Means Clustering Fuzzy C-means clustering(FCM) algorithm is described by Ruspini [28] It is definedas follows 119883 = 119909

1

1199092

119909119899

is cluster sample 119899 is thenumber of samples 119888 is the number of clusters 1 lt 119888 lt 119899119889119894119895

= 119909119894

minus V119895

is Euclidean distance between the sample 119909119894

and the cluster center V119895

119881 = [V1

V2

V119888

] is cluster centermatrix 119880 = [119906

119894119895

] is 119899 lowast 119888 membership matrix and 119906119894119895

isthe degree of membership of 119909

119894

that belongs to the cluster 119895Update the degree of membership 119906

119894119895

and the cluster centerV119895

by

119906119894119895

=

1

sum119888

119905=1

(

10038171003817100381710038171003817119909119894

minus V119895

10038171003817100381710038171003817

2

1003817100381710038171003817119909119894

minus V119905

1003817100381710038171003817

2

)

2(119898minus1)

V119895

=

sum119899

119894=1

119906119898

119894119895

119909119894

sum119899

119894=1

119906119898

119894119895

(1)

where119898 = 2 The objective function is defined as

119869 =

119899

sum

119894=1

119888

sum

119895=1

119906119898

119894119895

10038171003817100381710038171003817119909119894

minus V119895

10038171003817100381710038171003817

2

(2)

119869will obtain the extremumwhen the matrix119880 is normalized

23 Intuitionistic Fuzzy Clustering (IFC)

231 Intuitionistic Fuzzy SetTheory Let119883 be a given domainan intuitionistic fuzzy set 119877 on119883 is given by

119877 = ⟨119909 120583119877

(119909) ]119877

(119909)⟩ | 119909 isin 119883 (3)

where 120583119877

(119909) 119883 rarr [0 1] and ]119877

(119909) 119883 rarr [0 1]

denote degree of membership and nonmembership of 119877respectively For 119909 isin 119883 in 119877 it is true that 0 le 120583

119877

(119909) +

]119877

(119909) le 1 The general intuitionistic fuzzy set is denotedby 119877 = ⟨119909 120583

119877

(119909) ]119877

(119909)⟩ The general intuitionistic fuzzy setis also marked as 119877 = ⟨119909 120583

119877

(119909) 1 minus 120583119877

(119909)⟩ | 119909 isin 119883120587119877

(119909) = 1 minus 119906119877

(119909) minus ]119877

(119909) denotes intuitionistic index of 119909 in119877 which is a measurement of uncertain degree

232 Intuitionistic Fuzzy Clustering Algorithm Let 119883 =

1199091

1199092

119909119899

be a set of samples to be clustered 119875 =

1199011

1199012

119901119888

is a set of 119888 clustering prototypes where 119888 isthe number of clusters The intuitionistic fuzzy set is givenby 119877 = ⟨(119909

119894

119901119895

) 120583 ]⟩ | 119909119894

isin 119883 119901119895

isin 119875 (119909119894

119901119895

) isthe degree of membership of the sample 119894 in the cluster 119895119881 = [V

1

V2

V119888

] is a group of cluster centers Clusteringprototype model (119901

119894119895

) degree of hesitation (120587119894119895

) degree of

BioMed Research International 3

membership (119906119894119895

) and degree of nonmembership (V119894119895

) can becomputed as follows

119901119894119895

= 120572 (1 minus 119890(119889(119909119894 119901119895)

2)4

) (4)

119890120582119896

=

1

sum119888

119894=1

(1 minus 120587119896119894

119890119889

(119909119896V119894 )2

+ 119890120582119896

)

(5)

120587119894119895

=

119908119890

(119894 V119895

) if 119894 = V119895

1 otherwise(6)

119906119894119895

=

(1 minus 120587119894119895

)

(119890119889(119909119894 119901119895)

2

+ 1)

(7)

V119894119895

= 1 minus 120587119894119895

minus 119906119894119895

(8)

where 120572 = 04 119890120582119896

in (5) is solved according to the fastsecant method In intuitionistic fuzzy clustering algorithmthe degree of hesitation is always set in advance In this paperthe degree of hesitation 120587 is determined according to the PPInetwork nodes interaction weight 119908

119890

Clustering analysis can be converted to optimization

problem which aims at minimizing the objective function(clustering criterion function) 119869

119869 =

119899

sum

119894=1

119888

sum

119895=1

119906119894119895

119889 (119909119894

119901119895

) (9)

119889(119909119894

119901119895

) = (119909119894

minus 119901119895

)(119909119894

minus 119901119895

)119879 is the square of Euclidean

distance between the sample 119894 and clustering prototype119895 andmust satisfy equation 119906

119894119895

+ V119894119895

+ 120587119894119895

= 1

24 The ABC Algorithm Artificial bee colony algorithm isan intelligent optimization algorithm proposed by Karabogaet al [29ndash31] to solve the multivariate function optimiza-tion problems based on the intelligent behavior of honeybee swarm Srinivasa Rao et al [32] used it to solve net-work reconfiguration problem in distributed systems Otherresearchers proposed chaotic artificial bee colony [33] anddiscrete artificial bee colony [34] algorithms The artificialbee colony has become a research hotspot due to its simpleidea fast convergence rate and superior performance Ourteam had adopted it to optimize the threshold of clusteringproblem and achieved good results [35]

The ABC algorithm is a new metaheuristic bionic algo-rithm where each bee can be viewed as an agent and swarmintelligence is achieved through the cooperation betweenthe individuals The ABC algorithm includes two differentmechanisms consisting of foraging behavior and propagatingbehavior The ABC algorithm based on propagating mech-anism was inspired by marriage behavior of bee colony[36] the queen maintains good genes to make colony moreadaptive to the environment While the ABC algorithmbased on foragingmechanism finds optimal solution throughcollaboration between the various types of bees and roleconversion mechanism the ABC algorithm provides a newidea for the heuristic algorithm research and becomes one

of the important research directions of solving complexoptimization problems

The ABC algorithm based on foraging behavior typicallycontains four groups nectar source employed bees onlookerbees and scouts bees A nectar source corresponds to anemployed bee the position of a nectar source representsa possible solution of optimization problem and its valuedepends on many factors such as the degree of proximitybetween the nectar source and honeycomb or the degreeof nectar source concentration and so on The fitness isusually used to describe the nectar source features Theemployed bees associate with certain nectar source carryinga lot of nectar source information that is related to incomelevel The onlooker bees search and update new nectarsource near honeycomb if there are no nectar source forupdate the scouts bees will search new nectar source inthe global scope Nectar source search follows two stepsfirstly the employed bees find nectar source and recordthe nectar source location and nectar quantity Then theonlooker bees use the information the employed bees pro-vided to decide which nectar source to go to or scoutsbees go on global searching to explore the new nectarsource

Based on foraging behavior the ABC algorithm finds theglobal optimal value by each individual beesrsquo local searchwhich has a faster convergence speed higher precision andfew parameters Therefore the ABC algorithm is used in thispaper to optimize the IFC algorithm The nectar sourcesstand for a set of clustering center of IFC algorithm thecluster centers are optimized and updated by employed beesonlookers bees and scouts bees It overcomes the sensitivityto clustering center of the IFC algorithmwhile improving theeffect of clustering

25 ABC Fuzzy Clustering Model

251 The Solution Space The small-world and scale-freeproperties of PPI networks make few nodes have a largedegree these nodesmay have an important impact on proteinfunctions Most of the other nodes own a relatively smalldegree and even the degree of some nodes is zero whichare named isolated nodes In PPI networks a protein maypossess diverse functions as a result it is inadvisable tohold the idea that all the protein nodes in a cluster areregarded as having an identical function Therefore a lot ofclustering methods without considering the characteristicsperform unsatisfactory toward PPI networks IFC clusteringalgorithm fits well with PPI networks of a protein belongingto several functional modules As mentioned above ABCalgorithm performs admirably we establish a correspondingrelationship between clustering and the optimization mech-anism of bee colony (Table 1) Accordingly a model basedon the combination of ABC optimization mechanism andintuitionistic fuzzy method is put forward in this paperWe name the model ABC intuitionistic fuzzy clusteringmode (ABC-IFC for short) The method takes advantage ofABC algorithm to determine the optimal cluster centers andovercomes the weakness of sensitivity to cluster centers by


Table 1 Corresponding relationship between clustering and mech-anism of bee colony optimization

Foraging behavior of honeybees Clustering

Position of nectar sources Cluster centers

Amount of nectar sources Value of objectivefunction

Responsibilities of onlookerbees and scout bees

Searching foroptimizing cluster

centersHighest nectar amount ofnectar sources Best cluster centers

fuzzy c-means clustering and intuitionistic fuzzy clusteringalgorithm Utilizing intuitionistic fuzzy algorithm to clusterPPI networks is expected to improve the clustering effectbecause the idea is in line with the characteristics of PPInetworks

252 Principle of ABC Fuzzy Clustering Model In the IFCclustering algorithm a group of cluster centers are givenrandomly at the very beginning afterwards the member-ship degree matrix is calculated and partitions are madeaccording to it Meanwhile the value of criteria is calcu-lated until the algorithm satisfies the stopping conditionand obtains the final clustering results A set of initialcluster centers is generated randomly in IFC clusteringalgorithm in which there are no rules to follow More-over the clustering criteria function is based on distancewhich is unreasonable to deal with PPI networks Henceit is inefficient to cluster PPI networks by IFC clusteringalgorithm

Since some nodes in PPI networks are unreachabledistance is not the suitable measure for clustering As weknow there are interactions among nodes in PPI networksWe can compute the similarities among protein modulesdensity within modules and average interactions using theinteractions among the nodes Accordingly we redesign thecriteria of IFC and evaluate the results in light of intermodulesimilarity innermodule density and similarity

The newmodel ABC-IFC is proposed to seek the optimalcluster centers in this paper by combining IFC clusteringalgorithm and ABC algorithm The ABC algorithm consistsof three kinds of bees employed bees onlooker bees andscout bees In the ABC-IFC model a nectar source standsfor a group of cluster centers the number of nectar sourcesis the number of clusters Onlooker bees are responsiblefor exploiting new sources adjacent to employed bees andthen updating the cluster centers If onlooker bees failto find the nectar source scout bees will update clustercenters and search the whole area to revise the clustercenters Finally ABC algorithm will obtain the optimalcluster centers IFC clustering algorithm will compute thefuzzymembership degreematrix based on the optimal clustercenters and meanwhile clusters are divided according to themembership degree matrix to obtain the clustering results inthe end

253 Objective Function Traditional clustering algorithmsusually adopt distance to define the objective function more-over in weighted networks the reciprocal of the weights isregarded as the measure of distance between two nodes Theshortest path distance between two nodes is usually regardedas the distance when the two nodes are not connectedHowever some nodes in PPI networks are unreachable andthe reciprocal of weights becomes infinite when there is nointeraction between two protein nodes so it is unreasonableto use the shortest path distance as the distance betweentwo nodes Thus a new objective function is designed toevaluate the cluster results Our objective function includestwo parts the first part is the similarity between interclustersthe second part is the reciprocal of the average of summationof inner clusterrsquos density and the weights between nodeswithin a clusterThe less the first part and the second part thebetter the performance will be Thus the objective function isdefined as

min 119891V119886119897

=

1

119888 lowast 119888

119888

sum

119894=1

119888

sum

119895=1

sim (119868119894

119869119895

) +

1

DEWif DEW = 0

1

119888 lowast 119888

119888

sum

119894=1

119888

sum

119895=1

sim (119868119894

119869119895

) otherwise

(10)

where sim(119868119894

119869119895

) = (sum119909isin119868119894119910isin119869119895

119888(119909 119910))max(|119868119894

| |119869119895

|) standsfor the similarity between two clusters

119888 (119909 119910) =

1 if 119909 = 119910119908119890

(119909 119910) if 119909 = 119910 ⟨119909 119910⟩ isin 1198640 otherwise

den (119894) = 2119890

119899119894

lowast (119899119894

minus 1)

dw (119894) =sum (119908

119890

)

119899119894

lowast (119899119894

minus 1)

DEW = 1119888

119888

sum

119894=1

(den (119894) + dw (119894))

(11)

119899119894

represents the number of protein nodes in cluster 119894 119890represents the number of interactions among the clusterand others in PPI networks and 119908

119890

represents the weightsbetween nodes within a cluster

The first part of (10) is similarity between two clusterswhich is used to evaluate the similarity of two clusters thesmaller the similarity the better the cluster effects will be andvice versa The second part is the reciprocal of the average ofsummation of inner clusterrsquos density and the weights betweennodes within a cluster which reflects the strength of theinteraction within a cluster If there is no interaction betweentwo nodes the value of denominator of the second part mayequal zero that is the second part tends to be infinitelysmall and thus the first part becomes the criteria for clusterresults Otherwise when the value of the second part is notzero then the combination of similarity between two clusters


the density and weights within a cluster can evaluate thecluster effect more comprehensively

26 ABC-Based Fuzzy Clustering Algorithm

261 Algorithm Description

Step 1 Initialize the iterations 119894119905119890119903 = 1 maximum iterationmaxiter and clustering number 119888 and randomly select clus-tering center V 119897119894119898119894119905 = 0 and 119898119886119909 119897119894119898119894119905 Set 119892119887119890119904119905119881 = 119894119899119891initially gbestV represents a group of optimal cluster centersgbest cluster is the clustering result of the optimal clustercenters which is set randomly at the beginning

Step 2 Calculate the degree of membership matrix 119880 using(5)ndash(7) cluster is formed according to themembership degreematrix and evaluate the fitness value 119891V119886119897(119888119897119906119904119905119890119903) based on(10)

Step 3 If 119891V119886119897 (119888119897119906119904119905119890119903) lt 119891V119886119897 (119892119887119890119904119905 119888119897119906119904119905119890119903) 119892119887119890119904119905119881 = 119881and 119892119887119890119904119905 119888119897119906119904119905119890119903 = 119888119897119906119904119905119890119903

Step 4 Onlookers bees search new nectar source nearemployed bees if the search is successful go to Step 6

Step 5 If the search is a failure and 119897119894119898119894119905 gt 119898119886119909119897119894119898119894119905 scoutsbees start global searching set 119897119894119898119894119905 = 0

Step 6 Update the clustering center 119881

Step 7 Set 119894119905119890119903 = 119894119905119890119903 + 1

Step 8 If 119894119905119890119903 lt= 119898119886119909119894119905119890119903 go to Step 2 otherwise out put thecluster result 119892119887119890119904119905 119888119897119906119904119905119890119903

262 Time Complexity of Algorithm In time complexityanalysis asymptotic method is usually used to express theorder of magnitude of program execution steps to estimatethe performance of the algorithm In ABC algorithm if 119888represents cluster numbers 119899 is protein node maximumiterations is maxiter the population size is 119873 which is twicethe value of 119888 the node number of a class is 119899119906119898 and the timecomplexity of algorithm is as follows

(1) the time complexity of initializing the clusteringcenter is 119874(119899)

(2) the time complexity of calculating the degree ofmembership matrix is 119874(119899 lowast 119888)

(3) the time complexity of updating the clustering centeris 119874(119898119886119909119894119905119890119903 lowast 119873 lowast 119899119906119898)

(4) the time complexity of updating global optimal valueis 119874(119873)

(5) the time complexity of cluster division is 119874(119899 lowast 119888)(6) the time complexity of judging stopping condition is119874(119898119886119909119894119905119890119903)

From the above analysis we could find that the timecomplexity of (2) is dominating the algorithm spends much

time on the clustering center updating which increases thetime complexity

3 Results and Discussions

31 Dataset and Criteria In the experiments the datasetof PPI networks comes from MIPS database [37] whichconsists of two sets of data one is the experimental datawhich contains 1376 protein nodes and the 6880 interactiveprotein-pairs which is considered the training database theother describes the result that the proteins belong to identicalfunctional module which is regarded as the standard dataset[38] containing 89 clusters

Precision recall 119875 value and their harmonic mean cometo be the metric for evaluating the clustering in this paperPrecision [39] is the ratio of the maximal number of commonnodes between experimental results and standard dataset tothe number of nodes in experimental results Recall [39] isthe ratio of the maximal number of common nodes betweenexperimental results and standard dataset to the number ofnodes in the standard dataset The equations are as follows

119901119903119890119888119894119904119894119900119899 (119862 | 119865) =

MMS (119862 119865)|119862|

119903119890119888119886119897119897 (119862 | 119865) =

MMS (119862 119865)|119865|

(12)

where 119862 stands for the obtained cluster results and 119865represents the standard dataset of MIPS |119862| is the numberof nodes in the set 119862 |119865| stands for the number of nodes inthe standard dataset MMS(119862 119865) denotes the number of themaximum matching nodes between experimental result andstandard dataset

A protein may possess various kinds of functions in PPInetworks so it is not advisable to consider that all the proteinnodes in a cluster are regarded as having identical functionThus the definition of 119875 value [40] comes to evaluate thereasonableness of this assignment Suppose that the numberof a cluster in the experiment is 119898 the number of proteinnodes which possess identical function is 119889 the number ofproteins in the standard database is119872 and 119863is the numberof proteins which have the same function as each other The119875 value is expressed as follows

119875 =

119898

sum

119895=119889

(119863

119895

) (119872minus119863

119898minus119895

)

(119872

119898

)

(13)

According to (13) the lower the 119875 value is the more con-fident the protein nodes comes from one cluster module andpossess identical function which provides instructive infor-mation for researchers to analyze the function of unknownproteins

In general large modules own high recall value becausea large module 119862 contains a lot of nodes in the set of 119865and extremely all nodes are gathered in the same clusterin this case the recall value tends to the highest On thecontrary small modules possess high precision because smallmodules have the same properties Extremely each node can


be a module and these modules have the highest precisionHence f-measure comes to be the metric for clustering in thispaper which is shown in (14) Consider

119891-119898119890119886119904119906119903119890 =2 sdot 119901119903119890119888119894119904119894119900119899 sdot 119903119890119888119886119897119897

119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

(14)

32 Algorithm Parameter Analysis In ABC algorithm thereare two important parameters one is the prob which is theprobability that the onlooker bees select the nectar source theother is the limit which is the threshold indicating whetherthe scout bees will go on global searching

Onlooker bees search the nectar source according to probA random number rand arranged from 0 to 1 is generatedfirstly if 119903119886119899119889 gt 119901119903119900119887 the onlooker bee chooses the nodewith the largest amount of information among the adjacentnodes of the employed bee to be the new cluster centersotherwise the node which has the second largest informationis chosen to be the new cluster centers Also limit is anotherimportant parameter in ABC algorithm While there is noimprovement of the cluster centers after limit times of loopsthe cluster centers will be abandoned and the scout bees willsearch for a new solution of cluster centers

Figure 1 shows the influence of parameter prob on theclustering results prob is the parameter that onlooker beesselect the nectar source according to the roulette wheelselection strategy If the parameter is set too small thepossibility of onlooker bees searching local optimal clusteringcenter is big and meanwhile the algorithm is easy to fall intolocal optimal On the other hand if it is too large it couldensure the algorithmrsquos diversity but onlooker bees could justfind local suboptimal solution FromFigure 1 we find that thecluster effect is best when 119901119903119900119887 = 04

Figure 2 shows the influence of parameter limit on theclustering results limit is an important parameter in ABCalgorithm which is used to determine whether the scout beessearch the global area and update clustering center FromFigure 2 we can see that if limit is too small the scoutbees will constantly search new cluster centers to replacethe current cluster centers the algorithm will discard theoptimal cluster centers If limit is too large the frequency ofthe scout bees searching the new solution will decrease andcause the algorithm to fall into local optimal and this willseriously affect the cluster results Figure 2 show that when119897119894119898119894119905 = 5 the algorithm reaches the best cluster results andthen when limit gradually increases the cluster effect willgradually become poor

33 Comparison of Performance among FCM IFC and ABC-IFC Algorithms FCM clustering algorithm is very sensitiveto the cluster centers the performance of cluster results ispoor IFC algorithm is proposed based on fuzzy clusteringwhich performs better than FCM clustering but the clustereffect is still not well enough ABC-IFC algorithm overcomesthe above drawbacks Comparisons of precision recall 119875value and f-measure of FCM algorithm ICM algorithm andABC-IFC algorithm are shown in Figures 3 to 6 respectively

Figure 3 shows the comparison of precision of threealgorithms In Figure 3 it can be seen that ABC-IFC obtains

0 02 04 06 08 1075

08

085

09

095

1

F-m

easu

re

The influence of parameter probability

f-measure

Probability

Figure 1 The influence of parameter 119901119903119900119887

075

08

085

09

095

1

F-m

easu

re

0 5 10 15Limit

The influence of parameter limit

f-measure

Figure 2 The influence of parameter limit

the highest precision among three algorithms The valuesof precision fluctuate largely when the number of clustersarranges from 20 to 70 and become steady and reach thebest when the number of clusters is between 80 and 120 Theprecision values of IFC algorithm are between FCMandABC-IFC algorithms the highest precision is achieved when thecluster number is 110 Although precision tends to increasewith the augment of cluster numbers in FCM algorithm theprecision of FCM is the lowest among three algorithms


50 100 1500

01

02

03

04

05

06

07

08

Cluster number

Prec

ision

Comparison of three algorithms in precision value

FCMIFCABC-IFC

Figure 3 Comparison of three algorithms in precision value

50 100 150Cluster number

FCMIFCABC-IFC

0

02

04

06

08

1

Reca

ll

Comparison of three algorithms in recall value

Figure 4 Comparison of three algorithms in recall value

Figure 4 represents the comparison of recall value inthree algorithms Even if the values of ABC-IFC algorithmfluctuate strongly they are the highest compared with theother algorithms The values of FCM algorithm are higherthan IFC but lower than ABC-IFC From Figure 4 when thenumber of clusters is 110 the value tends to be the lowestRecall of IFC is the worst among three algorithms


FCMIFCABC-IFC

0

01

02

03

04

05

06

07

P v

alue

Comparison of three algorithms in P value

Figure 5 Comparison of three algorithms in 119875value


FCMIFCABC-IFC

0

02

04

06

08

1

F-m

easu

re

Comparison of three algorithms in f-measure value

Figure 6 Comparison of three algorithms in f-measure value

Figure 5 describes the comparison of 119875 value amongthree algorithms 119875 value of ABC-IFC decreases with thecluster number increasing at the beginning 119875 value increasesgradually after the cluster number increases to 80 What ismore 119875 value becomes lower and holds steady when thenumber of clusters arranges from 80 to 120 119875 value of IFC isbetween ABC-IFC and FCMThe highest value and strongestfluctuation of 119875 value appears in FCM


Table 2 The comparison of three algorithms on different cluster numbers

Algorithm Cluster number Precision Recall 119875value 119865-measureFCM 10 00339 05814 02256 00641IFC 10 03095 00606 03231 01014ABC-IFC 10 06444 08373 02226 07283FCM 20 00409 05269 04166 00759IFC 20 02924 01074 03383 01571ABC-IFC 20 05741 06156 01964 05941FCM 30 00499 05080 03753 00909IFC 30 03525 00883 03555 01412ABC-IFC 30 05647 06789 01667 06166FCM 40 00581 05006 03322 01041IFC 40 03288 01288 03253 01851ABC-IFC 40 05559 05691 01698 05624FCM 50 00624 04905 03405 01107IFC 50 03010 01172 02907 01687ABC-IFC 50 05933 06452 01094 06182FCM 60 00675 04960 05303 01188IFC 60 02628 02236 02893 02416ABC-IFC 60 05254 07253 00510 06094FCM 70 00765 04920 04147 01324IFC 70 02671 01343 02456 01787ABC-IFC 70 05984 06864 00720 06394FCM 80 00831 04864 03930 01419IFC 80 03057 01777 02171 02248ABC-IFC 80 05665 08331 00954 06744FCM 90 00894 04861 04670 01510IFC 90 02900 02068 02246 02414ABC-IFC 90 06034 07331 00561 06620FCM 100 00938 04829 04597 01571IFC 100 03136 01899 02279 02366ABC-IFC 100 06081 06416 00508 06244FCM 110 01015 03337 03986 01557IFC 110 03666 01956 02242 02551ABC-IFC 110 06032 06785 00656 06386FCM 120 01073 04801 03685 01754IFC 120 03193 01840 01903 02335ABC-IFC 120 05873 06637 00479 06232FCM 130 01124 04819 02595 01823IFC 130 02813 01849 01577 02231ABC-IFC 130 06118 06734 00825 06411FCM 140 01192 04763 04354 01907IFC 140 02818 02373 02576 02576ABC-IFC 140 05622 07579 00750 06455FCM 150 01303 04793 04354 02049IFC 150 02759 01886 02456 02240ABC-IFC 150 07491 07537 00830 07514

Comparison of f-measure among three algorithms isshown in Figure 6 f-measure of ABC-IFC is the highestamong three algorithms and fluctuates slightly Cluster resultsreach the best when the number of clusters changes from 80

to 90 f-measure of IFC is lower than ABC-IFC in whichf-measure tends to increase when the number of clustersincreases but the improvement is not obvious f-measureof FCM increases with the clusters number increasing


Table 3 The proteins classified correctly and wrongly in certain cluster

Ordinalcluster The proteins classified correctly The proteins classified wrongly

1 YJL154c YJL053w YHR012w mdash

2 YLR382cYKR052c

YBR120cYDR194c

YPR134wYIR021w

YOR334wQ0120

YHR005c-aYDL044c

YMR023cYGR222w

YJL133w

3

YDR167wYPL254wYLR055cYHR099w

YDR392wYCL010cYOL148cYDR145w

YGR252wYDR176wYGL066wYBR198c

YGL112cYDR448wYBR081cY MR236w

mdash

4YNL151cYPR110cYOR116c

YOR224cYHR143w-aYKL144c

YOR210wYOR207c

YPR190cYBR154c YNL113w YPR187w YNR003c

5 YPL218wYAL007c

YIL109cYAR002c-a

YPL085wYAL042w

YML012w YGL200cYLR208w

YPR181c YDL195w

6

YOR224cYOR210wYBR154cYPR187w

YDL140cYIL021wYLR418c

YOL005cYDR404cYGL070cYOR151c

YJL140wYHR143w mdash

7 YDR167wYPL254w

YDR392wYCL010c YGR252w YGL112c

YDR176wYOL148cYHR099wYMR236w

YDR448wYGL066wYDR145w

YLR055cYBR081cYBR198c

8 YHR069cYDL111c

YDR280wYGR095c

YOL021cYCR035c

YGR195w mdash

but the values are far behind ABC-IFC The cluster per-formance of IFC is better than FCM However ABC-IFCdemonstrates excellent performance in terms of precisionrecall 119875 value and f-measure

34 Cluster Results Analysis Because it has a great impacton the algorithms the number of cluster must be initializedfirst in ABC-IFC IFC and FCM algorithms There is norule to follow on determination of the number of clustersThus the number of clusters tested was arranged from 1020 and 30 to 140 and 150 Table 2 shows the comparisonof clustering results with different cluster numbers In FCMalgorithm running time is very short When the clusternumber increases recall decreases the precision and f-measure increase Although the running time is short thevalues of both precision and f-measure are too small so thatFCM algorithm is not feasible IFC algorithm performs betterthan FCM in terms of precision the recall value is lowerthan in FCM and the running time is not quite long Butprecision recall 119875 value and f-measure are not well enoughThe running time of ABC-IFC algorithm amplifies with theincrease of cluster numbers Precision recall and 119875 value arebetter than the FCM and IFC algorithms f-measure reachesthe highest and ABC-IFC performs steady with variouscluster numbers

Due to space limitations Table 3 only listed the correctlyand wrongly classified proteins in 8 clusters It can be seenfrom Table 3 that the four clusters marked 1 3 6 and 8are completely correct clusters marked 2 4 5 and 7 arepartly correct Between the correctly and wrongly classifiedproteins we can see that the proteins having the different

functions provide a foundation for the research on the con-nection between different protein functional modules Thecorrectly classified proteins have the same function whichprovides analysis basis for us to identify protein functionalmodules and predict protein function accurately

4 Conclusions

In this paper the fuzzy clustering algorithm has been usedin PPI network clustering Because there are some nodesunreachable in PPI network we have redesigned the cluster-ing objective function and proposed a novel clusteringmodelcombined with the optimization mechanism of artificial beecolony with the intuitionistic fuzzy clustering The computa-tional results on PPI dataset have shown that the algorithmcould not only overcome the drawbacks of sensitivity toclustering center but also have the highest accuracy andrecall rate the lowest 119875 value and the best f-measure amongthe competing algorithms Meanwhile the algorithm has agood effect on PPI network functional modules and alsohas great potential to solve other small-world and scale-freecharacterized complex network problems

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper


Acknowledgments

This paper is supported by the National Natural ScienceFoundation of China (nos 61100164 and 61173190) ScientificResearch Start-up Foundation for Returned Scholars Min-istry of Education of China ([2012]1707) and the Funda-mental Research Funds for the Central Universities ShaanxiNormal University (nos GK201402035 and GK201302025)

References

[1] P R Graves andTA J Haystead ldquoMolecular biologistrsquos guide toproteomicsrdquo Microbiology and Molecular Biology Reviews vol66 no 1 pp 39ndash63 2002

[2] B Yang D-Y Liu J Liu D Jin and H-B Ma ldquoComplexnetwork clustering algorithmsrdquo Journal of Software vol 20 no1 pp 54ndash66 2009

[3] J Wang M Li Y Deng and Y Pan ldquoRecent advances inclustering methods for protein interaction networksrdquo BMCGenomics vol 11 supplement 3 article S10 2010

[4] J A Hartigan and M A Wong ldquoAlgorithm AS 136 a K-Meansclustering algorithmrdquo Journal of the Royal Statistical Society Cvol 28 no 1 pp 100ndash108 1979

[5] F Luo Y Yang C-F Chen R Chang J Zhou and R HScheuermann ldquoModular organization of protein interactionnetworksrdquo Bioinformatics vol 23 no 2 pp 207ndash214 2007

[6] M Li J Wang and J Chen ldquoA fast agglomerate algorithm formining functional modules in protein interaction networksrdquoin Proceedings of the International Conference on BioMedicalEngineering and Informatics vol 1 pp 3ndash7 May 2008

[7] M Li J Wang J Chen and Y Pan ldquoHierarchical organizationof functional modules in weighted protein interaction networksusing clustering coefficientrdquo in Bioinformatics Research andApplications vol 5542 of Lecture Notes in Computer Science pp75ndash86 Springer Berlin Germany 2009

[8] JWangM Li J Chen andY Pan ldquoA fast hierarchical clusteringalgorithm for functional modules discovery in protein inter-action networksrdquo IEEEACM Transactions on ComputationalBiology and Bioinformatics vol 8 no 3 pp 607ndash620 2011

[9] G Palla I Derenyi I Farkas and T Vicsek ldquoUncoveringthe overlapping community structure of complex networks innature and societyrdquoNature vol 435 no 7043 pp 814ndash818 2005

[10] B Adamcsek G Palla I J Farkas I Derenyi and T VicsekldquoCFinder locating cliques and overlapping modules in biologi-cal networksrdquo Bioinformatics vol 22 no 8 pp 1021ndash1023 2006

[11] M Li J Chen J Wang B Hu and G Chen ldquoModifying theDPClus algorithm for identifying protein complexes based onnew topological structuresrdquo BMC Bioinformatics vol 9 article398 2008

[12] J Wang B Liu M Li and Y Pan ldquoIdentifying protein com-plexes from interaction networks based on clique percolationand distance restrictionrdquo BMC Genomics vol 11 supplement 2article S10 2010

[13] J Ren J Wang M Li and L Wang ldquoIdentifying proteincomplexes based on density and modularity in protein-proteininteraction networkrdquo BMC Systems Biology vol 7 article S122013

[14] S Srihari K Ning and HW Leong ldquoMCL-CAw a refinementof MCL for detecting yeast complexes from weighted PPInetworks by incorporating core-attachment structurerdquo BMCBioinformatics vol 11 article 504 2010

[15] D Bu Y Zhao L Cai et al ldquoTopological structure analysisof the protein-protein interaction network in budding yeastrdquoNucleic Acids Research vol 31 no 9 pp 2443ndash2450 2003

[16] P G Sun L Gao and S S Han ldquoIdentification of overlappingand non-overlapping community structure by fuzzy clusteringin complex networksrdquo Information Sciences vol 181 no 6 pp1060ndash1071 2011

[17] E Nabieva K Jim A Agarwal B Chazelle and M SinghldquoWhole-proteome prediction of protein function via graph-theoretic analysis of interaction mapsrdquo Bioinformatics vol 21no 1 pp i302ndashi310 2005

[18] Y-R Cho W Hwang and A Zhang ldquoOptimizing flow-basedmodularization by iterative centroid search in protein inter-action networksrdquo in Proceedings of the 7th IEEE InternationalConference on Bioinformatics and Bioengineering (BIBE rsquo07) pp342ndash349 January 2007

[19] X Lei J Tian L Ge and A Zhang ldquoThe clustering model andalgorithm of PPI network based on propagating mechanism ofartificial bee colonyrdquo Information Sciences vol 247 pp 21ndash392013

[20] X Lei X Huang L Shi and A Zhang ldquoClustering PPI databased on Improved functional-flow model through Quantum-behaved PSOrdquo International Journal of Data Mining and Bioin-formatics vol 6 no 1 pp 42ndash60 2012

[21] X Lei S Wu L Ge and A Zhang ldquoClustering and overlappingmodules detection in PPI network based on IBFOrdquo Proteomicsvol 13 no 2 pp 278ndash290 2013

[22] J C Dunn ldquoA graph theoretic analysis of pattern classificationvia Tamurarsquos fuzzy relationrdquo IEEETransactions on SystemsManand Cybernetics vol 4 no 3 pp 310ndash313 2000

[23] C Bertoluzza M Solci and M L Capodieci ldquoMeasure of afuzzy set The 120572-cut approach in the finite caserdquo Fuzzy Sets andSystems vol 123 no 1 pp 93ndash102 2001

[24] J Ma and L Shao ldquoOptimal algorithm for fuzzy classificationproblemrdquo Journal of Software vol 12 no 4 pp 578ndash581 2001

[25] X Yuan H Li and K Sun ldquoThe cut sets decompositiontheorems and representation theorems on intuitionistic fuzzysets and interval valued fuzzy setsrdquo Science China InformationSciences vol 54 no 1 pp 91ndash110 2011

[26] D J Watts and S H Strogatz ldquoCollective dynamics of ldquosmall-worldrdquo networksrdquoNature vol 393 no 6684 pp 440ndash442 1998

[27] A-L Barabasi and R Albert ldquoEmergence of scaling in randomnetworksrdquo Science vol 286 no 5439 pp 509ndash512 1999

[28] E H Ruspini ldquoA new approach to clusteringrdquo Information andControl vol 15 no 1 pp 22ndash32 1969

[29] D Karaboga and B Basturk ldquoA powerful and efficient algo-rithm for numerical function optimization artificial bee colony(ABC) algorithmrdquo Journal of Global Optimization vol 39 no 3pp 459ndash471 2007

[30] D Karaboga and B Basturk ldquoOn the performance of artificialbee colony (ABC) algorithmrdquo Applied Soft Computing vol 8no 1 pp 687ndash697 2008

[31] D Karaboga and B Akay ldquoA comparative study of artificial beecolony algorithmrdquo Applied Mathematics and Computation vol214 no 1 pp 108ndash132 2009

[32] R Srinivasa Rao S V L Narasimham and M RamalingarajuldquoOptimization of distribution network configuration for lossreduction using artificial bee colony algorithmrdquo InternationalJournal of Electrical Power and Energy Systems Engineering vol1 no 2 Article ID 7092715 2008


[33] Y D Zhang L N Wu S H Wang and Y K Huo ldquoChaoticartificial bee colony used for cluster analysisrdquo Communicationin Computer and Information Science vol 134 pp 205ndash211 2011

[34] M F Tasgetiren Q-K Pan P N Suganthan andAH-L ChenldquoA discrete artificial bee colony algorithm for the total flowtimeminimization in permutation flow shopsrdquo Information Sciencesvol 181 no 16 pp 3459ndash3475 2011

[35] X Lei X Huang and A Zhang ldquoImproved artificial beecolony algorithm and its application in data clusteringrdquo inProceedings of the 5th IEEE International Conference on Bio-Inspired Computing Theories and Applications (BIC-TA rsquo10) pp514ndash521 Changsha China September 2010

[36] O B Hadded and A M B O Afshar ldquoMBO (MarriageBees Optimization) a new heuristic approach in hydrosystemsdesign and operationrdquo in Proceedings of the 1st InternationalConference onManaging Rivers in the 21st Century pp 499ndash5042004

[37] U Guldener MMunsterkotter G Kastenmuller et al ldquoCYGDthe comprehensive yeast genome databaserdquo Nucleic AcidsResearch vol 33 pp D364ndashD368 2005

[38] H W Mewes D Frishman K F Mayer et al ldquoMIPS analysisand annotation of proteins from whole genomes in 2005rdquoNucleic Acids Research vol 34 pp D169ndashD172 2006

[39] A D Zhang Protein Interaction Networks Cambridge Univer-sity Press New York NY USA 2009

[40] D Ucar S Asur and U Catalyurek ldquoImproving functionalmodularity in protein-protein interactions graphs using hub-induced subgraphsrdquo in Knowledge Discovery in DatabasesPKDD 2006 vol 4213 of Lecture Notes in Computer Science pp371ndash382 Springer Berlin Germany 2006

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Anatomy Research International

PeptidesInternational Journal of


Hindawi Publishing Corporation httpwwwhindawicom

International Journal of

Volume 2014

Zoology


Molecular Biology International

GenomicsInternational Journal of


The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014


BioinformaticsAdvances in

Marine BiologyJournal of



Signal TransductionJournal of


BioMed Research International

Evolutionary BiologyInternational Journal of



Biochemistry Research International

ArchaeaHindawi Publishing Corporationhttpwwwhindawicom Volume 2014


Genetics Research International


Advances in

Virolog y

Hindawi Publishing Corporationhttpwwwhindawicom

Nucleic AcidsJournal of

Volume 2014

Stem CellsInternational



Enzyme Research



Microbiology


cluster absolutely Results of these methods are not ideal forPPI networks in that there are some overlapping functionalmodules in PPI networks In fact proteins in PPI networksdo not belong to a single cluster According to the definitionof fuzzy partition Dunn [22] has extended hard clusteringto fuzzy clustering which allows one protein to belong tomultiple clusters This expresses the fuzzy concept of ldquoitbelongs to both this cluster and that clusterrdquo Therefore fuzzyclustering algorithm ismore suitable for clustering proteins inPPI networks Fuzzy c-means (FCM) clustering method [23]optimal fuzzy clustering algorithm [24] and intuitionisticfuzzy clustering (IFC) [25] algorithm are commonly usedTraditional fuzzy clustering algorithms adopt distance as themeasure of sample similarity criteria As most nodes in PPInetworks are unreachable it is difficult to apply them to PPInetworks effectively

In this paper a new clustering model is proposed bycombining artificial bee colony (ABC) optimization andintuitionistic fuzzy membership Specifically the proposedmethods employ fuzzy membership matrix for partitioninginitial clusters first Then a new clustering objective functionis presented by taking characteristics of PPI networks thesimilarity among clusters and the density and the weightsof interaction nodes within clusters into consideration Thenew method also takes advantage of ABC algorithm tooptimize the values of the objective function to gain thecluster results In addition the new algorithm makes useof ABC algorithm to optimize cluster centers automaticallyto overcome defects of sensitivity to cluster centers throughfuzzy c-means clustering and intuitionistic fuzzy clusteringalgorithm Computational experiments show that the newmodel and algorithm perform better than existing algo-rithms

The rest of this paper is organized as follows In Section 2basic concepts and principles are introduced firstly secondlythe proposed ABC fuzzy clustering model is discussed andthen the flow chart and the computational step are listedalong with the time complexity analysis of the algorithmPerformance and evaluation of the ABC-IFC algorithm isshown by comparing with FCM and IFC in Section 3Section 4 concludes this study

2 Materials and Methods

21 Two Properties of PPI Networks Watts and Strogatz[26] presented small-world property of complex networksstating that most nodes are not neighbors of one anotherwhile they can be reached from any of other nodes by asmall number of hops or steps A small-world network hassmaller average distance than random networks and largerclustering coefficient than regular networks There are somesubnetworks in which almost any two nodes are connectedbecause of the effect of the large clustering coefficient On theother hand any two nodes are connectedmostly through oneshort path

Regarding each network as an undirected graph the 119890degree of node V is called 119896 if node V has 119896 adjacent node119875(119896) represents the frequency of nodes with the degree of 119896

if119875(119896) follows the power-lawdistribution such as119875(119896) prop 119896minus120574with 120574 isin [2 3] the graph is considered scale-free Withscale-free property [27] only the degree of a small numberof nodes is large while that of most nodes is small Statisticalphysicists used to call the phenomenon following power-lawdistribution as scale-free phenomenon

22 Fuzzy C-Means Clustering Fuzzy C-means clustering(FCM) algorithm is described by Ruspini [28] It is definedas follows 119883 = 119909

1

1199092

119909119899

is cluster sample 119899 is thenumber of samples 119888 is the number of clusters 1 lt 119888 lt 119899119889119894119895

= 119909119894

minus V119895

is Euclidean distance between the sample 119909119894

and the cluster center V119895

119881 = [V1

V2

V119888

] is cluster centermatrix 119880 = [119906

119894119895

] is 119899 lowast 119888 membership matrix and 119906119894119895

isthe degree of membership of 119909

119894

that belongs to the cluster 119895Update the degree of membership 119906

119894119895

and the cluster centerV119895

by

119906119894119895

=

1

sum119888

119905=1

(

10038171003817100381710038171003817119909119894

minus V119895

10038171003817100381710038171003817

2

1003817100381710038171003817119909119894

minus V119905

1003817100381710038171003817

2

)

2(119898minus1)

V119895

=

sum119899

119894=1

119906119898

119894119895

119909119894

sum119899

119894=1

119906119898

119894119895

(1)

where119898 = 2 The objective function is defined as

119869 =

119899

sum

119894=1

119888

sum

119895=1

119906119898

119894119895

10038171003817100381710038171003817119909119894

minus V119895

10038171003817100381710038171003817

2

(2)

119869will obtain the extremumwhen the matrix119880 is normalized

23 Intuitionistic Fuzzy Clustering (IFC)

231 Intuitionistic Fuzzy SetTheory Let119883 be a given domainan intuitionistic fuzzy set 119877 on119883 is given by

119877 = ⟨119909 120583119877

(119909) ]119877

(119909)⟩ | 119909 isin 119883 (3)

where 120583119877

(119909) 119883 rarr [0 1] and ]119877

(119909) 119883 rarr [0 1]

denote degree of membership and nonmembership of 119877respectively For 119909 isin 119883 in 119877 it is true that 0 le 120583

119877

(119909) +

]119877

(119909) le 1 The general intuitionistic fuzzy set is denotedby 119877 = ⟨119909 120583

119877

(119909) ]119877

(119909)⟩ The general intuitionistic fuzzy setis also marked as 119877 = ⟨119909 120583

119877

(119909) 1 minus 120583119877

(119909)⟩ | 119909 isin 119883120587119877

(119909) = 1 minus 119906119877

(119909) minus ]119877

(119909) denotes intuitionistic index of 119909 in119877 which is a measurement of uncertain degree

232 Intuitionistic Fuzzy Clustering Algorithm Let 119883 =

1199091

1199092

119909119899

be a set of samples to be clustered 119875 =

1199011

1199012

119901119888

is a set of 119888 clustering prototypes where 119888 isthe number of clusters The intuitionistic fuzzy set is givenby 119877 = ⟨(119909

119894

119901119895

) 120583 ]⟩ | 119909119894

isin 119883 119901119895

isin 119875 (119909119894

119901119895

) isthe degree of membership of the sample 119894 in the cluster 119895119881 = [V

1

V2

V119888

] is a group of cluster centers Clusteringprototype model (119901

119894119895

) degree of hesitation (120587119894119895

) degree of


membership (119906119894119895



119901119894119895

= 120572 (1 minus 119890(119889(119909119894 119901119895)

2)4

) (4)

119890120582119896

=

1

sum119888

119894=1

(1 minus 120587119896119894

119890119889

(119909119896V119894 )2

+ 119890120582119896

)

(5)

120587119894119895

=

119908119890

(119894 V119895

) if 119894 = V119895

1 otherwise(6)

119906119894119895

=

(1 minus 120587119894119895

)

(119890119889(119909119894 119901119895)

2

+ 1)

(7)

V119894119895

= 1 minus 120587119894119895

minus 119906119894119895

(8)

where 120572 = 04 119890120582119896


119890



119869 =

119899

sum

119894=1

119888

sum

119895=1

119906119894119895

119889 (119909119894

119901119895

) (9)

119889(119909119894

119901119895

) = (119909119894

minus 119901119895

)(119909119894

minus 119901119895



119894119895

+ V119894119895

+ 120587119894119895

= 1





















min 119891V119886119897

=

1

119888 lowast 119888

119888

sum

119894=1

119888

sum

119895=1

sim (119868119894

119869119895

) +

1

DEWif DEW = 0

1

119888 lowast 119888

119888

sum

119894=1

119888

sum

119895=1

sim (119868119894

119869119895

) otherwise

(10)

where sim(119868119894

119869119895

) = (sum119909isin119868119894119910isin119869119895

119888(119909 119910))max(|119868119894

| |119869119895


119888 (119909 119910) =

1 if 119909 = 119910119908119890


den (119894) = 2119890

119899119894

lowast (119899119894

minus 1)

dw (119894) =sum (119908

119890

)

119899119894

lowast (119899119894

minus 1)

DEW = 1119888

119888

sum

119894=1

(den (119894) + dw (119894))

(11)

119899119894


119890









Step 3 If 119891V119886119897 (119888119897119906119904119905119890119903) lt 119891V119886119897 (119892119887119890119904119905 119888119897119906119904119905119890119903) 119892119887119890119904119905119881 = 119881and 119892119887119890119904119905 119888119897119906119904119905119890119903 = 119888119897119906119904119905119890119903




Step 7 Set 119894119905119890119903 = 119894119905119890119903 + 1













119901119903119890119888119894119904119894119900119899 (119862 | 119865) =

MMS (119862 119865)|119862|

119903119890119888119886119897119897 (119862 | 119865) =

MMS (119862 119865)|119865|

(12)



119875 =

119898

sum

119895=119889

(119863

119895

) (119872minus119863

119898minus119895

)

(119872

119898

)

(13)





119891-119898119890119886119904119906119903119890 =2 sdot 119901119903119890119888119894119904119894119900119899 sdot 119903119890119888119886119897119897

119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

(14)







0 02 04 06 08 1075

08

085

09

095

1

F-m

easu

re


f-measure

Probability


075

08

085

09

095

1

F-m

easu

re

0 5 10 15Limit


f-measure




50 100 1500

01

02

03

04

05

06

07

08

Cluster number

Prec

ision


FCMIFCABC-IFC



FCMIFCABC-IFC

0

02

04

06

08

1

Reca

ll





FCMIFCABC-IFC

0

01

02

03

04

05

06

07

P v

alue




FCMIFCABC-IFC

0

02

04

06

08

1

F-m

easu

re













2 YLR382cYKR052c

YBR120cYDR194c

YPR134wYIR021w

YOR334wQ0120

YHR005c-aYDL044c

YMR023cYGR222w

YJL133w

3





mdash



YOR210wYOR207c


5 YPL218wYAL007c

YIL109cYAR002c-a

YPL085wYAL042w


YPR181c YDL195w

6





7 YDR167wYPL254w





8 YHR069cYDL111c

YDR280wYGR095c

YOL021cYCR035c

YGR195w mdash





4 Conclusions





Acknowledgments


References

















































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology


membership (119906119894119895



119901119894119895

= 120572 (1 minus 119890(119889(119909119894 119901119895)

2)4

) (4)

119890120582119896

=

1

sum119888

119894=1

(1 minus 120587119896119894

119890119889

(119909119896V119894 )2

+ 119890120582119896

)

(5)

120587119894119895

=

119908119890

(119894 V119895

) if 119894 = V119895

1 otherwise(6)

119906119894119895

=

(1 minus 120587119894119895

)

(119890119889(119909119894 119901119895)

2

+ 1)

(7)

V119894119895

= 1 minus 120587119894119895

minus 119906119894119895

(8)

where 120572 = 04 119890120582119896


119890



119869 =

119899

sum

119894=1

119888

sum

119895=1

119906119894119895

119889 (119909119894

119901119895

) (9)

119889(119909119894

119901119895

) = (119909119894

minus 119901119895

)(119909119894

minus 119901119895



119894119895

+ V119894119895

+ 120587119894119895

= 1





















min 119891V119886119897

=

1

119888 lowast 119888

119888

sum

119894=1

119888

sum

119895=1

sim (119868119894

119869119895

) +

1

DEWif DEW = 0

1

119888 lowast 119888

119888

sum

119894=1

119888

sum

119895=1

sim (119868119894

119869119895

) otherwise

(10)

where sim(119868119894

119869119895

) = (sum119909isin119868119894119910isin119869119895

119888(119909 119910))max(|119868119894

| |119869119895


119888 (119909 119910) =

1 if 119909 = 119910119908119890


den (119894) = 2119890

119899119894

lowast (119899119894

minus 1)

dw (119894) =sum (119908

119890

)

119899119894

lowast (119899119894

minus 1)

DEW = 1119888

119888

sum

119894=1

(den (119894) + dw (119894))

(11)

119899119894


119890









Step 3 If 119891V119886119897 (119888119897119906119904119905119890119903) lt 119891V119886119897 (119892119887119890119904119905 119888119897119906119904119905119890119903) 119892119887119890119904119905119881 = 119881and 119892119887119890119904119905 119888119897119906119904119905119890119903 = 119888119897119906119904119905119890119903




Step 7 Set 119894119905119890119903 = 119894119905119890119903 + 1













119901119903119890119888119894119904119894119900119899 (119862 | 119865) =

MMS (119862 119865)|119862|

119903119890119888119886119897119897 (119862 | 119865) =

MMS (119862 119865)|119865|

(12)



119875 =

119898

sum

119895=119889

(119863

119895

) (119872minus119863

119898minus119895

)

(119872

119898

)

(13)





119891-119898119890119886119904119906119903119890 =2 sdot 119901119903119890119888119894119904119894119900119899 sdot 119903119890119888119886119897119897

119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

(14)







0 02 04 06 08 1075

08

085

09

095

1

F-m

easu

re


f-measure

Probability


075

08

085

09

095

1

F-m

easu

re

0 5 10 15Limit


f-measure




50 100 1500

01

02

03

04

05

06

07

08

Cluster number

Prec

ision


FCMIFCABC-IFC



FCMIFCABC-IFC

0

02

04

06

08

1

Reca

ll





FCMIFCABC-IFC

0

01

02

03

04

05

06

07

P v

alue




FCMIFCABC-IFC

0

02

04

06

08

1

F-m

easu

re













2 YLR382cYKR052c

YBR120cYDR194c

YPR134wYIR021w

YOR334wQ0120

YHR005c-aYDL044c

YMR023cYGR222w

YJL133w

3





mdash



YOR210wYOR207c


5 YPL218wYAL007c

YIL109cYAR002c-a

YPL085wYAL042w


YPR181c YDL195w

6





7 YDR167wYPL254w





8 YHR069cYDL111c

YDR280wYGR095c

YOL021cYCR035c

YGR195w mdash





4 Conclusions





Acknowledgments


References

















































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology














min 119891V119886119897

=

1

119888 lowast 119888

119888

sum

119894=1

119888

sum

119895=1

sim (119868119894

119869119895

) +

1

DEWif DEW = 0

1

119888 lowast 119888

119888

sum

119894=1

119888

sum

119895=1

sim (119868119894

119869119895

) otherwise

(10)

where sim(119868119894

119869119895

) = (sum119909isin119868119894119910isin119869119895

119888(119909 119910))max(|119868119894

| |119869119895


119888 (119909 119910) =

1 if 119909 = 119910119908119890


den (119894) = 2119890

119899119894

lowast (119899119894

minus 1)

dw (119894) =sum (119908

119890

)

119899119894

lowast (119899119894

minus 1)

DEW = 1119888

119888

sum

119894=1

(den (119894) + dw (119894))

(11)

119899119894


119890









Step 3 If 119891V119886119897 (119888119897119906119904119905119890119903) lt 119891V119886119897 (119892119887119890119904119905 119888119897119906119904119905119890119903) 119892119887119890119904119905119881 = 119881and 119892119887119890119904119905 119888119897119906119904119905119890119903 = 119888119897119906119904119905119890119903




Step 7 Set 119894119905119890119903 = 119894119905119890119903 + 1













119901119903119890119888119894119904119894119900119899 (119862 | 119865) =

MMS (119862 119865)|119862|

119903119890119888119886119897119897 (119862 | 119865) =

MMS (119862 119865)|119865|

(12)



119875 =

119898

sum

119895=119889

(119863

119895

) (119872minus119863

119898minus119895

)

(119872

119898

)

(13)





119891-119898119890119886119904119906119903119890 =2 sdot 119901119903119890119888119894119904119894119900119899 sdot 119903119890119888119886119897119897

119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

(14)







0 02 04 06 08 1075

08

085

09

095

1

F-m

easu

re


f-measure

Probability


075

08

085

09

095

1

F-m

easu

re

0 5 10 15Limit


f-measure




50 100 1500

01

02

03

04

05

06

07

08

Cluster number

Prec

ision


FCMIFCABC-IFC



FCMIFCABC-IFC

0

02

04

06

08

1

Reca

ll





FCMIFCABC-IFC

0

01

02

03

04

05

06

07

P v

alue




FCMIFCABC-IFC

0

02

04

06

08

1

F-m

easu

re













2 YLR382cYKR052c

YBR120cYDR194c

YPR134wYIR021w

YOR334wQ0120

YHR005c-aYDL044c

YMR023cYGR222w

YJL133w

3





mdash



YOR210wYOR207c


5 YPL218wYAL007c

YIL109cYAR002c-a

YPL085wYAL042w


YPR181c YDL195w

6





7 YDR167wYPL254w





8 YHR069cYDL111c

YDR280wYGR095c

YOL021cYCR035c

YGR195w mdash





4 Conclusions





Acknowledgments


References

















































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology







Step 3 If 119891V119886119897 (119888119897119906119904119905119890119903) lt 119891V119886119897 (119892119887119890119904119905 119888119897119906119904119905119890119903) 119892119887119890119904119905119881 = 119881and 119892119887119890119904119905 119888119897119906119904119905119890119903 = 119888119897119906119904119905119890119903




Step 7 Set 119894119905119890119903 = 119894119905119890119903 + 1













119901119903119890119888119894119904119894119900119899 (119862 | 119865) =

MMS (119862 119865)|119862|

119903119890119888119886119897119897 (119862 | 119865) =

MMS (119862 119865)|119865|

(12)



119875 =

119898

sum

119895=119889

(119863

119895

) (119872minus119863

119898minus119895

)

(119872

119898

)

(13)





119891-119898119890119886119904119906119903119890 =2 sdot 119901119903119890119888119894119904119894119900119899 sdot 119903119890119888119886119897119897

119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

(14)







0 02 04 06 08 1075

08

085

09

095

1

F-m

easu

re


f-measure

Probability


075

08

085

09

095

1

F-m

easu

re

0 5 10 15Limit


f-measure




50 100 1500

01

02

03

04

05

06

07

08

Cluster number

Prec

ision


FCMIFCABC-IFC



FCMIFCABC-IFC

0

02

04

06

08

1

Reca

ll





FCMIFCABC-IFC

0

01

02

03

04

05

06

07

P v

alue




FCMIFCABC-IFC

0

02

04

06

08

1

F-m

easu

re













2 YLR382cYKR052c

YBR120cYDR194c

YPR134wYIR021w

YOR334wQ0120

YHR005c-aYDL044c

YMR023cYGR222w

YJL133w

3





mdash



YOR210wYOR207c


5 YPL218wYAL007c

YIL109cYAR002c-a

YPL085wYAL042w


YPR181c YDL195w

6





7 YDR167wYPL254w





8 YHR069cYDL111c

YDR280wYGR095c

YOL021cYCR035c

YGR195w mdash





4 Conclusions





Acknowledgments


References

















































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology



119891-119898119890119886119904119906119903119890 =2 sdot 119901119903119890119888119894119904119894119900119899 sdot 119903119890119888119886119897119897

119901119903119890119888119894119904119894119900119899 + 119903119890119888119886119897119897

(14)







0 02 04 06 08 1075

08

085

09

095

1

F-m

easu

re


f-measure

Probability


075

08

085

09

095

1

F-m

easu

re

0 5 10 15Limit


f-measure




50 100 1500

01

02

03

04

05

06

07

08

Cluster number

Prec

ision


FCMIFCABC-IFC



FCMIFCABC-IFC

0

02

04

06

08

1

Reca

ll





FCMIFCABC-IFC

0

01

02

03

04

05

06

07

P v

alue




FCMIFCABC-IFC

0

02

04

06

08

1

F-m

easu

re













2 YLR382cYKR052c

YBR120cYDR194c

YPR134wYIR021w

YOR334wQ0120

YHR005c-aYDL044c

YMR023cYGR222w

YJL133w

3





mdash



YOR210wYOR207c


5 YPL218wYAL007c

YIL109cYAR002c-a

YPL085wYAL042w


YPR181c YDL195w

6





7 YDR167wYPL254w





8 YHR069cYDL111c

YDR280wYGR095c

YOL021cYCR035c

YGR195w mdash





4 Conclusions





Acknowledgments


References

















































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology


50 100 1500

01

02

03

04

05

06

07

08

Cluster number

Prec

ision


FCMIFCABC-IFC



FCMIFCABC-IFC

0

02

04

06

08

1

Reca

ll





FCMIFCABC-IFC

0

01

02

03

04

05

06

07

P v

alue




FCMIFCABC-IFC

0

02

04

06

08

1

F-m

easu

re













2 YLR382cYKR052c

YBR120cYDR194c

YPR134wYIR021w

YOR334wQ0120

YHR005c-aYDL044c

YMR023cYGR222w

YJL133w

3





mdash



YOR210wYOR207c


5 YPL218wYAL007c

YIL109cYAR002c-a

YPL085wYAL042w


YPR181c YDL195w

6





7 YDR167wYPL254w





8 YHR069cYDL111c

YDR280wYGR095c

YOL021cYCR035c

YGR195w mdash





4 Conclusions





Acknowledgments


References

















































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology










2 YLR382cYKR052c

YBR120cYDR194c

YPR134wYIR021w

YOR334wQ0120

YHR005c-aYDL044c

YMR023cYGR222w

YJL133w

3





mdash



YOR210wYOR207c


5 YPL218wYAL007c

YIL109cYAR002c-a

YPL085wYAL042w


YPR181c YDL195w

6





7 YDR167wYPL254w





8 YHR069cYDL111c

YDR280wYGR095c

YOL021cYCR035c

YGR195w mdash





4 Conclusions





Acknowledgments


References

















































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology


Acknowledgments


References

















































Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology

















Volume 2014

Zoology






















Advances in

Virolog y



Volume 2014




Enzyme Research



Microbiology

Research Article ABC and IFC: Modules Detection Method for ...

Documents

Transcript of Research Article ABC and IFC: Modules Detection Method for ...