Neighbor Similarity Based Agglomerative Method for Community Detection in Networks · 2019. 7....
Transcript of Neighbor Similarity Based Agglomerative Method for Community Detection in Networks · 2019. 7....
Research ArticleNeighbor Similarity Based Agglomerative Method forCommunity Detection in Networks
Jianjun Cheng 1 Xing Su 1 Haijuan Yang12 Longjie Li 1 Jingming Zhang1
Shiyan Zhao1 and Xiaoyun Chen 1
1School of Information Science and Engineering Lanzhou University China2Department of Electronic Information Engineering Lanzhou Vocational Technical College China
Correspondence should be addressed to Jianjun Cheng chengjianjunlzueducn
Received 27 December 2018 Revised 15 March 2019 Accepted 11 April 2019 Published 2 May 2019
Academic Editor Guang Li
Copyright copy 2019 Jianjun Cheng et al This is an open access article distributed under the Creative Commons Attribution Licensewhich permits unrestricted use distribution and reproduction in any medium provided the original work is properly cited
Community structures can reveal organizations and functional properties of complex networks hence detecting communities fromnetworks is of great importanceWith the surge of large networks in recent years the efficiency of community detection is demandedcritically Therefore many local methods have emerged In this paper we propose a node similarity based community detectionmethod which is also a local one consisted of two phases In the first phase we first take out the node with the largest degree fromthe network to take it as an exemplar of the first community and insert its most similar neighbor node into the community as wellThen the one with the largest degree in the remainder nodes is selected if its most similar neighbor has not been classified intoany community yet we create a new community for the selected node and its most similar neighbor Otherwise if its most similarneighbor has been classified into a certain community we insert the selected node into the community to which its most similarneighbor belongs This procedure is repeated until every node in the network is assigned to a community at that time we obtain aseries of preliminary communities However some of them might be too small or too sparse edges connecting to outside of themmight go beyond the ones inside them Keeping them as the final ones will lead to a low-quality community structureTherefore wemerge some of them in an efficient approach in the second phase to improve the quality of the resulting community structure Totestify the performance of our proposed method extensive experiments are performed on both some artificial networks and somereal-world networks The results show that the proposed method can detect high-quality community structures from networkssteadily and efficiently and outperform the comparison algorithms significantly
1 Introduction
Many real-world systems can be abstracted as complexnetworks in which nodes represent entities in the systemsand edges correspond to interactions between the entitiesOne of the most significant characteristics observed inthese complex networks is the ldquocommunity structurerdquo whichmeans that nodes in the network can be divided into groupsnaturally nodes in the same group are connected densely andconnections across different groups are relatively sparse eachof the node groups is a so-called ldquocommunityrdquo
The communities are always related to functional mod-ules of networks For instance communities can be groupsof web pages in WWW networks [1] or scientific papersin citation networks [2] sharing same topics books with
the same political orientations copurchased from the onlinebookseller Amazoncom [3] pathways or complexes inmetabolic networks or protein-protein interaction networks[4 5] In social networks communities often correspond toreal social groupings having the same interests or profes-sional occupations eg scientist groups classified accordingto the scientistsrsquo specialties in the coauthor relationshipcollaboration networks [6 7] jazz musician groups dividedaccording to the locations and race [8] or affiliations ofgang members in the policing area of Hollenbeck LosAngeles [9] Besides this some researches have indicatedthat networks can present quite different properties whenbeing considered at the community level rather than fromthe perspective of entire network or the individual node[10]
HindawiComplexityVolume 2019 Article ID 8292485 16 pageshttpsdoiorg10115520198292485
2 Complexity
Therefore analyzing the community structures in net-works can facilitate the recognition of the characteristics ofnetworks and make prediction further about the functionalproperties of the corresponding systems That is to saycommunity detection provides us with an effective means forstudying the functional properties of networks via dippinginto structural characteristics which really make sense inpractical applications Therefore a multitude of methods[11 12] have been proposed for detecting communities incomplex networks we will review some related literature inSection 2
In this paper we propose a community detection methodas well which is based on node similarity and consistsof two phases The first phase repeatedly selects the nodewith the largest degree in the remainder of the networkand either takes it as the exemplar of a new communityor inserts it into the community to which its most similarneighbor belongs according to its most similar neighborrsquoscommunity affiliation At the end of this phase we get a seriesof communities However they are only the preliminarycommunities some of them might be too small or too sparseedges connecting to outside of them might go far beyondthe ones inside them Accepting them as the final ones willlead to a low-quality community structure Therefore thesecond phase merges some of the preliminary communitiesto improve the quality of the resulting community struc-ture
The main contributions of this work can be summarizedas follows
(i) We propose a node similarity based local algorithmshortened as NSA for community detection whichis a two-phase method The first phase is used to getthe preliminary communities and the second phaseis to merge some of the preliminary communitiesto improve the quality of the resulting communitystructure
(ii) We propose an index community metric to measurethe sparsity or smallness of a community In thesecond phase we use the index as a criterion todetermine which preliminary communities need tobe merged
(iii) Extensive experiments on some artificial networksand real-world networks are carried out to testify theperformance of the proposed method The experi-mental results show that the performance and thetime complexity of the proposed method are steadilypromising and outperform its competitors
The remainder of this paper is organized as followsSection 2 reviews some literature about community detec-tion The details of the proposed algorithm are elaboratedin Section 3 The experimental results and analysis on bothartificial networks and real-world networks are presented inSection 4 In Section 5 we discuss how to set the optimalvalue for a parameter introduced in our proposed methodand the paper ends with a conclusion in Section 6
2 Related Work
A great deal of community detection methods have beenproposed in the last decade these methods try to explorecommunities in networks from various perspectives Thegraph theory-based methods take the problem of communitydetection as the traditional task of graph partitioning anddivide the network into subnetworks Kernighan-Lin [13]is a representative method of this kind which partitionsthe network into two arbitrary subnetworks first and thenrepeatedly swaps some nodes between the two subnetworksto maximize a predefined gain function
The hierarchical clustering methods reveal multilevelcommunity structures either in divisive ways or in agglomer-ative approaches or in hybrid ways eg GN algorithm [6 7]detects communities by repeatedly removing the edge withthe largest betweenness from the networks its output is adendrogram representing the nested hierarchy of possiblecommunity structures of the network and the level corre-sponding to the largest value of a measure modularity[7] istaken as the final result FastQ algorithm [23 24] takes eachnode in the network as a community first and then repeatedlymerges two of them into one Its output is also a dendrogramdepicting themerge procedure of possible community hierar-chies Zarandi et al [25] randomly removed some edges withlow similarity to obtain some disconnected components asthe primary communities and then some of them aremergedto get the resulting community structure
The modularity optimization-based algorithms detectcommunity structures from networks by utilizing the phys-ical meaning of modularitymdashthe higher the value of mod-ularity the better the community structuremdashand taking themodularity as the objective to optimize For instance in orderto maximize the modularity of the community structureFast119876[23 24] joins a pair of communities whose merge canlead to the largest modularity increment in each iterationLouvain algorithm [26] uses the node-moving strategy toextract community structure with the optimized modularityfrom the network which begins with an initial partition ofeach node being a community as well then for each nodethe algorithm evaluates the modularity gain of moving itinto the community to which each of its neighbors belongsand moves that node into the community with the largestpositive modularity gain consequently SLM (short for SmartLocal Moving) algorithm [27] searches for possibilities ofincreasing modularity with respect to both splitting com-munities and moving sets of nodes from one community toanother
LPA (Label Propagation Algorithm) [28] makes uti-lization of information propagation mechanism to detectcommunities from networks Every node in the network isinitialized with a unique label and all nodes in the networkare arranged in a random order first then each node in thatspecific order updates its label to the one occurred mostfrequently among its neighbors This label update procedureis ended with the status that every node in the networkhas a label which is the majority one among neighborsand nodes with the same labels form a community Owingto its simplicity and high efficiency several variants have
Complexity 3
been derived from LPA Barber et al [29] proposed a seriesof algorithms that propagate labels under some constraintsLPAm is the most famous one which tries to maximizethe modularity during the label propagation procedureChin et al [30] identified the main communities usingthe number of mutual neighboring nodes first then theyattached some independent constraints to the basic LPA andused the constrained LPA to add the remainder nodes intocommunities finally they used a node-moving strategy likethat is employed in Louvain to refine the quality of theresulting community structure Ding et al [31] yielded amodified version of LPA which exploits the idea of densitypeak clustering [32] and Chebyshev inequality to choosecommunity centers from the network and then propagateslabels of the selected centers to the whole network with theproposed multistrategy of label propagation
Density-based methods define and utilize the concept ofdensity in networks for nodes or communities to uncovercommunity structures SCAN [33] borrows the idea from theclassical density-based clustering algorithm DBSCAN [34]to reveal communities hubs and outliers from networksSCAN++ [35] is a derivative of SCAN it reduces time con-sumption via introducing a new data structure and reducingthe number of density evaluations in the detecting procedureIsoFdp [36] maps the network nodes as data points intoa low-dimensional manifold and then exploits the densitypeak clustering algorithm [32] to extract the final communitystructure LCCD algorithm [37] also practices on the wayproposed in the density peak clustering algorithm [32] tolocate the structural centers from networks and then expandscommunities from the identified centers to the borders usinga local search procedure
Network dynamic-based methods explore communitystructures by simulating the dynamic processes in networksRandom walk is a typical dynamic procedure carried out innetworks random walk-based methods utilize the tendencyof the walker being trapped into a community during a shortwalk rather than walking across the community border intoanother community to detect communities from networksWalkTrap [38] makes use of random walk to calculate theprobability of going from one node to another during ashort-length walk and then calculates the distance tomeasurenodesrsquo similarities and community similarities PPC algo-rithm [39] considers the network as a single communityinitially and recursively partitions each community utilizingnode similarities computed using random walks until furtherpartitioning cannot acquire a better value of modularityRWA [40] employs random walks to calculate the probabilityof a node belonging to a community and each communityis expanded by repeatedly attracting the node which ismost likely to belong to that community to join Besidesthis Attractor [41] utilizes distance dynamics to explorecommunities fromnetworks node interactions might changethe distances among nodes and the distance change willmake an impact on the interaction in reverse Members ofthe same community will gradually move together undersuch interplays and nodes in different communities will keepfar away from each other steadily BiAttractor [42] extendsthe concept of distance dynamics and the idea of Attractor
to bipartite networks which is used to detect two-modecommunities of bipartite networks
Spectral methods engage eigenspectra of various net-work-associated matrices to extract communities For exam-ple Amini et al [43] found the initial node partitionsusing the spectral clustering method based on the normal-ized Laplacian matrix derived from a regularized adjacencymatrix those partitions were used for fitting a stochasticblock model by a pseudolikelihood algorithm to detect theresulting community structure SiemonC de Lange et al [44]identified an integrative community structure in the macro-scopic anatomical neural networks of the macaque and catand the microscopic network of the C elegans by examiningthe spectra of their normalized Laplacian matrices Krzakalaet al [45] produced a class of spectral algorithms to detectcommunities based on the nonbacktracking matrix whichdepicts a nonbacktracking walk on the directed edges ofthe network Shi et al [46] proposed a spectral communitydetection method LLSA which employs Lanczos methodto obtain the approximated eigenvector of the transitionmatrix with the largest eigenvalue and the elements of thiseigenvector approximately indicate the affiliation probabilityof the corresponding nodes to the communities
Most of the methods mentioned above are global onesthey detect communities often depending on some globalinformation such as the number of communities informa-tion about eigenvalues or eigenvectors as prior knowledgebut they are hard to acquire due to the size of networksinvolved getting larger and larger Moreover most of themare computationally demanding leading to high time com-plexity These limitations prevent them from being appliedto large-scale applications To overcome the deficiency of theglobal algorithms many local methods have been proposedincluding someof the aforementionedmethods For exampleLPA and most of its variations determine which label shouldbe adopted by a node according to its neighborhood onlyLCCD takes into account both the local density of nodes andthe relative distance between nodes to locate the local struc-tural centers and expands communities from the structuralcenters with a local search procedure LLSA applies a fastheat kernel diffusing to sample a small subnetwork includingalmost all members of a community and the eigenvectorwhose elements suggest nodes for their memberships ofcommunities is obtained by performing Lanczos method onthe sampled subnetwork
Besides this ComSim algorithm [47] identifies cores ofcommunities from bipartite networks by seeking for cycleswhich are node chains formed by following outgoing linksand reaching a node already visited and then allocates theremaining nodes to the communities that maximize thesimilarity between the node and the community In BLI algo-rithm [48] local clustering information and local structuralsimilarity are employed to establish the primary communitystructure then some small-scale communities whose sizesare smaller than a given threshold 120582 are absorbed by somelarger ones kSIM [49] is also a local method that works ina bottom-up way At the beginning each node is taken as acommunity then the preliminary communities are formedby identifying for each node the neighbor community to
4 Complexity
Input 119866(119881 119864) the network 120575 the community metric thresholdOutput 119862119878 the detected community structurelowast form the preliminary community structure119862119878 119901119903119890 lowast
1 119862119878 119901119903119890 larr997888FPC(119866)lowast merge small or sparse communities in 119862119878 119901119903119890 lowast
2 119862119878 larr997888PCM(119862119878 119901119903119890 120575)3 return 119862119878
Algorithm 1 The framework of our proposed method NSA
which one of its 119896 most similar neighbors with the lowestdegree belongs and assigning the node to that community Inthis procedure common neighbor index is employed as thesimilarity measure for each pair of nodes
Compared to those global ones these local methods showgood performance in large-scale networks Inspired by thiswe also propose a local method to extract communities fromnetworks The proposed method is based on node similarityand is termed as NSA (Node Similarity based Algorithm)for short it comprises of two phases the first phase aimsat constructing the preliminary community structure thesecond phase tries to improve the quality of the final resultby merging some small or sparse communities To do sowe also propose a measure community metric to evaluatethe sparsity or smallness of communities The details of theproposed method are elaborated in the next section
3 The Proposed Method
31 The Framework of the Proposed Method The frameworkof the proposed method is outlined by the pseudocode listedin Algorithm 1
As mentioned previously the proposed method consistsof two phases Function calls FPC() and PCM() implementthe two phases respectively The former establishes thepreliminary community structure based on a node selectionstrategy and the node similarity the latter merges somesmall or sparse communities to improve the quality of theresulting community structure The inputs of this algorithmare the network and a threshold 120575 the network involved inthis paper is the undirected and unweighted graph whichis always represented as 119866(119881 119864) as in Algorithm 1 where 119881and 119864 are the node set and edge set respectively |119881| = 119899and |119864| = 119898 are the number of nodes and edges in thenetwork individually The threshold 120575 is used in the secondphase of the proposed method to identify communities to bemergedmdasha community whose community metric is smallerthan 120575 should be merged into another oneThe output of thisalgorithm is the detected community structure
The next two subsections describe the two proceduresconcretely and deliberately
32 Formation of the Preliminary Community Structure Thefunction FPC() implements the first phase of the proposedmethod whose purpose is to construct the preliminarycommunity structure from the network We first pick out
the node with the largest degree from the network takeit as the exemplar of the first community and insert itsmost similar neighbor into the community as well (if thereare more than one node with the largest degree in thenetwork we arbitrarily select any one of them to take it as theexemplar and if the exemplar hasmore than onemost similarneighbors the one with the smallest degree is selected)Afterwards the next largest-degree node in the remainderof network is selected if its most similar neighbor has notbeen classified into any community yet we create a newcommunity for it and its most similar neighbor Otherwiseif its most similar neighbor has been assigned to a certaincommunity (eg the one denoted as 119862119896) we insert theselected node into that community (ie119862119896 ) aswellWe repeatthis process until every node is classified into a community Inthis procedure densely connected nodes can quickly gathertogether around the exemplars to form communities Atthe end of this procedure we get a series of communitieswhich constitute the preliminary community structure of thenetwork The pseudocode describing the entire procedure islisted in Algorithm 2
In this algorithm the degree of node 119906 is the number of119906rsquos neighbors and is denoted as 119889119906 ie
119889119906 = |Γ (119906)| (1)
where
Γ (119906) = V | (119906 V) isin 119864 V isin 119881 (2)
is the set of neighbors of node 119906 119904119894119898(119906 V) stands for thesimilarity between nodes 119906 and V There are abundant waysto calculate the similarity between nodes in the network anyone of themcanbe employed in principleHowever to pursuethe efficiency we calculate it here as in the following equationwhich involves only the neighborhoods of nodes 119906 and Vthemselves
119904119894119898 (119906 V) = |Γ (119906) cap Γ (V)||Γ (119906) cup Γ (V)| (3)
Thevariables119880 and119862119878 119901119903119890 are used to record the unclassifiednodes and the preliminary community structure they arenaturally initialized to be the original node set 119881 of network119866 and an empty set 120601 in step 1 Steps 2 and 3 select the nodewith the largest degree from the remainder of the networkand its most similar neighbors and denote them as V and 119908respectively Step 4 determines whether 119908 has been assigned
Complexity 5
Input 119866(119881 119864) the networkOutput 119862119878 119901119903119890 = 1198621 1198622 sdot sdot sdot 119862119896 the identified preliminary community structure
1 Initialize variables 119880 and 119862119878 119901119903119890 which are used to recordthe unclassified nodes and the preliminary community structure
119880 larr997888 119881 119862119878 119901119903119890 larr997888 1206012 Select the node with the largest degree denote it as V
V larr997888 argmax119906119889119906 | 119906 isin 1198803 Get the most similar neighbor of V denote it as 119908
119908 larr997888 argmax119906119904119894119898(V 119906) | 119906 isin Γ(V)4 if 119908 has not been assigned to any community then5 Create a new community for nodes V and 119908
119870 larr997888 |119862119878 119901119903119890| 119862119870+1 larr997888 V 1199086 Insert the created community into the community structure
119862119878 119901119903119890 larr997888 119862119878 119901119903119890 cup 119862119870+17 Remove nodes V and 119908 from 119880 as they are classified
119880 larr997888 119880 minus V 1199088 else9 Find the community to which 119908 belongs denote it as 119862119896
119896 larr997888 locate(119862119878 119901119903119890 119908)10 Insert node V into 119862119896
119862119896 larr997888 119862119896 cup V11 Remove node V from 119880 as it is classified
119880 larr997888 119880 minus V12 Repeat steps 2 through 11 until 119880 = 12060113 return 119862119878 119901119903119890
Algorithm 2 FPC(G) forming the preliminary community structure
to a community or not if it has not been classified to anycommunity yet steps 5 and 6 create a new community fornodes V and 119908 and insert the newly created community into119862119878 119901119903119890 then step 7 removes nodes V and 119908 from 119880 as theyhave been classified into the new community just now If node119908 has been already assigned to a community step 9 finds thecommunity 119862119896 to which node Vrsquos most similar neighbor 119908belongs and step 10 inserts node V into community 119862119896 Sincenode V has been assigned to community119862119896 step 11 removes itfrom119880 Step 12 repeats operations in steps 2 through 11 until119880 = 120601 meaning that all the nodes in the network have beenvisited At that time the preliminary community structureis obtained in 119862119878 119901119903119890 and is returned as the output of thisalgorithm in step 13
To make it clearer we take Zacharyrsquos karate club network[14] as an example to illustrate intuitively the procedureThis is a network with 34 nodes and 78 edges as shown inFigure 1(a) in which the node with the largest degree is nodelsquo34rsquo and its most similar neighbor is node lsquo33rsquo Thereforenode lsquo34rsquo is taken as the exemplar of the first communityand node lsquo33rsquo is also inserted into this community Thenthe node with the largest degree in the remaining nodes isnode lsquo1rsquo its most similar neighbor is node lsquo2rsquo Since node lsquo2rsquohas not been assigned to a community yet we create a newcommunity take node lsquo1rsquo as its exemplar and insert node lsquo2rsquointo the new community as well The same thing happens tonode pairs (lsquo3rsquo lsquo4rsquo) (lsquo32rsquo lsquo29rsquo) and (lsquo9rsquo lsquo31rsquo) sequentially Thenthe next largest-degree node is lsquo14rsquo its most similar neighbornode lsquo4rsquo is already in the third community therefore weinsert node lsquo14rsquo into the third community All of the other
nodes are processed in the same way and in the subsequentoperations node pairs (lsquo24rsquo rsquo30rsquo) (lsquo6rsquo lsquo7rsquo) (lsquo5rsquo lsquo11rsquo) and (lsquo25rsquolsquo26rsquo) form new communities all of the remaining nodesare inserted into communities to which their most similarneighbors belong At the end of the process we obtain thepreliminary community structure as shown in Figure 1(b) inwhich each node connects to its most similar neighbor witha directed edge
33 Merge of Small or Sparse Communities At the end ofthe first phase of our proposed method we obtain thepreliminary community structure However some commu-nities are either too small or too sparse to make sense justlike the preliminary communities lsquo5rsquo lsquo11rsquo lsquo9rsquo lsquo31rsquo lsquo32rsquolsquo29rsquo lsquo25rsquo lsquo26rsquo lsquo28rsquo lsquo24rsquo lsquo30rsquo lsquo27rsquo and lsquo6rsquo lsquo7rsquo lsquo17rsquo inFigure 1(b) because each of them contains only a few nodesthe inside edges of each of them are very sparse the numberof edges inside each of them is much smaller than that ofedges connecting to outside violating the characteristic thatconnections inside one community are much denser thanthose across different communities Keeping them in the finalcommunity structure will lead to the low quality Thereforewe merge some of the preliminary communities to acquirethe final result in the second phase which is carried out byfunction call PCM() in Algorithm 1
To this end there are two problems needed to be solvedin PCM() The first one is to identify which communities aresmall or sparse enough that need to be merged into anotherones the second one is to select the communities into whicheach of the small or sparse communities should be merged
6 Complexity
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 1 The procedure of FPC() on the karate club network
For the first problem we propose an index communitymetric which takes into account two factors communitysize and community sparsity to find out the preliminarycommunities needed to be merged Here we formalize therelevant concepts and the index as Definition 1 throughDefinition 3
Definition 1 (community sparsity) The sparsity of commu-nity 119862119894 is defined as follows
120572119894 =10038161003816100381610038161003816119864119894119899119894
1003816100381610038161003816100381610038161003816100381610038161198641199001199061199051198941003816100381610038161003816 (4)
where 119864119894119899119894 is the set of edges within community 119862119894 and and119864119900119906119905119894 is the set of edges connecting nodes in community 119862119894with other communities
That is to say the sparsity of community 119862119894 is defined asthe ratio between the number of inner edges of 119862119894 and thenumber of outer edges of 119862119894 Obviously the more edges existwithin community 119862119894 the larger the value of 120572119894 will be andvice versa
Definition 2 (community scale) The scale of community 119862119894is formalized as follows
120573119894 =10038161003816100381610038161198811198941003816100381610038161003816
|119881| (5)
where 119881119894 is the set of nodes in community 119862119894
Obviously the scale of community 119862119894 is defined as theratio of the number of nodes in 119862119894 to the total numberof nodes in the network The more nodes there are incommunity 119862119894 the larger value the ratio will be and viceversa
Definition 3 (community metric) The community metricis a combination of both the community sparsity and thecommunity scale which is defined for community 119862119894 asfollows
120574119894 = 120572119894 lowast 120573119894 (6)
On the basis of these definitions the first problem can besolved by setting a community metric threshold 120575 That is tosay if 120574119894 lt 120575 community 119862119894 needs to be merged into anothercommunity
For the second problem we consider a strategy con-forming to the construction of preliminary communitiesThe preliminary communities are formed based mainly onnode similarity in the first phase therefore we also use thesimilarity as a criterion here to merge communities ie eachof the small or sparse communities is merged into its mostsimilar adjacent communityHere the similarity between twocommunities 119862119894 and 119862119895 is calculated as follows
119878119894119898(119862119894 119862119895) =sum 119906isin119862119894
Visin119862119895119904119894119898 (119906 V)10038161003816100381610038161003816119862119895
10038161003816100381610038161003816 (7)
where 119904119894119898(119906 V) is the similarity between nodes 119906 isin 119862119894and V isin 119862119895 which is calculated using (3) In functionPCM() implementing the merge procedure 119862119894 is a com-munity needed to be merged 119862119895 is one of its adjacentcommunities The numerator of the right term in (7) is thesum of similarities between nodes in communities 119862119894 and119862119895 Dividing by the denominator |119862119895| is a constraint onthe priority for larger communities to prevent from formingsome giant communities
The logic of entire procedure of the second phase is listedin Algorithm 3 the operations are almost self-explanatoryThe variable 119862119878 is used to record the final communitystructure it is initialized as the preliminary communitystructure 119862119878 119901119903119890 in step 1 Step 2 calculates the communitymetric for each of the preliminary communities steps 3 and4 select the community with the smallest community metricand its most similar community step 5 merges them toyield a new community and step 6 calculates the communitymetric for that new community Step 7 replaces the twocommunities 119862119905 and 119862119895 with that new community in 119862119878to reflect the effect of the merge operation Step 8 repeatsoperations in steps 3 through 7 until the minimal communitymetric of the selected community is larger than the giventhreshold 120575 meaning that all the remaining communities aresatisfactory therefore themerge procedure is terminated andthe resulting community structure in119862119878 is returned in step 9
Complexity 7
Input 119862119878 119901119903119890 the preliminary community structure 120575 the community-metric thresholdOutput 119862119878 the final community structure
1 Initialize 119862119878 which is used to record the community structure119862119878 larr997888 119862119878 119901119903119890
2 Calculate the community metric for each of the preliminary communitiesforeach 119862119894 isin 119862119878 do
120574119894 larr997888 120572119894 times 1205731198943 Select the community with the minimal community metric denote its index as 119905
119905 larr997888 argmin119894120574119894 | 119894 = 1 2 sdot sdot sdot |119862119878|4 Identify the most similar community with 119862119905 denote its index as 119895
119895 larr997888 argmax119894119878119894119898(119862119905 119862119894) | 119894 = 1 2 sdot sdot sdot |119862119878| 119894 = 1199055 Merge communities 119862119905 and 119862119895 to form a new community
119896 larr997888 |119862119878| 119862119896+1 larr997888 119862119905 cup 1198621198956 Calculate the community metric for the new community
120574119896+1 larr997888 120572119896+1 times 120573119896+17 Replace the two communities 119862119905 and 119862119895 with the new community to reflect the merging effect
119862119878 = 119862119878 minus 119862119905 119862119895 cup 119862119896+18 Repeat steps 3 through 7 until 120574119905 gt 1205759 return 119862119878
Algorithm 3 PCM(119862119878 119901119903119890 120575) merge small or sparse communities
34 Time Complexity The proposed algorithm is comprisedof two phases the first one is to form the preliminarycommunities The main time consumption in this phase ison the selection of the node with the largest degree (step2 in Algorithm 2) and its most similar neighbor (step 3 inAlgorithm 2) the former can be accomplished in 119874(log 119899) ineach iteration using a max-heap data structure the latter canbe got down in 119874(log⟨119889⟩) with the max-heap where ⟨119889⟩ isthe average degree of nodes in the network Since ⟨119889⟩ ≪ 119899the time consumption of the first phase is 119874(119899 log 119899)
The second phase is used to improve the quality of theresulting community structure by merging some of the smallor sparse communities Themajor time is spent on determin-ing the community needed to be merged and its most similaradjacent community in each iteration Assuming there are119870 communities in the preliminary community structure theformer operation can be implemented in 119874(log119870) the lattercan also be carried out with 119874(log119870) time consumption inthe worst case Hence the second phase can be implementedwith 119874(119870 log119870) time consumption
Since 119870 ≪ 119899 then log119870 ≪ log 119899 Therefore theproposed method can detect communities from networkswith a relatively high efficiency 119874(119899 log 119899) time complexity
4 Experimental Results and Discussion
41 Network Datasets and Comparison System To testify theperformance of our proposed method we have conductedextensive experiments on both some groups of artificial net-works and some real-world networks The artificial networksare synthesized using LFR benchmark network generator[50] which works with some parameters to control thecharacteristics of generated networks Here we consider theinfluences of both the network scale and community sizetherefore four types of networks are generated say smallnetworks with small communities and big communities and
larger networks with small communities and big commu-nities respectively Each of the small networks and largernetworks contains 1000 and 5000 nodes respectively thesmall community contains about 10 nodes at least and 50nodes atmost theminimumandmaximumnumber of nodesin the big communities are 20 and 100 respectively Thegenerated networks with small communities and big commu-nities aremarked using the suffixes lsquosrsquo and lsquobrsquo individuallyTheexponents of the power-law distributions that node degreeand community size follow are the default values minus2 andminus1 respectively The parameters used to synthesize the fourgroups of artificial networks are listed in Table 1
We also performed the experiments on 13 real-worldnetworks the size of these networks spans from tens tohundreds of thousands of nodes the information aboutthem is listed in Table 2 These real-world networks can bedivided into two categories the first category includes thefirst four networks whose ground-truth communities areknown a priori the second one contains the other ninenetworks which have no publicly acknowledged ground-truth community structures
On these networks we ran our proposed method todetect community structures from them and compared theresults to those of 5 popular community detection algorithmsnamely Fast119876[24] WalkTrap [38] LPA[28] Attractor[41]IsoFdp[36] which have been already introduced in Section 2For LPA since it is a nondeterministic algorithm we ranit on each network 10 times and take the average of theevaluation metrics as its resulting metric value obtained fromthat network For our proposedmethod NSA we empiricallyset 120575 = 013 for the dolphin social network and 120575 = 01 forother networks in the experiments The details of how to setthe optimal value of 120575 will be discussed in Section 5
42 Evaluation Metrics Two indexes namely NMI (Nor-malized Mutual Information) [51] and modularity[7] are
8 Complexity
Table 1 The parameters used to generate the LFR networks In the header row of this table 119899 is number of nodes contained in the network⟨119889⟩ and 119889119898119886119909 are the average degree and the max degree respectively exp119889 and exp119888119900119898 are the exponents of the power law distributions thatnode degree and community size follow min(119862119894) and max(119862119894) represent the minimal and maximal number of nodes contained in everycommunity respectively
Network 119899 ⟨119889⟩ 119889119898119886119909 exp119889 expcom min(119862119894) max(119862119894)LFR1000s 1000 20 50 -2 -1 10 50LFR1000b 1000 20 50 -2 -1 20 100LFR5000s 5000 20 50 -2 -1 10 50LFR5000b 5000 20 50 -2 -1 20 100
Table 2 The information about the real-world networks 119899 and119898 are the number of nodes and edges in the network respectively
Network 119899 119898Karate club[14] 34 78Dolphin social network[15] 62 159Risk map[16] 42 83Scientists collaboration network [6] 118 197Lesmis[17] 77 254Polbooks[3] 105 441ColiNeta[18] 423 519NetScience[10] 1589 2742Email[19] 1133 5451YeastL[20] 2361 7182PGP[21] 10680 24316DBLP[22] 317080 1049866Amazon[22] 334863 925872
adopted as the measure metrics to evaluate the qualityof the detected community structure in this paper TheNMI between the ground-truth community structure 119875 =1198751 1198752 119875119870 and the extracted one 1198751015840 = 11987510158401 11987510158402 11987510158401198701015840 is calculated as follows
NMI (119875 1198751015840)
=minus2sum|119875|119894=1sum|119875
1015840|119895=1 119899119894119895 log ((119899119894119895 sdot 119899) (119899119875119894 sdot 119899119875
1015840
119895 ))sum|119875|119894=1 119899119875119894 log (119899119875119894 119899) + sum|119875
1015840|119895=1 119899119875
1015840
119895 log (1198991198751015840
119895 119899)
(8)
where 119899119875119894 = |119875119894| 1198991198751015840
119895 = |1198751015840119895 | and 119899119894119895 = |119875119894 cap 1198751015840119895 | respectivelyThe NMI is an information-theory based metric which
measures how much the detected community structureagrees with the ground truth Therefore it can only be usedto evaluate the quality of the detected community structureon networks whose ground-truth community structure isalready known Its value is in the range of [0 1] larger isbetter
Another metric widely used to evaluate the performanceof community detection method is modularity[7] which isdefined as follows
119876 = sum119894
(119890119894119894 minus 1198862119894 ) (9)
where 119890119894119894 is the diagonal element of a 119870 times 119870 matrix 119890whose element 119890119894119895 is the fraction of edges between nodes incommunities 119862119894 and 119862119895 to the total edges in the network 119870
is the number of communities in the community structure 119886119894is the fraction of edges associated with nodes in community119862119894
The first term sum119894 119890119894119894 in the right of (9) is the fractionof edges within communities the second term sum119894 1198862119894 is theexpected value of the same fraction in a random graph inwhich nodes and degree distribution are the same as in theoriginal network but edges are connected between nodesrandomly The smaller difference is between the two termsthe more the network approaches a random graph then theweaker the community structure is On the contrary thelarger the difference between them is the network departsfurther from the random graph then the stronger the com-munity structure is That is to say the modularity measuresquality of the community structure from the perspective ofhow far the detected result deviates from a random networkits effective value falls in [0 1] higher is better
43 Synthetic Networks We carried out experiments on fourgroups of artificial networks to testify the performance ofthe proposed method As mentioned above all the fourtypes of artificial networks are synthesized using the LFRbenchmark generator software [50] Besides the parameterslisted in Table 1 another critical parameter for this softwareis the mixing parameter 120583 which regulates for each node theratio of edges connected to nodes in other communities Thesmaller the value of 120583 is the clearer the community structurewill be Obviously 120583 = 05 is a transitive point above whichcommunities in networks tend to be obscure
Complexity 9
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a)
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10
NMI
(b)
Figure 2Comparison of different community-detection algorithms on LFR benchmark networks containing 1000 nodes (a)The results detectedfrom small network with small-sized communities (b) The results identified from small networks with big-sized communities
In our experiments we varied the value of 120583 from 01 to08 with an increment of 01 for each group of LFR networksTo eliminate the occasionality we generated 10 networksfor each value of 120583 while keeping the same setting forother parameters Since the community structures have beenalready embedded in these synthetic networks we use NMIas the metric to evaluate the performance of our proposedmethod and the comparison algorithms We took thesenetworks as the input one by one to run our proposedmethodand the comparison algorithms to detect communities anduse the average of NMI as the resulting metric The resultsdetected by our proposal and the comparison algorithmsfrom the small networks with small-sized communities orbig-sized communities are illustrated in Figures 2(a) and 2(b)respectively the results revealed from the larger networkswith small-sized communities and big-sized communities arepresented in Figures 3(a) and 3(b) separately
In Figures 2(a) and 2(b) Fast119876 tends to introducemistakes in the results no matter communities in networksarewell separated or obscure Asmentioned previously Fast119876is a typical modularity-optimization based algorithm it aimsonly at acquiring results with larger modularity rather thanhigh accuracy In our experiments all of the results uncoveredby it are not satisfactory Even in the networks with 120583 =01 it still failed to identify the exact communities andfurthermore its performance is the worst in comparisonalgorithms for 120583 ⩽ 05 For 120583 gt 05 the quality of its results isonly better than that of LPA LPA performed as well as othercomparison algorithms in those networks for 120583 lt 05 but itsperformance dropped dramatically for 120583 ⩾ 05 it even couldnot detect the effective communities from networks for 120583 gt06 This might be due to its own label-update mechanismwhen the community boundaries become obscure nodestend to accept incorrect labels to update their own onesalways leading to the trivial results even all nodes are labeled
as members of one giant community The proposed methodNSA acquired NMI = 1 on all networks for 120583 lt 05 meaningthat the detected partitions are perfectly matched with theground-truth community structures in these networks For120583 = 05 NSA also obtained the results as better as those ofWalkTrap Attractor and IsoFdp For 120583 gt 05 there has beena slip in the quality of the detected community structuresfor all those three algorithms and the proposed method For05 lt 120583 ⩽ 06 the quality of our proposal is better thanthat of Attractor in networks with larger communities andfor 120583 ⩾ 07 the performance of our proposed method is thebest
In Figures 3(a) and 3(b) we obtained the similar results asthose in Figure 2 overall But they still differ from each otherin someway In Figure 3(a) our proposedmethod performedthe best on almost all networks For 05 lt 120583 lt 07 in Figure 2NMI of the results extracted by our proposed method islower than those of WalkTrap and IsoFdp however inFigure 3 the proposedmethod performed better than IsoFdpfor 120583 gt 05 These results suggest that the performancesof the comparison algorithms are not stable on differentnetworks but our proposedmethod can steadily extract high-quality community structures from networks with differentcharacteristics This is also can be manifested from the factthat all the curves of the proposed method in these figuresdecline more slowly than others Moreover we can draw aconclusion by comparing the curves of the proposalrsquos own inthese figures that our proposed method inclines to performbetter on larger networks with small communities thereforeit overcomes the problem of resolution limit to some extent
44 Real-World Networks We also carried out experimentson 13 real-world networks to further test the effectivenessand efficiency of our proposed method As mentioned inSection 41 these networks fall in two categories ones with
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
2 Complexity
Therefore analyzing the community structures in net-works can facilitate the recognition of the characteristics ofnetworks and make prediction further about the functionalproperties of the corresponding systems That is to saycommunity detection provides us with an effective means forstudying the functional properties of networks via dippinginto structural characteristics which really make sense inpractical applications Therefore a multitude of methods[11 12] have been proposed for detecting communities incomplex networks we will review some related literature inSection 2
In this paper we propose a community detection methodas well which is based on node similarity and consistsof two phases The first phase repeatedly selects the nodewith the largest degree in the remainder of the networkand either takes it as the exemplar of a new communityor inserts it into the community to which its most similarneighbor belongs according to its most similar neighborrsquoscommunity affiliation At the end of this phase we get a seriesof communities However they are only the preliminarycommunities some of them might be too small or too sparseedges connecting to outside of them might go far beyondthe ones inside them Accepting them as the final ones willlead to a low-quality community structure Therefore thesecond phase merges some of the preliminary communitiesto improve the quality of the resulting community struc-ture
The main contributions of this work can be summarizedas follows
(i) We propose a node similarity based local algorithmshortened as NSA for community detection whichis a two-phase method The first phase is used to getthe preliminary communities and the second phaseis to merge some of the preliminary communitiesto improve the quality of the resulting communitystructure
(ii) We propose an index community metric to measurethe sparsity or smallness of a community In thesecond phase we use the index as a criterion todetermine which preliminary communities need tobe merged
(iii) Extensive experiments on some artificial networksand real-world networks are carried out to testify theperformance of the proposed method The experi-mental results show that the performance and thetime complexity of the proposed method are steadilypromising and outperform its competitors
The remainder of this paper is organized as followsSection 2 reviews some literature about community detec-tion The details of the proposed algorithm are elaboratedin Section 3 The experimental results and analysis on bothartificial networks and real-world networks are presented inSection 4 In Section 5 we discuss how to set the optimalvalue for a parameter introduced in our proposed methodand the paper ends with a conclusion in Section 6
2 Related Work
A great deal of community detection methods have beenproposed in the last decade these methods try to explorecommunities in networks from various perspectives Thegraph theory-based methods take the problem of communitydetection as the traditional task of graph partitioning anddivide the network into subnetworks Kernighan-Lin [13]is a representative method of this kind which partitionsthe network into two arbitrary subnetworks first and thenrepeatedly swaps some nodes between the two subnetworksto maximize a predefined gain function
The hierarchical clustering methods reveal multilevelcommunity structures either in divisive ways or in agglomer-ative approaches or in hybrid ways eg GN algorithm [6 7]detects communities by repeatedly removing the edge withthe largest betweenness from the networks its output is adendrogram representing the nested hierarchy of possiblecommunity structures of the network and the level corre-sponding to the largest value of a measure modularity[7] istaken as the final result FastQ algorithm [23 24] takes eachnode in the network as a community first and then repeatedlymerges two of them into one Its output is also a dendrogramdepicting themerge procedure of possible community hierar-chies Zarandi et al [25] randomly removed some edges withlow similarity to obtain some disconnected components asthe primary communities and then some of them aremergedto get the resulting community structure
The modularity optimization-based algorithms detectcommunity structures from networks by utilizing the phys-ical meaning of modularitymdashthe higher the value of mod-ularity the better the community structuremdashand taking themodularity as the objective to optimize For instance in orderto maximize the modularity of the community structureFast119876[23 24] joins a pair of communities whose merge canlead to the largest modularity increment in each iterationLouvain algorithm [26] uses the node-moving strategy toextract community structure with the optimized modularityfrom the network which begins with an initial partition ofeach node being a community as well then for each nodethe algorithm evaluates the modularity gain of moving itinto the community to which each of its neighbors belongsand moves that node into the community with the largestpositive modularity gain consequently SLM (short for SmartLocal Moving) algorithm [27] searches for possibilities ofincreasing modularity with respect to both splitting com-munities and moving sets of nodes from one community toanother
LPA (Label Propagation Algorithm) [28] makes uti-lization of information propagation mechanism to detectcommunities from networks Every node in the network isinitialized with a unique label and all nodes in the networkare arranged in a random order first then each node in thatspecific order updates its label to the one occurred mostfrequently among its neighbors This label update procedureis ended with the status that every node in the networkhas a label which is the majority one among neighborsand nodes with the same labels form a community Owingto its simplicity and high efficiency several variants have
Complexity 3
been derived from LPA Barber et al [29] proposed a seriesof algorithms that propagate labels under some constraintsLPAm is the most famous one which tries to maximizethe modularity during the label propagation procedureChin et al [30] identified the main communities usingthe number of mutual neighboring nodes first then theyattached some independent constraints to the basic LPA andused the constrained LPA to add the remainder nodes intocommunities finally they used a node-moving strategy likethat is employed in Louvain to refine the quality of theresulting community structure Ding et al [31] yielded amodified version of LPA which exploits the idea of densitypeak clustering [32] and Chebyshev inequality to choosecommunity centers from the network and then propagateslabels of the selected centers to the whole network with theproposed multistrategy of label propagation
Density-based methods define and utilize the concept ofdensity in networks for nodes or communities to uncovercommunity structures SCAN [33] borrows the idea from theclassical density-based clustering algorithm DBSCAN [34]to reveal communities hubs and outliers from networksSCAN++ [35] is a derivative of SCAN it reduces time con-sumption via introducing a new data structure and reducingthe number of density evaluations in the detecting procedureIsoFdp [36] maps the network nodes as data points intoa low-dimensional manifold and then exploits the densitypeak clustering algorithm [32] to extract the final communitystructure LCCD algorithm [37] also practices on the wayproposed in the density peak clustering algorithm [32] tolocate the structural centers from networks and then expandscommunities from the identified centers to the borders usinga local search procedure
Network dynamic-based methods explore communitystructures by simulating the dynamic processes in networksRandom walk is a typical dynamic procedure carried out innetworks random walk-based methods utilize the tendencyof the walker being trapped into a community during a shortwalk rather than walking across the community border intoanother community to detect communities from networksWalkTrap [38] makes use of random walk to calculate theprobability of going from one node to another during ashort-length walk and then calculates the distance tomeasurenodesrsquo similarities and community similarities PPC algo-rithm [39] considers the network as a single communityinitially and recursively partitions each community utilizingnode similarities computed using random walks until furtherpartitioning cannot acquire a better value of modularityRWA [40] employs random walks to calculate the probabilityof a node belonging to a community and each communityis expanded by repeatedly attracting the node which ismost likely to belong to that community to join Besidesthis Attractor [41] utilizes distance dynamics to explorecommunities fromnetworks node interactions might changethe distances among nodes and the distance change willmake an impact on the interaction in reverse Members ofthe same community will gradually move together undersuch interplays and nodes in different communities will keepfar away from each other steadily BiAttractor [42] extendsthe concept of distance dynamics and the idea of Attractor
to bipartite networks which is used to detect two-modecommunities of bipartite networks
Spectral methods engage eigenspectra of various net-work-associated matrices to extract communities For exam-ple Amini et al [43] found the initial node partitionsusing the spectral clustering method based on the normal-ized Laplacian matrix derived from a regularized adjacencymatrix those partitions were used for fitting a stochasticblock model by a pseudolikelihood algorithm to detect theresulting community structure SiemonC de Lange et al [44]identified an integrative community structure in the macro-scopic anatomical neural networks of the macaque and catand the microscopic network of the C elegans by examiningthe spectra of their normalized Laplacian matrices Krzakalaet al [45] produced a class of spectral algorithms to detectcommunities based on the nonbacktracking matrix whichdepicts a nonbacktracking walk on the directed edges ofthe network Shi et al [46] proposed a spectral communitydetection method LLSA which employs Lanczos methodto obtain the approximated eigenvector of the transitionmatrix with the largest eigenvalue and the elements of thiseigenvector approximately indicate the affiliation probabilityof the corresponding nodes to the communities
Most of the methods mentioned above are global onesthey detect communities often depending on some globalinformation such as the number of communities informa-tion about eigenvalues or eigenvectors as prior knowledgebut they are hard to acquire due to the size of networksinvolved getting larger and larger Moreover most of themare computationally demanding leading to high time com-plexity These limitations prevent them from being appliedto large-scale applications To overcome the deficiency of theglobal algorithms many local methods have been proposedincluding someof the aforementionedmethods For exampleLPA and most of its variations determine which label shouldbe adopted by a node according to its neighborhood onlyLCCD takes into account both the local density of nodes andthe relative distance between nodes to locate the local struc-tural centers and expands communities from the structuralcenters with a local search procedure LLSA applies a fastheat kernel diffusing to sample a small subnetwork includingalmost all members of a community and the eigenvectorwhose elements suggest nodes for their memberships ofcommunities is obtained by performing Lanczos method onthe sampled subnetwork
Besides this ComSim algorithm [47] identifies cores ofcommunities from bipartite networks by seeking for cycleswhich are node chains formed by following outgoing linksand reaching a node already visited and then allocates theremaining nodes to the communities that maximize thesimilarity between the node and the community In BLI algo-rithm [48] local clustering information and local structuralsimilarity are employed to establish the primary communitystructure then some small-scale communities whose sizesare smaller than a given threshold 120582 are absorbed by somelarger ones kSIM [49] is also a local method that works ina bottom-up way At the beginning each node is taken as acommunity then the preliminary communities are formedby identifying for each node the neighbor community to
4 Complexity
Input 119866(119881 119864) the network 120575 the community metric thresholdOutput 119862119878 the detected community structurelowast form the preliminary community structure119862119878 119901119903119890 lowast
1 119862119878 119901119903119890 larr997888FPC(119866)lowast merge small or sparse communities in 119862119878 119901119903119890 lowast
2 119862119878 larr997888PCM(119862119878 119901119903119890 120575)3 return 119862119878
Algorithm 1 The framework of our proposed method NSA
which one of its 119896 most similar neighbors with the lowestdegree belongs and assigning the node to that community Inthis procedure common neighbor index is employed as thesimilarity measure for each pair of nodes
Compared to those global ones these local methods showgood performance in large-scale networks Inspired by thiswe also propose a local method to extract communities fromnetworks The proposed method is based on node similarityand is termed as NSA (Node Similarity based Algorithm)for short it comprises of two phases the first phase aimsat constructing the preliminary community structure thesecond phase tries to improve the quality of the final resultby merging some small or sparse communities To do sowe also propose a measure community metric to evaluatethe sparsity or smallness of communities The details of theproposed method are elaborated in the next section
3 The Proposed Method
31 The Framework of the Proposed Method The frameworkof the proposed method is outlined by the pseudocode listedin Algorithm 1
As mentioned previously the proposed method consistsof two phases Function calls FPC() and PCM() implementthe two phases respectively The former establishes thepreliminary community structure based on a node selectionstrategy and the node similarity the latter merges somesmall or sparse communities to improve the quality of theresulting community structure The inputs of this algorithmare the network and a threshold 120575 the network involved inthis paper is the undirected and unweighted graph whichis always represented as 119866(119881 119864) as in Algorithm 1 where 119881and 119864 are the node set and edge set respectively |119881| = 119899and |119864| = 119898 are the number of nodes and edges in thenetwork individually The threshold 120575 is used in the secondphase of the proposed method to identify communities to bemergedmdasha community whose community metric is smallerthan 120575 should be merged into another oneThe output of thisalgorithm is the detected community structure
The next two subsections describe the two proceduresconcretely and deliberately
32 Formation of the Preliminary Community Structure Thefunction FPC() implements the first phase of the proposedmethod whose purpose is to construct the preliminarycommunity structure from the network We first pick out
the node with the largest degree from the network takeit as the exemplar of the first community and insert itsmost similar neighbor into the community as well (if thereare more than one node with the largest degree in thenetwork we arbitrarily select any one of them to take it as theexemplar and if the exemplar hasmore than onemost similarneighbors the one with the smallest degree is selected)Afterwards the next largest-degree node in the remainderof network is selected if its most similar neighbor has notbeen classified into any community yet we create a newcommunity for it and its most similar neighbor Otherwiseif its most similar neighbor has been assigned to a certaincommunity (eg the one denoted as 119862119896) we insert theselected node into that community (ie119862119896 ) aswellWe repeatthis process until every node is classified into a community Inthis procedure densely connected nodes can quickly gathertogether around the exemplars to form communities Atthe end of this procedure we get a series of communitieswhich constitute the preliminary community structure of thenetwork The pseudocode describing the entire procedure islisted in Algorithm 2
In this algorithm the degree of node 119906 is the number of119906rsquos neighbors and is denoted as 119889119906 ie
119889119906 = |Γ (119906)| (1)
where
Γ (119906) = V | (119906 V) isin 119864 V isin 119881 (2)
is the set of neighbors of node 119906 119904119894119898(119906 V) stands for thesimilarity between nodes 119906 and V There are abundant waysto calculate the similarity between nodes in the network anyone of themcanbe employed in principleHowever to pursuethe efficiency we calculate it here as in the following equationwhich involves only the neighborhoods of nodes 119906 and Vthemselves
119904119894119898 (119906 V) = |Γ (119906) cap Γ (V)||Γ (119906) cup Γ (V)| (3)
Thevariables119880 and119862119878 119901119903119890 are used to record the unclassifiednodes and the preliminary community structure they arenaturally initialized to be the original node set 119881 of network119866 and an empty set 120601 in step 1 Steps 2 and 3 select the nodewith the largest degree from the remainder of the networkand its most similar neighbors and denote them as V and 119908respectively Step 4 determines whether 119908 has been assigned
Complexity 5
Input 119866(119881 119864) the networkOutput 119862119878 119901119903119890 = 1198621 1198622 sdot sdot sdot 119862119896 the identified preliminary community structure
1 Initialize variables 119880 and 119862119878 119901119903119890 which are used to recordthe unclassified nodes and the preliminary community structure
119880 larr997888 119881 119862119878 119901119903119890 larr997888 1206012 Select the node with the largest degree denote it as V
V larr997888 argmax119906119889119906 | 119906 isin 1198803 Get the most similar neighbor of V denote it as 119908
119908 larr997888 argmax119906119904119894119898(V 119906) | 119906 isin Γ(V)4 if 119908 has not been assigned to any community then5 Create a new community for nodes V and 119908
119870 larr997888 |119862119878 119901119903119890| 119862119870+1 larr997888 V 1199086 Insert the created community into the community structure
119862119878 119901119903119890 larr997888 119862119878 119901119903119890 cup 119862119870+17 Remove nodes V and 119908 from 119880 as they are classified
119880 larr997888 119880 minus V 1199088 else9 Find the community to which 119908 belongs denote it as 119862119896
119896 larr997888 locate(119862119878 119901119903119890 119908)10 Insert node V into 119862119896
119862119896 larr997888 119862119896 cup V11 Remove node V from 119880 as it is classified
119880 larr997888 119880 minus V12 Repeat steps 2 through 11 until 119880 = 12060113 return 119862119878 119901119903119890
Algorithm 2 FPC(G) forming the preliminary community structure
to a community or not if it has not been classified to anycommunity yet steps 5 and 6 create a new community fornodes V and 119908 and insert the newly created community into119862119878 119901119903119890 then step 7 removes nodes V and 119908 from 119880 as theyhave been classified into the new community just now If node119908 has been already assigned to a community step 9 finds thecommunity 119862119896 to which node Vrsquos most similar neighbor 119908belongs and step 10 inserts node V into community 119862119896 Sincenode V has been assigned to community119862119896 step 11 removes itfrom119880 Step 12 repeats operations in steps 2 through 11 until119880 = 120601 meaning that all the nodes in the network have beenvisited At that time the preliminary community structureis obtained in 119862119878 119901119903119890 and is returned as the output of thisalgorithm in step 13
To make it clearer we take Zacharyrsquos karate club network[14] as an example to illustrate intuitively the procedureThis is a network with 34 nodes and 78 edges as shown inFigure 1(a) in which the node with the largest degree is nodelsquo34rsquo and its most similar neighbor is node lsquo33rsquo Thereforenode lsquo34rsquo is taken as the exemplar of the first communityand node lsquo33rsquo is also inserted into this community Thenthe node with the largest degree in the remaining nodes isnode lsquo1rsquo its most similar neighbor is node lsquo2rsquo Since node lsquo2rsquohas not been assigned to a community yet we create a newcommunity take node lsquo1rsquo as its exemplar and insert node lsquo2rsquointo the new community as well The same thing happens tonode pairs (lsquo3rsquo lsquo4rsquo) (lsquo32rsquo lsquo29rsquo) and (lsquo9rsquo lsquo31rsquo) sequentially Thenthe next largest-degree node is lsquo14rsquo its most similar neighbornode lsquo4rsquo is already in the third community therefore weinsert node lsquo14rsquo into the third community All of the other
nodes are processed in the same way and in the subsequentoperations node pairs (lsquo24rsquo rsquo30rsquo) (lsquo6rsquo lsquo7rsquo) (lsquo5rsquo lsquo11rsquo) and (lsquo25rsquolsquo26rsquo) form new communities all of the remaining nodesare inserted into communities to which their most similarneighbors belong At the end of the process we obtain thepreliminary community structure as shown in Figure 1(b) inwhich each node connects to its most similar neighbor witha directed edge
33 Merge of Small or Sparse Communities At the end ofthe first phase of our proposed method we obtain thepreliminary community structure However some commu-nities are either too small or too sparse to make sense justlike the preliminary communities lsquo5rsquo lsquo11rsquo lsquo9rsquo lsquo31rsquo lsquo32rsquolsquo29rsquo lsquo25rsquo lsquo26rsquo lsquo28rsquo lsquo24rsquo lsquo30rsquo lsquo27rsquo and lsquo6rsquo lsquo7rsquo lsquo17rsquo inFigure 1(b) because each of them contains only a few nodesthe inside edges of each of them are very sparse the numberof edges inside each of them is much smaller than that ofedges connecting to outside violating the characteristic thatconnections inside one community are much denser thanthose across different communities Keeping them in the finalcommunity structure will lead to the low quality Thereforewe merge some of the preliminary communities to acquirethe final result in the second phase which is carried out byfunction call PCM() in Algorithm 1
To this end there are two problems needed to be solvedin PCM() The first one is to identify which communities aresmall or sparse enough that need to be merged into anotherones the second one is to select the communities into whicheach of the small or sparse communities should be merged
6 Complexity
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 1 The procedure of FPC() on the karate club network
For the first problem we propose an index communitymetric which takes into account two factors communitysize and community sparsity to find out the preliminarycommunities needed to be merged Here we formalize therelevant concepts and the index as Definition 1 throughDefinition 3
Definition 1 (community sparsity) The sparsity of commu-nity 119862119894 is defined as follows
120572119894 =10038161003816100381610038161003816119864119894119899119894
1003816100381610038161003816100381610038161003816100381610038161198641199001199061199051198941003816100381610038161003816 (4)
where 119864119894119899119894 is the set of edges within community 119862119894 and and119864119900119906119905119894 is the set of edges connecting nodes in community 119862119894with other communities
That is to say the sparsity of community 119862119894 is defined asthe ratio between the number of inner edges of 119862119894 and thenumber of outer edges of 119862119894 Obviously the more edges existwithin community 119862119894 the larger the value of 120572119894 will be andvice versa
Definition 2 (community scale) The scale of community 119862119894is formalized as follows
120573119894 =10038161003816100381610038161198811198941003816100381610038161003816
|119881| (5)
where 119881119894 is the set of nodes in community 119862119894
Obviously the scale of community 119862119894 is defined as theratio of the number of nodes in 119862119894 to the total numberof nodes in the network The more nodes there are incommunity 119862119894 the larger value the ratio will be and viceversa
Definition 3 (community metric) The community metricis a combination of both the community sparsity and thecommunity scale which is defined for community 119862119894 asfollows
120574119894 = 120572119894 lowast 120573119894 (6)
On the basis of these definitions the first problem can besolved by setting a community metric threshold 120575 That is tosay if 120574119894 lt 120575 community 119862119894 needs to be merged into anothercommunity
For the second problem we consider a strategy con-forming to the construction of preliminary communitiesThe preliminary communities are formed based mainly onnode similarity in the first phase therefore we also use thesimilarity as a criterion here to merge communities ie eachof the small or sparse communities is merged into its mostsimilar adjacent communityHere the similarity between twocommunities 119862119894 and 119862119895 is calculated as follows
119878119894119898(119862119894 119862119895) =sum 119906isin119862119894
Visin119862119895119904119894119898 (119906 V)10038161003816100381610038161003816119862119895
10038161003816100381610038161003816 (7)
where 119904119894119898(119906 V) is the similarity between nodes 119906 isin 119862119894and V isin 119862119895 which is calculated using (3) In functionPCM() implementing the merge procedure 119862119894 is a com-munity needed to be merged 119862119895 is one of its adjacentcommunities The numerator of the right term in (7) is thesum of similarities between nodes in communities 119862119894 and119862119895 Dividing by the denominator |119862119895| is a constraint onthe priority for larger communities to prevent from formingsome giant communities
The logic of entire procedure of the second phase is listedin Algorithm 3 the operations are almost self-explanatoryThe variable 119862119878 is used to record the final communitystructure it is initialized as the preliminary communitystructure 119862119878 119901119903119890 in step 1 Step 2 calculates the communitymetric for each of the preliminary communities steps 3 and4 select the community with the smallest community metricand its most similar community step 5 merges them toyield a new community and step 6 calculates the communitymetric for that new community Step 7 replaces the twocommunities 119862119905 and 119862119895 with that new community in 119862119878to reflect the effect of the merge operation Step 8 repeatsoperations in steps 3 through 7 until the minimal communitymetric of the selected community is larger than the giventhreshold 120575 meaning that all the remaining communities aresatisfactory therefore themerge procedure is terminated andthe resulting community structure in119862119878 is returned in step 9
Complexity 7
Input 119862119878 119901119903119890 the preliminary community structure 120575 the community-metric thresholdOutput 119862119878 the final community structure
1 Initialize 119862119878 which is used to record the community structure119862119878 larr997888 119862119878 119901119903119890
2 Calculate the community metric for each of the preliminary communitiesforeach 119862119894 isin 119862119878 do
120574119894 larr997888 120572119894 times 1205731198943 Select the community with the minimal community metric denote its index as 119905
119905 larr997888 argmin119894120574119894 | 119894 = 1 2 sdot sdot sdot |119862119878|4 Identify the most similar community with 119862119905 denote its index as 119895
119895 larr997888 argmax119894119878119894119898(119862119905 119862119894) | 119894 = 1 2 sdot sdot sdot |119862119878| 119894 = 1199055 Merge communities 119862119905 and 119862119895 to form a new community
119896 larr997888 |119862119878| 119862119896+1 larr997888 119862119905 cup 1198621198956 Calculate the community metric for the new community
120574119896+1 larr997888 120572119896+1 times 120573119896+17 Replace the two communities 119862119905 and 119862119895 with the new community to reflect the merging effect
119862119878 = 119862119878 minus 119862119905 119862119895 cup 119862119896+18 Repeat steps 3 through 7 until 120574119905 gt 1205759 return 119862119878
Algorithm 3 PCM(119862119878 119901119903119890 120575) merge small or sparse communities
34 Time Complexity The proposed algorithm is comprisedof two phases the first one is to form the preliminarycommunities The main time consumption in this phase ison the selection of the node with the largest degree (step2 in Algorithm 2) and its most similar neighbor (step 3 inAlgorithm 2) the former can be accomplished in 119874(log 119899) ineach iteration using a max-heap data structure the latter canbe got down in 119874(log⟨119889⟩) with the max-heap where ⟨119889⟩ isthe average degree of nodes in the network Since ⟨119889⟩ ≪ 119899the time consumption of the first phase is 119874(119899 log 119899)
The second phase is used to improve the quality of theresulting community structure by merging some of the smallor sparse communities Themajor time is spent on determin-ing the community needed to be merged and its most similaradjacent community in each iteration Assuming there are119870 communities in the preliminary community structure theformer operation can be implemented in 119874(log119870) the lattercan also be carried out with 119874(log119870) time consumption inthe worst case Hence the second phase can be implementedwith 119874(119870 log119870) time consumption
Since 119870 ≪ 119899 then log119870 ≪ log 119899 Therefore theproposed method can detect communities from networkswith a relatively high efficiency 119874(119899 log 119899) time complexity
4 Experimental Results and Discussion
41 Network Datasets and Comparison System To testify theperformance of our proposed method we have conductedextensive experiments on both some groups of artificial net-works and some real-world networks The artificial networksare synthesized using LFR benchmark network generator[50] which works with some parameters to control thecharacteristics of generated networks Here we consider theinfluences of both the network scale and community sizetherefore four types of networks are generated say smallnetworks with small communities and big communities and
larger networks with small communities and big commu-nities respectively Each of the small networks and largernetworks contains 1000 and 5000 nodes respectively thesmall community contains about 10 nodes at least and 50nodes atmost theminimumandmaximumnumber of nodesin the big communities are 20 and 100 respectively Thegenerated networks with small communities and big commu-nities aremarked using the suffixes lsquosrsquo and lsquobrsquo individuallyTheexponents of the power-law distributions that node degreeand community size follow are the default values minus2 andminus1 respectively The parameters used to synthesize the fourgroups of artificial networks are listed in Table 1
We also performed the experiments on 13 real-worldnetworks the size of these networks spans from tens tohundreds of thousands of nodes the information aboutthem is listed in Table 2 These real-world networks can bedivided into two categories the first category includes thefirst four networks whose ground-truth communities areknown a priori the second one contains the other ninenetworks which have no publicly acknowledged ground-truth community structures
On these networks we ran our proposed method todetect community structures from them and compared theresults to those of 5 popular community detection algorithmsnamely Fast119876[24] WalkTrap [38] LPA[28] Attractor[41]IsoFdp[36] which have been already introduced in Section 2For LPA since it is a nondeterministic algorithm we ranit on each network 10 times and take the average of theevaluation metrics as its resulting metric value obtained fromthat network For our proposedmethod NSA we empiricallyset 120575 = 013 for the dolphin social network and 120575 = 01 forother networks in the experiments The details of how to setthe optimal value of 120575 will be discussed in Section 5
42 Evaluation Metrics Two indexes namely NMI (Nor-malized Mutual Information) [51] and modularity[7] are
8 Complexity
Table 1 The parameters used to generate the LFR networks In the header row of this table 119899 is number of nodes contained in the network⟨119889⟩ and 119889119898119886119909 are the average degree and the max degree respectively exp119889 and exp119888119900119898 are the exponents of the power law distributions thatnode degree and community size follow min(119862119894) and max(119862119894) represent the minimal and maximal number of nodes contained in everycommunity respectively
Network 119899 ⟨119889⟩ 119889119898119886119909 exp119889 expcom min(119862119894) max(119862119894)LFR1000s 1000 20 50 -2 -1 10 50LFR1000b 1000 20 50 -2 -1 20 100LFR5000s 5000 20 50 -2 -1 10 50LFR5000b 5000 20 50 -2 -1 20 100
Table 2 The information about the real-world networks 119899 and119898 are the number of nodes and edges in the network respectively
Network 119899 119898Karate club[14] 34 78Dolphin social network[15] 62 159Risk map[16] 42 83Scientists collaboration network [6] 118 197Lesmis[17] 77 254Polbooks[3] 105 441ColiNeta[18] 423 519NetScience[10] 1589 2742Email[19] 1133 5451YeastL[20] 2361 7182PGP[21] 10680 24316DBLP[22] 317080 1049866Amazon[22] 334863 925872
adopted as the measure metrics to evaluate the qualityof the detected community structure in this paper TheNMI between the ground-truth community structure 119875 =1198751 1198752 119875119870 and the extracted one 1198751015840 = 11987510158401 11987510158402 11987510158401198701015840 is calculated as follows
NMI (119875 1198751015840)
=minus2sum|119875|119894=1sum|119875
1015840|119895=1 119899119894119895 log ((119899119894119895 sdot 119899) (119899119875119894 sdot 119899119875
1015840
119895 ))sum|119875|119894=1 119899119875119894 log (119899119875119894 119899) + sum|119875
1015840|119895=1 119899119875
1015840
119895 log (1198991198751015840
119895 119899)
(8)
where 119899119875119894 = |119875119894| 1198991198751015840
119895 = |1198751015840119895 | and 119899119894119895 = |119875119894 cap 1198751015840119895 | respectivelyThe NMI is an information-theory based metric which
measures how much the detected community structureagrees with the ground truth Therefore it can only be usedto evaluate the quality of the detected community structureon networks whose ground-truth community structure isalready known Its value is in the range of [0 1] larger isbetter
Another metric widely used to evaluate the performanceof community detection method is modularity[7] which isdefined as follows
119876 = sum119894
(119890119894119894 minus 1198862119894 ) (9)
where 119890119894119894 is the diagonal element of a 119870 times 119870 matrix 119890whose element 119890119894119895 is the fraction of edges between nodes incommunities 119862119894 and 119862119895 to the total edges in the network 119870
is the number of communities in the community structure 119886119894is the fraction of edges associated with nodes in community119862119894
The first term sum119894 119890119894119894 in the right of (9) is the fractionof edges within communities the second term sum119894 1198862119894 is theexpected value of the same fraction in a random graph inwhich nodes and degree distribution are the same as in theoriginal network but edges are connected between nodesrandomly The smaller difference is between the two termsthe more the network approaches a random graph then theweaker the community structure is On the contrary thelarger the difference between them is the network departsfurther from the random graph then the stronger the com-munity structure is That is to say the modularity measuresquality of the community structure from the perspective ofhow far the detected result deviates from a random networkits effective value falls in [0 1] higher is better
43 Synthetic Networks We carried out experiments on fourgroups of artificial networks to testify the performance ofthe proposed method As mentioned above all the fourtypes of artificial networks are synthesized using the LFRbenchmark generator software [50] Besides the parameterslisted in Table 1 another critical parameter for this softwareis the mixing parameter 120583 which regulates for each node theratio of edges connected to nodes in other communities Thesmaller the value of 120583 is the clearer the community structurewill be Obviously 120583 = 05 is a transitive point above whichcommunities in networks tend to be obscure
Complexity 9
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a)
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10
NMI
(b)
Figure 2Comparison of different community-detection algorithms on LFR benchmark networks containing 1000 nodes (a)The results detectedfrom small network with small-sized communities (b) The results identified from small networks with big-sized communities
In our experiments we varied the value of 120583 from 01 to08 with an increment of 01 for each group of LFR networksTo eliminate the occasionality we generated 10 networksfor each value of 120583 while keeping the same setting forother parameters Since the community structures have beenalready embedded in these synthetic networks we use NMIas the metric to evaluate the performance of our proposedmethod and the comparison algorithms We took thesenetworks as the input one by one to run our proposedmethodand the comparison algorithms to detect communities anduse the average of NMI as the resulting metric The resultsdetected by our proposal and the comparison algorithmsfrom the small networks with small-sized communities orbig-sized communities are illustrated in Figures 2(a) and 2(b)respectively the results revealed from the larger networkswith small-sized communities and big-sized communities arepresented in Figures 3(a) and 3(b) separately
In Figures 2(a) and 2(b) Fast119876 tends to introducemistakes in the results no matter communities in networksarewell separated or obscure Asmentioned previously Fast119876is a typical modularity-optimization based algorithm it aimsonly at acquiring results with larger modularity rather thanhigh accuracy In our experiments all of the results uncoveredby it are not satisfactory Even in the networks with 120583 =01 it still failed to identify the exact communities andfurthermore its performance is the worst in comparisonalgorithms for 120583 ⩽ 05 For 120583 gt 05 the quality of its results isonly better than that of LPA LPA performed as well as othercomparison algorithms in those networks for 120583 lt 05 but itsperformance dropped dramatically for 120583 ⩾ 05 it even couldnot detect the effective communities from networks for 120583 gt06 This might be due to its own label-update mechanismwhen the community boundaries become obscure nodestend to accept incorrect labels to update their own onesalways leading to the trivial results even all nodes are labeled
as members of one giant community The proposed methodNSA acquired NMI = 1 on all networks for 120583 lt 05 meaningthat the detected partitions are perfectly matched with theground-truth community structures in these networks For120583 = 05 NSA also obtained the results as better as those ofWalkTrap Attractor and IsoFdp For 120583 gt 05 there has beena slip in the quality of the detected community structuresfor all those three algorithms and the proposed method For05 lt 120583 ⩽ 06 the quality of our proposal is better thanthat of Attractor in networks with larger communities andfor 120583 ⩾ 07 the performance of our proposed method is thebest
In Figures 3(a) and 3(b) we obtained the similar results asthose in Figure 2 overall But they still differ from each otherin someway In Figure 3(a) our proposedmethod performedthe best on almost all networks For 05 lt 120583 lt 07 in Figure 2NMI of the results extracted by our proposed method islower than those of WalkTrap and IsoFdp however inFigure 3 the proposedmethod performed better than IsoFdpfor 120583 gt 05 These results suggest that the performancesof the comparison algorithms are not stable on differentnetworks but our proposedmethod can steadily extract high-quality community structures from networks with differentcharacteristics This is also can be manifested from the factthat all the curves of the proposed method in these figuresdecline more slowly than others Moreover we can draw aconclusion by comparing the curves of the proposalrsquos own inthese figures that our proposed method inclines to performbetter on larger networks with small communities thereforeit overcomes the problem of resolution limit to some extent
44 Real-World Networks We also carried out experimentson 13 real-world networks to further test the effectivenessand efficiency of our proposed method As mentioned inSection 41 these networks fall in two categories ones with
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 3
been derived from LPA Barber et al [29] proposed a seriesof algorithms that propagate labels under some constraintsLPAm is the most famous one which tries to maximizethe modularity during the label propagation procedureChin et al [30] identified the main communities usingthe number of mutual neighboring nodes first then theyattached some independent constraints to the basic LPA andused the constrained LPA to add the remainder nodes intocommunities finally they used a node-moving strategy likethat is employed in Louvain to refine the quality of theresulting community structure Ding et al [31] yielded amodified version of LPA which exploits the idea of densitypeak clustering [32] and Chebyshev inequality to choosecommunity centers from the network and then propagateslabels of the selected centers to the whole network with theproposed multistrategy of label propagation
Density-based methods define and utilize the concept ofdensity in networks for nodes or communities to uncovercommunity structures SCAN [33] borrows the idea from theclassical density-based clustering algorithm DBSCAN [34]to reveal communities hubs and outliers from networksSCAN++ [35] is a derivative of SCAN it reduces time con-sumption via introducing a new data structure and reducingthe number of density evaluations in the detecting procedureIsoFdp [36] maps the network nodes as data points intoa low-dimensional manifold and then exploits the densitypeak clustering algorithm [32] to extract the final communitystructure LCCD algorithm [37] also practices on the wayproposed in the density peak clustering algorithm [32] tolocate the structural centers from networks and then expandscommunities from the identified centers to the borders usinga local search procedure
Network dynamic-based methods explore communitystructures by simulating the dynamic processes in networksRandom walk is a typical dynamic procedure carried out innetworks random walk-based methods utilize the tendencyof the walker being trapped into a community during a shortwalk rather than walking across the community border intoanother community to detect communities from networksWalkTrap [38] makes use of random walk to calculate theprobability of going from one node to another during ashort-length walk and then calculates the distance tomeasurenodesrsquo similarities and community similarities PPC algo-rithm [39] considers the network as a single communityinitially and recursively partitions each community utilizingnode similarities computed using random walks until furtherpartitioning cannot acquire a better value of modularityRWA [40] employs random walks to calculate the probabilityof a node belonging to a community and each communityis expanded by repeatedly attracting the node which ismost likely to belong to that community to join Besidesthis Attractor [41] utilizes distance dynamics to explorecommunities fromnetworks node interactions might changethe distances among nodes and the distance change willmake an impact on the interaction in reverse Members ofthe same community will gradually move together undersuch interplays and nodes in different communities will keepfar away from each other steadily BiAttractor [42] extendsthe concept of distance dynamics and the idea of Attractor
to bipartite networks which is used to detect two-modecommunities of bipartite networks
Spectral methods engage eigenspectra of various net-work-associated matrices to extract communities For exam-ple Amini et al [43] found the initial node partitionsusing the spectral clustering method based on the normal-ized Laplacian matrix derived from a regularized adjacencymatrix those partitions were used for fitting a stochasticblock model by a pseudolikelihood algorithm to detect theresulting community structure SiemonC de Lange et al [44]identified an integrative community structure in the macro-scopic anatomical neural networks of the macaque and catand the microscopic network of the C elegans by examiningthe spectra of their normalized Laplacian matrices Krzakalaet al [45] produced a class of spectral algorithms to detectcommunities based on the nonbacktracking matrix whichdepicts a nonbacktracking walk on the directed edges ofthe network Shi et al [46] proposed a spectral communitydetection method LLSA which employs Lanczos methodto obtain the approximated eigenvector of the transitionmatrix with the largest eigenvalue and the elements of thiseigenvector approximately indicate the affiliation probabilityof the corresponding nodes to the communities
Most of the methods mentioned above are global onesthey detect communities often depending on some globalinformation such as the number of communities informa-tion about eigenvalues or eigenvectors as prior knowledgebut they are hard to acquire due to the size of networksinvolved getting larger and larger Moreover most of themare computationally demanding leading to high time com-plexity These limitations prevent them from being appliedto large-scale applications To overcome the deficiency of theglobal algorithms many local methods have been proposedincluding someof the aforementionedmethods For exampleLPA and most of its variations determine which label shouldbe adopted by a node according to its neighborhood onlyLCCD takes into account both the local density of nodes andthe relative distance between nodes to locate the local struc-tural centers and expands communities from the structuralcenters with a local search procedure LLSA applies a fastheat kernel diffusing to sample a small subnetwork includingalmost all members of a community and the eigenvectorwhose elements suggest nodes for their memberships ofcommunities is obtained by performing Lanczos method onthe sampled subnetwork
Besides this ComSim algorithm [47] identifies cores ofcommunities from bipartite networks by seeking for cycleswhich are node chains formed by following outgoing linksand reaching a node already visited and then allocates theremaining nodes to the communities that maximize thesimilarity between the node and the community In BLI algo-rithm [48] local clustering information and local structuralsimilarity are employed to establish the primary communitystructure then some small-scale communities whose sizesare smaller than a given threshold 120582 are absorbed by somelarger ones kSIM [49] is also a local method that works ina bottom-up way At the beginning each node is taken as acommunity then the preliminary communities are formedby identifying for each node the neighbor community to
4 Complexity
Input 119866(119881 119864) the network 120575 the community metric thresholdOutput 119862119878 the detected community structurelowast form the preliminary community structure119862119878 119901119903119890 lowast
1 119862119878 119901119903119890 larr997888FPC(119866)lowast merge small or sparse communities in 119862119878 119901119903119890 lowast
2 119862119878 larr997888PCM(119862119878 119901119903119890 120575)3 return 119862119878
Algorithm 1 The framework of our proposed method NSA
which one of its 119896 most similar neighbors with the lowestdegree belongs and assigning the node to that community Inthis procedure common neighbor index is employed as thesimilarity measure for each pair of nodes
Compared to those global ones these local methods showgood performance in large-scale networks Inspired by thiswe also propose a local method to extract communities fromnetworks The proposed method is based on node similarityand is termed as NSA (Node Similarity based Algorithm)for short it comprises of two phases the first phase aimsat constructing the preliminary community structure thesecond phase tries to improve the quality of the final resultby merging some small or sparse communities To do sowe also propose a measure community metric to evaluatethe sparsity or smallness of communities The details of theproposed method are elaborated in the next section
3 The Proposed Method
31 The Framework of the Proposed Method The frameworkof the proposed method is outlined by the pseudocode listedin Algorithm 1
As mentioned previously the proposed method consistsof two phases Function calls FPC() and PCM() implementthe two phases respectively The former establishes thepreliminary community structure based on a node selectionstrategy and the node similarity the latter merges somesmall or sparse communities to improve the quality of theresulting community structure The inputs of this algorithmare the network and a threshold 120575 the network involved inthis paper is the undirected and unweighted graph whichis always represented as 119866(119881 119864) as in Algorithm 1 where 119881and 119864 are the node set and edge set respectively |119881| = 119899and |119864| = 119898 are the number of nodes and edges in thenetwork individually The threshold 120575 is used in the secondphase of the proposed method to identify communities to bemergedmdasha community whose community metric is smallerthan 120575 should be merged into another oneThe output of thisalgorithm is the detected community structure
The next two subsections describe the two proceduresconcretely and deliberately
32 Formation of the Preliminary Community Structure Thefunction FPC() implements the first phase of the proposedmethod whose purpose is to construct the preliminarycommunity structure from the network We first pick out
the node with the largest degree from the network takeit as the exemplar of the first community and insert itsmost similar neighbor into the community as well (if thereare more than one node with the largest degree in thenetwork we arbitrarily select any one of them to take it as theexemplar and if the exemplar hasmore than onemost similarneighbors the one with the smallest degree is selected)Afterwards the next largest-degree node in the remainderof network is selected if its most similar neighbor has notbeen classified into any community yet we create a newcommunity for it and its most similar neighbor Otherwiseif its most similar neighbor has been assigned to a certaincommunity (eg the one denoted as 119862119896) we insert theselected node into that community (ie119862119896 ) aswellWe repeatthis process until every node is classified into a community Inthis procedure densely connected nodes can quickly gathertogether around the exemplars to form communities Atthe end of this procedure we get a series of communitieswhich constitute the preliminary community structure of thenetwork The pseudocode describing the entire procedure islisted in Algorithm 2
In this algorithm the degree of node 119906 is the number of119906rsquos neighbors and is denoted as 119889119906 ie
119889119906 = |Γ (119906)| (1)
where
Γ (119906) = V | (119906 V) isin 119864 V isin 119881 (2)
is the set of neighbors of node 119906 119904119894119898(119906 V) stands for thesimilarity between nodes 119906 and V There are abundant waysto calculate the similarity between nodes in the network anyone of themcanbe employed in principleHowever to pursuethe efficiency we calculate it here as in the following equationwhich involves only the neighborhoods of nodes 119906 and Vthemselves
119904119894119898 (119906 V) = |Γ (119906) cap Γ (V)||Γ (119906) cup Γ (V)| (3)
Thevariables119880 and119862119878 119901119903119890 are used to record the unclassifiednodes and the preliminary community structure they arenaturally initialized to be the original node set 119881 of network119866 and an empty set 120601 in step 1 Steps 2 and 3 select the nodewith the largest degree from the remainder of the networkand its most similar neighbors and denote them as V and 119908respectively Step 4 determines whether 119908 has been assigned
Complexity 5
Input 119866(119881 119864) the networkOutput 119862119878 119901119903119890 = 1198621 1198622 sdot sdot sdot 119862119896 the identified preliminary community structure
1 Initialize variables 119880 and 119862119878 119901119903119890 which are used to recordthe unclassified nodes and the preliminary community structure
119880 larr997888 119881 119862119878 119901119903119890 larr997888 1206012 Select the node with the largest degree denote it as V
V larr997888 argmax119906119889119906 | 119906 isin 1198803 Get the most similar neighbor of V denote it as 119908
119908 larr997888 argmax119906119904119894119898(V 119906) | 119906 isin Γ(V)4 if 119908 has not been assigned to any community then5 Create a new community for nodes V and 119908
119870 larr997888 |119862119878 119901119903119890| 119862119870+1 larr997888 V 1199086 Insert the created community into the community structure
119862119878 119901119903119890 larr997888 119862119878 119901119903119890 cup 119862119870+17 Remove nodes V and 119908 from 119880 as they are classified
119880 larr997888 119880 minus V 1199088 else9 Find the community to which 119908 belongs denote it as 119862119896
119896 larr997888 locate(119862119878 119901119903119890 119908)10 Insert node V into 119862119896
119862119896 larr997888 119862119896 cup V11 Remove node V from 119880 as it is classified
119880 larr997888 119880 minus V12 Repeat steps 2 through 11 until 119880 = 12060113 return 119862119878 119901119903119890
Algorithm 2 FPC(G) forming the preliminary community structure
to a community or not if it has not been classified to anycommunity yet steps 5 and 6 create a new community fornodes V and 119908 and insert the newly created community into119862119878 119901119903119890 then step 7 removes nodes V and 119908 from 119880 as theyhave been classified into the new community just now If node119908 has been already assigned to a community step 9 finds thecommunity 119862119896 to which node Vrsquos most similar neighbor 119908belongs and step 10 inserts node V into community 119862119896 Sincenode V has been assigned to community119862119896 step 11 removes itfrom119880 Step 12 repeats operations in steps 2 through 11 until119880 = 120601 meaning that all the nodes in the network have beenvisited At that time the preliminary community structureis obtained in 119862119878 119901119903119890 and is returned as the output of thisalgorithm in step 13
To make it clearer we take Zacharyrsquos karate club network[14] as an example to illustrate intuitively the procedureThis is a network with 34 nodes and 78 edges as shown inFigure 1(a) in which the node with the largest degree is nodelsquo34rsquo and its most similar neighbor is node lsquo33rsquo Thereforenode lsquo34rsquo is taken as the exemplar of the first communityand node lsquo33rsquo is also inserted into this community Thenthe node with the largest degree in the remaining nodes isnode lsquo1rsquo its most similar neighbor is node lsquo2rsquo Since node lsquo2rsquohas not been assigned to a community yet we create a newcommunity take node lsquo1rsquo as its exemplar and insert node lsquo2rsquointo the new community as well The same thing happens tonode pairs (lsquo3rsquo lsquo4rsquo) (lsquo32rsquo lsquo29rsquo) and (lsquo9rsquo lsquo31rsquo) sequentially Thenthe next largest-degree node is lsquo14rsquo its most similar neighbornode lsquo4rsquo is already in the third community therefore weinsert node lsquo14rsquo into the third community All of the other
nodes are processed in the same way and in the subsequentoperations node pairs (lsquo24rsquo rsquo30rsquo) (lsquo6rsquo lsquo7rsquo) (lsquo5rsquo lsquo11rsquo) and (lsquo25rsquolsquo26rsquo) form new communities all of the remaining nodesare inserted into communities to which their most similarneighbors belong At the end of the process we obtain thepreliminary community structure as shown in Figure 1(b) inwhich each node connects to its most similar neighbor witha directed edge
33 Merge of Small or Sparse Communities At the end ofthe first phase of our proposed method we obtain thepreliminary community structure However some commu-nities are either too small or too sparse to make sense justlike the preliminary communities lsquo5rsquo lsquo11rsquo lsquo9rsquo lsquo31rsquo lsquo32rsquolsquo29rsquo lsquo25rsquo lsquo26rsquo lsquo28rsquo lsquo24rsquo lsquo30rsquo lsquo27rsquo and lsquo6rsquo lsquo7rsquo lsquo17rsquo inFigure 1(b) because each of them contains only a few nodesthe inside edges of each of them are very sparse the numberof edges inside each of them is much smaller than that ofedges connecting to outside violating the characteristic thatconnections inside one community are much denser thanthose across different communities Keeping them in the finalcommunity structure will lead to the low quality Thereforewe merge some of the preliminary communities to acquirethe final result in the second phase which is carried out byfunction call PCM() in Algorithm 1
To this end there are two problems needed to be solvedin PCM() The first one is to identify which communities aresmall or sparse enough that need to be merged into anotherones the second one is to select the communities into whicheach of the small or sparse communities should be merged
6 Complexity
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 1 The procedure of FPC() on the karate club network
For the first problem we propose an index communitymetric which takes into account two factors communitysize and community sparsity to find out the preliminarycommunities needed to be merged Here we formalize therelevant concepts and the index as Definition 1 throughDefinition 3
Definition 1 (community sparsity) The sparsity of commu-nity 119862119894 is defined as follows
120572119894 =10038161003816100381610038161003816119864119894119899119894
1003816100381610038161003816100381610038161003816100381610038161198641199001199061199051198941003816100381610038161003816 (4)
where 119864119894119899119894 is the set of edges within community 119862119894 and and119864119900119906119905119894 is the set of edges connecting nodes in community 119862119894with other communities
That is to say the sparsity of community 119862119894 is defined asthe ratio between the number of inner edges of 119862119894 and thenumber of outer edges of 119862119894 Obviously the more edges existwithin community 119862119894 the larger the value of 120572119894 will be andvice versa
Definition 2 (community scale) The scale of community 119862119894is formalized as follows
120573119894 =10038161003816100381610038161198811198941003816100381610038161003816
|119881| (5)
where 119881119894 is the set of nodes in community 119862119894
Obviously the scale of community 119862119894 is defined as theratio of the number of nodes in 119862119894 to the total numberof nodes in the network The more nodes there are incommunity 119862119894 the larger value the ratio will be and viceversa
Definition 3 (community metric) The community metricis a combination of both the community sparsity and thecommunity scale which is defined for community 119862119894 asfollows
120574119894 = 120572119894 lowast 120573119894 (6)
On the basis of these definitions the first problem can besolved by setting a community metric threshold 120575 That is tosay if 120574119894 lt 120575 community 119862119894 needs to be merged into anothercommunity
For the second problem we consider a strategy con-forming to the construction of preliminary communitiesThe preliminary communities are formed based mainly onnode similarity in the first phase therefore we also use thesimilarity as a criterion here to merge communities ie eachof the small or sparse communities is merged into its mostsimilar adjacent communityHere the similarity between twocommunities 119862119894 and 119862119895 is calculated as follows
119878119894119898(119862119894 119862119895) =sum 119906isin119862119894
Visin119862119895119904119894119898 (119906 V)10038161003816100381610038161003816119862119895
10038161003816100381610038161003816 (7)
where 119904119894119898(119906 V) is the similarity between nodes 119906 isin 119862119894and V isin 119862119895 which is calculated using (3) In functionPCM() implementing the merge procedure 119862119894 is a com-munity needed to be merged 119862119895 is one of its adjacentcommunities The numerator of the right term in (7) is thesum of similarities between nodes in communities 119862119894 and119862119895 Dividing by the denominator |119862119895| is a constraint onthe priority for larger communities to prevent from formingsome giant communities
The logic of entire procedure of the second phase is listedin Algorithm 3 the operations are almost self-explanatoryThe variable 119862119878 is used to record the final communitystructure it is initialized as the preliminary communitystructure 119862119878 119901119903119890 in step 1 Step 2 calculates the communitymetric for each of the preliminary communities steps 3 and4 select the community with the smallest community metricand its most similar community step 5 merges them toyield a new community and step 6 calculates the communitymetric for that new community Step 7 replaces the twocommunities 119862119905 and 119862119895 with that new community in 119862119878to reflect the effect of the merge operation Step 8 repeatsoperations in steps 3 through 7 until the minimal communitymetric of the selected community is larger than the giventhreshold 120575 meaning that all the remaining communities aresatisfactory therefore themerge procedure is terminated andthe resulting community structure in119862119878 is returned in step 9
Complexity 7
Input 119862119878 119901119903119890 the preliminary community structure 120575 the community-metric thresholdOutput 119862119878 the final community structure
1 Initialize 119862119878 which is used to record the community structure119862119878 larr997888 119862119878 119901119903119890
2 Calculate the community metric for each of the preliminary communitiesforeach 119862119894 isin 119862119878 do
120574119894 larr997888 120572119894 times 1205731198943 Select the community with the minimal community metric denote its index as 119905
119905 larr997888 argmin119894120574119894 | 119894 = 1 2 sdot sdot sdot |119862119878|4 Identify the most similar community with 119862119905 denote its index as 119895
119895 larr997888 argmax119894119878119894119898(119862119905 119862119894) | 119894 = 1 2 sdot sdot sdot |119862119878| 119894 = 1199055 Merge communities 119862119905 and 119862119895 to form a new community
119896 larr997888 |119862119878| 119862119896+1 larr997888 119862119905 cup 1198621198956 Calculate the community metric for the new community
120574119896+1 larr997888 120572119896+1 times 120573119896+17 Replace the two communities 119862119905 and 119862119895 with the new community to reflect the merging effect
119862119878 = 119862119878 minus 119862119905 119862119895 cup 119862119896+18 Repeat steps 3 through 7 until 120574119905 gt 1205759 return 119862119878
Algorithm 3 PCM(119862119878 119901119903119890 120575) merge small or sparse communities
34 Time Complexity The proposed algorithm is comprisedof two phases the first one is to form the preliminarycommunities The main time consumption in this phase ison the selection of the node with the largest degree (step2 in Algorithm 2) and its most similar neighbor (step 3 inAlgorithm 2) the former can be accomplished in 119874(log 119899) ineach iteration using a max-heap data structure the latter canbe got down in 119874(log⟨119889⟩) with the max-heap where ⟨119889⟩ isthe average degree of nodes in the network Since ⟨119889⟩ ≪ 119899the time consumption of the first phase is 119874(119899 log 119899)
The second phase is used to improve the quality of theresulting community structure by merging some of the smallor sparse communities Themajor time is spent on determin-ing the community needed to be merged and its most similaradjacent community in each iteration Assuming there are119870 communities in the preliminary community structure theformer operation can be implemented in 119874(log119870) the lattercan also be carried out with 119874(log119870) time consumption inthe worst case Hence the second phase can be implementedwith 119874(119870 log119870) time consumption
Since 119870 ≪ 119899 then log119870 ≪ log 119899 Therefore theproposed method can detect communities from networkswith a relatively high efficiency 119874(119899 log 119899) time complexity
4 Experimental Results and Discussion
41 Network Datasets and Comparison System To testify theperformance of our proposed method we have conductedextensive experiments on both some groups of artificial net-works and some real-world networks The artificial networksare synthesized using LFR benchmark network generator[50] which works with some parameters to control thecharacteristics of generated networks Here we consider theinfluences of both the network scale and community sizetherefore four types of networks are generated say smallnetworks with small communities and big communities and
larger networks with small communities and big commu-nities respectively Each of the small networks and largernetworks contains 1000 and 5000 nodes respectively thesmall community contains about 10 nodes at least and 50nodes atmost theminimumandmaximumnumber of nodesin the big communities are 20 and 100 respectively Thegenerated networks with small communities and big commu-nities aremarked using the suffixes lsquosrsquo and lsquobrsquo individuallyTheexponents of the power-law distributions that node degreeand community size follow are the default values minus2 andminus1 respectively The parameters used to synthesize the fourgroups of artificial networks are listed in Table 1
We also performed the experiments on 13 real-worldnetworks the size of these networks spans from tens tohundreds of thousands of nodes the information aboutthem is listed in Table 2 These real-world networks can bedivided into two categories the first category includes thefirst four networks whose ground-truth communities areknown a priori the second one contains the other ninenetworks which have no publicly acknowledged ground-truth community structures
On these networks we ran our proposed method todetect community structures from them and compared theresults to those of 5 popular community detection algorithmsnamely Fast119876[24] WalkTrap [38] LPA[28] Attractor[41]IsoFdp[36] which have been already introduced in Section 2For LPA since it is a nondeterministic algorithm we ranit on each network 10 times and take the average of theevaluation metrics as its resulting metric value obtained fromthat network For our proposedmethod NSA we empiricallyset 120575 = 013 for the dolphin social network and 120575 = 01 forother networks in the experiments The details of how to setthe optimal value of 120575 will be discussed in Section 5
42 Evaluation Metrics Two indexes namely NMI (Nor-malized Mutual Information) [51] and modularity[7] are
8 Complexity
Table 1 The parameters used to generate the LFR networks In the header row of this table 119899 is number of nodes contained in the network⟨119889⟩ and 119889119898119886119909 are the average degree and the max degree respectively exp119889 and exp119888119900119898 are the exponents of the power law distributions thatnode degree and community size follow min(119862119894) and max(119862119894) represent the minimal and maximal number of nodes contained in everycommunity respectively
Network 119899 ⟨119889⟩ 119889119898119886119909 exp119889 expcom min(119862119894) max(119862119894)LFR1000s 1000 20 50 -2 -1 10 50LFR1000b 1000 20 50 -2 -1 20 100LFR5000s 5000 20 50 -2 -1 10 50LFR5000b 5000 20 50 -2 -1 20 100
Table 2 The information about the real-world networks 119899 and119898 are the number of nodes and edges in the network respectively
Network 119899 119898Karate club[14] 34 78Dolphin social network[15] 62 159Risk map[16] 42 83Scientists collaboration network [6] 118 197Lesmis[17] 77 254Polbooks[3] 105 441ColiNeta[18] 423 519NetScience[10] 1589 2742Email[19] 1133 5451YeastL[20] 2361 7182PGP[21] 10680 24316DBLP[22] 317080 1049866Amazon[22] 334863 925872
adopted as the measure metrics to evaluate the qualityof the detected community structure in this paper TheNMI between the ground-truth community structure 119875 =1198751 1198752 119875119870 and the extracted one 1198751015840 = 11987510158401 11987510158402 11987510158401198701015840 is calculated as follows
NMI (119875 1198751015840)
=minus2sum|119875|119894=1sum|119875
1015840|119895=1 119899119894119895 log ((119899119894119895 sdot 119899) (119899119875119894 sdot 119899119875
1015840
119895 ))sum|119875|119894=1 119899119875119894 log (119899119875119894 119899) + sum|119875
1015840|119895=1 119899119875
1015840
119895 log (1198991198751015840
119895 119899)
(8)
where 119899119875119894 = |119875119894| 1198991198751015840
119895 = |1198751015840119895 | and 119899119894119895 = |119875119894 cap 1198751015840119895 | respectivelyThe NMI is an information-theory based metric which
measures how much the detected community structureagrees with the ground truth Therefore it can only be usedto evaluate the quality of the detected community structureon networks whose ground-truth community structure isalready known Its value is in the range of [0 1] larger isbetter
Another metric widely used to evaluate the performanceof community detection method is modularity[7] which isdefined as follows
119876 = sum119894
(119890119894119894 minus 1198862119894 ) (9)
where 119890119894119894 is the diagonal element of a 119870 times 119870 matrix 119890whose element 119890119894119895 is the fraction of edges between nodes incommunities 119862119894 and 119862119895 to the total edges in the network 119870
is the number of communities in the community structure 119886119894is the fraction of edges associated with nodes in community119862119894
The first term sum119894 119890119894119894 in the right of (9) is the fractionof edges within communities the second term sum119894 1198862119894 is theexpected value of the same fraction in a random graph inwhich nodes and degree distribution are the same as in theoriginal network but edges are connected between nodesrandomly The smaller difference is between the two termsthe more the network approaches a random graph then theweaker the community structure is On the contrary thelarger the difference between them is the network departsfurther from the random graph then the stronger the com-munity structure is That is to say the modularity measuresquality of the community structure from the perspective ofhow far the detected result deviates from a random networkits effective value falls in [0 1] higher is better
43 Synthetic Networks We carried out experiments on fourgroups of artificial networks to testify the performance ofthe proposed method As mentioned above all the fourtypes of artificial networks are synthesized using the LFRbenchmark generator software [50] Besides the parameterslisted in Table 1 another critical parameter for this softwareis the mixing parameter 120583 which regulates for each node theratio of edges connected to nodes in other communities Thesmaller the value of 120583 is the clearer the community structurewill be Obviously 120583 = 05 is a transitive point above whichcommunities in networks tend to be obscure
Complexity 9
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a)
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10
NMI
(b)
Figure 2Comparison of different community-detection algorithms on LFR benchmark networks containing 1000 nodes (a)The results detectedfrom small network with small-sized communities (b) The results identified from small networks with big-sized communities
In our experiments we varied the value of 120583 from 01 to08 with an increment of 01 for each group of LFR networksTo eliminate the occasionality we generated 10 networksfor each value of 120583 while keeping the same setting forother parameters Since the community structures have beenalready embedded in these synthetic networks we use NMIas the metric to evaluate the performance of our proposedmethod and the comparison algorithms We took thesenetworks as the input one by one to run our proposedmethodand the comparison algorithms to detect communities anduse the average of NMI as the resulting metric The resultsdetected by our proposal and the comparison algorithmsfrom the small networks with small-sized communities orbig-sized communities are illustrated in Figures 2(a) and 2(b)respectively the results revealed from the larger networkswith small-sized communities and big-sized communities arepresented in Figures 3(a) and 3(b) separately
In Figures 2(a) and 2(b) Fast119876 tends to introducemistakes in the results no matter communities in networksarewell separated or obscure Asmentioned previously Fast119876is a typical modularity-optimization based algorithm it aimsonly at acquiring results with larger modularity rather thanhigh accuracy In our experiments all of the results uncoveredby it are not satisfactory Even in the networks with 120583 =01 it still failed to identify the exact communities andfurthermore its performance is the worst in comparisonalgorithms for 120583 ⩽ 05 For 120583 gt 05 the quality of its results isonly better than that of LPA LPA performed as well as othercomparison algorithms in those networks for 120583 lt 05 but itsperformance dropped dramatically for 120583 ⩾ 05 it even couldnot detect the effective communities from networks for 120583 gt06 This might be due to its own label-update mechanismwhen the community boundaries become obscure nodestend to accept incorrect labels to update their own onesalways leading to the trivial results even all nodes are labeled
as members of one giant community The proposed methodNSA acquired NMI = 1 on all networks for 120583 lt 05 meaningthat the detected partitions are perfectly matched with theground-truth community structures in these networks For120583 = 05 NSA also obtained the results as better as those ofWalkTrap Attractor and IsoFdp For 120583 gt 05 there has beena slip in the quality of the detected community structuresfor all those three algorithms and the proposed method For05 lt 120583 ⩽ 06 the quality of our proposal is better thanthat of Attractor in networks with larger communities andfor 120583 ⩾ 07 the performance of our proposed method is thebest
In Figures 3(a) and 3(b) we obtained the similar results asthose in Figure 2 overall But they still differ from each otherin someway In Figure 3(a) our proposedmethod performedthe best on almost all networks For 05 lt 120583 lt 07 in Figure 2NMI of the results extracted by our proposed method islower than those of WalkTrap and IsoFdp however inFigure 3 the proposedmethod performed better than IsoFdpfor 120583 gt 05 These results suggest that the performancesof the comparison algorithms are not stable on differentnetworks but our proposedmethod can steadily extract high-quality community structures from networks with differentcharacteristics This is also can be manifested from the factthat all the curves of the proposed method in these figuresdecline more slowly than others Moreover we can draw aconclusion by comparing the curves of the proposalrsquos own inthese figures that our proposed method inclines to performbetter on larger networks with small communities thereforeit overcomes the problem of resolution limit to some extent
44 Real-World Networks We also carried out experimentson 13 real-world networks to further test the effectivenessand efficiency of our proposed method As mentioned inSection 41 these networks fall in two categories ones with
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
4 Complexity
Input 119866(119881 119864) the network 120575 the community metric thresholdOutput 119862119878 the detected community structurelowast form the preliminary community structure119862119878 119901119903119890 lowast
1 119862119878 119901119903119890 larr997888FPC(119866)lowast merge small or sparse communities in 119862119878 119901119903119890 lowast
2 119862119878 larr997888PCM(119862119878 119901119903119890 120575)3 return 119862119878
Algorithm 1 The framework of our proposed method NSA
which one of its 119896 most similar neighbors with the lowestdegree belongs and assigning the node to that community Inthis procedure common neighbor index is employed as thesimilarity measure for each pair of nodes
Compared to those global ones these local methods showgood performance in large-scale networks Inspired by thiswe also propose a local method to extract communities fromnetworks The proposed method is based on node similarityand is termed as NSA (Node Similarity based Algorithm)for short it comprises of two phases the first phase aimsat constructing the preliminary community structure thesecond phase tries to improve the quality of the final resultby merging some small or sparse communities To do sowe also propose a measure community metric to evaluatethe sparsity or smallness of communities The details of theproposed method are elaborated in the next section
3 The Proposed Method
31 The Framework of the Proposed Method The frameworkof the proposed method is outlined by the pseudocode listedin Algorithm 1
As mentioned previously the proposed method consistsof two phases Function calls FPC() and PCM() implementthe two phases respectively The former establishes thepreliminary community structure based on a node selectionstrategy and the node similarity the latter merges somesmall or sparse communities to improve the quality of theresulting community structure The inputs of this algorithmare the network and a threshold 120575 the network involved inthis paper is the undirected and unweighted graph whichis always represented as 119866(119881 119864) as in Algorithm 1 where 119881and 119864 are the node set and edge set respectively |119881| = 119899and |119864| = 119898 are the number of nodes and edges in thenetwork individually The threshold 120575 is used in the secondphase of the proposed method to identify communities to bemergedmdasha community whose community metric is smallerthan 120575 should be merged into another oneThe output of thisalgorithm is the detected community structure
The next two subsections describe the two proceduresconcretely and deliberately
32 Formation of the Preliminary Community Structure Thefunction FPC() implements the first phase of the proposedmethod whose purpose is to construct the preliminarycommunity structure from the network We first pick out
the node with the largest degree from the network takeit as the exemplar of the first community and insert itsmost similar neighbor into the community as well (if thereare more than one node with the largest degree in thenetwork we arbitrarily select any one of them to take it as theexemplar and if the exemplar hasmore than onemost similarneighbors the one with the smallest degree is selected)Afterwards the next largest-degree node in the remainderof network is selected if its most similar neighbor has notbeen classified into any community yet we create a newcommunity for it and its most similar neighbor Otherwiseif its most similar neighbor has been assigned to a certaincommunity (eg the one denoted as 119862119896) we insert theselected node into that community (ie119862119896 ) aswellWe repeatthis process until every node is classified into a community Inthis procedure densely connected nodes can quickly gathertogether around the exemplars to form communities Atthe end of this procedure we get a series of communitieswhich constitute the preliminary community structure of thenetwork The pseudocode describing the entire procedure islisted in Algorithm 2
In this algorithm the degree of node 119906 is the number of119906rsquos neighbors and is denoted as 119889119906 ie
119889119906 = |Γ (119906)| (1)
where
Γ (119906) = V | (119906 V) isin 119864 V isin 119881 (2)
is the set of neighbors of node 119906 119904119894119898(119906 V) stands for thesimilarity between nodes 119906 and V There are abundant waysto calculate the similarity between nodes in the network anyone of themcanbe employed in principleHowever to pursuethe efficiency we calculate it here as in the following equationwhich involves only the neighborhoods of nodes 119906 and Vthemselves
119904119894119898 (119906 V) = |Γ (119906) cap Γ (V)||Γ (119906) cup Γ (V)| (3)
Thevariables119880 and119862119878 119901119903119890 are used to record the unclassifiednodes and the preliminary community structure they arenaturally initialized to be the original node set 119881 of network119866 and an empty set 120601 in step 1 Steps 2 and 3 select the nodewith the largest degree from the remainder of the networkand its most similar neighbors and denote them as V and 119908respectively Step 4 determines whether 119908 has been assigned
Complexity 5
Input 119866(119881 119864) the networkOutput 119862119878 119901119903119890 = 1198621 1198622 sdot sdot sdot 119862119896 the identified preliminary community structure
1 Initialize variables 119880 and 119862119878 119901119903119890 which are used to recordthe unclassified nodes and the preliminary community structure
119880 larr997888 119881 119862119878 119901119903119890 larr997888 1206012 Select the node with the largest degree denote it as V
V larr997888 argmax119906119889119906 | 119906 isin 1198803 Get the most similar neighbor of V denote it as 119908
119908 larr997888 argmax119906119904119894119898(V 119906) | 119906 isin Γ(V)4 if 119908 has not been assigned to any community then5 Create a new community for nodes V and 119908
119870 larr997888 |119862119878 119901119903119890| 119862119870+1 larr997888 V 1199086 Insert the created community into the community structure
119862119878 119901119903119890 larr997888 119862119878 119901119903119890 cup 119862119870+17 Remove nodes V and 119908 from 119880 as they are classified
119880 larr997888 119880 minus V 1199088 else9 Find the community to which 119908 belongs denote it as 119862119896
119896 larr997888 locate(119862119878 119901119903119890 119908)10 Insert node V into 119862119896
119862119896 larr997888 119862119896 cup V11 Remove node V from 119880 as it is classified
119880 larr997888 119880 minus V12 Repeat steps 2 through 11 until 119880 = 12060113 return 119862119878 119901119903119890
Algorithm 2 FPC(G) forming the preliminary community structure
to a community or not if it has not been classified to anycommunity yet steps 5 and 6 create a new community fornodes V and 119908 and insert the newly created community into119862119878 119901119903119890 then step 7 removes nodes V and 119908 from 119880 as theyhave been classified into the new community just now If node119908 has been already assigned to a community step 9 finds thecommunity 119862119896 to which node Vrsquos most similar neighbor 119908belongs and step 10 inserts node V into community 119862119896 Sincenode V has been assigned to community119862119896 step 11 removes itfrom119880 Step 12 repeats operations in steps 2 through 11 until119880 = 120601 meaning that all the nodes in the network have beenvisited At that time the preliminary community structureis obtained in 119862119878 119901119903119890 and is returned as the output of thisalgorithm in step 13
To make it clearer we take Zacharyrsquos karate club network[14] as an example to illustrate intuitively the procedureThis is a network with 34 nodes and 78 edges as shown inFigure 1(a) in which the node with the largest degree is nodelsquo34rsquo and its most similar neighbor is node lsquo33rsquo Thereforenode lsquo34rsquo is taken as the exemplar of the first communityand node lsquo33rsquo is also inserted into this community Thenthe node with the largest degree in the remaining nodes isnode lsquo1rsquo its most similar neighbor is node lsquo2rsquo Since node lsquo2rsquohas not been assigned to a community yet we create a newcommunity take node lsquo1rsquo as its exemplar and insert node lsquo2rsquointo the new community as well The same thing happens tonode pairs (lsquo3rsquo lsquo4rsquo) (lsquo32rsquo lsquo29rsquo) and (lsquo9rsquo lsquo31rsquo) sequentially Thenthe next largest-degree node is lsquo14rsquo its most similar neighbornode lsquo4rsquo is already in the third community therefore weinsert node lsquo14rsquo into the third community All of the other
nodes are processed in the same way and in the subsequentoperations node pairs (lsquo24rsquo rsquo30rsquo) (lsquo6rsquo lsquo7rsquo) (lsquo5rsquo lsquo11rsquo) and (lsquo25rsquolsquo26rsquo) form new communities all of the remaining nodesare inserted into communities to which their most similarneighbors belong At the end of the process we obtain thepreliminary community structure as shown in Figure 1(b) inwhich each node connects to its most similar neighbor witha directed edge
33 Merge of Small or Sparse Communities At the end ofthe first phase of our proposed method we obtain thepreliminary community structure However some commu-nities are either too small or too sparse to make sense justlike the preliminary communities lsquo5rsquo lsquo11rsquo lsquo9rsquo lsquo31rsquo lsquo32rsquolsquo29rsquo lsquo25rsquo lsquo26rsquo lsquo28rsquo lsquo24rsquo lsquo30rsquo lsquo27rsquo and lsquo6rsquo lsquo7rsquo lsquo17rsquo inFigure 1(b) because each of them contains only a few nodesthe inside edges of each of them are very sparse the numberof edges inside each of them is much smaller than that ofedges connecting to outside violating the characteristic thatconnections inside one community are much denser thanthose across different communities Keeping them in the finalcommunity structure will lead to the low quality Thereforewe merge some of the preliminary communities to acquirethe final result in the second phase which is carried out byfunction call PCM() in Algorithm 1
To this end there are two problems needed to be solvedin PCM() The first one is to identify which communities aresmall or sparse enough that need to be merged into anotherones the second one is to select the communities into whicheach of the small or sparse communities should be merged
6 Complexity
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 1 The procedure of FPC() on the karate club network
For the first problem we propose an index communitymetric which takes into account two factors communitysize and community sparsity to find out the preliminarycommunities needed to be merged Here we formalize therelevant concepts and the index as Definition 1 throughDefinition 3
Definition 1 (community sparsity) The sparsity of commu-nity 119862119894 is defined as follows
120572119894 =10038161003816100381610038161003816119864119894119899119894
1003816100381610038161003816100381610038161003816100381610038161198641199001199061199051198941003816100381610038161003816 (4)
where 119864119894119899119894 is the set of edges within community 119862119894 and and119864119900119906119905119894 is the set of edges connecting nodes in community 119862119894with other communities
That is to say the sparsity of community 119862119894 is defined asthe ratio between the number of inner edges of 119862119894 and thenumber of outer edges of 119862119894 Obviously the more edges existwithin community 119862119894 the larger the value of 120572119894 will be andvice versa
Definition 2 (community scale) The scale of community 119862119894is formalized as follows
120573119894 =10038161003816100381610038161198811198941003816100381610038161003816
|119881| (5)
where 119881119894 is the set of nodes in community 119862119894
Obviously the scale of community 119862119894 is defined as theratio of the number of nodes in 119862119894 to the total numberof nodes in the network The more nodes there are incommunity 119862119894 the larger value the ratio will be and viceversa
Definition 3 (community metric) The community metricis a combination of both the community sparsity and thecommunity scale which is defined for community 119862119894 asfollows
120574119894 = 120572119894 lowast 120573119894 (6)
On the basis of these definitions the first problem can besolved by setting a community metric threshold 120575 That is tosay if 120574119894 lt 120575 community 119862119894 needs to be merged into anothercommunity
For the second problem we consider a strategy con-forming to the construction of preliminary communitiesThe preliminary communities are formed based mainly onnode similarity in the first phase therefore we also use thesimilarity as a criterion here to merge communities ie eachof the small or sparse communities is merged into its mostsimilar adjacent communityHere the similarity between twocommunities 119862119894 and 119862119895 is calculated as follows
119878119894119898(119862119894 119862119895) =sum 119906isin119862119894
Visin119862119895119904119894119898 (119906 V)10038161003816100381610038161003816119862119895
10038161003816100381610038161003816 (7)
where 119904119894119898(119906 V) is the similarity between nodes 119906 isin 119862119894and V isin 119862119895 which is calculated using (3) In functionPCM() implementing the merge procedure 119862119894 is a com-munity needed to be merged 119862119895 is one of its adjacentcommunities The numerator of the right term in (7) is thesum of similarities between nodes in communities 119862119894 and119862119895 Dividing by the denominator |119862119895| is a constraint onthe priority for larger communities to prevent from formingsome giant communities
The logic of entire procedure of the second phase is listedin Algorithm 3 the operations are almost self-explanatoryThe variable 119862119878 is used to record the final communitystructure it is initialized as the preliminary communitystructure 119862119878 119901119903119890 in step 1 Step 2 calculates the communitymetric for each of the preliminary communities steps 3 and4 select the community with the smallest community metricand its most similar community step 5 merges them toyield a new community and step 6 calculates the communitymetric for that new community Step 7 replaces the twocommunities 119862119905 and 119862119895 with that new community in 119862119878to reflect the effect of the merge operation Step 8 repeatsoperations in steps 3 through 7 until the minimal communitymetric of the selected community is larger than the giventhreshold 120575 meaning that all the remaining communities aresatisfactory therefore themerge procedure is terminated andthe resulting community structure in119862119878 is returned in step 9
Complexity 7
Input 119862119878 119901119903119890 the preliminary community structure 120575 the community-metric thresholdOutput 119862119878 the final community structure
1 Initialize 119862119878 which is used to record the community structure119862119878 larr997888 119862119878 119901119903119890
2 Calculate the community metric for each of the preliminary communitiesforeach 119862119894 isin 119862119878 do
120574119894 larr997888 120572119894 times 1205731198943 Select the community with the minimal community metric denote its index as 119905
119905 larr997888 argmin119894120574119894 | 119894 = 1 2 sdot sdot sdot |119862119878|4 Identify the most similar community with 119862119905 denote its index as 119895
119895 larr997888 argmax119894119878119894119898(119862119905 119862119894) | 119894 = 1 2 sdot sdot sdot |119862119878| 119894 = 1199055 Merge communities 119862119905 and 119862119895 to form a new community
119896 larr997888 |119862119878| 119862119896+1 larr997888 119862119905 cup 1198621198956 Calculate the community metric for the new community
120574119896+1 larr997888 120572119896+1 times 120573119896+17 Replace the two communities 119862119905 and 119862119895 with the new community to reflect the merging effect
119862119878 = 119862119878 minus 119862119905 119862119895 cup 119862119896+18 Repeat steps 3 through 7 until 120574119905 gt 1205759 return 119862119878
Algorithm 3 PCM(119862119878 119901119903119890 120575) merge small or sparse communities
34 Time Complexity The proposed algorithm is comprisedof two phases the first one is to form the preliminarycommunities The main time consumption in this phase ison the selection of the node with the largest degree (step2 in Algorithm 2) and its most similar neighbor (step 3 inAlgorithm 2) the former can be accomplished in 119874(log 119899) ineach iteration using a max-heap data structure the latter canbe got down in 119874(log⟨119889⟩) with the max-heap where ⟨119889⟩ isthe average degree of nodes in the network Since ⟨119889⟩ ≪ 119899the time consumption of the first phase is 119874(119899 log 119899)
The second phase is used to improve the quality of theresulting community structure by merging some of the smallor sparse communities Themajor time is spent on determin-ing the community needed to be merged and its most similaradjacent community in each iteration Assuming there are119870 communities in the preliminary community structure theformer operation can be implemented in 119874(log119870) the lattercan also be carried out with 119874(log119870) time consumption inthe worst case Hence the second phase can be implementedwith 119874(119870 log119870) time consumption
Since 119870 ≪ 119899 then log119870 ≪ log 119899 Therefore theproposed method can detect communities from networkswith a relatively high efficiency 119874(119899 log 119899) time complexity
4 Experimental Results and Discussion
41 Network Datasets and Comparison System To testify theperformance of our proposed method we have conductedextensive experiments on both some groups of artificial net-works and some real-world networks The artificial networksare synthesized using LFR benchmark network generator[50] which works with some parameters to control thecharacteristics of generated networks Here we consider theinfluences of both the network scale and community sizetherefore four types of networks are generated say smallnetworks with small communities and big communities and
larger networks with small communities and big commu-nities respectively Each of the small networks and largernetworks contains 1000 and 5000 nodes respectively thesmall community contains about 10 nodes at least and 50nodes atmost theminimumandmaximumnumber of nodesin the big communities are 20 and 100 respectively Thegenerated networks with small communities and big commu-nities aremarked using the suffixes lsquosrsquo and lsquobrsquo individuallyTheexponents of the power-law distributions that node degreeand community size follow are the default values minus2 andminus1 respectively The parameters used to synthesize the fourgroups of artificial networks are listed in Table 1
We also performed the experiments on 13 real-worldnetworks the size of these networks spans from tens tohundreds of thousands of nodes the information aboutthem is listed in Table 2 These real-world networks can bedivided into two categories the first category includes thefirst four networks whose ground-truth communities areknown a priori the second one contains the other ninenetworks which have no publicly acknowledged ground-truth community structures
On these networks we ran our proposed method todetect community structures from them and compared theresults to those of 5 popular community detection algorithmsnamely Fast119876[24] WalkTrap [38] LPA[28] Attractor[41]IsoFdp[36] which have been already introduced in Section 2For LPA since it is a nondeterministic algorithm we ranit on each network 10 times and take the average of theevaluation metrics as its resulting metric value obtained fromthat network For our proposedmethod NSA we empiricallyset 120575 = 013 for the dolphin social network and 120575 = 01 forother networks in the experiments The details of how to setthe optimal value of 120575 will be discussed in Section 5
42 Evaluation Metrics Two indexes namely NMI (Nor-malized Mutual Information) [51] and modularity[7] are
8 Complexity
Table 1 The parameters used to generate the LFR networks In the header row of this table 119899 is number of nodes contained in the network⟨119889⟩ and 119889119898119886119909 are the average degree and the max degree respectively exp119889 and exp119888119900119898 are the exponents of the power law distributions thatnode degree and community size follow min(119862119894) and max(119862119894) represent the minimal and maximal number of nodes contained in everycommunity respectively
Network 119899 ⟨119889⟩ 119889119898119886119909 exp119889 expcom min(119862119894) max(119862119894)LFR1000s 1000 20 50 -2 -1 10 50LFR1000b 1000 20 50 -2 -1 20 100LFR5000s 5000 20 50 -2 -1 10 50LFR5000b 5000 20 50 -2 -1 20 100
Table 2 The information about the real-world networks 119899 and119898 are the number of nodes and edges in the network respectively
Network 119899 119898Karate club[14] 34 78Dolphin social network[15] 62 159Risk map[16] 42 83Scientists collaboration network [6] 118 197Lesmis[17] 77 254Polbooks[3] 105 441ColiNeta[18] 423 519NetScience[10] 1589 2742Email[19] 1133 5451YeastL[20] 2361 7182PGP[21] 10680 24316DBLP[22] 317080 1049866Amazon[22] 334863 925872
adopted as the measure metrics to evaluate the qualityof the detected community structure in this paper TheNMI between the ground-truth community structure 119875 =1198751 1198752 119875119870 and the extracted one 1198751015840 = 11987510158401 11987510158402 11987510158401198701015840 is calculated as follows
NMI (119875 1198751015840)
=minus2sum|119875|119894=1sum|119875
1015840|119895=1 119899119894119895 log ((119899119894119895 sdot 119899) (119899119875119894 sdot 119899119875
1015840
119895 ))sum|119875|119894=1 119899119875119894 log (119899119875119894 119899) + sum|119875
1015840|119895=1 119899119875
1015840
119895 log (1198991198751015840
119895 119899)
(8)
where 119899119875119894 = |119875119894| 1198991198751015840
119895 = |1198751015840119895 | and 119899119894119895 = |119875119894 cap 1198751015840119895 | respectivelyThe NMI is an information-theory based metric which
measures how much the detected community structureagrees with the ground truth Therefore it can only be usedto evaluate the quality of the detected community structureon networks whose ground-truth community structure isalready known Its value is in the range of [0 1] larger isbetter
Another metric widely used to evaluate the performanceof community detection method is modularity[7] which isdefined as follows
119876 = sum119894
(119890119894119894 minus 1198862119894 ) (9)
where 119890119894119894 is the diagonal element of a 119870 times 119870 matrix 119890whose element 119890119894119895 is the fraction of edges between nodes incommunities 119862119894 and 119862119895 to the total edges in the network 119870
is the number of communities in the community structure 119886119894is the fraction of edges associated with nodes in community119862119894
The first term sum119894 119890119894119894 in the right of (9) is the fractionof edges within communities the second term sum119894 1198862119894 is theexpected value of the same fraction in a random graph inwhich nodes and degree distribution are the same as in theoriginal network but edges are connected between nodesrandomly The smaller difference is between the two termsthe more the network approaches a random graph then theweaker the community structure is On the contrary thelarger the difference between them is the network departsfurther from the random graph then the stronger the com-munity structure is That is to say the modularity measuresquality of the community structure from the perspective ofhow far the detected result deviates from a random networkits effective value falls in [0 1] higher is better
43 Synthetic Networks We carried out experiments on fourgroups of artificial networks to testify the performance ofthe proposed method As mentioned above all the fourtypes of artificial networks are synthesized using the LFRbenchmark generator software [50] Besides the parameterslisted in Table 1 another critical parameter for this softwareis the mixing parameter 120583 which regulates for each node theratio of edges connected to nodes in other communities Thesmaller the value of 120583 is the clearer the community structurewill be Obviously 120583 = 05 is a transitive point above whichcommunities in networks tend to be obscure
Complexity 9
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a)
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10
NMI
(b)
Figure 2Comparison of different community-detection algorithms on LFR benchmark networks containing 1000 nodes (a)The results detectedfrom small network with small-sized communities (b) The results identified from small networks with big-sized communities
In our experiments we varied the value of 120583 from 01 to08 with an increment of 01 for each group of LFR networksTo eliminate the occasionality we generated 10 networksfor each value of 120583 while keeping the same setting forother parameters Since the community structures have beenalready embedded in these synthetic networks we use NMIas the metric to evaluate the performance of our proposedmethod and the comparison algorithms We took thesenetworks as the input one by one to run our proposedmethodand the comparison algorithms to detect communities anduse the average of NMI as the resulting metric The resultsdetected by our proposal and the comparison algorithmsfrom the small networks with small-sized communities orbig-sized communities are illustrated in Figures 2(a) and 2(b)respectively the results revealed from the larger networkswith small-sized communities and big-sized communities arepresented in Figures 3(a) and 3(b) separately
In Figures 2(a) and 2(b) Fast119876 tends to introducemistakes in the results no matter communities in networksarewell separated or obscure Asmentioned previously Fast119876is a typical modularity-optimization based algorithm it aimsonly at acquiring results with larger modularity rather thanhigh accuracy In our experiments all of the results uncoveredby it are not satisfactory Even in the networks with 120583 =01 it still failed to identify the exact communities andfurthermore its performance is the worst in comparisonalgorithms for 120583 ⩽ 05 For 120583 gt 05 the quality of its results isonly better than that of LPA LPA performed as well as othercomparison algorithms in those networks for 120583 lt 05 but itsperformance dropped dramatically for 120583 ⩾ 05 it even couldnot detect the effective communities from networks for 120583 gt06 This might be due to its own label-update mechanismwhen the community boundaries become obscure nodestend to accept incorrect labels to update their own onesalways leading to the trivial results even all nodes are labeled
as members of one giant community The proposed methodNSA acquired NMI = 1 on all networks for 120583 lt 05 meaningthat the detected partitions are perfectly matched with theground-truth community structures in these networks For120583 = 05 NSA also obtained the results as better as those ofWalkTrap Attractor and IsoFdp For 120583 gt 05 there has beena slip in the quality of the detected community structuresfor all those three algorithms and the proposed method For05 lt 120583 ⩽ 06 the quality of our proposal is better thanthat of Attractor in networks with larger communities andfor 120583 ⩾ 07 the performance of our proposed method is thebest
In Figures 3(a) and 3(b) we obtained the similar results asthose in Figure 2 overall But they still differ from each otherin someway In Figure 3(a) our proposedmethod performedthe best on almost all networks For 05 lt 120583 lt 07 in Figure 2NMI of the results extracted by our proposed method islower than those of WalkTrap and IsoFdp however inFigure 3 the proposedmethod performed better than IsoFdpfor 120583 gt 05 These results suggest that the performancesof the comparison algorithms are not stable on differentnetworks but our proposedmethod can steadily extract high-quality community structures from networks with differentcharacteristics This is also can be manifested from the factthat all the curves of the proposed method in these figuresdecline more slowly than others Moreover we can draw aconclusion by comparing the curves of the proposalrsquos own inthese figures that our proposed method inclines to performbetter on larger networks with small communities thereforeit overcomes the problem of resolution limit to some extent
44 Real-World Networks We also carried out experimentson 13 real-world networks to further test the effectivenessand efficiency of our proposed method As mentioned inSection 41 these networks fall in two categories ones with
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 5
Input 119866(119881 119864) the networkOutput 119862119878 119901119903119890 = 1198621 1198622 sdot sdot sdot 119862119896 the identified preliminary community structure
1 Initialize variables 119880 and 119862119878 119901119903119890 which are used to recordthe unclassified nodes and the preliminary community structure
119880 larr997888 119881 119862119878 119901119903119890 larr997888 1206012 Select the node with the largest degree denote it as V
V larr997888 argmax119906119889119906 | 119906 isin 1198803 Get the most similar neighbor of V denote it as 119908
119908 larr997888 argmax119906119904119894119898(V 119906) | 119906 isin Γ(V)4 if 119908 has not been assigned to any community then5 Create a new community for nodes V and 119908
119870 larr997888 |119862119878 119901119903119890| 119862119870+1 larr997888 V 1199086 Insert the created community into the community structure
119862119878 119901119903119890 larr997888 119862119878 119901119903119890 cup 119862119870+17 Remove nodes V and 119908 from 119880 as they are classified
119880 larr997888 119880 minus V 1199088 else9 Find the community to which 119908 belongs denote it as 119862119896
119896 larr997888 locate(119862119878 119901119903119890 119908)10 Insert node V into 119862119896
119862119896 larr997888 119862119896 cup V11 Remove node V from 119880 as it is classified
119880 larr997888 119880 minus V12 Repeat steps 2 through 11 until 119880 = 12060113 return 119862119878 119901119903119890
Algorithm 2 FPC(G) forming the preliminary community structure
to a community or not if it has not been classified to anycommunity yet steps 5 and 6 create a new community fornodes V and 119908 and insert the newly created community into119862119878 119901119903119890 then step 7 removes nodes V and 119908 from 119880 as theyhave been classified into the new community just now If node119908 has been already assigned to a community step 9 finds thecommunity 119862119896 to which node Vrsquos most similar neighbor 119908belongs and step 10 inserts node V into community 119862119896 Sincenode V has been assigned to community119862119896 step 11 removes itfrom119880 Step 12 repeats operations in steps 2 through 11 until119880 = 120601 meaning that all the nodes in the network have beenvisited At that time the preliminary community structureis obtained in 119862119878 119901119903119890 and is returned as the output of thisalgorithm in step 13
To make it clearer we take Zacharyrsquos karate club network[14] as an example to illustrate intuitively the procedureThis is a network with 34 nodes and 78 edges as shown inFigure 1(a) in which the node with the largest degree is nodelsquo34rsquo and its most similar neighbor is node lsquo33rsquo Thereforenode lsquo34rsquo is taken as the exemplar of the first communityand node lsquo33rsquo is also inserted into this community Thenthe node with the largest degree in the remaining nodes isnode lsquo1rsquo its most similar neighbor is node lsquo2rsquo Since node lsquo2rsquohas not been assigned to a community yet we create a newcommunity take node lsquo1rsquo as its exemplar and insert node lsquo2rsquointo the new community as well The same thing happens tonode pairs (lsquo3rsquo lsquo4rsquo) (lsquo32rsquo lsquo29rsquo) and (lsquo9rsquo lsquo31rsquo) sequentially Thenthe next largest-degree node is lsquo14rsquo its most similar neighbornode lsquo4rsquo is already in the third community therefore weinsert node lsquo14rsquo into the third community All of the other
nodes are processed in the same way and in the subsequentoperations node pairs (lsquo24rsquo rsquo30rsquo) (lsquo6rsquo lsquo7rsquo) (lsquo5rsquo lsquo11rsquo) and (lsquo25rsquolsquo26rsquo) form new communities all of the remaining nodesare inserted into communities to which their most similarneighbors belong At the end of the process we obtain thepreliminary community structure as shown in Figure 1(b) inwhich each node connects to its most similar neighbor witha directed edge
33 Merge of Small or Sparse Communities At the end ofthe first phase of our proposed method we obtain thepreliminary community structure However some commu-nities are either too small or too sparse to make sense justlike the preliminary communities lsquo5rsquo lsquo11rsquo lsquo9rsquo lsquo31rsquo lsquo32rsquolsquo29rsquo lsquo25rsquo lsquo26rsquo lsquo28rsquo lsquo24rsquo lsquo30rsquo lsquo27rsquo and lsquo6rsquo lsquo7rsquo lsquo17rsquo inFigure 1(b) because each of them contains only a few nodesthe inside edges of each of them are very sparse the numberof edges inside each of them is much smaller than that ofedges connecting to outside violating the characteristic thatconnections inside one community are much denser thanthose across different communities Keeping them in the finalcommunity structure will lead to the low quality Thereforewe merge some of the preliminary communities to acquirethe final result in the second phase which is carried out byfunction call PCM() in Algorithm 1
To this end there are two problems needed to be solvedin PCM() The first one is to identify which communities aresmall or sparse enough that need to be merged into anotherones the second one is to select the communities into whicheach of the small or sparse communities should be merged
6 Complexity
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 1 The procedure of FPC() on the karate club network
For the first problem we propose an index communitymetric which takes into account two factors communitysize and community sparsity to find out the preliminarycommunities needed to be merged Here we formalize therelevant concepts and the index as Definition 1 throughDefinition 3
Definition 1 (community sparsity) The sparsity of commu-nity 119862119894 is defined as follows
120572119894 =10038161003816100381610038161003816119864119894119899119894
1003816100381610038161003816100381610038161003816100381610038161198641199001199061199051198941003816100381610038161003816 (4)
where 119864119894119899119894 is the set of edges within community 119862119894 and and119864119900119906119905119894 is the set of edges connecting nodes in community 119862119894with other communities
That is to say the sparsity of community 119862119894 is defined asthe ratio between the number of inner edges of 119862119894 and thenumber of outer edges of 119862119894 Obviously the more edges existwithin community 119862119894 the larger the value of 120572119894 will be andvice versa
Definition 2 (community scale) The scale of community 119862119894is formalized as follows
120573119894 =10038161003816100381610038161198811198941003816100381610038161003816
|119881| (5)
where 119881119894 is the set of nodes in community 119862119894
Obviously the scale of community 119862119894 is defined as theratio of the number of nodes in 119862119894 to the total numberof nodes in the network The more nodes there are incommunity 119862119894 the larger value the ratio will be and viceversa
Definition 3 (community metric) The community metricis a combination of both the community sparsity and thecommunity scale which is defined for community 119862119894 asfollows
120574119894 = 120572119894 lowast 120573119894 (6)
On the basis of these definitions the first problem can besolved by setting a community metric threshold 120575 That is tosay if 120574119894 lt 120575 community 119862119894 needs to be merged into anothercommunity
For the second problem we consider a strategy con-forming to the construction of preliminary communitiesThe preliminary communities are formed based mainly onnode similarity in the first phase therefore we also use thesimilarity as a criterion here to merge communities ie eachof the small or sparse communities is merged into its mostsimilar adjacent communityHere the similarity between twocommunities 119862119894 and 119862119895 is calculated as follows
119878119894119898(119862119894 119862119895) =sum 119906isin119862119894
Visin119862119895119904119894119898 (119906 V)10038161003816100381610038161003816119862119895
10038161003816100381610038161003816 (7)
where 119904119894119898(119906 V) is the similarity between nodes 119906 isin 119862119894and V isin 119862119895 which is calculated using (3) In functionPCM() implementing the merge procedure 119862119894 is a com-munity needed to be merged 119862119895 is one of its adjacentcommunities The numerator of the right term in (7) is thesum of similarities between nodes in communities 119862119894 and119862119895 Dividing by the denominator |119862119895| is a constraint onthe priority for larger communities to prevent from formingsome giant communities
The logic of entire procedure of the second phase is listedin Algorithm 3 the operations are almost self-explanatoryThe variable 119862119878 is used to record the final communitystructure it is initialized as the preliminary communitystructure 119862119878 119901119903119890 in step 1 Step 2 calculates the communitymetric for each of the preliminary communities steps 3 and4 select the community with the smallest community metricand its most similar community step 5 merges them toyield a new community and step 6 calculates the communitymetric for that new community Step 7 replaces the twocommunities 119862119905 and 119862119895 with that new community in 119862119878to reflect the effect of the merge operation Step 8 repeatsoperations in steps 3 through 7 until the minimal communitymetric of the selected community is larger than the giventhreshold 120575 meaning that all the remaining communities aresatisfactory therefore themerge procedure is terminated andthe resulting community structure in119862119878 is returned in step 9
Complexity 7
Input 119862119878 119901119903119890 the preliminary community structure 120575 the community-metric thresholdOutput 119862119878 the final community structure
1 Initialize 119862119878 which is used to record the community structure119862119878 larr997888 119862119878 119901119903119890
2 Calculate the community metric for each of the preliminary communitiesforeach 119862119894 isin 119862119878 do
120574119894 larr997888 120572119894 times 1205731198943 Select the community with the minimal community metric denote its index as 119905
119905 larr997888 argmin119894120574119894 | 119894 = 1 2 sdot sdot sdot |119862119878|4 Identify the most similar community with 119862119905 denote its index as 119895
119895 larr997888 argmax119894119878119894119898(119862119905 119862119894) | 119894 = 1 2 sdot sdot sdot |119862119878| 119894 = 1199055 Merge communities 119862119905 and 119862119895 to form a new community
119896 larr997888 |119862119878| 119862119896+1 larr997888 119862119905 cup 1198621198956 Calculate the community metric for the new community
120574119896+1 larr997888 120572119896+1 times 120573119896+17 Replace the two communities 119862119905 and 119862119895 with the new community to reflect the merging effect
119862119878 = 119862119878 minus 119862119905 119862119895 cup 119862119896+18 Repeat steps 3 through 7 until 120574119905 gt 1205759 return 119862119878
Algorithm 3 PCM(119862119878 119901119903119890 120575) merge small or sparse communities
34 Time Complexity The proposed algorithm is comprisedof two phases the first one is to form the preliminarycommunities The main time consumption in this phase ison the selection of the node with the largest degree (step2 in Algorithm 2) and its most similar neighbor (step 3 inAlgorithm 2) the former can be accomplished in 119874(log 119899) ineach iteration using a max-heap data structure the latter canbe got down in 119874(log⟨119889⟩) with the max-heap where ⟨119889⟩ isthe average degree of nodes in the network Since ⟨119889⟩ ≪ 119899the time consumption of the first phase is 119874(119899 log 119899)
The second phase is used to improve the quality of theresulting community structure by merging some of the smallor sparse communities Themajor time is spent on determin-ing the community needed to be merged and its most similaradjacent community in each iteration Assuming there are119870 communities in the preliminary community structure theformer operation can be implemented in 119874(log119870) the lattercan also be carried out with 119874(log119870) time consumption inthe worst case Hence the second phase can be implementedwith 119874(119870 log119870) time consumption
Since 119870 ≪ 119899 then log119870 ≪ log 119899 Therefore theproposed method can detect communities from networkswith a relatively high efficiency 119874(119899 log 119899) time complexity
4 Experimental Results and Discussion
41 Network Datasets and Comparison System To testify theperformance of our proposed method we have conductedextensive experiments on both some groups of artificial net-works and some real-world networks The artificial networksare synthesized using LFR benchmark network generator[50] which works with some parameters to control thecharacteristics of generated networks Here we consider theinfluences of both the network scale and community sizetherefore four types of networks are generated say smallnetworks with small communities and big communities and
larger networks with small communities and big commu-nities respectively Each of the small networks and largernetworks contains 1000 and 5000 nodes respectively thesmall community contains about 10 nodes at least and 50nodes atmost theminimumandmaximumnumber of nodesin the big communities are 20 and 100 respectively Thegenerated networks with small communities and big commu-nities aremarked using the suffixes lsquosrsquo and lsquobrsquo individuallyTheexponents of the power-law distributions that node degreeand community size follow are the default values minus2 andminus1 respectively The parameters used to synthesize the fourgroups of artificial networks are listed in Table 1
We also performed the experiments on 13 real-worldnetworks the size of these networks spans from tens tohundreds of thousands of nodes the information aboutthem is listed in Table 2 These real-world networks can bedivided into two categories the first category includes thefirst four networks whose ground-truth communities areknown a priori the second one contains the other ninenetworks which have no publicly acknowledged ground-truth community structures
On these networks we ran our proposed method todetect community structures from them and compared theresults to those of 5 popular community detection algorithmsnamely Fast119876[24] WalkTrap [38] LPA[28] Attractor[41]IsoFdp[36] which have been already introduced in Section 2For LPA since it is a nondeterministic algorithm we ranit on each network 10 times and take the average of theevaluation metrics as its resulting metric value obtained fromthat network For our proposedmethod NSA we empiricallyset 120575 = 013 for the dolphin social network and 120575 = 01 forother networks in the experiments The details of how to setthe optimal value of 120575 will be discussed in Section 5
42 Evaluation Metrics Two indexes namely NMI (Nor-malized Mutual Information) [51] and modularity[7] are
8 Complexity
Table 1 The parameters used to generate the LFR networks In the header row of this table 119899 is number of nodes contained in the network⟨119889⟩ and 119889119898119886119909 are the average degree and the max degree respectively exp119889 and exp119888119900119898 are the exponents of the power law distributions thatnode degree and community size follow min(119862119894) and max(119862119894) represent the minimal and maximal number of nodes contained in everycommunity respectively
Network 119899 ⟨119889⟩ 119889119898119886119909 exp119889 expcom min(119862119894) max(119862119894)LFR1000s 1000 20 50 -2 -1 10 50LFR1000b 1000 20 50 -2 -1 20 100LFR5000s 5000 20 50 -2 -1 10 50LFR5000b 5000 20 50 -2 -1 20 100
Table 2 The information about the real-world networks 119899 and119898 are the number of nodes and edges in the network respectively
Network 119899 119898Karate club[14] 34 78Dolphin social network[15] 62 159Risk map[16] 42 83Scientists collaboration network [6] 118 197Lesmis[17] 77 254Polbooks[3] 105 441ColiNeta[18] 423 519NetScience[10] 1589 2742Email[19] 1133 5451YeastL[20] 2361 7182PGP[21] 10680 24316DBLP[22] 317080 1049866Amazon[22] 334863 925872
adopted as the measure metrics to evaluate the qualityof the detected community structure in this paper TheNMI between the ground-truth community structure 119875 =1198751 1198752 119875119870 and the extracted one 1198751015840 = 11987510158401 11987510158402 11987510158401198701015840 is calculated as follows
NMI (119875 1198751015840)
=minus2sum|119875|119894=1sum|119875
1015840|119895=1 119899119894119895 log ((119899119894119895 sdot 119899) (119899119875119894 sdot 119899119875
1015840
119895 ))sum|119875|119894=1 119899119875119894 log (119899119875119894 119899) + sum|119875
1015840|119895=1 119899119875
1015840
119895 log (1198991198751015840
119895 119899)
(8)
where 119899119875119894 = |119875119894| 1198991198751015840
119895 = |1198751015840119895 | and 119899119894119895 = |119875119894 cap 1198751015840119895 | respectivelyThe NMI is an information-theory based metric which
measures how much the detected community structureagrees with the ground truth Therefore it can only be usedto evaluate the quality of the detected community structureon networks whose ground-truth community structure isalready known Its value is in the range of [0 1] larger isbetter
Another metric widely used to evaluate the performanceof community detection method is modularity[7] which isdefined as follows
119876 = sum119894
(119890119894119894 minus 1198862119894 ) (9)
where 119890119894119894 is the diagonal element of a 119870 times 119870 matrix 119890whose element 119890119894119895 is the fraction of edges between nodes incommunities 119862119894 and 119862119895 to the total edges in the network 119870
is the number of communities in the community structure 119886119894is the fraction of edges associated with nodes in community119862119894
The first term sum119894 119890119894119894 in the right of (9) is the fractionof edges within communities the second term sum119894 1198862119894 is theexpected value of the same fraction in a random graph inwhich nodes and degree distribution are the same as in theoriginal network but edges are connected between nodesrandomly The smaller difference is between the two termsthe more the network approaches a random graph then theweaker the community structure is On the contrary thelarger the difference between them is the network departsfurther from the random graph then the stronger the com-munity structure is That is to say the modularity measuresquality of the community structure from the perspective ofhow far the detected result deviates from a random networkits effective value falls in [0 1] higher is better
43 Synthetic Networks We carried out experiments on fourgroups of artificial networks to testify the performance ofthe proposed method As mentioned above all the fourtypes of artificial networks are synthesized using the LFRbenchmark generator software [50] Besides the parameterslisted in Table 1 another critical parameter for this softwareis the mixing parameter 120583 which regulates for each node theratio of edges connected to nodes in other communities Thesmaller the value of 120583 is the clearer the community structurewill be Obviously 120583 = 05 is a transitive point above whichcommunities in networks tend to be obscure
Complexity 9
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a)
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10
NMI
(b)
Figure 2Comparison of different community-detection algorithms on LFR benchmark networks containing 1000 nodes (a)The results detectedfrom small network with small-sized communities (b) The results identified from small networks with big-sized communities
In our experiments we varied the value of 120583 from 01 to08 with an increment of 01 for each group of LFR networksTo eliminate the occasionality we generated 10 networksfor each value of 120583 while keeping the same setting forother parameters Since the community structures have beenalready embedded in these synthetic networks we use NMIas the metric to evaluate the performance of our proposedmethod and the comparison algorithms We took thesenetworks as the input one by one to run our proposedmethodand the comparison algorithms to detect communities anduse the average of NMI as the resulting metric The resultsdetected by our proposal and the comparison algorithmsfrom the small networks with small-sized communities orbig-sized communities are illustrated in Figures 2(a) and 2(b)respectively the results revealed from the larger networkswith small-sized communities and big-sized communities arepresented in Figures 3(a) and 3(b) separately
In Figures 2(a) and 2(b) Fast119876 tends to introducemistakes in the results no matter communities in networksarewell separated or obscure Asmentioned previously Fast119876is a typical modularity-optimization based algorithm it aimsonly at acquiring results with larger modularity rather thanhigh accuracy In our experiments all of the results uncoveredby it are not satisfactory Even in the networks with 120583 =01 it still failed to identify the exact communities andfurthermore its performance is the worst in comparisonalgorithms for 120583 ⩽ 05 For 120583 gt 05 the quality of its results isonly better than that of LPA LPA performed as well as othercomparison algorithms in those networks for 120583 lt 05 but itsperformance dropped dramatically for 120583 ⩾ 05 it even couldnot detect the effective communities from networks for 120583 gt06 This might be due to its own label-update mechanismwhen the community boundaries become obscure nodestend to accept incorrect labels to update their own onesalways leading to the trivial results even all nodes are labeled
as members of one giant community The proposed methodNSA acquired NMI = 1 on all networks for 120583 lt 05 meaningthat the detected partitions are perfectly matched with theground-truth community structures in these networks For120583 = 05 NSA also obtained the results as better as those ofWalkTrap Attractor and IsoFdp For 120583 gt 05 there has beena slip in the quality of the detected community structuresfor all those three algorithms and the proposed method For05 lt 120583 ⩽ 06 the quality of our proposal is better thanthat of Attractor in networks with larger communities andfor 120583 ⩾ 07 the performance of our proposed method is thebest
In Figures 3(a) and 3(b) we obtained the similar results asthose in Figure 2 overall But they still differ from each otherin someway In Figure 3(a) our proposedmethod performedthe best on almost all networks For 05 lt 120583 lt 07 in Figure 2NMI of the results extracted by our proposed method islower than those of WalkTrap and IsoFdp however inFigure 3 the proposedmethod performed better than IsoFdpfor 120583 gt 05 These results suggest that the performancesof the comparison algorithms are not stable on differentnetworks but our proposedmethod can steadily extract high-quality community structures from networks with differentcharacteristics This is also can be manifested from the factthat all the curves of the proposed method in these figuresdecline more slowly than others Moreover we can draw aconclusion by comparing the curves of the proposalrsquos own inthese figures that our proposed method inclines to performbetter on larger networks with small communities thereforeit overcomes the problem of resolution limit to some extent
44 Real-World Networks We also carried out experimentson 13 real-world networks to further test the effectivenessand efficiency of our proposed method As mentioned inSection 41 these networks fall in two categories ones with
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
6 Complexity
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
1112
13
141516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 1 The procedure of FPC() on the karate club network
For the first problem we propose an index communitymetric which takes into account two factors communitysize and community sparsity to find out the preliminarycommunities needed to be merged Here we formalize therelevant concepts and the index as Definition 1 throughDefinition 3
Definition 1 (community sparsity) The sparsity of commu-nity 119862119894 is defined as follows
120572119894 =10038161003816100381610038161003816119864119894119899119894
1003816100381610038161003816100381610038161003816100381610038161198641199001199061199051198941003816100381610038161003816 (4)
where 119864119894119899119894 is the set of edges within community 119862119894 and and119864119900119906119905119894 is the set of edges connecting nodes in community 119862119894with other communities
That is to say the sparsity of community 119862119894 is defined asthe ratio between the number of inner edges of 119862119894 and thenumber of outer edges of 119862119894 Obviously the more edges existwithin community 119862119894 the larger the value of 120572119894 will be andvice versa
Definition 2 (community scale) The scale of community 119862119894is formalized as follows
120573119894 =10038161003816100381610038161198811198941003816100381610038161003816
|119881| (5)
where 119881119894 is the set of nodes in community 119862119894
Obviously the scale of community 119862119894 is defined as theratio of the number of nodes in 119862119894 to the total numberof nodes in the network The more nodes there are incommunity 119862119894 the larger value the ratio will be and viceversa
Definition 3 (community metric) The community metricis a combination of both the community sparsity and thecommunity scale which is defined for community 119862119894 asfollows
120574119894 = 120572119894 lowast 120573119894 (6)
On the basis of these definitions the first problem can besolved by setting a community metric threshold 120575 That is tosay if 120574119894 lt 120575 community 119862119894 needs to be merged into anothercommunity
For the second problem we consider a strategy con-forming to the construction of preliminary communitiesThe preliminary communities are formed based mainly onnode similarity in the first phase therefore we also use thesimilarity as a criterion here to merge communities ie eachof the small or sparse communities is merged into its mostsimilar adjacent communityHere the similarity between twocommunities 119862119894 and 119862119895 is calculated as follows
119878119894119898(119862119894 119862119895) =sum 119906isin119862119894
Visin119862119895119904119894119898 (119906 V)10038161003816100381610038161003816119862119895
10038161003816100381610038161003816 (7)
where 119904119894119898(119906 V) is the similarity between nodes 119906 isin 119862119894and V isin 119862119895 which is calculated using (3) In functionPCM() implementing the merge procedure 119862119894 is a com-munity needed to be merged 119862119895 is one of its adjacentcommunities The numerator of the right term in (7) is thesum of similarities between nodes in communities 119862119894 and119862119895 Dividing by the denominator |119862119895| is a constraint onthe priority for larger communities to prevent from formingsome giant communities
The logic of entire procedure of the second phase is listedin Algorithm 3 the operations are almost self-explanatoryThe variable 119862119878 is used to record the final communitystructure it is initialized as the preliminary communitystructure 119862119878 119901119903119890 in step 1 Step 2 calculates the communitymetric for each of the preliminary communities steps 3 and4 select the community with the smallest community metricand its most similar community step 5 merges them toyield a new community and step 6 calculates the communitymetric for that new community Step 7 replaces the twocommunities 119862119905 and 119862119895 with that new community in 119862119878to reflect the effect of the merge operation Step 8 repeatsoperations in steps 3 through 7 until the minimal communitymetric of the selected community is larger than the giventhreshold 120575 meaning that all the remaining communities aresatisfactory therefore themerge procedure is terminated andthe resulting community structure in119862119878 is returned in step 9
Complexity 7
Input 119862119878 119901119903119890 the preliminary community structure 120575 the community-metric thresholdOutput 119862119878 the final community structure
1 Initialize 119862119878 which is used to record the community structure119862119878 larr997888 119862119878 119901119903119890
2 Calculate the community metric for each of the preliminary communitiesforeach 119862119894 isin 119862119878 do
120574119894 larr997888 120572119894 times 1205731198943 Select the community with the minimal community metric denote its index as 119905
119905 larr997888 argmin119894120574119894 | 119894 = 1 2 sdot sdot sdot |119862119878|4 Identify the most similar community with 119862119905 denote its index as 119895
119895 larr997888 argmax119894119878119894119898(119862119905 119862119894) | 119894 = 1 2 sdot sdot sdot |119862119878| 119894 = 1199055 Merge communities 119862119905 and 119862119895 to form a new community
119896 larr997888 |119862119878| 119862119896+1 larr997888 119862119905 cup 1198621198956 Calculate the community metric for the new community
120574119896+1 larr997888 120572119896+1 times 120573119896+17 Replace the two communities 119862119905 and 119862119895 with the new community to reflect the merging effect
119862119878 = 119862119878 minus 119862119905 119862119895 cup 119862119896+18 Repeat steps 3 through 7 until 120574119905 gt 1205759 return 119862119878
Algorithm 3 PCM(119862119878 119901119903119890 120575) merge small or sparse communities
34 Time Complexity The proposed algorithm is comprisedof two phases the first one is to form the preliminarycommunities The main time consumption in this phase ison the selection of the node with the largest degree (step2 in Algorithm 2) and its most similar neighbor (step 3 inAlgorithm 2) the former can be accomplished in 119874(log 119899) ineach iteration using a max-heap data structure the latter canbe got down in 119874(log⟨119889⟩) with the max-heap where ⟨119889⟩ isthe average degree of nodes in the network Since ⟨119889⟩ ≪ 119899the time consumption of the first phase is 119874(119899 log 119899)
The second phase is used to improve the quality of theresulting community structure by merging some of the smallor sparse communities Themajor time is spent on determin-ing the community needed to be merged and its most similaradjacent community in each iteration Assuming there are119870 communities in the preliminary community structure theformer operation can be implemented in 119874(log119870) the lattercan also be carried out with 119874(log119870) time consumption inthe worst case Hence the second phase can be implementedwith 119874(119870 log119870) time consumption
Since 119870 ≪ 119899 then log119870 ≪ log 119899 Therefore theproposed method can detect communities from networkswith a relatively high efficiency 119874(119899 log 119899) time complexity
4 Experimental Results and Discussion
41 Network Datasets and Comparison System To testify theperformance of our proposed method we have conductedextensive experiments on both some groups of artificial net-works and some real-world networks The artificial networksare synthesized using LFR benchmark network generator[50] which works with some parameters to control thecharacteristics of generated networks Here we consider theinfluences of both the network scale and community sizetherefore four types of networks are generated say smallnetworks with small communities and big communities and
larger networks with small communities and big commu-nities respectively Each of the small networks and largernetworks contains 1000 and 5000 nodes respectively thesmall community contains about 10 nodes at least and 50nodes atmost theminimumandmaximumnumber of nodesin the big communities are 20 and 100 respectively Thegenerated networks with small communities and big commu-nities aremarked using the suffixes lsquosrsquo and lsquobrsquo individuallyTheexponents of the power-law distributions that node degreeand community size follow are the default values minus2 andminus1 respectively The parameters used to synthesize the fourgroups of artificial networks are listed in Table 1
We also performed the experiments on 13 real-worldnetworks the size of these networks spans from tens tohundreds of thousands of nodes the information aboutthem is listed in Table 2 These real-world networks can bedivided into two categories the first category includes thefirst four networks whose ground-truth communities areknown a priori the second one contains the other ninenetworks which have no publicly acknowledged ground-truth community structures
On these networks we ran our proposed method todetect community structures from them and compared theresults to those of 5 popular community detection algorithmsnamely Fast119876[24] WalkTrap [38] LPA[28] Attractor[41]IsoFdp[36] which have been already introduced in Section 2For LPA since it is a nondeterministic algorithm we ranit on each network 10 times and take the average of theevaluation metrics as its resulting metric value obtained fromthat network For our proposedmethod NSA we empiricallyset 120575 = 013 for the dolphin social network and 120575 = 01 forother networks in the experiments The details of how to setthe optimal value of 120575 will be discussed in Section 5
42 Evaluation Metrics Two indexes namely NMI (Nor-malized Mutual Information) [51] and modularity[7] are
8 Complexity
Table 1 The parameters used to generate the LFR networks In the header row of this table 119899 is number of nodes contained in the network⟨119889⟩ and 119889119898119886119909 are the average degree and the max degree respectively exp119889 and exp119888119900119898 are the exponents of the power law distributions thatnode degree and community size follow min(119862119894) and max(119862119894) represent the minimal and maximal number of nodes contained in everycommunity respectively
Network 119899 ⟨119889⟩ 119889119898119886119909 exp119889 expcom min(119862119894) max(119862119894)LFR1000s 1000 20 50 -2 -1 10 50LFR1000b 1000 20 50 -2 -1 20 100LFR5000s 5000 20 50 -2 -1 10 50LFR5000b 5000 20 50 -2 -1 20 100
Table 2 The information about the real-world networks 119899 and119898 are the number of nodes and edges in the network respectively
Network 119899 119898Karate club[14] 34 78Dolphin social network[15] 62 159Risk map[16] 42 83Scientists collaboration network [6] 118 197Lesmis[17] 77 254Polbooks[3] 105 441ColiNeta[18] 423 519NetScience[10] 1589 2742Email[19] 1133 5451YeastL[20] 2361 7182PGP[21] 10680 24316DBLP[22] 317080 1049866Amazon[22] 334863 925872
adopted as the measure metrics to evaluate the qualityof the detected community structure in this paper TheNMI between the ground-truth community structure 119875 =1198751 1198752 119875119870 and the extracted one 1198751015840 = 11987510158401 11987510158402 11987510158401198701015840 is calculated as follows
NMI (119875 1198751015840)
=minus2sum|119875|119894=1sum|119875
1015840|119895=1 119899119894119895 log ((119899119894119895 sdot 119899) (119899119875119894 sdot 119899119875
1015840
119895 ))sum|119875|119894=1 119899119875119894 log (119899119875119894 119899) + sum|119875
1015840|119895=1 119899119875
1015840
119895 log (1198991198751015840
119895 119899)
(8)
where 119899119875119894 = |119875119894| 1198991198751015840
119895 = |1198751015840119895 | and 119899119894119895 = |119875119894 cap 1198751015840119895 | respectivelyThe NMI is an information-theory based metric which
measures how much the detected community structureagrees with the ground truth Therefore it can only be usedto evaluate the quality of the detected community structureon networks whose ground-truth community structure isalready known Its value is in the range of [0 1] larger isbetter
Another metric widely used to evaluate the performanceof community detection method is modularity[7] which isdefined as follows
119876 = sum119894
(119890119894119894 minus 1198862119894 ) (9)
where 119890119894119894 is the diagonal element of a 119870 times 119870 matrix 119890whose element 119890119894119895 is the fraction of edges between nodes incommunities 119862119894 and 119862119895 to the total edges in the network 119870
is the number of communities in the community structure 119886119894is the fraction of edges associated with nodes in community119862119894
The first term sum119894 119890119894119894 in the right of (9) is the fractionof edges within communities the second term sum119894 1198862119894 is theexpected value of the same fraction in a random graph inwhich nodes and degree distribution are the same as in theoriginal network but edges are connected between nodesrandomly The smaller difference is between the two termsthe more the network approaches a random graph then theweaker the community structure is On the contrary thelarger the difference between them is the network departsfurther from the random graph then the stronger the com-munity structure is That is to say the modularity measuresquality of the community structure from the perspective ofhow far the detected result deviates from a random networkits effective value falls in [0 1] higher is better
43 Synthetic Networks We carried out experiments on fourgroups of artificial networks to testify the performance ofthe proposed method As mentioned above all the fourtypes of artificial networks are synthesized using the LFRbenchmark generator software [50] Besides the parameterslisted in Table 1 another critical parameter for this softwareis the mixing parameter 120583 which regulates for each node theratio of edges connected to nodes in other communities Thesmaller the value of 120583 is the clearer the community structurewill be Obviously 120583 = 05 is a transitive point above whichcommunities in networks tend to be obscure
Complexity 9
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a)
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10
NMI
(b)
Figure 2Comparison of different community-detection algorithms on LFR benchmark networks containing 1000 nodes (a)The results detectedfrom small network with small-sized communities (b) The results identified from small networks with big-sized communities
In our experiments we varied the value of 120583 from 01 to08 with an increment of 01 for each group of LFR networksTo eliminate the occasionality we generated 10 networksfor each value of 120583 while keeping the same setting forother parameters Since the community structures have beenalready embedded in these synthetic networks we use NMIas the metric to evaluate the performance of our proposedmethod and the comparison algorithms We took thesenetworks as the input one by one to run our proposedmethodand the comparison algorithms to detect communities anduse the average of NMI as the resulting metric The resultsdetected by our proposal and the comparison algorithmsfrom the small networks with small-sized communities orbig-sized communities are illustrated in Figures 2(a) and 2(b)respectively the results revealed from the larger networkswith small-sized communities and big-sized communities arepresented in Figures 3(a) and 3(b) separately
In Figures 2(a) and 2(b) Fast119876 tends to introducemistakes in the results no matter communities in networksarewell separated or obscure Asmentioned previously Fast119876is a typical modularity-optimization based algorithm it aimsonly at acquiring results with larger modularity rather thanhigh accuracy In our experiments all of the results uncoveredby it are not satisfactory Even in the networks with 120583 =01 it still failed to identify the exact communities andfurthermore its performance is the worst in comparisonalgorithms for 120583 ⩽ 05 For 120583 gt 05 the quality of its results isonly better than that of LPA LPA performed as well as othercomparison algorithms in those networks for 120583 lt 05 but itsperformance dropped dramatically for 120583 ⩾ 05 it even couldnot detect the effective communities from networks for 120583 gt06 This might be due to its own label-update mechanismwhen the community boundaries become obscure nodestend to accept incorrect labels to update their own onesalways leading to the trivial results even all nodes are labeled
as members of one giant community The proposed methodNSA acquired NMI = 1 on all networks for 120583 lt 05 meaningthat the detected partitions are perfectly matched with theground-truth community structures in these networks For120583 = 05 NSA also obtained the results as better as those ofWalkTrap Attractor and IsoFdp For 120583 gt 05 there has beena slip in the quality of the detected community structuresfor all those three algorithms and the proposed method For05 lt 120583 ⩽ 06 the quality of our proposal is better thanthat of Attractor in networks with larger communities andfor 120583 ⩾ 07 the performance of our proposed method is thebest
In Figures 3(a) and 3(b) we obtained the similar results asthose in Figure 2 overall But they still differ from each otherin someway In Figure 3(a) our proposedmethod performedthe best on almost all networks For 05 lt 120583 lt 07 in Figure 2NMI of the results extracted by our proposed method islower than those of WalkTrap and IsoFdp however inFigure 3 the proposedmethod performed better than IsoFdpfor 120583 gt 05 These results suggest that the performancesof the comparison algorithms are not stable on differentnetworks but our proposedmethod can steadily extract high-quality community structures from networks with differentcharacteristics This is also can be manifested from the factthat all the curves of the proposed method in these figuresdecline more slowly than others Moreover we can draw aconclusion by comparing the curves of the proposalrsquos own inthese figures that our proposed method inclines to performbetter on larger networks with small communities thereforeit overcomes the problem of resolution limit to some extent
44 Real-World Networks We also carried out experimentson 13 real-world networks to further test the effectivenessand efficiency of our proposed method As mentioned inSection 41 these networks fall in two categories ones with
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 7
Input 119862119878 119901119903119890 the preliminary community structure 120575 the community-metric thresholdOutput 119862119878 the final community structure
1 Initialize 119862119878 which is used to record the community structure119862119878 larr997888 119862119878 119901119903119890
2 Calculate the community metric for each of the preliminary communitiesforeach 119862119894 isin 119862119878 do
120574119894 larr997888 120572119894 times 1205731198943 Select the community with the minimal community metric denote its index as 119905
119905 larr997888 argmin119894120574119894 | 119894 = 1 2 sdot sdot sdot |119862119878|4 Identify the most similar community with 119862119905 denote its index as 119895
119895 larr997888 argmax119894119878119894119898(119862119905 119862119894) | 119894 = 1 2 sdot sdot sdot |119862119878| 119894 = 1199055 Merge communities 119862119905 and 119862119895 to form a new community
119896 larr997888 |119862119878| 119862119896+1 larr997888 119862119905 cup 1198621198956 Calculate the community metric for the new community
120574119896+1 larr997888 120572119896+1 times 120573119896+17 Replace the two communities 119862119905 and 119862119895 with the new community to reflect the merging effect
119862119878 = 119862119878 minus 119862119905 119862119895 cup 119862119896+18 Repeat steps 3 through 7 until 120574119905 gt 1205759 return 119862119878
Algorithm 3 PCM(119862119878 119901119903119890 120575) merge small or sparse communities
34 Time Complexity The proposed algorithm is comprisedof two phases the first one is to form the preliminarycommunities The main time consumption in this phase ison the selection of the node with the largest degree (step2 in Algorithm 2) and its most similar neighbor (step 3 inAlgorithm 2) the former can be accomplished in 119874(log 119899) ineach iteration using a max-heap data structure the latter canbe got down in 119874(log⟨119889⟩) with the max-heap where ⟨119889⟩ isthe average degree of nodes in the network Since ⟨119889⟩ ≪ 119899the time consumption of the first phase is 119874(119899 log 119899)
The second phase is used to improve the quality of theresulting community structure by merging some of the smallor sparse communities Themajor time is spent on determin-ing the community needed to be merged and its most similaradjacent community in each iteration Assuming there are119870 communities in the preliminary community structure theformer operation can be implemented in 119874(log119870) the lattercan also be carried out with 119874(log119870) time consumption inthe worst case Hence the second phase can be implementedwith 119874(119870 log119870) time consumption
Since 119870 ≪ 119899 then log119870 ≪ log 119899 Therefore theproposed method can detect communities from networkswith a relatively high efficiency 119874(119899 log 119899) time complexity
4 Experimental Results and Discussion
41 Network Datasets and Comparison System To testify theperformance of our proposed method we have conductedextensive experiments on both some groups of artificial net-works and some real-world networks The artificial networksare synthesized using LFR benchmark network generator[50] which works with some parameters to control thecharacteristics of generated networks Here we consider theinfluences of both the network scale and community sizetherefore four types of networks are generated say smallnetworks with small communities and big communities and
larger networks with small communities and big commu-nities respectively Each of the small networks and largernetworks contains 1000 and 5000 nodes respectively thesmall community contains about 10 nodes at least and 50nodes atmost theminimumandmaximumnumber of nodesin the big communities are 20 and 100 respectively Thegenerated networks with small communities and big commu-nities aremarked using the suffixes lsquosrsquo and lsquobrsquo individuallyTheexponents of the power-law distributions that node degreeand community size follow are the default values minus2 andminus1 respectively The parameters used to synthesize the fourgroups of artificial networks are listed in Table 1
We also performed the experiments on 13 real-worldnetworks the size of these networks spans from tens tohundreds of thousands of nodes the information aboutthem is listed in Table 2 These real-world networks can bedivided into two categories the first category includes thefirst four networks whose ground-truth communities areknown a priori the second one contains the other ninenetworks which have no publicly acknowledged ground-truth community structures
On these networks we ran our proposed method todetect community structures from them and compared theresults to those of 5 popular community detection algorithmsnamely Fast119876[24] WalkTrap [38] LPA[28] Attractor[41]IsoFdp[36] which have been already introduced in Section 2For LPA since it is a nondeterministic algorithm we ranit on each network 10 times and take the average of theevaluation metrics as its resulting metric value obtained fromthat network For our proposedmethod NSA we empiricallyset 120575 = 013 for the dolphin social network and 120575 = 01 forother networks in the experiments The details of how to setthe optimal value of 120575 will be discussed in Section 5
42 Evaluation Metrics Two indexes namely NMI (Nor-malized Mutual Information) [51] and modularity[7] are
8 Complexity
Table 1 The parameters used to generate the LFR networks In the header row of this table 119899 is number of nodes contained in the network⟨119889⟩ and 119889119898119886119909 are the average degree and the max degree respectively exp119889 and exp119888119900119898 are the exponents of the power law distributions thatnode degree and community size follow min(119862119894) and max(119862119894) represent the minimal and maximal number of nodes contained in everycommunity respectively
Network 119899 ⟨119889⟩ 119889119898119886119909 exp119889 expcom min(119862119894) max(119862119894)LFR1000s 1000 20 50 -2 -1 10 50LFR1000b 1000 20 50 -2 -1 20 100LFR5000s 5000 20 50 -2 -1 10 50LFR5000b 5000 20 50 -2 -1 20 100
Table 2 The information about the real-world networks 119899 and119898 are the number of nodes and edges in the network respectively
Network 119899 119898Karate club[14] 34 78Dolphin social network[15] 62 159Risk map[16] 42 83Scientists collaboration network [6] 118 197Lesmis[17] 77 254Polbooks[3] 105 441ColiNeta[18] 423 519NetScience[10] 1589 2742Email[19] 1133 5451YeastL[20] 2361 7182PGP[21] 10680 24316DBLP[22] 317080 1049866Amazon[22] 334863 925872
adopted as the measure metrics to evaluate the qualityof the detected community structure in this paper TheNMI between the ground-truth community structure 119875 =1198751 1198752 119875119870 and the extracted one 1198751015840 = 11987510158401 11987510158402 11987510158401198701015840 is calculated as follows
NMI (119875 1198751015840)
=minus2sum|119875|119894=1sum|119875
1015840|119895=1 119899119894119895 log ((119899119894119895 sdot 119899) (119899119875119894 sdot 119899119875
1015840
119895 ))sum|119875|119894=1 119899119875119894 log (119899119875119894 119899) + sum|119875
1015840|119895=1 119899119875
1015840
119895 log (1198991198751015840
119895 119899)
(8)
where 119899119875119894 = |119875119894| 1198991198751015840
119895 = |1198751015840119895 | and 119899119894119895 = |119875119894 cap 1198751015840119895 | respectivelyThe NMI is an information-theory based metric which
measures how much the detected community structureagrees with the ground truth Therefore it can only be usedto evaluate the quality of the detected community structureon networks whose ground-truth community structure isalready known Its value is in the range of [0 1] larger isbetter
Another metric widely used to evaluate the performanceof community detection method is modularity[7] which isdefined as follows
119876 = sum119894
(119890119894119894 minus 1198862119894 ) (9)
where 119890119894119894 is the diagonal element of a 119870 times 119870 matrix 119890whose element 119890119894119895 is the fraction of edges between nodes incommunities 119862119894 and 119862119895 to the total edges in the network 119870
is the number of communities in the community structure 119886119894is the fraction of edges associated with nodes in community119862119894
The first term sum119894 119890119894119894 in the right of (9) is the fractionof edges within communities the second term sum119894 1198862119894 is theexpected value of the same fraction in a random graph inwhich nodes and degree distribution are the same as in theoriginal network but edges are connected between nodesrandomly The smaller difference is between the two termsthe more the network approaches a random graph then theweaker the community structure is On the contrary thelarger the difference between them is the network departsfurther from the random graph then the stronger the com-munity structure is That is to say the modularity measuresquality of the community structure from the perspective ofhow far the detected result deviates from a random networkits effective value falls in [0 1] higher is better
43 Synthetic Networks We carried out experiments on fourgroups of artificial networks to testify the performance ofthe proposed method As mentioned above all the fourtypes of artificial networks are synthesized using the LFRbenchmark generator software [50] Besides the parameterslisted in Table 1 another critical parameter for this softwareis the mixing parameter 120583 which regulates for each node theratio of edges connected to nodes in other communities Thesmaller the value of 120583 is the clearer the community structurewill be Obviously 120583 = 05 is a transitive point above whichcommunities in networks tend to be obscure
Complexity 9
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a)
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10
NMI
(b)
Figure 2Comparison of different community-detection algorithms on LFR benchmark networks containing 1000 nodes (a)The results detectedfrom small network with small-sized communities (b) The results identified from small networks with big-sized communities
In our experiments we varied the value of 120583 from 01 to08 with an increment of 01 for each group of LFR networksTo eliminate the occasionality we generated 10 networksfor each value of 120583 while keeping the same setting forother parameters Since the community structures have beenalready embedded in these synthetic networks we use NMIas the metric to evaluate the performance of our proposedmethod and the comparison algorithms We took thesenetworks as the input one by one to run our proposedmethodand the comparison algorithms to detect communities anduse the average of NMI as the resulting metric The resultsdetected by our proposal and the comparison algorithmsfrom the small networks with small-sized communities orbig-sized communities are illustrated in Figures 2(a) and 2(b)respectively the results revealed from the larger networkswith small-sized communities and big-sized communities arepresented in Figures 3(a) and 3(b) separately
In Figures 2(a) and 2(b) Fast119876 tends to introducemistakes in the results no matter communities in networksarewell separated or obscure Asmentioned previously Fast119876is a typical modularity-optimization based algorithm it aimsonly at acquiring results with larger modularity rather thanhigh accuracy In our experiments all of the results uncoveredby it are not satisfactory Even in the networks with 120583 =01 it still failed to identify the exact communities andfurthermore its performance is the worst in comparisonalgorithms for 120583 ⩽ 05 For 120583 gt 05 the quality of its results isonly better than that of LPA LPA performed as well as othercomparison algorithms in those networks for 120583 lt 05 but itsperformance dropped dramatically for 120583 ⩾ 05 it even couldnot detect the effective communities from networks for 120583 gt06 This might be due to its own label-update mechanismwhen the community boundaries become obscure nodestend to accept incorrect labels to update their own onesalways leading to the trivial results even all nodes are labeled
as members of one giant community The proposed methodNSA acquired NMI = 1 on all networks for 120583 lt 05 meaningthat the detected partitions are perfectly matched with theground-truth community structures in these networks For120583 = 05 NSA also obtained the results as better as those ofWalkTrap Attractor and IsoFdp For 120583 gt 05 there has beena slip in the quality of the detected community structuresfor all those three algorithms and the proposed method For05 lt 120583 ⩽ 06 the quality of our proposal is better thanthat of Attractor in networks with larger communities andfor 120583 ⩾ 07 the performance of our proposed method is thebest
In Figures 3(a) and 3(b) we obtained the similar results asthose in Figure 2 overall But they still differ from each otherin someway In Figure 3(a) our proposedmethod performedthe best on almost all networks For 05 lt 120583 lt 07 in Figure 2NMI of the results extracted by our proposed method islower than those of WalkTrap and IsoFdp however inFigure 3 the proposedmethod performed better than IsoFdpfor 120583 gt 05 These results suggest that the performancesof the comparison algorithms are not stable on differentnetworks but our proposedmethod can steadily extract high-quality community structures from networks with differentcharacteristics This is also can be manifested from the factthat all the curves of the proposed method in these figuresdecline more slowly than others Moreover we can draw aconclusion by comparing the curves of the proposalrsquos own inthese figures that our proposed method inclines to performbetter on larger networks with small communities thereforeit overcomes the problem of resolution limit to some extent
44 Real-World Networks We also carried out experimentson 13 real-world networks to further test the effectivenessand efficiency of our proposed method As mentioned inSection 41 these networks fall in two categories ones with
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
8 Complexity
Table 1 The parameters used to generate the LFR networks In the header row of this table 119899 is number of nodes contained in the network⟨119889⟩ and 119889119898119886119909 are the average degree and the max degree respectively exp119889 and exp119888119900119898 are the exponents of the power law distributions thatnode degree and community size follow min(119862119894) and max(119862119894) represent the minimal and maximal number of nodes contained in everycommunity respectively
Network 119899 ⟨119889⟩ 119889119898119886119909 exp119889 expcom min(119862119894) max(119862119894)LFR1000s 1000 20 50 -2 -1 10 50LFR1000b 1000 20 50 -2 -1 20 100LFR5000s 5000 20 50 -2 -1 10 50LFR5000b 5000 20 50 -2 -1 20 100
Table 2 The information about the real-world networks 119899 and119898 are the number of nodes and edges in the network respectively
Network 119899 119898Karate club[14] 34 78Dolphin social network[15] 62 159Risk map[16] 42 83Scientists collaboration network [6] 118 197Lesmis[17] 77 254Polbooks[3] 105 441ColiNeta[18] 423 519NetScience[10] 1589 2742Email[19] 1133 5451YeastL[20] 2361 7182PGP[21] 10680 24316DBLP[22] 317080 1049866Amazon[22] 334863 925872
adopted as the measure metrics to evaluate the qualityof the detected community structure in this paper TheNMI between the ground-truth community structure 119875 =1198751 1198752 119875119870 and the extracted one 1198751015840 = 11987510158401 11987510158402 11987510158401198701015840 is calculated as follows
NMI (119875 1198751015840)
=minus2sum|119875|119894=1sum|119875
1015840|119895=1 119899119894119895 log ((119899119894119895 sdot 119899) (119899119875119894 sdot 119899119875
1015840
119895 ))sum|119875|119894=1 119899119875119894 log (119899119875119894 119899) + sum|119875
1015840|119895=1 119899119875
1015840
119895 log (1198991198751015840
119895 119899)
(8)
where 119899119875119894 = |119875119894| 1198991198751015840
119895 = |1198751015840119895 | and 119899119894119895 = |119875119894 cap 1198751015840119895 | respectivelyThe NMI is an information-theory based metric which
measures how much the detected community structureagrees with the ground truth Therefore it can only be usedto evaluate the quality of the detected community structureon networks whose ground-truth community structure isalready known Its value is in the range of [0 1] larger isbetter
Another metric widely used to evaluate the performanceof community detection method is modularity[7] which isdefined as follows
119876 = sum119894
(119890119894119894 minus 1198862119894 ) (9)
where 119890119894119894 is the diagonal element of a 119870 times 119870 matrix 119890whose element 119890119894119895 is the fraction of edges between nodes incommunities 119862119894 and 119862119895 to the total edges in the network 119870
is the number of communities in the community structure 119886119894is the fraction of edges associated with nodes in community119862119894
The first term sum119894 119890119894119894 in the right of (9) is the fractionof edges within communities the second term sum119894 1198862119894 is theexpected value of the same fraction in a random graph inwhich nodes and degree distribution are the same as in theoriginal network but edges are connected between nodesrandomly The smaller difference is between the two termsthe more the network approaches a random graph then theweaker the community structure is On the contrary thelarger the difference between them is the network departsfurther from the random graph then the stronger the com-munity structure is That is to say the modularity measuresquality of the community structure from the perspective ofhow far the detected result deviates from a random networkits effective value falls in [0 1] higher is better
43 Synthetic Networks We carried out experiments on fourgroups of artificial networks to testify the performance ofthe proposed method As mentioned above all the fourtypes of artificial networks are synthesized using the LFRbenchmark generator software [50] Besides the parameterslisted in Table 1 another critical parameter for this softwareis the mixing parameter 120583 which regulates for each node theratio of edges connected to nodes in other communities Thesmaller the value of 120583 is the clearer the community structurewill be Obviously 120583 = 05 is a transitive point above whichcommunities in networks tend to be obscure
Complexity 9
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a)
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10
NMI
(b)
Figure 2Comparison of different community-detection algorithms on LFR benchmark networks containing 1000 nodes (a)The results detectedfrom small network with small-sized communities (b) The results identified from small networks with big-sized communities
In our experiments we varied the value of 120583 from 01 to08 with an increment of 01 for each group of LFR networksTo eliminate the occasionality we generated 10 networksfor each value of 120583 while keeping the same setting forother parameters Since the community structures have beenalready embedded in these synthetic networks we use NMIas the metric to evaluate the performance of our proposedmethod and the comparison algorithms We took thesenetworks as the input one by one to run our proposedmethodand the comparison algorithms to detect communities anduse the average of NMI as the resulting metric The resultsdetected by our proposal and the comparison algorithmsfrom the small networks with small-sized communities orbig-sized communities are illustrated in Figures 2(a) and 2(b)respectively the results revealed from the larger networkswith small-sized communities and big-sized communities arepresented in Figures 3(a) and 3(b) separately
In Figures 2(a) and 2(b) Fast119876 tends to introducemistakes in the results no matter communities in networksarewell separated or obscure Asmentioned previously Fast119876is a typical modularity-optimization based algorithm it aimsonly at acquiring results with larger modularity rather thanhigh accuracy In our experiments all of the results uncoveredby it are not satisfactory Even in the networks with 120583 =01 it still failed to identify the exact communities andfurthermore its performance is the worst in comparisonalgorithms for 120583 ⩽ 05 For 120583 gt 05 the quality of its results isonly better than that of LPA LPA performed as well as othercomparison algorithms in those networks for 120583 lt 05 but itsperformance dropped dramatically for 120583 ⩾ 05 it even couldnot detect the effective communities from networks for 120583 gt06 This might be due to its own label-update mechanismwhen the community boundaries become obscure nodestend to accept incorrect labels to update their own onesalways leading to the trivial results even all nodes are labeled
as members of one giant community The proposed methodNSA acquired NMI = 1 on all networks for 120583 lt 05 meaningthat the detected partitions are perfectly matched with theground-truth community structures in these networks For120583 = 05 NSA also obtained the results as better as those ofWalkTrap Attractor and IsoFdp For 120583 gt 05 there has beena slip in the quality of the detected community structuresfor all those three algorithms and the proposed method For05 lt 120583 ⩽ 06 the quality of our proposal is better thanthat of Attractor in networks with larger communities andfor 120583 ⩾ 07 the performance of our proposed method is thebest
In Figures 3(a) and 3(b) we obtained the similar results asthose in Figure 2 overall But they still differ from each otherin someway In Figure 3(a) our proposedmethod performedthe best on almost all networks For 05 lt 120583 lt 07 in Figure 2NMI of the results extracted by our proposed method islower than those of WalkTrap and IsoFdp however inFigure 3 the proposedmethod performed better than IsoFdpfor 120583 gt 05 These results suggest that the performancesof the comparison algorithms are not stable on differentnetworks but our proposedmethod can steadily extract high-quality community structures from networks with differentcharacteristics This is also can be manifested from the factthat all the curves of the proposed method in these figuresdecline more slowly than others Moreover we can draw aconclusion by comparing the curves of the proposalrsquos own inthese figures that our proposed method inclines to performbetter on larger networks with small communities thereforeit overcomes the problem of resolution limit to some extent
44 Real-World Networks We also carried out experimentson 13 real-world networks to further test the effectivenessand efficiency of our proposed method As mentioned inSection 41 these networks fall in two categories ones with
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 9
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a)
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10
NMI
(b)
Figure 2Comparison of different community-detection algorithms on LFR benchmark networks containing 1000 nodes (a)The results detectedfrom small network with small-sized communities (b) The results identified from small networks with big-sized communities
In our experiments we varied the value of 120583 from 01 to08 with an increment of 01 for each group of LFR networksTo eliminate the occasionality we generated 10 networksfor each value of 120583 while keeping the same setting forother parameters Since the community structures have beenalready embedded in these synthetic networks we use NMIas the metric to evaluate the performance of our proposedmethod and the comparison algorithms We took thesenetworks as the input one by one to run our proposedmethodand the comparison algorithms to detect communities anduse the average of NMI as the resulting metric The resultsdetected by our proposal and the comparison algorithmsfrom the small networks with small-sized communities orbig-sized communities are illustrated in Figures 2(a) and 2(b)respectively the results revealed from the larger networkswith small-sized communities and big-sized communities arepresented in Figures 3(a) and 3(b) separately
In Figures 2(a) and 2(b) Fast119876 tends to introducemistakes in the results no matter communities in networksarewell separated or obscure Asmentioned previously Fast119876is a typical modularity-optimization based algorithm it aimsonly at acquiring results with larger modularity rather thanhigh accuracy In our experiments all of the results uncoveredby it are not satisfactory Even in the networks with 120583 =01 it still failed to identify the exact communities andfurthermore its performance is the worst in comparisonalgorithms for 120583 ⩽ 05 For 120583 gt 05 the quality of its results isonly better than that of LPA LPA performed as well as othercomparison algorithms in those networks for 120583 lt 05 but itsperformance dropped dramatically for 120583 ⩾ 05 it even couldnot detect the effective communities from networks for 120583 gt06 This might be due to its own label-update mechanismwhen the community boundaries become obscure nodestend to accept incorrect labels to update their own onesalways leading to the trivial results even all nodes are labeled
as members of one giant community The proposed methodNSA acquired NMI = 1 on all networks for 120583 lt 05 meaningthat the detected partitions are perfectly matched with theground-truth community structures in these networks For120583 = 05 NSA also obtained the results as better as those ofWalkTrap Attractor and IsoFdp For 120583 gt 05 there has beena slip in the quality of the detected community structuresfor all those three algorithms and the proposed method For05 lt 120583 ⩽ 06 the quality of our proposal is better thanthat of Attractor in networks with larger communities andfor 120583 ⩾ 07 the performance of our proposed method is thebest
In Figures 3(a) and 3(b) we obtained the similar results asthose in Figure 2 overall But they still differ from each otherin someway In Figure 3(a) our proposedmethod performedthe best on almost all networks For 05 lt 120583 lt 07 in Figure 2NMI of the results extracted by our proposed method islower than those of WalkTrap and IsoFdp however inFigure 3 the proposedmethod performed better than IsoFdpfor 120583 gt 05 These results suggest that the performancesof the comparison algorithms are not stable on differentnetworks but our proposedmethod can steadily extract high-quality community structures from networks with differentcharacteristics This is also can be manifested from the factthat all the curves of the proposed method in these figuresdecline more slowly than others Moreover we can draw aconclusion by comparing the curves of the proposalrsquos own inthese figures that our proposed method inclines to performbetter on larger networks with small communities thereforeit overcomes the problem of resolution limit to some extent
44 Real-World Networks We also carried out experimentson 13 real-world networks to further test the effectivenessand efficiency of our proposed method As mentioned inSection 41 these networks fall in two categories ones with
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
10 Complexity
FastQWalktrapLPA
AttractorIsoFdpproposal(NSA)
02 03 04 05 06 07 0801
00
02
04
06
08
10NMI
(a) (b)
Figure 3Comparison of different community detection algorithms on LFR benchmark networks containing 5000 nodes (a)The results extractedfrom the larger networks with small-sized communities (b) The results revealed from the larger networks with big-sized communities
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(a)
1
23
4
5
6
7
8
9
10
11
12
13
14
1516
17
18
19
20
21
22
23
24
2526
27
28
29
30
31
34
33
32
(b)
Figure 4 The karate club network (a) The ground-truth community structure (b) The community structure detected by our proposedmethod NSA (The nodes in different communities are plotted in different colors and shapes this illustration style is also applied in thesubsequent figures)
the ground-truth community structure known a priori andthe other ones without publicly acknowledged ground truth
Networks withGround-Truth Community StructureThis cate-gory includes the first 4 networks listed in Table 2 since theirground-truth community structure is already known wemeasure the quality of the community structures identifiedby the proposed method and comparison algorithms interms of both NMI and modularity The values of the twometrics obtained by the proposed method and comparisonalgorithms have been recorded in Table 3 The scales of thesenetworks are relatively small facilitating to us visualizing thedetected results Belowwe analyze the results extracted by theproposed method from these networks individually
The Karate Club Network This is a network depicting thefriendships among members of a karate club it contains 34nodes and 78 edges This network was compiled by WayneW Zachary who observed the karate club for 3 years Duringthe period of study of Zachary the club split into two factionsbecause of a dispute arisen between the administrator andthe instructor Corresponding to the two parts the network isalways taking the partition of two communities as the groundtruth which is shown in Figure 4(a) The result detected byour proposed method is presented in Figure 4(b)
From Figure 4 we can see that our proposed methoddetected 3 rather than 2 communities from the network Itseems that the detected result deviates from the ground truthin some ways but this result coincides with the conclusion
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 11
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(a)
musquasimn23 notch
dn21 jet number1 knitrippleflukezig upbang sn96
gallatin plfeather dn63 bumper
beescratchwave web tr77
dn16 tr82 oscarbeak fish
sn100sn89 zipfel tsn83ccl thumper
kringel sn63
sn90
zap hookdouble tr99 whitetipsn9
tsn103grin shmuddelsn4haecksel
mn60 topless scabs stripes tr88
trigger patchback tr120vau jonah fork
cross smn5five mn83 mn105
(b)
Figure 5 The dolphin social network (a) The ground-truth community structure (b) The community structure identified by our proposedmethod NSA
Table 3 The experimental results on networks with ground-truth community structures The largest values of the two measure metrics aretyped in bold
Network Metric Fast119876 WalkTrap LPA Attractor IsoFdp NSAKarate 119876 0381 0353 0355 0371 0371 0402
NMI 0693 0504 062 0924 100 0699Dolphin 119876 0492 0489 0464 045 0505 0513
NMI 0719 0632 0719 069 0744 0887Risk map 119876 0625 0624 059 0598 0519 0624
NMI 0894 0848 0821 0839 0714 0848Scientists 119876 0749 0733 064 0694 0668 0744
NMI 0867 0818 0743 0835 0823 0878
found in the experiments on synthetic networks that ourproposed method tends to find small communities fromnetworks to overcome the problem of resolution limit More-over considering from the perspective of measure metricsthe modularity corresponding to the detected result is thelargest among those of comparison algorithms Although ourproposed method is not based on the strategy of optimizingmodularity it inclines to acquire the community structurewith as larger modularity as possible If it is not the largestit is the second largest with a small offset to the largest Thesefindings can also be manifested in next networks
Lusseaursquos Dolphin Social Network This network describesthe interactions of a group of dolphins living in Doubt-ful Sound New Zealand It consists of 62 nodes and 159edges which represent dolphin individuals and the cooc-currences of pairs of dolphins being observed respectivelyThis network is generally partitioned into 4 groups as theground-truth community structure which is as exhibited inFigure 5(a) Figure 5(b) is the community structure uncov-ered by our proposed method
In Figure 5 our proposed method detected communitiesfrom this network with a high degree of success it identified4 communities as well the absolute majority of nodes areclassified into the correct communities and the result almost
approaches the ground-truth community structure Consid-ering quantitatively both the values of NMI and modularitycorresponding to the result detected by the proposedmethodfrom this network are the largest among those of comparisonalgorithms which means that the community structureidentified by the proposed method is obviously better thanthose of comparison algorithms
Risk Map Network This network is a world politicalmap loaded in the popular game Risk (httpsenwikipediaorgwikiRisk (game)) in which 42 countries or territoriesof 6 continents are involved Therefore 42 nodes and 83 edgesconnecting adjacent countries or territories are organizedin 6 communities as the ground truth which is illustratedin Figure 6(a) Feeding this network into the proposedmethod we obtained the community structure as shown inFigure 6(b)
Comparing the detected result to the ground truth com-munity structure the community containing nodes lsquo18rsquo andlsquo23rsquo in the ground truth is split into two small communitiesin Figure 6(b) owning to the tendency of the proposedmethod Besides this nodes lsquo26rsquo lsquo33rsquo and lsquo34rsquo are misclassifiedinto the wrong communities in the detected result Butnodes lsquo12rsquo lsquo16rsquo lsquo26rsquo lsquo33rsquo and lsquo34rsquo are special ones in thisnetwork the outer edges associated with them are no less
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
12 Complexity
Table 4 The experimental results of modularity on networks The largest values of the two measure metrics are typed in bold
Network Fast119876 WalkTrap LPA Attractor IsoFdp NSALesmis 0499 0519 0515 0498 0491 054Polbooks 0502 0507 0508 0501 0518 0524ColiNeta 0779 0746 0693 0718 - 0761Email 0499 0531 0379 0464 0531 0544NetScience 0955 0956 0896 0937 - 0957YeastL 0573 0529 0372 0511 - 0574PGP 085 0789 0765 0768 0726 0867DBLP 0735 - 0652 0637 - 0782Amazon 0869 - 0743 0741 - 0898
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
35 36
37 38
3940
4142
(a)
12
3
4
56
7
8
9
10 11
12
1314
15 16
17
18
1920
21
22
23
24
25
26
27 2829
3031
3433
32
3536
37 38
3940
4142
(b)
Figure 6 Risk map network (a) The ground-truth communitystructure (b)The community structure uncovered by our proposedmethod NSA
even more than those within the communities to whichthese nodes belong Therefore if we ignore the meaningof the actual representation of these nodes and considerqualitatively based on the topology only the communitystructure extracted by our proposed method is more rationalthan the ground truth more edges associated with these threenodes are located within the community than in the ground
truth thus more tightly these three nodes are connectedto nodes within the same community in Figure 6(b) Whenconsidering quantitatively both values of the two measuremetrics of our proposed method are second only to those ofFast119876 and are the same with those of WalkTrapThese resultsalso confirm that our proposed method provides us with anacceptable solution to the problem of community detection
Scientists Collaboration Network This is the largest con-nected component of a network delineating the coauthorrelationship among scientists working at the Santa Fe Insti-tute NewMexico Nodes in this network represent scientistsedges stand for the two scientists who have collaborated atleast on one paper There are 118 nodes and 197 edges in totalin this network The nodes can be divided into 6 groups asthe ground-truth communities according to the specialties ofthe scientists which is as presented in Figure 7(a) Taking thisnetwork as the input to the proposedmethodwe obtained thecommunity structure as illustrated in Figure 7(b)
The proposed method revealed 8 communities fromthis network two additional communities are detected inFigure 7(b) These two communities are relatively indepen-dent components especially for the community containingnodes lsquo1rsquo there are much more inner edges than outer edgesThat is to say nodes in these two communities are connectedmore tightly to one another than with the remainder of thenetwork Therefore isolating them from the network andtaking themas independent communities are also reasonableConsidering from the perspective of measure metrics thevalue of NMI obtained by the proposedmethod is the largestwhich suggests that the result detected by our proposal is theonemost approaches the ground-truth community structurethe modularity value of the proposed method is not thelargest though it is also second only to that of Fast119876 Theseresults also testify that our proposed method can extracthigh-quality community structure from networks
Networks without Ground-Truth Community Structure Thiscategory contains the last 9 real-world networks listed inTable 2 For the experiments carried out on this category ofnetworks we evaluate the quality of the extracted communitystructures using the modularity only due to the absence ofthe ground-truth community structures For the proposedmethod and comparison algorithms the obtained values ofmodularity have been recorded in Table 4 To illustrate them
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 13
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241
48 46 72
7721
31 33
39
1130
404745
71 76
96
19
98
2528 64
4375
946670
101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88 10680
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255 36
84 103110
118109
108 113116
107 114 115
(a)
1814 154
172
1 3
5
79
10
12
16 26 386
2437
823
49341332
35
2027
2241 48 46 72
7721
31 33
39
1130
404745
71 7696
19
98
2528 64
4375
9466 70 101 97
99
97
4442 100
29
63
7495
6165
93
92
91
60 6762
7378 90
5868
88106
80
8911250 56
82 8769 8186
5251
59
57
54
53
85105 111
104 11783
10255
3684 103
110118
109108 113
116107 114 115
(b)
Figure 7 The collaboration network of scientists working at the Santa Fe Institute (a) The ground-truth community structure (b) Thecommunity structure detected by our proposed NSA algorithm
Lesmis DBLPPGPYeastLNetScienceEmailColiNetaPolbooks Amazon00
01
02
03
04
05
06
07
08
09
10Q
Networks
FastQWalktrapLPAAttractorIsoFdpproposal(NSA)
Mod
ularity
(Q)
Figure 8 The bar chart of the modularity obtained by comparison algorithms and the proposed method NSA
intuitively we also plotted them in a bar chart which ispresented in Figure 8
On these networks our proposed method achieved thelargest modularity from 8 of them On the only other onenetwork ColiNeta it still obtained the second largest valueof modularity For Fast119876 it is based on the modularityoptimization strategy though it acquired the largest value ofmodularity on network ColiNeta only For WalkTrap it is anapproach based on random walk then its time complexityis relatively high It cannot manage to get effective resultsfrom networks Amazon and DBLP due to the large scaleof these two networks For LPA and Attractor they can
extract community structures from all those networks butthe quality of the detected results is not satisfactory ForIsoFdp it can only be applied to connected networks andcannot run on networks ColiNeta NetScience and YeastLas these three networks are disconnected It cannot detectthe community structure from networks Amazon and DBLPeffectively either because of their large scale These compari-son results manifest that our proposed method can steadilyeffectively and efficiently provide uswith promising solutionsfor the problem of community detection in networks of wide-range applications and outperform comparison algorithmssignificantly
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
14 Complexity
000 005 010 015 020 025 030
Karate
020
025
030
035
040
045
050
Q
(a) The karate club network (b) The dolphin social network
000 005 010 015 020 025 030
Riskmap
040
045
050
055
060
065
070
Q
(c) The risk map network
000 005 010 015 020 025 030
Santafe
040
045
050
055
060
065
070
075
080
Q
(d) The scientists collaboration network
Figure 9 The setting of parameter 120575
5 Parameter Setting
In the second phase of the proposed method we introducea threshold 120575 for the community metric to identify thepreliminary communities needed to be merged As afore-mentioned we calculate the community metric 120574119894 = 120572119894 times 120573119894for every preliminary community 119862119894 in the merge procedureif the value of 120574119894 is below the threshold 120575 the correspondingcommunity 119862119894 is identified as the one needed to be merged
Therefore 120575 works as a parameter in our proposedmethod whose setting can influence the quality of theresulting community structure Considering qualitativity thelarger or the sparser the network is the threshold 120575 shouldbe smaller in accordance with the definitions of communitysparsity (120572119894) community scale (120573119894) and community metric(120574119894) To determine the optimal value of 120575 we conduct a groupof experiments to explore the relationship between the valueof 120575 and the quality of the resulting community structure onthe first four networks listed in Table 2 namely the karateclub network the dolphin social network the map of gameRisk and the scientists collaboration network respectivelyThe quality of the resulting community structure is measuredin term of modularity 119876 We vary the value of 120575 from 0 to 10by increasing 0005 each time for each value of 120575 we run ourproposed method on these networks and observe the changeof modularity along with the varies of 120575
The observed results are as illustrated in Figure 9 inwhich we plotted only the proportion of 120575 isin [0 03] because
the largest modularities are obtained during 120575 ⩽ 03 on all ofthose four networks Our proposed method gets the largestmodularity when 120575 = 013 on the dolphin social network and120575 = 01 on the other three networks Therefore we adopt thecorresponding value for those four networks and empiricallyset 120575 = 01 for other networks to perform the experiments InFigure 9 the largest modularity is obtained around the valueof 120575 = 01 and the interval of [005 02] covers the optimalvalue of 120575Therefore we empirically suggest that120575 be adjustedadaptively around 01 in the range of [005 02] according tothe size and the sparsity of networks involved in real-worldapplications
6 Conclusion
In this paper we presented a novel method to detectcommunities from networks It is a local method basedon node similarity and overcomes the deficiency of hightime consumption of global methods First we constructthe preliminary community structure by repeatedly selectingthe node with the largest degree and either taking it asthe exemplar of a new community or inserting it into thecommunity to which its most similar neighbor belongs onthe basis of its most similar neighborrsquos community assign-ment ie if its most similar neighbor has not been assignedto any community yet we create a new community for itand its most similar neighbor if its most similar neighborhas been assigned to a certain community we insert it into
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Complexity 15
that community as well At the end of this process weobtain a series of preliminary communities However someof them might be too small or too sparse leading to a low-quality result Therefore we merge some of the preliminarycommunities to acquire the final community structure To doso we also proposed some indexes which take both the sizeand sparsity of communities into account to determine whichcommunities should be merged
To test the performance of the proposed method wehave performed extensive experiments on four groups ofsynthetic networks and 13 real-world networks and comparedthe detected community structures with the results extractedby comparison algorithms in terms of NMI and modular-ity the comparison results demonstrate that our proposedmethod can extract high-quality community structures fromnetworks abstracted from various applications and nodes inthe extracted communities are connected more tightly Theproposed method overcomes the problem of resolution limitto some extent and outperforms the competitors successfully
Data Availability
We have conducted experiments on some artificial net-works and some real-world datasets The artificial networksare synthesized using LFR benchmark network generatorwhich can be freely available at httpssitesgooglecomsitesantofortunato The parameters used to synthesize the arti-ficial networks are listed in Table 1 The real-world datasupporting this study are from previously reported studieswhich have been cited in Table 2 Most of the real-worlddatasets can also be downloaded from httpwww-personalumichedusimmejnnetdata and httpssnapstanfordedudataindexhtml TheColiNeta dataset was provided by Jeonget al [18] We construct the Risk Map network manuallyaccording to the literature [16]
Conflicts of Interest
The authors declare that they have no conflicts of interest
Acknowledgments
This work was partially supported by the National NaturalScience Foundation of China (Grant ID 61602225)
References
[1] J Kleinberg and S Lawrence ldquoNetwork analysis The structureof the webrdquo Science vol 294 no 5548 pp 1849-1850 2001
[2] P Chen and S Redner ldquoCommunity structure of the physicalreview citation networkrdquo Journal of Informetrics vol 4 no 3pp 278ndash290 2010
[3] M E J Newman ldquoModularity and community structure innetworksrdquoProceedings of theNational Acadamy of Sciences of theUnited States of America vol 103 no 23 pp 8577ndash8582 2006
[4] E Ravasz A L Somera D A Mongru Z N Oltvai and A LBarabasi ldquoHierarchical organization ofmodularity inmetabolicnetworksrdquo Science vol 297 no 5586 pp 1551ndash1555 2002
[5] R Guimera and L A N Amaral ldquoFunctional cartography ofcomplex metabolic networksrdquo Nature vol 433 no 7028 pp895ndash900 2005
[6] M Girvan and M E J Newman ldquoCommunity structure insocial and biological networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 99 no12 pp 7821ndash7826 2002
[7] M E J Newman andM Girvan ldquoFinding and evaluating com-munity structure in networksrdquo Physical Review E StatisticalNonlinear and Soft Matter Physics vol 69 no 2 Article ID026113 2004
[8] P M Gleiser and L Danon ldquoCommunity structure in jazzrdquoAdvances in Complex Systems (ACS) vol 6 no 4 pp 565ndash5732003
[9] Y van Gennip B Hunter R Ahn et al ldquoCommunity detectionusing spectral clustering on sparse geosocial datardquo SIAM Jour-nal on Applied Mathematics vol 73 no 1 pp 67ndash83 2013
[10] M E J Newman ldquoFinding community structure in networksusing the eigenvectors of matricesrdquo Physical Review E Statisti-cal Nonlinear and Soft Matter Physics vol 74 no 3 Article ID036104 19 pages 2006
[11] S Fortunato ldquoCommunity detection in graphsrdquoPhysics Reportsvol 486 no 3ndash5 pp 75ndash174 2010
[12] S Fortunato and D Hric ldquoCommunity detection in networksa user guiderdquo Physics Reports vol 659 pp 1ndash44 2016
[13] BW Kernighan and S Lin ldquoAn efficient heuristic procedure forpartitioning graphsrdquo Bell Labs Technical Journal vol 49 no 1pp 291ndash307 1970
[14] W W Zachary ldquoAn information flow model for conflict andfission in small groupsrdquo Journal of Anthropological Research vol33 no 4 pp 452ndash473 1977
[15] D Lusseau ldquoThe emergent properties of a dolphin socialnetworkrdquo in Proceedings of the Royal Society of London BBiological Sciences vol 270 supplement 2 pp S186ndashS188 2003
[16] K Steinhaeuser and N V Chawla ldquoIdentifying and evaluatingcommunity structure in complex networksrdquo Pattern Recogni-tion Letters vol 31 no 5 pp 413ndash421 2010
[17] M E J Newman ldquoThe structure and function of complexnetworksrdquo SIAM Review vol 45 no 2 pp 167ndash256 2003
[18] H Jeong B Tombor R Albert Z N Oltval and A-L BarabaslldquoThe large-scale organization of metabolic networksrdquo Naturevol 407 no 6804 pp 651ndash654 2000
[19] RGuimera L DanonADıaz-Guilera F Giralt andAArenasldquoSelf-similar community structure in a network of humaninteractionsrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 68 no 6 Article ID 065103 2003
[20] RMilo S Shen-Orr S ItzkovitzNKashtanDChklovskii andU Alon ldquoNetwork motifs simple building blocks of complexnetworksrdquo Science vol 298 no 5594 pp 824ndash827 2002
[21] M Boguna R Pastor-Satorras A Dıaz-Guilera and A ArenasldquoModels of social networks based on social distance attach-mentrdquo Physical Review E Statistical Nonlinear and Soft MatterPhysics vol 70 no 5 Article ID 056122 2004
[22] J Yang and J Leskovec ldquoDefining and evaluating network com-munities based on ground-truthrdquo Knowledge and InformationSystems vol 42 no 1 pp 181ndash213 2015
[23] M E J Newman ldquoFast algorithm for detecting communitystructure in networksrdquo Physical Review E Statistical Nonlinearand Soft Matter Physics vol 69 no 6 Article ID 066133 2004
[24] A Clauset M E J Newman and C Moore ldquoFinding com-munity structure in very large networksrdquo Physical Review E
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
16 Complexity
Statistical Nonlinear and Soft Matter Physics vol 70 no 6Article ID 066111 2004
[25] F Dabaghi Zarandi and M Kuchaki Rafsanjani ldquoCommunitydetection in complex networks using structural similarityrdquoPhysica A Statistical Mechanics and its Applications vol 503 pp882ndash891 2018
[26] V D Blondel J Guillaume R Lambiotte and E LefebvreldquoFast unfolding of communities in large networksrdquo Journal ofStatistical Mechanics Theory and Experiment vol 2008 no 10Article ID P10008 2008
[27] L Waltman andN J Van Eck ldquoA smart local moving algorithmfor large-scale modularity-based community detectionrdquo TheEuropean Physical Journal B vol 86 no 11 article 471 pp 1ndash142013
[28] U N Raghavan R Albert and S Kumara ldquoNear lineartime algorithm to detect community structures in large-scalenetworksrdquo Physical Review E Statistical Nonlinear and SoftMatter Physics vol 76 no 3 Article ID 036106 2007
[29] M J Barber and J W Clark ldquoDetecting network communitiesby propagating labels under constraintsrdquo Physical Review EStatistical Nonlinear and Soft Matter Physics vol 80 no 2Article ID 026129 2009
[30] J Hou Chin and K Ratnavelu ldquoA semi-synchronous label prop-agation algorithm with constraints for community detection incomplex networksrdquo Scientific Reports vol 7 Article ID 458362017
[31] J Ding X He J Yuan Y Chen and B Jiang ldquoCommunitydetection by propagating the label of centerrdquoPhysica A Statisti-cal Mechanics and its Applications vol 503 pp 675ndash686 2018
[32] A Laio and A Rodriguez ldquoClustering by fast search and find ofdensity peaksrdquo Science vol 344 no 6191 pp 1492ndash1496 2014
[33] X Xu N Yuruk Z Feng and T A J Schweiger ldquoSCAN Astructural clustering algorithm for networksrdquo in Proceedings ofthe 13th ACM SIGKDD International Conference on KnowledgeDiscovery and DataMining (KDD rsquo07) pp 824ndash833 ACMNewYork NY USA August 2007
[34] M Este H P Kriegel S Jorg and x Xu ldquoA density-basedalgorithm for discovering clusters in large spatial databases withnoiserdquo in Proceedings of the Second International Conference onKnowledge Discovery and Data Mining (KDDrsquo96) pp 226ndash231AAAI Press 1996
[35] H Shiokawa Y Fujiwara and M Onizuka ldquoScan++ Efficientalgorithm for finding clusters hubs and outliers on large-scalegraphsrdquo in Proceedings of the 3rd Workshop on Spatio-TemporalDatabase Management STDBM 2006 Co-located with the 32ndInternational Conference on Very Large Data Bases VLDB 2006pp 1178ndash1189 Republic of Korea September 2006
[36] T You H-M Cheng Y-Z Ning B-C Shia and Z-Y ZhangldquoCommunity detection in complex networks using density-based clustering algorithm and manifold learningrdquo Physica AStatistical Mechanics and its Applications vol 464 pp 221ndash2302016
[37] XWangG Liu J Li and J PNees ldquoLocating structural centersA density-based clustering method for community detectionrdquoPLoS ONE vol 12 no 1 Article ID e0169355 2017
[38] P Pons and M Latapy ldquoComputing communities in largenetworks using random walksrdquo in International symposium oncomputer and information sciences pp 284ndash293 2005
[39] S A Tabrizi A Shakery M Asadpour M Abbasi and M ATavallaie ldquoPersonalized PageRank clustering a graph cluster-ing algorithm based on random walksrdquo Physica A Statistical
Mechanics and its Applications vol 392 no 22 pp 5772ndash57852013
[40] Y Su B Wang and X Zhang ldquoA seed-expanding methodbased on random walks for community detection in networkswith ambiguous community structuresrdquo Scientific Reports vol7 Article ID 41830 2017
[41] J Shao Z Han Q Yang and T Zhou ldquoCommunity detectionbased on distance dynamicsrdquo in Proceedings of the 21th ACMSIGKDD International Conference on Knowledge Discovery andData Mining pp 1075ndash1084 ACM Australia August 2015
[42] H-L Sun E Chrsquong X Yong J M Garibaldi S See and D-B Chen ldquoA fast community detection method in bipartite net-works by distance dynamicsrdquo Physica A Statistical Mechanicsand its Applications vol 496 pp 108ndash120 2018
[43] A A Amini A Chen P J Bickel and E Levina ldquoPseudo-likelihood methods for community detection in large sparsenetworksrdquoThe Annals of Statistics vol 41 no 4 pp 2097ndash21222013
[44] S C de Lange M A de Reus and M P van den HeuvelldquoThe laplacian spectrum of neural networksrdquo Frontiers inComputational Neuroscience vol 7 no 189 2014
[45] F Krzakala C Moore E Mossel et al ldquoSpectral redemptionin clustering sparse networksrdquo Proceedings of the NationalAcadamy of Sciences of the United States of America vol 110 no52 pp 20935ndash20940 2013
[46] P Shi K He D Bindel and J E Hopcroft ldquoLocal LanczosSpectral Approximation for Community Detectionrdquo in JointEuropean Conference on Machine Learning and KnowledgeDiscovery in Databases vol 10534 of Lecture Notes in ComputerScience pp 651ndash667 Springer International Publishing 2017
[47] R Tackx F Tarissan and J Guillaume ldquoComSim a bipartitecommunity detection algorithm using cycle and nodersquos similar-ityrdquo in International Workshop on Complex Networks and theirApplications vol 689 of Studies in Computational Intelligencepp 278ndash289 Springer International Publishing 2017
[48] TWang L Yin and XWang ldquoA community detectionmethodbased on local similarity and degree clustering informationrdquoPhysica A Statistical Mechanics and its Applications vol 490pp 1344ndash1354 2018
[49] K R Zalik ldquoMaximal neighbor similarity reveals real commu-nities in networksrdquo Scientific Reports vol 5 Article ID 183742015
[50] A Lancichinetti S Fortunato and F Radicchi ldquoBenchmarkgraphs for testing community detection algorithmsrdquo PhysicalReview E Statistical Nonlinear and Soft Matter Physics vol 78no 4 Article ID 046110 2008
[51] L Ana and A Jain ldquoRobust data clusteringrdquo in Proceedingsof the IEEE Computer Society Conference on Computer Visionand Pattern Recognition vol 2 pp II-128ndashII-133 Madison WIUSA 2003
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom
Hindawiwwwhindawicom Volume 2018
MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Mathematical Problems in Engineering
Applied MathematicsJournal of
Hindawiwwwhindawicom Volume 2018
Probability and StatisticsHindawiwwwhindawicom Volume 2018
Journal of
Hindawiwwwhindawicom Volume 2018
Mathematical PhysicsAdvances in
Complex AnalysisJournal of
Hindawiwwwhindawicom Volume 2018
OptimizationJournal of
Hindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom Volume 2018
Engineering Mathematics
International Journal of
Hindawiwwwhindawicom Volume 2018
Operations ResearchAdvances in
Journal of
Hindawiwwwhindawicom Volume 2018
Function SpacesAbstract and Applied AnalysisHindawiwwwhindawicom Volume 2018
International Journal of Mathematics and Mathematical Sciences
Hindawiwwwhindawicom Volume 2018
Hindawi Publishing Corporation httpwwwhindawicom Volume 2013Hindawiwwwhindawicom
The Scientific World Journal
Volume 2018
Hindawiwwwhindawicom Volume 2018Volume 2018
Numerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisNumerical AnalysisAdvances inAdvances in Discrete Dynamics in
Nature and SocietyHindawiwwwhindawicom Volume 2018
Hindawiwwwhindawicom
Dierential EquationsInternational Journal of
Volume 2018
Hindawiwwwhindawicom Volume 2018
Decision SciencesAdvances in
Hindawiwwwhindawicom Volume 2018
AnalysisInternational Journal of
Hindawiwwwhindawicom Volume 2018
Stochastic AnalysisInternational Journal of
Submit your manuscripts atwwwhindawicom