Community-based diffusion scheme using Markov chain and...

13
Community-based diffusion scheme using Markov chain and spectral clustering for mobile social networks Jegwang Ryu 1 Jiho Park 1 Junyeop Lee 1 Sung-Bong Yang 1 Ó Springer Science+Business Media, LLC 2017 Abstract With the increase in the number of mobile devices such as tablets and smart watches, mobile social networks (MSNs) provide great opportunities for people to exchange information. As a result, information diffusion has become a critical issue in the emerging MSNs. In this paper, we address the problem of finding the top-k influ- ential users who can effectively spread information in a network, which is referred to as the diffusion minimization problem. In order to minimize the spreading period, we can utilize the k-center problem, but which has a time com- plexity of NP-hard. We propose a community-based dif- fusion scheme using Markov chain and spectral clustering (CDMS) to minimize the spreading time by adopting a community concept based on the geographic regularity of human mobility in the MSNs. We exploit the Markov chain to predict a node’s mobility patterns and cluster the pre- dicted patterns using the spectral graph theory. Finally, we select the top-k influential nodes in each community. Simulations are performed using the NS-2, based on the home-cell community-based mobility model, to show that the proposed scheme results in MSNs. In addition, we demonstrate that CDMS outperforms the noncommunity- based algorithms in terms of the number of nodes and ratio of k influential nodes. Keywords Mobile social networks Information diffusion Markov chain Spectral clustering 1 Introduction Many companies such as Google, Amazon, and Yahoo exploit social information of the users of social networks for effective marketing [1]. For example, to effectively minimize the marketing cost, companies analyze informa- tion using people’s social behavioral patterns in online social networks (OSNs) such as Facebook or Twitter, because the word-of-mouth advertising technique, wherein the customers themselves advertise the products and events to other people, relies on the customers for most of the promotional efforts [2, 3]. In the past decade, many studies have been conducted to find the top-k influential users in a social network. Recently, because of the evolution of mobile devices and the technological advancements in wireless network techniques, mobile social networks (MSNs) are presenting great opportunities for people to exchange information. MSNs are known as Delay Tolerant Networks (DTNs) [3, 4], and messages unlike different cellular infrastructures can be delivered by the store–carry- forward technique to enable communication. Information diffusion have effective routing strategy. In the research of Information diffusion, first small node set propagates messages to total node. It is effective routing strategy. If there are small node set to more delivery message than other nodes, network traffic will decrease. As shown Fig. 1, Finding Main application in wireless networks is data offloading to forward messages between nodes in mobile & Sung-Bong Yang [email protected] Jegwang Ryu [email protected] Jiho Park [email protected] Junyeop Lee [email protected] 1 Department of Computer Science, Yonsei University, Seoul, Korea 123 Wireless Netw DOI 10.1007/s11276-017-1599-6

Transcript of Community-based diffusion scheme using Markov chain and...

Page 1: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

Community-based diffusion scheme using Markov chainand spectral clustering for mobile social networks

Jegwang Ryu1 • Jiho Park1 • Junyeop Lee1 • Sung-Bong Yang1

� Springer Science+Business Media, LLC 2017

Abstract With the increase in the number of mobile

devices such as tablets and smart watches, mobile social

networks (MSNs) provide great opportunities for people to

exchange information. As a result, information diffusion

has become a critical issue in the emerging MSNs. In this

paper, we address the problem of finding the top-k influ-

ential users who can effectively spread information in a

network, which is referred to as the diffusion minimization

problem. In order to minimize the spreading period, we can

utilize the k-center problem, but which has a time com-

plexity of NP-hard. We propose a community-based dif-

fusion scheme using Markov chain and spectral clustering

(CDMS) to minimize the spreading time by adopting a

community concept based on the geographic regularity of

human mobility in the MSNs. We exploit the Markov chain

to predict a node’s mobility patterns and cluster the pre-

dicted patterns using the spectral graph theory. Finally, we

select the top-k influential nodes in each community.

Simulations are performed using the NS-2, based on the

home-cell community-based mobility model, to show that

the proposed scheme results in MSNs. In addition, we

demonstrate that CDMS outperforms the noncommunity-

based algorithms in terms of the number of nodes and ratio

of k influential nodes.

Keywords Mobile social networks � Information

diffusion � Markov chain � Spectral clustering

1 Introduction

Many companies such as Google, Amazon, and Yahoo

exploit social information of the users of social networks

for effective marketing [1]. For example, to effectively

minimize the marketing cost, companies analyze informa-

tion using people’s social behavioral patterns in online

social networks (OSNs) such as Facebook or Twitter,

because the word-of-mouth advertising technique, wherein

the customers themselves advertise the products and events

to other people, relies on the customers for most of the

promotional efforts [2, 3]. In the past decade, many studies

have been conducted to find the top-k influential users in a

social network. Recently, because of the evolution of

mobile devices and the technological advancements in

wireless network techniques, mobile social networks

(MSNs) are presenting great opportunities for people to

exchange information. MSNs are known as Delay Tolerant

Networks (DTNs) [3, 4], and messages unlike different

cellular infrastructures can be delivered by the store–carry-

forward technique to enable communication. Information

diffusion have effective routing strategy. In the research of

Information diffusion, first small node set propagates

messages to total node. It is effective routing strategy. If

there are small node set to more delivery message than

other nodes, network traffic will decrease. As shown Fig. 1,

Finding Main application in wireless networks is data

offloading to forward messages between nodes in mobile

& Sung-Bong Yang

[email protected]

Jegwang Ryu

[email protected]

Jiho Park

[email protected]

Junyeop Lee

[email protected]

1 Department of Computer Science, Yonsei University, Seoul,

Korea

123

Wireless Netw

DOI 10.1007/s11276-017-1599-6

Page 2: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

network, where the most active nodes (the top-k influential

nodes) can propagate to support Cellular Network such as

3G and 4G. Therefore, information diffusion form initial

top-k node set I can exploit finding the proper delivery

active nodes, as a result, our schemes contribute effective

routing strategy in mobile social networks.

Several papers solve these problems through information

dissemination using word-of-mouth techniques in MSNs

[5–7]. Recently, Lu et al. [5] proposed a diffusion scheme to

solve the problem of diffusion minimization in MSNs. The

diffusion scheme under the diffusion model in MSNs can be

formulated as an asymmetric k-center problem, which is

NP-hard and has a time complexity O ( n5), using the (log*

n) approximation algorithm [8]. However, this scheme is not

suitable for large complex networks [8]. To solve this

problem, we propose the CDMS to minimize the diffusion

period in MSNs. The basic idea is to exploit the social

community structure [9]. It has better performance and

shorter time complexity than the schemes finding the top-k

influential nodes in the entire network, because nodes in

each community are strongly connected. For instance,

people tend to demonstrate the regularity of movement

when they are strongly connected, such as in companies, and

houses. [10]. They also demonstrate a mobility pattern,

which provides information such as frequently visited spots.

Our proposed scheme for MSNs is an approach based on the

regularity of movement. To predict the mobility patterns of

nodes in dynamic networks such as MSNs, we employ the

Markov chain to illustrate how a node moves from one spot

to another with a certain transition probability. Then, we can

compute each node’s steady-state vector, which represents

the probability distributions of all spots in a network area.

We construct communities by clustering the vectors and

finally select the top-k influential nodes in each community.

We exploit spectral clustering to divide a node’s steady-state

vector into communities because spectral clustering is

highly effective for community detection [11–14]. We

conducted extensive simulations using the network simu-

lator NS-2 [15], and the results show that the CDMS has

better performance in minimizing the diffusion period in

synthetic networks, when compared to noncommunity-

based schemes.

The technical contributions of this paper can be sum-

marized as follows.

• We introduce a new scheme to solve the problem of

diffusion minimization in MSNs, by exploiting the

mobility patterns of nodes in a network area.

• We refine the contact probabilities using the commu-

nity concept for large-scale MSNs: We employ Markov

chain and spectral clustering to detect community

structures as time goes by.

• We conduct extensive simulations using the network

simulator NS-2, and compare the results of both

nonclustering and clustering schemes.

The rest of this paper is organized as follows. Section 2

explains the related work. In Sect. 3, we define the problem

that we aim to solve in this paper, the assumptions for the

system model, and the problem statement. In Sect. 4, we

explain CDMS in detail. In Sect. 5, we present the simu-

lation results. Finally, the conclusion is presented in

Sect. 6.

2 Background

2.1 Influence maximization in social networks

In the early studies on influence maximization, individual

behaviors were assumed to spread through social contact

information [16–18], and maximizing the spread in social

networks was dependent on specific network infrastruc-

tures such as social behavior. There have been various

studies in the fields of biology, marketing, and data sci-

ence, based on user behaviors. In the past, immunization

strategies had been proposed in the field of biology, to

protect from diseases such as HIV/AIDS, influenza, etc.

Recently, Sun et al. [19] introduced a new metric—con-

nectivity centrality—and its adaptive algorithm, to find

influential users in a targeted vaccination situation, using a

sensor network. In the past decade, influence maximization

has been studied in OSNs such as Facebook [20] and

Twitter [21], for viral marketing. This problem can be

applied to find a small group of influential individuals in

OSNs. Domingos and Richardson [22] first proposed the

problem and solved it using a probabilistic model. Kemp

et al. [23] proposed a greedy algorithm and Wang et al.

[24] designed a community-based greedy algorithm. They

proposed two fundamental stochastic models that are the

independent cascade (IC) model and the linear threshold

(LT) model; these two models propagate through individ-

ual interactions in social networks.

Fig. 1 Information diffusion in many applications

Wireless Netw

123

Page 3: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

2.2 Information diffusion in MSNs

Recently, with the evolution of wireless network tech-

niques and mobile devices such as smartphones, tablets,

and smart watches, many companies are interested in

applications to minimize the cost of marketing in MSNs.

Unlike the classical models such as the IC model or the LC

model for OSNs, diffusion models for MSNs exploit node

behaviors such as contact or mobility information. The

diffusion models for MSNs face the problem of finding the

top-k influential nodes to minimize the diffusion period in

MSNs [5, 6]. There are several diffusion schemes to min-

imize the diffusion period in MSNs [5, 6, 25]. Lu et al. [5]

suggested two algorithms—a community-based algorithm

and a distributed set–cover algorithm using the proba-

bilistic model—to minimize the diffusion period in MSNs.

For mobile cellular data offloading in DTNs, Han et al. [25]

suggested a data offloading scheme formulated from

research on information diffusion in MSNs. Recently, Chen

et al. proposed a diffusion scheme using k-means clustering

and the social features in MSNs [6]. Therefore, noncom-

munity-based schemes for diffusion in MSNs have lower

importance than the diffusion schemes based on commu-

nity information. This is because the diffusion schemes that

find the top-k influential nodes within each community are

more effective in minimizing the period than the ones that

find the top-k influential nodes in the whole network. In

this paper, we will solve the problem of diffusion mini-

mization by using social behaviors such as mobility

information in the MSNs. CDMS is also a community-

based diffusion scheme, but it follows an approach that is

different from that of the other community-based schemes.

2.3 Markov chain model

The Markov chain was first introduced by Andrey Markov,

and it is widely used to represent the statistical regularities

in computer science [26]. This theory is mainly utilized for

the prediction of node behavior in MSNs [27–29]. The

algorithm can define the probability and occupation ratio of

node’s movement, and the probability in each spot is rep-

resented by the steady-state distribution vector. To define

the vector, the transition probability matrix P containing

the probability of each node’s movement at each spot is

defined as follows:

p ¼

p11 p12 � � � p1gp21 p22 � � � p2g

..

. ...

� � � ...

pg1 pg2 � � � pgg

26664

37775 ð1Þ

where pij is the probability that a node moves from a cer-

tain spot i to a spot j, where a certain spot is one of the

sections in the network area. If p ið Þ denotes the probabilitydistribution in the i-th step, the rule governing the node’s

mobility can be expressed by the following equation:

p tð Þ ¼ PT� �t

p 0ð Þ ð2Þ

where p 0ð Þ is the initial probability matrix. If the Markov

chain is ergodic, there is a unique steady-state distribution

p with the relation pP = p, where p is the steady-state

distribution vector whose entries are nonnegative and add

to 1. The following are some applications of the Markov

chain in wireless networks and ad hoc networks. Soelisti-

janto et al. [27] proposed a forwarding scheme, which is an

analysis of the traffic distribution among the nodes in social

opportunistic networks, using the Markov model of steady-

state traffic distribution. Lee et al. [28] utilized the semi-

Markov process to predict the distribution of future user

spots. Recently, Yu et al. [29] proposed a new scheme,

which is a Markov-based multihop mobility prediction for

applications such as location-based services or mobile

crowd sensing for MSNs.

2.4 Spectral clustering technique

Clustering is one of the most widely used methods for

research in computer science, such as machine learning and

pattern recognition [13, 14]. Compared to the traditional

clustering techniques such as k-means, spectral clustering

has many advantages and has become one of the most

popular clustering techniques for exploratory data analysis.

Donath and Hoffmann first contributed the spectral graph

theory for partitioning a graph [30]. Fiedler proposed to

solve the graph-partitioning problem using the second-

smallest eigenvalue of the Laplacian matrix of a compo-

nent [31]. In this paper, we exploit the tutorial by Luxburg

[13] on the community structures between nodes. The main

tool to solve the graph-partitioning problem in spectral

clustering is a graph Laplacian spectral clustering algo-

rithm consisting of three matrixes: the similarity matrix S,

the graph Laplacian L, and a matrix Y whose columns are

the k-first multiple eigenvectors corresponding to the k-first

eigenvalues of L. The similarity graph Sij between nodes viand vj is defined as

S vi; vj� �

¼ e�

vi�vjj j22r2

� �

ð3Þ

where r is a scaling parameter to control the width of the

distance among nodes. After constructing the similarity

graph S, the normalized Laplacian Lnorm can be expressed

as:

Lnorm ¼ I � D�1=2SD�1=2 ð4Þ

Wireless Netw

123

Page 4: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

where D is the diagonal matrix of S. After calculating the

eigenvalue decomposition to Lnorm, an n 9 k matrix

Y whose columns are the multiple eigenvectors corre-

sponding to the k first eigenvalues, is constructed. The final

step in spectral clustering is to cluster Y, and techniques

such as k-means clustering can be used to cluster Y easily.

Then, a graph is constructed to represent the k subgraphs

that are strongly linked within each subgraph but weakly

linked to each other. From the view point of MSNs, a

subgraph is a community structure among nodes that are

strongly linked to each other.

Compared to the traditional clustering techniques,

spectral clustering has many advantages and has become

one of the most popular clustering techniques for

exploratory data analysis. In this paper, the motivation for

using spectral clustering technique is to solve a problem of

the high-dimensional dataset. In high dimensional data, the

traditional clustering techniques become less precise as the

number of dimensions grows. But, spectral graph theory is

a novel technique for clustering the data based on the

eigenvectors of Laplacian of the similarity graph.

3 Problem statement and system model

3.1 Problem statement

In previous studies, influence maximization was summa-

rized by the following equation:

argminI�V

uðIÞ; jIj � k ð5Þ

where V is the set of all nodes, I is the set of influential

nodes, and c Ið Þ is the expected number of total active nodes

at the end of the influence maximization process in social

networks. Unlike social networks, the diffusion in MSNs

has to consider node behaviors such as contact frequency

and mobility patterns, to publish information because the

network topology changes dynamically with time. The

challenge is to find a subset I that minimizes the diffusion

period in dynamic networks such as MSNs. The diffusion

problem for MSNs is described by the following equation:

argminI�V

uðIÞ; jIj � k ð6Þ

where u Ið Þ is the diffusion time for finding a subset I to

minimize the expected diffusion period. As a result,

information diffusion has to consider both total number of

propagated node from k initial node set and minimum

amount of diffusion time.

3.2 System model

In this section, we introduce our system model and the

assumptions used to solve the problem of diffusion mini-

mization in MSNs. We use HCMM [32] as the mode’s

mobility model because it can model the properties of

human mobility and contact frequency. As shown in [32],

the models have models spatial and temporal properties of

human mobility in social relationship such as contact and

mobility pattern. Each node has own home-community,

speed and location. Each node periodically measures its

historical location information. In our environment, there

are Nm mobile nodes where m is the number of nodes. The

undirected graph for the relationships among nodes is

defined as G = (N, E), where N denotes a finite set of

nodes and E denotes a finite set of links between nodes

based on social behaviors such as contact frequency. We

assume that Ni delivers a message to Nj whenever contact

happens among nodes, and that each node has a memory

space to store delivered messages and the current position.

All nodes have two different states: active and inactive.

Active nodes can deliver a message. However, inactive

nodes cannot deliver a massage and switch their status to

active when they receive a contact from active nodes

during the diffusion period. The diffusion process for dif-

fusion minimization consists of warm-up and diffusion

periods. The warm-up period is the time required to find

the top-k influential nodes by exploiting node behavior.

Diffusion period is the time required to propagate messages

from the top-k influential nodes to the other nodes. When

all nodes are in active states, the diffusion process is ter-

minated. There is a central server (CS), which stores

periodically recorded mobility information of the nodes in

the entire network during the warm-up period. The CS also

has a memory space and a hardware system to analyze the

recorded social information, and creates a similarity graph

G among nodes. As environment of mobile social net-

works, CS is also a mobile node, but it is accomplished in

special-purpose node. Before warm-up period, we already

employ CS. During warm-up period, as special-purpose

node, CS periodically move in entire network area to store

mobility logs. And then create snapshot. As special-pur-

pose node, CS have detected communities using clustering

technique at the end of warm-up period. After the warm-up

period, the CS no longer analyzes information in the

MSNs. Then, the nodes will start contacting each other

from the k influential nodes during the diffusion period.

Figure 2 shows an example of the diffusion process

between mobile nodes Ni and Nj. Both move at t0 and t1.

Node Ni is one of the top-k influential nodes, Nj is an

inactive node, and t0 and t1 are intervals in the diffusion

period. When Ni and Nj are in a communication range R at

t1, contact happens between the two nodes and Nj become

Wireless Netw

123

Page 5: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

active. As shown in Fig. 2, the nodes are in one of the two

states, and record the first contact time from the active

nodes. Then, they send the recorded time to the CS. When

all nodes have been switched by the active nodes, the CS

measures the time in the diffusion period. We also explain

in the case when nodes are relatively stable, or even static

in warm-up and diffusion period. During warm-up period,

nodes periodically record their same current position and

communicate with CS. During diffusion period, if there are

no movement in MSNs, there do not increase number of

active nodes. When they contact with other non-static

node, they switched by the active nodes.

In brief, the following assumptions are made:

• Each node in the MSNs has a nodeID that is unique and

has the same radius of communication range. Thus, the

nodes can deliver messages to each other and can

record their information such as current spot, the two

states, and the time in active state.

• Each node has its own community as home or special

spots. This network is composed of communities,

denoted as H = { H1;H2; . . .;Hl}, where l is a

community number.

• The CS is aware of the global information of all the

nodes in the entire network area because it can

periodically record mobility information from nodes.

• We do not consider the main resource consumption (such

as memory space, CPU, battery, power, and bandwidth).

4 Proposed scheme

4.1 Overview

Diffusion schemes in the community concept demonstrate

effective performances in minimizing the diffusion period,

by finding the top-k influential nodes within each

community instead of finding them from the whole net-

work topology. For this reason, we propose the CDMS, a

community-detection-based diffusion scheme using Mar-

kov chain and spectral clustering. Our scheme consists of

three steps. First, we exploit the spot information to learn

the individual behaviors. Second, we present each node as

a vector containing individual behaviors, using the Markov

chain. Finally, the vectors are clustered into one of the

communities through clustering techniques, and we can

determine the most influential node in each community.

4.2 Geographic regularity of node’s movement

The movement of each node is recorded by the CS to

predict the future node behaviors. The CS periodically

accumulates the current position of each node during the

warm-up period. The CS uses a snapshot (SNs), which

represents the locations of all nodes in a network area

during an interval ts; where s is the number of the SN and

t is the interval. By exploiting the SN, the CS can record the

geographic regularity of a node’s movement. After the CS

collects SNs in every interval t, each SN is partitioned into

certain network sections denoted by SP = {SP1, SP2,…,

SPg}, where g is the number of SPs. Figure 3(a), (b), (c),

and (d) show the topologies of the entire network area at

time t0, t1, t2, and t3. As shown in Fig. 3, N2 moves to SP4

from SP2 between SN0 and SN1, while N1 is located in SP1

in both SN0 and SN1. Note that node N2 moves more

actively than node N1 and they have different mobility

patterns. However, nodes N4 and N5 have the same

mobility pattern because they are always located in same

spots at t0, t1, t2, and t3. In this manner, the CS stores the

movements of every node in the warm-up period.

Fig. 2 Diffusion periods at t0and t1

Wireless Netw

123

Page 6: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

4.3 Mobility pattern prediction

The goal of our scheme is to predict the mobility pattern of

each node by using the Markov chain, and we assume that

the CS uses some information of Ni such as\ nodeID,

spotID, time interval[ and constructs the transition matrix

P during the warm-up period. The CS can store the

sequence of spotIDs for every node during the warm-up

period. To construct a transition matrix P, the CS calculates

the conditional probability of the node’s next movement

during the warm-up period. As shown in Fig. 4, there are

sixteen cases because there are four network areas: SP1,

SP2, SP3, and SP4. Figure 4 shows the transition matrix P

of node Ni between time t0 and t9. The probability of

movement of Ni from SP1 to SP2 is 0.4. Through these

methods, the CS can construct 4� 4 transition matrix P for

each node using Eq. (1).

After constructing the transition matrix P, the CS cal-

culates vi and solves Eq. (2) using a homogeneous linear

system, which is the steady-state vector of Ni that

represents the probability distribution for the node’s

mobility. There is an instance to explain it in the concrete:

Suppose that the network area is partitioned by four spots

(SP1, SP2, SP3, SP4). Then, vi and vj will be\ 0.8, 0.05,

0.05, 0.1[ and\ 0.5, 0.05, 0.05, 0.4[. Ni will have a

Fig. 3 Snapshots at t0, t1, t2,

and t3

Fig. 4 Transition matrix P of node Ni between t0 and t9

Wireless Netw

123

Page 7: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

higher probability of being located in SP1 than in other

spots. Otherwise, Nj will move in SP1 or SP4. The vectors

vi and vj can be calculated as mentioned above. Note that virepresents a mobility pattern of how a node will stay in

certain spots in the network area.

4.4 Applying spectral clustering

To minimize the diffusion period in MSNs, we exploit a

community concept using the spectral clustering technique.

Algorithm 1 describes the spectral clustering algorithm to

help understand our proposed scheme. Spectral clustering

consists of three steps, and we describe each step in detail.

The first step of the spectral clustering algorithm con-

structs a similarity matrix S 2, where Sij � 0 reflects the

relationships between all the nodes according to Eq. (3).

The next step is constructing a matrix L. This is the main

tool in the spectral clustering algorithm used to solve the

graph-partitioning problem by exploiting the eigenvalues,

the eigenvectors of L that are calculated by Eq. (4). Then,

the n 9 k matrix Y is created, where the matrix is repre-

sented by the multiple eigenvectors corresponding to the k

first eigenvalues of L. In the final step of spectral cluster-

ing, Y is classified by k-means clustering into communities

C = {C1, C2, C3,…, Ck} and the number of communities is

equal to k.

4.5 Finding seed node set

The CS selects one per community by calculating the

Euclidian distances between the steady-state vectors cor-

responding to the top-k influential nodes. The distance

between vci and vji is defined as

D x; yð Þ ¼

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPgi¼1 xi � yið Þ2

qffiffiffig

p ð7Þ

where x and y are vci and vji , respectively.v

ci is the centroid

vector and vji is one of the steady-state vectors corre-

sponding to the nodes in Ci. Algorithm 2 is used to

determine the k influential nodes as per Eq. (7). As shown

Algorithm 2, each seed node i is selected in each com-

munity, where a seed node i in Ci has the most similar

value with centroid vector vji in each community Ci. CS

determine the top-k node set I in k-iteration. After a warm-

up period, the states of the inactive nodes are switched to

active by the top-k influential nodes, and the CS is no

longer involved. When all the nodes have been contacted

by active nodes, the diffusion process is terminated, as

mentioned above in Sect. 3.

5 Performance evaluation

5.1 Simulation environment

We use the network simulator NS-2 v2.35 to evaluate the

CDMS due to the build diffusion process and analyze its

result. All nodes in a network area move according to the

HCMM [32]. HCMM is frequently used in MSN simula-

tions. The number of nodes range from 40 to 90. Since, in

the real world, k should be a small value, the number of

top-k influential nodes can be up to 20% of the total nodes

in each setting. The size of the network is set to 450 m �450 m and the number of grids is 9 where each grid is an

SP. Each node has its own community Hi as a home, and

the total number of communities is 4. Communication

ranges are set to 1, 5, 10, 15 and 20 m. Velocity of a node

is from 2 to 10 m/s, which is appropriate for the movement

of both people and vehicles in MSNs. The warm-up period

is set to 500 s to collect the movement patterns of the nodes

and the diffusion period is 5000 s (excluding the warm-up

period). Each node is first located in its community cell as

the HCMM environment. Table 1 shows the parameters of

the simulation environment, in detail.

Figure 5 shows our simulated network and the HCMM

environment that represents one of the home cells (such as

a community). The nodes frequently move to one of their

home cells, issue logs of current context, and then com-

municate with the CS (please see Sect. 4). The CS builds

SNs every 10 s. At the end of the warm-up period, the CS

calculates the mobility pattern of each node using a Mar-

kov process and extracts a steady-state vector that

Algorithm 1. Community detection using spectral clustering

Input: Data points v1, v2 …vm, Number of cluster k

Output: k-community groups

1 Construct a similarity matrix S and Laplacian matrix Lnorm

2 Use eigenvalue decomposition of Lnorm

3 Form the matrix Y by stacking the eigenvectors in columns

corresponding consist first k eigenvalue of Lnorm

4 Use k-means clustering on the rows of Y

Algorithm 2. Determining k influential nodes

Input: Top-k influential node set: I = { ; }

Output: Seed node set I

for i ¼ 1. . .k do

vci / centroid vector of community i

seed / minj

D vci ; vji

� �� �

I / I [ {seed}

end for

Wireless Netw

123

Page 8: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

represents a distribution of how long a node would stay at a

certain SP. After extracting vectors for all the nodes, the

CS clusters them using the spectral clustering technique.

The top-k influential nodes are finally selected; these

influential nodes are more similar to centroids than the

other nodes in each community. After a warm-up period,

the inactive nodes are affected by the top-k influential

nodes. Once all nodes have received contact from the

active nodes, the simulation is terminated. We run each

scheme 20 times and measure the average time without the

warm-up period.

5.2 Simulation results

CDMS is compared to three schemes—RAND, K-CEN-

TER [8], and a community-based diffusion scheme using

Markov chain and k-means clustering (CDMK). The top-k

influential nodes in RAND are selected randomly. The

K-CENTER scheme is based on a graph G, where G is

represented by the contact frequency of all nodes as men-

tioned above in Sect. 3. The selection of the top-k influ-

ential nodes can be formulated using the asymmetric

k-center problem, which is a solution for diffusion mini-

mization under our diffusion model. The CDMK is similar

to our proposed scheme because the authors in [6] solved a

problem of diffusion minimization using social information

and community concept. However, the experiment in

HCMM did not use dynamic social features. Therefore, we

simply changed the social features as steady-state vectors

and implemented the community concept through k-means

clustering, as mentioned in [6].

5.2.1 Number of nodes

In order to evaluate the performance in different network

environments, we examined the performance of each

scheme by varying the number of nodes from 40 to 90. As

expected, the diffusion times of all schemes decreased as

Table 1 Simulation parameters

Parameter (unit) Value (default)

Number of nodes 40, 50, 60, 70, 80, 90 (80)

Ratio of k influential nodes (%) 5, 7.5, 10, 15, 20 (10)

Size of the network (m2) 450 9 450

Number of home-cell communities 4

Community size (m2) 150 � 150

Node speed (m=s) 2 * 10

Radius of communication range (m) 1, 5, 10, 15, 20 (10)

Interval for SN (s) 10

Warm-up period (s) 500

Total diffusion process (s) 5000

Fig. 5 HCMM cells in network simulation

Fig. 6 Diffusion times with

different numbers of nodes

Wireless Netw

123

Page 9: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

the number of nodes increased. As shown in Fig. 6, RAND

had the worst performance and K-CENTER had a better

performance than RAND. As the number of nodes

increased, the diffusion time of K-CENTER did not change

much, compared to that of the community-based diffusion

scheme, especially in the case of more dynamic complex

networks. Meanwhile, the performances of the community-

based schemes were up to 10% higher than that of the

noncommunity-based schemes. This is because the

community-based schemes select influential nodes from

individual communities rather than from the entire network

as done in the K-CENTER scheme. In Fig. 6, when the

number of nodes is 40, the community-based schemes have

the best performance because the number of clusters is

equal to the number of home-cell communities mentioned

above in Table 1. CDMS has the best performance except

in the cases of 40 and 60 nodes because the spectral

clustering technique outperforms the traditional clustering

Fig. 7 Diffusion times with

different ratios of influential

nodes

Fig. 8 Diffusion times with

different communication ranges

Wireless Netw

123

Page 10: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

algorithms such as the k-means algorithm [13]. Although

CDMK has better performance than our scheme in the 40

and 60-node cases, the difference in the diffusion times

between the two community-based schemes is small.

5.2.2 Ratio of influential nodes

Because the number of influential nodes in the real world

should be small, we select k to be no higher than 20% of

the total nodes for each setting. Figure 7 shows the average

diffusion times for different numbers of influential nodes.

However, there is no performance difference when k is

more than 20% of the total nodes. RAND still has the worst

performance while K-CENTER performs better than

RAND. Community-based schemes always demonstrate

better performances than the noncommunity-based

schemes and CDMS has the best performance. In addition,

the diffusion time of K-CENTER does not change much

because of the increase in the ratio of k influential nodes,

when compared to the community-based schemes. This

Fig. 9 Percentage of active

nodes in diffusion period.

a Communication range of 1 m.

b Communication range of

10 m

Wireless Netw

123

Page 11: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

means that K-CENTER chooses influential nodes in only

large communities in the case of dense environments. Since

noncommunity-based schemes do not consider a commu-

nity structure, it difficult to find critical nodes such as

isolated or nonactive nodes for diffusion. Meanwhile, in

community-based schemes, critical nodes have high con-

tact probabilities with nodes in their communities because

of the selected influential node in each community, where

nodes in each community have similar mobility patterns.

Therefore, community-based schemes have better perfor-

mances than noncommunity-based schemes.

5.2.3 Communication ranges

Figure 8 shows the diffusion times for communication

ranges of 1, 5, 10, 15, and 20 m. When the communication

range is less than 1 m, not all schemes can propagate

information to all the nodes because a few nodes are not

affected by the k-influential nodes within the diffusion

period. Thus, only in case of 1 m, we measure the diffusion

time when the percentage is equal to a threshold value

defined as 95%, which is the maximum percentage of the

affective nodes. Increase in the communication range is

related to the environment of dense MSNs. Thus, as the

communication range becomes wider, all schemes have

shorter diffusion times because the contact probabilities

between nodes increase. As shown in Fig. 8, the perfor-

mance of RAND is the worst and K-CENTER outperforms

RAND. However, the community-based schemes have

better performances than RAND and K-CENTER. When

the range is 1 m, the differences in the performances

among schemes are too large because of sparse MSNs.

Meanwhile, as shown in Fig. 8, the community-based

schemes in dense MSNs such as 5, 10, 15, and 20 m out-

perform the other two algorithms, but the differences in

performances are relatively small when compared to the

case of 1 m because there are many nodes such as isolated

or nonactive nodes in sparse MSNs. CDMS and CDMK

consider these nodes by constructing community structures

with varying distributions for the node’s mobility patterns.

Therefore, community-based schemes can be implemented

well in sparse networks.

5.2.4 Percentages of active nodes in diffusion period

Finally, we compare the number of active nodes for each

scheme during the diffusion period. Figure 9 shows the

percentage of active nodes for a diffusion period. Fig-

ure 9(a) and (b) show sparse and dense networks, respec-

tively, based on each node’s communication range. When

the communication range is 1 m, a few nodes may not be

affected by the k influential nodes within the diffusion

period. Thus, we measure the diffusion time when the

percentage is equal to a threshold value, as mentioned

above in Sect. 5.2.3. Figure 9 (a) shows the average per-

centage of active nodes within a communication range

1 m, where RAND has the worst performance and

K-CENTER has a better performance than RAND. Com-

munity-based schemes have shorter diffusion processes

than RAND and K-CENTER. CDMS has the best perfor-

mance for most of the diffusion time except for the interval

between 420 and 520 s. Figure 9(b) shows the average

percentage of active nodes for a communication range of

10 m, where K-CENTER has the best performance

between 20 and 130 s; however, the K-CENTER

scheme has a problem in propagation for a few discon-

nected or nonactive nodes. Meanwhile, community-based

schemes still outperform the other two schemes for the

same reasons mentioned above in Sect. 5.2.3. In summary,

the community-based diffusion schemes reduce the term of

the diffusion process in the sparse network for propagation

through the entire network through finding isolated or

nonactive nodes, where these nodes are propagated by a

node, which has the same mobility pattern.

6 Conclusion

We addressed a problem of finding the top-k influential

nodes to propagate information effectively to nodes in a

dynamic network as quick as possible, referred to as the

diffusion minimization problem. In this paper, we analyzed

solutions for the diffusion minimization problem in MSNs

by proposing the CDMS, which is a novel diffusion

scheme in which influential nodes are selected through

node behaviors and techniques of community detection in

MSNs. It was more effective to solve the diffusion mini-

mization because the influential nodes were selected from

within communities instead of from the entire network

topology. Since community-based diffusion schemes also

considered nonactive nodes and isolated nodes, the simu-

lation results showed that the community-based schemes

had better performances compared to the noncommunity-

based schemes regardless of the sparseness of the MSNs. In

addition, the spectral clustering technique has many

advantages over k-means clustering. For that reason,

CDMS has a better performance compared to CDMK. As

all the methods have warm-up time and CS. Therefore, we

execute information diffusion strategy algorithms in the

mobile node, since the main resource consumption (such as

memory, CPU, and battery) of the mobile app will be cost

by central server part in warm-up period, and is related

with the sensing frequency, which is not the focus of this

paper. As a future work, we plan to study the various social

behaviors among nodes and the recent clustering

Wireless Netw

123

Page 12: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

techniques for detecting communities, to diffuse informa-

tion more effectively with the various resources in MSNs.

Acknowledgements This research was supported by the Basic

Science Research Program through the National Research Foundation

of Korea (NRF) funded by the Ministry of Education, Science, and

Technology (2016R1A2B4010142).

References

1. Ma, H., Yang, H., Lyu, M. R., & King, I. (2008). Mining social

networks using heat diffusion processes for marketing candidates

selection. In Proceedings of the 17th ACM conference on Infor-

mation and knowledge management (pp. 233–242).

2. Richardson, M., & Domingos, P. (2002). Mining knowledge-

sharing sites for viral marketing. In Proceedings of the 8th ACM

SIGKDD international conference on Knowledge discovery and

data mining (pp. 61–70).

3. Nguyen, H. A., & Silvia, G. (2009). Routing in opportunistic

networks. International Journal of Ambient Computing and

Intelligence, 1(3), 19–38.

4. Conti, M., Giordano, S., May, M., & Passarella, A. (2010). From

opportunistic networks to opportunistic computing. IEEE Com-

munications Magazine, 48(9), 126–139.

5. Lu, Z., Wen, Y., & Cao, G. (2014). Information diffusion in

mobile social networks: The speed perspective. In Proceedings of

IEEE INFOCOM (pp. 1932–1940).

6. Chen, X., & Xiong, K. (2015). Dynamic social feature-based

diffusion in mobile social networks. In Proceedings of IEEE/CIC

International Conference on Communications in China (ICCC)

(pp. 1–6).

7. Myers, S. A., Zhu, C., & Leskovec, J. (2012). Information dif-

fusion and external influence in networks. In Proceedings of the

18th ACM SIGKDD international conference on knowledge dis-

covery and data mining (pp. 33–41).

8. Panigrahy, R., & Vishwanathan, S. (1998). An O (log*n)

approximation algorithm for the asymmetric p-center problem.

Journal of Algorithms, 27(2), 259–268.

9. Girvan, M., & Newman, M. E. (2002). Community structure in

social and biological networks. Proceedings of the National

Academy of Sciences, 99(12), 7821–7826.

10. Hsu, W. J., Spyropoulos, T., Psounis, K., & Helmy, A. (2007).

Modeling time-variant user mobility in wireless mobile networks.

In Proceedings of IEEE INFOCOM (pp. 758–766).

11. van Gennip, Y., Hunter, B., Ahn, R., Elliott, P., Luh, K.,

Halvorson, M., et al. (2013). Community detection using spectral

clustering on sparse geosocial data. SIAM Journal on Applied

Mathematics., 73(1), 67–83.

12. Zhang, S., Wang, R. S., & Zhang, X. S. (2007). Identification of

overlapping community structure in complex networks using

fuzzy c-means clustering. Statistical Mechanics and its Applica-

tions, 374(1), 483–490.

13. Von Luxburg, U. (2007). A tutorial on spectral clustering.

Statistics and computing, 17(4), 395–416.

14. Ng, A. Y., Jordan, M. I., & Weiss, Y. (2001). On spectral clus-

tering: Analysis and an algorithm. In Proceedings of Advances in

Neural Information Processing Systems. Cambridge, MA: MIT

Press.

15. Network Simulator-2. (2014). http://www.isi.edu/nsnam/ns/.

16. Christakis, N. A., & Fowler, J. H. (2007). The spread of obesity in

a large social network over 32 years. New England Journal of

Medicine, 357(4), 370–379.

17. Centola, D., Eguıluz, V. M., & Macy, M. W. (2007). Cascade

dynamics of complex propagation. Physica A: Statistical

Mechanics and its Applications, 374(1), 449–456.

18. Lambiotte, R., & Panzarasa, P. (2009). Communities, knowledge

creation, and information diffusion. Journal of Informetrics, 3(3),

180–190.

19. Sun, X., Lu, Z., Zhang, X., Salathe, M., & Cao, G. (2015). Tar-

geted vaccination based on a wireless sensor system. In Pro-

ceedings of Pervasive Computing and communications

workshops (pp. 215–220).

20. Bakshy, E., Rosenn, I., Marlow, C., & Adamic, L. (2012). The

role of social networks in information diffusion. In Proceedings

of the 21th international conference on World Wide Web (pp.

519–528).

21. Romero, D. M., Meeder, B., & Kleinberg, J. (2011). Differences

in the mechanics of information diffusion across topics: Idioms,

political hashtags, and complex contagion on twitter. In Pro-

ceedings of the 20th international conference on World wide web

(pp. 695–704).

22. Domingos, P., & Richardson, M. (2001). Mining the network

value of customers. In Proceedings of the 17th ACM SIGKDD

international conference on Knowledge discovery and data

mining (pp. 57–66).

23. Kempe, D., Kleinberg, J., & Tardos, E. (2003). Maximizing the

spread of influence through a social network. In Proceedings of

the 9th ACM SIGKDD international conference on knowledge

discovery and data mining (pp. 137–146).

24. Wang, Y., Cong, G., Song, G., & Xie, K. (2010). Community-

based greedy algorithm for mining top-k influential nodes in

mobile social networks. In Proceedings of the 16th ACM

SIGKDD international conference on Knowledge discovery and

data mining (pp. 1039–1048).

25. Han, B., Hui, P., Kumar, V. A., Marathe, M. V., Shao, J., &

Srinivasan, A. (2012). Mobile data offloading through oppor-

tunistic communications and social participation. IEEE Trans-

actions on Mobile Computing, 11(5), 821–834.

26. Markov chain. (2016). https://en.wikipedia.org/wiki/Markov_

chain.

27. Soelistijanto, B., & Howarth, M. (2012). Traffic distribution and

network capacity analysis in social opportunistic networks. In

Proceedings of the 8th IEEE international conference on the

wireless and mobile computing, networking and communications

(WiMob) (pp. 823–830).

28. Lee, J. K., & Hou, J. C. (2006). Modeling steady-state and

transient behaviors of user mobility: Formulation, analysis, and

application. In Proceedings of the 7th ACM international sym-

posium on mobile ad hoc networking and computing (pp. 85–96).

29. Yu, Z., Yu, Z., & Chen, Y. (2016). Multi-hop mobility prediction.

Mobile Networks and Applications, 21(2), 367–374.

30. Donath, W. E., & Hoffman, A. J. (1973). Lower bounds for the

partitioning of graphs. IBM Journal of Research and Develop-

ment, 17(5), 420–425.

31. Fiedler, M. (1973). Algebraic connectivity of graphs. Cze-

choslovak Mathematical Journal, 23(2), 298–305.

32. Boldrini, C., & Passarella, A. (2010). HCMM: Modelling spatial

and temporal properties of human mobility driven by users’ social

relationships. Computer Communications, 33(9), 1056–1074.

Wireless Netw

123

Page 13: Community-based diffusion scheme using Markov chain and ...algo.yonsei.ac.kr/international_JNL/Jegwang.pdfwireless network techniques, mobile social networks (MSNs) are presenting

Jegwang Ryu is currently an

M.S. candidate in computer

science at Yonsei University in

Korea. His research interests

include mobile social networks,

delay tolerant networks and

machine learning.

Jiho Park is currently an Ph.D.

candidate in computer science at

Yonsei University in Korea. His

research interests include mobile

social networks, machine Learn-

ing, deep learning and social

network analysis.

Junyeop Lee is currently an

Ph.D. candidate in computer sci-

ence at Yonsei University in

Korea. His research interests

include mobile social networks,

machine Learning, deep learning

and social network analysis.

Sung-Bong Yang received his

M.S. and Ph.D. from the Dept.

of Computer Science at the

University of Oklahoma in 1986

and 1992, respectively. He has

been a professor at Yonsei

University since 1994. His

research interests include graph

algorithms, mobile computing,

machine learning and social

network analysis.

Wireless Netw

123