Clustering determines the dynamics of complex contagions...

15
Clustering determines the dynamics of complex contagions in multiplex networks Yong Zhuang, 1 Alex Arenas, 2 and Osman Ya˘ gan 1 1 Department of Electrical and Computer Engineering, Carnegie Mellon University, Pittsburgh, PA 15213 USA 2 Departament d’Enginyeria Inform´ atica i Matem´ atiques, Universitat Rovira i Virgili, 43007 Tarragona, Spain (Dated: December 8, 2016) We present the mathematical analysis of generalized complex contagions in a class of clustered multiplex networks. The model is intended to understand spread of influence, or any other spread- ing process implying a threshold dynamics, in setups of interconnected networks with significant clustering. The contagion is assumed to be general enough to account for a content-dependent linear threshold model, where each link type has a different weight (for spreading influence) that may depend on the content (e.g., product, rumor, political view) that is being spread. Using the generating functions formalism, we determine the conditions, probability, and expected size of the emergent global cascades. This analysis provides a generalization of previous approaches and is specially useful in problems related to spreading and percolation. The results present non trivial dependencies between the clustering coefficient of the networks and its average degree. In particular, several phase transitions are shown to occur depending on these descriptors. Generally speaking, our findings reveal that increasing clustering decreases the probability of having global cascades and their size, however this tendency changes with the average degree. There exists a certain average degree from which on clustering favors the probability and size of the contagion. By comparing the dynamics of complex contagions over multiplex networks and their monoplex projections, we demonstrate that ignoring link types and aggregating network layers may lead to inaccurate con- clusions about contagion dynamics, particularly when the correlation of degrees between layers is high. PACS numbers: I. INTRODUCTION The study of dynamical processes on real-world com- plex networks has been an active research area over the past decade. Some of the most widely studied problems include cascading failures in interdependent networks [1– 6], simple contagions (e.g., disease propagation in human and animal populations [7–25], etc.), and complex con- tagions (e.g., diffusion of influence, beliefs, norms, and innovations in social networks [26–29]). Recently, the attention was shifted from single, isolated networks to multiplex and multi-layer networks [17, 18, 29–43]. This shift is primarily driven by the observation that links in a network might be categorized according to the nature of the relationship they represent (e.g., friends, family, office-mates) as well as according to the social network they belong (e.g., Google+ vs. Facebook links), and each link type might play a different role in the dynamical pro- cess. In this work, we focus on the analysis of complex con- tagion processes that take place on multiplex (or, multi- layer) networks. In doing so, we adopt the content- dependent linear threshold model of social contagions proposed by Ya˘ gan and Gligor [29]. Their framework is a generalization of the linear threshold model intro- duced by Watts [28] and is based on individuals adopt- ing a behavior when their perceived proportion of active neighbors exceeds a certain threshold; the key to that modeling framework is that one’s perceived influence de- pends on the types of the relationships they have and the context in which diffusion is being considered. More precisely, each individual in the network can be in one of the two states, active or inactive. Each link type-i is associated with a content-dependent weight c i in [0, ] that encodes the relative importance of this link type in spreading the given content. Then, an inactive node with m i active neighbors and d i - m i inactive neighbors via type-i links turns active only if i c i m i i c i d i τ where τ is the node’s threshold drawn from a distribution P (τ ). It is assumed that nodes update their state syn- chronously and once active, a node stays active forever. Ya˘ gan and Gligor analyzed [29] the content-dependent linear threshold model in multiplex networks and derived the conditions, probability, and expected size of global cascades, i.e., cases where activating a randomly selected node leads to activation of a positive fraction of the pop- ulation in the limit of large system sizes. However, their multiplex network model was formed by combining in- dependent layers of networks (one for each link type), where each layer is generated by the configuration model [44, 45]. Although a good starting point, configuration model is known to generate networks that can not ac- curately capture some important aspects of real-world social networks, most notably the property of high clus- tering [46, 47]. Informally known as the phenomenon that “friends of our friends” are likely to be our friends, clus- tering has been shown to affect the dynamics of various

Transcript of Clustering determines the dynamics of complex contagions...

Page 1: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

Clustering determines the dynamics of complex contagions in multiplex networks

Yong Zhuang,1 Alex Arenas,2 and Osman Yagan1

1Department of Electrical and Computer Engineering,Carnegie Mellon University, Pittsburgh, PA 15213 USA2Departament d’Enginyeria Informatica i Matematiques,

Universitat Rovira i Virgili, 43007 Tarragona, Spain(Dated: December 8, 2016)

We present the mathematical analysis of generalized complex contagions in a class of clusteredmultiplex networks. The model is intended to understand spread of influence, or any other spread-ing process implying a threshold dynamics, in setups of interconnected networks with significantclustering. The contagion is assumed to be general enough to account for a content-dependentlinear threshold model, where each link type has a different weight (for spreading influence) thatmay depend on the content (e.g., product, rumor, political view) that is being spread. Using thegenerating functions formalism, we determine the conditions, probability, and expected size of theemergent global cascades. This analysis provides a generalization of previous approaches and isspecially useful in problems related to spreading and percolation. The results present non trivialdependencies between the clustering coefficient of the networks and its average degree. In particular,several phase transitions are shown to occur depending on these descriptors. Generally speaking,our findings reveal that increasing clustering decreases the probability of having global cascades andtheir size, however this tendency changes with the average degree. There exists a certain averagedegree from which on clustering favors the probability and size of the contagion. By comparingthe dynamics of complex contagions over multiplex networks and their monoplex projections, wedemonstrate that ignoring link types and aggregating network layers may lead to inaccurate con-clusions about contagion dynamics, particularly when the correlation of degrees between layers ishigh.

PACS numbers:

I. INTRODUCTION

The study of dynamical processes on real-world com-plex networks has been an active research area over thepast decade. Some of the most widely studied problemsinclude cascading failures in interdependent networks [1–6], simple contagions (e.g., disease propagation in humanand animal populations [7–25], etc.), and complex con-tagions (e.g., diffusion of influence, beliefs, norms, andinnovations in social networks [26–29]). Recently, theattention was shifted from single, isolated networks tomultiplex and multi-layer networks [17, 18, 29–43]. Thisshift is primarily driven by the observation that links ina network might be categorized according to the natureof the relationship they represent (e.g., friends, family,office-mates) as well as according to the social networkthey belong (e.g., Google+ vs. Facebook links), and eachlink type might play a different role in the dynamical pro-cess.

In this work, we focus on the analysis of complex con-tagion processes that take place on multiplex (or, multi-layer) networks. In doing so, we adopt the content-dependent linear threshold model of social contagionsproposed by Yagan and Gligor [29]. Their frameworkis a generalization of the linear threshold model intro-duced by Watts [28] and is based on individuals adopt-ing a behavior when their perceived proportion of activeneighbors exceeds a certain threshold; the key to thatmodeling framework is that one’s perceived influence de-pends on the types of the relationships they have and

the context in which diffusion is being considered. Moreprecisely, each individual in the network can be in oneof the two states, active or inactive. Each link type-i isassociated with a content-dependent weight ci in [0,∞]that encodes the relative importance of this link type inspreading the given content. Then, an inactive node withmi active neighbors and di − mi inactive neighbors viatype-i links turns active only if∑

i cimi∑i cidi

≥ τ

where τ is the node’s threshold drawn from a distributionP (τ). It is assumed that nodes update their state syn-chronously and once active, a node stays active forever.

Yagan and Gligor analyzed [29] the content-dependentlinear threshold model in multiplex networks and derivedthe conditions, probability, and expected size of globalcascades, i.e., cases where activating a randomly selectednode leads to activation of a positive fraction of the pop-ulation in the limit of large system sizes. However, theirmultiplex network model was formed by combining in-dependent layers of networks (one for each link type),where each layer is generated by the configuration model[44, 45]. Although a good starting point, configurationmodel is known to generate networks that can not ac-curately capture some important aspects of real-worldsocial networks, most notably the property of high clus-tering [46, 47]. Informally known as the phenomenon that“friends of our friends” are likely to be our friends, clus-tering has been shown to affect the dynamics of various

Page 2: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

2

diffusion processes [31, 48–55] significantly.

With this in mind, our main contribution in this pa-per is to provide a thorough analysis of influence spreadprocess in a class of clustered multiplex networks. In par-ticular, we study the content dependent linear thresholdmodel in a multiplex network model where each link type(or, network layer) is formed by the clustered randomnetwork model proposed by Newman [48] and Miller [56].We solve for the critical threshold, probability, and ex-pected size of global cascades and confirm our analyticalresults via extensive simulations. The main observationfrom our results is that clustering has a double-facetedimpact on the probability and expected size of globalcascades. Namely, we show that clustering decreases theprobability and size of cascades when average degree inthe network is small, whereas after a certain value of av-erage degree, clustering is shown to facilitate cascades.

We also compare the dynamics of complex contagionsover multiplex networks and their monoplex projections.There has been recent interest [30] in understandingwhether monoplex projection of a multiplex network (ob-tained by ignoring the colors of edges and aggregatingthe layers) can still capture the essential properties (e.g.,cascade threshold and size) of a diffusion process. Inthe affirmative, this would eliminate the need for consid-ering the full multiplex structure of real-world systemsin tackling similar problems. We show that even in thesimplest case where all link types have the same influenceweight (i.e., c1 = c2 = · · · ), monoplex theory may not beable to capture contagion dynamics accurately, reinforc-ing the need for studying multiplex networks in its correctsetup. We observe that the accuracy of monoplex theoryin capturing cascade dynamics over multiplex networksdepends tightly on the assortativity (i.e., correlation be-tween the degrees of connected pairs) of the network. Forinstance, when assortativity is negligible, monoplex the-ory is seen to predict cascade dynamics very well, whilein highly assortative cases its ability to predict contagionbehavior diminishes significantly.

Finally, we proof the possibility of an unforeseen be-havior in the dynamics of complex contagions in multi-plex networks, i.e., that of observing more than two phasetransitions in the cascade size as the mean degrees in net-work layers increase. It has been reported [18, 28] manytimes that threshold models of complex contagion exhibittwo phase transitions as the average degree increases; asecond-order transition at low degrees marking the for-mation of a giant component of vulnerable nodes and afirst-order transition at high degrees due to increased localstability of nodes. Here, we consider a multiplex whereone layer has degree distribution Poi(λ) while the de-gree in the second layer follows Poi(λ/α) with probabil-ity α and is zero with probability 1− α. In this setting,we observe that in general there exist two intervals of λfor which cascades are possible, amounting to four phasetransitions as opposed to two; also, it is seen that onlythe first transition is second-order while the remainingones are first order. However, depending on the value of

α, these regions may overlap (with overlap starting whenα exceeds a critical value) resulting again with only twophase transitions; see Section VI C for details.

The paper is organized as follows. We give detailsof the models applied in this study and the problemto be considered in Section II. In Sections III and IVwe present the main results of this work, and confirm itthrough extensive computer simulations in Section V. InSection VI, we make a comparison between complex con-tagions in monoplex networks and multiplex networks,and also demonstrate the new phenomenon about thenumber of phase transitions. Finally, Section VII sum-marizes our work and gives future directions.

II. MODEL: STRUCTURE AND DYNAMICS

A. Random Graphs with Clustering

Our goal is to study complex contagion processes insynthetic networks that capture some important aspectsof real-world networks but otherwise are generated ran-domly. It is known [44, 45] that the widely used configu-ration model [44] generates tree-like graphs with numberof cycles approaching to zero as the number of nodes getslarge. However, most social networks exhibit high clus-tering, informally known as the propensity of a “friendof a friend” to be one’s friend. Put differently, real-worldsocial networks are usually not tree-like and instead haveconsiderable number of cycles, particularly of size three;i.e., triangles. With this in mind, Miller [57] and New-man [45] proposed a modification on the configurationmodel to enable generating random graphs with givendegree distributions and tunable clustering.

The model proposed in [45, 57] is often referred to asrandom networks with clustering and is based on thefollowing algorithm. Given a joint degree distribution{pst}∞s,t=0 that gives the probability that a node has ssingle edges and t triangles, each node will be given sstubs labeled as single and 2t stubs labeled as trian-gles with probability pst, for any s, t = 0, 1, . . .. Then,stubs that are labeled as single are joined randomly toform single edges that are not part of a triangle, whereaspairs of triangle stubs from three nodes are randomlymatched to form triangles between the three participat-ing nodes; the total degree of a node is then distributedby pk =

∑s,t:s+2t=k pst. As in the standard configura-

tion model, it can be shown that the number of cyclesformed by single edges goes to zero as n gets large, andso does the number of cycles of length larger than three[44].

We quantify the level of clustering using the widelyrecognized global clustering coefficient [45], defined via

Cglobal =3× (number of triangles in network)

number of connected triples.

Here, “connected triples” means a single vertex con-nected by edges to two others. It was shown in [44] that

Page 3: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

3

Cglobal is positive in the random clustered network model,while it approaches to zero with increasing network sizein the standard configuration model.

B. Multi-layer and Multiplex Network Models

In this paper, we consider a multiplex network wherelinks are classified into different types, or colors; they arereferred to as edge-colored multi-graphs by some authors.For ease of exposition, we consider the case with only twocolors, red and blue, but the discussion can easily be ex-tended to arbitrary number of colors. Let R and B denotethe sub-networks formed by red and blue edges, respec-tively. A possible motivation is that R models the kinshipcontact network among individuals, while the networkB stands for the colleagueship network. Alternatively,we can think of B modeling the physical (e.g., face-to-face) relationships among human beings while R modelsconnections through an online social network (e.g., Face-book).

In line with the second motivation, we assume that net-work B is defined on the vertices N = {1, . . . , n}, whileR contains only a subset of the nodes in N to accountfor the fact that not every individual participates in on-line social networks; e.g., we assume that each vertexin N participates in R independently with probabilityα ∈ (0, 1], meaning that the set of vertices of R consti-tutes α-fraction of the whole population. We illustratein Figure 1 two equivalent representations of this model,first shown as a multi-layer network with overlapping ver-tex sets, and second as a multiplex network.

We generate both R and B from the generalized con-figuration model described in Section II A; i.e., both arerandom networks with clustering. In particular, we let{prst, s, t = 0, 1, . . . } and {pbst, s, t = 0, 1, . . . } denotethe joint distributions for single edges and triangles forR and B, respectively. Then both networks are gener-ated independently according to the algorithm describedin Section II A, and they are denoted respectively byR = R(n;α, prst) and B = B(n; pbst). We define the overallnetwork H over which influence spreads as the disjointunion H = R

∐B and represent it by H(n;α, prst, p

bst).

Here, the disjoint union operation implies that we stilldistinguish R-edges from B-edges in H, meaning that itis a multiplex network.

We denote the colored degree d of a node in H by

d = (drs, 2nrt, dbs, 2nbt) (1)

meaning that it has drs single edges and 2nrt triangleedges in network R, and dbs single edges and 2nbt triangleedges in network B; here nrt and nbt are defined as thenumber of triangles assigned to this node in R and B,respectively. The distribution of this colored degree isdenoted by pd and can be computed directly from prst,pbst and α.

C. Content-dependent Linear Threshold Model forSocial Contagion

The classical linear threshold model by Watts [28] isbased on individuals adopting a behavior when the frac-tion of their active neighbors exceed a certain thresh-old. Namely, an inactive node i with mi active neighborsand di−mi inactive neighbors will become active only ifmi/di exceeds τi drawn from a distribution P (τ). Moreprecisely, nodes update their states synchronously at dis-crete time steps t = 0, 1, . . . , and an inactive node will beactivated at time t if the fraction of its active neighborsexceeds its threshold at time t − 1; once active, a nodecan not be deactivated. A major concern with this modelis that it assumes all links in the network have the sameimportance, irrespective of the context that the spread-ing is being considered. However, in real world contagionprocesses, each link type (e.g., co-workership vs. familyor physical links vs. online social network links) may playa different role in different cascade processes. For exam-ple, in the spread of a new consumer product amongstthe population, a video game would be more likely tobe promoted among high school classmates rather thanamong family members; the situation would be exactlythe opposite in the case of a new cleaning product [58].

To address the aforementioned drawbacks, Yagan andGligor [29] proposed a content-dependent linear thresh-old model for social contagion in multiplex networks. Inthis model, each link type is associated with a content de-pendent parameter ci in [0,∞] that measures the relativebias type-i links have in spreading the content. Then, aninactive node with mi active neighbors and di −mi in-active neighbors through link type-i will turn active if∑

i cimi∑i cidi

≥ τ .

In this work, we will analyze complex contagions inH under the content-dependent threshold model intro-duced in [29]. Consider a node with colored degreed = (drs, 2nrt, dbs, 2nbt) and active-degree

m = (mrs,mrt1,mrt2,mbs,mbt1,mbt2) ,

where mrs (resp. mbs) gives the number of active neigh-bors connected through red single edges (resp. blue singleedges), and mrt1 and mrt2 (resp. mbt1 and mbt2) give thenumber of red (resp. blue) triangles with one and twoactive neighbors, respectively; see Figure 2 for demon-stration of three cases counted as mrs, mrt1, and mrt2,respectively. Next, for a given content to spread over H,let cr and cb denote the weight of red- and blue-edges,respectively, in spreading this content. Without loss ofgenerality, we set c := cr

cb. Then, the probability that an

inactive node with degree d and active-degree m turnsactive is given by

F (m,d) (2)

= P[c (mrs+mrt1+2mrt2)+mbs +mbt1+2mbt2

c (drs + 2nrt) + dbs + 2nbt≥ τ

]

Page 4: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

4

(a) (b)

FIG. 1. Illustration of multi-layer and multiplex network representations of our model. In (a), we see a multilayer network (e.g.,a Physical communication layer and a online social network layer) with overlapping vertex sets; vertical dashed lines representnodes corresponding to the same individual. In (b), we see the equivalent representation of this model by a multiplex network,i.e., an edge-colored multigraph. Edges from online social networks are shown in red and edges from the physical network areshown in blue; although not shown in this particular example, it is possible for a pair of nodes to be connected by both blueand red links, rendering the resulting representation a multi-graph.

(a) (b) (c)

FIG. 2. Illustration of three cases that would be counted as mrs, mrt1, and mrt2, respectively for the number of active nodes.Nodes shown in filled (green) circles are active while those shown in non-filled circles are inactive.

Hereafter, the function F (m,d) will be referred to as theneighborhood response function [52, 59].

D. The Problem

We consider the diffusion of influence over H that isinitiated by a node selected uniformly at random. Ourmain goal is to derive the conditions, probability, and ex-pected size of global cascades, i.e., cases where influencestarts from a single individual and reaches a positive frac-tion of the population in the large n limit. Of particularinterest will be to reveal the effect of clustering coefficientCglobal and content parameter c on these quantities.

III. CONDITION AND PROBABILITY OFGLOBAL CASCADES

In this section, we derive the condition and probabil-ity of global cascades in clustered multiplex networks;expected size of global cascades is handled separately inSection IV. As mentioned in Section II B, we restrict ourattention to networks with only two link types, labeledas blue and red edges, respectively. Distinguishing fur-ther the edges based on whether or not they are part ofa triangle, we obtain four types of edges in our clustered

multiplex network model, labeled as red single edges, redtriangles edges, blue single edges, and blue triangle edges;these are denoted by rs−, rt−, bs−, and bt−, respec-tively.

To analyze the influence diffusion process, we con-sider a branching process [60] that starts by activatinga node selected uniformly at random from among allnodes. Starting with the neighbors of the seed node,we explore and identify all nodes that are reached andactivated, continuing recursively until the branching pro-cess stops. Since the contagion model considered here ismonotone, i.e., nodes that are activated stay active for-ever, the branching process is guaranteed to stop, andthe resulting number of nodes reached gives the cascadesize.

Let H(x) denote the generating function [61] for the“finite number of nodes that are reached and influenced”by the branching process [60] initiated by a node se-lected uniformly at random. We will derive an expressionfor H(x) using hrs(x), hrt(x), hbs(x), and hbt(x), wherehrs(x) (resp. hbs(x)) stands for the generating functionfor the “finite number of nodes reached by following arandomly selected red single (resp. blue single) edge”;hrt(x) and hbt(x) are defined similarly for red triangleand blue triangle edges, respectively. Then, H(x) takesthe form

Page 5: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

5

H(x) = x∑d

pdD(drs, nrt, dws, nwt), (3)

where

D(y, z,m, `) := hrs(x)yhrt(x)zhbs(x)mhbt(x)` (4)

The validity of (3) can be seen as follows. The term xstands for the node that is selected randomly and set ac-tive to initiate the cascade. This node has a degree d =(drs, 2nrt, dbs, 2nbt) with probability pd. The number ofnodes reached by each of its drs (resp. dbs) red singleedges (resp. blue single edges) has a generating functionhrs(x) (resp. hbs(x)). Considering its triangle edges in asimilar manner, we see from the powers property of gener-ating functions [44] that when the initial node has degreed the number of nodes influenced in this process has agenerating function hrs(x)drshrt(x)nrthbs(x)dbshbt(x)nbt .Taking the expectation over the degree d of the initialnode, we get (3).

For (3) to be useful, we shall derive expressions for thegenerating functions hrs(x), hrt(x), hbs(x), and hbt(x).As will become apparent soon, there are no explicit equa-tions defining these functions. Instead, we should seek toderive recursive equations to define each generating func-tion in terms of the others. These steps are taken in thenext sections where we first focus on deriving hrs(x) andhbs(x) (Section III A) followed by derivations of hrt(x)and hbt(x) (Section III B).

Given that random networks with clustering are freeof cycles of size larger than three, it is clear that the ini-tial stages of the branching process will expand largelybecause of vulnerable nodes that can get activated ei-ther by one or two active neighbors. In our formula-tion, the multiplex nature of the network calls for defin-ing the notion of the vulnerability with respect to linktypes as well [29]. Throughout, we say that a node isR-vulnerable (resp. B-vulnerable) if it gets activatedby a single active connection through a red link (resp.blue link). We define ρd,rs and ρd,bs as the probabil-ity that a node is R-vulnerable and B-vulnerable, re-spectively. We also define ρd,rt (resp. ρd,bt) as theprobability that a node gets activated by having twoactive neighbors via red (resp. blue) edges. In otherwords, ρd,rt (resp. ρd,bt) gives the probability that anode gets activated by having a red (resp. blue) trian-gle with both neighbors being active; see Figure 2. Moreprecisely, we set ρd,rs = F [(1, 0, 0, 0, 0, 0) ,d], ρd,rt =F [(0, 0, 1, 0, 0, 0) ,d], ρd,bs = F [(0, 0, 0, 1, 0, 0) ,d], andρd,bt = F [(0, 0, 0, 0, 0, 1) ,d].

A. Influence Propagation via Red Single Edges

We start by deriving recursive equations for hrs(x) andhbs(x) by focusing on the number of nodes reached andinfluenced by following one end of a single edge in R and

B, respectively. In what follows, we only derive hrs(x)since the computation of hbs(x) follows in a very similarmanner. In order to compute hrs(x), consider picking ared single edge uniformly at random (among all red sin-gle edges in H) and assume that it is connected at oneend to an active node. Then, we compute the generatingfunction for the number of nodes influenced by follow-ing the other end of the edge, and obtain the followingexpression for the generating function hrs(x):

hrs(x) = x∑d

drspd〈drs〉

ρd,rsD(drs − 1, nrt, dbs, nbt)

+ x0∑d

drspd〈drs〉

(1− ρd,rs) (5)

where D is as defined at (4).We now explain each term appearing at (5) in turn.

The explicit factor x stands for the initial vertex that isarrived at by following the randomly selected red single-edge. The term drspd

〈drs〉 gives the normalized probability

that the arrived vertex has colored degree d. Since thearrived node is reached by a red link, it needs to be red -vulnerable to be added to the vulnerable component. Ifthe arrived node is indeed red -vulnerable, which happenswith probability ρd,rs, it can activate other nodes via itsremaining drs − 1 red single edges, dbs blue single edges,nrt red triangles, and nbt blue triangles. Because thenumber of vulnerable nodes reached by each of its redsingle edges and triangles (resp. blue single edges andtriangles) are generated in turn by hrs(x) and hrt(x)(resp. hbs(x) and hbt(x)) respectively, we obtain theterm hrs(x)drs−1hrt(x)nrthbs(x)dbshbt(x)nbt by the pow-ers property of generating functions. Averaging over allpossible colored degrees d gives the first term in (5). Thesecond term with the factor x0 accounts for the possibil-ity that the arrived node is not red-vulnerable and thusis not included in the cluster. An analogous expressioncan be obtained for hbs(x) via similar arguments.

B. Influence Propagation via Red Triangles

We now derive hrt(x), i.e., the generating function forthe number of nodes influenced by following a red trian-gle selected at random; similar arguments hold for hbt(x).We consider the situation where nodes u, v, and w forma triangle and the top vertex u is active, while othersare not. We are interested in computing the generatingfunction for the total number of nodes that will be influ-enced by nodes v and w. We will compute the generatingfunction hrt(x) by conditioning on the following events:

• If neither of nodes v and w is R-vulnerable, thenthe number of nodes influenced will be zero. Nodev has degree d with normalized probability nrtpd

〈nrt〉 ,

in which case it is not R-vulnerable with proba-bility 1 − ρd,rs. Similarly, the probability that

Page 6: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

6

node w has degree d′ and not R-vulnerable is

(n′rtpd′〈n′

rt〉)(1 − ρd′,rs). Summing over all possible

cases, we obtain the first term in (7) with x0 (mean-ing that zero nodes will be influenced by followingthe red triangle in this case).

• Consider the case where only one of v and w isinfluenced, leading to a term with the factor x1 in(7). Without loss of generality, consider the casewhere v is activated while w is not. If node v hasdegree d, then it is R-vulnerable with probabilityρd,rs, and can influence other nodes in the usualmanner. Then, the event that node w, with degreed′, will not be activated despite having two activeneighbors (nodes u and v) has probability 1−ρd′,rt.By symmetry and exchangeability of nodes v andw, an equivalent term will be obtained for the casewhere w is activated but v is not. Summing overall possibilities we obtain the second term in (7).

• Finally, we consider the case where both v and wbecome active giving rise to term with factor x2.There are two possible scenarios:

– Both of v and w are activated by u. The prob-ability that v is activated by u is ρd,rs as al-ready discussed during the computation of thefirst term. By symmetry, the probability for

w is the same as for v. Multiplying the twoprobabilities leads to the third term in (7).

– Only one of v and w is made active imme-diately by u while the other is not; e.g., sayv is activated but not w. However, w alsogets activated by the joint influence from uand v. With d and d′ denoting the degreeof v and w, this happens with probability(ρd,rs)(ρd′,rt − ρd′,rs). Here, the second termaccounts for the fact that w gets activated onlyif it has two (or, more) active neighbors. Sum-ming over all possibilities as before, and mul-tiplying by two for the case where v and w arereplaced, we obtain the last term in (7).

C. Deriving the Condition for Global Cascades

The discussion given in Section III A and III B leadsto a set of recursive equations for hrs(x), hrt(x), hbs(x),and hbt(x). Recursions for hrs(x) and hrt(x) are givenin (6) and (7), respectively; the expressions for hbs(x)and hbt(x) are very similar and omitted here for brevity.With these four recursive equations in place, it is possibleto determine the characteristic function H(x) of the fi-nite number of nodes activated in the contagion process.Namely, for a given x, we shall find a fixed-point of theserecursive equations, and then use the resulting values ofhrs(x), hrt(x), hbs(x), and hbt(x) in (3) to get H(x).

hrs(x) = x∑d

drspd〈drs〉

ρd,rsD(drs − 1, nrt, dbs, nbt) + x0∑d

drspd〈drs〉

(1− ρd,rs) (6)

hrt(x) = x0∑d

∑d′

nrtpd〈nrt〉

(1− ρd,rs)n′rtpd′

〈n′rt〉(1− ρd′,rs) (7)

+ 2x∑d

∑d′

nrtpd〈nrt〉

ρd,rsD(drs, nrt − 1, dbs, nbt)n′rtpd′

〈n′rt〉(1− ρd′,rt)

+ x2∑d

∑d′

(nrtpd〈nrt〉

ρd,rsD(drs, nrt − 1, dbs, nbt)

)(n′rtpd′

〈n′rt〉ρd′,rsD(d′rs, n

′rt − 1, d′bs, n

′bt)

)+ 2x2

∑d

∑d′

(nrtpd〈nrt〉

ρd,rsD(drs, nrt − 1, dbs, nbt)

)(n′rtpd′

〈n′rt〉(ρd′,rt − ρd′,rs)D(d′rs, n

′rt − 1, d′bs, n

′bt)

)

By conservation of probability and the definition ofgenerating functions, we know that H(1) = 1 only if finalnumber of activated nodes is finite with probability one.In other words, global cascades that lead to a positivefraction of influenced nodes are possible only if H(1) < 1.This prompts us to seek a fixed point of the recursiveequations when x = 1. For notational convenience, wedefine h1 := hrs(1), h2 := hrt(1), h3 := hbs(1), and

h4 := hbt(1). From (3), this gives

H(1) =∑d

pdhdrs1 hnrt

2 hdbs3 hnbt4 , (8)

while the recursions take the form

hi = gi(h1, h2, h3, h4), i = 1, 2, 3, 4. (9)

Here the functions g1, g2 are easily obtained from (6) and(7), and similarly g3 and g4 can be obtained from the

Page 7: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

7

recursions for hbs(x), and hbt(x). To give an example,we have

g3(h1, h2, h3, h4)

=∑d

dbspd〈dbs〉

(ρd,bsh

drs1 hnrt

2 hdbs−13 hnbt4 + 1− ρd,bs

)

It is clear that the recursions (9) have a trivial fixedpoint h1 = h2 = h3 = h4 = 1 which yields H(1) = 1,meaning that cascades will die out without reaching apositive fraction of the population with high probability.To check the stability of the trivial solution, we linearizethe recursive equations (9) around x = 1, and computethe corresponding Jacobian matrix J via

J(i, j) =∂gi(h1, h2, h3, h4)

∂hj

∣∣∣∣∣h1=h2=h3=h4=1

(10)

for each i, j = 1, 2, 3, 4; the exact expression for the fourby four matrix J is not give here in order to save space.Now, the trivial solution h1 = h2 = h3 = h4 = 1 is lin-early stable if and only if the largest eigenvalue in abso-lute value of J, denoted σ(J), is less than one. Otherwise,if σ(J) > 1, then there exists another fixed point for therecursion with h1, h2, h3, h4 < 1, leading to H(1) < 1. Inthat case, the probability deficit 1 −H(1) > 0 gives theprobability that the contagion process reaches infinitelymany nodes, i.e., a global spreading event takes place.Collecting, we conclude that the condition of global cas-cades is given by σ(J) > 1, while the probability of globalcascade equals Ptrig = 1−H(1).

IV. EXPECTED CASCADE SIZE

Next, we are interested in computing the expected sizeof global cascades when they take place. Put differently,we will analyze the expected fraction of nodes that willeventually become active as we pick a node in the networkuniformly at random and set it active. We follow theapproach used in [29, 52, 62], which has been proven tobe an effective way to analyze expected cascade size innetworks.

First, consider the network H as a tree-structure withan arbitrary node selected as the root. Then, label thelevels of the tree from ` = 0 at the bottom to ` = +∞at the top of the tree. Similar to [29, 52], we assumethat nodes begin updating their states starting from thebottom of the tree and proceeding to the top. In otherwords, we assume that a node at level ` updates its stateonly after all nodes at the lower levels 0, 1, . . . , `−1 finishupdating. We define qrs,` as the probability that a nodeat level ` of a tree, which is connected to its unique parentby a red single edge, is active given that its parent atlevel `+ 1 is inactive. Then, we consider a pair of nodesat level ` that together with their parent at level ` + 1form a red triangle. Given that the parent is inactive, welet qrt1,` (resp. qrt2,`) denote the probability that onlyone (resp. both) of the two child nodes of this triangle isactive. We define qbs,`, qbt1,`, and qbt2,` for blue edges inthe same manner.

According to our model, an active node is never deac-tivated, meaning that qrs,`, qrt1,`, qrt2,`, qbs,`, qbt1,`, qbt2,`are all non-decreasing. Therefore, they will converge toqrs,∞, qrt1,∞, qrt2,∞, qbs,∞, qbt1,∞, qbt2,∞. Then, theexpected cascade size (i.e., the fraction of active indi-viduals) S is given by the probability that the arbitraryselected node at the top of the tree becomes active. Inorder to computer S, we first derive recursive relationsfor qrs,`, qrt2,`, qrt2,`, qbs,`, qbt2,`, qbt2,`. We have

qrs,`+1 =∑d

drspd〈drs〉

drs−1∑i=0

nrt∑j=0

j∑x=0

dbs∑m=0

nbt∑n=0

n∑y=0

Q` [(i, j, x,m, y, n) , (drs − 1, nrt, dbs, nbt)]F [(i, x, j − x,m, y, n− y),d]

qrt1,`+1 = 2∑d,d′

nrtpdnrt

n′rtpd′

〈n′rt〉

drs,d′rs∑

i,i′=0

nrt−1,n′rt−1∑

j,j′=0

j,j′∑x,x′=0

dbs,d′bs∑

m,m′=0

nbt,n′bt∑

n,n′=0

n,n′∑y,y′=0

{Q` [(i, j, x,m, y, n) , (drs, nrt − 1, dbs, nbt)]

×Q`[(i′, j′, x′,m′, y′, n′

), (d′rs, n

′rt − 1, d′bs, n

′bt)]F [(i, x, j − x,m, y, n− y),d]

×(1− F[(i′, x′ + 1, j′ − x′,m′, y′, n′ − y′),d′

])}

qrt2,`+1 =∑d,d′

nrtpdnrt

n′rtpd′

〈n′rt〉

drs,d′rs∑

i,i′=0

nrt−1,n′rt−1∑

j,j′=0

j,j′∑x,x′=0

dbs,d′bs∑

m,m′=0

nbt,n′bt∑

n,n′=0

n,n′∑y,y′=0

{Q` [(i, j, x,m, y, n) , (drs, nrt − 1, dbs, nbt)]

×Q`[(i′, j′, x′,m′, y′, n′

), (d′rs, n

′rt − 1, d′bs, n

′bt)](

F [(i, x, j − x,m, y, n− y),d]F[(i′, x′ + 1, j′ − x′,m′, y′, n′ − y′),d′

]+(F [(i, x+ 1, j − x,m, y, n− y),d]− F [(i, x, j − x,m, y, n− y),d]

)F[(i′, x′, j′ − x′,m′, y′, n′ − y′),d′

])}

Page 8: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

8

where we define

Q` [(i, j, x,m, n, y) , (d1, d2, d3, d4)] =

(d1i

)qirs,`(1− qrs,`)d1−i

(d2j

)(j

x

)qxrt1,`q

j−xrt2,`(1− qrt1,` − qrt2,`)

d2−j

(d3m

)qmbs,`

× (1− qbs,`)d3−m(d4n

)(n

y

)qybt1,`q

n−ybt2,`(1− qbt1,` − qbt2,`)

d4−n (11)

In words, Q` [(i, j, x,m, n, y) , (d1, d2, d3, d4)] gives theprobability that a node at level ` with colored degree(d1, 2d2, d3, 2d4) has

• i (resp. d1 − i) of the d1 neighbors connectedthrough red single edges as active (resp. inactive).Similarly, m (resp. d3−m) of the d3 neighbors con-nected through blue single edges as active (resp. in-active)

• of the d2 red-triangles it participates in, x has oneactive node, j − x has two active nodes, and d2 −j has no active node. Similarly, of the d4 blue-triangles it participates in, y has one active node,n−y has two active nodes, and d4−n has no activenode

Hence, multiplying Q` [(i, j, x,m, n, y) , (d1, d2, d3, d4)]with F [(i, x, j − x,m, y, n− y),d] and summing over allpossibilities for d and i, j, x,m, n, y gives the probabil-ity that the node under consideration turns active. This

confirms the first expression above. Second and thirdterms consider simultaneously a pair of nodes that arepart of a red triangle (where the top, i.e., parent, ver-tex is inactive). Therefore, we first condition on thedegrees of these two nodes being d and d′ respectively,and consider all possibilities concerning the states (ac-tive vs. inactive) of these neighbors. Then for qrt1,`+1,we realize by symmetry that the desired expression istwo times the probability that the node with degree dturns active, and despite having one extra active neigh-bor, the node with degree d′ does not turn active. Thefact that first node turns active is incorporated in the ex-pression (1 − F [(i′, x′ + 1, j′ − x′,m′, y′, n′ − y′),d′]) bythe term x′ + 1. For qrt2,`+1, we proceed similarly andrealize that for both nodes to turn active there are twopossibilities. The node with degree d either turns ac-tive regardless of the state of the node with degree d′ (inwhich case the node with degree d′ will turn active withprobability F [(i′, x′ + 1, j′ − x′,m′, y′, n′ − y′),d′]), or itturns active only after the node with degree d′ does.

With the above recursion in place, we compute thefinal cascade size via

S =∑d

pd

drs∑i=0

nrt∑j=0

j∑x=0

dbs∑m=0

nbt∑n=0

n∑y=0

Q∞ [(i, j, x,m, n, y) , (drs, nrt, dbs, nbt)]F [(i, x, j − x,m, y, n− y),d] (12)

Namely, we first solve for the values of qrs,∞, qrt1,∞,qrt2,∞, qbs,∞, qbt1,∞, qbt2,∞ using the recursive equations,and then substitute them into (12) to obtain the expectedsize of global cascades.

V. NUMERICAL RESULTS

A. Networks with Doubly Poisson Distributions

In our first simulation study, we use doubly Poissondistribution for the number of single edges and trianglesin both networks. Namely, we set

prst = e−λr,1(λr,1)s

s!e−λr,2

(λr,2)t

t!, s, t = 0, 1, . . . ,

pbst = e−λb,1(λb,1)s

s!e−λb,2

(λb,2)t

t!, s, t = 0, 1, . . . ,

where s and t are the number of single edges and trianglesin the corresponding networks, respectively. Thus, λr,1and λr,2 (resp. λb,1 and λb,2) denote the mean number ofsingle edges and triangles, respectively in R (resp. in B).

We consider n = 1 × 105 nodes in the population andα = 0.5 for the size of network R. We let τ = 0.18and c = 0.25 for the threshold and content parameters,respectively. The results are shown in Figure 3 wherethe curves stand for the theoretical results of probabil-ity Ptrig and expected size S of cascades (obtained fromour discussion in Section III and IV), as a function ofλr,1 = λr,2 = λb,1 = λb,2. The markers stand for the em-pirical results for the same quantities, and are obtainedby averaging over 5,000 independent experiments. Wesee a very good agreement between the analytical andexperimental results confirming the validity of our anal-ysis. The slight discrepancy observed in Ptrig is due tothe limited number of experiments, and can be mitigated

Page 9: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

9

Degree Parameter (λr,1 = λr,2 = λb,1 = λb,2)0 0.5 1 1.5 2 2.5 3

FractionalSize

0

0.2

0.4

0.6

0.8

1S - AnlysS - Expt.Ptrigger - AnlysPtrigger - Expt.

(a)

Content Parameter (c)0 1 2 3 4 5 6

FractionalSize

0

0.1

0.2

0.3

0.4

S - Expt.S - Anlys.Ptrigger - Anlys

(b)

FIG. 3. Simulations for doubly Poisson degree distributions.In (a), we set the content parameter c = 0.25, the thresholdas τ = 0.18, and α = 0.5, and vary the degree parameters.In (b), we fix τ = 0.18, λr,1 = λr,2 = λb,1 = λb,2 = 0.3, andα = 0.5 while varying content parameter c.

by increasing the number of realizations.

Next, we change our experimental set-up to demon-strate the effect of content parameter on the probabil-ity and size of cascades. To that end, we fix all net-work parameters and observe the quantities of interestas the content parameter c varies. In particular, we setλr,1 = λr,2 = λb,1 = λb,2 = 0.3 and τ = 0.18. We seethat the probability and expected size of global cascadesvary greatly as c changes. This can be taken as an in-dication that our model can capture the real-world phe-nomenon that over the same population certain contentscan become widespread while others die out quickly. Inthe setting used here, we see that global cascades takeplace when the content parameter c is not too small orlarge. The reason is that with a too small or large contentparameter, the connectivity of the conjoined network isdominated by only one of the two networks. So, if neitherof them has enough connectivity to trigger a global cas-cade by their own, then there will be no global cascades inthe conjoined network. When the c is neither too largenor too small (e.g., close to unity), both networks willcontribute to the connectivity together and it becomespossible to trigger a global cascade. For other values ofλr and λb a completely different situation might occur,e.g., with very small or very large c promoting cascades;e.g., see [29, Fig. 2] for a few such examples.

B. How does Clustering Affect the Cascade Size?

Our next goal is to reveal the impact of clustering onthe influence propagation process in the models consid-ered here. To do so, we should be able to vary the level ofclustering while keeping the first and second moment ofthe total degree distribution fixed, as they are known toaffect the contagion behavior significantly [29]. In orderto satisfy this constraint, we use Poisson distributions forthe number of single edges and triangle in two networkswith parameters given in Table I.

With the setting given in Table I, the clustering co-efficient in R will be fixed for any η ∈ [0, 4], while theclustering of B varies between the two extremes: i) whenη = 4, B will have no single-edges and consist only oftriangles resulting with a clustering coefficient close toone; and ii) with η = 0, there will be no triangles in Band hence its clustering coefficient will be close to zero.Collecting, we see that the clustering coefficient of B andthus of H increases with increasing η in this setting.

Network R Network BDistribution of single-edges Poi(2λ) 2 Poi( 4−η

2λ)

Distribution of triangles Poi(λ) Poi(η2λ)

TABLE I. Parameters of the doubly Poisson distribution.This choice ensures that the mean and variance of the totaldegree distribution (single plus triangle edges) in B are inde-pendent of η, while its clustering varies greatly as η varies in(0, 4). In Figure 3 we set λ = 0.5.

With these in mind, we first demonstrate the impactof clustering on the probability of triggering a global cas-cade. Figure 4(a) shows the probability of triggering aglobal cascade as a function of λ for three different η val-ues. The resulting clustering coefficients are plotted inFigure 4(b) where we clearly see that clustering increaseswith increasing η. The main observation from Figure 4(a)is that increasing the clustering (i.e., increasing η), shiftsthe interval of λ for which global cascades are possibleto the right. This leads to a double-faceted conclusionthat clustering decreases the probability of global cas-cades when average degrees are small, whereas after acertain value of average degree, clustering increases theprobability of cascades.

The double-faceted impact of clustering on cascadeprobability can be explained as follows. It is known[18, 28] that threshold models of complex contagion ex-hibit two phase transitions as the average degree in-creases, a second-order transition at low degrees thatmarks the formation of a giant vulnerable component anda first-order transition at high degrees due to increasedlocal stability of nodes; namely, due to the increased diffi-culty of activating high-degree nodes. Given that cluster-ing is known to decrease the size of giant component [31],we expect that it will be more difficult for a clustered net-work to contain a giant vulnerable cluster. This is whythe lower phase transition in complex contagions appear

Page 10: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

10

Degree Parameter (λ)0 0.5 1 1.5 2

Fractional

Size

0

0.2

0.4

0.6η = 3.99η = 3η = 0.01

(a)

Degree Parameter (λ)0 0.5 1 1.5 2

ClusteringCoeffi

cient

0

0.1

0.2

0.3

0.4η = 0.01η = 3η = 3.99

(b)

Degree Parameter (λ)0 0.5 1 1.5 2

Fractional

Size

0

0.2

0.4

0.6

0.8

1η = 3.99η = 3η = 0.01

(c)

FIG. 4. Illustration of the effect of clustering coefficient on the expected cascade and probability of global cascades. We fixτ = 0.18, c = 0.25, and α = 0.5, then vary the degree parameter λ defined in Table I. We see (a) the probability to trigger aglobal cascade; (b) the global clustering coefficient described in Section II A; and (c) the expected cascade size.

later (i.e., at larger degrees) as clustering increases. Onthe other hand, the cycles of size three (i.e., triangles)that are common in clustered networks can help triggercascades when average degree is higher. For instance,in a tree-like network a single active node can only acti-vate its vulnerable connections. However, in a triangle,an active node may first activate one of its vulnerableconnections, making it possible for the third node to beactivated (which now has two active neighbors) even if itis not vulnerable. This is what pushes the second phasetransition to higher degrees.

Next, we explore the impact of clustering (in the set-ting considered here) on the expected cascade size inFigure 3(c). Here again, we see the double-faceted im-pact of clustering with small average degrees favoring lowclustering, while high degrees favoring high clustering interms of having a larger cascade size. In fact, we see theexistence of a critical average degree (around λ = 0.6 inFigure 3(c)) such that when λ is smaller (resp. larger)than the critical value, expected cascade size decreases(resp. increases) with increasing clustering. This extendsthe observation that Hackett et al. [52] made for single-layer networks to multi-layer networks.

Finally, we consider the impact of clustering on theaverage degree-cascade threshold plane. For each pa-rameter pair (λ, τ), the curves in Figure 5 separate theregion where global cascade can take place (areas insidethe boundaries) from the region where they cannot (areasoutside the boundaries). Once again, we confirm that in-creasing the clustering coefficient shifts the interval wherecascades are possible up (i.e., to higher degrees) for anythreshold τ .

VI. COMPARISON BETWEEN MONOPLEXAND MULTIPLEX NETWORKS

In what follows, we will compare the dynamics of com-plex contagions over a monoplex network with that overa multiplex network. Of particular interest will be to findout whether the projection of a multiplex network into amonoplex network leads to any significant differences inthe dynamics that would warrant the separate analyses

Threshold (τ)0.1 0.15 0.2 0.25

DegreeParameter

(λ)

0

1

2

3

Global Cascades

No Global Cascades

η - 0.01η - 2.00η - 3.99

FIG. 5. We show the cascade regions in the DegreeParameter-Threshold plane when α = 0.5, τ = 0.18, and bothnetworks follow doubly Poisson distributions as described inTable I. Clustering increases as η increases.

of multiplex networks as conducted here.To identify the factors affecting complex contagions,

we consider two different degree distributions to generatethe networks. In Section VI A, we use a setting similar toprevious sections, with the resulting networks having al-most no degree-degree correlations; e.g., assortativity de-fined as the Pearson correlation coefficient between thedegrees of pairs of linked nodes [63]. In Section VI Band VI C, we use a different setting that leads to (tun-able) assortativity for multiplex networks. In order tokeep the focus on the comparison between monoplex andmultiplex networks, we shall consider only non-clusterednetworks in the following discussion.

A. Multiplex Networks with Limited Assortativity

First we consider the limited assortativity case and usethe following degree distribution to assign blue and redstubs to each node:

pbk = e−λbλkbk!, k = 0, . . . , (13)

prk = αe−λrλkrk!

+ (1− α)δk,0, k = 0, . . . .

Page 11: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

11

where δ denotes the Kronecker delta. In other words,each nodes receives Poi(λb) blue edges, and an α-fractionof nodes receive an additional Poi(λr) edges of color red.A multiplex network is generated using the colored config-uration model [64, 65] where stubs that are of the samecolor are matched randomly. The monoplex projectedtheory ignores the color of the edges and matches allstubs randomly with each other. An important questionis whether we lose any significant information about con-tagion dynamics when the monoplex projected theory isused instead of the multiplex theory developed here andin [29]. For convenience, we set λb = λr and use c = 1 asthe content parameter.

In Figure 6(a), we set α = 0.99. We see nearly nodifference between the theoretical cascade sizes obtainedfrom monoplex and multiplex theories, and they bothmatch the simulation results well. However, when α isreduced to 0.1 in Figure 6(b), we clearly see a differ-ence between the two theories and only multiplex theorymatches the simulation results. This shows that even inthe simplest case where both link types have the sameinfluence factor (i.e., c = 1), monoplex theory may beunable to capture certain properties of cascade dynam-ics, reinforcing the need for studying cascades using themultiplex theory.

We now explain why the two cases, α = 0.1 andα = 0.99, lead to different conclusions regarding the ac-curacy of the monoplex theory in capturing contagiondynamics over multiplex networks. One of the key dif-ferences between the two cases is the resulting assorta-tivity. When α = 0.1, only 10% of the nodes has redstubs, each of which can only be connected with otherred stubs in the multiplex network case. Put differently,in this setting a small fraction of the population will havestatistically higher degrees than the rest, and the addi-tional links they have can only connect nodes with highdegrees together. This leads to a positive correlation (i.e.,assortativity) between the degrees of pairs of connectednodes. However, in the monoplex projection, the addi-tional edges can be used to connect any two nodes, re-sulting with very little to no assortativity in the network.Obviously, when α is close to one, almost every node willhave the additional edges and the above phenomenon willnot be observed. Our simulation results confirm this in-tuition as we see that assortativity is negligible (∼ 10−4)in both monoplex and multiplex cases when α = 0.99,while with α = 0.1, assortativity varies (as λr = λb in-creases) from 0.05 to 0.2 in the multiplex case while stillbeing negligible in the monoplex case.

The impact of assortativity on the comparison betweenmonoplex and multiplex theories is investigated furtherin the forthcoming discussion.

B. Multiplex Networks with Assortativity

In this section, we change the setting slightly to gener-ate multiplex networks with high assortativity. To that

Degree Parameter (λr = λb)0 2 4 6 8

Fractional

Size

0

0.2

0.4

0.6

0.8

1

projected theory: α = 0.99multiplex theory: α = 0.99simulations: α = 0.99

(a)

Degree Parameter (λr = λb)0 2 4 6 8

Fractional

Size

0

0.2

0.4

0.6

0.8

1

projected theory: α = 0.1multiplex theory: α = 0.1simulations: α = 0.1

(b)

FIG. 6. Comparison between monoplex networks and mul-tiplex networks with limited assortativity. In (a) and (b),we fix the threshold τ = 0.15, the content parameter c = 1,then vary the degree parameters in (13). For the networksobtained by projected theory and the networks in multiplextheory with α = 0.99, assortativity is negligible. However,when α = 0.1, the assortativity coefficient of the networks inthe multiplex theory become significant; e.g., it can be up to0.21.

end, we use the degree distributions given at (13), butinstead of setting λr = λb, we enforce

αλr = λb (14)

for any α ∈ (0, 1). This setting allows us to tune assorta-tivity without changing the mean degree in the network.In particular, assortativity will increase as α decreases(by virtue of a small fraction of nodes forming a highly-connected cluster) [31]. In addition, this setting allowsus to compare the contagion dynamics in multi-layer net-works when the upper layer is i) small but densely con-nected (small α) versus ii) large but loosely connected(large α); see [31] for relevant results for bond percola-tion processes.

Using the above degree distributions, we generatemonoplex and multiplex networks as in Section VI A andanalyze the complex contagion process. In Figure 7(a),we see that when α = 0.99, which leads to very limitedassortativity, the difference between monoplex and mul-tiplex networks is negligible. This is in parallel with whatwe observed in Section VI A. However, decreasing α to0.1 leads to two interesting observations in Figure 7(b).First, instead of the commonly reported two phase tran-sitions [18, 28], we observe four phase transitions in the

Page 12: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

12

Degree Parameter (αλr = λb)0 2 4 6 8

FractionalSize

0

0.2

0.4

0.6

0.8

1

projected theory: α = 0.99multiplex theory: α = 0.99simulations: α = 0.99

(a)

Degree Parameter (αλr = λb)0 2 4 6 8

FractionalSize

0

0.2

0.4

0.6

0.8

1

projected theory: α = 0.1multiplex theory: α = 0.1simulations: α = 0.1

(b)

FIG. 7. Comparison between monoplex networks and multi-plex networks with assortativity. Similar with the observationin Figure 6, networks in the projected theory and in the mul-tiplex theory with α = 0.99 have negligible assortativity coef-ficients. However, for the networks of multiplex theory withα = 0.1, assortativity coefficient ranges from 0.19 to 0.79. Ingeneral, assortativity increases with increasing λr and λb inthe multiplex theory.

cascade size as αλr = λb increases. Secondly we see a sig-nificant difference between the monoplex projected the-ory and multiplex theory, with only the multiplex theorymatching the simulations well. Once again, this showsthat monoplex theory is unable to capture the cascadedynamics under certain settings.

The emergence of four phase transitions in Figure 7(b),which to the best of our knowledge was not reportedbefore, can be explained as follows 1. When α = 0.1,only ten percent of the nodes have red edges, but thethe mean number of red edges for those nodes equals10λb (see (14)). Therefore, the first couple of phase tran-sitions taking place at very small λb values can be at-tributed mainly to red-edges. First, λb becomes largeenough (e.g., gets to around 0.1) that the sub-network

1 From a practical point of view, observing four phase transitionsrather than two signals a more chaotic system behavior (in termsof contagion dynamics) with respect to changes in the degree pa-rameter λb. In turn, this would make the prediction of cascaderegion more difficult in the cases where system parameters arenot known exactly but estimated; e.g., social network applica-tions.

induced only on the red-edges contains a giant vulnera-ble cluster, giving rise to global cascades; note that atthis point λb is so small that blue edges do not createenough local stability to prevent cascades from happen-ing. However, after λb reaches a certain level (around0.65), the subgraph on red-edges, having average degreeof 10λb, reaches the second phase transition point wherecascades stop due to the increased local connectivity ofnodes. These first two transitions being second- and first-order, respectively, also confirms that they are primarilydue to the red-edges.

As λb increases further we observe an interval wherethere are no global cascades due to either colors of edges;nodes with red and blue-edges are highly stubborn whilenodes with only blue edges are not connected enough totrigger a cascade. This interval is then followed by a re-gion where λb is large enough that the sub-graph on blueedges has a giant vulnerable cluster. However, the emer-gence of a second-order transition in the whole network isprevented here due to some of these nodes turning stub-born as a result of their red-edges. Eventually, however,λb becomes large enough that even with occasional stub-born nodes present, a giant vulnerable cluster emerges.This point is reached much later in monoplex networksas compared to the multiplex networks. This is becausein the former case stubborn nodes (with red edges) areequally likely to be connected with any other node, whilein the latter case they are mostly connected with eachother; thus in the latter case they are less likely to in-hibit the emergence of a giant vulnerable cluster on blueedges.

Finally, the system goes through a fourth transitionwhen λb becomes large enough that even nodes withonly blue edges become highly connected and hence stub-born. We see that this final transition point is reachedmuch later in multiplex networks than monoplex net-works meaning that cascades take place over a broaderrange of λb values in the former case. Again, this can beattributed to the high assortativity seen in multiplex net-works that leads to extremely stubborn nodes (that haveboth blue and red edges) being isolated from those thatare mildly stubborn (that have only blue edges). On theother hand, in monoplex networks, every node is able toconnect with the extremely stubborn nodes, and thus thecritical value of λb at which cascades become impossibledue to high local stability is reached much earlier thanthat in multiplex networks.

C. Two vs. Four Phase Transitions

In Section VI B, we have observed the possibility ofhaving more than two phase transitions in the cascadesize. As discussed there, multiple phase transitions oc-cur mainly due to the setting (14) that, with small α, en-sures a small fraction of nodes having significantly higherconnectivity than the rest, while also being mostly con-nected with each other. Since the existence of more than

Page 13: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

13

Degree Parameter (αλr = λb)0 2 4 6 8

Fractional

Size

0

0.2

0.4

0.6

0.8

1

multiplex theory: α = 0.1multiplex theory: α = 0.166multiplex theory: α = 0.99

FIG. 8. Demonstration of multiple phase transitions.

two transitions has not been reported in previous studies,we are interested in exploring it further. In particular, wenow investigate the impact of α on the number of phasetransitions as well as transition points. Of particular in-terest will be to find the critical α value that separatesthe cases where four phase transitions occur from thosewith only two transitions; e.g., the α value for which thetwo cascade regions overlap. For simplicity we only con-sider multiplex networks in this section.

Figure 8 shows the expected size of global cascadesunder (13)-(14) for three different values of α. We seethat global cascades take place over a single interval ofαλr = λb when α is large (e.g., α = 0.99) while overtwo disjoint intervals when α is small (e.g., α = 0.1).When α is somewhere in between (e.g., case α = 0.166 )it is possible to have the cascade intervals partially over-lap. In such cases, we only see a single interval whereglobal cascades take place. However, an additional tran-sition point appears, manifested by a shift of slope incascade size, marking possibly the overlapping point of(what would be) the two cascade intervals.

Figure 8 allows us to comment also on the impact ofthe size and density of the online social network layer infacilitating influence propagation. With (14) in effect,a small α corresponds to a social network with few butdensely connected individuals, while large α correspondsto a social network with many subscribers, each with fewconnections on average. In all cases, the total numberof edges in the social network is fixed by virtue of (14).We see from Figure 8 that the comparison between thethree cases leads to a multi-faceted picture as the meannumber of links αλr varies. For instance, the large butloosely connected case of α = 0.99 leads to the largestexpected global cascade size over a certain interval, butit has the smallest cascade interval among all three. Theintermediate case of α = 0.166 seems like a stretchedversion (over the x-axis) of the case with α = 0.99. Inparticular, this case leads to the largest interval whereglobal cascades are possible, though the expected cascadesize is smaller than that obtained with α = 0.99 (and alsowith α = 0.1) over certain intervals. Finally, the caseof a small but densely connected extra layer (i.e., with

α = 0.1), falls right under the case with α = 0.166 formost values of αλr, though it gives the largest size of allthree in small intervals where αλr is very small or verylarge.

VII. CONCLUSION AND FUTURE WORK

We studied the diffusion of influence in a class of clus-tered multiplex networks. We solved analytically for thecondition, probability, and expected size of global cas-cades, and confirmed our results via extensive computersimulations. One of our key findings is to show how clus-tering affects the probability and expected size of globalcascades. We also compared several interesting proper-ties of complex contagions on a multiplex network andits monoplex projection. We demonstrate that ignoringlink types and aggregating network layers may lead to in-accurate conclusions about contagion dynamics, particu-larly when assortativity is high. Finally, we show for thefirst time that linear threshold models do not necessarilyexhibit two phase transitions as previously reported. De-pending on assortativity, we show that both in monoplexand multiplex cases (with two link types) it is possible toobserve four phase transitions.

Our analysis and modeling framework subsumes someprevious studies. For instance, by setting hrt(x) =hbt(x) = 1 in the recursive relations, we ensure that Rand B are non-clustered random networks. So, our anal-ysis corresponds to complex contagions in non-clusterednetworks, which was studied in [29]. Similarly, if we lethrs(x) = hrt(x) = 1 in the recursions and set the contentparameter c = 1, then our analysis corresponds to com-plex contagions in clustered monoplex networks, whichwas studied in [52].

Future work may consider in more details the impact ofassortativity or other topological features on the cascadedynamics. It would also be interesting to compare mul-tiplex networks and their monoplex projections in termsof other dynamical processes; e.g., site percolation, trans-port processes, etc. Another interesting direction wouldbe to consider networks that exhibit clustering not onlythrough triangles, but also through larger cliques [66]. Fi-nally, it would be interesting to study multiplex thresholddynamics that implement nonlinear rules as well; e.g., see[32, 67].

ACKNOWLEDGMENT

This research was supported in part by National Sci-ence Foundation through grant CCF #1422165, and inpart by the Department of ECE at Carnegie Mellon Uni-versity. A.A. acknowledge financial support by the Span-ish government through grant FIS2015-38266, ICREAAcademia and James S. McDonnell Foundation.

Page 14: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

14

[1] S. V. Buldyrev, R. Parshani, G. Paul, H. E. Stanley, andS. Havlin, Nature 464, 1025 (2010).

[2] A. Vespignani, Nature 464, 984 (2010).[3] O. Yagan, D. Qian, J. Zhang, and D. Cochran, IEEE

Transactions on Parallel and Distributed Systems 23,1708 (2012), arXiv:1201.2698v2.

[4] C. D. Brummitt, R. M. D’Souza, and E. Leicht, Pro-ceedings of the National Academy of Sciences 109, E680(2012).

[5] F. Radicchi and A. Arenas, Nature Physics 9, 717 (2013).[6] S. D. Reis, Y. Hu, A. Babino, J. S. Andrade Jr, S. Canals,

M. Sigman, and H. A. Makse, Nature Physics 10, 762(2014).

[7] R. Albert, H. Jeong, and A.-L. Barabasi, Nature 406,378 (2000).

[8] R. Cohen, K. Erez, D. Ben-Avraham, and S. Havlin,Physical Review Letters 85, 4626 (2000).

[9] R. Cohen, D. Ben-Avraham, and S. Havlin, PhysicalReview E 66, 036113 (2002).

[10] E. Leicht and R. M. D’Souza, arXiv preprintarXiv:0907.0894 (2009).

[11] J. Nagler, A. Levina, and M. Timme, Nature Physics 7,265 (2011).

[12] J. D. Murray, Mathematical Biology, 3rd ed. (Springer,New York (NY), 2002).

[13] S.-W. Son, P. Grassberger, and M. Paczuski, PhysicalReview Letters 107, 195702 (2011).

[14] G. Baxter, S. Dorogovtsev, A. Goltsev, and J. Mendes,Physical Review Letters 109, 248701 (2012).

[15] M. Dickison, S. Havlin, and H. E. Stanley, Physical Re-view E 85, 066109 (2012).

[16] R. M. Anderson and R. M. May, Infectious Diseases ofHumans (Oxford University Press, Oxford (UK), 1991).

[17] F. Sahneh, C. Scoglio, and P. Van Mieghem, IEEE/ACMTransactions on Networking 21, 1609 (2013).

[18] O. Yagan, D. Qian, J. Zhang, and D. Cochran, IEEEJournal on Selected Areas in Communications 31, 1038(2013).

[19] L. D. Valdez, P. A. Macri, H. E. Stanley, and L. A.Braunstein, Physical Review E 88, 050803 (2013).

[20] K. Zhao and G. Bianconi, Journal of Statistical Mechan-ics: Theory and Experiment 2013, P05005 (2013).

[21] D. Cellai, E. Lopez, J. Zhou, J. P. Gleeson, and G. Bian-coni, Physical Review E 88, 052811 (2013).

[22] N. Azimi-Tafreshi, J. Gomez-Gardenes, and S. Doro-govtsev, Physical Review E 90, 032816 (2014).

[23] G. Bianconi and S. N. Dorogovtsev, Physical Review E89, 062814 (2014).

[24] F. Radicchi, Nature Physics 11, 597 (2015).[25] A. A. Saberi, Physics Reports 578, 1 (2015).[26] T. W. Valente, Social Networks 18, 69 (1996).[27] J. P. Gleeson, Physical Review E 77, 046117 (2008).[28] D. J. Watts, Proceedings of the National Academy of

Sciences 99, 5766 (2002).[29] O. Yagan and V. Gligor, Physical Review E 86, 036103

(2012).[30] A. Hackett, D. Cellai, S. Gomez, A. Arenas, and J. P.

Gleeson, Physical Review X 6, 021002 (2016).[31] Y. Zhuang and O. Yagan, IEEE Transactions on Network

Science and Engineering 3, 211 (2016).

[32] C. D. Brummitt, K.-M. Lee, and K.-I. Goh, PhysicalReview E 85, 045102 (2012).

[33] M. Kivela, A. Arenas, M. Barthelemy, J. P. Gleeson,Y. Moreno, and M. A. Porter, Journal of Complex Net-works 2, 203 (2014).

[34] M. De Domenico, A. Sole-Ribalta, S. Gomez, andA. Arenas, Proceedings of the National Academy of Sci-ences 111, 8351 (2014).

[35] M. De Domenico, A. Sole-Ribalta, E. Cozzo, M. Kivela,Y. Moreno, M. A. Porter, S. Gomez, and A. Arenas,Physical Review X 3, 041022 (2013).

[36] B. Min, S. Do Yi, K.-M. Lee, and K.-I. Goh, PhysicalReview E 89, 042811 (2014).

[37] M. A. Serrano, L. Buzna, and M. Boguna, New Journalof Physics 17, 053033 (2015).

[38] C. Granell, S. Gomez, and A. Arenas, Physical ReviewLetters 111, 128701 (2013).

[39] G. J. Baxter, S. N. Dorogovtsev, J. F. F. Mendes, andD. Cellai, Physical Review E 89, 042801 (2014).

[40] M. De Domenico, C. Granell, M. A. Porter, and A. Are-nas, arXiv preprint arXiv:1604.02021 (2016).

[41] H. Wu, A. Arenas, and S. Gomez, arXiv preprintarXiv:1606.01688 (2016).

[42] S. Boccaletti, G. Bianconi, R. Criado, C. I. Del Ge-nio, J. Gomez-Gardenes, M. Romance, I. Sendina-Nadal,Z. Wang, and M. Zanin, Physics Reports 544, 1 (2014).

[43] D. Cellai and G. Bianconi, Physical Review E 93, 032302(2016).

[44] M. E. Newman, S. H. Strogatz, and D. J. Watts, PhysicalReview E 64, 026118 (2001).

[45] M. E. Newman, SIAM Review 45, 167 (2003).[46] D. J. Watts and S. H. Strogatz, Nature 393, 440 (1998).[47] M. A. Serrano and M. Boguna, Physical Review E 74,

056114 (2006).[48] M. E. J. Newman, Physical Review Letters 103, 058701

(2009).[49] J. P. Gleeson, S. Melnik, and A. Hackett, Physical Re-

view E 81, 066114 (2010).[50] J. Badham and R. Stocker, Theoretical Population Biol-

ogy 77, 71 (2010).[51] A. Hackett, J. P. Gleeson, and S. Melnik, International

Journal of Complex Systems in Science 1, 25 (2011).[52] A. Hackett, S. Melnik, and J. P. Gleeson, Physical Re-

view E 83, 056107 (2011).[53] X. Huang, S. Shao, H. Wang, S. V. Buldyrev, H. E.

Stanley, and S. Havlin, EPL (Europhysics Letters) 101,18002 (2013).

[54] P. Colomer-de Simon and M. Boguna, Physical ReviewX 4, 041020 (2014).

[55] E. Cozzo, M. Kivela, M. De Domenico, A. Sole-Ribalta,A. Arenas, S. Gomez, M. A. Porter, and Y. Moreno,New Journal of Physics 17, 073029 (2015).

[56] J. C. Miller, Physical Review E 80, 020901 (2009).[57] M. Molloy and B. Reed, Random structures & algorithms

6, 161 (1995).[58] S. Tang, J. Yuan, X. Mao, X.-Y. Li, W. Chen, and

G. Dai, in Proceedings of IEEE INFOCOM 2011 (2011)pp. 2291 –2299.

[59] P. Dodds and D. J. Watts, Physical Review Letters 92,218701 (2004).

Page 15: Clustering determines the dynamics of complex contagions ...oyagan/Journals/Influence_Clustered.pdf · ering the full multiplex structure of real-world systems in tackling similar

15

[60] K. B. Athreya and P. E. Ney, Branching processes, Vol.196 (Springer Science & Business Media, 2012).

[61] H. S. Wilf, generatingfunctionology (Elsevier, 2013).[62] J. P. Gleeson and D. J. Cahalane, Physical Review E 75,

056103 (2007).

[63] M. E. Newman, Physical Review Letters 89, 208701(2002).

[64] B. Soderberg, Physical Review E 68, 015102 (2003).[65] B. Soderberg, Physical Review E 66, 066121 (2002).[66] B. Karrer and M. E. Newman, Physical Review E 82,

066118 (2010).[67] K.-M. Lee, C. D. Brummitt, and K.-I. Goh, Physical

Review E 90, 062816 (2014).