arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

16
Extracting significant signal of news consumption from social networks: the case of Twitter in Italian political elections Carolina Becatti, 1 Guido Caldarelli, 1, 2, 3, 4 Renaud Lambiotte, 5 and Fabio Saracco 1 1 IMT School for Advanced Studies, P.zza S. Francesco 19, 55100 Lucca (Italy) 2 Istituto dei Sistemi Complessi (CNR) UoS Universit` a Sapienza, P.le A. Moro 2, 00185 Rome (Italy) 3 European Centre for Living Technology, Universit`a di Venezia Ca’ Foscari, S. Marco 2940, 30124 Venice (Italy) 4 Catchy srl, Talent Garden Poste Italiane Via Giuseppe Andreoli 9, 00195 Rome (Italy) 5 Mathematical Institute, University of Oxford, Oxford (UK) According to the Eurobarometer report about EU media use of May 2018, the number of European citizens who consult on-line social networks for accessing information is considerably increasing. In this work we analyze approximately 10 6 tweets exchanged during the last Italian elections. By using an entropy-based null model discounting the activity of the users, we first identify potential political alliances within the group of verified accounts: if two verified users are retweeted more than expected by the non-verified ones, they are likely to be related. Then, we derive the users’ affiliation to a coalition measuring the polarization of unverified accounts. Finally, we study the bipartite directed representation of the tweets and retweets network, in which tweets and users are collected on the two layers. Users with the highest out-degree identify the most popular ones, whereas highest out-degree posts are the most “viral”. We identify significant content spreaders by statistically validating the connections that cannot be explained by users’ tweeting activity and posts’ virality by using an entropy-based null model as benchmark. The analysis of the directed network of validated retweets reveals signals of the alliances formed after the elections, highlighting commonalities of interests before the event of the national elections. arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019

Transcript of arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

Page 1: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

Extracting significant signal of news consumption from social networks:the case of Twitter in Italian political elections

Carolina Becatti,1 Guido Caldarelli,1, 2, 3, 4 Renaud Lambiotte,5 and Fabio Saracco1

1IMT School for Advanced Studies, P.zza S. Francesco 19, 55100 Lucca (Italy)2Istituto dei Sistemi Complessi (CNR) UoS Universita Sapienza, P.le A. Moro 2, 00185 Rome (Italy)

3European Centre for Living Technology, Universita di Venezia Ca’ Foscari, S. Marco 2940, 30124 Venice (Italy)4Catchy srl, Talent Garden Poste Italiane Via Giuseppe Andreoli 9, 00195 Rome (Italy)

5Mathematical Institute, University of Oxford, Oxford (UK)

According to the Eurobarometer report about EU media use of May 2018, the number of Europeancitizens who consult on-line social networks for accessing information is considerably increasing. Inthis work we analyze approximately 106 tweets exchanged during the last Italian elections. Byusing an entropy-based null model discounting the activity of the users, we first identify potentialpolitical alliances within the group of verified accounts: if two verified users are retweeted morethan expected by the non-verified ones, they are likely to be related. Then, we derive the users’affiliation to a coalition measuring the polarization of unverified accounts. Finally, we study thebipartite directed representation of the tweets and retweets network, in which tweets and usersare collected on the two layers. Users with the highest out-degree identify the most popular ones,whereas highest out-degree posts are the most “viral”. We identify significant content spreadersby statistically validating the connections that cannot be explained by users’ tweeting activity andposts’ virality by using an entropy-based null model as benchmark. The analysis of the directednetwork of validated retweets reveals signals of the alliances formed after the elections, highlightingcommonalities of interests before the event of the national elections.

arX

iv:1

901.

0793

3v1

[ph

ysic

s.so

c-ph

] 2

3 Ja

n 20

19

Page 2: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

2

I. INTRODUCTION

In recent years, the computer revolution we are witnessing made evident that the interconnection betweenindividuals in the society formed a different (non regular) lattice on which news and gossip propagate fromone person to another one [1]. Indeed, the way Europeans approach news and information has drasticallychanged. According to the Eurobarometer report about EU media use of May 2018 [2], printed press isconsulted everyday by 28% of the EU citizens, following a decreasing trend, while is never consulted byapproximately 20% of them. On the other hand, the daily access to the Internet increased up to the 65% ofthe population (starting from a 45% in 2010) and the increase in the percentage of citizens using everydayon-line social network for accessing information is even more striking, passing from 18% in 2010 to 42% in2017. This movement towards new technologies, beside being different from country to country (for instance,in Belgium the everyday access to the web concerns the 72% of the population against the 53% of Italy), isparadoxically in opposition to a general distrust towards new media: only 20% of the population declared totrust the contents available in on-line social networks, 34% trusts the Internet in general, while radio (59%)and TV (51%) are considered as more reliable.

In order to provide a global understanding of the structure and the dynamic of these new media, complexnetwork studies [3, 4] tackled different aspects of these phenomena. Different studies analyzed the raising ofgrass-root democratic protests as Arab springs [5], Occupy Wall Street [6] or the Spanish “Indignados” [7].Analogously, the dynamics of the electoral campaign online media has a (relatively) long literature, focusing,time by time, on USA [8–14], Australia [15, 16], Norway [17], Spain [18], Italy [19–21], France [22] andUK [23, 24]. Generally, the shift from mediated to disintermediated news consumption has led to a range ofdocumented phenomena: users tend to focus on information reinforcing their opinion (confirmation bias [25–29]) and to group in clusters of people with similar viewpoints, forming the so called echo chambers [26–31].The different dynamics that the public debate follows on social-network platforms is also remarkable: the timeevolution of viral non-verified contents is more persistent than the verified equivalent [27] and “negative”messages spread faster than “positive” ones, even if the latter reach on average a wider audience [32].Moreover, the analysis of time evolution of the activities in social platforms helps to predict the trend ofretweets [33], the interactions of a single user with her/his neighbours [34] and to detect future developmentsof information campaign at an early stage [35] or “astroturf” campaigns [22].

Extracting information from on-line social networks is often complicated by their complex, intertwinedorganization, and their strong heterogeneity. Salient features can be identified as significant deviations fromcarefully-constructed null models. An unbiased entropy-based null-model providing a general framework forthe analysis of real-world networks, can be obtained following the information theory derivation of statisticalphysics [36–38]: starting from an observed network, define an ensemble comprehending all possible networkswith the same amount of nodes [39], but varying the number of links, passing from an empty network tothe fully connected one, via all possible link configurations. Then define a probability distribution over theensemble; the functional form of this distribution can be derived through the maximization of the entropyunder certain constraints [40], i.e. preserving the average values over the ensemble of some quantitiesof interest. In order to obtain an estimate for the probability distribution parameters, we maximize thelikelihood to observe the real network [41, 42]. The crucial point of the above construction is the role ofthe constraints: in order to provide a reliable null-model, constraints should represent important propertiesof the system under analysis. Depending on the application, they may represent either the total number oflinks, as in the Erdos-Renyi null model, or the degree sequence [40–42], or other topological properties [43–48].

In the present work, we analyze ∼ 106 tweets exchanged during the last Italian elections. Using entropy-based null-models, we detect non-trivial patterns in the diffusion of viral contents and different diffusionstrategies depending on the polarization of the users. We employ a first undirected representation of thenetwork of retweets by distinguishing certified from non-certified users. The former group is mostly composedof celebrities and official accounts, including politicians, newspapers, etc. Then we identify groups of verifiedusers by their interaction with the opposite layer, following the recipe of [49, 50]. If two verified users areretweeted more than expected by non-verified ones, they are likely to be related. We analyse the communityorganisation of the resulting network and measure the polarization of unverified users: as observed in otherstudies [25–31, 51, 52], people tend to interact just with a single community, strongly polarizing their opinionsand we confirm this observation in the Italian elections of 2018. Finally, we study a bipartite and directedrepresentation of the users’ tweets and retweets network, in order to identify significant news consumers.In this context, we detect both the most popular and most significant content spreaders for the differentcommunities. In the former case, we focus only on the users with the highest out-degree, in the latter casewe statistically validate the (directed) connections that cannot be simply explained by the “virality” of the

Page 3: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

3

tweets and the tweet/retweet activity of the users. This last validation is an extension of the approachof [49, 50] to direct bipartite networks.

Our analysis uncovers the various strategies for the electoral campaign, followed by different politicalalliances and highlight a different participation to the political debate, providing indications about the roleof the users to spread the viral content inside each community. Moreover, by analysing the structure of thedirected network of statistically validated retweets, we observe not only the alliances presented before theelection, but even a signal of the alliances after the elections. Indeed, our studies show the proximity ofthe electorate of the governing parties during the electoral debate, beside belonging to different pre-electionsalliances.

The paper is structured as follows: in Section II we describe the performed analyses with a report of themain findings; we summarize research questions, results and this work’s contribution in the Section III. Moredetailed information regarding the dataset, the analyses and methods can be found in Section IV and in theSupplementary Material file.

II. RESULTS

A. Identification of alliances via unverified user behaviours

Using the Twitter API, we have downloaded a sample of all tweets posted from January 28 to March 19,2018. The query has been performed only requiring each post to contain at least one of a set of Italianelections-related keywords. For more details about the nature of the dataset, refer to the SupplementaryMaterial file, Section I.

As a first step, we have split the sample of Italian-speaking users in two categories, verified and non-verified users. Each account can request to be verified by the system: by doing so, Twitter guarantees itsauthenticity. This procedure is generally applied to those people considered of public interest. Therefore, weexpect famous people, politicians, newspapers, TV channels, radio channels etc. to have verified accounts,while the remaining users to belong to the other set. Then, we build the bipartite network of retweetsbetween verified and non-verified users: an edge between two users indicates that one has retweeted theother’s content at least once during the available time period. At this step we disregard edges’ direction,therefore in principle we do not know who is the author of the post and who is the second one who sharesthe content. However, as shown in Figure 2 of the Supplementary Material file, the actions of tweeting andsharing contents are mostly performed by verified accounts, while non-verified users mainly retweet alreadypublished posts.

In order to obtain groups of verified users based on their activity, we project the information contained inthis bipartite network on the verified users layer. The result of the classical projection methods is a weightedmonopartite network: two users are connected if they share at least one common neighbour on the oppositelayer and the edge between them is weighted by the number of non-verified users who have simultaneouslyretweeted their posts. However, this method often generates a dense projection; thus, several procedureshas been proposed in order to establish the significance of the edges in the projected network, discountingdifferent pieces of information. For this application we use as benchmark an entropy-based null model thatdiscounts the information contained in the degree sequence of both layers [50]; in this way we are focusingon overlaps that cannot be explained just by the activity of the users. A brief description of this approachcan be found in the Section IV, but refer to the Supplementary Material for the details of the entire procedure.

Given the projected and validated network of retweets, we have performed a reshuffled community detectionprocedure, i.e. the Louvain algorithm [53] runs several times with a rearranged nodes’ ordering and thepartition with the highest modularity is selected; in this way we overcome the original algorithm’s orderdependence [54]. With this procedure we identify ten groups of non-isolated nodes. However, for the followinganalyses we focus on a subset of four of them, being those with a remarkable number of nodes (more thana hundred) and a non-trivial interpretation. Two blocks identify quite well the groups of Movimento 5Stelle (from now on M5S) and right-leaning politicians. For instance, in the former we find the accountsM5S Camera, M5S Senato, M5S Europa and Movimento 5 Stelle, as well as the politicians Danilo Toninellior Luigi Di Maio. Instead in the latter we see the accounts Forza Italia, Lega - Salvini Premier, GruppoFI Camera, Fratelli d’Italia, Noi con Salvini as well as the users Silvio Berlusconi, Matteo Salvini, RenatoBrunetta or Giorgia Meloni. The two remaining communities are instead more heterogeneous. In one ofthem, we find a high number of radios, newspapers or newscasts, such as Rai Radio 2, Radio 105, RTL

Page 4: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

4

FIG. 1. Hashtags with frequence higher than 0.5% of total number of hashtags used by each group of veri-fied users, excluding “elezioni”,“elezioni2018”,“elezionipolitiche”,“4marzo”,“4marzo2018”,“marzo2018”,“politiche”,“politiche2018”, “elezionipolitiche2018”,“elezioni4marzo2018”. The represented color scales have been obtainedrescaling the original frequencies between 0 and 1.

102.5, Tg Rai, Tg La7, Sky Tg 24, Rai News, la Repubblica, Il Corriere della Sera, Il Post. Since themajority of nodes refers to official accounts of news media, we characterize this group as the one collectingnews spreaders and information channels. Finally, the last community encompasses some politicians withinthe left-leaning parties, such as Matteo Renzi and some other figures belonging to the Democratic party(included the account of the Partito Democratico itself). Therefore we use this interpretation for the lastdetected community.

B. Structure of the communities

Given the division in political alliances, we start analyzing the topological characteristics of the subgraphsmade by each group. An important insight on users’ behaviour comes from the observation of the hashtagsused by verified users. Excluding the set of keywords used to extract the data, Figure 1 reproduces thehashtags used more frequently by the verified users of each community, selecting those with a frequencyhigher than or equal to the 0.5% of the total number of hashtags in the single community. All political (i.e.yellow, red and blue) communities have the name of their own party as the most mentioned tag; indeed weobserve, respectively, the hashtags “m5s”, “pd” and “centrodestra” in their first position. Nevertheless, thesecond most used hashtag refers to the main opponent of the political alliance represented by the community.It is curious that this word is the Movimento 5 Stelle for both the right- and left-leaning alliances, since itwas effectively the most voted party at the elections. Instead, the second most used hashtag by Movimento5 Stelle is “renzi”, leader of the Partito Democratico at the moment of the elections, governing at thattime. Also the major exponents of the other parties are mentioned, for instance “berlusconi”, “salvini” and“dimaio” appear in the left- and right-leaning parties, as well as in the M5S one and the mixed group. Evenif the frequencies of the hashtags are comparable, summarising, all alliances have the name of their owncoalition as the most frequent one, followed by the names of the major competitors. More comments aboutthe frequency of hashtags and an analysis of the information channels consulted by users in the different

Page 5: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

5

1 2 3 4

12

34

0.82 0.075 0.044 0.03

0.014 0.89 0.071 0.026

0.012 0.039 0.93 0.016

0.02 0.028 0.0089 0.94

0.0

0.2

0.4

0.6

0.8

1.0

0.0 0.2 0.4 0.6 0.8 1.0polarization values

0

1

2

3

4

5

6

7

frequ

encie

s

FIG. 2. Left: heatmap reproducing the average fraction of interactions towards each community, for subsets ofnon-verified users. Each square in the heatmap indicates the average value of the quantity reported on the x-axis,computed over the set of non-verified users belonging to the community on the y-axis. Right: histogram of obtainedpolarization values.

communities can be found in the Supplementary Materials, Section II.

C. Polarization analysis

Once identified four groups of political characters and information channels representing the Italian polit-ical scenario, we want to analyse how the remaining accounts interact with them. In other words, the goalis to study whether the audience is polarized towards the source of information that better resembles theirideology or uniformly shares contents from accounts of different political orientation.Indicate with Cc for c = 1, 2, 3, 4 the sets of verified users identified at the end of the phase above, denotingrespectively the M5S, information channels community, the left-leaning and right-leaning accounts. Alsoindicate with

Nα = i : i ∈ L and m∗iα = 1

the set of neighbours of the non-verified user α ∈ Γ in the bipartite network of verified/unverified users,i.e. the set of verified users that node α has interacted with. The polarisation index for α is

ρα = max(Iα,c : c = 1, 2, 3, 4) for α ∈ Γ (1)

with

Iα,c =|Cc ∩Nα||Nα|

for c = 1, 2, 3, 4. (2)

The term Iα,c denotes the fraction of α’s interactions towards community c, i.e. the ratio of α’s neighboursbelonging to community c. This index has the following characteristics: is bounded in [0, 1], therefore ρα = 0means that no interaction has been observed with the four groups; values of ρα close to 1/4 indicate that userα equally interacts with the four clusters; for all the other values, the greater ρα the higher the inequalityin the number of interactions with the four communities (i.e. ρα close to 1 means that user α almost alwayshas interacted with the same group).

The choice of this index has been mostly driven by the observation of users’ interactions with the commu-nities. See left panel of Figure 2. Each square in the heatmap reproduces the average value of the quantityon the x-axis, computed over the set of non-verified users belonging to the community on the y-axis. Mostof the non-verified users have an extremely unbalanced distribution of interactions with the members of thepolitical alliances. The pictorial representation of the entire distribution of the terms in (2) within eachcommunity is provided in Figure 4 of the Supplementary Material. Instead, on the right-hand side of Figure

Page 6: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

6

FIG. 3. Biadjacency matrix of the network of retweets between verified and non-verified users. The coloured blocksidentify the four communities of verified users obtained with the community detection method: yellow for M5S, redand blue respectively for left- and right-leaning alliances and purple for the information channels community. Therows of the matrix have been ordered according to the division in communities, while the columns are sorted accordingto the political affiliation of non-verified users, i.e. the number of interactions with each group.

j i k

p q

i j

p

FIG. 4. Left: User i is a central user and publishes two posts p and q during the elections. Users j and k respectivelyretweet the same posts p and q at least one time. Right: illustration of the case in which i retweets one of her/hisown posts (the red arrow). This case is excluded from the analysis.

2 we represent the histogram of the polarization values obtained for the non-verified users. As introduced bya preliminary evaluation of users’ behaviour, many non-verified users show high values of polarization index,meaning that their attention patterns are mostly focused towards a limited group of political characters.

Figure 3 shows the biadjacency matrix of the bipartite network of verified and non-verified users. Thecoloured blocks identify the four communities obtained with the community detection method: the red andblue blocks respectively identify the groups of left-leaning and right-leaning politicians; the yellow communitycollects the available political figures within the M5S party, while the violet group represents the informationchannels community. The rows of the matrix have been ordered according to the division in communitiesof the verified users, while the non-verified users have been sorted according to their political affiliation,i.e. the number of interactions towards each group, that is equivalent to the computation of the numeratorof the term in equation (2) for all the available communities. Such a matrix exhibits a block structure alongthe diagonal, indicating a greater number of interactions towards the “preferred” community with respectto the others and therefore a higher density of links within the blocks with respect to the external density.Contrarily to the majority of studies on the same subject, the polarisation of non-certified users has notbeen studied simply labeling the units in the verified accounts layer. On the contrary, users in the samecommunity share a significantly high number of non-verified users that have retweeted their content. Indeed,a validated edge between users i and j indicates that a high number of non-verified accounts who sharedcontents posted by i has also retweeted posts published by j and therefore they are considered similar by amajority of their audience and followers. Therefore, the behaviour of the non-certified accounts is the driverto understand the division in clusters.

Page 7: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

7

D. Influence analysis

At this point, we proceed with the identification of the significant sources of Twitter viral content. Inspiredby the work of [14], we propose a bipartite and directed network of information flow. An example of this kindof graph is provided in Figure 4. The users on the upper layer tweet and retweet the posts represented onthe lower layer. An out-going edge (i, p) as in the figure indicates that user i has published tweet p duringthe time period T , while a link in the opposite direction (p, j) denotes that user j has retweeted the samepost p. In this situation, j behaves as a spreader : the larger the number of spreaders of an user, the largerthe audience of the contents shared by her/him. Moreover, the larger the number of tweets posted by i thathave been retweeted by j, the tighter the social bond between the two.As in [14], we simply project the bipartite and directed network of tweets and retweets onto the userslayer: a directed edge between i and j in the projected graph indicates that j has retweeted i’s posts atleast one time. A list of the nodes with the highest in- and out-degrees per community can be found inthe Supplementary Material file, Tables IV and V. However, these findings are not really explanatory inidentifying the most effective users in the directed network of retweets, since we are missing a benchmarkfor stating if j is a significant contributor of the popularity of i’s posts. The identification of such sig-nificant tights will be performed in two steps: first, we define a suitable benchmark model to evaluatethe significance of pairwise connections; this can be done by using the Bipartite Directed ConfigurationModel proposed in [48], thus discounting the information related to the in- and out- degrees of the nodesof both layers (in the present case the information of in- and out- degrees of posts and users). Then weextend the validation presented in [50] to this kind of graphs, in order to identify the significant tightsbetween source and spreader and validate the directed network of information flows. The overall procedureis explained in the Methods section (Section IV), but more details can be found in the SupplementaryMaterials. In the final, validated and directed network of users, only the pairs of users in which one of thetwo retweets the other’s contents more than what is expected by the considered null model will be connected.

A pictorial representation of the network of validated retweets is provided in Figure 5. Nodes’ colouridentifies the user’s community while nodes’ dimension indicates their out-degree in the validated graph. Thestructure of this network is better represented in Figures 6 and 7. Each plot focuses on the structure of thesubgraphs of the directed network generated by each community. Nodes dimension is directly proportionalto their out-degrees in the subgraph, therefore the larger the node, the higher the number of times that userhas been retweeted by the other accounts. Nodes colour is instead related to whether the account has beenverified or not: blue for verified users, orange for non-verified ones.

The first plot is related to the M5S community and shows strongly connected block of (mostly non-verified)nodes that retweets among themselves and with the verified accounts of the community, the most central ofwhich are the Twitter accounts of the newspaper Il Fatto Quotidiano and its journalists Marco Travaglio,Peter Gomez and Antonio Padellaro. In the other communities journalists and newspapers do not form sucha strong core. It is interesting to notice that the M5S political leader Luigi Di Maio does not belong to thisbig community, but is located in a small community outside this large component.

The second plot represents the purple community. The most central nodes are the verified accounts ofnewspapers (see for example La Repubblica or Il Corriere della Sera) and information channels (such asSky TG24, Tg La7, Agenzia Ansa or Rainews). Some politicians such as Pietro Grasso or Giuseppe Civatiare present in this group, together with the political parties of Rifondazione and Potere al Popolo. Thesepoliticians represent the more extreme, left-leaning orientation and that have not encountered a commonalityof interests and supporters with the accounts in the community of Partito Democratico and indeed theybelong to different communities.

In the plot associated to the left-leaning community (the third one) we identify a central block of mostlynon-verified users. The most retweeted figures are Matteo Renzi and the account of Partito Democratico, asshown by their high values of out-degrees. The remaining verified nodes are mostly well-known characters inthe political scenario (such as Maria Elena Boschi and Carlo Calenda), as well as newspapers (see for exampleIl Foglio or IlSole24Ore). Among the non-verified users we have the accounts of the Partito Democraticopolitical parties related to the areas of Milan and Rome.

Finally, the plot associated to the right-leaning community is characterised by two quite separate clusters;one of them is centered on the accounts of people belonging to Lega Nord, such as Matteo Salvini, ClaudioBorghi and the party Lega-Salvini Premier. On the other side there are the accounts of Forza Italia andGruppo FI Camera and some of its exponents like Silvio Berlusconi or Renato Brunetta. The two verifiednodes of Giorgia Meloni and Fratelli d’Italia (the political party she is leading) receive retweets from both

Page 8: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

8

FIG. 5. Entire directed network of retweets after the bipartite validation procedure. An edge from i to j indicatesthat j has retweeted i a significantly high number of times. Nodes colour identifies the community to which the userbelongs (red for left-leaning, yellow for M5S, purple for information channels and blue for right-leaning communities)while nodes dimension indicates their out-degree in the validated graph (i.e. how often their posts have been retweeteda significantly high number of times). Note that the blue nodes are divided into 2 subcommunities, one closer to theM5S community, the other closer to the left-leaning one.

sides, nevertheless being closer to the Lega pole. Another popular node is CasaPound, that has its own circleof retweeters and share some interactions with the subgroup of Lega Nord.

In order to understand how viral news propagate through the validated network, we analyse the percentageof retweets coming from inside of the same community and the percentage that, instead, derives from theother ones. The results are shown in Figure 1 of the Supplementary Materials and are extensively commentedtherein.

III. DISCUSSION

In this work, we focused the attention on the identification of the sources of Twitter viral content duringthe last Italian electoral campaign of 2018. We gathered the data from the Twitter API during the monthbefore the elections. As a first step, we exploited the way people consume news on the social media to identifyfour groups of strongly connected verified users: two certified users are connected in the validated networkof retweets if a significantly high number of non-certified accounts retweets content published by both ofthem. By construction, the behaviour of non-certified users is exploited to understand the division of thepolitical sphere into clusters. An analysis of the obtained communities shows that the most central accountsare mostly newspapers, journalists and information channels in general, each of them belonging to one ofthe previously listed clusters (see Table II of the Supplementary Material file). Looking at the hashtagsincluded in the posts published by the groups, we see that the most used keywords are referred to the partyitself and its members, followed by keywords related to political competitors. Given the obtained division inpolitical alliances, we studied the behaviour of the remaining non-verified users towards these groups. Morespecifically, we have observed the fraction of retweets directed towards each alliance: we observe a stronglypolarized behaviour, since the majority of the uncertified accounts in the bipartite network of retweets mostlyinteract with one community only. In order to strengthen this result, we also perform a different analysiscomparing the distribution of polarization values observed for users with the same number of interactions(see Figure 5 of the Supplementary Material). Also in this case, the distribution is skewed towards the highernumbers, indicating a focus towards the same group of users.

As a second step, we focused our attention on the identification of significant news spreaders. Following themethodology presented in [14], we constructed the directed network of retweets among users and selected

Page 9: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

9

the names with the highest out-degree and in-degree. In order to statistically validate our findings, weconstructed the bipartite and directed network of tweets and retweets introduced in Section IV B 2 andperformed the validation procedure described therein. The outcome of this analysis is a monopartite, directednetwork of users, in which an edge from i to j indicates that the latter retweeted contents posted by the formera significantly high number of times. The visualization of this validated network helps to understand theactual composition of each coalition, as well as the possible interconnections between them. For instance,we observed that approximately 8% of the connections between one community and another one happenbetween verified and unverified users, where, in most of the cases, the latter retweeted some posts fromthe former. However, we also see connections involving newspapers and information channels belongingto different coalitions, confirming again their essential role and centrality in spreading news on the socialnetworks. Finally, we analysed the origin of the retweets received by each community: even though thedistribution seems less polarized in this case, a higher percentage of the retweets received by a communitycomes from users belonging to the community itself, especially for the cases of C3 and C4, showing that mostof the interactions still comes from the same sphere of influence (see Figures 1 and 6 of the SupplementaryMaterial).

In our view, the methodological contributions of this paper are manifold. First, at odds with a majority ofpapers dealing with the same topic, our dataset was not manually labelled: we identified groups of stronglyconnected users starting from the behaviour of non-certified ones and their interactions with certified people.Despite this data-driven approach, we manage to identify four clusters of users that are closely alignedwith the known Italian political division. Therefore, how people consume the news and interact with mainpolitical figures helps to shed light on the actual division of the verified users according to their politicalorientation. Our second contribution resides in the representation of the network of activities on Twitter asa bipartite, directed network. We employed the null model for directed bipartite networks proposed in [48] toidentify the significant information flows between pairs of users and different communities. We observe thatthe right-leaning alliance is divided in two subcommunities: one centered on Berlusconi and Forza Italia,closer to the Partito Democratico community; the other, led by Salvini and Lega party, closer to the M5Scommunity. The result is striking since M5S and Lega actually allied to form a new government, after theelections, when no predefined alliance obtained the absolute majority. Our analysis suggests that seeds ofthis commonalities of interests were already present in the way different electors acted on Twitter.

Interesting lines of research, left for future investigations, include a validation of our approach to a corpusof different elections, in order to identify potential regularities or differences between countries, and theuse of tools from natural language processing to infer how positively- and negatively-connoted tweets aredistributed across the communities.

IV. METHODS

A. Definition of the polarisation network

1. Bipartite network of verified/unverified users

As a first step, we have split the sample of available and Italian-speaking users in two categories, thegroups of verified and non-verified users. Each account can request to be verified by the system: by doingso, Twitter guarantees that the account is authentic. This kind of procedure is, in general, applied to allthose people who are considered of public interest. Therefore, we expect the accounts of famous people,politicians, newspapers, TV channels, radio channels etc. to be included into the set of verified users, whileall the remaining users to belong to the other set. Given this division, we construct a first bipartite networkfrom the data, the network of retweets between verified and non-verified users during the whole periodG∗Bi = (L,Γ,E)[55]. By definition of bipartite networks, with this representation we exclude all the cases inwhich two certified (or non-certified) users retweet each other. We do so since we are interested in exploitingthe way normal people consume the news to detect connected groups of certified users. Refer to Section VA 1 of the Supplementary Materials for a more detailed description of this network.

2. Bipartite Configuration Model and undirected validated projection

In order to evaluate the relevance of the bipartite network of retweets between verified and non-verifiedusers, we need to define a suitable model as a benchmark for our measurements. Indeed, at the present

Page 10: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

10

step we want to discount the information due to the activity of the users, both verified and unverified, inorder to detect non-trivial superposition of connections. We thus implement a Bipartite Configuration Model(BiCM) [56] for this case, that maximise the entropy of the system, constraining the degree sequence of thetwo layers. A full detailed description of the null model can be found in the Supplementary Materials. Letus just remind here that at the end of this procedure we obtain a probability per graph that factorises inindependent probabilities per link.In order to infer information about similar behaviours of verified users, we first project the informationcontained in the bipartite network G∗Bi on the verified-users layer. The result of the classical projectionmethods is a weighted monopartite network: two users i, j ∈ L are connected if they share at least onecommon neighbour on the opposite layer Γ and the edge between them is weighted by the number of non-verified users who have simultaneously retweeted their posts. This quantity is expressed by the number ofV−motifs between users i and j [56, 57]

V ∗ij =∑α∈Γ

m∗iαm∗jα.

However, this method often generates a highly connected projection; in order to extract the statisticallysignificant information of this projection we implement the method of [50]. Roughly speaking, the methodconsists in comparing the realised V−motifs with the expectation of the Bipartite Configuration Model.In this sense, the statically significant V−motifs are extracted: otherwise stated, all motifs that cannotbe explained by the degree sequence only are validated. More details can be found in the SupplementaryMaterials, Sections V A 2 - V A 3.

B. Directed network of information flow

1. Bipartite directed network of tweetin/retweeting activity

In this section, we describe the construction of the bipartite and directed network of information flow.A directed edge (i, p) indicates that user i is the original creator of post p, while an edge on the oppositedirection (p, j) shows that j has retweeted tweet p at least one time during the elections days. Such asituation is shown in the left panel of Figure 4. In the picture, i is a user who publishes two posts p and qduring the election days; the users j retweets post p and user k retweets post q. Notice that, by construction,we cannot keep track of the chains of sequential retweets (consult Section I of the Supplementary Materialsfor more details about the nature of the data set). Therefore, the edge (q, k) does not necessarily meanthat k has retweeted directly from i’s post: it is possible (though less likely) that k has retweeted post qfrom one of user i’s retweets. Another particular case is self-retweet: Twitter allows the users to retweettheir own posts, either directly from the original tweet or from somebody else’s retweet. With this networkrepresentation, these cases can be illustrated as in right panel of Figure 4, where the retweet by i itself isindicated with the red arrow. However, these type of edges have been excluded from the analysis, since theyrepresent a very small percentage of the overall number of retweets observed in the data (approximately0.97%) and we are rather interested in how people from the same political coalition interact with each otherto boost the visibility of their opinion.

This bipartite and directed network G∗BiD can be represented by means of two biadjacency matrices, onefor the tweets T∗ = t∗ip : i ∈ U and p ∈ P and the other one for the retweets R∗ = r∗pj : p ∈ P and j ∈ U.Essentially t∗ip = 1 if user i has tweeted post p and 0 otherwise and r∗pj = 1 whether or not at least oneretweet to post p by user j is observed during T . See Section V B for a more detailed description of thisdirected network.

2. Bipartite Directed Configuration Model and directed validated projectiion

For the actual analysis we are interested in determining which are the most pervasive tweet flows, dis-counting both the activity of the users and the virality of the tweets. Otherwise stated, we want to measurewhich are the non-trivial patterns observed in the network, not explained by the degree sequence for bothlayers and highlight non-trivial patterns describing the structure of the actual system. The randomization ofa bipartite directed network with constraints on the in- and out-degree sequences was first presented in [48];we revise the procedure that leads to the definition of the probabilities in the Supplementary Materials. Let

Page 11: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

11

us remark that, as in the previous (undirected) case of the BiCM, the probability per graph factorises interms of probabilities per (directed) link. Moreover, respect to the previous case there is a crucial simpli-fication: indeed each post cannot have more than an author, thus the in-degree of tweet posts is 〈κin

p 〉 = 1over the ensemble defined by Bipartite Directed Configuration Model (BiDCM).Extending the definition of V−motifs to directed networks, the quantity Vij identifies the number of times jhas acted as a spreader for i, retweeting content posted by her/him [58]. Using the null model’s expectationof this quantity we are able to infer the statistical significance of the tight between i and j, following aprocedure similar to the one of [50]; consult the Supplementary Materials for further details. This procedurereturns a squared NU × NU matrix collecting the significance of each link and each edge is treated as aseparate null hypothesis to test [59]. Only the links associated to rejected hypotheses have to be includedin the validated network: whenever V∗ij is statistically validated, j is considered a significant spreader of i’stweets and the directed edge (i, j) is included in the validated network of information flows F = (fij)ij∈U .Refer to Sections V C - V D for a more detailed description of these methods.

ACKNOWLEDGEMENTS

This work of F.S. and G.C. was supported by the EU projects CoeGSS (Grant No. 676547), Openmaker(Grant No. 687941), SoBigData (Grant No. 654024). G.C. is supported also by EC TENDER SMART(Grant No. 2017/0090 LC-00704693).

AUTHOR CONTRIBUTIONS

F.S. collected the data. C.B and F.S. designed and performed the analyses and prepared tables and figures.All authors wrote, reviewed and approved the manuscript.

ADDITIONAL INFORMATION

The authors declare no competing interests.

DATA AVAILABILITY

The datasets generated and analysed during the current study are not publicly available due to the datacleaning procedure the authors implement for research purposes, but are available from the correspondingauthor on reasonable request.

[1] A-L. Barabasi. The network takeover. Nature Physics, 8:14–16, 2011. doi:10.1038/nphys2188.[2] TNS opinion & social and Directorate-General Communications. Media use in the European

Union, 2018. URL https://publications.europa.eu/en/publication-detail/-/publication/

a575c1c9-58b6-11e8-ab41-01aa75ed71a1/language-en/format-PDF/source-77908075.[3] M.E.J. Newman. Networks. An introduction. 2010. ISBN 978-0-19-920665-0 (Hbk).[4] Guido Caldarelli. Scale-Free Networks: Complex Webs in Nature and Technology. Oxford University Press,

Oxford (UK), 2010. ISBN 9780191705984.[5] Sandra Gonzalez-Bailon, Javier Borge-Holthoefer, Alejandro Rivero, and Yamir Moreno. The Dynamics

of Protest Recruitment through an Online Network. Sci. Rep., 2011. ISSN 2045-2322. doi:10.1038/srep00197.

[6] M Conover, J Ratkiewicz, and M Francisco. Political polarization on twitter. Icwsm, 2011. ISSN 15205126.doi:10.1021/ja202932e.

[7] Sandra Gonzalez-Bailon, Javier Borge-Holthoefer, and Yamir Moreno. Broadcasters and Hidden Influentials inOnline Protest Diffusion. Am. Behav. Sci., 2013. ISSN 00027642. doi:10.1177/0002764213479371.

[8] Lada A. Adamic and Natalie Glance. The political blogosphere and the 2004 U.S. election. In Proc. 3rd Int. Work.Link Discov. - LinkKDD ’05, pages 36–43, New York, New York, USA, 2005. ACM Press. ISBN 1595932151.doi:10.1145/1134271.1134277. URL http://portal.acm.org/citation.cfm?doid=1134271.1134277.

Page 12: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

12

[9] Nicholas A. Diakopoulos and David A. Shamma. Characterizing debate performance via aggregated twittersentiment. In Proc. 28th Int. Conf. Hum. factors Comput. Syst. - CHI ’10, page 1195, New York, New York,USA, 2010. ACM Press. ISBN 9781605589299. doi:10.1145/1753326.1753504. URL http://portal.acm.org/

citation.cfm?doid=1753326.1753504.[10] Joseph DiGrazia, Karissa McKelvey, Johan Bollen, and Fabio Rojas. More tweets, more votes: Social media

as a quantitative indicator of political behavior. PLoS One, 8(11):e79449, nov 2013. ISSN 19326203. doi:10.1371/journal.pone.0079449. URL https://dx.doi.org/10.1371/journal.pone.0079449.

[11] Marija Anna Bekafigo and Allan McBride. Who Tweets About Politics?: Political Participation of Twitter UsersDuring the 2011Gubernatorial Elections. Soc. Sci. Comput. Rev., 31(5):625–643, oct 2013. ISSN 08944393.doi:10.1177/0894439313490405. URL http://journals.sagepub.com/doi/10.1177/0894439313490405.

[12] Adam Badawy, Emilio Ferrara, and Kristina Lerman. Analyzing the Digital Traces of Political Manipulation:The 2016 Russian Interference Twitter Campaign. feb 2018. doi:10.1145/nnnnnnn.nnnnnnn. URL http://

arxiv.org/abs/1802.04291.[13] Alexandre Bovet, Flaviano Morone, and Hernan A. Makse. Validation of Twitter opinion trends with national

polling aggregates: Hillary Clinton vs Donald Trump. Sci. Rep., 2018. ISSN 20452322. doi:10.1038/s41598-018-26951-y.

[14] Alexandre Bovet and Hernan A. Makse. Influence of fake news in Twitter during the 2016 US presidentialelection. mar 2018. URL http://arxiv.org/abs/1803.08491.

[15] Rachel K. Gibson and Ian McAllister. Does CyberCampaigning Win Votes? Online Communication in the2004 Australian Election. J. Elections, Public Opin. Parties, 16(3):243–263, oct 2006. ISSN 1745-7289. doi:10.1080/13689880600950527. URL http://www.tandfonline.com/doi/full/10.1080/13689880600950527.

[16] Axel Bruns and Stefan Stieglitz. Quantitative Approaches to Comparing Communication Patterns on Twitter.J. Technol. Hum. Serv., 30(3-4):160–185, jul 2012. ISSN 15228991. doi:10.1080/15228835.2012.744249. URLhttp://www.tandfonline.com/doi/abs/10.1080/15228835.2012.744249.

[17] Gunn Sara Enli and Eli Skogerbø. PERSONALIZED CAMPAIGNS IN PARTY-CENTRED POLITICS:Twitter and Facebook as arenas for political communication. Inf. Commun. Soc., 16(5):757–774, jun 2013.ISSN 1369118X. doi:10.1080/1369118X.2013.782330. URL http://www.tandfonline.com/doi/abs/10.1080/

1369118X.2013.782330.[18] J. Borondo, A. J. Morales, J. C. Losada, and R. M. Benito. Characterizing and modeling an electoral campaign

in the context of Twitter: 2011 Spanish Presidential election as a case study. Chaos, 22(2):023138, jun 2012.ISSN 10541500. doi:10.1063/1.4729139. URL http://aip.scitation.org/doi/10.1063/1.4729139.

[19] Guido Caldarelli, Alessandro Chessa, Fabio Pammolli, Gabriele Pompa, Michelangelo Puliga, Massimo Riccaboni,and Gianni Riotta. A multi-level geographical study of italian political elections from twitter data. PLoS One, 9(5):e95809, may 2014. ISSN 19326203. doi:10.1371/journal.pone.0095809. URL https://dx.doi.org/10.1371/

journal.pone.0095809.[20] Michela Del Vicario, Sabrina Gaito, Walter Quattrociocchi, Matteo Zignani, and Fabiana Zollo. Public discourse

and news consumption on online social media: A quantitative, cross-platform analysis of the Italian Referendum.feb 2017. ISSN 16130073. doi:10.475/123. URL http://arxiv.org/abs/1702.06016.

[21] Jacopo Bindi, Davide Colombi, Flavio Iannelli, Nicola Politi, Michele Sugarelli, Raffaele Tavarone, and EnricoUbaldi. Political Discussion and Leanings on Twitter: the 2016 Italian Constitutional Referendum. may 2018.URL http://arxiv.org/abs/1805.07388.

[22] Emilio Ferrara. Disinformation and social bot operations in the run up to the 2017 French presidential election.First Monday, 2017. ISSN 13960466. doi:10.2139/ssrn.2995809.

[23] Laura Cram, Clare Llewellyn, Robin Hill, and Walid Magdy. UK General Election 2017: a Twitter Analysis.Neuropolitics Res. Lab, 2017:1–11, jun 2017. URL http://arxiv.org/abs/1706.02271.

[24] Michela Del Vicario, Fabiana Zollo, Guido Caldarelli, Antonio Scala, and Walter Quattrociocchi. Map-ping social dynamics on Facebook: The Brexit debate. Soc. Networks, 2017. ISSN 03788733. doi:10.1016/j.socnet.2017.02.002.

[25] Walter Quattrociocchi, Guido Caldarelli, and Antonio Scala. Opinion dynamics on interacting networks: mediacompetition and social influence. Sci. Rep., 2015. ISSN 2045-2322. doi:10.1038/srep04938.

[26] Michela Del Vicario, Gianna Vivaldo, Alessandro Bessi, Fabiana Zollo, Antonio Scala, Guido Caldarelli, andWalter Quattrociocchi. Echo Chambers: Emotional Contagion and Group Polarization on Facebook. Sci. Rep.,2016. ISSN 20452322. doi:10.1038/srep37825.

[27] Michela Del Vicario, Alessandro Bessi, Fabiana Zollo, Fabio Petroni, Antonio Scala, Guido Caldarelli, H. EugeneStanley, and Walter Quattrociocchi. The spreading of misinformation online. Proc. Natl. Acad. Sci., 2016. ISSN0027-8424. doi:10.1073/pnas.1517441113.

[28] Ana Lucıa Schmidt, Fabiana Zollo, Antonio Scala, Cornelia Betsch, and Walter Quattrociocchi. Polarization ofthe vaccination debate on Facebook. Vaccine, 2018. ISSN 18732518. doi:10.1016/j.vaccine.2018.05.040.

[29] Ana Lucıa Schmidt, Fabiana Zollo, Antonio Scala, and Walter Quattrociocchi. Polarization Rank: A Study onEuropean News Consumption on Facebook. may 2018. URL http://arxiv.org/abs/1805.08030.

[30] Eytan Bakshy, Solomon Messing, and Lada A. Adamic. Exposure to ideologically diverse news and opinion onFacebook. Science (80-. )., 2015. ISSN 10959203. doi:10.1126/science.aaa1160.

[31] Dimitar Nikolov, Diego F. M. Oliveira, Alessandro Flammini, and Filippo Menczer. Measuring Online SocialBubbles. PeerJ Comput. Sci., 1:e38, dec 2015. ISSN 2376-5992. doi:10.7717/peerj-cs.38. URL https://peerj.

Page 13: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

13

com/articles/cs-38http://arxiv.org/abs/1502.07162.[32] Emilio Ferrara and Zeyao Yang. Quantifying the effect of sentiment on information diffusion in social media.

PeerJ Comput. Sci., 2015. ISSN 2376-5992. doi:10.7717/peerj-cs.26.[33] Ryota Kobayashi and Renaud Lambiotte. TiDeH: Time-Dependent Hawkes Process for Predicting Retweet

Dynamics. mar 2016. doi:10.1523/jneurosci.3400-11.2011. URL http://arxiv.org/abs/1603.09449.[34] Lionel Tabourier, Anne Sophie Libert, and Renaud Lambiotte. Predicting links in ego-networks using temporal

information. EPJ Data Sci., 5(1):1, dec 2016. ISSN 21931127. doi:10.1140/epjds/s13688-015-0062-0. URLhttp://www.epjdatascience.com/content/5/1/1.

[35] Onur Varol, Emilio Ferrara, Filippo Menczer, and Alessandro Flammini. Early detection of promoted campaignson social media. EPJ Data Sci., 2017. ISSN 21931127. doi:10.1140/epjds/s13688-017-0111-y.

[36] E.T. Jaynes. Information Theory and Statistical Mechanics, 1957. ISSN 0031-899X.[37] Squartini Tiziano and Garlaschelli Diego. Maximum-entropy networks. Pattern detection, network reconstruction

and graph combinatorics. Springer, page 116, 2017.[38] Giulio Cimini, Tiziano Squartini, Fabio Saracco, Diego Garlaschelli, Andrea Gabrielli, and Guido Caldarelli. The

Statistical Physics of Real-World Networks. oct 2018. doi:arXiv:1810.05095v1. URL http://arxiv.org/abs/

1810.05095.[39] Note1. Nodes play the role of the volume in the canonical definition in statistical physics.[40] Juyong Park and Mark E J Newman. Statistical mechanics of networks. Phys. Rev. E, 70(6):66117, dec 2004.

ISSN 1539-3755. doi:10.1103/PhysRevE.70.066117. URL http://www.scopus.com/inward/record.url?eid=

2-s2.0-41349102726%7B&%7DpartnerID=tZOtx3y1.[41] Diego Garlaschelli and Maria I Loffredo. Maximum likelihood: Extracting unbiased information from com-

plex networks. Phys. Rev. E - Stat. Nonlinear, Soft Matter Phys., 78(1):1–5, 2008. ISSN 15393755. doi:10.1103/PhysRevE.78.015101.

[42] Tiziano Squartini and Diego Garlaschelli. Analytical maximum-likelihood method to detect patterns in realnetworks. New J. Phys., 13, 2011. ISSN 13672630. doi:10.1088/1367-2630/13/8/083001.

[43] Tiziano Squartini, Iman van Lelyveld, and Diego Garlaschelli. Early-warning signals of topological collapsein interbank networks. Sci. Rep., 3:3357, 2013. ISSN 2045-2322. doi:10.1038/srep03357. URL http://www.

pubmedcentral.nih.gov/articlerender.fcgi?artid=3842548&tool=pmcentrez&rendertype=abstract.[44] Piotr Fronczak, Agata Fronczak, and Maksymilian Bujok. Exponential Random Graph Models for Networks with

Community Structure. Phys. Rev. E, 88(May):032810, 2013. ISSN 1539-3755. doi:10.1103/PhysRevE.88.032810.URL http://link.aps.org/doi/10.1103/PhysRevE.88.032810.

[45] Rossana Mastrandrea, Tiziano Squartini, Giorgio Fagiolo, and Diego Garlaschelli. Enhanced reconstruction ofweighted networks from strengths and degrees. New J. Phys., 16, 2014. ISSN 13672630. doi:10.1088/1367-2630/16/4/043022.

[46] Domenico Di Gangi, Fabrizio Lillo, and Davide Pirino. Assessing systemic risk due to fire sales spilloverthrough maximum entropy network reconstruction. J. Econ. Dyn. Control, 94:117–141, sep 2018. ISSN01651889. doi:10.1016/j.jedc.2018.07.001. URL https://www.sciencedirect.com/science/article/pii/

S0165188918301787.[47] Carolina Becatti, Guido Caldarelli, and Fabio Saracco. Entropy-based randomisation of rating networks. may

2018. ISSN 1095-9203. doi:10.1126/science.307.5710.674. URL http://arxiv.org/abs/1805.00717.[48] Jeroen van Lidth de Jeude, Riccardo Di Clemente, Guido Caldarelli, Fabio Saracco, and Tiziano Squartini.

Reconstructing mesoscale network structures. may 2018. ISSN 0022-0949. doi:10.1242/jeb.01477. URL http:

//arxiv.org/abs/1805.06005.[49] Stanislao Gualdi, Giulio Cimini, Kevin Primicerio, Riccardo Di Clemente, and Damien Challet. Statistically

validated network of portfolio overlaps and systemic risk. Sci. Rep., 6, mar 2016. ISSN 20452322. doi:10.1038/srep39467. URL http://arxiv.org/abs/1603.05914http://dx.doi.org/10.1038/srep39467.

[50] Fabio Saracco, Mika J. Straka, Riccardo Di Clemente, Andrea Gabrielli, Guido Caldarelli, and Tiziano Squartini.Inferring monopartite projections of bipartite networks: An entropy-based approach. New J. Phys., 19(5):16, jul2017. ISSN 13672630. doi:10.1088/1367-2630/aa6b38. URL http://arxiv.org/abs/1607.02481.

[51] Pranav Dandekar, Ashish Goel, and David Lee. Biased Assimilation, Homophily and the Dy-namics of Polarization. Proc. Natl. Acad. Sci. U. S. A., 110(15):5791–6, apr 2012. ISSN 0027-8424. doi:10.1073/pnas.1217220110. URL http://www.ncbi.nlm.nih.gov/pubmed/23536293http:

//www.pubmedcentral.nih.gov/articlerender.fcgi?artid=PMC3625335http://arxiv.org/abs/1209.

5998%0Ahttp://dx.doi.org/10.1073/pnas.1217220110.[52] Seth Flaxman, Sharad Goel, and Justin M. Rao. Ideological Segregation and the Effects of So-

cial Media on News Consumption. SSRN Electron. J., pages 1–39, mar 2013. ISSN 1556-5068.doi:10.2139/ssrn.2363701. URL http://www.ssrn.com/abstract=2363701http://dx.doi.org/10.2139/ssrn.

2363701%5Cnhttp://www.ssrn.com/abstract=2363701.[53] Vincent D. Blondel, Jean Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. Fast unfolding of com-

munities in large networks. J. Stat. Mech. Theory Exp., 2008(10), 2008. ISSN 17425468. doi:10.1088/1742-5468/2008/10/P10008.

[54] Santo Fortunato. Community detection in graphs. Phys. Rep., 486(3-5):75–174, feb 2010. ISSN03701573. doi:10.1016/j.physrep.2009.11.002. URL http://www.sciencedirect.com/science/article/pii/

S0370157309002841.

Page 14: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

14

[55] Note2. In the following we will refer to quantities regarding the real network with an asterisk ∗.[56] Fabio Saracco, Riccardo Di Clemente, Andrea Gabrielli, and Tiziano Squartini. Randomizing bipartite net-

works: the case of the World Trade Web. Sci. Rep., 5:10595, 2015. URL http://arxiv.org/abs/1503.

05098%5Cnhttp://www.nature.com/articles/srep10595.[57] Reinhard Diestel. Graph Theory, 4th Edition. 2012. ISBN 978-3-642-14278-9. doi:10.1109/IEMBS.2010.5626521.[58] Note3. Note that Vij 6= Vji.[59] Note4. Actually, there is a tricky part. Since all users that were not author of retweeted posts have probability

exactly equal to zero to spread a tweet, their probability to be the source of a directed V− motif is again exactlyzero, for each of the possible tweet. Those cases were not considered in the FDR.

Page 15: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

15

carlo sibilia

Michele Dell'Orco

Manlio Di Stefano

Peter Gomez

Marco Travaglio

Il Fatto Quotidiano

Pietro Grasso

Radio Vaticana Italia

Mauro Casciari

Il Post

Tgr Rai VdA

Tgr Rai Campania

Rainews

Giuseppe Civati

Agenzia ANSA

Tgr Rai Marche

la Repubblica

Daniela Collu

Rifondazione

Antonello Perillo

Alessandro Alciato

Luciana Littizzetto

Corriere della SeraYouTrend

Pif

trashitaliano.it

Fabrizio Biasin

TV8

Tgr Rai Puglia

Fran Altomare

Ufficio Stampa Rai

RTL 102.5

chetempochefa

Sky TG24

Tgr Rai

We Love Cinema

Tg La7

Potere al Popolo

Alessandro Masala

Rai Radio2

Andrea Dianetti

FIG. 6. Most retweeted users in groups C1 and C2. Nodes’ dimension is proportional to the out-degree in the subgraphgenerated by the community (i.e. number of retweets received from users from the same community). The blue colorindicates that the node is verified, while orange is not verified.

Page 16: arXiv:1901.07933v1 [physics.soc-ph] 23 Jan 2019 ...

16

Partito Democratico

Matteo Renzi

Riccardo Magi

Claudio Cerasa

Giorgio Gori

Daniele Cinà

Vittorio Zucconi

Alessia Morani

Eugenio Cardi

jacopo iacoboni

Il Foglio

Beatrice Lorenzin

PD ROMA

InCammino

Fratelli d'Italia

Giorgia Meloni ..

Gruppo FI Camera

Lega − Salvini Premier

Forza Italia

Giorgio La Porta

Renato Brunetta

CasaPound Italia

Forza Italia Online

Matteo Salvini

Claudio Borghi A.

Annagrazia Calabria

Patrizia Rametta

ForzaItalia ELEZIONI

Silvio Berlusconi

Mauro Antonini

Barbara Raval .... ........

Simone Di Stefano

FIG. 7. Most retweeted users in groups C3 and C4. Nodes’ dimension is proportional to the out-degree in the subgraphgenerated by the community (i.e. number of retweets received from users from the same community). The blue colorindicates that the node is verified, while orange is not verified.