Today lecture: 1. Overview of social network analysis ...

Social network analysis

Today lecture:

1. Overview of social network analysis

Properties of social networks

Analysis tasks

2. Martino: Community detection in temporal networks

3. Bruno: Community detection in signed networks

Aggarwal Ch. 19

MDM course Aalto 2020 – p.1/20

Many types of social networks

online networks (Twitter, LinkedIn, Facebook)

indirect communication networks (telecommunications,email, chat messages)

media sharing sites (Youtube, Instagram, Tiktok)

interaction networks in professional communities (e.g.,citation networks between researchers)


Presentation as a graph G = (V,E)

V set of nodes corresponding to actors

may have labels or content (attributes, documents)

E set of edges corresponding to links

usually undirected (friendship)

sometimes directed (“following” in twitter)

may have weights wi j

a node is a neighbour of another node, if they areconnected by an edge


Example of a social network

edge weight = number of exchanged messagesnode attributes: city and education

Image source: Zardi et al.: A Multi-agent homophily-based approachfor community detection in social networks, ICTAI 2014


Two basic properties

1. Homophily: Nodes connected to one another tend toshare similar (content-based) properties.

e.g., same background, hobbies or interests

2. Triadic closure: If two actors share common neighbours,it is more likely they are already connected or willeventually become connected


Typical dynamic properties

1. Preferential attachment: likelihood of a node to receivenew edges increases with node degree→ smallnumber of very high degree nodes (hubs)

2. Small world property: average path lengths betweenany two nodes are quite small

3. Densification: Groups become denser over time (morenew links than nodes)⇒

4. Shrinking diameters: distances between nodes shrinkover time

5. Giant connected component: often a giant connectedcomponent emerges in the network


Analysis tasks

I Social influence analysis (influential nodes andinfluence spread)

II Community detection (graph clustering)

III Collective classification (predict missing node labels)

IV Link prediction (predict future links between nodes)


I Social influence analysis

Which nodes have most influence? How influence(information, ideas, opinions) spreads?

A valuable advertising channel!

1. Measures for evaluating which nodes are influential:

centrality of a node in an undirected graph

prestige of a node in a directed graph

2. Influence propagation or diffusion models

given influence weights on edges and a model toevaluate total influence of a set of nodes

determine a set of seed nodes such that spread ofinfluence is maximal


Measures for the centrality of node v

Degree centrality: CD(v) =Degree(v)

n−1

Closeness centrality: CC(v) = 1avgu∈V {Dist(v,u)}

Betweenness centrality: CB(v) =

∑u,w∈V,u,w

#{shortest-paths(u,w) through v}

#{shortest-paths(u,w)}

( n2 )

Image source: Aggarwal Ch 19MDM course Aalto 2020 – p.9/20

II Community detection: cluster the graph

Given G = (V,E). Each edge (vi, v j) has weight wi j

if cost ci j, transform, e.g. by wi j =1

ci j(ci j , 0)

Common objective: Cluster V into groups V1, . . . ,VK

such that the edge-cut cost

cost(V1, . . . ,Vk) =∑

(vi,v j)∈E,vi∈Vp,v j∈Vq,p,q

wi j

is minimal.

many variants and extra constraints!

in general NP-hard problem, but polynomially solvable, if∀i, j : wi j = 1, K = 2 and no balancing requirements


Example

Clustering based on both structural and content-basedfeatures

Image source: Zardi et al.: A Multi-agent homophily-based approachfor community detection in social networks, ICTAI 2014 MDM course Aalto 2020 – p.11/20

Some community detection methods

Kerningham-Lin: balanced 2-way partitioning

at each iteration, test a set of possible swap sequencesand choose the one with greatest improvement

Girwan-Newman: Remove

“bridge edges” until K con-

nected components remain

edges with highbetweenness: largeproportion of shortestpaths go through them

e.g., edges incident onhubs

Image source: Aggarwal Ch 19 MDM course Aalto 2020 – p.12/20

METIS algorithm

1. Coarsen the graph by combining nodes and edges

2. Partition the coarsened representation (easier)

3. Refine partitioning when expanding graphs back

Image sourceAggarwal Ch 19


METIS Graph coarsening: Combine tightly

interconnected nodes

simple strategies: choose two neighbours along a randomedge, heavy edge or “high-density” edge

new node weight = number of contracted nodes it presents

combine parallel edges by summing their weights

image source

Aggarwal Ch 19.3


III Collective classification: predict missing labels

could look at the most common label in theneighbourhood, but node labels very sparse→ look atalso connections through unlabeled nodes

semisupervised classification, where unlabeled data isutilized in learning

+ suits for any data type given a similarity graph!

Image source: Aggarwal Ch 19MDM course Aalto 2020 – p.15/20

Collective classification: Common methods

1. Iterative classification algorithm (ICA)

of test nodes and movePredict labels

most certain to thetraining set

e.g., class distribution of neighbour

nodes, weighted by edge weights

nodescurrently labeled

for all nodes utilizingExtract link features Use link features

and content features totrain a probabilistic

classifier

repeat ITER times

e.g., naive Bayesif it has high probability

class prediction of a node "certain"

2. Label propagation with random walks

What is the most probable class of a node where arandom walk (through unlabeled nodes) leads?

3. Supervised spectral methods

spectral embedding + classify multidimensionaldata or solve directly spectral optimization problem


IV Link prediction: predict future links

Utilize especially structural features

Image source: Aggarwal Ch 19


Link prediction approaches

1. Evaluate potential connections with node similaritymeasures

+ easy and fast to compute

2. Learn a classifier for predicting links or their absence

+ more accurate– computationally more expensive

3. Use missing value estimation methods (like matrixfactorization)


Measures for similarity between nodes vi and v j

a) Neighbourhood-based measures: (normalized) numberof common neighbours

– not good, if number of common neighbours small

b) Walk-based measures

Katz measure produces often excellent results

Katz(vi, v j) =

∞∑

t=0

βt · n(t)

i j,

where n(t)

i j= number of walks of length t between vi and

v j and β < 1 discount factor (punishes long walks)

SimRank and PageRank-based measuresMDM course Aalto 2020 – p.19/20

Classification approach

create multidimensional data, where each node pair isone instance

in training use all linked pairs (class 1) and a sample ofunlinked pairs (class 0)

features: similarity measures between nodes, otherstructural features (e.g., node degrees), possiblycontent-based features

learn a classifier and make predictions for remainingpairs

e.g. logistic regression


Today lecture: 1. Overview of social network analysis ...

Documents

Transcript of Today lecture: 1. Overview of social network analysis ...