Community Detection in Social Networks: A Brief Overview

48
Community Detection in Social Networks A Brief Overview Satyaki Sikdar Heritage Institute of Technology, Kolkata 8 January 2016 Satyaki Sikdar Community Detection 8 January 2016 1 / 37

Transcript of Community Detection in Social Networks: A Brief Overview

Page 1: Community Detection in Social Networks: A Brief Overview

Community Detection in Social NetworksA Brief Overview

Satyaki Sikdar

Heritage Institute of Technology, Kolkata

8 January 2016

Satyaki Sikdar Community Detection 8 January 2016 1 / 37

Page 2: Community Detection in Social Networks: A Brief Overview

Introduction

Table of Contents

1 IntroductionAbout MeSocial NetworksMathematical background

2 Motivation

3 The Hunt for Communities

4 The Need for Speed (and quality)

Satyaki Sikdar Community Detection 8 January 2016 2 / 37

Page 3: Community Detection in Social Networks: A Brief Overview

Introduction About Me

about me

Extremely lazy - I’ve been told

Working with social networks for the past 8 months the supervision of Prof. ParthaBasuchowdhuri

Conversant in Python, C++ and C - an average programmer at best

Vice Chair of Heritage Institute of Technology ACM Student Chapter

Satyaki Sikdar Community Detection 8 January 2016 3 / 37

Page 4: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Networks

Networks are everywhere. They crop up wherever there are interactions between actors.

friendship networks

follower networks

neural networks

telecom networks

trade of goods and services

protein protein interactions - medicine design

citations and collaborations

power grid networks

predator prey networks

Satyaki Sikdar Community Detection 8 January 2016 4 / 37

Page 5: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Networks

Networks are everywhere. They crop up wherever there are interactions between actors.

friendship networks

follower networks

neural networks

telecom networks

trade of goods and services

protein protein interactions - medicine design

citations and collaborations

power grid networks

predator prey networks

Satyaki Sikdar Community Detection 8 January 2016 4 / 37

Page 6: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Networks

Networks are everywhere. They crop up wherever there are interactions between actors.

friendship networks

follower networks

neural networks

telecom networks

trade of goods and services

protein protein interactions - medicine design

citations and collaborations

power grid networks

predator prey networks

Satyaki Sikdar Community Detection 8 January 2016 4 / 37

Page 7: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Networks

Networks are everywhere. They crop up wherever there are interactions between actors.

friendship networks

follower networks

neural networks

telecom networks

trade of goods and services

protein protein interactions - medicine design

citations and collaborations

power grid networks

predator prey networks

Satyaki Sikdar Community Detection 8 January 2016 4 / 37

Page 8: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Networks

Networks are everywhere. They crop up wherever there are interactions between actors.

friendship networks

follower networks

neural networks

telecom networks

trade of goods and services

protein protein interactions - medicine design

citations and collaborations

power grid networks

predator prey networks

Satyaki Sikdar Community Detection 8 January 2016 4 / 37

Page 9: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Networks

Networks are everywhere. They crop up wherever there are interactions between actors.

friendship networks

follower networks

neural networks

telecom networks

trade of goods and services

protein protein interactions - medicine design

citations and collaborations

power grid networks

predator prey networks

Satyaki Sikdar Community Detection 8 January 2016 4 / 37

Page 10: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Networks

Networks are everywhere. They crop up wherever there are interactions between actors.

friendship networks

follower networks

neural networks

telecom networks

trade of goods and services

protein protein interactions - medicine design

citations and collaborations

power grid networks

predator prey networks

Satyaki Sikdar Community Detection 8 January 2016 4 / 37

Page 11: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Networks

Networks are everywhere. They crop up wherever there are interactions between actors.

friendship networks

follower networks

neural networks

telecom networks

trade of goods and services

protein protein interactions - medicine design

citations and collaborations

power grid networks

predator prey networks

Satyaki Sikdar Community Detection 8 January 2016 4 / 37

Page 12: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Networks

Networks are everywhere. They crop up wherever there are interactions between actors.

friendship networks

follower networks

neural networks

telecom networks

trade of goods and services

protein protein interactions - medicine design

citations and collaborations

power grid networks

predator prey networks

Satyaki Sikdar Community Detection 8 January 2016 4 / 37

Page 13: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Citation and Email networks

Figure: Citation network Figure: Enron email network. n= 33,696, m =180,811

Satyaki Sikdar Community Detection 8 January 2016 5 / 37

Page 14: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Telecommunication and Protein networks

Satyaki Sikdar Community Detection 8 January 2016 6 / 37

Page 15: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

Friendship and Les Miserables

Satyaki Sikdar Community Detection 8 January 2016 7 / 37

Page 16: Community Detection in Social Networks: A Brief Overview

Introduction Social Networks

High school relationship network

Nearly bipartite

One giant component and a lot of littleones

No cycles, almost tree like - information /disease spreads fast

Satyaki Sikdar Community Detection 8 January 2016 8 / 37

Page 17: Community Detection in Social Networks: A Brief Overview

Introduction Mathematical background

Network representation

Networks portray the interactions between different actors.

Actors or individuals are nodes/vertices inthe graph

If there’s interaction between two nodes,there’s an edge/link between them

The links can have weights or intensitiessignifying the strength of connections

The links can be directed, like in the webgraph. There’s a directed link betweentwo nodes (pages) A and B if there’s ahyperlink to B from A

Satyaki Sikdar Community Detection 8 January 2016 9 / 37

Page 18: Community Detection in Social Networks: A Brief Overview

Introduction Mathematical background

Degree and degree distribution

The degree of a node is the number of outward edges from that nodeThe degree distribution of a network is distribution of the fraction of nodes with a givendegree with the corresponding degrees

Node Degree

1 32 23 44 25 36 37 38 29 2

10 2Satyaki Sikdar Community Detection 8 January 2016 10 / 37

Page 19: Community Detection in Social Networks: A Brief Overview

Motivation

Table of Contents

1 Introduction

2 MotivationWhat are they and why do we even care?Communities!Justification for the presence of communities

3 The Hunt for Communities

4 The Need for Speed (and quality)

Satyaki Sikdar Community Detection 8 January 2016 11 / 37

Page 20: Community Detection in Social Networks: A Brief Overview

Motivation What are they and why do we even care?

Community Structure: An Informal Definition

The degree distribution follows a powerlaw and is long-tailed

The distribution of edges isinhomogeneous

High concentrations of edges withinspecial groups of vertices, and lowconcentrations between them. Thisfeature of real networks is calledcommunity structure

Satyaki Sikdar Community Detection 8 January 2016 12 / 37

Page 21: Community Detection in Social Networks: A Brief Overview

Motivation What are they and why do we even care?

Degree distributions of real life networks

Satyaki Sikdar Community Detection 8 January 2016 13 / 37

Page 22: Community Detection in Social Networks: A Brief Overview

Motivation Communities!

Why bother about communities?

Communities are groups of vertices which probably share common properties and/or playsimilar roles within the graph.

Society offers a wide variety of possible group organizations: families, working andfriendship circles, villages, towns, nations.

Communities also occur in many networked systems from biology, computer science,engineering, economics, politics, etc.

In protein-protein interaction networks, communities are likely to group proteins havingthe same specific function within the cell

In the graph of the World Wide Web they may correspond to groups of pages dealingwith the same or related topics

Satyaki Sikdar Community Detection 8 January 2016 14 / 37

Page 23: Community Detection in Social Networks: A Brief Overview

Motivation Communities!

Applications of Community Detection

Clustering Web clients who have similar interests and are geographically near to eachother improves the performance of services

Identifying clusters of customers with similar interests in the network of purchasenetworks of online retailers enables to set up efficient recommendation systems

Clusters of large graphs can be used to create data structures in order to efficiently storethe graph data and to handle navigational queries, like path searches

Allocation of tasks to processors in parallel computing. This can be accomplished bysplitting the computer cluster into groups with roughly the same number of processors,such that the number of physical connections between processors of different groups isminimal.

Satyaki Sikdar Community Detection 8 January 2016 15 / 37

Page 24: Community Detection in Social Networks: A Brief Overview

Motivation Communities!

A few real world examples

Figure: Zachary’s Karate Club

Figure: Collaboration network between scientistsworking at the Santa Fe Institute

Satyaki Sikdar Community Detection 8 January 2016 16 / 37

Page 25: Community Detection in Social Networks: A Brief Overview

Motivation Justification for the presence of communities

An Empirical Justification

Figure: Add health friendship data Coded by Race: Blue = Black, Yellow = White, Red = Hispanic,Green = Asian, White = Other

Satyaki Sikdar Community Detection 8 January 2016 17 / 37

Page 26: Community Detection in Social Networks: A Brief Overview

Motivation Justification for the presence of communities

Homophily: Birds of a feather stick together

There’s a visible bias in friendships

52% white students, white-white friendships 86%

38% black students, black-black friendships 85%

5% Hispanics, Hispanic-Hispanic friendships 2%

Asymmetric behavior highlights homophily

Results in non-uniform edge distributions

Promotes the formation and maintains the community structure

Satyaki Sikdar Community Detection 8 January 2016 18 / 37

Page 27: Community Detection in Social Networks: A Brief Overview

Motivation Justification for the presence of communities

Homophily: Birds of a feather stick together

There’s a visible bias in friendships

52% white students, white-white friendships 86%

38% black students, black-black friendships 85%

5% Hispanics, Hispanic-Hispanic friendships 2%

Asymmetric behavior highlights homophily

Results in non-uniform edge distributions

Promotes the formation and maintains the community structure

Satyaki Sikdar Community Detection 8 January 2016 18 / 37

Page 28: Community Detection in Social Networks: A Brief Overview

Motivation Justification for the presence of communities

Homophily: Birds of a feather stick together

There’s a visible bias in friendships

52% white students, white-white friendships 86%

38% black students, black-black friendships 85%

5% Hispanics, Hispanic-Hispanic friendships 2%

Asymmetric behavior highlights homophily

Results in non-uniform edge distributions

Promotes the formation and maintains the community structure

Satyaki Sikdar Community Detection 8 January 2016 18 / 37

Page 29: Community Detection in Social Networks: A Brief Overview

Motivation Justification for the presence of communities

Homophily: Birds of a feather stick together

There’s a visible bias in friendships

52% white students, white-white friendships 86%

38% black students, black-black friendships 85%

5% Hispanics, Hispanic-Hispanic friendships 2%

Asymmetric behavior highlights homophily

Results in non-uniform edge distributions

Promotes the formation and maintains the community structure

Satyaki Sikdar Community Detection 8 January 2016 18 / 37

Page 30: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities

Table of Contents

1 Introduction

2 Motivation

3 The Hunt for CommunitiesWhere to start?DefinitionsA naıve approach - NP hardnessGirvan-Newman AlgorithmGirvan-Newman in ActionModularityLouvain MethodOur method - methodical graph sparsification

4 The Need for Speed (and quality)Satyaki Sikdar Community Detection 8 January 2016 19 / 37

Page 31: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Where to start?

Formalizing the problem

For a given graph G(V, E), find a cover C = {C1 ,C2 , ...,Ck} such that⋃iCi = V

For disjoint communities, Ci⋂Cj = ∅ ∀i , j

For overlapping communities, Ci⋂Cj 6= ∅ ∀i , j

Figure: Zachary’s Karate Club Network

C = {C1,C2,C3},C1 = yellow nodes, C2 =green, C3 = blue is a disjoint cover

However, C = {C1, C2}, C1 = yellow & greennodes and C2 = blue & green nodes is anoverlapping cover

Satyaki Sikdar Community Detection 8 January 2016 20 / 37

Page 32: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Definitions

A few more definitions

Figure: A simple graph with threecommunities. Intra-communityedges are blue and inter-communityones in green

Let C be a community of a graph G(V, E) with |C| = nc ,|V| = n and |E| = m . We define,

Average link density δ(G) =m

n(n − 1)/2

Intra-cluster density δint(C) =#internal edges of C

nc(nc − 1)/2

Inter-cluster density δext(C) =#intercluster edges of C

nc(n − nc)

For a good community, we expect δint(C) >> δ(G) andδext(C) << δ(G)

We look to maximize∑C

(δint(C)− δext(C))

Satyaki Sikdar Community Detection 8 January 2016 21 / 37

Page 33: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities A naıve approach - NP hardness

A Naıve Approach

We have an objective function f(C) =∑C∈C

(δint(C)− δext(C))

How do we find a good C?

Exhaustive enumeration, or in simple words, brute force!

Try out all the possible communities C of all possible sizes, pick the best sets of C thatmaximizes f(C)

What’s the problem? Too many choices of C to pick from - needle in a haystack!

Even for small graphs, brute forcing becomes infeasible

Can we do better?

Satyaki Sikdar Community Detection 8 January 2016 22 / 37

Page 34: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Girvan-Newman Algorithm

A Little Background: Edge Betweenness Centrality

Betweenness centrality of an edge e is the sum of the fraction of all-pairs shortest paths that

pass through e: cB(e) =∑s,t∈V

σ(s, t|e)σ(s, t)

where σ(s, t) is the number of shortest paths from s

to t and σ(s, t|e) is the number of shortest paths from s to t passing through the edge e

Top 6 edgesEdge cB(e) type

(10, 13) 0.3 inter(3, 5) 0.23333 inter

(7, 15) 0.2079 inter(1, 8) 0.1873 inter

(13, 15) 0.1746 intra(5, 7) 0.1476 intra

Bottom 6 edgesEdge cB(e) type

(8, 11) 0.022 intra(1, 2) 0.0269 intra

(9, 11) 0.031 intra(8, 9) 0.0412 intra

(12, 15) 0.052 intra(3, 4) 0.060 intra

Satyaki Sikdar Community Detection 8 January 2016 23 / 37

Page 35: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Girvan-Newman Algorithm

The Girvan-Newman Algorithm

Proposed by Girvan and Newman in 2002, and was improved in 2004.

Based on reachability of nodes - shortest paths

Edges are selected on the basis of the edge betweenness centrality

The algorithm

1 Computation centrality for all edges

2 Removal of edge with largest centrality; ties can be broken randomly

3 Recalculation of the centralities on the running graph

4 Iterate from step 2, stop when you get clusters of desirable quality

Satyaki Sikdar Community Detection 8 January 2016 24 / 37

Page 36: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Girvan-Newman in Action

(a) Best edge: (10, 13)

(f) Final graph

(b) Best edge: (3, 5)

(e) Best edge: (2, 11)

(c) Best edge: (7, 15)

(d) Best edge: (1, 8)Satyaki Sikdar Community Detection 8 January 2016 25 / 37

Page 37: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Modularity

Modularity

For a given graph G(V, E), and a disjoint cover C = {C1 ,C2 , ...,Ck}, we have,

the number of intra-community edges as1

2

∑ij

Aijδ(ci , cj)

the expected number of edges between all pairs of nodes in a community as1

2

∑ij

kikj2m

δ(ci , cj)

the difference of the actual and the expected values is1

2

∑ij

(Aij −

kikj2m

)δ(ci , cj)

We define modularity Q =1

2m

∑ij

(Aij −

kikj2m

)δ(ci , cj). Q ∈ [−1, 1]

The higher the modularity, the better is the community structure*.

The lower it is, the more is the randomness in edge distribution

Satyaki Sikdar Community Detection 8 January 2016 26 / 37

Page 38: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Louvain Method

Louvain Method: A Greedy Approach

Proposed by Blondel et al in 2008.

Takes the greedy maximization approach

Very fast in practice, it’s the current state-of-the-art in disjoint community detection.

Performs hierarchical partitioning, stopping when there cannot be any furtherimprovement in modularity

Contracts the graph in each iteration thereby speeding up the process

Satyaki Sikdar Community Detection 8 January 2016 27 / 37

Page 39: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Louvain Method

The Algorithm

1 Initially each node is in it’s own community

2 A sequential sweep over the nodes is performed.Given a node i, the gain in weighted modularity (∆Q) coming from putting i in thecommunity of its neighbor j is computed. i is put in that community for which ∆Q ismaximum (∆Q > 0).

3 Communities are replaced by supernodes and two supernodes are connected by an edge iffthere’s at least an edge between vertices of the two communities.

4 The above two steps are repeated as long as ∆Q > 0

Satyaki Sikdar Community Detection 8 January 2016 28 / 37

Page 40: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Louvain Method

Louvain Method in Action

Satyaki Sikdar Community Detection 8 January 2016 29 / 37

Page 41: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Louvain Method

Figure: Belgian mobile phone network. The red nodes are French speakers and the Green ones areDutch

Satyaki Sikdar Community Detection 8 January 2016 30 / 37

Page 42: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Our method - methodical graph sparsification

Community Detection by Graph Sparsification

Proposed by Basuchowdhuri, Sikdar, Shreshtha, Majumder in 2015. Accepted in ACMCoDS 2016 as a full paper.

The input graph is methodically sparsified preserving the community structure. At-spanner is used for this purpose.

Louvain Method is applied on the reduced graph to obtain the clusters

Very fast in practice. Performance is comparable to Louvain Method both in terms ofquality and modularity.

Satyaki Sikdar Community Detection 8 January 2016 31 / 37

Page 43: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Our method - methodical graph sparsification

The Algorithm

1 Construct a t-spanner for the given network. Take the complement of the spanner in theoriginal network

2 Form a cover using any fast community detection in the sparsified graph

3 Run Louvain method to refine the clusters

Satyaki Sikdar Community Detection 8 January 2016 32 / 37

Page 44: Community Detection in Social Networks: A Brief Overview

The Hunt for Communities Our method - methodical graph sparsification

Figure: Original network. n =115, m = 613

Figure: Sparsified network. n= 115, m = 137

Figure: Final network. n =115, m = 137

Satyaki Sikdar Community Detection 8 January 2016 33 / 37

Page 45: Community Detection in Social Networks: A Brief Overview

The Need for Speed (and quality)

Table of Contents

1 Introduction

2 Motivation

3 The Hunt for Communities

4 The Need for Speed (and quality)Performance comparison

Satyaki Sikdar Community Detection 8 January 2016 34 / 37

Page 46: Community Detection in Social Networks: A Brief Overview

The Need for Speed (and quality) Performance comparison

Performance Comparison

Louvain Method Our AlgorithmDataset n m Modularity Time t Modularity Time

Karate 34 78 0.415 0 7 0.589422 0.5Dolphins 62 159 0.518 0 5 0.6744 0.53Football 115 613 0.604 0 9 0.8627 0.69Enron 33,696 180,811 0.596 0.38 3 0.855 13.13DBLP 317,080 1,049,866 0.819 11 9 0.9589864 78.56

Satyaki Sikdar Community Detection 8 January 2016 35 / 37

Page 47: Community Detection in Social Networks: A Brief Overview

The Need for Speed (and quality) Performance comparison

Wrapping Up

Social network analysis is a vibrant dynamic field spanning across fields like sociology,economics, physics, biology and not just CS

Community detection is an active field of research.

Not much work is done with dynamic networks.

Satyaki Sikdar Community Detection 8 January 2016 36 / 37

Page 48: Community Detection in Social Networks: A Brief Overview

The Need for Speed (and quality) Performance comparison

Thank you for listening!

Satyaki Sikdar Community Detection 8 January 2016 37 / 37