SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy...

95
SOCIAL NETWORKS and COMMUNITY DETECTION

Transcript of SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy...

Page 1: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

SOCIAL NETWORKS

andCOMMUNITY DETECTION

Page 2: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

“Networks” is a pervasive term?

Networked Economy

Immigrant Networks

National Innovation Networks

Networking

Entrepreneurial Networks

Ego Networks Regional

Networks

Infrastructure Networks

Social Networks

Page 3: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Social Networks

• What Social Network Analysis is?

Network Analysis is the keywordFor the 21st Century

Researchers , Politicians , People talk about Social Networks.

Page 4: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

What is a Network?

(Web definition) A set of nodes, points, or locations connected by means of data, voice, and video communications for the purpose of exchange.

Page 5: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Real World Web Networks

• Internet• World Wide Web.• Citation Networks.• Transportation Network.• Food Webs.• Social Networks.• Biochemical Networks.

Page 6: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Social Networks

A social network is a description of the social structure between actors, mostly individuals or organizations. It indicates the ways in which they are connected through various social familiarities ranging from casual acquaintance to close familiar bonds.

Page 7: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Marriage ties among Renaissance Florentine families

A paleo-social network

Page 8: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Social Network Analysis• Social network analysis [SNA] is the mapping and measuring of relationships

and flows between people, groups, organizations, computers or other information/knowledge processing entities. It also includes community detection.

• The nodes in the network are the people and groups while the links (ties) show relationships or flows between the nodes.

• Community detection is discovering groups in the network where individuals’ group memberships are not explicitly given

Page 9: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

The unit of interest in a network are the combined sets of actors and their relations.

We represent actors with points and relations with lines.

Actors are referred to variously as:Nodes, vertices or points

Relations are referred to variously as:Edges, Arcs, Lines, Ties

Example:

a

b

c e

d

Social Network Data

Page 10: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

SN = graph

• A network can then be represented as a graph data structure

• We can apply a variety of measures and analysis to the graph representing a given SN

• Ties in a SN can be directed or undirected (e.g. friendship, co-authorship are usually undirected, emails are directed)

Page 11: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

From graphs to matrices

a

b

c e

d

Undirected, binary (0,1) Directed, binary

a

b

c e

d

a b c d ea

b

c

d

e

1

1

1 1 1

1 1

a b c d ea

b

c

d

e

1

1 1

1 1 1

1 11 1

Basic Data StructuresSocial Network

Page 12: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

From matrices to lists

a b c d ea

b

c

d

e

1

1 1

1 1 1

1 11 1

a bb a cc b d ed c ee c d

a bb ab cc bc dc ed cd ee ce d

Adjacency List Arc List

Basic Data StructuresSocial Network

Page 13: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

In general, a relation can be:Binary or ValuedDirected or Undirected

a

b

c e

d

Undirected, binary Directed, binary

a

b

c e

d

a

b

c e

d

Undirected, Valued Directed, Valued

a

b

c e

d1 3

4

21

Social Network as a graph

Page 14: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Social Network

• Measuring the flow of information– Topology

• Connectivity• Centrality

– Time

• Structure & Social Space

Page 15: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

In addition to the simple probability that one actor passes information on to another (pij), two factors affect flow through a network:

Topology-the shape, or form, of the network- Example: one actor cannot pass information to another unless they are either directly or indirectly connected

Time - the timing of contact matters- Example: an actor cannot pass information he has not receive yet

Measuring Networks: Flow

Page 16: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Two features of the network’s topology are known to be important: connectivity and centrality

Connectivity refers to how actors in one part of the network are connected to actors in another part of the network.

• Reachability: Is it possible for actor i to reach actor j? This can only be true if there is a chain of contact from one actor to another.

• Distance: Given they can be reached, how many steps are they from each other?

• Number of paths: How many different paths connect each pair?

Measuring Networks: Topology

Page 17: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Without full network data, you can’t distinguish actors with limited information potential from those more deeply embedded in a setting.

a

b

c

Measuring Networks: Connectivity

Page 18: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

d e

c

Indirect connections are what make networks systems. One actor can reach another if there is a path in the graph connecting them.

a

b

c e

d

f

b f

a

Reachability

Measuring Networks: Connectivity

Paths can be directed, leading to a distinction between “strong” and “weak” components

Page 19: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Reachability

If you can trace a sequence of relations from one actor to another, then the two are reachable. If there is at least one path connecting every pair of actors in the graph, the graph is connected and is called a component.

Intuitively, a component is the set of people who are all connected by a chain of relations.

Measuring Networks: Connectivity

Page 20: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

This example contains many components.

ReachabilityMeasuring Networks: Connectivity

Page 21: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

In general, components can be directed or undirected.

For a graph with any directed edges, there are two types of components:

Strong components consist of the set(s) of all nodes that are mutually reachable

Weak components consist of the set(s) of all nodes where at least one node can reach the other.

ReachabilityMeasuring Networks: Connectivity

(hidden)

Page 22: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

There are only 2 strong components with more than 1 person in this network.

ReachabilityMeasuring Networks: Connectivity

(hidden)

Page 23: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

a

Distance is measured by the (weighted) number of relations separating a pair:

Actor “a” is: 1 step from 4 2 steps from 5 3 steps from 4 4 steps from 3 5 steps from 1

Distance & number of paths

Measuring Networks: Connectivity

Page 24: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Paths are the different routes one can take. Node-independent paths are particularly important.

a

b

There are 2 independent paths connecting a and b.There are many non-independent paths

Distance & number of pathsMeasuring Networks: Flow

Page 25: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Probability of transferby distance and number of paths, assume a constant pij of 0.6

0

0.2

0.4

0.6

0.8

1

1.2

2 3 4 5 6Path distance

pro

ba

bil

ity

10 paths

5 paths

2 paths

1 path

Distance & number of pathsMeasuring Networks: Connectivity

Page 26: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Centrality refers to (one dimension of) location, identifying where an actor resides in a network.

• For example, we can compare actors at the edge of the network to actors at the center.

• In general, this is a way to formalize intuitive notions about the distinction between insiders and outsiders.

Centrality

Measuring Networks: Centrality

Page 27: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Conceptually, centrality is fairly straight forward: we want to identify which nodes are in the ‘center’ of the network. In practice, identifying exactly what we mean by ‘center’ is somewhat complicated.

Three standard centrality measures capture a wide range of “importance” in a network:

•Degree•Closeness•Betweenness

Measuring Networks: Centrality

Page 28: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

The most intuitive notion of centrality focuses on degree. Degree is the number of ties, and the actor with the most ties is the most important:

j

ijiiD XXndC )(

Centrality DegreeMeasuring Networks: Centrality

Page 29: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

If we want to measure the degree to which the graph as a whole is centralized, we look at the dispersion of centrality:

Simple: variance of the individual centrality scores.

gCnCSg

idiDD /))((

1

22

Or, using Freeman’s general formula for centralization:

)]2)(1[(

)()(1

*

gg

nCnCC

g

i iDDD

CD(n*) is the maximum attained value of the same network size, therefore we are measuring the dispersion around that value

Measuring Networks: Centrality

Page 30: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Degree Centralization Scores

Freeman: .07Variance: .20

Freeman: 1.0Variance: 3.9

Freeman: .02Variance: .17

Freeman: 0.0Variance: 0.0

Measuring Networks: Centrality

Page 31: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Degree Centralization Scores

Freeman: 0.1Variance: 4.84

Measuring Networks: Centrality

Page 32: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

A second measure of centrality is closeness centrality. An actor is considered important if he/she is relatively close to all other actors.

Closeness is based on the inverse of the distance of each actor to every other actor in the network.

1

1

),()(

g

jjiic nndnC

)1))((()(' gnCnC iCiC

Closeness Centrality:

Normalized Closeness Centrality

Measuring Networks: Centrality

Page 33: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Distance Closeness normalized

0 1 1 1 1 1 1 1 .143 1.00 1 0 2 2 2 2 2 2 .077 .538 1 2 0 2 2 2 2 2 .077 .538 1 2 2 0 2 2 2 2 .077 .538 1 2 2 2 0 2 2 2 .077 .538 1 2 2 2 2 0 2 2 .077 .538 1 2 2 2 2 2 0 2 .077 .538 1 2 2 2 2 2 2 0 .077 .538

Closeness Centrality in the examples

Distance Closeness normalized

0 1 2 3 4 4 3 2 1 .050 .400 1 0 1 2 3 4 4 3 2 .050 .400 2 1 0 1 2 3 4 4 3 .050 .400 3 2 1 0 1 2 3 4 4 .050 .400 4 3 2 1 0 1 2 3 4 .050 .400 4 4 3 2 1 0 1 2 3 .050 .400 3 4 4 3 2 1 0 1 2 .050 .400 2 3 4 4 3 2 1 0 1 .050 .400 1 2 3 4 4 3 2 1 0 .050 .400

Measuring Networks: Centrality

Page 34: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Distance Closeness normalized 0 1 2 3 4 5 6 .048 .286 1 0 1 2 3 4 5 .063 .375 2 1 0 1 2 3 4 .077 .462 3 2 1 0 1 2 3 .083 .500 4 3 2 1 0 1 2 .077 .462 5 4 3 2 1 0 1 .063 .375 6 5 4 3 2 1 0 .048 .286

Closeness Centrality in the examples

Measuring Networks: Centrality

Page 35: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Distance Closeness normalized

0 1 1 2 3 4 4 5 5 6 5 5 6 .021 .255 1 0 1 1 2 3 3 4 4 5 4 4 5 .027 .324 1 1 0 1 2 3 3 4 4 5 4 4 5 .027 .324 2 1 1 0 1 2 2 3 3 4 3 3 4 .034 .414 3 2 2 1 0 1 1 2 2 3 2 2 3 .042 .500 4 3 3 2 1 0 2 3 3 4 1 1 2 .034 .414 4 3 3 2 1 2 0 1 1 2 3 3 4 .034 .414 5 4 4 3 2 3 1 0 1 1 4 4 5 .027 .324 5 4 4 3 2 3 1 1 0 1 4 4 5 .027 .324 6 5 5 4 3 4 2 1 1 0 5 5 6 .021 .255 5 4 4 3 2 1 3 4 4 5 0 1 1 .027 .324 5 4 4 3 2 1 3 4 4 5 1 0 1 .027 .324 6 5 5 4 3 2 4 5 5 6 1 1 0 .021 .255

Closeness Centrality in the examplesMeasuring Networks: Centrality

Page 36: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Closeness Centrality in the examples

Measuring Networks: Centrality

Page 37: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Measuring Networks: Centrality

Identify the set of all vertices A where the greatest distance d(A,B) to other vertices B is minimal.

Value = longest distance to any other node.

The graph theoretic center is ‘3’

Graph-teoretic center

Page 38: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Graph Theoretic Center (Barry or Jordan Center).

Measuring Networks: Centrality

Page 39: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Betweenness Centrality:Model based on communication flow: A person who lies on communication paths can control communication flow, and is thus important. Betweenness centrality counts the number of geodesic paths between i and k that actor j resides on. Geodesics are defined as the shortest path between points

b

a

C d e f g h

Measuring Networks: Centrality

Page 40: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Betweenness Centrality:

a

b c

d

e

f g h

i j

k

lm

a

b

d

e

f

k

m

l

m

g

h

j

i

j

c

d

e

f

k

m

l

m

g

h

j

i

j

Measuring Networks: Centrality

Page 41: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

kj

jkijkiB gngnC /)()(

Betweenness Centrality:

Where gjk = the number of geodesics connecting jk, and gjk = the number that actor i is on.

Usually normalized by:

]2/)2)(1/[()()(' ggnCnC iBiB

Measuring Networks: Centrality

Page 42: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Centralization: 1.0

Centralization: .31

Centralization: .59 Centralization: 0

Betweenness Centrality:Measuring Networks: Centrality

Page 43: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Centralization: .183

Measuring Networks: Centrality

Page 44: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Information Centrality:It is quite likely that information can flow through paths other than the geodesic. The Information Centrality score uses all paths in the network, and weights them based on their length.

Measuring Networks: Centrality

Page 45: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Information Centrality:

Measuring Networks: Centrality

Page 46: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

(Node size proportional to betweenness centrality )

Actors that appear very different when seen individually, are comparable in the global network.

Measuring Networks: Centrality

Page 47: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Two factors that affect network flows:

Topology- the shape, or form, of the network- simple example: one actor cannot pass information to another unless they are either directly or indirectly connected

Time - the timing of contacts matters- simple example: an actor cannot pass information he has not yet received.

Time

Measuring Networks: Time

Page 48: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Timing in networks

A focus on contact structure has often slighted the importance of network dynamics, though a number of recent works are addressing this.

Time affects networks in two important ways:1) The structure itself evolves, in ways that will

affect the topology an thus flow.

2) The timing of contact constrains information flow

Measuring Networks: Time

Page 49: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Data on drug users in Colorado Springs, over 5 years

Drug Relations, Colorado Springs, Year 1

Measuring Networks: Time

Page 50: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Drug Relations, Colorado Springs, Year 2Current year in red, past relations in gray

Measuring Networks: Time

Page 51: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Drug Relations, Colorado Springs, Year 3Current year in red, past relations in gray

Measuring Networks: Time

Page 52: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Drug Relations, Colorado Springs, Year 4Current year in red, past relations in gray

Measuring Networks: Time

Page 53: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Drug Relations, Colorado Springs, Year 5Current year in red, past relations in gray

Measuring Networks: Time

Page 54: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

B

C E

D F

A2 - 5

3 - 7

0 - 1

8 - 9

3 - 5

Numbers above lines indicate contact periods

What impact does timing have on flow through the network?

Measuring Networks: Time

Page 55: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

B

C E

D F

A

The path graph for the hypothetical contact network

While clearly important, this is not often handled well by current SNA software.

Measuring Networks: Time

Page 56: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Measuring Networks: Structure & Social Space

The second broad division for measuring networks steps back to generalized features of the global network.

These factors almost always are of interest because of what they imply about how information moves through the network, but have resulted in a distinct line of methods and substantive research.

The study of these generalized features is also known as community detection (small worlds analysis, etc.)

Page 57: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Community

• Community: It is formed by individuals such that those within a group interact with each other more frequently than with those outside the group– a.k.a. group, cluster, cohesive subgroup, module in different contexts

• Community detection: discovering groups in a network where individuals’ group memberships are not explicitly given

• Why communities in social media? – Human beings are social– Easy-to-use social media allows people to extend their social life in

unprecedented ways– Difficult to meet friends in the physical world, but much easier to find

friend online with similar interests– Interactions between nodes can help determine communities

59

Page 58: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Communities in Social Media• Two types of groups in social media

– Explicit Groups: formed by user subscriptions– Implicit Groups: implicitly formed by social interactions

• Some social media sites allow people to join groups, is it necessary to extract groups based on network topology?– Not all sites provide community platform– Not all people want to make effort to join groups– Groups can change dynamically

• Network interaction provides rich information about the relationship between users– Can complement other kinds of information, e.g. user profile– Help network visualization and navigation– Provide basic information for other tasks, e.g. recommendation

60

Page 59: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Subjectivity of Community Definition

Each component is a communityA densely-knit

community

Definition of a community can be subjective.

Definition of a community can be subjective.

61

Page 60: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Taxonomy of Community Criteria • Criteria vary depending on the tasks• Roughly, community detection methods can be divided

into 4 categories (not exclusive): • Node-Centric Community

– Each node in a group satisfies certain properties

• Group-Centric Community– Consider the connections within a group as a whole. The group

has to satisfy certain properties without zooming into node-level

• Network-Centric Community– Partition the whole network into several disjoint sets

• Hierarchy-Centric Community – Construct a hierarchical structure of communities

62

Page 61: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Node-Centric Community Detection

• Nodes satisfy different properties– Complete Mutuality

• cliques

– Reachability of members• k-clique, k-clan, k-club

– Nodal degrees • k-plex, k-core

– Relative frequency of Within-Outside Ties• LS sets, Lambda sets

• Commonly used in traditional social network analysis• Here, we discuss some representative ones

63

Page 62: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Complete Mutuality: Cliques

• Clique: a maximum complete subgraph in which all nodes are adjacent to each other

• NP-hard to find the maximum clique in a network• Straightforward implementation to find cliques is very

expensive in time complexity

Nodes 5, 6, 7 and 8 form a clique

64

Page 63: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Finding the Maximum Clique

• In a clique of size k, each node maintains degree >= k-1– Nodes with degree < k-1 will not be included in the maximum

clique

• Recursively apply the following pruning procedure– Sample a sub-network from the given network, and find a

clique in the sub-network, say, by a greedy approach

– Suppose the clique above is size k, in order to find out a larger clique, all nodes with degree <= k-1 should be removed.

• Repeat until the network is small enough• Many nodes will be pruned as social media networks

follow a power law distribution for node degrees

65

Page 64: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Maximum Clique Example

• Suppose we sample a sub-network with nodes {1-9} and find a clique {1, 2, 3} of size 3

• In order to find a clique >3, remove all nodes with degree <=3-1=2– Remove nodes 2 and 9– Remove nodes 1 and 3– Remove node 4

66

Page 65: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Clique Percolation Method (CPM)

• Clique is a very strict definition, unstable• Normally use cliques as a core or a seed to find larger

communities

• CPM is such a method to find overlapping communities– Input

• A parameter k, and a network – Procedure

• Find out all cliques of size k in a given network• Construct a clique graph. Two cliques are adjacent if they

share k-1 nodes• Each connected components in the clique graph form a

community

67

Page 66: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

CPM ExampleCliques of size 3:{1, 2, 3}, {1, 3, 4}, {4, 5, 6}, {5, 6, 7}, {5, 6, 8}, {5, 7, 8}, {6, 7, 8}

Communities: {1, 2, 3, 4}

{4, 5, 6, 7, 8}

68

Page 67: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Reachability : k-clique, k-club • Any node in a group should be reachable in k hops• k-clique: a maximal subgraph in which the largest geodesic

distance between any two nodes <= k • k-club: a substructure of diameter <= k

• A k-clique might have diameter larger than k in the subgraph– E.g. {1, 2, 3, 4, 5}

• Commonly used in traditional SNA• Often involves combinatorial optimization

Cliques: {1, 2, 3}2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6}2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6}

69

Page 68: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Group-Centric Community Detection: Density-Based Groups

• The group-centric criterion requires the whole group to satisfy a certain condition– E.g., the group density >= a given threshold

• A subgraph is a quasi-clique if

where the denominator is the maximum number of degrees.

• A similar strategy to that of cliques can be used– Sample a subgraph, and find a maximal

quasi-clique (say, of size )– Remove nodes with degree less than the average degree

,

<

Page 69: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Network-Centric Community Detection

• Network-centric criterion needs to consider the connections within a network globally

• Goal: partition nodes of a network into disjoint sets

• Approaches:– (1) Clustering based on vertex similarity

– (2) Latent space models (multi-dimensional scaling )

– (3) Block model approximation

– (4) Spectral clustering

– (5) Modularity maximization

71

Page 70: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Clustering based on Vertex Similarity

• Apply k-means or similarity-based clustering to nodes• Vertex similarity is defined in terms of the similarity of their

neighborhood• Structural equivalence: two nodes are structurally

equivalent iff they are connecting to the same set of actors

• Structural equivalence is too restrict for practical use.

Nodes 1 and 3 are structurally equivalent; So are nodes 5 and 6.

72

(1) Clustering based on vertex similarity

Page 71: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Vertex Similarity

• Jaccard Similarity

• Cosine similarity

73

(1) Clustering based on vertex similarity

Page 72: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Vertex Similarity (2)

Page 73: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Clustering based on vertex similarity (K-means)

• Each cluster is associated with a centroid• Each node is assigned to the cluster with the

closest centroid

Page 74: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.
Page 75: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Cut

• Most interactions are within group whereas interactions between groups are few

• community detection minimum cut problem• Cut: A partition of vertices of a graph into two disjoint

sets• Minimum cut problem: find a graph partition such that

the number of edges between the two sets is minimized

77

Page 76: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Ratio Cut & Normalized Cut

• Minimum cut often returns an imbalanced partition, with one set being a singleton, e.g. node 9

• Change the objective function to consider community size

Ci,: a community|Ci|: number of nodes in Ci

vol(Ci): sum of degrees in Ci

78

Page 77: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Ratio Cut & Normalized Cut Example

For partition in red:

For partition in green:

Both ratio cut and normalized cut prefer a balanced partition79

Page 78: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Hierarchy-Centric Community Detection

• Goal: build a hierarchical structure of communities based on network topology

• Allow the analysis of a network at different resolutions

• Representative approaches: – Divisive Hierarchical Clustering (top-down)– Agglomerative Hierarchical clustering (bottom-up)

80

Page 79: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Divisive Hierarchical Clustering

• Divisive clustering– Partition nodes into several sets

– Each set is further divided into smaller ones

– Network-centric partition can be applied for the partition

• One particular example: recursively remove the “weakest” tie– Find the edge with the least strength

– Remove the edge and update the corresponding strength of each edge

• Recursively apply the above two steps until a network is decomposed into desired number of connected components.

• Each component forms a community

81

Page 80: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Edge Betweenness• The strength of a tie can be measured by edge betweenness

• Edge betweenness: the number of shortest paths that pass along with the edge

• The edge with higher betweenness tends to be the bridge between two communities.

The edge betweenness of e(1, 2) is 4 (=6/2 + 1), as all the shortest paths from 2 to {4, 5, 6, 7, 8, 9} have to either pass e(1, 2) or e(2, 3), and e(1,2) is the shortest path between 1 and 2

82

Page 81: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Divisive clustering based on edge betweenness

Initial betweenness value

After remove e(4,5), the betweenness of e(4, 6) becomes 20, which is the highest;

83

Idea: progressively removing edges with the highest betweenness

After remove e(4,6), the edge e(7,9) has the highest betweenness value 4, and should be removed.

Page 82: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.
Page 83: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Agglomerative Hierarchical Clustering

• Initialize each node as a community• Merge communities successively into larger

communities following a certain criterion– E.g., based on vertex similarity

85

Dendrogram according to Agglomerative Clustering based on Modularity

Page 84: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.
Page 85: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Summary of Community Detection

• Node-Centric Community Detection– cliques, k-cliques, k-clubs

• Group-Centric Community Detection– quasi-cliques

• Network-Centric Community Detection– Clustering based on vertex similarity

• Hierarchy-Centric Community Detection– Divisive clustering– Agglomerative clustering

87

Page 86: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

COMMUNITY EVALUATION

88

Page 87: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Evaluating Community Detection (1)

• For groups with clear definitions– E.g., Cliques, k-cliques, k-clubs, quasi-cliques– Verify whether extracted communities satisfy the

definition

• For networks with ground truth information– Normalized mutual information– Accuracy of pairwise community memberships

89

Page 88: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Measuring a Clustering Result

• The number of communities after grouping can be different from the ground truth

• No clear community correspondence between clustering result and the ground truth

• Normalized Mutual Information can be used

Ground Truth

1, 2, 3 4, 5, 6 1, 3 2 4, 5, 6

Clustering Result

How to measure the clustering quality?

How to measure the clustering quality?

90

Page 89: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Accuracy of Pairwise Community Memberships

• Consider all the possible pairs of nodes and check whether they reside in the same community

• An error occurs if– Two nodes belonging to the same community are assigned

to different communities after clustering– Two nodes belonging to different communities are assigned

to the same community • Construct a contingency table or confusion matrix

91

Page 90: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Accuracy Example

Ground Truth

C(vi) = C(vj) C(vi) ≠ C(vj)

ClusteringResult

C(vi) = C(vj) 4 0

C(vi) ≠ C(vj) 2 9

Ground Truth

1, 2, 3

4, 5, 6

1, 3 2

4, 5, 6

Clustering Result

Accuracy = (4+9)/ (4+2+9+0) = 13/15

92

Page 91: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Normalized Mutual Information• Entropy: the information contained in a distribution

• Mutual Information: the shared information between two distributions

• Normalized Mutual Information (between 0 and 1)

• Consider a partition as a distribution (probability of one node falling into one community), we can compute the matching between the clustering result and the ground truth

93

or

Page 92: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

ka, kb = clusters generati dalle partizioni πa, πb, h e l sonogli indici dei clusters nelle partizioni

Page 93: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

NMI-Example• Partition a: [1, 1, 1, 2, 2, 2] • Partition b: [1, 2, 1, 3, 3, 3]

1, 2, 3 4, 5, 61, 3 2 4, 5,6

h=1 3

h=2 3

ahn

l=1 2

l=2 1

l=3 3

bln l=1 l=2 l=3

h=1 2 1 0

h=2 0 0 3

lhn ,

=0.8278

95Reference: http://www.cse.ust.hk/~weikep/notes/NormalizedMI.m

contingency table or confusion matrix

Page 94: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Evaluation using Semantics

• For networks with semantics– Networks come with semantic or attribute

information of nodes or connections– Human subjects can verify whether the extracted

communities are coherent • Evaluation is qualitative• It is also intuitive and helps understand a community

An animalcommunity

A healthcommunity

96

Page 95: SOCIAL NETWORKS and COMMUNITY DETECTION. “Networks” is a pervasive term? Networked Economy Immigrant Networks National Innovation Networks Networking.

Evaluation without Ground Truth

• For networks without ground truth or semantic information• This is the most common situation• An option is to resort to cross-validation

– Extract communities from a (training) network– Evaluate the quality of the community structure on a network

constructed from a different date or based on a related type of interaction

• Quantitative evaluation functions– Modularity (M.Newman. Modularity and community structure in

networks. PNAS 06.)– Link prediction (the predicted network is compared with the true

network)

97