Community structure in complex networks

48
Community structure in complex networks V.A. Traag KITLV, Leiden, the Netherlands e-Humanities, KNAW, Amsterdam, the Netherlands February 21, 2014 e Royal Netherlands Academy of Arts and Sciences Humanities

description

Overview of my work in community detection. Showing how to address resolution limit and how to assess the significance of a partition.

Transcript of Community structure in complex networks

Page 1: Community structure in complex networks

Community structure in complex networks

V.A. Traag

KITLV, Leiden, the Netherlandse-Humanities, KNAW, Amsterdam, the Netherlands

February 21, 2014

eRoyal Netherlands Academy of Arts and SciencesHumanities

Page 2: Community structure in complex networks

Overview

1 What are communities in networks? How do we find them?

2 Where are those small communities?

3 When are communities significant?

4 What should I remember? And what’s next?

Page 3: Community structure in complex networks

Part IWhat are communities?How do we find them?

Page 4: Community structure in complex networks

What is a community?

• Everybody has an intuitive idea.

• Yet no single agreed upon definition.

• Common core:

Groups of nodes that areI relatively densely connected within, andI relatively sparsely connected between.

Page 5: Community structure in complex networks

General community detection

• Reward links inside community,weight aij

• Punish missing links insidecommunity, weight bij .

• General quality function

H =∑ij

(Aijaij − (1−Aij)bij)δ(σi , σj).

0

12

3

4

56

7

8

9

10

11

Page 6: Community structure in complex networks

General community detection

• Reward links inside community,weight aij

• Punish missing links insidecommunity, weight bij .

• General quality function

H =∑ij

(Aijaij − (1−Aij)bij)δ(σi , σj).

0

12

3

4

56

7

8

9

10

11

Page 7: Community structure in complex networks

General community detection

• Reward links inside community,weight aij

• Punish missing links insidecommunity, weight bij .

• General quality function

H =∑ij

(Aijaij − (1−Aij)bij)δ(σi , σj).

0

12

3

4

56

7

8

9

10

11

Page 8: Community structure in complex networks

Different weights

No a-priori constraints on weights aij , bij .

Model aij bijReichardt & Bornholdt 1− bij γpijArenas, Fernandez & Gomez 1− bij pij(γ)− γδijRonhovde & Nussinov 1 γConstant Potts Model 1− γ γ

Page 9: Community structure in complex networks

Modularity

• Null-model pij , constraint:∑

ij pij = 2m.

• Popular null-model, configuration model pij =kikj2m .

• With γ = 1, leads to modularity:

Q =∑ij

(Aij −

kikj2m

)δ(σi , σj).

• As sum over communities:

Q =∑c

(ec − 〈ec〉).

Page 10: Community structure in complex networks

Optimising modularity

Initial communities

0

12

3

4

56

7

8

9

10

11

Page 11: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

0

12

3

4

56

7

8

9

10

11

Page 12: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

0

12

3

4

56

7

8

9

10

11

Page 13: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

Move 11

0

12

3

4

56

7

8

9

10

11

Page 14: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

Move 11

0

12

3

4

56

7

8

9

10

11

No more improvement

Page 15: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

Move 11

0

1

2

1

1 1

3 6

5

Aggregate graph, andrepeat same procedure.

Page 16: Community structure in complex networks

Optimising modularity

Initial communities

Move 0

Move 5

Move 11

0

1

2

1

1 1

3 6

5

Aggregate graph, andrepeat same procedure.

Louvain algorithm

1 Move node i to best (greedy) community.

2 Repeat (1) until no more improvement.

3 Contract graph (communities → nodes).

4 Repeat (1)-(3) until no more improvement.

Page 17: Community structure in complex networks

Part IIWhere are those small

communities?

Page 18: Community structure in complex networks

Resolution limit

• Modularity might miss ‘small’communities.

• Merge two cliques in ring of cliqueswhen

γRB <q

nc(nc − 1) + 2.

• Number of communities scales as√γRBm.

• For general null model, problemremains since

∑ij pij = 2m.

Page 19: Community structure in complex networks

Resolution limit

• Modularity might miss ‘small’communities.

• Merge two cliques in ring of cliqueswhen

γRB <q

nc(nc − 1) + 2.

• Number of communities scales as√γRBm.

• For general null model, problemremains since

∑ij pij = 2m.

Page 20: Community structure in complex networks

Resolution-limit-free

• Ronhovde & Nussinov model (aij = 1, bij = γ).

• Claim: resolution-limit-free, as merge depends only on ‘local’variables

γRN <1

n2c − 1.

• But, take pij = kikj , we obtain

γRB <1

2(nc(nc − 1) + 2)2,

also only ‘local’ variables. Hence, also resolution-limit-free?

• Problems of scale remain.

Page 21: Community structure in complex networks

Resolution limit

Page 22: Community structure in complex networks

Resolution limit

Page 23: Community structure in complex networks

Resolution limit

Resolution limit

Resolution-limit-free

Page 24: Community structure in complex networks

Defining resolution-limit-free

Definition (Resolution-limit-free)

Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.

Theorem (Swap optimal subpartitions)

If C is optimal, with subpartition D, we can replace D by anotheroptimal subpartition D ′.

Page 25: Community structure in complex networks

Defining resolution-limit-free

Definition (Resolution-limit-free)

Objective function H is called resolution-limit-free if, wheneverpartition C optimal for G , then subpartition D ⊂ C also optimalfor subgraph H(D) ⊂ G induced by D.

Theorem (Swap optimal subpartitions)

If C is optimal, with subpartition D, we can replace D by anotheroptimal subpartition D ′.

What methods areresolution-limit-free?

Page 26: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Page 27: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Page 28: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Page 29: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Inverse not true: some small perturbation (i.e. non local weight)will not change optimal partition. But very few exceptions.

Page 30: Community structure in complex networks

Resolution-limit-free methods

• RN and CPM can be easily proven resolution-limit-free.

• What about other weights aij and bij?

Definition (Local weights)

Weights aij , bij called local whenever for subgraph H ⊂ G , weightsremain similar, i.e. aij(G ) = λ(H)aij(H) and bij(G ) = λ(H)bij(H).

Theorem (Local weights ⇒ resolution-limit-free)

Objective function H is resolution-limit-free if weights are local.

Inverse not true: some small perturbation (i.e. non local weight)will not change optimal partition. But very few exceptions.

Local methods areresolution-limit-free.

Page 31: Community structure in complex networks

Part IIWhen are communities

significant?

Page 32: Community structure in complex networks

Modularity in non-modular graphs

Modularity as sign of community structure

• Modularity −1 ≤ Q ≤ 1.

• High modularity ⇒ community structure?

• Modularity higher than 0.3 seen as significant.

Page 33: Community structure in complex networks

Modularity in non-modular graphs

Modularity as sign of community structure

• Modularity −1 ≤ Q ≤ 1.

• High modularity ⇒ community structure?

• Modularity higher than 0.3 seen as significant.

Many graphs have high modularity,but no community structure.

Page 34: Community structure in complex networks

Modularity without community structure

Q = 0.31

Modularity Q 6≈ 0 for random graphs.

Page 35: Community structure in complex networks

Significance

How significant is a partition?

Page 36: Community structure in complex networks

Significance

E = 14

E = 9

Fixed partition

E = 11

Better partition

Page 37: Community structure in complex networks

Significance

E = 14

E = 9

Fixed partition

E = 11

Better partition

• Not: Probability to find E edges in partition.

• But: Probability to find partition with E edges.

Page 38: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 39: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 40: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 41: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 42: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 43: Community structure in complex networks

Subgraph probability

Decompose partition

• Probability to find partition with E edges.

• Probability to find communities with ec edges.

• Asymptotic estimate

• Probability for subgraph of nc nodes with density pc

Pr(S(nc , pc) ⊆ G (n, p)) ≈ exp[−n2cD(pc ‖ p)

]

Significance

• Probability for all communities Pr(σ) ≈∏c

exp[−n2cD(pc ‖ p)

].

• Significance S(σ) = − log Pr(σ) =∑c

n2cD(pc ‖ p).

Page 44: Community structure in complex networks

Significance

10−3 10−2 10−1 100103

104

105

106

γ

N E

Page 45: Community structure in complex networks

Significance

10−3 10−2 10−1 100103

104

105

106

γ

N E S

Page 46: Community structure in complex networks

Final ChapterWhat should I remember? And

what’s next?

Page 47: Community structure in complex networks

Conclusions

To remember

• Modularity can hide small communities.

• Local methods avoid this problem (RN, CPM).

• High modularity 6⇒ significant: use significance.

What’s next?

• Various measures of significance: what’s the difference?

• Choose “correct” resolution ⇒ resolution limit?

Page 48: Community structure in complex networks

Thank you!Questions?

Traag, Van Dooren & NesterovNarrow scope for resolution-limit-free community detectionPhys Rev E 84, 016114 (2011)

Traag, Krings & Van DoorenSignificant scales in community structureSci Rep 3, 2930 (2013)

Reichardt & BornholdtStatistical mechanics of community detection.Phys Rev E 74, 016110 (2006)

m www.traag.net B [email protected] @vtraag