Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in...

27
Methods and Applications for Detecting Structure in Complex Networks Elizabeth A. Leicht

Transcript of Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in...

Page 1: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Methods and Applications for Detecting Structure in Complex Networks

Elizabeth A. Leicht

Page 2: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Introduction

http://www-personal.umich.edu/~mejn/networks/attweb.gif

Yeast proteins: Maslov & Sneppen, Science 296, 910-913 (2002).

Web site: M. E. J. Newman & M. Girvan, Physical Review E 69, 026113 (2004).

7th Grade Students: adapted from Moreno, Who Shall Survive, 1934.

Page 3: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Visualizing a Network

Page 4: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Large Networks

Page 5: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Talk Outline

• Introduction to networks and their structure.

• Network structure detection method 1: linear-algebra and optimization techniques.

• Network structure detection method 2: probabilistic mixture models and the expectation-maximization algorithm.

• Conclusions.

Page 6: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Detecting Communities in Directed Networks

- In many networks, the vertices divide naturally into communities, also known as groups or modules.

- Previous methods for detecting communities in networks have mostly focused just on undirected networks.

- One established method for partitioning a network relies on identifying statistically surprising configurations of edges by measuring modularity (Q).

Q =

(fraction of edges within communities)- (expected fraction of such edges)

Page 7: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Calculating Modularity

Rewrite modularity using the adjacency matrix.

Q =1

m

n!

ij=1

[Aij ! Pij ] !ci,cj

How do we determine the “expected” number of edges between two vertices?

Aij =

!

1, if there is an edge from j to i

0, otherwise .

ci = the community to which i belongs.Pij = the expected number of edges from j to i.

Pij =

kini koutj

m

kini

koutj

Page 8: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Division of a Network into Two Communitiessj = !1si = +1

Q =1

2m

n!

ij=1

!

Aij !kini koutj

m

"

(sisj + 1)

!ij =1

2(sisj + 1)

Let Bij = Aij !

kini koutj

mbe an element of the modularity matrix.

Let v(1) be the eigenvectorcorresponding to the mostpositive eigenvalue of thesymmetric matrix B + BT

.

si =

!

+1, if v(1)i

> 0

!1, if v(1)i

< 0

Q =1

2msTBs =

1

2msTB

Ts =

1

4msT

!

B + BT"

s

Page 9: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Two Communities and More

- A network can be divided into more than two communities by repeated bisection.

- We can calculate the additional contribution to the modularity, ∆ Q, from the each additional division and halt the divisions

when no there is no additional increase in the modularity.

Page 10: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Communities with Edge Direction Bias

Page 11: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Communities with Edge Direction Bias

Page 12: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Hubs and Authorities Example

Page 13: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Hubs and Authorities Example

Page 14: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Edge Direction Bias in Real Networks

Minnesota

Indiana

Michigan

Iowa Wisconsin

Michigan State

Northwestern Ohio State

Pennsylvania State

Purdue

Illinois

Page 15: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Edge Direction Bias in Real Networks

Purdue IndianaIowa

Michigan

Wisconsin

Michigan State

Minnesota

Northwestern Ohio State

Pennsylvania State

Illinois

Page 16: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Exploratory Analysis of Structure in Networks

- Previously we identified communities in networks because we specifically sought a method to detect modules in networks.

- Reliance on specific measures of network structure where we are required to know the type of structure, for which we are looking, in advance can be limiting.

- We turn to probabilistic techniques and the Expectation Maximization (EM) Algorithm to identify general patterns of connection between vertices.

Page 17: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

The Method

There are three types of quantities in this method of approach.

1. Observed data: the actual edges falling between pairs of vertices in a network ( ).

2. Missing data: we assume that the vertices divide into c groups. We denote the group to which vertex belongs as and set of all missing data as

3. Model parameters: they describe the patterns of vertices in different groups (θ, π).

A

i

gi g.

!ri =

the probability that there exists an edgefrom a vertex in group r to a vertex i

!r = the probability of a vertex belonging to group r

c!

r=1

!r = 1

n!

i=1

!ri = 1

Page 18: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

The likelihood of the data given the model is

where

Frequently, one works not with the likelihood itself, but with the log-likelihood.

A Likelihood Problem

Pr(A|g,!, ") =!

ij

"Aij

gj,iand Pr(g|!, ") =

!

j

!gj.

Pr(A,g|!, ") = Pr(A|g,!, ")Pr(g|!, "),

L = ln Pr(A,g|!, ") =!

j

!

ln!gj+

"

i

"Aij

gj,i

"

.

Page 19: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

We cannot directly observe g.

It is, however, possible to calculate an expected value for the log-likelihood over all possible values of g.

where

Dealing with the “Missing” Data

L =

c!

r=1

n!

j=1

qrj

!

ln!r +

n!

i=1

Aij ln "ri

"

qjr = Pr(gj = r|A,!, ") =Pr(A,gj = r|!, ")

Pr(A|!, ")=

!r

∏i "

Aij

ri∑

s !s

∏i "

Aij

si

.

L =

c!

g1=1

· · ·

c!

gn=1

Pr(g|A, !,")

n!

j=1

!

ln"gj+

n!

i=1

Aij ln!gj,i

"

Page 20: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

The EM Algorithm

- Initialize model parameters (θ, π) with random values.

- Find the probability a given vertex is a member of group (E-step).

- Maximize the model parameters (M-step)

- Iterate until convergence.

qjr =

!r

!i "

Aij

ri"

s !s

!i "

Aij

si

.

!r =

1

n

!

i

qir !ri =

!j Aijqjr

!j kjqjr

.

rj

Page 21: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

The AddHealth Network

Page 22: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Example: AddHealth

Page 23: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Example: A “Keystone” Network

Page 24: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Example: A “Keystone” Network

Page 25: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

“Big Ten” Results with the EM Algorithm

0 0.5 1

qj1

Vertex shading based on probability of being assigned to group 1.

Vertex size based on probability of being beaten by teams assigned to group 1.

Page 26: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Example: The “Big Ten” Conference

Illinois

Purdue

Indiana

WisconsinIowa

Pennsylvania StateMinnesota

Michigan StateNorthwestern

Ohio StateMichigan

Page 27: Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in directed networks is presented. - Additionally, a method for detecting more general

Conclusions

- There are many existing methods for detecting structure in complex networks, but this work takes two new and interesting approaches.

- In this dissertation a method for detecting one specific type of structure, community structure, in directed networks is presented.

- Additionally, a method for detecting more general types of structure by turning to probabilistic mixture models and the Expectation-Maximization Algorithm is given.

- These two topics are not the only ones covered in my dissertation.

- The dissertation also introduces method for tackling the specific issue of identifying similar vertices.

- A method for probing the structure of time evolving citation networks was also developed.