Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in...
Transcript of Methods and Applications for Detecting Structure in ...€¦ · structure, community structure, in...
Methods and Applications for Detecting Structure in Complex Networks
Elizabeth A. Leicht
Introduction
http://www-personal.umich.edu/~mejn/networks/attweb.gif
Yeast proteins: Maslov & Sneppen, Science 296, 910-913 (2002).
Web site: M. E. J. Newman & M. Girvan, Physical Review E 69, 026113 (2004).
7th Grade Students: adapted from Moreno, Who Shall Survive, 1934.
Visualizing a Network
Large Networks
Talk Outline
• Introduction to networks and their structure.
• Network structure detection method 1: linear-algebra and optimization techniques.
• Network structure detection method 2: probabilistic mixture models and the expectation-maximization algorithm.
• Conclusions.
Detecting Communities in Directed Networks
- In many networks, the vertices divide naturally into communities, also known as groups or modules.
- Previous methods for detecting communities in networks have mostly focused just on undirected networks.
- One established method for partitioning a network relies on identifying statistically surprising configurations of edges by measuring modularity (Q).
Q =
(fraction of edges within communities)- (expected fraction of such edges)
Calculating Modularity
Rewrite modularity using the adjacency matrix.
Q =1
m
n!
ij=1
[Aij ! Pij ] !ci,cj
How do we determine the “expected” number of edges between two vertices?
Aij =
!
1, if there is an edge from j to i
0, otherwise .
ci = the community to which i belongs.Pij = the expected number of edges from j to i.
Pij =
kini koutj
m
kini
koutj
Division of a Network into Two Communitiessj = !1si = +1
Q =1
2m
n!
ij=1
!
Aij !kini koutj
m
"
(sisj + 1)
!ij =1
2(sisj + 1)
Let Bij = Aij !
kini koutj
mbe an element of the modularity matrix.
Let v(1) be the eigenvectorcorresponding to the mostpositive eigenvalue of thesymmetric matrix B + BT
.
si =
!
+1, if v(1)i
> 0
!1, if v(1)i
< 0
Q =1
2msTBs =
1
2msTB
Ts =
1
4msT
!
B + BT"
s
Two Communities and More
- A network can be divided into more than two communities by repeated bisection.
- We can calculate the additional contribution to the modularity, ∆ Q, from the each additional division and halt the divisions
when no there is no additional increase in the modularity.
Communities with Edge Direction Bias
Communities with Edge Direction Bias
Hubs and Authorities Example
Hubs and Authorities Example
Edge Direction Bias in Real Networks
Minnesota
Indiana
Michigan
Iowa Wisconsin
Michigan State
Northwestern Ohio State
Pennsylvania State
Purdue
Illinois
Edge Direction Bias in Real Networks
Purdue IndianaIowa
Michigan
Wisconsin
Michigan State
Minnesota
Northwestern Ohio State
Pennsylvania State
Illinois
Exploratory Analysis of Structure in Networks
- Previously we identified communities in networks because we specifically sought a method to detect modules in networks.
- Reliance on specific measures of network structure where we are required to know the type of structure, for which we are looking, in advance can be limiting.
- We turn to probabilistic techniques and the Expectation Maximization (EM) Algorithm to identify general patterns of connection between vertices.
The Method
There are three types of quantities in this method of approach.
1. Observed data: the actual edges falling between pairs of vertices in a network ( ).
2. Missing data: we assume that the vertices divide into c groups. We denote the group to which vertex belongs as and set of all missing data as
3. Model parameters: they describe the patterns of vertices in different groups (θ, π).
A
i
gi g.
!ri =
the probability that there exists an edgefrom a vertex in group r to a vertex i
!r = the probability of a vertex belonging to group r
c!
r=1
!r = 1
n!
i=1
!ri = 1
The likelihood of the data given the model is
where
Frequently, one works not with the likelihood itself, but with the log-likelihood.
A Likelihood Problem
Pr(A|g,!, ") =!
ij
"Aij
gj,iand Pr(g|!, ") =
!
j
!gj.
Pr(A,g|!, ") = Pr(A|g,!, ")Pr(g|!, "),
L = ln Pr(A,g|!, ") =!
j
!
ln!gj+
"
i
"Aij
gj,i
"
.
We cannot directly observe g.
It is, however, possible to calculate an expected value for the log-likelihood over all possible values of g.
where
Dealing with the “Missing” Data
L =
c!
r=1
n!
j=1
qrj
!
ln!r +
n!
i=1
Aij ln "ri
"
qjr = Pr(gj = r|A,!, ") =Pr(A,gj = r|!, ")
Pr(A|!, ")=
!r
∏i "
Aij
ri∑
s !s
∏i "
Aij
si
.
L =
c!
g1=1
· · ·
c!
gn=1
Pr(g|A, !,")
n!
j=1
!
ln"gj+
n!
i=1
Aij ln!gj,i
"
The EM Algorithm
- Initialize model parameters (θ, π) with random values.
- Find the probability a given vertex is a member of group (E-step).
- Maximize the model parameters (M-step)
- Iterate until convergence.
qjr =
!r
!i "
Aij
ri"
s !s
!i "
Aij
si
.
!r =
1
n
!
i
qir !ri =
!j Aijqjr
!j kjqjr
.
rj
The AddHealth Network
Example: AddHealth
Example: A “Keystone” Network
Example: A “Keystone” Network
“Big Ten” Results with the EM Algorithm
0 0.5 1
qj1
Vertex shading based on probability of being assigned to group 1.
Vertex size based on probability of being beaten by teams assigned to group 1.
Example: The “Big Ten” Conference
Illinois
Purdue
Indiana
WisconsinIowa
Pennsylvania StateMinnesota
Michigan StateNorthwestern
Ohio StateMichigan
Conclusions
- There are many existing methods for detecting structure in complex networks, but this work takes two new and interesting approaches.
- In this dissertation a method for detecting one specific type of structure, community structure, in directed networks is presented.
- Additionally, a method for detecting more general types of structure by turning to probabilistic mixture models and the Expectation-Maximization Algorithm is given.
- These two topics are not the only ones covered in my dissertation.
- The dissertation also introduces method for tackling the specific issue of identifying similar vertices.
- A method for probing the structure of time evolving citation networks was also developed.