7/30/2019 Fien Berg
1/41
Some Aspects of Graphical
Models for Network Data
Stephen E. Fienberg
Department of Statistics, Living Analytics Research Centre,
Heinz College, Machine Learning Department, and Cylab
Carnegie Mellon University
(Based on joint work with Sonja Petrovic, Alessandro Rinaldo,
and Xiaolin Yang)
Algebraic Statistics in the Alleghenies at The Pennsylvania StateUniversity
June 11, 2012
http://find/7/30/2019 Fien Berg
2/41
Graphs as Metaphors
Representation of statistical structures in terms of graphs
G = {V,E}, is a useful metaphor that allows us to exploitthe mathematical language of graph theory and some
relatively simple results.Graphs often provide powerful representations for the
interpretation of models.
Vertices and edges have different meaning in different
statistical settings.
2 / 4 1
http://find/http://goback/7/30/2019 Fien Berg
3/41
Graphical Representations of Statistical Models
Variables IndividualsDirected a b
Undirected c d
aHMMs, state-space models, Bayes nets, causal models(DAGs), recursive partitioning models
bsocial networks, trees, citation and email networks
ccovariance selection models, log-linear models,
multivariate time-series models
drelational networks, co-authorship networks
Note that a and c refer to probability models, while b and d are
used to describe observed data.
3 / 4 1
http://find/7/30/2019 Fien Berg
4/41
aHMMs, State-Space Models
4 / 4 1
http://find/7/30/2019 Fien Berg
5/41
aCausal Models, DAGs
CHILD network (blue babies) (Cowell et al.,1999)
5 / 4 1
http://find/7/30/2019 Fien Berg
6/41
bSocial Networks
AIDS blog network (Kolaczyk, 2009)
6 / 4 1
http://find/7/30/2019 Fien Berg
7/41
7/30/2019 Fien Berg
8/41
cLog-linear Models
Prognostic factors for coronary heart disease for Czech
autoworkers26 table (Edwards and Hrvanek, 1985)
8 / 4 1
http://find/7/30/2019 Fien Berg
9/41
dRelational Networks
Zacharys karate club network (Zachary, 1977; Kolaczyk,
2009)
9 / 4 1
http://find/http://goback/7/30/2019 Fien Berg
10/41
dYeast Protein-Protein Interaction
Airoldi et al. (2008)
10/41
http://find/7/30/2019 Fien Berg
11/41
Graphical Models for Variables
The following Markov conditions are equivalent:
Pairwise Markov Property: For all nonadjacent pairs ofvertices, i and j, i j | K \ {i,j}.Global Markov Property: For all triples of disjoint subsets ofK, whenever aand b are separated by c in the graph,a b | c.Local Markov Property: For every vertex i, if c is the
boundary set of i, i.e., c = bd(i), and b = K \ {i c} , theni b | c.
All discrete graphical models are log-linear.
Gaussian graphical model selection problem involves
estimating the zero-pattern of the inverse covariance orconcentration matrix.
For DAGs, we continue to use Markov properties but also
exploit partial ordering of variables.
Always assume individuals or units are independent r.v.s.
11/41
http://find/7/30/2019 Fien Berg
12/41
Models for Individuals/Units in Networks
Graph describes observedadjacency matrix.
Usually use 1 for presence of an edge, and 0 for absence.Can also have weights in place of 1s.
Except for Erdos-Renyi-Gilbert model, where occurrence
of edges corresponds to iid Bernoulli r.v.s, units are
dependent.
Simplest generalization of Erdos-Renyi-Gilbert model
assumes that dyads are independente.g., the p1 model
of Holland and Leinhardt, which has additional parameters
for reciprocation in directed networks.
Exponential Random Graph Models (ERGMs) that includestar and triangle motifs no longer have dyadic
independence.
Can have multiple relationships measure on same
individuals/units.
12/41
T Ki d f N k M d l
http://find/7/30/2019 Fien Berg
13/41
Two Kinds of Network Models
Models directly amenable to algebraic statisticsapproaches because they both the model and the
likelihood functions involve polynomials, e.g.,Erdos-Renyi-Gilbert and ERGs.
These models have minimal sufficient statistics (MSSs) thatprovide a lower dimensional representation of the data forformal inference, e.g., marginal totals or sums of counts.
Bayesian hierarchical models for which the data are theMSSs, e.g., mixed-membership stochastic block models,random Rasch models, etc.
These models gain their attractiveness statistically byintegrating over all parameters in lower levels of thehierarchy and thus focusing on a reduced number ofparameters, but which involve the data in complex ways.
Goldenberg et al. (2010) review both types models.
This talk focuses on the former kinds of models but the
discussion does have implications for some Bayesian
hierarchical model settings.13/41
E d R i Gilb t M d l
http://find/7/30/2019 Fien Berg
14/41
Erdos-Renyi-Gilbert Model
In G(n,M) model, we chose graph uniformly at random
from the collection of all graphs which have n nodes and Medgeshypergeometric distribution associated with the
degree of a node.
In G(n,p) model, we connect nodes in graphindependently, with constant probability p, Now M is
random and has a binomial distribution with probability(n2)M
pM(1 p)(
n2)M.
interesting probability structure, especially as as n and M
get large, but not much of interest statistically for a fixed n
or M since basically we are in a simple distributionalsetting.
This changes when we let p vary depending on the nodes
it connects, and especially when we allow edges to be
directed and dependent.
14/41
H ll d d L i h dt d l
http://find/7/30/2019 Fien Berg
15/41
Holland and Leinhardt p1 model
nnodes, random occurrence of directed edges.
Describe the probability of an edge occurring betweennodes i and j:
log Pij(0,0) = ij
log Pij(1,0) = ij + i + j +
log Pij(0,1) = ij + j + i +
log Pij(1,1) = ij + i + j + j + i + 2 + ij
3 common forms:ij = 0 (no reciprocal effect)ij = (constant reciprocation factor)ij = + i + j (edge-dependent reciprocation)
When edges are undirected, p1 reduces to the beta model.15/41
E ti ti f
http://find/7/30/2019 Fien Berg
16/41
Estimation for p1
The likelihood function for the p1 model is clearly of
exponential family form.
For the constant reciprocation version, we have
log p1(x) x++ +
i
xi+i +
j
x+jj +
ij
xijxji (1)
Get MLEs using IPFmethod scales.Holland-Leinhardt explored goodness of fit of modelempirically by comparing ij = 0 vs. ij = .
Standard asymptotics, e.g., 2 tests, not applicable.No. parameters increases with no. of nodes.
Fienberg and Wasserman (1981) use edge-dependentreciprocation model to test ij = .
Algebraic Statistics: Petrovic et al. (2010) provide Markov
bases; Rinaldo et al. (2011) characterize MLE existence.
Direct link to Bradley-Terry model when = 0.16/41
Exponential Random Graph Models (ERGMs)
http://find/7/30/2019 Fien Berg
17/41
Exponential Random Graph Models (ERGMs)
Let X be a n nadjacency matrix or a 0-1 vector of lengthn
2
or a point in {0,1}n
).Identify a set of network statistics
t = (t1(X), . . . , tk(X)) Rk
and construct a distribution such that t is a vector ofsufficient statistics.
This leads to an exponential family model of the form:
P(X = x) = h(x) exp{ t ()},
where
Rk is the natural parameter space;() is a normalizing constant (often intractable);h() depends on x only.
17/41
Likelihood Methods for ERGMs
http://find/7/30/2019 Fien Berg
18/41
Likelihood Methods for ERGMs
Likelihood methods are more complex than exponential
family structure might suggest.
Pseudo-estimation using independent logistic regressions,
one per node.Can get MLEs via MCMC.
Problem of degeneracy or near degeneracy:
MLEs dont existlikelihood maximized on the boundary.Likelihood function is not well-behaved and most
observable configurations are near the boundary.
18/41
ERGMs: 7 node Example I
http://find/http://goback/7/30/2019 Fien Berg
19/41
ERGMs: 7-node ExampleI
Set of all graphs on 7 nodes when the sufficient statistics are
the number of edges and number of triangles.
There are 221 = 2, 097,152 possible graphs!There are only 110 different configurations for the
2-dimensional sufficient statistics.
Note: many points on the boundary (including the empty
and complete graph)
0 2 4 6 8 10 12 14 16 18 200
5
10
15
20
25
30
35Support of the edge and triangle statistics
Number of edges
Numberoftriang
les
19/41
http://find/7/30/2019 Fien Berg
20/41
Inference for Models for Individuals/Units in Networks
7/30/2019 Fien Berg
21/41
Inference for Models for Individuals/Units in Networks
Relevant asymptotics has number of nodes, n.
When there are node-specific parameters, asymptotics are
far more complex, e.g., see Chatterjee and Diaconis
(2011).Maximum likelihood approaches available for ERGMs.
For blockmodels, with constant structure within blocks,there is asymptotic theory.
Related literature on community formation and
modularity. Bickel and Chen (2009)
21/41
Pseudo-likelihood for ERGMs
http://find/7/30/2019 Fien Berg
22/41
Pseudo-likelihood for ERGMs
Frank and Strauss (1986) and Strauss and Ikeda (1990),
following ideas of Besag and work on Markov random fieldmodels, considered conditional probability P(Xij = 1|X
cij )
where Xcij is the graph after removing edge (i,j).
P(Xij = 1|Xcij ) =
exp [ (T(X+ij ) T(X
ij ))]
1 + exp [ (T(X+ij ) T(X
ij ))]
=exp [ (Xcij )]
1 + exp [ (Xcij )]
where X+ij and X
ij represent graphs setting Xij = 1 or 0,
Xcij denotes remainder of network , and (xcij) is change of
SSs when xij changes from 0 to 1.
This has form of logistic regression model.
22/41
Maximum Pseudo-likelihood Estimation for ERGMs
http://find/7/30/2019 Fien Berg
23/41
Maximum Pseudo likelihood Estimation for ERGMs
Pseudo-likelihood treats logistic regression components as
if they were independent and sums over all edges:
lP(; x) =
ij
(xcij)xij
ij
log(1 + exp(T(xcij))) (2)
Simple Markov basis structure for pseudo-likelihood.
Fienberg, Petrovic, and Rinaldo (unpub.)
Theorem (Yang, Fienberg, and Rinaldo (unpub.))
The existence of the MPLE implies the existence of the MLE.
The converse is false.
Implications:If we use MPLEs we may act as if the likelihood iswell-behaved when it is not.Even if MLEs exist, actual values of MPLEs and MLEs candiffer substantially.
23/41
MLE vs MPLE7 node Example
http://find/7/30/2019 Fien Berg
24/41
MLE vs. MPLE 7 node Example
24/41
dRelational Networks
http://find/7/30/2019 Fien Berg
25/41
d Relational Networks
Zacharys karate club network (Zachary, 1977; Kolaczyk,
2009)
25/41
MLE vs. MPLEZacharys Karate Club Data
http://find/7/30/2019 Fien Berg
26/41
MLE vs. MPLE Zachary s Karate Club Data
Simple ERGMs dont fit data well because they ignore
two-block structure.
Nonetheless, we can see howMLE and MPLE estimation
methods diverge for these data.
26/41
Connecting the Two Graphical Approaches
http://find/http://goback/7/30/2019 Fien Berg
27/41
Connecting the Two Graphical Approaches
There is link between graphical models for for variables
and graphical models for networks, not just a common
metaphor.Frank and Strauss (1986) introduce a pairwise Markov
property for individual-level undirected network models.
Key is the construction of the dual graph.
27/41
The Network Dual Graph
http://find/7/30/2019 Fien Berg
28/41
e et o ua G ap
Dual Graph: Set up conditional independence graph,G = {V,E}, whose nodes are edges from originalgraph, G = {V,E}.
Xij and Xij are conditionally independent given the other
r.v.s Xkl iff they do not share a vertex in G
= {V
,E
}.G is Markov if G contains no edge between disjoint sets
(s, t) and (u, v) in V.
Cliques in a Markov random graph are stars (edges are
1-stars) of various orders and triangles.
If there are no edges in the dual graph G, then we
essentially have the beta model.
28/41
Undirected Markov Random Graph Models
http://find/7/30/2019 Fien Berg
29/41
p
Theorem
For homogeneous (exchangeable) graphs, distribution of X
now satisfies the pairwise Markov property iff
Pr{X = x} exp{
NV1
k=1
kSk(x) + T(x)}
where Sk(x) is no. of k-stars and T(x) is no. triangles.
Many other ERGMs dont have this property, e.g., those
with alternating k-stars and alternating triangles.
29/41
Directed Markov Random Graph Models
http://find/http://goback/7/30/2019 Fien Berg
30/41
p
Analogous approach to construction of dual graph for
situation with directed edges.
Now the vertices of G are paired corresponding to dyads
from G.
Cliques are the original dyads, and various stars and
triangular structures.
If model contains no stars or triangles, it reduces to p1.
30/41
Open Problems
http://find/7/30/2019 Fien Berg
31/41
p
Some network models have nice, non-degeneratebehavior.
Dyadic independence models such as p1.Simple blockmodels.Blockmodels that build on dyadic independence structures.
Question: Are there other ERGMs, and in particularMarkov Random Graph Models, that are nice?
Question: Where does decomposability in dual graph fit in?
Question: Can we derive Markov bases for ERGMs other
than the simple ones mentioned above?Question: Is there a role for AS in other aspects of these
considerations?
31/41
Link to Sturmfels on Ranking
http://find/7/30/2019 Fien Berg
32/41
g
Recall relationship between p1 model with 0 andBradley-Terry model from paired comparisons literature.
Plackett-Luce model is the natural generalization of BT forfully-ranked data.
Network relationships often involve ranking and censoring,
e.g., List your 5 closest friends.This is the methodology in the Add Health Network.For those with 5 or less friends we have complete ranking.For those with > 5 friends we have a censored list ofrankings.
Question: What are the extensions of the p1 and otherERGMs to deal with the ranking and censoring problem?
Question: Is there a role for AS in these considerations?
32/41
Roles for Latent Variables
http://goforward/http://find/http://goback/7/30/2019 Fien Berg
33/41
For graphical models for variables:
Natural for many models, e.g., HMMs.Arise naturally in Hierarchical Bayesian structures.Hyperparameters are latent quantities.
For models for individuals/units in networks:
Random effects versions of node-specific models such asp1.Arise naturally in hierarchical Bayesian approaches, suchas Mixed Membership Stochastic Blockmodels and latentspace models.
Can also use latent structure to infer network links from
data on variables for individuals, e.g., as in relational topic
models.
Is there a role role AS methods here?
33/41
Ex: National Longitudinal Study of Adolescent Health
http://goforward/http://find/http://goback/7/30/2019 Fien Berg
34/41
Add HealthFollows nationally representative sample of U.S.adolescents in grades 7-12 from 1994-95 school year intoyoung adulthood with 4 in-home interviews, the latest in2008.Combines data on respondents social, economic,psychological and physical well-being with contextual dataon family, neighborhood, community, school, friendships,peer groups, and romantic relationships.
Key notion is that models not only need to be longitudinal
(dynamic) but also link to variables associated with nodesand or edges.
Models for simple dyadic relationships are not sufficient.
34/41
eat r en s p etwor ata, unter et a .(2008)
http://find/http://goback/7/30/2019 Fien Berg
35/41
(2008)Wave I:
35/41
Role of Time/Dynamics
http://find/http://goback/7/30/2019 Fien Berg
36/41
For graphical models for variables:
Time gives ordering to variables and assists in causalmodels.Note distinction between position of underlying latentquantity over time and the actual manifest measurementassociated with it, which is often measured retrospectively.
Dynamic models for individual-based networks:
Continuous-time stochastic process models for event data,perhaps aggregated into chunks.
Discrete-time transition models, perhaps embedded intocontinuous time process, e.g., see Hanneke et al. (2010).
36/41
Drawing Inferences From Subnetworks and Subgraphs
http://find/7/30/2019 Fien Berg
37/41
Inferences from SubgraphsConditional independence structure allows for local
message passing and inference from cliques and regularsubgraphs when there are separator sets that isolatecomponents.Interpretation in terms of GLM regression coefficientsalways depends on the other variables in the model.
Inferences from SubnetworksMost properties observed in subnetworks dont generalizeto full network, and vice versa, e.g., power laws for degreedistributions.Problem is dependencies among nodes and boundaryeffects for subnetworks.
Missing edges are generally not missing at random, exceptfor some sampling settings, e.g., see Handcock and Gile(2010).
The forgoing suggests that we cant use cross-validation
for all but simplest network models.
37/41
Summary
http://find/7/30/2019 Fien Berg
38/41
Two types of settings:
Variables Individuals
Directed a b
Undirected c d
For a and c we use conditional independence ideas tomodel probabilistic relations among variables.
for b and d we use graph to represent observed data.
Independence comes into play in network settings only via
dyadic independence.
ERGMs have heuristic appeal but often display degenerate
behavior.Markov Random Graph models invoke the Markov
property we inherit from more traditional graphical model
setting, but whether something nice flows from Markov
Random Graph structure remains an open issue.
38/41
References
http://find/7/30/2019 Fien Berg
39/41
Airoldi, E M., Blei, D. M., Fienberg, S. E., and Xing, E. P. (2008)Mixed Membership Stochastic Blockmodels. Journal of Machine
Learning Research, 9, 19812014.
Bickel, P. and Chen, A. (2009) A Nonparametric View of NetworkModels and Newman-Girvan and Other Modularities. PNAS, 106
(50), 2106821073.
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975) DiscreteMultivariate Analysis: Theory and Practice. MIT Press. Reprinted bySpringer (2007).
Chaterjee, S. and Diaconis, P. (2011) Estimating and UnderstandingExponential Random Graph Models. arXiv:1102.2650v3.
Cowell, R. G., Dawid, A. P., Lauritzen, S. L., and Spiegelhalter, D. J.(1999) Probabilistic Networks and Expert Systems. Springer.
Frank, O. and Strauss, D. (1986) Markov Graphs. Journal of theAmerican Statistical Association, 81, 832842.
39/41
ReferencesII
http://find/7/30/2019 Fien Berg
40/41
Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi E. M.
(2010) A Survey of Statistical Network Models. Foundations andTrends in Machine Learning, 2 (2), 129233.
Handcock, M. S. and Gile, K. J. (2010) Modeling Social Networksfrom Sampled Data. Annals of Applied Statistics, 4, 525.
Hanneke, S., Fu, W. and Xing, E. P. (2010) Discrete Temporal Modelsof Social Networks. Electronic J. Statistics, 4, 585605.
Hunter, D. R. , Goodreau, S. M., and Handcock, M. S. (2008)Goodness of Fit of Social Network Models. J. American StatisticalAssociation, 103, 248258.
Lauritzen, S. (1996) Graphical Models. Oxford Univ. Press.Kolaczyk, E. D. (2009) Statistical Analysis of Network Data: Methodsand Models. Springer.
40/41
ReferencesIII
http://find/7/30/2019 Fien Berg
41/41
Petrovic, S., Rinaldo, A., and Fienberg, S. E. (2010) AlgebraicStatistics for a Directed Random Graph Model with Reciprocation.Proceedings of the Conference on Algebraic Methods in Statisticsand Probability, Contemporary Mathematics Series, AMS, 261283.
Rinaldo, A., Fienberg, S. E. and Zhou, Y. (2009) On the Geometry of
Discrete Exponential Families with Application to ExponentialRandom Graph Models. Electronic J. Statistics, 3, 446484.
Rinaldo, A., Petrovic, S., and Fienberg, S. E. (2011) On the Existenceof the MLE for a Directed Random Graph Network Model withReciprocation. arXiv:1010.0745v1
Zachary, W. W. (1977) An Information Flow Model for Conflict andFission in Small Groups. J. Anthropological Research, 33, 452473.
41/41
http://find/Top Related