Fien Berg

7/30/2019 Fien Berg

1/41

Some Aspects of Graphical

Models for Network Data

Stephen E. Fienberg

Department of Statistics, Living Analytics Research Centre,

Heinz College, Machine Learning Department, and Cylab

Carnegie Mellon University

(Based on joint work with Sonja Petrovic, Alessandro Rinaldo,

and Xiaolin Yang)

Algebraic Statistics in the Alleghenies at The Pennsylvania StateUniversity

June 11, 2012
http://find/

7/30/2019 Fien Berg

2/41

Graphs as Metaphors

Representation of statistical structures in terms of graphs

G = {V,E}, is a useful metaphor that allows us to exploitthe mathematical language of graph theory and some

relatively simple results.Graphs often provide powerful representations for the

interpretation of models.

Vertices and edges have different meaning in different

statistical settings.

2 / 4 1
http://find/http://goback/

7/30/2019 Fien Berg

3/41

Graphical Representations of Statistical Models

Variables IndividualsDirected a b

Undirected c d

aHMMs, state-space models, Bayes nets, causal models(DAGs), recursive partitioning models

bsocial networks, trees, citation and email networks

ccovariance selection models, log-linear models,

multivariate time-series models

drelational networks, co-authorship networks

Note that a and c refer to probability models, while b and d are

used to describe observed data.

3 / 4 1
http://find/

7/30/2019 Fien Berg

4/41

aHMMs, State-Space Models

4 / 4 1
http://find/

7/30/2019 Fien Berg

5/41

aCausal Models, DAGs

CHILD network (blue babies) (Cowell et al.,1999)

5 / 4 1
http://find/

7/30/2019 Fien Berg

6/41

bSocial Networks

AIDS blog network (Kolaczyk, 2009)

6 / 4 1
http://find/

7/30/2019 Fien Berg

7/41

7/30/2019 Fien Berg

8/41

cLog-linear Models

Prognostic factors for coronary heart disease for Czech

autoworkers26 table (Edwards and Hrvanek, 1985)

8 / 4 1
http://find/

7/30/2019 Fien Berg

9/41

dRelational Networks

Zacharys karate club network (Zachary, 1977; Kolaczyk,

2009)

9 / 4 1

7/30/2019 Fien Berg

10/41

dYeast Protein-Protein Interaction

Airoldi et al. (2008)

10/41
http://find/

7/30/2019 Fien Berg

11/41

Graphical Models for Variables

The following Markov conditions are equivalent:

Pairwise Markov Property: For all nonadjacent pairs ofvertices, i and j, i j | K \ {i,j}.Global Markov Property: For all triples of disjoint subsets ofK, whenever aand b are separated by c in the graph,a b | c.Local Markov Property: For every vertex i, if c is the

boundary set of i, i.e., c = bd(i), and b = K \ {i c} , theni b | c.

All discrete graphical models are log-linear.

Gaussian graphical model selection problem involves

estimating the zero-pattern of the inverse covariance orconcentration matrix.

For DAGs, we continue to use Markov properties but also

exploit partial ordering of variables.

Always assume individuals or units are independent r.v.s.

11/41
http://find/

7/30/2019 Fien Berg

12/41

Models for Individuals/Units in Networks

Graph describes observedadjacency matrix.

Usually use 1 for presence of an edge, and 0 for absence.Can also have weights in place of 1s.

Except for Erdos-Renyi-Gilbert model, where occurrence

of edges corresponds to iid Bernoulli r.v.s, units are

dependent.

Simplest generalization of Erdos-Renyi-Gilbert model

assumes that dyads are independente.g., the p1 model

of Holland and Leinhardt, which has additional parameters

for reciprocation in directed networks.

Exponential Random Graph Models (ERGMs) that includestar and triangle motifs no longer have dyadic

independence.

Can have multiple relationships measure on same

individuals/units.

12/41

T Ki d f N k M d l
http://find/

7/30/2019 Fien Berg

13/41

Two Kinds of Network Models

Models directly amenable to algebraic statisticsapproaches because they both the model and the

likelihood functions involve polynomials, e.g.,Erdos-Renyi-Gilbert and ERGs.

These models have minimal sufficient statistics (MSSs) thatprovide a lower dimensional representation of the data forformal inference, e.g., marginal totals or sums of counts.

Bayesian hierarchical models for which the data are theMSSs, e.g., mixed-membership stochastic block models,random Rasch models, etc.

These models gain their attractiveness statistically byintegrating over all parameters in lower levels of thehierarchy and thus focusing on a reduced number ofparameters, but which involve the data in complex ways.

Goldenberg et al. (2010) review both types models.

This talk focuses on the former kinds of models but the

discussion does have implications for some Bayesian

hierarchical model settings.13/41

E d R i Gilb t M d l
http://find/

7/30/2019 Fien Berg

14/41

Erdos-Renyi-Gilbert Model

In G(n,M) model, we chose graph uniformly at random

from the collection of all graphs which have n nodes and Medgeshypergeometric distribution associated with the

degree of a node.

In G(n,p) model, we connect nodes in graphindependently, with constant probability p, Now M is

random and has a binomial distribution with probability(n2)M

pM(1 p)(

n2)M.

interesting probability structure, especially as as n and M

get large, but not much of interest statistically for a fixed n

or M since basically we are in a simple distributionalsetting.

This changes when we let p vary depending on the nodes

it connects, and especially when we allow edges to be

directed and dependent.

14/41

H ll d d L i h dt d l
http://find/

7/30/2019 Fien Berg

15/41

Holland and Leinhardt p1 model

nnodes, random occurrence of directed edges.

Describe the probability of an edge occurring betweennodes i and j:

log Pij(0,0) = ij

log Pij(1,0) = ij + i + j +

log Pij(0,1) = ij + j + i +

log Pij(1,1) = ij + i + j + j + i + 2 + ij

3 common forms:ij = 0 (no reciprocal effect)ij = (constant reciprocation factor)ij = + i + j (edge-dependent reciprocation)

When edges are undirected, p1 reduces to the beta model.15/41

E ti ti f
http://find/

7/30/2019 Fien Berg

16/41

Estimation for p1

The likelihood function for the p1 model is clearly of

exponential family form.

For the constant reciprocation version, we have

log p1(x) x++ +

i

xi+i +

j

x+jj +

ij

xijxji (1)

Get MLEs using IPFmethod scales.Holland-Leinhardt explored goodness of fit of modelempirically by comparing ij = 0 vs. ij = .

Standard asymptotics, e.g., 2 tests, not applicable.No. parameters increases with no. of nodes.

Fienberg and Wasserman (1981) use edge-dependentreciprocation model to test ij = .

Algebraic Statistics: Petrovic et al. (2010) provide Markov

bases; Rinaldo et al. (2011) characterize MLE existence.

Direct link to Bradley-Terry model when = 0.16/41

Exponential Random Graph Models (ERGMs)
http://find/

7/30/2019 Fien Berg

17/41

Exponential Random Graph Models (ERGMs)

Let X be a n nadjacency matrix or a 0-1 vector of lengthn

2

or a point in {0,1}n

).Identify a set of network statistics

t = (t1(X), . . . , tk(X)) Rk

and construct a distribution such that t is a vector ofsufficient statistics.

This leads to an exponential family model of the form:

P(X = x) = h(x) exp{ t ()},

where

Rk is the natural parameter space;() is a normalizing constant (often intractable);h() depends on x only.

17/41

Likelihood Methods for ERGMs
http://find/

7/30/2019 Fien Berg

18/41

Likelihood Methods for ERGMs

Likelihood methods are more complex than exponential

family structure might suggest.

Pseudo-estimation using independent logistic regressions,

one per node.Can get MLEs via MCMC.

Problem of degeneracy or near degeneracy:

MLEs dont existlikelihood maximized on the boundary.Likelihood function is not well-behaved and most

observable configurations are near the boundary.

18/41

ERGMs: 7 node Example I

7/30/2019 Fien Berg

19/41

ERGMs: 7-node ExampleI

Set of all graphs on 7 nodes when the sufficient statistics are

the number of edges and number of triangles.

There are 221 = 2, 097,152 possible graphs!There are only 110 different configurations for the

2-dimensional sufficient statistics.

Note: many points on the boundary (including the empty

and complete graph)

0 2 4 6 8 10 12 14 16 18 200

5

10

15

20

25

30

35Support of the edge and triangle statistics

Number of edges

Numberoftriang

les

19/41
http://find/

7/30/2019 Fien Berg

20/41

Inference for Models for Individuals/Units in Networks

7/30/2019 Fien Berg

21/41

Inference for Models for Individuals/Units in Networks

Relevant asymptotics has number of nodes, n.

When there are node-specific parameters, asymptotics are

far more complex, e.g., see Chatterjee and Diaconis

(2011).Maximum likelihood approaches available for ERGMs.

For blockmodels, with constant structure within blocks,there is asymptotic theory.

Related literature on community formation and

modularity. Bickel and Chen (2009)

21/41

Pseudo-likelihood for ERGMs
http://find/

7/30/2019 Fien Berg

22/41

Pseudo-likelihood for ERGMs

Frank and Strauss (1986) and Strauss and Ikeda (1990),

following ideas of Besag and work on Markov random fieldmodels, considered conditional probability P(Xij = 1|X

cij )

where Xcij is the graph after removing edge (i,j).

P(Xij = 1|Xcij ) =

exp [ (T(X+ij ) T(X

ij ))]

1 + exp [ (T(X+ij ) T(X

ij ))]

=exp [ (Xcij )]

1 + exp [ (Xcij )]

where X+ij and X

ij represent graphs setting Xij = 1 or 0,

Xcij denotes remainder of network , and (xcij) is change of

SSs when xij changes from 0 to 1.

This has form of logistic regression model.

22/41

Maximum Pseudo-likelihood Estimation for ERGMs
http://find/

7/30/2019 Fien Berg

23/41

Maximum Pseudo likelihood Estimation for ERGMs

Pseudo-likelihood treats logistic regression components as

if they were independent and sums over all edges:

lP(; x) =

ij

(xcij)xij

ij

log(1 + exp(T(xcij))) (2)

Simple Markov basis structure for pseudo-likelihood.

Fienberg, Petrovic, and Rinaldo (unpub.)

Theorem (Yang, Fienberg, and Rinaldo (unpub.))

The existence of the MPLE implies the existence of the MLE.

The converse is false.

Implications:If we use MPLEs we may act as if the likelihood iswell-behaved when it is not.Even if MLEs exist, actual values of MPLEs and MLEs candiffer substantially.

23/41

MLE vs MPLE7 node Example
http://find/

7/30/2019 Fien Berg

24/41

MLE vs. MPLE 7 node Example

24/41

dRelational Networks
http://find/

7/30/2019 Fien Berg

25/41

d Relational Networks

Zacharys karate club network (Zachary, 1977; Kolaczyk,

2009)

25/41

MLE vs. MPLEZacharys Karate Club Data
http://find/

7/30/2019 Fien Berg

26/41

MLE vs. MPLE Zachary s Karate Club Data

Simple ERGMs dont fit data well because they ignore

two-block structure.

Nonetheless, we can see howMLE and MPLE estimation

methods diverge for these data.

26/41

Connecting the Two Graphical Approaches

7/30/2019 Fien Berg

27/41

Connecting the Two Graphical Approaches

There is link between graphical models for for variables

and graphical models for networks, not just a common

metaphor.Frank and Strauss (1986) introduce a pairwise Markov

property for individual-level undirected network models.

Key is the construction of the dual graph.

27/41

The Network Dual Graph
http://find/

7/30/2019 Fien Berg

28/41

e et o ua G ap

Dual Graph: Set up conditional independence graph,G = {V,E}, whose nodes are edges from originalgraph, G = {V,E}.

Xij and Xij are conditionally independent given the other

r.v.s Xkl iff they do not share a vertex in G

= {V

,E

}.G is Markov if G contains no edge between disjoint sets

(s, t) and (u, v) in V.

Cliques in a Markov random graph are stars (edges are

1-stars) of various orders and triangles.

If there are no edges in the dual graph G, then we

essentially have the beta model.

28/41

Undirected Markov Random Graph Models
http://find/

7/30/2019 Fien Berg

29/41

p

Theorem

For homogeneous (exchangeable) graphs, distribution of X

now satisfies the pairwise Markov property iff

Pr{X = x} exp{

NV1

k=1

kSk(x) + T(x)}

where Sk(x) is no. of k-stars and T(x) is no. triangles.

Many other ERGMs dont have this property, e.g., those

with alternating k-stars and alternating triangles.

29/41

Directed Markov Random Graph Models

7/30/2019 Fien Berg

30/41

p

Analogous approach to construction of dual graph for

situation with directed edges.

Now the vertices of G are paired corresponding to dyads

from G.

Cliques are the original dyads, and various stars and

triangular structures.

If model contains no stars or triangles, it reduces to p1.

30/41

Open Problems
http://find/

7/30/2019 Fien Berg

31/41

p

Some network models have nice, non-degeneratebehavior.

Dyadic independence models such as p1.Simple blockmodels.Blockmodels that build on dyadic independence structures.

Question: Are there other ERGMs, and in particularMarkov Random Graph Models, that are nice?

Question: Where does decomposability in dual graph fit in?

Question: Can we derive Markov bases for ERGMs other

than the simple ones mentioned above?Question: Is there a role for AS in other aspects of these

considerations?

31/41

Link to Sturmfels on Ranking
http://find/

7/30/2019 Fien Berg

32/41

g

Recall relationship between p1 model with 0 andBradley-Terry model from paired comparisons literature.

Plackett-Luce model is the natural generalization of BT forfully-ranked data.

Network relationships often involve ranking and censoring,

e.g., List your 5 closest friends.This is the methodology in the Add Health Network.For those with 5 or less friends we have complete ranking.For those with > 5 friends we have a censored list ofrankings.

Question: What are the extensions of the p1 and otherERGMs to deal with the ranking and censoring problem?

Question: Is there a role for AS in these considerations?

32/41

Roles for Latent Variables
http://goforward/http://find/http://goback/

7/30/2019 Fien Berg

33/41

For graphical models for variables:

Natural for many models, e.g., HMMs.Arise naturally in Hierarchical Bayesian structures.Hyperparameters are latent quantities.

For models for individuals/units in networks:

Random effects versions of node-specific models such asp1.Arise naturally in hierarchical Bayesian approaches, suchas Mixed Membership Stochastic Blockmodels and latentspace models.

Can also use latent structure to infer network links from

data on variables for individuals, e.g., as in relational topic

models.

Is there a role role AS methods here?

33/41

Ex: National Longitudinal Study of Adolescent Health
http://goforward/http://find/http://goback/

7/30/2019 Fien Berg

34/41

Add HealthFollows nationally representative sample of U.S.adolescents in grades 7-12 from 1994-95 school year intoyoung adulthood with 4 in-home interviews, the latest in2008.Combines data on respondents social, economic,psychological and physical well-being with contextual dataon family, neighborhood, community, school, friendships,peer groups, and romantic relationships.

Key notion is that models not only need to be longitudinal

(dynamic) but also link to variables associated with nodesand or edges.

Models for simple dyadic relationships are not sufficient.

34/41

eat r en s p etwor ata, unter et a .(2008)

7/30/2019 Fien Berg

35/41

(2008)Wave I:

35/41

Role of Time/Dynamics

7/30/2019 Fien Berg

36/41

For graphical models for variables:

Time gives ordering to variables and assists in causalmodels.Note distinction between position of underlying latentquantity over time and the actual manifest measurementassociated with it, which is often measured retrospectively.

Dynamic models for individual-based networks:

Continuous-time stochastic process models for event data,perhaps aggregated into chunks.

Discrete-time transition models, perhaps embedded intocontinuous time process, e.g., see Hanneke et al. (2010).

36/41

Drawing Inferences From Subnetworks and Subgraphs
http://find/

7/30/2019 Fien Berg

37/41

Inferences from SubgraphsConditional independence structure allows for local

message passing and inference from cliques and regularsubgraphs when there are separator sets that isolatecomponents.Interpretation in terms of GLM regression coefficientsalways depends on the other variables in the model.

Inferences from SubnetworksMost properties observed in subnetworks dont generalizeto full network, and vice versa, e.g., power laws for degreedistributions.Problem is dependencies among nodes and boundaryeffects for subnetworks.

Missing edges are generally not missing at random, exceptfor some sampling settings, e.g., see Handcock and Gile(2010).

The forgoing suggests that we cant use cross-validation

for all but simplest network models.

37/41

Summary
http://find/

7/30/2019 Fien Berg

38/41

Two types of settings:

Variables Individuals

Directed a b

Undirected c d

For a and c we use conditional independence ideas tomodel probabilistic relations among variables.

for b and d we use graph to represent observed data.

Independence comes into play in network settings only via

dyadic independence.

ERGMs have heuristic appeal but often display degenerate

behavior.Markov Random Graph models invoke the Markov

property we inherit from more traditional graphical model

setting, but whether something nice flows from Markov

Random Graph structure remains an open issue.

38/41

References
http://find/

7/30/2019 Fien Berg

39/41

Airoldi, E M., Blei, D. M., Fienberg, S. E., and Xing, E. P. (2008)Mixed Membership Stochastic Blockmodels. Journal of Machine

Learning Research, 9, 19812014.

Bickel, P. and Chen, A. (2009) A Nonparametric View of NetworkModels and Newman-Girvan and Other Modularities. PNAS, 106

(50), 2106821073.

Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975) DiscreteMultivariate Analysis: Theory and Practice. MIT Press. Reprinted bySpringer (2007).

Chaterjee, S. and Diaconis, P. (2011) Estimating and UnderstandingExponential Random Graph Models. arXiv:1102.2650v3.

Cowell, R. G., Dawid, A. P., Lauritzen, S. L., and Spiegelhalter, D. J.(1999) Probabilistic Networks and Expert Systems. Springer.

Frank, O. and Strauss, D. (1986) Markov Graphs. Journal of theAmerican Statistical Association, 81, 832842.

39/41

ReferencesII
http://find/

7/30/2019 Fien Berg

40/41

Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi E. M.

(2010) A Survey of Statistical Network Models. Foundations andTrends in Machine Learning, 2 (2), 129233.

Handcock, M. S. and Gile, K. J. (2010) Modeling Social Networksfrom Sampled Data. Annals of Applied Statistics, 4, 525.

Hanneke, S., Fu, W. and Xing, E. P. (2010) Discrete Temporal Modelsof Social Networks. Electronic J. Statistics, 4, 585605.

Hunter, D. R. , Goodreau, S. M., and Handcock, M. S. (2008)Goodness of Fit of Social Network Models. J. American StatisticalAssociation, 103, 248258.

Lauritzen, S. (1996) Graphical Models. Oxford Univ. Press.Kolaczyk, E. D. (2009) Statistical Analysis of Network Data: Methodsand Models. Springer.

40/41

ReferencesIII
http://find/

7/30/2019 Fien Berg

41/41

Petrovic, S., Rinaldo, A., and Fienberg, S. E. (2010) AlgebraicStatistics for a Directed Random Graph Model with Reciprocation.Proceedings of the Conference on Algebraic Methods in Statisticsand Probability, Contemporary Mathematics Series, AMS, 261283.

Rinaldo, A., Fienberg, S. E. and Zhou, Y. (2009) On the Geometry of

Discrete Exponential Families with Application to ExponentialRandom Graph Models. Electronic J. Statistics, 3, 446484.

Rinaldo, A., Petrovic, S., and Fienberg, S. E. (2011) On the Existenceof the MLE for a Directed Random Graph Network Model withReciprocation. arXiv:1010.0745v1

Zachary, W. W. (1977) An Information Flow Model for Conflict andFission in Small Groups. J. Anthropological Research, 33, 452473.

41/41
http://find/

Fien Berg

Documents

Transcript of Fien Berg