Fien Berg

download Fien Berg

of 41

Transcript of Fien Berg

  • 7/30/2019 Fien Berg

    1/41

    Some Aspects of Graphical

    Models for Network Data

    Stephen E. Fienberg

    Department of Statistics, Living Analytics Research Centre,

    Heinz College, Machine Learning Department, and Cylab

    Carnegie Mellon University

    (Based on joint work with Sonja Petrovic, Alessandro Rinaldo,

    and Xiaolin Yang)

    Algebraic Statistics in the Alleghenies at The Pennsylvania StateUniversity

    June 11, 2012

    http://find/
  • 7/30/2019 Fien Berg

    2/41

    Graphs as Metaphors

    Representation of statistical structures in terms of graphs

    G = {V,E}, is a useful metaphor that allows us to exploitthe mathematical language of graph theory and some

    relatively simple results.Graphs often provide powerful representations for the

    interpretation of models.

    Vertices and edges have different meaning in different

    statistical settings.

    2 / 4 1

    http://find/http://goback/
  • 7/30/2019 Fien Berg

    3/41

    Graphical Representations of Statistical Models

    Variables IndividualsDirected a b

    Undirected c d

    aHMMs, state-space models, Bayes nets, causal models(DAGs), recursive partitioning models

    bsocial networks, trees, citation and email networks

    ccovariance selection models, log-linear models,

    multivariate time-series models

    drelational networks, co-authorship networks

    Note that a and c refer to probability models, while b and d are

    used to describe observed data.

    3 / 4 1

    http://find/
  • 7/30/2019 Fien Berg

    4/41

    aHMMs, State-Space Models

    4 / 4 1

    http://find/
  • 7/30/2019 Fien Berg

    5/41

    aCausal Models, DAGs

    CHILD network (blue babies) (Cowell et al.,1999)

    5 / 4 1

    http://find/
  • 7/30/2019 Fien Berg

    6/41

    bSocial Networks

    AIDS blog network (Kolaczyk, 2009)

    6 / 4 1

    http://find/
  • 7/30/2019 Fien Berg

    7/41

  • 7/30/2019 Fien Berg

    8/41

    cLog-linear Models

    Prognostic factors for coronary heart disease for Czech

    autoworkers26 table (Edwards and Hrvanek, 1985)

    8 / 4 1

    http://find/
  • 7/30/2019 Fien Berg

    9/41

    dRelational Networks

    Zacharys karate club network (Zachary, 1977; Kolaczyk,

    2009)

    9 / 4 1

    http://find/http://goback/
  • 7/30/2019 Fien Berg

    10/41

    dYeast Protein-Protein Interaction

    Airoldi et al. (2008)

    10/41

    http://find/
  • 7/30/2019 Fien Berg

    11/41

    Graphical Models for Variables

    The following Markov conditions are equivalent:

    Pairwise Markov Property: For all nonadjacent pairs ofvertices, i and j, i j | K \ {i,j}.Global Markov Property: For all triples of disjoint subsets ofK, whenever aand b are separated by c in the graph,a b | c.Local Markov Property: For every vertex i, if c is the

    boundary set of i, i.e., c = bd(i), and b = K \ {i c} , theni b | c.

    All discrete graphical models are log-linear.

    Gaussian graphical model selection problem involves

    estimating the zero-pattern of the inverse covariance orconcentration matrix.

    For DAGs, we continue to use Markov properties but also

    exploit partial ordering of variables.

    Always assume individuals or units are independent r.v.s.

    11/41

    http://find/
  • 7/30/2019 Fien Berg

    12/41

    Models for Individuals/Units in Networks

    Graph describes observedadjacency matrix.

    Usually use 1 for presence of an edge, and 0 for absence.Can also have weights in place of 1s.

    Except for Erdos-Renyi-Gilbert model, where occurrence

    of edges corresponds to iid Bernoulli r.v.s, units are

    dependent.

    Simplest generalization of Erdos-Renyi-Gilbert model

    assumes that dyads are independente.g., the p1 model

    of Holland and Leinhardt, which has additional parameters

    for reciprocation in directed networks.

    Exponential Random Graph Models (ERGMs) that includestar and triangle motifs no longer have dyadic

    independence.

    Can have multiple relationships measure on same

    individuals/units.

    12/41

    T Ki d f N k M d l

    http://find/
  • 7/30/2019 Fien Berg

    13/41

    Two Kinds of Network Models

    Models directly amenable to algebraic statisticsapproaches because they both the model and the

    likelihood functions involve polynomials, e.g.,Erdos-Renyi-Gilbert and ERGs.

    These models have minimal sufficient statistics (MSSs) thatprovide a lower dimensional representation of the data forformal inference, e.g., marginal totals or sums of counts.

    Bayesian hierarchical models for which the data are theMSSs, e.g., mixed-membership stochastic block models,random Rasch models, etc.

    These models gain their attractiveness statistically byintegrating over all parameters in lower levels of thehierarchy and thus focusing on a reduced number ofparameters, but which involve the data in complex ways.

    Goldenberg et al. (2010) review both types models.

    This talk focuses on the former kinds of models but the

    discussion does have implications for some Bayesian

    hierarchical model settings.13/41

    E d R i Gilb t M d l

    http://find/
  • 7/30/2019 Fien Berg

    14/41

    Erdos-Renyi-Gilbert Model

    In G(n,M) model, we chose graph uniformly at random

    from the collection of all graphs which have n nodes and Medgeshypergeometric distribution associated with the

    degree of a node.

    In G(n,p) model, we connect nodes in graphindependently, with constant probability p, Now M is

    random and has a binomial distribution with probability(n2)M

    pM(1 p)(

    n2)M.

    interesting probability structure, especially as as n and M

    get large, but not much of interest statistically for a fixed n

    or M since basically we are in a simple distributionalsetting.

    This changes when we let p vary depending on the nodes

    it connects, and especially when we allow edges to be

    directed and dependent.

    14/41

    H ll d d L i h dt d l

    http://find/
  • 7/30/2019 Fien Berg

    15/41

    Holland and Leinhardt p1 model

    nnodes, random occurrence of directed edges.

    Describe the probability of an edge occurring betweennodes i and j:

    log Pij(0,0) = ij

    log Pij(1,0) = ij + i + j +

    log Pij(0,1) = ij + j + i +

    log Pij(1,1) = ij + i + j + j + i + 2 + ij

    3 common forms:ij = 0 (no reciprocal effect)ij = (constant reciprocation factor)ij = + i + j (edge-dependent reciprocation)

    When edges are undirected, p1 reduces to the beta model.15/41

    E ti ti f

    http://find/
  • 7/30/2019 Fien Berg

    16/41

    Estimation for p1

    The likelihood function for the p1 model is clearly of

    exponential family form.

    For the constant reciprocation version, we have

    log p1(x) x++ +

    i

    xi+i +

    j

    x+jj +

    ij

    xijxji (1)

    Get MLEs using IPFmethod scales.Holland-Leinhardt explored goodness of fit of modelempirically by comparing ij = 0 vs. ij = .

    Standard asymptotics, e.g., 2 tests, not applicable.No. parameters increases with no. of nodes.

    Fienberg and Wasserman (1981) use edge-dependentreciprocation model to test ij = .

    Algebraic Statistics: Petrovic et al. (2010) provide Markov

    bases; Rinaldo et al. (2011) characterize MLE existence.

    Direct link to Bradley-Terry model when = 0.16/41

    Exponential Random Graph Models (ERGMs)

    http://find/
  • 7/30/2019 Fien Berg

    17/41

    Exponential Random Graph Models (ERGMs)

    Let X be a n nadjacency matrix or a 0-1 vector of lengthn

    2

    or a point in {0,1}n

    ).Identify a set of network statistics

    t = (t1(X), . . . , tk(X)) Rk

    and construct a distribution such that t is a vector ofsufficient statistics.

    This leads to an exponential family model of the form:

    P(X = x) = h(x) exp{ t ()},

    where

    Rk is the natural parameter space;() is a normalizing constant (often intractable);h() depends on x only.

    17/41

    Likelihood Methods for ERGMs

    http://find/
  • 7/30/2019 Fien Berg

    18/41

    Likelihood Methods for ERGMs

    Likelihood methods are more complex than exponential

    family structure might suggest.

    Pseudo-estimation using independent logistic regressions,

    one per node.Can get MLEs via MCMC.

    Problem of degeneracy or near degeneracy:

    MLEs dont existlikelihood maximized on the boundary.Likelihood function is not well-behaved and most

    observable configurations are near the boundary.

    18/41

    ERGMs: 7 node Example I

    http://find/http://goback/
  • 7/30/2019 Fien Berg

    19/41

    ERGMs: 7-node ExampleI

    Set of all graphs on 7 nodes when the sufficient statistics are

    the number of edges and number of triangles.

    There are 221 = 2, 097,152 possible graphs!There are only 110 different configurations for the

    2-dimensional sufficient statistics.

    Note: many points on the boundary (including the empty

    and complete graph)

    0 2 4 6 8 10 12 14 16 18 200

    5

    10

    15

    20

    25

    30

    35Support of the edge and triangle statistics

    Number of edges

    Numberoftriang

    les

    19/41

    http://find/
  • 7/30/2019 Fien Berg

    20/41

    Inference for Models for Individuals/Units in Networks

  • 7/30/2019 Fien Berg

    21/41

    Inference for Models for Individuals/Units in Networks

    Relevant asymptotics has number of nodes, n.

    When there are node-specific parameters, asymptotics are

    far more complex, e.g., see Chatterjee and Diaconis

    (2011).Maximum likelihood approaches available for ERGMs.

    For blockmodels, with constant structure within blocks,there is asymptotic theory.

    Related literature on community formation and

    modularity. Bickel and Chen (2009)

    21/41

    Pseudo-likelihood for ERGMs

    http://find/
  • 7/30/2019 Fien Berg

    22/41

    Pseudo-likelihood for ERGMs

    Frank and Strauss (1986) and Strauss and Ikeda (1990),

    following ideas of Besag and work on Markov random fieldmodels, considered conditional probability P(Xij = 1|X

    cij )

    where Xcij is the graph after removing edge (i,j).

    P(Xij = 1|Xcij ) =

    exp [ (T(X+ij ) T(X

    ij ))]

    1 + exp [ (T(X+ij ) T(X

    ij ))]

    =exp [ (Xcij )]

    1 + exp [ (Xcij )]

    where X+ij and X

    ij represent graphs setting Xij = 1 or 0,

    Xcij denotes remainder of network , and (xcij) is change of

    SSs when xij changes from 0 to 1.

    This has form of logistic regression model.

    22/41

    Maximum Pseudo-likelihood Estimation for ERGMs

    http://find/
  • 7/30/2019 Fien Berg

    23/41

    Maximum Pseudo likelihood Estimation for ERGMs

    Pseudo-likelihood treats logistic regression components as

    if they were independent and sums over all edges:

    lP(; x) =

    ij

    (xcij)xij

    ij

    log(1 + exp(T(xcij))) (2)

    Simple Markov basis structure for pseudo-likelihood.

    Fienberg, Petrovic, and Rinaldo (unpub.)

    Theorem (Yang, Fienberg, and Rinaldo (unpub.))

    The existence of the MPLE implies the existence of the MLE.

    The converse is false.

    Implications:If we use MPLEs we may act as if the likelihood iswell-behaved when it is not.Even if MLEs exist, actual values of MPLEs and MLEs candiffer substantially.

    23/41

    MLE vs MPLE7 node Example

    http://find/
  • 7/30/2019 Fien Berg

    24/41

    MLE vs. MPLE 7 node Example

    24/41

    dRelational Networks

    http://find/
  • 7/30/2019 Fien Berg

    25/41

    d Relational Networks

    Zacharys karate club network (Zachary, 1977; Kolaczyk,

    2009)

    25/41

    MLE vs. MPLEZacharys Karate Club Data

    http://find/
  • 7/30/2019 Fien Berg

    26/41

    MLE vs. MPLE Zachary s Karate Club Data

    Simple ERGMs dont fit data well because they ignore

    two-block structure.

    Nonetheless, we can see howMLE and MPLE estimation

    methods diverge for these data.

    26/41

    Connecting the Two Graphical Approaches

    http://find/http://goback/
  • 7/30/2019 Fien Berg

    27/41

    Connecting the Two Graphical Approaches

    There is link between graphical models for for variables

    and graphical models for networks, not just a common

    metaphor.Frank and Strauss (1986) introduce a pairwise Markov

    property for individual-level undirected network models.

    Key is the construction of the dual graph.

    27/41

    The Network Dual Graph

    http://find/
  • 7/30/2019 Fien Berg

    28/41

    e et o ua G ap

    Dual Graph: Set up conditional independence graph,G = {V,E}, whose nodes are edges from originalgraph, G = {V,E}.

    Xij and Xij are conditionally independent given the other

    r.v.s Xkl iff they do not share a vertex in G

    = {V

    ,E

    }.G is Markov if G contains no edge between disjoint sets

    (s, t) and (u, v) in V.

    Cliques in a Markov random graph are stars (edges are

    1-stars) of various orders and triangles.

    If there are no edges in the dual graph G, then we

    essentially have the beta model.

    28/41

    Undirected Markov Random Graph Models

    http://find/
  • 7/30/2019 Fien Berg

    29/41

    p

    Theorem

    For homogeneous (exchangeable) graphs, distribution of X

    now satisfies the pairwise Markov property iff

    Pr{X = x} exp{

    NV1

    k=1

    kSk(x) + T(x)}

    where Sk(x) is no. of k-stars and T(x) is no. triangles.

    Many other ERGMs dont have this property, e.g., those

    with alternating k-stars and alternating triangles.

    29/41

    Directed Markov Random Graph Models

    http://find/http://goback/
  • 7/30/2019 Fien Berg

    30/41

    p

    Analogous approach to construction of dual graph for

    situation with directed edges.

    Now the vertices of G are paired corresponding to dyads

    from G.

    Cliques are the original dyads, and various stars and

    triangular structures.

    If model contains no stars or triangles, it reduces to p1.

    30/41

    Open Problems

    http://find/
  • 7/30/2019 Fien Berg

    31/41

    p

    Some network models have nice, non-degeneratebehavior.

    Dyadic independence models such as p1.Simple blockmodels.Blockmodels that build on dyadic independence structures.

    Question: Are there other ERGMs, and in particularMarkov Random Graph Models, that are nice?

    Question: Where does decomposability in dual graph fit in?

    Question: Can we derive Markov bases for ERGMs other

    than the simple ones mentioned above?Question: Is there a role for AS in other aspects of these

    considerations?

    31/41

    Link to Sturmfels on Ranking

    http://find/
  • 7/30/2019 Fien Berg

    32/41

    g

    Recall relationship between p1 model with 0 andBradley-Terry model from paired comparisons literature.

    Plackett-Luce model is the natural generalization of BT forfully-ranked data.

    Network relationships often involve ranking and censoring,

    e.g., List your 5 closest friends.This is the methodology in the Add Health Network.For those with 5 or less friends we have complete ranking.For those with > 5 friends we have a censored list ofrankings.

    Question: What are the extensions of the p1 and otherERGMs to deal with the ranking and censoring problem?

    Question: Is there a role for AS in these considerations?

    32/41

    Roles for Latent Variables

    http://goforward/http://find/http://goback/
  • 7/30/2019 Fien Berg

    33/41

    For graphical models for variables:

    Natural for many models, e.g., HMMs.Arise naturally in Hierarchical Bayesian structures.Hyperparameters are latent quantities.

    For models for individuals/units in networks:

    Random effects versions of node-specific models such asp1.Arise naturally in hierarchical Bayesian approaches, suchas Mixed Membership Stochastic Blockmodels and latentspace models.

    Can also use latent structure to infer network links from

    data on variables for individuals, e.g., as in relational topic

    models.

    Is there a role role AS methods here?

    33/41

    Ex: National Longitudinal Study of Adolescent Health

    http://goforward/http://find/http://goback/
  • 7/30/2019 Fien Berg

    34/41

    Add HealthFollows nationally representative sample of U.S.adolescents in grades 7-12 from 1994-95 school year intoyoung adulthood with 4 in-home interviews, the latest in2008.Combines data on respondents social, economic,psychological and physical well-being with contextual dataon family, neighborhood, community, school, friendships,peer groups, and romantic relationships.

    Key notion is that models not only need to be longitudinal

    (dynamic) but also link to variables associated with nodesand or edges.

    Models for simple dyadic relationships are not sufficient.

    34/41

    eat r en s p etwor ata, unter et a .(2008)

    http://find/http://goback/
  • 7/30/2019 Fien Berg

    35/41

    (2008)Wave I:

    35/41

    Role of Time/Dynamics

    http://find/http://goback/
  • 7/30/2019 Fien Berg

    36/41

    For graphical models for variables:

    Time gives ordering to variables and assists in causalmodels.Note distinction between position of underlying latentquantity over time and the actual manifest measurementassociated with it, which is often measured retrospectively.

    Dynamic models for individual-based networks:

    Continuous-time stochastic process models for event data,perhaps aggregated into chunks.

    Discrete-time transition models, perhaps embedded intocontinuous time process, e.g., see Hanneke et al. (2010).

    36/41

    Drawing Inferences From Subnetworks and Subgraphs

    http://find/
  • 7/30/2019 Fien Berg

    37/41

    Inferences from SubgraphsConditional independence structure allows for local

    message passing and inference from cliques and regularsubgraphs when there are separator sets that isolatecomponents.Interpretation in terms of GLM regression coefficientsalways depends on the other variables in the model.

    Inferences from SubnetworksMost properties observed in subnetworks dont generalizeto full network, and vice versa, e.g., power laws for degreedistributions.Problem is dependencies among nodes and boundaryeffects for subnetworks.

    Missing edges are generally not missing at random, exceptfor some sampling settings, e.g., see Handcock and Gile(2010).

    The forgoing suggests that we cant use cross-validation

    for all but simplest network models.

    37/41

    Summary

    http://find/
  • 7/30/2019 Fien Berg

    38/41

    Two types of settings:

    Variables Individuals

    Directed a b

    Undirected c d

    For a and c we use conditional independence ideas tomodel probabilistic relations among variables.

    for b and d we use graph to represent observed data.

    Independence comes into play in network settings only via

    dyadic independence.

    ERGMs have heuristic appeal but often display degenerate

    behavior.Markov Random Graph models invoke the Markov

    property we inherit from more traditional graphical model

    setting, but whether something nice flows from Markov

    Random Graph structure remains an open issue.

    38/41

    References

    http://find/
  • 7/30/2019 Fien Berg

    39/41

    Airoldi, E M., Blei, D. M., Fienberg, S. E., and Xing, E. P. (2008)Mixed Membership Stochastic Blockmodels. Journal of Machine

    Learning Research, 9, 19812014.

    Bickel, P. and Chen, A. (2009) A Nonparametric View of NetworkModels and Newman-Girvan and Other Modularities. PNAS, 106

    (50), 2106821073.

    Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. (1975) DiscreteMultivariate Analysis: Theory and Practice. MIT Press. Reprinted bySpringer (2007).

    Chaterjee, S. and Diaconis, P. (2011) Estimating and UnderstandingExponential Random Graph Models. arXiv:1102.2650v3.

    Cowell, R. G., Dawid, A. P., Lauritzen, S. L., and Spiegelhalter, D. J.(1999) Probabilistic Networks and Expert Systems. Springer.

    Frank, O. and Strauss, D. (1986) Markov Graphs. Journal of theAmerican Statistical Association, 81, 832842.

    39/41

    ReferencesII

    http://find/
  • 7/30/2019 Fien Berg

    40/41

    Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi E. M.

    (2010) A Survey of Statistical Network Models. Foundations andTrends in Machine Learning, 2 (2), 129233.

    Handcock, M. S. and Gile, K. J. (2010) Modeling Social Networksfrom Sampled Data. Annals of Applied Statistics, 4, 525.

    Hanneke, S., Fu, W. and Xing, E. P. (2010) Discrete Temporal Modelsof Social Networks. Electronic J. Statistics, 4, 585605.

    Hunter, D. R. , Goodreau, S. M., and Handcock, M. S. (2008)Goodness of Fit of Social Network Models. J. American StatisticalAssociation, 103, 248258.

    Lauritzen, S. (1996) Graphical Models. Oxford Univ. Press.Kolaczyk, E. D. (2009) Statistical Analysis of Network Data: Methodsand Models. Springer.

    40/41

    ReferencesIII

    http://find/
  • 7/30/2019 Fien Berg

    41/41

    Petrovic, S., Rinaldo, A., and Fienberg, S. E. (2010) AlgebraicStatistics for a Directed Random Graph Model with Reciprocation.Proceedings of the Conference on Algebraic Methods in Statisticsand Probability, Contemporary Mathematics Series, AMS, 261283.

    Rinaldo, A., Fienberg, S. E. and Zhou, Y. (2009) On the Geometry of

    Discrete Exponential Families with Application to ExponentialRandom Graph Models. Electronic J. Statistics, 3, 446484.

    Rinaldo, A., Petrovic, S., and Fienberg, S. E. (2011) On the Existenceof the MLE for a Directed Random Graph Network Model withReciprocation. arXiv:1010.0745v1

    Zachary, W. W. (1977) An Information Flow Model for Conflict andFission in Small Groups. J. Anthropological Research, 33, 452473.

    41/41

    http://find/