Some Advances in Role Discovery in GraphsTina Eliassi-Rad Rutgers University [email protected]...
Transcript of Some Advances in Role Discovery in GraphsTina Eliassi-Rad Rutgers University [email protected]...
Some Advances in Role Discovery in Graphs
Sean GilpinUniversity of California, Davis
Chia-Tung KuoUniversity of California, Davis
Tina Eliassi-RadRutgers University
Ian DavidsonUniversity of California,[email protected]
September 8, 2016
Abstract
Role discovery in graphs is an emerging area that allows analysis ofcomplex graphs in an intuitive way. In contrast to other graph prob-lems such as community discovery, which finds groups of highly connectednodes, the role discovery problem finds groups of nodes that share similargraph topological structure. However, existing work so far has two severelimitations that prevent its use in some domains. Firstly, it is completelyunsupervised which is undesirable for a number of reasons. Secondly, mostwork is limited to a single relational graph. We address both these lim-itations in an intuitive and easy to implement alternating least squaresframework. Our framework allows convex constraints to be placed onthe role discovery problem which can provide useful supervision. In par-ticular we explore supervision to enforce i) sparsity, ii) diversity and iii)alternativeness. We then show how to lift this work for multi-relationalgraphs. A natural representation of a multi-relational graph is an order3 tensor (rather than a matrix) and that a Tucker decomposition allowsus to find complex interactions between collections of entities (E-groups)and the roles they play for a combination of relations (R-groups). Ex-isting Tucker decomposition methods in tensor toolboxes are not suitedfor our purpose, so we create our own algorithm that we demonstrate ispragmatically useful.
1 Introduction
Role discovery is a developing area that allows the simplification of graphs in auser-interpretable way. Consider a graph of n nodes specified in an adjacency
1
arX
iv:1
609.
0264
6v1
[cs
.AI]
9 S
ep 2
016
matrix A. Earlier efforts convert this matrix into a new n× f matrix V so thateach node in the graph has a list of f features [22]. Role discovery is then thecomputation of converting V so that each node/user is mapped to a combinationof roles (denoted by the n × r matrix G) and each role is defined with respectto the f features (denoted by the r × f matrix F). This is accomplished byperforming a non-negative matrix factor decomposition as shown below:
argminG,F
||V −GF||2
subject to: G ≥ 0,F ≥ 0(1)
The n× r matrix G when read row-wise indicates which of the r roles eachnode plays and to what degree. The r× f matrix F when read row-wise defineseach of the r roles with respect to the f features. The entries in G and Fare non-negative real numbers signifying that each node can play each role tovarying degrees and that different features define a role in varying degrees. Thissimplification of graphs into roles is not only intuitive for a domain expert, butit has been shown to be useful in a number of interesting settings includingprediction, transfer learning, and sense making [21].
Limitations in Existing Work. However, all work developed so far hastwo limitations. Firstly, role discovery has been typically completely unsuper-vised in that the domain expert cannot easily inject their expertise and expec-tations into the simplification and secondly role discovery is typically performedon a single relational graph. We now discuss each limitation in turn.
Consider a domain expert that is looking for the simplest explanation of agraph during their exploratory phase of analysis. Existing work cannot specifyhow to emphasize this simplicity apart from requiring a small number of rolesto be used. Other forms of parsimonious guidance such as requiring a node onlybe assigned to a few roles or making each role defined by only a small set offeatures is desirable but currently not possible. Similarly, if a decompositionyields a set of roles that are not actionable, not interesting or already known,the domain expert cannot enforce an alternative set of roles. These two recenttrends in data mining – exploring the addition of positive and negative guidance– have been shown to have wide-scale application in the data mining literature[5][36]; but to our knowledge have not been applied to role discovery. Hencethis work marks the first paper exploring guided role discovery.
To our knowledge previous work in role discovery only focuses on simplegraphs with a single relational type. Conversely, many datasets are either di-rectly multi-relational or can be modeled as a multi-relational graph. Consideran email graph, modeling just one relation sent-mail-to. This graph greatlymasks the complexity of the underlying behavior occurring in the network. In-stead many more insights could be found if say the topic of the email were alsoconsidered producing a multi-relational graph sent-mail-to-y-about-x. Sim-ilarly, consider a node-attributed social network graph, that is, each node hasmultiple labels. Such a graph can augment the basic friend relation by creatingmultiple relations such as female-friend, school-friend or nearby-friendby placing an edge between nodes that are friends that also share label values.
2
Challenges.The challenge with adding guidance to role discovery is how to do so whilst
still yielding an efficiently solvable algorithm. Pre-processing the graph or post-processing the results is undesirable, instead it is preferable to inject the guid-ance into the underlying algorithm that finds the roles. The alternating leastsquare (ALS) is a popular and well understood algorithm for non-negative ma-trix factorization (NMF) for role discovery and the challenge is to add in guid-ance into this algorithm.
The challenge of role discovery in multi-relational graphs is two fold, thefirst is representational and the second is algorithmic.
Representational Challenges. How should a multi-relational graph berepresented for effective and efficient role discovery? In Figure 6, we show howan order 3 tensor can compactly represent such a graph. Here, the first moderepresents the entities (i.e., nodes) in the graph; the second mode is the fea-tures of each entity (obtained from our ReFeX package [22]), and the last moderepresents the relations. The existing work on single-relation graphs uses non-negative matrix factorization. The analog PARAFAC (parallel factor) tensordecomposition for our multi-relational graph tensor, has several serious limita-tions. In particular, it requires each group of nodes to play exactly one rolefor exactly one set of relations. This is not due to the rank one decomposi-tion assumption, but rather due to the simplified form of decomposition. Thiscardinality limitations greatly limits what can be found. Consider our afore-mentioned example of an email network, we could perhaps find that a groupof people play the role of a broker for a particular email topic, office-party.Though useful, if those people also play a different role for the same email topica PARAFAC decomposition could not find it. Similarly and most importantly, ifanother group were to play the role of a peripheral figure for the exact sametopic (office-party) PARAFAC would not be able to discover this relation. Itis precisely these types of complex multi-way interactions between people, rolesand relations that we wish to discover. Hence we do not consider PARAFACdecompositions, though it would be the natural extension of our earlier work onrole discovery to multi-relational graphs. Instead we use a Tucker decompositionshown in Figure 7 whose addition of a core tensor to the decomposition allowsmultiple groups of entities (E-groups) to play multiple roles for multiple groupsof relations (R-groups). This allows very complex insights into the behavior inthe graph to be found, but the challenge of how to interpret and use this coreis critical to our work.
Algorithmic Challenges. The second challenge is that existing Tuckerdecompositions found in the popular Kolda Tensor Toolbox and Bro NWayToolbox are not suited for our purpose. Existing toolboxes implement an or-thogonality constraint on the factor matrices which in our context (where thetensor only contains non-negative values) means each group of entities must bedistinct (i.e., non-overlapping) from every other group, and the same for rolesand groups of relations. Similarly existing toolboxes typically do not enforcea non-negativity constraint on the core of the Tucker meaning if we use themwe would have entities playing negative roles which does not make intuitive
3
sense. Hence to better fit our needs of having interpretable insights on overlap-ping groups of entities (a.k.a. E-groups), roles, and groups of relations (a.k.a.R-groups), we develop our own algorithm, Multi-relational Role Discovery(MRD), shown in Algorithm 1.
Our work makes several contributions to the field of role discovery in graphs.With respect to guided role discovery we show:
• We provide a framework to encode guidance as a series of convex optimiza-tion problems each of which can be efficiently solved by our alternatingleast squares (ALS) algorithm. All data sets and code will be made avail-able once the paper is accepted.
• Within our framework we explore guidance in the form of sparsity, di-versity and orthogonality/alternativeness but other types of guidance arepossible.
• We show that sparsity and diversity yield improved performance in termsof predictive accuracy for the identity resolution task across multiplegraphs.
• We show that alternative roles exist in social networks (such as in aYouTube graph) and in particular these roles are very different from theknown communities in the data.
With respect to multi-relational role discovery we show:
• We propose and study role discovery in multi-relational graphs using ten-sors and using our novel MRD Tucker decomposition algorithm (see Sec-tions 6 and 7 ).
• We show how to analyze the core tensor of the Tucker decomposition in amultitude of visual and analytic ways to explain the complex interactionsoccurring (see Section 8).
• We create and measure macro-level properties of the interaction graphsuch as the simplicity, sharing and stability of the graph with respect toroles (see Table 5).
• We use a constrained formulation of our algorithm that allows transfer-ring in knowledge (i.e., roles) from a graph to explain another graph (seeSection 9.2) This allows understanding temporal shifts in roles (see Figure18).
In the next section, we describe related work and then an algorithm forincorporating convex constraints in non-negative matrix decomposition whichallows us to encode guidance in a flexible way. Section 4 presents how convexconstraints can naturally encode guidance in the form of sparsity and diversityon both the role assignment matrix (G) and role explanation matrix (F). Wealso present how these constraints can encode the notion of alternativeness to
4
find a different set of roles to another set that are for instance non-actionable ortrivial. Our experiments on guidance, in Section 5, demonstrate the usefulnessof these forms of guidance in a number of applications and real-world graphs.We show how sparsity and diversity guidance can improve upon prediction per-formance for the application of identity resolution via roles. We also show howalternativeness can be used to find an alternative set of roles to the underlyingcommunity structure. Next in Section 6 we show how multi-relational role dis-covery can be formulated as a tensor decomposition problem. In particular itcan be modeled using a non-negative tucker decompositions, and in that sectionwe also propose our Tucker decomposition algorithm. The Tucker decomposi-tion allows capturing many of the complex interactions between nodes, the rolesthey play, and the relationships they play them for, which are captured in thecore tensor of the Tucker decomposition. In Section 8 we discuss how the coreof the Tucker can be interpreted a number of ways, including as a heteroge-neous hyper-graph on the space of groups of nodes, groups of features (roles)and groups of relations. Our work opens up many possible novel uses and weexperimentally focus on two: i) macroscopic properties of the graphs in termsof roles and ii) transfer settings between multiple graphs which are discussed inSection 9.
2 Related Work
The basis for role discovery in graphs using non-negative matrix factorization(NMF) was first proposed in a series of papers at KDD [22][21]. The methodReFeX [22] described a recursive method to take a n× n adjacency matrix (A)and compute a set of f salient features for each of the n nodes represented asa matrix V . The RolX method [21] made use of NMF to simplify the featuresinto a set of roles and explored their use for graph matching, sense making andtransfer learning. Many previous works had applied NMF to other data miningproblems (e.g. [40][28]) but theirs was the first to apply it to role discovery.Other methods for role discovery are not scalable to huge graphs and includeBayesian frameworks using MCMC sampling methods [37] and semi-supervisedrole labeling [17].
The addition of guidance to matrix decomposition is a relatively new areawith most work involving spatial data and properties such as unimodality aswe have done for tensors [12]. Of course much work exists on very basic con-straints such as non-negativity and minimal rank decompositions. The area ofconstraints for matrix decomposition takes on several different meanings to ourown work. For example in [30] the authors propose the use of labeled informa-tion to guide the decomposition. Perhaps the closest to our own work is the useof sparseness constrains in NMF [23].
To the best of our knowledge the encoding of guidance for role discovery andthe encoding of diversity and alternative constraints for NMF as described inthis paper has not been addressed before. However, the notion of guidance and“alternativeness” is popular in the clustering field with work by ourselves and
5
others [5][36].
3 A Constrained NMF Framework for EncodingGuidance into Role Discovery
In this section, we discuss our algorithm for solving the guided role-discoveryproblem. We present a general algorithm that is well-suited for large-scaleproblems, and is capable of being extended to different forms of guidance. Thedifferent supervisions (described in Section 4) are solvable using this algorithm.
Our algorithm for solving the guided role discovery problem is a constrainedNMF approach used to find the decomposition shown in Equation 2. Likemany unconstrained NMF solvers, it uses the alternating least squares approach[35, 7]. Non-negative least squares is a well-studied problem, and can be utilizedto find an NMF solution by solving for one matrix at a time (G or F), whileholding the other constant which is generally known as alternating least squares(ALS). NMF is known to be intractable; and the ALS approach is not guaranteedto find global solutions but will converge to a local minimum. In this work, weadd additional constraints to the problem and therefore need more sophisticatedmethods.
The method we chose was motivated by gradient projection methods, whichare known for being well-suited to quickly finding good but not highly accuratesolutions for large problems, by sacrificing some of the theoretical convergenceguarantees of methods such as interior point [6]. Projected gradient descentmethods can be summarized as those that iteratively find better points by fol-lowing the gradient of the objective function, and subsequently find the closestpoint that meets the constraints. Since the objective we are solving is leastsquares, we have a closed form solution to the unconstrained minimum fromwhich we subsequently find the closest constrained solution. It is known, thatfor a class of constrained least squares solution, this approach will lead to anexact global solution in one iteration (see Lemma 1).
Therefore, our algorithm has the advantage that each subproblem (but notthe entire problem) can be solved exactly by reducing it into an unconstrainedleast square problem [39][3] and an Euclidean projection problem [14][32], bothof which have efficient solutions. Additionally, this approach to optimization(projected gradient descent) has been shown in the past to work well on large-scale problems, at the expense of accuracy, and is used by state of the art solvers[31].
The outline of the remainder of this section is as follows. First, we formallydescribe the convex constrained NMF problem and discuss how ALS can be usedto solve it. Then, we explain how ALS can also be used to solve for individualrole assignment vectors, as well as role definition vectors. Finally, we describehow ALS over definition/assignment vectors can be solved using a projectionmethod by first solving an unconstrained least squares problem and then findingthe closest point in the constrained space.
6
The Constrained NMF Problem In Equation 2, there are two variables Gand F that are being simultaneously optimized. If either is treated as a constant,the problem becomes convex and can be solved exactly using any method forsolving convex optimization problems. One can alternate between solving forG and F this way until convergence. Although each iteration finds a globaloptimum to this modified problem, the result of this procedure (alternatingoptimization) is not guaranteed to find a global minimum to the original problemin Equation 2. In the following, we describe our method for transforming theformulation into a series of convex programming problems, which are generallyeasy to solve.
minimizeG,F
||V −GF||2
subject to gi(G) ≤ dGi, i = 1, . . . , tG
fi(F) ≤ dFi, i = 1, . . . , tF
(2)
where gi and fi are convex functions.
An ALS Formulation Rather than alternating between solving for the entirematrices G and F, we can instead solve for one column of G or one row of F ata time. This is possible if convex constraints can be specified in terms of thesecolumns, which is the case in this work. Without loss of generality, Equation 3shows an individual sub-optimization problem in terms of one of the columnsof G, denoted x.
Gk = minimizex
||R− xFk||2subject to: gi(x) ≤ dGi, i = 1, . . . , tG
(3)
In Equation 3, R represents the residuals of all other factors not being solvedfor (sum of outer products of corresponding columns of G and rows F). Fk isthe kth row of the role/feature explanation matrix that corresponds to the kth
column of the role assignment matrix. So with this formulation, we alternatebetween learning single role assignments, followed by learning a role definition.Next we explain how we solve the convex constrained problem shown in Equation3.
Solving The Constrained Least Squares Problem Our projection methodis as follows. First, solve Equation 3 with all constraints removed using stan-dard least squares solvers. Second, find the closest point to the unconstrainedsolution, that satisfies the given constraints. This projection method takesadvantage of standard and very fast least squares solvers and the subsequentnearest feasible point problem is relatively simple to solve. In addition, Lemma1 shows that performing these two steps will exactly solve the original prob-
7
lem in Equation 3. Applications of this theorem and its proof can be found in[10][20].
Lemma 1 Projection Equivalence Result. The following constrained optimiza-tion problem:
minimizex
||B− xa||2subject to: ci(x) ≤ di, i = 1, . . . , n
(4)
where ci are convex functions on x, is equivalent to:
minimizex
||x∗ − x||2subject to: ci(x) ≤ di, i = 1, . . . , n
(5)
where x∗ is the optimal to the optimization problem in Equation 4 without con-traints.
This leads to the following algorithm for convex constrained NMF presentedin Figure 1. Like ALS for unconstrained NMF, this heuristic is not guaranteedto meet a global optimum, even though all subproblems are solved exactly.However, each step will lead to a reduction in the global objective (Equation 2).Thus, in practice the algorithm will find local minima that meet all specifiedconstraints.
The advantage of solving for one role at a time rather than the entirety of Gor F as is generally done with ALS, is that it allows the problem to be brokendown into smaller parts that then fit into fast solvers. In general, projectionmethods have been found to be better suited to larger problems and we foundthis to be the case as well. Using this method allows us to solve much largerproblems than we had previously been able to using standard constrained opti-mization solvers [12]. The final constrained optimization problem (i.e., closestconstrained point problem) is simple enough that we find for even medium-sizedproblems we could utilize high level solvers such as CVX [11][19], which makesexperimenting with new types of constraints very simple.
4 Framework for Flexible Supervision
In the previous section, we discussed a novel and general algorithm that caneasily handle convex constraints. Convex constraints can encode a variety ofuseful guidances. In this section, we show how they can be used to enforcesparsity, diversity and alternativeness. In the experimental section, we showapplications which exploit these forms of guidance.
4.1 Sparsity
The area of sparsity has recently attracted much attention. In a general context,sparsity has been shown to have two main benefits: (1) parsimony and (2)improved predictive performance, with the later being motivated by Occam’s
8
Table 1: Summary of effects of constraints on both role assignment G and roledefinitions F (see Section 4 for formulation of each constraint type).
Role Assignment Role DefinitionSparsity Encourages role assignments to
be more definitive. Increasingthe strength of constraint re-duces the number of nodes thathave minority membership inrole.
Increases the ability to inter-pret role definitions by ensuringthat the definitions only use fea-tures most strongly correlatedwith each role. Increasing thestrength of constraint decreasesthe likelihood that features withsmall explanatory benefit areincluded.
Diversity Roles cannot have membershipsthat are too similar. No tworoles can have exactly the samemembership assignment. In-creasing the strength of the con-straint limits the amount of al-lowable overlap in assignments.
Roles cannot have definitionsthat are too similar. No tworoles can have redundant ex-planations and increasing thestrength of constraint ensuresthat roles must be explainedwith completely different sets offeatures.
Alternative Find a set of roles that lendsitself to a different role assign-ment than a given role assign-ment. Increasing the strength ofconstraint, decreases the allow-able similarity between the two.
Learn a role definition matrixthat is significantly differentthan a given role definition. In-creasing the strength of con-straint ensures that the defini-tions must be very dissimilar.
9
Inputs:
• V: Node feature matrix containing n nodes described by f topologicalstructure features.
• gi(x),fi(x): Convex constraints on columns of G and rows of F respec-tively.
• r: Number of roles (methods for learning r described in previous work[21]).
Outputs:
• G: Role assignment matrix that satisfying all constraints.
• F: Role definition matrix that satisfying all constraints.
Algorithm:
while reconstruction error decreases do{
for k = 1 . . . r //Recalculate each role.{
1. Calculate R = V −G•( 6=k)F(6=k)•
2. Calculate G•k by solving for x as follows:
(a) x∗ = argminx||R− xFk•||2
(b) G•k = argminx||x∗ − x||2 s.t. gi(x) ≤ εi : ∀i
3. Calculate Fk• by solving for x as follows:
(a) x∗ = argminx||R− xG•k||2
(b) Fk• = argminx||x∗ − x||2 s.t. fi(x) ≤ εi : ∀i
}}
Figure 1: Our algorithm that will be used to encode all guidances described inSection 4. The algorithm uses a least squares approach and allows additionalconvex constraints to be added to the NMF formulation.
10
razor. Sparse learning formulations exist for many learning settings such aslinear regression (LASSO), Kernel methods (SVM) and covariance estimation.
In our work, we can place sparsity constraints on both the G or F matricesleading to an objective function of:
argminG,F
||V −GF||2
subject to: G ≥ 0,F ≥ 0
∀i ||G•i||1 ≤ εG∀i ||Fi•||1 ≤ εF
where εG and εF define upperbounds for the sparsityconstraints (amount of allowable density).
(6)
Previous works have shown the effectiveness of using L1 norm as a penaltyin model learning. In our formulation the L1 penalty is encoded as a constraintrather than a penalty in the objective, but it is known that these formulationsare theoretically equivalent [8]. However, another twist to our formulation isthat we do not constrain the entire matrix but instead constrain each columnof G and each row of F. This was done because our solver requires constraintsto be formulated only over one role vector at a time. The effect of this technicaldifference is that the sparsity must be more uniformly spread across each roledefinition or role assignment which is a benefit of this method.
Sparsity constraints on G and F have easy to understand intuitive inter-pretations. If G is sparse, it means that nodes are assigned to as few roles aspossible; and it is possible for some nodes to be assigned to no roles. If F issparse, it means that the roles are defined with respect to as few features aspossible. Both of these extensions allow for a simple explanation of the data,and lead to improved prediction performance.
4.2 Diversity
In the NMF forms of role discovery, nothing prevents the roles to which nodesare assigned (i.e., the G matrix) and the role definitions (i.e., the F matrix) tobe highly overlapping. This can be undesirable particularly for the F matrixsince it means all roles are highly similar. This can be overcome by enforcing adiversity requirement so that each role uses a different set of features (for theF matrix) and nodes are assigned to different combinations of roles (for the Gmatrix).
Our formulation for role allocation diversity (G matrix) and role definitiondiversity (F matrix) makes use of orthogonality as follows:
11
Figure 2: Visualization of diversity constraints on role explanation matrix F(roles × features) for DBLP dataset. The top matrix shows the unconstrainedresult; the bottom matrix is constrained to be completely diverse (ε = 0); andthe middle matrix shows a middle ground. From the top matrix to the bottommatrix, the number of black cells (i.e. zero values) increases since roles definitionsmust be explained with completely different sets of features.
argminG,F
||V −GF||2
subject to: G ≥ 0,F ≥ 0
∀i, j GT•iG•j ≤ εG i 6= j
∀i, j Fi•.FTj• ≤ εF i 6= j
where εG and εF define upperbounds on how angularlysimilar role assignments and role definitions canbe to each other.
(7)
When ε = 0, our constraint will exactly match the definition orthogality, andwhen ε ≥ 0 the constraint can be viewed as limiting the angular similarity be-tween two vectors. The effect of combining this constraint with non-negativityconstraints is that no role definitions will have any common features and norole assignments will have overlapping populations for ε = 0. This is so sinceGT•iG•j = 0 if and only if these two vectors do not share any non-zero entries.
Figure 2 shows such an example, where none of the three roles have any over-lapping features. In the context of our solver which solves for one vector at atime, this constraint will be linear (a weighted sum).
12
4.3 Alternative Role Discovery
Recent work on another unsupervised problem, clustering, has explored thearea of alternativeness [36, 13]. In that literature, the term alternativeness andorthogonality are used interchangeably, but we only use the term alternativenessfor clarity.
The motivation for alternativeness in unsupervised learning is strong. Mostinteresting problems are on large data sets that contain complex phenomena andthere may exist multiple explanations of the data. However, most unsupervisedlearning algorithms expect that there exists only one good explanation of thedata and return one explanation.
In many situations, it may be the case that the returned explanation isundesirable since it is either unactionable or not novel. Consider the IMDB(Internet Movies Database) dataset. If the resultant roles map actors to thestudios for which they work, then this is not particularly novel. Here, the workon alternative role discovery allows a previously discovered set of role allocations(G∗) or role definitions (F∗) to be specified as a counter-example of what notto find. The challenge though is to find another good explanation of the datathat is different to those already found.
The optimization problem to find alternative roles is then:
argminG,F
||V −GF||2
subject to: G ≥ 0,F ≥ 0
∀i, j G∗T•i G•j ≤ εG∀i, j F∗i•F
Tj• ≤ εF
where εG and εF define upperbounds on how similarthe results can be to G∗ and F∗.
(8)
5 Experiments for Guided Role Discovery
Our experiments demonstrate how constraints on graph role discovery can beuseful. Role discovery requires the user to specify the number of roles to use anda set of features for a graph. For the former, we used the Minimum DescriptionLength (MDL) described in [21] to automatically select the number of roles;and for the later, we used the approach described in [22]. We show that rolediscovery can be used to improve the results of the identity resolution problembetween two graphs, and that they can be further improved by using sparsityor diversity constraints. By using sparsity or diversity constraints, we improvethe role definitions which leads to more meaningful role assignments and moreaccurate identity resolutions. See Section 5.1 for these experiments. We alsoexperimentally verify the solutions to the alternative role discovery formulationpresented in Section 4.3 and observe that they indeed produce significantlydifferent results. The purpose of our experimental section is to address thefollowing questions:
13
Network —V— —E— k —LCC— #CCVLDB 1,306 3,224 4.94 769 112SIGMOD 1,545 4,191 5.43 1,092 116CIKM 2,367 4,388 3.71 890 361SIGKDD 1,529 3,158 4.13 743 189ICDM 1,651 2,883 3.49 458 281SDM 915 1,501 3.28 243 165
Table 2: Information about DBLP co-author networks for each conference.Data was collected for five years (2005-2009). —V—=number of vertices,—E—=number of edges, k=average degree, —LCC—=size of largest connectedcomponent, #CC=number of connected components.
1. Does adding constraints to the NMF-based role discovery formulation im-prove the quality of the resulting role explanations and assignments? Fig-ures 3 and 4 show that constraints improve the results of identity resolu-tion.
2. What effects do diversity constraints have on role discovery results? Fig-ures 3 and 4 show how diversity constraints can improve role discoveryresults even more so than sparsity constraints.
3. Can our alternative role discovery formulation produce significantly dif-ferent results? Tables 3 and 4 shows that our formulation can produceresults that are significantly different than a given set of roles or commu-nity assignments respectively.
5.1 Sparse and Diverse Identity Resolution in Co-authorshipGraphs
In this experiment, we show that by adding sparsity and diversity constraintsto the NMF formulation of role discovery, the resulting role definitions are ofhigher quality. We measure this improvement in quality indirectly by showinghow role definition matrices can be used for resolving identities of nodes acrossgraphs, and that constrained role definitions perform better than unconstrainedrole definitions for that problem.
From the DBLP data-set [27], we extracted a co-author graph from eachof the following related conferences from 2005 to 2009: KDD, ICDM, SDM,CIKM, SIGMOD, VLDB (see Table 2 for detailed information about each co-author graph). We extract a set of relevant structure features for the KDDgraph using REFEX [22], and compute these same features for all of the co-author graphs. We subsequently learn a set of role definitions from the KDDgraph using standard RolX [21] as well as the sparse and diverse versions ofGLRD. For each of these competing role definitions, we assign each vertex fromeach graph to the roles whose function they most exhibit. As a baseline, we also
14
explore author identification without roles by using the raw graph features asdescribed in ReFeX.
We use the role assignments to resolve the identities of vertices from eachgraph (namely, ICDM, SDM, CIKM, SIGMOD, and VLDB) to the vertices inthe KDD graph. Without loss of generality, assume we are resolving identity ofauthors from the KDD graph to the authors in ICDM graph. For each authorin both conferences, we select the corresponding row vector from the node byrole matrix Gkdd and find the k closest neighbors (row vectors) from Gicdm. Ifthe original author from KDD graph is present in the set of k closest neighbors,we count the result as a match. We repeat this experiment using sparsity anddiversity constraints on Fkdd. We also repeat the experiment using the ReFeXfeatures, comparing author feature vectors from Vkdd and Vicdm. Figures 3 and4 shows how the different decomposition methods compare in this setting for allgraphs paired with KDD.
Our method of utilizing role discovery results for the author identificationtask is described formally in the following set of steps:
1. Extract features from co-authorship graphs to get graph features (e.g. Vkdd,Vicdm)using ReFeX.
2. From the graph features matrix Vkdd perform role discovery to obtainGkdd and Fkdd.
3. Transfer the role definition matrix Fkdd (role by feature matrix) to othergraphs (e.g. Vicdm) by solving Equation 9.
Gicdm = minG||Vicdm −GFkdd||2 s.t. G ≥ 0 (9)
Our experiments with graph identity-resolution show that diversity andsparseness constraints almost universally improve the quality of learned role-definition matrix. This is not unexpected since there is a long tradition inmachine learning of using sparsity to prevent overfitting. As mentioned previ-ously we can view diversity as enforcing sparsity since a diverse set of roles asper our definition do not share many overlapping features and hence each roledefinition is concise.
Figure 3 shows that role definitions learned using sparsity and diversity out-perform standard unconstrained role discovery (RolX) in almost every settingand problem parameterization. Figure 4 more clearly shows the general trendby considering the results for a particular problem parameterization. In thatfigure, we observe that diversity constraints lead to the most improvement overRolX, while sparsity improvements are lesser. We also observe that transfer-ring the KDD role definitions to some graphs (like VLDB and SIGMOD) doesnot compare well to the baseline method that does not use any roles (such asReFeX). We believe this is because the same participants in conferences such asVLDB and SIGMOD do not have a similar role to the ones they play in KDD;and hence, using the raw features (without roles) produces better results.
15
We believe that sparsity improves the quality of role definitions by reducingthe ability of unconstrained NMF-based role discovery to overfit the problem.Features that only slightly add to the definition of a role are more likely to beexplaining noise; and by forcing those values to zero, we end up with more robustdefinitions. Furthermore, the diversity constraints help by removing redundancyin role definitions, which leads to definitions that are more easily comparable.For example, if a feature is used to define every role, then it is not essential indefining any of them.
5.2 Alternative Roles
In this section, we show that our alternative role discovery formulation (pre-sented in Section 4.3) can discover significantly different role definitions, as wellas show that the formulation can be used to improve the role definitions whenthere are ground-truth communities. In Table 3, we show the difference betweenan alternative role discovery result and an original role definition found usingunconstrained role discovery (via RolX). In Table 4, we show that we can useour formulation to get more consistent assignments of roles when ground-truthcommunities are known.
In our first experiment, we explore the difference between the roles of theoriginal and alternative role discovery. Using the KDD co-authorship graph, wefind a set of roles and constrain a new solution to have a significantly differentrole definition (F matrix). We then compare the results by assigning each vertexto its most dominant role in both results to create two separate partitions ofthe vertices. We then measure the difference between the two partitions usingJaccard distance. Table 3 shows that all of the Jaccard distances are far from 0meaning that the alternative role assignments are very different than the originalones. Figure 5 illustrates the alternative roles found in the largest connectedcomponent of the KDD coauthorship graph. Note, the reader can zoom in onthis figure to read the names of each author. The following is a description of theoriginal roles and the roles that GLRD(Alternative) found. These descriptionare based on sense-making analysis [21]. As the descriptions show these rolesare capturing alternative concepts.
R1(alt) R2(alt) R3(alt) R4(alt)R1 0.946 0.510 0.762 0.913R2 1.000 0.971 0.810 0.739R3 1.000 0.7942 1.000 1.000R4 0.345 0.991 1.000 0.982
Table 3: Jaccard distance matrix comparing original role assignments (rows)to alternative role assignments (columns). Jaccard distance of 0 represents anexact match between clustering and 1 represents no overlap. The relative errorfor the two decompositions was similar: 0.12% and .5% (where relative error iserror = ||V −GF||/||V||).
16
Original Roles:
Role 1: Nodes here have high eccentricity. These are periphery nodes.
Role 2: Nodes here have high eccentricity and high clustering coefficient.These are periphery nodes that are cliquey.
Role 3: Nodes here have high degree and high clustering coefficient. Theseare highly connected cliquey nodes.
Role 4: Nodes here have high PageRank, high degree, and high biconnectedcomponents numbers. These are globally central stars and brokers.
Alternative Roles:
Role 1: Nodes here have high PageRank and high biconnected componentnumbers. These are globally central and brokers.
Role 2: Nodes here have high clustering coefficient but not high eccentricity.These are non-periphery nodes that are cliquey.
Role 3: Nodes here have high eccentricity and high clustering coefficient.These are periphery nodes that are cliquey.
Role 4: Nodes here have high eccentricity and high degree. These are pe-riphery nodes that are locally stars.
We next experiment with a YouTube dataset, which is a network of userswith known ground-truth communities [33]. This graph was created by crawlingthe YouTube site in 2007 and creating directed edges between a pair of usersa and b if a’s profile page linked to b’s profile page. Ground-truth communi-ties were assigned by collecting all users belonging to the same group, whichwere pages that allowed communications between users on given topics. Thegraph has 1,134,890 vertices, 2,987,624 edges, and 8,385 communities. We se-lected all communities with over 100 users of which there were 105. The largestcommunity has 2,217 users.
There is an inherent complementariness between role discovery and commu-nity detection. The former is about structural similarity; while the latter isbased on proximity in the graph. Role discovery finds functions/roles of usersbut does not find the communities themselves. However, there may be multipleinteresting sets of communities within the same network and those communitiesmay be characterized by very different roles. In this experiment, we encode theset of ground-truth communities for which our role discovery technique shouldfind roles.
The way we encode the YouTube ground-truth communities into our anal-ysis is by providing the communities as G∗ to our alternative role discoveryformulation. This will force our discovered roles to have a role assignment thatis different than ground-truth communities, which matches the semantic rela-tionship between the two problems.
17
To evaluate the effectiveness of this result we measured the proportion ofmembers in each community belonging to each role. We then calculated thestandard deviation over all such communities per role and report the resultsin Table 4. The assumption for this evaluation is that each role should beequally represented in each community. Our results show that the alternativerole discovery formulation can indeed be used to normalize the roles with respectto a set of ground-truth communities. After applying sense-making [21], the sixroles that our GLRD(Alternative) finds are as follows:
Alternative Role 1: Nodes here are global hubs. They have high PageRankvalues, high out-degrees, and high biconnected component numbers.
Alternative Role 2: Nodes here are on the periphery of the graph. They havehigher than default eccentricity.
Alternative Role 3: Nodes here are authorities. They have high PageRankvalues and high in-degrees.
Alternative Role 4: Nodes here are very cliquey. They have high clusteringcoefficients.
Alternative Role 5: Nodes here are local hubs. They have high out-degreesand high biconnected component numbers.
Alternative Role 6: Nodes here are the majority of the population; they arethe “regular” folks. They have a local neighborhood that is more cliqueythan expected but otherwise nothing special stands out.
Roles 1 2 3 4 5 6Original 7.85 7.93 8.70 2.35 9.81 7.57Alternate 5.06 6.34 5.34 3.81 8.62 5.88
Table 4: For each role, we report the standard deviations of role proportionsover all communities. The result shows that our alternative role discovery for-mulation can be used to find roles whose members are better distributed acrossa set of interesting communities. The values are scaled by 102.
6 Lifting our Formulation for Multi-RelationlRole Discovery
Here we outline our method to lift our previous work to perform role discoveryin multi-relational graphs. We do not recreate the same experiments since theyare trival but instead focus on the more challenging problem of role discoveryin multi-relational graphs.
Role Discovery in Multi-relational Graphs. Our approach to extendingrole discovery to multi-relational graph is to model the graphs as a tensor. This
18
is done by extracting features from each relation and appending the resultingfeature matrices into a single tensor V of dimension n× f × r. Just as NMF isused to decompose a feature matrix V , tensor decompositions can be used todecompose a feature tensor V. One natural choice of tensor decompositions todecompose a feature tensor would be non-negative PARAFAC [16]. PARAFAClike NMF is a rank one decomposition see Figure 6. However, PARAFAC isnot an ideal model to find complex patterns in graphs, as is desired for rolediscovery, because it is too simplistic in its assumptions. In particular it willonly allow each group of entities to play only one role for only one group ofrelations. See the introductory section for a more indepth explanation of thelimitations of PARAFAC.
argminG,F,R
||V −∑
k
gk ◦ fk ◦ rk||Fro
subject to: G ≥ 0,F ≥ 0,R ≥ 0
(10)
Instead we use the Tucker decomposition (shown in Equation 11) that allowsus to find the complex interaction between E-groups, the roles they play, andR-groups they play those roles in. The diagrammatic explanation of Tucker de-composition in Figure 7 shows how it models these interactions. Like PARAFACand NMF, it is a rank one decomposition which allows for an intuitive inter-pretation. A column in G corresponds to a group of people and is a length nindicator vector showing E-group membership. Similarly a column in F cor-responds to a role definition which is a group of features and a column in Rcorresponds to a group of relations which we refer to as an R-group. UnlikePARAFAC and NMF, any factor can be any combination of the columns in G,F , and R. The core of the Tucker decomposition allows this complex interactionand requires more explanation (PARAFAC can be viewed as a specific Tuckerwith diagonal core). It too is a order 3 tensor except the modes are now directlyinterpretable as E-groups, roles, and R-groups. An entry in the core at i, j, kmeans that E-group i plays role j for R-group k. Understanding and simplify-ing this core is critical to the success of multi-relational role discovery using aTucker decomposition.
argminG,F,R,H
||V −∑
i
∑
j
∑
k
hijk ∗ gk ◦ fk ◦ rk||Fro
subject to: G ≥ 0,F ≥ 0,R ≥ 0,H ≥ 0
(11)
7 Our MRDAlgorithm For Multi-Relational Graphs
The Tucker model has most often been described as a higher order analog ofprincipal component analysis or singular value decomposition and is tradition-ally defined with factor matrices being orthogonal. Among the most populartensor toolboxes, the Tucker model is often implemented with orthogonalityconstraint on the factor matrices (Tensor Toolbox [4, 2]) or with no constraint
19
enforced on the core (Nway Toolbox [1]). Other recently proposed algorithmsfor non-negative Tucker model [24, 34] extend the classical multiplicative up-date procedures proposed for NMF [26], which is known to converge slowlynear stationary points [29]. Since the alternating least squares (ALS) methodis known as the “workhorse” algorithm for PARAFAC [25] and is empiricallydemonstrated to be competitive among many existing methods [38], we imple-ment our own version of non-negative Tucker decomposition using an alternatingnon-negative least squares (ANLS) scheme.
Let V be the tensor to be decomposed. Denote the factor matrices by G,Fand R and the core tensor by H. In each iteration we optimize over each ofG,F,R and H in turn while fixing all others as constants. When G is beingoptimized, the objective can be written as:
argminG≥0
‖VG −GHG(R⊗ F)T ‖Fro (12)
where VG is the matricization of V in the first mode and ⊗ is the Kroneckerproduct. The subproblems when F and R are being solved for have the exactsame form but with a different variable being optimized. In addition it is gener-ally desirable for the entries in the core to indicate the weights of each couplingof factors. Thus we normalize the columns of G,F and R once they are solved.When we solve for the core H, rewriting the tensors in vectorized form turnsthe objective into:
argminH≥0
‖vec(V)− (R⊗ F⊗G)vec(H)‖Fro (13)
where vec(·) is the vectorization of a tensor. Our overall solver is summarizedin Algorithm 1. We build our solver on top of the existing constructs in theMATLAB tensor toolbox [2] and employ the fast non-negative least squares(NNLS) solver particularly designed for tensor decomposition [9] when we solvesubproblems (12) and (13). For the terminating condition we adopt the commonpractice for ALS which stops when the relative change in the objective betweensuccessive iterations is smaller than some pre-set threshold. It is worth notingthat although we only enforce non-negativity constraints in this case, it requireslittle effort to adopt any constraint applicable to standard least squares probleminto our formulation.
20
Algorithm 1 Multi-relational Role Discovery (MRD) using Alternating LeastSquares Non-negative Tucker decomposition.
1: Initialize G,F,R and H to any non-negative values2: while Stop condition not met do
3: G← argminG≥0
‖VG −GHG(R⊗ F)T ‖Fro
4: Normalize the columns of G
5: F← argminF≥0
‖VF − FHF (R⊗G)T ‖Fro
6: Normalize the columns of F
7: R← argminR≥0
‖VR −RHR(F⊗G)T ‖Fro
8: Normalize the columns of R
9: H ← argminH≥0
‖vec(V)− (R⊗ F⊗G)vec(H)‖Fro
10: end while11: return G,F,R,H
Algorithm Complexity. Our algorithm is an example of alternating leastsquares with each step being efficiently solvable using least squares solvers. Thenon-negativity requirement on the core can be efficiently enforced by solvers.Since tensor decomposition is well known to be intractable, we provide an es-timate of our algorithm’s run time to converge to a good local minima. Thealgorithm like most tensor decomposition algorithms has linear complexity withrespect to the number of factors, modes and size of the core. In practice thedecomposition of our graphs shown in the experimental section took under aminute to run on a 12-core machine.
8 Interpretting Tensor Decomposition for RoleDiscovery
After applying Algorithm 1 we have decomposed the multi-relational graph intoa series of E-groups (defined by G), a series of roles (defined by F ) and a seriesof R-groups (defined by R). The core of the Tucker decomposition measures theinteraction between these E-groups, roles and R-groups. Here we show how tointerpret and analyze the results of Tucker decomposition in a number of ways.
8.1 Visually Interpreting Core Slices
We begin with the simple but useful approach of visually inspecting the core ten-sor slices to compare E-groups, roles, or R-groups. A slice of the core (dependingon its orientation: left-to-right, top-to-down or back-to-front) can represent aE-group, role, or R-group. Different slices of the same orientation can then beused to compare the similarity of E-groups, roles and R-groups. For examplein Figure 8 we display the slices corresponding to different E-groups from amulti-relational role discovery result.
21
Comparing the slices directly leads to very detailed comparison of E-groupsbecause we compare for example if they have role/R-group combinations incommon. However if we consider aggregations of these slices we can get morecoarse comparison, such as whether or not the E-groups play the same roles, orwhether they participate in the same R-groups. For example the third and fifthE-group look very similar in terms of the R-groups they take part in, but bylooking at the slices we know that they differ because they play very differentroles in those very same relations.
8.2 Visualizing Core as an Interaction Graph
A further visual understanding of the phenomenon in the multi-relational graphcan be obtained by visualizing the core as a graph. This is achieved by creatinga node for every E-group, role, and R-group. This will of course be a heteroge-neous graph. An entry in the core then could be represented in this graph asa clique on the triplet (E-group, role, R-group) it corresponds to. Since eachedge corresponds to a Tucker core entry, it’s edge can be weighted dependingon that core value entry and be interpreted as a similarity. However, if we arefocused say on predominantly understanding groups of entities, we can create atripartite graph as shown in Figure 9 which removes the edge between the roleand R-group. We shall call this graph the interaction graph to distinguish itfrom the original multi-relational graph we study.
This interaction graph can then be visualized and interesting signature pat-terns can be interpreted. See Figure 9 for some example signatures.
8.3 Analysis of the Interaction Graph
Given the interaction graph described in the previous subsection which showsthe relationship between E-groups, roles, and R-groups, we can analyze thisgraph any number of ways. For example, a popular approach to graph simpli-fication is to embed the graph into a two dimensional space. Figure 16 showssuch an embedding using PCA of the graph written in “hyper-edge” form. Thatis a n×m matrix where each column in the matrix represents a hyper-edge andentry i, j has value 1 if node i is involved in hyper-edge j. This heterogeneousobject embedding can be interpreted such that each cluster is a collection ofE-groups, roles, and R-groups that often interact.
22
Property ComputationSimplicity: To what extent are nodes connected to multiple nodesof other types versus being connected to only one node (e.g., E-groups playing multiple roles)?
Average nodedegree
Sharing: How much can E-groups be separated into independentparts? For example, can we find two sets of roles that are playedby completely non-overlapping sets of E-groups?
Mincut cost
Variability: How does the simplicity of nodes (E-groups, roles, orR-groups) vary across the interaction graph.
Variance of nodedegree; Entropyof PageRankdistribution
Stability: How stable are the interactions between roles, E-groupsand R-groups
Spectral gap
Table 5: The macroscopic properties measured on the interaction graph H. SeeFigure 17 for measurements over several congressional multi-relational graphsspanning a time frame of 30 years.
8.4 Macroscopic Properties Derived from the InteractionGraph
Given the interpretation of the core as an interaction graph, we can than under-stand the macroscopic properties of the role dynamics by analyzing the interac-tion graph properties. The metrics we study are motivated in Table 5 along withhow they are computed. These metric are meant to give the user a broad under-standing of the underlying dynamics of the graph. The simplicity property tellshow strongly aligned E-groups, roles and R-groups are, while the sharing prop-erty measure how many roles, and R-groups, are shared among different groupsof entities. The variability property, captures the amount of imbalance in thecomplexities of different nodes in the interaction graph, by calculating both thevariance of the node degrees as well as the entropy of the stationary distributionon a random walk along the interaction graph. Another important property wemeasure is the stability of the results we discovered. Here we wish to answerthe question, how robust are the patterns found within the interactions graphand how easily could those patterns change due to small perturbations.
8.5 Complex Analysis Via Role Transfer
Our work so far learned both the E-groups, role definitions and R-groups fromthe one multi-relational graph. However, we can transfer in these definitionsfrom another source by holding them fixed as constants in the Tucker decom-position. For example, if we wish to transfer in a set of existing roles, we canadjust Algorithm 1 and not solve for the F matrix that defines the roles. Thisallows us to test many interesting questions such as how transferable the rolesfrom other graphs are at explaining another multi-relational graph. We exper-
23
iment with this particular type of transfer in Figure 18, however other typesof transfer are possible. We now discuss all types but due to space limitationsshow experiments only for role transfer.
Role transfer can be used to detect to what extent roles are similar ordissimilar across different multi-relational graph. If there is a particularly in-teresting set of roles that have been studied in another graph, they can betransferred to a new graph to see how the nodes in that graph play those roles.
E-group transfer can only be used if the multi-relational graphs are on thesame entities. However if there are some well understood grouping of entities(say Democrat, Tea Party and Republican) these can be translated into E-groups and transferred to help gain understanding of the behaviors of thosespecific groups.
R-group transfer, similar to role transfer, can be used to test how wellrelation groupings transfer across multiple graphs.
9 Empirical Results
As in our previous work, all code and data sets will be made publicly availableon our website.
Since we wished to focus on analyzing both multi-relational graphs and col-lections of similar multi-relational graphs for transfer setting, we focused ourempirical analysis on the Cosponsorship Network Data [15, 16] data set. Thisdata set consists of congressional cosponsor data for over 30 years of congresses.Congressional representatives have the ability to add their name to a bill in orderto lend support to it (called cosponsoring), and it has been argued that this actis a good measure of interaction within congress because legislators spend con-siderable effort convincing other representatives to cosponsor their bills. Usingthis publicly available information about cosponsorships, each congress can bebroken up into a multi-relational graph with approximately 450 different nodes(congressional representatives) who jointly cosponsor approximately 10,000 billsper congress (many are just amendments). Table 6 show statistics for the graphcreated from the 110th congress, but in all we study the 96th-110th congresses,each of which has their own cosponsorship graph. Rather than create a cospon-sorship graph based on all of the proposed bills from a particular congress, webuild a multi-relational graph by viewing each committee as a separate relation(see Figure 10). Each bill is assigned to a committee based on the topic ofthe legislation. We analyzed bills from 15 different committees (the committeesfor which there were legislation in each congress 96th-110th) so that all of therelations are consistent over all the multi-relational graphs. Across the differentcongresses the one factor that does change is the set of elected representativeselected during each. Putting this altogether the multi-relational graph we studyis a person×person×committee tensor such that the entry at (i, j, k) indicateshow often congressman i and j cosponsored a bill that was sent to committeek for a particular congress. This graph has many underlying complexities interms of groups of congressional representatives who work together (i.e., party-
24
based and tenure-length based), the roles that congressional representatives play(e.g., focused and generalist), and the relationships of the various bill areas (e.g.,science-focused, business-focused). We study the last 15 congresses (96thto 110th) and have a multi-relational graph for each.
Graph Attribute ValueNumber Representatives 453
Number Bills 10613Sponsors Per Bill 16.9
Mean cosponsor degree (aggregated) 8.37Standard deviation (aggregated) 6.31
Number of zeros (aggregated) 1729Mean cosponsor degree (median) 0.48
Standard deviation (median) 1.02Number of zeros (median) 53235
Table 6: Details on the congressional cosponsor data set for the 110th congress.The aggregated statistics were calculated on the cosponsorship graph withouttreating it as a multi-relational graph. The median statistics measure the me-dian attribute value over each relation or committee. The number of zeros refersto the number of pairs of representatives that have no edge (or an edge of weightzero).
9.1 Studies on a Single Multi-Relational Graph
Here we present results on the analysis of the 110th Congress which sat from2007-2009. This was a Democrat controlled congress that sat during the lasttwo years of President George W. Bush’s administration. It was also unique inthat it was the first Democrat controlled congress since 1995.
We analyzed this multi-relational cosponsor graph using our formulation formulti-relational role discovery. This produced E-groups, roles, and R-groupsalong with an interaction graph that explained the interactions between thethree concepts. The E-groups are shown in Figure 12, the interpretation ofrole definitions is shown in Figure 11, and the composition of the R-groups isshown in Figure 13. How these E-groups, roles, and R-groups interact in theinteraction graph are shown both directly as a sliced core in Figure 14, as asparsified graph in Figure 15, and as a graph embedding in Figure 16.
Underlying E-groups, Roles and R-groups. Figure 12 shows that asexpected people from the same party cosponsor the same bills though this fur-ther divides into two different E-groups per party. For the two Democrat groupswe note that there is an E-group of mostly junior congressmen (group 4) whilstthe other contains many of the senior congresswoman (group 1). Of particularnote is the 5th E-group that contains a mix of Republican and Democrat repre-sentatives which largely represents a group of centralist members. For exampleMcGotter was a well known member of the moderate “Republican Main Street
25
Partnership”.Figure 11 shows the types of roles that are found in the graph via sense mak-
ing [21]. This plot shows for each role the attributes shared by representativeswho play that role. Roles can be contrasted and compared in terms of thesereference features. For example roles 2 and 4 both have comparable degree butlargely differing weight, meaning representatives from both roles participatedin cosponsorship with roughly the same number of other representatives, butrepresentatives in role 4 cosponsored with the same people more often.
Figure 13 shows the compositions of the R-groups. Each R-group is com-posed of some combination of the 15 studied relations each of which in turncorrespond to a congressional committees which is roughly interpretable as thetopic of the bill. While there is some overlap in the relational contribution ofeach R-group, each of them has a unique dominating relation (R-group 1 ‘Waysand Means’, R-group 2 ‘Rules’, R-group 3 ‘Oversight and Government Reform’,R-group 4 ‘Education and Labor’, R-group 5 ‘Agriculture’). Because we did notenforce orthogonality for our Tucker decomposition, as is commonly done (seeAlgorithm 1), we can see which relations are less distinguishing in terms of roleanalysis by looking at those relations that show up in multiple R-groups (e.g.,‘Transport and Infrastructure’ is assigned to every R-group).
Interactions Between E-groups, Roles and R-groups. We now explainthe Interaction Graph which is shown in Figure 15. As previously mentionedE-groups are largely divided by party even though party was not part of thedata set. It can be argued then that this role discovery formulation discoveredcommunities rather than roles. However the reason these groups divided alongparty lines is because parties are playing different roles in different R-groups.Depending on different factors such as which party is the majority, we expectthe parties to play different roles, so our analysis matches our expectations.
While there is much overlap in the R-groups that both parties participatein, the parties play different roles in those R-groups. For example the Repub-lican groups participate largely in R-groups 3,4,5 while the Democrat groupsparticipate largely in R-groups 1,2,3,5. However E-group 4 (Republican) andE-group 5 (Democrat) play different roles in R-group 5 (Agriculture). This isan example of a Role Tie from Figure 9.
There are also some roles and E-groups that are unique to a party. Forexample role 2 is exclusive to Republicans (many collaborators, but not manycollaborations). And R-group 1 (Ways and Means) is more strongly associatedwith the Democrat E-groups. This makes sense, because the Ways and Meanscommittee is one of the most prestigious to participate in and relates to taxlegislation. It therefore makes sense that the majority party would be mostactive in this committee.
Though the direct view of the interaction graph is useful, as discussed earlierthere are other methods to understand the interaction. We can slice the coretensor either by E-group, role, or R-group and directly compare. Figure 14shows such a comparison across E-groups. We can see that E-groups 1 and 3both play role 5 but on different R-groups, also E-group 1 plays mainly onerole, but E-group 3 plays multiple roles in the graph. Finally, we can embed
26
this graph into a metric space as shown in Figure 15.
9.2 Studies Across Multiple Multi-Relational Graphs
We also performed multi-relational analysis across a total of 15 consecutive con-gresses and report the results here. There were two experiments we performed,to analyze these multi-relational graphs and to gain insight into them. Firstin Figure 17 we analyzed how the macro-properties of the learned interactiongraphs, as discussed in Section 8.3, varied throughout the congress (see Figure17). And second we determined how well roles definitions learned from onecongress can transfer to others, as discussed in Section 8.5, the results of whichare presented in Figure 18.
Figure 17 shows the results of our analysis of macro-properties of the learnedinteraction graphs from the 96th-110th congresses. These results contain animmense amount of interesting insights and we focus on just a few due to spacerestrictions. The first unusual property is we note is a great spike of instability inthe 101st congress. This is due to the election of a new President Bush followinga very popular bipartisan President Regan. In addition many controversial billswere passed that crossed party lines such as the Americans With DisabilitiesAct. In contrast the 99th congress was very stable given it was Regan’s secondterm and most bills were supported across partisan lines. Of particular note isalso the sharp peaks during congresses 97, 101 and to lesser extent 103. Theycorrespond precisely to changes in Presidencies: Carter (Democrat) to Regan(Republican) (97), Regan (Republican) to Bush (Republican) (101) and Bush(Republican) to Clinton (Democrat) (103).
In Figure 18 we show a heat map on the role transfer between differentcongresses. We first ran our algorithm to discover the roles for all congresses.Then we transferred each set of role definitions learned from all 15 congressesto every other congress, and measured the fit to determine how well each setof roles could be used to explain the behavior of every other congress. Theheat map shows how well (dark red) or how poorly (dark blue) the roles for thecongress in the row explained the interactions for the congress in the column.Of course the diagonal is dark red since those roles were built from data for thatcongress. As expected the block red structure indicates that later congressesroles can better explain later congresses behavior and earlier congresses rolescan explain earlier congresses behavior. The solid blue block on the top lefthand corner indicates that later congresses roles are very poor at explainingthe later congresses behavior. The apparent outliers within the top right handblock and lower left hand block (i.e., the bluish entries amongst the red/yellow)are indicative of a shift in presidency or house majority either Democrat toRepublican or vice-versa.
27
10 Conclusion
Role discovery is an emerging and important area of graph mining. It looksat discovering nodes that perform similar functions in networks, but do notnecessarily belong to the same community. Existing work so far has had twolimitations: they are completely unsupervised and are focused on single rela-tional graphs.
We propose a framework that allows incorporating convex constraints intoNMF to allow a rich set of guided role discovery formulations. In particular weexplore three types of guidance: sparsity, diversity and alternativeness. Sparsityand diversity can be used to create simpler and more interpretable role defini-tions and role allocations. Also they can reduce overfitting and produce betterpredictive results for matching authors between the KDD conference and a vari-ety of other conferences provided they perform similar roles in both conferences.The notion of alternativeness has been explored in the clustering literature andis useful if the given explanation is not valid and an alternative is required. Herewe show that not only do alternative roles exist in co-author networks, but thatwe can find an alternative to the community structure in a very large YouTubegraph.
We then showed how to lift that framework to multi-relational graphs by firstrepresenting the multi-relational graph as a tensor. We then use a Tucker de-composition due to the more popular PARAFAC decomposition not being ableto find the complex interactions that are likely to occur between the E-groups,roles, and R-groups. However, existing Tucker decomposition algorithms in pop-ular toolboxes enforce properties that would lead to non-intuitive results for rolediscovery, hence we formulate our own algorithm. A critical aspect to our workis how to interpret and use the core of the Tucker decomposition which showsthe complex interactions between the E-groups, roles, and R-groups. We showhow it can be visualized and represented as an interaction graph whose proper-ties we can use as macroscopic indicators of the original multi-relational graph.Our experimental results focus on 15 multi-relational Congressional cosponsorrecord graphs. Here an E-group is a collection of congressional representatives,an R-groups is the collection of bill types (determined by the committee theywent through), with the roles being on cosponsoring behavior. We show thatour methods can find intuitive and expected insights such as Republican andDemocrats naturally separate into different E-groups. We also find that groupsof representatives can play multiple roles for multiple R-groups, showing thatthe Tucker decomposition does indeed find the complex interactions we wish todiscover. The macroscopic properties of the interaction graph show that thecongresses vary greatly over time with abrupt changes being associated withchanges in the Presidency and control of the Congress. Finally our transfersetting offers a useful insight into understanding how roles have differed acrosscongress by using the roles from different congresses to explain the behavior ofothers.
28
11 Acknowledgments
The authors gratefully acknowledge support of this research via ONR grantsN00014-09-1-0712, N00014-11- 1-0108 and NSF Grant NSF IIS-0801528. Thiswork was also supported in part by IARPA via AFRL Contract No. FA8650-10-C-7061 and in part by DAPRA under SMISC Program Agreement No. W911NF-12-C-0028.
References
[1] C. A. Andersson and R. Bro. The n-way toolbox for {MATLAB}. Chemo-metrics and Intelligent Laboratory Systems, 2000.
[2] B. W. Bader and T. G. Kolda. Algorithm 862: MATLAB tensor classes forfast algorithm prototyping. ACM Transactions on Mathematical Software,32(4):635–653, December 2006.
[3] B. W. Bader, T. G. Kolda, et al. Matlab tensor toolbox version 2.5, January2012.
[4] B. W. Bader, T. G. Kolda, et al. Matlab tensor toolbox version 2.5. Avail-able online, January 2012.
[5] S. Basu, I. Davidson, and K. Wagstaff. Constrained Clustering: Algorithms,Applications and Theory. Prentice Hall, 2008.
[6] A. Beck and M. Teboulle. Mirror descent and nonlinear projected sub-gradient methods for convex optimization. Operations Research Letters,31(3):167–175, 2003.
[7] M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plem-mons. Algorithms and applications for approximate nonnegative matrixfactorization. Computational Statistics and Data Analysis, 52(1):155–173,2007.
[8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge UniversityPress, NY, USA, 2004.
[9] R. Bro and S. De Jong. A fast non-negativity-constrained least squaresalgorithm. Journal of Chemometrics, 11(5):393–401, 1997.
[10] R. Bro and N. D. Sidiropoulos. Least squares algorithms under unimodalityand non-negativity constraints. J. of Chemometrics, 12(4):223–247, 1998.
[11] I. CVX Research. CVX: Matlab software for disciplined convex program-ming, version 2.0 beta. http://cvxr.com/cvx, Sept. 2012.
[12] I. Davidson, S. Gilpin, and P. B. Walker. Behavioral event data and theiranalysis. DMKD, 25(3):635–653, 2012.
29
[13] I. Davidson and Z. Qi. Finding alternative clusterings using constraints. InICDM, pages 773–778, 2008.
[14] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient pro-jections onto the l1-ball for learning in high dimensions. In ICML, pages272–279, 2008.
[15] J. Fowler. Connecting the Congress: A Study of Cosponsorship Networks.Political Analysis, 2006.
[16] J. Fowler. Legislative Cosponsorship Networks in the U.S. House and Sen-ate. Social Networks, 2006.
[17] H. Furstenau and M. Lapata. Semi-supervised semantic role labeling. InEACL, pages 220–228, 2009.
[18] S. Gilpin, T. Eliassi-Rad, and I. Davidson. Guided learning for role discov-ery (glrd): Framework, algorithms, and applications. In KDD, 2013.
[19] M. Grant and S. Boyd. Graph implementations for nonsmooth convexprograms. In V. Blondel, S. Boyd, and H. Kimura, editors, Recent Ad-vances in Learning and Control, Lecture Notes in Control and Infor-mation Sciences, pages 95–110. Springer-Verlag Limited, 2008. http:
//stanford.edu/~boyd/graph_dcp.html.
[20] W. Heiser and P. Kroonenberg. Dimensionwise fitting in PARAFAC-CANDECOMP with missing data and constrained parameters. TechnicalReport PRM 97-01, University of Leiden, The Netherlands, 1997.
[21] K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu,D. Koutra, C. Faloutsos, and L. Li. RolX: Structural role extraction &mining in large graphs. In KDD, pages 1231–1239, 2012.
[22] K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, andC. Faloutsos. It’s who you know: Graph mining using recursive structuralfeatures. In KDD, pages 663–671, 2011.
[23] P. O. Hoyer. Non-negative matrix factorization with sparseness constraints.JMLR, 5:1457–1469, 2004.
[24] Y.-D. Kim and S. Choi. Nonnegative tucker decomposition. In ComputerVision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on,pages 1–8, June 2007.
[25] T. G. Kolda and B. W. Bader. Tensor decompositions and applications.SIAM Review, 51(3):455–500, September 2009.
[26] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factoriza-tion. In NIPS, pages 556–562. MIT Press, 2000.
30
[27] M. Ley. DBLP, computer science bibliography. http://www.informatik.uni-trier.de/~ley/db/.
[28] T. Li and C. Ding. The relationships among various nonnegative matrixfactorization methods for clustering. In ICDM, pages 362–371, 2006.
[29] C.-J. Lin. Projected gradient methods for nonnegative matrix factorization.Neural Comput., 19(10):2756–2779, Oct. 2007.
[30] H. Liu, Z. Wu, X. Li, D. Cai, and T. Huang. Constrained nonnegative ma-trix factorization for image representation. PAMI, 34(7):1299–1311, 2012.
[31] J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections.Arizona State University, 2009.
[32] J. Liu and J. Ye. Efficient Euclidean projections in linear time. In ICML,pages 657–664, 2009.
[33] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee.Measurement and analysis of online social networks. In IMC, pages 29–42,2007.
[34] M. Morup, L. K. Hansen, and S. M. Arnfred. Algorithms for sparse non-negative tucker decompositions. Neural Computat, 20:2112–2141, 2008.
[35] P. Paatero and U. Tapper. Positive matrix factorization: A non-negativefactor model with optimal utilization of error estimates of data values.Environmetrics, 5(2):111–126, 1994.
[36] Z. Qi and I. Davidson. A principled and flexible framework for findingalternative clusterings. In KDD, pages 717–726, 2009.
[37] M. Somaiya, C. Jermaine, and S. Ranka. Mixture models for learninglow-dimensional roles in high-dimensional data. In KDD, pages 909–918,2010.
[38] G. Tomasi and R. Bro. A comparison of algorithms for fitting the parafacmodel. Computational Statistics & Data Analysis, 50(7):1700–1734, April2006.
[39] L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, 1997.
[40] F. Wang, T. Li, X. Wang, S. Zhu, and C. Ding. Community discovery usingnonnegative matrix factorization. DMKD, 22(3):493–521, 2011.
31
2 3 4 5 60
0.05
0.1
0.15
0.2
0.25
Set Size (Log Scale)
Rec
all
(a) CIKM
2 3 4 5 60
0.1
0.2
0.3
0.4
0.5
Set Size (Log Scale)
Rec
all
(b) SDM
2 3 4 5 60
0.1
0.2
0.3
0.4
Set Size (Log Scale)
Rec
all
ReFeXRolXGLRD(Sparse)GLRD(Diverse)
(c) ICDM
2 3 4 5 60
0.1
0.2
0.3
0.4
Set Size (Log Scale)
Rec
all
(d) SIGMOD
2 3 4 5 60
0.1
0.2
0.3
0.4
Set Size (Log Scale)
Rec
all
(e) VLDB
Figure 3: Comparison of role discovery techniques for identity resolution acrossgraphs. Role definitions are learned from the KDD co-authorship graph; then,authors from the other (conference) co-authorship graphs are assigned to theseroles using various techniques. In particular, we show results for ReFeX (fea-tures only), RolX (unconstrained role discovery), GLRD-Sparse (role discoverywith sparsity constraints), and GLRD-Diverse (role discovery with diversityconstraints). Authors from each conference are paired with increasing numberof nearest neighbors from KDD conference (x-axis) and the resulting recall isreported (y-axis). Across most settings role definitions using sparsity and diver-sity constraints lead to better identity resolution results than standard uncon-strained RolX. For graphs that are most similar in nature to KDD (e.g. ICDM,SDM, CIKM) the transfer of role definitions lead to better results than sim-ply using structural features of nodes directly. Note that the recall values arerelatively low because the set sizes (on the x-axis) are small compared to thepopulation size in each graph.
32
Figure 4: Comparison of role discovery techniques for identity resolution ex-periments. Authors from each conference paired with the nearest 32 neighborsfrom KDD conference; the resulting recall accuracy is reported. The percent-age number (on the x-axis) is the fraction of authors that overlap between thetwo conferences. Nearly all experiments show better results with sparsity anddiversity constraints except when the authors do not share similar roles in thetwo conferences (SIGMOD and VLDB).
33
Steve_Kelling
Wesley_M._Hochachka
Mirek_Riedewald
Daniel_Fink
Mohamed_Farid_Elhawary
Daria_Sorokina
Rich_Caruana
Nam_Nguyen
Art_Munson
Shaojun_Wang
Chi-Hoon_LeeJayant_R._Kalagnanam
Oivind_Johnsen
Cristian_Bucila
Lei_Zheng
Shyam_Kapur
Ye_Chen
John_F._Canny
Dmitry_Pavlov
Ramnath_Balasubramanyan
Wojciech_Gryc
Sai_Zeng
Chidanand_Apté
Conrad_Murphy
Tong_Zhang
Christian_A._Lang
Ioana_M._Boier-Martin
Prem_Melville
Naval_K._Verma
Robert_A._Stine
Lyle_H._Ungar
Alexandrin_Popescul
Byron_Dom
Dean_P._Foster
Jing_Zhou
Jignashu_Parikh
Cary_Dehing-Oberije
D._Sculley
Chase_Krumpelman
Glenn_Fung
Sugato_BasuR._Bharat_Rao Kazunori_Okada
Alina_Beygelzimer
Kristin_P._Bennett
David_P._Naidich
Senthil_Periaswamy Jinbo_Bi
Nancy_Obuchowski
Toshiro_Kubota
Aurelie_C._Lozano
Andrew_Arnold
Naoki_Abe
Alexandru_Niculescu-Mizil
Richard_D._Lawrence
Jonathan_R._M._Hosking
Yan_Liu
Hongfei_Li
Robert_Schroko
Jason_V._Davis Claudia_Perlich
John_Langford
Yuqiang_Guan
Brian_Kulis
Bianca_Zadrozny
Saharon_Rosset
Mikhail_Bilenko
Robert_G._Malkin
Sonal_Gupta
Matthew_Richardson
Sathyakama_Sandilya
Roberto_J._Bayardo
Marcos_Salganicoff
Dhiman_Barman
Dimitrios_Gunopulos
Flip_Korn
Ian_Davidson
Divesh_Srivastava
Dejan_Diklic
Shawn_NewsamErick_Cantú-Paz Gene_Alexander
Yung_Chang
Chris_H._Q._Ding
Byron_J._GaoSudhir_KumarOliver_Schulte
Jin-yi_CaiJieping_Ye
Qi_Li
Lei_Yu
Gautam_Das
Christos_Boutsidis
Petros_Drineas
Yu-Ru_Lin
Jimeng_SunMichael_W._Mahoney
Ravi_B._Konuru
Mauro_MaggioniAisling_Kelliher
Anil_K._Jain
Kewei_Chen
Jing_LiJiangxin_Wang
Teresa_Wu
Zheng_Zhao
Eric_Reiman
Min_BaeRinkal_Patel
Zohreh_NazeriHuan_Liu
Thomas_Bernecker Florian_Verhein
Inderjit_S._Dhillon
Martin_Pfeifle
Srujana_Merugu
Andreas_Züfle Matthias_Renz
Sriram_Krishnan
Rómer_Rosales Nena_M._Marin
Raymond_J._Mooney Meghana_Deodhar
Srivatsava_Daruru
Hans-Peter_Kriegel Deepak_AgarwalDimitrios_Kotsakos
Manolis_PlatakisBenjamin_AraiAshish_Grover
Goutam_Paul
Dharmendra_S._Modha
Arindam_Banerjee
Joydeep_Ghosh
Philippe_Lambin
Shipeng_Yu Matt_Walker
Kai_Yu
Obi_L._Griffith
Mohsen_JamaliSteven_J._M._Jones
Arno_Knobbe
Richard_Frank
Volker_Tresp
Zengjian_HuWen_JinS._S._Ravi
Rong_Ge
Flavia_Moser
Jeff_M._Phillips
Zhengyuan_Zhu
Martin_Ester
Jun_Liu
Betul_CeranLiang_Sun
Shuiwang_Ji
Jianhui_Chen
Jianping_Zhang
Lei_TangMingrui_Wu
Andrew_McGregor
Neal_E._YoungBee-Chung_Chen
Giri_Kumar_TayiSuresh_Venkatasubramanian
Kevin_Chen-Chuan_Chang
Wei_Wang_0009
Yifan_Li
Xiaoyuan_Wang
Lin_Deng
Raymond_Chi-Wing_Wong
Ada_Wai-Chee_Fu
Junfeng_Wang
Baile_Shi
Chen_Wang
Zhen_LiaoYan_Li
Huanhuan_Cao
Bin_Zhou_0002
Qi_He
Krishna_Gade
Nikos_Mamoulis
David_W._Cheung
Jian_Xu
Jiong_Yang
Hongxing_He
Meng_Hu
Graham_J._Williams
Jan_Prins
Jun_Huan
Jinze_Liu
Huidong_Jin
Dik_Lun_Lee
Panagiotis_Karras
Zhiping_Zeng
Shuigeng_Zhou
Daxin_Jiang
Yutao_Shou
Jing_Wang
Dimitris_Sacharidis
Yi_Wang
Jianyong_Wang
Yufei_Tao
Yuzhou_Zhang
Lizhu_Zhou
Shuang-Hong_Yang
Xiaokui_Xiao
Xin_Zhang
Ziyu_GuanCan_Wang
Charu_C._Aggarwal
Jinwen_Ma
Chun_Tang
Wei_Vivian_Zhang
Aidong_Zhang
Jiajun_Bu
Chun_Chen
Mohammed_Javeed_Zaki
Lise_Getoor
Gregory_Piatetsky-Shapiro
Ralph_Krieger
Ira_AssentThomas_Seidl
Emmanuel_Müller
Timm_Jansen
Chabane_Djeraba
Markus_Peters
Andrew_Y._Wu
Yongtai_Zhu
Jian_Pei
Yabo_Xu
Bo_Zhang_0002
Xuemin_Lin
Taehyong_Kim
Woochang_Hwang
Na_TaMurali_Ramanathan
Jianhua_Feng
Qi_Zhang
Wei_Wang_0010
Fei_ZouRichard_F._Helm
Leonard_McMillan
Debprakash_Patnaik Yong_Ju_Cho
Malcolm_Potts
Yang_CaoRatnesh_K._Sharma
Xiang_Zhang
Naren_Ramakrishnan
Bud_Mishra
Deept_Kumar
Mohammed_J._Zaki
Manish_Marwah
Lizhuang_Zhao
Rick_Pechter
Michael_R._Berthold Christoph_Lingenfelder
Mike_Hoskins
Wayne_Thompson Rich_Holada
Michael_Zeller
Erik_Marcade
Ashish_Verma
Ajay_Gupta Kevin_English
Jeff_Achtermann
Shourya_Roy
Indrajit_Bhattacharya
Sachindra_Joshi
Shantanu_Godbole
Ronen_Feldman
Mustafa_BilgicElena_ZhelevaHossam_Sharara
Louis_Licamele
Robert_Grossman
Aron_Culotta
Khashayar_Rohanimanesh Xiaojin_Zhu
Matthew_Marzilli
Robert_Hall
Michael_L._Wick
Andrew_McCallum Karl_Schultz
Xuerui_Wang
Charles_A._Sutton
Chris_Pal
Gregory_Druck
Zhong_Su
Zhili_Guo
Limin_YaoVincent_C._S._Lee
Xiaoxun_Zhang
Honglei_Guo
Shuguo_Han
Huijia_Zhu
Wee_Keong_Ng
David_M._Mimno
Ramdev_Kanapady
Lexiang_Ye
Wei_Peng
Jie_Tang
Sheng_Ma
Ping_Luo
Yuhong_XiongFen_Lin
Zhongzhi_Shi
Yong_Zhao
Chad_L._Myers
Gang_FangChristopher_Potter Shyam_Boriah
György_J._Simon Gaurav_Pandey
Gowtham_AtluriTushar_Garg Rohit_Gupta
Zhi-Li_Zhang
Kevin_Lü
Wei_TangYong_Ge
Xiaofeng_Gao
Zhi-Hua_ZhouLei_Yuan
Bao-Hong_Shen
Wenjun_Zhou
Hui_Xiong
Haesun_Park Ravi_Janardan
Blayne_Field
Vipin_KumarJerry_Scripps
Tie_Wang
Steven_A._Klooster Michael_Steinbach
Shashi_Shekhar
Pang-Ning_Tan
Ramendra_K._Sahoo
Ying-Xin_Li
Jie_Wu
Jian_Chen
Weili_WuJunjie_Wu
Shi_Zhong
Peng_Wu
Li_Zhang
Haibin_Cheng
Jing_ZhangAleksandar_Lazarevic
Abdol-Hossein_Esfahanian
Juanzi_Li
Li_Wan
Zhixue_Han
Yanfang_YeQingshan_Jiang
Heng_HuangFeng_Liang Dijun_Luo
Chandrika_Kamath Dingding_Wang
Dongyi_Ye
Yizhou_Sun
Xiaoxin_Yin
Siau-Cheng_Khoo
Chengnian_Sun
Chen_ChenJiawei_Han
Jing_Gao
Yintao_Yu
Jing_Jiang
Philip_S._Yu
Wei_Fan
Ming_Hua
Huiping_Cao
Bin_Jiang
George_Karypis
George_Kollios
Enhong_Chen
Hang_LiEamonn_J._Keogh
Yunbo_Cao
Haixun_Wang
Jian_Yin
Marios_Hadjieleftheriou
Jeffrey_Xu_Yu
Gu_Xu
Jin_Shieh
Sang-Hee_Lee
Victor_B._Zordan
Chotirat_(Ann)_Ratanamahatana
ZhaoHui_Tang
Jeffrey_P._Lankford Xiaoyue_Wang
Stefano_LonardiBill_Yuan-chi_Chiu
Dragomir_Yankov
Donna_M._Nystrom
Li_WeiJessica_Lin
Jose_Medina
Qiang_Zhu_0002
Steven_Loscalzo
Tao_Li
Charles_Perng
Zi_YangChi_Wang
Gabriel_Pui_Cheong_Fung
Michail_VlachosRong_Jin
Horst_D._Simon
Tianbao_Yang
Paul_Castro
Aris_Anagnostopoulos
Joe_McCloskey
Amin_Saberi
Hari_Sundaram
Jennifer_T._Chayes
Yi_Liu
Robert_F._Murphy
Vitor_R._Carvalho
Amr_Ahmed
John_D._Lafferty
Sunita_Sarawagi
Kin_Ung
Ramesh_Nallapati Susan_Ditmore
William_W._Cohen
Eric_P._Xing
Di_Wu
Seung-won_Hwang
Jongwuk_Lee
Ruihua_Song
Shuyi_ZhengMatthew_R._Scott
Xiaowen_Ding
Rui_Cai
Xiaolin_Shi
Jiang-Ming_Yang Zaiqing_Nie
Lei_Zhang
Lie_Lu
Sourav_S._Bhowmick
Muyuan_Wang
Naiyao_Zhang
Qiankun_ZhaoTie-Yan_Liu
Wei-Ying_MaWeimin_Xiao
Ji-Rong_Wen
Bing_Liu
Hsiao-Wuen_Hon
Andreas_Schaller
Bo_Zhang
Thomas_M._Tirpak
Jeffrey_Benkler
Kaidi_Zhao
Minqing_Hu
Jun_Zhu
Bin_Gao
Hua_Huang
Benyu_Zhang
Chunsong_Wang
Jun_Yan
QianSheng_Cheng
Xin_Zheng
Weiguo_FanZhiwei_Li
Shuicheng_Yan
Fabian_Mörchen Bernd_Wachmann
Mathäus_Dejori
Dmitriy_Fradkin
Markus_Bundschus
Julien_Etienne
Hua_Li
Jian-Tao_Sun
Zheng_Chen
Yong_Yu
Tengjiao_Wang
Gui-Rong_Xue
Wensi_Xi
Bishan_Yang
Qiang_Yang
Sinno_Jialin_Pan
Andrew_B._Goldberg
Saara_Hyvönen
Evimaria_Terzi
Jerry_Kiernan
Krishna_Kummamuru Amit_SasturkarDavid_J._CrandallLuis_A._N._Amaral
Xiangyang_LanSpiros_Papadimitriou
Deepayan_Chakrabarti
Vanja_Josifovski
Andrei_Z._Broder
Ashwin_Satyanarayana Zijie_Qi
Boulos_Harb
Mayssam_Sayyadian
Peer_Kröger
Elke_AchtertKun_Liu
Pauli_Miettinen
Theodoros_Lappas Matthias_Schubert
Arthur_Zimek
Rakesh_Agrawal
Jouni_K._Seppänen
Anitha_Kannan
John_C._Shafer
Ariel_FuxmanTaneli_Mielikäinen
Eino_Hinkkanen
Panayiotis_Tsaparas Raghu_Krishnapuram
Kai_Puolamäki Niko_Vuokko
Markus_OjalaEsa_Junttila
Gemma_C._Garriga
Helger_Lipmaa
Sami_Hanhijärvi
Mikael_ForteliusNikolaj_Tatti
Debora_Donato
Carlos_Castillo
Paolo_Boldi
Luca_BecchettiAntti_Ukkonen
Francesco_Bonchi
Katrin_Haegler
Christian_Böhm R._Dean_Malmgren
Nikola_S._Müller Jake_M._Hofman
Alexander_Hinneburg
Siddharth_Suri
Dan_Cosley
Claudia_Plant
Heikki_MannilaFoto_N._AfratiSven_LaurAristides_Gionis
Hannes_Heikinheimo
Hyung-Jeong_Yang
U._Kang
B._Aditya_Prakash
Charalampos_E._Tsourakakis
Daniel_P._Huttenlocher
David_Andersen
Duncan_J._Watts
Edoardo_Airoldi
Gary_L._MillerGueorgi_Kossinets
Jon_M._KleinbergNancy_S._Pollard
James_McCann
Stephen_Bay
Jure_Leskovec
Jeanne_M._VanBriesen
Christos_Faloutsos Lei_Li
Andrew_TomkinsFan_GuoAlessandro_Panconesi
Kevin_S._McCurley
Todd_C._MowryBelle_L._Tseng Kensuke_Onuma
Xiaodan_SongDavid_Selinger
Lars_Backstrom
Patrick_C._K._Hung
Bruce_R._Schatz
Cheuk-kwong_Lee Bin_Tan
Bei_Yu
Qihong_Shao
Ruofei_Zhang
Noman_Mohammed Tina_Eliassi-Rad
Daniel_GruhlJasmine_Novak
Brian_GallagherAlexander_Tuzhilin
Wenjie_FuAtulya_Velivelli
Dengyong_Zhou
Rohini_K._Srihari
Tao_TaoKoji_Hino
Ming-Ting_SunChing-Yung_Lin
Sandeep_Khanzode Junghoo_Cho
Jia-Yu_PanNan_Du
Nicholas_Valler
Markus_G._Anderle
Yehuda_Koren
Natalie_S._Glance
Michalis_Faloutsos
Pinar_Duygulu
Mary_McGlohon
Brian_Taylor
Lisa_Friedland
Haoqiang_Zheng
Matthew_J._Rattigan
Zhiqiang_ZhengJohn_Komoroske
Henry_G._Goldberg
Özgür_Simsek Robert_M._Bell
Zhen_GuoAgma_J._M._Traina André_G._R._Balan Stephen_C._North
Chris_Volinsky
Zhongfei_Zhang
Khalid_El-Arini
Marc_MaierGaurav_Veda
Jennifer_Neville
David_Jensen
Bai_Wang
Jean_BolotAndreas_Krause
Ashwin_SridharanSridhar_Machiraju
Leman_Akoglu
Mukund_SeshadriCarlos_GuestrinDavid_M._Steier
Dafna_Shahaf
Kelly_Palmer
Brian_Neil_Levine
Andrew_Fast
Balaji_Padmanabhan
Hong_Zhang
Robert_Stockton
Kamal_Nigam
Matthew_Hurst
Matthew_Siegler
Takashi_Tomokiyo
Deepavali_Bhagwat
Ludmila_Cherkasova
Joseph_Tucek
Jaap_Suermondt
Alistair_C._Veitch Pankaj_Mehra
Kave_Eshghi
Charles_B._Morrey
Evan_Kirshenbaum
Ingo_Mierswa
Hung_Hay_Ho
Alfred_Ultsch
Wei_Jin
Tushar_Saxena
Rong_Pan
Li_Xu
Jeffrey_Junfeng_Pan
Dou_Shen
Xiao_Ling
Vincent_Wenchen_Zheng
Wenyuan_Dai
Junhui_Zhao
Stephane_Chiocchetti
Ron_Bekkerman
Michael_WurstGeorge_Forman
Timm_Euler
Shyamsundar_Rajaram Ralf_Klinkenberg
Krishnamurthy_Viswanathan Martin_Scholz
Wei_Su
Dawei_LiuDamien_McAullayJiuyong_Li
Chris_Kelman
Ross_Sparks
Lei_Chen_0002
Jie_Chen
Yingyi_Bu
Ee-Peng_Lim
Benjamin_C._M._Fung
Hady_Wirawan_Lauw
Michael_Garland
David_A._Padua
Rui_Li
Bin_He
Ke_WangOlivier_Verscheure
Kun_Zhang
Zhongfei_(Mark)_Zhang
Xifeng_YanBo_Long
Jing_Peng
David_Lo
Hong_Cheng
Deepak_S._Turaga
Shengnan_Cong
Qiaozhu_MeiYi_Chen
Xuehua_Shen
Dong_XinXianghong_Jasmine_Zhou
Zhijun_Yin
Xuanhui_Wang
Ka_Cheung_SiaXu_Ling
Xiao_HuNikos_Anerousis
Jun'ichi_Tatemura
Paat_Rusmevichientong ChengXiang_Zhai
Richard_SproatShu_Tao Xiaoyun_Wu
Yun_Chi
Christian_Borgs
Shenghuo_ZhuChao_Liu_0001
Mohammad_Mahdian ErHeng_Zhong
Jiangtao_Ren
Flavio_ChierichettiRamanathan_V._Guha
Silvio_LattanziD._Sivakumar
Prabhakar_Raghavan
Ravi_Kumar
Michael_Mitzenmacher
Kunal_Punera
Sudipto_Guha
Hanghang_Tong
Dacheng_Tao
Ravi_Sundaram
Uma_Mahadevan
Anirban_Dasgupta
Figure 5: A visualization of our alternative role discovery results for the KDDco-authorship graph’s largest connected component. All the colored nodes be-long to the same primary role under the original factorization. However, theybelong to different primary roles under the alternative factorization, as indi-cated by the various colors. We observe that the alternative roles are able toseparate the 3 blue “local-star” nodes (namely, Jun Zhu, Lei Zhang, and Evi-maria Terzi) from the red “global-broker” nodes (namely, Christos Faloutsos,Heikki Mannila, Vipin Kumar, etc). The alternative roles also separate out the4 yellow “periphery-cliquey” nodes. Note, the reader can zoom in on this figureto read the names of each author.
𝑔1
𝐹
no
de
s
features features
no
de
s
𝒱 ≈ = +⋯+ 𝐺
𝑓1
𝑟1
𝑔𝑘
𝑓𝑘
𝑟𝑘
𝑅
Figure 6: A multi-relational graph represented using an order 3 tensor. ThePARAFAC tensor decomposition is a rank 1 simplification of the graph and isthe natural analog to the earlier used [21, 18] NMF formulation of role discov-ery. However, it has significant limitations for role discovery in multi-relationalgraphs.
34
no
des
features
E-gr
ou
ps
roles
E-group definitions
role
d
efin
itio
ns
Figure 7: The Tucker decomposition for role discovery. The factor matricescan be interpreted as: groups of features (role definitions), groups of entities(E-groups), and groups of relations (R-groups). The Tucker core shows howthe roles/E-groups/R-groups interact in the multi-relational graph and can beviewed itself as a hyper-graph which we call an example of an interaction graph.
roles
topics
Figure 8: Analysis of E-group slices from the tensor core. Each slide shows theroles/R-groups each E-group of people play and are directly comparable.
35
Role E-Group R-Group
No Tie
R-Group Tie
Role Tie
Bow Tie
Figure 9: Some patterns that can exist in an interaction graph. No Tie: E-grouponly plays one role in one R-group; R-Group Tie: E-group plays same role inmultiple R-groups; Role Tie: E-group plays multiple roles in same R-groups;Bow Tie: E-group plays multiple roles but in different R-groups.
36
Pelosi
Reid Blunt
Education Bill 1 Education Bill 2 Agriculture Bill 2
Education Bill 2 Agriculture Bill 1 Agriculture Bill 2
Education Bill 1 Education Bill 2 Agriculture Bill 1
A:1
A:1
E:2
E:1
E:1
Figure 10: Description of how multi-relational graphs are created from the con-gressional cosponsors data. Nodes in this graph represent congressional repre-sentatives and the adjacent lists of hypothetical bills are those that the represen-tative cosponsored. When two representatives cosponsor the same bill, a labelededge is created between them where the label corresponds to the assigned com-mittee for the bill (e.g. Agriculture, Education). The weight associated with alabeled edge corresponds to the number of bills from the same committee a pairof representatives both cosponsored.
37
1 2 3 4 50
0.2
0.4
0.6
0.8
1
Pro
pert
y C
ontr
ibut
ion
Role
DegreeWeightClustering CoefficientPage RankEccentricitiesBiconnected Components
Figure 11: Sense making of roles discovered in the 110th Congress CosponsorMulti-Relational Graph. Roles are redefined in terms of a set of reference fea-tures each of which is normalized for comparison purposes. Role 3 are the powerbrokers.
38
E-group 1Name Party Exp
Millender-McDonald D 11Obey, David D 38Tsongas, Niki D 0Speier, Jackie D 0
Faleomavaega, Eni D 18Meehan, Martin D 14Edwards, Donna D 0Visclosky, Peter D 22
Hoyer, Steny D 26Foster, Bill D 0
(a) Democrat seniority. Hoyer was the majorityleader. Characterized by large number of col-laboration with many representatives largely in3rd R-group (Ways and Means).
E-group 2Name Party Exp
Hensarling, Jeb R 4Boehner, John R 16
Thornberry, Mac R 12Broun, Paul R 0
Shadegg, John R 12Hastert, Dennis R 8Scalise, Steve R 11Latta, Robert R 6
Flake, Jeff R 6McCrery, Jim R 14
(b) Republican seniority. Boehner was mi-nority leader at the time.
E-group 3Name Party Exp
Cooper, Jim D 16Johnson, Henry D 0
Ryan, Tim D 4DeGette, Diana D 10Engel, Eliot L. D 14Doggett, Lloyd D 12
Pastor, Ed D 16Meek, Kendrick D 4
Murphy, C. D 0Crowley, Joseph D 8
(c) Active largely in R-group (5th) butwith multiple roles. The 5th R-group isdominated by the agriculture committee.
E-group 4Name Party Exp
Hall, Ralph R 16Rodgers, Cathy R 2
Myrick, Sue R 12Issa, Darrell R 6
Drake, Thelma R 2Kuhl, Randy R 2
Poe, Ted R 2Boozman, John R 6
Conaway, Michael R 2Wamp, Zach R 12
(d) Working with many representatives(high degree) but not often (low weight) onR-group 5.
E-group 5Name Party Exp
Jackson-Lee, Sheila D 12Cohen, Steve D 0
Hare, Phil D 0Grijalva, Raul D 4English, Phil R 12
Honda, Michael D 6McCotter, Thaddeus R 4
Filner, Bob D 14Hinchey, Maurice D 14Gonzalez, Charles D 8
(e) Mixed party membership
Figure 12: Samples of congressional representatives from each E-group (foundin in the 110th Congress Cosponsorship Graph) along with their party affiliationand years of service in U.S. House of Representatives at beginning of congress(2007). 39
0 0.2 0.4 0.6 0.8
Energy and CommerceRules
Small BusinessTransportation and Infrastructure
AppropriationsVeterans’ Affairs
Education and LaborAgriculture
Ways and Means
Financial ServicesOversight and Government Reform
JudiciaryNatural Resources
BudgetScience and Technology
Relational Topic 1
0 0.2 0.4 0.6 0.8
12
34
56
789
1011
1213
1415
Relational Topic 2
0 0.2 0.4 0.6 0.8
12
34
56
789
1011
1213
1415
Relational Topic 3
0 0.2 0.4 0.6 0.8
Energy and CommerceRules
Small BusinessTransportation and Infrastructure
AppropriationsVeterans’ Affairs
Education and LaborAgriculture
Ways and Means
Financial ServicesOversight and Government Reform
JudiciaryNatural Resources
BudgetScience and Technology
Relational Topic 4
0 0.2 0.4 0.6 0.8
12
34
56
789
1011
1213
1415
Relational Topic 5
Figure 13: R-groups for 100th congress. Each bar plot corresponds to a singleR-group and the bars show how much each relation contributes to the respectiverelation R-group.
40
Roles
Rel
atio
nal T
opic
s
Group 1
1 2 3 4 5
1
2
3
4
5
Roles
Rel
atio
nal T
opic
s
Group 2
1 2 3 4 5
1
2
3
4
5
Roles
Rel
atio
nal T
opic
s
Group 3
1 2 3 4 5
1
2
3
4
5
Roles
Rel
atio
nal T
opic
s
Group 4
1 2 3 4 5
1
2
3
4
5
Roles
Rel
atio
nal T
opic
s
Group 5
1 2 3 4 5
1
2
3
4
5
Figure 14: Tucker core found in in the 110th Congress Cosponsorship Graphsliced by E-group. Each slice represents an E-group while the rows correspondto R-groups and the columns correspond to roles. Light colors correspond tohigh values and black corresponds to zero value.
41
Role 3
Role 4
Role 2
Role1
Role 5
E-group
3
E-group
4
E-group
2
E-group
1
E-group
5
R-group
3
R-group
4
R-group
2
R-group
1
R-group
5
Figure 15: Sparsified tripartite representation of core tensor found in the 110thCongress Cosponsorship Graph. Each entry i, j, k of core corresponds to ahyperedge between E-group i, role j, and R-group k We sparsify this into asingle-relation graph that is role focused. Looking back to our example patterns(see Figure 9), we observe that this congress has two bow ties patterns (E-groups2 and 4, 100% Republicans); one no tie pattern (E-group 5, 80% Democrats),one role tie pattern (E-group 3, 100% Democrats), and one R-group tie (E-group1, 100% Democrats). Figure 12 lists the members of each E-group.
42
Figure 16: Projection and heterogeneous clustering of tripartite graph repre-sentation of core tensor found for the 110th Congress Cosponsor Graph. Colorsrepresent the clustering while marker shapes represent the type of object (E-groups, roles, and R-groups).
43
096 098 100 102 104 106 108 110
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Congress
Con
trib
utio
n
Degree (Mean)
Degree (Variance)
Eigen Gap
Cut Cost
Stationary Dist. (Entropy)
Meta−Average
Figure 17: Properties of interaction graph formed from the Tucker cores for thelast 15 Congresses. Attributes are all normalized for comparison purposes.
44
Congress Transfer To
Con
gres
s T
rans
fer
Fro
m
096 098 100 102 104 106 108 110
110
108
106
104
102
100
098
0960
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Figure 18: Fit quality when transferring roles between 96th to 110th congress.fit = 1 − reconstruction error/||V||. Roles learned from congresses on the x-axis are transferred to each congress as denoted on the y-axis. Transferring totemporally further congresses generally leads to poorer fits.
45