Some Advances in Role Discovery in GraphsTina Eliassi-Rad Rutgers University [email protected]...

Some Advances in Role Discovery in Graphs

Sean GilpinUniversity of California, Davis

[email protected]

Chia-Tung KuoUniversity of California, Davis

[email protected]

Tina Eliassi-RadRutgers University

[email protected]

Ian DavidsonUniversity of California,[email protected]

September 8, 2016

Abstract

Role discovery in graphs is an emerging area that allows analysis ofcomplex graphs in an intuitive way. In contrast to other graph prob-lems such as community discovery, which finds groups of highly connectednodes, the role discovery problem finds groups of nodes that share similargraph topological structure. However, existing work so far has two severelimitations that prevent its use in some domains. Firstly, it is completelyunsupervised which is undesirable for a number of reasons. Secondly, mostwork is limited to a single relational graph. We address both these lim-itations in an intuitive and easy to implement alternating least squaresframework. Our framework allows convex constraints to be placed onthe role discovery problem which can provide useful supervision. In par-ticular we explore supervision to enforce i) sparsity, ii) diversity and iii)alternativeness. We then show how to lift this work for multi-relationalgraphs. A natural representation of a multi-relational graph is an order3 tensor (rather than a matrix) and that a Tucker decomposition allowsus to find complex interactions between collections of entities (E-groups)and the roles they play for a combination of relations (R-groups). Ex-isting Tucker decomposition methods in tensor toolboxes are not suitedfor our purpose, so we create our own algorithm that we demonstrate ispragmatically useful.

1 Introduction

Role discovery is a developing area that allows the simplification of graphs in auser-interpretable way. Consider a graph of n nodes specified in an adjacency

1

arX

iv:1

609.

0264

6v1

[cs

.AI]

9 S

ep 2

016

matrix A. Earlier efforts convert this matrix into a new n× f matrix V so thateach node in the graph has a list of f features [22]. Role discovery is then thecomputation of converting V so that each node/user is mapped to a combinationof roles (denoted by the n × r matrix G) and each role is defined with respectto the f features (denoted by the r × f matrix F). This is accomplished byperforming a non-negative matrix factor decomposition as shown below:

argminG,F

||V −GF||2

subject to: G ≥ 0,F ≥ 0(1)

The n× r matrix G when read row-wise indicates which of the r roles eachnode plays and to what degree. The r× f matrix F when read row-wise defineseach of the r roles with respect to the f features. The entries in G and Fare non-negative real numbers signifying that each node can play each role tovarying degrees and that different features define a role in varying degrees. Thissimplification of graphs into roles is not only intuitive for a domain expert, butit has been shown to be useful in a number of interesting settings includingprediction, transfer learning, and sense making [21].

Limitations in Existing Work. However, all work developed so far hastwo limitations. Firstly, role discovery has been typically completely unsuper-vised in that the domain expert cannot easily inject their expertise and expec-tations into the simplification and secondly role discovery is typically performedon a single relational graph. We now discuss each limitation in turn.

Consider a domain expert that is looking for the simplest explanation of agraph during their exploratory phase of analysis. Existing work cannot specifyhow to emphasize this simplicity apart from requiring a small number of rolesto be used. Other forms of parsimonious guidance such as requiring a node onlybe assigned to a few roles or making each role defined by only a small set offeatures is desirable but currently not possible. Similarly, if a decompositionyields a set of roles that are not actionable, not interesting or already known,the domain expert cannot enforce an alternative set of roles. These two recenttrends in data mining – exploring the addition of positive and negative guidance– have been shown to have wide-scale application in the data mining literature[5][36]; but to our knowledge have not been applied to role discovery. Hencethis work marks the first paper exploring guided role discovery.

To our knowledge previous work in role discovery only focuses on simplegraphs with a single relational type. Conversely, many datasets are either di-rectly multi-relational or can be modeled as a multi-relational graph. Consideran email graph, modeling just one relation sent-mail-to. This graph greatlymasks the complexity of the underlying behavior occurring in the network. In-stead many more insights could be found if say the topic of the email were alsoconsidered producing a multi-relational graph sent-mail-to-y-about-x. Sim-ilarly, consider a node-attributed social network graph, that is, each node hasmultiple labels. Such a graph can augment the basic friend relation by creatingmultiple relations such as female-friend, school-friend or nearby-friendby placing an edge between nodes that are friends that also share label values.

2

Challenges.The challenge with adding guidance to role discovery is how to do so whilst

still yielding an efficiently solvable algorithm. Pre-processing the graph or post-processing the results is undesirable, instead it is preferable to inject the guid-ance into the underlying algorithm that finds the roles. The alternating leastsquare (ALS) is a popular and well understood algorithm for non-negative ma-trix factorization (NMF) for role discovery and the challenge is to add in guid-ance into this algorithm.

The challenge of role discovery in multi-relational graphs is two fold, thefirst is representational and the second is algorithmic.

Representational Challenges. How should a multi-relational graph berepresented for effective and efficient role discovery? In Figure 6, we show howan order 3 tensor can compactly represent such a graph. Here, the first moderepresents the entities (i.e., nodes) in the graph; the second mode is the fea-tures of each entity (obtained from our ReFeX package [22]), and the last moderepresents the relations. The existing work on single-relation graphs uses non-negative matrix factorization. The analog PARAFAC (parallel factor) tensordecomposition for our multi-relational graph tensor, has several serious limita-tions. In particular, it requires each group of nodes to play exactly one rolefor exactly one set of relations. This is not due to the rank one decomposi-tion assumption, but rather due to the simplified form of decomposition. Thiscardinality limitations greatly limits what can be found. Consider our afore-mentioned example of an email network, we could perhaps find that a groupof people play the role of a broker for a particular email topic, office-party.Though useful, if those people also play a different role for the same email topica PARAFAC decomposition could not find it. Similarly and most importantly, ifanother group were to play the role of a peripheral figure for the exact sametopic (office-party) PARAFAC would not be able to discover this relation. Itis precisely these types of complex multi-way interactions between people, rolesand relations that we wish to discover. Hence we do not consider PARAFACdecompositions, though it would be the natural extension of our earlier work onrole discovery to multi-relational graphs. Instead we use a Tucker decompositionshown in Figure 7 whose addition of a core tensor to the decomposition allowsmultiple groups of entities (E-groups) to play multiple roles for multiple groupsof relations (R-groups). This allows very complex insights into the behavior inthe graph to be found, but the challenge of how to interpret and use this coreis critical to our work.

Algorithmic Challenges. The second challenge is that existing Tuckerdecompositions found in the popular Kolda Tensor Toolbox and Bro NWayToolbox are not suited for our purpose. Existing toolboxes implement an or-thogonality constraint on the factor matrices which in our context (where thetensor only contains non-negative values) means each group of entities must bedistinct (i.e., non-overlapping) from every other group, and the same for rolesand groups of relations. Similarly existing toolboxes typically do not enforcea non-negativity constraint on the core of the Tucker meaning if we use themwe would have entities playing negative roles which does not make intuitive

3

sense. Hence to better fit our needs of having interpretable insights on overlap-ping groups of entities (a.k.a. E-groups), roles, and groups of relations (a.k.a.R-groups), we develop our own algorithm, Multi-relational Role Discovery(MRD), shown in Algorithm 1.

Our work makes several contributions to the field of role discovery in graphs.With respect to guided role discovery we show:

• We provide a framework to encode guidance as a series of convex optimiza-tion problems each of which can be efficiently solved by our alternatingleast squares (ALS) algorithm. All data sets and code will be made avail-able once the paper is accepted.

• Within our framework we explore guidance in the form of sparsity, di-versity and orthogonality/alternativeness but other types of guidance arepossible.

• We show that sparsity and diversity yield improved performance in termsof predictive accuracy for the identity resolution task across multiplegraphs.

• We show that alternative roles exist in social networks (such as in aYouTube graph) and in particular these roles are very different from theknown communities in the data.

With respect to multi-relational role discovery we show:

• We propose and study role discovery in multi-relational graphs using ten-sors and using our novel MRD Tucker decomposition algorithm (see Sec-tions 6 and 7 ).

• We show how to analyze the core tensor of the Tucker decomposition in amultitude of visual and analytic ways to explain the complex interactionsoccurring (see Section 8).

• We create and measure macro-level properties of the interaction graphsuch as the simplicity, sharing and stability of the graph with respect toroles (see Table 5).

• We use a constrained formulation of our algorithm that allows transfer-ring in knowledge (i.e., roles) from a graph to explain another graph (seeSection 9.2) This allows understanding temporal shifts in roles (see Figure18).

In the next section, we describe related work and then an algorithm forincorporating convex constraints in non-negative matrix decomposition whichallows us to encode guidance in a flexible way. Section 4 presents how convexconstraints can naturally encode guidance in the form of sparsity and diversityon both the role assignment matrix (G) and role explanation matrix (F). Wealso present how these constraints can encode the notion of alternativeness to

4

find a different set of roles to another set that are for instance non-actionable ortrivial. Our experiments on guidance, in Section 5, demonstrate the usefulnessof these forms of guidance in a number of applications and real-world graphs.We show how sparsity and diversity guidance can improve upon prediction per-formance for the application of identity resolution via roles. We also show howalternativeness can be used to find an alternative set of roles to the underlyingcommunity structure. Next in Section 6 we show how multi-relational role dis-covery can be formulated as a tensor decomposition problem. In particular itcan be modeled using a non-negative tucker decompositions, and in that sectionwe also propose our Tucker decomposition algorithm. The Tucker decomposi-tion allows capturing many of the complex interactions between nodes, the rolesthey play, and the relationships they play them for, which are captured in thecore tensor of the Tucker decomposition. In Section 8 we discuss how the coreof the Tucker can be interpreted a number of ways, including as a heteroge-neous hyper-graph on the space of groups of nodes, groups of features (roles)and groups of relations. Our work opens up many possible novel uses and weexperimentally focus on two: i) macroscopic properties of the graphs in termsof roles and ii) transfer settings between multiple graphs which are discussed inSection 9.

2 Related Work

The basis for role discovery in graphs using non-negative matrix factorization(NMF) was first proposed in a series of papers at KDD [22][21]. The methodReFeX [22] described a recursive method to take a n× n adjacency matrix (A)and compute a set of f salient features for each of the n nodes represented asa matrix V . The RolX method [21] made use of NMF to simplify the featuresinto a set of roles and explored their use for graph matching, sense making andtransfer learning. Many previous works had applied NMF to other data miningproblems (e.g. [40][28]) but theirs was the first to apply it to role discovery.Other methods for role discovery are not scalable to huge graphs and includeBayesian frameworks using MCMC sampling methods [37] and semi-supervisedrole labeling [17].

The addition of guidance to matrix decomposition is a relatively new areawith most work involving spatial data and properties such as unimodality aswe have done for tensors [12]. Of course much work exists on very basic con-straints such as non-negativity and minimal rank decompositions. The area ofconstraints for matrix decomposition takes on several different meanings to ourown work. For example in [30] the authors propose the use of labeled informa-tion to guide the decomposition. Perhaps the closest to our own work is the useof sparseness constrains in NMF [23].

To the best of our knowledge the encoding of guidance for role discovery andthe encoding of diversity and alternative constraints for NMF as described inthis paper has not been addressed before. However, the notion of guidance and“alternativeness” is popular in the clustering field with work by ourselves and

5

others [5][36].

3 A Constrained NMF Framework for EncodingGuidance into Role Discovery

In this section, we discuss our algorithm for solving the guided role-discoveryproblem. We present a general algorithm that is well-suited for large-scaleproblems, and is capable of being extended to different forms of guidance. Thedifferent supervisions (described in Section 4) are solvable using this algorithm.

Our algorithm for solving the guided role discovery problem is a constrainedNMF approach used to find the decomposition shown in Equation 2. Likemany unconstrained NMF solvers, it uses the alternating least squares approach[35, 7]. Non-negative least squares is a well-studied problem, and can be utilizedto find an NMF solution by solving for one matrix at a time (G or F), whileholding the other constant which is generally known as alternating least squares(ALS). NMF is known to be intractable; and the ALS approach is not guaranteedto find global solutions but will converge to a local minimum. In this work, weadd additional constraints to the problem and therefore need more sophisticatedmethods.

The method we chose was motivated by gradient projection methods, whichare known for being well-suited to quickly finding good but not highly accuratesolutions for large problems, by sacrificing some of the theoretical convergenceguarantees of methods such as interior point [6]. Projected gradient descentmethods can be summarized as those that iteratively find better points by fol-lowing the gradient of the objective function, and subsequently find the closestpoint that meets the constraints. Since the objective we are solving is leastsquares, we have a closed form solution to the unconstrained minimum fromwhich we subsequently find the closest constrained solution. It is known, thatfor a class of constrained least squares solution, this approach will lead to anexact global solution in one iteration (see Lemma 1).

Therefore, our algorithm has the advantage that each subproblem (but notthe entire problem) can be solved exactly by reducing it into an unconstrainedleast square problem [39][3] and an Euclidean projection problem [14][32], bothof which have efficient solutions. Additionally, this approach to optimization(projected gradient descent) has been shown in the past to work well on large-scale problems, at the expense of accuracy, and is used by state of the art solvers[31].

The outline of the remainder of this section is as follows. First, we formallydescribe the convex constrained NMF problem and discuss how ALS can be usedto solve it. Then, we explain how ALS can also be used to solve for individualrole assignment vectors, as well as role definition vectors. Finally, we describehow ALS over definition/assignment vectors can be solved using a projectionmethod by first solving an unconstrained least squares problem and then findingthe closest point in the constrained space.

6

The Constrained NMF Problem In Equation 2, there are two variables Gand F that are being simultaneously optimized. If either is treated as a constant,the problem becomes convex and can be solved exactly using any method forsolving convex optimization problems. One can alternate between solving forG and F this way until convergence. Although each iteration finds a globaloptimum to this modified problem, the result of this procedure (alternatingoptimization) is not guaranteed to find a global minimum to the original problemin Equation 2. In the following, we describe our method for transforming theformulation into a series of convex programming problems, which are generallyeasy to solve.

minimizeG,F

||V −GF||2

subject to gi(G) ≤ dGi, i = 1, . . . , tG

fi(F) ≤ dFi, i = 1, . . . , tF

(2)

where gi and fi are convex functions.

An ALS Formulation Rather than alternating between solving for the entirematrices G and F, we can instead solve for one column of G or one row of F ata time. This is possible if convex constraints can be specified in terms of thesecolumns, which is the case in this work. Without loss of generality, Equation 3shows an individual sub-optimization problem in terms of one of the columnsof G, denoted x.

Gk = minimizex

||R− xFk||2subject to: gi(x) ≤ dGi, i = 1, . . . , tG

(3)

In Equation 3, R represents the residuals of all other factors not being solvedfor (sum of outer products of corresponding columns of G and rows F). Fk isthe kth row of the role/feature explanation matrix that corresponds to the kth

column of the role assignment matrix. So with this formulation, we alternatebetween learning single role assignments, followed by learning a role definition.Next we explain how we solve the convex constrained problem shown in Equation3.

Solving The Constrained Least Squares Problem Our projection methodis as follows. First, solve Equation 3 with all constraints removed using stan-dard least squares solvers. Second, find the closest point to the unconstrainedsolution, that satisfies the given constraints. This projection method takesadvantage of standard and very fast least squares solvers and the subsequentnearest feasible point problem is relatively simple to solve. In addition, Lemma1 shows that performing these two steps will exactly solve the original prob-

7

lem in Equation 3. Applications of this theorem and its proof can be found in[10][20].

Lemma 1 Projection Equivalence Result. The following constrained optimiza-tion problem:

minimizex

||B− xa||2subject to: ci(x) ≤ di, i = 1, . . . , n

(4)

where ci are convex functions on x, is equivalent to:

minimizex

||x∗ − x||2subject to: ci(x) ≤ di, i = 1, . . . , n

(5)

where x∗ is the optimal to the optimization problem in Equation 4 without con-traints.

This leads to the following algorithm for convex constrained NMF presentedin Figure 1. Like ALS for unconstrained NMF, this heuristic is not guaranteedto meet a global optimum, even though all subproblems are solved exactly.However, each step will lead to a reduction in the global objective (Equation 2).Thus, in practice the algorithm will find local minima that meet all specifiedconstraints.

The advantage of solving for one role at a time rather than the entirety of Gor F as is generally done with ALS, is that it allows the problem to be brokendown into smaller parts that then fit into fast solvers. In general, projectionmethods have been found to be better suited to larger problems and we foundthis to be the case as well. Using this method allows us to solve much largerproblems than we had previously been able to using standard constrained opti-mization solvers [12]. The final constrained optimization problem (i.e., closestconstrained point problem) is simple enough that we find for even medium-sizedproblems we could utilize high level solvers such as CVX [11][19], which makesexperimenting with new types of constraints very simple.

4 Framework for Flexible Supervision

In the previous section, we discussed a novel and general algorithm that caneasily handle convex constraints. Convex constraints can encode a variety ofuseful guidances. In this section, we show how they can be used to enforcesparsity, diversity and alternativeness. In the experimental section, we showapplications which exploit these forms of guidance.

4.1 Sparsity

The area of sparsity has recently attracted much attention. In a general context,sparsity has been shown to have two main benefits: (1) parsimony and (2)improved predictive performance, with the later being motivated by Occam’s

8

Table 1: Summary of effects of constraints on both role assignment G and roledefinitions F (see Section 4 for formulation of each constraint type).

Role Assignment Role DefinitionSparsity Encourages role assignments to

be more definitive. Increasingthe strength of constraint re-duces the number of nodes thathave minority membership inrole.

Increases the ability to inter-pret role definitions by ensuringthat the definitions only use fea-tures most strongly correlatedwith each role. Increasing thestrength of constraint decreasesthe likelihood that features withsmall explanatory benefit areincluded.

Diversity Roles cannot have membershipsthat are too similar. No tworoles can have exactly the samemembership assignment. In-creasing the strength of the con-straint limits the amount of al-lowable overlap in assignments.

Roles cannot have definitionsthat are too similar. No tworoles can have redundant ex-planations and increasing thestrength of constraint ensuresthat roles must be explainedwith completely different sets offeatures.

Alternative Find a set of roles that lendsitself to a different role assign-ment than a given role assign-ment. Increasing the strength ofconstraint, decreases the allow-able similarity between the two.

Learn a role definition matrixthat is significantly differentthan a given role definition. In-creasing the strength of con-straint ensures that the defini-tions must be very dissimilar.

9

Inputs:

• V: Node feature matrix containing n nodes described by f topologicalstructure features.

• gi(x),fi(x): Convex constraints on columns of G and rows of F respec-tively.

• r: Number of roles (methods for learning r described in previous work[21]).

Outputs:

• G: Role assignment matrix that satisfying all constraints.

• F: Role definition matrix that satisfying all constraints.

Algorithm:

while reconstruction error decreases do{

for k = 1 . . . r //Recalculate each role.{

1. Calculate R = V −G•( 6=k)F(6=k)•

2. Calculate G•k by solving for x as follows:

(a) x∗ = argminx||R− xFk•||2

(b) G•k = argminx||x∗ − x||2 s.t. gi(x) ≤ εi : ∀i

3. Calculate Fk• by solving for x as follows:

(a) x∗ = argminx||R− xG•k||2

(b) Fk• = argminx||x∗ − x||2 s.t. fi(x) ≤ εi : ∀i

}}

Figure 1: Our algorithm that will be used to encode all guidances described inSection 4. The algorithm uses a least squares approach and allows additionalconvex constraints to be added to the NMF formulation.

10

razor. Sparse learning formulations exist for many learning settings such aslinear regression (LASSO), Kernel methods (SVM) and covariance estimation.

In our work, we can place sparsity constraints on both the G or F matricesleading to an objective function of:

argminG,F

||V −GF||2

subject to: G ≥ 0,F ≥ 0

∀i ||G•i||1 ≤ εG∀i ||Fi•||1 ≤ εF

where εG and εF define upperbounds for the sparsityconstraints (amount of allowable density).

(6)

Previous works have shown the effectiveness of using L1 norm as a penaltyin model learning. In our formulation the L1 penalty is encoded as a constraintrather than a penalty in the objective, but it is known that these formulationsare theoretically equivalent [8]. However, another twist to our formulation isthat we do not constrain the entire matrix but instead constrain each columnof G and each row of F. This was done because our solver requires constraintsto be formulated only over one role vector at a time. The effect of this technicaldifference is that the sparsity must be more uniformly spread across each roledefinition or role assignment which is a benefit of this method.

Sparsity constraints on G and F have easy to understand intuitive inter-pretations. If G is sparse, it means that nodes are assigned to as few roles aspossible; and it is possible for some nodes to be assigned to no roles. If F issparse, it means that the roles are defined with respect to as few features aspossible. Both of these extensions allow for a simple explanation of the data,and lead to improved prediction performance.

4.2 Diversity

In the NMF forms of role discovery, nothing prevents the roles to which nodesare assigned (i.e., the G matrix) and the role definitions (i.e., the F matrix) tobe highly overlapping. This can be undesirable particularly for the F matrixsince it means all roles are highly similar. This can be overcome by enforcing adiversity requirement so that each role uses a different set of features (for theF matrix) and nodes are assigned to different combinations of roles (for the Gmatrix).

Our formulation for role allocation diversity (G matrix) and role definitiondiversity (F matrix) makes use of orthogonality as follows:

11

Figure 2: Visualization of diversity constraints on role explanation matrix F(roles × features) for DBLP dataset. The top matrix shows the unconstrainedresult; the bottom matrix is constrained to be completely diverse (ε = 0); andthe middle matrix shows a middle ground. From the top matrix to the bottommatrix, the number of black cells (i.e. zero values) increases since roles definitionsmust be explained with completely different sets of features.

argminG,F

||V −GF||2


∀i, j GT•iG•j ≤ εG i 6= j

∀i, j Fi•.FTj• ≤ εF i 6= j

where εG and εF define upperbounds on how angularlysimilar role assignments and role definitions canbe to each other.

(7)

When ε = 0, our constraint will exactly match the definition orthogality, andwhen ε ≥ 0 the constraint can be viewed as limiting the angular similarity be-tween two vectors. The effect of combining this constraint with non-negativityconstraints is that no role definitions will have any common features and norole assignments will have overlapping populations for ε = 0. This is so sinceGT•iG•j = 0 if and only if these two vectors do not share any non-zero entries.

Figure 2 shows such an example, where none of the three roles have any over-lapping features. In the context of our solver which solves for one vector at atime, this constraint will be linear (a weighted sum).

12

4.3 Alternative Role Discovery

Recent work on another unsupervised problem, clustering, has explored thearea of alternativeness [36, 13]. In that literature, the term alternativeness andorthogonality are used interchangeably, but we only use the term alternativenessfor clarity.

The motivation for alternativeness in unsupervised learning is strong. Mostinteresting problems are on large data sets that contain complex phenomena andthere may exist multiple explanations of the data. However, most unsupervisedlearning algorithms expect that there exists only one good explanation of thedata and return one explanation.

In many situations, it may be the case that the returned explanation isundesirable since it is either unactionable or not novel. Consider the IMDB(Internet Movies Database) dataset. If the resultant roles map actors to thestudios for which they work, then this is not particularly novel. Here, the workon alternative role discovery allows a previously discovered set of role allocations(G∗) or role definitions (F∗) to be specified as a counter-example of what notto find. The challenge though is to find another good explanation of the datathat is different to those already found.

The optimization problem to find alternative roles is then:

argminG,F

||V −GF||2


∀i, j G∗T•i G•j ≤ εG∀i, j F∗i•F

Tj• ≤ εF

where εG and εF define upperbounds on how similarthe results can be to G∗ and F∗.

(8)

5 Experiments for Guided Role Discovery

Our experiments demonstrate how constraints on graph role discovery can beuseful. Role discovery requires the user to specify the number of roles to use anda set of features for a graph. For the former, we used the Minimum DescriptionLength (MDL) described in [21] to automatically select the number of roles;and for the later, we used the approach described in [22]. We show that rolediscovery can be used to improve the results of the identity resolution problembetween two graphs, and that they can be further improved by using sparsityor diversity constraints. By using sparsity or diversity constraints, we improvethe role definitions which leads to more meaningful role assignments and moreaccurate identity resolutions. See Section 5.1 for these experiments. We alsoexperimentally verify the solutions to the alternative role discovery formulationpresented in Section 4.3 and observe that they indeed produce significantlydifferent results. The purpose of our experimental section is to address thefollowing questions:

13

Network —V— —E— k —LCC— #CCVLDB 1,306 3,224 4.94 769 112SIGMOD 1,545 4,191 5.43 1,092 116CIKM 2,367 4,388 3.71 890 361SIGKDD 1,529 3,158 4.13 743 189ICDM 1,651 2,883 3.49 458 281SDM 915 1,501 3.28 243 165

Table 2: Information about DBLP co-author networks for each conference.Data was collected for five years (2005-2009). —V—=number of vertices,—E—=number of edges, k=average degree, —LCC—=size of largest connectedcomponent, #CC=number of connected components.

1. Does adding constraints to the NMF-based role discovery formulation im-prove the quality of the resulting role explanations and assignments? Fig-ures 3 and 4 show that constraints improve the results of identity resolu-tion.

2. What effects do diversity constraints have on role discovery results? Fig-ures 3 and 4 show how diversity constraints can improve role discoveryresults even more so than sparsity constraints.

3. Can our alternative role discovery formulation produce significantly dif-ferent results? Tables 3 and 4 shows that our formulation can produceresults that are significantly different than a given set of roles or commu-nity assignments respectively.

5.1 Sparse and Diverse Identity Resolution in Co-authorshipGraphs

In this experiment, we show that by adding sparsity and diversity constraintsto the NMF formulation of role discovery, the resulting role definitions are ofhigher quality. We measure this improvement in quality indirectly by showinghow role definition matrices can be used for resolving identities of nodes acrossgraphs, and that constrained role definitions perform better than unconstrainedrole definitions for that problem.

From the DBLP data-set [27], we extracted a co-author graph from eachof the following related conferences from 2005 to 2009: KDD, ICDM, SDM,CIKM, SIGMOD, VLDB (see Table 2 for detailed information about each co-author graph). We extract a set of relevant structure features for the KDDgraph using REFEX [22], and compute these same features for all of the co-author graphs. We subsequently learn a set of role definitions from the KDDgraph using standard RolX [21] as well as the sparse and diverse versions ofGLRD. For each of these competing role definitions, we assign each vertex fromeach graph to the roles whose function they most exhibit. As a baseline, we also

14

explore author identification without roles by using the raw graph features asdescribed in ReFeX.

We use the role assignments to resolve the identities of vertices from eachgraph (namely, ICDM, SDM, CIKM, SIGMOD, and VLDB) to the vertices inthe KDD graph. Without loss of generality, assume we are resolving identity ofauthors from the KDD graph to the authors in ICDM graph. For each authorin both conferences, we select the corresponding row vector from the node byrole matrix Gkdd and find the k closest neighbors (row vectors) from Gicdm. Ifthe original author from KDD graph is present in the set of k closest neighbors,we count the result as a match. We repeat this experiment using sparsity anddiversity constraints on Fkdd. We also repeat the experiment using the ReFeXfeatures, comparing author feature vectors from Vkdd and Vicdm. Figures 3 and4 shows how the different decomposition methods compare in this setting for allgraphs paired with KDD.

Our method of utilizing role discovery results for the author identificationtask is described formally in the following set of steps:

1. Extract features from co-authorship graphs to get graph features (e.g. Vkdd,Vicdm)using ReFeX.

2. From the graph features matrix Vkdd perform role discovery to obtainGkdd and Fkdd.

3. Transfer the role definition matrix Fkdd (role by feature matrix) to othergraphs (e.g. Vicdm) by solving Equation 9.

Gicdm = minG||Vicdm −GFkdd||2 s.t. G ≥ 0 (9)

Our experiments with graph identity-resolution show that diversity andsparseness constraints almost universally improve the quality of learned role-definition matrix. This is not unexpected since there is a long tradition inmachine learning of using sparsity to prevent overfitting. As mentioned previ-ously we can view diversity as enforcing sparsity since a diverse set of roles asper our definition do not share many overlapping features and hence each roledefinition is concise.

Figure 3 shows that role definitions learned using sparsity and diversity out-perform standard unconstrained role discovery (RolX) in almost every settingand problem parameterization. Figure 4 more clearly shows the general trendby considering the results for a particular problem parameterization. In thatfigure, we observe that diversity constraints lead to the most improvement overRolX, while sparsity improvements are lesser. We also observe that transfer-ring the KDD role definitions to some graphs (like VLDB and SIGMOD) doesnot compare well to the baseline method that does not use any roles (such asReFeX). We believe this is because the same participants in conferences such asVLDB and SIGMOD do not have a similar role to the ones they play in KDD;and hence, using the raw features (without roles) produces better results.

15

We believe that sparsity improves the quality of role definitions by reducingthe ability of unconstrained NMF-based role discovery to overfit the problem.Features that only slightly add to the definition of a role are more likely to beexplaining noise; and by forcing those values to zero, we end up with more robustdefinitions. Furthermore, the diversity constraints help by removing redundancyin role definitions, which leads to definitions that are more easily comparable.For example, if a feature is used to define every role, then it is not essential indefining any of them.

5.2 Alternative Roles

In this section, we show that our alternative role discovery formulation (pre-sented in Section 4.3) can discover significantly different role definitions, as wellas show that the formulation can be used to improve the role definitions whenthere are ground-truth communities. In Table 3, we show the difference betweenan alternative role discovery result and an original role definition found usingunconstrained role discovery (via RolX). In Table 4, we show that we can useour formulation to get more consistent assignments of roles when ground-truthcommunities are known.

In our first experiment, we explore the difference between the roles of theoriginal and alternative role discovery. Using the KDD co-authorship graph, wefind a set of roles and constrain a new solution to have a significantly differentrole definition (F matrix). We then compare the results by assigning each vertexto its most dominant role in both results to create two separate partitions ofthe vertices. We then measure the difference between the two partitions usingJaccard distance. Table 3 shows that all of the Jaccard distances are far from 0meaning that the alternative role assignments are very different than the originalones. Figure 5 illustrates the alternative roles found in the largest connectedcomponent of the KDD coauthorship graph. Note, the reader can zoom in onthis figure to read the names of each author. The following is a description of theoriginal roles and the roles that GLRD(Alternative) found. These descriptionare based on sense-making analysis [21]. As the descriptions show these rolesare capturing alternative concepts.

R1(alt) R2(alt) R3(alt) R4(alt)R1 0.946 0.510 0.762 0.913R2 1.000 0.971 0.810 0.739R3 1.000 0.7942 1.000 1.000R4 0.345 0.991 1.000 0.982

Table 3: Jaccard distance matrix comparing original role assignments (rows)to alternative role assignments (columns). Jaccard distance of 0 represents anexact match between clustering and 1 represents no overlap. The relative errorfor the two decompositions was similar: 0.12% and .5% (where relative error iserror = ||V −GF||/||V||).

16

Original Roles:

Role 1: Nodes here have high eccentricity. These are periphery nodes.

Role 2: Nodes here have high eccentricity and high clustering coefficient.These are periphery nodes that are cliquey.

Role 3: Nodes here have high degree and high clustering coefficient. Theseare highly connected cliquey nodes.

Role 4: Nodes here have high PageRank, high degree, and high biconnectedcomponents numbers. These are globally central stars and brokers.

Alternative Roles:

Role 1: Nodes here have high PageRank and high biconnected componentnumbers. These are globally central and brokers.

Role 2: Nodes here have high clustering coefficient but not high eccentricity.These are non-periphery nodes that are cliquey.

Role 3: Nodes here have high eccentricity and high clustering coefficient.These are periphery nodes that are cliquey.

Role 4: Nodes here have high eccentricity and high degree. These are pe-riphery nodes that are locally stars.

We next experiment with a YouTube dataset, which is a network of userswith known ground-truth communities [33]. This graph was created by crawlingthe YouTube site in 2007 and creating directed edges between a pair of usersa and b if a’s profile page linked to b’s profile page. Ground-truth communi-ties were assigned by collecting all users belonging to the same group, whichwere pages that allowed communications between users on given topics. Thegraph has 1,134,890 vertices, 2,987,624 edges, and 8,385 communities. We se-lected all communities with over 100 users of which there were 105. The largestcommunity has 2,217 users.

There is an inherent complementariness between role discovery and commu-nity detection. The former is about structural similarity; while the latter isbased on proximity in the graph. Role discovery finds functions/roles of usersbut does not find the communities themselves. However, there may be multipleinteresting sets of communities within the same network and those communitiesmay be characterized by very different roles. In this experiment, we encode theset of ground-truth communities for which our role discovery technique shouldfind roles.

The way we encode the YouTube ground-truth communities into our anal-ysis is by providing the communities as G∗ to our alternative role discoveryformulation. This will force our discovered roles to have a role assignment thatis different than ground-truth communities, which matches the semantic rela-tionship between the two problems.

17

To evaluate the effectiveness of this result we measured the proportion ofmembers in each community belonging to each role. We then calculated thestandard deviation over all such communities per role and report the resultsin Table 4. The assumption for this evaluation is that each role should beequally represented in each community. Our results show that the alternativerole discovery formulation can indeed be used to normalize the roles with respectto a set of ground-truth communities. After applying sense-making [21], the sixroles that our GLRD(Alternative) finds are as follows:

Alternative Role 1: Nodes here are global hubs. They have high PageRankvalues, high out-degrees, and high biconnected component numbers.

Alternative Role 2: Nodes here are on the periphery of the graph. They havehigher than default eccentricity.

Alternative Role 3: Nodes here are authorities. They have high PageRankvalues and high in-degrees.

Alternative Role 4: Nodes here are very cliquey. They have high clusteringcoefficients.

Alternative Role 5: Nodes here are local hubs. They have high out-degreesand high biconnected component numbers.

Alternative Role 6: Nodes here are the majority of the population; they arethe “regular” folks. They have a local neighborhood that is more cliqueythan expected but otherwise nothing special stands out.

Roles 1 2 3 4 5 6Original 7.85 7.93 8.70 2.35 9.81 7.57Alternate 5.06 6.34 5.34 3.81 8.62 5.88

Table 4: For each role, we report the standard deviations of role proportionsover all communities. The result shows that our alternative role discovery for-mulation can be used to find roles whose members are better distributed acrossa set of interesting communities. The values are scaled by 102.

6 Lifting our Formulation for Multi-RelationlRole Discovery

Here we outline our method to lift our previous work to perform role discoveryin multi-relational graphs. We do not recreate the same experiments since theyare trival but instead focus on the more challenging problem of role discoveryin multi-relational graphs.

Role Discovery in Multi-relational Graphs. Our approach to extendingrole discovery to multi-relational graph is to model the graphs as a tensor. This

18

is done by extracting features from each relation and appending the resultingfeature matrices into a single tensor V of dimension n× f × r. Just as NMF isused to decompose a feature matrix V , tensor decompositions can be used todecompose a feature tensor V. One natural choice of tensor decompositions todecompose a feature tensor would be non-negative PARAFAC [16]. PARAFAClike NMF is a rank one decomposition see Figure 6. However, PARAFAC isnot an ideal model to find complex patterns in graphs, as is desired for rolediscovery, because it is too simplistic in its assumptions. In particular it willonly allow each group of entities to play only one role for only one group ofrelations. See the introductory section for a more indepth explanation of thelimitations of PARAFAC.

argminG,F,R

||V −∑

k

gk ◦ fk ◦ rk||Fro

subject to: G ≥ 0,F ≥ 0,R ≥ 0

(10)

Instead we use the Tucker decomposition (shown in Equation 11) that allowsus to find the complex interaction between E-groups, the roles they play, andR-groups they play those roles in. The diagrammatic explanation of Tucker de-composition in Figure 7 shows how it models these interactions. Like PARAFACand NMF, it is a rank one decomposition which allows for an intuitive inter-pretation. A column in G corresponds to a group of people and is a length nindicator vector showing E-group membership. Similarly a column in F cor-responds to a role definition which is a group of features and a column in Rcorresponds to a group of relations which we refer to as an R-group. UnlikePARAFAC and NMF, any factor can be any combination of the columns in G,F , and R. The core of the Tucker decomposition allows this complex interactionand requires more explanation (PARAFAC can be viewed as a specific Tuckerwith diagonal core). It too is a order 3 tensor except the modes are now directlyinterpretable as E-groups, roles, and R-groups. An entry in the core at i, j, kmeans that E-group i plays role j for R-group k. Understanding and simplify-ing this core is critical to the success of multi-relational role discovery using aTucker decomposition.

argminG,F,R,H

||V −∑

i

∑

j

∑

k

hijk ∗ gk ◦ fk ◦ rk||Fro

subject to: G ≥ 0,F ≥ 0,R ≥ 0,H ≥ 0

(11)

7 Our MRDAlgorithm For Multi-Relational Graphs

The Tucker model has most often been described as a higher order analog ofprincipal component analysis or singular value decomposition and is tradition-ally defined with factor matrices being orthogonal. Among the most populartensor toolboxes, the Tucker model is often implemented with orthogonalityconstraint on the factor matrices (Tensor Toolbox [4, 2]) or with no constraint

19

enforced on the core (Nway Toolbox [1]). Other recently proposed algorithmsfor non-negative Tucker model [24, 34] extend the classical multiplicative up-date procedures proposed for NMF [26], which is known to converge slowlynear stationary points [29]. Since the alternating least squares (ALS) methodis known as the “workhorse” algorithm for PARAFAC [25] and is empiricallydemonstrated to be competitive among many existing methods [38], we imple-ment our own version of non-negative Tucker decomposition using an alternatingnon-negative least squares (ANLS) scheme.

Let V be the tensor to be decomposed. Denote the factor matrices by G,Fand R and the core tensor by H. In each iteration we optimize over each ofG,F,R and H in turn while fixing all others as constants. When G is beingoptimized, the objective can be written as:

argminG≥0

‖VG −GHG(R⊗ F)T ‖Fro (12)

where VG is the matricization of V in the first mode and ⊗ is the Kroneckerproduct. The subproblems when F and R are being solved for have the exactsame form but with a different variable being optimized. In addition it is gener-ally desirable for the entries in the core to indicate the weights of each couplingof factors. Thus we normalize the columns of G,F and R once they are solved.When we solve for the core H, rewriting the tensors in vectorized form turnsthe objective into:

argminH≥0

‖vec(V)− (R⊗ F⊗G)vec(H)‖Fro (13)

where vec(·) is the vectorization of a tensor. Our overall solver is summarizedin Algorithm 1. We build our solver on top of the existing constructs in theMATLAB tensor toolbox [2] and employ the fast non-negative least squares(NNLS) solver particularly designed for tensor decomposition [9] when we solvesubproblems (12) and (13). For the terminating condition we adopt the commonpractice for ALS which stops when the relative change in the objective betweensuccessive iterations is smaller than some pre-set threshold. It is worth notingthat although we only enforce non-negativity constraints in this case, it requireslittle effort to adopt any constraint applicable to standard least squares probleminto our formulation.

20

Algorithm 1 Multi-relational Role Discovery (MRD) using Alternating LeastSquares Non-negative Tucker decomposition.

1: Initialize G,F,R and H to any non-negative values2: while Stop condition not met do

3: G← argminG≥0

‖VG −GHG(R⊗ F)T ‖Fro

4: Normalize the columns of G

5: F← argminF≥0

‖VF − FHF (R⊗G)T ‖Fro

6: Normalize the columns of F

7: R← argminR≥0

‖VR −RHR(F⊗G)T ‖Fro

8: Normalize the columns of R

9: H ← argminH≥0

‖vec(V)− (R⊗ F⊗G)vec(H)‖Fro

10: end while11: return G,F,R,H

Algorithm Complexity. Our algorithm is an example of alternating leastsquares with each step being efficiently solvable using least squares solvers. Thenon-negativity requirement on the core can be efficiently enforced by solvers.Since tensor decomposition is well known to be intractable, we provide an es-timate of our algorithm’s run time to converge to a good local minima. Thealgorithm like most tensor decomposition algorithms has linear complexity withrespect to the number of factors, modes and size of the core. In practice thedecomposition of our graphs shown in the experimental section took under aminute to run on a 12-core machine.

8 Interpretting Tensor Decomposition for RoleDiscovery

After applying Algorithm 1 we have decomposed the multi-relational graph intoa series of E-groups (defined by G), a series of roles (defined by F ) and a seriesof R-groups (defined by R). The core of the Tucker decomposition measures theinteraction between these E-groups, roles and R-groups. Here we show how tointerpret and analyze the results of Tucker decomposition in a number of ways.

8.1 Visually Interpreting Core Slices

We begin with the simple but useful approach of visually inspecting the core ten-sor slices to compare E-groups, roles, or R-groups. A slice of the core (dependingon its orientation: left-to-right, top-to-down or back-to-front) can represent aE-group, role, or R-group. Different slices of the same orientation can then beused to compare the similarity of E-groups, roles and R-groups. For examplein Figure 8 we display the slices corresponding to different E-groups from amulti-relational role discovery result.

21

Comparing the slices directly leads to very detailed comparison of E-groupsbecause we compare for example if they have role/R-group combinations incommon. However if we consider aggregations of these slices we can get morecoarse comparison, such as whether or not the E-groups play the same roles, orwhether they participate in the same R-groups. For example the third and fifthE-group look very similar in terms of the R-groups they take part in, but bylooking at the slices we know that they differ because they play very differentroles in those very same relations.

8.2 Visualizing Core as an Interaction Graph

A further visual understanding of the phenomenon in the multi-relational graphcan be obtained by visualizing the core as a graph. This is achieved by creatinga node for every E-group, role, and R-group. This will of course be a heteroge-neous graph. An entry in the core then could be represented in this graph asa clique on the triplet (E-group, role, R-group) it corresponds to. Since eachedge corresponds to a Tucker core entry, it’s edge can be weighted dependingon that core value entry and be interpreted as a similarity. However, if we arefocused say on predominantly understanding groups of entities, we can create atripartite graph as shown in Figure 9 which removes the edge between the roleand R-group. We shall call this graph the interaction graph to distinguish itfrom the original multi-relational graph we study.

This interaction graph can then be visualized and interesting signature pat-terns can be interpreted. See Figure 9 for some example signatures.

8.3 Analysis of the Interaction Graph

Given the interaction graph described in the previous subsection which showsthe relationship between E-groups, roles, and R-groups, we can analyze thisgraph any number of ways. For example, a popular approach to graph simpli-fication is to embed the graph into a two dimensional space. Figure 16 showssuch an embedding using PCA of the graph written in “hyper-edge” form. Thatis a n×m matrix where each column in the matrix represents a hyper-edge andentry i, j has value 1 if node i is involved in hyper-edge j. This heterogeneousobject embedding can be interpreted such that each cluster is a collection ofE-groups, roles, and R-groups that often interact.

22

Property ComputationSimplicity: To what extent are nodes connected to multiple nodesof other types versus being connected to only one node (e.g., E-groups playing multiple roles)?

Average nodedegree

Sharing: How much can E-groups be separated into independentparts? For example, can we find two sets of roles that are playedby completely non-overlapping sets of E-groups?

Mincut cost

Variability: How does the simplicity of nodes (E-groups, roles, orR-groups) vary across the interaction graph.

Variance of nodedegree; Entropyof PageRankdistribution

Stability: How stable are the interactions between roles, E-groupsand R-groups

Spectral gap

Table 5: The macroscopic properties measured on the interaction graph H. SeeFigure 17 for measurements over several congressional multi-relational graphsspanning a time frame of 30 years.

8.4 Macroscopic Properties Derived from the InteractionGraph

Given the interpretation of the core as an interaction graph, we can than under-stand the macroscopic properties of the role dynamics by analyzing the interac-tion graph properties. The metrics we study are motivated in Table 5 along withhow they are computed. These metric are meant to give the user a broad under-standing of the underlying dynamics of the graph. The simplicity property tellshow strongly aligned E-groups, roles and R-groups are, while the sharing prop-erty measure how many roles, and R-groups, are shared among different groupsof entities. The variability property, captures the amount of imbalance in thecomplexities of different nodes in the interaction graph, by calculating both thevariance of the node degrees as well as the entropy of the stationary distributionon a random walk along the interaction graph. Another important property wemeasure is the stability of the results we discovered. Here we wish to answerthe question, how robust are the patterns found within the interactions graphand how easily could those patterns change due to small perturbations.

8.5 Complex Analysis Via Role Transfer

Our work so far learned both the E-groups, role definitions and R-groups fromthe one multi-relational graph. However, we can transfer in these definitionsfrom another source by holding them fixed as constants in the Tucker decom-position. For example, if we wish to transfer in a set of existing roles, we canadjust Algorithm 1 and not solve for the F matrix that defines the roles. Thisallows us to test many interesting questions such as how transferable the rolesfrom other graphs are at explaining another multi-relational graph. We exper-

23

iment with this particular type of transfer in Figure 18, however other typesof transfer are possible. We now discuss all types but due to space limitationsshow experiments only for role transfer.

Role transfer can be used to detect to what extent roles are similar ordissimilar across different multi-relational graph. If there is a particularly in-teresting set of roles that have been studied in another graph, they can betransferred to a new graph to see how the nodes in that graph play those roles.

E-group transfer can only be used if the multi-relational graphs are on thesame entities. However if there are some well understood grouping of entities(say Democrat, Tea Party and Republican) these can be translated into E-groups and transferred to help gain understanding of the behaviors of thosespecific groups.

R-group transfer, similar to role transfer, can be used to test how wellrelation groupings transfer across multiple graphs.

9 Empirical Results

As in our previous work, all code and data sets will be made publicly availableon our website.

Since we wished to focus on analyzing both multi-relational graphs and col-lections of similar multi-relational graphs for transfer setting, we focused ourempirical analysis on the Cosponsorship Network Data [15, 16] data set. Thisdata set consists of congressional cosponsor data for over 30 years of congresses.Congressional representatives have the ability to add their name to a bill in orderto lend support to it (called cosponsoring), and it has been argued that this actis a good measure of interaction within congress because legislators spend con-siderable effort convincing other representatives to cosponsor their bills. Usingthis publicly available information about cosponsorships, each congress can bebroken up into a multi-relational graph with approximately 450 different nodes(congressional representatives) who jointly cosponsor approximately 10,000 billsper congress (many are just amendments). Table 6 show statistics for the graphcreated from the 110th congress, but in all we study the 96th-110th congresses,each of which has their own cosponsorship graph. Rather than create a cospon-sorship graph based on all of the proposed bills from a particular congress, webuild a multi-relational graph by viewing each committee as a separate relation(see Figure 10). Each bill is assigned to a committee based on the topic ofthe legislation. We analyzed bills from 15 different committees (the committeesfor which there were legislation in each congress 96th-110th) so that all of therelations are consistent over all the multi-relational graphs. Across the differentcongresses the one factor that does change is the set of elected representativeselected during each. Putting this altogether the multi-relational graph we studyis a person×person×committee tensor such that the entry at (i, j, k) indicateshow often congressman i and j cosponsored a bill that was sent to committeek for a particular congress. This graph has many underlying complexities interms of groups of congressional representatives who work together (i.e., party-

24

based and tenure-length based), the roles that congressional representatives play(e.g., focused and generalist), and the relationships of the various bill areas (e.g.,science-focused, business-focused). We study the last 15 congresses (96thto 110th) and have a multi-relational graph for each.

Graph Attribute ValueNumber Representatives 453

Number Bills 10613Sponsors Per Bill 16.9

Mean cosponsor degree (aggregated) 8.37Standard deviation (aggregated) 6.31

Number of zeros (aggregated) 1729Mean cosponsor degree (median) 0.48

Standard deviation (median) 1.02Number of zeros (median) 53235

Table 6: Details on the congressional cosponsor data set for the 110th congress.The aggregated statistics were calculated on the cosponsorship graph withouttreating it as a multi-relational graph. The median statistics measure the me-dian attribute value over each relation or committee. The number of zeros refersto the number of pairs of representatives that have no edge (or an edge of weightzero).

9.1 Studies on a Single Multi-Relational Graph

Here we present results on the analysis of the 110th Congress which sat from2007-2009. This was a Democrat controlled congress that sat during the lasttwo years of President George W. Bush’s administration. It was also unique inthat it was the first Democrat controlled congress since 1995.

We analyzed this multi-relational cosponsor graph using our formulation formulti-relational role discovery. This produced E-groups, roles, and R-groupsalong with an interaction graph that explained the interactions between thethree concepts. The E-groups are shown in Figure 12, the interpretation ofrole definitions is shown in Figure 11, and the composition of the R-groups isshown in Figure 13. How these E-groups, roles, and R-groups interact in theinteraction graph are shown both directly as a sliced core in Figure 14, as asparsified graph in Figure 15, and as a graph embedding in Figure 16.

Underlying E-groups, Roles and R-groups. Figure 12 shows that asexpected people from the same party cosponsor the same bills though this fur-ther divides into two different E-groups per party. For the two Democrat groupswe note that there is an E-group of mostly junior congressmen (group 4) whilstthe other contains many of the senior congresswoman (group 1). Of particularnote is the 5th E-group that contains a mix of Republican and Democrat repre-sentatives which largely represents a group of centralist members. For exampleMcGotter was a well known member of the moderate “Republican Main Street

25

Partnership”.Figure 11 shows the types of roles that are found in the graph via sense mak-

ing [21]. This plot shows for each role the attributes shared by representativeswho play that role. Roles can be contrasted and compared in terms of thesereference features. For example roles 2 and 4 both have comparable degree butlargely differing weight, meaning representatives from both roles participatedin cosponsorship with roughly the same number of other representatives, butrepresentatives in role 4 cosponsored with the same people more often.

Figure 13 shows the compositions of the R-groups. Each R-group is com-posed of some combination of the 15 studied relations each of which in turncorrespond to a congressional committees which is roughly interpretable as thetopic of the bill. While there is some overlap in the relational contribution ofeach R-group, each of them has a unique dominating relation (R-group 1 ‘Waysand Means’, R-group 2 ‘Rules’, R-group 3 ‘Oversight and Government Reform’,R-group 4 ‘Education and Labor’, R-group 5 ‘Agriculture’). Because we did notenforce orthogonality for our Tucker decomposition, as is commonly done (seeAlgorithm 1), we can see which relations are less distinguishing in terms of roleanalysis by looking at those relations that show up in multiple R-groups (e.g.,‘Transport and Infrastructure’ is assigned to every R-group).

Interactions Between E-groups, Roles and R-groups. We now explainthe Interaction Graph which is shown in Figure 15. As previously mentionedE-groups are largely divided by party even though party was not part of thedata set. It can be argued then that this role discovery formulation discoveredcommunities rather than roles. However the reason these groups divided alongparty lines is because parties are playing different roles in different R-groups.Depending on different factors such as which party is the majority, we expectthe parties to play different roles, so our analysis matches our expectations.

While there is much overlap in the R-groups that both parties participatein, the parties play different roles in those R-groups. For example the Repub-lican groups participate largely in R-groups 3,4,5 while the Democrat groupsparticipate largely in R-groups 1,2,3,5. However E-group 4 (Republican) andE-group 5 (Democrat) play different roles in R-group 5 (Agriculture). This isan example of a Role Tie from Figure 9.

There are also some roles and E-groups that are unique to a party. Forexample role 2 is exclusive to Republicans (many collaborators, but not manycollaborations). And R-group 1 (Ways and Means) is more strongly associatedwith the Democrat E-groups. This makes sense, because the Ways and Meanscommittee is one of the most prestigious to participate in and relates to taxlegislation. It therefore makes sense that the majority party would be mostactive in this committee.

Though the direct view of the interaction graph is useful, as discussed earlierthere are other methods to understand the interaction. We can slice the coretensor either by E-group, role, or R-group and directly compare. Figure 14shows such a comparison across E-groups. We can see that E-groups 1 and 3both play role 5 but on different R-groups, also E-group 1 plays mainly onerole, but E-group 3 plays multiple roles in the graph. Finally, we can embed

26

this graph into a metric space as shown in Figure 15.

9.2 Studies Across Multiple Multi-Relational Graphs

We also performed multi-relational analysis across a total of 15 consecutive con-gresses and report the results here. There were two experiments we performed,to analyze these multi-relational graphs and to gain insight into them. Firstin Figure 17 we analyzed how the macro-properties of the learned interactiongraphs, as discussed in Section 8.3, varied throughout the congress (see Figure17). And second we determined how well roles definitions learned from onecongress can transfer to others, as discussed in Section 8.5, the results of whichare presented in Figure 18.

Figure 17 shows the results of our analysis of macro-properties of the learnedinteraction graphs from the 96th-110th congresses. These results contain animmense amount of interesting insights and we focus on just a few due to spacerestrictions. The first unusual property is we note is a great spike of instability inthe 101st congress. This is due to the election of a new President Bush followinga very popular bipartisan President Regan. In addition many controversial billswere passed that crossed party lines such as the Americans With DisabilitiesAct. In contrast the 99th congress was very stable given it was Regan’s secondterm and most bills were supported across partisan lines. Of particular note isalso the sharp peaks during congresses 97, 101 and to lesser extent 103. Theycorrespond precisely to changes in Presidencies: Carter (Democrat) to Regan(Republican) (97), Regan (Republican) to Bush (Republican) (101) and Bush(Republican) to Clinton (Democrat) (103).

In Figure 18 we show a heat map on the role transfer between differentcongresses. We first ran our algorithm to discover the roles for all congresses.Then we transferred each set of role definitions learned from all 15 congressesto every other congress, and measured the fit to determine how well each setof roles could be used to explain the behavior of every other congress. Theheat map shows how well (dark red) or how poorly (dark blue) the roles for thecongress in the row explained the interactions for the congress in the column.Of course the diagonal is dark red since those roles were built from data for thatcongress. As expected the block red structure indicates that later congressesroles can better explain later congresses behavior and earlier congresses rolescan explain earlier congresses behavior. The solid blue block on the top lefthand corner indicates that later congresses roles are very poor at explainingthe later congresses behavior. The apparent outliers within the top right handblock and lower left hand block (i.e., the bluish entries amongst the red/yellow)are indicative of a shift in presidency or house majority either Democrat toRepublican or vice-versa.

27

10 Conclusion

Role discovery is an emerging and important area of graph mining. It looksat discovering nodes that perform similar functions in networks, but do notnecessarily belong to the same community. Existing work so far has had twolimitations: they are completely unsupervised and are focused on single rela-tional graphs.

We propose a framework that allows incorporating convex constraints intoNMF to allow a rich set of guided role discovery formulations. In particular weexplore three types of guidance: sparsity, diversity and alternativeness. Sparsityand diversity can be used to create simpler and more interpretable role defini-tions and role allocations. Also they can reduce overfitting and produce betterpredictive results for matching authors between the KDD conference and a vari-ety of other conferences provided they perform similar roles in both conferences.The notion of alternativeness has been explored in the clustering literature andis useful if the given explanation is not valid and an alternative is required. Herewe show that not only do alternative roles exist in co-author networks, but thatwe can find an alternative to the community structure in a very large YouTubegraph.

We then showed how to lift that framework to multi-relational graphs by firstrepresenting the multi-relational graph as a tensor. We then use a Tucker de-composition due to the more popular PARAFAC decomposition not being ableto find the complex interactions that are likely to occur between the E-groups,roles, and R-groups. However, existing Tucker decomposition algorithms in pop-ular toolboxes enforce properties that would lead to non-intuitive results for rolediscovery, hence we formulate our own algorithm. A critical aspect to our workis how to interpret and use the core of the Tucker decomposition which showsthe complex interactions between the E-groups, roles, and R-groups. We showhow it can be visualized and represented as an interaction graph whose proper-ties we can use as macroscopic indicators of the original multi-relational graph.Our experimental results focus on 15 multi-relational Congressional cosponsorrecord graphs. Here an E-group is a collection of congressional representatives,an R-groups is the collection of bill types (determined by the committee theywent through), with the roles being on cosponsoring behavior. We show thatour methods can find intuitive and expected insights such as Republican andDemocrats naturally separate into different E-groups. We also find that groupsof representatives can play multiple roles for multiple R-groups, showing thatthe Tucker decomposition does indeed find the complex interactions we wish todiscover. The macroscopic properties of the interaction graph show that thecongresses vary greatly over time with abrupt changes being associated withchanges in the Presidency and control of the Congress. Finally our transfersetting offers a useful insight into understanding how roles have differed acrosscongress by using the roles from different congresses to explain the behavior ofothers.

28

11 Acknowledgments

The authors gratefully acknowledge support of this research via ONR grantsN00014-09-1-0712, N00014-11- 1-0108 and NSF Grant NSF IIS-0801528. Thiswork was also supported in part by IARPA via AFRL Contract No. FA8650-10-C-7061 and in part by DAPRA under SMISC Program Agreement No. W911NF-12-C-0028.

References

[1] C. A. Andersson and R. Bro. The n-way toolbox for {MATLAB}. Chemo-metrics and Intelligent Laboratory Systems, 2000.

[2] B. W. Bader and T. G. Kolda. Algorithm 862: MATLAB tensor classes forfast algorithm prototyping. ACM Transactions on Mathematical Software,32(4):635–653, December 2006.

[3] B. W. Bader, T. G. Kolda, et al. Matlab tensor toolbox version 2.5, January2012.

[4] B. W. Bader, T. G. Kolda, et al. Matlab tensor toolbox version 2.5. Avail-able online, January 2012.

[5] S. Basu, I. Davidson, and K. Wagstaff. Constrained Clustering: Algorithms,Applications and Theory. Prentice Hall, 2008.

[6] A. Beck and M. Teboulle. Mirror descent and nonlinear projected sub-gradient methods for convex optimization. Operations Research Letters,31(3):167–175, 2003.

[7] M. W. Berry, M. Browne, A. N. Langville, V. P. Pauca, and R. J. Plem-mons. Algorithms and applications for approximate nonnegative matrixfactorization. Computational Statistics and Data Analysis, 52(1):155–173,2007.

[8] S. Boyd and L. Vandenberghe. Convex Optimization. Cambridge UniversityPress, NY, USA, 2004.

[9] R. Bro and S. De Jong. A fast non-negativity-constrained least squaresalgorithm. Journal of Chemometrics, 11(5):393–401, 1997.

[10] R. Bro and N. D. Sidiropoulos. Least squares algorithms under unimodalityand non-negativity constraints. J. of Chemometrics, 12(4):223–247, 1998.

[11] I. CVX Research. CVX: Matlab software for disciplined convex program-ming, version 2.0 beta. http://cvxr.com/cvx, Sept. 2012.

[12] I. Davidson, S. Gilpin, and P. B. Walker. Behavioral event data and theiranalysis. DMKD, 25(3):635–653, 2012.

29

[13] I. Davidson and Z. Qi. Finding alternative clusterings using constraints. InICDM, pages 773–778, 2008.

[14] J. Duchi, S. Shalev-Shwartz, Y. Singer, and T. Chandra. Efficient pro-jections onto the l1-ball for learning in high dimensions. In ICML, pages272–279, 2008.

[15] J. Fowler. Connecting the Congress: A Study of Cosponsorship Networks.Political Analysis, 2006.

[16] J. Fowler. Legislative Cosponsorship Networks in the U.S. House and Sen-ate. Social Networks, 2006.

[17] H. Furstenau and M. Lapata. Semi-supervised semantic role labeling. InEACL, pages 220–228, 2009.

[18] S. Gilpin, T. Eliassi-Rad, and I. Davidson. Guided learning for role discov-ery (glrd): Framework, algorithms, and applications. In KDD, 2013.

[19] M. Grant and S. Boyd. Graph implementations for nonsmooth convexprograms. In V. Blondel, S. Boyd, and H. Kimura, editors, Recent Ad-vances in Learning and Control, Lecture Notes in Control and Infor-mation Sciences, pages 95–110. Springer-Verlag Limited, 2008. http:

//stanford.edu/~boyd/graph_dcp.html.

[20] W. Heiser and P. Kroonenberg. Dimensionwise fitting in PARAFAC-CANDECOMP with missing data and constrained parameters. TechnicalReport PRM 97-01, University of Leiden, The Netherlands, 1997.

[21] K. Henderson, B. Gallagher, T. Eliassi-Rad, H. Tong, S. Basu, L. Akoglu,D. Koutra, C. Faloutsos, and L. Li. RolX: Structural role extraction &mining in large graphs. In KDD, pages 1231–1239, 2012.

[22] K. Henderson, B. Gallagher, L. Li, L. Akoglu, T. Eliassi-Rad, H. Tong, andC. Faloutsos. It’s who you know: Graph mining using recursive structuralfeatures. In KDD, pages 663–671, 2011.

[23] P. O. Hoyer. Non-negative matrix factorization with sparseness constraints.JMLR, 5:1457–1469, 2004.

[24] Y.-D. Kim and S. Choi. Nonnegative tucker decomposition. In ComputerVision and Pattern Recognition, 2007. CVPR ’07. IEEE Conference on,pages 1–8, June 2007.

[25] T. G. Kolda and B. W. Bader. Tensor decompositions and applications.SIAM Review, 51(3):455–500, September 2009.

[26] D. D. Lee and H. S. Seung. Algorithms for non-negative matrix factoriza-tion. In NIPS, pages 556–562. MIT Press, 2000.

30

[27] M. Ley. DBLP, computer science bibliography. http://www.informatik.uni-trier.de/~ley/db/.

[28] T. Li and C. Ding. The relationships among various nonnegative matrixfactorization methods for clustering. In ICDM, pages 362–371, 2006.

[29] C.-J. Lin. Projected gradient methods for nonnegative matrix factorization.Neural Comput., 19(10):2756–2779, Oct. 2007.

[30] H. Liu, Z. Wu, X. Li, D. Cai, and T. Huang. Constrained nonnegative ma-trix factorization for image representation. PAMI, 34(7):1299–1311, 2012.

[31] J. Liu, S. Ji, and J. Ye. SLEP: Sparse Learning with Efficient Projections.Arizona State University, 2009.

[32] J. Liu and J. Ye. Efficient Euclidean projections in linear time. In ICML,pages 657–664, 2009.

[33] A. Mislove, M. Marcon, K. P. Gummadi, P. Druschel, and B. Bhattacharjee.Measurement and analysis of online social networks. In IMC, pages 29–42,2007.

[34] M. Morup, L. K. Hansen, and S. M. Arnfred. Algorithms for sparse non-negative tucker decompositions. Neural Computat, 20:2112–2141, 2008.

[35] P. Paatero and U. Tapper. Positive matrix factorization: A non-negativefactor model with optimal utilization of error estimates of data values.Environmetrics, 5(2):111–126, 1994.

[36] Z. Qi and I. Davidson. A principled and flexible framework for findingalternative clusterings. In KDD, pages 717–726, 2009.

[37] M. Somaiya, C. Jermaine, and S. Ranka. Mixture models for learninglow-dimensional roles in high-dimensional data. In KDD, pages 909–918,2010.

[38] G. Tomasi and R. Bro. A comparison of algorithms for fitting the parafacmodel. Computational Statistics & Data Analysis, 50(7):1700–1734, April2006.

[39] L. N. Trefethen and D. Bau. Numerical Linear Algebra. SIAM, 1997.

[40] F. Wang, T. Li, X. Wang, S. Zhu, and C. Ding. Community discovery usingnonnegative matrix factorization. DMKD, 22(3):493–521, 2011.

31

2 3 4 5 60

0.05

0.1

0.15

0.2

0.25

Set Size (Log Scale)

Rec

all

(a) CIKM

2 3 4 5 60

0.1

0.2

0.3

0.4

0.5


Rec

all

(b) SDM

2 3 4 5 60

0.1

0.2

0.3

0.4


Rec

all

ReFeXRolXGLRD(Sparse)GLRD(Diverse)

(c) ICDM

2 3 4 5 60

0.1

0.2

0.3

0.4


Rec

all

(d) SIGMOD

2 3 4 5 60

0.1

0.2

0.3

0.4


Rec

all

(e) VLDB

Figure 3: Comparison of role discovery techniques for identity resolution acrossgraphs. Role definitions are learned from the KDD co-authorship graph; then,authors from the other (conference) co-authorship graphs are assigned to theseroles using various techniques. In particular, we show results for ReFeX (fea-tures only), RolX (unconstrained role discovery), GLRD-Sparse (role discoverywith sparsity constraints), and GLRD-Diverse (role discovery with diversityconstraints). Authors from each conference are paired with increasing numberof nearest neighbors from KDD conference (x-axis) and the resulting recall isreported (y-axis). Across most settings role definitions using sparsity and diver-sity constraints lead to better identity resolution results than standard uncon-strained RolX. For graphs that are most similar in nature to KDD (e.g. ICDM,SDM, CIKM) the transfer of role definitions lead to better results than sim-ply using structural features of nodes directly. Note that the recall values arerelatively low because the set sizes (on the x-axis) are small compared to thepopulation size in each graph.

32

Figure 4: Comparison of role discovery techniques for identity resolution ex-periments. Authors from each conference paired with the nearest 32 neighborsfrom KDD conference; the resulting recall accuracy is reported. The percent-age number (on the x-axis) is the fraction of authors that overlap between thetwo conferences. Nearly all experiments show better results with sparsity anddiversity constraints except when the authors do not share similar roles in thetwo conferences (SIGMOD and VLDB).

33

Steve_Kelling

Wesley_M._Hochachka

Mirek_Riedewald

Daniel_Fink

Mohamed_Farid_Elhawary

Daria_Sorokina

Rich_Caruana

Nam_Nguyen

Art_Munson

Shaojun_Wang

Chi-Hoon_LeeJayant_R._Kalagnanam

Oivind_Johnsen

Cristian_Bucila

Lei_Zheng

Shyam_Kapur

Ye_Chen

John_F._Canny

Dmitry_Pavlov

Ramnath_Balasubramanyan

Wojciech_Gryc

Sai_Zeng

Chidanand_Apté

Conrad_Murphy

Tong_Zhang

Christian_A._Lang

Ioana_M._Boier-Martin

Prem_Melville

Naval_K._Verma

Robert_A._Stine

Lyle_H._Ungar

Alexandrin_Popescul

Byron_Dom

Dean_P._Foster

Jing_Zhou

Jignashu_Parikh

Cary_Dehing-Oberije

D._Sculley

Chase_Krumpelman

Glenn_Fung

Sugato_BasuR._Bharat_Rao Kazunori_Okada

Alina_Beygelzimer

Kristin_P._Bennett

David_P._Naidich

Senthil_Periaswamy Jinbo_Bi

Nancy_Obuchowski

Toshiro_Kubota

Aurelie_C._Lozano

Andrew_Arnold

Naoki_Abe

Alexandru_Niculescu-Mizil

Richard_D._Lawrence

Jonathan_R._M._Hosking

Yan_Liu

Hongfei_Li

Robert_Schroko

Jason_V._Davis Claudia_Perlich

John_Langford

Yuqiang_Guan

Brian_Kulis

Bianca_Zadrozny

Saharon_Rosset

Mikhail_Bilenko

Robert_G._Malkin

Sonal_Gupta

Matthew_Richardson

Sathyakama_Sandilya

Roberto_J._Bayardo

Marcos_Salganicoff

Dhiman_Barman

Dimitrios_Gunopulos

Flip_Korn

Ian_Davidson

Divesh_Srivastava

Dejan_Diklic

Shawn_NewsamErick_Cantú-Paz Gene_Alexander

Yung_Chang

Chris_H._Q._Ding

Byron_J._GaoSudhir_KumarOliver_Schulte

Jin-yi_CaiJieping_Ye

Qi_Li

Lei_Yu

Gautam_Das

Christos_Boutsidis

Petros_Drineas

Yu-Ru_Lin

Jimeng_SunMichael_W._Mahoney

Ravi_B._Konuru

Mauro_MaggioniAisling_Kelliher

Anil_K._Jain

Kewei_Chen

Jing_LiJiangxin_Wang

Teresa_Wu

Zheng_Zhao

Eric_Reiman

Min_BaeRinkal_Patel

Zohreh_NazeriHuan_Liu

Thomas_Bernecker Florian_Verhein

Inderjit_S._Dhillon

Martin_Pfeifle

Srujana_Merugu

Andreas_Züfle Matthias_Renz

Sriram_Krishnan

Rómer_Rosales Nena_M._Marin

Raymond_J._Mooney Meghana_Deodhar

Srivatsava_Daruru

Hans-Peter_Kriegel Deepak_AgarwalDimitrios_Kotsakos

Manolis_PlatakisBenjamin_AraiAshish_Grover

Goutam_Paul

Dharmendra_S._Modha

Arindam_Banerjee

Joydeep_Ghosh

Philippe_Lambin

Shipeng_Yu Matt_Walker

Kai_Yu

Obi_L._Griffith

Mohsen_JamaliSteven_J._M._Jones

Arno_Knobbe

Richard_Frank

Volker_Tresp

Zengjian_HuWen_JinS._S._Ravi

Rong_Ge

Flavia_Moser

Jeff_M._Phillips

Zhengyuan_Zhu

Martin_Ester

Jun_Liu

Betul_CeranLiang_Sun

Shuiwang_Ji

Jianhui_Chen

Jianping_Zhang

Lei_TangMingrui_Wu

Andrew_McGregor

Neal_E._YoungBee-Chung_Chen

Giri_Kumar_TayiSuresh_Venkatasubramanian

Kevin_Chen-Chuan_Chang

Wei_Wang_0009

Yifan_Li

Xiaoyuan_Wang

Lin_Deng

Raymond_Chi-Wing_Wong

Ada_Wai-Chee_Fu

Junfeng_Wang

Baile_Shi

Chen_Wang

Zhen_LiaoYan_Li

Huanhuan_Cao

Bin_Zhou_0002

Qi_He

Krishna_Gade

Nikos_Mamoulis

David_W._Cheung

Jian_Xu

Jiong_Yang

Hongxing_He

Meng_Hu

Graham_J._Williams

Jan_Prins

Jun_Huan

Jinze_Liu

Huidong_Jin

Dik_Lun_Lee

Panagiotis_Karras

Zhiping_Zeng

Shuigeng_Zhou

Daxin_Jiang

Yutao_Shou

Jing_Wang

Dimitris_Sacharidis

Yi_Wang

Jianyong_Wang

Yufei_Tao

Yuzhou_Zhang

Lizhu_Zhou

Shuang-Hong_Yang

Xiaokui_Xiao

Xin_Zhang

Ziyu_GuanCan_Wang

Charu_C._Aggarwal

Jinwen_Ma

Chun_Tang

Wei_Vivian_Zhang

Aidong_Zhang

Jiajun_Bu

Chun_Chen

Mohammed_Javeed_Zaki

Lise_Getoor

Gregory_Piatetsky-Shapiro

Ralph_Krieger

Ira_AssentThomas_Seidl

Emmanuel_Müller

Timm_Jansen

Chabane_Djeraba

Markus_Peters

Andrew_Y._Wu

Yongtai_Zhu

Jian_Pei

Yabo_Xu

Bo_Zhang_0002

Xuemin_Lin

Taehyong_Kim

Woochang_Hwang

Na_TaMurali_Ramanathan

Jianhua_Feng

Qi_Zhang

Wei_Wang_0010

Fei_ZouRichard_F._Helm

Leonard_McMillan

Debprakash_Patnaik Yong_Ju_Cho

Malcolm_Potts

Yang_CaoRatnesh_K._Sharma

Xiang_Zhang

Naren_Ramakrishnan

Bud_Mishra

Deept_Kumar

Mohammed_J._Zaki

Manish_Marwah

Lizhuang_Zhao

Rick_Pechter

Michael_R._Berthold Christoph_Lingenfelder

Mike_Hoskins

Wayne_Thompson Rich_Holada

Michael_Zeller

Erik_Marcade

Ashish_Verma

Ajay_Gupta Kevin_English

Jeff_Achtermann

Shourya_Roy

Indrajit_Bhattacharya

Sachindra_Joshi

Shantanu_Godbole

Ronen_Feldman

Mustafa_BilgicElena_ZhelevaHossam_Sharara

Louis_Licamele

Robert_Grossman

Aron_Culotta

Khashayar_Rohanimanesh Xiaojin_Zhu

Matthew_Marzilli

Robert_Hall

Michael_L._Wick

Andrew_McCallum Karl_Schultz

Xuerui_Wang

Charles_A._Sutton

Chris_Pal

Gregory_Druck

Zhong_Su

Zhili_Guo

Limin_YaoVincent_C._S._Lee

Xiaoxun_Zhang

Honglei_Guo

Shuguo_Han

Huijia_Zhu

Wee_Keong_Ng

David_M._Mimno

Ramdev_Kanapady

Lexiang_Ye

Wei_Peng

Jie_Tang

Sheng_Ma

Ping_Luo

Yuhong_XiongFen_Lin

Zhongzhi_Shi

Yong_Zhao

Chad_L._Myers

Gang_FangChristopher_Potter Shyam_Boriah

György_J._Simon Gaurav_Pandey

Gowtham_AtluriTushar_Garg Rohit_Gupta

Zhi-Li_Zhang

Kevin_Lü

Wei_TangYong_Ge

Xiaofeng_Gao

Zhi-Hua_ZhouLei_Yuan

Bao-Hong_Shen

Wenjun_Zhou

Hui_Xiong

Haesun_Park Ravi_Janardan

Blayne_Field

Vipin_KumarJerry_Scripps

Tie_Wang

Steven_A._Klooster Michael_Steinbach

Shashi_Shekhar

Pang-Ning_Tan

Ramendra_K._Sahoo

Ying-Xin_Li

Jie_Wu

Jian_Chen

Weili_WuJunjie_Wu

Shi_Zhong

Peng_Wu

Li_Zhang

Haibin_Cheng

Jing_ZhangAleksandar_Lazarevic

Abdol-Hossein_Esfahanian

Juanzi_Li

Li_Wan

Zhixue_Han

Yanfang_YeQingshan_Jiang

Heng_HuangFeng_Liang Dijun_Luo

Chandrika_Kamath Dingding_Wang

Dongyi_Ye

Yizhou_Sun

Xiaoxin_Yin

Siau-Cheng_Khoo

Chengnian_Sun

Chen_ChenJiawei_Han

Jing_Gao

Yintao_Yu

Jing_Jiang

Philip_S._Yu

Wei_Fan

Ming_Hua

Huiping_Cao

Bin_Jiang

George_Karypis

George_Kollios

Enhong_Chen

Hang_LiEamonn_J._Keogh

Yunbo_Cao

Haixun_Wang

Jian_Yin

Marios_Hadjieleftheriou

Jeffrey_Xu_Yu

Gu_Xu

Jin_Shieh

Sang-Hee_Lee

Victor_B._Zordan

Chotirat_(Ann)_Ratanamahatana

ZhaoHui_Tang

Jeffrey_P._Lankford Xiaoyue_Wang

Stefano_LonardiBill_Yuan-chi_Chiu

Dragomir_Yankov

Donna_M._Nystrom

Li_WeiJessica_Lin

Jose_Medina

Qiang_Zhu_0002

Steven_Loscalzo

Tao_Li

Charles_Perng

Zi_YangChi_Wang

Gabriel_Pui_Cheong_Fung

Michail_VlachosRong_Jin

Horst_D._Simon

Tianbao_Yang

Paul_Castro

Aris_Anagnostopoulos

Joe_McCloskey

Amin_Saberi

Hari_Sundaram

Jennifer_T._Chayes

Yi_Liu

Robert_F._Murphy

Vitor_R._Carvalho

Amr_Ahmed

John_D._Lafferty

Sunita_Sarawagi

Kin_Ung

Ramesh_Nallapati Susan_Ditmore

William_W._Cohen

Eric_P._Xing

Di_Wu

Seung-won_Hwang

Jongwuk_Lee

Ruihua_Song

Shuyi_ZhengMatthew_R._Scott

Xiaowen_Ding

Rui_Cai

Xiaolin_Shi

Jiang-Ming_Yang Zaiqing_Nie

Lei_Zhang

Lie_Lu

Sourav_S._Bhowmick

Muyuan_Wang

Naiyao_Zhang

Qiankun_ZhaoTie-Yan_Liu

Wei-Ying_MaWeimin_Xiao

Ji-Rong_Wen

Bing_Liu

Hsiao-Wuen_Hon

Andreas_Schaller

Bo_Zhang

Thomas_M._Tirpak

Jeffrey_Benkler

Kaidi_Zhao

Minqing_Hu

Jun_Zhu

Bin_Gao

Hua_Huang

Benyu_Zhang

Chunsong_Wang

Jun_Yan

QianSheng_Cheng

Xin_Zheng

Weiguo_FanZhiwei_Li

Shuicheng_Yan

Fabian_Mörchen Bernd_Wachmann

Mathäus_Dejori

Dmitriy_Fradkin

Markus_Bundschus

Julien_Etienne

Hua_Li

Jian-Tao_Sun

Zheng_Chen

Yong_Yu

Tengjiao_Wang

Gui-Rong_Xue

Wensi_Xi

Bishan_Yang

Qiang_Yang

Sinno_Jialin_Pan

Andrew_B._Goldberg

Saara_Hyvönen

Evimaria_Terzi

Jerry_Kiernan

Krishna_Kummamuru Amit_SasturkarDavid_J._CrandallLuis_A._N._Amaral

Xiangyang_LanSpiros_Papadimitriou

Deepayan_Chakrabarti

Vanja_Josifovski

Andrei_Z._Broder

Ashwin_Satyanarayana Zijie_Qi

Boulos_Harb

Mayssam_Sayyadian

Peer_Kröger

Elke_AchtertKun_Liu

Pauli_Miettinen

Theodoros_Lappas Matthias_Schubert

Arthur_Zimek

Rakesh_Agrawal

Jouni_K._Seppänen

Anitha_Kannan

John_C._Shafer

Ariel_FuxmanTaneli_Mielikäinen

Eino_Hinkkanen

Panayiotis_Tsaparas Raghu_Krishnapuram

Kai_Puolamäki Niko_Vuokko

Markus_OjalaEsa_Junttila

Gemma_C._Garriga

Helger_Lipmaa

Sami_Hanhijärvi

Mikael_ForteliusNikolaj_Tatti

Debora_Donato

Carlos_Castillo

Paolo_Boldi

Luca_BecchettiAntti_Ukkonen

Francesco_Bonchi

Katrin_Haegler

Christian_Böhm R._Dean_Malmgren

Nikola_S._Müller Jake_M._Hofman

Alexander_Hinneburg

Siddharth_Suri

Dan_Cosley

Claudia_Plant

Heikki_MannilaFoto_N._AfratiSven_LaurAristides_Gionis

Hannes_Heikinheimo

Hyung-Jeong_Yang

U._Kang

B._Aditya_Prakash

Charalampos_E._Tsourakakis

Daniel_P._Huttenlocher

David_Andersen

Duncan_J._Watts

Edoardo_Airoldi

Gary_L._MillerGueorgi_Kossinets

Jon_M._KleinbergNancy_S._Pollard

James_McCann

Stephen_Bay

Jure_Leskovec

Jeanne_M._VanBriesen

Christos_Faloutsos Lei_Li

Andrew_TomkinsFan_GuoAlessandro_Panconesi

Kevin_S._McCurley

Todd_C._MowryBelle_L._Tseng Kensuke_Onuma

Xiaodan_SongDavid_Selinger

Lars_Backstrom

Patrick_C._K._Hung

Bruce_R._Schatz

Cheuk-kwong_Lee Bin_Tan

Bei_Yu

Qihong_Shao

Ruofei_Zhang

Noman_Mohammed Tina_Eliassi-Rad

Daniel_GruhlJasmine_Novak

Brian_GallagherAlexander_Tuzhilin

Wenjie_FuAtulya_Velivelli

Dengyong_Zhou

Rohini_K._Srihari

Tao_TaoKoji_Hino

Ming-Ting_SunChing-Yung_Lin

Sandeep_Khanzode Junghoo_Cho

Jia-Yu_PanNan_Du

Nicholas_Valler

Markus_G._Anderle

Yehuda_Koren

Natalie_S._Glance

Michalis_Faloutsos

Pinar_Duygulu

Mary_McGlohon

Brian_Taylor

Lisa_Friedland

Haoqiang_Zheng

Matthew_J._Rattigan

Zhiqiang_ZhengJohn_Komoroske

Henry_G._Goldberg

Özgür_Simsek Robert_M._Bell

Zhen_GuoAgma_J._M._Traina André_G._R._Balan Stephen_C._North

Chris_Volinsky

Zhongfei_Zhang

Khalid_El-Arini

Marc_MaierGaurav_Veda

Jennifer_Neville

David_Jensen

Bai_Wang

Jean_BolotAndreas_Krause

Ashwin_SridharanSridhar_Machiraju

Leman_Akoglu

Mukund_SeshadriCarlos_GuestrinDavid_M._Steier

Dafna_Shahaf

Kelly_Palmer

Brian_Neil_Levine

Andrew_Fast

Balaji_Padmanabhan

Hong_Zhang

Robert_Stockton

Kamal_Nigam

Matthew_Hurst

Matthew_Siegler

Takashi_Tomokiyo

Deepavali_Bhagwat

Ludmila_Cherkasova

Joseph_Tucek

Jaap_Suermondt

Alistair_C._Veitch Pankaj_Mehra

Kave_Eshghi

Charles_B._Morrey

Evan_Kirshenbaum

Ingo_Mierswa

Hung_Hay_Ho

Alfred_Ultsch

Wei_Jin

Tushar_Saxena

Rong_Pan

Li_Xu

Jeffrey_Junfeng_Pan

Dou_Shen

Xiao_Ling

Vincent_Wenchen_Zheng

Wenyuan_Dai

Junhui_Zhao

Stephane_Chiocchetti

Ron_Bekkerman

Michael_WurstGeorge_Forman

Timm_Euler

Shyamsundar_Rajaram Ralf_Klinkenberg

Krishnamurthy_Viswanathan Martin_Scholz

Wei_Su

Dawei_LiuDamien_McAullayJiuyong_Li

Chris_Kelman

Ross_Sparks

Lei_Chen_0002

Jie_Chen

Yingyi_Bu

Ee-Peng_Lim

Benjamin_C._M._Fung

Hady_Wirawan_Lauw

Michael_Garland

David_A._Padua

Rui_Li

Bin_He

Ke_WangOlivier_Verscheure

Kun_Zhang

Zhongfei_(Mark)_Zhang

Xifeng_YanBo_Long

Jing_Peng

David_Lo

Hong_Cheng

Deepak_S._Turaga

Shengnan_Cong

Qiaozhu_MeiYi_Chen

Xuehua_Shen

Dong_XinXianghong_Jasmine_Zhou

Zhijun_Yin

Xuanhui_Wang

Ka_Cheung_SiaXu_Ling

Xiao_HuNikos_Anerousis

Jun'ichi_Tatemura

Paat_Rusmevichientong ChengXiang_Zhai

Richard_SproatShu_Tao Xiaoyun_Wu

Yun_Chi

Christian_Borgs

Shenghuo_ZhuChao_Liu_0001

Mohammad_Mahdian ErHeng_Zhong

Jiangtao_Ren

Flavio_ChierichettiRamanathan_V._Guha

Silvio_LattanziD._Sivakumar

Prabhakar_Raghavan

Ravi_Kumar

Michael_Mitzenmacher

Kunal_Punera

Sudipto_Guha

Hanghang_Tong

Dacheng_Tao

Ravi_Sundaram

Uma_Mahadevan

Anirban_Dasgupta

Figure 5: A visualization of our alternative role discovery results for the KDDco-authorship graph’s largest connected component. All the colored nodes be-long to the same primary role under the original factorization. However, theybelong to different primary roles under the alternative factorization, as indi-cated by the various colors. We observe that the alternative roles are able toseparate the 3 blue “local-star” nodes (namely, Jun Zhu, Lei Zhang, and Evi-maria Terzi) from the red “global-broker” nodes (namely, Christos Faloutsos,Heikki Mannila, Vipin Kumar, etc). The alternative roles also separate out the4 yellow “periphery-cliquey” nodes. Note, the reader can zoom in on this figureto read the names of each author.

𝑔1

𝐹

no

de

s

features features

no

de

s

𝒱 ≈ = +⋯+ 𝐺

𝑓1

𝑟1

𝑔𝑘

𝑓𝑘

𝑟𝑘

𝑅

Figure 6: A multi-relational graph represented using an order 3 tensor. ThePARAFAC tensor decomposition is a rank 1 simplification of the graph and isthe natural analog to the earlier used [21, 18] NMF formulation of role discov-ery. However, it has significant limitations for role discovery in multi-relationalgraphs.

34

no

des

features

E-gr

ou

ps

roles

E-group definitions

role

d

efin

itio

ns

Figure 7: The Tucker decomposition for role discovery. The factor matricescan be interpreted as: groups of features (role definitions), groups of entities(E-groups), and groups of relations (R-groups). The Tucker core shows howthe roles/E-groups/R-groups interact in the multi-relational graph and can beviewed itself as a hyper-graph which we call an example of an interaction graph.

roles

topics

Figure 8: Analysis of E-group slices from the tensor core. Each slide shows theroles/R-groups each E-group of people play and are directly comparable.

35

Role E-Group R-Group

No Tie

R-Group Tie

Role Tie

Bow Tie

Figure 9: Some patterns that can exist in an interaction graph. No Tie: E-grouponly plays one role in one R-group; R-Group Tie: E-group plays same role inmultiple R-groups; Role Tie: E-group plays multiple roles in same R-groups;Bow Tie: E-group plays multiple roles but in different R-groups.

36

Pelosi

Reid Blunt

Education Bill 1 Education Bill 2 Agriculture Bill 2

Education Bill 2 Agriculture Bill 1 Agriculture Bill 2

Education Bill 1 Education Bill 2 Agriculture Bill 1

A:1

A:1

E:2

E:1

E:1

Figure 10: Description of how multi-relational graphs are created from the con-gressional cosponsors data. Nodes in this graph represent congressional repre-sentatives and the adjacent lists of hypothetical bills are those that the represen-tative cosponsored. When two representatives cosponsor the same bill, a labelededge is created between them where the label corresponds to the assigned com-mittee for the bill (e.g. Agriculture, Education). The weight associated with alabeled edge corresponds to the number of bills from the same committee a pairof representatives both cosponsored.

37

1 2 3 4 50

0.2

0.4

0.6

0.8

1

Pro

pert

y C

ontr

ibut

ion

Role

DegreeWeightClustering CoefficientPage RankEccentricitiesBiconnected Components

Figure 11: Sense making of roles discovered in the 110th Congress CosponsorMulti-Relational Graph. Roles are redefined in terms of a set of reference fea-tures each of which is normalized for comparison purposes. Role 3 are the powerbrokers.

38

E-group 1Name Party Exp

Millender-McDonald D 11Obey, David D 38Tsongas, Niki D 0Speier, Jackie D 0

Faleomavaega, Eni D 18Meehan, Martin D 14Edwards, Donna D 0Visclosky, Peter D 22

Hoyer, Steny D 26Foster, Bill D 0

(a) Democrat seniority. Hoyer was the majorityleader. Characterized by large number of col-laboration with many representatives largely in3rd R-group (Ways and Means).


Hensarling, Jeb R 4Boehner, John R 16

Thornberry, Mac R 12Broun, Paul R 0

Shadegg, John R 12Hastert, Dennis R 8Scalise, Steve R 11Latta, Robert R 6

Flake, Jeff R 6McCrery, Jim R 14

(b) Republican seniority. Boehner was mi-nority leader at the time.


Cooper, Jim D 16Johnson, Henry D 0

Ryan, Tim D 4DeGette, Diana D 10Engel, Eliot L. D 14Doggett, Lloyd D 12

Pastor, Ed D 16Meek, Kendrick D 4

Murphy, C. D 0Crowley, Joseph D 8

(c) Active largely in R-group (5th) butwith multiple roles. The 5th R-group isdominated by the agriculture committee.


Hall, Ralph R 16Rodgers, Cathy R 2

Myrick, Sue R 12Issa, Darrell R 6

Drake, Thelma R 2Kuhl, Randy R 2

Poe, Ted R 2Boozman, John R 6

Conaway, Michael R 2Wamp, Zach R 12

(d) Working with many representatives(high degree) but not often (low weight) onR-group 5.


Jackson-Lee, Sheila D 12Cohen, Steve D 0

Hare, Phil D 0Grijalva, Raul D 4English, Phil R 12

Honda, Michael D 6McCotter, Thaddeus R 4

Filner, Bob D 14Hinchey, Maurice D 14Gonzalez, Charles D 8

(e) Mixed party membership

Figure 12: Samples of congressional representatives from each E-group (foundin in the 110th Congress Cosponsorship Graph) along with their party affiliationand years of service in U.S. House of Representatives at beginning of congress(2007). 39

0 0.2 0.4 0.6 0.8

Energy and CommerceRules

Small BusinessTransportation and Infrastructure

AppropriationsVeterans’ Affairs

Education and LaborAgriculture

Ways and Means

Financial ServicesOversight and Government Reform

JudiciaryNatural Resources

BudgetScience and Technology

Relational Topic 1

0 0.2 0.4 0.6 0.8

12

34

56

789

1011

1213

1415

Relational Topic 2

0 0.2 0.4 0.6 0.8

12

34

56

789

1011

1213

1415

Relational Topic 3

0 0.2 0.4 0.6 0.8

Energy and CommerceRules

Small BusinessTransportation and Infrastructure

AppropriationsVeterans’ Affairs

Education and LaborAgriculture

Ways and Means

Financial ServicesOversight and Government Reform

JudiciaryNatural Resources

BudgetScience and Technology

Relational Topic 4

0 0.2 0.4 0.6 0.8

12

34

56

789

1011

1213

1415

Relational Topic 5

Figure 13: R-groups for 100th congress. Each bar plot corresponds to a singleR-group and the bars show how much each relation contributes to the respectiverelation R-group.

40

Roles

Rel

atio

nal T

opic

s

Group 1

1 2 3 4 5

1

2

3

4

5

Roles

Rel

atio

nal T

opic

s

Group 2

1 2 3 4 5

1

2

3

4

5

Roles

Rel

atio

nal T

opic

s

Group 3

1 2 3 4 5

1

2

3

4

5

Roles

Rel

atio

nal T

opic

s

Group 4

1 2 3 4 5

1

2

3

4

5

Roles

Rel

atio

nal T

opic

s

Group 5

1 2 3 4 5

1

2

3

4

5

Figure 14: Tucker core found in in the 110th Congress Cosponsorship Graphsliced by E-group. Each slice represents an E-group while the rows correspondto R-groups and the columns correspond to roles. Light colors correspond tohigh values and black corresponds to zero value.

41

Role 3

Role 4

Role 2

Role1

Role 5

E-group

3

E-group

4

E-group

2

E-group

1

E-group

5

R-group

3

R-group

4

R-group

2

R-group

1

R-group

5

Figure 15: Sparsified tripartite representation of core tensor found in the 110thCongress Cosponsorship Graph. Each entry i, j, k of core corresponds to ahyperedge between E-group i, role j, and R-group k We sparsify this into asingle-relation graph that is role focused. Looking back to our example patterns(see Figure 9), we observe that this congress has two bow ties patterns (E-groups2 and 4, 100% Republicans); one no tie pattern (E-group 5, 80% Democrats),one role tie pattern (E-group 3, 100% Democrats), and one R-group tie (E-group1, 100% Democrats). Figure 12 lists the members of each E-group.

42

Figure 16: Projection and heterogeneous clustering of tripartite graph repre-sentation of core tensor found for the 110th Congress Cosponsor Graph. Colorsrepresent the clustering while marker shapes represent the type of object (E-groups, roles, and R-groups).

43

096 098 100 102 104 106 108 110

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0.5

Congress

Con

trib

utio

n

Degree (Mean)

Degree (Variance)

Eigen Gap

Cut Cost

Stationary Dist. (Entropy)

Meta−Average

Figure 17: Properties of interaction graph formed from the Tucker cores for thelast 15 Congresses. Attributes are all normalized for comparison purposes.

44

Congress Transfer To

Con

gres

s T

rans

fer

Fro

m

096 098 100 102 104 106 108 110

110

108

106

104

102

100

098

0960

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Figure 18: Fit quality when transferring roles between 96th to 110th congress.fit = 1 − reconstruction error/||V||. Roles learned from congresses on the x-axis are transferred to each congress as denoted on the y-axis. Transferring totemporally further congresses generally leads to poorer fits.

45

Some Advances in Role Discovery in GraphsTina Eliassi-Rad Rutgers University [email protected]...

Documents

Transcript of Some Advances in Role Discovery in GraphsTina Eliassi-Rad Rutgers University [email protected]...