Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf ·...

Community Detection with Partially Observable Links and Node Attributes

Xiaokai Wei, Bokai Cao, Weixiang Shao, Chun-Ta Lu and Philip S. Yu

2

Partially Observable Network

§  Networks are partially observable w.r.t. links and attributes under various scenarios: – Online social network: news users typically have no

links and some users does not fill up their profile information. Besides, prudent users might set strict privacy setting which limits the visibility of their profile information and connections.

– Bioinformatics: It is more costly to obtain attributes/interaction links for certain genes.

– Terrorist network: some terrorists’ attribute and linkage information could be difficult to obtain.

3

Introduction

§  Link and node attributes provide complementary information about nodes

§  Challenge: nodes with only observable links and nodes with only observable attributes are not directly comparable

v4

v3

v2

v1

v6

Node attribute viewLink structure view

v4

v3

v2

v1v5

4

Introduction

§  POLNA: –  community detection with Partially Observable

Links and Node Attributes

§  Idea: – Learn a latent representation for each node via

kernel matrix alignment based on either link structure or node content

– Link-based representations and content-based representations of fully observable nodes interact with each other by co-regularization

5

Formulation

Definition 1 Information Network An information network G = (V,E,X)

consists of V , the set of nodes, E ✓ V ⇥ V , the set of links, and the feature

matrix X = [x1,x2, . . . ,xn] (n = |V |), where xi 2 RD(i = 1 . . . n) is the feature

vector of node vi.

Definition 2 Partially Observable Information Network In a partially

observable information network G = (V,E,X), each node belongs to one or

two (overlapping) of the following sets: the set of nodes with observable links

(denoted as Og) and nodes with observable attributes (denoted as Oa

). We

further denote the set of nodes with both observable links and attributes as Os=

Og \ Oa. Note that V = Og [ Oa

since we assume each node has at least one

source of information.

Let the number of nodes with observable links or observable attributes be

ng= |Og| and na

= |Oa|, respectively. For more concise notations in the

following sections, we assign local indices 1 ⇠ ngto the nodes in Og

and 1 ⇠ na

to the nodes in Oa. A mapping function �g

(i) (or �a(i)) maps the local index

i 2 {1, . . . , ng} (or i 2 {1, . . . , na}) in Og(or Oa

) to its global index.

6

Matrix Alignment

§  Measuring the similarity between two (vectorized) matrices

Matrix Alignment For two matrices K1 2 Rn⇥nand K2 2 Rn⇥n

(assume

||K1||F > 0 and ||K2||F > 0), the alignment between K1 and K2 is defined as

⇢(K1,K2) =Tr(K1K2)

||K1||F · ||K2||F(1)

where Tr(·) is the trace of a matrix.

Unnormalized Matrix Alignment For two matrices K1 2 Rn⇥nand

K2 2 Rn⇥n(assume ||K1||F > 0 and ||K2||F > 0), the alignment between K1

and K2 is defined as

⇢(K1,K2) = Tr(K1K2) (1)

Centered Matrix Alignment For two matrices K1 2 Rn⇥nand K2 2

Rn⇥n(assume ||K1||F > 0 and ||K2||F > 0), the centered alignment between

K1 and K2 is defined as

⇢(K1,K2) =Tr(HK1HHK2H)

=Tr(HK1HK2)

where the second equation can be obtained by noting HH = H and Tr(AB) =

Tr(BA) for arbitrary matrices A,B 2 Rn⇥n. H = I� 1

n11T

7

Representation Learning with Kernel Matrix Alignment

§  Kernel constructed from latent representations

§  Learn latent representations that best align with the network structure and attribute kernel

Suppose Kg 2 Rng⇥ngand Ka 2 Rna⇥na

are kernel matrices computed from

the latent representation Ugand Ua

with Gaussian Kernel, respectively.

Kgij =e�||Ug

i �Ugj ||

2

Kaij =e�||Ua

i �Uaj ||

2

minUg

fg = �Tr(HLgHKg) + �||Ug||2F

minUa

fa = �Tr(HLaHKa) + �||Ua||2F

Attribute-based representation

Network-based representation

8

Partial Co-regularization

§  Learn a consensus from the link-based representation and attribute-based representation

§  POLNA with hard constraint

minUa,Ug,U⇤

f =fg⇤ + fa⇤ + f�⇤

=� Tr(HLgHKg)

� Tr(HLaHKa)

+ �||U⇤||2Fs.t. Ug

�g�1(i)= Ua

�a�1(i) = U⇤i

minU⇤

f =� Tr(HLgHKg)

� Tr(HLaHKa)

+ �||U⇤||2F

9

Partial Co-regularization

§  POLNA with soft constraint – Enforcing the link representation and attribute

representation to be exactly the same might be too strict, since links and attributes could contain certain information reflecting their unique characteristics

minUa,Ug,U⇤

f =fg + fa + fs

=� Tr(HLgHKg) + �||Ug||2F� Tr(HLaHKa) + �||Ua||2F+ ↵

X

i2Os

(||Ug�g�1(i)

�U⇤i ||2F + ||Ua

�a�1(i) �U⇤i ||2F )

10

Optimization

§  Gradient

@fg

@Ugi

= �ngX

j=1

((HLgH)ij ·@Kg

ij

@Ugi

+ (HLgH)ji ·@Kg

ji

@Ugi

) + 2�Ugi

= 4 ·ngX

j=1

(((HLgH)�Kg)ij�Ug

i �Ugj

�) + 2�Ug

i

11

Optimization

§  POLNA with hard constraint

Algorithm 1 Optimization for POLNA with Hard Consensus Constraint(POLNA-HC)

1: Input: (Partially observable) feature matrix X, (Partially observable) net-work G, �.

2: Output: Consensus representation U

⇤

3: Initialize: Ugi = rand(0, 1), Ua

i = rand(0, 1), U⇤ = 0

4: Update U

⇤ with Eq (15) by L-BFGS

12

Optimization

§  POLNA with soft constraint – Alternating optimization

Algorithm 1 Alternating Optimization for POLNA with Soft Consensus Con-straint (POLNA-SC)

1: Input: (Partially observable) feature matrix X, (Partially observable) net-work G, � and ↵.

2: Output: Consensus representation U

⇤.3: Initialize: Ug

i = rand(0, 1), Uai = rand(0, 1), U⇤ = 0

4: Set t = 15: while not converged do

6: Find the optimal Ug by L-BFGS with Eq (16)7: if t = 1 then

8: U

⇤i = U

g�g�1(i)

, 8i 2 Os

9: end if

10: Find the optimal Ua by L-BFGS with Eq (17)11: U

⇤i = (Ug

�g�1(i)+U

a�a�1(i))/2, 8i 2 Os

12: t = t+ 113: end while

14: U

⇤i = U

g�g�1(i)

, 8i 2 Og/Os

15: U

⇤i = U

a�a�1(i), 8i 2 Oa/Os

13

Evaluation

§  Datasets – Citeseer: citation network – Cora: citation network

§  Evaluation Metrics – Accuracy – NMI

Table 1: Statistics of two datasets

Statistics Citeseer Cora

# of instances 3312 2708

# of links 4598 5429

# of attributes 3703 1433

# of classes 6 7

14

Evaluation

§  Methods: –  NMF: Non-negative Matrix Factorization on combined

network –  Spectral Clustering on combined network –  POLNA-HC: POLNA with hard constraint –  POLNA-SC: POLNA with soft constraint

§  Combined network for baselines: –  Create kNN network from observable node attributes and

combine it with observable link structure

15

Evaluation

§  Accuracy with different observable rate

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

Percentage of Fully Observable Nodes

Accu

racy

POLNA−SCPOLNA−HCSNMFSpectral

00.20.40.60.810.25

0.3

0.35

0.4

0.45

0.5

0.55

0.6

0.65


Accu

racy


Citeseer Cora

16

Evaluation

§  NMI with different observable rate

00.20.40.60.810

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45


NMI


00.20.40.60.810

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45


NMI


Citeseer Cora

17

Convergence Analysis

§  Accuracy and objective function w.r.t. # iterations

0 2 4 6 8 10 12 14 16 18 200.2

0.3

0.4

0.5

0.6

0.7

Number of Iterations

ACC

0 2 4 6 8 10 12 14 16 18 20−4

−3

−2

−1

0

1x 104

Obj

ectiv

e Fu

nctio

n

ACCObjective Function

0 2 4 6 8 10 12 14 16 18 200.2

0.4

0.6

0.8

Number of Iterations

ACC

0 2 4 6 8 10 12 14 16 18 20−4

−2

0

2x 104

Obj

ectiv

e Fu

nctio

n

ACCObjective Function

Citeseer Cora

18

Sensitivity Analysis

§  POLNA-HC w.r.t.

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8


Accu

racy

h=0.01h=0.05h=0.1h=0.5h=1h=5h=10

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8


Accu

racy

h=0.01h=0.05h=0.1h=0.5h=1h=5h=10

Citeseer Cora

�

19


§  POLNA-SC w.r.t.

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8


Accu

racy

h=0.01h=0.05h=0.1h=0.5h=1h=5

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8


Accu

racy

h=0.01h=0.05h=0.1h=0.5h=1h=5

Citeseer Cora

�

20


§  POLNA-SC w.r.t.

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8


Accu

racy

_=0.01_=0.05_=0.1_=0.5_=1_=5_=10

00.20.40.60.810.2

0.3

0.4

0.5

0.6

0.7

0.8


Accu

racy

_=0.01_=0.05_=0.1_=0.5_=1_=5_=10

Citeseer Cora

↵

21

Questions?

Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf ·...

Documents

Transcript of Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf ·...