Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf ·...
Transcript of Community Detection with Partially Observable Links and ...clu/doc/bigdata16_polna_slides.pdf ·...
Community Detection with Partially Observable Links and Node Attributes
Xiaokai Wei, Bokai Cao, Weixiang Shao, Chun-Ta Lu and Philip S. Yu
2
Partially Observable Network
§ Networks are partially observable w.r.t. links and attributes under various scenarios: – Online social network: news users typically have no
links and some users does not fill up their profile information. Besides, prudent users might set strict privacy setting which limits the visibility of their profile information and connections.
– Bioinformatics: It is more costly to obtain attributes/interaction links for certain genes.
– Terrorist network: some terrorists’ attribute and linkage information could be difficult to obtain.
3
Introduction
§ Link and node attributes provide complementary information about nodes
§ Challenge: nodes with only observable links and nodes with only observable attributes are not directly comparable
v4
v3
v2
v1
v6
Node attribute viewLink structure view
v4
v3
v2
v1v5
4
Introduction
§ POLNA: – community detection with Partially Observable
Links and Node Attributes
§ Idea: – Learn a latent representation for each node via
kernel matrix alignment based on either link structure or node content
– Link-based representations and content-based representations of fully observable nodes interact with each other by co-regularization
5
Formulation
Definition 1 Information Network An information network G = (V,E,X)
consists of V , the set of nodes, E ✓ V ⇥ V , the set of links, and the feature
matrix X = [x1,x2, . . . ,xn] (n = |V |), where xi 2 RD(i = 1 . . . n) is the feature
vector of node vi.
Definition 2 Partially Observable Information Network In a partially
observable information network G = (V,E,X), each node belongs to one or
two (overlapping) of the following sets: the set of nodes with observable links
(denoted as Og) and nodes with observable attributes (denoted as Oa
). We
further denote the set of nodes with both observable links and attributes as Os=
Og \ Oa. Note that V = Og [ Oa
since we assume each node has at least one
source of information.
Let the number of nodes with observable links or observable attributes be
ng= |Og| and na
= |Oa|, respectively. For more concise notations in the
following sections, we assign local indices 1 ⇠ ngto the nodes in Og
and 1 ⇠ na
to the nodes in Oa. A mapping function �g
(i) (or �a(i)) maps the local index
i 2 {1, . . . , ng} (or i 2 {1, . . . , na}) in Og(or Oa
) to its global index.
6
Matrix Alignment
§ Measuring the similarity between two (vectorized) matrices
Matrix Alignment For two matrices K1 2 Rn⇥nand K2 2 Rn⇥n
(assume
||K1||F > 0 and ||K2||F > 0), the alignment between K1 and K2 is defined as
⇢(K1,K2) =Tr(K1K2)
||K1||F · ||K2||F(1)
where Tr(·) is the trace of a matrix.
Unnormalized Matrix Alignment For two matrices K1 2 Rn⇥nand
K2 2 Rn⇥n(assume ||K1||F > 0 and ||K2||F > 0), the alignment between K1
and K2 is defined as
⇢(K1,K2) = Tr(K1K2) (1)
Centered Matrix Alignment For two matrices K1 2 Rn⇥nand K2 2
Rn⇥n(assume ||K1||F > 0 and ||K2||F > 0), the centered alignment between
K1 and K2 is defined as
⇢(K1,K2) =Tr(HK1HHK2H)
=Tr(HK1HK2)
where the second equation can be obtained by noting HH = H and Tr(AB) =
Tr(BA) for arbitrary matrices A,B 2 Rn⇥n. H = I� 1
n11T
7
Representation Learning with Kernel Matrix Alignment
§ Kernel constructed from latent representations
§ Learn latent representations that best align with the network structure and attribute kernel
Suppose Kg 2 Rng⇥ngand Ka 2 Rna⇥na
are kernel matrices computed from
the latent representation Ugand Ua
with Gaussian Kernel, respectively.
Kgij =e�||Ug
i �Ugj ||
2
Kaij =e�||Ua
i �Uaj ||
2
minUg
fg = �Tr(HLgHKg) + �||Ug||2F
minUa
fa = �Tr(HLaHKa) + �||Ua||2F
Attribute-based representation
Network-based representation
8
Partial Co-regularization
§ Learn a consensus from the link-based representation and attribute-based representation
§ POLNA with hard constraint
minUa,Ug,U⇤
f =fg⇤ + fa⇤ + f�⇤
=� Tr(HLgHKg)
� Tr(HLaHKa)
+ �||U⇤||2Fs.t. Ug
�g�1(i)= Ua
�a�1(i) = U⇤i
minU⇤
f =� Tr(HLgHKg)
� Tr(HLaHKa)
+ �||U⇤||2F
9
Partial Co-regularization
§ POLNA with soft constraint – Enforcing the link representation and attribute
representation to be exactly the same might be too strict, since links and attributes could contain certain information reflecting their unique characteristics
minUa,Ug,U⇤
f =fg + fa + fs
=� Tr(HLgHKg) + �||Ug||2F� Tr(HLaHKa) + �||Ua||2F+ ↵
X
i2Os
(||Ug�g�1(i)
�U⇤i ||2F + ||Ua
�a�1(i) �U⇤i ||2F )
10
Optimization
§ Gradient
@fg
@Ugi
= �ngX
j=1
((HLgH)ij ·@Kg
ij
@Ugi
+ (HLgH)ji ·@Kg
ji
@Ugi
) + 2�Ugi
= 4 ·ngX
j=1
(((HLgH)�Kg)ij�Ug
i �Ugj
�) + 2�Ug
i
11
Optimization
§ POLNA with hard constraint
Algorithm 1 Optimization for POLNA with Hard Consensus Constraint(POLNA-HC)
1: Input: (Partially observable) feature matrix X, (Partially observable) net-work G, �.
2: Output: Consensus representation U
⇤
3: Initialize: Ugi = rand(0, 1), Ua
i = rand(0, 1), U⇤ = 0
4: Update U
⇤ with Eq (15) by L-BFGS
12
Optimization
§ POLNA with soft constraint – Alternating optimization
Algorithm 1 Alternating Optimization for POLNA with Soft Consensus Con-straint (POLNA-SC)
1: Input: (Partially observable) feature matrix X, (Partially observable) net-work G, � and ↵.
2: Output: Consensus representation U
⇤.3: Initialize: Ug
i = rand(0, 1), Uai = rand(0, 1), U⇤ = 0
4: Set t = 15: while not converged do
6: Find the optimal Ug by L-BFGS with Eq (16)7: if t = 1 then
8: U
⇤i = U
g�g�1(i)
, 8i 2 Os
9: end if
10: Find the optimal Ua by L-BFGS with Eq (17)11: U
⇤i = (Ug
�g�1(i)+U
a�a�1(i))/2, 8i 2 Os
12: t = t+ 113: end while
14: U
⇤i = U
g�g�1(i)
, 8i 2 Og/Os
15: U
⇤i = U
a�a�1(i), 8i 2 Oa/Os
13
Evaluation
§ Datasets – Citeseer: citation network – Cora: citation network
§ Evaluation Metrics – Accuracy – NMI
Table 1: Statistics of two datasets
Statistics Citeseer Cora
# of instances 3312 2708
# of links 4598 5429
# of attributes 3703 1433
# of classes 6 7
14
Evaluation
§ Methods: – NMF: Non-negative Matrix Factorization on combined
network – Spectral Clustering on combined network – POLNA-HC: POLNA with hard constraint – POLNA-SC: POLNA with soft constraint
§ Combined network for baselines: – Create kNN network from observable node attributes and
combine it with observable link structure
15
Evaluation
§ Accuracy with different observable rate
00.20.40.60.810.2
0.3
0.4
0.5
0.6
0.7
Percentage of Fully Observable Nodes
Accu
racy
POLNA−SCPOLNA−HCSNMFSpectral
00.20.40.60.810.25
0.3
0.35
0.4
0.45
0.5
0.55
0.6
0.65
Percentage of Fully Observable Nodes
Accu
racy
POLNA−SCPOLNA−HCSNMFSpectral
Citeseer Cora
16
Evaluation
§ NMI with different observable rate
00.20.40.60.810
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Percentage of Fully Observable Nodes
NMI
POLNA−SCPOLNA−HCSNMFSpectral
00.20.40.60.810
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
Percentage of Fully Observable Nodes
NMI
POLNA−SCPOLNA−HCSNMFSpectral
Citeseer Cora
17
Convergence Analysis
§ Accuracy and objective function w.r.t. # iterations
0 2 4 6 8 10 12 14 16 18 200.2
0.3
0.4
0.5
0.6
0.7
Number of Iterations
ACC
0 2 4 6 8 10 12 14 16 18 20−4
−3
−2
−1
0
1x 104
Obj
ectiv
e Fu
nctio
n
ACCObjective Function
0 2 4 6 8 10 12 14 16 18 200.2
0.4
0.6
0.8
Number of Iterations
ACC
0 2 4 6 8 10 12 14 16 18 20−4
−2
0
2x 104
Obj
ectiv
e Fu
nctio
n
ACCObjective Function
Citeseer Cora
18
Sensitivity Analysis
§ POLNA-HC w.r.t.
00.20.40.60.810.2
0.3
0.4
0.5
0.6
0.7
0.8
Percentage of Fully Observable Nodes
Accu
racy
h=0.01h=0.05h=0.1h=0.5h=1h=5h=10
00.20.40.60.810.2
0.3
0.4
0.5
0.6
0.7
0.8
Percentage of Fully Observable Nodes
Accu
racy
h=0.01h=0.05h=0.1h=0.5h=1h=5h=10
Citeseer Cora
�
19
Sensitivity Analysis
§ POLNA-SC w.r.t.
00.20.40.60.810.2
0.3
0.4
0.5
0.6
0.7
0.8
Percentage of Fully Observable Nodes
Accu
racy
h=0.01h=0.05h=0.1h=0.5h=1h=5
00.20.40.60.810.2
0.3
0.4
0.5
0.6
0.7
0.8
Percentage of Fully Observable Nodes
Accu
racy
h=0.01h=0.05h=0.1h=0.5h=1h=5
Citeseer Cora
�
20
Sensitivity Analysis
§ POLNA-SC w.r.t.
00.20.40.60.810.2
0.3
0.4
0.5
0.6
0.7
0.8
Percentage of Fully Observable Nodes
Accu
racy
_=0.01_=0.05_=0.1_=0.5_=1_=5_=10
00.20.40.60.810.2
0.3
0.4
0.5
0.6
0.7
0.8
Percentage of Fully Observable Nodes
Accu
racy
_=0.01_=0.05_=0.1_=0.5_=1_=5_=10
Citeseer Cora
↵
21
Questions?