Clustered Embedding of Massive Social Networks ᵻ U. of Texas at Austin ‡ Linkoping U.ˆ AT&T...
-
Upload
kory-cooper -
Category
Documents
-
view
217 -
download
2
Transcript of Clustered Embedding of Massive Social Networks ᵻ U. of Texas at Austin ‡ Linkoping U.ˆ AT&T...
Clustered Embedding of Clustered Embedding of Massive Social NetworksMassive Social Networks
ᵻU. of Texas at Austin ‡ Linkoping U. ˆ AT&T Labs ⁰MSR Asia
Han Hee SongNarus Inc.
[email protected] Savas‡,Tae Won Cho ˆ, Vacha Dave ᵻ, Zhengdong Lu⁰,
Inderjit S. Dhillon ᵻ, Yin Zhang ᵻ, Lili Qiu ᵻ
Social NetworksSocial Networks Rise of social networks
Applications In business
• Fraud detection• Targeted advertisement
In information access• Improving search (e.g. Google Co-op, Yahoo! My Web) • Content recommendation (via collaborative filtering)
In systems & security research• Defending against Sybil (a.k.a. multi-identity) attacks• Fighting spam
2
Proximity MeasureProximity Measure A central concept in social network analysis
Quantifies the “closeness” or “similarity” between users in a social network
Lies at the heart of many important applications• E.g., closer users are more trustworthy, and provide more valuable
opinions for content recommendation
Simple measures are often insufficient [LK03] E.g., shortest path distance
• 6 degrees of separation limited resolution• Depends on only the shortest path not robust
E.g., number of common neighbors• Only works among friends-of-friends limited applicability
3
Path Ensemble Based Proximity MeasuresPath Ensemble Based Proximity Measures Katz measure [Katz53]
Total path count exponentially damped by length. Katz between node X, Y is
More paths + shorter length stronger relationship
Rooted PageRank [LK03] Personalized PageRank rooted at the node taking measurement Defines a random walk to capture the probability for two nodes to run into
each other
Escape probability [TFK07] Gives the probability for a random walk starting from x to visit y before
returning to x
4
klengthyx
k
k pathsyxK
1
),(
By aggregating an infinite number of paths, the above measures cancapture more social structure and are often more effective [LK03]
factor damping is where
Open Problem: Scalable Proximity EstimationOpen Problem: Scalable Proximity Estimation
Online social networks = opportunity + challenge Provide rich information for social network research Impose significant scalability challenge
• Massive: with millions of users• Dynamic: users constantly join, leave, make friends
Path ensemble based measures are expensive to compute Existing methods can only handle thousands of nodes [LK03] Frequently dismissed due to high computational cost (e.g. [SMP08]) … despite their good performance in smaller networks [LK03]
5
Social network analysis: enabling proximity estimation in networks with millions of nodes and links
Decompose graph for efficient, low-rank approximation of proximity measures
A : large and sparse adjacency matrix (m x m, ex: m=2million) U : user feature matrix (m x r, ex: m=2million, r=100) D : dense core matrix (r x r, ex: r=100)
Why Spectral Graph Embedding?: many functions of A can be computed efficiently Ex: approximation of common neighbor, Katz, rootedPageRank, etc. Fast computation of Ak is important
Spectral Graph EmbeddingSpectral Graph Embedding6
UAmxm
D UT
TkkTκ UUD ) (UDUA
Limitations of Spectral Graph EmbeddingLimitations of Spectral Graph Embedding
Not all matrices have low-rank approximationex: rooted PageRank uses normalized A Small dimensions are not enough to capture
high-rank measures Processing overhead becomes too expensive
7
Amxm
D UT
U
Scalable Graph EmbeddingScalable Graph Embedding My goal
Scalable Graph Embedding for high-rank matrix Many existing proximity measures/ adjacency graph are complex
• Ex: Rooted PageRank
Requirements Scalable graph embedding for large and complex matrix Low processing overhead in CPU time and memory usage Accurate approximation of proximity measures
My idea Combine spectral graph embedding with clustering Preserve both inter- and intra- clustering information
8
Background: Graph ClusteringBackground: Graph Clustering
Clustering Partition the set of vertices into disjoint clusters Good clustering => more intra links, fewer inter links
Benefits Parallel: All the computation can be performed independently
on each block of nodes The size of diagonal blocks are smaller than the original graph
9
n1 n2 n3 n4
n1 1 0 1 0
n2 0 1 0 1
n3 1 0 1 0
n4 0 1 0 1
n1 n3 n2 n4
C1
n1 1 1 0 0
n3 1 1 0 0
C2
n2 0 0 1 1
n4 0 0 1 1
#4 Approximate A using block diagonal V and core D#5 Compute D using block diagonal V and clustered A#2 Decompose each cluster Ai into Vi and Di#1 Combine Spectral Graph Embedding with Clustering
#3 Build block diagonal V using clustered Vi
The memory usage of V is mxr
Clustered Spectral Graph EmbeddingClustered Spectral Graph Embedding10
A1 A12
A21 A2
Amxm
V1A1
D1 V1T
DV1
T
V2T
VmxcrT
V1
V2
Vmxcr
A1 A12
A21 A2
V1
V2
Vmxcr
0
0
V1T
V2
VmxcrT
?
Dcrxcr
D
Memory requirement comparisonMemory requirement comparison11
A1 A12
A21 A2
Amxm
V1
V2
Vmxcr
0
0
V1T
V2
VmxcrT
D
Dcrxcr
CSGEvs.
Spectral Graph Embedding
VTDrxr
Vmxr
V : use same amount of memory (m x r) as SGED : SGE (r x r) vs. CSGE (cr x cr)
No. clusters
% intra links
% inter links
Flickr 18 71.8% 28.2%
LiveJournal
17 72.5% 27.5%
MySpace 17 51.9% 48.1%
DatasetDatasetNo. nodes No. links
Flickr 2.0 Million 41.3 Million
LiveJournal
1.8 Million 83.7 Million
MySpace 2.1 Million 90.3 Million
12
Accuracy of Proximity Metric Estimation Accuracy of Proximity Metric Estimation (LiveJournal)(LiveJournal)
Katz (regular A)
13
Normalized Absolute Error
CD
F
Normalized Absolute Error
CD
F
Rooted PageRank (normalized A)
Same memory usage for SGE and CSGE• SGE: 100 dimensions• CSGE: 100 dimensions/cluster, 17 clusters
Preparation Time of Proximity Estimation Preparation Time of Proximity Estimation (Flickr)(Flickr)
Preparation time of CSGE is fast, can be even faster by parallelization
ANormalize
d A
Spectral Graph Embedding 19.9 min 2821 minClustered Spectral
Embedding 15.7 min 332 min
14
CSGE-clustering 1.4 min 1.4 min
CSGE-eigen decompose 12.0 min 327 min
CSGE-D core 2.3 min 3.1 minCSGE performs nearly an order of magnitude faster than SGE
Query Time of Proximity EstimationQuery Time of Proximity Estimation
Query time depends on the size of core matrix D:
CSGE (crxcr) vs. SGE/PE(rxr) CSGE computation time can be reduced by using smaller D or low-rank approximation of D
FlickrLiveJourn
alMySpace
Directly computed Katz8,040
ms14,790
ms16,655
ms
Katz-Spectral Graph Embed.
0.045 ms
0.036 ms0.036
ms
Katz-CSGE2.76
ms2.04 ms 2.65 ms
15
Applications of CSGEApplications of CSGE
Proximity Estimation Enable approximation of non-low rank proximity
measures
Link Prediction (friendship recommendation) Predicting future friendship among OSN users Accuracy improvement of up-to 20% over SGE
Missing Link Inference (compressive sensing) Inferring unobserved links Several-fold reduction in the false positive rate in
comparison to SGE
16
Supervised Learning in Link PredictionSupervised Learning in Link Prediction
The best measure varies significantly across networks [Song09]
Supervised Learning learns optimal parameters of proximity metric for future link creation (using CSGE)
Understanding link prediction Help to expand social neighborhood by discovering new users
with similar interest Basis for comparing network evolution models High quality recommendation on potential friends Personalized search Targeted advertisement
17
Supervised Learning of Model ParametersSupervised Learning of Model Parameters 18
At-1
∆ t ∆ t+1
At At+1
Goal Predict new links at time t+1 using past snapshot of networks
(snapshot at t and t-1) Link Prediction
Training: train our model using past and present snapshots Prediction: make prediction using our trained model for future
snapshot
past future
Graph snapshot
s
New links
present
#5 Train model parameterS: diagonal matrix optimized for ∆t
Qt-1: factor matrix of D from At-1
Ut-1: user feature matrix from At-1
Qt-1T
S
S: diagonal matrix with parameters
Qt-1
Ut-1
Ut-1T
#4 Approximate new link matrix with concise modelWk-1: model parameter matrixUk-1: user feature matrix at time k-1
#3 Decompose dense core matrix∧t-1: diagonal matrix of eigenvaluesQt-1: factor matrix of dense core
#2 Use clustered spectral graph embedding to capture snapshots Ut-1 : user matrix from embeddingDt-1 : dense core matrix from embedding
#1 Setup matrix to train the modelAt : adjacency matrix at time t∆t : new edges created at time t
Supervised LearningSupervised Learning 19
∆t At At-1
#6 Make prediction using recent graph embedding Ut: user matrix from adjacency matrix at time tQt: factor matrix of dense core from adjacency matrix at time tS: trained diagonal matrix∆k: prediction on new links at time k+1
Ut-1Ut-1
TWt-1
Ut-1
Ut-1TDt-1
Qt-1 ∧t-1 Qt-1T
Qt-1 ∧t-1 Qt-1T
Ut-1
Ut-1T
∆t+1
S
Ut-1Ut-1
T SQt-1
TQt-1
UtQt Ut
TQtT
Training
Testing
Flickr dataset with 2-hop scenario Consider only node pairs connected by 2-hops for testing
Accuracy of Supervised Link PredictorsAccuracy of Supervised Link Predictors20
Learning parameters yields (consistently) better performance
Supervised Learning is up-to 20%
more accurate
ConclusionConclusion Clustered Spectral Graph Embedding (CSGE) -- new method for
dimensionality reduction Embeds a massive original graph into a smaller, dense graph Does not require the underlying graph to have a good low-rank
approximation Improves both computational efficiency and accuracy
Applications to social network analysis Proximity estimation
• Computational efficiency: up to an order of magnitude improvement in processing time and memory usage
• Accuracy: several orders of magnitude improvement for graphs with high intrinsic dimensionality
Link prediction• Developed supervised learning for automatic learning of optimal parameter
configurations• Consistently achieve the best performance across different graphs
21
ReferencesReferences [TGJ+02] H. Tangmunarunkit, et al., “Network topology generators:
Degree-based vs structural”, in ACM SIGCOMM, 2002. [Katz53] L. Katz, “A new status index derived from sociometric analysis”,
Psychometrika, 1953. [LK03] D. Liben-Nowell and J. Kleinberg, “The link prediction problem for
social networks”, Proc. CIKM, 2003. [MGD06] A. Mislove, K. Gummadi, and P. Druschel, Exploiting social
networks for Internet search, HotNets-V, 2006. [MKGDB08] A. Mislove, H. S. Koppula, K. Gummadi, P. Druschel, and B.
Bhattacharjee, “Growth of the flickr social network”, Proc. SIGCOMM Workshop on Social Networks, 2008.
[TFK07] H. Tong, C. Faloutsos, and Y. Kohen, “Fast direction-aware proximity for graph mining”, Proc. KDD, 2007.
[SMP08] P. Sakar, A. Moore, and A. Prakash, “Fast incremental proximity search in large graphs”, ICML, 2008.
[CKC04] D.B. Chua, E.D. Kolaczyk, and M. Crovella, “Efficient monitoring of end-to-end network properties.” Proc INFOCOM ,2004.
[GKS76] G. H. Golub, V. Klema, and G. W. Stewart, “Rank degeneracy and least squares problems”, TR CS-TR-76-559, Stanford University 1976
24