Clustered Embedding of Massive Social Networks ᵻ U. of Texas at Austin ‡ Linkoping U.ˆ AT&T...

Clustered Embedding of Clustered Embedding of Massive Social NetworksMassive Social Networks

ᵻU. of Texas at Austin ‡ Linkoping U. ˆ AT&T Labs ⁰MSR Asia

Han Hee SongNarus Inc.

[email protected] Savas‡,Tae Won Cho ˆ, Vacha Dave ᵻ, Zhengdong Lu⁰,

Inderjit S. Dhillon ᵻ, Yin Zhang ᵻ, Lili Qiu ᵻ

Social NetworksSocial Networks Rise of social networks

Applications In business

• Fraud detection• Targeted advertisement

In information access• Improving search (e.g. Google Co-op, Yahoo! My Web) • Content recommendation (via collaborative filtering)

In systems & security research• Defending against Sybil (a.k.a. multi-identity) attacks• Fighting spam

2

Proximity MeasureProximity Measure A central concept in social network analysis

Quantifies the “closeness” or “similarity” between users in a social network

Lies at the heart of many important applications• E.g., closer users are more trustworthy, and provide more valuable

opinions for content recommendation

Simple measures are often insufficient [LK03] E.g., shortest path distance

• 6 degrees of separation limited resolution• Depends on only the shortest path not robust

E.g., number of common neighbors• Only works among friends-of-friends limited applicability

3

Path Ensemble Based Proximity MeasuresPath Ensemble Based Proximity Measures Katz measure [Katz53]

Total path count exponentially damped by length. Katz between node X, Y is

More paths + shorter length stronger relationship

Rooted PageRank [LK03] Personalized PageRank rooted at the node taking measurement Defines a random walk to capture the probability for two nodes to run into

each other

Escape probability [TFK07] Gives the probability for a random walk starting from x to visit y before

returning to x

4

klengthyx

k

k pathsyxK

1

),(

By aggregating an infinite number of paths, the above measures cancapture more social structure and are often more effective [LK03]

factor damping is where

Open Problem: Scalable Proximity EstimationOpen Problem: Scalable Proximity Estimation

Online social networks = opportunity + challenge Provide rich information for social network research Impose significant scalability challenge

• Massive: with millions of users• Dynamic: users constantly join, leave, make friends

Path ensemble based measures are expensive to compute Existing methods can only handle thousands of nodes [LK03] Frequently dismissed due to high computational cost (e.g. [SMP08]) … despite their good performance in smaller networks [LK03]

5

Social network analysis: enabling proximity estimation in networks with millions of nodes and links

Decompose graph for efficient, low-rank approximation of proximity measures

A : large and sparse adjacency matrix (m x m, ex: m=2million) U : user feature matrix (m x r, ex: m=2million, r=100) D : dense core matrix (r x r, ex: r=100)

Why Spectral Graph Embedding?: many functions of A can be computed efficiently Ex: approximation of common neighbor, Katz, rootedPageRank, etc. Fast computation of Ak is important

Spectral Graph EmbeddingSpectral Graph Embedding6

UAmxm

D UT

TkkTκ UUD ) (UDUA

Limitations of Spectral Graph EmbeddingLimitations of Spectral Graph Embedding

Not all matrices have low-rank approximationex: rooted PageRank uses normalized A Small dimensions are not enough to capture

high-rank measures Processing overhead becomes too expensive

7

Amxm

D UT

U

Scalable Graph EmbeddingScalable Graph Embedding My goal

Scalable Graph Embedding for high-rank matrix Many existing proximity measures/ adjacency graph are complex

• Ex: Rooted PageRank

Requirements Scalable graph embedding for large and complex matrix Low processing overhead in CPU time and memory usage Accurate approximation of proximity measures

My idea Combine spectral graph embedding with clustering Preserve both inter- and intra- clustering information

8

Background: Graph ClusteringBackground: Graph Clustering

Clustering Partition the set of vertices into disjoint clusters Good clustering => more intra links, fewer inter links

Benefits Parallel: All the computation can be performed independently

on each block of nodes The size of diagonal blocks are smaller than the original graph

9

n1 n2 n3 n4

n1 1 0 1 0

n2 0 1 0 1

n3 1 0 1 0

n4 0 1 0 1

n1 n3 n2 n4

C1

n1 1 1 0 0

n3 1 1 0 0

C2

n2 0 0 1 1

n4 0 0 1 1

#4 Approximate A using block diagonal V and core D#5 Compute D using block diagonal V and clustered A#2 Decompose each cluster Ai into Vi and Di#1 Combine Spectral Graph Embedding with Clustering

#3 Build block diagonal V using clustered Vi

The memory usage of V is mxr

Clustered Spectral Graph EmbeddingClustered Spectral Graph Embedding10

A1 A12

A21 A2

Amxm

V1A1

D1 V1T

DV1

T

V2T

VmxcrT

V1

V2

Vmxcr

A1 A12

A21 A2

V1

V2

Vmxcr

0

0

V1T

V2

VmxcrT

?

Dcrxcr

D

Memory requirement comparisonMemory requirement comparison11

A1 A12

A21 A2

Amxm

V1

V2

Vmxcr

0

0

V1T

V2

VmxcrT

D

Dcrxcr

CSGEvs.

Spectral Graph Embedding

VTDrxr

Vmxr

V : use same amount of memory (m x r) as SGED : SGE (r x r) vs. CSGE (cr x cr)

No. clusters

% intra links

% inter links

Flickr 18 71.8% 28.2%

LiveJournal

17 72.5% 27.5%

MySpace 17 51.9% 48.1%

DatasetDatasetNo. nodes No. links

Flickr 2.0 Million 41.3 Million

LiveJournal

1.8 Million 83.7 Million

MySpace 2.1 Million 90.3 Million

12

Accuracy of Proximity Metric Estimation Accuracy of Proximity Metric Estimation (LiveJournal)(LiveJournal)

Katz (regular A)

13

Normalized Absolute Error

CD

F

Normalized Absolute Error

CD

F

Rooted PageRank (normalized A)

Same memory usage for SGE and CSGE• SGE: 100 dimensions• CSGE: 100 dimensions/cluster, 17 clusters

Preparation Time of Proximity Estimation Preparation Time of Proximity Estimation (Flickr)(Flickr)

Preparation time of CSGE is fast, can be even faster by parallelization

ANormalize

d A

Spectral Graph Embedding 19.9 min 2821 minClustered Spectral

Embedding 15.7 min 332 min

14

CSGE-clustering 1.4 min 1.4 min

CSGE-eigen decompose 12.0 min 327 min

CSGE-D core 2.3 min 3.1 minCSGE performs nearly an order of magnitude faster than SGE

Query Time of Proximity EstimationQuery Time of Proximity Estimation

Query time depends on the size of core matrix D:

CSGE (crxcr) vs. SGE/PE(rxr) CSGE computation time can be reduced by using smaller D or low-rank approximation of D

FlickrLiveJourn

alMySpace

Directly computed Katz8,040

ms14,790

ms16,655

ms

Katz-Spectral Graph Embed.

0.045 ms

0.036 ms0.036

ms

Katz-CSGE2.76

ms2.04 ms 2.65 ms

15

Applications of CSGEApplications of CSGE

Proximity Estimation Enable approximation of non-low rank proximity

measures

Link Prediction (friendship recommendation) Predicting future friendship among OSN users Accuracy improvement of up-to 20% over SGE

Missing Link Inference (compressive sensing) Inferring unobserved links Several-fold reduction in the false positive rate in

comparison to SGE

16

Supervised Learning in Link PredictionSupervised Learning in Link Prediction

The best measure varies significantly across networks [Song09]

Supervised Learning learns optimal parameters of proximity metric for future link creation (using CSGE)

Understanding link prediction Help to expand social neighborhood by discovering new users

with similar interest Basis for comparing network evolution models High quality recommendation on potential friends Personalized search Targeted advertisement

17

Supervised Learning of Model ParametersSupervised Learning of Model Parameters 18

At-1

∆ t ∆ t+1

At At+1

Goal Predict new links at time t+1 using past snapshot of networks

(snapshot at t and t-1) Link Prediction

Training: train our model using past and present snapshots Prediction: make prediction using our trained model for future

snapshot

past future

Graph snapshot

s

New links

present

#5 Train model parameterS: diagonal matrix optimized for ∆t

Qt-1: factor matrix of D from At-1

Ut-1: user feature matrix from At-1

Qt-1T

S

S: diagonal matrix with parameters

Qt-1

Ut-1

Ut-1T

#4 Approximate new link matrix with concise modelWk-1: model parameter matrixUk-1: user feature matrix at time k-1

#3 Decompose dense core matrix∧t-1: diagonal matrix of eigenvaluesQt-1: factor matrix of dense core

#2 Use clustered spectral graph embedding to capture snapshots Ut-1 : user matrix from embeddingDt-1 : dense core matrix from embedding

#1 Setup matrix to train the modelAt : adjacency matrix at time t∆t : new edges created at time t

Supervised LearningSupervised Learning 19

∆t At At-1

#6 Make prediction using recent graph embedding Ut: user matrix from adjacency matrix at time tQt: factor matrix of dense core from adjacency matrix at time tS: trained diagonal matrix∆k: prediction on new links at time k+1

Ut-1Ut-1

TWt-1

Ut-1

Ut-1TDt-1

Qt-1 ∧t-1 Qt-1T

Qt-1 ∧t-1 Qt-1T

Ut-1

Ut-1T

∆t+1

S

Ut-1Ut-1

T SQt-1

TQt-1

UtQt Ut

TQtT

Training

Testing

Flickr dataset with 2-hop scenario Consider only node pairs connected by 2-hops for testing

Accuracy of Supervised Link PredictorsAccuracy of Supervised Link Predictors20

Learning parameters yields (consistently) better performance

Supervised Learning is up-to 20%

more accurate

ConclusionConclusion Clustered Spectral Graph Embedding (CSGE) -- new method for

dimensionality reduction Embeds a massive original graph into a smaller, dense graph Does not require the underlying graph to have a good low-rank

approximation Improves both computational efficiency and accuracy

Applications to social network analysis Proximity estimation

• Computational efficiency: up to an order of magnitude improvement in processing time and memory usage

• Accuracy: several orders of magnitude improvement for graphs with high intrinsic dimensionality

Link prediction• Developed supervised learning for automatic learning of optimal parameter

configurations• Consistently achieve the best performance across different graphs

21

Thank you!Thank you!

Backup SlidesBackup Slides

ReferencesReferences [TGJ+02] H. Tangmunarunkit, et al., “Network topology generators:

Degree-based vs structural”, in ACM SIGCOMM, 2002. [Katz53] L. Katz, “A new status index derived from sociometric analysis”,

Psychometrika, 1953. [LK03] D. Liben-Nowell and J. Kleinberg, “The link prediction problem for

social networks”, Proc. CIKM, 2003. [MGD06] A. Mislove, K. Gummadi, and P. Druschel, Exploiting social

networks for Internet search, HotNets-V, 2006. [MKGDB08] A. Mislove, H. S. Koppula, K. Gummadi, P. Druschel, and B.

Bhattacharjee, “Growth of the flickr social network”, Proc. SIGCOMM Workshop on Social Networks, 2008.

[TFK07] H. Tong, C. Faloutsos, and Y. Kohen, “Fast direction-aware proximity for graph mining”, Proc. KDD, 2007.

[SMP08] P. Sakar, A. Moore, and A. Prakash, “Fast incremental proximity search in large graphs”, ICML, 2008.

[CKC04] D.B. Chua, E.D. Kolaczyk, and M. Crovella, “Efficient monitoring of end-to-end network properties.” Proc INFOCOM ,2004.

[GKS76] G. H. Golub, V. Klema, and G. W. Stewart, “Rank degeneracy and least squares problems”, TR CS-TR-76-559, Stanford University 1976

24

Future worksFuture works

Analysis of social networks Community detection Leverage friendship info in content

recommendation

25

Clustered Embedding of Massive Social Networks ᵻ U. of Texas at Austin ‡ Linkoping U.ˆ AT&T...

Documents

Transcript of Clustered Embedding of Massive Social Networks ᵻ U. of Texas at Austin ‡ Linkoping U.ˆ AT&T...