Proximity on Graphs - Carnegie Mellon School of Computer...
Transcript of Proximity on Graphs - Carnegie Mellon School of Computer...
H. Tong 15-826
Guest-lecture 1
SCS CMU
Proximity on Graphs-Definitions, Fast Solutions, and Applications
Hanghang [email protected]
SCS CMU
-----
Graphs are everywhere!
Internet Map [Koren 2009] Food Web [2007] Terrorist Network
2
p [ ] [ ]
Protein Network [Salthe 2004]
Social Network [Newman 2005]
Web Graph
[Krebs 2002]
SCS CMU
Graph Mining: Big Picture
John
Smith
TomJones
Alan
Peter
Adam
+ Graph Level-Patterns-Laws-Generators
+ Subgraph Level- Community
Adam
Dan
Jack
Alice
Amy Anna
Beck
Tom
Cell PhoneMessageE-Mail
Community+ Node Level
-Association-Correlation-Causality-Proximity
Anna
We are here! 826 guest lecture 3
H. Tong 15-826
Guest-lecture 2
SCS CMU
Proximity on Graph: What?
a.k.a Relevance, Closeness, ‘Similarity’…826 guest lecture 4
SCS CMU
Proximity on Graphs: Why?
• Link prediction [Liben-Nowell+], [Tong+]• Ranking [Haveliwala], [Chakrabarti+]• Email Management [Minkov+]• Image caption [Pan+]g p [ ]• Neighborhooh Formulation [Sun+]• Conn. subgraph [Faloutsos+], [Tong+], [Koren+]• Pattern match [Tong+]• Collaborative Filtering [Fouss+]• Many more…
Will return to this later826 guest lecture 5
SCS CMU
0.2
0.25
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
Link Prediction
Prox. Hist. for a set of deleted links
density
density
Prox (i j)+Prox (j i)
Prox. is effective to ‘deleted’ and absent edges!
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
0.05
0.1
0.15
Prox (i j)+Prox (j i)
Q: How to predict the existence of the link?A: Proximity! [Liben-Nowell + 2003]
Prox. Hist. for a set of absent links
826 guest lecture 6
H. Tong 15-826
Guest-lecture 3
SCS CMU Neighborhood Search on graphs
… …
…
…
Conference Author
A: Proximity! [Sun+ ICDM2005]
Q: what is most related conference to ICDM?
826 guest lecture 7
SCS CMU
Image
Region Automatic Image Caption
Test Image
Sea Sun Sky Wave Cat Forest Tiger Grass
Keyword
Q: How to assign keywords to the test image?A: Proximity! [Pan+ 2004]
826 guest lecture 8
SCS CMU
A A
Center-Piece Subgraph Discovery
Input: original graph Output: CePS
CePS Node
3-9
B C B C
Q: How to find hub for the black nodes?A: Proximity! [Tong+ KDD 2006]
H. Tong 15-826
Guest-lecture 4
SCS CMU
OutputInput
Data Graph
Query Graph
Best-Effort Pattern Match
Matching Subgraph
Q: How to find matching subgraph?A: Proximity![Tong+ KDD 2007 b]826 guest lecture
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• Basic: RWR
• Variants
• Properties
• Applications• Conclusion
• Generalizations
826 guest lecture 11
SCS CMU
Why not shortest path?Some ``bad’’ proximities
‘pizza delivery guy’ problem
A BD1 1
‘multi-facet’ relationship
826 guest lecture 12
H. Tong 15-826
Guest-lecture 5
SCS CMU
Why not max. netflow?Some ``bad’’ proximities
No punishment on long paths
826 guest lecture 13
SCS CMU
Random Walk with Restart Node 4
Node 1Node 2Node 3Node 4Node 5Node 6
0.130.100.130.220.130.05
13
2
910
811
120.13
0.10
0.13
0.08
0.04
0.02
0.04
0.03
Node 7Node 8Node 9Node 10Node 11Node 12
0.050.080.040.030.040.02
4
5 6
7
0.13
0.05
0.05
Ranking vector More red, more relevantNearby nodes, higher scores
4r14
SCS CMU
RWR: Think of it as Wine Spill
15
1. Spill a drop of wine on cloth 2. Spread/diffuse to the neighborhood
H. Tong 15-826
Guest-lecture 6
SCS CMU
13
2
910
811
12
RWR: Wine Spill on a Graph
16
4
3
56
7
11
wine spill on cloth RWR on a graph
Query
Same Diffusion Eq.
SCS CMU
13
2
910
811
12
Random Walk with Restart
4
3
56
7
17Same Diffusion Eq.
SCS CMU
Why RWR is a good score?
W : adjacency matrix. c: damping factor
1( )Q I cW −= − =
,( , ) i jQ i j r∝
i
j
2c 3cQ c= ...W 2W 3W
all paths from ito j with length 1
all paths from ito j with length 2
all paths from ito j with length 3
826 guest lecture 18
H. Tong 15-826
Guest-lecture 7
SCS CMU
Why RWR is A Good Score?
Prox (A B) =A B
…
High proximity many, short, heavy-weighted paths
Prox (A, B) = Score (Red Path) + Score (Green Path) + Score (Blue Path) +Score (Purple Path) +
…
19
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• Basic: RWR
• Variants
• Properties
• Applications• Conclusion
• Generalizations
826 guest lecture 20
SCS CMU
Variant: escape probability• Define Random Walk (RW) on the graph• Esc_Prob(CMU Napa)
– Prob (starting at CMU, reaches Napa before returning to CMU)
CMU Napathe remaining graph
Esc_Prob = Pr (smile before cry) 21
H. Tong 15-826
Guest-lecture 8
SCS CMU
Other Variants
• Other measure by RWs– Community Time/Hitting Time [Fouss+]– SimRank [Jeh+]
• Equivalence of Random WalksAll are “related to” or “similar to”Equivalence of Random Walks– Electric Networks:
• EC [Doyle+]; SAEC[Faloutsos+]; CFEC[Koren+] – String Systems
• Katz [Katz], [Huang+], [Scholkopf+]• Matrix-Forest-based Alg [Chobotarev+]
All are “related to” or “similar to” random walk with restart!
826 guest lecture 22
SCS CMU
Chaptering different measurements
RWR
Esc_Prob+ Sink
Hitting Time/Commute Time
RegularizedUn-constrainedQuad Opt.
relax
4 ssp decides 1 esc_prob
KatzNormalize
+ Sink Commute Time
Effective Conductance
String System
Harmonic Func.ConstrainedQuad Opt.
Mathematic Tools
X out-degree
“voltage = position”
Physical Models
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• Basic: RWR
• Variants
• Properties
• Applications• Conclusion
• Generalizations
826 guest lecture 24
H. Tong 15-826
Guest-lecture 9
SCS CMU
Property: Monotonicity
We want: Prox ( ) Prox ( )candi origianla b a b≤→ →A: degree preserving! [Koren+ KDD06][Tong+ KDD07a][Tong+ SDM08]
826 guest lecture25
SCS CMU
Property: Asymmetry [Tong+ KDD07 a]
What is Prox from A to B?What is Prox from B to A?
What is Prox between A and B?
826 guest lecture 26
SCS CMU
Asymmetry in un-directed graphs• Hanghang’s # 1 Conf. is CIKM• The #1 author of CIKM is ...
So is love…
HanghangCIKM
826 guest lecture27
H. Tong 15-826
Guest-lecture 10
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• Basic: RWR
• Variants
• Properties
• Applications• Conclusion
• Generalizations
826 guest lecture 28
SCS CMU
Group Proximity [Tong+ KDD07 a]
• Q: How close are Accountants to SECs?
826 guest lecture
• A: Prob (starting at any RED, reaches anyGREEN before touching any RED again)
29
SCS CMU
Proximity on Attributed Graphs [Tong+ KDD07 b]
What is the proximity from node 7 to 10?If we know that…
826 guest lecture30
H. Tong 15-826
Guest-lecture 11
SCS CMU
A: Augmented graphs [Tong+ KDD07 b]
826 guest lecture 31
SCS CMU
More on Generalizations
• Attributed on edges [Chakrabarti+ KDD 06]
• Proximity w/ Time– [Minkov+], [Tong+ SDM 2008], [Tong+ CIKM 2008]
• Proximity w/ Side Information [Tong+ 2008]
• …
826 guest lecture 32
SCS CMU
Summary of Proximity Definitions• Goal: Summarize multiple … relationship
• Solutions– Basic: Random Walk with Restart
• [Pan+ 2004][Sun+ 2006][Tong+ 2006][ ][ ][ g ]
– Properties: Asymmetry, monotonicity• [Koren+ 2006][Tong+ 2007] [Tong+ 2008]
– Variants: Esc_Prob and many others.• [Faloutsos+ 2004] [Koren+ 2006][Tong+ 2007]
– Generalizations: Group Prox, w/ Attr., w/ Time, w/ Side Information
• [Charkrabarti+ 2006][Tong+ 2007] [Tong+ 2008]826 guest lecture
33
H. Tong 15-826
Guest-lecture 12
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions• Applications• Conclusion
826 guest lecture 34
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• B_Lin: RWR
• BB_Lin: Skewed BGs
• Applications• Conclusion
826 guest lecture
• FastUpdate: Time-Evolving
35
SCS CMU
Preliminary: Sherman–Morrison Lemma
Tv
uAA =If:
1 11 1 1
1( )1
TT
T
A u v AA A u v Av A u
− −− − −
−
⋅ ⋅= + ⋅ = −
+ ⋅ ⋅
Then:
826 guest lecture 36
H. Tong 15-826
Guest-lecture 13
SCS CMU
SM: The block-form A
D
B
C1 1 1 1 1 1 1 1 1
1 1 1 1 1
( ) ( )( ) ( )
A B A A B D CA B CA A B D CA BC D D CA B CA D CA B
− − − − − − − − −
− − − − −
⎡ ⎤+ − − −⎡ ⎤= ⎢ ⎥⎢ ⎥ − − −⎢ ⎥⎣ ⎦ ⎣ ⎦
Or1 1 1 1 1 1
1 1 1 1 1
( ) ( )( ) ( )
A B A BD C A B D CA BC D D C A BD C D CA B
− − − − − −
− − − − −
⎡ ⎤− − −⎡ ⎤= ⎢ ⎥⎢ ⎥ − − −⎢ ⎥⎣ ⎦ ⎣ ⎦
And many other variants…Also known as Woodburg Identity
Or…
826 guest lecture 37
SCS CMU
SM Lemma: Applications
• RLS (Recursive least square)– and almost any algorithm in time series!
• Leave-one-out cross validation for LSR• Kalman filtering• Incremental matrix decomposition• …• … and all the fast sol.s we will introduce!
826 guest lecture 38
SCS CMU
Computing RWR
9 1012
0.13 0 1/3 1/3 1/3 0 0 0 0 0 0 0 00.10 1/3 0 1/3 0 0 0 0 1/4 0 0 0 0.13
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
01/3 1/3 0 1/3 0 0 0 0 0 0 0 0
⎛ ⎞⎜⎜⎜⎜
0.13 00.10 00.13 0
⎛ ⎞ ⎛ ⎞⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟
Ranking vector Starting vectorAdjacency matrix
(1 )i i ir cWr c e= + −Restart p
1
43
2
5 6
7
811
120.220.130.05
0.90.050.080.040.030.040.02
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ = ×⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
1/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0 0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 1/3 1/3 0
⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎝ ⎠
0.220.13 00.05 0
0.10.05 00.08 00.04 00.03 00.04 0
2 0
1
0.0
⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟+ ×⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
n x n n x 1n x 1
1
826 guest lecture 39
H. Tong 15-826
Guest-lecture 14
SCS CMU
0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0
000
00
1
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
Q: Given query i, how to solve it?
??Query
0.9= ×0 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
00.1
0000
0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 0 1/3 1/3 0 0
⎜ ⎟ ⎜ ⎟+ ×⎜ ⎟ ⎜ ⎟
⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
??
Adjacency matrix Starting vectorRanking vectorRanking vector
826 guest lecture 40
SCS CMU
14
32
6
9 10
8 11120.13
0.10
0.13
0 05
0.08
0.04
0.02
0.04
0.03
OntheFly: 0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4
0.9= ×
0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 0
000
00
0.1000
1
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
+ ×⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
000100000
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
0.130.100.130.220.130.050.050.080.04
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
1
43
2
6
9 10
8 11
12
0.300.30.10.30000
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
0.120.180.120.350.030.070.070.070
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
0.190.090.190.180.180.040.040.060.02
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
0.140.130.140.260.100.060.060.080.01
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
0.160.100.160.210.150.050.050.070.02
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
0.130.100.130.220.130.050.050.080.04
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟ 5 6
70.13
0.05
0.05
0 0 0 0 0 0 0 0 1/2 0 1/3 1/2 0 0 0 0 0
0 0 0 1/4 0 1/3 0 1/2 0
0 0 0 0 0 0 0 0 0 1/3 1/3 0 0
⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
000
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
0.030.040.02
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
5 6
7
000
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
000
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
00.020
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
0.010.010
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
0.010.020.01
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
0.030.040.02
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
No pre-computation/ light storage
Slow on-line response O(mE)
ir ir
826 guest lecture 41
SCS CMU
0.20 0.13 0.14 0.13 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.340.28 0.20 0.13 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.450.14 0.13 0.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.330.13 0.10 0.13 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.320.09 0.09 0.09 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.03 0.04 0.04 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
4
PreCompute1 2 3 4 5 6 7 8 9 10 11 12r r r r r r r r r r r r
14
32
6
9 10
8 11120.13
0.10
0.13
0 05
0.08
0.04
0.02
0.04
0.03
1 32
6
9 10
8 11
12
R:0.03 0.04 0.04 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.220.08 0.11 0.04 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130.03 0.04 0.03 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.04 0.04 0.04 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.04 0.05 0.04 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.02 0.03 0.02 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
5 6
70.13
0.05
0.05
5 6
7
[Haveliwala+ 2002]c x Q
Q 826 guest lecture
42
H. Tong 15-826
Guest-lecture 15
SCS CMU
2.20 1.28 1.43 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.341.28 2.02 1.28 0.96 0.64 0.53 0.53 0.85 0.60 0.48 0.53 0.451.43 1.28 2.20 1.29 0.68 0.56 0.56 0.63 0.44 0.35 0.39 0.331.29 0.96 1.29 2.06 0.95 0.78 0.78 0.61 0.43 0.34 0.38 0.320.91 0.86 0.91 1.27 2.41 1.97 1.97 1.05 0.73 0.58 0.66 0.560.37 0.35 0.37 0.52 0.98 2.06 1.37 0.43 0.30 0.24 0.27 0.22
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟
PreCompute:
14
32
6
9 10
8 11120.13
0.10
0.13
0.08
0.04
0.02
0.04
0.03
1
43
29 10
8 11
12
0.130.100.130.220.130.050 05
⎛ ⎞⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟0.37 0.35 0.37 0.52 0.98 1.37 2.06 0.43 0.30 0.24 0.27 0.22
0.84 1.14 0.84 0.82 1.05 0.86 0.86 2.13 1.49 1.19 1.33 1.130.29 0.40 0.29 0.28 0.36 0.30 0.30 0.74 1.78 1.00 0.76 0.790.35 0.48 0.35 0.34 0.44 0.36 0.36 0.89 1.50 2.45 1.54 1.800.39 0.53 0.39 0.38 0.49 0.40 0.40 1.00 1.14 1.54 2.28 1.720.22 0.30 0.22 0.21 0.28 0.22 0.22 0.56 0.79 1.20 1.14 2.05
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
5 6
70.13
0.05
0.05
5 6
7
Fast on-line response
Heavy pre-computation/storage costO(n ) O(n )
0.050.080.040.030.040.02
⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠
3 2 43
SCS CMU
Q: How to Balance?
On-line Off-line
826 guest lecture 44
SCS CMU
B_Lin: Pre-Compute[Tong+ ICDM 2006]
10
Find Communities
10 1010
Compute & Store Within-Communities Scores
Q33Q11
1
4
3
2
5 6
7
9 10
811
12
5 6
7
9 10
811
12
1
4
3
2
45
1
4
3
2
5 6
7
9 10
811
12
5 6
7
9 10
811
12
1
4
3
2
Q22
Q11
H. Tong 15-826
Guest-lecture 16
SCS CMU
B_Lin: On-Line[Tong+ ICDM 2006]
13
29 10
811
120.130.10
0.13
0.08
0.04
0.02
0 04
0.03
13
2
9 10
811
12
Find Communities
9 10
811
12
1
43
2
13
2
1
43
2
4
5 6
70.13
0.05
0.05
0.044
5 6
7
Estimate the impact of the bridges
Combine1
43
2
5 6
7
9 10
811
12
5 6
7
1
43
2
5 6
7
9 10
811
12
4
46
SCS CMU
B_Lin: details
+W =
W1,1
W2,2
W2,2
+~~
W1: within community Cross community 47
W1,1
W2,2
W2,2
U
S V
SCS CMU
B_Lin: details
-1 -1
W1,1
W2,2
W2,2
U
S V
WI – c ~~ I – c – cUSVW1
Easy to be inverted LRA difference
48
H. Tong 15-826
Guest-lecture 17
SCS CMU
B_Lin: details
-1 -1
W1,1
W2,2
W2,2
U
S V
WI – c ~~ I – c – cUSVW1
Easy to be inverted LRA difference
Sherman–Morrison Lemma: If B=A-USVThen B-1 = A-1 + A-1U(S-1-VA-1U)-1VA-1
A USV
SCS CMU
B_Lin: Pre-Compute Stage• Q: Efficiently compute and store Q• A: A few small, instead of ONE BIG, inverted
matrices Proximities within communities Effect of bridges
50Footnote: Q1=(I-cW1)-1
U
VQ1,1
Q2,2
Qk,k
>Q1,1
Q2,2
Qk,k
. . .Q1=
SCS CMU
B_Lin: On-Line Stage• Q: Efficiently recover one column of Q• A: A few, instead of MANY, matrix-vector
multiplicationsriei
51
U
Vii
>
Pre-computed Query Result51
H. Tong 15-826
Guest-lecture 18
SCS CMU
Query Time vs. Pre-Compute Time
Log Query Time
•Quality: 90%+ •On-line:
Log Pre-compute Time
•Up to 150x speedup•Pre-computation:
•Two orders saving
826 guest lecture 52
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• B_Lin: RWR
• BB_Lin: Skewed BGs
• Applications• Conclusion
826 guest lecture
• FastUpdate: Time-Evolving
53
SCS CMU
RWR on Bipartite Graph
n
authorsAuthor-Conf. Matrix
Observation: n >> m!
Examples: n
mConferences
p1. DBLP: 400k aus, 3.5k confs2. NetFlix: 2.7M usrs,18k mvs
826 guest lecture 54
H. Tong 15-826
Guest-lecture 19
SCS CMU
• Q: Given query i, how to solve it?
0 1/3 1/3 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 0 0 0 1/4 0 0 0 01/3 1/3 0 1/3 0 0 0 0 0 0 0 01/3 0 1/3 0 1/4 0 0 0 0 0 0 0
0001
⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
RWR on Skewed bipartite graphs
… .. .
. .
m confs
1/3 0 1/3 0 1/4
0.9= ×
0 0 0 0 0 0 00 0 0 1/3 0 1/2 1/2 1/4 0 0 0 00 0 0 0 1/4 0 1/2 0 0 0 0 0
0 0 0 0 1/4 1/2 0 0 0 0 0 0 0 1/3 0 0 1/4 0 0 0 1/2 0 1/3 00 0 0 0 0 0 0 1/4 0 1/3 0 00 0 0 0 0 0 0 0 1/2 0 1/3 1/2
0 0 0 0 0
00
0.10000
0 0 1/4 0 1/3 0 1/2 0 0 0 0 0 0 0 0 0 0 1/3 1/3
1
0 0
⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟
+ ×⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
??…. .. . . …. ...
0
0
n m
Ar
… .
. .. . …. .. . . …. ... Ac n aus
826 guest lecture 55
SCS CMU
• Step 1:
• Step 2:
BB_Lin: Pre-Computation dd[Tong+ ICDM 06]
M = AcArX
1( 0 9 )I M −Λ = ×
2-step RWR for Conferences
All Conf-Conf Prox. Scores
m conferences
• Step 2:
• Cost:
• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes
( 0.9 )I MΛ = − ×3( )O m m E+ ⋅
n authors826 guest lecture 56
SCS CMU
• Step 1:
• Step 2:
M = AcArX
1( 0 9 )I M −Λ = ×
2-step RWR for Conferences
All Conf-Conf Prox. Scores
m conferences
BB_Lin: Pre-Computation dd[Tong+ ICDM 06] details
• Step 2: ( 0.9 )I MΛ = − ×
n authors826 guest lecture 57
H. Tong 15-826
Guest-lecture 20
SCS CMU
• Step 1:
• Step 2:
M = Ac ArX
1( 0 9 )I M −Λ = ×
2-step RWR for Conferences
All Conf-Conf Prox. Scores
BB_Lin: Pre-Computation dd[Tong+ ICDM 06] details
• Step 2:
• Cost:
• Examples – NetFlix: 1.5hr for pre-computation; – DBLP: 1 few minutes
( 0.9 )I MΛ = − ×
3( )O m m E+ ⋅
Ac/ArE edges
m x m
826 guest lecture58
SCS CMU
BB_Lin: On-Line Stage
(Base) Case 1: - Conf - Conf
authors
Conferences
Read out !
Ac/ArE edges826 guest lecture 59
SCS CMU
BB_Lin: On-Line Stage
Case 2: - Au - Conf
authors
Conferences
Ac/ArE edges
1 matrix-vec!
826 guest lecture 60
H. Tong 15-826
Guest-lecture 21
SCS CMU
BB_Lin: On-Line Stage
Case 3: - Au - Au
authors
Conferences
Ac/ArE edges
2 matrix-vec!
826 guest lecture 61
SCS CMU
BB_Lin: Examples
Dataset Off-Line Cost On-Line CostDBLP a few minutes frac. of sec.
NetFlix 1.5 hours <0.01 sec.
400k authors x 3.5k conf.s 2.7m user
x 18k movies
826 guest lecture 62
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• B_Lin: RWR
• BB_Lin: Skewed BGs
• Applications• Conclusion
826 guest lecture
• FastUpdate: Time-Evolving
63
H. Tong 15-826
Guest-lecture 22
SCS CMU
Challenges
• BB_Lin is good for skewed bipartite graphs– for NetFlix (2.7M nodes and 100M edges)– On-line cost for query: fraction of seconds
• w/ 1.5 hr pre-computation for m x m core matrix
• But…what if the graph is evolving over time– New edges/nodes arrive; edge weights increase…– On-line cost: 1.5hr itself becomes a part of this!
826 guest lecture 64
SCS CMU
1( 0.9 )I M −Λ = − × 3( )O m m E+ ⋅t=0
Q: How to update the core matrix?
t=11( 0.9 )I M −Λ = − ×
~ ~ 3( )O m m E+ ⋅
?826 guest lecture 65
SCS CMU
Update the core matrix• Step 1:
M = Ac ArX~ M= X+
details
• Step 2:1( 0.9 )I M −Λ = − ×
~ ~ Rank 2 update
= + X
2( )O m 3( )O m m E+ ⋅826 guest lecture 66
H. Tong 15-826
Guest-lecture 23
SCS CMU
Update : General Case
M = AcArX~
n authors
m Conferences
details
• E’ edges changed• Involves n’ authors, m’ confs.
• Observationmin( ', ') 'n m E
826 guest lecture 67
SCS CMU
• Observation: – the rank of update is small!– Real Example (DBLP Post)
• 1258 time steps
Update : General Case
min( ', ') 'n m E n authors
m Conferences
details
• 1258 time steps• E’ up to ~20,000!• min(n’,m’) <=132
• Our Algorithm2(min( ', ') ')O n m m E+
3( )O m m E+ ⋅826 guest lecture 68
SCS CMU
176xlog(Time) (Seconds)
Speed Comparison
6969
40x
Data Sets
Our
DBLP Netflix
Our
H. Tong 15-826
Guest-lecture 24
SCS CMU
More on “Fast Solutions”
• FastAllDAP– Simultaneously solve multiple linear systems– [Tong+ KDD 2007 a]
• MT3– Multiple-Resolution Analysis on Time [Tong+ CIKM 2008]
• Fast-ProSIN– On-Line response for users’ feedback [Tong+ ICDM 2008]
826 guest lecture 70
SCS CMU
Summary of Fast Solutions
• Goal: Efficiently Solve Linear System(s)
• Sols.– B_Lin: one large linear system [Tong+ ICDM06]
– BB_Lin: the intrinsic complexity is small [Tong+ ICDM06]
– FastUpdate: dynamic linear system [Tong+ SDM08]– FastAllDAP: multiple linear systems [Tong+ KDD07 a]– MT3: [Tong+ CIKM 2008]
– Fast-ProSIN: [Tong+ 2008]826 guest lecture 71
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions• Applications• Conclusion
826 guest lecture 72
H. Tong 15-826
Guest-lecture 25
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• Link Prediction & +
• Ranking Related Tasks• Applications• Conclusion
826 guest lecture
• Ranking Related Tasks
• User Specific Patterns
• Time Related Tasks
73
SCS CMU
i
i i
i
i i
Link Prediction: existence
826 guest lecture
i
i
i
i
Prox (deleted) >> Prox (absent) !Footnote: - Red pair: ``deleted’’; - Blue pair: ``absent’’
74
SCS CMU
0.2
0.25
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18 Link Prediction:existence
Prox. Hist. for a set of deleted links
density
density
Prox (i j)+Prox (j i)
Prox. is effective to ‘deleted’ and absent edges!
0 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.090
0.05
0.1
0.15
Prox (i j)+Prox (j i)
Q: How to predict the existence of the link?A: Proximity! [Liben-Nowell + 2003]
Prox. Hist. for a set of absent links
826 guest lecture 75
H. Tong 15-826
Guest-lecture 26
SCS CMU
Link Prediction: direction [Tong+ KDD 07 a]
• Q: Given the existence of the link, what is the direction of the link?
• A: Compare prox(i j) and prox(j i)>70%
i
j
i
i
i
>70%
Prox (i j) - Prox (j i)
density
826 guest lecture 76
SCS CMU
Beyond Link Prediction
• Collaborative Filtering [Fouss+]
• Name Disambiguation H. Tong
C. Faloutsos
J. Pan
P. S. Yu
…
– [Minkov+ SIGIR 06]
• Anomaly Nodes/Edges – ‘a’ is abnormal if the neighborhood of ‘a’ is so different– [Sun+ ICDM 2005]
826 guest lecture
Hanghang TongM.-J. Li
…
77
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• Link Prediction & +
• Ranking Related Tasks• Applications• Conclusion
826 guest lecture
• Ranking Related Tasks
• User Specific Patterns
• Time Related Tasks
78
H. Tong 15-826
Guest-lecture 27
SCS CMU Neighborhood Search on graphs
… …
…
…
Conference Author
A: Proximity! [Sun+ ICDM2005]
Q: what is most related conference to ICDM?
826 guest lecture 79
SCS CMU
NF: example
826 guest lecture 80
SCS CMU
gCaP: Automatic Image Caption• Q
…
Sea Sun Sky Wave{ } { }Cat Forest Grass Tiger
{?, ?, ?,}
?A: Proximity!
[Pan+ KDD2004]
826 guest lecture 81
H. Tong 15-826
Guest-lecture 28
SCS CMU
Image
Region
Test Image
Sea Sun Sky Wave Cat Forest Tiger Grass
Image
Keyword826 guest lecture 82
SCS CMU
Image
Region
Test Image
Sea Sun Sky Wave Cat Forest Tiger Grass
Image
Keyword
{Grass, Forest, Cat, Tiger}
83
SCS CMU
C-DEM: Multi-Modal Query System for DrosophilaEmbryo Databases [Guo+ VLDB 2008]
826 guest lecture 84
H. Tong 15-826
Guest-lecture 29
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• Link Prediction & +
• Ranking Related Tasks• Applications• Conclusion
826 guest lecture
• Ranking Related Tasks
• User Specific Patterns
• Time Related Tasks
85
SCS CMU
Center-Piece Subgraph Discovery [Tong+ KDD 06]
A
• Given: a graph W, and a query set • Find: the most central node (wrt the query
set)
Q: Who is the most central node wrt the black nodes?
(e.g., master-mind criminal, common advisor/collaborator, etc)
B C
86
SCS CMU
A A
Center-Piece Subgraph Discovery [Tong+ KDD 06]
Input: original graph Output: CePS
CePS Node
3-87
B C B C
Q: How to find hub for nodes A, B, C?Our Solution: Max (Prox(A, Red) x Prox(B, Red) x Prox(C, Red))
H. Tong 15-826
Guest-lecture 30
SCS CMU
?
CePS: Example (AND Query)
88
DBLP co-authorship network: -400,000 authors, 2,000,000 edges
Code at: http://www.cs.cmu.edu/~htong/soft.htm
SCS CMU
CePS: Example (AND Query)
10
10
1 2
1
89
DBLP co-authorship network: -400,000 authors, 2,000,000 edges
Code at: http://www.cs.cmu.edu/~htong/soft.htm
1
1 1
4 6
SCS CMU
K_SoftAnd: Relaxation of AND
Asking AND query? No Answer!
Disconnected Communities
Noise
826 guest lecture 90
H. Tong 15-826
Guest-lecture 31
SCS CMU
CePS: 2 SoftANDDB
Stat.
SCS CMU
OutputInput
Data Graph
Query Graph
Best-Effort Pattern Match
Matching Subgraph
Q: How to find matching subgraph?A: Proximity![Tong+ KDD 2007 b]
826 guest lecture
SCS CMU
G-Ray: How to?
matching nodematching node
matching node
details
matching node
Goodness = Prox (12, 4) x Prox (4, 12) xProx (7, 4) x Prox (4, 7) xProx (11, 7) x Prox (7, 11) xProx (12, 11) x Prox (11, 12)
826 guest lecture 93
H. Tong 15-826
Guest-lecture 32
SCS CMU
Effectiveness: star-query Databases
Query Result826 guest lecture
Bio-medicalIntelligent Agent
94
SCS CMU
Effectiveness: line-queryDatabases Learning Bio-medicalTheory
Query
Result826 guest lecture 95
SCS CMU
Outline
• Motivation• Definitions• Fast Solutions
• Link Prediction & +
• Ranking Related Tasks• Applications• Conclusion
826 guest lecture
• Ranking Related Tasks
• User Specific Patterns
• Time Related Tasks
96
H. Tong 15-826
Guest-lecture 33
SCS CMU
Challenge• Graphs are evolving over time!
–New nodes/edges show up; –Existing nodes/edges die out;
Ed i ht h–Edge weights change…
Q: How to generalize everything? A: Track Proximity! [Tong+ SDM 2008]
826 guest lecture 97
SCS CMU
pTrack/cTrack: Trend analysis on graph level
G.Hinton
T. SejnowskiRank of Influential-ness
M. Jordan
C. Koch
Year
826 guest lecture 98
SCS CMU
pTrack: Problem Definition
• [Given]– (1) a large, skewed time-evolving bipartite
graphs, – (2) the query nodes of interest
• [Track]– (1) top-k most related nodes for each query
node at each time step t; – (2) the proximity score (or rank of proximity)
between any two query nodes at each time step t
826 guest lecture 99
H. Tong 15-826
Guest-lecture 34
SCS CMU
pTrack: Philip S. Yu’s Top-5 conferences up to each year
ICDEICDCS
SIGMETRICSPDISVLDB
CIKMICDCSICDE
SIGMETRICSICMCS
KDDSIGMOD
ICDMCIKM
ICDCS
ICDMKDDICDESDMVLDB
1992 1997 2002 2007
DatabasesPerformanceDistributed Sys.
DatabasesData Mining
DBLP: (Au. x Conf.)- 400k aus, - 3.5k confs - 20 yrs
826 guest lecture 100
SCS CMU
KDD’s Rank wrt. VLDB over years
Prox.Rank
Year
Data Mining and Databases are getting closer & closer
826 guest lecture 101
SCS CMU
cTrack:10 most influential authors in NIPS community up to each year
T. Sejnowski
Author-paper bipartite graph from NIPS 1987-1999. 3k. 1740 papers, 2037 authors, spreading over 13 years
M. Jordan
102
H. Tong 15-826
Guest-lecture 35
SCS CMU
T3/MT3: How to understand time?
826 guest lecture 103
SCS CMU
A Motivating Example: InputsTime Event(e.g., Session) EntityOct. 26 Link Analysis Tom, Bob
Clustering Bob, AlanOct. 27 Classification Bob, Alan
Anomaly Detection Alan, BeckOct. 28 Party Beck, DanOct. 29 Web Search Dan, Jack
Advertising Jack, PeterOct. 30 Enterprise Search Jack, PeterOct. 31 Q & A Peter, Smith
SCS CMU
Time Cluster, rep. entities: b7,b6, b8A Motivating Example: Outputs
JackOct. 29
Oct. 30Oct. 30
Oct 28
Time ClusterRep. Entities:
``Jack’’, ``Peter’’, ``Smith’’
Abnormal TimeOct. 28
Oct. 26
Oct. 27
Rep. Entities:``Beck’’ ``Dan’’
Time ClusterRep. Entities:
``Tom’’, ``Bob’’,``Alan’’
105
H. Tong 15-826
Guest-lecture 36
SCS CMU
Problem Definitions: (How to Understand Time in such complex context)
• Given datasets collected at different time stamps;
• Find– Q1: Time Cluster – Q2: Abnormal Time stamp– Q3: Interpretations– Q4: Right time granularity
826 guest lecture 106
SCS CMU
T3: Basic IdeaOct. 26, 2008
Oct. 27, 2008
Link Analysis
Clustering
Classification
Anomaly Dect.
Tom
Bob
AlanTT
Oct. 28, 2008
Oct. 29, 2008
Oct. 30, 2008
Oct. 31, 2008
Party
Web Search
Advertising
En. Search
Q & A
Beck
Dan
Jack
Peter
Smith
TP
826 guest lecture107
SCS CMU
Data Sets• CIKM: from cikm proceedings
• Time: Publication years (1993-2007, 15)• Event: Paper (952)• Entities: Authors (1895) & Sessions
(279)• Attribute: Keyword (158)
826 guest lecture 108
H. Tong 15-826
Guest-lecture 37
SCS CMU
T3 on `CIKM’ Data Set Rep. Authors Rep. Keywords
James. P. CallanW. Bruce CroftJames AllanPhilip S. Yu
George KarypisCharles Clarke
WebCluster
ClassificationXML
LanguageStream
Rep. Authors Rep. KeywordsElke Rundensteiner
Daniel MirankerAndreas Henrich
Il-Yeol SongScott B Huffman
Robert J. Hall
KnowledgeSystem
UnstructuredRule
Object-orientedDeductive 109
SCS CMU
More Applications• Clustering
– Proximity as input [Ding+ KDD 2007]• Email management [Minkov+ CEAS 06].
• Business Process Management [Qu+ 2008]
• ProSINProSIN– Listen to clients’ comments [Tong+ 2008]
• Ghost Edge– Within Network Classification [Gallagher & Tong+ KDD08 b]
• …
826 guest lecture 112
SCS CMU
gCap:
pTrack/cTrack:
LP: [Liben-Nowell+][Tong+ 2007]
NF:
CePS:
G-Ray:
T3:GhostEdge:
[Gallagher & 2008
[Tong+ 2007]
[Tong+ 2006]
[Pan+ 2004]
[Pan+ 2005]
[Tong+ 2008]
FastAllDAP: [Tong+ 2007]
BB Lin: [Tong+ 2006]
FastUpdate: [Tong+ 2008]
ComputationsApplications
EfficientlySolve Linear
System(s)
MT3: [Tong+ 2008]Use Proximity as
Building block(gCap, CePS…)
Fast-ProSIN: [Tong+ 2008]
T3: & Tong+ 2008][Tong+ 2008]
RWR: [Pan+ 2004][Sun+ 2006][Tong+ 2006]
Variants: [Faloutsos+ 2004] [Koren+ 2006][Tong+ 2007]
Generalizations: [Charkrabarti+ 2006][Tong+ 2007, 2008]
Properties.: [Koren+ 2006]]Tong+ 2007, 2008]
Definitions
B_Lin: [Tong+ 2006]
BB_Lin: [Tong+ 2006]
ProximityOn
Graphs
Weighted Multiple
Relationship
System(s)
H. Tong 15-826
Guest-lecture 38
SCS CMU
Take-home messages• Proximity Definitions
– RWR– and a lot of variants
Computations• Computations– Sherman–Morrison Lemma– Fast Incremental Computation
• Applications– Proximity as a building block
826 guest lecture 114
SCS CMU
References• L. Page, S. Brin, R. Motwani, & T. Winograd. (1998), The
PageRank Citation Ranking: Bringing Order to the Web, Technical report, Stanford Library.
• T.H. Haveliwala. (2002) Topic-Sensitive PageRank. In WWW, 517-526, 2002
• J Y Pan H J Yang C Faloutsos & P Duygulu (2004) Automatic• J.Y. Pan, H.J. Yang, C. Faloutsos & P. Duygulu. (2004) Automatic multimedia cross-modal correlation discovery. In KDD, 653-658, 2004.
• C. Faloutsos, K. S. McCurley & A. Tomkins. (2002) Fast discovery of connection subgraphs. In KDD, 118-127, 2004.
• J. Sun, H. Qu, D. Chakrabarti & C. Faloutsos. (2005) Neighborhood Formation and Anomaly Detection in Bipartite Graphs. In ICDM, 418-425, 2005.
• W. Cohen. (2007) Graph Walks and Graphical Models. Draft.826 guest lecture 115
SCS CMU
References• P. Doyle & J. Snell. (1984) Random walks and electric
networks, volume 22. Mathematical Association America, New York.
• Y. Koren, S. C. North, and C. Volinsky. (2006) Measuring and extracting proximity in networks. In KDD, 245–255, 2006.
• A Agarwal S Chakrabarti & S Aggarwal (2006) Learning to• A. Agarwal, S. Chakrabarti & S. Aggarwal. (2006) Learning to rank networked entities. In KDD, 14-23, 2006.
• S. Chakrabarti. (2007) Dynamic personalized pagerank in entity-relation graphs. In WWW, 571-580, 2007.
• F. Fouss, A. Pirotte, J.-M. Renders, & M. Saerens. (2007) Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation. IEEE Trans. Knowl. Data Eng. 19(3), 355-369 2007.
826 guest lecture 116
H. Tong 15-826
Guest-lecture 39
SCS CMU
References• H. Tong & C. Faloutsos. (2006) Center-piece subgraphs:
problem definition and fast solutions. In KDD, 404-413, 2006.• H. Tong, C. Faloutsos, & J.Y. Pan. (2006) Fast Random Walk
with Restart and Its Applications. In ICDM, 613-622, 2006.• H. Tong, Y. Koren, & C. Faloutsos. (2007) Fast direction-
aware proximity for graph mining In KDD 747 756 2007aware proximity for graph mining. In KDD, 747-756, 2007.• H. Tong, B. Gallagher, C. Faloutsos, & T. Eliassi-Rad. (2007)
Fast best-effort pattern matching in large attributed graphs. In KDD, 737-746, 2007.
• H. Tong, S. Papadimitriou, P.S. Yu & C. Faloutsos. (2008) Proximity Tracking on Time-Evolving Bipartite Graphs. to appear in SDM 2008.
826 guest lecture 117
SCS CMU
References• B. Gallagher, H. Tong, T. Eliassi-Rad, C. Faloutsos. Using
Ghost Edges for Classification in Sparsely Labeled Networks. KDD 2008
• H. Tong, Y. Sakurai, T. Eliassi-Rad, and C. Faloutsos. Fast Mining of Complex Time-Stamped Events CIKM 08
• H Tong H Qu and H Jamjoom Measuring Proximity on• H. Tong, H. Qu, and H. Jamjoom. Measuring Proximity on Graphs with Side Information. ICDM 2008
826 guest lecture 118