N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu,...
-
date post
19-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu,...
![Page 1: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/1.jpg)
NEIGHBORHOOD FORMATION AND ANOMALY DETECTION IN BIPARTITE GRAPHS
Jimeng Sun, Huiming Qu,
Deepayan Chakrabarti & Christos Faloutsos
Presented ByBhavana Dalvi
![Page 2: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/2.jpg)
OUTLINE
Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
![Page 3: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/3.jpg)
BIPARTITE GRAPHS AND INTERESTING QUESTIONS
Author Paper graph
Authors Papers
a
![Page 4: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/4.jpg)
BIPARTITE GRAPHS AND INTERESTING QUESTIONS
Author Paper graph
Authors Papers
aWhich authors are most related to ‘a’ ?
![Page 5: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/5.jpg)
BIPARTITE GRAPHS AND INTERESTING QUESTIONS
Author Paper graph
Authors Papers
aWhich authors are most related to ‘a’ ?
![Page 6: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/6.jpg)
BIPARTITE GRAPHS AND INTERESTING QUESTIONS
Author Paper graph
Authors Papers
aWhich authors are most related to ‘a’ ?
0.8b
![Page 7: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/7.jpg)
BIPARTITE GRAPHS AND INTERESTING QUESTIONS
Author Paper graph
Authors Papers
aWhich authors are most related to ‘a’ ?
0.8
0.6
0.2
0.4
b
![Page 8: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/8.jpg)
BIPARTITE GRAPHS AND INTERESTING QUESTIONS
Author Paper graph
Authors Papers
a
Which is the uncommon paper written by ‘a’ ?
0.8
0.6
0.2
0.4
![Page 9: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/9.jpg)
BIPARTITE GRAPHS AND INTERESTING QUESTIONS
Author Paper graph
Authors Papers
a
Which is the uncommon paper written by ‘a’ ?
0.8
0.6
0.2
0.4
![Page 10: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/10.jpg)
BIPARTITE GRAPHS AND INTERESTING QUESTIONS
P2P Network
10
users
files
Which users have similar preferences as a particular user?
Jimeng Sun’s presentation at ICDM 2005
Which files are downloaded by users with very different preferences?
![Page 11: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/11.jpg)
OUTLINE
Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
![Page 12: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/12.jpg)
Neighborhood formation (NF)Input : query node q in V1
Output : relevance scores of all the nodes in V1 to q
Anomaly detection (AD)Input : query node q in V1, Output : normality scores for nodes in V2 that link to q
PROBLEM DEFINITION
V1 V2
q
E
![Page 13: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/13.jpg)
OUTLINE
Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
![Page 14: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/14.jpg)
NEIGHBORHOOD FORMATION
Relevance (b, q) (# short length paths from q to b)
b
q
The connection that links only b and q brings more relevance than the connection which links b, q and other nodes.
b
q
![Page 15: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/15.jpg)
EXACT NF ALGORITHM : RANDOM WALK WITH RESTART
Input : a graph G and a query node q
Output : relevance scores to q Construct the transition matrix where
every node in the graph becomes a state every state has a restart probability c to jump back to the query node q. transition probability
Find the steady-state probability u which is the relevance score of all the nodes to q
q
cc c
c
c
Jimeng Sun’s presentation at ICDM 2005
![Page 16: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/16.jpg)
FINDING STEADY STATE PROBABILITIES
|V1| = k , |V2| = n M : k*n matrix representing weighted graph G Adjacency matrix : PA = col_norm(MA) qA : transform query node ‘a’ to (k+n)*1 vector
where only ath column has 1 and rest are 0. uA : steady state probability vector with restart
probability c
Bipartite structure :
k << n then savings are significant
![Page 17: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/17.jpg)
EXTENSIONS TO NF ALGORITHM
Parallel NF If multiple queries, computation can be done in
parallel.
Approximate NF Cluster the nodes in to k partitions
(preprocessing) Given query node q, find partition Gi it belongs to Run Exact NF algorithm only on Gi
Set relevance = 0 for nodes not in Gi
![Page 18: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/18.jpg)
OUTLINE
Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
![Page 19: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/19.jpg)
ANOMALY DETECTION
A node x in V2 is normal if
Nodes in V1 that links to x
are in same neighbourhood. e.g. V1
V2
V1 V2
low normalityhigh normality
xx
![Page 20: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/20.jpg)
ANOMALY DETECTION ALGORITHM
Input : node t in V2, Bipartite transition matrix P,
Output : Normality score(t)
1. Set St = neighbours of t in V1
2. RSt : Pairwise relevance scores for nodes in St
3. Normality score ns(t) = function (RSt) e.g. mean over non-diagonal elements in
RSt
![Page 21: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/21.jpg)
OUTLINE
Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
![Page 22: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/22.jpg)
DATASETS
datasets |V1| |V2| |E| Avgdeg (V1)
Avgdeg (V2)
Conference-Author (CA)
2687 288K 662K 510 5
Author-Paper (AP)
316K 472K 1M 3 2
IMDB 553K 204K 2.2M 4 11
![Page 23: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/23.jpg)
DO THE NEIGHBORHOODS MAKE SENSE?
rele
van
ce s
core
rele
vanc
e sc
ore
rele
vanc
e sc
ore
most relevant neighbors most relevant neighbors
The nodes (x-axis) with the highest relevance scores (y-axis) are indeed very relevant to the query node.
![Page 24: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/24.jpg)
HOW ACCURATE IS THE APPROXIMATE NF?
neighborhood size = 20 num of partitions = 10
Precision = fraction of overlaps between ApprNF and NF among top k neighbors
The precision drops slowly while increasing the number of partition The precision remain high for a wide range of neighborhood size
![Page 25: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/25.jpg)
DO THE ANOMALIES MAKE SENSE?
avg
. nor
mal
ity
scor
e
Injection : • Inject 100 nodes in V2 connecting k nodes each in V1
where k = avg. degree of nodes in V2
• Nodes in V1 are randomly picked such that degree = 10 * avg. degree of nodes in V1
• Assumption : will induce connections across neighbourhoods
![Page 26: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/26.jpg)
WHAT ABOUT THE COMPUTATIONAL COST?
Computational cost drops significantly even with small increment in number of partitions
![Page 27: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/27.jpg)
OUTLINE
Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
![Page 28: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/28.jpg)
RELATEDWORK
Random walk on Graphs Page-Rank [ISDN 1998], Topic Sensitive Page-Rank [WWW 2002]
Outlier detection Outlier detection in high dimensional data : Aggarwal
and Yu [SIGMOD 2001] Outlier Detection Using Random Walks [ICTAI 2006]
Find outlier clusters
Graph partitioning : METIS package Spectral clustering methods Neighbourhoods can become personalized clusters
![Page 29: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/29.jpg)
OUTLINE
Motivation Problem Definition Neighborhood formation Anomaly detection Experiments Related work Conclusion and future work
![Page 30: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/30.jpg)
CONCLUSIONS AND FUTURE WORK
Solution to two problems for Bipartite Graphs Neighborhood Formation (NF) Anomaly Detection (AD)
Random walk with restart along with graph partitioning can be used to solve NF efficiently.
AD can be done based on relevance scores generated by NF
Experiments on real datasets show good results.
Proximity Tracking on Time-Evolving Graphs (SIAM 2008 paper) Defines proximity scores in dynamic setting. Efficient incremental updates
![Page 31: N EIGHBORHOOD F ORMATION AND A NOMALY D ETECTION IN B IPARTITE G RAPHS Jimeng Sun, Huiming Qu, Deepayan Chakrabarti & Christos Faloutsos Jimeng Sun, Huiming.](https://reader030.fdocuments.us/reader030/viewer/2022032703/56649d2e5503460f94a0573c/html5/thumbnails/31.jpg)
THANK YOU