Scalable Anomaly Ranking of Attributed Neighborhoods
-
Upload
bryan-perozzi -
Category
Data & Analytics
-
view
20 -
download
0
Transcript of Scalable Anomaly Ranking of Attributed Neighborhoods
Scalable Anomaly Ranking of Attributed Neighborhoods
SDMMay 5th, 2016
Bryan Perozzi, Leman AkogluStony Brook University
SDM 2016: Best Paper Runner-up Award!
What’s an Anomaly, Anyhow? Given an attributed subgraph how to
quantify its quality? Structure Only
Internal Measures Average Degree
Internal
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
What’s an Anomaly, Anyhow? Given an attributed subgraph how to
quantify its quality? Structure Only
Internal Measures Average Degree
Boundary Cut Edges
Internal + Boundary Conductance
Internal
External
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
What’s an Anomaly, Anyhow? Given an attributed subgraph how to
quantify its quality? Structure Only
Internal Measures Average Degree
Boundary Cut Edges
Internal + Boundary Conductance
Structure + Attributes SODA [Gupta+, 14] Attributed Weighted Normalized Cut [Gunnermann+,13]
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Outline Problem of Anomaly Ranking Metric: Normality Optimizing Normality Experimental Results Understanding Graphs with Normality
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Normality (intuition)
high
low
Given an attributed subgraph how to quantify quality?
Internal structural density
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Normality (intuition)
high
low
chess biking
Given an attributed subgraph how to quantify quality?
Internal structural density AND attribute coherence
neighborhood “focus”
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Normality (intuition) Given an attributed subgraph
how to quantify quality? Internal
structural density AND attribute coherence
neighborhood “focus” Boundary
structural sparsity, OR external separation
“exoneration”
high
lowBryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Motivation: no good cuts in real-world graphs social circles overlap
“Exoneration”: by (a) null model, (b) attributes
Normality (intuition)[Leskovec+ ‘08]
[McAuley+ ‘14]
(b) neighborhood overlap(a) hub effect
edges expected,not surprising
separable bydifferent “focus”
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
The measure of Normality
1
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
The measure of Normality
1internal
consistency
Null model
dot-product, orKronecker’s “focus” vector
chess biking
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
The measure of Normality
1external
separability
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Anomaly Mining of Entity Neighborhoods (AMEN)
Given a community, can we find the weights which maximize its normality?
2
1
latent
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Optimizing Normality
2
1
3
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Optimizing Normality
: one attribute f with largest x
x
: all f with positive x
Normality becomes
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Size Invariant Scoring So far: Normality of a community grows with
size Bad for comparisons
Need to “Normalize” normality for ranking
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Illustrative examplessplit-radix FFTtelescopic op-amps
telescopic cascodemultidecade
… …
reciprocal splitreserve
… …
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Example neighborhoods
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Synthetic Anomaly Detection
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Normality vs Conductance, DBLP
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Normality vs Conductance, Google+
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Normality Distribution
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Feature distribution, DBLP
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Feature distribution, LastFM
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods
Thanks! Any questions?
Bryan Perozzi
Papers, Code, Contact Info:
www.perozzi.net
Bryan Perozzi Scalable Anomaly Ranking of Attributed Neighborhoods