Post on 17-May-2020
Detecting Anomalies in Streaming Graphs
Nina MishraDhivya Eswaran Christos Faloutsos Sudipto Guha
SpotLight
Carnegie Mellon University Amazon
This work was performed at Amazon.
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Graphs are being created everywhere
�2
INTRODUCTION
You Alice
6 Jun 2018, 1.34am
………
………
………
………………
………
………………
………
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Many other settings…
�3
INTRODUCTION
IM/e-mail networks Computer networks
Transportation networks Edit networks
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
As a sequence of graph snapshots
�4
INTRODUCTION
time
Monday PM Tuesday PM
Monday AM Tuesday AM Wednesday AMMORNINGS
NIGHTS
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
But sometimes unusual events happen
�5
INTRODUCTION
NormalTax scamNetwork failure
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Unusual events in other settings
�6
INTRODUCTION
Computer networks (e.g., port scans,
denial-of-service)Transportation networks (events/weather)
stadium
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
How do we detect such anomalies in streaming graphs?
�7
INTRODUCTION
How do we even characterize these anomalies?
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Anomalies tend to involve…
�9
INSIGHT
sudden (dis)appearance of large dense directed subgraph
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
sudden (dis)appearance of large dense directed subgraph
�10
INSIGHT
sourcessources
destinationsdestinations
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018 �11
INSIGHT
sudden (dis)appearance of large dense directed subgraph
sources
destinationsmany nodes
many many edges
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018 �12
INSIGHT
sudden (dis)appearance of large dense directed subgraph
steady evolution?
suddeninitial final
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018 �13
INSIGHT
appearance disappearance
sudden (dis)appearance of large dense directed subgraph
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018 �15
PROBLEM
time
anomaly!Ok! Ok!Ok!
• (Un)directed weighted edges • Time-evolving node set • Known node-correspondence
STREAMING MODEL
• Real-time and fast detection • Bounded working memory
ALGORITHMIC CONSTRAINTS
GIVEN
FIND
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Overview of SpotLight
�17
ALGORITHM
Graph
Sketching
v(G3)
v(G1)
v(G2) v(G4)
G1
G3 G4
G2
anomaly! v(G3)
v(G1)
v(G2) v(G4)
Anomaly
Detection
Many off-the-shelf methods for anomaly detection:
‣ Robust Random Cut Forests [Guha, Mishra, Roy & Schrijvers; ICML 2016]
‣ Light-weight Online Detector of Anomalies [Pevny; ML 2016]
‣ Randomized Space Forests [Wu, Zhang, Fan, Edwards & Yu; ICDM 2014]
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
0 100
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
0 100 20
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight randomized graph sketching
�18
ALGORITHM
0 100 20
THREE PARAMETERS:
‣ Probability of sampling source ‘p’ ‣ Probability of sampling destination ‘q’ ‣ Number of sketching dimensions ‘K’
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
time5pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
time5pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
time5pm
0 0 0
ahS hS hS
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
time5pm
0 0 0
ahS hS hS
bhD hD hD
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
time5pm
0 0 1
ahS hS hS
bhD hD hD
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm
0 0 1
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm
0 0 1
bhS hS hS
chD hD hD
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm
0 2 3
bhS hS hS
chD hD hD
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
0 2 3
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
5-6pm
0 2 3
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
5-6pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
a
d2
a
a1
b
c1
7pm
5-6pm
1 0 2
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
a
d2
a
a1
b
c1
7pm
5-6pm 6-7pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
SpotLight at work on a stream
�19
ALGORITHM
STREAMING ANOMALY DETECTOR
Hashes: hS, hS, hS: src → {1,.., 1/p} & hD, hD, hD: dst → {1,.., 1/q}
anom
aly s
core
time
b
a1
c
b2
time5pm 6pm
a
d2
a
a1
b
c1
7pm
5-6pm 6-7pm
0 0 0
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Intuition behind our theorems
�21
GUARANTEES
G GBGR
v(GR)
v(GB)
K-dim SpotLight Space
v(G)dR
dB dR - dB > O(K m2)
Deterministic Experiment: Add ‘m’ unit-weight edges.
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Thm 1: Focus-awareness in expectation
�22
GUARANTEES
<
GGR GB
Randomized Experiment: Add ‘m’ unit-weight edges uniformly at random.
K-dim SpotLight Space
dR
dB
distance
proba
bility
E[dB]
Focus-awareness property was introduced by Koutra, Vogelstein & Faloutsos [SDM 2013].
E[dR]
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Thm 2: Criterion for anomaly detection
�23
GUARANTEES
distance
proba
bility dR dB
FN FP
decision thresholdanomalynormal
distancepro
babil
ity
dR dB
FPR ≤ 𝛅
𝛜
➡ Pr[dR-dB > 𝛜] ≥ 1-𝛅
“EXPECTED” GAP “HIGH PROBABILITY” GAP
sketch size, K ≥ K*
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
The labeled DARPA dataset
�25
EXPERIMENTS
4.5M edges in 87.7K time ticks 9.5K sources, 24K destinations Edges labeled as attack/not
Stream of 1.5K hourly graphs(24% anomalous)
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
DARPA: Precision and recall
�26
EXPERIMENTS
#graphs correctly flagged
#graphs flaggedPrecision =
#graphs correctly flagged
#anomalous graphsRecall =
RHSS: (Ranshous, Harenburg, Sharma & Samatova, SDM 2016)STA: Streaming Tensor Analysis (Sun, Tao & Faloutsos, KDD 2006)
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
DARPA: Challenges and successes
�27
EXPERIMENTS
SpotLight
Edge Weight = SL with K=p=q=1 (+misses medium size attacks)
(misses small attacks)
RHSS = Edge likelihood function (+misses repeated attacks)
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Summary
29
CONCLUSION
Memory efficient Theoretical guaranteesReal-time
Ok!
anomaly!
Ok! Ok! time
PROBLEM
SpotLight sketching
SOLUTION
ESWARAN, FALOUTSOS, GUHA & MISHRA
SPOTLIGHT: DETECTING ANOMALIES IN STREAMING GRAPHS
KDD 2018
Future directions
�30
CONCLUSION
MORE CHALLENGING ANOMALIES
‣ Slow and/or small attacks
‣ Sequence of suspicious events rather than a single event
STREAMING ANOMALY ATTRIBUTION
‣ Blame a small set of sources and destinations for the anomaly