Post on 14-Aug-2020
U Kang
Introduction to Data Mining
Anomaly Detection
U KangSeoul National Univeristy
U Kang
In This Lecture
Motivation of anomaly detection
Graph structure based method
Random walk based method
U Kang
Outline
Overview
Graph Structure Based Method
Random Walk Based Method
U Kang
Data Mining
Data mining: find patterns and anomalies
To spot anomalies, we have to discover patterns
U Kang
Data Mining
Data mining: find patterns and anomalies
To spot anomalies, we have to discover patterns
Large datasets reveal patterns/anomalies that may be invisible otherwise…
U Kang
Anomaly Detection
Anomaly detection
Find suspicious data points which deviate significantly from normal data
Anomaly detection in graph
Find “strange” node in graph
U Kang
Anomaly Detection
Applications
Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.)
Call network : find heavy telemarketer
Social network : spot people adding friends indiscriminately in “popularity contest”
Credit card fraud
(the list continues..)
U Kang
Anomaly Detection
More Applications
Campaign donation irregularity
Extremely cross-disciplinary authors in an author-paper graph
Electronic auction fraud
U Kang
Plan
We will look at two methods for anomaly detection in graphs
Graph Structure Based Method
Random Walk Based Method
U Kang
Outline
Overview
Graph Structure Based Method
Random Walk Based Method
L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted
Graphs. PAKDD, 2012
U Kang
Problem Definition
Given: a weighted and unlabeled graph,
Q1: how can we spot strange, abnormal, extreme nodes?
Q2 : how can we explain why the spotted nodes are anomalous?
U Kang
OddBall: approach
For each node
Extract “ego-net” (=1 step neighborhood)
Extract features (#edges, total weight, etc.)
Features that could yield “laws”
Features fast to compute and interpret
Detect patterns
Regularities
Detect anomalies
Deviate significantly
from patterns
U Kang
What is Odd?
U Kang
Main Idea
For each egonet, extract features
Find “rules” in features
Anomalies deviate significantly from the rules
U Kang
Which Features?
Ni : # of neighbors (degree) of ego i
Ei : # of edges in egonet i
Wi : total weight of egonet i
λw,i : principal eigenvalue of the weighted adjacency matrix of egonet i
U Kang
Why Principal Eigenvalue?
U Kang
OddBall: pattern #1
U Kang
OddBall: pattern #2
U Kang
OddBall: pattern #3
U Kang
OddBall: anomaly detection
(e.g. LOF)
U Kang
OddBall: datasets
U Kang
OddBall at work (Posts)
U Kang
OddBall at work (FEC)
U Kang
OddBall at work (DBLP)
U Kang
Outline
Overview
Graph Structure Based Method
Random Walk Based Method
J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly
detection in bipartite graphs. ICDM, 2005
U Kang
Anomalies in Bipartite Graphs
U Kang
Examples of Bipartite Graphs
Publication network
Author-paper
P2P network
User-file
Recommendation
User-product
Stock market
Stock-trader
U Kang
1) Neighborhood Formulation
Main idea
Compute the Random Walk with Restart score from query node q
Steady state probability = relevance
U Kang
1) Neighborhood Formulation
Exact Neighborhood Formulation (NF)
Exact RWR score
Approximate NF
Partition the original graph into pieces by METIS
Compute similarities only on the partition containing the query node
U Kang
2) Anomaly Detection
Main idea: to compute anomaly score of t
Compute pairwise “relevance” scores for the neighbors of t
Compute mean of the relevance scores
U Kang
Experiment
Dataset:
DBLP Conf-Auth
DBLP Author-Paper
IMDB movie-actor
Questions:
Q1) What are the discoveries?
Q2) Anomaly detection quality?
U Kang
1) NF discovery
U Kang
2) Anomaly Detection Quality
Setting: injected 100 random nodes connecting high degree nodes
U Kang
What You Need to Know
Anomaly detection
Find suspicious data points which deviate significantly from normal data
Anomaly detection in graphs
Graph Structure Based Method
Random Walk Based Method
Neighborhood Formulation (NF)
Anomaly detection using NF
U Kang
Questions?