Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network...

Post on 14-Aug-2020

14 views 0 download

Transcript of Introduction to Data Miningukang/courses/20S-DM/L... · Anomaly Detection Applications Network...

U Kang

Introduction to Data Mining

Anomaly Detection

U KangSeoul National Univeristy

U Kang

In This Lecture

Motivation of anomaly detection

Graph structure based method

Random walk based method

U Kang

Outline

Overview

Graph Structure Based Method

Random Walk Based Method

U Kang

Data Mining

Data mining: find patterns and anomalies

To spot anomalies, we have to discover patterns

U Kang

Data Mining

Data mining: find patterns and anomalies

To spot anomalies, we have to discover patterns

Large datasets reveal patterns/anomalies that may be invisible otherwise…

U Kang

Anomaly Detection

Anomaly detection

Find suspicious data points which deviate significantly from normal data

Anomaly detection in graph

Find “strange” node in graph

U Kang

Anomaly Detection

Applications

Network intrusion detection: find suspicious attackers (e.g. DDoS attack, spammer, etc.)

Call network : find heavy telemarketer

Social network : spot people adding friends indiscriminately in “popularity contest”

Credit card fraud

(the list continues..)

U Kang

Anomaly Detection

More Applications

Campaign donation irregularity

Extremely cross-disciplinary authors in an author-paper graph

Electronic auction fraud

U Kang

Plan

We will look at two methods for anomaly detection in graphs

Graph Structure Based Method

Random Walk Based Method

U Kang

Outline

Overview

Graph Structure Based Method

Random Walk Based Method

L. Akoglu, M. McGlohon, C. Faloutsos. OddBall: Spotting Anomalies in Weighted

Graphs. PAKDD, 2012

U Kang

Problem Definition

Given: a weighted and unlabeled graph,

Q1: how can we spot strange, abnormal, extreme nodes?

Q2 : how can we explain why the spotted nodes are anomalous?

U Kang

OddBall: approach

For each node

Extract “ego-net” (=1 step neighborhood)

Extract features (#edges, total weight, etc.)

Features that could yield “laws”

Features fast to compute and interpret

Detect patterns

Regularities

Detect anomalies

Deviate significantly

from patterns

U Kang

What is Odd?

U Kang

Main Idea

For each egonet, extract features

Find “rules” in features

Anomalies deviate significantly from the rules

U Kang

Which Features?

Ni : # of neighbors (degree) of ego i

Ei : # of edges in egonet i

Wi : total weight of egonet i

λw,i : principal eigenvalue of the weighted adjacency matrix of egonet i

U Kang

Why Principal Eigenvalue?

U Kang

OddBall: pattern #1

U Kang

OddBall: pattern #2

U Kang

OddBall: pattern #3

U Kang

OddBall: anomaly detection

(e.g. LOF)

U Kang

OddBall: datasets

U Kang

OddBall at work (Posts)

U Kang

OddBall at work (FEC)

U Kang

OddBall at work (DBLP)

U Kang

Outline

Overview

Graph Structure Based Method

Random Walk Based Method

J. Sun, H. Qu, D. Chakrabarti, and C. Faloutsos. Neighborhood formation and anomaly

detection in bipartite graphs. ICDM, 2005

U Kang

Anomalies in Bipartite Graphs

U Kang

Examples of Bipartite Graphs

Publication network

Author-paper

P2P network

User-file

Recommendation

User-product

Stock market

Stock-trader

U Kang

1) Neighborhood Formulation

Main idea

Compute the Random Walk with Restart score from query node q

Steady state probability = relevance

U Kang

1) Neighborhood Formulation

Exact Neighborhood Formulation (NF)

Exact RWR score

Approximate NF

Partition the original graph into pieces by METIS

Compute similarities only on the partition containing the query node

U Kang

2) Anomaly Detection

Main idea: to compute anomaly score of t

Compute pairwise “relevance” scores for the neighbors of t

Compute mean of the relevance scores

U Kang

Experiment

Dataset:

DBLP Conf-Auth

DBLP Author-Paper

IMDB movie-actor

Questions:

Q1) What are the discoveries?

Q2) Anomaly detection quality?

U Kang

1) NF discovery

U Kang

2) Anomaly Detection Quality

Setting: injected 100 random nodes connecting high degree nodes

U Kang

What You Need to Know

Anomaly detection

Find suspicious data points which deviate significantly from normal data

Anomaly detection in graphs

Graph Structure Based Method

Random Walk Based Method

Neighborhood Formulation (NF)

Anomaly detection using NF

U Kang

Questions?