Discovering Outlier Filtering Rules from Unlabeled Data

20
Discovering Outlier Filtering Rules from Unlabeled Data Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu

description

Discovering Outlier Filtering Rules from Unlabeled Data. Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu. Outline. Motivation Objective Introduction Main Framework Outlier Detector - SmartSifter Rule Generator – DL-ESC/DL-SC - PowerPoint PPT Presentation

Transcript of Discovering Outlier Filtering Rules from Unlabeled Data

Page 1: Discovering Outlier Filtering  Rules from Unlabeled Data

Discovering Outlier Filtering Rules from Unlabeled Data

Author: Kenji Yamanishi & Jun-ichi Takeuchi

Advisor: Dr. Hsu Graduate: Chia- Hsien Wu

Page 2: Discovering Outlier Filtering  Rules from Unlabeled Data

Outline

Motivation Objective Introduction Main Framework Outlier Detector - SmartSifter Rule Generator – DL-ESC/DL-SC Experimentation–The network intrusion Experimental Results Conclusion Opinion

Page 3: Discovering Outlier Filtering  Rules from Unlabeled Data

Motivation

The problem of the SmartSifter’s accuracy

The SmartSifter cannot find the general pattern of the identified outliers

Page 4: Discovering Outlier Filtering  Rules from Unlabeled Data

Objective

Improving the accuracy of SmartSiFter.

Discovering a new pattern that outliers in a specific group may commonly have

Page 5: Discovering Outlier Filtering  Rules from Unlabeled Data

Introduction

Developing SmartSifer : It is an on-line outlier detection algorithm

Improving the power of the SamtSifer by combining supervised learning method

Page 6: Discovering Outlier Filtering  Rules from Unlabeled Data

Main Framework

Classifier L

A New Rule

Page 7: Discovering Outlier Filtering  Rules from Unlabeled Data

Outlier Detector - SmartSifter ->SS

Using a probabilistic (Gaussian mixture) model->P(x,y) = p(x)p(y|x)

Employing an on-line discounting learning algorithm (SDLE)/(SDEM) to update the model

Giving a score to each datum

Page 8: Discovering Outlier Filtering  Rules from Unlabeled Data

Outlier Detector - SmartSifter ->SS(cont.)

SDLE algorithm: An on-line discounting variant of the Laplace law based estimation algorithm

SDEM algorithm: An on-line discounting variant of the incremental EM (Expectation Maximization) algorithm

Page 9: Discovering Outlier Filtering  Rules from Unlabeled Data

Outlier Detector - SmartSifter ->SS(cont.)

Outputting a sorted datasetA highly scored data indicates a high

possibility be an outlier

Page 10: Discovering Outlier Filtering  Rules from Unlabeled Data

Rule Generator – DL-ESC/DL-SC

Using a stochastic decision list

Employing the principle of minimizing extended stochastic complexity or stochastic complexity

Page 11: Discovering Outlier Filtering  Rules from Unlabeled Data

Rule Generator – DL-ESC/DL-SC (cont.)

If ξ makes t1 true, then μ = v1 with probability p1

else if ξ makes t2 true, then μ = v2 with probability p2

………………………

else μ = vs with probability ps

Page 12: Discovering Outlier Filtering  Rules from Unlabeled Data

Experimentation - Network intrusion detection

The purpose of our experiment is to detect without making use of the labels concerning intrusions

Page 13: Discovering Outlier Filtering  Rules from Unlabeled Data

Experimentation – Dataset (cont.)

Using the dataset KDD Cup 1999 prepared for network intrusion detection

Using the 13 attributes for DL-ESC Using four attributes for SmartSifter (service ,d

uration ,src_bytes ,dst_bytes) Only “service” is categorical Y= log(x+0.1),where the base of logarithm is e Generating five datasets S0,S1,S2,S3,S4

Page 14: Discovering Outlier Filtering  Rules from Unlabeled Data

Experimentation – Dataset (cont.)

Page 15: Discovering Outlier Filtering  Rules from Unlabeled Data

Experimentation – Illustration by an Example (cont.)

Update Rule – S1

First Rule – S1

Update Rule – S2

Page 16: Discovering Outlier Filtering  Rules from Unlabeled Data

Experimental Results

SS : SmartSifter R&S: Rule and SmartSifter (This framework) Using S0 as a training set to construct a filtering

rule, each of S1,S2,S3,and S4 is used for test

Page 17: Discovering Outlier Filtering  Rules from Unlabeled Data

Experimental Results (cont.)

Page 18: Discovering Outlier Filtering  Rules from Unlabeled Data

Experimental Results (cont.)

Page 19: Discovering Outlier Filtering  Rules from Unlabeled Data

Conclusion

This new framework has two features

Improving the power of SmartSifter

Helping the user discovers a general pattern

Page 20: Discovering Outlier Filtering  Rules from Unlabeled Data

Opinion

Making the detection process more effective and more understandable

This framework can apply to other field