Discovering Outlier Filtering Rules from Unlabeled Data

Post on 05-Jan-2016

23 views 0 download

description

Discovering Outlier Filtering Rules from Unlabeled Data. Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu. Outline. Motivation Objective Introduction Main Framework Outlier Detector - SmartSifter Rule Generator – DL-ESC/DL-SC - PowerPoint PPT Presentation

Transcript of Discovering Outlier Filtering Rules from Unlabeled Data

Discovering Outlier Filtering Rules from Unlabeled Data

Author: Kenji Yamanishi & Jun-ichi Takeuchi

Advisor: Dr. Hsu Graduate: Chia- Hsien Wu

Outline

Motivation Objective Introduction Main Framework Outlier Detector - SmartSifter Rule Generator – DL-ESC/DL-SC Experimentation–The network intrusion Experimental Results Conclusion Opinion

Motivation

The problem of the SmartSifter’s accuracy

The SmartSifter cannot find the general pattern of the identified outliers

Objective

Improving the accuracy of SmartSiFter.

Discovering a new pattern that outliers in a specific group may commonly have

Introduction

Developing SmartSifer : It is an on-line outlier detection algorithm

Improving the power of the SamtSifer by combining supervised learning method

Main Framework

Classifier L

A New Rule

Outlier Detector - SmartSifter ->SS

Using a probabilistic (Gaussian mixture) model->P(x,y) = p(x)p(y|x)

Employing an on-line discounting learning algorithm (SDLE)/(SDEM) to update the model

Giving a score to each datum

Outlier Detector - SmartSifter ->SS(cont.)

SDLE algorithm: An on-line discounting variant of the Laplace law based estimation algorithm

SDEM algorithm: An on-line discounting variant of the incremental EM (Expectation Maximization) algorithm

Outlier Detector - SmartSifter ->SS(cont.)

Outputting a sorted datasetA highly scored data indicates a high

possibility be an outlier

Rule Generator – DL-ESC/DL-SC

Using a stochastic decision list

Employing the principle of minimizing extended stochastic complexity or stochastic complexity

Rule Generator – DL-ESC/DL-SC (cont.)

If ξ makes t1 true, then μ = v1 with probability p1

else if ξ makes t2 true, then μ = v2 with probability p2

………………………

else μ = vs with probability ps

Experimentation - Network intrusion detection

The purpose of our experiment is to detect without making use of the labels concerning intrusions

Experimentation – Dataset (cont.)

Using the dataset KDD Cup 1999 prepared for network intrusion detection

Using the 13 attributes for DL-ESC Using four attributes for SmartSifter (service ,d

uration ,src_bytes ,dst_bytes) Only “service” is categorical Y= log(x+0.1),where the base of logarithm is e Generating five datasets S0,S1,S2,S3,S4

Experimentation – Dataset (cont.)

Experimentation – Illustration by an Example (cont.)

Update Rule – S1

First Rule – S1

Update Rule – S2

Experimental Results

SS : SmartSifter R&S: Rule and SmartSifter (This framework) Using S0 as a training set to construct a filtering

rule, each of S1,S2,S3,and S4 is used for test

Experimental Results (cont.)

Experimental Results (cont.)

Conclusion

This new framework has two features

Improving the power of SmartSifter

Helping the user discovers a general pattern

Opinion

Making the detection process more effective and more understandable

This framework can apply to other field