On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X....

20

Click here to load reader

description

Motivation  The model is decomposable into smaller components.  The decomposition is semantic-aware in the sense.

Transcript of On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X....

Page 1: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams

Peng Wang, H. Wang, X. Wu, W. Wang, and B. ShiProc. of the Fifth IEEE International Conference on Data Mining (ICDM’05)

Speaker: Yu Jiun LiuDate : 2006/9/26

Page 2: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Introduction State of the art

The incrementally updated classifiers. The ensemble classifiers.

Model Granularity Traditional : monolithic This paper : semantic decomposition

Page 3: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Motivation The model is decomposable into

smaller components.

The decomposition is semantic-aware in the sense.

Page 4: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Monolithic Models Stream : Attributes : Class Label : Window : Model (Classifier) : Ci

,,,1 krrdAA ,,1

iC

1,,, wiii rrrecordsoverW

Page 5: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Rule-based Models A rule form : minsup = 0.3 and minconf = 0.8 Valid rules of W1 are:

Valid rules of W3 are:

jk Cppp 21

Page 6: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Algorithm Phase 1 : Initialization

Use the first w records to train all valid rules for window W1.

Construct the RS-tree and REC-tree. Phase 2 : Update

When record arrives, insert it into the REC-tree and update the sup. and conf. of the rules matched by it.

Delete oldest record and update the value matched by it.

wir

Page 7: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Data Structure

Page 8: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

RS-Tree A prefix tree with attribute order Each node N represents a unique rule R : P Ci N’ (P’ Cj) is a child node of N, iff:

Page 9: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

REC-Tree Each record r as a sequence

Node N points to rulein the RS-tree if :

Page 10: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Detecting Concept Drifts percentage V.S. the distribution of the

misclassified records.

The percentage approach cannot tell us which part of the classifier gives rise to the inaccuracy.

Page 11: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Definition

Page 12: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Finding Rule Algorithm

Page 13: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Update Algorithm

Page 14: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Experiments CPU : 1.7 GHz Memory : 256MB Datasets : synthetic and real life dataset.

Synthetic :

Real life dataset : 10,344 recodes and 8 dimensions.

Page 15: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Effect of model updating Synthetic 10 dimensions Window size 5000 4 dimensions changing

Page 16: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

The relation of concept drifts and ijN

Page 17: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Effect of rule composition

Page 18: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Accuracy and Time Window size : 10,000 EC : 10 classifiers, each trained on 1000 records. Synthetic data.

Page 19: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Real life data

Page 20: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…

Conclusion Overcome the effects of concept

drifts. By reducing granularity, change

detection and model update can be more efficient without compromising classification accuracy.