On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X....
Click here to load reader
-
Upload
lewis-jennings -
Category
Documents
-
view
212 -
download
0
description
Transcript of On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X....
![Page 1: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/1.jpg)
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams
Peng Wang, H. Wang, X. Wu, W. Wang, and B. ShiProc. of the Fifth IEEE International Conference on Data Mining (ICDM’05)
Speaker: Yu Jiun LiuDate : 2006/9/26
![Page 2: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/2.jpg)
Introduction State of the art
The incrementally updated classifiers. The ensemble classifiers.
Model Granularity Traditional : monolithic This paper : semantic decomposition
![Page 3: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/3.jpg)
Motivation The model is decomposable into
smaller components.
The decomposition is semantic-aware in the sense.
![Page 4: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/4.jpg)
Monolithic Models Stream : Attributes : Class Label : Window : Model (Classifier) : Ci
,,,1 krrdAA ,,1
iC
1,,, wiii rrrecordsoverW
![Page 5: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/5.jpg)
Rule-based Models A rule form : minsup = 0.3 and minconf = 0.8 Valid rules of W1 are:
Valid rules of W3 are:
jk Cppp 21
![Page 6: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/6.jpg)
Algorithm Phase 1 : Initialization
Use the first w records to train all valid rules for window W1.
Construct the RS-tree and REC-tree. Phase 2 : Update
When record arrives, insert it into the REC-tree and update the sup. and conf. of the rules matched by it.
Delete oldest record and update the value matched by it.
wir
![Page 7: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/7.jpg)
Data Structure
![Page 8: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/8.jpg)
RS-Tree A prefix tree with attribute order Each node N represents a unique rule R : P Ci N’ (P’ Cj) is a child node of N, iff:
![Page 9: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/9.jpg)
REC-Tree Each record r as a sequence
Node N points to rulein the RS-tree if :
![Page 10: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/10.jpg)
Detecting Concept Drifts percentage V.S. the distribution of the
misclassified records.
The percentage approach cannot tell us which part of the classifier gives rise to the inaccuracy.
![Page 11: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/11.jpg)
Definition
![Page 12: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/12.jpg)
Finding Rule Algorithm
![Page 13: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/13.jpg)
Update Algorithm
![Page 14: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/14.jpg)
Experiments CPU : 1.7 GHz Memory : 256MB Datasets : synthetic and real life dataset.
Synthetic :
Real life dataset : 10,344 recodes and 8 dimensions.
![Page 15: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/15.jpg)
Effect of model updating Synthetic 10 dimensions Window size 5000 4 dimensions changing
![Page 16: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/16.jpg)
The relation of concept drifts and ijN
![Page 17: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/17.jpg)
Effect of rule composition
![Page 18: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/18.jpg)
Accuracy and Time Window size : 10,000 EC : 10 classifiers, each trained on 1000 records. Synthetic data.
![Page 19: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/19.jpg)
Real life data
![Page 20: On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu,…](https://reader038.fdocuments.us/reader038/viewer/2022101123/5a4d1beb7f8b9ab0599e423b/html5/thumbnails/20.jpg)
Conclusion Overcome the effects of concept
drifts. By reducing granularity, change
detection and model update can be more efficient without compromising classification accuracy.