Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.
-
Upload
margaret-cameron -
Category
Documents
-
view
215 -
download
2
Transcript of Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.
![Page 1: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/1.jpg)
DebellorData Mining Platform with Stream Architecture
Marcin Wojnarski
Warsaw University, Poland
![Page 2: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/2.jpg)
2
Outline
Debellor – data mining platform
Motivation
Main features
Architecture: Cell data streaming multi-threading
Available in ver. 0.6
Future releases
Summary
![Page 3: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/3.jpg)
3
Language: Java
Licence: open source (GPL)
Download: www.debellor.org
Debello – to conquer (latin). Debellor – conqueror of data
Debellor
![Page 4: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/4.jpg)
4
Rseslib
Debellor – data mining platform
Weka TA-Lib
Lib
SVM
own…
own…
Debellor
![Page 5: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/5.jpg)
5
Motivation
Demand for more complex algorithms.
Necessity to combine elementary algorithms.
![Page 6: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/6.jpg)
6
Motivation
1. Data Processing Network (DPN)
Load Preprocess PredictPreprocess
Save
Load
Visualize
![Page 7: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/7.jpg)
7
Motivation
2. Committee of algorithms
Classifier B Voting
Classifier A
Classifier C
![Page 8: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/8.jpg)
8
Motivation
3. Nested algorithms
RBF neural network
K-means
![Page 9: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/9.jpg)
9
Requirements
Versatile Efficient
Simple
![Page 10: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/10.jpg)
10
All types of data processing algorithms
Extendible data types
Stream architecture large data sets
Multi-threading
Immutability of data objects safety
Features of Debellor
![Page 11: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/11.jpg)
11
Debellor
![Page 12: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/12.jpg)
12
Algorithm Cell
cell
Cell cell = new RseslibClassifier("C45");
cell.set("pruning", "true");
![Page 13: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/13.jpg)
13
Cell – data source
cell
cell.open();
Sample s1 = cell.next(),
s2 = cell.next(),
...
cell.close();
![Page 14: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/14.jpg)
14
Cell – data receiver
cell
cell.setSource(anotherCell);
anotherCell
![Page 15: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/15.jpg)
15
Trainable Cell
cell
cell.setSource(…);
cell.learn();
cell
EMPTY
TRAINED
![Page 16: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/16.jpg)
16
Data Streaming
A B
A B
BATCH
STREAM
It’s the cell who is responsible for asking for data
![Page 17: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/17.jpg)
17
Benefits of streaming
X X
crash!
training of k-means
![Page 18: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/18.jpg)
18
Thread_1
Multi-threading
A B
![Page 19: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/19.jpg)
19
Thread_1
Multi-threading
A.newThread();
A B
Thread_2
![Page 20: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/20.jpg)
20
Available in version 0.6
Rseslib algorithms: classifiers (~20 algorithms)
Weka algorithms: ARFF reader classifiers (~60) filters (47)
Debellor algorithms: Train&Test evaluation k-means for large data (stream-based)
Data types: numeric and symbolic features vectors of features, vectors of vectors of …
![Page 21: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/21.jpg)
21
Future releases
Multi-input & multi-output cells
Composite cells (e.g. meta-learning)
Serialization and copying
…
![Page 22: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/22.jpg)
22
Summary
Platform
Stream architecture
Extendible
Multi-threaded
Weka & Rseslib partially integrated
![Page 23: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/23.jpg)
23
www.debellor.org
Home
![Page 24: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.](https://reader030.fdocuments.us/reader030/viewer/2022032805/56649ee75503460f94bf7d0b/html5/thumbnails/24.jpg)
24