Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

24
Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland

Transcript of Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

Page 1: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

DebellorData Mining Platform with Stream Architecture

Marcin Wojnarski

Warsaw University, Poland

Page 2: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

2

Outline

Debellor – data mining platform

Motivation

Main features

Architecture: Cell data streaming multi-threading

Available in ver. 0.6

Future releases

Summary

Page 3: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

3

Language: Java

Licence: open source (GPL)

Download: www.debellor.org

Debello – to conquer (latin). Debellor – conqueror of data

Debellor

Page 4: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

4

Rseslib

Debellor – data mining platform

Weka TA-Lib

Lib

SVM

own…

own…

Debellor

Page 5: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

5

Motivation

Demand for more complex algorithms.

Necessity to combine elementary algorithms.

Page 6: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

6

Motivation

1. Data Processing Network (DPN)

Load Preprocess PredictPreprocess

Save

Load

Visualize

Page 7: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

7

Motivation

2. Committee of algorithms

Classifier B Voting

Classifier A

Classifier C

Page 8: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

8

Motivation

3. Nested algorithms

RBF neural network

K-means

Page 9: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

9

Requirements

Versatile Efficient

Simple

Page 10: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

10

All types of data processing algorithms

Extendible data types

Stream architecture large data sets

Multi-threading

Immutability of data objects safety

Features of Debellor

Page 11: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

11

Debellor

Page 12: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

12

Algorithm Cell

cell

Cell cell = new RseslibClassifier("C45");

cell.set("pruning", "true");

Page 13: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

13

Cell – data source

cell

cell.open();

Sample s1 = cell.next(),

s2 = cell.next(),

...

cell.close();

Page 14: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

14

Cell – data receiver

cell

cell.setSource(anotherCell);

anotherCell

Page 15: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

15

Trainable Cell

cell

cell.setSource(…);

cell.learn();

cell

EMPTY

TRAINED

Page 16: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

16

Data Streaming

A B

A B

BATCH

STREAM

It’s the cell who is responsible for asking for data

Page 17: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

17

Benefits of streaming

X X

crash!

training of k-means

Page 18: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

18

Thread_1

Multi-threading

A B

Page 19: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

19

Thread_1

Multi-threading

A.newThread();

A B

Thread_2

Page 20: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

20

Available in version 0.6

Rseslib algorithms: classifiers (~20 algorithms)

Weka algorithms: ARFF reader classifiers (~60) filters (47)

Debellor algorithms: Train&Test evaluation k-means for large data (stream-based)

Data types: numeric and symbolic features vectors of features, vectors of vectors of …

Page 21: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

21

Future releases

Multi-input & multi-output cells

Composite cells (e.g. meta-learning)

Serialization and copying

Page 22: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

22

Summary

Platform

Stream architecture

Extendible

Multi-threaded

Weka & Rseslib partially integrated

Page 23: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

23

www.debellor.org

Home

Page 24: Debellor Data Mining Platform with Stream Architecture Marcin Wojnarski Warsaw University, Poland.

24