TensorFlow with Apache Ignite and Machine and Deep ... · Achieving Continuous Machine and Deep...

Post on 10-Aug-2020

11 views 0 download

Transcript of TensorFlow with Apache Ignite and Machine and Deep ... · Achieving Continuous Machine and Deep...

Achieving Continuous Machine and Deep Learning with Apache Ignite and TensorFlow Yuriy Babak

2019 © GridGain Systems

2019 © GridGain Systems2019 © GridGain Systems

Speaker

2

Yuriy Babak● Head of ML/DL framework development at GridGain● Apache Ignite committer

2019 © GridGain Systems2019 © GridGain Systems

Agenda

• ML introduction• Apache Ignite ML overview• Integration with TensorFlow• Demo

2019 © GridGain Systems

2019 © GridGain Systems GridGain Company Confidential4

Apache Ignite ML overview

2019 © GridGain Systems2019 © GridGain Systems

ML Terms

5

● Raw Data - data of business (logs, transactions, …) ● Training/Test set - preprocessed data from raw data, numeric features

(set: age, user type id, gender, ...)● Training algorithm - produce model by training set (GDB, Gibbs

Sampling, Decision Tree building)● Model - formula for producing answers on new business data (decision

tree, SVM, linear regression)● Meta algorithm - combinator of several models (boosting, bagging,

stacking)● Evaluation/Validation - estimation of model given test/validation set

(Precision, MSE, ROC AUC)● Inference - using ML model for prediction

2019 © GridGain Systems2019 © GridGain Systems

Typical ML tasks

6

1. Regression2. Time-series regression3. Classification (binary, multiclass)4. Clusterization5. Ranking6. Entity extraction, structure recognition7. Generative models

2019 © GridGain Systems2019 © GridGain Systems

Distributed model training

7

Calculate local improvements

Aggregate local improvements

We have a model

Update model

Map

Reduce

2019 © GridGain Systems2019 © GridGain Systems

Ignite partition-based dataset

8

Partition Data Dataset Context Dataset Data

Upstream Cache Context Cache On-Heap

Learning Env

On-Heap

Durable Stateless Durable Recoverable

Dataset dataset = … // Partition based dataset, internal API

dataset.compute((env, ctx, data) -> map(...), (r1, r2) -> reduce(...))

double[][] x = …double[] y = ...

double[][] x = …double[] y = ...

Partition Based Dataset StructuresSource Data

2019 © GridGain Systems2019 © GridGain Systems

Apache Ignite ML API

9

2019 © GridGain Systems2019 © GridGain Systems

List pre-build models

10

Regression: Linear Regression (LSQR and SGD), Decision Tree, Random Forest, GBT, Nearest Neighbours (KNN), MLP.

Classification: SVM, Nearest Neighbours (ANN, KNN), Decision Tree, MLP, GBT, Logistic Regression.

Clustering: K-Means, Gaussian Mixture Model (GMM).

Preprocessing: Normalization, Binarization, Imputer, One-Hot Encoder, String Encoder, MinMax Scaler, MaxAbsScaler.

Ensambles: Boosting, Stacking, Bagging of any model.

2019 © GridGain Systems GridGain Company Confidential11

TensorFlow integration

2019 © GridGain Systems2019 © GridGain Systems

Apache Ignite as data source

12

2019 © GridGain Systems2019 © GridGain Systems

Distributed training

13

TensorFlow Cluster on top of Apache Ignite ClusterTensorFlow/Ignite Client

ignite-tf.sh

Tool that help to submit TensorFlow code into cluster and manage it.

2019 © GridGain Systems2019 © GridGain Systems

Distributed training

14

2019 © GridGain Systems GridGain Company Confidential15

Demo

2019 © GridGain Systems2019 © GridGain Systems

Demo

16

Training and inference for pre-build models

2019 © GridGain Systems2019 © GridGain Systems

Demo

17

TensorFlow model inference from Apache Ignite

2019 © GridGain Systems2019 © GridGain Systems

Links

18

● Apache Ignite docs - https://apacheignite.readme.io/docs● Apache Ignite sources - https://github.com/apache/ignite● TensorFlow IO module - https://github.com/tensorflow/io/tree/master/tensorflow_io/ignite

2019 © GridGain Systems2019 © GridGain Systems19

Q&A

2019 © GridGain Systems2019 © GridGain Systems

Attend the In-Memory Computing Summit

20