An Introduction to Sparkling Water by Michal Malohlava

14
An introduction to Sparkling Water Michal Malohlava h2o.ai

Transcript of An Introduction to Sparkling Water by Michal Malohlava

An introduction to Sparkling Water

Michal Malohlava h2o.ai

Who Am I?Background

• PhD in CS from Charles University in Prague, Czech Republic

• Postdoc at Purdue University experimenting with algos for large-scale computation

• Now at H2O.ai Experience with domain-specific languages,

distributed system, software engineering, and big data.

H2O.ai

H2O team

Sri Ambati Cliff ClickCo-F

ound

ers

Stephen Boyd

Rob Tibshirani

TrevorHastie

Scie

ntifi

cAd

viso

ryCo

unci

l

H2OOpen-Source In-Memory Data Science Platform

• Highly optimized Java code (in-house) • Distributed in-memory K-V store and map/

reduce computation framework • Data parser (HDFS, S3, NFS, HTTP, local

drives, etc.) • Read/write access to distributed data

frames (R/Pandas-style) • ML algos - Deep Learning, GBM, DRF,

GLM, GLRM, K-Means, PCA, CoxPH, Ensembles

• REST API: clients Interactive UI/R/Python

Sparkling Water

Sparkling WaterProvides

• Transparent integration of H2O into Spark ecosystem

• Use H2O Frames and algorithms with Spark API

Excels in existing Spark workflows requiring advanced Machine Learning algorithms

Where to use Sparkling Water?

Data SourceM

odel

build

ing

Modelling

Deep Learning, GBMDRF, GLM, GLRM

K-Means, PCACoxPH, Ensembles

Prediction processingData munging

Where to use Sparkling Water?

Data Source

Data

par

sing

mun

ging

ModellingData load/munging/

exploration

Load and parsedata directly into

H2OFrame

Ad hocdata

transformation

Where to use Sparkling Water?

DataSourceO

ff-lin

e m

odel

train

ing

Stre

ampr

oces

sing

Data Stream

Data munging

Model prediction

Deploy the model

Export modelin a binary format

or as code

Modelling

WHAT IS INSIDE?

Cluster manager

Worker node

Spark executor

Scala/Py main program

Driver node

H2OContext

SparkContext

Worker node

Spark executor

Worker node

Spark executor

H2O

Ser

vice

sH

2O S

ervi

ces

Data Source

Spar

k Ex

ecut

orSp

ark

Exec

utor

Spar

k Ex

ecut

or

Spark Cluster

DataFrame

H2O

Ser

vice

s

H2OFrame

Data Source

h2oContext.asDataFrame

h2oContext.asH2OFrame

TIME FOR DEMO

Key Points to RememberSparkling Water integrates H2O to Spark

• Enables using advanced machine learning algorithms inside Spark workflows

• Offers eager computation model,mutable data structure H2OFrame

THANK YOU.@h2oai @mmalohlava

h2o.ai/downloadgithub.com/h2oai/sparkling-waterVisit our booth for live demos and more!