Introduction to Sparkling Water - Spark Summit East 2016

Post on 12-Apr-2017

579 views 0 download

Transcript of Introduction to Sparkling Water - Spark Summit East 2016

An introduction to Sparkling Water

Michal Malohlava h2o.ai

Who Am I?Background

• PhD in CS from Charles University in Prague, Czech Republic

• Postdoc at Purdue University experimenting with algos for large-scale computation

• Now software engineer at H2O.ai Experience with domain-specific languages,

distributed system, software engineering, and big data.

H2O.aiH

2O team

Sri Ambati Cliff ClickCo-

Foun

ders

Stephen Boyd

Rob Tibshirani

TrevorHastie

Scie

ntifi

cA

dvis

ory

Cou

ncil

H2OOpen-Source In-Memory Data Science Platform

• Highly optimized Java code (in-house) • Distributed in-memory K-V store and map/

reduce computation framework • Data parser (HDFS, S3, NFS, HTTP, local

drives, etc.) • Read/write access to distributed data

frames (R/Pandas-style) • ML algos - Deep Learning, GBM, DRF,

GLM, GLRM, K-Means, PCA, CoxPH, Ensembles

• REST API: clients Interactive UI/R/Python

Sparkling Water

Sparkling WaterProvides

• Transparent integration of H2O into Spark ecosystem

• Use H2O Frames and algorithms with Spark API

Excels in existing Spark workflows requiring advanced Machine Learning algorithms

TYPICAL USE CASES

Where to use Sparkling Water?

Data SourceM

odel

build

ing

Modelling

Deep Learning, GBMDRF, GLM, GLRM

K-Means, PCACoxPH, Ensembles

Prediction processingData munging

Where to use Sparkling Water?

Data Source

Dat

a pa

rsin

gm

ungi

ng

ModellingData load/munging/

exploration

Load and parsedata directly into

H2OFrame

Ad hocdata

transformation

Where to use Sparkling Water?

DataSourceO

ff-lin

e m

odel

trai

ning

Stre

ampr

oces

sing

Data Stream

Data munging

Model prediction

Deploy the model

Export modelin a binary format

or as code

Modelling

WHAT IS INSIDE?

Cluster manager

Worker node

Spark executor

Scala/Py main program

Driver node

H2OContext

SparkContext

Worker node

Spark executor

Worker node

Spark executor

H2O

Ser

vice

sH

2O S

ervi

ces

Data Source

Spar

k Ex

ecut

orSp

ark

Exec

utor

Spar

k Ex

ecut

or

Spark Cluster

DataFrame

H2O

Ser

vice

s

H2OFrame

Data Source

h2oContext.asDataFrame

h2oContext.asH2OFrame

TIME FOR DEMO!

Key Points to RememberSparkling Water integrates H2O to Spark

• Enables using advanced machine learning algorithms inside Spark workflows

• Offers eager computation model,mutable data structure H2OFrame

THANK YOU.@h2oai @mmalohlava

h2o.ai/downloadgithub.com/h2oai/sparkling-waterVisit our booth K27 for live demos and more!