Introduction to Sparkling Water - Spark Summit East 2016

15
An introduction to Sparkling Water Michal Malohlava h2o.ai

Transcript of Introduction to Sparkling Water - Spark Summit East 2016

Page 1: Introduction to Sparkling Water - Spark Summit East 2016

An introduction to Sparkling Water

Michal Malohlava h2o.ai

Page 2: Introduction to Sparkling Water - Spark Summit East 2016

Who Am I?Background

• PhD in CS from Charles University in Prague, Czech Republic

• Postdoc at Purdue University experimenting with algos for large-scale computation

• Now software engineer at H2O.ai Experience with domain-specific languages,

distributed system, software engineering, and big data.

Page 3: Introduction to Sparkling Water - Spark Summit East 2016

H2O.aiH

2O team

Sri Ambati Cliff ClickCo-

Foun

ders

Stephen Boyd

Rob Tibshirani

TrevorHastie

Scie

ntifi

cA

dvis

ory

Cou

ncil

Page 4: Introduction to Sparkling Water - Spark Summit East 2016

H2OOpen-Source In-Memory Data Science Platform

• Highly optimized Java code (in-house) • Distributed in-memory K-V store and map/

reduce computation framework • Data parser (HDFS, S3, NFS, HTTP, local

drives, etc.) • Read/write access to distributed data

frames (R/Pandas-style) • ML algos - Deep Learning, GBM, DRF,

GLM, GLRM, K-Means, PCA, CoxPH, Ensembles

• REST API: clients Interactive UI/R/Python

Sparkling Water

Page 5: Introduction to Sparkling Water - Spark Summit East 2016

Sparkling WaterProvides

• Transparent integration of H2O into Spark ecosystem

• Use H2O Frames and algorithms with Spark API

Excels in existing Spark workflows requiring advanced Machine Learning algorithms

Page 6: Introduction to Sparkling Water - Spark Summit East 2016

TYPICAL USE CASES

Page 7: Introduction to Sparkling Water - Spark Summit East 2016

Where to use Sparkling Water?

Data SourceM

odel

build

ing

Modelling

Deep Learning, GBMDRF, GLM, GLRM

K-Means, PCACoxPH, Ensembles

Prediction processingData munging

Page 8: Introduction to Sparkling Water - Spark Summit East 2016

Where to use Sparkling Water?

Data Source

Dat

a pa

rsin

gm

ungi

ng

ModellingData load/munging/

exploration

Load and parsedata directly into

H2OFrame

Ad hocdata

transformation

Page 9: Introduction to Sparkling Water - Spark Summit East 2016

Where to use Sparkling Water?

DataSourceO

ff-lin

e m

odel

trai

ning

Stre

ampr

oces

sing

Data Stream

Data munging

Model prediction

Deploy the model

Export modelin a binary format

or as code

Modelling

Page 10: Introduction to Sparkling Water - Spark Summit East 2016

WHAT IS INSIDE?

Page 11: Introduction to Sparkling Water - Spark Summit East 2016

Cluster manager

Worker node

Spark executor

Scala/Py main program

Driver node

H2OContext

SparkContext

Worker node

Spark executor

Worker node

Spark executor

Page 12: Introduction to Sparkling Water - Spark Summit East 2016

H2O

Ser

vice

sH

2O S

ervi

ces

Data Source

Spar

k Ex

ecut

orSp

ark

Exec

utor

Spar

k Ex

ecut

or

Spark Cluster

DataFrame

H2O

Ser

vice

s

H2OFrame

Data Source

h2oContext.asDataFrame

h2oContext.asH2OFrame

Page 13: Introduction to Sparkling Water - Spark Summit East 2016

TIME FOR DEMO!

Page 14: Introduction to Sparkling Water - Spark Summit East 2016

Key Points to RememberSparkling Water integrates H2O to Spark

• Enables using advanced machine learning algorithms inside Spark workflows

• Offers eager computation model,mutable data structure H2OFrame

Page 15: Introduction to Sparkling Water - Spark Summit East 2016

THANK YOU.@h2oai @mmalohlava

h2o.ai/downloadgithub.com/h2oai/sparkling-waterVisit our booth K27 for live demos and more!