Sparkling Water 5 28-14

Meetup 5/28/2014

Sparkling WaterMichal Malohlava!

@mmalohlava!

@hexadata

Who am I?

Background

•PhD in CS from Charles University in Prague, 2012

•1 year PostDoc at Purdue University experimenting with algos for large computation

•1 year at 0xdata helping to develop H2O engine for big data computation

!Experience with domain-specific languages, distributed system, software engineering, and big data.

Overview

1.Towards H2O and Spark integration

2.Details and demo

3.Next steps…

Vision

Towards Spark and H2O integration

User-friendly API

!Large and active community

!Platform components - SQL

!Multitenancy

Memory efficient

Performance of computation

Machine learning algorithms

Parser, R-interface

Combine benefits of both tools and makes

“H2O a killer application for Spark”

Steps towards !interoperability

1.Data sharing between Spark to H2O

2.Optimize & improve

3.Low-level integration

Steps towards !interoperability

1.Data sharing between Spark to H2O

2.Optimize & improve

3.Low-level integration

Data sharing scenario

��

SQL query

��

SQL query

��

FrameRDD

SQL query

��

FrameRDD

SQL queryAlgo

��

FrameRDD

SQL queryAlgo

Data sharing strategies

Possible solutions

•Direct

•Distributed

•Socket-based

•File-based

•Tachyon-based

��

Spark to H2O

Possible solutions

•Direct

•Distributed

•Socket-based

•File-based

•Tachyon-based

��

Spark to H2O

Possible solutions

•Direct

•Distributed

•Socket-based

•File-based

•Tachyon-based

��

Spark to H2O

Possible solutions

•Direct

•Distributed

•Socket-based

•File-based

•Tachyon-based

��

Spark to H2O

Possible solutions

•Direct

•Distributed

•Socket-based

•File-based

•Tachyon-based

��

Spark to H2O

Possible solutions

•Direct

•Distributed

•Socket-based

•File-based

•Tachyon-based

��

Spark to H2O

Data sharing!via Tachyon

��

Tachyon

��

H2O node with Spark driver

Tachyon

��

Tachyon

Invoke Spark driver

��

Tachyon

Load data

��

Tachyon

��

Tachyon

Persist data to Tachyon

��

Tachyon

Load data into H2O frame

��

Tachyon

Invoke GBM on data

Spark 1.0-rc11

SQL component

Implemented proper parser/serializer to satisfy H2O parser

��

Latest H2O version - 2.5-SNAPSHOT

With Tachyon support included

Embedded Spark driver

��

Key requirements

Transparent approach

Work with many columns

Preserve NAs

Preserve headers

Key requirements

Preserve NAs

Preserve headers

Key requirements

Preserve NAs

Preserve headers

Key requirements

Preserve NAs

Preserve headers

Solved challenges

Large number of columns in SQL schema

•>22 columns (case class restriction)

•Solved via Product interface class Airlines( year :Option[Int], // 0! month :Option[Int], // 1! dayOfMonth :Option[Int], // 2! dayOfWeek :Option[Int], // 3! crsDepTime :Option[Int], // 5! crsArrTime :Option[Int], // 7! uniqueCarrier :Option[String], // 8! flightNum :Option[Int], // 9! tailNum :Option[Int], // 10! crsElapsedTime:Option[Int], // 12! origin :Option[String], // 16! dest :Option[String], // 17! distance :Option[Int], // 18! isArrDelayed :Option[Boolean],// 29! isDepDelayed :Option[Boolean] // 30! ) extends Product { … }

Solved challenges

Handling NAs during load

•Store them in SQL RDD

•Solved by https://github.com/apache/spark/pull/658

•Use Option[T] or non-primitive Java type

Handling NAs during save

•A simple sql.Row serializer handling NA values

Time for Demo

Step-by-step

Start Spark cloud - 1 worker

Start Tachyon storage

Start H2O slave node

Start H2O master node with Scala master program

override def run(conf: DemoConf): Unit = {! // Dataset! val dataset = "data/allyears2k_headers.csv"! // Row parser! val rowParser = AirlinesParser! // Table name for SQL! val tableName = "airlines_table"! // Select all flights with destination == SFO! val query = """SELECT * FROM airlines_table WHERE dest="SFO" """! ! // Connect to shark cluster and make a query over prostate, transfer data into H2O! val frame:Frame = executeSpark[Airlines](dataset, rowParser, !! ! ! conf.extractor, tableName, query, local=conf.local)! ! // Now make a blocking call of GBM directly via Java API! gbm(frame, frame.vec("isDepDelayed"), 100, true)! }

Step-by-step

Start Spark cloud - 1 worker

Start Tachyon storage

Start H2O slave node

Start H2O master node with Scala master program

override def run(conf: DemoConf): Unit = {! // Dataset! val dataset = "data/allyears2k_headers.csv"! // Row parser! val rowParser = AirlinesParser! // Table name for SQL! val tableName = "airlines_table"! // Select all flights with destination == SFO! val query = """SELECT * FROM airlines_table WHERE dest="SFO" """! ! // Connect to shark cluster and make a query over prostate, transfer data into H2O! val frame:Frame = executeSpark[Airlines](dataset, rowParser, !! ! ! conf.extractor, tableName, query, local=conf.local)! ! // Now make a blocking call of GBM directly via Java API! gbm(frame, frame.vec("isDepDelayed"), 100, true)! }

Demo code

Next steps…

Optimize data transfers

•Have notion of H2O RDD inside Spark

H2O Backend for MLlib

•Based on H2O RDD

•Use H2O algos

Open challenges

See http://jira.0xdata.com and Sparkling component

•PUB-730 Transfer results from H2O frame into RDD

•PUB-732 Parquet support for H2O

•PUB-733 MLlib backend

•PUB-734 H2O-based RDD

Time for questions

Thank you!

Learn more about H2O at 0xdata.com

Thank you!

Follow us at @hexadata

neo> for r in h2o h2o-sparkling; do !git clone “git@github.com:0xdata/$r.git”!done

Sparkling Water 5 28-14

Data & Analytics

Transcript of Sparkling Water 5 28-14

Introduction to Sparkling Water - Spark Summit East 2016

Menu & Shop - Vueling · 2021. 4. 28. · SOFT & ENERGY DRINKS REFRESCOS Y BEBIDAS ENERGÉTICAS SOLÁN DE CABRAS Still Water/ Agua sin gas SOLÁN DE CABRAS Sparkling Water / Agua

Building Machine Learning Applications with Sparkling Water

STILL OR SPARKLING GET IN TOUCH SPRING WATER orders ...

Ice and Sparkling Water Dispenser with Chewblet Ice ... · Carbonator air is bled according to installation instructions. Turn on carbonator and CO2: Sparkling and Still Water ˜ow

H2O World - Sparkling Water - Michal Malohlava

SOFT DRINKS HOT DRINKS - Kervan Sofrasi RestaurantŞalgam spicy turnip 2.20 bottle sparkling / still water 330ml .452 large bottle sparkling / still water 4.00 ayran jug 8.50 2.45

Gala Water 2ltr Still, Sparkling, Cadbury Milk Chocolate ... · Gala Water 2ltr Still, Sparkling, Cadbury Milk Chocolate Fingers 114g, Green Isle Skinny Cut Oven Chips 800g, Donegal

Drinks · 2020. 11. 3. · Drinks Soft Drinks Wenlock Still & Sparkling Spring Water Small Bottle - 2.75 Large Bottle - 5 Elderﬂower Presse Sparkling - 2.75 Ginger Beer Sparkling

2Q’14 Earnings release - 코웨이 IR · 2018-01-10 · Smallest size of ice-making water purifier Sparkling water purifier Released on July 2014 Cold water/ Sparkling water function

Machine Learning with Sparkling Water: H2O + Spark

Sparkling Water Meetup 4.15.15

Sparkling Water, Safety and a Revamped Pool

STARTERS...Thirsty Planet sparkling water 750ml £3.00 Thirsty Planet sparkling water 330ml £1.50 Thirsty Planet still water 750ml £3.00 Thirsty Planet still water 330ml £1.50 Fever

PUSSER S COCKTAILS OTHER CHOICES town pub drink checklist.pdfJUICES, FRUIT PUNCH STILL SPRING WATER 500ML STILL SPRING WATER 1L SPARKLING WATER 330ML SPARKLING WATER 1L RED BULL HOT

Sparkling Water Datasheet

Sparkling Water, ASK CRAIG

Spirit manual AUS ENGLISH 2018 · Enjoy your sparkling water or add your preferred SodaStream Sparkling Drink Mix ﬂavour to enjoy a sparkling ﬂavoured beverage! Sparkling Drink

Sparkling Water Meetup

Sparkling Water Webinar October 29th, 2014