Best practices for productionizing Apache Spark MLlib models
Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with...
Transcript of Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with...
![Page 1: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/1.jpg)
Productionizing H2O Models with Apache
Spark
Jakub Háva,[email protected]://github.com/jakubhavahttps://www.linkedin.com/in/havaj/
AI UkraineKyiv, October 13-14 2018
![Page 2: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/2.jpg)
#ML4SAIS
Who are we?• Kuba
• Senior Software engineer at H2O.ai - Core Sparkling Water • Master’s at Charles University (CZ) • Implemented high-performance cluster monitoring tool for
JVM based languages (JNI, JVMTI, instrumentation) • Michal
• VP of Engineering at H2O.ai • Creator of Sparkling Water • Ph.D at Charles University (CZ), PostDoc at Purdue
!2
![Page 3: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/3.jpg)
Machine Learning (ML) Lifecycle
![Page 4: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/4.jpg)
!4
ModelTrainingAlgorithm
FeatureEngineering
ModelPipelineBuilding
TrainingPredictions
DataEngineering
Basic ML Lifecycle
#ML4SAIS
![Page 5: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/5.jpg)
!5
ModelTrainingAlgorithm
FeatureEngineering
FeaturizationPipelineModel
ModelPipelineBuilding
TrainingPredictions
DeploymentPredictions
DataEngineering
ModelPipelineDeployment
Basic ML Lifecycle
#ML4SAIS
![Page 6: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/6.jpg)
Example Implementations
!6#ML4SAIS
Data Engineering Feature Engineering Training Algorithm Deployment
Pipeline Model
Spark H2O Spark H2O MOJO
Spark H2O Driverless AI Spark H2O Driverless AI MOJO
Model Building Model Deployment
![Page 7: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/7.jpg)
H2O + Spark = Sparkling
Water
![Page 8: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/8.jpg)
#ML4SAIS
H2O + Spark• H2O
• Machine Learning Library • Distributed Algorithms • For ML experts
• Sparkling Water • Integrates H2O & Spark Ecosystems • Transparent for Spark users • Based on Spark pipelines & H2O
!8
![Page 9: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/9.jpg)
Basic ML Lifecycle: Sparkling Water
!9
ModelTrainingAlgorithm
FeatureEngineering
SparkTransformers H2OMOJOModel
TrainingPredictions
DeploymentPredictions
AutoML
Pipeline
#ML4SAIS
![Page 10: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/10.jpg)
Demo: Spark Pipeline
![Page 11: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/11.jpg)
H2O Driverless AI
![Page 12: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/12.jpg)
#ML4SAIS
H2O Driverless AI• What if I’m not expert ?
• H2O Driverless AI • H2O Driverless AI
• No expert knowledge required • Automatic Feature Engineering & ML
!12
![Page 13: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/13.jpg)
Basic ML Lifecycle: Driverless AI
!13
ModelTrainingAlgorithm
FeatureEngineering
DriverlessAIFeatureTransformations DriverlessAIModel
TrainingPredictions
DeploymentPredictions
PipelineDriverlessAIMOJOas
#ML4SAIS
![Page 14: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/14.jpg)
Demo: Driverless AI as Spark Pipeline
![Page 15: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/15.jpg)
!15
![Page 16: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/16.jpg)
Driverless AI Pipeline
!16#ML4SAIS
![Page 17: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/17.jpg)
Governed ML Lifecycle
![Page 18: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/18.jpg)
Governed ML Lifecycle
!18
ModelTrainingAlgorithm
FeatureEngineering
FeaturizationPipelineModel
ModelPipelineBuilding
TrainingPredictions
DeploymentPredictions
ModelManagement
DataEngineering
ModelPipelineDeployment
ModelMonitoring
AutoDocumentation
#ML4SAIS
![Page 19: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/19.jpg)
#ML4SAIS
Materials
!19
https://bit.ly/2sxowxD
![Page 20: Productionizing H2O Models with Apache Spark · 2018-10-18 · Productionizing H2O Models with Apache Spark Jakub Háva, jakub@h2o.ai https: ... Spark Transformers H 2O MOJO Model](https://reader034.fdocuments.us/reader034/viewer/2022042220/5ec6038497b9d92ce92ddcb8/html5/thumbnails/20.jpg)
#ML4SAIS
Sparkling Water enables deployment of H2O ML models with Spark Pipelines
!20
Thank you!