Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

55
Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark Bay Area Spark Meetup Nov 8, 2017 Sue Ann Hong, Databricks

Transcript of Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Page 1: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Bay Area Spark MeetupNov 8, 2017

Sue Ann Hong, Databricks

Page 2: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

This talk

• Deep Learning at scale: current state• Deep Learning Pipelines: the philosophy• End-to-end workflow with DL Pipelines• Future

Page 3: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning at Scale: current state

3putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 4: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

What is Deep Learning?

• A set of machine learning techniques that use layers that transform numerical inputs• Classification• Regression• Arbitrary mapping

• Popular in the 80’s as Neural Networks• Recently came back thanks to advances in data collection,

computation techniques, and hardware.

Page 5: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Success of Deep Learning

Tremendous success for applications with complex data• AlphaGo• Image interpretation• Automatic translation• Speech recognition

Page 6: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning is often challengingLabeled

DataCompute Resources

& Time Engineer hours

• Tedious or difficult to distribute computations• No exact science around deep learning à lots of tweaking• Low level APIs with steep learning curve• Complex models à need a lot of data

Page 7: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning in industry

• Currently limited adoption• Huge potential beyond the industrial giants• How do we accelerate the road to massive availability?

Page 8: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

8putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 9: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines:Deep Learning with Simplicity

• Open-source Databricks library• Focuses on ease of use and integration• without sacrificing performance

Page 10: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning is often challengingCompute Resources

& TimeEngineer hours Labeled

Data

• Tedious or difficult to distribute computations• No exact science around deep learning à lots of tweaking• Low level APIs with steep learning curve• Complex models à need a lot of data

Page 11: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning is often challenging

• Tedious or difficult to distribute computations

• No exact science around deep learning à lots of tweaking

• Low level APIs with steep learning curve

• Complex models à need a lot of data

Page 12: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

• Tedious or difficult to distribute computations

• No exact science around deep learning à lots of tweaking

• Low level APIs with steep learning curve

• Complex models à need a lot of data

Instead

• Be easy to scale

• Require little tweaking

• Be easy to write

• Require little or no data

Commonworkflowsshould

Page 13: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

How

• Be easy to scale

• Require little tweaking

• Be easy to write

• Require little or no data

Commonworkflowsshould

• Use Apache Spark for scaling out common tasks

• Leverage well-known model architectures

• Integrate with MLlib Pipelines API to capture ML workflow concisely

• Leverage pre-trained models for common tasks

Apache Spark for scaling out

MLlib Pipelines API

pre-trained models

Page 14: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Demo: Build a visual recommendation AI

10minutes

7lines of code

Elastic Scale-out using Apache Spark

MLlib Pipelines API Leveragepre-trained models

0labels

Page 15: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Page 16: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

To the notebook!

Page 17: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Page 18: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

End-to-End Workflowwith Deep Learning Pipelines

18putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 19: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

A typical Deep Learning workflow

• Load data (images, text, time series, …)• Interactive work• Train• Select an architecture for a neural network• Optimize the weights of the NN

• Evaluate results, potentially re-train• Apply:• Pass the data through the NN to produce new features or output

Load data Interactive workTrain

EvaluateApply

Page 20: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

A typical Deep Learning workflow

Load data

Interactive work

Train

Evaluate

Apply

• ImageloadinginSpark

• Distributedbatchprediction• DeployingmodelsinSQL

• Transferlearning• Distributedtuning

• Pre-trainedmodels

Page 21: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Page 22: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Adds support for images in Spark

• ImageSchema, reader, conversion functions to/from numpy arrays• Most of the tools we’ll describe work on ImageSchema columns

from sparkdl import readImagesimage_df = readImages(sample_img_dir)

Page 23: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Upcoming: built-in support in Spark

• Spark-21866• Contributing image format & reading to Spark• Targeted for Spark 2.3• Joint work with Microsoft

23putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 24: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Page 25: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Applying popular models

• Popular pre-trained models accessible through MLlib Transformers

predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3")

predictions_df = predictor.transform(image_df)

Page 26: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Applying popular modelspredictor = DeepImagePredictor(inputCol="image",

outputCol="predicted_labels", modelName="InceptionV3")

predictions_df = predictor.transform(image_df)

Page 27: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Hyperparameter tuning

Transferlearning

Page 28: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Hyperparameter tuning

Transferlearning

Page 29: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer learning

• Pre-trained models may not be directly applicable• New domain, e.g. shoes

• Training from scratch requires • Enormous amounts of data• A lot of compute resources & time

• Idea: intermediate representations learned for one task may be useful for other related tasks

Page 30: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

SoftMaxGIANT PANDA 0.9RACCOON 0.05RED PANDA 0.01…

Page 31: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

Page 32: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

Classifier

Page 33: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

Classifier

Rose: 0.7

Daisy: 0.3

Page 34: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning as a Pipeline

DeepImageFeaturizerImage

Loading PreprocessingLogistic

Regression

MLlib Pipeline

Page 35: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning as a Pipeline

35putyour#assignedhashtagherebysettingthefooterinview-header/footer

featurizer = DeepImageFeaturizer(inputCol="image",outputCol="features",modelName="InceptionV3")

lr = LogisticRegression(labelCol="label")

p = Pipeline(stages=[featurizer, lr])

p_model = p.fit(train_df)

Page 36: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

• Usually for classification tasks• Similar task, new domain

• But other forms of learning leveraging learned representations can be loosely considered transfer learning

Page 37: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Page 38: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Featurization for similarity-based ML

DeepImageFeaturizerImage

Loading PreprocessingLogistic

Regression

Page 39: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Featurization for similarity-based ML

DeepImageFeaturizerImage

Loading PreprocessingClusteringKMeans

GaussianMixture

Nearest NeighborKNN LSH

Distance computation

Page 40: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Featurization for similarity-based ML

DeepImageFeaturizerImage

Loading PreprocessingClusteringKMeans

GaussianMixture

Nearest NeighborKNN LSH

Distance computation

Duplicate Detection

RecommendationAnomaly Detection

Search result diversification

Page 41: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Hyperparameter tuning

Transferlearning

Page 42: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Page 43: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• ApplySparkSQL

Batchprediction

s

Page 44: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• ApplySparkSQL

Batchprediction

Page 45: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Batch prediction as an MLlib Transformer

• A model is a Transformer in MLlib• DataFrame goes in, DataFrame comes out with output columns

predictor = XXTransformer(inputCol="image",outputCol="prediction",modelSpecification={…})

predictions = predictor.transform(test_df)

Page 46: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers for images

46

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer (model_name)

(keras.Model)

(tf.Graph)

Input(Image)

Output(vector)

model_namemodel_name

Keras.Model

tf.Graph

Page 47: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers for images

47

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer (model_name)

(keras.Model)

(tf.Graph)

Input(Image)

Output(vector)

model_namemodel_name

Keras.Model

tf.Graph

Page 48: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers

48

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizer

DeepImagePredictor

NamedImageTransformer

model_name)

Input(Image

)

Output(vector

)

model_namemodel_name

Keras.Model

tf.Graph

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizer

DeepTextPredictor

NamedTextTransformerInput(Text)

Output(vector)

model_namemodel_name

Keras.Model

tf.Graph

TFTransformer

Page 49: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers

49

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizerDeepTextPredictor

NamedTextTransformer

TFTransformer

Page 50: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers

50

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizerDeepTextPredictor

NamedTextTransformer

TFTransformer

Page 51: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Common Tasks EasyAdvanced Ones Possible

51

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizerDeepTextPredictor

NamedTextTransformer

TFTransformer

Page 52: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Common Tasks EasyAdvanced Ones Possible

52

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizer

DeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizer

DeepTextPredictor

NamedTextTransformer

TFTransformer

Pre-built Solutions

80%-built Solutions

Self-built Solutions

Page 53: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines : FutureIn progress• Text featurization (embeddings)• TFTransformer for arbitrary vectors

Future directions• Non-image data domains: video, text, speech, …• Distributed training• Support for more backends, e.g. MXNet, PyTorch, BigDL

Page 54: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Questions?Thank you!

54putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 55: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

ResourcesDL Pipelines GitHub Repo, Spark Summit Europe 2017 Deep Dive

Blog posts & webinars (http://databricks.com/blog)• Deep Learning Pipelines• GPU acceleration in Databricks• BigDL on Databricks• Deep Learning and Apache Spark

Docs for Deep Learning on Databricks (http://docs.databricks.com)• Getting started• Deep Learning Pipelines Example• Spark integration