Download - Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transcript
Page 1: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Bay Area Spark MeetupNov 8, 2017

Sue Ann Hong, Databricks

Page 2: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

This talk

• Deep Learning at scale: current state• Deep Learning Pipelines: the philosophy• End-to-end workflow with DL Pipelines• Future

Page 3: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning at Scale: current state

3putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 4: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

What is Deep Learning?

• A set of machine learning techniques that use layers that transform numerical inputs• Classification• Regression• Arbitrary mapping

• Popular in the 80’s as Neural Networks• Recently came back thanks to advances in data collection,

computation techniques, and hardware.

Page 5: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Success of Deep Learning

Tremendous success for applications with complex data• AlphaGo• Image interpretation• Automatic translation• Speech recognition

Page 6: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning is often challengingLabeled

DataCompute Resources

& Time Engineer hours

• Tedious or difficult to distribute computations• No exact science around deep learning à lots of tweaking• Low level APIs with steep learning curve• Complex models à need a lot of data

Page 7: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning in industry

• Currently limited adoption• Huge potential beyond the industrial giants• How do we accelerate the road to massive availability?

Page 8: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

8putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 9: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines:Deep Learning with Simplicity

• Open-source Databricks library• Focuses on ease of use and integration• without sacrificing performance

Page 10: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning is often challengingCompute Resources

& TimeEngineer hours Labeled

Data

• Tedious or difficult to distribute computations• No exact science around deep learning à lots of tweaking• Low level APIs with steep learning curve• Complex models à need a lot of data

Page 11: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning is often challenging

• Tedious or difficult to distribute computations

• No exact science around deep learning à lots of tweaking

• Low level APIs with steep learning curve

• Complex models à need a lot of data

Page 12: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

• Tedious or difficult to distribute computations

• No exact science around deep learning à lots of tweaking

• Low level APIs with steep learning curve

• Complex models à need a lot of data

Instead

• Be easy to scale

• Require little tweaking

• Be easy to write

• Require little or no data

Commonworkflowsshould

Page 13: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

How

• Be easy to scale

• Require little tweaking

• Be easy to write

• Require little or no data

Commonworkflowsshould

• Use Apache Spark for scaling out common tasks

• Leverage well-known model architectures

• Integrate with MLlib Pipelines API to capture ML workflow concisely

• Leverage pre-trained models for common tasks

Apache Spark for scaling out

MLlib Pipelines API

pre-trained models

Page 14: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Demo: Build a visual recommendation AI

10minutes

7lines of code

Elastic Scale-out using Apache Spark

MLlib Pipelines API Leveragepre-trained models

0labels

Page 15: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Page 16: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

To the notebook!

Page 17: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Page 18: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

End-to-End Workflowwith Deep Learning Pipelines

18putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 19: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

A typical Deep Learning workflow

• Load data (images, text, time series, …)• Interactive work• Train• Select an architecture for a neural network• Optimize the weights of the NN

• Evaluate results, potentially re-train• Apply:• Pass the data through the NN to produce new features or output

Load data Interactive workTrain

EvaluateApply

Page 20: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

A typical Deep Learning workflow

Load data

Interactive work

Train

Evaluate

Apply

• ImageloadinginSpark

• Distributedbatchprediction• DeployingmodelsinSQL

• Transferlearning• Distributedtuning

• Pre-trainedmodels

Page 21: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Page 22: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Adds support for images in Spark

• ImageSchema, reader, conversion functions to/from numpy arrays• Most of the tools we’ll describe work on ImageSchema columns

from sparkdl import readImagesimage_df = readImages(sample_img_dir)

Page 23: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Upcoming: built-in support in Spark

• Spark-21866• Contributing image format & reading to Spark• Targeted for Spark 2.3• Joint work with Microsoft

23putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 24: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Page 25: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Applying popular models

• Popular pre-trained models accessible through MLlib Transformers

predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3")

predictions_df = predictor.transform(image_df)

Page 26: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Applying popular modelspredictor = DeepImagePredictor(inputCol="image",

outputCol="predicted_labels", modelName="InceptionV3")

predictions_df = predictor.transform(image_df)

Page 27: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Hyperparameter tuning

Transferlearning

Page 28: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Hyperparameter tuning

Transferlearning

Page 29: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer learning

• Pre-trained models may not be directly applicable• New domain, e.g. shoes

• Training from scratch requires • Enormous amounts of data• A lot of compute resources & time

• Idea: intermediate representations learned for one task may be useful for other related tasks

Page 30: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

SoftMaxGIANT PANDA 0.9RACCOON 0.05RED PANDA 0.01…

Page 31: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

Page 32: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

Classifier

Page 33: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

Classifier

Rose: 0.7

Daisy: 0.3

Page 34: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning as a Pipeline

DeepImageFeaturizerImage

Loading PreprocessingLogistic

Regression

MLlib Pipeline

Page 35: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning as a Pipeline

35putyour#assignedhashtagherebysettingthefooterinview-header/footer

featurizer = DeepImageFeaturizer(inputCol="image",outputCol="features",modelName="InceptionV3")

lr = LogisticRegression(labelCol="label")

p = Pipeline(stages=[featurizer, lr])

p_model = p.fit(train_df)

Page 36: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Transfer Learning

• Usually for classification tasks• Similar task, new domain

• But other forms of learning leveraging learned representations can be loosely considered transfer learning

Page 37: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark
Page 38: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Featurization for similarity-based ML

DeepImageFeaturizerImage

Loading PreprocessingLogistic

Regression

Page 39: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Featurization for similarity-based ML

DeepImageFeaturizerImage

Loading PreprocessingClusteringKMeans

GaussianMixture

Nearest NeighborKNN LSH

Distance computation

Page 40: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Featurization for similarity-based ML

DeepImageFeaturizerImage

Loading PreprocessingClusteringKMeans

GaussianMixture

Nearest NeighborKNN LSH

Distance computation

Duplicate Detection

RecommendationAnomaly Detection

Search result diversification

Page 41: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Hyperparameter tuning

Transferlearning

Page 42: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Page 43: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• ApplySparkSQL

Batchprediction

s

Page 44: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• ApplySparkSQL

Batchprediction

Page 45: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Batch prediction as an MLlib Transformer

• A model is a Transformer in MLlib• DataFrame goes in, DataFrame comes out with output columns

predictor = XXTransformer(inputCol="image",outputCol="prediction",modelSpecification={…})

predictions = predictor.transform(test_df)

Page 46: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers for images

46

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer (model_name)

(keras.Model)

(tf.Graph)

Input(Image)

Output(vector)

model_namemodel_name

Keras.Model

tf.Graph

Page 47: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers for images

47

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer (model_name)

(keras.Model)

(tf.Graph)

Input(Image)

Output(vector)

model_namemodel_name

Keras.Model

tf.Graph

Page 48: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers

48

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizer

DeepImagePredictor

NamedImageTransformer

model_name)

Input(Image

)

Output(vector

)

model_namemodel_name

Keras.Model

tf.Graph

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizer

DeepTextPredictor

NamedTextTransformerInput(Text)

Output(vector)

model_namemodel_name

Keras.Model

tf.Graph

TFTransformer

Page 49: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers

49

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizerDeepTextPredictor

NamedTextTransformer

TFTransformer

Page 50: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Hierarchy of DL transformers

50

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizerDeepTextPredictor

NamedTextTransformer

TFTransformer

Page 51: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Common Tasks EasyAdvanced Ones Possible

51

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizerDeepTextPredictor

NamedTextTransformer

TFTransformer

Page 52: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Common Tasks EasyAdvanced Ones Possible

52

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizer

DeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizer

DeepTextPredictor

NamedTextTransformer

TFTransformer

Pre-built Solutions

80%-built Solutions

Self-built Solutions

Page 53: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Deep Learning Pipelines : FutureIn progress• Text featurization (embeddings)• TFTransformer for arbitrary vectors

Future directions• Non-image data domains: video, text, speech, …• Distributed training• Support for more backends, e.g. MXNet, PyTorch, BigDL

Page 54: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Questions?Thank you!

54putyour#assignedhashtagherebysettingthefooterinview-header/footer

Page 55: Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

ResourcesDL Pipelines GitHub Repo, Spark Summit Europe 2017 Deep Dive

Blog posts & webinars (http://databricks.com/blog)• Deep Learning Pipelines• GPU acceleration in Databricks• BigDL on Databricks• Deep Learning and Apache Spark

Docs for Deep Learning on Databricks (http://docs.databricks.com)• Getting started• Deep Learning Pipelines Example• Spark integration