Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Post on 21-Jan-2018

422 views 6 download

Transcript of Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Build, Scale, and Deploy Deep Learning Pipelines Using Apache Spark

Bay Area Spark MeetupNov 8, 2017

Sue Ann Hong, Databricks

This talk

• Deep Learning at scale: current state• Deep Learning Pipelines: the philosophy• End-to-end workflow with DL Pipelines• Future

Deep Learning at Scale: current state

3putyour#assignedhashtagherebysettingthefooterinview-header/footer

What is Deep Learning?

• A set of machine learning techniques that use layers that transform numerical inputs• Classification• Regression• Arbitrary mapping

• Popular in the 80’s as Neural Networks• Recently came back thanks to advances in data collection,

computation techniques, and hardware.

Success of Deep Learning

Tremendous success for applications with complex data• AlphaGo• Image interpretation• Automatic translation• Speech recognition

Deep Learning is often challengingLabeled

DataCompute Resources

& Time Engineer hours

• Tedious or difficult to distribute computations• No exact science around deep learning à lots of tweaking• Low level APIs with steep learning curve• Complex models à need a lot of data

Deep Learning in industry

• Currently limited adoption• Huge potential beyond the industrial giants• How do we accelerate the road to massive availability?

Deep Learning Pipelines

8putyour#assignedhashtagherebysettingthefooterinview-header/footer

Deep Learning Pipelines:Deep Learning with Simplicity

• Open-source Databricks library• Focuses on ease of use and integration• without sacrificing performance

Deep Learning is often challengingCompute Resources

& TimeEngineer hours Labeled

Data

• Tedious or difficult to distribute computations• No exact science around deep learning à lots of tweaking• Low level APIs with steep learning curve• Complex models à need a lot of data

Deep Learning is often challenging

• Tedious or difficult to distribute computations

• No exact science around deep learning à lots of tweaking

• Low level APIs with steep learning curve

• Complex models à need a lot of data

• Tedious or difficult to distribute computations

• No exact science around deep learning à lots of tweaking

• Low level APIs with steep learning curve

• Complex models à need a lot of data

Instead

• Be easy to scale

• Require little tweaking

• Be easy to write

• Require little or no data

Commonworkflowsshould

How

• Be easy to scale

• Require little tweaking

• Be easy to write

• Require little or no data

Commonworkflowsshould

• Use Apache Spark for scaling out common tasks

• Leverage well-known model architectures

• Integrate with MLlib Pipelines API to capture ML workflow concisely

• Leverage pre-trained models for common tasks

Apache Spark for scaling out

MLlib Pipelines API

pre-trained models

Demo: Build a visual recommendation AI

10minutes

7lines of code

Elastic Scale-out using Apache Spark

MLlib Pipelines API Leveragepre-trained models

0labels

To the notebook!

End-to-End Workflowwith Deep Learning Pipelines

18putyour#assignedhashtagherebysettingthefooterinview-header/footer

A typical Deep Learning workflow

• Load data (images, text, time series, …)• Interactive work• Train• Select an architecture for a neural network• Optimize the weights of the NN

• Evaluate results, potentially re-train• Apply:• Pass the data through the NN to produce new features or output

Load data Interactive workTrain

EvaluateApply

A typical Deep Learning workflow

Load data

Interactive work

Train

Evaluate

Apply

• ImageloadinginSpark

• Distributedbatchprediction• DeployingmodelsinSQL

• Transferlearning• Distributedtuning

• Pre-trainedmodels

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Adds support for images in Spark

• ImageSchema, reader, conversion functions to/from numpy arrays• Most of the tools we’ll describe work on ImageSchema columns

from sparkdl import readImagesimage_df = readImages(sample_img_dir)

Upcoming: built-in support in Spark

• Spark-21866• Contributing image format & reading to Spark• Targeted for Spark 2.3• Joint work with Microsoft

23putyour#assignedhashtagherebysettingthefooterinview-header/footer

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Applying popular models

• Popular pre-trained models accessible through MLlib Transformers

predictor = DeepImagePredictor(inputCol="image", outputCol="predicted_labels", modelName="InceptionV3")

predictions_df = predictor.transform(image_df)

Applying popular modelspredictor = DeepImagePredictor(inputCol="image",

outputCol="predicted_labels", modelName="InceptionV3")

predictions_df = predictor.transform(image_df)

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Hyperparameter tuning

Transferlearning

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Hyperparameter tuning

Transferlearning

Transfer learning

• Pre-trained models may not be directly applicable• New domain, e.g. shoes

• Training from scratch requires • Enormous amounts of data• A lot of compute resources & time

• Idea: intermediate representations learned for one task may be useful for other related tasks

Transfer Learning

SoftMaxGIANT PANDA 0.9RACCOON 0.05RED PANDA 0.01…

Transfer Learning

Transfer Learning

Classifier

Transfer Learning

Classifier

Rose: 0.7

Daisy: 0.3

Transfer Learning as a Pipeline

DeepImageFeaturizerImage

Loading PreprocessingLogistic

Regression

MLlib Pipeline

Transfer Learning as a Pipeline

35putyour#assignedhashtagherebysettingthefooterinview-header/footer

featurizer = DeepImageFeaturizer(inputCol="image",outputCol="features",modelName="InceptionV3")

lr = LogisticRegression(labelCol="label")

p = Pipeline(stages=[featurizer, lr])

p_model = p.fit(train_df)

Transfer Learning

• Usually for classification tasks• Similar task, new domain

• But other forms of learning leveraging learned representations can be loosely considered transfer learning

Featurization for similarity-based ML

DeepImageFeaturizerImage

Loading PreprocessingLogistic

Regression

Featurization for similarity-based ML

DeepImageFeaturizerImage

Loading PreprocessingClusteringKMeans

GaussianMixture

Nearest NeighborKNN LSH

Distance computation

Featurization for similarity-based ML

DeepImageFeaturizerImage

Loading PreprocessingClusteringKMeans

GaussianMixture

Nearest NeighborKNN LSH

Distance computation

Duplicate Detection

RecommendationAnomaly Detection

Search result diversification

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Hyperparameter tuning

Transferlearning

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• Apply

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• ApplySparkSQL

Batchprediction

s

Deep Learning Pipelines

• Load data

• Interactive work

• Train

• Evaluate model

• ApplySparkSQL

Batchprediction

Batch prediction as an MLlib Transformer

• A model is a Transformer in MLlib• DataFrame goes in, DataFrame comes out with output columns

predictor = XXTransformer(inputCol="image",outputCol="prediction",modelSpecification={…})

predictions = predictor.transform(test_df)

Hierarchy of DL transformers for images

46

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer (model_name)

(keras.Model)

(tf.Graph)

Input(Image)

Output(vector)

model_namemodel_name

Keras.Model

tf.Graph

Hierarchy of DL transformers for images

47

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer (model_name)

(keras.Model)

(tf.Graph)

Input(Image)

Output(vector)

model_namemodel_name

Keras.Model

tf.Graph

Hierarchy of DL transformers

48

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizer

DeepImagePredictor

NamedImageTransformer

model_name)

Input(Image

)

Output(vector

)

model_namemodel_name

Keras.Model

tf.Graph

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizer

DeepTextPredictor

NamedTextTransformerInput(Text)

Output(vector)

model_namemodel_name

Keras.Model

tf.Graph

TFTransformer

Hierarchy of DL transformers

49

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizerDeepTextPredictor

NamedTextTransformer

TFTransformer

Hierarchy of DL transformers

50

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizerDeepTextPredictor

NamedTextTransformer

TFTransformer

Common Tasks EasyAdvanced Ones Possible

51

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizerDeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizerDeepTextPredictor

NamedTextTransformer

TFTransformer

Common Tasks EasyAdvanced Ones Possible

52

TFImageTransformer

KerasImageTransformer

DeepImageFeaturizer

DeepImagePredictor

NamedImageTransformer

TFTextTransformer

KerasTextTransformer

DeepTextFeaturizer

DeepTextPredictor

NamedTextTransformer

TFTransformer

Pre-built Solutions

80%-built Solutions

Self-built Solutions

Deep Learning Pipelines : FutureIn progress• Text featurization (embeddings)• TFTransformer for arbitrary vectors

Future directions• Non-image data domains: video, text, speech, …• Distributed training• Support for more backends, e.g. MXNet, PyTorch, BigDL

Questions?Thank you!

54putyour#assignedhashtagherebysettingthefooterinview-header/footer

ResourcesDL Pipelines GitHub Repo, Spark Summit Europe 2017 Deep Dive

Blog posts & webinars (http://databricks.com/blog)• Deep Learning Pipelines• GPU acceleration in Databricks• BigDL on Databricks• Deep Learning and Apache Spark

Docs for Deep Learning on Databricks (http://docs.databricks.com)• Getting started• Deep Learning Pipelines Example• Spark integration