BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep...

67
BAYESIAN GLOBAL OPTIMIZATION Using Optimal Learning to Tune Deep Learning Pipelines Scott Clark [email protected]

Transcript of BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep...

Page 1: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

BAYESIAN GLOBAL OPTIMIZATIONUsing Optimal Learning to Tune Deep Learning Pipelines

Scott Clark

[email protected]

Page 2: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

OUTLINE

1. Why is Tuning AI Models Hard?

2. Comparison of Tuning Methods

3. Bayesian Global Optimization

4. Deep Learning Examples

5. Evaluating Optimization Strategies

Page 3: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

Deep Learning / AI is extremely powerful

Tuning these systems is extremely non-intuitive

Page 4: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

https://www.quora.com/What-is-the-most-important-unresolved-problem-in-machine-learning-3

What is the most important unresolved problem in machine learning?

“...we still don't really know why some configurations of deep neural networks work in some case and not others, let alone having a more or less automatic approach to determining the architectures and the hyperparameters.”

Xavier Amatriain, VP Engineering at Quora(former Director of Research at Netflix)

Page 5: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

Photo: Joe Ross

Page 6: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

TUNABLE PARAMETERS IN DEEP LEARNING

Page 7: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

TUNABLE PARAMETERS IN DEEP LEARNING

Page 8: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

Photo: Tammy Strobel

Page 9: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

STANDARD METHODSFOR HYPERPARAMETER SEARCH

Page 10: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

STANDARD TUNING METHODS

ParameterConfiguration

?Grid Search Random Search

Manual Search

- Weights- Thresholds- Window sizes- Transformations

ML / AIModel

TestingData

CrossValidation

TrainingData

Page 11: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

OPTIMIZATION FEEDBACK LOOP

Objective Metric

Better Results

REST API

New configurationsML / AIModel

TestingData

CrossValidation

TrainingData

Page 12: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

BAYESIAN GLOBAL OPTIMIZATION

Page 13: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

… the challenge of how to collect information as efficiently as possible, primarily for settings where collecting information is time consuming and expensive.

Prof. Warren Powell - Princeton

What is the most efficient way to collect information?Prof. Peter Frazier - Cornell

How do we make the most money, as fast as possible?Scott Clark - CEO, SigOpt

OPTIMAL LEARNING

Page 14: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

● Optimize objective function○ Loss, Accuracy, Likelihood

● Given parameters○ Hyperparameters, feature/architecture params

● Find the best hyperparameters○ Sample function as few times as possible○ Training on big data is expensive

BAYESIAN GLOBAL OPTIMIZATION

Page 15: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

SMBO

Sequential Model-Based Optimization

HOW DOES IT WORK?

Page 16: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

1. Build Gaussian Process (GP) with points sampled so far

2. Optimize the fit of the GP (covariance hyperparameters)

3. Find the point(s) of highest Expected Improvement within parameter domain

4. Return optimal next best point(s) to sample

GP/EI SMBO

Page 17: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

GAUSSIAN PROCESSES

Page 18: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

GAUSSIAN PROCESSES

Page 19: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

GAUSSIAN PROCESSES

Page 20: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

GAUSSIAN PROCESSES

Page 21: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

GAUSSIAN PROCESSES

Page 22: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

GAUSSIAN PROCESSES

Page 23: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

GAUSSIAN PROCESSES

Page 24: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

GAUSSIAN PROCESSES

Page 25: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

overfit good fit underfit

GAUSSIAN PROCESSES

Page 26: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

EXPECTED IMPROVEMENT

Page 27: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

EXPECTED IMPROVEMENT

Page 28: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

EXPECTED IMPROVEMENT

Page 29: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

EXPECTED IMPROVEMENT

Page 30: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

EXPECTED IMPROVEMENT

Page 31: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

EXPECTED IMPROVEMENT

Page 32: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

DEEP LEARNING EXAMPLES

Page 33: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

● Classify movie reviews using a CNN in MXNet

SIGOPT + MXNET

Page 34: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

TEXT CLASSIFICATION PIPELINE

ML / AIModel

(MXNet)

TestingText

ValidationAccuracy

Better Results

REST API

HyperparameterConfigurations

andFeature

TransformationsTrainingText

Page 35: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

TUNABLE PARAMETERS IN DEEP LEARNING

Page 36: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

● Comparison of several RMSProp SGD parametrizations

STOCHASTIC GRADIENT DESCENT

Page 37: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

ARCHITECTURE PARAMETERS

Page 38: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

Grid Search Random Search

This slide’s GIF loops automatically

?

TUNING METHODS

Page 39: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

MULTIPLICATIVE TUNING SPEED UP

Page 40: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

SPEED UP #1: CPU -> GPU

Page 41: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

SPEED UP #2: RANDOM/GRID -> SIGOPT

Page 42: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

CONSISTENTLY BETTER AND FASTER

Page 43: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

● Classify house numbers in an image dataset (SVHN)

SIGOPT + TENSORFLOW

Page 44: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

COMPUTER VISION PIPELINE

ML / AIModel

(Tensorflow)

TestingImages

CrossValidation

Accuracy

Better Results

REST API

HyperparameterConfigurations

andFeature

TransformationsTrainingImages

Page 45: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

METRIC OPTIMIZATION

Page 46: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

● All convolutional neural network● Multiple convolutional and dropout layers● Hyperparameter optimization mixture of

domain expertise and grid search (brute force)

SIGOPT + NEON

http://arxiv.org/pdf/1412.6806.pdf

Page 47: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

COMPARATIVE PERFORMANCE

● Expert baseline: 0.8995○ (using neon)

● SigOpt best: 0.9011○ 1.6% reduction in

error rate○ No expert time

wasted in tuning

Page 48: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

SIGOPT + NEON

http://arxiv.org/pdf/1512.03385v1.pdf

● Explicitly reformulate the layers as learning residualfunctions with reference to the layer inputs, instead of learning unreferenced functions

● Variable depth● Hyperparameter optimization mixture of domain expertise and grid search (brute force)

Page 49: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

COMPARATIVE PERFORMANCE

Standard Method

● Expert baseline: 0.9339○ (from paper)

● SigOpt best: 0.9436○ 15% relative error

rate reduction○ No expert time

wasted in tuning

Page 50: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

EVALUATING THE OPTIMIZER

Page 51: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

OUTLINE

● Metric Definitions

● Benchmark Suite

● Eval Infrastructure

● Visualization Tool

● Baseline Comparisons

Page 52: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

What is the best value found after optimization completes?

METRIC: BEST FOUND

BLUE RED

BEST_FOUND 0.7225 0.8949

Page 53: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

How quickly is optimum found? (area under curve)

METRIC: AUC

BLUE RED

BEST_FOUND 0.9439 0.9435

AUC 0.8299 0.9358

Page 54: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

STOCHASTIC OPTIMIZATION

Page 55: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

● Optimization functions from literature● ML datasets: LIBSVM, Deep Learning, etc

BENCHMARK SUITE

TEST FUNCTION TYPE COUNT

Continuous Params 184

Noisy Observations 188

Parallel Observations 45

Integer Params 34

Categorical Params / ML 47

Failure Observations 30

TOTAL 489

Page 56: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

● On-demand cluster in AWS for parallel eval function optimization

● Full eval consists of

~20000 optimizations, taking ~30 min

INFRASTRUCTURE

Page 57: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

RANKING OPTIMIZERS

1. Mann-Whitney U tests using BEST_FOUND

2. Tied results then partially ranked using AUC

3. Any remaining ties, stay as ties for final ranking

Page 58: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

RANKING AGGREGATION

● Aggregate partial rankings across all eval functions using Borda count (sum of methods ranked lower)

Page 59: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

SHORT RESULTS SUMMARY

Page 60: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

BASELINE COMPARISONS

Page 61: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

SIGOPT SERVICE

Page 62: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

OPTIMIZATION FEEDBACK LOOP

Objective Metric

Better Results

REST API

New configurationsML / AIModel

TestingData

CrossValidation

TrainingData

Page 63: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

SIMPLIFIED OPTIMIZATIONClient Libraries● Python● Java● R● Matlab● And more...

Framework Integrations● TensorFlow● scikit-learn● xgboost● Keras● Neon● And more...

Live Demo

Page 64: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

DISTRIBUTED TRAINING

● SigOpt serves as a distributed scheduler for training models across workers

● Workers access the SigOpt API for the latest parameters to try for each model

● Enables easy distributed training of non-distributed algorithms across any number of models

Page 65: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

COMPARATIVE PERFORMANCE ● Better Results, Faster and Cheaper

Quickly get the most out of your models with our proven, peer-reviewed ensemble of Bayesian and Global Optimization Methods

○ A Stratified Analysis of Bayesian Optimization Methods (ICML 2016)○ Evaluation System for a Bayesian Optimization Service (ICML 2016)○ Interactive Preference Learning of Utility Functions for Multi-Objective Optimization (NIPS 2016)○ And more...

● Fully FeaturedTune any model in any pipeline

○ Scales to 100 continuous, integer, and categorical parameters and many thousands of evaluations○ Parallel tuning support across any number of models○ Simple integrations with many languages and libraries○ Powerful dashboards for introspecting your models and optimization○ Advanced features like multi-objective optimization, failure region support, and more

● Secure Black Box OptimizationYour data and models never leave your system

Page 66: BAYESIAN GLOBAL OPTIMIZATION - NVIDIAon-demand.gputechconf.com/gtc/2017/presentation/s... · Deep Learning / AI is extremely powerful Tuning these systems is extremely non-intuitive.

https://sigopt.com/getstarted

Try it yourself!