SigOpt for Machine Learning and AI

15
Hello, my name is Scott Clark, co-founder and CEO of SigOpt. In this video I’m going to show you how SigOpt can help you amplify your machine learning and AI models by optimally tuning them using our black-box optimization platform. For more information please visit https://sigopt.com .

Transcript of SigOpt for Machine Learning and AI

AMPLIFY YOURML / AI MODELS

Hello, my name is Scott Clark, co-founder and CEO of SigOpt.

In this video I’m going to show you how SigOpt can help you amplify your machine learning and AI models by optimally tuning them using our black-box optimization platform.

For more information please visit https://sigopt.com.

© 2017 SigOpt, Inc https://sigopt.com

SigOpt optimizes...

● Machine Learning● AI / Deep Learning● Risk / Fraud Models● Backtests / Simulations

Resulting in...

● Better Results● Faster Development● Cheaper, Faster

Tuning

OPTIMIZATION AS A SERVICE

The SigOpt platform provides an ensemble of state-of-the-art Bayesian and Global optimization algorithms via a simple Software-as-a-Service API.

SigOpt optimizes machine learning models like random forests, support vector machines, and gradient boosted methods as well as more sophisticated techniques like a deep learning pipelines, proprietary risk and fraud models, or even a complex backtesting and simulation pipeline. This enables data scientists and machine learning engineers to build better models with less trial and error by efficiently optimizing the tunable parameters of these models.

This results in captured performance that may otherwise be left on the table by conventional techniques while also reducing the time and cost for developing and optimizing new models.

© 2017 SigOpt, Inc https://sigopt.com

Photo: Joe Ross

Every complex system has tunable parameters.

A car has parameters like the gear ratio or fuel injection ratio that affect output like top speed.

© 2017 SigOpt, Inc https://sigopt.com

TUNABLE PARAMETERS IN DEEP LEARNING

A machine learning or AI model has tunable hyperparameters that affect performance. This can be as simple as the number of trees in a random forest or the kernel of a Support Vector Machine, or as complex as the learning rate in a gradient boosted or deep learning method.

In this simple TensorFlow example, we have constructed a 4 layer network to perform 2D, binary classification. We are attempting to learn a surface that can differentiate blue and orange dots as seen in the figure to the right. Even this simple task and small network has 22 tunable hyperparameters including traditional hyperparameters like learning rate and activation function, as well as regularization and architecture parameters, and feature transformation parameters. By tuning the parameters of this pipeline in unison we can achieve much better results than tuning them independently.

This extends to other AI and machine learning pipelines as well, which may incorporate many unsupervised and supervised learning techniques with tunable parameters.

© 2017 SigOpt, Inc https://sigopt.com

STANDARD TUNING METHODS

ParameterConfiguration

?Grid Search Random Search

Manual Search

- Weights- Thresholds- Window sizes- Transformations

ML / AIModel

TestingData

CrossValidation

TrainingData

Domain expertise is incredibly important when developing new machine learning pipelines, which often undergo rigorous validation before being deployed into production.

Often the modeler needs to tune the hyperparameters and feature transformation parameters within their pipeline to optimize a performance metric and maximize the business value of the model. This involves finding the best parameter and hyperparameter configurations for all the various knobs and levers within the system and can have an significant impact on the end results.

Traditionally this is a very time-consuming and expensive, trial and error based process that relies on methods like grid, random, local, or an expert-intensive manual search.

© 2017 SigOpt, Inc https://sigopt.com

OPTIMIZATION FEEDBACK LOOP

Objective Metric

Better Results

REST API

New configurationsML / AIModel

TestingData

CrossValidation

TrainingData

SigOpt uses a proven, peer-reviewed ensemble of Bayesian and Global Optimization algorithms to efficiently tune these models

First, SigOpt suggests parameter configurations to evaluate, which are then evaluated using a method like cross validation where an objective metric like an AUC or F-1 score is computed. This process is repeated, either in parallel or serially.

SigOpt’s ensemble of optimization methods leverages the historical performance of previous configurations to optimally suggest new parameter configurations to evaluate. By efficiently trading off exploration (learning more information about the underlying parameters and response surface) and exploitation (leveraging that information to optimize the output metric), SigOpt is able to find better configurations exponentially faster than standard methods like an exhaustive or grid search.

All of this is accomplished by bolting our easy-to-integrate REST API onto your existing models and infrastructure.

SigOpt’s black-box optimization algorithms require only high-level information about the parameters being tuned and how they performed, meaning sensitive information about your data and model stays private and secure. Additionally, the benefits captured by better tuning are additive with the work you’ve already done on the model and data itself.

© 2017 SigOpt, Inc https://sigopt.com

USE CASE: DEFAULT CLASSIFICATION

ML / AIModel

(xgboost)

TestingLoan Data

CrossValidation

Accuracy (AUC ROC)

Better Results

REST API

HyperparameterConfigurationsTraining

Loan Data

In this specific example, we’ll compare the relative tradeoffs of different tuning strategies in a loan default classification pipeline using xgboost, a popular gradient boosting library, and the open lending club dataset.

We’ll tune the various hyperparameters of xgboost and optimize the accuracy metric of AUC ROC.

© 2017 SigOpt, Inc https://sigopt.com

COMPARATIVE PERFORMANCE

Accu

racy

Grid Search

Random SearchA

UC

.698

.690

.683

.675$1,000100 hrs

$10,0001,000 hrs

$100,00010,000 hrs

Cost

● Better: 22% fewer bad loans vs baseline

● Faster/Cheaper: 100x less time and AWS cost than standard tuning methods

xgboostextended example

SigOpt was able to efficiently tune the pipeline, beating the standard methods of exhaustive grid search and a randomized search in both AUC and the cost to achieve that AUC.

SigOpt found a model that had a 22% relative improvement in the metric when compared to the default xgboost hyperparameters, while also requiring 100x fewer evaluations than the standard grid search approach.

Extended xgboost example- Blog:

http://blog.sigopt.com/post/140871698423/sigopt-for-ml-unsupervised-learning-with-even

- Code: https://github.com/sigopt/sigopt-examples/tree/master/unsupervised-model

© 2017 SigOpt, Inc https://sigopt.com

USE CASE: COMPUTER VISION

ML / AIModel

(Tensorflow)

TestingImages

CrossValidation

Accuracy

Better Results

REST API

HyperparameterConfigurations

andFeature

TransformationsTrainingImages

Because SigOpt is a black-box optimization platform it is agnostic to the underlying model being tuned and can be readily used to tune any Machine Learning or AI pipeline.

All SigOpt requires is continuous, integer, or categorical parameters to tune, whether they are hyperparameters of a machine learning model or feature transformation parameters of an NLP or computer vision model as well as a performance metric to optimize.

SigOpt makes no assumptions about the underlying parameters or metric. It can even be a composite of many underlying metrics and does not need to be convex, continuous, differentiable, or even defined for all configurations.

In this specific example, we’ll compare the relative tradeoffs of different tuning strategies in a computer vision classification pipeline using Google’s tensorflow on the SVHN image dataset. We’ll tune the various hyperparameters of tensorflow, as well as feature transformation parameters related to the images themselves, to optimize the accuracy of the classifier.

© 2017 SigOpt, Inc https://sigopt.com

COMPARATIVE PERFORMANCE

● Better: 315% better accuracy than baseline

● Faster/Cheaper: 88% cheaper than standard tuning methods

Cost (1 production model, 50 GPU cluster)

No Tuning

Random Search

Accu

racy

CV

Acc

0.75

0.5

0.25

0$5,000$7,500$10,000

1.0

$2,500 $0

Tensorflow CNN ExampleNeon DNN Examples

SigOpt beat the default hyperparameter configuration as well as the standard randomized search method in both accuracy and the cost to achieve that accuracy, by requiring fewer evaluations to reach the optimal configuration on a GPU cluster.

SigOpt found a model that had a 315% relative improvement in the metric when compared to the default tensorflow parameters, while also requiring 88% fewer evaluations than the standard random search approach.

Extended Tensorflow example- Blog:

http://blog.sigopt.com/post/141501625253/sigopt-for-ml-tensorflow-convnets-on-a-budget

- Code: https://github.com/sigopt/sigopt-examples/tree/master/tensorflow-cnn Extended Neon DNN examples

- Blog: http://blog.sigopt.com/post/146208659358/much-deeper-much-faster-deep-neural-network

- Code: https://github.com/sigopt/sigopt-examples/tree/master/dnn-tuning-nervana

© 2017 SigOpt, Inc https://sigopt.com

COMPARATIVE PERFORMANCE ● Better Results, Faster and Cheaper

Quickly get the most out of your models with our proven, peer-reviewed ensemble of Bayesian and Global Optimization Methods

○ A Stratified Analysis of Bayesian Optimization Methods (ICML 2016)○ Evaluation System for a Bayesian Optimization Service (ICML 2016)○ Interactive Preference Learning of Utility Functions for Multi-Objective Optimization (NIPS 2016)○ And more...

● Fully FeaturedTune any model in any pipeline

○ Scales to 100 continuous, integer, and categorical parameters and many thousands of evaluations○ Parallel tuning support across any number of models○ Simple integrations with many languages and libraries○ Powerful dashboards for introspecting your models and optimization○ Advanced features like multi-objective optimization, failure region support, and more

● Secure Black Box OptimizationYour data and models never leave your system

SigOpt provides best-in-class performance. We’ve successfully deployed our solution at firms worldwide and rigorously compare our methods to standard and open source alternatives at the top machine learning conferences.

Our platform scales to any problem and provides features like native parallelism, multi-objective optimization, and more.

Additionally, our black box optimization approach means that your proprietary data and models never leave your system, allowing you to leverage these powerful techniques on top of the infrastructure and tools you’ve already built.

Links:○ A Stratified Analysis of Bayesian Optimization Methods (ICML 2016)

■ https://arxiv.org/pdf/1603.09441v1.pdf○ Evaluation System for a Bayesian Optimization Service (ICML 2016)

■ https://arxiv.org/abs/1605.06170○ Interactive Preference Learning of Utility Functions for Multi-Objective

Optimization (NIPS 2016)■ https://arxiv.org/abs/1612.04453

○ And more…■ https://sigopt.com/research

© 2017 SigOpt, Inc https://sigopt.com

SIMPLIFIED OPTIMIZATIONClient Libraries● Python● Java● R● Matlab● And more...

Framework Integrations● TensorFlow● Scikit-learn● xgboost● Keras● Neon● And more...

Live Demo

The SigOpt optimization platform integrates with any technology stack and the intuitive dashboards shine a light on the otherwise opaque world of parameter tuning.

Just plug our API in, tune your models, and your whole team benefits from the history, transparency, and analysis in the platform.

Documentation: https://sigopt.com/docsIntegrations: https://github.com/sigoptLive Demo: https://sigopt.com/getstarted

© 2017 SigOpt, Inc https://sigopt.com

DISTRIBUTED TRAINING

● SigOpt serves as a distributed scheduler for training models across workers

● Workers access the SigOpt API for the latest parameters to try for each model

● Enables easy distributed training of non-distributed algorithms across any number of models

SigOpt also allows you to tune any algorithm in parallel by acting as a distributed scheduler for parameter tuning.

This allows you to tune traditionally serial models in parallel, and achieve better results faster than otherwise possible, while also scaling across any number of independent models.

More info: https://sigopt.com/docs/overview/parallel

© 2017 SigOpt, Inc https://sigopt.com

SIGOPT CUSTOMERS

SigOpt has successfully engaged with globally recognized leaders in insurance, credit card, algorithmic trading and consumer packaged goods industries. Use cases include:

● Trading Strategies● Complex Models● Simulations / Backtests● Machine Learning and AI

Select Customers

SigOpt has been deployed successfully at some of the largest and most sophisticated firms and universities in the world

We’ve helped tune everything from algorithmic trading strategies to machine learning and AI pipelines and beyond.

© 2017 SigOpt, Inc https://sigopt.com

Contact us to set up an evaluation today

[email protected]

Contact us to set up an evaluation and unleash the power of Bayesian and Global Optimization on your models today.

© 2017 SigOpt, Inc https://sigopt.com