Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

21
Inês Almeida . PAPIs connect . March 2016 MLaaS Benchmark on building your ML soluon

Transcript of Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Page 1: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

MLaaS Benchmarkon building your ML solution

Page 2: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

!

given a user’s recent mobile app activity, will he return within two weeks?

Page 3: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

what is the best ML solution?

Page 4: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

Amazon ML

� documentation

� cheapest

� model exporting

� incremental training

� unknown algorithms

� model exporting

� algorithm variety

� interface

� most expensive

� model exporting

Google Predict MS Azure ML

Page 5: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

aspects considered

I. data preprocessing operationsII. algorithmsIII. perfomance

Page 6: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

I. data preprocessing

Page 7: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

• turning raw data into structured data _ data cleaning _ missing value imputation _ feature engineering aka dark magic

• can make or break your solution

• probably easier to do on your side

Page 8: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

Amazon ML

missing value imputation � not explicity � yes, automatic � yes, custom

� yes � no � yes

� yes � yes � yes

� yes � yes � yes

data scaling

text tokenization

categorical data encoding

Google Predict MS Azure ML

Page 9: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

II. algorithms

Page 10: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

supervised learning

• linear models _ easier to train and tune _ limited expressiveness

• nonlinear models _ more expressive capabilities _ prone to overfitting _ random forests: the no-brainer

Page 11: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

Amazon ML

supervised learning linear algorithms unknown, possibly linear

linear and nonlinear algorithms

none none k-meansunsupervised learning

Google Predict MS Azure ML

Page 12: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

III. perfomance

Page 13: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

Amazon ML

test set accuracy 81% 82% 81% 81%

Google Predict MS Azure ML scikit learn

Page 14: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

what is the best ML solutionfor us?

Page 15: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

• distributed, large-scale solution _ Hadoop (HDFS) for data storage _ Spark for ML computing _ requires much effort

Page 16: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

• single-machine solution _ MongoDB for data storage _ Python packages for ML computing _ exploits our current architecture _ works fine for our scale

Page 17: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

Liquid

data(MongoDB)

models(MongoDB)

data processing

model training

predicting

API(Flask)

ML Web Service(pandas, sklearn, theano, …)

Page 18: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

what is the best ML solutionfor you?

Page 19: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

• if using an external provider _ ML services need some data science knowledge _ keep data preprocessing on your side

• if building your own solution _ exploit your product’s strengths _ start simple, then build on it

Page 20: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

alternatives

• bigml _ generic ml service that uses random forests

• prediction.io _ open source ML server with customizable templates

• algorithmia _ algorithm marketplace (not just ML)

Page 21: Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect

Inês Almeida . PAPIs connect . March 2016

resources

Machine Learning as a Service on Liquid Bloghttps://blog.onliquid.com/machine-learning-service-benchmark/

Machine learning APIs: which performs best? by Louis Dorardhttp://www.louisdorard.com/blog/machine-learning-apis-comparison

Principles of Machine Learning Benchmarking by Joey Richardhttp://www.wise.io/blog/principles-of-machine-learning-benchmarking

Does off-the-shelf machine learning need a benchmark? by Jay Krepshttp://blog.empathybox.com/post/18810157226/does-off-the-shelf-machine-learning-need-a