Machine Learning & Ecommerce - by David Jones - PAPIs Connect
Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect
-
Upload
papisio -
Category
Technology
-
view
568 -
download
1
Transcript of Machine Learning Services Benchmark - Inês Almeida @ PAPIs Connect
Inês Almeida . PAPIs connect . March 2016
MLaaS Benchmarkon building your ML solution
Inês Almeida . PAPIs connect . March 2016
!
given a user’s recent mobile app activity, will he return within two weeks?
Inês Almeida . PAPIs connect . March 2016
what is the best ML solution?
Inês Almeida . PAPIs connect . March 2016
Amazon ML
� documentation
� cheapest
� model exporting
� incremental training
� unknown algorithms
� model exporting
� algorithm variety
� interface
� most expensive
� model exporting
Google Predict MS Azure ML
Inês Almeida . PAPIs connect . March 2016
aspects considered
I. data preprocessing operationsII. algorithmsIII. perfomance
Inês Almeida . PAPIs connect . March 2016
I. data preprocessing
Inês Almeida . PAPIs connect . March 2016
• turning raw data into structured data _ data cleaning _ missing value imputation _ feature engineering aka dark magic
• can make or break your solution
• probably easier to do on your side
Inês Almeida . PAPIs connect . March 2016
Amazon ML
missing value imputation � not explicity � yes, automatic � yes, custom
� yes � no � yes
� yes � yes � yes
� yes � yes � yes
data scaling
text tokenization
categorical data encoding
Google Predict MS Azure ML
Inês Almeida . PAPIs connect . March 2016
II. algorithms
Inês Almeida . PAPIs connect . March 2016
supervised learning
• linear models _ easier to train and tune _ limited expressiveness
• nonlinear models _ more expressive capabilities _ prone to overfitting _ random forests: the no-brainer
Inês Almeida . PAPIs connect . March 2016
Amazon ML
supervised learning linear algorithms unknown, possibly linear
linear and nonlinear algorithms
none none k-meansunsupervised learning
Google Predict MS Azure ML
Inês Almeida . PAPIs connect . March 2016
III. perfomance
Inês Almeida . PAPIs connect . March 2016
Amazon ML
test set accuracy 81% 82% 81% 81%
Google Predict MS Azure ML scikit learn
Inês Almeida . PAPIs connect . March 2016
what is the best ML solutionfor us?
Inês Almeida . PAPIs connect . March 2016
• distributed, large-scale solution _ Hadoop (HDFS) for data storage _ Spark for ML computing _ requires much effort
Inês Almeida . PAPIs connect . March 2016
• single-machine solution _ MongoDB for data storage _ Python packages for ML computing _ exploits our current architecture _ works fine for our scale
Inês Almeida . PAPIs connect . March 2016
Liquid
data(MongoDB)
models(MongoDB)
data processing
model training
predicting
API(Flask)
ML Web Service(pandas, sklearn, theano, …)
Inês Almeida . PAPIs connect . March 2016
what is the best ML solutionfor you?
Inês Almeida . PAPIs connect . March 2016
• if using an external provider _ ML services need some data science knowledge _ keep data preprocessing on your side
• if building your own solution _ exploit your product’s strengths _ start simple, then build on it
Inês Almeida . PAPIs connect . March 2016
alternatives
• bigml _ generic ml service that uses random forests
• prediction.io _ open source ML server with customizable templates
• algorithmia _ algorithm marketplace (not just ML)
Inês Almeida . PAPIs connect . March 2016
resources
Machine Learning as a Service on Liquid Bloghttps://blog.onliquid.com/machine-learning-service-benchmark/
Machine learning APIs: which performs best? by Louis Dorardhttp://www.louisdorard.com/blog/machine-learning-apis-comparison
Principles of Machine Learning Benchmarking by Joey Richardhttp://www.wise.io/blog/principles-of-machine-learning-benchmarking
Does off-the-shelf machine learning need a benchmark? by Jay Krepshttp://blog.empathybox.com/post/18810157226/does-off-the-shelf-machine-learning-need-a