Modern Machine Learning Infrastructure and Practices
-
Upload
will-gardella -
Category
Software
-
view
143 -
download
1
Transcript of Modern Machine Learning Infrastructure and Practices
• Robotics • Pricing and Optimization • Big Data, Hadoop and Spark • Data Science, ML in Display Advertising • ML, Relevance in Sponsored Search • Contenting Ranking for FB Posts
About Me
• Advantages from mining/learning patterns in data • Cost of Storage and Compute • Distributed Systems
Machine Learning Why Now?
• Specific Tasks • Quality Data • Feature Engineering • Iterations of Experiments
ML Today
Domain Knowledge
StatisticsEngineering
ML Workflow New Hypothesis• Data Analysis• Problem Formulation• Short/Long Term Objectives
Data Preparation• Acquire Data• Synthesize• Clean/Reformat
Feature Engineering• Domain Knowledge• Creativity• Extraction Pipeline
Online Evaluation• Bucket Test• Launch Criteria• Metrics – CTR, Time Spent• Performance Impact
Offline Evaluation• Evaluate on Test Set• Metrics – PR/AUC/NDCG
Model Training• Training Algorithm• Hyper Parameter Tuning• Over-fitting
Example of a ML System
Datastore
ETL Ad-hoc Analysis
MLFramework
DistributedKV-Store
Snapshot RealtimeFeatures
AlgorithmService
LoggingService
• Ad-hoc Analysis • Adding and Validating New Features • Gap between Online/Offline Metrics • System/Other Issues
Challenges and Lessons Learned 4 V’s of Big Data
• Expensive computation in training (clusters, GPUs) • Interpretability of model • Power consumption
Challenges