Real-world ML use cases - GitHub Pages
Transcript of Real-world ML use cases - GitHub Pages
![Page 1: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/1.jpg)
Real-world ML use cases
Agenda
• NLP models for Customer Support
• Model retraining strategies
• Gran Neural Networks for dish/restaurant recommendation
• Learning from recommender system deployment
• Lessons learned from real-world data collection
Piero Molino
![Page 2: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/2.jpg)
Improving Uber Customer Support with Natural Language Processing and Deep Learning
Piero Molino | AI LabsHuaixiu Zheng | Applied Machine Learning Yi-Chia Wang | Applied Machine Learning
Customer Obsession Ticket Assistant
![Page 3: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/3.jpg)
COTA v1: classical NLP + ML models
○ Faster and more accurate customer care experience○ Million $ of saving while retaining customer satisfaction
COTA v2: deep learning models
○ Experiments with various deep learning architectures○ 20-30% performance boost compared to classical models
Main Takeaways
![Page 4: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/4.jpg)
COTA Blog Post and followup, KDD paper
![Page 5: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/5.jpg)
Motivation and Solution
Complexity of Customer support @Uber
COTA v1: Traditional ML / NLP Models
Multi-class Classification vs Ranking
COTA v2: Deep Learning Models
Deep learning architectures
COTA v1 vs COTA v2
Agenda
![Page 6: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/6.jpg)
Motivation and Solution
Complexity of Customer support @Uber
COTA v1: Traditional ML / NLP Models
Multi-class Classification vs Ranking
COTA v2: Deep Learning Models
Deep learning architectures
COTA v1 vs COTA v2
Agenda
![Page 7: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/7.jpg)
What is the challenge?As Uber grows, so does our volume of support tickets
Millions of tickets from riders / drivers / eaters per week
Thousands of different types of issues users may encounter
![Page 8: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/8.jpg)
User
CSRContact Ticket
Response
Select Flow Node
Write Message SelectContact Type
Lookup info &Policies
Select ActionWrite response using a Reply Template
Uber Support Platform
![Page 9: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/9.jpg)
What is the challenge?And it is not easy to solve a ticket
1000+ typesin a hierarchydepth: 3~6
10+ actions (adjust fare, add appeasement, …)
1000+ reply templates
![Page 10: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/10.jpg)
Motivation and Solution
Complexity of Customer support @Uber
COTA v1: Traditional ML / NLP Models
Multi-class Classification vs Ranking
COTA v2: Deep Learning Models
Deep learning architectures
COTA v1 vs COTA v2
Agenda
![Page 11: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/11.jpg)
SUGGESTED CONTACT TYPES
Driver > Account > Unable to sign in or go online > Account inactive
Driver > Account > Profile > Unsubscribe > SMS or Text
Driver > Account > Vehicles > Edit vehicle class
Reorder actions in relevance
Surface top-3 most-relevant reply templates
COTA v1: Suggested ResolutionMachine learning models recommending the 3 most relevant solutions
![Page 12: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/12.jpg)
Multiclass classification
Pointwise ranking
E.g., User type
E.g., ETA, city
E.g., Ticket creation time, product type
COTA v1 Model Pipeline
Information about the contact type or reply, obtained from all tickets belonging to it
![Page 13: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/13.jpg)
cosine similarity
Topi
c #i
Topic #j
TicketCT1
CT2
Multi-class Classification Pointwise Ranking
Tickets Features
Label (CT1, CT2)
t1 features CT1
t2 features CT2
Tickets Features Type Features Sim (t, CT) Label (0, 1)
t1 features CT1 features 0.8 1
t1 features CT2 features 0.1 0
t2 features CT1 features 0.2 0
t2 features CT2 features 0.7 1
Ranking allows us to include features of candidate types and similarity features between a ticket and a candidate type
Model: Random Forest with hyperparameters optimized through grid search
From Classification to Ranking
![Page 14: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/14.jpg)
6% absolute (10% relative) improvement
Hits@3: any of the top 3 suggestions is selected by CSRs
Performance Comparison
![Page 15: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/15.jpg)
Motivation and Solution
Complexity of Customer support @Uber
COTA v1: Traditional ML / NLP Models
Multi-class Classification vs Ranking
COTA v2: Deep Learning Models
Deep learning architectures
COTA v1 vs COTA v2
Agenda
![Page 16: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/16.jpg)
Input Encoders Combiner Output Decoders
Generic architecture, reusable in many different applications.We are considering open-sourcing it!
Text features
Categorical features
Numerical features
Encoder Decoder
Categorical features
Text features
Decoder
Binary features
Encoder
Encoder
EncoderCombiner
Numerical featuresDecoder
Set features Encoder
Sequential features Encoder
Binary featuresDecoder
Set featuresDecoder
Sequential featuresDecoder
COTA v2: Deep Learning Architecture
![Page 17: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/17.jpg)
Key contributors Travis Addair Yaroslav Dudin Sai Sumanth Miryala Jim Thompson Avanika Narayan Ivaylo Stefanov John Wahba Doug Blank Patrick von Platen Carlo Grisetti Chris Van Pelt Boris Dayma
Documentation http://ludwig.ai
Repository http://github.com/uber/ludwig
Blogpost http://eng.uber.com/introducing-ludwig http://eng.uber.com/ludwig-v0-2/ http://eng.uber.com/ludwig-v0-3/
White paper https://arxiv.org/abs/1909.07930
![Page 18: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/18.jpg)
COTA v2: Text Encoding Models
Char / Word CNN RNNChar CNN Char / Word RNNWord CNN
Char Seq
6 x1D Conv
2 x FC
Vector
Word Seq
2 x FC
Vector
1D Conv width 5
1D Conv width 4
1D Conv width 3
1D Conv width 2
Char / Word Seq
2 xRNN
2 x FC
Vector
Char / Word Seq
2 x RNN
2 x FC
Vector
3 x Conv
![Page 19: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/19.jpg)
Which text encoder?Hyperparameter search for contact type classification
Model Validation accuracyTraining time per epoch in minutes
CharCNNRNN opt 0.4805 35
WordCNN opt 0.4733 4
WordRNN opt 0.4713 17
WordCNNRNN opt 0.4615 12
CharCNN opt 0.4598 5
WordCNN is the best compromise between performance and speed
20%+ over Random Forest used in COTA v1 and ~10x faster than CharCNNRNN
![Page 20: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/20.jpg)
Sequence Model for Type SelectionPredict the sequence of nodes instead of leaf node
Use a Recurrent Decoder to predict sequences of nodes in the contact type tree
Pick the last class before <eos> as prediction
Model makes more reasonable mistakes
CT0
CT1
CT2
CT3
CT4
CT5
CT6
CT7
CT8
CT9
CombinerOutput RNN RNN RNN RNN
CT0 CT3 CT4 CT8
EOSCT3 CT4 CT8
Example: Driver > Trips > Pickup and drop-off issues > Cancellation Fee > Driver Cancelled
![Page 21: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/21.jpg)
Text featurese.g. message
Categorical featurese.g. flow node
Numerical featurese.g. trip fare
Final ArchitectureMulti-task sequential learning
TYPE REPLY
Trainground-truth
TYPE
REPLY
TYPE REPLY
Testpredicted
Convolution layers
Embeddinglayer
Batch-normlayer
Binary featurese.g. is completed
FC + Dropout
layers
Recurrent Decoder
Softmax layer
![Page 22: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/22.jpg)
Effect of Adding Dependencies Between Tasks
Adding the dependency from Type to Reply improves accuracy
It also improves a lot the coherence between the two models, increasing combined accuracy consistently
Combined accuracy computed requiring both Type and Reply model to be correct at the same time
![Page 23: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/23.jpg)
Motivation and Solution
Complexity of Customer support @Uber
COTA v1: Traditional ML / NLP Models
Multi-class Classification vs Ranking
COTA v2: Deep Learning Models
Deep learning architectures
COTA v1 vs COTA v2
Outline
![Page 24: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/24.jpg)
COTA v2 is consistently more effective than COTA v1 on all metrics for both models
The combined accuracy in particular shows an absolute ~+9% (relative +~20%)
COTA v1 vs. COTA v2 offline comparison
![Page 25: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/25.jpg)
COTA v1 vs. COTA v2 A/B Test
![Page 26: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/26.jpg)
COTA v2 is 20-30% more accurate than COTA v1 in online A/B tests
COTA v1 reduces handling time of ~8%, while COTA v2 provides an additional ~7% reduction, more than ~15% overall reduction
Statistically significant customer satisfaction improvement
COTA v1 vs. COTA v2 A/B Test
![Page 27: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/27.jpg)
Threshold on Type Model Confidence
![Page 28: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/28.jpg)
Threshold on Both Models’ Confidence
![Page 29: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/29.jpg)
95% accuracy → 10% coverage90% accuracy → 20% coverage
Coverage vs. Maximum Accuracy
![Page 30: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/30.jpg)
Conclusions
Moving from traditional to deep learning models, we observe a substantial performance boost (up to 30%)
Using intelligent suggestions we were able to reduce ticket handling time without impacting customer satisfaction
Using NLP & ML COTA makes customer care experience faster and more accurate while saving Uber millions of $
![Page 31: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/31.jpg)
Model degradation
Distribution shift in the real world
• Bugs get solved, probability of a issue type can decrease
• New products can be added (UberPool) so new issue types appear
Older data becomes noise
• We often talk about distribution shift in the test set, but the test set of a month ago is the training set now
![Page 32: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/32.jpg)
Retraining Strategy
Dealing with distribution shift is an open research topic
In practice in most cases the safest route is just retraining the model
But...
• How often to retrain?
• What triggers retraining?
• With how much data?
![Page 33: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/33.jpg)
Offline simulation: time-based split
TimeJan Feb Mar Apr
Dataset
Training
Test
![Page 34: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/34.jpg)
Offline simulation: split in weeks
TimeJan Feb Mar Apr
Dataset
Train
Test Test Test Test
TrainTrainTrainTrainTrainTrainTrain
![Page 35: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/35.jpg)
Offline simulation
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
![Page 36: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/36.jpg)
Offline simulationTrain Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
Train Test Test Test TestTrainTrainTrainTrainTrainTrainTrain
...
![Page 37: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/37.jpg)
Offline simulation
TimeJan Feb Mar Apr
Dataset
TrainTrainTrainTrainTrainTrainTrainTrain Test Test Test Test
TrainTrainTrainTrainTrainTrainTrainTrain Test Test Test Test
TrainTrainTrainTrainTrainTrainTrainTrain Test Test Test Test
TrainTrainTrainTrainTrainTrainTrainTrain Test Test Test Test
TrainTrainTrainTrainTrainTrainTrainTrain Test Test Test Test
![Page 38: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/38.jpg)
Retraining Strategy
How often to retrain? With how much data?
0
0,225
0,45
0,675
0,9
week+1 week+2 week+3 week+40
0,225
0,45
0,675
0,9
week-8 week-6 week-4 week-2
![Page 39: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/39.jpg)
Online Retraining
What triggers retraining?
Used learnings from offline simulation
Retrained when performance dropped below performance on the test set at training original training time - 8% (relative)
Retrained with 1.5 months of training data, as we learned from the offline simulation that more was detrimental to performance
![Page 40: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/40.jpg)
COTA TeamCross-functional collaboration
AI LabsApplied Machine LearningCustomer ObsessionMichelangeloSensing and Perception
![Page 41: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/41.jpg)
Enhancing Recommendations on Uber Eats with Graph Convolutional NetworksAnkit Jain/Piero Molino
ankit.jain/[email protected]
![Page 42: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/42.jpg)
Agenda1. Graph Representation Learning
2. Dish Recommendation on Uber Eats
3. Graph Learning on Uber Eats
![Page 43: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/43.jpg)
Graph Representation Learning
![Page 44: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/44.jpg)
Linked Open Data
Graph data
Social networks Biomedical networks
Information networks Internet Networks of neurons
![Page 45: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/45.jpg)
Tasks on graphs
Node classificationPredict a type of a given node
Link predictionPredict whether two nodes are linked
Community detectionIdentify densely linked clusters of nodes
Network similarityHow similar are two (sub)networks
![Page 46: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/46.jpg)
Define an encoder mapping from nodes to embeddings
Define a node similarity function based on the network structure
Optimize the parameters of the encoder so that:
Learning framework
embedding spaceoriginal graph
![Page 47: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/47.jpg)
Simplest encoding approach: encoder is just an embedding-lookup
Algorithms like Matrix Factorization, Node2Vec, Deepwalk fall in this category
Embedding size
One column per node
embedding matrix
embedding vector for a specific node
Shallow encoding
![Page 48: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/48.jpg)
Shallow encoding limitations
O(|V|) parameters are needed, every node has its own embedding vector
Either not possible or very time consuming to generate embeddings for nodes not seen during training
Does not incorporate node features
![Page 49: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/49.jpg)
Graph Neural NetworkKey Idea: To obtain node representations, use a neural network to aggregate information from neighbors recursively by limited Breadth-FIrst Search (BFS)
INPUT GRAPH
A
A
C NN
NN
A
F
E
B
A
NN
B
D
C NN
DEPTH 2 BFS
A
B
D
E
F
C
![Page 50: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/50.jpg)
train with snapshot new node arrives generate embedding for new node
Inductive capability
In many real applications new nodes are often added to the graph
Need to generate embeddings for new nodes without retraining
Hard to do with shallow methods
![Page 51: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/51.jpg)
Dish Recommendation on Uber Eats
![Page 52: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/52.jpg)
Suggested Dishes
Recommended DishesCarousel Picked for You
![Page 53: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/53.jpg)
![Page 54: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/54.jpg)
![Page 55: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/55.jpg)
![Page 56: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/56.jpg)
![Page 57: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/57.jpg)
Graph Learning in Uber Eats
![Page 58: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/58.jpg)
Users connected to dishes they have ordered in the last M days
Weights are frequency of orders
Graph properties
Graph is dynamic: new users and dishes are added every day
Each node has features, e.g. word2vec of dish names
3
1
4
Bipartite graph for dish recommendation
U1
U2
D1
D2
![Page 59: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/59.jpg)
positive pair
negativesample
margin
For dish recommendation we care about ranking, not actual similarity score
Max Margin Loss:
Max Margin Loss
![Page 60: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/60.jpg)
New loss with Low Rank Positives
U1
U2
D2
D4
D1
D3
5
2
2
1
3
U1 D1
Positive Negative Low Rank Positive
U1 D4 U1 D3
![Page 61: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/61.jpg)
Weighted pool aggregation
Aggregate neighborhood embeddings based on edge weight
hD hB
hA
hC
5
2
1
Q
Q
Q
denotes a fully connected layer
![Page 62: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/62.jpg)
Model Test AUC
Previous production model 0.784
With graph embeddings 0.877
Offline evaluation
Trained the downstream Personalized Ranking Model using graph node embeddings
~12% improvement in test AUC over previous production model
![Page 63: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/63.jpg)
Feature Importance
Graph learning cosine similarity is the top feature in the model
![Page 64: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/64.jpg)
Online evaluation
Ran a A/B test of the Recommended Dishes Carousel in San Francisco
Significant uplift in Click-Through Rate with respect to the previous production model
Conclusion: Dish Recommendations with graph learning features are live in San Francisco, soon everywhere else
![Page 65: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/65.jpg)
ServingTraining Pipeline Step 2Training Pipeline Step 1
Data Pipeline Step 1
Table 1Table 1Source Tables
Versioned nodes and edges
Daily Ingestion
Data Pipeline Step 2
date
...date
- k
Collapsed graph with latest nodes
and edges
Date
Data Pipeline Step 3
Partitioned city graph using
Cypher
Data Pipeline Step 4
NetworkX graph for model training
& embedding generation
Back-dated graph for offline analysis
Past Date
GNN Model Training
Personalized Ranker Model
Training
Node Embeddings
Ranker Model
Personalized Ranker Online
Recommendation
![Page 66: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/66.jpg)
More Resources
Uber Eng Blog Post
Learn better representation in data scarcity regimes like small/new cities through meta-learning [NeurIPS Graph Representation Learning Workshop 2019]
![Page 67: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/67.jpg)
Learnings
In complex data pipelines, the model isn't always the bottleneck
• Graph processing was more expensive than model inference because of sheer size
Even when the model (or the data proc + model) is the bottleneck you can often precompute and cache
• Precomputed a big LRU cache of user-to-dish/restaurant similarities. It was recomputed entirely only when the model was updated and refreshed after user ordered
![Page 68: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/68.jpg)
Learnings: online evaluation issues
Q: Despite big offline gains, only got small improvement in Click through rate and orders (still statistically significant and worth millions of dollars), why?
A: Our recommendations where a small part of the UI, "favourite restaurants" and "Daily Deals" came always first in the UI and gathered most of clicks and orders. Bewre how you choose the denominator of your metrics!
![Page 69: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/69.jpg)
Learnings: online evaluation issues
Q: Despite big offline gains, only got small improvement in Click through rate and orders (still statistically significant and worth millions of dollars), why?
A: Our recommendations where a small part of the UI, "favourite restaurants" and "Daily Deals" came always first in the UI and gathered most of clicks and orders. Bewre how you choose the denominator of your metrics!
![Page 70: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/70.jpg)
Learnings: online evaluation issues
Q: Why is it hard to show big online gains in recommender systems in general?
A: If there's a model in production your are comparing against, you are likely using biased data for both training and prediction!
![Page 71: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/71.jpg)
Learnings: online evaluation issues
Q: Why is it hard to show big online gains in recommender systems in general?
A: If there's a model in production your are comparing against, you are likely using biased data for both training and prediction!
![Page 72: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/72.jpg)
Learnings: data bias
The world changes (new restaurants and dishes) -> ML lifecycle is a loop
The user behavior changes (now that my favorite pizza place is on the app, I start always ordering from there)
Model deployment changes user behavior (the items the model suggest influence your behavior)
Biased training data and biased evaluation data
![Page 73: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/73.jpg)
Learnings: data bias
Q: How to collect unbiased data?
A: Complicated, one option is to show random recommendations to x% of users
![Page 74: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/74.jpg)
Learnings: data bias
Q: How to collect unbiased data?
A: Complicated, one option is to show random recommendations to x% of users
![Page 75: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/75.jpg)
Learnings: data bias
Q: What is the cost of collecting unbiased data?
A: The likelyhod of those users actually selecting those items is very low -> small positive data is collected, those users may not buy anything -> the company looses money!
![Page 76: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/76.jpg)
Learnings: data bias
Q: What is the cost of collecting unbiased data?
A: The likelyhod of those users actually selecting those items is very low -> small positive data is collected, those users may not buy anything -> the company looses money!
![Page 77: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/77.jpg)
Learnings: data bias
Q: What could be compromise solutions?
A: Show to users random predictions from within the top 100 predicted by the model. Data is still biased, but more likely to collect unexpected positive datapoints.
![Page 78: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/78.jpg)
Learnings: data bias
Q: What could be compromise solutions?
A: Show to users random predictions from within the top 100 predicted by the model. Data is still biased, but more likely to collect unexpected positive datapoints.
![Page 79: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/79.jpg)
Team
Ankit Jain Isaac Liu Ankur Sarda
Piero Molino Long Tao Jimin Jia
Jan Pedersen Nathan Berrebbi Santosh Golecha
Ramit Hora Alex Danilychev
![Page 80: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/80.jpg)
Restaurant preparation timeThe data generation process
Time
User orders
Restaurant prepares
Driver dispatch
Dirver arrives
Order delivered
User
Restaurant
Driver
User
Driver arrives
Order pickup
![Page 81: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/81.jpg)
Restaurant preparation timeThe data generation process
Predict restaurant preparation time is useful, I can decide when to dispatch the driver to reduce wait! (If I can also predict when the driver will arrive)
• How do you know when a restaurant is done preparing?
• The driver can arrive early, in which case the preparation time is from initial order to order pickup
• If the driver arrives late, and the dish is already prepared, the order pickup time is a upper bound
Time
User orders
Restaurant prepares
Driver dispatch
Dirver arrives
Order delivered
User
Restaurant
Driver
User
Driver arrives
Order pickup
![Page 82: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/82.jpg)
Restaurant preparation time modeling
We tried trainign a model anyway using order pickup
Huge variance in the training data -> Huge variance in predictions!
Our model was 5min more acurate than previous one, but with stddev +- 10min!
Mean absolute error0 2 4 6 8 10 12 14 16 18 20+
XGboost NewModel
![Page 83: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/83.jpg)
Restaurant preparation time variance
Drilled into the data to understand the source of variance
Same restaurant, same day, same order, few minutes after -> 20min prep time vs 2min prep time
Why?
Restaurant Order Day Time Prep Time
POD Thai Pho Soup Tuesday 2nd 19:10 20m
POD Thai Pho Soup Tuesday 2nd 19:15 2m
![Page 84: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/84.jpg)
Restaurant preparation time new feature
Restaurants batch orders!
Theory: They prepare a big amount of soup when first ordered, the next soup order will take much less because they are already prepared
Added a feature in the model:
were items in the order ordered in the last x minutes?
Improved predictions by 2min, reduced stddev by 1/3 (still a lot)
Mean absolute error0 2 4 6 8 10 12 14 16 18 20+
XGboostNewModelNewModel + feat
![Page 85: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/85.jpg)
Restaurant preparation time moral
Went back to data collection, asked restaurants to notify us when the order was ready
Still noisy data (restaurants have no incentive to be precise, or they forget entirely), but better estimate
Moral: ML lifecycle is a loop and you can go back to the data collection process even after deployment, and iterate the process multiple times
![Page 86: Real-world ML use cases - GitHub Pages](https://reader033.fdocuments.us/reader033/viewer/2022050214/626e48ed165c460a5c0b8f3b/html5/thumbnails/86.jpg)
What am I working on now
@Stanford with Chris Ré
Ludwig: declarative multimodal deep learning pipeline toolbox (no code needed, extensible, AutoML capabilities)
For a talk about Ludwig you can check my website http://w4nderlu.st or the last Stanford MLSys Seminar Series episode http://mlsys.stanford.edu
Founded a company to make ML accessible to less technical people: AutoML + end-to-end platform built on Ludwig + secret spicy sauce!