Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
-
Upload
elad-rosenheim -
Category
Software
-
view
454 -
download
0
Transcript of Taking Machine Learning from Batch to Real-Time (big data eXposed 2015)
Learn Like a Human – Taking Machine Learning from Batch to Real-Time
Elad Rosenheim
Who am IArchitect at Dynamic Yield,
“Predictors” Team Lead
Previously:AlphaCSPSAP
Performance & Scale, DevOpsMeasure All the Things!
East-Asia & Japan
Who’s Dynamic Yield?
We’re optimizing & personalizing websites since 2011
Start-up in Tel-Avivheaded by Liad Agmon
I Joined as 5th employee, we’re 50 now and growing fast
On the AgendaOur clients’ problem
Old School Solutions
Meet the ML Bandits
Our clients’ problem
Publishers, retailers, SaaSall share a common problem
They know their domainbut not how to optimize for each user
Screen real-estate is limitedyet everyone sees the same thing
What top videos to show on NBC News’ site?
What user segments should see this element at this location?
What’s the best layout for this element?
Both the layout of this page and each element in it deserve testing
What’s the best layout?
What types of products to show whom?
What articles to show on ynet’s homepage? What titles and images?
In what order?
What is the best default sort order for products on Adika?
Does is significantly differ between user segments?
The BeginningFirst, there was the educated guess
Then, there was the A/B test"Data Beats Opinion“Freedom to experiment (with nice tools)Hopefully: less fear of change, less politics
How does it work?Split traffic between baseline and alternative variationsIn theory: sit & wait for significant resultsIn practice: peek at the numbers till the nice “95% confidence”
A/B Tests: Already Old School?
While you wait, you're bleeding clicks
clicks == money
What about the really dynamic stuff?Campaigns, Current Headlines, Products on Sale
Enter the Multi-Arm BanditsA Single-Arm Bandit
Suppose I have multiple arms in front of me,each with its unknown mean reward…
How do I optimize income from multiple machines?Caution or Haste?
Explore vs. Exploit
In our context:How do I optimize multiple variations?
Bandits - A Classic Problem(Very) Simple Solutions
ε-greedy, ε–decreasingFirst 100% random explore, then ~90% exploit?Magic numbers, built-in revenue loss
Bayesian-based approachesSmoother curve from explore to exploit
“Winner” is now a less relevant term
Bandits work well when…
We want to find the variation “best on average“
…but we’re not improving the conversion rate of any single variation
2.4% 1.7% 0.4%
Enter PersonalizationEach of us is a beautiful and unique feature vector!
By showing the right variation to the right people,we can improve conversions per variation
and beat the best variation
ML Challenge Accepted
The Usual Suspects
Collaborative Filtering?Very big, very sparse matrix
Cold StartBatch
Not suitable in this case
Classifiers?Logistic Regression, Random Forest et al.Periodically learn over all converters so far
More data == more time, bigger modelNot the classic question
What We NeedLike a bandit, we need to learn as we go (not in batch),
but this time with “context” - the user’s data
Incremental Learning over the stream of impressions & rewards(“Partial Fit”)
We’re looking to…Start learning from the first impressionHandle the explore-exploit curveRun fast (enough)In the worst case: converge on the best variation, like a bandit
Meet the Contextual Bandits
They “eat” the data streamThey demand fast access to user data
Historical or immediateTheir model is always ready for action
In the PapersLinear Bayes, LinUCB
What we do: Per-Variation Logistic RegressionA variant supporting updates in “mini-batches”Exploration-on-topWorst case: “Garbage In Multi Arm Bandit Out”Light on memory, compact output
Online should be fast & scale
Offline: a testbed for iteratively testing new ideasNew algorithmsTweaked parametersFeature transformations
How We Do It: Online & Offline
The Online Flow
DY Web Servers
a. get our scriptb. log impressions, conversions
Queue Per Test
Learn Workers
User DB
Persist ModelLoad to
Predict Server
Queue Per Test
A B C
A B C
A B C
Predictions
The Offline Evaluator
Test, Improve, Iterate
Using real-world data
Using generated dataFrom easy to hard
Going GlobalLearn in the center site, fast predict in each geo. How?
Push models via local Redis slaves Compressed SSH tunnel
User data - daily aggregationStorage into LMDB (simple, fast memory-mapped K/V DB)Sync via S3 (LZ4 compressed), read from SSD
Learn & Predict servicesPython as ML lingua franca: NumPy, SciPy, scikit-learn
Elad & Idan Say Goodbye
Better data beats better algorithmsReduce aggressivelyKeep It Simple, Smart!
Elad RosenheimIdan MichaeliRead our blog
Hiring? but of course!
What’s with the Groundhog?