Improving experimentation velocity via Multi-Armed Bandits

22
Improving experimentation velocity via Multi-Armed Bandits Dr Ilias Flaounas Senior Data Scientist Growth Hacking Meetup, Sydney, 20 June 2016

Transcript of Improving experimentation velocity via Multi-Armed Bandits

Improving experimentation velocity via Multi-Armed Bandits

Dr Ilias Flaounas Senior Data Scientist

Growth Hacking Meetup, Sydney, 20 June 2016

http://www.nancydixonblog.com/2012/05/-why-knowledge-management-didnt-save-general-motors-addressing-complex-issues-by-convening-conversat.html

Conversion rate

PDF

•In a classic A/B we pick where to assign the next user randomly. •In MAB we actively choose the cohort.

Pick black to exploit

Pick green (or red) to explore

Win for variation “d”.

Win for variation “d” and estimation of p-values

Let’s run it for a bit longer… Again, win for variation “d”.

Classic A/B/C/D/E: ~2.5K samples Multi-armed bandit: ~1K samples

60% Less samples

No winner after 1K iterations

Classic A/B/C: ~5K samples Multi-armed bandit: ~1K samples

80% Less samples

No winner after 1K iterations

Classic A/B/C: ~2.8K samples Multi-armed bandit: ~1K samples

64% Less samples

Win for variation “a”.

Classic A/B/C: ~1.8K samples Multi-armed bandit: ~1K samples

45% Less samples

Disadvantages

• Reaching significance for non-winning arms takes longer

• Unclear stopping criteria - App-specific heuristics

• Hard to order non-winning arms and assess reliably their impact

Advantages

• Reaching significance for the winning arm is faster

• Best arm can change over time

• There are no false positives in the long term

• How can we locate the city of Bristol from tweets?

• 10K candidate locations organised in a 100x100 grid

• At every step we get tweets from one location and count the number of mentions of the word “Bristol”

• Challenge: find the target in sub-linear time complexity!

• Contextual bandits can tackle this problem

• We proposed the KernelUCB, a non-linear & contextual flavour of MAB.

• The last few steps of the algorithm before it locates Bristol.

Technical description: M. Valko, N. Korda, R. Munos, I. Flaounas, N. Cristianini, “Finite-Time Analysis of Kernelised Contextual Bandits”, UAI, 2013.

Target is the red dot.

KernelUCB Matlab code: http://www.complacs.org/pmwiki.php/CompLACS/KernelUCB

KernelUCB with RBF kernel converges after ~300 iterations (instead of >>10K).

Thank you!Yes, we are hiring

Dr Ilias Flaounas Senior Data Scientist