On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ......

23
On the Complexity of A/B Testing Emilie Kaufmann , Olivier Capp ´ e (T ´ el´ ecom ParisTech) and Aur´ elien Garivier (Institut de Math ´ ematiques de Toulouse) Conference On Learning Theory, Barcelona, June 14 th , 2014 Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 1 / 23

Transcript of On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ......

Page 1: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

On the Complexity of A/B Testing

Emilie Kaufmann, Olivier Cappe (Telecom ParisTech) and AurelienGarivier (Institut de Mathematiques de Toulouse)

Conference On Learning Theory, Barcelona,June 14th, 2014

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 1 / 23

Page 2: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

A/B Testing An example

Motivation

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 2 / 23

Page 3: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

A/B Testing An example

Our goal

Improve performance:

Ü fixed number of test users � A smaller probability of errorÜ fixed probability of error � A fewer test users

Tools: sequential allocation and stopping

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 3 / 23

Page 4: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

Best arm identification in two-armed bandits

1 Best arm identification in two-armed bandits

2 Lower bounds on the complexities

3 The complexity of A/B Testing with Gaussian feedback

4 The complexity of A/B Testing with binary feedback

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 4 / 23

Page 5: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

Best arm identification in two-armed bandits Model and objectives

The model

A two-armed bandit model isa set ν � �ν1, ν2� of two probability distributions (’arms’) withrespective means µ1 and µ2a� � argmaxa µa is the (unknown) best am

To find the best arm, an agent interacts with the bandit model witha sampling rule �At�t>N where At > �1,2� is the arm chosen at timet (based on past observations) � A a sample Zt � νAt is observeda stopping rule τ indicating when he stops sampling the armsa recommendation rule aτ > �1,2� indicating which arm he thinksis best (at the end of the interaction)

In classical A/B Testing, the sampling rule At is uniform on �1,2� andthe stopping rule τ � t is fixed in advance.

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 5 / 23

Page 6: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

Best arm identification in two-armed bandits Model and objectives

Two possible goals

The agent’s goal is to design a strategy A � ��At�, τ, aτ� satisfying

Fixed-budget setting Fixed-confidence setting

τ � t Pν�aτ x a�� B δpt�ν� �� Pν�at x a�� as small Eν�τ� as small

as possible as possible

An algorithm using uniform sampling is

Fixed-budget setting Fixed-confidence setting

a classical test of a sequential test of�µ1 A µ2� against �µ1 @ µ2� �µ1 A µ2� against �µ1 @ µ2�based on t samples with probability of error

uniformly bounded by δ

[Siegmund 85]: sequential tests can save samples !Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 6 / 23

Page 7: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

Best arm identification in two-armed bandits Model and objectives

The complexities of best-arm identification

LetM be a class of bandit models. An algorithm A � ��At�, τ, aτ� is...

Fixed-budget setting Fixed-confidence settingconsistent onM if δ-PAC onM if

¦ν >M, pt�ν� � Pν�at x a�� Ð�t�ª

0 ¦ν >M, Pν�aτ x a�� B δFrom the literature

pt�ν� � exp �� tCH�ν�� Eν�τ� � C �H ��ν� log 1

δ

[Audibert et al. 10],[Bubeck et al. 11] [Mannor Tsitsilis 04],[Even-Dar et al. 06][Bubeck et al. 13],... [Kalanakrishnan et al.12],...

Two complexities

κB�ν� � infA consistent

�lim supt�ª

�1t log pt��

�1

κC�ν� � infA δ�PAC

lim supδ�0

Eν�τ�log�1~δ�

for a probability of error B δ, for a probability of error B δbudget t � κB�ν� log 1

δ Eν�τ� � κC�ν� log 1δ

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 7 / 23

Page 8: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

Lower bounds on the complexities

Outline

1 Best arm identification in two-armed bandits

2 Lower bounds on the complexities

3 The complexity of A/B Testing with Gaussian feedback

4 The complexity of A/B Testing with binary feedback

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 8 / 23

Page 9: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

Lower bounds on the complexities Theoretical elements

Changes of distribution

New formulation for a change of distribution

Let ν and ν� be two bandit models. Let N1 (resp. N2) denote the totalnumber of draws of arm 1 (resp. arm 2) by algorithm A). For anyA > Fτ such that 0 @ Pν�A� @ 1

Eν�N1�KL�ν1, ν�1� �Eν�N2�KL�ν2, ν�2� C d�Pν�A�,Pν��A��,where d�x, y� �� x log�x~y� � �1 � x� log ��1 � x�~�1 � y��.

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 9 / 23

Page 10: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

Lower bounds on the complexities Statement of the lower bounds

General lower bounds

Theorem 1

LetM be a class of two armed bandit models that are continuouslyparametrized by their means. Let ν � �ν1, ν2� >M.

Fixed-budget setting Fixed-confidence setting

any consistent algorithm satisfies any δ-PAC algorithm satisfies

lim supt�ª �1t log pt�ν� B K��ν1, ν2� Eν�τ� C 1

K��ν1,ν2�log � 1

2δ�

with KL��ν1, ν2� with K��ν1, ν2�� KL�ν�, ν1� � KL�ν�, ν2� � KL�ν1, ν�� � K�ν2, ν��Thus, κB�ν� C 1

K��ν1,ν2�Thus, κC�ν� C 1

K��ν1,ν2�

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 10 / 23

Page 11: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with Gaussian feedback

1 Best arm identification in two-armed bandits

2 Lower bounds on the complexities

3 The complexity of A/B Testing with Gaussian feedback

4 The complexity of A/B Testing with binary feedback

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 11 / 23

Page 12: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with Gaussian feedback Fixed-budget setting

Fixed-budget setting

For fixed (known) values σ1, σ2, we consider Gaussian bandit models

M � �ν � �N �µ1, σ21� ,N �µ2, σ22�� � �µ1, µ2� > R2, µ1 x µ2�

Theorem 1:

κB�ν� C 2�σ1 � σ2�2�µ1 � µ2�2A strategy allocating t1 � � σ1

σ1�σ2t� samples to arm 1 and t2 � t � t1

samples to arm 1, and recommending the empirical best satisfies

lim inft�ª

�1

tlog pt�ν� C �µ1 � µ2�2

2�σ1 � σ2�2κB�ν� � 2�σ1 � σ2�2�µ1 � µ2�2

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 12 / 23

Page 13: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with Gaussian feedback Fixed-confidence setting

Fixed-confidence setting: Algorithm

The α-Elimination algorithm with exploration rate β�t, δ�Ü chooses At in order to keep a proportion N1�t�~t � α

i.e. At � 2 if and only if �αt� � �α�t � 1��Ü if µa�t� is the empirical mean of rewards obtained from a up to time

t, σ2t �α� � σ2

1~�αt� � σ22~�t � �αt��,

τ � inf �t > N � Sµ1�t� � µ2�t�S A¼2σ2

t �α�β�t, δ� 

0 200 400 600 800 1000

−1.

0−

0.5

0.0

0.5

1.0

Ü recommends the empirical best arm aτ � argmaxaµa�τ�Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 13 / 23

Page 14: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with Gaussian feedback Fixed-confidence setting

Fixed-confidence setting: Results

From Theorem 1:

Eν�τ� C 2�σ1 � σ2�2�µ1 � µ2�2 log � 1

2δ�

σ1σ1�σ2

-Elimination with β�t, δ� � log tδ � 2 log log�6t� is δ-PAC and

¦ε A 0, Eν�τ� B �1 � ε�2�σ1 � σ2�2�µ1 � µ2�2 log � 1

2δ� � oε

δ�0�log 1

δ�

κC�ν� � 2�σ1 � σ2�2�µ1 � µ2�2

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 14 / 23

Page 15: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with Gaussian feedback Conclusions

Gaussian distributions: Conclusions

For any two fixed values of σ1 and σ2,

κB�ν� � κC�ν� � 2�σ1 � σ2�2�µ1 � µ2�2If the variances are equal, σ1 � σ2 � σ,

κB�ν� � κC�ν� � 8σ2

�µ1 � µ2�2uniform sampling is optimal only when σ1 � σ21~2-Elimination is δ-PAC for a smaller exploration rateβ�t, δ� � log�log�t�~δ�

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 15 / 23

Page 16: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with binary feedback

1 Best arm identification in two-armed bandits

2 Lower bounds on the complexities

3 The complexity of A/B Testing with Gaussian feedback

4 The complexity of A/B Testing with binary feedback

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 16 / 23

Page 17: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with binary feedback Lower bounds

Lower bounds for Bernoulli bandit models

M � �ν � �B�µ1�,B�µ2�� � �µ1, µ2� >�0; 1�2, µ1 x µ2�,shorthand: K�µ,µ�� � KL �B�µ�,B�µ���.

Fixed-budget setting Fixed-confidence setting

any consistent algorithm satisfies any δ-PAC algorithm satisfies

lim supt�ª �1t log pt�ν� B K��µ1, µ2� Eν�τ� C 1

K��µ1,µ2�log � 1

2δ�

(Chernoff information)

K��µ1, µ2� A K��µ1, µ2�

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 17 / 23

Page 18: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with binary feedback Lower bounds

Algorithms using uniform sampling

For any consistent... For any δ-PAC...

... algorithm pt�ν� à e�K��µ1,µ2�t Eν�τ�log�1~δ� à

1K��µ1,µ2�

... algorithm using pt�ν� à e� K�µ,µ1��K�µ,µ2�2

t Eν�τ�log�1~δ� à

2K�µ1,µ��K�µ2,µ�

uniform sampling with µ � f�µ1, µ2� with µ �µ1�µ2

2

Remark: Quantities in the same column appear to be close from oneanother

� Binary rewards: uniform sampling close to optimal

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 18 / 23

Page 19: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with binary feedback Lower bounds

Algorithms using uniform sampling

For any consistent... For any δ-PAC...

... algorithm pt�ν� � e�K��µ1,µ2�t Eν�τ�log�1~δ� à

1K��µ1,µ2�

... algorithm using pt�ν� � e� K�µ,µ1��K�µ,µ2�2

t Eν�τ�log�1~δ� à

2K�µ1,µ��K�µ2,µ�

uniform sampling with µ � f�µ1, µ2� with µ �µ1�µ2

2

Remark: Quantities in the same column appear to be close from oneanother

� Binary rewards: uniform sampling close to optimal

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 19 / 23

Page 20: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with binary feedback Fixed-budget setting

Fixed-budget setting

We show that

κB�ν� � 1

K��µ1, µ2�(matching algorithm not implementable in practice)

The algorithm using uniform sampling and recommending theempirical best arm is preferable (and very close to optimal)

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 20 / 23

Page 21: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with binary feedback Fixed-confidence setting

Fixed-confidence setting

δ-PAC algorithms using uniform sampling satisfy

Eν�τ�log�1~δ� C

1

I��ν� with I��ν� � K �µ1, µ1�µ22� �K �µ2, µ1�µ22

�2

.

The algorithm using uniform sampling and

τ � inf �t > 2N�� Sµ1�t� � µ2�t�S A log

log�t� � 1

δ¡

is δ-PAC but not optimal: E�τ�log�1~δ� �

2�µ1�µ2�2

A1

I��ν�.

A better stopping rule NOT based on the difference of empirical means

τ � inf �t > 2N�� tI��µ1�t�, µ2�t�� A log

log�t� � 1

δ¡

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 21 / 23

Page 22: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with binary feedback Conclusions

Bernoulli distributions: Conclusion

Regarding the complexities:κB�ν� � 1

K��µ1,µ2�

κC�ν� C 1K��µ1,µ2�

A1

K��µ1,µ2�

ThusκC�ν� A κB�ν�

Regarding the algorithmsThere is not much to gain by departing from uniform samplingIn the fixed-confidence setting, a sequential test based on thedifference of the empirical means is no longer optimal

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 22 / 23

Page 23: On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ... On the Complexity of A/B Testing COLT 2014 5 / 23. Best arm identification in two-armed

The complexity of A/B Testing with binary feedback Conclusions

Conclusion

the complexities κB�ν� and κC�ν� are not always equal (andfeature some different informational quantities)for Bernoulli distributions and Gaussian with similar variances,strategies using uniform sampling are (almost) optimalstrategies using random stopping do not necessarily lead to asaving in terms of the number of sample used

Coming soon:Generalization to m best arms identification among K arms

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 23 / 23