On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ......

On the Complexity of A/B Testing

Emilie Kaufmann, Olivier Cappe (Telecom ParisTech) and AurelienGarivier (Institut de Mathematiques de Toulouse)

Conference On Learning Theory, Barcelona,June 14th, 2014

Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 1 / 23

A/B Testing An example

Motivation


A/B Testing An example

Our goal

Improve performance:

Ü fixed number of test users � A smaller probability of errorÜ fixed probability of error � A fewer test users

Tools: sequential allocation and stopping


Best arm identification in two-armed bandits

1 Best arm identification in two-armed bandits

2 Lower bounds on the complexities

3 The complexity of A/B Testing with Gaussian feedback

4 The complexity of A/B Testing with binary feedback


Best arm identification in two-armed bandits Model and objectives

The model

A two-armed bandit model isa set ν � �ν1, ν2� of two probability distributions (’arms’) withrespective means µ1 and µ2a� � argmaxa µa is the (unknown) best am

To find the best arm, an agent interacts with the bandit model witha sampling rule �At�t>N where At > �1,2� is the arm chosen at timet (based on past observations) � A a sample Zt � νAt is observeda stopping rule τ indicating when he stops sampling the armsa recommendation rule aτ > �1,2� indicating which arm he thinksis best (at the end of the interaction)

In classical A/B Testing, the sampling rule At is uniform on �1,2� andthe stopping rule τ � t is fixed in advance.



Two possible goals

The agent’s goal is to design a strategy A � ��At�, τ, aτ� satisfying

Fixed-budget setting Fixed-confidence setting

τ � t Pν�aτ x a�� B δpt�ν� �� Pν�at x a�� as small Eν�τ� as small

as possible as possible

An algorithm using uniform sampling is


a classical test of a sequential test of�µ1 A µ2� against �µ1 @ µ2� �µ1 A µ2� against �µ1 @ µ2�based on t samples with probability of error

uniformly bounded by δ

[Siegmund 85]: sequential tests can save samples !Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 6 / 23


The complexities of best-arm identification

LetM be a class of bandit models. An algorithm A � ��At�, τ, aτ� is...

Fixed-budget setting Fixed-confidence settingconsistent onM if δ-PAC onM if

¦ν >M, pt�ν� � Pν�at x a�� Ð�t�ª

0 ¦ν >M, Pν�aτ x a�� B δFrom the literature

pt�ν� � exp �� tCH�ν�� Eν�τ� � C �H ��ν� log 1

δ

[Audibert et al. 10],[Bubeck et al. 11] [Mannor Tsitsilis 04],[Even-Dar et al. 06][Bubeck et al. 13],... [Kalanakrishnan et al.12],...

Two complexities

κB�ν� � infA consistent

�lim supt�ª

�1t log pt�ν��

�1

κC�ν� � infA δ�PAC

lim supδ�0

Eν�τ�log�1~δ�

for a probability of error B δ, for a probability of error B δbudget t � κB�ν� log 1

δ Eν�τ� � κC�ν� log 1δ


Lower bounds on the complexities

Outline






Lower bounds on the complexities Theoretical elements

Changes of distribution

New formulation for a change of distribution

Let ν and ν� be two bandit models. Let N1 (resp. N2) denote the totalnumber of draws of arm 1 (resp. arm 2) by algorithm A). For anyA > Fτ such that 0 @ Pν�A� @ 1

Eν�N1�KL�ν1, ν�1� �Eν�N2�KL�ν2, ν�2� C d�Pν�A�,Pν��A��,where d�x, y� �� x log�x~y� � �1 � x� log ��1 � x�~�1 � y��.


Lower bounds on the complexities Statement of the lower bounds

General lower bounds

Theorem 1

LetM be a class of two armed bandit models that are continuouslyparametrized by their means. Let ν � �ν1, ν2� >M.


any consistent algorithm satisfies any δ-PAC algorithm satisfies

lim supt�ª �1t log pt�ν� B K��ν1, ν2� Eν�τ� C 1

K��ν1,ν2�log � 1

2δ�

with KL��ν1, ν2� with K��ν1, ν2�� KL�ν�, ν1� � KL�ν�, ν2� � KL�ν1, ν�� K�ν2, ν��Thus, κB�ν� C 1

K��ν1,ν2�Thus, κC�ν� C 1

K��ν1,ν2�


The complexity of A/B Testing with Gaussian feedback






The complexity of A/B Testing with Gaussian feedback Fixed-budget setting

Fixed-budget setting

For fixed (known) values σ1, σ2, we consider Gaussian bandit models

M � �ν � �N �µ1, σ21� ,N �µ2, σ22�� µ1, µ2� > R2, µ1 x µ2�

Theorem 1:

κB�ν� C 2�σ1 � σ2�2�µ1 � µ2�2A strategy allocating t1 � � σ1

σ1�σ2t� samples to arm 1 and t2 � t � t1

samples to arm 1, and recommending the empirical best satisfies

lim inft�ª

�1

tlog pt�ν� C �µ1 � µ2�2

2�σ1 � σ2�2κB�ν� � 2�σ1 � σ2�2�µ1 � µ2�2


The complexity of A/B Testing with Gaussian feedback Fixed-confidence setting

Fixed-confidence setting: Algorithm

The α-Elimination algorithm with exploration rate β�t, δ�Ü chooses At in order to keep a proportion N1�t�~t � α

i.e. At � 2 if and only if �αt� � �α�t � 1��Ü if µa�t� is the empirical mean of rewards obtained from a up to time

t, σ2t �α� � σ2

1~�αt� � σ22~�t � �αt��,

τ � inf �t > N � Sµ1�t� � µ2�t�S A¼2σ2

t �α�β�t, δ�

0 200 400 600 800 1000

−1.

0−

0.5

0.0

0.5

1.0

Ü recommends the empirical best arm aτ � argmaxaµa�τ�Emilie Kaufmann (Telecom ParisTech) On the Complexity of A/B Testing COLT 2014 13 / 23

The complexity of A/B Testing with Gaussian feedback Fixed-confidence setting

Fixed-confidence setting: Results

From Theorem 1:

Eν�τ� C 2�σ1 � σ2�2�µ1 � µ2�2 log � 1

2δ�

σ1σ1�σ2

-Elimination with β�t, δ� � log tδ � 2 log log�6t� is δ-PAC and

¦ε A 0, Eν�τ� B �1 � ε�2�σ1 � σ2�2�µ1 � µ2�2 log � 1

2δ� � oε

δ�0�log 1

δ�

κC�ν� � 2�σ1 � σ2�2�µ1 � µ2�2


The complexity of A/B Testing with Gaussian feedback Conclusions

Gaussian distributions: Conclusions

For any two fixed values of σ1 and σ2,

κB�ν� � κC�ν� � 2�σ1 � σ2�2�µ1 � µ2�2If the variances are equal, σ1 � σ2 � σ,

κB�ν� � κC�ν� � 8σ2

�µ1 � µ2�2uniform sampling is optimal only when σ1 � σ21~2-Elimination is δ-PAC for a smaller exploration rateβ�t, δ� � log�log�t�~δ�


The complexity of A/B Testing with binary feedback






The complexity of A/B Testing with binary feedback Lower bounds

Lower bounds for Bernoulli bandit models

M � �ν � �B�µ1�,B�µ2�� µ1, µ2� >�0; 1�2, µ1 x µ2�,shorthand: K�µ,µ�� KL �B�µ�,B�µ��.


any consistent algorithm satisfies any δ-PAC algorithm satisfies

lim supt�ª �1t log pt�ν� B K��µ1, µ2� Eν�τ� C 1

K��µ1,µ2�log � 1

2δ�

(Chernoff information)

K��µ1, µ2� A K��µ1, µ2�



Algorithms using uniform sampling

For any consistent... For any δ-PAC...

... algorithm pt�ν� à e�K��µ1,µ2�t Eν�τ�log�1~δ� à

1K��µ1,µ2�

... algorithm using pt�ν� à e� K�µ,µ1��K�µ,µ2�2

t Eν�τ�log�1~δ� à

2K�µ1,µ��K�µ2,µ�

uniform sampling with µ � f�µ1, µ2� with µ �µ1�µ2

2

Remark: Quantities in the same column appear to be close from oneanother

� Binary rewards: uniform sampling close to optimal



Algorithms using uniform sampling

For any consistent... For any δ-PAC...

... algorithm pt�ν� � e�K��µ1,µ2�t Eν�τ�log�1~δ� à

1K��µ1,µ2�

... algorithm using pt�ν� � e� K�µ,µ1��K�µ,µ2�2

t Eν�τ�log�1~δ� à

2K�µ1,µ��K�µ2,µ�

uniform sampling with µ � f�µ1, µ2� with µ �µ1�µ2

2

Remark: Quantities in the same column appear to be close from oneanother

� Binary rewards: uniform sampling close to optimal


The complexity of A/B Testing with binary feedback Fixed-budget setting

Fixed-budget setting

We show that

κB�ν� � 1

K��µ1, µ2�(matching algorithm not implementable in practice)

The algorithm using uniform sampling and recommending theempirical best arm is preferable (and very close to optimal)


The complexity of A/B Testing with binary feedback Fixed-confidence setting

Fixed-confidence setting

δ-PAC algorithms using uniform sampling satisfy

Eν�τ�log�1~δ� C

1

I��ν� with I��ν� � K �µ1, µ1�µ22� �K �µ2, µ1�µ22

�2

.

The algorithm using uniform sampling and

τ � inf �t > 2N�� Sµ1�t� � µ2�t�S A log

log�t� � 1

δ¡

is δ-PAC but not optimal: E�τ�log�1~δ� �

2�µ1�µ2�2

A1

I��ν�.

A better stopping rule NOT based on the difference of empirical means

τ � inf �t > 2N�� tI��µ1�t�, µ2�t�� A log

log�t� � 1

δ¡


The complexity of A/B Testing with binary feedback Conclusions

Bernoulli distributions: Conclusion

Regarding the complexities:κB�ν� � 1

K��µ1,µ2�

κC�ν� C 1K��µ1,µ2�

A1

K��µ1,µ2�

ThusκC�ν� A κB�ν�

Regarding the algorithmsThere is not much to gain by departing from uniform samplingIn the fixed-confidence setting, a sequential test based on thedifference of the empirical means is no longer optimal


The complexity of A/B Testing with binary feedback Conclusions

Conclusion

the complexities κB�ν� and κC�ν� are not always equal (andfeature some different informational quantities)for Bernoulli distributions and Gaussian with similar variances,strategies using uniform sampling are (almost) optimalstrategies using random stopping do not necessarily lead to asaving in terms of the number of sample used

Coming soon:Generalization to m best arms identification among K arms


On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ......

Documents

Transcript of On the Complexity of A/B Testing€¦ · A/B Testing An example Our goal Improve performance: ......