An Asymptotic Minimax Theorem for Gaussian Two-Armed...

Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion

An Asymptotic Minimax Theoremfor Gaussian Two-Armed Bandit

A.V.Kolnogorov1

1Yaroslav-the-Wise Novgorod State UniversityAlexander.Kolnogorov@novsu.ru

Moscow

2017, October 26

A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit

Bernoulli Two-Armed Bandit

It is a slot machine with two arms. If the `-tharm is chosen then the gambler gets unit income(+1) with probability p` and nothing (0) withprobability q` (p` + q` = 1).

The gambler can chose arms N times totally. His goal is to maximize (insome sense) the total expected income. Probabilities p1, p2 are fixedduring control process but unknown to the gambler.

A Dilemma “Information vs Control”

For the gambler it would be better always to chose the armcorresponding to the largest value of probabilities p1, p2. However, if hewants to determine this arm he should try them both and this diminisheshis total expected income.

Formal Setup

Formally, let’s consider a Bernoulli random controlled process ξn,n = 1, . . . , N , s.t.

Pr(ξn = 1|yn = `) = p`, Pr(ξn = 0|yn = `) = q`, ` = 1, 2.

A strategy σ can use all currently available information of the process:n1, n2 – total numbers of both arms choices, X1, X2 – correspondingtotal incomes. The loss function is as follows

LN (σ, θ) = N(p1 ∨ p2)− Eσ,θ

where θ = (p1, p2) is a parameter of the process.

Bayesian Approach

Let λ(θ) be a prior distribution density onΘ = {(p1, p2) : 0 ≤ p` ≤ 1, ` = 1, 2}. The Bayesian risk is equal to

RBN (λ) = inf

LN (σ, θ)λ(θ)dθ,

corresponding optimal strategy σB is called Bayesian strategy.

A Simple Recursive Algorithm of Determination

As Berry and Fristedt write: “. . . it is not that researchers in banditproblems tend to “Bayesians”; rather Bayes’s theorem provides aconvenient mathematical formalism that allows for adaptive learning andso is an ideal tool in sequential decision problems”.

Minimax Approach

The minimax risk is equal to

RMN (Θ) = inf

{σ}supΘ

LN (σ, θ),

corresponding optimal strategy σM minimax strategy.

Robustness of Minimax Approach

If σM is applied then the following inequality holds

LN (σM , θ) ≤ RMN (Θ), ∀θ ∈ Θ.

Impossibility of Direct Determination

As Fabius and van Zwet write about Bernoulli two-armed bandit: “thealgebra involved becomes progressively more complicated with increasingN and seems to remain prohibitive already for N as small as 5”.

An Asymptotic Minimax Theorem

TheoremThe following inequality holds as N →∞ for Bernoulli two-armed bandit

0.612 ≤ (DN)−1/2RMN (Θ) ≤ 0.752.

The proof of the theorem uses some indirect estimates and techniques.

Vogel, W. An asymptotic minimax theorem for the two-armed banditproblem. Ann. Math. Stat., 1961, V. 31, P. 444–451

Specification of the AMT

Specification of the Asymptotic Minimax Theorem

We propose the following specification of the AMT for Gaussiantwo-armed bandit

TheoremThe following equality holds

limN→∞

(DN)−1/2RMN (Θ) ≈ 0.637.

The following issues are to be discussed:

1 What is Gaussian two-armed bandit?2 Why Gaussian two-armed bandit?3 How are Gaussian and Bernoulli two-armed bandits related with?4 How the theorem is proved?

What is Gaussian Two-Armed Bandit

It is a slot machine with two arms. If the `-tharm is chosen then the gambler gets randomincome. This income is normally distributed withunit variance and mathematical expectation m`.

The gambler can chose arms N times totally. His goal is to maximize (insome sense) the total expected income. Expectations m1, m2 are fixedduring control process but unknown to the gambler.

A Dilemma “Information vs Control”

For the gambler it would be better always to chose the armcorresponding to the largest value of expectations m1, m2. However, ifhe wants to determine this arm he should try them both and thisdiminishes his total expected income.

Why Gaussian Two-Armed Bandit?

Why Gaussian Two-Armed Bandit? Parallel processing

Assume that a large numberT = NK items of data aregiven, which can be processedby two alternative methods.Processing may be successful(ξt = 1) or unsuccessful (ξt =0).

Probabilities of successful and unsuccessful processing depend on chosenmethods (arms) and are equal to p` and q` respectively (` = 1, 2).Assume that p1, p2 are close to p. Let’s define the process

ξ′n = (DK)−1/2nK∑

t=(n−1)K+1

ξt, n = 1, . . . , N, D = p(1− p).

Distributions of ξ′n are close to normal and their variances are close to 1.A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit

Some properties

Relation between Gaussian and Bernoulli Two-armedBandits

1 In application to data processing, Gaussian two-armed bandit is aparticular case of Bernoulli two-armed bandit which allows toprocess data in parallel. The data should be partitioned in a numberof groups. Data in the same group are then processed in parallel bythe same method.

2 For example, given 30 000 items of data, the data can be partitionedinto 30 groups each containing 1000 items of data. Then they canbe processed in 30 stages by 1000 items of data at each stage.

3 If the number of stages is large enough (e.g. 30 stages or more) thenmaximal losses for parallel data processing are almost the same as ifthe data were processed optimally one-by-one!

4 So, Bernoulli and Gaussian two-armed bandits are equivalent asN →∞.

Formal Setup

Formal Setup of the Problem

Formally incomes are considered as a controlled random processξ1, ξ2, . . . , ξN , which values depend only on currently chosen arms (in thesequel called actions) y1, y2, . . . , yN and are normally distributed withunit variance and mathematical expectation m` if the `-th action ischosen

f(x|m`) = (2π)−1/2 exp{−(x−m`)2/2

This process is completely described by a vector parameter θ = (m1,m2).A control strategy σ prescribes the choice of actions yn, n = 1, . . . , Nand depends on complete prehistory of the process. In fact, it is sufficientto know four current values: n1, n2 – numbers of choices of both actionsand X1, X2 -– total incomes for both actions. Loss function is defined asfollows

LN (σ, θ) = N(m1 ∨m2)− Eσ,θ

Minimax and Bayesian Settings

Assume that the set of parameters is thefollowing Θ = {(m1,m2) : |m1 − m2| ≤2c1, |m1 + m2| ≤ 2c2}, with 0 < c1 < ∞,0 < c2 < ∞ and c2 is large enough. Minimaxrisk is defined as

RMN (Θ) = inf

{σ}supΘ

LN (σ, θ),

corresponding strategy σM is called minimaxstrategy.

Consider a prior distribution λ(m1,m2) on Θ. Bayesian risk is defined as

RBN (λ) = inf

LN (σ, θ)λ(θ)dθ,

corresponding strategy σB is called Bayesian strategy.

Some Properties of Approaches

Robustness of Minimax Approach

If σM is applied then the following inequality holds

LN (σM , θ) ≤ RMN (Θ), ∀θ ∈ Θ.

A Simple Recursive Algorithm for Bayesian Approach

There is a well-known recursive algorithm for Bayesian risk and Bayesianstrategy numerical determination for any prior distribution.

Main Theorem of the Theory of Games

Under mild conditions the minimax risk is equal to Bayesian onecorresponding to the worst-case prior distribution i.e.

RMN (Θ) = sup

{Λ}RB

N (Λ) = RBN (Λ0).

and minimax strategy is equal to corresponding Bayesian strategy as well.In the sequel, minimax risk is searched as Bayesian one calculatedwith respect to the worst-case prior.

The Worst-Case Prior Distribution

Asymptotically the Worst-Case Prior Distribution

Asymptotically the worst-caseprior is uniform along andsymmetric crosswise the maindiagonal

Calculations allow to expect that itconcentrates on two parallel lines.Distance between them is the onlyunknown parameter

Invariant Equation – 1

Change of Variables

Recall that n1, n2 denote total numbers of choices of both actions, X1,X2 are corresponding total incomes and m1, m2 are mathematicalexpectations. We assume that actions are applied to groups of the sizeM . This allows parallel processing.Let’s denote

u =X1n2 −X2n1

nN1/2,

t1 =n1

N, t2 =

N, ε =

w = (m1 −m2)N1/2

and let %(w) characterize a symmetric uniform prior distribution.

Invariant Equation – 2

Invariant Recursive Equation with Unit Time Horizon

To determine the Bayesian risk one should solve Bellman type recursiveequation

rε(u, t1, t2) = min`=1,2

r(`)ε (u, t1, t2),

where r(1)ε (u, t1, t2) = r

(2)ε (u, t1, t2) = 0 for t1 + t2 = 1,

r(1)ε (u, t1, t2) = εg(1)(u, t1, t2) + rε(u, t1 + ε, t2) ∗ fεt22t−1(t+ε)−1(u),

r(2)ε (u, t1, t2) = εg(2)(u, t1, t2) + rε(u, t1, t2 + ε) ∗ fεt21t−1(t+ε)−1(u),

for t1 + t2 < 1, t1 ≥ ε and t2 ≥ ε.

g(`)(u, t1, t2) =∞∫0

2wg(u, (−1)`+1w, t1, t2)%(w)dw, ` = 1, 2,

g(u, w, t1, t2) = exp(−2uw − 2w2t1t2t

−1),

fε(x) = (2πε)−1/2 exp(x2/(2ε)).

Passage to the Limit

Let ε → 0. Then for all u and for all t1, t2 for which solution of theequation is well defined there exists a limit

r(u, t1, t2) = limε→+0

rε(u, t1, t2), ` = 1, 2.

Under some additional conditions r(u, t1, t2) satisfies the second orderpartial differential equation

min`=1,2

∂t`+

2t2× ∂2r

∂u2+ g(`)(u, t1, t2)

` = 3− `, ` = 1, 2, with initial conditions

limt1+t2→1

r(u, t1, t2) = 0 if t1 > ε, t2 > ε,

and boundary conditions

limu→+∞

r(u, t1, t2) = limu→−∞

r(u, t1, t2) = 0

Risk and Strategy

Minimax Risk

Limiting minimax risk is equal to the limiting Bayesian risk correspondingto the worst prior distribution and is calculated as

limN→∞

N−1/2RMN (Θ) = sup

%r(%;u, t1, t2)

∣∣u=0,t1=0,t2=0

Optimal Strategy

Currently optimal is the `-th action if

∂t`+

2t2× ∂2r

∂u2+ g(`)(u, t1, t2)

has the smaller value (` = 1, 2).

Numerical Experiments

Calculations were done under assumption that the worst prior distribution%(w) is concentrated at two points w = ±d with 0.5 ≤ d ≤ 2.5. Themaximal value of r(%; 0, 0, 0) was determined as

r(%; 0, 0, 0) ≈ 0.637

corresponding to d ≈ 1.57.

Some References - 1

Russian References

1 Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyubiologicheskikh sistem (Studies in Automata Theory and ModelingBiological Systems), Moscow: Nauka, 1969.

2 Varshavskii, V.I., Kollektivnoe povedenie avtomatov (CollectiveBehavior of Automata), Moscow: Nauka, 1973.

3 Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow:Nauka, 1981.

4 Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov(Adaptive Choice between Alternatives), Moscow: Nauka, 1986.

5 Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie ponepolnym dannym (Sequential Control with Incomplete Data),Moscow: Nauka, 1982.

Some References - 2

English References

1 Berry, D.A. and Fristedt, B. Bandit Problems: Sequential Allocationof Experiments. Chapman and Hall. London, New York.,1985.

2 Lai, T.L. and Robbins, H. Asymptotically Efficient AdaptiveAllocation Rules. Advances in Applied Mathematics, 1985, V. 6,P. 4-22.

3 Robbins, H. Some aspects of the sequential design of experiments.Bulletin of Amer. Math. Soc., 1952, V. 58, P.527-535.

4 Vogel, W. An asymptotic minimax theorem for the two-armed banditproblem. Ann. Math. Stat., 1961, V. 31, P.444-451

5 Fabius, J., and van Zwet, W.R. Some remarks on the two-armedbandit. Ann. Math. Stat., 1970, V. 41, 1906 -1916.

Some References - 3

Previous Publications

1 Kolnogorov A. V. Finding Minimax Strategy and Minimax Risk in aRandom Environment (the Two-Armed Bandit Problem) // Automationand Remote Control, 2011, Vol. 72, No. 5, pp. 1017-1027.

2 Kolnogorov A.V. On a Limiting Description of Robust Parallel Control in aRandom Environment // Automation and Remote Control, Vol. 76,No. 7, pp. 1229 - 1241, 2015.

Thank you

Thank you for attention

An Asymptotic Minimax Theorem for Gaussian Two-Armed...

Documents

Transcript of An Asymptotic Minimax Theorem for Gaussian Two-Armed...

A Finite-Time Analysis of Multi-armed Bandits Problems ...proceedings.mlr.press/v19/maillard11a/maillard11a.pdf · adaptation of Dinwoodie (1992, Theorem 2.1 and comments on page

Extremes of the supercritical Gaussian Free Fieldalea.impa.br/articles/v13/13-28.pdf · Extremes of the supercritical Gaussian Free Field 713 Theorem 1.2. Let V N be as above and

Dynkin’s isomorphism theorem and the stochastic heat equation · of information on Markov processes, local times, and their deep connections to Gaussian processes. Our notation

Gaussian, Hermite-Gaussian, and Laguerre-Gaussian beams: A …cds.cern.ch/record/796806/files/0410021.pdf · 2009-07-13 · 1 Gaussian, Hermite-Gaussian, and Laguerre-Gaussian beams:

Lab 3b: Distribution of the mean Outline Distribution of the mean: Normal (Gaussian) – Central Limit Theorem “Justification” of the mean as the best estimate.

1 Introduction - WIT Press · Gaussian and the first Green integral theorem without any co-ordinate trans-formation. The lateral boundary conditions are generalised radiation boundary

Information Theory Lecture 12 Parallel Gaussian …...Parallel Gaussian channels Water ﬁlling Theorem Given k independent parallel Gaussian channels with noise variance Ni, i = 1,.

Nearly Optimal Non-Gaussian Codes for the Gaussian ...

On cocycle superrigidity for Gaussian actions · and a non-amenable group, recovering Popa’s cocycle superrigidity theorem for Bernoulli actions of these groups. We also obtain

Chapter 5 Multiple Random Variables - SIUEyadwang/ECE352_Lec5.pdf · 2017. 2. 20. · 21 §5.9 Bivariate Gaussian Random Variables Theorem 5.18 If Xand Y are the bivariate Gaussian

Statistical independence in mathematics–the key to a ......Statistical independence in mathematics–the key to a Gaussian law theorem for standardized sums of independent random

Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

1korytov/tmp4/word/note_A13_statistic… · Web viewPoisson distribution. Gaussian distribution. Central Limit Theorem. Propagation of errors. Averaging with proper weights. Statistics.

Multidimensional Fibonacci Coding - arXiv · Keywords: Fibonacci code, Zeckendorf’s theorem, pre x code, data compression, Z-module, Gaussian integers 1 Introduction A fundamental

Dvoretzky’s Theorem and Concentration of Measure · Bodies and Banach Space Geometry as well as on Yehoram Gordon’s unpublished manuscript \Applications of the Gaussian Min-Max

GAUSSIAN HEAT KERNEL UPPER BOUNDS VIA PHRAGMEN … · 2018-10-29 · GAUSSIAN HEAT KERNEL UPPER BOUNDS VIA PHRAGMEN-LINDEL´ OF THEOREM¨ THIERRY COULHON AND ADAM SIKORA Abstract.

Central limit theorem revisited Throw a dice twelve times- the distribution of values is not Gaussian…

Bayesian Nonparametric Modelling · 2008-04-25 · Outline Bayesian Nonparametric Modelling Gaussian Processes De Finetti’s Theorem Pólya Urn Scheme Dirichlet Processes Representations

Gaussian Curvature and The Gauss-Bonnet Theorem · 2013. 5. 13. · Within the proof of the Gauss-Bonnet theorem, one of the fundamental theorems is applied: the theorem of Stokes.

State Space Gaussian Processes with Non-Gaussian Likelihood