Post on 15-Sep-2019
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
An Asymptotic Minimax Theoremfor Gaussian Two-Armed Bandit
A.V.Kolnogorov1
1Yaroslav-the-Wise Novgorod State UniversityAlexander.Kolnogorov@novsu.ru
ACMPT
Moscow
2017, October 26
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Bernoulli Two-Armed Bandit
Bernoulli Two-Armed Bandit
It is a slot machine with two arms. If the `-tharm is chosen then the gambler gets unit income(+1) with probability p` and nothing (0) withprobability q` (p` + q` = 1).
The gambler can chose arms N times totally. His goal is to maximize (insome sense) the total expected income. Probabilities p1, p2 are fixedduring control process but unknown to the gambler.
A Dilemma “Information vs Control”
For the gambler it would be better always to chose the armcorresponding to the largest value of probabilities p1, p2. However, if hewants to determine this arm he should try them both and this diminisheshis total expected income.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Formal Setup
Formal Setup
Formally, let’s consider a Bernoulli random controlled process ξn,n = 1, . . . , N , s.t.
Pr(ξn = 1|yn = `) = p`, Pr(ξn = 0|yn = `) = q`, ` = 1, 2.
A strategy σ can use all currently available information of the process:n1, n2 – total numbers of both arms choices, X1, X2 – correspondingtotal incomes. The loss function is as follows
LN (σ, θ) = N(p1 ∨ p2)− Eσ,θ
(N∑
n=1
ξn
),
where θ = (p1, p2) is a parameter of the process.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Bayesian Approach
Bayesian Approach
Let λ(θ) be a prior distribution density onΘ = {(p1, p2) : 0 ≤ p` ≤ 1, ` = 1, 2}. The Bayesian risk is equal to
RBN (λ) = inf
{σ}
∫Θ
LN (σ, θ)λ(θ)dθ,
corresponding optimal strategy σB is called Bayesian strategy.
A Simple Recursive Algorithm of Determination
As Berry and Fristedt write: “. . . it is not that researchers in banditproblems tend to “Bayesians”; rather Bayes’s theorem provides aconvenient mathematical formalism that allows for adaptive learning andso is an ideal tool in sequential decision problems”.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Minimax Approach
Minimax Approach
The minimax risk is equal to
RMN (Θ) = inf
{σ}supΘ
LN (σ, θ),
corresponding optimal strategy σM minimax strategy.
Robustness of Minimax Approach
If σM is applied then the following inequality holds
LN (σM , θ) ≤ RMN (Θ), ∀θ ∈ Θ.
Impossibility of Direct Determination
As Fabius and van Zwet write about Bernoulli two-armed bandit: “thealgebra involved becomes progressively more complicated with increasingN and seems to remain prohibitive already for N as small as 5”.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
AMT
An Asymptotic Minimax Theorem
TheoremThe following inequality holds as N →∞ for Bernoulli two-armed bandit
0.612 ≤ (DN)−1/2RMN (Θ) ≤ 0.752.
The proof of the theorem uses some indirect estimates and techniques.
Vogel, W. An asymptotic minimax theorem for the two-armed banditproblem. Ann. Math. Stat., 1961, V. 31, P. 444–451
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Specification of the AMT
Specification of the Asymptotic Minimax Theorem
We propose the following specification of the AMT for Gaussiantwo-armed bandit
TheoremThe following equality holds
limN→∞
(DN)−1/2RMN (Θ) ≈ 0.637.
The following issues are to be discussed:
1 What is Gaussian two-armed bandit?2 Why Gaussian two-armed bandit?3 How are Gaussian and Bernoulli two-armed bandits related with?4 How the theorem is proved?
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
What is Gaussian Two-Armed Bandit
What is Gaussian Two-Armed Bandit
It is a slot machine with two arms. If the `-tharm is chosen then the gambler gets randomincome. This income is normally distributed withunit variance and mathematical expectation m`.
The gambler can chose arms N times totally. His goal is to maximize (insome sense) the total expected income. Expectations m1, m2 are fixedduring control process but unknown to the gambler.
A Dilemma “Information vs Control”
For the gambler it would be better always to chose the armcorresponding to the largest value of expectations m1, m2. However, ifhe wants to determine this arm he should try them both and thisdiminishes his total expected income.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Why Gaussian Two-Armed Bandit?
Why Gaussian Two-Armed Bandit? Parallel processing
-
Assume that a large numberT = NK items of data aregiven, which can be processedby two alternative methods.Processing may be successful(ξt = 1) or unsuccessful (ξt =0).
Probabilities of successful and unsuccessful processing depend on chosenmethods (arms) and are equal to p` and q` respectively (` = 1, 2).Assume that p1, p2 are close to p. Let’s define the process
ξ′n = (DK)−1/2nK∑
t=(n−1)K+1
ξt, n = 1, . . . , N, D = p(1− p).
Distributions of ξ′n are close to normal and their variances are close to 1.A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Some properties
Relation between Gaussian and Bernoulli Two-armedBandits
1 In application to data processing, Gaussian two-armed bandit is aparticular case of Bernoulli two-armed bandit which allows toprocess data in parallel. The data should be partitioned in a numberof groups. Data in the same group are then processed in parallel bythe same method.
2 For example, given 30 000 items of data, the data can be partitionedinto 30 groups each containing 1000 items of data. Then they canbe processed in 30 stages by 1000 items of data at each stage.
3 If the number of stages is large enough (e.g. 30 stages or more) thenmaximal losses for parallel data processing are almost the same as ifthe data were processed optimally one-by-one!
4 So, Bernoulli and Gaussian two-armed bandits are equivalent asN →∞.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Formal Setup
Formal Setup of the Problem
Formally incomes are considered as a controlled random processξ1, ξ2, . . . , ξN , which values depend only on currently chosen arms (in thesequel called actions) y1, y2, . . . , yN and are normally distributed withunit variance and mathematical expectation m` if the `-th action ischosen
f(x|m`) = (2π)−1/2 exp{−(x−m`)2/2
}.
This process is completely described by a vector parameter θ = (m1,m2).A control strategy σ prescribes the choice of actions yn, n = 1, . . . , Nand depends on complete prehistory of the process. In fact, it is sufficientto know four current values: n1, n2 – numbers of choices of both actionsand X1, X2 -– total incomes for both actions. Loss function is defined asfollows
LN (σ, θ) = N(m1 ∨m2)− Eσ,θ
(N∑
n=1
ξn
).
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Minimax and Bayesian Settings
Minimax and Bayesian Settings
Assume that the set of parameters is thefollowing Θ = {(m1,m2) : |m1 − m2| ≤2c1, |m1 + m2| ≤ 2c2}, with 0 < c1 < ∞,0 < c2 < ∞ and c2 is large enough. Minimaxrisk is defined as
RMN (Θ) = inf
{σ}supΘ
LN (σ, θ),
corresponding strategy σM is called minimaxstrategy.
Consider a prior distribution λ(m1,m2) on Θ. Bayesian risk is defined as
RBN (λ) = inf
{σ}
∫Θ
LN (σ, θ)λ(θ)dθ,
corresponding strategy σB is called Bayesian strategy.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Some Properties of Approaches
Some Properties of Approaches
Robustness of Minimax Approach
If σM is applied then the following inequality holds
LN (σM , θ) ≤ RMN (Θ), ∀θ ∈ Θ.
A Simple Recursive Algorithm for Bayesian Approach
There is a well-known recursive algorithm for Bayesian risk and Bayesianstrategy numerical determination for any prior distribution.
Main Theorem of the Theory of Games
Under mild conditions the minimax risk is equal to Bayesian onecorresponding to the worst-case prior distribution i.e.
RMN (Θ) = sup
{Λ}RB
N (Λ) = RBN (Λ0).
and minimax strategy is equal to corresponding Bayesian strategy as well.In the sequel, minimax risk is searched as Bayesian one calculatedwith respect to the worst-case prior.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
The Worst-Case Prior Distribution
Asymptotically the Worst-Case Prior Distribution
Asymptotically the worst-caseprior is uniform along andsymmetric crosswise the maindiagonal
Calculations allow to expect that itconcentrates on two parallel lines.Distance between them is the onlyunknown parameter
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Invariant Equation – 1
Change of Variables
Recall that n1, n2 denote total numbers of choices of both actions, X1,X2 are corresponding total incomes and m1, m2 are mathematicalexpectations. We assume that actions are applied to groups of the sizeM . This allows parallel processing.Let’s denote
u =X1n2 −X2n1
nN1/2,
t1 =n1
N, t2 =
n2
N, ε =
M
N,
w = (m1 −m2)N1/2
and let %(w) characterize a symmetric uniform prior distribution.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Invariant Equation – 2
Invariant Recursive Equation with Unit Time Horizon
To determine the Bayesian risk one should solve Bellman type recursiveequation
rε(u, t1, t2) = min`=1,2
r(`)ε (u, t1, t2),
where r(1)ε (u, t1, t2) = r
(2)ε (u, t1, t2) = 0 for t1 + t2 = 1,
r(1)ε (u, t1, t2) = εg(1)(u, t1, t2) + rε(u, t1 + ε, t2) ∗ fεt22t−1(t+ε)−1(u),
r(2)ε (u, t1, t2) = εg(2)(u, t1, t2) + rε(u, t1, t2 + ε) ∗ fεt21t−1(t+ε)−1(u),
for t1 + t2 < 1, t1 ≥ ε and t2 ≥ ε.
g(`)(u, t1, t2) =∞∫0
2wg(u, (−1)`+1w, t1, t2)%(w)dw, ` = 1, 2,
g(u, w, t1, t2) = exp(−2uw − 2w2t1t2t
−1),
fε(x) = (2πε)−1/2 exp(x2/(2ε)).
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Passage to the Limit
Passage to the Limit
Let ε → 0. Then for all u and for all t1, t2 for which solution of theequation is well defined there exists a limit
r(u, t1, t2) = limε→+0
rε(u, t1, t2), ` = 1, 2.
Under some additional conditions r(u, t1, t2) satisfies the second orderpartial differential equation
min`=1,2
(∂r
∂t`+
t2`
2t2× ∂2r
∂u2+ g(`)(u, t1, t2)
)= 0,
` = 3− `, ` = 1, 2, with initial conditions
limt1+t2→1
r(u, t1, t2) = 0 if t1 > ε, t2 > ε,
and boundary conditions
limu→+∞
r(u, t1, t2) = limu→−∞
r(u, t1, t2) = 0
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Risk and Strategy
Risk and Strategy
Minimax Risk
Limiting minimax risk is equal to the limiting Bayesian risk correspondingto the worst prior distribution and is calculated as
limN→∞
N−1/2RMN (Θ) = sup
%r(%;u, t1, t2)
∣∣u=0,t1=0,t2=0
.
Optimal Strategy
Currently optimal is the `-th action if
∂r
∂t`+
t2`
2t2× ∂2r
∂u2+ g(`)(u, t1, t2)
has the smaller value (` = 1, 2).
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Numerical Experiments
Numerical Experiments
Calculations were done under assumption that the worst prior distribution%(w) is concentrated at two points w = ±d with 0.5 ≤ d ≤ 2.5. Themaximal value of r(%; 0, 0, 0) was determined as
maxd
r(%; 0, 0, 0) ≈ 0.637
corresponding to d ≈ 1.57.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Some References - 1
Russian References
1 Tsetlin, M.L., Issledovaniya po teorii avtomatov i modelirovaniyubiologicheskikh sistem (Studies in Automata Theory and ModelingBiological Systems), Moscow: Nauka, 1969.
2 Varshavskii, V.I., Kollektivnoe povedenie avtomatov (CollectiveBehavior of Automata), Moscow: Nauka, 1973.
3 Sragovich, V.G., Adaptivnoe upravlenie (Adaptive Control), Moscow:Nauka, 1981.
4 Nazin, A.V. and Poznyak, A.S., Adaptivnyi vybor variantov(Adaptive Choice between Alternatives), Moscow: Nauka, 1986.
5 Presman, E.L. and Sonin, I.M., Posledovatel’noe upravlenie ponepolnym dannym (Sequential Control with Incomplete Data),Moscow: Nauka, 1982.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Some References - 2
English References
1 Berry, D.A. and Fristedt, B. Bandit Problems: Sequential Allocationof Experiments. Chapman and Hall. London, New York.,1985.
2 Lai, T.L. and Robbins, H. Asymptotically Efficient AdaptiveAllocation Rules. Advances in Applied Mathematics, 1985, V. 6,P. 4-22.
3 Robbins, H. Some aspects of the sequential design of experiments.Bulletin of Amer. Math. Soc., 1952, V. 58, P.527-535.
4 Vogel, W. An asymptotic minimax theorem for the two-armed banditproblem. Ann. Math. Stat., 1961, V. 31, P.444-451
5 Fabius, J., and van Zwet, W.R. Some remarks on the two-armedbandit. Ann. Math. Stat., 1970, V. 41, 1906 -1916.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Some References - 3
Previous Publications
1 Kolnogorov A. V. Finding Minimax Strategy and Minimax Risk in aRandom Environment (the Two-Armed Bandit Problem) // Automationand Remote Control, 2011, Vol. 72, No. 5, pp. 1017-1027.
2 Kolnogorov A.V. On a Limiting Description of Robust Parallel Control in aRandom Environment // Automation and Remote Control, Vol. 76,No. 7, pp. 1229 - 1241, 2015.
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit
Introduction Gaussian Two-Armed Bandit Sketch of the proof of theorem Conclusion
Thank you
Thank you for attention
Thank you for attention
A.V.Kolnogorov An Asymptotic Minimax Theorem for Gaussian Two-Armed Bandit