Adaptive Sampling Controlled Stochastic Recursions

70
Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks Adaptive Sampling Controlled Stochastic Recursions Raghu Pasupathy {[email protected]}, Purdue Statistics, West Lafayette, IN Co-authors : Soumyadip Ghosh (IBM Watson Research); Fatemeh Hashemi (Virginia Tech); Peter Glynn (Stanford University). January 7, 2016

Transcript of Adaptive Sampling Controlled Stochastic Recursions

Page 1: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

Adaptive Sampling Controlled StochasticRecursions

Raghu Pasupathy [email protected],Purdue Statistics, West Lafayette, IN

Co-authors:Soumyadip Ghosh (IBM Watson Research);

Fatemeh Hashemi (Virginia Tech);Peter Glynn (Stanford University).

January 7, 2016

Page 2: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

THE TALK THAT DID NOT MAKE IT ... !

Page 3: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

THE TALK THAT DID NOT MAKE IT ... !

1. An Overview of Stochastic Approximation andSample-Average Approximation Methods

Page 4: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

THE TALK THAT DID NOT MAKE IT ... !

1. An Overview of Stochastic Approximation andSample-Average Approximation Methods

2. Some References:

2.1 A Guide to SAA [Kim et al., 2014]2.2 Lectures on Stochastic Prgramming: Modeling and

Theory [Shapiro et al., 2009]2.3 Simulation Optimization: A Concise Overview and

Implementation Guide [Pasupathy and Ghosh, 2013]2.4 Introduction to Stochastic Search and

Optimization [Spall, 2003]

Page 5: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

THE TALK THAT MADE IT ...ADAPTIVE SAMPLING CONTROLLED STOCHASTIC RECURSIONS

1. Problem Statement

2. Canonical Rates in Simulation Optimization

3. Stochastic Approximation

4. Adaptive Sampling Controlled Stochastic Recursion(ASCSR)

5. The Optimality of ASCSR

6. Sample Numerical Experience

7. Final Remarks

Page 6: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

PROBLEM CONTEXTSIMULATION OPTIMIZATION

“Solve an optimization problem when only ‘noisy’ observationsof the objective functions/constraints are available.”

minimize f (x)

subject to g(x) ≤ 0, x ∈ D;

Page 7: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

PROBLEM CONTEXTSIMULATION OPTIMIZATION

“Solve an optimization problem when only ‘noisy’ observationsof the objective functions/constraints are available.”

minimize f (x)

subject to g(x) ≤ 0, x ∈ D;

– f : D → IR (and its derivative) can only be estimated, e.g.,Fm(x) = m−1

∑mi=1 Fj(x), where Fj(x) are iid random

variables with mean f (x);

– g : D → IRc can only be estimated usingGm = m−1

∑mi=1 Gj(x), where Gj(x) are iid random vectors

with mean g(x);

– unbiased observations of the derivative of f may or maynot be available.

Page 8: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

PROBLEM CONTEXTSTOCHASTIC ROOT FINDING

“Find a zero of a function when only ‘noisy’ observations of thefunction are available.”

find x

such that f (x) = 0, x ∈ D;

where

– f : D → IRc can only be estimated usingFm = m−1

∑mi=1 Fj(x), where Fj(x) are iid random vectors

with mean f (x).

Page 9: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

“STOCHASTIC COMPLEXITY,” CANONICAL RATES

Examples:

(i) ξ = E[X], ξ(m) = m−1∑m

i=1 Xi where Xi, i = 1, 2, . . . are iidcopies of X. Then, when E[X2] < ∞,

rmse(ξ(m), ξ) = O(m−1/2).

Page 10: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

“STOCHASTIC COMPLEXITY,” CANONICAL RATES

Examples:

(i) ξ = E[X], ξ(m) = m−1∑m

i=1 Xi where Xi, i = 1, 2, . . . are iidcopies of X. Then, when E[X2] < ∞,

rmse(ξ(m), ξ) = O(m−1/2).

(ii) ξ = g′(x) and ξ(m) =Ym(x + s)− Ym(x − s)

2s, where

g(·) : IR → IR and Yi(x), i = 1, 2, . . . are iid copies of Y(x)satisfying E[Y(x)] = g(x). Then, when s = Θ(m−1/6),

rmse(ξ(m), ξ) = O(m−1/3).

Page 11: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

“STOCHASTIC COMPLEXITY,” CANONICAL RATES

Examples:

(i) ξ = E[X], ξ(m) = m−1∑m

i=1 Xi where Xi, i = 1, 2, . . . are iidcopies of X. Then, when E[X2] < ∞,

rmse(ξ(m), ξ) = O(m−1/2).

(ii) ξ = g′(x) and ξ(m) =Ym(x + s)− Ym(x − s)

2s, where

g(·) : IR → IR and Yi(x), i = 1, 2, . . . are iid copies of Y(x)satisfying E[Y(x)] = g(x). Then, when s = Θ(m−1/6),

rmse(ξ(m), ξ) = O(m−1/3).

For forward differences, s = Θ(m−1/4),

rmse(ξ(m), ξ) = O(m−1/4).

Page 12: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

“STOCHASTIC COMPLEXITY,” CANONICAL RATES

Examples: ... contd.

(iii) Owing to (i), SO and SRFP algorithms “declare victory” ifthe error ‖Xk − x∗‖ in their solution estimator Xk decays asOp(1/

√Wk), where Wk is the total simulation effort

expended towards obtaining Xk.

Page 13: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

“STOCHASTIC COMPLEXITY,” CANONICAL RATES

But I hasten to add...

– There is a now a well-understood relationship betweensmoothness and complexity in convex problems primarilydue to the work of Alexander Shapiro, ArkadiNemirovskii, and Yuri Nesterov — see Bubeck (2014) for abeautiful monograph.

– Is there an analogous theory to be developed based on theassumed structural property of the sample-paths?

Page 14: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

STOCHASTIC APPROXIMATION (SA)Robbins and Monro (1951):

Xk+1 = Xk − akH(Xk),

where H(x) estimates h(x) , ∇f (x).

Page 15: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

STOCHASTIC APPROXIMATION (SA)Robbins and Monro (1951):

Xk+1 = Xk − akH(Xk),

where H(x) estimates h(x) , ∇f (x).Kiefer-Wolfowitz (1952) analogue for optimization:

Xk+1 = Xk − ak

(F(Xk + sk)− F(Xk)

sk

)

,

where F(x) estimates f (x).

Page 16: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

STOCHASTIC APPROXIMATION (SA)Robbins and Monro (1951):

Xk+1 = Xk − akH(Xk),

where H(x) estimates h(x) , ∇f (x).Kiefer-Wolfowitz (1952) analogue for optimization:

Xk+1 = Xk − ak

(F(Xk + sk)− F(Xk)

sk

)

,

where F(x) estimates f (x).Modern Incarnations:

Xk+1 = ΠD[Xk − akB−1k H(Xk)], (RM);

Xk+1 = ΠD[Xk − akB−1k ∇F(Xk)], (KW);

where D is the feasible space, and ΠD[x] denotes projection.

Page 17: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

STOCHASTIC APPROXIMATIONSA IS UBIQUITOUS

1. SA is probably amongst most used algorithms. (Typing“Stochastic Approximation” in Google Scholar brings upabout 1.77 million hits!)

2. SA is backed by more than six decades of research.

3. Enormous number of variations of SA have been createdand studied.

4. SA is used in virtually every field where there is a need forstochastic optimization (Pasupathy (2014)).

Page 18: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SA: ASYMPTOTICS

1. Convergence (L2,wp1) guaranteed assumingC.1 structural conditions on f , g;C.2

∑∞

k=1 ak = ∞;C.3

∑∞

k=1 a2k < ∞ for Robbins-Munro and

∑∞

k=1 a2k/s2

k < ∞, sk → 0 for Kiefer-Wolfowitz.

(C.3 can be weakened to ak → 0 [Broadie et al., 2011].)

2. The canonical rate of Op(1/√

k) is achievable forRobbins-Munro [Fabian, 1968, Polyak and Juditsky, 1992,Ruppert, 1985, Ruppert, 1991].

3. Deterioration in the Kiefer-Wolfowitzcontext [Mokkadem and Pelletier, 2011,Djeddour et al., 2008].(Loosely, when ρ/v(sk) is the deterministic bias of the

recursion, the best achievable rate is Θ(1/√

ks2k) achieved

when sk is chosen so that v(sk)−1

kc2k has a nonzero limit.)

Page 19: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

WHY AN ALTERNATIVE PARADIGM?

1. SA’s parameters are still difficult to choose.

– Conditions C.2 and C.3 leave enormous classes feasibleparameter sequences from which to choose. (See Broadie etal. [Broadie et al., 2011]; Vaidya andBhatnagar [Vaidya and Bhatnagar, 2006] for further detail.)

– Nemiroski, Juditsky, Lan andShapiro [Nemirovski et al., 2009] demonstrate that therecan be a severe degradation in the convergence rate ofSA-type methods if the parameters inherent to the functionare guessed incorrectly.

2. Shouldn’t advances in nonlinear programming beexploited more fully?

3. SA does not lend itself to trivial parallelization.

Page 20: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SAMPLING CONTROLLED STOCHASTIC RECURSION

(SCSR)AN ALTERNATIVE TO SA?

Instead of SA, why not just employ your favorite deterministicrecursion (e.g., quasi-Newton, trust region), and replaceunknown quantities in the recursion by Monte Carloestimators?

Page 21: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SAMPLING CONTROLLED STOCHASTIC RECURSION

(SCSR)AN ALTERNATIVE TO SA?

Instead of SA, why not just employ your favorite deterministicrecursion (e.g., quasi-Newton, trust region), and replaceunknown quantities in the recursion by Monte Carloestimators?

– Use a recursion (such as line search) as the underlyingsearch mechanism;

– Sample judiciously.

Page 22: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: LINE SEARCH

Page 23: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: LINE SEARCH

Page 24: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: LINE SEARCH

Page 25: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: LINE SEARCH

Page 26: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: LINE SEARCH

Page 27: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: LINE SEARCH

Page 28: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: GRADIENT SEARCH

Page 29: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: GRADIENT SEARCH

Page 30: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: GRADIENT SEARCH

Page 31: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSR: GRADIENT SEARCH

Page 32: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SAMPLING-CONTROLLED STOCHASTIC RECURSION

(SCSR)AN ALTERNATIVE TO SA?

Xk+1 = Xk + Hk(Xk,Mk, k), k = 1, 2, . . . . (SCSR)

xk+1 = xk + hk(xk, k), k = 1, 2, . . . . (DA)

Page 33: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SAMPLING-CONTROLLED STOCHASTIC RECURSION

(SCSR)AN ALTERNATIVE TO SA?

Xk+1 = Xk + Hk(Xk,Mk, k), k = 1, 2, . . . . (SCSR)

xk+1 = xk + hk(xk, k), k = 1, 2, . . . . (DA)

1. How should the sample size Mk be chosen (adatively) toensure convergence wp1 of the iterates Xk?

2. Can the canonical rate be achieved in such “practical”algorithms?

Page 34: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SAMPLING-CONTROLLED STOCHASTIC RECURSION

(SCSR)AN ALTERNATIVE TO SA?

Xk+1 = Xk + Hk(Xk,Mk, k), k = 1, 2, . . . . (SCSR)

xk+1 = xk + hk(xk, k), k = 1, 2, . . . . (DA)

1. Some theory on non-adaptive “optimal sampling rates”has been developed recently [Pasupathy et al., 2014]. More )

Page 35: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SAMPLING-CONTROLLED STOCHASTIC RECURSION

(SCSR)AN ALTERNATIVE TO SA?

Xk+1 = Xk + Hk(Xk,Mk, k), k = 1, 2, . . . . (SCSR)

xk+1 = xk + hk(xk, k), k = 1, 2, . . . . (DA)

1. Some theory on non-adaptive “optimal sampling rates”has been developed recently [Pasupathy et al., 2014]. More )

2. Virtually all recursions in [Ortega and Rheinboldt, 1970]and in [Duflo and Wilson, 1997] are subsumed.

3. Trust-region [Conn et al., 2000] and DFO-typerecursions [Conn et al., 2009] are subsumed with effort!

4. Two prominent “realizations” of SCSR-type algorithmsare [Byrd et al., 2012] and [Chang et al., 2013].

Page 36: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADATIVE SCSRTHE GUIDING PRINCIPLE FOR OPTIMAL SAMPLING

Write:

Xk+1 = Xk + Hk(Xk,Mk, k), k = 1, 2, . . . . (SCSR)

Page 37: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADATIVE SCSRTHE GUIDING PRINCIPLE FOR OPTIMAL SAMPLING

Write:

Xk+1 = Xk + Hk(Xk,Mk, k), k = 1, 2, . . . . (SCSR)

as

Xk+1 − x∗ = Xk + hk(Xk, k)− x∗︸ ︷︷ ︸

structural error

+Hk(Xk,Mk, k) − hk(Xk, k)︸ ︷︷ ︸

sampling error

.

Page 38: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADATIVE SCSRTHE GUIDING PRINCIPLE FOR OPTIMAL SAMPLING

Write:

Xk+1 = Xk + Hk(Xk,Mk, k), k = 1, 2, . . . . (SCSR)

as

Xk+1 − x∗ = Xk + hk(Xk, k)− x∗︸ ︷︷ ︸

structural error

+Hk(Xk,Mk, k) − hk(Xk, k)︸ ︷︷ ︸

sampling error

.

(i) Sample so that ‖Hk(Xk,Mk)−hk(Xk)‖ ≈ ‖Xk +hk(Xk, k)−x∗‖in some sense, for optimal evolution;(ii) Fast structural recursion with (i) ensures efficiency, a factthat is not immediately evident.

Page 39: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRSAMPLE SIZE DETERMINATION

How much to sample? Sample until structural error estimate ≈sampling error estimate?

Page 40: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRSAMPLE SIZE DETERMINATION

How much to sample? Sample until structural error estimate ≈sampling error estimate?

Mk|Fk = infm≥ν(k)

mǫse(Hk(Xk,m)) < c‖Hk(Xk,m)‖|Fk ,

Page 41: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRSAMPLE SIZE DETERMINATION

How much to sample? Sample until structural error estimate ≈sampling error estimate?

Mk|Fk = infm≥ν(k)

mǫse(Hk(Xk,m)) < c‖Hk(Xk,m)‖|Fk ,

which is usually,

Mk|Fk = infm≥ν(k)

mǫ σ(Xk,m)√m

< c‖Hk(Xk,m)‖|Fk

.

1. ν(k) → ∞ is the “escorting sequence,” and ǫ is the“coercion” constant.

2. The constants c, β > 0.

Page 42: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRHEURISTIC INTERPRETATION I: BYRD, CHIN, NOCEDAL AND WU (2012)

1. At Xk, d = Hk(Xk,Mk) is a descent direction at Xk if‖Hk(Xk,m)− hk(Xk)‖2 ≤ c‖H(Xk,m)‖2 for some c ∈ [0, 1).

2. Notice:

E[‖Hk(Xk,m)− hk(Xk)‖22|Fk] = V(Hk(Xk,m)|Fk).

The above two points inspires the heuristic:

Mk|Fk = infm√

V(Hk(Xk)|Fk) ≤ c‖Hk(Xk,m)‖2|Fk. (1)

(Sample until estimated error in gradient is less than c timesgradient estimate, i.e., until you are confident you have adescent direction.)

Page 43: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRHEURISTIC INTERPRETATION II: PASUPATHY AND SCHMEISER (2010)

1. The coefficient of variation of Hk(Xk,m)|Fk can beestimated as

cv (Hk(Xk,m)|Fk) =

V (Hk(Xk,m)|Fk)

Hk(Xk,m).

2. A “reasonable” heuristic is to then continue sampling untilthe absolute value of the estimated coefficient of variationdrops below the fixed threshold c.

Page 44: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: STANDING ASSUMPTIONS AND NOTATION

A.1 There exists a unique root x∗ such that h(x∗) = 0.

A.2 There exists ℓ0, ℓ1 such that for all x ∈ D,ℓ0‖x − x∗‖2

2 ≤ hT(x)h(x) ≥ ℓ1‖x − x∗‖22.

A.3 Hk(Xk,m) , h(Xk) +∑m

j=1 ξkj, where ξk is amartingale-difference process defined on the probabilityspace (Ω,F ,Fk,P), and ξkj are iid copies of ξk.

Page 45: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: SOME INTUITION ON ITERATION EVOLUTION

Letting Zk = Xk − x∗, we see that

Zk+1 = Zk +1

βh(Xk) +

1

β(H(Xk,Mk)− h(Xk)), and

Page 46: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: SOME INTUITION ON ITERATION EVOLUTION

Letting Zk = Xk − x∗, we see that

Zk+1 = Zk +1

βh(Xk) +

1

β(H(Xk,Mk)− h(Xk)), and

EΩ[Z2k+1|Fk] ≤

(

1 − 2ℓ0

β+

ℓ21

β2

)

Z2k

︸ ︷︷ ︸

structural error

+1

β2EΩ[‖H(Xk,Mk)− h(Xk)‖2|Fk]

︸ ︷︷ ︸

sampling error

.

Page 47: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: SOME INTUITION ON ITERATION EVOLUTION

Letting Zk = Xk − x∗, we see that

Zk+1 = Zk +1

βh(Xk) +

1

β(H(Xk,Mk)− h(Xk)), and

EΩ[Z2k+1|Fk] ≤

(

1 − 2ℓ0

β+

ℓ21

β2

)

Z2k

︸ ︷︷ ︸

structural error

+1

β2EΩ[‖H(Xk,Mk)− h(Xk)‖2|Fk]

︸ ︷︷ ︸

sampling error

.

Recall Guiding Principles:(i) EΩ[‖H(Xk,Mk)− h(Xk)‖2|Fk] ≈ h2(Xk) for opt. evolution;(ii) fast structural recursion with (i) for efficiency.

Page 48: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: CONSISTENCY

TheoremLet the sequence νk satisfy

k ν−1k < ∞. Then the A-SCSR

iterates Xk satisfy Xka.s.→ x∗ as k → ∞.

Page 49: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: CONSISTENCY

TheoremLet the sequence νk satisfy

k ν−1k < ∞. Then the A-SCSR

iterates Xk satisfy Xka.s.→ x∗ as k → ∞.

Proof Sketch.

EΩ[‖H(Xk,Mk)− h(Xk)‖2|Fk]

≤ 1

νkEΩ[Mk‖H(Xk,Mk)− h(Xk)‖2|Fk]

≤ 1

νkEΩ[sup

m‖√

m (H(Xk,m)− h(Xk)) ‖2|Fk]

= O(1

νk).

Page 50: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: QUALITY OF ESTIMATOR

TheoremLet σ2 = V(Y1(x

∗)) < ∞. Recalling that

Mk|Fk = infm≥ν(k)

mǫ σ(Xk,m)√m

< c‖H(Xk,m)‖|Fk

, we have as

k → ∞,E[‖H(Xk+1,Mk+1)‖2|Fk]

E[M−1+2ǫk+1 |Fk]

a.s.→ σ2

c2.

1. Proof relies on the fact that the conditional second momentof the excess is uniformly bounded away from infinity.

2. The theorem essentially connects the sampling error withthe sequential sample size.

Page 51: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: BEHAVIOR OF SAMPLE SIZE

TheoremDenote η = 2/(1 − 2ǫ). The following hold as k → ∞ and for someδ > 0.

(i) If x ≤ 4−η/2(σ2

c2 )η/2, then

Phη(Xk)Mk ≤ x|Fk ≤ exp−h−δ(Xk).

(ii) If x ≥ 4η/2(σ2

c2 )η/2, then

Phη(Xk)Mk ≥ x|Fk ≤ exp−h−δ(Xk).

(In English, Mk concentrates around h−η(Xk).)

Page 52: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: BEHAVIOR OF SAMPLE SIZE

Page 53: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: BEHAVIOR OF SAMPLE SIZE

TheoremDenote η = 2/(1 − 2ǫ). Then following hold almost surely.

(i) lim infk hη(Xk)E[Mk|Fk] ≥ 4−η/2(σ2

c2 )η/2.

(ii) lim supk hη(Xk)E[Mk|Fk] ≤ 4η/2(σ2

c2 )η/2.

Page 54: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: BEHAVIOR OF SAMPLE SIZE

TheoremDenote η = 2/(1 − 2ǫ). Then following hold almost surely.

(i) lim infk hη(Xk)E[Mk|Fk] ≥ 4−η/2(σ2

c2 )η/2.

(ii) lim supk hη(Xk)E[Mk|Fk] ≤ 4η/2(σ2

c2 )η/2.

TheoremDenote η = 2/(1 − 2ǫ). Then following hold almost surely.

(i) lim infk h−2(Xk)E[M−1+2ǫk |Fk] ≥ 1/4.

(ii) lim supk h−2(Xk)E[M−1+2ǫk |Fk] ≤ 4.

(Loosely, E[M−1+2ǫk |Fk] ≈ h2(Xk).)

Page 55: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHEORETICAL RESULTS: EFFICIENCY

TheoremLet Wk =

j Mj denote the total simulation effort after k iterations.Then,

(i) E[‖Xk − x∗‖2W1−2ǫk ] = O(1) as k → ∞;

(ii) If Mk = op(Wk), then W1−2ǫk ‖Xk − x∗‖2 p→∞.

1. The result says that the mean squared errorE[‖Xk − x∗‖2] ≈ (E[Wk])

−1, coinciding with the estimationrate.

2. Sampling should be atleast “geometric,” irrespective oferror!

Page 56: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

ADAPTIVE SCSRTHE ESCORT SEQUENCE AND THE COERCION CONSTANT

TheoremLet Wk =

j Mj denote the total simulation effort after k iterations.

Then, PMk = νk i.o. = 0.

*(solution)

(initial guess)

k (escort parameter)

(correction constant)

Page 57: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

NUMERICAL ILLUSTRATION

AluffiPentini Function Rosenbrock Function

g(x) = Eξ[0.25(x1ξ)4 − 0.5(x1ξ)

2 +0.1(x1ξ) + 0.5x2

2], ξ ∼ N(1, 0.1)g(x) = Eξ[100(x2 − (x1 ξ)

2)2 +(x1 ξ − 1)2], ξ ∼ N(1, 0.1)

Page 58: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

NUMERICAL ILLUSTRATIONSAMPLE SIZE BEHAVIOR

Page 59: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

NUMERICAL ILLUSTRATIONSAMPLE SIZE BEHAVIOR

Page 60: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

NUMERICAL ILLUSTRATION

Page 61: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SUMMARY AND FINAL REMARKS

1. Main Insight for Canonical Rates:“Sample until the standard error estimate (of the objectbeing estimated within the recursion) is in lock step withthe estimate itself.”

Page 62: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SUMMARY AND FINAL REMARKS

1. Main Insight for Canonical Rates:“Sample until the standard error estimate (of the objectbeing estimated within the recursion) is in lock step withthe estimate itself.”Some details, however, seem important.

– The escorting sequence νk is needed to bring iterates tothe vicinity of the root.

– The coercion constant ǫ is needed, unfortunately, to makesure that the sampling error drops at the requisite rate.

Page 63: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SUMMARY AND FINAL REMARKS

1. Main Insight for Canonical Rates:“Sample until the standard error estimate (of the objectbeing estimated within the recursion) is in lock step withthe estimate itself.”Some details, however, seem important.

– The escorting sequence νk is needed to bring iterates tothe vicinity of the root.

– The coercion constant ǫ is needed, unfortunately, to makesure that the sampling error drops at the requisite rate.

2. Generalization to faster recursions will involve acorresponding higher power of the object estimate.

Page 64: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SUMMARY AND FINAL REMARKS

1. Main Insight for Canonical Rates:“Sample until the standard error estimate (of the objectbeing estimated within the recursion) is in lock step withthe estimate itself.”Some details, however, seem important.

– The escorting sequence νk is needed to bring iterates tothe vicinity of the root.

– The coercion constant ǫ is needed, unfortunately, to makesure that the sampling error drops at the requisite rate.

2. Generalization to faster recursions will involve acorresponding higher power of the object estimate.

3. Incorportaion of biased estimators, non-stationaryrecursions that include more than just the current pointseems within reach.

Page 65: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

Broadie, M., Cicek, D. M., and Zeevi, A. (2011).General bounds and finite-time improvement for thekiefer-wolfowitz stochastic approximation algorithm.Operations Research, 59(5):1211–1224.

Byrd, R. H., Chin, G. M., Nocedal, J., and Wu, Y. (2012).Sample size selection for optimization methods formachine learning.Mathematical Programming, Series B, 134:127–155.

Chang, K., Hong, J., and Wan, H. (2013).Stochastic trust-region response-surface method (strong) –a new response-surface framework for simulationoptimization.INFORMS Journal on Computing.To appear.

Conn, A. R., Gould, N. I. M., and Toint, P. L. (2000).Trust-Region Methods.

Page 66: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SIAM, Philadelphia, PA.

Conn, A. R., Scheinberg, K., and Vincente, L. N. (2009).Introduction to Derivative-Free Optimization.SIAM, Philadelphia, PA.

Djeddour, K., Mokkadem, A., and Pelletier, M. (2008).On the recursive estimation of the location and of the sizeof the mode of a probability density.Serdica Mathematics Journal, 34:651–688.

Duflo, M. and Wilson, S. S. (1997).Random Iterative Models.Springer, New York, NY.

Fabian, V. (1968).On asymptotic normality in stochastic approximation.Annals of Mathematical Statistics, 39:1327–1332.

Kim, S., Pasupathy, R., and Henderson, S. G. (2014).A guide to SAA.

Page 67: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

Frederick Hilliers OR Series. Elsevier.

Mokkadem, A. and Pelletier, M. (2011).A generalization of the averaging procedure: The use oftwo-time-scale algorithms.SIAM Journal on Control and Optimization, 49:1523.

Nemirovski, A., Juditsky, A., Lan, G., and Shapiro, A.(2009).Robust stochastic approximation approach to stochasticprogramming.SIAM Journal on Optimization, 19(4):1574–1609.

Ortega, J. M. and Rheinboldt, W. C. (1970).Iterative Solution of Nonlinear Equations in Several Variables.Academic Press, New York, NY.

Pasupathy, R. and Ghosh, S. (2013).Simulation optimization: A concise overview andimplementation guide.

Page 68: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

INFORMS TutORials. INFORMS.

Pasupathy, R., Glynn, P. W., Ghosh, S. G., and Hahemi, F. S.(2014).How much to sample in simulation-based stochasticrecursions?Under Review.

Polyak, B. T. and Juditsky, A. B. (1992).Acceleration of stochastic approximation by averaging.SIAM Journal on Control and Optimization, 30(4):838–855.

Ruppert, D. (1985).A Newton-Raphson version of the multivariateRobbins-Monro procedure.Annals of Statistics, 13:236–245.

Ruppert, D. (1991).Stochastic approximation.Handbook in Sequential Analysis, pages 503–529. Dekker,New York, NY.

Page 69: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

Shapiro, A., Dentcheva, D., and Ruszczynski, A. (2009).Lectures on Stochastic Programming: Modeling and Theory.SIAM, Philadelphia, PA.

Spall, J. C. (2003).Introduction to Stochastic Search and Optimization.John Wiley & Sons, Inc., Hoboken, NJ.

Vaidya, R. and Bhatnagar, S. (2006).Robust optimization of random early detection.Telecommunication Systems, 33(4):291–316.

Page 70: Adaptive Sampling Controlled Stochastic Recursions

Outline Preliminaries Stochastic Approximation SCSR and ASCSR Final Remarks

SCSR: HOW MUCH TO SAMPLE?THEORETICAL GUIDANCE

Polynomial (λ p, p ) Geometric (c ) Exponential(λ t, t )

Sublinear(λ s, s)

Linear(ℓ)

Superlinear(λ q, q )

k−p α+1

k−s

k−s k−s

k−p α

ℓk

c−α k

ℓk

k−p α c−α k

c−α p k

2

c−α t k

1

pα = 1 + s

ℓ = c−α

p = t

Back