CHAPTER 6 S TOCHASTIC A PPROXIMATION AND THE F INITE- D IFFERENCE M ETHOD

CHAPTER 6CHAPTER 6

SSTOCHASTIC TOCHASTIC AAPPROXIMATION AND PPROXIMATION AND

THE THE FFINITE-INITE-DDIFFERENCE IFFERENCE MMETHODETHOD

•Organization of chapter in ISSO–Contrast of gradient-based and gradient-free algorithms

–Motivating examples

–Finite-difference algorithm–Convergence theory–Asymptotic normality–Selection of gain sequences–Numerical examples–Extensions and segue to SPSA in Chapter 7

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

6-2

Motivation for AlgorithmsMotivation for AlgorithmsNot Requiring Gradient of Loss FunctionNot Requiring Gradient of Loss Function

• Primary interest here is in optimization problems for which we cannot obtain direct measurements of L/

cannotcannot use techniques such as Robbins-Monro SA, steepest descent, etc.

cancan (in principle) use techniques such as Kiefer and Wolfowitz SA (Chapter 6), genetic algorithms (Chapters 9–10),…

• Many such “gradient-free” problems arise in practice

– Generic difficult parameter estimation

– Model-free feedback control

– Simulation-based optimization

– Experimental design: sensor configuration

6-3

Model-Free Control Setup Model-Free Control Setup (Example 6.2 in (Example 6.2 in ISSOISSO))

6-4

Finite Difference SA (FDSA) MethodFinite Difference SA (FDSA) Method

• FDSA has standard “first-order” form of root-finding (Robbins-Monro) SA– Finite difference approximation replaces direct gradient

measurement (Chap. 5)

– Resulting algorithm sometimes called Kiefer-Wolfowitz SAKiefer-Wolfowitz SA

• Let denote FD estimate of g() at kth iteration (next slide)

• Let denote estimate for at kth iteration• FDSA algorithm has form

where ak is nonnegative gain value

• Under conditions, in stochastic sense (a.s.)

ˆ ( )kg

k̂

1ˆ ˆ ˆˆ ( )k k k k ka g

k̂

6-5

Finite Difference Gradient ApproximationFinite Difference Gradient Approximation• Classical method for approximating gradients in Kiefer-

Wolfowitz SA is by finite differences• FD gradient approximation used in SA recursion as gradient

measurement (previous slide)

• Standard two-sided gradient approximation at iteration k is

where j is p-dimensional with 1 in jth entry, 0 elsewhere

• Each computation of FD approximation takes 2p measurements y(•)

k k k k

k

k k

k k p k k p

k

y c y cc

y c y c

c

1 1ˆ ˆ( ) ( )

2

ˆˆ ( )

ˆ ˆ( ) ( )

2

g

6-6

Shaded TriangleShaded Triangle Shows Shows Valid Coefficient Valid Coefficient Values Values and and in Gain Sequences in Gain Sequences aakk = =

aa//((kk+1++1+AA)) and and cckk = = cc//((kk+1)+1) (Sect. 6.5 of (Sect. 6.5 of ISSOISSO))

Solid line indicates non-strict Solid line indicates non-strict border (border ( or or ) and dashed ) and dashed line indicates strict border (>)line indicates strict border (>)

6-7

Example: Wastewater Treatment Problem Example: Wastewater Treatment Problem (Example 6.5 in (Example 6.5 in ISSOISSO))

• Small-scale problem with p = 2– Aim is to optimize water cleanliness and methane gas byproduct

– Evaluated algorithms with 50 realizations of N = 2000 measurements

• Used FDSA with gains ak = a/(1 + k) and ck = 1/(1 + k)1/6

– Asymptotically optimal decay rates found “best”

• Gain tuning chooses a; naïve gain sets a = 1• Also compared with random search algorithm B from Chapter 2• Algorithms use noisy loss measurements (same level as in

Example 2.7 in ISSO)

6-8

Mean values ofMean values of L() with 95% Confidence Intervals with 95% Confidence Intervals

FDSA with “naïve” gains

FDSA with tuned gains

N = 100 (25 iters.)

0.11 [0.087, 0.140]

0.083 [0.057, 0.108]

N = 2000 (500 iters.)

0.023 [0.017, 0.028]

0.021 [0.016, 0.026]

Above numbers much lower than random search algorithm B: best value at N = 2000 is 0.38

Shows value of approximating gradient in FDSA

ˆ( )kL

6-9

Example: Skewed-Quartic Loss FunctionExample: Skewed-Quartic Loss Function(Examples 6.6 and 6.7 in (Examples 6.6 and 6.7 in ISSOISSO))

• Larger-scale problem with p = 10:

()i is the i th component of B, and pB is an upper triangular

matrix of ones

• Used N = 1000 measurements; 50 replications

• Used FDSA with gains ak = a/(1+k+A) and ck = c/(1+k)

• “Semi-automatic” and manual gain tuning

• Also compared with random search algorithm B

3 4

1 1

( ) 0.1 ( ) 0.01 ( )p p

T Ti i

i i

L B B B B

6-10

Algorithm Comparison with Skewed-Quartic Algorithm Comparison with Skewed-Quartic Loss Function (Loss Function (pp = 10) (Example 6.6 in = 10) (Example 6.6 in ISSOISSO))

6-11

Example with Skewed-Quartic Loss: Example with Skewed-Quartic Loss: Mean Terminal Values and 95% Confidence Mean Terminal Values and 95% Confidence

Intervals for Intervals for

FDSA: semi-automatic

gains

FDSA: manually

tuned gains

Random search B

0.427 [0.411, 0.443]

0.531 [0.502, 0.561]

1.285 [1.190, 1.378]

FDSA semi-automatic is best with respect to error

Random search algorithm B produces solution further from

than initial condition!

But loss value is better than initial condition

k 0ˆ ˆ

CHAPTER 6 S TOCHASTIC A PPROXIMATION AND THE F INITE- D IFFERENCE M ETHOD

Documents

Transcript of CHAPTER 6 S TOCHASTIC A PPROXIMATION AND THE F INITE- D IFFERENCE M ETHOD