CHAPTER 6 S TOCHASTIC A PPROXIMATION AND THE F INITE- D IFFERENCE M ETHOD
description
Transcript of CHAPTER 6 S TOCHASTIC A PPROXIMATION AND THE F INITE- D IFFERENCE M ETHOD
CHAPTER 6CHAPTER 6
SSTOCHASTIC TOCHASTIC AAPPROXIMATION AND PPROXIMATION AND
THE THE FFINITE-INITE-DDIFFERENCE IFFERENCE MMETHODETHOD
•Organization of chapter in ISSO–Contrast of gradient-based and gradient-free algorithms
–Motivating examples
–Finite-difference algorithm–Convergence theory–Asymptotic normality–Selection of gain sequences–Numerical examples–Extensions and segue to SPSA in Chapter 7
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall
6-2
Motivation for AlgorithmsMotivation for AlgorithmsNot Requiring Gradient of Loss FunctionNot Requiring Gradient of Loss Function
• Primary interest here is in optimization problems for which we cannot obtain direct measurements of L/
cannotcannot use techniques such as Robbins-Monro SA, steepest descent, etc.
cancan (in principle) use techniques such as Kiefer and Wolfowitz SA (Chapter 6), genetic algorithms (Chapters 9–10),…
• Many such “gradient-free” problems arise in practice
– Generic difficult parameter estimation
– Model-free feedback control
– Simulation-based optimization
– Experimental design: sensor configuration
6-3
Model-Free Control Setup Model-Free Control Setup (Example 6.2 in (Example 6.2 in ISSOISSO))
6-4
Finite Difference SA (FDSA) MethodFinite Difference SA (FDSA) Method
• FDSA has standard “first-order” form of root-finding (Robbins-Monro) SA– Finite difference approximation replaces direct gradient
measurement (Chap. 5)
– Resulting algorithm sometimes called Kiefer-Wolfowitz SAKiefer-Wolfowitz SA
• Let denote FD estimate of g() at kth iteration (next slide)
• Let denote estimate for at kth iteration• FDSA algorithm has form
where ak is nonnegative gain value
• Under conditions, in stochastic sense (a.s.)
ˆ ( )kg
k̂
1ˆ ˆ ˆˆ ( )k k k k ka g
k̂
6-5
Finite Difference Gradient ApproximationFinite Difference Gradient Approximation• Classical method for approximating gradients in Kiefer-
Wolfowitz SA is by finite differences• FD gradient approximation used in SA recursion as gradient
measurement (previous slide)
• Standard two-sided gradient approximation at iteration k is
where j is p-dimensional with 1 in jth entry, 0 elsewhere
• Each computation of FD approximation takes 2p measurements y(•)
k k k k
k
k k
k k p k k p
k
y c y cc
y c y c
c
1 1ˆ ˆ( ) ( )
2
ˆˆ ( )
ˆ ˆ( ) ( )
2
g
6-6
Shaded TriangleShaded Triangle Shows Shows Valid Coefficient Valid Coefficient Values Values and and in Gain Sequences in Gain Sequences aakk = =
aa//((kk+1++1+AA)) and and cckk = = cc//((kk+1)+1) (Sect. 6.5 of (Sect. 6.5 of ISSOISSO))
Solid line indicates non-strict Solid line indicates non-strict border (border ( or or ) and dashed ) and dashed line indicates strict border (>)line indicates strict border (>)
6-7
Example: Wastewater Treatment Problem Example: Wastewater Treatment Problem (Example 6.5 in (Example 6.5 in ISSOISSO))
• Small-scale problem with p = 2– Aim is to optimize water cleanliness and methane gas byproduct
– Evaluated algorithms with 50 realizations of N = 2000 measurements
• Used FDSA with gains ak = a/(1 + k) and ck = 1/(1 + k)1/6
– Asymptotically optimal decay rates found “best”
• Gain tuning chooses a; naïve gain sets a = 1• Also compared with random search algorithm B from Chapter 2• Algorithms use noisy loss measurements (same level as in
Example 2.7 in ISSO)
6-8
Mean values ofMean values of L() with 95% Confidence Intervals with 95% Confidence Intervals
FDSA with “naïve” gains
FDSA with tuned gains
N = 100 (25 iters.)
0.11 [0.087, 0.140]
0.083 [0.057, 0.108]
N = 2000 (500 iters.)
0.023 [0.017, 0.028]
0.021 [0.016, 0.026]
Above numbers much lower than random search algorithm B: best value at N = 2000 is 0.38
Shows value of approximating gradient in FDSA
ˆ( )kL
6-9
Example: Skewed-Quartic Loss FunctionExample: Skewed-Quartic Loss Function(Examples 6.6 and 6.7 in (Examples 6.6 and 6.7 in ISSOISSO))
• Larger-scale problem with p = 10:
()i is the i th component of B, and pB is an upper triangular
matrix of ones
• Used N = 1000 measurements; 50 replications
• Used FDSA with gains ak = a/(1+k+A) and ck = c/(1+k)
• “Semi-automatic” and manual gain tuning
• Also compared with random search algorithm B
3 4
1 1
( ) 0.1 ( ) 0.01 ( )p p
T Ti i
i i
L B B B B
6-10
Algorithm Comparison with Skewed-Quartic Algorithm Comparison with Skewed-Quartic Loss Function (Loss Function (pp = 10) (Example 6.6 in = 10) (Example 6.6 in ISSOISSO))
6-11
Example with Skewed-Quartic Loss: Example with Skewed-Quartic Loss: Mean Terminal Values and 95% Confidence Mean Terminal Values and 95% Confidence
Intervals for Intervals for
FDSA: semi-automatic
gains
FDSA: manually
tuned gains
Random search B
0.427 [0.411, 0.443]
0.531 [0.502, 0.561]
1.285 [1.190, 1.378]
FDSA semi-automatic is best with respect to error
Random search algorithm B produces solution further from
than initial condition!
But loss value is better than initial condition
k 0ˆ ˆ