Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

16
Journal of Theoretical Probability, Vol. 1, No. 2, 1988 Minimizing Noisy Functionais in Hilbert Space: An Extension of the Kiefer-Wolfowitz Procedure Larry Goldstein ~ Received April 21, 1987 Let H be an infinite-dimensional real separable Hilbert space and R the real line. Given an unknown functional g: H~R that can only be observed with random error, we consider a recursive method to locate an extremum of g. Applications to optimal stochastic control are presented. KEY WORDS: Infinite-dimensional Hilbert space; Kiefer-Wolfowitz proce- dure; recursive method; noisy functionals 1. INTRODUCTION Let H be a real separable Hilbert space with basis {r and g a real-valued functional on H that can only be observed with random error. When H is the finite-dimensional Euclidean space E.~, an extremum of g may be located by the Kiefer-Wolfowitz stochastic approximation method as follows(4-6): Let xl ~ E,. be an initial estimate of the extremum, and now generate successive estimates by the recursion xn+l=xn-a,~ Vcog(xn) 2c~J n~>l where Vc,,g(x.) is the finite difference approximation to the gradient of g at xn with kth component given by g(x. + c.r - g(x. - c.r 2c,~ 1Department of Mathematics, University of Southern California, Los Angeles, California 90089. 189 0894-9840/88/0400-0189506.00/0 1988Plenum Publishing Corporation

Transcript of Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

Page 1: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

Journal of Theoretical Probability, Vol. 1, No. 2, 1988

Minimizing Noisy Functionais in Hilbert Space: An Extension of the Kiefer-Wolfowitz Procedure

Larry Goldstein ~

Received April 21, 1987

Let H be an infinite-dimensional real separable Hilbert space and R the real line. Given an unknown functional g: H ~ R that can only be observed with random error, we consider a recursive method to locate an extremum of g. Applications to optimal stochastic control are presented.

KEY WORDS: Infinite-dimensional Hilbert space; Kiefer-Wolfowitz proce- dure; recursive method; noisy functionals

1. I N T R O D U C T I O N

Let H be a real separable Hilbert space with basis {r and g a real-valued functional on H that can only be observed with random error. When H is the finite-dimensional Euclidean space E.~, an extremum of g may be located by the Kiefer-Wolfowitz stochastic approximation method as fo l lows(4 -6 ) :

Let xl ~ E,. be an initial estimate of the extremum, and now generate successive estimates by the recursion

xn+l=xn-a,~ Vcog(xn) 2c~J n~>l

where Vc,, g(x.) is the finite difference approximation to the gradient of g at xn with kth component given by

g ( x . + c.r - g ( x . - c . r

2c,~

1Depar tment of Mathematics, University of Southern California, Los Angeles, California 90089.

189

0894-9840/88/0400-0189506.00/0 �9 1988 Plenum Publishing Corporation

Page 2: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

190 Goldstein

{~Jn}n~176 is an Era-valued error process, and {a~},~_~ a sequence of step sizes. Note that the estimate of Vc~ requires 2m observations. Conditions under which this procedure converges to an extremum are given in Nevel'son and Has'minskii (6) and Kushner and Clark. (s)

Alternatively, one could locate the extremum by the method of ran- d ~o dora directions.(5) In this method, one generates random directions { n }n =

uniformly distributed over the surface of the unit sphere in Era, and then estimates the directional derivative in this direction. The estimate of the extremum then moves in this direction or opposite to it, by an amount dictated by the estimate of the directional derivative. Here successive estimates of the extremum are generated by the recursion

x~ + l = x~ - a.dn I g(x~ + c~d.)2c.- g(x. - c~d~) + t),,~,2c~] n ~> 1

Note that each iteration requires only two observations of g, regardless of the dimension m, and that here {~On}~_ 1 is a scalar error process.

In the case where H is no longer finite dimensional, neither of the above methods is immediately applicable, as the first would require an infinite number of observations at each iteration and the second depends on the existence of a uniform distribution on the unit sphere. We can, however, restrict attention at iteration n to a subspace of finite dimension k, and let k, 1' ~ while employing the random directions method requiring only two observations per iteration regardless of kn to keep the algorithm computationally feasible.

This approach necessitates restrictions on the functional g, perhaps the most serious being that in each finite-dimensional subspace, g should have a "pseudo-extremum" toward which the process can tend while waiting for the dimension to increase.

Stochastic approximation schemes for locating roots of functions have been studied in spaces more general than E m by many authors; see, for example, Dvoretzky, (3) Revesz, (8'9) Salov, ~12) Venter, (13) and Walk. (14'15) The idea of using a nested sequence of finite-dimensional subspaces of a Hilbert space appears also in Bertran (2/ and Nixdorf. (7) In Ref. 7 the stochastic approximation scheme for root finding is considered and an invariance principle derived. The problem considered in Ref. 2 is similar to that considered here; in Ref. 2, however, it is assumed that the derivative of g is observable with error; here we derive a procedure that succeeds when only g is observable with error.

Section 2 presents the main theorem; in Section 3 an application to optimal stochastic control is given; proofs are found in Section 4.

Page 3: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

Minimizing Funetionals in Hilbert Space 191

2. STATEMENT OF THE MAIN THEOREM

Let H be an infinite-dimensional real separable Hilbert space with inner product ( . , - ) and norm HxH =(x , x) ~/2. Let R be the real line. Let {~bk}2L ~ be a complete orthonormal set in H and R J= span {r ~bs}. We write x = c q ~ b l + ~ 2 r +c~s~b j as x=(cq ,~2, . . . , c @, where c~1, c~2,..., e j are real scalars. Let L(H) be the set of all bounded linear transformations from H to H with norm IlAll=supll~ll~l IIAxl[, and P / : H - * R j the orthogonal projection of H onto R j. K, K~, K 2 .... will denote positive constants, not necessarily the same at each occurence.

Theorem 2.1. Let g: H ~ R and assume that

Vet(0, 1),

Further, assume

g is twice Frechet differentiable with bounded second Frechet derivative. Denote the first Frechet derivative of g by f (2.1)

g achieves a unique minimum at 2 ~ H. (2.2)

Assume the existence of a sequence of pseudoextrema" { j}j=~ and a strictly positive function r(e) defined on (0, 1) such that

~j~RJ (2.3)

lim l l2j- Xll = 0 (2.4) j ~ o o

I1:?;+ ~ - )zjFI < oo ( 2 . 5 ) j = l

inf (x - Y~j, f ( x ) ) >1 z(e) (2.6) x ~ ll.I

~ ~ [ I x - - "2jll <~ e - 1

V x s R s, ][Psf(x)ll2<~K(1 + Ilx- ~jll 2) (2.7)

Now choose a sequence of random search directions {d,}ff= 1 as follows:

Let {k,}ff=l be an integer valued sequence such that

k ~ ) l , k,<~k,,+~<~k,,+l, lira k n =o o (2.8) / 7 ~ C O

where qT, n = 1,2,...; i = 1,2,...,k,+1 are mutually independent and identically distributed normally with mean zero and variance one.

Now let

(2.9)

(2.1o)

Page 4: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

192 Goldstein

Let

xI=O and (2.11 )

x " + l = x n - a n d ~ I g ( x n + c n d n ) - g ( x " - c n d n ) + ~ k " , n>~l

where {~b.}~=l is a sequence of real-valued random variables such that

d n i s i n d e p e n d e n t o f F . = ( r { d i , ~ j : l < ~ i < . n - l , l < ~ j < . n } , (2.12)

and with Bn -- a { di, fig; 1 ~< i <~ n - 1 }

E [ ~ . I B n ] = 0 and E [ ~ ] L B . ] <<~2 a.s. (2.13)

a 0o We require the sequences { n}.=x, {c.}ff=~, {kn}ff=~ to satisfy

a. > 0, lim an = 0 (2.14a) n ~ 3

cn>0, lira c . = 0 (2.14b)

~ a~= oo (2.15) n = l

< oo (2.16) n = 1 Cn

and

a,,enk~/+l < oo (2.17) n = l

Then, under conditions (2.1)-(2.17) the sequence x, defined by the recur- sion (2.1i) converges to 2, the unique minimum of the functional g, with probability one.

Note: To satisfy conditions (2.8) and (2.14)-(2.17) we may take a, = 1/n, c, : n -V, and k,+ 1= [n~], where/~ and 7 lie in the interior of the triangle with vertices (0, 0), (0, 1/2), (1/4, 3/8) in the/3- 7 plane.

3. APPLICATIONS TO OPTIMAL STOCHASTIC C O N T R O L

We show below that a version of the stochastic linear-quadratic regulator problem of optimal control may be solved recursively using Theorem 2.1.

Page 5: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

Minimizing Functionals in Hilbert Space 193

We change notation here to agree with the standard notation for this problem in the literature. The variable x will now represent a "state" in E, and u a "control" in Em; the functional to be minimized is g(u)= E[J(u)], where J( . ) is defined below.

Let u = L ~ [ 0 , 1], the set of all Era-valued measurable functions u(.) on [0, 1] such that ~o 1 u r ( z ) u ( z ) d r < oo. The operation ( u , v ) u =

1 T ~o u (~)v(z)dr makes U into a real separable Hilbert space with norm Itull,~ = (u, ,,~1/2 ~ m - We define the space X = L~[0, 1] similarly.

Now let there be given an initial vector x o s E, , matrices A : E , ~ E, , B: E m ~ E,, C: E k ~ E, , G: E, ~ E, , and H: Em --, Em with G and H sym- metric nonnegative definite and symmetric positive definite, respectively, and W(.) a mean zero Brownian motion in Ek on [0, 1 ].

For a given u e U, let the state of a system obey the stochastic integral equation

;o ;0 - x ( t ) = X o + A x ( z ) d z + Bu(z )dz+ CdW~, t e [ 0 , 1 ] (3.1)

and let the cost of the policy u be given by

J(u) = [x r ( r ) Gx(z) + ur(z) Hu(r ) ] dz (3.2)

Now consider the following problem: Given J(u) for any element u~ U, locate the element t ~ U that minimizes g(u)= E[J(u)]. We show below that Theorem 2.1 may be used to locate 5 recursively where we assume that the W(-) process at each iteration is independent of the W(.) process in previous iterations. We remark that all parameters mentioned above and even the form of the functional g are not assumed to be known.

To show that the hypotheses of Theorem 2.1 are satisfied under the above assumptions we first put Eq. (3.1) in the following form:

where

and

x ( t ) = ~ ( t ) [ X o + f ] ~ - ~ ( z ) B u ( z ) & + f ~ c ~ ' l ( z ) C d W ~ ]

q~ ( t ) = A q~ ( z ) dr

~(o)=I

Page 6: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

194 Goldstein

Let L: U ~ X be given by

f2 (LH)(t) = q~)(t) ~D-I(T) BU(T) tiT;

note that L is a bounded linear operator. Let b e X be defined by b = r Xo, and v ~ X by

v= ~( t ) cb-l(z) CdW~

We now restate the problem:

Minimize E[J(u)] = El(x, Gx)x + (u, H u ) c ]

subject to x = Lu + b + v

when only J(u) is available for any u e U. Substituting,

J(u) = (Lu, GLu)x + (b, Gb)x + (v, Gv)x

+ 2(Lu, Gv)x+2(Lu, Gb)x+2(v, G b )x + (u, Hu)u (3.3)

and

g(u) = E[J(u)]

= (L'GLu, U)u + 2(L'Gb, u)u + (Hu, u)u + (Gb, b)x + E(v, Gv-)x (3.4)

where prime denotes adjoint. Now g(u) is twice Frechet differentiable with first and second Frechet derivatives given by

f ( u ) = 2 [ ( L ' G L + H ) u + L ' G b ] and 2(L 'GL+H) respectively (3.5)

In particular, we seen g has bounded second Frechet derivative. The functional g achieves a unique minimum ~ that may be found by

solving

(L'GL + H) fi+ L'Gb=O

so that

~= - ( L ' GL + H) -1 L' Gb

and hence conditions (2.1) and (2.2) of Theorem 2.1 are satisfied. We now demonstrate the existence of the "pseudoextrema." Let E be

any finite-dimensional subspace of U. We show that the "pseudoextremum" ~E is given by the solution of a minimization problem over E.

Page 7: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

Minimizing Functionals in Hilbert Space 195

Let P: U ~ E be the orthogonal projection of U onto E. Consider ge(u); the restriction of g to E. For u E E, Pu = u and we write, by (3.4)

ge(u) = (L' GLPu, Pu)E + 2(L' Gb, PU)E

+ (HPu, PU)E + (Gb, b)x + E(v, Gv)x

= (PL'GLPu, U)E+ 2(PL'Gb, U)e

+ (PHPu, u)e + (Gb, b)x + E(v, Gv)x

and so the Frechet derivative re(u) of ge(u) is given by

�89 = (PL' GLP + PHP) u + PL' Gb

Since (PL 'GLP + PHP) is invertible when restricted as a map from E to E, we may write

f e = - ( P L ' GLP + P H P ) - ~ PL' Gb (3.6)

where the inverse is understood in this sense. Hence, for a sequence of finite-dimensional subspaces { R J } j ~ - 1 with R j ~ 1~ j + 1 and

U = U R J j=l

we can define

fftj = - (PjL' GLPj + PjHPj) -~ PjL' Gb

where Pj: U ~ R j is the orthogonal projection of U onto R j. Under the assumption (2.5), which will in general depend on the choice of basis {~bk}ff=l, we can now derive limj~ ~ [[fj-ul[ = 0, and so satisfy (2.4).

To verify (2.6), let u e E , ~ ( 0 , 1), and JlfE-ull ~>e. Then

uE), f (u ) ) = X(u - fie, Pf(u))

(PL'GL+PH)u+PL'Gb) by (3.5)

(PL' GLP + PHP) u + PL' Gb) as u s E

(PL' GLP -',- PHP) [u + (PL ' G L P + PHP) - 1 PL' Gb ])

(PL' GLP + PHP)(u - by (3.6)

(L' GLP + H)(u - re)) ~> 2rain(H) ][ u - - fE[I 2 >1 ;~2

r(~) in condition (2.6) to be 22e 2, where 2 = 2min(H ).

�89 - fiE, f (u ) )

= -

= (u - re,

= (U -- rE,

= ( U - - rE,

= ( U - - f ~ ,

= ( U - - rE,

hence we may take

860/t/2-6

Page 8: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

196 Goldstein

SO

Next, we show (2.7) is satisfied. Let u e E. Then

�89 = (PL ' GL + P H ) u + PL ' Gb

= (PL ' G L P + P H P ) u + PL ' Gb

= (PL ' G L P + P H P ) ( u - fiE),

= P(L ' GL + H)(u - fie)

as before

II~ef(u)lI 2 < I IL 'GL+ Hll 2 [lu - fell 2

= K I l u - fe l t 2

Lastly, we verify (2.12) and (2.13). Let 0n = J(u~)-g(un). As the W(.) process is independent of the W(.) process in previous iterations, and as we may construct the random direction dn independently of the W(.) process in the current iteration and of the past, we need only verify that E0 2 exists. We drop the subscript n, and using (3.3) and (3.4),

0=71+~2 where

71 = 2(GLu + GB, v), ;~2 = (v, Gv) - E(v, Gv)

We apply the inequalities

~b2~<2(72+72) and E72<.E(v, Gv)2 ~ltGllZ Eltvll 4

and using the bound (5.1.6) in Ref. 1, Fubini's theorem and the Schwarz inequality we see E llvll4< 00; similarly, using the fact that GLu + Gb e L~[0, 1 ], we may derive E72 < o0.

4. P R O O F O F T H E O R E M

Lemma 4.1. Let {Sn},~=I, {p.}~=~ be nonnegative B. measurable real-valued random variables defined on a probability space ((2, F, P), Bn an increasing family of sub e-algebras of F.

Suppose

E[s~]<oo and E [ s Z + t l B n ] ~ ( l + u 2 ) sn2+unZ__anpn a.s. (4.1)

where {u.}n~l and {an}n~176 1 a r e real sequences such that

a~>0, ~ a~=oo (4.2) n = [

Page 9: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

Minimizing Functionais in Hilbert Space 197

and

2 u, < ~ (4.3)

Then there exists a r a n d o m variable s such that s , ~ s a.s. as n ~ co and a n oo such that p ~ 0 a.s. as k ~ ~ . sequence of integers { k}k=l

Proof By Theo rem 1 in Ref. 11 we have that lim~ ~ ~ s~ exists and is finite and ~ ] ~ 1 a,p~ < ~ a.s. Since 5Z2=~ a~ = or, there exists a sub-

n ~ ~ 0 a.s. sequence { , } x = l such that P,k

The p roo f of Theorem 2.1 proceeds by showing that s~2-- i[x _ ~ , , i ]2 satisfies inequali ty (4. t ) with p , = 2 ( x , - Y~k,,+,, f ( x , ) ) so that we m a y apply L e m m a 4.1 to show that s , ~ 0 a.s., and then use assumpt ion (2.4) to conclude that this implies x , ~ ~ a.s.

Proof of Theorem 2.1. By (2.11 )

x ~ + l = x " - a " d " I g ( x ~ + c n d n ) - g ( x n - c " d " ) + 6 " ] 2 c ~

xn - a m d~ [ ( d . , f(xn)) + c. lid. It 2 h,, + t_ 2c . j

where rhnl ~ K a.s. by (2.1). Define M , to be the L(H) -va lued r a n d o m variable

Mnx--d~(d, , ,x) , x 6 H

N o w write

and so

(xn + 1 - s = (xn - Y~*~ - anMnf (x , ) - a ,c ,d , Ild, ll 2 h, 2cn

The compu ta t i on of 2 - S n + l = I Ix ,+l -xk~ 2 results in the following ten terms:

] I N n - - "~kn+ 111 2 (4 .4 )

2 2 +a, lIM, f(x,)[J (4.5)

Page 10: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

198 Goldstein

+ a~c~h2n IId~ll 6 (4.6)

2 2 anion + ~ IId.II 2 (4.7)

- 2an(xn -- s176 M . f ( x . ) ) (4.8)

- 2 a . G Ild.II 2 h . (x . - 2k.+~, d.) (4.9)

ant~n ---(x.-~k.+~,d.) (4.10) Cn

+ 2a2.c. IId.ll 2 h , ( M ~ f ( x , ) , d,,) (4.11)

a.~0,, + ( M , f ( x n ) , d~) (4.12)

Cn

+ a~lld,,I]4 hnO,, (4.13)

In order to apply Lemma 4.1, we need to show that the conditional expected value of 2 S,+l = IIx,+~-ffk,+~ll 2 with respect to B,, can be boun-

+ u ~ ) G + u 2 - a , p ~ , where ~ = t 2 ded above by (1 2 2 u , < o% and this will require that all terms (4.5)-(4.13) with the exception of (4.8) have con- ditional expected value with respect to B,, bounded by v](1 + I lx~- Jzk~ II 2)

2 2 where {v,},,= ~ is convergent. Toward this end, we collect some facts. o r V n

First, by (2,9), (2.10), and (2.12) we have that the operator M~ is self adjoint, M . = M . P k . + ~ , E [ M . IB.]=Pk.+~, and ItM.II--lld.tl 2. Further- more, a simple computation yields

+ ~ + m ) 2m/2F( k" 2

g [ tl d,, ll m ] -

and so in particular

We have also

by (2.17), and

IV kin~ 2 E[l ldnl lml ~..m,~n+ l

~ ~2 ~21.3 < OO (4.14) ~nt'nlX'n + 1 n=l

z 2 {4.15) ank,,+ 1 < ~

Page 11: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

Minimizing Functionals in Hilbert Space 199

a s

2 2 ~ a.kr ,., k3/2 ank,~+ 1 = ~n~n"n+ 1

n = l n : l Cn

n r ~ n + l ] 2 2 3 <~ E 'J ancnkn+ 1 <o0

n 1 n = 1

by (2.16) and (4.14); and lastly

by (4.15) and (2.14), and

by (4.16) and (2.8). We proceed with

(4.5): (4.5):

2 2 (4.16) a n C n k n + I < CO

n = l

, ,72r ].73/2 ~,,~,,",,+ 1 < ov ( ~.4.17.

the program outlined above beginning with term

E[a~ IIM. f(x~)l] 21 g .] = a~E[ IIM.Pk.+, f(x.)l121B~]

<<. a~E[tlM.[i 2 [IP~~

= a~E[IIMolI2] ItPk~ 2

~ Ka~k~+,(1 + IIx~- G~+~il 2)

by the above and (2.7)

SO

as M.=MnPko+

(4.6):

E[ a~ c~ h ~ IIdAI61B.] <~ K2 a2 c~ E[ ]ld.jl 6] < K t aZ c2 k 3 -~n--n " 'n + 1

(4.7): Using (2.12),

E[itd.ll u ~,~ I B . ] -- E[E[t[d.II 2 ~,~ [ F . ] lB. ] = E[~2E[IId.t[21F.] [B.]

= EEIId.l} z] E[O2. IB .]<~2k , ,+ , by 2.13

F a ~ ] a~ 2 2 B ] EL-~G2 [Id.ll21B. -=FG2E[t[d.ll ~ l . 2 2 kn+la n

4d

Page 12: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

200 Goldstein

(4.8):

E [ - 2an(x. - 2k.+~, M n f ( x . ) ) l B . ] = - 2 a n ( x . -- 2k.+~, E(Mn I B . ) f ( x n ) )

= - 2 a n ( x . - 2k.+1, Pk~ T(X.))

Define p . = 2(X.--2k.+,, P~.+~f(xn)) and note

p . = 2(Pk.+ ~(x. -- YCk. + ~), f ( xn ) )

=2(X . - -2k .+~ , f ( xn ) ) as x . ~ R k " ~ R k"+~

and so, by (2.6) Pn 1> 0 a.s. (4.9):

E[ - 2ancn LLdnll 2 hn(xn - YCk.§ ~, dn) l B . ]

<<. 2ancnE[l[d.llZlhnt I(Xn-- 2k.+~, d.)l [Bn]

~2Kanc.E[l ldnl l 3 Ilx. - 2k.+~ll [ B . ]

= 2Kanc. [Ix. --2k.+~]] g[lld.ll3] z a ~ v3/21(1 + IIx.-~~ 2) ~ t x l n t-nta-n +

(4.10): Since d. is independent of Fn and 0 . is F . measurable,

E[O.dn lBn] = E [ E E O . d . I Fn] t Bn]

= E[qJ.E[dnlFn] I Bn] = E[dn-I E[On lB.2 = o

and so, using the Bn measurabili ty of X.--2k~+~

~ ] an ~ E [ ~ n d n l B , , ] ) = O E _ - a n O " (xn dn)lBn = - - - ( x . - x . , , + , , [_ C n - - Xk"+~ ' C,,

(4.11): We write

E[2a~c,, lid.II 2 hn(Mnf(Xn), dn) t B . ]

= E[2a]cn Ildnll 2 h. ( (Mn - P k . + l ) f ( X n ) , dn) ] B . ]

+ E[_2a]cn ]ldnll 2 h.(Pk.+~f(xn), dn) I Bn]

The second term can be bounded as follows:

E[2aZe. IId,,ll 2 hn(p~,.+ ~ f (xn) , d.) l B.,]

2Ka~cnE[lldnll 3 IIPk.+~f(xn)ll lB . ]

= 2Ka2c,,E[lldnl[ 3] IIPk.+~f(x.)ll

r.- ..2o ~3/2 ~(1 + Ilx.-)7k.+,ll 2)

Page 13: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

Minimizing Funetionals in Hilbert Space

As for the first term, we begin by considering

E[ II(M~ - Pk.+,)f(x.)j[2lB.] = E[/I(Mn - e,.+~) P~~

Let

y=Pk.+~f(x.)eR k"< and y j = ( y , r

Omitting superscripts for notational convenience

d.=(q~,q2,...,ttk.+~) and ifl<~i,j~k~+~

(M,,r r = (dn(dn, Cj), r = (an, r ~b:) = qir/+

so that

and hence, using

Now

( ( M . - P , . + 0 r r = ,7i ,7j- a,.j

kn+ l y= E y,r j=l

kn+l ((M.-Pk.+z) y, r ~ yj(r/,r/j-a~.j)

j = l

201

kn+l ii(M_pk.+,) y[]2= ~ ((M.-P~.+~) y, r

i = l

kn+l k,~+l k~,+l = 2 E E (?litlJ,-~i.J,)(~lirlj2-ai, J2) YJl YJ2

i = i j i = l j 2 = l

Using that the qj are independent of Bn and that the yj are Bn measurable, it is easy to show that when taking the conditional expected value of the above expression with respect to B,,, terms where j , ~J2 vanish. Therefore

E[ N (Mn - Pk.+~) YII2 f B~]

- ~ - i~1 (r/i~j- 6~,j) 2

= E I T ~ : (?12-1)2 y 2 [ B n ] - ~ E I T ~ I (~i~j)2 y2]Bn]

kn+t kn+l = ~ 2y~+ Z Y]=Z[lYII2+(k.+I-1)[[y{I2=(k.+~ +1) 11y{[2

i= 1 i#j

Page 14: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

202 Goldstein

SO

E [ I I ( M ~ - PG+,) f (Xn)] I21Bn] <~ (kn+ 1 + 1) IlPk.+~f(x~)]l 2

Now we may bound the first terms above as follows:

E[2a~c. Itd.ll 2 h . ( (M. - Pk..~) f (x . ) , d.)

(4.12):

<~ 2Ka2GE[lld.ll 3" [[(mn- P~.+l) f(xn)lL2l O.3

<~ 2Ka2. c.(E[ rid.l] 6 t B.] . E[ ]l (M. - Pk,,+l) f(x.)N 21B.]) ~/2

~< K1 2 3/2 a . G k . + t [ (k .+ 1 + 1) IIPk.+if(x,,)ll2] t/2 ,-~2z, /2-3/2 ./e-l/2

2 2 t( 1 - I12) <~K3a.cnk.+ + IIx.-- xk.+~

F E _a~ O" ( M . f ( x . ) , d.) l B. = - - E[ ( f (x . ) , O.Mnd.) L B.] 1_ C n C.

2 = a~ ( f (x . ) , E[ tp .M.d . IB.])

Cn

But, as in (4.10)

E[ ~l nmndn [ Bn] =- E[ E[ ~ .M.d . IF.] lB.]

= E[~p.E[M.d. IF.] lB.]

=E[M.d.] E[O.I B.] =0

(4.13): Let

C . = a { d i , ~ : : l <~i<~n, l ~ < j ~ n - 1 }

B . = a { d i , ql~: l <<. i <~ n - 1}

F . = a { d ~ , ~ j : l ~ < i ~ < n - 1 , l <~ j<~n}

recall

and

by (2.13)

Since dn is independent of F,, by (2.12)

E[d/. I C.] = E[~On lB.] = 0

Page 15: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

Minimizing Functionals in Hilbert Space 205

As B. c C~, x , is C, measurable; from this we see that h,, is C~ measurable. Hence

E[a] Ildnll 4 ]ln@ n [ Bn] = a~E[E[IId.l[ 4 hn~l n I C n ] [ B , , ]

= a2,E[]ld, ll 4 hnE[r I C,]IBn] = 0

Collecting terms and using (2.16), (2.17), and (4.14)-(4.17) we may write

Efrlx.+~-2k.+~JlZjB,,]<~(1 +v 2) flx.-2~.+,ll; + v 2 - a . p . (4.18)

where

2 = 2(x. --2k. . j , f ( x . ) ) v. < ~176 Pn n--I

and a. is as given in Theorem 2.1. Let t ,= [!x,-2k~+l[P, s , = IJx,-2k,[[ and r , = Ir2k--Fck..l[]. Then

~,~=~ r , < o o by (2.5) and (2.8) and also

t .<~s .+r . (4.19)

Therefore t. 2 << ( l + 2 r . ) s ] + ( 2 r . + r 2 . ) , and using this inequality in (4.18) yields

Ers~+l t~n] < (1 + v~)(1 + 2r.) s ~.

+(1 +v2.)(2r.+r2.)+ 2 v. - an fl,,

~<(1 +u~) 2 sn + uTz - a n Pn

2 for some sequence {u,},~=l with ~22_~ u, < oo. As E[s 2] < oo trivially, we may now use Lemma 4.1 to conclude s, ~ s

a.s., and that there exists a subsequence nj such that Pnj--' 0 a.s. But as s, ,<~t,+r, we have, using (4.19), that Itn-S,t<<.r~ and so

t,---,s a.s. as well. If P ( s > 0 ) r then for some m and N the set

A = ~ {o)" 1/m<~t,<~m} n ~ N

has positive probability and on A for n >~ N. But

1 5 p" = (x. - x~.+t, f ( xn ) )

>~inf ( x - x k . + ~ , f ( x ) ) : x ~ R + , - - ~ I l x -~ , ,~ l l <~rn m

Page 16: Minimizing noisy functionals in hilbert space: An extension of the Kiefer-Wolfowitz procedure

204 Goldstein

by (2.6) wh ich c o n t r a d i c t s t ha t p , j ~ 0 wi th p r o b a b i l i t y o n e for s o m e sub-

s equence nj. H e n c e s = 0 a.s., a n d n o w by

(2.4) a n d (2.8) we c o n c l u d e x~ ~ 2 wi th p r o b a b i l i t y one as n - , oo.

A C K N O W L E D G M E N T

This r e sea rch was s u p p o r t e d in pa r t by the office of N a v a l R e s e a r c h

u n d e r c o n t r a c t N o . N00014-81 -K-0003 .

REFERENCES

1. Arnold, L. (1974). Stochastic Differential Equations, Wiley, New York. 2. Bertram J. (1973). Optimisation stochastique dans un espace de Hilbert, C. R. Acad. Sci.

Paris Ser. A 276, 613-616. 3. Dvoretzky, A. (1956). On stochastic approximation, Proc. Third Berkeley Syrup. Math.

Star. Prob. 1, pp. 39-55. 4. Kiefer, J., and Wolfowitz, J. (1952). Stochastic estimation of the maximum of a regression

function, Ann. Math. Stat. 23, 462466. 5, Kushner, H., and Clark, D. (1978). Stochastic approximation methods for constrained

and unconstrained systems, Applied Mathematical Sciences 25, Springer, Berlin. 6. Nevel'son, M., and Has'minskii, R. (1976). Stochastic approximation and Recursive

Estimation, American Mathematical Society, Providence, Rhode Island. 7. Nixdorf, R. (1984). An invariance principle for a finite dimensional stochastic

approximation method in a Hilbert space, J. Multivariate Anal. 15, 252-260. 8. Revesz, P. (1973). Robbins Monro procedure in a Hilbert space and its application in the

theory of learning processes I, Stud. Sci. Math. Hung. 8, 391-398. 9. Revesz, P. (1973). Robbins Monro procedure in a Hilbert space II, Stud. Sci. Math. Hung.

8, 469472. 10. Robbins, H., and Monro, S. (1951). A stochastic approximation method, Ann. Math. Stat.

22, 400~07. 11. Robbins, H., and Siegmund, D. (1971). A convergence theorem for non-negative almost

supermartingales and some applications, Optimizing Methods in Statistics, Academic Press, New York, Rustagi, pp. 233-257.

12. Solov, G. (1979). On a stochastic approximation theorem in a Hilbert space and its applications, Theory Prob, Appl. 24, 413419.

13. Venter, J. (1966). On Dvoretzky stochastic approximation theorems, Ann. Math. Stat. 37, 1534-1544.

14. Walk, H. (1977). An invariance principle for the Robbins-Monro process in a Hilbert space, Z. Wahrsch. verw. Gebiete 39, 135-150.

15. Walk, H. (1978). Martingales and the Robbins-Monro procedure in D[0, 1], J. Mul- tivariate Anal. 8, 430-452.