Efficient Monte Carlo for High Excursions by Adler, Blanchet and Liu

8/12/2019 Efficient Monte Carlo for High Excursions by Adler, Blanchet and Liu

1/46

Submitted to the Annals of Applied Probability

EFFICIENT MONTE CARLO FOR HIGH EXCURSIONS OF

GAUSSIAN RANDOM FIELDS

By Robert J. Adler Jose H. Blanchet and Jingchen Liu

Technion, Columbia, Columbia

Our focus is on the design and analysis of efficient Monte Carlo meth-ods for computing tail probabilities for the suprema of Gaussian randomfields, along with conditional expectations of functionals of the fields giventhe existence of excursions above high levels, b. Nave Monte Carlo takesan exponential, inb, computational cost to estimate these probabilities andconditional expectations for a prescribed relative accuracy. In contrast, ourMonte Carlo procedures achieve, at worst, polynomial complexity in b, as-suming only that the mean and covariance functions are H older continuous.We also explain how to fine tune the construction of our procedures in thepresence of additional regularity, such as homogeneity and smoothness, in

order to further improve the efficiency.

1. Introduction. This paper centers on the design and analysis of efficientMonte Carlo techniques for computing probabilities and conditional expectationsrelated to high excursions of Gaussian random fields. More specifically, supposethat f : T R is a continuous Gaussian random field over a d dimensionalcompact setT Rd. (Additional regularity conditions on Twill be imposed below,as needed.)

Our focus is on tail probabilities of the form

w(b) = P(maxtT

f(t)> b)(1.1)

and on conditional expectations

(1.2) E[(f)| maxtT

f(t)> b],

as b , where is a functional of the field, which, for concreteness we take tobe positive and bounded.

While much of the paper will concentrate on estimating the exceedance probabil-ity (1.1), it is important to note that our methods, based on importance sampling,are broadly applicable to the efficient evaluation of conditional expectations of theform (1.2). Indeed, as we shall explain at the end of Section 4, our approach to ef-ficient importance sampling is based on a procedure which mimics the conditional

Research supported in part by US-Israel Binational Science Foundation, 2008262, ISF 853/10,and NSF, DMS-0852227.

Research supported in part by DMS-0902075, CMMI-0846816, and CMMI-1069064.Research supported in part by Institute of Education Sciences, U.S. Department of Education,

through Grant R305D100017, and NSF CMMI-1069064.AMS 2000 subject classifications:Primary 60G15, 65C05; Secondary 60G60,62G32.Keywords and phrases: Gaussian random fields, high level excursions, Monte Carlo, tail distri-

butions, efficiency.

1
http://www.imstat.org/aap/http://www.imstat.org/aap/

2/46

2 ADLER, BLANCHET, LIU

distribution off, given that maxTf(t) > b. Moreover, once an efficient (in a pre-cise mathematical sense described in Section2) importance sampling procedure isin place, it follows under mild regularity conditions on that an efficient estimator

for (1.2) is immediately obtained by exploiting an efficient estimator for ( 1.1).The need for an efficient estimator of(b) should be reasonably clear. Suppose

one could simulatef sup

tTf(t)

exactly (i.e. without bias) via nave Monte Carlo. Such an approach would typicallyrequire a number of replications off which would be exponential in b2 to obtainan accurate estimate (in relative terms). Indeed, since in great generality (see [25])w(b) = exp(cb2+ o(b2)) asb for some c (0, ), it follows that the averageof n i.i.d. Bernoulli trials each with success parameter w(b) estimates w(b) witha relative mean squared error equal to n1/2(1 w(b))1/2/w(b)1/2. To control thesize of the error therefore requires1 n= (w(b)1), which is typically prohibitivelylarge. In addition, there is also a problem in that typicallyf cannot be simulatedexactly and that some discretization off is required.

Our goal is to introduce and analyze simulation estimators that can be appliedto a general class of Gaussian fields and that can be shown to require at most apolynomial number of function evaluations in b to obtain estimates with a prescribedrelative error. The model of computation that we use to count function evaluationsand the precise definition of an algorithm with polynomial complexity is given inSection2.Our proposed estimators are, in particular,asymptotically optimal. (Thisproperty, which is a popular notion in the context of rare-event simulation (cf.[7, 13]) essentially requires that the second moments of estimators decay at the sameexponential rate as the square of the first moments.) The polynomial complexityof our estimators requires to assume no more than that the underlying Gaussianfield is Holder continuous (see Theorem3.1 in Section3). Therefore, our methods

provide means for efficiently computing probabilities and expectations associatedwith high excursions of Gaussian random fields in wide generality.In the presence of enough smoothness, we shall also show how to design estima-

tors that can be shown to be strongly efficient, in the sense that their associatedcoefficient of variation remains uniformly bounded as b . Moreover, the associ-ated path generation of the conditional field (given a high excursion) can, basically,be carried out with the same computational complexity as the unconditional sam-pling procedure (uniformly inb). This is Theorem3.3 in Section3.

High excursions of Gaussian random fields appear in wide number of applications,including, but not limited to,

Physical oceanography: Here the random field can be water pressure or surfacetemperature. See[4] for many examples.

Cosmology: This includes the analysis of COBE and WMAP microwave data

on a sphere or galactic density data. e.g. [9, 23, 24].1Givenh and g positive, we shall use the familiar asymptotic notation h(x) =O(g(x)) if there

is c < such that h(x) cg(x) for all x large enough; h(x) = (g(x)) if h(x) cg (x) if x issufficiently large and h(x) =o(g(x)) as x ifh(x)/g(x) 0 as x ; and h(x) = (g(x))ifh(x) =O(g(x)) and h(x) = (g(x)).


3/46

MONTE CARLO FOR RANDOM FIELDS 3

Quantum chaos: Here random planar waves replace deterministic (but unob-tainable) solutions of Schrodinger equations. e.g. the recent review [14].

Brain mapping: This application is the most developed and very widely used.

For example, the paper by Friston et al [16] that introduced random fieldmethodology to the brain imaging community has, at the time of writing,over 4,500 (Google) citations.

Many of these applications deal with twice differentiable, constant variance ran-dom fields, or random fields that have been normalized to have constant variance,the reason being that they require estimates of the tail probabilities (1.1) and theseare only really well known in the smooth, constant (unit) variance case. In partic-ular, it is known that, with enough smoothness assumptions,

(1.3) lim infb

b2 log P(suptT

f(t) b) E(({t T : f(t) b})) 12

+ 1

22c,

where (A) is the Euler characteristic of the set A and the term 2

c

is relatedto a geometric quantity known as the critical radius of T and depends on thecovariance structure off. See [5,26] for details. Since both the probability and theexpectation in (1.3) are typically O(b exp(b2/2) for some 0 and large b, amore user friendly (albeit not quite as correct) way to write (1.3) is

(1.4) P(suptT

f(t) b) E(({t T : f(t) b})) 1 + O(ecb2),for somec.

The expectation in (1.3) and (1.4) has an explicit form that is readily computedfor Gaussian and Gaussian-related random fields of constant variance (see [5, 6] fordetails), although ifTis geometrically complicated or the covariance off highlynon-stationary there can be terms in the expectation that can only be evaluated

numerically or estimated statistically. (e.g. [2, 27]). Nevertheless, when available,(1.3) provides excellent approximations and simulation studies have shown thatthe approximations are numerically useful for quite moderate b, of the order of 2standard deviations.

However, as we have already noted, (1.3) holds only for constant variance fields,which also need to be twice differentiable. In the case of less smoothf, other classesof results occur, in which the expansions are less reliable and, in addition, typicallyinvolve the unknown Pickands constants (cf. [8,22]).

These are some of the reasons why, despite a well developed theory, Monte-Carlotechniques still have a significant role to play in understanding the behavior of Gaus-sian random fields at high levels. The estimators proposed in this paper basicallyreduce the rare event calculations associated to high excursions in Gaussian randomfields to calculations that are roughly comparable to the evaluation of expectations

or integrals in which no rare event is involved. In other words, the computationalcomplexity required to implement the estimators discussed here is similar in somesense to the complexity required to evaluate a given integral in finite dimension oran expectation where no tail parameter is involved. To the best of our knowledgethese types of reductions have not been studied in the development of numerical

4/46


methods for high excursions of random fields. This feature distinguishes the presentwork from the application of other numerical techniques that are generic (such asquasi-Monte Carlo and other numerical integration routines) and that in particular

might be also applicable to the setting of Gaussian fields.Contrary to our methods, which are designed to have provably good performance

uniformly over the level of excursion, a generic numerical approximation procedure,such as quasi-Monte Carlo, will typically require an exponential increase in thenumber of function evaluations in order to maintain a prescribed level of relativeaccuracy. This phenomenon is unrelated to the setting of Gaussian random fields. Inparticular, it can be easily seen to happen even when evaluating a one dimensionalintegral with a small integrand. On the other hand, we believe that our estimatorscan, in practice, be easily combined with quasi-Monte Carlo or other numericalintegration methods. The rigorous analysis of such hybrid approaches, although ofgreat interest, requires an extensive treatment and will be pursued in the future.As an aside, we note that quasi-Monte Carlo techniques have been used in theexcursion analysis of Gaussian random fields in[8].

The remainder of the paper is organized as follows. In Section 2 we introducethe basic notions of polynomial algorithmic complexity, which are borrowed fromthe general theory of computation. Section3 discusses the main results in light ofthe complexity considerations of Section2. Section4 provides a brief introductionto importance sampling, a simulation technique that we shall use heavily in thedesign of our algorithms. The analysis of finite fields, which is given in Section5,is helpful to develop the basic intuition behind our procedures for the general case.Section6provides the construction and analysis of a polynomial time algorithm forhigh excursion probabilities of Holder continuous fields. Finally, in Section 7, weadd additional smoothness assumptions along with stationarity and explain how tofine tune the construction of our procedures in order to further improve efficiencyin these cases.

2. Basic Notions of Computational Complexity. In this section we shalldiscuss some general notions of efficiency and computational complexity related tothe approximation of the probability of the rare events{Bb : b b0}, for whichP(Bb) 0 as b . In essence, efficiency means that computational complexityis, in some sense, controllable, uniformly in b. A notion that is popular in theefficiency analysis of Monte Carlo methods for rare events is weak efficiency (alsoknown as asymptotic optimality) which requires that the coefficient of variation ofa given estimator, Lb ofP(Bb), to be dominated by 1/P(Bb)

for any >0. More

formally, we have

Definition 2.1. A family of estimators{Lb: b b0} is said to be polynomiallyefficient for estimatingP(Bb) ifE(Lb) =P(Bb) and there exists aq(0, ) forwhich

(2.1) supbb0

V ar(Lb)

[P(Bb)]2 |log P(Bb)|q

< .

Moreover, if (2.1) holds withq= 0, then the family is said to be strongly efficient.

5/46


Below we often refer to Lb as a strongly (polynomially) efficient estimator, bywhich we mean that the family{Lb : b > 0} is strongly (polynomially) efficient.In order to understand the nature of this definition let

{L(j)

b , 1

j

n

} be a

collection of i.i.d. copies ofLb. The averaged estimator

Ln(b) = 1n

nj=1

L(j)b

has a relative mean squared error equal to [V ar(Lb)]1/2/[n1/2P(Bb)]. A simpleconsequence of Chebyshevs inequality is that

PLn(b) /P(Bb) 1 V ar (Lb)

2nP[(Bb)]2.

Thus, ifLb is polynomially efficient and we wish to compute P(Bb) with at most relative error and at least 1 confidence, it suffices to simulate

n= (2

1

|log P(Bb)|q

)i.i.d. replications ofLb. In fact, in the presence of polynomial efficiency, the boundn =

21 |log P(Bb)|q

can be boosted to n =

2 log(1) |log P(Bb)

q)using the so-called median trick (see [21]).

Naturally, the cost per replication must also be considered in the analysis, andwe shall do so, but the idea is that evaluating P(Bb) via crude Monte Carlo wouldrequire, given and , n= (1/P(Bb)) replications. Thus a polynomially efficientlyestimator makes the evaluation of P(Bb) exponentially faster relative to crudeMonte Carlo, at least in terms of the number of replications.

Note that a direct application of deterministic algorithms (such as quasi MonteCarlo or quadrature integration rules) might improve (under appropriate smooth-ness assumptions) the computational complexity relative to Monte Carlo, althoughonly by a polynomial rate (i.e. the absoluteerror decreases to zero at rate np for

p > 1/2, where n is the number of function evaluations and p depends on the di-mension of the function that one is integrating; see for instance[7]). We believe thatthe procedures that we develop in this paper can guide the construction of efficientdeterministic algorithms with small relative error and with complexity that scalesat a polynomial rate in|log P(Bb)|. This is an interesting research topic that weplan to explore in the future.

An issue that we shall face in designing our Monte Carlo procedure is that, dueto the fact that fwill have to be discretized, the corresponding estimatorLb willnot be unbiased. In turn, in order to control the relative bias with an effort that iscomparable to the bound on the number of replications discussed in the precedingparagraph, one must verify that the relative bias can be reduced to an amountless than with probability at least 1 at a computational cost of the formO(

q0

|log P(Bb)|q1

). IfLb() can be generated with O(q0 |log P(Bb)|q1) cost,and satisfying

P(Bb) ELb() P(Bb), and ifsupb>0

V ar(Lb())P(Bb)

2 |log P(Bb)|q <


6/46


for some q(0, ), thenLn(b, ) =nj=1 L(j)b ()/n, where theL(j)b ()s are i.i.d.copies of

Lb(), satisfies

P(|Ln(b, ) /P(Bb) 1| 2) V ar(Lb())2 n P(Bb)2

.

Consequently, takingn=

21 |log P(Bb)|q

suffices to give an estimator withat most relative error and 1 confidence, and the total computational cost is

2q01 |log P(Bb)|q+q1

.

We shall measure computational cost in terms of function evaluations such as asingle addition, a multiplication, a comparison, the generation of a single uniformrandom variable onT, the generation of a single standard Gaussian random variableand the evaluation of (x) for fixed x 0, where

(x) = 1

(x) = 1

2 x

es2/2 ds.

All of these function evaluations are assumed to cost at most a fixed amount c ofcomputer time. Moreover, we shall also assume that first and second order momentcharacteristics of the field, such as (t) = Ef(t) and C(s, t) = Cov (f(t) , f(s))can be computed in at most c units of computer time for eachs, t T. We note thatsimilar models of computation are often used in the complexity theory of continuousproblems, see [28].

The previous discussion motivates the next definition which has its roots in thegeneral theory of computation in both continuous and discrete settings, [ 20, 28].In particular, completely analogous notions in the setting of complexity theory ofcontinuous problems lead to the notion of tractability of a computational problem[31].

Definition 2.2. A Monte Carlo procedure is said to be a fully polynomialrandomized approximation scheme (FPRAS) for estimating P(Bb) if, for someq, q1, q2 [0, ), it outputs an averaged estimator that is guaranteed to have at most > 0relative error with confidence at least1 (0, 1)in(q1q2 | log P(Bb) |q)

function evaluations.

The terminology adopted, namely FPRAS, is borrowed from the complexitytheory of randomized algorithms for counting, [20]. Many counting problems canbe expressed as rare event estimation problems. In such cases it typically occursthat the previous definition (expressed in terms of a rare event probability) coincidesprecisely with the standard counting definition of a FPRAS (in which there is noreference to any rare event to estimate). This connection is noted, for instance, in

[11]. Our terminology is motivated precisely by this connection.By lettingBb= {f > b}, the goal in this paper is to design a class of fully poly-

nomial randomized approximation schemes that are applicable to a general class ofGaussian random fields. In turn, since our Monte Carlo estimators will be based onimportance sampling, it turns out that we shall also be able to straightforwardly

7/46


construct FPRASs to estimate quantities such as E[(f)| suptTf(t) > b] for asuitable class of functionals for which (f) can be computed with an error ofat most with a cost that is polynomial as function of1. We shall discuss this

observation in Section4, which deals with properties of importance sampling.

3. Main Results. In order to state and discuss our main results we need somenotation. For each s, t T define

(t) = E(f(t)), C (s, t) = Cov(f(s) , f(t)), 2(t) = C(t, t)> 0, r(s, t) = C(s, t)(s)(t)

.

Moreover, givenx Rd and > 0 we write|x| =dj=1 |xj |, where xj is the j -thcomponent ofx. We shall assume that, for each fixed s, t T, both (t) and C (s, t)can be evaluated in at most c units of computing time.

Our first result shows that under modest continuity conditions on , andr itis possible to construct an explicit FPRAS for w(b) under the following regularity

conditions:A1: The fieldf is almost surely continuous on T;A2: For some >0 and|s t| < , the mean and variance functions satisfies

| (t) (s)| + | (t) (s)| H|s t| ;A3: For some > 0 and|s s| < ,|t t| < the correlation function of f

satisfies|r (t, s) r (t, s)| H[|t t| + |s s| ];

A4: 0 T. There exist0 andd such that, for any t T and small enough,m(B(t, ) T) 0dd.

where m is the Lebesgue measure, B(t, ) ={

s : |

t

s|

}

and d

=m (B(0, 1)).

The assumption that 0 T is of no real consequence and is adopted only fornotational convenience.

Theorem 3.1. Suppose thatf : T R is a Gaussian random field satisfyingConditions A1A4 above. Then, Algorithm6.1provides a FPRAS forw (b).

The polynomial rate of the intrinsic complexity bound inherent in this result isdiscussed in Section6,along with similar rates in results to follow. The conditions ofthe previous theorem are weak, and hold for virtually all applied settings involvingcontinuous Gaussian fields on compact sets.

Not surprisingly, the complexity bounds of our algorithms can be improved upon

under additional assumptions onf. For example, in the case of finite fields (i.e. whenTis finite) with a non-singular covariance matrix we can show that the complexityof the algorithm is actually bounded as b . We summarize this in the nextresult, whose proof, which is given in Section 5, is useful for understanding themain ideas behind the general procedure.


8/46


Theorem 3.2. Suppose thatT is a finite set andfhas a non-singular covari-ance matrix overT T. Then Algorithm5.3provides a FPRAS withq= 0.

As we indicated above, the strategy behind the discrete case provides the basisfor the general case. In the general situation, the underlying idea is to discretize thefield with an appropriate sampling (discretization) rule that depends on the levelb and the continuity characteristics of the field. The number of sampling pointsgrows as b increases, and the complexity of the algorithm is controlled by findinga good sampling rule. There is a trade off between the number of points sampled,which has a direct impact on the complexity of the algorithm, and the fidelityof the discrete approximation to the continuous field. Naturally, in the presence ofenough smoothness and regularity, more information can be obtained with the samesample size. This point is illustrated in the next result, Theorem3.3,which considerssmooth, homogeneous fields. Note that in addition to controlling the error inducedby discretizing the field, the variance is strongly controlled and the discretization

rule is optimal, in a sense explained in Section 7. For Theorem3.3we require the

following additional regularity conditions.

B1: fis homogeneous and almost surely twice continuously differentiable.B2: 0T Rd is a d-dimensional convex set with nonempty interior. Denoting

its boundary byT, assume thatTis a (d1)-dimensional manifold withoutboundary. For any t T, assume that there exists 0 > 0 such that

m(B(t, ) T) 0d,

for any

9/46


where we assume that the probability measure Q is such that Q ( B) is abso-lutely continuous with respect to the measure P( B). If we use EQ to denoteexpectation under Q, then (4.1) trivially yields that the random variable

L () = ( B)dPdQ

()

is an unbiased estimator for P(B) > 0 under the measure Q, or, symbolically.EQL= P(B).

An averaged importance sampling estimator based on the measure Q, which isoften referred as an importance sampling distribution or a change-of-measure, isobtained by simulatingn i.i.d. copiesL(1),...,L(n) ofL underQ and computing theempirical averageLn= (L(1)+ ...+ L(n))/n. A central issue is that of selecting Q inorder to minimize the variance ofLn. It is easy to verify that ifQ() =P(|B) =P( B)/P(B) then the corresponding estimator has zero variance. However,Q isclearly a change of measure that is of no practical value, since P(B) the quantity

that we are attempting to evaluate in the first place is unknown. Nevertheless,when constructing a good importance sampling distribution for a family of sets{Bb: b b0}for which 0< P(Bb) 0 asb , it is often useful to analyze theasymptotic behavior ofQ as P(Bb) 0 in order to guide the construction of agoodQ.

We now describe briefly how an efficient importance sampling estimator forP(Bb) can also be used to estimate a large class of conditional expectations givenBb. Suppose that a single replication of the corresponding importance samplingestimator,

Lb= ( Bb) dP/dQ,

can be generated in O (log

|P(Bb)

|q0) function evaluations, for some q0 > 0, and

that

V ar (Lb) = O

[P(Bb)]2 log |P(Bb)|q0

.

These assumptions imply that by taking the average of i.i.d. replications ofLb weobtain a FPRAS.

Fix (0, ) and letX(, q) be the class of random variables X satisfying0 X with

E[ X| Bb] = [1/ log(P(Bb))q].(4.2)Then, by noting that

(4.3)

EQ (XLb)

EQ (Lb) =E[ X| Bb] =E[X; Bb]

P(Bb) ,

it follows easily that a FPRAS can be obtained by constructing the natural estima-tor forE[ X| Bb]; i.e. the ratio of the corresponding averaged importance samplingestimators suggested by the ratio in the left of (4.3). Of course, when X is difficult


10/46


to simulate exactly, one must assume the bias E[X; Bb] can be reduced in poly-nomial time. The estimator is naturally biased but the discussion on FPRAS onbiased estimators given in Section2 can be directly applied.

In the context of Gaussian random fields, we have that Bb ={f > b} andone is very often interested in random variables X of the form X = (f), where : C(T) R andC(T) denotes the space of continuous functions onT. EndowingC(T) with the uniform topology, consider functions that are non-negative andbounded by a positive constant. An archetypical example is given by the volumeof (conditioned) high level excursion sets with = m(T) is known to satisfy (4.2).However, there are many other examples ofX(, q) with =m(T) which satisfy(4.2) for a suitable q, depending on the regularity properties of the field. In fact, ifthe mean and covariance properties offare Holder continuous, then, using similartechniques as those given in the arguments of Section 6, it is not difficult to seethatqcan be estimated.

In case that (f) is not bounded, the analysis is usually case-by-case. In partic-ular, we need to investigate

EQ(2(f)L2b) = E(2(f)Lb|Bb)P(Bb).

We provide a brief calculation for the case of the conditional overshoot, that is(f) = f b and Bb ={f > b}. We admit the change of measure defined laterin (6.5). Then, given{f > b}, 2(f) and Lb are negatively correlated (the higherthe overshoot is, the larger the excursion set is) and we can obtain that

EQ(2(f)L2b) E(2(f)|Bb)E(Lb).Conditional on the occurrence of{f > b}, b(f) asymptotically follows an ex-ponential distribution. Therefore, E(2(f)|Bb) = (1 + o(1))E2((f)|Bb). Togetherwith the FPRAS ofLb in computing P(Bb), (f)Lb is an FPRAS to compute the

conditional overshoot. The corresponding numerical examples are given in Section8. Two key steps involve the analyses of the conditional correlation of 2(f) andLb and the conditional distribution of (f) givenBb.

Thus, we have that a FPRAS based importance sampling algorithm for w (b)would typically also yield a polynomial time algorithm for functional characteristicsof the conditional field given high level excursions. Since this is a very important,and novel, application, we devote the remainder of this paper to the developmentof efficient importance sampling algorithms for w(b).

5. The Basic Strategy: Finite Fields. In this section we develop our mainideas in the setting in which T is a finite set of the form T ={t1,...,tM}. Toemphasize the discrete nature of our algorithms in this section, we writeXi = f(ti)for 1,...,M and set X = (X1,...,XM). This section is mainly of an expository

nature, since much of it has already appeared in [3]. Nevertheless, it is includedhere as a useful guide to the intuition behind the algorithms for the continuouscase.

We have already noted that in order to design an efficient importance samplingestimator for w (b) = P(max1iMXi> b) it is useful to study the asymptotic


11/46


conditional distribution ofX, given that max1iMXi > b. Thus, we begin withsome basic large deviation results.

Proposition 5.1. For any set of random variablesX1, . . . , X M,

max1iM

P(Xi> b) P

max1iM

Xi > b

Mj=1

P(Xj > b) .

Moreover, if theXj are mean zero, Gaussian, and the correlation betweenXi andXj is strictly less than 1, then

P(Xi > b,Xj > b) = o (max[P(Xi> b), P(Xj > b)]) .

Thus, if the associated covariance matrix ofX is non-singular,

w (b) = (1 + o (1))

Mj=1

P(Xj > b) .

Proof. The lower bound in the first display is trivial, and the upper bound followsby the union bound. The second display follows easily by working with the jointdensity of a bivariate Gaussian distribution (e.g. [10, 19]) and the third claim is adirect consequence of the inclusion-exclusion principle.

As noted above,Q corresponds to the conditional distribution ofXgiven thatX max1iMXi > b. It follows from Proposition5.1that, conditional onX

>b, the probability that two or more Xj exceed b is negligible. Moreover, it alsofollows that

P( Xi = X| X > b) = (1 + o (1)) P(Xi > b)M

j=1 P(Xj > b).

The following corollary now follows as an easy consequence of these observations.

Corollary5.2.dTV(Q, Q) 0

asb , wheredTV denotes the total variation norm andQis defined, for BorelB RM, as

Q (X B) =Mi=1

p (i, b) P[ X B| Xi > b],

where

p (i, b) = P(Xi> b)M

j=1 P(Xj > b).


12/46


Proof. Pick an arbitrary Borel B . Then we have that

Q

(X B) =P[X

B, max1iMXi > b]

w (b) M

i=1

P[X

B, Xi > b]

w (b)

=Mi=1

P[X B|Xi> b] p (i, b)(1 + o (1))

.

The above, which follows from the union bound and the last part of Proposition5.1combined with the definition ofQ, yields that for each > 0 there exists b0(independent ofB ) such that, for all b b0,

Q (X B) Q (X B) /(1 ).The lower bound follows similarly, using the inclusion-exclusion principle and thesecond part of Proposition5.1.

Corollary5.2provides support for choosingQ as an importance sampling dis-tribution. Of course, we still have to verify that the corresponding algorithm is aFPRAS. The importance sampling estimator induced byQtakes the form

(5.1) Lb = dP

dQ=

Mj=1 P(Xj > b)Mj=1 (Xj > b)

.

Note that underQ we have that X > b almost surely, so the denominator in theexpression for Lb is at least 1. Therefore, we have that

EQL2b

M

j=1 P(Xj > b)2

,

and by virtue of Proposition5.1we conclude (using V arQ to denote the varianceunderQ) that

V arQ (Lb)

P(X > b)2 0,

asb . In particular, it follows thatLb is strongly efficient.Our proposed algorithm can now be summarized as follows.

Algorithm5.3. There are two steps in the algorithm:

Step 1: Simulaten i.i.d. copiesX(1), . . . , X (n) ofX from the distributionQ.Step 2: Compute and output

Ln= 1n

ni=1

L(i)b ,

whereL(i)b =

Mj=1 P(X

(i)j > b)/

Mj=1 (X

(i)j > b).


13/46


Since the generation ofLi underQ takes O

M3

function evaluations we con-clude, based on the analysis given in Section 2, that Algorithm 5.3 is a FPRASwithq= 0. This implies Theorem3.2,as promised.

6. A FPRAS for Holder Continuous Gaussian Fields. In this sectionwe shall describe the algorithm and the analysis behind Theorem3.1.Throughoutthe section, unless stated otherwise, we assume conditions A1A4 of Section3.

There are two issues related to the complexity analysis. Firstly, sincefis assumedcontinuous, the entire field cannot be generated in a (discrete) computer and so thealgorithm used in the discrete case needs adaptation. Once this is done, we need tocarry out an appropriate variance analysis.

Developing an estimator directly applicable to the continuous field will be car-ried out in Section 6.1. This construction will not only be useful when studyingthe performance of a suitable discretization, but will also help to explain some ofthe features of our discrete construction. Then, in Section6.2,we introduce a dis-cretization approach and study the bias caused by the discretization. In addition,we provide bounds on the variance of this discrete importance sampling estimator.

6.1. A Continuous Estimator. We start with a change of measure motivatedby the discrete case in Section5. A natural approach is to consider an importancesampling strategy analogous to that of Algorithm5.3. The continuous adaptationinvolves first sampling b according to the probability measure

(6.1) P(b ) = E[m (Ab )]E[m (Ab)]

,

whereAb = {t T : f(t)> b}. The idea of introducingbin the continuous settingis not necessarily to locate the point at which the maximum is achieved, as was thesituation in the discrete case. Rather, b will be used to find a random point whichhas a reasonable probability of being in the excursion set Ab. (This probability willtend to be higher iff is non-homogenous.) This relaxation will prove useful in theanalysis of the algorithm. Note thatb, with the distribution indicated in equation(6.1), has a density function (with respect to Lebesgue measure) given by

hb(t) =P(f(t)> b)

E[m(Ab)] ,

and that we also can write

E[m (Ab)] = E

T

(f(t)> b) dt=

T

P(f(t)> b) dt= m (T) P(f(U)> b) ,

whereU is uniformly distributed overT.Once b is generated, the natural continuous adaptation corresponding to the

strategy described by Algorithm5.3proceeds by samplingfconditional onf(b)>b. Note that if we use Q to denote the change-of-measure induced by such a con-tinuous sampling strategy, then the corresponding importance sampling estimatortakes the form

Lb=dP

dQ=

E[m (Ab)]

m (Ab) .


14/46


The second moment of the estimator then satisfies

EQ[(Lb)

2] = ELb; Ab= = E[m(Ab)]P(f > b) E[m(Ab)

1

|Ab

= ].(6.2)

Unfortunately, it is easy to construct examples for which EQ[

Lb2

] is infinite. Forinstance, consider a homogeneous and twice differentiable random field with zeromean and unit variance living on T= [0, 1]d. Using the Slepian model, discussed inSection7, it follows that the asymptotic distribution of the overshoot given {f > b}satisfies

b(f b) S,weakly as b where S is an exponential random variable. Consequently, thedistribution ofm(Ab) givenm(Ab)> 0 satisfies

m(Ab) bdSd/2,

for some constant . Therefore, the second moment in (6.2) is infinity as long asd 2. This example suggests that the construction of the change of measure needsto be modified slightly.

Extreme value theory considerations similar to those explained in the previousparagraph give that the overshoot offover a given level b will be of order (1/b).Thus, in order to keepbreasonably close to the excursion set, we shall also considerthe possibility of an undershoot of size (1/b) right at b. As we shall see, thisrelaxation will allow us to prevent the variance in (6.2) becoming infinite. Thus,instead of (6.1), we shall considerba/b with density

(6.3) hba/b(t) =P(f(t)> b a/b)

E[m(Aba/b)] ,

for somea >0. To ease on later notation, write

a,b b a/b, a,b =ba/b.LetQ be the change of measure induced by sampling fas follows. Given a,b ,

samplef(a,b) conditional on f(a,b)> a,b. In turn, the rest offfollows its con-ditional distribution (under the nominal, or original, measure) given the observedvaluef(a,b). We then have that the corresponding Radon-Nikodym derivative is

(6.4) dP

dQ =

E[m(Aa,b)]

m(Aa,b) ,

and the importance sampling estimator Lb is

(6.5) Lb= dPdQ (Ab= ) = E[m(Aa,b)]

m(Aa,b) (m(Ab)> 0) .

Note that we have used the continuity of the field in order to write{m (Ab)> 0} ={Ab=} almost surely. The motivation behind this choice lies in the fact that


15/46


sincem(Aa,b)> m (Ab)> 0, the denominator may now be big enough to controlthe second moment of the estimator. In particular, we consider the homogeneousand twice differentiable field mentioned previously. Given m(Ab) > 0, m(Aa,b)

is asymptotically lower bounded by ad/2bd. As we shall see, introducing theundershoot of size a/b will be very useful in the technical development both in theremainder of this section and in Section7. In addition, its introduction also providesinsight into the appropriate form of the estimator needed when discretizing the field.

6.2. Algorithm and Analysis. We still need to face the problem of generatingf in a computer. Thus we now concentrate on a suitable discretization scheme,still having in mind the change of measure leading to (6.4). Since our interest isto ultimately design algorithms that are efficient for estimating expectations suchas E[ (f) |f > b], where may be a functional of the whole field, we shall use aglobal discretization scheme.

ConsiderU = (U1,...,UM) where Ui are i.i.d. uniform random variables taking

values in T and independent of the field f. Set TM ={U1,...,UM} and Xi =Xi(Ui) = f(Ui) for 1 i M. Then X = (X1,...,Xm) (conditional on U) isa multivariate Gaussian random vector with conditional means (Ui)

= E(Xi|Ui)

and covariancesC (Ui, Uj) = C ov ( Xi, Xj | Ui, Uj). Our strategy is to approximatew(b) by

wM(a,b) = P(maxtTM

f(t)> a,b) = E[P( max1iM

Xi> a,b|U)].

Given the development in Section5, it might not be surprising that if we can ensurethatM =M(, b) is polynomial in 1/ and b, then we shall be in a good positionto develop a FPRAS. The idea is to apply an importance sampling strategy similarto that we considered in the construction of Lb of (6.5), but this time it will beconditional on U. In view of our earlier discussions, we propose sampling from Q

defined via

Q ( X B| U) =Mi=1

pU(i) P[ X B| Xi > a,b, U],

where

pU(i) = P( Xi> a,b| U)M

j=1 P( Xj > a,b| U).

We then obtain the (conditional) importance sampling estimator

(6.6) Lb(U) =

Mi=1 P( Xi > a,b| U)

Mi=1 (Xi> a,b)

Mmaxi=1

Xi> a,b

.

Note that the event {maxMi=1Xi> a,b} occurs with probability 1 under Q. There-fore, the indicator I(maxMi=1Xi > a,b) will be omitted when it does not causeconfusion. It is clear that

wM(a,b) = EQ [Lb(U)].


16/46


Suppose for the moment thatM, a and the number of replications n have beenchosen. Our future analysis will in particular guide the selection of these parameters.Then the procedure is summarized by the next algorithm.

Algorithm6.1. The algorithm has three steps:

Step 1:SimulateU(1),...,U(n) which areni.i.d. copies of the vectorU= (U1,...,UM)described above.Step 2: Conditional on eachU(i), for i= 1,...,n, generateL

(i)b (U

(i)) as described

by (6.6) by considering the distribution ofX(i)(U(i)) = (X(i)1 (U

(i)1 ),...,X

(i)M(U

(i)M)).

Generate theX(i)(U(i))independently so that at the end we obtain that theL(i)b (U

(i))aren i.i.d. copies ofLb(U).Step 3: Output

Ln(U

(1),...,U(n)) = 1

n

n

i=1L(i)b (U

(i)).

6.3. Running Time of Algorithm6.1: Bias and Variance Control. The remain-der of this section is devoted to the analysis of the running time of the Algorithm6.1. The first step lies in estimating the bias and second moment ofLb(U) underthe change of measure induced by the sampling strategy of the algorithm, whichwe denote by Q. We start with a simple bound for the second moment.

Proposition 6.2. There exists a finite0, depending onT= maxtT| (t)|and2T= maxtT

2 (t), for which

EQ

[Lb(U)2] 0M2P

maxtT

f(t)> b

2.

Proof. Observe that

EQ

[Lb(U)2] E

Mi=1

P( Xi> a,b| Ui)2 .

E M

i=1

suptT

P( f(Ui)> a,b| Ui = t)2

=M2maxtT

P(f(t)> a,b)2

0M2maxtT

P(f(t)> b)2

0M2PmaxtT

f(t)> b2

,

which completes the proof.


17/46


Next we obtain a preliminary estimate of the bias.

Proposition 6.3. For eachM

1 we have

|w (b) wM(a,b)| E

expMm Aa,b /m(T) ; Ab T=

+ P

maxtT

f(t)> a,b, maxtT

f(t) b

.

Proof. Note that

|w (b) wM(a,b)| P(max

tTMf(t) a,b, max

tTf(t)> b) + P(max

tTMf(t)> a,b, max

tTf(t) b).

The second term is easily bounded by

P(maxtTM

f(t)> a,b, maxtT

f(t) b) P(maxtT

f(t)> a,b, maxtT

f(t) b).

The first term can be bounded as follows.

P

maxtTM

f(t) a,b, maxtT

f(t)> b

E

(P[ f(Ui) a,b| f])M (Ab T= )

E

1 m(Aa,b)/m (T)

M; Ab T= )

Eexp M m(Aa,b)/m (T) ; Ab T= ) .

This completes the proof.

The previous proposition shows that controlling the relative bias of Lb(U) re-quires finding bounds for

(6.7) E

exp(Mm(Aa,b)/m(T)); Ab T=

and

(6.8) P(maxtT

f(t)> a,b, maxtT

f(t) b),

and so we develop these. To bound (6.7) we take advantage of the importancesampling strategy based on Q introduced earlier in (6.4). Write

Eexp Mm(Aa,b)/m (T) ; m(Ab)> 0(6.9)=EQ

expMm(Aa,b)/m (T)

m(Aa,b) ; m(Ab)> 0

Em

Aa,b

.


18/46


Furthermore, note that for each >0 we have

EQ exp Mm Aa,b /m (T)m Aa,b ; m(Ab)> 0(6.10)

1 exp(M/m(T))

+ EQ

expMm(Aa,b)/m(T)

m

Aa,b ; m(Aa,b) ; m(Ab)> 0

.

The next result, whose proof is given in Section 6.4, gives a bound for the aboveexpectation.

Proposition 6.4. Letbe as in Conditions A2 and A3. For anyv >0, thereexist constants, 2 (0, )(independent ofa (0, 1)andb, but dependent onv)such that if we select

1

d/

(b/a)(2+v)2d/

,and defineW such thatP(W > x) = exp

x/d forx 0, thenEQ

expMm(Aa,b)/m (T)

m(Aa,b) ; m(Aa,b) ; m(Ab)> 0

(6.11)

EW2 m (T)2d/2 M

b

a

4d/.

The following result gives us a useful upper bound on (6.8). The proof is givenin Section6.5.

Proposition 6.5. Assume that Conditions A2 and A3 are in force. For anyv >0, let= 2d/+ dv+ 1, whered is the dimension ofT. There exist constantsb0, (0, ) (independent of a but depending on T = maxtT| (t)|, 2T =maxtT

2 (t), v, the Holder parameters, andH) so that for all bb0 1 wehave

(6.12) P

maxtT

f(t) b + a/b| maxtT

f(t)> b

ab.

Consequently,

P

maxtT

f(t)> a,b, maxtT

f(t) b

=PmaxtT f(t) b| maxtT f(t)> a,bPmaxtT f(t)> a,b abP

maxtT

f(t)> a,b

.


19/46


Moreover,

PmaxtT f(t)> a,b (1 ab) PmaxtT f(t)> b .Propositions6.4and6.5allow us to prove Theorem 3.1, which is rephrased in

the form of the following theorem, which contains the detailed rate of complexityand so the main result of this section.

Theorem 6.6. Supposefis a Gaussian random field satisfying conditions A1A4 in Section3.Given anyv >0, puta= /(4b)(whereandas in Proposition6.5), and 1 = d/(b/a)(2+v)d/ Then, there exist c, 0 > 0 such that for all 0,

(6.13) |w (b) wM(a,b)| w (b) ,

ifM= c1(b/a)(4+4v)d/. Consequently, by our discussion in Section2and thebound on the second moment given in Proposition6.2, it follows that Algorithm6.1

provides a FPRAS with running timeO

(M)3 (M)2 21

.

Proof. Combining (6.3), (6.9) and (6.10) with Propositions6.36.5we have that

|w (b) wM(a,b) | 1 exp(M/m (T)) Em Aa,b

+E[W2] m (T)

2d/2 M

b

a

4d/E

m

Aa,b

+

ab

1 ab

w (b) .

Furthermore, there exists a constantK (0, ) such thatEm

Aa,b Kmax

tTP(f(t)> b) m (T) Kw (b) m (T)

Therefore, we have that

|w (b) wM(a,b) |w (b)

1Km(T)exp(M/m (T))

+E[W2]K m (T)

2

2d/2 M

b

a

4d/+

ab

1 ab

.

Moreover, sincea = /(4b), we obtain that, for 1/2,

|w (b) wM(a,b) |w (b)

1Km(T)exp(M/m (T))

+[EW2]K m (T)2

2d/2 M

b

a

4d/+ /2.

20/46


From the selection of, M and it follows easily that the first two terms on theright hand side of the previous display can be made less than /2 for all 0 bytaking0 sufficiently small

The complexity count given in the theorem now corresponds to the followingestimates. The factor O((M)

3) represents the cost of a Cholesky factorization re-

quired to generate a single replication of a finite field of dimension M. In addition,the second part of Proposition6.2 gives us that O(M221) replications are re-quired to control the relative variance of the estimator.

We now proceed to prove Propositions6.4 and 6.5.

6.4. Proof of Proposition6.4. We concentrate on the analysis of the left handside of (6.11). An important observation is that conditional on the random variablea,b with distribution

Q(a,b ) =Em(Aa,b )Em(Aa,b) ,

and, given f(a,b), the rest of the field, namely (f(t) : t T\{a,b}) is anotherGaussian field with a computable mean and covariance structure. The second termin(6.10) indicates that we must estimate the probability that m(Aa,b) takes smallvalues under Q. For this purpose, we shall develop an upper bound for

(6.14) P

m(Aa,b)< y1, m(Ab)> 0

f(t) = a,b+ z/a,b ,fory large enough. Our arguments proceeds in two steps. For the first, in order tostudy (6.14), we shall estimate the conditional mean covariance of{f(s) : s T},given that f(t) =a,b+z/a,b. Then, we use the fact that the conditional field isalso Gaussian and take advantage of general results from the theory of Gaussianrandom fields to obtain a bound for (6.14). For this purpose we recall some usefulresults from the theory of Gaussian random fields. The first result is due to Dudley[15].

Theorem 6.7. LetUbe a compact subset ofRn, and let{f0(t) : t U} be amean zero, continuous Gaussian random field. Define the canonical metricd onUas

d (s, t) =

E[f0(t) f0(s)]2and putdiam (U) = sups,tUd (s, t), which is assumed to be finite. Then there existsa finite universal constant > 0 such that

E[maxtU f0(t)] diam(U)/2

0 [log (N())]1/2

d,

where the entropyN() is the smallest number ofdballs of radius whose unioncoversU.


21/46


The second general result that we shall need is the so-called B-TIS (Borel-Tsirelson-Ibragimov-Sudakov) inequality[5, 12,29].

Theorem 6.8. Under the setting described in Theorem6.7,

P

maxtU

f0(t) E[maxtU

f0(t)] b

exp b2/(22U) ,where

2U= maxtU

E[f20 (t)].

We can now proceed with the main proof. We shall assume from now on thata,b = 0, since, as will be obvious from what follows, all estimates hold uniformlyover a,b T. This is a consequence of the uniform Holder assumptions A2 andA3. Define a new process

f

( f(t) : t T) L= ( f(t) : t T| f(0) =a,b+ z/a,b).Note that we can always writef(t) = (t) + g (t), whereg is a mean zero Gaussianrandom field on T. We have that

(t) = Ef(t) = (t) + (0)2 C(0, t) (a,b+ z/a,b (0)) ,and that the covariance function offis given by

Cg(s, t) = Cov (g (s) , g (t)) = C(s, t) (0)2C(0, s) C(0, t) .The following lemma describes the behavior of (t) and Cg(t, s).Lemma 6.9. Assume that|s| and|t| small enough. Then the following three

conclusions hold.(i) There exist constants0 and1 > 0 such that

| (t) (a,b+ z/a,b)| 0 |t| + 1 |t| (a,b+ z/a,b),and for allz (0, 1) anda,b large enough,

| (s) (t) | Ha,b|s t| .(ii)

Cg(s, t) 2H (t) (s) {|t| + |s| + |t s|}.(iii)

Dg(s, t) = E([g (t) g (s)]2) 1/21 |t s|/2 .Proof. All three consequences follow from simple algebraic manipulations. Thedetails are omitted.

22/46


Proposition 6.10. For anyv >0, there exist and2, such that for allt T,y/d a2+v

b(2+v), a sufficiently small, andz >0,

Pm(Aa,b)1 > y, m(Ab)> 0 f(t) = a,b+ z/a,b exp2a2y/d/b2 .Proof. For notational simplicity, and without loss of generality, we assume thatt= 0. First consider the case that z 1. Then there exist c1, c2 such that for allc2y

/d < b2v andz >1,

P

m

Aa,b T1

> y, m(Ab)> 0| f(0) =a,b+ z/a,b

P

inf|t| 0|f(0) =a,b+ z/a,b

P

inf|tt| b|f(0) =a,b+ z/a,b

P

sup

|st| a/b|f(0) =a,b+ z/a,b

.

Consider the new field (s, t) = g(s) g(t) with parameter space T T. Note that

V ar((s, t)) = Dg(s, t) 1|s t|/2.Via basic algebra, it is not hard to show that the entropy of(s, t) is bounded by

N() c3m(T T)/2d/. In addition, from (i) of Lemma6.9,we have| (s) (t) | Ha,b|s t|.


23/46


Similarly, for some >0 and ally/d 1

ab

2+v,a y, m(Ab)> 0|f(0) =a,b+ z/a,b

P

sup|st| a/b|f(0) =a,b+ z/a,b

=P

sup

|st| a

2b

exp

a2

c5b2y/d

.

Combining the two casesz >1 andz (0, 1) and choosingc5 large enough we have

P

m

Aa,b T

1

> y, m(Ab)> 0

f(0) =a,b+ z/a,b

exp a2c5b2y/d

,fora small enough and y/d 1

ab

2+v. Renaming the constants completes the

proof.

The final ingredient needed for the proof of Proposition6.4is the following lemmainvolving stochastic domination. The proof follows an elementary argument and istherefore omitted.

Lemma6.11. Letv1andv2be finite measures onR and definej(x) =x

vj(ds).Suppose that 1(x) 2(x) for each x x0. Let (h (x) : x x0) be a non-decreasing, positive and bounded function. Then,

x0

h (s) v1(dx) x0

h (s) v2(ds) .

Proof of Proposition 6.4. Note that

EQ

exp Mm Aa,b /m (T)m

Aa,b ; m Aa,b (0, ); m(Ab)> 0

=

T

z=0

Eexp M m Aa,b /m (T)

m (A,ab) ; m

Aa,b

(0, ); m(Ab)> 0 f(t) = a,b+

z

a,bP

a,b dt

P( a,b[f(t) a,b] dz| f(t)> a,b) .

supz>0,tT

Eexp Mm Aa,b /m (T)

m (A,ab) ; m

Aa,b

(0, ); m(Ab)> 0 f(t) = a,b+

z

a,b

.


24/46


Now define Y (b/a) = (b/a)2d/d/2 W with 2 as chosen in Proposition6.10

andWwith distribution given by P(W > x) = exp(x/d). By Lemma6.11,

suptT

Eexp Mm Aa,b /m (T)m (A,ab)

; m Aa,b (0, ); m(Ab)> 0 f(t) = a,b+

z

a,b

EY(b/a) exp(MY(b/a)1/m(T)); Y(b/a)> 1 .

Now let Zbe exponentially distributed with mean 1 and independent ofY(b/a).Then we have (using the definition of the tail distribution ofZ and Chebyshevsinequality)

expMY(b/a)1/m (T) = PZ > MY(b/a)1/m (T)Y (b/a)

m (T) Y (b/a)

M

.

Therefore,

E[expMY(b/a)1/m (T)Y(b/a); Y(b/a) 1]

m (T)M

E[Y(b/a)2; Y(b/a) 1]

m (T)M

2d/2

b

a

4d/E(W2),

which completes the proof.

6.5. Proof of Proposition 6.5. We start with the following result of Tsirelson[30].

Theorem6.12. Letfbe a continuous separable Gaussian process on a compact(in the canonical metric) domainT. Suppose thatV ar (f) = is continuous andthat (t) > 0 for t T. Moreover, assume that = Ef is also continuous and(t) 0 for all t T. Define

2T= max

tTVar(f(t)),

and setF(x) =P{maxtTf(t)x}. Then, F is continuously differentiable onR.Furthermore, lety be such thatF(y)> 1/2 and definey by

F(y) = (y).Then, for allx > y,

F(x)

xyy

xy

y (1 + 2) + 1

(1 + ),

25/46


where

= y2

x(x

y)y2

.

We can now prove the following lemma.

Lemma 6.13. There exists a constantA (0, ) independent ofa and b 0such that

(6.15) P

suptT

f(t) b + a/b| suptT

f(t)> b

aA P(suptTf(t) b 1/b)

P(suptTf(t) b) .

Proof. By subtracting inftT (t) > and redefining the level b to be binftT (t) we may simply assume that E f(t)0 so that we can apply Theorem6.12.Adopting the notation of Theorem6.12,first we pick b0 large enough so thatF(b0)> 1/2 and assume that b

b0+ 1. Now, let y = b

1/band F(y) = (y).

Note that there exists 0 (0, ) such that 0b y 10 b for all b b0. Thisfollows easily from the fact that

logP

suptT

f(t) > x

log sup

tTP {f(t) > x} x

2

22T.

On the other hand, by Theorem6.12 F is continuously differentiable, and so

(6.16) P

suptT

f(t) b

=

b+a/bb

F(x) dx

P{suptTf(t)> b}.

Moreover,

F(x)

1 xy

y xy

y (1 + 2 (x)) + 1

y

y(1 + (x))

(1 (y))

xyy

(1 + 2 (x)) + 1

y

y(1 + (x))

=P

maxtT

f(t)> b 1/b

xyy

(1 + 2 (x)) + 1

y

y(1 + (x)).

Therefore, b+a/bb

F(x) dx

P

suptT

f(t)> b 1/b b+a/b

b

xy

y (1 + 2 (x)) + 1

y

y(1 + (x)) dx.

Recalling that (x) = y2/[x(xy)y2], we can use the fact thaty 0bto concludethat ifx [b, b + a/b] then (x) 20 , and therefore b+a/b

b

xy

y (1 + 2 (x)) + 1

y

y(1 + (x)) dx 480 a.

26/46


We thus obtain that

(6.17) b+a/b

b F(x) dx

P{suptTf(t)> b} 4a8

0

P

{suptTf(t)> b

1/b

}P{suptTf(t)> b} .for anyb b0. This inequality, together with the fact that Fis continuously differ-entiable on (, ), yields the proof of the lemma for b 0.

The previous result translates a question that involves the conditional distri-bution of maxtTf(t) near b into a question involving the tail distribution ofmaxtTf(t). The next result then provides a bound on this tail distribution.

Lemma 6.14. For each v > 0 there exists a constant C(v) (0, ) (possiblydepending onv >0 but otherwise independent ofb) so that such that

PmaxtT f(t)> b C(v)b2d/+dv+1maxtT P(f(t)> b)for allb 1.

Proof. The proof of this result follows along the same lines of Theorem 2.6.2 in[6]. Consider an open cover of T =Mi=1Ti(), where Ti() ={s : |s ti| < }.We choose ti carefully such that N() = O(d) for arbitrarily small. Writef(t) = g (t) + (t), whereg (t) is a centered Gaussian random field and note, usingA2 and A3, that

P

maxtTi()

f(t)> b

P

maxtTi()

g (t)> b (ti) H

.

Now we wish to apply the Borel-TIS inequality (Theorem6.8) withU=Ti(),f0 =g,d (s, t) = E1/2([g (t) g (s)]2), which, as a consequence of A2 and A3, is boundedabove byC0 |t s|/2 for someC0 (0, ). Thus, applying Theorem6.7,we havethatEmaxtTi() g (t)C1/2 log(1/) for some C1 (0, ). Consequently, theBorel-TIS inequality yields that there exists C2(v) (0, ) such that for all bsufficiently large and sufficiently small we have

P

maxtTi()

g (t)> b (ti) H

C2(v)exp

b (ti) C1/(2+v)2

22Ti

,

whereTi = maxtTi() (t). Now select v > 0 small enough, and set /(2+v) =

b1. Straightforward calculations yield that

P maxtTi()

f(t)> b P maxtTi()

g (t)> b (ti) H C3(v) max

tTi()exp

(b (t))

2

2 (t)2


27/46


for some C3(v) (0, ). Now, recall the well known inequality (valid for x > 0)that

(x) 1x 1x3 1 (x) (x)x ,where = is the standard Gaussian density. Using this inequality it follows thatC4(v) (0, ) can be chosen so that

maxtTi()

exp

(b (t))

2

2 (t)2

C4(v) b max

tTiP(f(t)> b)

for all b 1. We then conclude that there existsC(v) (0, ) such that

P

maxtT

f(t)> b

N() C4(v) b max

tTP(f(t)> b)

Cdb max

tT

P(f(t)> b) = Cb2d/+dv+1 maxtT

P(f(t)> b) ,

giving the result.

We can now complete the proof of Proposition6.5.

Proof of Proposition6.5. The result is a straightforward corollary of the previoustwo lemmas. By (6.15) in Lemma6.13and Lemma6.14there exists (0, ) forwhich

P

maxtT

f(t) b + a/b|maxtT

f(t)> b

aA

P(maxtTf(t) b 1/b)P(max

tTf(t)

b)

aCAb2d/+dv+1maxtTP(f(t)> b 1/b)P(maxtTf(t) b)

aCAb2d/+dv+1maxtTP(f(t)> b 1/b)maxtTP(f(t)> b)

ab2d/+dv+1.

The last two inequalities follow from the obvious bound

P

maxtT

f(t) b

maxtT

P(f(t)> b)

and standard properties of the Gaussian distribution. This yields ( 6.12), from which

the remainder of the proposition follows.

28/46


7. Fine Tuning: Twice Differentiable Homogeneous Fields. In the pre-ceding section we constructed a polynomial time algorithm based on a randomizeddiscretization scheme. Our goal in this section is to illustrate how to take advantage

of additional information to further improve the running time and the efficiency ofthe algorithm. In order to illustrate our techniques we shall perform a more refinedanalysis in the setting of smooth and homogeneous fields and shall establish op-timality of the algorithm in a precise sense, to described below. Our assumptionsthroughout this section are B1B2 of Section3.

LetC(s t) = Cov(f(s), f(t)) be the covariance function off, which we assumealso has mean zero. Note that it is an immediate consequence of homogeneity anddifferentiability thatiC(0) =

3ijkC(0) = 0.

We shall need the following definition.

Definition 7.1. We callT ={t1,...,tM} T a-regular discretization ofTif, and only if,

mini=j |

ti

tj|

, suptT

mini |

ti

t|

2.

Regularity ensures that points in the gridTare well separated. Intuitively, sincef is smooth, having tight clusters of points translates to a waste of computingresources, as a result of sampling highly correlated values of f. Also, note thatevery region containing a ball of radius 2 has at least one representative inT.Therefore,Tcovers the domainTin an economical way. One technical convenienceof-regularity is that for subsets ATthat have positive Lebesgue measure (inparticular ellipsoids)

limM

#(A T)M

=m(A)

m(T),

where here and throughout the remainder of the section #(A) denotes the cardi-

nality of the set A.LetT = {t1,...,tM} be a-regular discretization ofT and consider

X= (X1,...,XM)T

= (f(t1) ,...,f(tM))T .

We shall concentrate on estimating wM(b) = P(max1iMXi > b). The next result(which we prove in Section7.1) shows that if = /bthen the relative bias isO (

).

Proposition 7.2. Supposef is a Gaussian random field satisfying conditionsB1 and B2. There exist c0, c1, b0, and 0 such that, for any finite /b-regular

discretizationT ofT,(7.1) P(sup

tTf(t)< b| sup

tTf(t)> b) c0

, and #(

T) c1m(T)dbd,

for all (0, 0] andb > b0.

Note that the bound on the bias obtained for twice differentiable fields is muchsharper than that of the general Holder continuous fields given by(6.13) in Theorem


29/46


6.6. This is partly because the conditional distribution of the random field aroundlocal maxima is harder to describe in the Holder continuous than in the case oftwice differentiable fields. In addition to the sharper description of the bias, we

shall also soon show in Theorem7.4that our choice of discretization is optimal in acetain sense. Finally, we point out that the bound of

in the first term of (7.1) is

not optimal. In fact, there seems to be some room of improvement, and we believethat a more careful analysis might yield a bound of the form c0

2.We shall estimatewM(b) by using a slight variation of Algorithm5.3.In partic-

ular, since the Xis are now identically distributed, we redefine Q to be

(7.2) Q (X B) =Mi=1

1

MP[ X B| Xi > b 1/b].

Our estimator then takes the form

(7.3) Lb= M P(X1> b 1/b)Mj=1 (Xj > b 1/b)

( max1iMXi > b).

Clearly, we have that EQ(Lb) =wM(b). (The reason for subtracting the factor of1/bwas explained in Section6.1.)

Algorithm 7.3. Given a number of replications n and an /b-regular dis-

cretizationTthe algorithm is as follows:Step 1:SampleX(1),...,X(n) i.i.d. copies ofXwith distributionQ given by (7.2).Step 2: Compute and output

Ln= 1

n

n

i=1 L(i)b ,

where L(i)b = M P(X1 > b 1/b)Mj=1 (X

(i)j > b 1/b)

( max1iM

X(i)j > b).

Theorem 7.5 later guides the selection of n in order to achieve a prescribedrelative error. In particular, our analysis, together with considerations from Section2, implies that choosing n = O

21

suffices to achieve relative error with

probability at least 1 .Algorithm 7.3 improves on Algorithm 6.1 for Holder continuous fields in two

important ways. The first aspect is that it is possible to obtain information on thesize of the relative bias of the estimator. In Proposition 7.2, we saw that in order

to overcome bias due to discretization, it suffices to take a discretization of sizeM= #( T) = (bd). That this selection is also asymptotically optimal, in the sensedescribed in the next result, will be proven in Section 7.1.


30/46


Theorem 7.4. Supposef is a Gaussian random field satisfying conditions B1and B2. If (0, 1), then, asb ,

sup#(T)bd P(suptT f(t)> b| suptT f(t)> b) 0.

This result implies that the relative bias goes to 100% as b if one chooses adiscretization scheme of size O

bd

with (0, 1). Consequently,d is the smallestpower of b that achieves any given bounded relative bias, and so the suggestionabove of choosingM=O

bd

points for the discretization is, in this sense, optimal.The second aspect of improvement involves the variance. In the case of Holder

continuous fields, the ratio of the second moment of the estimator and w (b)2

wasshown to be bounded by a quantity that is of order O(M2). In contrast, in thecontext of smooth and homogeneous fields considered here, the next result showsthat this ratio is bounded uniformly for b > b0 and M= #(

T)cbd. That is, the

variance remains strongly controlled.

Theorem 7.5. Supposef is a Gaussian random field satisfying conditions B1and B2. Then there exist constants c, b0, and 0 such that for any /b-regular

discretizationT ofTwe havesup

b>b0,[0,0]

EQL2bP2 (suptTf(t)> b)

supb>b0,[0,0]

EQL2bP2

suptTf(t)> b c.

for somec (0, ).

The proof of this result is given in Section 7.2. The fact that the number ofreplications remains bounded in b is a consequence of the strong control on the

variance.Finally, we note that the proof of Theorem3.3follows as a direct corollary of The-orem7.5together with Proposition7.2and our discussion in Section 2. Assumingthat placing each point inTtakes no more than c units of computer time, the totalcomplexity is, according to the discussion in section Section 2, O

nM3 + M

=

O

21M3 + M

. The contribution of the term M3 =O

6db3d

comes fromthe complexity of applying Cholesky factorization and the term M = O(2dbd)

corresponds to the complexity of placingT.Remark 7.6. Condition B2 imposes a convexity assumption on the boundary

ofT. This assumption, although convenient in the development of the proofs of The-orems7.4and7.5, is not necessary. The results can be generalized, at the expenseof increasing the length and the burden in the technical development, to the case

in whichT is ad-dimensional manifold satisfying the so-called Whitney conditions([5]).

The remainder of this section is devoted to the proof of Proposition7.2,Theorem7.4and Theorem7.5.


31/46


7.1. Bias Control: Proofs of Proposition7.2and Theorem 7.4. We start withsome useful lemmas, for all of which we assume that Conditions B1 and B2 aresatisfied. We shall also assume that the global maximum off over T is achieved,

with probability one, at a single point in T. Additional conditions under which thiswill happen can be found in [5] and require little more than the non-degeneracyof the joint distribution of f and its first and second order derivatives. Of theselemmas, Lemma7.7, the proof of which we defer to Section 7.3,is central to muchof what follows. However, before we state it we take a moment to describe Palmmeasures, which may not be familiar to all readers.

7.1.1. Palm Distributions and Conditioning. It is well known that one needsto be careful treating the distributions of stochastic processes at random times.For a simple example, in the current setting, consider the behavior of a smoothstationary Gaussian process f on R along with its derivative f. IftR is a fixedpoint, u > 0, and we are given that f(0) = 0 and f(t) = u, then the conditionaldistribution off(t) is still Gaussian, with parameters determined by the trivariatedistribution of (f(0), f(t), f(t)). However, if we are given that f(0) = 0, and thatt > 0 is the first positive time that f(t) = u, then t is an upcrossing of the levelu by f, and so f(t) must be positive. Thus it cannot be Gaussian. The differencebetween the two cases lies in the fact that in the first case t is deterministic, whilein the second it is random.

We shall require something similar, conditioning on the behavior of our (Gaus-sian) random fields in the neighborhood of local maxima. Since local maxima arerandom points, given their positions the distribution of the field is no longer station-ary nor, once again, even Gaussian. We often shall assume for that a local maximumis at the origin. This, however, amounts to saying that the point-process inducedby the set of local maxima is Palm stationary (as opposed to space stationary) andtherefore we must then use the associated Palm distribution; the precise conditional

distribution of the field given the value of the local maximum at the origin is givenin Lemma7.11. The precise distribution is given in Lemma7.11The theory behind this goes by the name ofhorizontal-vertical window condi-

tioningand the resulting conditional distributions are known as Palm distributions.Standard treatments are given, for example, in [1, 6, 17, 18, 19]. To differentiatebetween regular and Palm conditioning, we shall denote the latter byP.

We can now set up two important lemmas which tells us about the behavior offin the neighborhood of local and global maxima. Proofs are deferred until Section7.3. First, we provide some notations.

Let L be the (random) set of local maxima off. That is, for each sin the interiorofT, s L if and only if(7.4) f(s) = 0, and2f(s) N,

whereNis the set of negative definite matrices and2f(s) is the Hessian matrixoff at s. For s T, similar constraints apply and are described in the proof ofLemma7.7.Then we have

32/46


Lemma7.7. LetLbe the set of local maxima off. For anya0 > 0, there existsc, , b0, and 0 (which depend on the choice of a0), such that for any s L,a

(0, a0),

(0, 0), b > b0, z > b + a/b

P( min|ts| b0,

P( min|tt| b + a/b) 2c exp

2

.

7.1.2. Back to the Proofs. The following lemma gives a bound on the densityof suptTf(t), which will be used to control the size of overshoot beyond level b.

Lemma7.9. Letpf(x)be the density function ofsuptTf(t). Then there existsa constantcf andb0 such that

pf(x) cfxd+1P(f(0)> x),for allx > b0.

Proof. Recalling (1.3), let the continuous function pE(x), x R, be defined bythe relationship

E(({t T : f(t) b})) = b

pE(x) dx,

where the left hand side is the expected value of the Euler-Poincare characteristicofAb. Then, according to Theorem 8.10 in [8], there exists c and such that

|pE(x) pf(x)| < cP(f(0)> (1 + )x),for all x >0. In addition, thanks to the result of[5] which provides

b

pE(x)dxinclosed form, there exists c0 such that, for all x >1,

pE(x)< c0xd+1P(f(0)> x).

Hence, there exists cf such that

pf(x) c0xd+1P(f(0)> x) + cP(f(0)> (1 + )x) cfxd+1P(f(0)> x),

for all x >1.

The last ingredients required to provide the proof of Proposition7.2and Theorem7.4are stated in the following result, adapted from Lemma 6.1 and Theorem 7.2 in[22] to the twice differentiable case.

33/46


Theorem 7.10. There exists a constantH(depending on the covariance func-tionC, such that

P(suptT

f(t)> b) = (1 + o (1))Hm(T)bdP(f(0)> b),(7.6)

asb .Similarly, choosesmall enough so that[0, ]d T, and let0 = [0, b1]d. Then

there exists a constantH1 such that

P(supt0

f(t)> b) = (1 + o(1))H1P(f(0)> b),(7.7)

asb .

We now are ready to provide the proof of Proposition 7.2 and Theorem7.4.

Proof of Proposition 7.2. The fact that there exists c1 such that

#(T) c1m(T)dbdis immediate from assumption B2. Therefore, we proceed to provide a bound forthe relative bias. Note firstly that elementary conditional probability manipulationsyield that, for any >0,

P(suptT f(t)< b| suptT f(t)> b)

P(suptT

f(t) b) + P(suptT f(t)< b| suptT f(t)> b + 2

/b).

By (7.6) and Lemma7.9, there existsc2such that, for large enoughb, the first termabove can bounded by

P(suptT

f(t) b) c2.

Now take < 0 < 20 where 0 is as in Lemmas 7.7 and 7.8.Then, applying (7.5),

the second term can be bounded by

P(suptT f(t)< b| suptT f(t)> b + 2

/b)

P( sup|tt| b + 2

)

2c exp 1 .Hence, there exists ac0 such that

P(suptT f(t)< b| suptT f(t)> b) c2 + 2ce/ c0,

for all (0, 0).

34/46


Proof of Theorem 7.4. We write = 1 3 (0, 1). First note that, by (7.6),P(sup

T

f(t)> b + b21

|supT

f(t)> b)

0,

as b . Let t be the position of the global maximum of f in T. Accordingto the exact Slepian model in Section 7.3 and an argument similar to the proof ofLemmas7.7 and 7.8.

(7.8) P( sup|tt|>b21

f(t)> b|b < f(t) b + b21) 0,

asb . Consequently,P( sup

|tt|>b21f(t)> b| sup

Tf(t)> b) 0.

Let

B( T , b21) = tTB(t, b21).We have,

P(supT f(t)> b| supT f(t)> b) P( sup

|tt|>b21f(t)> b| sup

Tf(t)> b)

+ P(supT f(t)> b, sup|tt|>b21 f(t) b| supT f(t)> b) o(1) + P(t B( T , b21)| sup

Tf(t)> b)

o(1) + P( supB(T,b21) f(t)> b| supT f(t)> b).

Since #( T)b(13)d, we can find a finite set T ={t1,...,tl} Tand let k =tk + [0, b

1] such that l = O(b(1)d) and B(T , b21) lk=1k. The choice ofl only depends on #( T), not the particular distribution ofT. Therefore, applying(7.7),

sup#(Tbd)P( supB(T ,b21) f(t)> b) O(b(1)d)P(f(0)> b).

This, together with (7.6), yields

sup#(

Tbd)

P( supB(

T,b21)

f(t)> b| supT

f(t)> b) O(bd) = o(1),

forb b0, which clearly implies the statement of the result.

35/46


7.2. Variance Control: Proof of Theorem7.5. We proceed directly to the proofof Theorem7.5.

Proof of Theorem 7.5. Note that

EQL2bP2 (suptTf(t)> b)

= E(Lb)

P2 (suptTf(t)> b)

=

E

MP(X1>b1/b)

n

j=1 (Xj>b 1b )

;maxjXj > b

P2 (suptTf(t)> b)

=

E

MP(X1>b1/b)

n

j=1 (Xj>b 1b )

;maxjXj > b, suptTf(t)> b

P2

(suptTf(t)> b)

=

E

MP(X1>b1/b) (maxjXj>b)

n

j=1

(Xj>b1/b)

suptTf(t)> bP(suptTf(t)> b)

=E

M (maxjXj > b)n

j=1 (Xj > b 1/b)

suptT f(t)> b

P(X1 > b 1/b)P(suptTf(t)> b)

.

The remainder of the proof involves showing that the last conditional expectationhere is of order O(bd). This, together with (7.6), will yield the result. Note that foranyA (b, ) such that A (b, ) = (M) uniformly over b and we can write

EM (maxjXj > b)M

j=1

Xj > b 1b suptT f(t)> b

(7.9)

EM

Mj=1 (Xj > b 1/b) MA(b,)

M

j=1

Xj > b 1b

suptT f(t)> b

+ E

M

1 Mj=1 (Xj > b 1/b)< MA(b,)Mj=1

Xj > b 1b

suptT f(t)> b

.We shall selectA(b, ) appropriately in order to bound the expectations above. By

36/46


Lemma7.8, for any 4 0, there exist constantsc andc (0, ), such that

c

exp

2 P min|tt|/b f(t) b P

Mj=1

(Xj > b 1/b) cd/d suptT f(t)> b

P

MM

j=1 (Xj > b 1/b) b

dc

d

suptT f(t)> b

.(7.10)

The first inequality is an application of Lemma7.8. The second inequality is dueto the fact that for any ball B of radius 4 or larger, #( T B)cdd for somec >0. Inequality (7.10) implies that for all x such thatbdc/d0 < x 0 such that

P MMj=1

Xj > b 1b

bd

x suptT f(t)> b c expx2/d .Now let A (b, ) = bdc/(4dd) and observe that by the second result in (7.1) wehave A (b, ) = (M) and, moreover, that there exists c3 such that the first termon the right hand side of (7.9) is bounded by

(7.11) E

M M

j=1 (Xj > b 1/b) MA(b,)

Mj=1 (Xj > b 1/b)

supT f(t)> b c3bd.

Now we turn to the second term on the right hand side of ( 7.9). We use the factthatM/A(b, ) c (0, ) (uniformly asb and 0). There existc4 andc5 such that, for 0/c4,

E

M

1 Mj=1 Xj > b 1b< MA(b,)Mj=1

Xj > b 1b

supT f(t)> b

(7.12) MP

nj=1

Xj > b 1

b

< c

supT f(t)> b

MP

min|tt|c4/b

f(t) b

c1m(T)bdd exp

c242

c5bd.

The second inequality holds from the fact that ifM

j=1

Xj > b 1b

is less thanc, then the minimum offin a ball around the local maximum and of radius c4/b

must be less than b 1/b. Otherwise, there are more thanc elements ofT insidesuch a ball. The last inequality is due to Lemma7.8 and Theorem7.2.

37/46


Putting (7.11) and (7.12) together we obtain, for all /b -regular discretizationswith < 0 = min(1/4, 1/c4)0,

EQL2bP(suptTf(t)> b)

E M (maxjXj > b)Mj=1

Xj > b 1b

suptT f(t)> b P(X1 > b 1/b)P(suptTf(t)> b) (c3+ c5)b

dP(X1 > b 1/b)P(suptTf(t)> b)

.

Applying now (7.6) and Proposition7.2,we have that

P(suptT

f(t)> b) b)1 c0 ,

we have

supb>b0,[0,0]

EQ

L2b

PsuptTf(t)> b < as required.

7.3. Remaining proofs. We start with the proof of Lemma7.7. Without loss ofgenerality, we assume that the random field of that result has mean zero and unitvariance. However, before getting into the details of the proof of Lemma 7.7, weneed a few additional lemmas, for which we adopt the following notation: Let CiandCij be the first and second order derivatives ofC, and define the vectors

1(t) = (C1(t), ...,Cd(t)),2(t) = vech ((Cij(t), i= 1,...,d,j= i,...,d)) .

Letf(0) andf(0) be the gradient and vector of second order derivatives offat 0,wheref(0) is arranged in the same order as 2(0). Furthermore, let 02 = 20 bea vector of second order spectral moments and 22 a matrix of fourth order spectralmoments. The vectors 02 and22 are arranged so that 1 0 020 0

20 0 22

is the covariance matrix of (f(0), f(0), f(0)), where = (Cij(0)). It then followsthat

20 = 22 2002be the conditional variance off(0) given f(0). The following lemma, given in[6],provides a stochastic representation of the f given that it has a local maxima atlevel u at the origin. We emphasize that, as described above, the conditioning hereis in the sense of Palm distributions. The resultant conditional, or model process(7.13) is generally called a Slepian process.

38/46


Lemma 7.11. Given that f has a local maximum with height u at zero (aninterior point ofT), the conditional field is equal in distribution to

fu(t) uC(t) Wu (t) + g (t) .(7.13)g(t) is a centered Gaussian random field with covariance function

(s, t) = C(s t) (C(s), 2(s))

1 0220 22

1 C(t)2(t)

1(s)11(t),

andWu is a d(d+1)

2 random vector independent ofg (t) with density function

(7.14) u(w) |det(r(w) u) | exp

12

w120w

(r(w) u N) ,

wherer(w)is ad dsymmetric matrix whose upper triangular elements consist ofthe components ofw. The set of negative definite matrices is denoted by

N. Finally,

(t) is defined by

( (t) , (t)) = (C(t) , 2(t))

1 0220 22

1.

The following two technical lemmas, which we shall prove after completing theproof of Lemma7.7,provide bounds for the last two terms of (7.13).

Lemma 7.12. Using the notation in (7.13), there exist 0, 1, c1, andb0 suchthat, for anyu > b > b0 and (0, 0),

P

sup|t|a/b

Wu (t)

>

a

4b c1exp

1b2

4 .

Lemma 7.13. There existc,, and0 such that, for any (0, 0),P

max|t|a/b

|g(t)| > a4b

c exp

2

.

Proof of Lemma 7.7 Using the notation of Lemma7.11, given any s L forwhichf(s) = b, we have that the corresponding conditional distribution ofs is thatoffb( s). Consequently, it suffices to show that

P min|t|a/b fb(t)< b a/b c exp

2 .

We consider firstly the case for which the local maximum is in the interior ofT.Then, by the Slepian model (7.13),

fb(t) = bC(t) Wb (t) + g (t) .

39/46


We study the three terms of the Slepian model individually. Since,

C(t) = 1

tt + o|t|2 ,there exists a 0 such that

bC(t) b a4b

,

for all|t| < 0a/b. According to Lemmas7.12and 7.13,for a4b

c exp

2+ c1exp1b2

4 c exp

2

,

for somec and.Now consider the case for which the local maximum is in the (d1)-dimensional

boundary ofT. Due to convexity ofTwe can assume, without loss of generality,that the tangent space of T is generated by/t2, . . . , / td, the local maximumis located at the origin, and T is a subset of the positive half-plane t1 0. Thatthese arguments to not involve a loss of generality follows from the arguments onpages 291192 of[5], which rely on the assumed stationarity off(for translations)and the fact that rotations, while changing the distributions, will not affect theprobabilities that we are currently computing.)

For the origin, positioned as just described, to be a local maximum it is necessaryand sufficient that the gradient offrestricted to Tis the zero vector, the Hessianmatrix restricted to T is negative definite, and 1f(0) 0. Applying a version ofLemma7.11for this case, conditional on f(0) =u and 0 being a local maximum,the field is equal in distribution to

(7.15) uC(t) Wu(t) + 1(t)1 (Z, 0, ..., 0)T + g(t),whereZ0 corresponds to 1f(0) and it follows a truncated (conditional on thenegative axis) Gaussian random variable with mean zero and a variance parameterwhich is computed as the conditional variance of1f(0) given (2f(0) ,...,df(0)).

The vector

Wu is a (d(d + 1)/2)-dimensional random vector with density function

u(w) |det(r(w) u)| exp(12

w120w) (w u N),

where is the second spectral moment of f restricted to T and r(w) is the(d 1) (d 1) symmetric matrix whose upper triangular elements consist of the


40/46


components ofw. In the representation (7.15) the vectorWuandZare independent.As in the proof of Lemma 7.12, one can show that a similar bound holds, albeitwith with different constants. Thus, since 1(t) = O (t), there existc and such

that the third term in (7.15) can be bounded by

P[ max|t|a/b

1(t) 1 (Z, 0, ..., 0)T a/(4b)] c exp2

.

Consequently, we can also find c and such that the conclusion holds, and weare done.

Proof of Lemma 7.8. Recall that t is the unique global maximum off in T.Writing Palm probabilities as a ratio of expectations, as explained in Section 7.1.1,and using the fact that t L, we immediately have

P min|tt| b + a/b) E(#{s L : min|ts| b + a/b})

E(#{s L : f(s)> (b + a/b), s= t}) .(7.16)

WritingNb= #{s L : f(s)> (b + a/b)},

it is standard fare that, for the random fields of the kind we are treating,

E(Nb) = (1 + o(1))P(Nb= 1)

for large b. (e.g. Chapter 6 of [1]or Chapter 5 of[6].)Therefore, for b large enough,

E(#{s L : f(s)> (b + a/b)})E(#{s L : f(s)> (b + a/b), s= t})


41/46


and Wu and g are independent, (0) = 0. Furthermore, since C (t) = O (t) and

2(t) = O (t), there exists ac0such that |(t)| c0|t|2. In addition,Wuhas densityfunction proportional to

u(w) |det(r(w) u)| exp(12

w120w) (w u N) .

Note that det(r(w) u) is expressible as a polynomial in w and u, and thereexists some 0 andc such thatdet(r(w) u)det(u)

cif|w| 0u. Hence, there exist 2, c2 > 0, such that

u(w)

(w) : =c2exp

1

22w

120w

,

for all u1. The right hand side here is proportional to a multivariate Gaussiandensity. Thus,

P(|Wu| > x) =|w|>x

u(w)dw|w|>x

u(w) dw= c3P(|W| > x),whereWis a multivariate Gaussian random variable with density function propor-tional to. Therefore, by choosing 1 andc1 appropriately, we have

P

sup|t|/b

Wu> 14b

P

|Wu| > b

c202

c1exp

1b

2

4

,

for all u

b.

Proof of Lemma 7.13. Once again, it suffices to prove the Lemma for the casea= 1. Since

fb(0) =b = b Wb(0) + g(0),the covariance function ((s, t) : s, t T) of the centered fieldgsatisfies(0, 0) = 0.It is also easy to check that

s(s, t) = O(|s| + |t|), t(s, t) = O(|s| + |t|).Consequently, there exists a constant c (0, ) for which

(s, t) c(|s|2 + |t|2), (s, s) c|s|2.We need to control the tail probability of sup|t|/b |g(t)|. For this it is useful tointroduce the following scaling. Define

g(t) = b

g

t

b

.

42/46


Then sup|t|/b g (t) 14b if and only if sup|t|1 g(t) 14 . Let(s, t) = E(g(s) , g(t)) .

Then,supsR

(s, s) c.

Because(s, t) is at least twice differentiable, applying a Taylor expansion we easilysee that the canonical metric dg corresponding to g(s) (cf. Theorem6.7) can bebounded as follows:

d2g(s, t) = E(g(s) g(t))2 = b2

2

s

b ,

s

b

+

t

b,

t

b

2

s

b ,

t

b

c |s t|2 ,

for some constant c

(0,

). Therefore, the entropy ofg, evaluated at, is boundedbyKd for any >0 and with an appropriate choice ofK >0. Therefore, for all < 0,

P( sup|t|/b

|g(t)| 14b

) = P(sup|t|1

g(t) 14

) cdd exp( 116c2

),

for some constant cd and >0. The last inequality is a direct application of The-orem 4.1.1 of [5]. The conclusion of the lemma follows immediately by choosingcand appropriately.

8. Numerical Examples. In this section, we provide four examples whichindicate how well the techniques we have suggested actually work in practice.

The fist treats a random field for which the tail probability is in a closed form.This is simply to conf

Efficient Monte Carlo for High Excursions by Adler, Blanchet and Liu

Documents

Transcript of Efficient Monte Carlo for High Excursions by Adler, Blanchet and Liu