Weak convergence of the tail empirical process for ...rootzen/papers/tail_empirical060816.pdf · by...

Weak convergence of the tail empirical process for

dependent sequences

Holger RootzenChalmers University of Technology†

Abstract

This paper proves weak convergence in D of the tail empirical process- the renormalized extreme tail of of the empirical process - for a largeclass of stationary sequences. The conditions needed for convergence are(i) moment restrictions on the amount of clustering of extremes, (ii) re-strictions on long range dependence (absolute regularity or strong mixing),and (iii) convergence of the covariance function. We further show how thelimit process is changed if exceedances of a nonrandom level are replacedby exceedances of a high quantile of the observations. Weak convergenceof the tail empirical process is one key to asymptotics for extreme valuestatistics and its wide range of applications, from geoscience to finance.Earlier (unpublished) versions of our results have already found signifi-cant application in estimation theory. They also give a theoretical basisfor popular diagnostic plots in dependent cases.

1 Introduction

This paper gives a number of convergence results for the tail empirical functionfor dependent stationary sequences. In addition to theoretical interest, the mo-tivation comes from semiparametric methods for extremes, such as the Peaksover Thresholds (PoT) method. In these the statistical analysis only uses thepart of the observations which exceed some suitably chosen high level. Themethods, in particular with a generalized Pareto assumption for the tail distri-bution, are finding significant application, and are getting a firmer theoreticalfoundation. There are several recent books on the subject ((Coles (2001), Em-brechts et al. (1997), Kotz and Nadarajah (2000), Kowaka (1994), Beirlantet al. (2005), Reiss and Thomas (2005)) and a large journal literature. Theliterature is complemented by a considerable body of software: for a review seeStephenson and Gilleland (2005).

Keywords and phrases: extremes, clustering of extremes, tail distribution function, abso-lute regularity, strong mixing.AMS 2000 Classification: Primary 60G70; Secondary 60F17, 62G32.†SE-419 62 Goteborg, SWEDEN, [email protected] supported in part by the Swedish Foundation for Strategic Research.

1

Dependent observations are of basic interest in many classical applicationareas for extreme value statistics, e.g. Geoscience and Environmental Science.Also, in the recent surge of interest in extreme values of financial data, depen-dence is the rule rather than the exception.

Our limit theorems provide a theoretical foundation for extreme value es-timation theory, and for graphical diagnostics by e.g. probability and quantileplots. Results from an earlier unpublished version (Rootzen (1995)) alreadyhas found significant application in deriving asymptotic normality for PoT es-timators in dependent cases, see Drees (2000, 2002, 2003). The latter of thesepapers also shows that neglecting dependence can lead to severe underestima-tion of variability. Besides Resnick and Starica (1998) who study estimationfor some specific heavy-tailed models, the only other extreme value estimationresults for dependent sequences are aimed at Hill or Hill-like estimators (Hs-ing (1991), Rootzen et al (1992), Resnick and Starica (1995, 1997), Datta andMcCormick (1998), Novak (2002), Hill (2006)).

To describe the results of this paper, let {ξi}∞i=−∞ be a stationary sequencewith continuous marginal distribution function (d.f.) F. We throughout usethe notation F (x) = 1 − F (x) for the tail d.f. Let {un}∞n=1 be (high) levelsand {σn > 0}∞n=1 be norming constants. The tail function (or “conditional taildistribution function”) is defined to be

Tn(x) =F (un + xσn)

F (un), x ≥ 0, n = 1, 2, . . . .

In much of this paper we assume that the tail function converges to a generalizedPareto form, i.e.,

Tn(x) → T (x) = (1 + γx

σ)−1/γ+ , x ≥ 0, n →∞, (1.1)

where σ > 0 and γ ∈ (−∞,∞) are parameters of the limit and the subscript+ denotes ”positive part”. For γ = 0 we interpret T (x) to be the limit, e−x/σ,as γ → 0. The special case of uniform distributions is considered separately inSection 5, for use as a technical tool here. It can also be useful for situationswhere the Generalized Pareto assumption isn’t satisfied.

Let xT be the right endpoint of the support of T , i.e., xT = sup{x; T (x) < 1}so that xT = ∞ for γ ≥ 0, and xT = σ/|γ| for γ < 0. The tail empiricaldistribution function and the tail empirical process are defined as

Tn(x) =1

nF (un)

n∑

i=1

1{ξi>un+xσn}

ande(Tn)(x) =

√nF (un)(Tn(x)− Tn(x)),

respectively, with 1{} denoting the indicator function which is one if the eventin curly brackets occurs, and zero otherwise.

Let D(I) be the space of right continuous real functions on the (finite orinfinite) interval I which are right continuous and have left limits at each point,

2

given the Lindvall-Stone extension of the Shorokhod J1-topology, see Pollard(1984).

The first results of this paper, Theorems 2.1 and 2.2, are that the tailempirical process converges in D([0, xT )) to a continuous Gaussian process.The result requires three kinds of assumptions beyond (1.1). We now brieflydiscuss these. A more detailed discussion is given in Section 4.

Often extremes in dependent sequences come in small clusters (in contrastto independent sequences where extremes tend to be isolated from one an-other). The first kind of assumption restricts the size of clusters of large valuesby assuming that they have a suitably bounded p-th moment. The second as-sumption makes clusters which are far apart asymptotically independent. Forthis long range dependence restriction we use either absolute regularity (some-times also called β-mixing) or the somewhat weaker strong mixing condition.Absolute regularity may be the most natural assumption. It is widely appli-cable, e.g. in situations where coupling or regeneration holds; in particular forMarkov chains β-mixing with an exponential rate of decay is equivalent to ge-ometric ergodicity (Bradley 2005, Thm. 3.7), and it gives the cleanest results.However, at the expense of strengthening other conditions it is also possible toprove convergence using strong mixing assumptions.

Finally, we assume that the covariance function of {Tn(x);x ≥ 0} convergesas n →∞. For standard empirical processes, this convergence may be obtainedfrom more basic assumptions. However, this is not possible in the present tailcontext.

In applications, the “level” un has to be chosen using information from thesample. We study one way of doing this, by replacing un by ξ[cn], the cn-thlargest of ξ1, . . . , ξn. Our results easily give convergence also in this situation.However, the limit is changed in an interesting way.

Related results in the literature concern tail empirical processes for inde-pendent sequences and ordinary empirical processes in dependent cases. Con-vergence and statistical application of the tail empirical process have been con-sidered e.g. by Mason (1988), Deheuvels and Mason (1990), Einmahl (1990b),Csorgo et al (1986), Drees (1998); see also the review by Einmahl (1990a).There is a large literature on convergence of standard empirical processes fordependent sequences. We refer to Arcones and Bin Yu (1994), Doukhan et al.(1995), and Shao and Yu (1996) for sharp results and further information. Arecent survey is Dehling et al. (2002). β-mixing stems from Kolmogorov, andis used in Volkonski and Rozanov (1959, 1961). Berbee (1979) provides a sharpresult. The form of it which we use is from Eberlein (1984).

We introduce some further notation, state the results for the case when(1.1) holds, and give a number of alternative conditions in Section 2. The morerealistic case with exceedances over a random level ξ[cn] is studied in Section3. Section 4 discusses the conditions of the theorems and a further condition,D′(un), which precludes clustering of extremes. It also lists a number of concretemodels where the conditions are known to hold and mentions implications forprobability, quantile, and mean excess plots. Section 5 contains the results foruniform marginal distributions, and Section 6 the proofs.

3

2 Convergence of the tail empirical process

In this section we state the result for the case when exceedances have a limitingGeneralized Pareto distribution. We also introduce a number of variants of theconditions, which may be easier to check in specific cases. Section 4 includessome further discussion of the conditions.

Now, some notation which is needed to state the theorems. As in the intro-duction, let {ξi}∞i=−∞ be a stationary sequence with marginal d.f. F , and letBj

i = σ{ξi, . . . , ξj} be the σ-algebra generated by ξi, . . . , ξj . The sequence {ξi}is said to be β-mixing, or absolutely regular if

β(k) = sup`≥1

E supA∈B∞`+k+1

|P (A|B`1)− P (A)| → 0,

as k → ∞. The constants β(k) are called the β-mixing constants for {ξn}.Similarly, the sequence is strongly mixing if

α(k) = sup{|P (AB)− P (A)P (B)|; A ∈ B`1, B ∈ B∞`+k+1, ` > 1} → 0,

as k → ∞. The constants α(k) are the strong-mixing (or α-mixing) constantsfor {ξn}.

Let `n ≤ rn ≤ n be sequences of integers; rn is the length of the bigblocks and `n the lengths of the small blocks, in a ”big blocks - small blocks”argument, and hence the number of blocks is of the order n/rn. Further, as inthe introduction, let {un} be a sequence of real numbers, the levels, and write

Nn(x, y) =rn∑

i=1

1{un+xσn<ξi≤un+yσn}.

for the number of normalized exceedance values in a block which fall betweenx and y. We will assume that the levels un tend to the righthand endpoint ofdistribution of the as n →∞, but slowly enough to make the expected numberof exceedances of un by ξ1, . . . , ξn tend to infinity. Further, `n, the length of thesmall blocks which separate the big blocks, has to be big enough to make thebig blocks asymptotically independent, but still small enough to make the smallblocks asymptotically unimportant. Thus it will throughout, without furthercomment, be assumed that

F (un) → 0, nF (un) →∞, `n = o(rn), rn = o(n). (2.1)

We will use the following conditions. For each θ < xT and for 0 ≤ x, y < θthere is a constant c which only depends on θ, such that

C1 E{Nn(x, y)p|Nn(x, y) 6= 0} ≤ c, some p ≥ 2,

C2 β(`n)n/rn → 0,

and

C31

rnF (un)C(

rn∑

i=1

1{ξi>un+xσn},rn∑

i=1

1{ξi>un+yσn}) → r(x, y), n →∞,

4

for some function r(x, y). If p = 2 in C1 we in addition assume that

C4 rn(nF (un))−ν → 0,

for some ν < 1/2.We can now state the main results of this paper.

Theorem 2.1. Suppose (1.1) and C1− C3 hold, and that if p = 2 also C4 issatisfied. Then

e(Tn) → e, n →∞ in D([0, xT )),

where e is a centered continuous Gaussian process with covariance functionr(x, y).

For strong mixing we replace C2 by the following condition,

D2 α(`n)n/rn → 0, n/rn = o((nF (un)ν), α(n) = o(nθ),

with ν ∈ (0, 1) and θ > (1 − ν)p/(2(p − 2)), and where α(n) are the strongmixing coefficients.

Theorem 2.2. Suppose C1 is satisfied for some p > 2 and that (1.1), D2 andC3 hold. Then

e(Tn) → e, n →∞ in D([0, xT )),

where e is a centered continuous Gaussian process with covariance functionr(x, y).

Remark 2.3. For both theorems, (i) if γ ≥ 0, then xT = ∞ and there is henceconvergence in D([0,∞)). For the case γ < 0, which entails xT < ∞, it maybe seen from the proof that if the conditions hold also for θ = xT , then there isalso convergence in D([0,∞)). (But of course the limit is zero for x > xT .)(ii) For later use, we note that there is convergence also in D([−ε, xT )) for anε > 0, provided the conditions are satisfied for un replaced by un − εσn. Thiscan be seen by changing un to un − εσn in the theorem.

It is possible to replace C1 by conditions which may be easier to check inconcrete examples. The conditions are assumed to hold for any θ ∈ (0, xT ) and0 ≤ x, y ≤ θ and a constant c > 0 which only depends on θ. The first one

C111

rnF (un)E(Nn(x, y)p) ≤ c|x−y| some p ≥ 2

was proposed and extensively used by Drees for p = 2 and 4. Let

Nn(x, y) =rn∑

i=−rn

1{un+xσn<ξi≤un+yσn}.

The next three are that

C12 E{Nn(x, y)p|un+xσn < ξ0 ≤ un+yσn} ≤ c, some p ≥ 2,

5

C13 E{Nn(x, y)p|ξ0 = un+zσn} ≤ c, 0 ≤ z ≤ θ, some p ≥ 2,

or that

C14 E{Nn(x, y)p|ξ0 = un + zσn} ≤ c, 0 ≤ z ≤ θ, {ξi} is a Markov chain,

for some p ≥ 2. It may sometimes simplify computations to note that thatN(x, y) is bounded by the number of exceedances in the cluster, Nn(x, y) ≤Nn(0,∞) =

∑rni=1 1{ξi>un}, and similarly for Nn

The condition C3 may, under an additional assumption, be simplified to

C31 F (un)−1∑

|i|<rn

(1−|i|/rn)P (ξ0 > un+xσn, ξi > un+yσn) → r(x, y).

Corollary 2.4. (i) The condition C1 of Theorems 2.1 and 2.2 may be replacedby either one of C11− C14.(ii) If rnF (un) → 0 then the condition C3 of Theorem 2.1 may be replaced byC31.

3 Random levels

Letcn = nF (un), (3.1)

and let ξ[cn] be the [cn]-th largest of ξ1, . . . , ξn. In statistical applications cn andun are not simultaneously known, since this entails knowledge of F . Hence, thelevel uN is usually chosen as a function of the observations ξ1, . . . , ξn. Specifi-cally un is often replaced by ξ[cn]. In this section we shall see that this changesthe form of the limiting distribution in an interesting way, but that the previousresults may be used in the derivation of the limit.

For this, let

Tn(x) =1cn

n∑

i=1

1{ξi>ξ[cn]+xσn},

so that Tn is obtained by replacing un by ξ[cn] in the definition of Tn. Further,let

e(Tn)(x) =√

cn(Tn(x)− F (ξ[cn] + xσn)F (ξ[cn])

)

be the normalized estimation error of Tn(x) considered as an estimator of thetail function at the level ξ[cn].

We will assume, in correspondence to (2.1), that

cn →∞, cn/n → 0, (3.2)

that un is defined by (4.1), and that there is an ε > 0 such that

T ′n(x) =d

dx

F (un + xσn)F (un)

→ T ′(x), n →∞, (3.3)

uniformly for x ∈ [−ε, ε].

6

Theorem 3.1. Suppose that e(Tn) d→ e in D([−ε,∞)), for some ε > 0 andcontinuous process e, and that (3.1) - (3.3) hold. Then

en(Tn)(x) d→ e(x)− T (x)e(0), n →∞,

in D([0, xT )). In particular, this convergence holds if the conditions of Theorems2.1 or 2.2 or of Corollary 2.3 hold, as modified in Remark 2.2 (ii), and (3.3)is satisfied.

4 Comments on the conditions, examples, and ap-plications

(i) Condition C1 is the main difference between the limit theory for tail empiricalprocesses and standard empirical process theory. The appendix exhibits anexample where C2, C3, and C4 hold, but where the result of Theorem 2.1 isn’ttrue. In fact, in the example the β-mixing coefficients even decay exponentiallyfast. Hence, in general, some sort of condition like C1 is needed. Further,Drees (2002) argues that the variant C1.1 is rather close to being necessary forexponentially β-mixing variables, and it of course also is closely related to C3.

The intuitive content of C1 is that the lengths (or sizes) of clusters aresuitably bounded, also conditionally. In particular this is brought out by thefollowing somewhat more restrictive version of C1,

E(Nn(0,∞)p|Nn(x, y) > 0) ≤ c.

Here Nn(0,∞) is the number of exceedances in the first block, i.e., provided itis non-zero, Nn(0,∞) is the size of a cluster of exceedances.

Finally, it might be worth noting that C1 entails that rnF (un) is bounded,so that the expected number of exceedances in a block of size rn is bounded.(This claim is an easy consequence of C11 and Liapunov’s inequality, and C11follows from C1.)

(ii) Condition C3 can also be seen to be needed; in fact, it is easy to constructexamples of e.g. 2-dependent sequences where all the conditions of Theorem 2.1are satisfied but where C3 doesn’t hold and the tail empirical function doesn’tconverge. We leave this to the reader.

(iii) The norming of the difference between the empirical distribution functionand its mean is in this paper done by multiplying by

√nF (un). However, it

would be straightforward to allow for more general norming sequences.

(iv) Leadbetter’s condition D′(vn) holds for the sequence {vn} if

n

qn∑

i=1

P (ξ0 > vn, ξi > vn) → 0, m →∞,

7

for any sequence qn = o(n). If the condition holds for suitable sequences {vn}then it precludes clustering of extremes, see Leadbetter et al (1983, Section3.4).

We next claim that if D′(vm) holds for any sequence {vn} such that nF (vn) →1 and if (2.1) and rnF (un) → 0 also holds, then C3 follows, with

r(x, y) = T (x ∨ y).

Thus, in this case if the remaining conditions of Theorem 2.1 also hold, thenthe limit has the same distribution as W (T (x)), for W a standard Brownianmotion. In particular, this applies if the ξi’s are i.i.d., as is well know, cf.Einmahl (1990a).

To prove this claim put m = m(n) = [1/F (un)], vm = inf{un; [1/F (un)] =m} and qm = sup{rn; [1/F (un)] = m}. Then mF (vm) → 1, and qm/m → 0 asn →∞ and hence

F (un)−1∑

0<|i|<rn

P (ξ0 > un + xσn, ξi > un + yσn)

≤ F (un)−12rn∑

i=1

P (ξ0 > un, ξi > un)

≤ 2(m + 1)qm∑

i=1

P (ξ0 > vm, ξi > vm) → 0, n →∞.

Further, by the definition of Tn,

F (un)−1P (ξ0 > un + xσn, ξ0 > un + yσn) = Tn(x ∨ y)→ T (x ∨ y), n →∞,

by (1.1).

(v) Examples: Here we list a number of examples where the results have beenshown to apply, for most of the examples under suitable further conditions.Precise formulations are given in Rootzen (1995) and Drees (2000, 2002, 2003).The latter three papers used a version of C2 which is slightly too weak. However,it is straightforward to see that the results nevertheless are correct, so here thelist:

• k-dependent sequences: Rootzen (1995)

• Finite moving averages: Rootzen (1995), Drees (2002)

• AR(1)-processes with α-stable innovations: Rootzen (1995), Drees (2002)

• Solutions to random difference equations: Drees (2000, 2002,2003)

• ARCH(1)-processes: Drees (2002, 2003)

• Linear time series models, under extra conditions: Drees (2003)

8

(vi) Application to diagnostic plotting: In a pp-plot one plots Tn(x) on a suitablescale. The results from the previous section show that the deviations betweenthis estimate and the true tail function Tn(x) approximately is distributed as1/√

cn times a continuous Gaussian process with covariance simply computedfrom the asymptotic covariance r(x, y).

In a qq-plot one instead plots the inverse of Tn(x). It follows from Vervaat’sLemma (Vervaat: 1972), see Drees (2002) that the deviation from the inverseof the true tail functions has the same asymptotic distribution as the deviationsin a pp-plot.

A mean excess plot shows the empirical mean of the observations over a highlevel, as a function of the level. Under suitable extra moment conditions ourresults also give approximations for the deviations from the true mean excessfunction, cf. Shao and Yu (1996). However, to actually prove this would requirefurther work.

The covariance function r(x, y) is unknown and has to be estimated. Onepossibility is the blocks method, where one first divides ”interval” 1, . . . n into”blocks” of length rn, and also typically changes scale by replacing x by y = xσn

to remove the unknown parameter σn. For each block one then computes abivariate vector with first entry equal to the number of exceedances of ξ[cn]+x

in the block and second entry the number of exceedances of ξ[cn]+y in the block.From these approximately n/rn bivariate vectors one then computes an estimateof the correlation r(x, y) in a standard way. The drawback of this procedure isthe problem of how to choose the blocklength rn. However the estimate mayoften be rather insensitive to this choice. Drees (2003) proposes very promisingalternative ways to estimate quantities related to r(x, y). It would be interestingto explore if modifications of Drees’ ideas could be used for the present problem.

5 Uniform marginal distributions

We now state versions of the results from Section 2 which instead of (1.1) assumeuniform marginal distributions. Thus, let {ηi}∞i=−∞ be a stationary sequencewith uniform marginal d.f.’s, P (ηi ≤ x) = x for 0 ≤ x ≤ 1. Let {vn ∈ (0, 1)} bea sequence of constants, and {rn}, {`n} sequences of integers, which throughoutare assumed to satisfy

vn → 0, nvn →∞, `n = o(rn), rn = o(n). (5.1)

Further let

Tn(x) =1

nvn

n∑

i=1

1{ηi>1−vn+vnx},

e(Tn(x)) =√

nvn(Tn(x)− (1− x)),

be the uniform tail and tail empirical functions, respectively, and write

N(x, y) =rn∑

i=1

1{1−vn+vnx<ηi≤1−vn+vny},

9

for 0 ≤ x ≤ y ≤ 1. As above, in this section bar indicates quantities definedfrom the ηi’s. Corresponding to C1 - C4, we will use the following assumptions.For each θ < 1 and 0 ≤ x, y ≤ θ, there is a constant c which only depends on θsuch that

C1 E{Nn(x, y)p|Nn(x, y) 6= 0) ≤ c, some p ≥ 2,

C2 β(`n)n/rn → 0,

where the β-s are the β-mixing coefficients for {ηi}, and

C31

rnvnC(

rn∑

i=1

1{ηi>1−vn+vnx},rn∑

i=1

1{ηi>1−vn+vny}) → r(x, y), n →∞,

for some function r(x, y). If p = 2 in C1 we in addition assume that

C4 rn(nvn)−ν → 0,

for some ν < 1/2.The first result gives convergence of the tail empirical function for uniformly

distributed variables under uniform regularity.

Theorem 5.1. Suppose that C1 and C2 are satisfied. Further, if p = 2 assumeC4 holds. Then(i){e(Tn)(x); 0 ≤ x ≤ 1} is tight in D[0, 1), and any distributional limit of asubsequence is continuous.(ii) If in addition C3 is satisfied, then

e(Tn) d→ e in D[0, 1), as n →∞, (5.2)

where e is a continuous centered Gaussian process with covariance functionr(x, y).

For strong mixing we replace C2 by the following condition,

D2 α(`n)n/rn → 0, n/rn = o((nvn)ν), α(n) = o(nθ),

with ν ∈ (0, 1) and θ > (1 − ν)p/(2(p − 2)) and where α(n) are the strongmixing coefficients for the uniform variables. Corresponding to Theorem 5.1 wehave the following result for the strong-mixing case.

Theorem 5.2. Suppose that C1 is satisfied for some p > 2 and that D2 andC3 hold. Then

e(Tn) d→ e in D[0, 1), as n →∞, (5.3)

where e is a continuous centered Gaussian process with covariance functionr(x, y).

It of course is straightforward to translate C1.1 − C1.4 and C3.1 to thepresent case. We leave this to the reader. Furthermore, Drees (2000) hasshown that C4 may be slightly weakened, to rn(nvn)−1/2 log4(nvn) → 0. Drees(2002, 2003) also gives some further conditions, which are more stringent butsometimes easier to check, and which also can replace C1 and C3.

10

6 Proofs

We first prove Theorem 5.1. For this, let kn = [n/(2rn)] be the integer partof n/(2rn). In the proof of tightness, we split the “interval” {1, . . . , n} up into2kn blocks I1, . . . , I2kn of rn integers and a remaining block I2kn+1 of length lessthan 2rn, and consider sums over even and odd blocks separately.

For en,i(x) = (nvn)−1/2(1{ηi>1−vn+vnx} − vn(1 − x)) let {fn,i(x); 0 ≤ x ≤1, i ∈ I2j} have the same distribution as {en,i(x); 0 ≤ x ≤ 1, i ∈ I2j} forj = 1, . . . kn, but let them be independent for different values of j. Thus thefn,i’s have the same joint distribution as the en,i’s, except that fn,i’s in separateeven blocks are independent. Further, let

Xn,j(x) =∑

i∈I2j

fn,i(x)

be the sum over the j-th even block, so that the Xnk’s are mutually independentfor j = 1, . . . , kn, and let

Sn(x) =kn∑

j=1

Xn,j(x)

be the sum of the even blocksums.The next lemma shows that by β-mixing the en,i’s may be replaced by fn,i’s

in the proof of tightness of the sum over even blocks. Tightness of the sum overodd blocks may be proved in the same way as for the sum over even blocks.Finally, the sum over the last block, I2kn+1, is easily seen to tend to zero, andtogether these facts can be used to deduce tightness of en.

Lemma 6.1. Suppose C2 holds and that {Sn(x); 0 ≤ x ≤ 1} is tight in D[0, 1),with any subsequential distributional limit continuous. Then also the sums ofthe original variables over the even blocks,

{kn∑

i=1

∑

i∈I2j

en,i(x); 0 ≤ x ≤ 1}∞n=1,

is tight in D[0, 1), and any limit in distribution of a subsequence is continuous.

Proof. Using e.g. Eberlein (1984) the variation distance between {fn,i(·); i ∈I2j , j = 1, . . . kn} and {en,i(·); i ∈ I2j , j = 1, . . . , kn} is bounded by knβ(rn) <β(`n)n/rn → 0, by C2, and the lemma follows. ¤

Next, C1 gives a moment bound on the increments of the Xn,j ’s. In thelemma, and later, K denotes a generic constant whose value may change fromappearance to appearance, and which only depends on θ, where θ is an arbitrarynumber in (0, 1).

Lemma 6.2. Suppose C1 holds. Then, for 0 ≤ x, y ≤ θ and ζ ≥ p ≥ 2,

E{|Xn,1(y)−Xn,1(x)|ζ} ≤ Kk−1n (

r2n

nvn)(ζ−p)/2(nvn)1−p/2|y − x|, (6.1)

for large n.

11

Proof. Assume x ≤ y and recall the notation Nn(x, y) =∑rn

i=1 1{1−vn+vnx<ηi≤1−vn+vny}.By the definition of Xn,1, since central moments are bounded by a constanttimes the moment of the variable itself,

E{|Xn,1(y)−Xn,1(x)|ζ} ≤ K(nvn)−ζ/2E(Nn(x, y)ζ).

Now, using C1 in the third step,

E(Nn(x, y)ζ) ≤ rζ−pn E(Nn(x, y)p) (6.2)

≤ rζ−pn E(Nn(x, y)p|Nn(x, y) 6= 0)P (Nn(x, y) 6= 0)

≤ rζ−pn crnvn|y − x|.

Since rn ≤ n/kn, this proves the lemma. ¤

We can now control the variation of Sn on a mesh of size

∆ = ∆n = (nvn)−1 for p > 2 (6.3)

= (r2n

nvn)1/(1/2−ν) for p = 2.

Let Jδk denote the set of integers i for which 0∨(kδ−∆) ≤ i∆ ≤ ((k+1)δ+∆)∧θ.

Lemma 6.3. Suppose C1 holds. Then to and ε, η > 0 there is a δ > 0 with

P ( maxi,j∈Jδ

k

|Sn(j∆)− Sn(i∆)| > ε) ≤ ηδ, (6.4)

for k ≥ 0 and n large.

Proof. First note that if C1 holds for some p > 2 then it also holds for p = 2,by Liapunov’s inequality. Now, let ζ ≥ p ≥ 2 and for brevity write X for Xn,i

(for each n, the {Xn,i : i ≥ 0} form a stationary sequence). Then, by theRosenthal inequality (see e.g. Petrov(1995, p 59) and Lemma 6.2,

E|Sn(x)− Sn(y)|ζ ≤ K{knE|X(y)−X(x)|ζ + (knE|X(y)−X(x)|2)ζ/2}≤ K{( r2

n

nvn)(ζ−p)/2(nvn)1−p/2|y − x|+ |x− y|ζ/2}. (6.5)

Now, assume p > 2 and take ζ = p. It then follows from (6.5) that, for|y − x| > (nvn)−1 = ∆,

E|Sn(x)− Sn(y)|p ≤ K|y − x|p/2{((nvn)−1

|y − x| )−1+p/2 + 1}

≤ K|y − x|p/2

Hence, by Billingsley (1968), Theorem 12.2, the lefthand side of (6.4) is boundedby

K(δ + 2∆)p/2

ε4= (K

(δ + 2∆)p/2

δε4)δ.

12

By taking δ sufficiently small and for n large K(δ+2∆)p/2δ−1ε−4 < η and (6.4)holds. This concludes the proof for the case p > 2.

Next, assume p = 2 and take ζ = 4 in (6.5) to get that, for |x − y| ≥(r2

n/(nvn))1/(1/2−ν) = ∆,

E|Sn(x)− Sn(y)|4 ≤ K|y − x|1+(1/2−ν){((r2n/nvn)

11/2−ν

|y − x| )(1/2−ν) + |x− y|1/2+ν)}

≤ K|y − x|1+(1/2−ν).

The proof now can be completed in the same way as above. ¤

The next lemma, which uses monotonicity of 1{ηi>1−vn+vnx} and of vn(1−x)as functions of x, is classical, and its proof is hence left to the reader.

Lemma 6.4. For 0 ≤ x, y ≤ θ,

sup|x−y|≤δ

|Sn(x)− Sn(y)| ≤ 6maxk≥0

maxi,j∈Jδ

k

|Sn(i∆)− Sn(j∆)|+ 8√

nvn∆.

Together, Lemma 6.4 and Lemma 6.5 below bound the variation of {Sn(t)},by standard arguments.

Lemma 6.5. Suppose that C1 - C3 hold, and that if p = 2 also C4 is satisfied.Then(i) to ε, η > 0 there exists a δ > 0 with

lim supn→∞

P ( sup|x−y|≤δ,0≤x,y≤θ

|Sn(x)− Sn(y)| > ε) ≤ η,

and,(ii) for en,i(x) and In,2kn+1 as defined after Theorem 5.1,

sup0≤x≤1

|∑

i∈I2kn+1

en,i(x)| P→ 0.

Proof. (i) With δ as above,

lim supn→∞

P (max0≤k

maxi,j∈Jδ

k

|Sn(i∆)− Sn(j∆)| > ε)

≤ lim supn→∞

∑

0≤k<1/δ

P ( maxi,j∈Jδ

k

|Sn(i∆)− Sn(j∆)| > ε) ≤ δ−1ηδ = η.

Further, by (6.3),√

nvn∆ → 0, as n →∞ and part (i) now follows from Lemma6.4.

(ii) By definition, for x ∈ [0, 1],

|∑

i∈I2kn+1

en,i(x)| ≤ 1√nvn

∑

i∈I2kn+1

(1{ηi>1−vn} + vn).

Here the righthand side does not depend on x. The length of I2kn+1 is boundedby 2rn, and hence the expectation of the righthand side is bounded by4(nvn)−1/2rnvn. Further nvn → ∞ by assumption and rnvn is bounded (seeRemark (i), Section 4) so that 4(nvn)−1/2rnvn → 0. This proves the secondpart of the lemma. ¤

13

Proof of Theorem 5.1. (i) By Lemma 6.5 (i) and Theorem 15.5 of Billingsley(1968) for any θ ∈ (0, 1) we have that {Sn(x);x ∈ [0, θ]} is tight in D[0, θ], andthat any limit in distribution along a subsequence is continuous. By Lemma5.2 (i) the same holds for

kn∑

j=1

∑

i∈I2j

en,j(x).

Using exactly the same reasoning, this conclusion also applies tokn∑

j=1

∑

i∈I2j−1

en,j(x).

Now, since

e(Tn(x)) =kn∑

j=1

∑

i∈I2j−1

en,i(x) +kn∑

j=1

∑

i∈I2j

en,i(x) +∑

i∈I2k+1

en,i(x),

this together with Lemma 6.5 (ii) proves the first part of Theorem 5.1.

(ii) Since tightness already is established, only convergence of finite-dimensionaldistributions remains to be proved. Further, using the Cramer-Wold device ina standard way, this is only notationally more complicated than proving one-dimensional convergence. (Note that the Lindeberg condition for Cramer-Woldis an easy consequence of the one-dimensional Lindeberg conditions.) Hence weonly give the proof that en(x) converges in distribution to e(x), for x fixed.

To do this we show that the conditions of Corollary 4.2 of Rootzen et al.(1998) (abbreviated to Cor4.2 in the following) are satisfied. Since strong-mixing coefficients are bounded by the β-mixing coefficients in C2, the mixingcondition (2.1) of Cor4.2 is satisfied by assumption.

Next, we show that

1rnvn

V (`n∑

i=1

1{ηi>1−vnx}) → 0. (6.6)

Expanding the sum and using stationarity it can be seen thatE((

∑rn1 1{ηi>1−vnx})2) ≥ (rn/`n − 1)E((

∑`n1 1{ηi>1−vnx})2). Combining this

with (6.2), with ζ = p = 2 and y = 0, we get that

1rnvn

E((`n∑

i=1

1{ηi>1−vnx})2) ≤ cx/(rn/`n − 1) → 0.

This implies (6.6), and thus the first part of condition (4.7) of Cor4.2 is satisfied.This argument is due to Holger Drees (personal communication.) The secondpart may be proved similarly to Lemma 6.5 (ii).

Further, by Lemma 6.2 with ζ = p > 2, if C1 holds then knE|Xn,i(x)|p → 0and thus the Lindeberg condition is satisfied. If instead C4 is assumed tohold, then |Xn,i(x)| is bounded by rn(nvn)−1/2 → 0, and again the Lindebergcondition holds. Thus, all the assumptions of Cor4.2 are shown to hold, andhence en(x) d→ e(x). ¤

14

Proof of Theorem 5.2. The proof of convergence of finite-dimensional distri-butions in Theorem 5.1 (ii) only requires strong mixing, so it is enough to provethat tightness follows from the present assumptions. Choose κ > 1 with θ ≥κ(1−ν)p/(2(p−2)), for ν and θ from D2, let ζ = pκ(1−ν)/{p−2+κ(1−ν)} < p,and choose the mesh-size

∆ = ∆n = (n/rn

nvn)

ζ(1/2−1/p)κ(1−ζ/p) .

It is straightforward to check that (nvn)1/2∆n → 0. Inspection of the proof ofTheorem 5.1 (i) then shows that Theorem 5.2 follows if we prove that (6.4) holdsfor Sn redefined to be the sum of the original dependent blocks (in contrast tothe independent blocks used under β-mixing). Thus, with notation from thestart of Section 6, we here suppose that the Xn,j are sums of en,i-s and not offn,i-s.

Now, by the last part of Theorem 4.1 of Shao and Yu (1996), with p = ζ, r =p, and with notation as in Lemma 6.2, and using Lemma 6.2 in the second step,

E|Sn(x)− Sn(y)|ζ ≤ K kζ/2n (E(|X(y)−X(x)|p))ζ/p

≤ Kkζ/2n (k−1

n (nvn)1−p/2|x− y|)ζ/p.

Thus, for |x− y| ≥ ∆ and using kn = [n/2rn] we have that

E|Sn(x)− Sn(y)|ζ ≤ K |x− y|1+(κ−1)(1−ζ/p){(n/rn

nvn)−

ζ(1/2−1/p)κ(1−ζ/p) |x− y|}−κ(1−ζ/p)

≤ K|x− y|1+(κ−1)(1−ζ/p).

The rest of the proof follows as in Lemmas 6.3 and 6.4. ¤

The first two theorems follow from Theorems 2.1 and 2.2 by a simple “changeof variables” argument. The proofs are the same, so we only prove Theorem2.1.

Proof of Theorem 2.1 First recall the assumption (1.1) that

Tn(x) =F (un + xσn)

F (un)→ (1 + γ

x

σ)−1/γ = T (x), x ≥ 0, (6.7)

as n →∞. Further, let

T←n (y) = sup{x : Tn(x) ≥ y}, 0 < y ≤ 1,

be the left continuous inverse of Tn. Since T is bounded, continuous and strictlydecreasing, it follows that the convergence in (6.7) holds uniformly for x ≥ 0and that

T←n (y) → T−1(y) =σ

γ(y−γ − 1), (6.8)

uniformly for y ∈ [0, θ] for any θ < T−1(xT ). Since F is assumed to be contin-uous, ηi = F (ξi) is uniformly distributed on [0, 1], and

{ξi > un + xσ} a.s.= {ηi > F (un + xσn)} (6.9)= {ηi > 1− F (un) + F (un)(1− Tn(x))}.

15

Hence, if we put vn = F (un) → 0 and write Tn(x) = n−1∑n

i=1 1{ηi>1−vn+vnx},as before, then

Tn(x) a.s.= Tn(1− Tn(x)). (6.10)

We first show that {ηi} satisfies the assumptions of Theorem 5.1. Clearly, (2.1)implies that (5.1) holds. Similarly as for (6.9),

N(x, y) a.s.= Nn(T←n (1− x), T←n (1− y)).

By (6.8), to any θ ∈ (0, 1) there is a θ > 0 such that T←n (1 − θ) ≤ θ < xT

for all n, and hence it follows from C1 that C1 holds for 0 ≤ x, y ≤ θ. Next,C2 follows automatically from C2 and a straightforward argument shows that(6.8), C2, and C3 imply that

1rnvn

C(rn∑

i=1

1{ηi>1−vn+vnx},rn∑

i=1

1{ηi>1−vn+vny})

=1

rnF (un)C(

rn∑

i=1

1{ξi>un+T←n (1−x)σn},rn∑

i=1

1{ξi>un+T←n (1−y)σn})

→ r(T−1(1− x), T−1(1− y)), n →∞.

Hence also C3 holds. Similarly C4 follows from C4. By Theorem 5.1 we hencehave that

e(Tn) → e in D[0, θ], n →∞, (6.11)

with e a centered continuous normal process with covariance r(x, y) = r(T−1(1−x), T−1(1− y)).

It now follows from (6.11), Tn → T uniformly and Theorem 3.1 of Whitt(1980) that

e(Tn(x)) = e(Tn(1− Tn(x))) → e(1− T (x)) in D[0, θ], n →∞,

for any θ ∈ (0, xT ), since to such a θ there is a θ ∈ (0, 1) with 1 − Tn(θ) ≤ θfor all sufficiently large n. Further, e(1−T (x)) is a centered continuous normalprocess with covariance function

r(1− T (x), 1− T (y)) = r(T−1(1− (1− T (x))), T−1(1− (1− T (y))))= r(x, y),

which completes the proof of Theorem 2.1. ¤

Proof of Corollary 2.3. (i) Inspection of the proof shows that C1 only isused in Lemma 6.2, and then only used to establish that C1.1 holds, and hencewe may as well just directly assume that C11 holds.

It is at once seen that C13 implies C12. We next show that the result ofTheorem 2.1 holds also if C1 is replaced by C11. However, in the proof C1 isonly used to establish C1, which in turn only is used in the proof of Lemma6.2. Accordingly, we will prove that the result of Lemma 6.2 holds also underC12.

16

With notation as in the proof of Theorem 2.1, in particular with ηi =F (ξi), vn = F (un), we have, similarly to (6.10), that C12 implies that forθ ∈ (0, 1) there is a c with

E{(rn∑−rn

1{1−vn+vnx<ηi≤1−vn+vnx})p|1− vn + vnx < η0 ≤ 1− vn + vny} ≤ c,

for 0 ≤ x, y ≤ θ. Using a straightforward stationarity argument, and withnotation as in Lemma 6.2, we have that

E{Nn(x, y)ζ} ≤rn∑

i=1

E{Nn(x, y)ζ1{1−vn+vnx<ηi≤1−vn+vny}}

≤ rnE{(rn∑−rn

1{1−vn+vnx<ηi≤1−vn+vny})ζ1{1−vn+vnx<η0≤1−vn+vny}}

= r1+ζ−pn vn|y − x|E{(

rn∑−rn

1{1−vn+vnx<ηi<1−vn+vny})p|1− vn + vnx < ηi ≤ 1− vn + vny}

≤ rv1+ζ−pn |y − x|c.

Since this is the same bound as in (6.2) the result of Lemma 6.2 holds also ifC1 is replaced by C12, as required.

Now instead assume that C13 is satisfied. By similar considerations asabove we have to bound E{Nn(x, y)ζ} using that for 0 ≤ z ≤ θ ∈ (0, 1),

E{Nn(x, y)|ηi = 1− vn + vnz} ≤ c. (6.12)

Let ν = min{i ≥ 1; ηi ∈ (1 − vn + vnx, 1 − vn + vny]} and write S(k, `) =∑`i=k 1{1−vn+vnx<ηi≤1−vn+vny}. Then, using the Markov property for the second

step, non-negativity for the third step, and stationarity and (6.12) for the laststep,

E{N(x, y)ζ} = E{S(ν, rn)ζ1{ν≤rn}}= E{E{S(ν, rn)ζ |ην}1{ν≤rn}}≤ E{E{S(ν, ν + rn − 1)ζ |ην}1{ν≤rn}}≤ rζ−p

n E{E{S(ν, ν + rn − 1)p|ην}1{ν≤rn}}≤ rζ−p

n cE{1{ν≤rn}} ≤ r1+ζ−pn vn(y − x)c,

which again establishes(6.2).

(ii) If rnF (un) → 0 then

(rnF (un))−1E(rn∑

i=1

1{ξi>un+xσn})E(rn∑

i=1

1{ξi>un+yσn})

≤ (rnF (un))−1(rnF (un))2 = rnF (un) → 0.

Hence, in C3 the lefthand side may then be replaced by

(rnF (un))−1E{rn∑

i=1

1{ξi>un+yσn}rn∑

i=1

1{ξi>un+yσn}}.

17

By a standard use of linearity and stationarity, this is seen to be equal to thelefthand side of C31. ¤

Proof of Theorem 4.1. By Skorokhod’s representation theorem we mayreplace convergence in distribution with almost sure convergence, and we mayin fact also disregard the “almost”, and assume that

e(Tn(x)) → e(x), n →∞, (6.13)

uniformly on compact subintervals of [−ε, xT ), for ε > 0 sufficiently small andas before with xT the righthand endpoint of the support of T . We first showthat this implies that

√cnT ′n(0)(ξ[cn] − un)/σn → −e(0). (6.14)

Let T←n (y) = sup{x; T (x) > y} be the right-continuous inverse of Tn and notethat for all sufficiently large n, T←n (x) is well defined for x ∈ [−ε′, ε′], for somesuitable ε′ > 0, by (3.3). Theorem 3.1 of Whitt (1980) and (6.13) imply that

√cn(Tn(T←n (x))− x) → e(T←(x)), n →∞,

uniformly in [−ε′, ε′]. By a straightforward translation and extension argument,Lemma 1 of Vervaat (1972) can then be seen to imply that

√cn(Tn(T←n (x))− x) =

√cn((Tn ◦ T←n )←(x)− x)

→ e(T←(x)),(6.15)

uniformly on closed subintervals of (−ε′, ε′).It follows from (6.13) that T←n (1)− T←n (1) → 0, and hence, by (3.3),

Tn(T←n (1))− 1 = T ′n(T←n (1))(T←n (1)− T←n (1))(1 + o(1)), n →∞.

Since T←n (1) = 0, and since T←n (1) = (ξ[cn]−un)/σn by the definitions, this and(6.15) prove (6.14).

Next, again by the definitions,

e(Tn(x)) = e(Tn)(x + (ξ[cn] − un)/σn) +√

cn

F (ξ[cn] + xσn)F (ξ[cn])

F (ξ[cn])− F (un)F (un)

(6.16)Together, (6.13) and (6.14) ensure that

e(Tn)(x + (ξ[cn] − un)/σn) → e(x), n →∞,

uniformly on compact subintervals of [0, xT ). Further, (3.3), (6.14) and cn →∞, cn/n → 0 imply that

F (ξ[cn] + xσn)F (ξ[cn])

→ T (x), n →∞, (6.17)

18

for x ≥ 0, and that

F (ξ[cn])− F (un)F (un)

= T ′n(0)(ξ[cn] − un)(1 + o(1))/σn. (6.18)

It follows from (6.16)–(6.18) and (6.14) that

e(Tn)(x) → e(x)− T (x)e(0),

uniformly on compact subintervals of [0,∞). ¤

Acknowledgement I want to thank Holger Drees for very helpful comments,and for help with one ingredient in the proof of Theorem 5.1.

References

[1] Arcones, M. A. and Yu, B. (1994). Central limit theorems for empirical andU -processes of stationary mixing sequences. J. Theoret. Probab. 7, 47-71.

[2] Beirlant, J., Segers, J., and Teugels, J. (2005). Statistics of extremes, theoryand applications. Wiley, Chichester.

[3] Berbee, H. (1979). Random walks with stationary increments and renewaltheory. Mathematical Centre tracts 112, Mathematisch Centrum, Amster-dam.

[4] Berbee, H. (1987). Convergence rates in the strong law for bounded mixingsequences. Probab. Theory Rel. Fields 74, 255-270.

[5] Billingsley, P. (1968). Convergence of probability measures. New York: Wi-ley.

[6] Bradley, R. C. (2005). Basic properties of strong mixing conditions. Asurvey and some open questions. Probab. Surveys 2, 107-144.

[7] Coles, S. G. (2001) An Introduction to Statistical Modeling of ExtremeValues. Springer, London.

[8] Csorgo, M., Csorgo, S., Horvath, L., and Mason, D.M. (1986). Weightedempirical and quantile processes. Ann. Probab. 14, 31-85.

[9] Datta, S. and McCormick W. P. (1998). Inference for the tail parametersof a linear process with heavy tail innovations. Ann. Inst. Statist. Math.50, 237-359.

[10] Deheuvels, P. and Mason, D. M. (1990). Nonstandard functional laws of theiterated logarithm for tail empirical and quantile processes. Ann. Probab.18, 1693-1722.

[11] Dehling, H., Mikosch, T., and Sorensen, M., eds (2002). Empirical processtechniques or dependent data. Birkhauser, Boston.

19

[12] Drees, H. (1998). On smooth tail functionals. Scand J. Statist. 25, 187-210.

[13] Drees, H. (2000). Weighted Approximations of Tail Processes for β-MixingRandom Variables. Ann. Appl. Probab. 10, 1274-1301.

[14] Drees, H. (2002). Tail empirical processes under mixing conditions. In:H.G. Dehling, T. Mikosch and M. Sorensen (eds.), Empirical Process Tech-niques for Dependent Data, 325-342. Birkhuser, Boston.

[15] Drees, H. (2003). Extreme Quantile Estimation for Dependent Data withApplications to Finance. Bernoulli 9, 617-657.

[16] Doukhan, P., Massart, P., and Rio E. (1995). Invariance principles forabsolutely regular empirical processes. Ann. Inst. Henri Poincare 31, 393-427.

[17] Eberlein, E. (1984). Weak convergence of partial sums of absolutely regularsequences. Statist. Probab. Letters 2, 291-293.

[18] Einmahl, J. H. J. (1990a). Limit theorems for tail processes with applica-tion to intermediate quantile estimation. J. Statist. Planning and Inference32, 137-145.

[19] Einmahl, J. H. J. (1990b). The empirical distribution function as a tailestimator. Statist. Neerlandica 44, 79-82.

[20] Embrechts, P., Kluppelberg C., and Mikosch, T. (1997). Modelling extremalevents. Springer: New York.

[21] Hill, J. B. (2006). On tail index estimation using heterogeneous, dependentdata. Working paper, Dept. of Economics, Florida International university.

[22] Hsing, T. (1991). On tail estimation using dependent data. Ann. Statist.19, 1547-1569.

[23] Kotz, S. and Nadarajah, S. (2000). Extreme value distributions : theoryand applications. Imperial College Press: London.

[24] Kowaka, M. (1994). An Introduction to Life Prediction of Plant Materials.Application of Extreme Value Statistical Methods for Corrosion Analysis.Allerton Press: New York.

[25] Leadbetter, M. R., Lindgren, G., and Rootzen, H. (1983). Extremes andrelated properties of random sequences and processes. New York: Springer.

[26] Mason, D. M. (1988). A strong invariance principle for the tail empiricalprocess. Ann. Inst. Henri Poincare 24, 491-506.

[27] Novak, S.Y. (2002). Inference on heavy tails from dependent data. Siberianadv. Math. 12, 73-96.

[28] Petrov, V.V. (1995). Limit theorems of probbility theory, sequences of in-dependent variables. Oxford University Press: Oxford.

20

[29] Pollard, D. (1984). Convergence of stochastic processes. New York:Springer.

[30] Reiss, R. and Thomas, M. (2005). Statistical Analysis of Extreme Values(for Insurance, Finance, Hydrology and Other Fields). 3rd revised edition,Birkhuser: Basel.

[31] Resnick, S. and Starica, C. (1995). Consistency of Hill’s estimator for de-pendent data. J. Appl. Probab. 32, 139-167.

[32] Resnick, S. and Starica, C. (1997). Asymptotic behavior of the Hill esti-mator for autoregressive data. Commun. Statist. - Stochastic Models 13,703-731.

[33] Resnick, S. and Starica, C. (1998). Tail estimation for dependent data.Ann. Appl. Probability 8, 1156-1183.

[34] Rootzen, H. (1995). The tail empirical process for stationary sequences.Technical report, Department of Mathematics, Chalmers University, Swe-den.

[35] Rootzen, H., Leadbetter M. R., and de Haan L. (1992). Tail and quantileestimation for strongly mixing stationary processes. Report, Departmentof Statistics, University of North Carolina.

[36] Rootzen, H., Leadbetter M. R. and de Haan, L. (1998). On the distributionof tail array sums for strongly mixing stationary sequences. Ann. Appl.Probab. 8, 868-885.

[37] Shao, Q.M. and Yu, H. (1996). Weak convergence for weighted empiricalprocesses of dependent sequences. Ann. Probab. 24, 2098-2127.

[38] Stephenson and Gilleland (2005). Software for the analysis of extremeevents: the current status and future directions. Extremes 8, no. 3.

[39] Vervaat, W. (1972). Functional central limit theorems for processes withpositive drift and their inverses. Z. Wahrscheinlichkeitsteorie verw. Geb.23, 245-253.

[40] Whitt, W. (1980). Some useful functions for functional limit theorems.Math. Oper. Res. 5, 67-85.

[41] Volkonski, V. A. and Rozanov, Yu, A. (1959, 1961). Some limit theoremsfor random functions, I and II. Theory Probab. Appl. 4, 178-197 and 6,186-198.

21

A Appendix: an example which satisfies C2, C3, andC4, but where the conclusion of Theorem 2.1 doesn’thold

The example uses the simplest possible discrete renewal process to group theintegers into “cycles”. Then, for each cycle, with probability 1/2 all the ξi’s inthe cycle are equal and equal to the cycle length, and with probability 1/2 theyare i.i.d.

Formally, let {Ui}∞−∞ be i.i.d. r.v.’s with P (Ui = 0) = P (Ui = 1) = 1/2,let . . . , T−1 < T0 ≤ 0 < T1 < T2 . . . be the values of i where Ui = 1, letTi + 1, . . . , Ti+1 be the i-the cycle, and let Ci = Ti+1 − Ti be its length. Then,by straightforward computation,

P (Ci ≥ x) = P (T1 ≥ x) =1

2x−1, x ≥ 1, i 6= 0, and P (C0 = x) = x

12x+1

.

(A.1)Next, let ξ′i = Ck for Tk < i ≤ Tk+1, k = 0,±1, . . . , let {ξ′′i } be continuousi.i.d. r.v.’s with P (ξ′′i > x) = 2x22−x for all x greater than some suitable x0,let {Ik}∞−∞ be i.i.d. with P (Ik = 1) = P (Ik = 0) = 1/2 and define

ξi = Ikξ′i + (1− Ik)ξ′′i for Tk ≤ i < Tk+1, k = 0,±, . . . .

Thus, with probability 1/2 all values in a cycle are equal, and equal to the cyclelength, and with probability 1/2 the values in the cycle are i.i.d.; and cycles aremutually independent. Then {ξi}∞−∞ is stationary, and (for x > x0)

F (x) = P (ξ0 > x) =12P (ξ′0 > x) +

12P (ξ′′0 > x)

=12P (C0 > x) +

122x2 1

2x=

12

∞∑

i=x+1

i1

2i+1+ x2 1

2x

∼ x2 12x

, x →∞.

(A.2)

Choose 0 < ` < r < c log 2 < 1 and ε > 0 such that r < (1/2 − ε)(1 − c log 2)and put

un = [c log n], rn = [nr], `n = [n`],

and set σn = 1, for n = 1.We claim that he process {ξi} satisfies all the conditions of Theorem 2.1,

for p = 2, except C1, but the limit of e(Tn) is not continuous. An easy modifi-cation would give an example where {e(Tn)}∞n=1 in addition is not tight in theSkorokhod J1 topology. This is left to the reader.

The following statements (to be proved below) substantiate the claim:

{ξi} is beta-mixing at an exponential rate, i.e.,

βn = O(e−βn) for some β > 0.(A.3)

Further, setting γ = 0, σ = 1/ log 2, the conditions (1.1), (2.1), C31 andrnF (un) → 0 (and hence also C3), and C4 hold, with

r(x, y) =1

2x∨y+

12dx∨ye+1

, x, y ≥ 0,

22

where dae denotes the smallest integer which is not strictly less than a. Finally,

e(Tn)(1−)− e(Tn)(1) P9 0. (A.4)

Sketch of proof. The claim (A.3) follows by the same argument as in the proofof Theorem 6.1 of Berbee (1987) since the Ci’s have finite exponential moments.

Next, (1.1) easily follows from (A.2). Since 0 < r < c log 2 < 1, and definingθn ∈ [0, 1] by un = c log n− θn, we also get from (A.2) that

F (un) ∼ u2n

12un

∼ c2(log n)2n−c log 22θn , (A.5)

and it then is straightforward to check (2.1). Moreover,

rn(nF (un))−1/2+ε = 0(

nr((log n)2n1−c log 2

)−1/2+ε)→ 0, n →∞,

and C2 and rnF (un) → 0 are similarly seen to hold.It is a bit more involved to establish C31. For x, y ≥ 0, i ≥ 1,

P (ξ0 > un + x, ξi > un + y) =12P (ξ′0 > un + x, ξ′i > un + y, T1 > i)

+12P (ξ′′0 > un + x, ξ′′i > un + y, T1 > i)

+ P (ξ0 > un + x, ξi > un + y, T1 ≤ i).

(A.6)

Here, since un is an integer,

P (ξ′0 > un + x, ξ′i > un + y, T1 > i) =∞∑

k=i+1

P (ξ0 > un + dx ∨ ye|T1 = k)P (T1 = k).

(A.7)

Write (a)+ for the positive part of a and set u′ = un + dx ∨ ye + 1. By (A.1)and the definition of the ξi, the last expression in (A.7) equals

∞∑

k=i+1

P (T0 ≤ −(u′ − k))P (T1 = k) =∞∑

k=i+1

{1

2u′−k1{k≤u′} + 1{k>u′}

}12k

=1

2u′ (u′ − i)+ +

12u′∨i

.

Thus, since u′/rn → 0, and u′ →∞, as n →∞,

rn−1∑

i=1

(1− i

rn)P (ξ′0 > un + x, ξ′i > un + y, T1 > i) =

rn∑

i=1

(1− i

rn){

12u′ (u

′ − i)+ +1

2u′∨i

}∼ 1

2(u′)2

12u′ , n →∞.

(A.8)

23

By definition P (ξ′′ > u) ≤ 2F (un) and hence, since rnF (un) → 0,

F (un)−1rn−1∑

i=1

(1− i

rn)P (ξ′′0 > un + x, ξ′′i > un + y, T1 > i)

≤ F (un)−1rnP (ξ′′0 > un)2 → 0, n →∞.

(A.9)

Since P (ξ0 > un + x, ξi > un + x, T1 ≤ i) ≤ F (un)2, also

F (un)−1∑rn−1

i=1 (1− irn

)P (ξ0 > un + x, ξi > un + x, T1 ≤ i) → 0. (A.10)

The same arguments show that (A.8) - (A.10) hold also if the sum from 1 torn − 1 is replaced by a sum from −(rn − 1) to −1. Since in addition

P (ξ0 > un + x, ξ0 > un + y) = F (un + x ∨ y)

it follows from (A.6), (A.8) - (A.10), and (A.2) that

F (un)−1∑

|i|<rN

(1− i

rn)P (ξ0 > un + x, ξi > un + y)

∼ F (un)−1{F (un + x ∨ y) + (u′)21

2u′ }

→ 12x∨y

+1

2dx∨ye+1, n →∞,

(A.11)

and hence C31 holds.To prove (A.4), let ηi = 1 if i starts a sequence of un+1 of ξ’s which all equal

un + 1, i.e. ηi = 1 if ξ′i−1 6= ξ′i = . . . = ξ′i+un= un + 1 and the corresponding

Ik = 1, and let ηi = 0 otherwise. Then, writing ∆n = nF (un){Tn(1−) −Tn(1)} =

∑ni=1 1{ξi=un+1} and Sn =

∑ni=1 ηi,

|∆n − (un + 1)Sn|a.s.≤ un. (A.12)

By straightforward computations, E(ηi) = 12un+3 and V (

∑rni=1 ηi) ∼ rnEη1.

Using (A.3) and reasoning as for C31, Sn may be seen to satisfy the last set ofconditions of Corollary 4.2 of Rootzen et al (1998), and hence

Sn − nEη1√nEη1

d→ N(0, 1), n →∞. (A.13)

Now, by (A.13) and un/√

nF (un) → 0, and using that F is continuous, itfollows that

e(Tn)(1−)− e(Tn(1)) =∆n√

nF (un)

=(un + 1)(Sn − nEη1)√

nF (un)+ E(η1)

√n

F (un)+ o(1)

=Sn − nEη1√

nEη1

√(un + 1)2Eη1

F (un)+ mn + 0(1).

Since (un + 1)2Eη1/F (un) → 2−2, (A.4) follows.

24

Weak convergence of the tail empirical process for ...rootzen/papers/tail_empirical060816.pdf · by...

Documents

Transcript of Weak convergence of the tail empirical process for ...rootzen/papers/tail_empirical060816.pdf · by...