recursive arch EXTENDED - Dept. of Statistics, Texas …suhasini/papers/recursive_arch_EXTENDED.pdfA...

42
A recursive online algorithm for the estimation of time-varying ARCH parameters extended version Rainer Dahlhaus and Suhasini Subba Rao Institut f¨ ur Angewandte Mathematik Universit¨atHeidelberg Im Neuenheimer Feld 294 69120 Heidelberg Germany November 19, 2006 Abstract In this paper we propose an online recursive algorithm for estimating the parameters of a time-varying ARCH process. The estimation is done by updating the estimator at time point (t 1) with observations about the time point t to yield an estimator of the parameter at time point t. The sampling properties of this estimator are studied in a nonstationary context, in particular asymptotic normality and an expression for the bias due to nonstationarity are established. The minimax risk for the tvARCH process is evaluated and compared with the mean squared error of the online recursive estimator. It is shown, if the time-varying parameters belong to a H¨ older class of order less than or equal to one, then, with a suitable choice of step-size, the recursive online estimator attains the local minimax bound. On the other hand, if the order of the H¨ older class is greater than one, the minimax bound for the recursive algorithm cannot be attained. However, by running two recursive online algorithms in parallel with different step-sizes and taking a linear combination of the estimators the minimax bound can be attained for H¨ older classes of order between one and two. Supported by the Deutsche Forschungsgemeinschaft (DA 187/12-2). This technical report is the extended version of the paper “A recursive online algorithm for the estimation of time-varying ARCH parameters”, to appear in ... In addition to the published paper this version contains in particular a minimax result in Theorem 2.4, Remark 2.2, Theorem A.1 (being part of the proof of Theorem 3.1) and the proof of Lemma A.6. 1

Transcript of recursive arch EXTENDED - Dept. of Statistics, Texas …suhasini/papers/recursive_arch_EXTENDED.pdfA...

A recursive online algorithm for the estimation of

time-varying ARCH parameters∗

extended version†

Rainer Dahlhaus and Suhasini Subba Rao

Institut fur Angewandte Mathematik

Universitat Heidelberg

Im Neuenheimer Feld 294

69120 Heidelberg

Germany

November 19, 2006

Abstract

In this paper we propose an online recursive algorithm for estimating the parameters

of a time-varying ARCH process. The estimation is done by updating the estimator at

time point (t− 1) with observations about the time point t to yield an estimator of the

parameter at time point t. The sampling properties of this estimator are studied in a

nonstationary context, in particular asymptotic normality and an expression for the bias

due to nonstationarity are established. The minimax risk for the tvARCH process is

evaluated and compared with the mean squared error of the online recursive estimator.

It is shown, if the time-varying parameters belong to a Holder class of order less than

or equal to one, then, with a suitable choice of step-size, the recursive online estimator

attains the local minimax bound. On the other hand, if the order of the Holder class is

greater than one, the minimax bound for the recursive algorithm cannot be attained.

However, by running two recursive online algorithms in parallel with different step-sizes

and taking a linear combination of the estimators the minimax bound can be attained

for Holder classes of order between one and two.

∗Supported by the Deutsche Forschungsgemeinschaft (DA 187/12-2).†This technical report is the extended version of the paper “A recursive online algorithm for the estimation

of time-varying ARCH parameters”, to appear in ... In addition to the published paper this version contains

in particular a minimax result in Theorem 2.4, Remark 2.2, Theorem A.1 (being part of the proof of Theorem

3.1) and the proof of Lemma A.6.

1

1 Introduction

The class of ARCH processes can be generalised to include nonstationary processes, by

including models with parameters which are time dependent. More precisely Xt,N is

called a time-varying ARCH (tvARCH) process of order p if it satisfies the representation

Xt,N = Ztσt,N , σ2t,N = a0(

t

N) +

p∑

j=1

aj(t

N)X2

t−j,N , (1)

where Zt are independent, identically (iid) distributed random variables with E(Z0) = 0

and E(Z20 ) = 1. This general class of time-varying ARCH (tvARCH) processes was investi-

gated in detail in Dahlhaus and Subba Rao (2006). It was shown that the tvARCH process

can locally be approximated by a stationary process, the details we summarise below. Fur-

thermore, a local quasi-likelihood method was proposed to estimate the parameters of the

tvARCH(p) model.

A potential application of the tvARCH process is to model long financial time series. The

modelling of financial data using nonstationary time series models has recently attracted

considerable attention. A justification for using such models can be found, for example,

in Mikosch and Starica (2000, 2004). However given that financial time series are often

sampled at high frequency, evaluating the likelihood as each observation comes online can

be computationally expensive. Thus an “online” method, which uses the previous estimate

of the parameters at time point (t− 1) and the observation at time point t to estimate the

parameter at time point t, would be ideal and cost effective. There exists a huge literature

on recursive algorithms - mainly in the context of linear systems (cf. Ljung and Soderstrom

(1983), Solo (1981, 1989) or in the context of neural networks (cf. White (1996), Chen and

White (1998)). For a general overview see also Kushner and Yin (2003). Motivated by

the least mean squares algorithm considered in Moulines et al. (2005) for time-varying AR

processes, we consider in this paper the following online recursive algorithm for tvARCH

models:

at,N = at−1,N + λX2t,N − aT

t−1,NXt−1,N Xt−1,N

|Xt−1,N |21, t = (p + 1), . . . , N, (2)

where X Tt−1,N = (1, X2

t−1,N , . . . , X2t−p,N ) and where |Xt−1,N |1 = 1 +

∑pj=1 X2

t−j,N and we

start with the initial conditions ap,N = (0, . . . , 0). This algorithm is linear in the es-

timators, despite the nonlinearity of the tvARCH process. We call the stochastic algo-

rithm defined in (2) the ARCH normalised recursive estimation (ANRE) algorithm. Let

a(u)T = (a0(u), . . . , ap(u)), then at,N is regarded as an estimator of a( tN ) or of a(u) if

|t/N − u| < 1/N .

In this paper we will prove consistency and asymptotic normality of this recursive estima-

tor. Furthermore, we will discuss the improvements of the estimator obtained by combining

two estimates from (2) with different λ. Unlike most other work in the area of recursive

1

estimation the properties of the estimator are proved under the assumptions that the true

process is a process with time-varying coefficients, i.e. a nonstationary process. The rescal-

ing of the coefficients in (1) to the unit interval corresponds to the ’infill asymptotics’ in

nonparametric regression: As N → ∞ the system does not describe the asymptotic behavior

of the system in a physical sense but is meant as a meaningful asymptotics to approximate

e.g. the distribution of estimates based on a finite sample size. A similar approach has been

used in Moulines et al. (2005) for tvAR-models. A more detailed discussion of the relevance

of this approach and the relation to non-rescaled processes can be found in Section 3.

In fact the ANRE algorithm resembles the NLMS-algorithm investigated in Moulines

et al. (2005). Rewriting (2) we have

at,N = (I − λXt−1,NX T

t−1,N

|Xt−1,N |21) at−1,N + λ

X2t,NXt−1,N

|Xt−1,N |21. (3)

We can see from (3) that the convergence of the ANRE algorithm relies on showing some

type of exponential decay of the past. In this paper we will show that for any p > 0

E

k∏

i=1

(I − λ1

|Xt−i−1,N |21Xt−i−1,NX T

t−i−1,N )

p

≤ K(1 − λδ)k for some δ > 0, (4)

where ‖·‖ denotes the spectral norm of a matrix and∏n

i=0 Ai = A0 · · ·An. Roughly speaking

this means we have, on average, exponential decay of the past. Similar properties are often

established in the control literature and it is often referred to as persistence of excitation

of the stochastic matrix (see, for example, Guo (1994) and Aguech et al. (2000)), which

in our case is the matrix (1 − λ|Xt−1,N |21

Xt−1,NX Tt−1,N ). Persistence of excitation guarentees

convergence of the algorithm, which we will use to prove the asymptotic properties of at,N .

In Section 2 we state all results on the asymptotic behaviour of at,N including con-

sistency, asymptotic normality and rate efficiency. Furthermore, we suggest a modified

algorithm based on two parallel algorithms. In Section 3 we discuss practical implications.

Sections 4 and 5 contain the proofs which in large parts are based on the pertubation tech-

nique. We also derive a lower bound for the optimal rate of such estimators and prove that

it is achieved. Some technical methods are put in the appendix. We note that some of re-

sults in the appendix are of independent interest, as they deal with many of the probablistic

properties of both ARCH and tvARCH processes and their vector representations.

2 The ANRE algorithm

We first review some properties of the tvARCH process. Dahlhaus and Subba Rao (2006)

and Subba Rao (2004) have shown that the time varying ARCH process can locally be

approximated by a stationary ARCH process. Let u be fixed and

Xt(u) = Ztσt(u) σt(u)2 = a0(u) +

p∑

j=1

aj(u)Xt−j(u)2, (5)

2

where Zt are independent, identically distributed random variables with E(Z0) = 0 and

E(Z20 ) = 1. We also set Xt(u)T = (1, Xt(u)2, . . . , Xt−p+1(u)2). In Lemma 4.1 we show that

Xt(u)2 can be regarded as the stationary approximation of X2t,N around the time points

t/N ≈ u.

Let λmin(A) and λmax(A) denote the smallest and largest eigenvalues of the matrix A.

We use the following assumptions.

Assumption 2.1 Let Xt,N and Xt(u) be sequences of stochastic processes which satisfy

(1) and (5) respectively. Furthermore

(i) For some r ∈ [1,∞), there exists η > 0 with

E(Z2r0 )1/r sup

u

p∑

j=1

aj(u) < 1 − η.

(ii) There exists 0 < ρ1 ≤ ρ2 < ∞ such that for all u ∈ (0, 1] ρ1 ≤ a0(u) ≤ ρ2.

(iii) There exists β ∈ (0, 1] and a constant K such that for u, v ∈ (0, 1]

|aj(u) − aj(v)| ≤ K|u − v|β for j = 0, . . . , p.

(iv) Let Y0(u) = a0(u)Z20 and

and Yt(u) =

a0(u) +t∑

j=1

aj(u)Yt−j(u)

Z2t for t = 1, . . . , p.

Define Y p(u) = (1, Y1(u), . . . , Yp(u))T and Σ(u) = E(

Y p(u)Y p(u)T)

. Then there

exists a constant C such that

infu

λmin

Σ(u)

≥ C.

Remark 2.1 It is clear that Σ(u) is a positive semi-definite matrix, hence its smallest

eigenvalue is greater than or equal to zero. It can be shown that if p/E(Z4t )1/2 < 1 and

supu a0(u) > 0, then λmin(Σ(u)) > (1 − p/E(Z4t )1/2)/a0(u)(2p+1)/(p+1). However this con-

dition is only sufficient and lower bounds can be obtained under much weaker conditions.

We now investigate some of the asymptotic properties of the ANRE algorithm. Through-

out this paper we assume that λ → 0 and λN → ∞ as N → ∞. We mention explicitly that

λ does not depend on t, i.e. we are considering the fixed-stepsize case. The assumption

λ → 0 is possible in the triangle array framework of our model and the resulting asser-

tions (for example Theorem 2.2) are meant as an approximation of the corresponding finite

sample size distributions and not as the limit in any physical sense.

3

The following results are based on a representation proved at the end of Section 4.3.

The difference at0,N − a(u0) is dominated by two terms, that is

at0,N − a(u0) = Lt0(u0) + Rt0,N (u0) + Op(δN ), (6)

where

δN = (1

(Nλ)2β+

√λ

(Nλ)β+ λ +

1

Nβ), (7)

Lt0(u0) =

t0−p−1∑

k=0

λI − λF (u0)kMt0−k(u0) (8)

Rt0,N (u0) =

t0−p−1∑

k=0

λI − λF (u0)k

(

Mt0−k(t0 − k

N) −Mt0−k(u0) +

F (u0)a(t0 − k

N) − a(u0)

)

,

with

Mt(u) = (Z2t − 1)σt(u)2

Xt−1(u)

|Xt−1(u)|21(9)

and

F (u) = E

(X0(u)X0(u)T

|X0(u)|21

)

. (10)

We note that Lt0(u0) and Rt0,N (u0) play two different roles. Lt0(u0) is the weighted sum of

the stationary random variables Xt(u0)t, which locally approximate the tvARCH process

Xt,Nt, whereas Rt0,N (u0) is the (stochastic) bias due to nonstationarity; if the tvARCH

process were stationary this term would be zero. It is clear from the above that the magni-

tude of Rt0,N (u0) depends on the regularity of the time-varying parameters a(u) e.g., the

Holder class that a(u) belongs to. By using (6) we are able to obtain a bound for the mean

squared error of at0,N . Let | · | denote the Euclidean norm of a vector.

Theorem 2.1 Suppose Assumption 2.1 holds with r > 4 and u0 > 0. Then if |u0−t0/N | <

1/N we have

E

|at0,N − a(u0)|2

= O(λ +1

(Nλ)2β), (11)

where λ → 0, as N → ∞ and Nλ >> (log N)1+ε with some ε > 0.

The proof can be found at the end of Section 4.2. We note that an immediate consequence

of Theorem 2.1 is that the estimator at0,NP→ a(u0).

4

We note that the condition r > 4, is quite strong. We use the condition r = 4, to

prove persistence of excitation (see Section A.2), which is an integral part in the proof of

the theorem. However, we believe that it is possible to prove the same result under lower

moment conditions. In order to obtain the mean squared error we use the condition r > 4.

The stochastic term Lt0(u0) is the sum of martingale differences, which allows us to

obtain the following central limit theorem, whose proof is at the end of Section 4.3.

Theorem 2.2 Suppose Assumption 2.1 holds with r > 4 and u0 > 0. Let Rt0,N (u0) be

defined as in (8). If |t0/N − u0| < 1/N ,

(i) λ >> N−4β/(4β+1) and λ >> N−2β then

λ−1/2at0,N − a(u0) − λ−1/2Rt0,N (u0)D→ N (0, Σ(u0)), (12)

(ii) and λ >> N− 2β

2β+1 then

λ−1/2at0,N − a(u0) D→ N (0, Σ(u0)), (13)

where λ → 0 as N → ∞ and Nλ >> (log N)1+ε, for some ε > 0, with

Σ(u) =µ4

2F (u)−1

E(σ1(u)4X0(u)X0(u)T

|X0(u)|41)

(14)

and µ4 = E(Z40 ) − 1.

Remark 2.2 (Stationary ARCH-models) We note that an analogous algorithm to the

ANRE algorithm exists for stationary ARCH processes. Suppose the stationary ARCH(p)

process Xtt satisfies

Xt = Ztσt σ2t = a0 +

p∑

j=1

ajX2t−j ,

where Ztt are iid random variables with E(Zt) = 0, E(Z2t ) = 1, a0 > 0, aj ≥ 0 if 1 ≤ j ≤ p,

and∑p

j=1 aj < 1. Let aT = (a0, . . . , ap) and

at = at−1 +K

tX2

t − aTt−1Xt−1

Xt−1

|Xt−1|21, t ≥ (p + 1),

where X Tt−1 = (1, X2

t−1, . . . , X2t−p) and K is a finite positive constant. Then given the

observations Xk (k = 1, . . . , t), we use at as an estimator of a. Since the stepsize λ = Kt

does not fulfill our assumptions we cannot apply the above theorems directly. However, we

believe that by using the same methodology it can be shown that

atP→ a,

5

and

t1/2(at − a)D→ N (0, Σ) with Σ =

µ4

2E(X0X T

0

|X0|21)−1

E(σ4

1X0X T0

|X0|41)

.

This algorithm has the advantage that it is simple to evaluate and gives “stable” estimators,

in the sense that the difference |at − at+1|1 is small.

Until now we have assumed that a(u) ∈ Lip(β), where β ≤ 1. Let f(u) denote the

derivative of the vector or matrix f(·) with respect to u, Suppose 0 < β′ ≤ 1 and a(u) ∈Lip(β′), then we say a(u) ∈ Lip(1+β′). We now show that an exact expression for the bias

can be obtained if a(u) ∈ Lip(1 + β′) and β′ > 0.

We make the following assumptions.

Assumption 2.2 Let Xt,N be a sequence of stochastic processes which satisfy (1). Sup-

pose

(i) Assumption 2.1 holds.

(ii) For some β′ > 0

|ai(u) − ai(v)| ≤ K|u − v|β′

, for i = 0, . . . , p.

Under the above assumptions we show in Lemma 5.3 that

Eat0,N − a(u0) = − 1

NλF (u0)

−1a(u0) + O(1

(Nλ)1+β′ ). (15)

We note that at0,N is not a typical parameter estimator of an ARCH process, where usually

it is not possible to obtain an exact expression for the bias.

By using the bias above we obtain the following theorem, whose proof is at the end of

Section 5. Let tr(·) denote the trace of the matrix.

Theorem 2.3 Suppose Assumption 2.2 holds with r > 4 and u0 > 0. Then if |t0/N −u0| <

1/N , we have

E|at0,N − a(u0)|2 = λ trΣ(u0) +1

(Nλ)2∣

∣F (u0)−1a(u0)|2 + (16)

O

1

(Nλ)2+β′ +λ1/2

(Nλ)1+β′ +1

(Nλ)2

,

and if λ is such that λ−1/2/(Nλ)1+β′ → 0, then

λ−1/2(at0,N − a(u0)) + λ−1/2 1

NλF (u0)

−1a(u0)D→ N (0, Σ(u0)), (17)

where λ → 0 as N → ∞ and λN >> (log N)1+ε, for some ε > 0.

6

We now consider the minimax risk for estimators of tvARCH processes. Let ⌊β⌋ denote

the largest integer strictly less than β and define the norm ‖ · ‖β on f : [0, 1] → Rp+1 as

‖f‖β = supu,v

|f ⌊β⌋(u) − f ⌊β⌋(v)|1|u − v|β−⌊β⌋

.

Let Cβ(L) define the class

Cβ(L) =

f(·) =(

f0(·), . . . , fp+1(·))

∣ f : [0, 1] → (R+)p+1, ‖f‖β ≤ L, (18)

supu

p+1∑

i=1

fi(u) < 1 − δ, 0 < ρ1 < infu

f0(u) < ρ2 < ∞

.

It is clear if a(·) ∈ Cβ(L) are the parameters of the tvARCH process Xt,N, then Assump-

tion 2.1(i,ii,iii) is satisfied. In the following theorem we evaluate a lower bound for the

minimax risk.

Theorem 2.4 Let fZ denote the density function of Zk. Suppose E(Z4t ) < ∞ and

Z = E

1 + Zkd log fZ(x)

dx⌋x=Zk

2

< ∞.

Let Cβ(L) be defined as in (18) and If |t0/N − u0| < 1/N , then for any ‖x‖ ≤ 1, we have

infat0,N∈σ(Xt,N ;t=1,...,N)

supa(u)∈Cβ(L)

E[xT (at0,N − a(u0))(at0,N − a(u0))T x]

≥ C(Z, E(Z40 ), ρ1)) N−2β/(2β+1), (19)

where Xt,N is a tvARCH process defined by the parameters a(·) and C(Z, E(Z40 , ρ1)) is a

constant which only depends on Z, E(Z40 ) and ρ1.

Comparing this bound with (11) it is straightforward to show that the ANRE algorithm

attains the optimal rate if a(·) ∈ Lip(ν) with ν ≤ 1 (with λ = N− 2ν1+2ν ). The story is different

when 1 < ν < 2. If a(·) ∈ Lip(1+β′), β′ > 0, the mean squared error of the ANRE estimator

in (16) gets minimal for λ ≈ N−2/3 with minimum rate E|at0,N − a(u0)|2 = O(N−2/3).

However, in (19) the minimax rate in this case is N−

2(1+β′)

1+2(1+β′) which is smaller than N−2/3.

We now present a recursive method which attains the optimal rate even in this case.

Remark 2.3 (Bias correction, rate optimality) The idea here is to achieve a bias cor-

rection and the optimal rate by running two ANRE algorithms with different stepsizes λ1

and λ2 in parallel. Let at,N (λ1) and at,N (λ2) be the ANRE algorithms with stepsize λ1 and

λ2 respectively, and assume that λ1 > λ2. By using (15) for i = 1, 2 we have

Eat0,N (λi) = a(u0) −1

NλiF (u0)

−1a(u0) + O( 1

(Nλi)1+β′

)

. (20)

7

Since a(u0) − 1Nλi

F (u0)−1a(u0) ≈ a

(

u0 − 1Nλi

F (u0)−1)

we heuristically estimate a(u0 −1

NλiF (u0)

−1) instead of a(u0) by the algorithm. By using two different λi we can find a

linear combination of the corresponding estimates such that we “extrapolate” the two values

a(u0) − 1Nλi

F (u0)−1a(u0) (i = 1, 2) to a(u0). Formally let 0 < w < 1, λ2 = wλ1 and

at0,N (w) =1

1 − wat0,N (λ1) −

w

1 − wat0,N (λ2).

If |t0/N − u0| < 1/N , then by using (20) we have

Eat0,N (w) = a(u0) + O( 1

(Nλ)1+β′

)

.

By using Propostion 4.3 we have

E|at0,N − a(u0))|2 = O(λ +1

(Nλ)2(1+β′)),

and choosing λ = const × N−(2+2β′)/(3+2β′) gives the optimal rate. It remains the problem

of choosing λ (and w). It is obvious that λ should be chosen adaptively to the degree of

nonstationarity. That is λ should be large if the characteristics of the process are changing

more rapidly. However, a more specific suggestion would require more investigations - both

theoretically and by simulations.

The above method can not be extended to higher order derivatives since then the other

remainders of order (Nλ)−2 (see (16) and the proof of Theorem 2.3).

Finally we mention that choosıng λ2 < wλ1 will lead to an estimator of a(u0 + ∆) with

some ∆ > 0 (with rate as above). This could be the basis for the prediction of volatility of

time varying ARCH-processes.

3 Practical implications

Suppose that we observe data from a (non-rescaled) time-varying ARCH-process in discrete

time

Xt = Ztσt, σ2t = a0(t) +

p∑

j=1

aj(t)X2t−j , t ∈ Z. (21)

In order to estimate a(t) we use the estimator at as given in (2) (with all subscripts

N dropped). An approximation for the distribution of the estimator is given by Theo-

rem 2.2. Theorem 2.2(ii) can be used directly since it is completely formulated without

N . The matrices F (u0) and Σ(u0) depend on the unknown stationary approximation

Xt(u0) of the process at u0 = t0/N , i.e. at time t0 in non-rescaled time. Since this ap-

proximation is unknown we may use instead the process itself in a small neighborhood

of t0, i.e. we may estimate for example F (u0) by 1m

∑m−1j=0

Xt0−jXTt0−j

|Xt0−j |21with m small and

8

X Tt−1 = (1, X2

t−1, . . . , X2t−p). An estimator which fits better to the recursive algorithm is

[1 − (1 − λ)t0−p+1]−1∑t0−p

j=0 λ(1 − λ)j Xt0−jXTt0−j

|Xt0−j |21. In the same way we can estimate Σ(u0)

which altogether leads e.g. to an approximate confidence interval for at. In a similar way

Theorem 2.2(i) can be used.

The situation is more difficult with Theorem 2.1 and Theorem 2.3, since here the results

depend (at first sight) on N . Suppose that we have parameter functions aj(·) and some

N > t0 with aj(t0N ) = aj(t0) (i.e. the original function has been rescaled to the unit interval).

Consider Theorem 2.3 with the functions aj(·). The bias in (15) and (16) contains the term

1

N˙aj(u0) ≈

1

N

aj(t0N ) − aj(

t0−1N )

1N

= aj(t0) − aj(t0 − 1)

which again is independent of N . To avoid confusion we mention that 1N

˙aj(u0) of course

depends on N once the function aj(·) has been fixed (as in the asymptotic approach of

this paper) but it does not depend on N when it is used to approximate the function aj(t)

since then the function aj(·) is a different one for each N . In the spirit of the remarks

above we would e.g. use as an estimator of 1N

˙aj(u0) in (15) and (16) the expression

[1 − (1 − λ)t0−p+1]−1∑t0−p

j=0 λ(1 − λ)j[

aj(t0) − aj(t0 − j)]

.

These considerations also demonstrate the need for the asymptotic approach of this

paper: While it is not possible to set down a meaningful asymptotic theory for the model

(21) and to derive e.g. a central limit theorem for the estimator at, the approach of the

present paper for the rescaled model (1) leads to such results. This is achieved by the ’infill

asymptotics’ where more and more data become available of each local structure (e.g. about

time u0) as N → ∞. The results can then be used also for approximations in the model

(21) - e.g. for confidence intervals.

4 Proofs

4.1 Some preliminary results

In the next lemma we give a bound for the approximation error between X2t,N and Xt(u)2.

The proofs of these results and further details can be found in Dahlhaus and Subba Rao

(2006) (see also Subba Rao (2004)).

Lemma 4.1 Suppose Assumption 2.1 holds with some r ≥ 1. Let Xt,N and Xt(u) be

defined as in (1) and (5). Then we have

(i) Xt(u)2t is a stationary ergodic sequence. Furthermore there exists a stochastic pro-

cess Vt,Nt and a stationary ergodic process Wtt with supt,N E(|Vt,N |r) < ∞ and

E(|Wt|r) < ∞, such that

|X2t,N − Xt(u)2| ≤ 1

NβVt,N + | t

N− u|βWt (22)

9

|Xt(u)2 − Xt(v)2| ≤ |u − v|βWt. (23)

(ii) supt,N E(X2rt,N ) < ∞ and supu E(Xt(u)2r) < ∞.

We now define the derivative process by X2tt(u)t.

Lemma 4.2 Suppose Assumption 2.2 holds with some r ≥ 1. Then the derivative X2t(u)t

is a well defined process and satisfies the representation

X2t(u) =

a0(u) +

p∑

j=1

aj(u)X2t−j(u) + aj(u)Xt−j(u)2

Z2t .

Xt(u)2t is a stationary, ergodic process with

|X2t(u) − X2

t(v)| ≤ |u − v|β′

Wt, (24)

where Wt is the same Wt used in Lemma 4.1. Almost surely all paths of Xt(u)2u belong

to Lip(1) and we have the Taylor series expansion

X2t,N = Xt(u)2 + (

t

N− u)X2

t(u) + (| t

N− u|1+β′

+1

N)Rt,N ,

where |Rt,N |1 ≤ (Vt,N + Wt). Furthermore, supN X2t,N ≤ Wt, supu Xt(u)2 ≤ Wt and

supu |X2t(u)| < Wt. And the norms are bounded: supu E(|X2

t(u)|r) < ∞ and supt,N E(|Rt,N |r) <

∞.

Define Ft = σ(Z2t , Z2

t−1, . . .), (is clear that Ft = σ(X2t,N , X2

t−1,N , . . .) = σ(Xt(u)2, Xt−1(u)2, . . .),

since a Volterra expansion gives Xt,N in terms of Z2t t and the ARCH equations give Zt

in terms of Xtt). We now consider the mixing properties of functions of the processes

Xt,Nt and Xt(u)t. The proof of the proposition below can be found in Section A.1.

Proposition 4.1 Suppose Assumption 2.1 holds with r = 1. Let Xt,N and Xt(u) be

defined as in (1) and (5) respectively. Then there exists a (1− η) ≤ ρ < 1 such that for any

φ ∈ Lip(1)∣

∣E[

φ(Xt,N )|Ft−k

]

− E[

φ(Xt,N )]∣

1≤ Kρk(1 + |Xt−k,N |1), (25)

∣E[

φ(Xt(u))|Ft−k

]

− E[

φ(Xt(u))]∣

1≤ Kρk(1 + |Xt−k(u)|1), (26)

∣E[

φ(Xt,N )|Ft−k

]∣

1≤ K(1 + ρk|Xt−k,N |1), (27)

if j, k > 0 then∣

∣E[

φ(Xt,N )|Ft−k

]

− E[

φ(Xt,N )|Ft−k−j

]∣

1≤ Kρk (|Xt−k,N |1 + |Xt−k−j,N )|1) , (28)

and if Assumption 2.1 holds with some r > 1 then for 1 ≤ q ≤ r we have

E(

|Xt,N |q1∣

∣Ft−k) ≤ K|Xt−k,N |q1 (29)

where the constant K is independent of t, k, j and N .

10

The corollary below follows by using (25) and (26).

Corollary 4.1 Suppose Assumption 2.1 holds. Let Xt,N and Xt(u) be defined as in

(1) and (5) respectively and φ ∈ Lip(1). Then φ(Xt,N ) and φ(Xt(u)) are Lq-mixingales

of size −∞.

4.2 The pertubation technique

In this section we use the pertubation technique, introduced in Aguech et al. (2000), to

study the ANRE estimator, which we use to show consistency. To analyse the algorithm we

compare it with a similar algorithm, which is driven with the true parameters and where

Xt,N has been replaced by the stationary process Xt(u). Let δt,N (u) = at,N − a(u),

Mt,N = (Z2t − 1)σ2

t,N

Xt−1,N

|Xt−1,N |21, Mt(u) = (Z2

t − 1)σt(u)2Xt−1(u)

|Xt−1(u)|21, (30)

Ft,N =Xt,NX T

t,N

|Xt,N |21, Ft(u) =

Xt(u)Xt(u)T

|Xt(u)|21, (31)

and F (u) be defined as in (10). The ‘true algorithm’ is

a(u) = a(u) + λXt(u)2 − a(u)TXt−1(u) Xt−1(u)

|Xt−1(u)|21− λMt(u). (32)

An advantage about the specific form of the random matrices Ft,N is that |Ft,N |1 ≤ (p+1)2.

This upper bound will make some of the analysis easier to handle.

By subtracting (32) from (2) we get

δt,N (u) =(

I − λFt−1,N

)

δt−1,N (u) + λBt,N (u) + λMt(u), (33)

where

Bt,N (u) = Mt,N −Mt(u) + Ft−1,Na(t

N) − a(u). (34)

We note that (33) can also be considered as a recursive algorithm, thus we call (33) the

error algorithm. There are two terms driving the error algorithm: the bias Bt,N (u) and the

stochastic term, Mt(u). Because the error algorithm is linear with respect to the estimators

we can separate δt,N (u) in terms of the bias and the stochastic terms:

δt,N (u) = δBt,N (u) + δM

t,N (u) + δRt,N (u),

where

δBt,N (u) = (I − λFt−1,N )δB

t−1,N (u) + λBt,N (u), δBp,N (u) = 0 (35)

δMt,N (u) = (I − λFt−1,N )δM

t−1,N (u) + λMt(u), δMp,N (u) = 0 (36)

and δRt,N (u) = (I − λFt−1,N )δR

t−1,N (u), δRp,N (u) = −a(u). (37)

11

We have for y ∈ B, M, R

δyt,N (u) =

t−p−1∑

k=0

k−1∏

i=0

(

I − λFt−i−1,N

)

Dyt−k,N (u), (38)

where DBt,N (u) = λBt,N (u), DM

t,N (u) = λMt(u), DRt,N (u) = 0 if t > p and DR

p,N (u) = −a(u).

It is clear for δyt,N (u) to converge, that the random product

∏t−1i=0(I − λFt−i−1,N ) must

decay. Technically this is one of the main results. It is stated in the following theorem.

Theorem 4.1 Suppose the Assumption 2.1 holds with r = 4 and N is sufficiently large.

Then for all q ≥ 1 and (p + 1) ≤ t ≤ N there exists M > 0, δ > 0 such that

E

k−1∏

i=0

(

I − λXt−i−1,NX T

t−i−1,N

‖Xt−i−1,N‖21

)

q

≤ M exp−δλk (39)

Under Assumption 2.1 and by using the proposition above, there exists a δ > 0 such

that

(

E|δRt,N |q

)1/q ≤ M exp−δλt, (40)

for 1 ≤ q ≤ r and t = p + 1, . . . , N . Therefore this term decays exponentially and, as we

shall see below, is of lower order than δBt,N (u) and δM

t,N (u).

We now study the stochastic bias at t0 with |t0/N −u0| < 1/N . It is straightforward to

see that δBt,N (u) = δ1,B

t,N (u) + δ2,Bt,N (u) + δ3,B

t,N (u), where

δ1,Bt,N (u) = (I − λFt−1,N )δ1,B

t−1,N (u) + λMt,N −Mt(u),

δ2,Bt,N (u) = (I − λFt−1,N )δ2,B

t−1,N (u) + λF (u)a(t

N) − a(u),

δ3,Bt,N (u) = (I − λFt−1,N )δ3,B

t−1,N (u) + λ(Ft−1,N − F (u))a(t

N) − a(u), (41)

δ1,Bp,N = 0, δ2,B

p,N = 0 and δ3,Bp,N = 0.

In order to obtain asymptotic expressions for the expectation of each component δ1,Bt0,N ,

δ2,Bt0,N , δ2,B

t0,N and δMt0,N we use the pertubation technique proposed by Aguech et al. (2000).

Roughly speaking this means substituting the product of random matrices by a deterministic

product and showing that the difference is of a lower order. We can decompose for x =

M, (1, B), (2, B) the stochastic and bias terms as follows

δxt0,N (u0) = Jx,1

t0,N (u0) + Jx,2t0,N (u0) + Hx

t0,N (u0), (42)

where

Jx,1t,N (u0) = (I − λF (u0))J

x,1t−1,N (u0) + Gx

t,N , (43)

Jx,2t,N (u0) = (I − λF (u0))J

x,2t−1,N (u0) − λFt−1,N − F (u0)Jx,1

t−1,N (u0),

and Hxt,N (u0) = (I − λFt−1,N )Hx

t−1,N (u0) − λFt−1,N − F (u0)Jx,2t−1,N (u0),

12

with

GMt,N = λMt(u0), G1,B

t,N = λ[Mt,N −Mt(u0)],

G2,Bt,N = F (u0)[a( t

N ) − a(u)], G3,Bt,N = (Ft−1,N − F (u0))[a(

t

N) − a(u)] for t < t0.(44)

Furthermore, Jx,1p,N (u0) = Jx,2

p,N (u0) = Hxp,N (u0) = 0. (42) can easily be derived by taking the

sum on both sides of the three equations in (43). In the proposition below we will show that

Jx,1t0,N (u0) is the principle term in the expansion of δx

t0,N (u0), i.e. with δxt0,N (u0) ≈ Jx,1

t0,N (u0).

Substituting (44) into (43) we have

J(1,B),1t0,N (u0) =

t0−p−1∑

k=0

λI − λF (u0)kMt0−k,N −Mt0−k(u0)

J(2,B),1t0,N (u0) =

t0−p−1∑

k=1

I − λF (u0)kF (u0)a(t0 − k

N) − a(u0)

J(3,B),1t0,N (u0) =

t0−p−1∑

k=1

I − λF (u0)k(Ft−k−1,N − F (u0))a(t0 − k

N) − a(u0) (45)

In the proof below we require the following definition

Dt,N =

p−1∑

i=0

[Vt−i,N + Wt−i], for t ≥ p (46)

where Vt,N and Wt are defined as in Lemma 4.1.

Proposition 4.2 Suppose Assumption 2.1 holds with r > 4 and let δBt0,N (u0) and J

(1,B),1t0,N (u0),

J(2,B),1t0,N (u0) and J

(3,B),1t0,N (u0), be defined as in (35) and (45). Then for |t0/N − u0| < 1/N

and λN ≥ (log N)1+ε, for some ε > 0, we have

(i)

(

E|J (1,B),1t0,N (u0) + J

(2,B),1t0,N (u0)|r

)1/r= O(

1

(Nλ)β) (47)

(ii) (E|J (3,B),1t0,N (u0)|r)1/r = O( 1

(Nλ)2β + λ1/2(Nλ)−β)

(iii) and

(

E|δBt0,N (u0) − J

(1,B),1t0,N (u0) − J

(2,B),1t0,N (u0)|2

)1/2= O(

1

(Nλ)2β+

√λ

(Nλ)β+ λ). (48)

PROOF. We first prove (i). Let us consider J(1,B),1t0,N (u0). By using (117) we have

|Mt,N −Mt(u0)|1 ≤ |Z2t + 1|

1

Nβ+ (

p

N)β + | t

N− u0|β

Dt,N .

13

Therefore by substituting the above bound into J(1,B),1t0,N (u0) and by using (110) we have

(

E|J (1,B),1t0,N (u0)|r

)1/r≤

t0−p−1∑

k=0

λ‖(I − λF (u0))k‖ (E|Mt0−k,N −Mt0−k(u0)|r)1/r

≤ K

t0−p−1∑

k=0

λ(1 − δλ)k1 + pβ + kβ(E(Z20 + 1)r)1/r(E|Dt0−k−1,N |r)1/r.

Now by using Lemma 4.1(i) we have that supt,N ‖Dt,N‖Er < ∞. Furthermore, from Moulines

et al. (2005), Lemma C.3 we have

N∑

k=1

(1 − λ)kkβ ≤ λ−1−β . (49)

By using the above we obtain

(E|J (1,B),1t0,N (u0)|r)1/r ≤ K sup

t,N(E|Dt,N |r)1/r(E|Z2

0 + 1|r)1/r 1

(Nλ)β= O(

1

(Nλ)β). (50)

We now bound (E|J (2,B),1t0,N (u0)|r)1/r. Since a(u) ∈ Lip(β), by using (110) and (49) we have

(E|J (2,B),1t0,N (u0)|r)1/r ≤ C

t0−p−1∑

k=1

λ(1 − δλ)k( k

N

)β= O(

1

(λN)β). (51)

Thus (50) and (51) give the bound (47), which completes the proof of (i).

To prove (ii), we now bound (E|J (3,B),1t0,N (u0)|r)1/2. We first observe that J

(3,B),1t0,N (u0) can

be written as

J(3,B),1t0,N (u0) = It0,N + IIt0,N

where It0,N =

t0−p−1∑

k=1

I − λF (u0)k(Ft−k−1,N − Ft−k−1(u0))a(t0 − k

N) − a(u0)

IIt0,N =

t0−p−1∑

k=1

I − λF (u0)k(Ft−k−1(u0) − F (u0))a(t0 − k

N) − a(u0).

Using the same proof to prove J(1,B),1t0,N (u0), it is straightforward to show that (E|It0,N |r)1/r =

O((Nλ)−2β). In order to bound IIt0,N , let Fk(u0) = Fk(u0) − F (u0) and φ(x) = xxT /|x|21where x = (1, x1, . . . , xp), then φ(x) ∈ Lip(1) and Ft,N = φ(Xt,N ). Since φ ∈ Lip(1), by

using Corollary 4.1 we have that Ft(u0) − F (u0) can be written as the sum of martingale

differences:

Ft(u0) − F (u0) =∞∑

ℓ=0

mt(ℓ), (52)

14

where mt(ℓ) is a (p+1)×(p+1) matrix, defined by mt(ℓ) = E(Ft,N |Ft−ℓ)−E(Ft,N |Ft−ℓ−1).By using (28) we have (E|mt(ℓ)|r/2)2/r ≤ Kρℓ. Substituting the above into B2,B

t0,N we have

IIt0,N =∞∑

ℓ=0

t0−p−1∑

k=0

λI − λF (u0)kmt0−k−1(ℓ)a(t0 − k

N) − a(u0)

IIt0,Ni =

p+1∑

j=1

∞∑

ℓ=0

t0−p−1∑

k=0

λI − λF (u0)kmt0−k−1(ℓ)ijaj(t0 − k

N) − aj(u0).

We note that ‖I − λF (u0)k‖ ≤ K(1 − λδ)k (see (110)). Furthermore, if t1 < t2 then

Emt1(ℓ)mt2(ℓ) = Emt1(ℓ)E(mt2(ℓ)|Ft1−ℓ) = 0, therefore mt(ℓ)t is a sequence of mar-

tingale differences. Therefore, since J(2,B),1t0−k−1,N is deterministic we can use Burkholder’s

inequality (cf. (Hall & Heyde, 1980), Theorem 12.2) to obtain

(E|IIt0,N |r)1/r ≤ Kλ

∞∑

ℓ=0

p+1∑

i,j=1

(

E∣

t0−p−1∑

k=0

I − λF (u0)kmt0−k−1(ℓ)ija(t0 − k

N)j − a(u0)j

r

)1/r

≤ Kλ∞∑

ℓ=0

p+1∑

i,j=1

2r

t0−p−1∑

k=0

(1 − λδ)2k(

E|mk(ℓ)ij |r)2/r

(k

N)β

≤ K

(Nλ)β

∞∑

ℓ=0

ρℓλ1/2 = O(λ1/2

(Nλ)β). (53)

Thus we have proved (ii).

We now prove (iii). By using (42) we have

(E|δBt0,N (u0) − J

(1,B),1t0,N (u0) − J

(2,B),1t0,N (u0)|r/2)2/r

= (E|J (3,B),1t0,N (u0)|r/2)2/r + (E|

3∑

i=1

J(i,B),2t0,N (u0)|r/2)2/r + (E|

3∑

i=1

H(i,B)t0,N (u0)|r/2)2/r + O(

1

Nβ+ exp−λδt0).

We partition∑3

i=1 J(i,B),2t0,N into four terms:

3∑

i=1

J(i,B),2t0,N = AB

t0,N + B1,Bt0,N + B2,B

t0,N + B3,Bt0,N (54)

where,

ABt0,N = −

t0−p−1∑

k=0

λI − λF (u0)k[Ft0−k−1,N − Ft0−k−1(u0)] ×(

3∑

i=1

J(i,B),1t0−k−1,N

)

and, for y ∈ 1, 2, 3,

By,Bt0,N = −

t0−p−1∑

k=0

λI − λF (u0)k[Ft0−k−1(u0) − F (u0)]J(y,B),1t0−k−1,N (u0).

15

We first bound ABt0,N . By using (50), (51), (53) and (119) we have

(E|ABt0,N |r/2)2/r ≤ K

(Nλ)2β.

By using (134) and (135) we have

(E|B1,Bt0,N |r/2)r/2 = E

∣−t0−p−1∑

k=0

t0−p−k−1∑

i=0

λ2I − λF (u0)k+iFt0−k−1(u0) ×

Mt0−k−i−1,N −Mt0−k−i−1(u0)∣

r/2)2/r ≤ Kλ.

Using a similar proof to the above to bound (E|J (3,B),1t0−k−1,N (u0)|r)1/r we can show that

‖B2,Bt0,N‖E

r/2 = O(λ1/2(Nλ)−β). By using (53) it is straightforward to show that ‖B3,Bt0,N‖E

r/2 =

O((Nλ)−2β+λ1/2(Nλ)−β). Substituting the above bounds for (E|ABt0,N |r/2)2/r, (E|B1,B

t0,N |r/2)2/r,

(E|B2,Bt0,N |r/2)2/r and (E|B3,B

t0,N |r/2)2/r into (54) we obtain

(E|J (1,B),2t0,N (u0) + J

(2,B),2t0,N (u0) + J

(3,B),2t0,N (u0)|r/2)2/r = O(δN ) (55)

We now prove (iii) by bounding for y = 1, 2, 3, ‖H(y,B)t0,N (u0)‖E

2 . By using Holder’s

inequality, (55), Theorem 4.1 and that Ft,N are bounded random matrices we have

(E|H(1,B)t0,N (u0) + H

(2,B)t0,N (u0) + H

(3,B)t0,N (u0)|2)1/2 ≤

t0−p−1∑

k=0

λ

(

E‖(

k−1∏

i=0

(I − λFt0−i,N ))

‖2r/(r−4)

)(r−4)/2r

(E|[Ft0−k−1,N − F (u0)]J (1,B),2t0−k−1,N (u0) + J

(2,B),2t0−k−1,N (u0) + J

(3,B),2t0−k−1,N|r/2)2/r

≤ K( 1

(Nλ)2β+

λ1/2

(Nλ)β+ λ

)

. (56)

Since (E|J (i,B),2t0,N (u0)|2)1/2 ≤ E|J (i,B),2

t0,N (u0)|r/2)2/r, by substituting (55) and (56) into (54)

we have

(E|δBt0,N (u0) − J

(1,B),1t0,N (u0) − J

(2,B),1t0,N (u0|2)1/2

= O(1

(Nλ)2β+

√λ

(Nλ)β+ λ).

and we obtain (48).

In the following lemma we show that δMt0,N (u0) is dominated by JM,1

t0,N (u0), where

JM,1t0,N (u0) =

t0−p−1∑

k=0

λ(I − λF (u0))kMt0−k(u0). (57)

We observe that JM,1t0,N (u0) and Lt0(u0) (defined (8)) are the same.

16

Proposition 4.3 Suppose Assumption 2.1 holds with r > 4. Let δMt0,N (u0) and JM,1

t0,N (u0) be

defined as in (36) and (57). Then for |t0/N − u0| < 1/N and λN ≥ (log N)1+ε, for some

ε > 0, we have

(E|JM,1t0,N (u0)|r)1/r = O(

√λ) (58)

(E|δMt0,N (u0) − JM,1

t0,N (u0)|2)1/2 = O(λ +

√λ

(Nλ)β). (59)

PROOF. Since each component of the vector sequence Mt(u0) is a martingale difference

we can use the Burkholder inequality componentwise and (110) to obtain

(E|JM,1t0,N (u0)|r)1/r ≤ λ

2r

t0−p−1∑

k=0

(1 − δλ)2k(E|Z2t0−k − 1|r)2/r(E|σt0−k(u0)

2Xt0−k−1(u0)

|Xt0−k−1(u0)|21|r)2/r

1/2

≤ K√

λ. (60)

Since δMt0,N (u0) − JM,1

t0,N (u0) = JM,2t0,N (u0) + HM

t0,N (u0), to prove (59) we bound JM,2t0,N (u0) and

HMt0,N (u0). By using the same arguments given in Proposition 4.2 and (60) we have

(E|JM,2t0,N (u0)|r/2)2/r ≤ K

(

λ +

√λ

(Nλ)β

)

. (61)

Finally, we obtain a bound for HMt0,N (u0). By using Holder’s inequality, Theorem 4.1 and

(E‖Ft0−k−1,N − F (u0)‖r/2)2/r ≤ 2(p + 1)2 we have

(E|HMt0,N (u0)|2)1/2 ≤ 2

t0−p−1∑

k=0

λ‖(

k−1∏

i=0

(I − λFt−i−1,N ))

‖2r/(r−4))(r−4)/2r ×

(E|Ft0−k−1,N − F (u0)JM,2t0−k−1,N (u0)|r/2)2/r

≤ K(

λ +

√λ

(Nλ)β

)

. (62)

Thus we have shown (59).

By using Propositions 4.2 and 4.3 we have established that J(1,B),1t0,N (u0), J

(2,B),1t0,N (u0) and

JM,1t0,N (u0) are the principle terms in at0,N − a(u0). More precisely we have shown

at0,N − a(u0) = J(1,B),1t0,N (u0) + J

(2,B),1t0,N (u0) + JM,1

t0,N (u0) + R(1)N (63)

where ‖R(1)N ‖E

2 = O(δN ) (δN is defined in (7)). Furthermore, if λ → 0, λN → ∞ as N → ∞,

then this leads to consistency of at0,N . By using (63), we show below that an upper bound

for the mean squared error can be immediately obtained.

PROOF of Theorem 2.1 By substituting the bounds for (E|J (1,B),1t0,N (u0)+J

(2,B),1t0,N (u0)|2)1/2

and (E|JM,1t0,N (u0)|2)1/2 (in Propositions 4.2 and 4.3) into (63) we have E(|at0,N − a(u0)|2) =

O( 1(λN)β + λ1/2 + δN )2. Thus we have (11).

17

4.3 Proof of Theorem 2.2

We can see from Propositions 4.2 and 4.3 that the J(1,B),1t0,N (u0) + J

(1,B),2t0,N (u0) and JM,1

t0,N (u0)

are leading terms in the expansion of δt0,N . For this reason to prove Theorem 2.2 we need

only to consider these terms. We observe that

J(1,B),1t0,N (u0) + J

(2,B),1t0,N (u0) = Rt0,N (u0) + R

(2)N , (64)

where have replaced Mt,N by Mt(tN ) leading to the remainder

R(2)N =

t0−p−1∑

k=0

λI − λF (u0)k

Mt0−k,N −Mt0−k(t0 − k

N)

.

By using (22) we have (E|R(2)N |r)1/r ≤ K

Nβ , for all t, N . Therefore under Assumption 2.1

with r > 2, if λN ≥ (log N)1+ε for some ε > 0, then by substituting (64) into (63) we have

at0,N − a(u0) = Lt0(u0) + Rt0,N (u0) + R(3)N , (65)

where (E|R(3)N |2)1/2 = O(δN ). In the proposition below we show asymptotic normality of

Lt0(u0), which we use together with (65) to obtain asymptotic normality of at0,N

Proposition 4.4 Suppose Assumption 2.1 holds with r = 1 and for some δ > 0, E(Z4+δ0 ) <

∞. Let Lt0(u0) and Σ(u) be defined as in (8) and (14) respectively. If |t0/N − u0| < 1/N ,

then we have

λ−1/2Lt0(u0)D→ N (0, Σ(u0)), (66)

where λ → 0 as N → ∞ and λN ≥ (log N)1+ε, for some ε > 0.

PROOF. Since Lt0(u0) is the weighted sum of martingale differences the result follows from

the martingale central limit theorem and the Cramer-Wold device. It is straightforward to

check the conditional Lindeberg condition, hence we omit this verification. We simply note

that by using (111) we obtain the limit of the conditional variance of Lt0(u0):

µ4

t0−p−1∑

k=0

λ2I − λF (u0)2kσt0−k(u0)4Xt0−k−1(u0)Xt0−k−1(u0)

T

|Xt0−k−1(u0)|41P→ Σ(u0). (67)

We use the proposition above to prove asymptotic normality of at0,N (after a bias correc-

tion).

PROOF of Theorem 2.2. It follows from (65) that

λ−1/2at0,N − a(u0) = λ−1/2Rt0,N (u0) + λ−1/2Lt0(u0) + Op(λ1/2δN ).

18

Therefore, if λ >> N−4β/(4β+1) and λ >> N−2β then λ−1/2

(Nλ)2β → 0 and λ−1/2

Nβ → 0 respec-

tively, and we have

λ−1/2at0,N − a(u0) = λ−1/2Rt0,N (u0) + λ−1/2Lt0(u0) + op(1).

By using the above and Proposition 4.4 we obtain (12). Finally, since |Rt0,N (u0)|1 =

Op(1

(Nλ)β ), if λ >> N− 2β

2β+1 then λ−1/2Rt0,N (u0)P→ 0 and we have (13).

5 An asymptotic expression for the bias under additional

regularity

In this section, under additional assumptions on the smoothness of the parameters a(u), we

obtain an exact expression for the bias, which we use to prove Theorem 2.3.

In the section above we have shown that δBt0,N ≈ Rt0,N (u0). Since Mt(u) is a function

on Xt(u)2 whose derivative exists, the derivative of Mt(u) also exists and is given by

Mt(u) = (Z2t − 1)

Ft−1(u)a(u) + Ft−1(u)a(u)

, (68)

where

Ft−1(u) = Y t−1(u)Y t−1(u)T + Y t−1(u)Y t−1(u)T , (69)

with Y t−1(u) = 1|Xt−1(u)|1

Xt−1(u) and

Y t−1(u) =−∑p

j=1 X2t−j(u)

[1 +∑p

j=1 Xt−j(u)2]2

1

Xt−1(u)2

. . .

Xt−p(u)2

+1

1 +∑p

j=1 Xt−j(u)2

0

X2t−1(u)

. . .

X2t−p(u)

It is interesting to note that, like Mt(u)t, Mt(u)t is a sequence of vector martingale

differences. We will use Mt(u) to refine the approximation Rt0,N (u0) and show Rt0,N (u0) ≈Rt0,N (u0), where

Rt0,N (u0) =

t0−1∑

k=0

λ (1 − λF (u0)) (t0 − k

N− u0)

Mt0−k(u0) + F (u0)a(u0)

. (70)

We use this to obtain Theorem 2.3.

Lemma 5.1 Suppose Assumption 2.2 holds with some r ≥ 2. Then we have

∣Mt(u) − Mt(v)∣

1≤ K|u − v|β′ |Z2

t + 1|Lt, (71)

where Lt =

1 +∑p

k=1 Wt−k

2, with (E(L

r/2t )2/r < ∞. and almost surely

Mt(v) = Mt(u) + (v − u)Mt(u) + (u − v)1+β′

Rt(u, v) (72)

where |Rt(u, v)| ≤ Lt.

19

PROOF. By using (68), a(u) ∈ Lip(1 + β′) and that |Ft−1(u)|1 ≤ (p + 1)2 we have

|Mt(u) − Mt(v)|1≤ K|Z2

t + 1|

|Ft−1(u) − Ft−1(v)|1 + |Ft−1(u) − Ft−1(v)|1 + supu

|Ft−1(u)|1|u − v|

.

(73)

In order to bound (73) we consider Ft(u) and its derivatives. We see that |Ft−1(u) −Ft−1(v)|1 ≤ K|u − v|Lt, thus bounding the first bound on the right hand side of (73). To

obtain the other bounds we use (69). Now by using (23) and Lemma 4.2 we have that

supu |Ft−1(u)|1 ≤ KLt and |Ft−1(u) − Ft−1(v)|1 ≤ KL2t |u − v|β. Altogether this verifies

(71). By using the Cauchy-Schwartz inequality we have that (E(Lr/2t )2/r < ∞.

Finally we prove (73). Since Lt is a well defined random variable almost surely all the

paths of Mt(u) ∈ Lip(1 + β′). Therefore, there exists a set N , such that P (N ) = 0 and for

every ω ∈ N c we can make a Taylor expansion of Mt(u, ω) about w and obtain (72).

We now use (72) to show that Rt0,N (u0) ≈ Rt0,N (u0).

Lemma 5.2 Suppose Assumption 2.2 holds with r ≥ 2. Then if |t0/N −u0| < 1/N we have

Rt0,N (u0) = Rt0,N (u0) + R(4)N , (74)

where E(|R(4)N |r)1/r = O( 1

(Nλ)1+β′ ).

PROOF. To obtain the result we make a Taylor expansion of a( t−kN ) and Mt−k(

t−kN ) about

u0, and substitute this into Rt0,N . By using Lemma 5.1 we obtain the desired result.

In the lemma below we consider the mean and variance of the bias and stochastic terms

Rt0,N (u0) and Lt0(u0), which we use to obtain an asymptotic expression for the bias. We

will use the following results. Since infu λmin(F (u)) > C > 0 (see (108)), we have

t−1∑

k=0

λI − λF (u)kF (u) → I, (75)

t−1∑

k=0

λ2I − λF (u)2k(k

N)2∥

∥ = O(1

N2λ), (76)

if λ → 0, tλ → ∞ as t → ∞.

Lemma 5.3 Suppose Assumption 2.2 holds with r > 4. Let Rt0,N (u0), Lt0(u0) and Σ(u)

be defined as in (8) and (14) respectively. Then if |t0/N − u0| < 1/N we have

E (Rt0,N (u0)) = − 1

NλF (u0)

−1a(u) + O(1

(Nλ)1+β′ ), (77)

20

var (Rt0,N (u0)) = O(1

N2λ+

1

N), (78)

var (Lt0(u0)) = λΣ(u0) + o(λ) (79)

and E (Lt0(u0)) = 0.

PROOF. Since ∂Mk(u)∂u are martingale differences, by using appling (74) to (75) we have

(77). We now show (78). By using (76), supu |∂a(u)∂u | < ∞ and supu |Ft(u)| < (p + 1)2 we

have

var

Rt0,N (u0) = µ4

t0−p−1∑

k=0

λ2I − λF (u0)2k(k

N)2var

Ft0−k−1(u0)a(u0) + Ft0−k−1(u0)a(u0)

+ O(1

N2)

= O(1

N2λ+

1

N2).

It is straightforward to show (79) using (111). Finally, since Lt0(u0) is the sum of martingale

differences, E(Lt0(u0)) = 0.

From the lemma above it immediately follows that

Rt0,N (u0) = − 1

NλF (u0)

−1a(u0) + Op

1

(Nλ)1+β′ +1

Nλ1/2

(80)

and (Nλ)Rt0,N (u0)ℓ2→ −F (u0)

−1a(u0). We now use the above prove Theorem 2.3.

PROOF of Theorem 2.3 By substituting (80) into (65) (with β = 1) we have

at0,N − a(u0) =−1

NλF (u0)

−1a(u0) + Lt0(u0) + δ′NR(5)N , (81)

where (E|R(5)N |2)1/2 < ∞ and δ′N = 1

(Nλ)1+β′ + 1(Nλ)2

+ 1λ1/2N

+λ+ 1N1/2 . By using the above

and var(Lt0(u0)) = λΣ(u0) we have

E|at0,N − a(u0)|2 = λ trΣ(u0) +1

(Nλ)2∣

∣F (u0)−1a(u0)

2+ O

([

(Nλ)−1 +√

λ + δ′N

]

δ′N

)

,

which gives (16). In order to prove (17) we use (81) and Proposition 4.4. We first note that

if λ−1/2/(Nλ)1+β′ → 0, then by using (81) we have

λ−1/2at0,N − a(u0) + λ−1/2 1

λNF (u0)

−1a(u0) = λ−1/2Lt0(u0) + op(1).

Therefore by using Proposition 4.4 we have (17).

21

6 A lower bound for the minimax rate

PROOF of Theorem 2.4 We derive a lower bound for the minimax rate using the van

Trees inequality (cf. Gill and Levit (1995) and Moulines et al. (2005)). In order to do this

we define a prior distribution on a subset of Cβ(L), which we then go on to bound using

the van Trees inequality.

Let us suppose the function φ : [0, 1] → R+ is such that ‖φ‖β ≤ L, supu |φ(u)| = 1 and

∫ 10 φ(u)2du = 1. Let IT = (1, . . . , 1) ∈ R

p+1, JT = (ρ1, 0, . . . , 0) ∈ Rp+1 and define the set

CwN =

J + ηφ(wn(· − u0))I; η ∈ [0, min(w−βN , p−1(1 − δ))]

.

It is straightforward to show for a large enough wN that CwN ⊆ Cβ(L). Hence if a(·) ∈ CwN ,

and are the parameters of the tvARCH process Xt,N, then Assumption 2.1(i,ii,iii) is

satisfied and E(X2t,N ) < (ρ1 + 1)/(1 − δ).

Let λ : [0, 1] → R+ be a density which is continuous with respect to the Lebesgue

measure and define λN (·) = wβNλ(wβ

N ·). It is clear that we can use λN (·) as a density on

the set CwN . For each a(·) = J + ηφ(wn(· − u0))I ∈ CwN we define the tvARCH process

Xt,N = σt,NZt σ2t,N = ρ1 + ηφ(wn(

t

N− u0))Xt−1,N ,

where X Tt−1,N = (1, X2

t−1,N , . . . , X2t−p,N ). Let Eη be the expectation under the measure

generated by the parameters a(·) = (J + ηφ(wn(· − u0))I) and EλNthe expectation under

the measure λN (·). Since CwN ⊆ Cβ(L) it can be shown that

infat0,N∈σ(Xt,N ;t=1,...,N)

supa(u)∈Cβ(L)

E[xT (at0,N − a(u0))(at0,N − a(u0))T x]

≥ EλNinf

at0,N∈σ(Xt,N ;t=1,...,N)Eη[x

T (at0,N − a(u0))(at0,N − a(u0))T x]. (82)

We will assume that φ(·) and ρ1 are known. Now by using the van Trees inequality we have

for all |x| ≤ 1 and at0,N ∈ σ(Xt,N ; t = 1, . . . , N), that

EλNEη[x

T (at0,N − a(u0))(at0,N − a(u0))T x] ≥ 1

I(λN ) + EλN(I(η))

, (83)

where

I(λN ) =

λN (x)

(

d log λN (x)

dx

)2

dx

and

IN (η) =

∫ (

d log fη(x1,N , . . . , xN,N )

dη⌋η=η

)2

fη(x1,N , . . . , xN,N )dy1,N . . . , dy1,N

with fη(x1,N , . . . , xN,N ) the density of Xt,NNt=1. We now derive asymptotic expressions

for I(λN ) and I(η).

22

Using that P (Xt,N ≤ y|Xt−1,N ) = PZ(y/σ2t,N ), where PZ is the distribution function of

Zt. The log density of Xt,NNt=1 is

log fηxN,N , . . . , x1,N =N∑

t=p+1

log1

σt,NfZ

xt,N

σt,N

+ log fXp,N(x1,N , . . . , xp,N ),

where fZ is the density function of Zt, σt,N = [1 + ηφ(wN ( tN − u0))|xt−1,N |1]1/2, with

xt−1,N = (1, x2t−1,N , . . . , x2

t−p,N ) and fXp,Nis the density of Xp,N . Define

ℓt,N (ζ; xt,N , xt−1,N ) = −1

2log[ρ1 + ζφ(wN (

t

N− u0))|xt−1,N |1] −

xt,N

2[ρ1 + ζφ(wN ( tN − u0))|xt−1,N |1]1/2

.

Differentiating ℓt,N (ζ; xt,N , xt−1,N ) wrt. ζ and evaluating it at η, it is straightforward to see

that

2ℓt,N (η ; xt,N , xt−1,N ) = −φ(wN ( t

N − u0))|xt−1,N |1σ2

t,N

(

ρ1 + ztd log fZ(zt)

dzt

)

,

where zt = xt,N/σt,N . Now let us suppose

Ap,N = Eη

(∂ log fXp,NX1,N , . . . , Xp,N

∂η

)2< ∞.

It is well known that the derivative of the log conditional densities are a sequence of

martingale differences; that is ℓt,N (η; Xt,N ,Xt−1,N ) : t = p + 1, . . . , N, are martingale

differences. From this it follows that for t > p, we have

(

∂ log fXp,N(X1,N , . . . , Xp,N )

∂ηℓt,N (η; Xt,N ,Xt−1,N )

)

= 0.

Therefore, the corresponding Cramer-Rao bound is

IN (η) = Eη

∂ log fXp,N(X1,N , . . . , Xp,N )

∂η+

N∑

t=p+1

ℓt,N (η; Xt,N ,Xt−1,N )

2

=1

4

N∑

t=p+1

φwN (t

N− u0)2

E

[ |Xt−1,N |1σ2

t,N

(

ρ1 + Ztd log fZ(Zt)

dZt

)]2

+ Ap,N .

Since η → 0, as N → ∞, for large enough wN we have that w−βN pE(Z4

t )1/2 < 1, which implies

supt Eη(X4t,N ) < [(ρ1+w−β

N )E(Z4t )1/2/(1−pw−β

N E(Z4t )1/2)]2. Therefore, since |xt−1,N |/σ2

t,N <

|xt−1,N |/ρ1 we obtain

IN (η) ≤ 1

4sup

tEη(|Xt−1,N |21)

N∑

t=p+1

φwN (u0 −t

N)2Z + Ap,N

≤ Z(p + 1)(ρ1 + w−β

N )2E(Z4t )

4ρ1[(1 − pw−βN E(Z4

t )1/2)]2

N∑

t=p+1

φwN (u0 −t

N)2 + Ap,N

≤ K(Z, E(Z40 ), ρ1)

N

wN

∫ 1

0φ(x)2dx + o(

N

wN), (84)

23

where Z = E(ρ1 + Ztd log fZ(x)

dx ⌋x=Zt)2 and for a large enough wN , K(Z, E(Z4

0 ), ρ1) = Z(p +

1)(2ρ1)2E(Z4

t )

4ρ1[(1−E(Z4t )−1/2)]2

. We now bound I(λN ). It is clear that

I(λN ) = w2βN I(λ) where I(λ) =

∫ 1

0

∂ log λ(u)

∂u

2λ(u)du. (85)

Substituting (85) and (84) into (82) and (83) we obtain

infat0,N∈σ(Xt,N ;t=1,...,N)

supa(u)∈Cβ(L)

E[xT (at0,N − a(u0))(at0,N − a(u0))T x]

≥ 1

K(Z, E(Z40 ), ρ1)

NwN

∫ 10 φ(x)2dx + w2β

N I(λ)

By letting wN = O(N1/(1+2β)), the above is minimised and we obtain

infat0,N∈σ(Xt,N ;t=1,...,N)

supa(u)∈Cβ(L)

E[xT (at0,N − a(u0))(at0,N − a(u0))T x] ≥ K(Z, E(Z4

0 ), ρ1)N−2β/(2β+1),

where C(Z, E(Z40 ), ρ1) is a constant which depends only on Z, E(Z4

0 ), ρ1, thus giving the

required result.

A Appendix

A.1 Mixingale properties of φ(Xt,N)

Our object in this section is to prove Proposition 4.1. We do this by using the random

vector representation of the tvARCH process. Let

At(u) =

a1(u)Z2t a2(u)Z2

t . . . ap−1(u)Z2t ap(u)Z2

t

1 0 . . . 0 0

0 1 . . . 0 0...

.... . .

......

0 0 . . . 1 0

At(u) =

(

0 0Tp

0p At(u)

)

, (86)

bt(u)T = (1, a0(u)Z2t , 0T

p−1). By using the definition of the tvARCH process given in (1) we

have that the tvARCH vectors Xt,Nt satisfy the representation

Xt,N = At(t

N)X t−1,N + bt(

t

N). (87)

We note that (87) looks like a vector AR process, the difference is that At(tN ) is a ran-

dom matrix. However, similar to the vector AR case, it can be shown that the product∏t

k=0 Ak(kN ) decays exponentially. It is this property which we use to prove Proposition

4.1.

24

Let AN (t, j) =∏j−1

i=0 At−i(t−iN )

, A(u, t, j) =∏j−1

i=0 At−i(u)

, AN (t, 0) = Ip+1 and

A(u, t, 0) = Ip+1. By expanding (87) we have

X t,N = AN (t, k)X t−k,N +k−1∑

j=0

AN (t, j) b t−j(t − j

N). (88)

Suppose A is a (p + 1) × (p + 1) dimensional random matrix, with (i, j)th element Aij .

Let [A]q be a (p + 1) × (p + 1) dimensional matrix where

[A]q =(

E|Aqij |)1/q

; i, j = 1, . . . , p + 1

.

We will make frequent use of the following inequalities, which can easily be derived by

repeatedly applying Holder’s and Minkowski’s inequalities. Suppose A and X are indepen-

dent random matrices and vectors, respectively, then(

E∣

∣AX∣

q)1/q ≤∣

∣[A]q[X]q∣

1, and if X

has only positive elements then E(

|AX|1∣

∣X)

≤ E(

|[A]1X |1∣

∣X)

. Now by using Subba Rao

(2004), Example 2.3 and Corollary A.2, we have

‖[AN (t, k)]q‖ ≤ K ρk and ‖[A(u, t, k)]q‖ ≤ Kρk, (89)

where K is a finite constant.

PROOF of Proposition 4.1 We first prove (25). Let CN (t, k) :=∑k−1

j=0 AN (t, j) b t−j(t−jN ),

that is

X t,N = AN (t, k)X t−k,N + CN (t, k) (90)

with AN (t, k), CN (t, k) ∈ σ(Zt, . . . , Zt−k+1) and X t−k,N ∈ Ft−k. In particular we have

Eφ(

CN (t, k))

|Ft−k) = Eφ(

CN (t, k))

. (91)

Furthermore, by using Minkowski’s inequality it can be shown that

E(|AN (t, k)Xt−k,N |q|Ft−k)1/q ≤ K|[AN (t, k)]qXt−k,N |1.

The Lipschitz continuity of φ and (88) now imply |φ(

X t,N

)

−φ(

CN (t, k))

| ≤ K|AN (t, k)X t−k,N |.Therefore, by using the above we obtain

|Eφ(Xt,N )|Ft−k) − Eφ(Xt,N )|1 (92)

= |Et−kφ(Xt,N ) − φ(

CN (t, k))

|Ft−k) − Eφ(Xt,N ) − φ(

CN (t, k))

|1≤ K

(

|[AN (t, k)]1X t−k,N |1 + E|[AN (t, k)]1X t−k,N |1)

≤ K ‖[AN (t, k)]1‖|X t−k,N |1 + E(

‖[AN (t, k)]1‖|X t−k,N |1)

≤ K ‖[AN (t, k)]1‖(

|X t−k,N |1 + E|X t−k,N |1)

≤ Kρk(

1 + |X t−k,N |1)

25

since E|X t−k,N |1 is uniformly bounded thus giving (25). The proof of (26) is the same as

the proof above, so we omit details. (27) follows from (25) with the triangular inequality.

To prove (28) we use (90). Since

Eφ(

CN (t, k))

|Ft−k = Eφ(

CN (t, k))

= Eφ(

CN (t, k))

|Ft−k−j

we obtain as in (92)

|Eφ(Xt,N )|Ft−k − Eφ(Xt,N )|Ft−k−j|1≤ K E|[AN (t, k)]1X t−k,N |1|Ft−k + E|[AN (t, k)]1X t−k,N |1

∣Ft−k−j≤ Kρk

(

|X t−k,N |1 + |X t−k,N |1)

which gives (28). To prove (29) we use (92). We first note that

[

E|CN (t, k)|q∣

∣Ft−k

]1/q ≤ K

k−1∑

j=0

[AN (t, j)]q[bt−j(t − j

N)]q

1

≤ K

1 − ρ

and by using Minkowski’s inequality and the equivalence of norms, there exists a constant

K independent of Xt,N such that

E(

|Xt,N |q∣

∣Ft−k

)1/q ≤

E(

|AN (t, k)Xt−k,N |q∣

∣Ft−k

)1/q+

E(

|CN (t, k)|q∣

∣Ft−k

)1/q.

Now by using

E(

|AN (t, k)Xt−k,N |q∣

∣Ft−k

)1/q ≤ Kρk|Xt−k,N |1 we have

E

|Xt,N |q∣

∣Ft−k

Kρk|Xt−k,N |1 +K

1 − ρ

q ≤ K|Xt−k,N |q1,

hence we obtain (29)

We use the corollary below to prove Lemma A.7.

Corollary A.1 Suppose Assumption 2.1 holds with r = q0 and E(Z4q00 ) < ∞. Let f, g :

Rp+1 → R

(p+1)×(p+1) be Lipschitz continuous functions, such that for all positive x ∈ Rp+1,

|f(x)|abs ≤ Ip+1 and |g(x)|abs ≤ Ip+1, where Ip+1 is a p + 1× p + 1-dimensional matrix with

one in all the elements, and |A|abs = (|Ai,j | : i, j = 1, . . . , p + 1). Then for q ≤ q0 we have

[

E(

E

(Z2t−k+1 − 1)f(Xt−k,N )g(Xt(u))

∣Ft−k−j

− E

(Z2t−k+1 − 1)f(Xt−k,N )g(Xt(u))

)q]1/q

≤ KE(Z40 ) + 1ρj

(

(E|Xt−k−j,N |q)1/q + (E|Xt−k−j(u)|q)1/q)

, (93)

and[

E(

E

(Z2t−k+1 − 1)f(Xt−k(u))g(Xt(u))

∣Ft−k−j

− E

(Z2t−k+1 − 1)f(Xt−k(u))g(Xt(u))

)q]1/q

≤ KE(Z40 ) + 1ρj

(

(E|Xt−k−j(u)|q)1/q + (E|Xt−k−j(u)|q)1/q)

, (94)

for j, k ≥ 0, where ρ is such that (1 − η) < ρ < 1.

26

PROOF. We give the proof of (93) only, the proof of (94) is the same. We will use the same

notation introduced in the proof of Proposition 4.1 and let C(u, t, k) =∑k−1

j=0 A(u, t, j) b t−j(u),

that is Xt(u) = A(u, t, k + j)Xt−k−j(u) + C(u, t, k + j).

We will use (91). Since |f(x)|abs ≤ Ip+1 and |g(x)|abs ≤ Ip+1 we have

|(Z2t−k+1 − 1)

(

f(Xt−k,N )g(Xt(u)) − f(CN (t − k, j))g(C(u, t, k + j)))

|≤ |Z2

t−k+1 + 1|(

|AN (t − k, j)Xt−k−j,N |1 + |A(u, t, k + j)Xt−k−j(u)|1)

. (95)

Since A(u, t, k − 1), (Z2t−k + 1)At−k−1(u) and A(u, t − k, j) are independent matrices and

by using (89) we have

E∥

∥[(Z2t−k+1 + 1)A(u, t, k + j)]1

∥ ≤ E∥

∥[A(u, t, k − 1)]1∥

∥××∥

∥[(Z2t−k+1 + 1)At−k+1(u)]1

1

∥[A(u, t − k, j)]1∥

∥ ≤ KE(Z4t−k+1 + 1)ρk+j−1.

(96)

Considering the conditional expectation of (95), by using (96) and

|Et−k−j(Z2t−k+1 − 1)AN (t − k, j)Xt−k−j,N| ≤ Kρj |Xt−k−j,N |1 we have

|E[

(Z2t−k+1 − 1)

f(Xt−k,N )g(Xt(u)) − f(CN (t − k, j))g(C(u, t, k + j)

|Ft−k−1

]

|≤ Kρj |Xt−k−j,N |1E(Z2

t−k+1 + 1) + KE(Z4t−k+1 + 1)ρk+j−1|Xt−k−j(u)|1

≤ KE(Z40 + 1)ρj

(

|Xt−k−j,N |1 + |Xt−k−j(u)|1)

.

Using similar arguments we obtain the bound:

(

E∣

∣E[

(Z2t−k+1 − 1)

f(Xt−k,N )g(Xt(u)) − f(CN (t − k, j))g(C(u, t, k + j)]∣

r)1/r

≤ KE(Z40 + 1)ρj

(

E(|Xt−k−j,N |r)1/r + (E|Xt−k−j(u)|r)1/r)

leading to the result.

A.2 Persistence of excitation

As we mentioned in Section 1, a crucial component in the analysis of many stochastic,

recursive algorithms is to establish persistence of excitation of the transition random matrix

in the algorithm: in our case this implies showing Theorem 4.1. Intuitively, it is clear

that the verification of this result depends on showing that the smallest eigenvalue of the

conditional expectation of of the semi-positive definite matrix Xt,NX Tt,N/‖Xt,N‖2

1 is bounded

away from zero. In particular this is one of the major conditions given in Moulines et al.

(2005), Theorem 16, where conditions were given under which persistence of excitation can

be established. Their result is for a quite general class of algorithms which satisfy

Y t = I − λAt(λ)Y t−1 + Bt.

In our case the random matrix is not a function of λ, so to reduce notation we state the

theorem below for the case where At(λ) = At. Let M+d be the space of all positive semi-

definite symmetric matrices.

27

Theorem A.1 (Moulines, Priouret and Roueff) Let (Ω,F , P, F : n ∈ N) be a filtered

space. Let φt, t ≥ 0 and Vt, t ≥ 0 be non-negative adaptive processes such that Vt ≥ 1

for all t ≥ 1. Let A := At : t ≥ 0 be an adapted M+d valued process. Let s be a positive

integer, and let R1 and µ1 be positive constants. Assume that

(a) there exist µ < 1 and B ≥ 1 such that for all t ∈ N,

E(Vt+s|Ft) ≤ µVtI(φt > R1) + BVtI(φt ≤ R1); (97)

(b) there exists an α1 > 0 such that for all t ∈ N;

E

λmin

(

t+s∑

l=t+1

Al

)

|Ft

≥ α1I(φt ≤ R1); (98)

(c) there exists a λ1 such that for all λ ∈ [0, λ1] and t ∈ N; ‖λAt‖lub,2 ≤ 1;

(d) there exist a q > 1 and C1 > 0 such that for all t ∈ N and λ ∈ [0, λ1]

I(φt ≤ R1)t+s∑

l=t+1

E(

‖Al‖q|Ft

)

≤ C1.

Then for any p ≥ 1 there exist C0, δ0 and µ0 > 0 such that, for all λ ∈ [0, λ1] and n ≥ 1,

we have

E(

‖n∏

i=1

(I − λAi)‖p|F0

)

≤ C0e−δ0λnV0.

We use this theorem to prove Theorem 4.1, where At := Ft,N , φt := |Xt,N |1 and Vt :=

|Xt,N |1. (97) is often referred to as the drift condition, and concerns the actual process

Xt,N. Roughly speaking, under this condition if |Xt,N |1 is large, then after some time

it should ‘drift’ back to the interval [0, R1]. On the other hand (98), places conditions on

the algorithm, in particular the random matrices Ft,Nt. This is when excitation occurs;

loosely this means if |Xt,N |1 ∈ [0, R1] then ‖∏s−1i=0 (I − λFt+i)‖ ≤ (1 − δλ). The analysis

can then be conducted for |Xt,N |1 in the interval [0, R1], the drift condition then is used to

extend the results over the whole real line.

Suppose X is a random variable and define Et(X) = E(X|Ft).

Lemma A.1 Suppose that Assumption 2.1 holds with r = 2. Then, we have

λmin

(

E

Xt(u)Xt(u)T |Ft−k

)

> C, (99)

and for N large enough

λmin

(

E

Xt,NX Tt,N |Ft−k

)

>C

2, (100)

for k ≥ (p + 1), where C is a finite constant independent of t, N and u.

28

PROOF. We will prove (100). The proof of (99) is similar.

Our object is to partition Xt,NX Tt,N as follows

Xt,NX Tt,N = ∆1 + ∆2, (101)

where ∆1 and ∆2 are positive-definite matrices, ∆1 ∈ σ(Zt, . . . , Zt−p) and ∆2 ∈ Ft,N . This

implies λminEt−k(Xt,NX Tt,N ) ≥ λminE(∆1), if k ≥ p + 1, which allows us to obtain

a uniform lower bound for λminEt−k(Xt,NX Tt,N ). To facilitate this we represent Xt,N in

terms of martingale vectors.

By using (1) we have

X2t,N = a0(

t

N) +

p∑

i=1

ai(t

N)X2

t−i,N + (Z2t − 1)σ2

t,N

and

Xt,N = Θ(t

N)Xt−1,N + (Z2

t − 1)σ2t,ND (102)

where D is a (p + 1)-dimensional vector with Di = 0 if i 6= 2 and D2 = 1 and

Θ(u) =

1 0 0 0 0 0

a0(u) a1(u) a2(u) . . . ap−1(u) ap(u)

0 1 0 . . . 0 0

0 0 1 . . . 0 0...

.... . .

......

0 0 0 . . . 1 0

.

Iterating (102) (p + 1)-times we have

Xt,N =

p∑

i=0

(Z2t−i − 1)σ2

t−i,Ni−1∏

j=0

Θ(t − j

N)D +

p∏

j=0

Θ(t − j

N)Xt−p−1,N .

Let µ4 = E(Z2t − 1)2, Θt,N (i) =

∏i−1j=0 Θ( t−j

N ) and Bt,N (i) = Θt,N (i)DDT Θt,N (i)T . Since

(Z2t − 1)σ2

t,Nt are martingale differences we have

E(Xt,NX Tt,N |Ft−k,N ) = µ4

p∑

i=0

Et−k(σ4t−i,N )Θt,N (i)DDT Θt,N (i)T + (103)

+Θt,N (p + 1)Et−k(Xt−p−1,NX Tt−p−1,N )Θt,N (p + 1)T , for k ≥ p + 1.

Since the matrices above are non-negative definite, we have that

λmin

E(Xt,NX Tt,N |Ft−k)

≥ λmin

µ4

p∑

i=0

Et−k(σ4t−i,N )Bt,N (i)

. (104)

29

We now refine µ4∑p

i=0 Et−k(σ4t−i,N )Bt,N (i) to obtain ∆1. By using (88) we have

Xt,N =

p∑

j=0

AN (t, j)bt−j(t − j

N) + AN (t, p + 1)Xt−p−1,N .

Therefore by using (1) and X2t−i,N = (Xt,N )i+1 we have

σ2t−i,N =

1

Z2t−i

p∑

j=0

AN (t, j)bt−j(t − j

N)

i+1+

1

Z2t−i

AN (t, p + 1)Xt−p−1,Ni+1

= Ht−i,N (t) + Gt−i,N (t), (105)

where Ht−i,N (t) ∈ σ(Zt−i, . . . , Zt−p) and Gt−i,N (t) ∈ Ft−i. Since Ht−i,N (t) and Gt−i,N (t)

are positive this implies with (104)

λminE(Xt,NX Tt,N |Ft−k) ≥ λmin

Pt,N

,

with

Pt,N = µ4

p∑

i=0

E(Ht−i,N (t)2)Bt,N (i).

To bound this we define the corresponding terms for the stationary approximation Xt(u).

We set

P (u) = µ4

p∑

i=0

E(Ht−i(u, t)2)B(u, i),

where

Ht−i(u, t) =1

Z2t−i

p∑

j=0

A(u, t, j)bt−j(u)

i+1

and B(u, i) = Θ(u)iDDT (Θ(u)i)T . A close inspection of the above calculation steps reveals

that

P (u) = E

Y p(u)Y p(u)T

with Y p(u) from Assumption 2.1(iv). Therefore,

λmin

P (t

N)

≥ infu

λminP (u) ≥ C.

Since aj(·)j is β-Lipschitz we have

|E(Ht−i(t

N, t)2) − E(Ht−i,N (t)2)|1 ≤ K

|Bt,N (i) − B(t

N, i)|1 ≤ K

Nβ, for i = 0, . . . , p − 1.

30

Therefore ‖Pt,N − P ( tN )‖ ≤ |Pt,N − P ( t

N )|1 ≤ KNβ which leads to

λminPt,N = λminP (t

N) + [Pt,N − P (

t

N)] ≥ λminP (

t

N) − ‖Pt,N − P (

t

N)‖ ≥ C − K

and therefore to

λmin

(

E

Xt,NX Tt,N |Ft−k,N

)

> C − K

Nβ. (106)

Thus for N large enough we have (100). (99) follows in the same way.

Lemma A.2 Suppose that Assumption 2.1 holds with r = 4. Then, for k ≥ p + 1, we have

for N large enough

λmin

(

EXt,NX T

t,N

|Xt,N |21∣

∣Ft−k

)

>C

|Xt−k,N |41(107)

and

λmin

(

EXt(u)Xt(u)T

|Xt(u)|21)

> C, (108)

where C is a constant independent of t, N and u.

PROOF. We now prove (107). By definition

λminEt−k(Xt,NX T

t,N

|Xt,N |21) = inf

|x|=1Et−k

(xTXt,N

|Xt,N |1)2

.

Since

(xTXt,N )2 =xTXt,N

|Xt,N |1xTXt,N |Xt,N |1

and sup|x|=1 |xTXt,N |1 ≤ |Xt,N |1, we obtain by using Cauchy’s inequality and (29)

Et−k(xTXt,N )2 ≤

Et−k(xTXt,N

|Xt,N |1)21/2Et−k(x

TXt,N |Xt,N |1)21/2

Et−k(xTXt,N

|Xt,N |1)21/2Et−k(|Xt,N |41)1/2

Et−k(xTXt,N

|Xt,N |1)21/2K|Xt−k,N |411/2.

Therefore by using the above and Lemma A.1 for large N we obtain

inf|x|=1

Et−k

(xTXt,N

|Xt,N |1)2 ≥ inf

|x|=1

[

Et−k(xTXt,N )2

]2

Et−k(|Xt,N |41)≥ C

|Xt−k,N |41,

where C is a positive constant, thus giving (107). To prove (108) we use (99) to get

E(Xt(u)Xt(u)T ) = E−∞(Xt(u)Xt(u)T ) > C,

31

and using the arguments above we have

λmin

(

EXt(u)Xt(u)T

|Xt(u)|21)

>C

E(|Xt−k(u)|41).

By Lemma 4.1 supu E(|Xt(u)|41) < ∞, which leads to (108).

Corollary A.2 Suppose Assumption 2.1 holds with r = 4. Let F (u) be defined as in (10).

Then there exists C and λ1 such that for all λ ∈ [0, λ1] and u ∈ (0, 1] we have

λmaxI − λF (u) ≤ (1 − λC). (109)

There exists a 0 < δ ≤ C and K such that for all k

‖I − λF (u)k‖ ≤ K(1 − λδ)k. (110)

Furthermore

t−1∑

k=0

λI − λF (u)2k → 1

2F (u)−1 (111)

where λ → 0 and λt → ∞ as t → ∞.

PROOF. (109) follows directly from (108). Furthermore, since (I − λF (u)) is symmetric

matrix, we have ‖I − λF (u)k‖ ≤ ‖(I − λF (u))‖k ≤ (1 − λδ)k, that is (110).

We now prove (111). We have

λt−1∑

k=0

I − λF (u)2k = λ

I − (I − λF (u))2−1

I − (I − λF (u))2t

.

Since λminF (u) > C, for some C > 0, we have |I−λF (u)2t|1 ≤ ‖I−λF (u)2t‖|Ip+1| →0 as λ → 0, λt → ∞ and t → ∞. Furthermore, λ

I−(I−λF (u))2−1 → 1

2F (u)−1. Together

they give (111).

Lemma A.3 Suppose that Assumption 2.1 holds with r = 4. For a sufficiently large N and

for every R > 0 there exists a s0 > p+1 and C1 such that, for all s ≥ s0 and t = 1, . . . , N−s

we have

E

λmin

( t+s−1∑

k=t

Xk,NX Tk,N

|Xk,N |21

Ft

)

≥ C1I(‖Xt,N‖1 ≤ R),

where I denotes the identity function.

PROOF. The result can be proved using the methods given in Moulines et al. (2005),

Lemma 19. We outline the proof. By using λmin(A) ≥ λmin(B) − ‖A − B‖ (see Moulines

32

et al. (2005), Lemma 19) we have

Et

λmin

( t+s−1∑

k=t

Xk,NX Tk,N

|Xk,N |21

)

t+s−1∑

k=t

λmin

Et

(Xk,NX Tk,N

|Xk,N |21)

− Et

t+s−1∑

k=t

Xk,NX Tk,N

|Xk,N |21− Et

(Xk,NX Tk,N

|Xk,N |21)

∥.

We now evaluate an upper bound for the second term above. Let ∆k,N =Xk,NXT

k,N

‖Xk,N‖21

Et(Xk,NXT

k,N

‖Xk,N‖21

), then

Et‖t+s−1∑

k=t

∆k,N‖2 ≤ Et|t+s−1∑

k=t

∆k,N |2 ≤p+1∑

i,j=1

t+s−1∑

k1,k2=t

Et((∆k1,N )i,j(∆k2,N )i,j). (112)

We now bound Et((∆k1,N )i,j(∆k2,N )i,j). For k1 = k2 = k we obtain with (29)

Et

(∆k,N )i,j2 ≤ 2Et|Xk,N |41 ≤ K|Xt,N |41. (113)

Now let k1 6= k2. Let φ(x) = xT x/|x|21, where where x = (1, x1, . . . , xp). Since φ ∈ Lip(1),

by using (28) we have for k1 < k2:

|Ek1((∆k2,N )i,j)| = |Ek1((Xk2,NX T

k2,N )i,j

|Xk2,N |21) − Et(

(Xk2,NX Tk2,N )i,j

|Xk2,N |21)|

≤ Kρk2−k1(Et(|Xk1,N |1) + |Xk1,N |1). (114)

By using (29), (113) and (114) we have

|Et

(∆k1,N )i,j(∆k2,N )i,j

| ≤ |Et

Ek1(∆k1,N )i,j(∆k2,N )i,j

|≤ |Et

(∆k1,N )i,jEk1(∆k2,N )i,j

|≤ Et(∆k1,N )2i,j1/2

Et(Ek1(∆k2,N )i,j)21/2

≤ Kρk2−k1 |Xt,N |31 ≤ Kρk2−k1 |Xt,N |41. (115)

Substituting (115) into (112) we obtain

Et

t+s−1∑

k=t

∆k,N

2= Ks|Xt,N |41, (116)

where K is a constant independent of s. Therefore by using (107) and (116) we have for N

sufficiently large

Et

λmin

( t+s−1∑

k=t

Xk,NX Tk,N

|Xk,N |21

)

≥ Cs

2|Xt,N |−4

1 − K

s|Xt,N |411/2.

33

Therefore for any R > 0,

Et

λmin

( t+s−1∑

k=t

Xk,NX Tk,N

|Xk,N |21

)

≥ Cs

2R4− Ks1/2R2

I(|Xt,N |1 ≤ R)

=s1/2

R4

(

Cs1/2 − KR2)

I(|Xt,N |1 ≤ R).

Now choose s0 and corresponding C1 such that

C1 =s1/20

R4

(

Cs1/20 − KR2

)

> 0.

Then it is clear if s > s0 we have

Et

λmin

( t+s−1∑

k=t

Xk,NX Tk,N

|Xk,N |21

)

≥ s1/2

R4

(

Cs1/2 − KR2)

I(|Xt,N |1 ≤ R) ≥ C1I(|Xt,N |1 ≤ R),

thus we have the result.

PROOF of Theorem 4.1 We prove the result by applying Theorem A.1 to Al =

Fl,N ,Fl = σ(Zl, Zl−1, . . .) : p + 1 ≤ l ≤ N and verifying the conditions in Theorem

A.1, Theorem 4.1 then immediately follows. Let φl := |Xl,N |1, Vl := |Xl,N |1 and Al :=

Fl,N = Xl,NX Tl,N/|Xl,N |21. Let k ≥ p + 1 and 1 ≤ s ≤ N − k. Then by using (27) we have

E(Vt+s|Ft) ≤ K(1 + ρs|Xt,N |1). Therefore for any R1 and s > 0 we have

E(Vt+s

∣Ft) ≤ Kρs +K

R1I(|Xt,N |1 > R1)|Xt,N |1 + KI(|Xt,N |1 ≤ R1).

Thus we have

E(Vt+s

∣Ft) ≤ (Kρs +K

R1)I(|Xt,N |1 > R1)|Xt,N |1 + K(ρs + 1)I(|Xt,N |1 ≤ R1)|Xt,N |1.

By choosing an appropriate R1 we can find a s1 such that for all s ≥ s1 we have Kρs +

K/R1 < 1 and thus condition (a) is satisfied. Condition (b) directly follows from Lemma

A.3. Let Ip+1 be a (p + 1) × (p + 1) matrix where (Ip+1)ij = 1 for 1 ≤ i, j ≤ (p + 1). Since

‖Fl,N‖ =XT

l,NXl,N

|Xl,N |21= 1, for λ < 1, we have λ‖Fl,N‖ < 1, hence condition (c) is satisfied.

Finally by using the above for any q ≥ 1 and t ∈ N − p, . . . , t we have

r+s1∑

l=t

E(‖Fl,N‖q|Fj) ≤ k1(p + 1)2q.

Therefore condition (d) is satisfied and Theorem 4.1 follows from Theorem A.1.

A.3 The lower order terms in the pertubation expansion

In this section we will prove the auxillary results required in Section 4.2, where we showed

that the second order terms in the pertubation expansion were a lower order than the

principle terms. The analysis of Jx,2t0,N is based on partitioning it into two terms

Jx,2t0,N = Ax

t0,N + Bxt0,N ,

34

similar to the partition in (54).

Axt0,N is the weighted sum of Fk,N − Fk(u) whereas Bx

t0,N is the weighted sum of the

differences between the stationary approximation Fk(u0) and E(F0(u0)), that is of Fk(u0) =

Fk(u0)−F (u0). In this section we evaluate bounds for these two terms. We use the following

lemma.

Lemma A.4 Suppose Assumption 2.1 holds with some r ≥ 1. Let Mt,N and Mt(u) be

defined as in (30) and let Dt,N be defined as in (46). Then we have

Xt,NX Tt,N

|Xt,N |21=

Xt(u)Xt(u)T

|Xt(u)|21+ Rt,N (u) (117)

and Mt,N = Mt(u) + (1 + Z2t )Rt,N (u)

where

|Rt,N (u)| ≤(

| t

N− u|β + (

p

N)β +

1

)

Dt,N

Further, for q ≤ q0 there exists a constant K independent of t, N and u such that

(E|Rt,N (u)|q)1/q ≤ K(| t

N− u|β +

(p + 1

N

)β). (118)

PROOF. The proof uses (22) and the method given in Dahlhaus and Subba Rao (2006),

Lemma A.4.

We now give a bound for a general Axt,N .

Lemma A.5 Suppose Assumption 2.1 holds with r = q0 and let Gt,N be a random process

which satisfies supt,N ‖Gt,N‖Eq0

< ∞. Let Ft,N , Ft(u) and F (u) be defined as in (31) and

(10), respectively. Then if | t0N − u0| < 1/N and q ≤ q0 we have

E|t0−p−1∑

k=0

(I − λF (u0))k(

Ft0−k−1,N − Ft0−k−1(u0))

Gt0−k,N |q/2)2/q ≤ K

(δNλ)βsupt,N

(E|Gt,N |q)1/q,

(119)

where K is a finite constant.

PROOF. By using (110) and (117) we get the result.

We now consider the second term Bxt0,N . A crude way to bound E|Bx

t0,N |q is to use

Minkowski’s inequality and directly take the norm into the summand. However, with

this method we do not obtain the desired lower order. To overcome this problem we use

Burkholder type inequalities.

We now state a generalisation of Proposition B.3, in Moulines et al. (2005). The following

result can be proved by adapting the proof of Proposition B.3, in Moulines et al. (2005).

35

Lemma A.6 Suppose Mt and Ft are random matrices and F is a positive definite, de-

terministic matrix, with λmin(F ) > δ, for some δ > 0. Let Ft = σ(Ft, Mt, Ft−1, Mt−1, . . .).

We assume for some q ≥ 2

(i) Ft are identically distributed with mean zero.

(ii) E(Mt|Ft−1) = 0.

(iii) E|E(Ft|Fk)|2q)1/2q ≤ Kρt−k

(iv) E|E(MtFs|Fk) − E(MtFs)|q)1/q ≤ Kρs−k if k ≤ s ≤ t.

(v) supt(E|Mt|2q)1/(2q) < ∞ and (E|Ft|2q)1/(2q) < ∞.

Then we have

(

E|t−p−1∑

k=0

t−p−k−2∑

i=0

(I − λF )k+iFt−k−1Mt−k−i−1|q)1/q

≤ K

δλ, (120)

where K is a finite constant.

PROOF. We note since λminF ≥ δ > 0, then

‖(I − λF )k‖ ≤ K(1 − λδ)k. (121)

Change of variables in (120) gives

t−p−1∑

k=0

t−p−k−2∑

i=0

(I − λF )k+iFt−k−1Mt−k−i−1 =t−1∑

k=p

k∑

i=p+1

(I − λF )t−i−1FkMi

=t−1∑

k=p

(I − λF )t−k−1Fk

k∑

i=p+1

(I − λF )k−iMi

(using the convention∑0

i=1 = 0). Let φk = (I −λF )t−k−1Fk∑k

i=1(I −λF )k−iMi. Then we

have

E|t−1∑

k=p

k∑

i=p+1

(I − λF )t−i−1FkMi|q

1/q

≤ K

E|t−1∑

k=p

E(φk)| +

E|t−1∑

k=p

φk − E(φk)|q1/q

.(122)

We first consider |∑t−1k=p E(φk)|. Since E(FkMi) = E(MiE(Fk|Fi)), under the stated as-

sumptions, and using the norm inequality, |Ax| ≤ ‖A‖ |x|, we obtain

|E((I − λF )t−i−1FkMi)| ≤ K(1 − λδ)t−i−1E(

|Mi| |E(Fk|Fi)|)

≤ Kρk−iE|Mi| ≤ Kρk−i.

Substituting the above into E(φk) gives the bound

|E(φk)| ≤ K(1 − λδ)t−k−1k∑

i=p+1

ρk−i ≤ KS(ρ)(1 − λδ)t−k−1,

36

where S(ρ) = 1/(1 − ρ). Thus

|t−1∑

k=p

E(φk)| ≤ Kt−1∑

k=p

(1 − λδ)t−k−1S(ρ) = O(λ−1). (123)

We now consider the second term on the right hand side of (122), (E|∑t−1k=p

φk−E(φk)

|q)1/q.

Let φk = φk − E(φk). By using the generalised Burkholder inequality (see Dedecker and

Doukhan (2003)) we have

(E|t−1∑

k=p

φk|q)1/q ≤

2qt−1∑

k=p

(E|φk|q)1/qt−1∑

j=k

(E|E(φj |Fk)|q)1/q

1/2

. (124)

We treat the two summands, on the right hand side of (124), separately. By using Holder’s

inequality, the norm inequality and (121) we have

(E|φk|q)1/q =

E|(I − λF )t−k−1k∑

i=p+1

(I − λF )k−iMiFk − E(MiFk)|q

1/q

≤ 2K(1 − λδ)t−k−1(E|k∑

i=p+1

(1 − λδ)k−iMiFk|q)1/q

≤ 2K(1 − λδ)t−k−1(E|Fk|2q)1/(2q)(E|k∑

i=p+1

(1 − λδ)k−iMi|2q)1/(2q). (125)

By applying Burkholder’s inequality to (E|∑ki=p+1(1 − λδ)k−iMi|2q)1/(2q), we have

(E|k∑

i=p+1

(1 − λδ)k−iMi|2q)1/(2q) ≤

2qk∑

i=p+1

(1 − λδ)2(k−i)((E|Mi|2q)1/q)1/2

.

Substituting the above into (125) gives

(E|φk|q)1/q ≤ 2K(1 − λδ)t−k−1 supk

(E|Fk|2q)1/(2q) supi

(E|Mi|2q)1/2q2q

k−p−1∑

i=0

(1 − λδ)2i1/2

≤ K

λ1/2(1 − λδ)t−k−1. (126)

We now consider∑t−1

j=k(E|E(φj |Fk)|q)1/q. We treat the cases Mi ∈ Fk and Mi /∈ Fk

37

separately, and obtain the following partition

t−1∑

j=k

‖E(φj |Fk)‖Eq ≤

t−1∑

j=k

E|(I − λF )t−j−1j∑

i=p+1

(I − λF )j−iE[MiFj − E(MiFj)]|Fk|q

1/q

≤t−1∑

j=k

E|(I − λF )t−j−1k∑

i=p+1

(I − λF )j−iE[MiFj − E(MiFj)]|Fk|q

1/q

+

+t−1∑

j=k

(

E|(I − λF )t−j−1j∑

i=k+1

(I − λF )j−iE[MiFj − E(MiFj)]|Fk|q

)1/q

= Ak + Bk. (127)

Since for all random variables A, |E(A)| ≤ (E|E(A|Fk|q)1/q (if q ≥ 1) and E(MiFj |Fk) =

MiE(Fj |Fk) (for i ≤ k), we have

Ak ≤ 2t−1∑

j=k

‖(I − λF )t−j−1E

k∑

i=p+1

(I − λF )j−iMiFj |Fk‖Eq

≤ 2Kt−1∑

j=k

(1 − λδ)t−j−1

E|E(Fj |Fk)k∑

i=p+1

(I − λF )j−iMi|q

1/q

.

Now by using the Holder inequality and (E|E(Fj |Fk)|2q)1/(2q) ≤ Kρj−k we have

Ak ≤ 2t−1∑

j=k

(1 − λδ)t−j−1(E|E(Fj |Fk)|2q)1/2q (E|k∑

i=p+1

(I − λF )j−iMi|2q)1/2q

≤ 2

t−1∑

j=k

Kρj−k2q

k∑

i=p+1

(1 − λδ)j−i(E|Mi|2q)1/q)1/2 ≤ K

λ1/2. (128)

We now bound Bk:

Bk ≤ Kt−1∑

j=k

(1 − λδ)t−j−1j∑

i=k+1

(1 − λδ)j−i(E|E[MiFj − E(MiFj)]|Fk|q)1/q. (129)

We will evaluate two different bounds for ‖E(MiFj − E(MiFj))|Fk‖Eq . We use the mini-

mum of these bounds to evaluate Bk. Since i > k, (E|E(A|Fk)‖Eq ≤ (E|E(A|Fi)|q)1/q, under

the assumption (E|E(Fj |Fi)|2q)1/2q ≤ Kρj−i, we have

(E|E[MiFj − E(MiFj)]|Fk|q)1/q ≤ (E|E[MiFj − E(MiFj)]|Fi|q)1/q

≤ 2(E|EMiFj |Fi|q)1/q ≤ 2(E|MiEFj |Fi|q)1/q

≤ 2(E|Mi|2q)1/2q|EFj |Fi|2q)1/(2q) ≤ K supi

(E|Mi|2q)1/(2q)ρj−i.

Using again the stated assumption (iv), we get the alternative bound

(E|E[MiFj − E(MiFj)]|Fk|q)1/q ≤ Kρi−k.

38

Therefore we have

(E|E(MiFj − E(MiFj))|Fk|q)1/q ≤ K min(ρj−i, ρi−k)

leading to

Bk ≤ Kt−1∑

j=k

(1 − λδ)t−j−1j∑

i=k+1

(1 − λδ)j−i min(ρj−i, ρi−k) (130)

≤t−1∑

j=k

j∑

i=k+1

min(ρj−i, ρi−k) ≤ K. (131)

Substituting (128) and (130) into (127) gives

t−p−1∑

j=k

(E|E(φj |Fk)|q)1/q ≤ Ak + Bk ≤ K +K

λ1/2, (132)

which, by substituting (126) and (132) into (124), leads to

E|t−1∑

k=p

φk|q)1/q ≤ 2qt−1∑

k=0

(1 − λδ)t−k−1 K

λ1/2

(

K +K

λ1/2

)

1/2 ≤ K

λ. (133)

Finally by using (122), (123) and (133) we get the assertion.

We now apply the lemma above to the tvARCH online recursive algorithm.

Lemma A.7 Suppose Assumption 2.1 holds for r > 4. Let Ft(u), Mt,N and Mt(u)be defined as in (30) and Ft(u) = Ft(u) − E(Ft(u)). Then, we have

(

E

t0−p−1∑

k=0

t0−p−k−2∑

i=0

I − λF (u)k+iFt0−k−1(u)Mt0−k−i−1,N |r/2

)2/r

≤ K

λ(134)

and

(

E|t0−p−1∑

k=0

t0−p−k−2∑

i=0

I − λF (u)k+iFt0−k−1(u)Mt0−k−i−1(u)|r/2

)2/r

≤ K

λ. (135)

PROOF. We prove (134), the proof of (135) is the same. We will prove the result by

verifying the conditions in Lemma A.6, then (134) immediately follows. By using (108) we

have that λminF (u) ≥ δ, for some δ > 0. Let Mt := Mt,N , Ft := Ft(u), F := F (u) and

Ft = σ(Zt, Zt−1, . . .). It is clear from the definition that the series Ft(u)t has zero mean

and are identically distributed, also E(Mt(u)|Ft−1) = 0p+1×p+1. By using (25) we have

(

E|E(Ft(u)|Fk)|r)1/r

= (E|E(Ft(u)|Fk) − E(Ft(u))|r)1/r ≤ Kρt−k,

39

thus condition (iii) is satisfied. Since Mt,N = (Z2t − 1)σ2

t,NXt−1,N/|Xt−1,N |21, and Ft ≤ Ip+1

and σ2t,NXt−1,N/|Xt−1,N |21 ≤ Ip+1, by using Corollary A.1 and supt,N (E|Xt,N |r/2)2/r < ∞,

we can show that condition (iv) is satisfied.

Moreover, Ft,N is a bounded random matrix, hence all its moments exist. Finally, since

supt,N |Mt,N |rr ≤ K(Z2t + 1)r and E(Z2r

0 ) < ∞ we have for all k ≤ s ≤ t ≤ N , that

supt,N (E|Mt,N |r) < ∞, leading to condition (v). Thus all the conditions of Lemma A.6 are

satisfied and we obtain (134).

References

Aguech, R., Moulines, E., & Priouret, P. (2000). On a perturbation approach for the

analysis of stochastic tracking algorithm. SIAM Journal on Control and Optimization,

39, 872-99.

Chen, X., & White, H. (1998). Nonparametric learning with feedback. Journal of Economic

Theory, 82, 190-222.

Dahlhaus, R., & Subba Rao, S. (2006). Statistical inference of time varying ARCH processes.

Ann. Statistics, 34.

Gill, R., & Levit, B. (1995). Applications of the van Trees inequality: a Bayesian Cramer-

Rao bound. Bernoulli, 1/2, 59-79.

Guo, L. (1994). Stability of recursive stochastic tracking algorithms. SIAM J. on Control

and Optimisation, 32, 1195-1225.

Hall, P., & Heyde, C. (1980). Martingale Limit Theory and its Application. New York:

Academic Press.

Kushner, H., & Yin, G. (2003). Stochastic approximation and recursive algorithms and

applications. Heidelberg: Springer.

Ljung, L., & Soderstrom, T. (1983). Theory and practice of recursive identification. Cam-

bridge, MA: MIT Press.

Mikosch, T., & Starica, C. (2000). Is it really long memory we see in financial returns. In

P. Embrechts (Ed.), Extremes and integrated risk management (p. 439-459). London:

Risk Books.

Mikosch, T., & Starica, C. (2004). Non-stationarities in financial time sries, the long-range

dependence, and the IGARCH effects. The Review of Econometrics and Statistics,

86, 378-390.

40

Moulines, E., Priouret, P., & Roueff, F. (2005). On recursive estimation for locally station-

ary time varying autoregressive processes. Ann. Statist., 33, 2610-2654.

Solo, V. (1981). The second order properties of a time series recursion. Ann. Statist., 9,

307-317.

Solo, V. (1989). The limiting behavior of the LMS. IEEE Trans. Acoust. Speach and Signal

Proceess, 37, 1909-1922.

Subba Rao, S. (2004). On some nonstationary, nonlinear random processes and their

stationary approximations. Preprint.

White, H. (1996). Parametric statistical estimation using artifical neural networks. In

P. Smolensky, M. Mozer, & D. Rumelhart (Eds.), Mathematical perspectives on neural

networks (p. 719-775). Hilldale, NJ: L. Erlbaum Associates.

41