Lecture 3

One-Step MLE. Many estimators are consistent and asymptoticallynormal but not asymptotically efficient. Some of them can beimproved up to asymptotically efficient. We observe

dXt = S(ϑ,Xt) dt + σ(Xt) dWt, X0, 0 ≤ t ≤ T

in regular case, MLE is as. normal

Lθ

{√T

(ϑT − ϑ

)}=⇒ N

(0, I (ϑ)−1

), I(ϑ) = Eϑ

(S (ϑ, ξ)σ (ξ)

)2

and asymptotically efficient

limδ→0

limT→∞

sup|ϑ−ϑ0|<δ

TEϑ

(ϑT − ϑ

)2

= I(ϑ0)−1.

1

The family of measures is LAN

L

(ϑ +

u√T

, ϑ, XT

)= exp

{u∆T

(ϑ,XT

)− u2

2I (ϑ) + rT

(ϑ, u, XT

)}

Here rT → 0 and

∆T

(ϑ,XT

)=

1√T

∫ T

0

S (ϑ,Xt)σ (Xt)

2 [dXt − S (ϑ,Xt) dt] =⇒ N (0, I(ϑ))

Then having a consistent and as. normal estimator ϑT we constructthe estimator

ϑ◦T

= ϑT +∆T

(ϑT , XT

)√

T I(ϑT )

and show that this estimator is asymptotically efficient

2

(ϑ◦

T− ϑ

)√T =

(ϑT − ϑ

)√T +

∆T (ϑ)I (ϑ)

(1 + o(1))

+1

I (ϑ)√

T

∫ T

0

S(ϑT , Xt)σ(Xt)2

[S (ϑ, Xt)− S

(ϑT , Xt

)]dt (1 + o(1))

= ηT +∆T (ϑ)I (ϑ)

(1 + o(1))− ηT

I (ϑ)1T

∫ T

0

(S(ϑ, Xt)σ(Xt)

)2

dt (1 + o(1))

=∆T (ϑ)I (ϑ)

(1 + o(1)) + o(1) =⇒ N (0, I(ϑ)−1

).

3

It is easy to verify by the Ito formula that δT

(θ, XT

)= ∆T

(θ, XT

)

δT

(θ, XT

)=

1√T

∫ XT

X0

S(θ, y)σ (y)2

dy − 12√

T

∫ T

0

S′(θ, Xt) dt +

+1√T

∫ T

0

S(θ, Xt)(

σ′(Xt)σ(Xt)

− S(θ, Xt)σ(Xt)2

)dt,

and define the one-step maximum likelihood estimator by the sameformula

ϑ◦T

= ϑT +δT

(ϑT , XT

)√

T I(ϑT ).

We prove

Lϑ

{√T

(ϑ◦

T− ϑ

)}=⇒ N (

0, I(ϑ)−1)

4

Example. Let

dXt = − (Xt − ϑ)3 dt + σ dWt, X0, 0 ≤ t ≤ T.

The MLE cannot be written in explicit form, but the EMM

ϑT =1T

∫ T

0

Xt dt

is uniformly consistent and asymptotically normal. The one-stepMLE is

ϑ◦T

= ϑT −1

σ2I T

∫ T

0

(ϑT −Xt

)5 dt.

This estimator is consistent, asymptotically normal

Lϑ

{√T

(ϑ◦

T− ϑ

)}=⇒ N (

0, I−1)

5

Lower Bounds.

The first one is the Cramer-Rao bound. Suppose that the observeddiffusion process is

dXt = S (ϑ,Xt) dt + σ (Xt) dWt, X0, 0 ≤ t ≤ T

and we have to estimate some continuously differentiable functionψ (ϑ) , ϑ ∈ Θ ⊂ R.

∂

∂ϑEϑψT = Eϑ

(∆T

(ϑ, XT

)ψT

)

where

∆T

(ϑ, XT

)=

f (ϑ,X0)f (ϑ,X0)

+∫ T

0

S (ϑ,Xt)σ (Xt)

2 [dXt − S (ϑ,Xt) dt] .

6

Then we can write

Eϑ

(∆T

(ϑ,XT

)ψT

)= Eϑ

(∆T

(ϑ,XT

) [ψT −EϑψT

])

≤(Eϑ

[ψT −EϑψT

]2)1/2 (Eϑ∆T

(ϑ, XT

)2)1/2

and

Eϑ∆T

(ϑ,XT

)2= Eϑ

(f (ϑ,X0)f (ϑ,X0)

)2

+ T Eϑ

(S (ϑ, ξ)σ (ξ)

)2

= IT (ϑ) .

Hence

Eϑ

[ψT −EϑψT

]2 ≥

[ψ(ϑ) + b (ϑ)

]2

IT (ϑ),

7

Using the equalityEϑ

[ψT − ψ (ϑ)− b(ϑ)

]2 = Eϑ

[ψT − ψ (ϑ)

]2 − b(ϑ)2 we obtain finally

Eϑ

[ψT − ψ (ϑ)

]2 ≥

[ψ(ϑ) + b (ϑ)

]2

IT (ϑ)+ b (ϑ)2

which is called the Cramer–Rao inequality. If ψ (ϑ) = ϑ it became

Eϑ

[ϑT − ϑ

]2 ≥

[1 + b (ϑ)

]2

IT (ϑ)+ b (ϑ)2

and this last inequality is sometimes used to define an asymptotically

efficient estimator ϑT as an estimator satisfying for any ϑ ∈ Θ therelation

limT→∞

T Eϑ

[ϑT − ϑ

]2 =1

I (ϑ)(wrong!).

8

Due to the well-known Hodges example this definition is notsatisfactory. Therefore we use another bound (inequality) called theHajek–Le Cam bound. For quadratic loss function this lower boundis: for any estimator ϑT and any ϑ0 ∈ Θ

limδ→0

limT→∞

sup|ϑ−ϑ0|<δ

T Eϑ

[ϑT − ϑ

]2 ≥ 1I (ϑ0)

It can be considered as an asymptotic minimax version of theCramer–Rao inequality.

To prove it we need the van Trees lower bound. Suppose that theunknown parameter ϑ ∈ Θ = (α, β) is a random variable with densityp (ϑ), p (α) = 0 = p (β) and the Fisher information

Ip =∫ β

α

p (θ)2

p (θ)dθ < ∞.

9

Further we suppose that

∂

∂ϑL

(ϑ, ϑ1;XT

)= ∆

(ϑ,XT

)L

(ϑ, ϑ1;XT

)

Then we can write∫ β

α

ψ (ϑ)∂

∂ϑ

[L

(ϑ, ϑ1;XT

)p (ϑ)

]dϑ = ψ (ϑ)L

(ϑ, ϑ1; XT

)p (ϑ)

∣∣βα

−∫ β

α

ψ (ϑ)L(ϑ, ϑ1; XT

)p (ϑ) dϑ = −

∫ β

α

ψ (ϑ)L(ϑ, ϑ1; XT

)p (ϑ) dϑ.

In a similar way

Eϑ1

∫ β

α

(ψT − ψ (ϑ)

) ∂

∂ϑ

[L

(ϑ, ϑ1; XT

)p (ϑ)

]dϑ

= Eϑ1

∫ β

α

ψ (ϑ) L(ϑ, ϑ1; XT

)p (ϑ) dϑ =

∫ β

α

ψ (ϑ) p (ϑ) dϑ = EP

ψ (ϑ)

10

The Cauchy–Schwarz inequality gives us

(E

Pψ (ϑ)

)2

≤ Eϑ1

∫ β

α

(ψT − ψ (ϑ)

)2L

(ϑ, ϑ1;XT

)p (ϑ) dϑ

×Eϑ1

∫ β

α

(∂

∂ϑln

[L

(ϑ, ϑ1; XT

)p (ϑ)

])2

L(ϑ, ϑ1; XT

)p (ϑ) dϑ.

For the first integral we have

Eϑ1

∫ β

α

(ψT − ψ (ϑ)

)2L

(ϑ, ϑ1;XT

)p (ϑ) dϑ

=∫ β

α

Eϑ

(ψT − ψ (ϑ)

)2p (ϑ) dϑ = E

(ψT − ψ (ϑ)

)2,

and for the second integral we obtain EP

IT (ϑ) + Ip.

11

Therefore

E(ψT − ψ (ϑ)

)2 ≥

(E

Pψ (ϑ)

)2

EP

IT (ϑ) + Ip.

This lower bound is due to van Trees (1968) and is called the vanTrees inequality, global Cramer–Rao bound, integral type

Cramer–Rao inequality or Bayesian Cramer–Rao bound. If we needto estimate ϑ only, then it becomes

E(ϑT − ϑ

)2 ≥ 1E

PIT (ϑ) + Ip

.

The main advantage of this inequality is that the right hand sidedoes not depend on the properties of the estimators (say, bias) and sois the same for all estimators. It is widely used in asymptoticnonparametric statistics. In particular, it gives the Hajek–Le Caminequality in the following elementary way.

12

Let us introduce a random variable η with density functionp (v) , v ∈ [−1, 1] such that p (−1) = p (1) = 0 and the Fisherinformation Ip < ∞. Fix some δ > 0, put ϑ = θ0 + δη and write E forthe expectation with respect to the joint distribution of XT and η.Then we have

limT→∞

sup|θ−θ0|<δ

T Eθ

(ϑT − ϑ

)2 ≥ limT→∞

T E(ϑT − ϑ

)2

≥ limT→∞

T

EP

IT (ϑ) + δ−2Ip=

1∫ 1

−1I (θ0 + δu) p (u) du

.

Hence from the continuity of the function I (·) as δ → 0 we obtain

limδ→0

limT→∞

sup|θ−θ0|<δ

T Eθ

(ϑT − θ

)2 ≥ 1I (θ0)

13

Lecture 3

Documents

Transcript of Lecture 3