Lecture 3
Transcript of Lecture 3
One-Step MLE. Many estimators are consistent and asymptoticallynormal but not asymptotically efficient. Some of them can beimproved up to asymptotically efficient. We observe
dXt = S(ϑ,Xt) dt + σ(Xt) dWt, X0, 0 ≤ t ≤ T
in regular case, MLE is as. normal
Lθ
{√T
(ϑT − ϑ
)}=⇒ N
(0, I (ϑ)−1
), I(ϑ) = Eϑ
(S (ϑ, ξ)σ (ξ)
)2
and asymptotically efficient
limδ→0
limT→∞
sup|ϑ−ϑ0|<δ
TEϑ
(ϑT − ϑ
)2
= I(ϑ0)−1.
1
The family of measures is LAN
L
(ϑ +
u√T
, ϑ, XT
)= exp
{u∆T
(ϑ,XT
)− u2
2I (ϑ) + rT
(ϑ, u, XT
)}
Here rT → 0 and
∆T
(ϑ,XT
)=
1√T
∫ T
0
S (ϑ,Xt)σ (Xt)
2 [dXt − S (ϑ,Xt) dt] =⇒ N (0, I(ϑ))
Then having a consistent and as. normal estimator ϑT we constructthe estimator
ϑ◦T
= ϑT +∆T
(ϑT , XT
)√
T I(ϑT )
and show that this estimator is asymptotically efficient
2
(ϑ◦
T− ϑ
)√T =
(ϑT − ϑ
)√T +
∆T (ϑ)I (ϑ)
(1 + o(1))
+1
I (ϑ)√
T
∫ T
0
S(ϑT , Xt)σ(Xt)2
[S (ϑ, Xt)− S
(ϑT , Xt
)]dt (1 + o(1))
= ηT +∆T (ϑ)I (ϑ)
(1 + o(1))− ηT
I (ϑ)1T
∫ T
0
(S(ϑ, Xt)σ(Xt)
)2
dt (1 + o(1))
=∆T (ϑ)I (ϑ)
(1 + o(1)) + o(1) =⇒ N (0, I(ϑ)−1
).
3
It is easy to verify by the Ito formula that δT
(θ, XT
)= ∆T
(θ, XT
)
δT
(θ, XT
)=
1√T
∫ XT
X0
S(θ, y)σ (y)2
dy − 12√
T
∫ T
0
S′(θ, Xt) dt +
+1√T
∫ T
0
S(θ, Xt)(
σ′(Xt)σ(Xt)
− S(θ, Xt)σ(Xt)2
)dt,
and define the one-step maximum likelihood estimator by the sameformula
ϑ◦T
= ϑT +δT
(ϑT , XT
)√
T I(ϑT ).
We prove
Lϑ
{√T
(ϑ◦
T− ϑ
)}=⇒ N (
0, I(ϑ)−1)
4
Example. Let
dXt = − (Xt − ϑ)3 dt + σ dWt, X0, 0 ≤ t ≤ T.
The MLE cannot be written in explicit form, but the EMM
ϑT =1T
∫ T
0
Xt dt
is uniformly consistent and asymptotically normal. The one-stepMLE is
ϑ◦T
= ϑT −1
σ2I T
∫ T
0
(ϑT −Xt
)5 dt.
This estimator is consistent, asymptotically normal
Lϑ
{√T
(ϑ◦
T− ϑ
)}=⇒ N (
0, I−1)
5
Lower Bounds.
The first one is the Cramer-Rao bound. Suppose that the observeddiffusion process is
dXt = S (ϑ,Xt) dt + σ (Xt) dWt, X0, 0 ≤ t ≤ T
and we have to estimate some continuously differentiable functionψ (ϑ) , ϑ ∈ Θ ⊂ R.
∂
∂ϑEϑψT = Eϑ
(∆T
(ϑ, XT
)ψT
)
where
∆T
(ϑ, XT
)=
f (ϑ,X0)f (ϑ,X0)
+∫ T
0
S (ϑ,Xt)σ (Xt)
2 [dXt − S (ϑ,Xt) dt] .
6
Then we can write
Eϑ
(∆T
(ϑ,XT
)ψT
)= Eϑ
(∆T
(ϑ,XT
) [ψT −EϑψT
])
≤(Eϑ
[ψT −EϑψT
]2)1/2 (Eϑ∆T
(ϑ, XT
)2)1/2
and
Eϑ∆T
(ϑ,XT
)2= Eϑ
(f (ϑ,X0)f (ϑ,X0)
)2
+ T Eϑ
(S (ϑ, ξ)σ (ξ)
)2
= IT (ϑ) .
Hence
Eϑ
[ψT −EϑψT
]2 ≥
[ψ(ϑ) + b (ϑ)
]2
IT (ϑ),
7
Using the equalityEϑ
[ψT − ψ (ϑ)− b(ϑ)
]2 = Eϑ
[ψT − ψ (ϑ)
]2 − b(ϑ)2 we obtain finally
Eϑ
[ψT − ψ (ϑ)
]2 ≥
[ψ(ϑ) + b (ϑ)
]2
IT (ϑ)+ b (ϑ)2
which is called the Cramer–Rao inequality. If ψ (ϑ) = ϑ it became
Eϑ
[ϑT − ϑ
]2 ≥
[1 + b (ϑ)
]2
IT (ϑ)+ b (ϑ)2
and this last inequality is sometimes used to define an asymptotically
efficient estimator ϑT as an estimator satisfying for any ϑ ∈ Θ therelation
limT→∞
T Eϑ
[ϑT − ϑ
]2 =1
I (ϑ)(wrong!).
8
Due to the well-known Hodges example this definition is notsatisfactory. Therefore we use another bound (inequality) called theHajek–Le Cam bound. For quadratic loss function this lower boundis: for any estimator ϑT and any ϑ0 ∈ Θ
limδ→0
limT→∞
sup|ϑ−ϑ0|<δ
T Eϑ
[ϑT − ϑ
]2 ≥ 1I (ϑ0)
It can be considered as an asymptotic minimax version of theCramer–Rao inequality.
To prove it we need the van Trees lower bound. Suppose that theunknown parameter ϑ ∈ Θ = (α, β) is a random variable with densityp (ϑ), p (α) = 0 = p (β) and the Fisher information
Ip =∫ β
α
p (θ)2
p (θ)dθ < ∞.
9
Further we suppose that
∂
∂ϑL
(ϑ, ϑ1;XT
)= ∆
(ϑ,XT
)L
(ϑ, ϑ1;XT
)
Then we can write∫ β
α
ψ (ϑ)∂
∂ϑ
[L
(ϑ, ϑ1;XT
)p (ϑ)
]dϑ = ψ (ϑ)L
(ϑ, ϑ1; XT
)p (ϑ)
∣∣βα
−∫ β
α
ψ (ϑ)L(ϑ, ϑ1; XT
)p (ϑ) dϑ = −
∫ β
α
ψ (ϑ)L(ϑ, ϑ1; XT
)p (ϑ) dϑ.
In a similar way
Eϑ1
∫ β
α
(ψT − ψ (ϑ)
) ∂
∂ϑ
[L
(ϑ, ϑ1; XT
)p (ϑ)
]dϑ
= Eϑ1
∫ β
α
ψ (ϑ) L(ϑ, ϑ1; XT
)p (ϑ) dϑ =
∫ β
α
ψ (ϑ) p (ϑ) dϑ = EP
ψ (ϑ)
10
The Cauchy–Schwarz inequality gives us
(E
Pψ (ϑ)
)2
≤ Eϑ1
∫ β
α
(ψT − ψ (ϑ)
)2L
(ϑ, ϑ1;XT
)p (ϑ) dϑ
×Eϑ1
∫ β
α
(∂
∂ϑln
[L
(ϑ, ϑ1; XT
)p (ϑ)
])2
L(ϑ, ϑ1; XT
)p (ϑ) dϑ.
For the first integral we have
Eϑ1
∫ β
α
(ψT − ψ (ϑ)
)2L
(ϑ, ϑ1;XT
)p (ϑ) dϑ
=∫ β
α
Eϑ
(ψT − ψ (ϑ)
)2p (ϑ) dϑ = E
(ψT − ψ (ϑ)
)2,
and for the second integral we obtain EP
IT (ϑ) + Ip.
11
Therefore
E(ψT − ψ (ϑ)
)2 ≥
(E
Pψ (ϑ)
)2
EP
IT (ϑ) + Ip.
This lower bound is due to van Trees (1968) and is called the vanTrees inequality, global Cramer–Rao bound, integral type
Cramer–Rao inequality or Bayesian Cramer–Rao bound. If we needto estimate ϑ only, then it becomes
E(ϑT − ϑ
)2 ≥ 1E
PIT (ϑ) + Ip
.
The main advantage of this inequality is that the right hand sidedoes not depend on the properties of the estimators (say, bias) and sois the same for all estimators. It is widely used in asymptoticnonparametric statistics. In particular, it gives the Hajek–Le Caminequality in the following elementary way.
12
Let us introduce a random variable η with density functionp (v) , v ∈ [−1, 1] such that p (−1) = p (1) = 0 and the Fisherinformation Ip < ∞. Fix some δ > 0, put ϑ = θ0 + δη and write E forthe expectation with respect to the joint distribution of XT and η.Then we have
limT→∞
sup|θ−θ0|<δ
T Eθ
(ϑT − ϑ
)2 ≥ limT→∞
T E(ϑT − ϑ
)2
≥ limT→∞
T
EP
IT (ϑ) + δ−2Ip=
1∫ 1
−1I (θ0 + δu) p (u) du
.
Hence from the continuity of the function I (·) as δ → 0 we obtain
limδ→0
limT→∞
sup|θ−θ0|<δ
T Eθ
(ϑT − θ
)2 ≥ 1I (θ0)
13