Bayesian Doubly Adaptive Elastic-Net Lasso For VAR Shrinkage
Bayesian adaptive optimal estimation using a sieve prior
-
Upload
julyan-arbel -
Category
Documents
-
view
748 -
download
3
description
Transcript of Bayesian adaptive optimal estimation using a sieve prior
![Page 1: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/1.jpg)
Bayesian optimal adaptive estimation using asieve prior
YES IV Workshop
Julyan Arbel, [email protected]
ENSAE-CREST-Université Paris Dauphine
November 9, 2010
1 / 21
![Page 2: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/2.jpg)
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
2 / 21
![Page 3: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/3.jpg)
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
3 / 21
![Page 4: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/4.jpg)
Introduction
• Posterior concentration rate and risk convergence rate in aBayesian nonparametric setting.
• Results in the same spirit as the ones by Ghosal, Ghosh and Vander Vaart (2000) and Ghosal and Van der Vaart (2007), in thespecific case of models which are suitable for the use of sievepriors.
• Use of a family of sieve priors (introduced by Zhao (2000) in thewhite noise model).
• Infinite dimensional parameter from a Sobolev smoothness class.
4 / 21
![Page 5: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/5.jpg)
Notations• Let a model (X (n), A(n), P(n)
! : ! ! !) with observationsX (n) = (X n
i )1!i!n, and
! ="!
k=1
Rk .
• Denote !0 the parameter associated to the true model. Densitiesare denoted p(n)
! (p(n)0 for !0). The first k coordinates of !0 are
denoted !0k .
• A sieve prior " on ! is defined as follows
"(!) ="
k
"k"k (!),"
k
"k = 1,
and
!i
#i" g, where #i > 0.
5 / 21
![Page 6: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/6.jpg)
Notations• Let a model (X (n), A(n), P(n)
! : ! ! !) with observationsX (n) = (X n
i )1!i!n, and
! ="!
k=1
Rk .
• Denote !0 the parameter associated to the true model. Densitiesare denoted p(n)
! (p(n)0 for !0). The first k coordinates of !0 are
denoted !0k .
• A sieve prior " on ! is defined as follows
"(!) ="
k
"k"k (!),"
k
"k = 1,
and
!i
#i" g, where #i > 0.
5 / 21
![Page 7: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/7.jpg)
Notations• Let a model (X (n), A(n), P(n)
! : ! ! !) with observationsX (n) = (X n
i )1!i!n, and
! ="!
k=1
Rk .
• Denote !0 the parameter associated to the true model. Densitiesare denoted p(n)
! (p(n)0 for !0). The first k coordinates of !0 are
denoted !0k .
• A sieve prior " on ! is defined as follows
"(!) ="
k
"k"k (!),"
k
"k = 1,
and
!i
#i" g, where #i > 0.
5 / 21
![Page 8: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/8.jpg)
We define four different divergencies
K (f , g) =
ˆ
f log(f/g)dµ,
Vp,0(f , g) =
ˆ
f |log(f/g)# K (f , g)|p dµ,
#K (f , g) =
ˆ
p(n)0 |log(f , g)|dµ,
#Vp,0(f , g) =
ˆ
p(n)0 |log(f , g)# K (f , g)|p dµ.
6 / 21
![Page 9: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/9.jpg)
Define a Kullblack-Leibler neighborhood
Bn =$
! : K%
p(n)0 , p(n)
!
&$ n$2
n, Vp,0
%p(n)
0 , p(n)!
&$
'n$2
n(p/2)
.
We use a semimetric dn on !, and define !n =*! ! Rkn , %!% $ %n
+
with kn = k0n$2n/ log n and %n some power of n.
The posterior distribution is defined by
"(B|X (n)) =
´
B p(n)!
'X (n)
(d"(!)
´
! p(n)!
'X (n)
(d"(!)
.
7 / 21
![Page 10: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/10.jpg)
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
8 / 21
![Page 11: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/11.jpg)
Assumptions
Assumption 1 On the priorAssume there exist a, b, c, d > 0 such that "k and gn satisfy
e#ak log k $ "k $ e#bk log k ,
Ae#A1|t|d$ g(t) $ Be#B1|t|d
,
&T , #0 > 0, s.t. mini!kn
#i ' n#T and maxi>0
#i $ #0 < (,
kn"
i=1
|!0i |d /#di $ Ckn log n.
Assumption 2 On the rate of convergenceThe rate of convergence $n is bounded below by the two inequalities
K%
p(n)0 , p(n)
0kn
&$ n$2
n, and Vp,0
%p(n)
0 , p(n)0kn
&$
'n$2
n(p/2
.
9 / 21
![Page 12: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/12.jpg)
Assumptions
Assumption 1 On the priorAssume there exist a, b, c, d > 0 such that "k and gn satisfy
e#ak log k $ "k $ e#bk log k ,
Ae#A1|t|d$ g(t) $ Be#B1|t|d
,
&T , #0 > 0, s.t. mini!kn
#i ' n#T and maxi>0
#i $ #0 < (,
kn"
i=1
|!0i |d /#di $ Ckn log n.
Assumption 2 On the rate of convergenceThe rate of convergence $n is bounded below by the two inequalities
K%
p(n)0 , p(n)
0kn
&$ n$2
n, and Vp,0
%p(n)
0 , p(n)0kn
&$
'n$2
n(p/2
.
9 / 21
![Page 13: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/13.jpg)
Assumption 3 On divergencies#K and #Vp,0 satisfy
#K%
p(n)0kn
, p(n)!
&$ C
n2%!0kn # !%2 , #Vp,0
%p(n)
0kn, p(n)
!
&$ Cnp/2 %!0kn # !%p ,
Assumption 4 On semimetric dn
There exist G0, G > 0 such that, for any two !, !$,
dn(!, !$) $ CkG0
n %! # !$%G
10 / 21
![Page 14: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/14.jpg)
Assumption 3 On divergencies#K and #Vp,0 satisfy
#K%
p(n)0kn
, p(n)!
&$ C
n2%!0kn # !%2 , #Vp,0
%p(n)
0kn, p(n)
!
&$ Cnp/2 %!0kn # !%p ,
Assumption 4 On semimetric dn
There exist G0, G > 0 such that, for any two !, !$,
dn(!, !$) $ CkG0
n %! # !$%G
10 / 21
![Page 15: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/15.jpg)
Assumption 5 Test conditionThere exist constants c1, & > 0 such that for every $ > 0 and for each!1 such that dn(!1, !0) > $, one can construct a test statistic 'n ! [0, 1]which satisfies
E(n)0 'n $ e#c1n"2
, supdn(!,!1)<#"
E(n)! (1# 'n) $ e#c1n"2
.
11 / 21
![Page 16: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/16.jpg)
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
12 / 21
![Page 17: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/17.jpg)
Results
Theorem Posterior concentration rateThe rate of convergence of the posterior distribution relative to dn is$n,
E(n)0 "
%d2
n (!, !0) ' M$2n|X (n)
&) 0.
Corollary Risk convergence rateIf assumptions are satisfied with p > 2, and if dn is bounded, then theintegrated posterior risk given !0 and " converges at least at thesame rate $n
Rdnn (!0,") = E(n)
0 E",d2
n (!, !0)|X (n)-
= O'$2
n(.
13 / 21
![Page 18: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/18.jpg)
Results
Theorem Posterior concentration rateThe rate of convergence of the posterior distribution relative to dn is$n,
E(n)0 "
%d2
n (!, !0) ' M$2n|X (n)
&) 0.
Corollary Risk convergence rateIf assumptions are satisfied with p > 2, and if dn is bounded, then theintegrated posterior risk given !0 and " converges at least at thesame rate $n
Rdnn (!0,") = E(n)
0 E",d2
n (!, !0)|X (n)-
= O'$2
n(.
13 / 21
![Page 19: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/19.jpg)
Suppose the true parameter !0 has the Sobolev regularity (( > 1/2)
!$(Q0) =
.! :
""
i=1
!2i i2$ $ Q0 < (
/.
Then the assumption of the following Corollary holds in the Gaussianwhite noise model and in the regression. For these models, the rategiven in the following Corollary coincides with the minimax rate (up toa log n term) in these models: it is in this sense adaptive optimal.
14 / 21
![Page 20: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/20.jpg)
CorollaryIf ! ! !$(Q0) and
K%
p(n)0 , p(n)
0kn
&$ Cn %!0 # !0kn%
2 , Vp,0
%p(n)
0 , p(n)0kn
&$ Cnp/2 %!0 # !0kn%
p ,
then the rate $n is
$n = $0
0log n
n
1 !2!+1
.
15 / 21
![Page 21: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/21.jpg)
CorollaryIf ! ! !$(Q0) and
K%
p(n)0 , p(n)
0kn
&$ Cn %!0 # !0kn%
2 , Vp,0
%p(n)
0 , p(n)0kn
&$ Cnp/2 %!0 # !0kn%
p ,
then the rate $n is
$n = $0
0log n
n
1 !2!+1
.
15 / 21
![Page 22: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/22.jpg)
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
16 / 21
![Page 23: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/23.jpg)
White noise model
dX n(t) = f0(t)dt +1*n
dW (t), 0 $ t $ 1,
By Fourier transform on a basis ('i), equivalent normal mean model
X ni = !0i +
1*n
)i , i = 1, 2, . . .
Global L2 loss
RL2
n = E(n)0
222f̂n # f0222
2= E(n)
0
""
i=1
%#!ni # !0i
&2.
Pointwise l2 loss at point t (with ai = 'i(t))
Rl2n = E(n)
0
%f̂n(t)# f0(t)
&2= E(n)
0
3 ""
i=1
ai
%#!ni # !0i
&42
.
17 / 21
![Page 24: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/24.jpg)
White noise model
dX n(t) = f0(t)dt +1*n
dW (t), 0 $ t $ 1,
By Fourier transform on a basis ('i), equivalent normal mean model
X ni = !0i +
1*n
)i , i = 1, 2, . . .
Global L2 loss
RL2
n = E(n)0
222f̂n # f0222
2= E(n)
0
""
i=1
%#!ni # !0i
&2.
Pointwise l2 loss at point t (with ai = 'i(t))
Rl2n = E(n)
0
%f̂n(t)# f0(t)
&2= E(n)
0
3 ""
i=1
ai
%#!ni # !0i
&42
.
17 / 21
![Page 25: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/25.jpg)
Results in the white noise model
We show that the model satisfies Assumptions 1 to 5.
PropositionUnder global loss, concentration and risk rates are adaptive optimal
E(n)0 "
%%! # !0%2 ' M$2
n|X (n)&) 0,
RL2
n (!0,") = E(n)0 E"
,%! # !0%2 |X (n)
-= O
'$2
n(.
18 / 21
![Page 26: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/26.jpg)
Pointwise loss
Pointwise l2 loss does not satisfy Assumption 4. We can show thefollowing lower bound on the rate of the associated risk.
PropositionUnder pointwise loss, a lower bound on the frequentist risk rate isgiven by
sup!0%!!(Q0)
Rl2n (!0,") ! n#
2!!12!+1
log2 n.
A global optimal estimator can not be pointwise optimal (result statedby Cai, Low and Zhao, 2007).There is a penalty here from global to pointwise loss of (up to a log nterm)
n1
2!(2!+1) .
19 / 21
![Page 27: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/27.jpg)
Pointwise loss
Pointwise l2 loss does not satisfy Assumption 4. We can show thefollowing lower bound on the rate of the associated risk.
PropositionUnder pointwise loss, a lower bound on the frequentist risk rate isgiven by
sup!0%!!(Q0)
Rl2n (!0,") ! n#
2!!12!+1
log2 n.
A global optimal estimator can not be pointwise optimal (result statedby Cai, Low and Zhao, 2007).There is a penalty here from global to pointwise loss of (up to a log nterm)
n1
2!(2!+1) .
19 / 21
![Page 28: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/28.jpg)
Outline
1 Motivations
2 Assumptions
3 Results
4 White noise model
5 Conclusion
20 / 21
![Page 29: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/29.jpg)
Conclusion
• We have first derived posterior concentration and riskconvergence rates for a variety of models that accomodate asieve prior.
• In a second result we have obtained a lower bound for thefrequentist risk under pointwise loss, that is to say that the sieveprior does not achieve the optimal rate under pointwise loss.
• Further work should focus on posterior concentration rate underpointwise loss.
21 / 21
![Page 30: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/30.jpg)
Conclusion
• We have first derived posterior concentration and riskconvergence rates for a variety of models that accomodate asieve prior.
• In a second result we have obtained a lower bound for thefrequentist risk under pointwise loss, that is to say that the sieveprior does not achieve the optimal rate under pointwise loss.
• Further work should focus on posterior concentration rate underpointwise loss.
21 / 21
![Page 31: Bayesian adaptive optimal estimation using a sieve prior](https://reader031.fdocuments.us/reader031/viewer/2022020105/547dd92eb4795984508b49e0/html5/thumbnails/31.jpg)
Conclusion
• We have first derived posterior concentration and riskconvergence rates for a variety of models that accomodate asieve prior.
• In a second result we have obtained a lower bound for thefrequentist risk under pointwise loss, that is to say that the sieveprior does not achieve the optimal rate under pointwise loss.
• Further work should focus on posterior concentration rate underpointwise loss.
21 / 21