Bayesian adaptive optimal estimation using a sieve prior

Transcript of Bayesian adaptive optimal estimation using a sieve prior

Bayesian optimal adaptive estimation using asieve prior

YES IV Workshop

Julyan Arbel, [email protected]

ENSAE-CREST-Université Paris Dauphine

November 9, 2010

1 / 21

Page 2: Bayesian adaptive optimal estimation using a sieve prior

Outline

1 Motivations

2 Assumptions

3 Results

4 White noise model

5 Conclusion

2 / 21

Page 3: Bayesian adaptive optimal estimation using a sieve prior

Outline

1 Motivations

2 Assumptions

3 Results

4 White noise model

5 Conclusion

3 / 21

Page 4: Bayesian adaptive optimal estimation using a sieve prior

Introduction

• Posterior concentration rate and risk convergence rate in aBayesian nonparametric setting.

• Results in the same spirit as the ones by Ghosal, Ghosh and Vander Vaart (2000) and Ghosal and Van der Vaart (2007), in thespecific case of models which are suitable for the use of sievepriors.

• Use of a family of sieve priors (introduced by Zhao (2000) in thewhite noise model).

• Infinite dimensional parameter from a Sobolev smoothness class.

4 / 21

Page 5: Bayesian adaptive optimal estimation using a sieve prior

Notations• Let a model (X (n), A(n), P(n)

! : ! ! !) with observationsX (n) = (X n

i )1!i!n, and

! ="!

k=1

Rk .

• Denote !0 the parameter associated to the true model. Densitiesare denoted p(n)

! (p(n)0 for !0). The first k coordinates of !0 are

denoted !0k .

• A sieve prior " on ! is defined as follows

"(!) ="

k

"k"k (!),"

k

"k = 1,

and

!i

#i" g, where #i > 0.

5 / 21

Page 6: Bayesian adaptive optimal estimation using a sieve prior

i )1!i!n, and

! ="!

k=1

Rk .

denoted !0k .

"(!) ="

k

"k"k (!),"

k

"k = 1,

and

!i

5 / 21

Page 7: Bayesian adaptive optimal estimation using a sieve prior

i )1!i!n, and

! ="!

k=1

Rk .

denoted !0k .

"(!) ="

k

"k"k (!),"

k

"k = 1,

and

!i

5 / 21

Page 8: Bayesian adaptive optimal estimation using a sieve prior

We define four different divergencies

K (f , g) =

ˆ

f log(f/g)dµ,

Vp,0(f , g) =

ˆ

f |log(f/g)# K (f , g)|p dµ,

#K (f , g) =

ˆ

p(n)0 |log(f , g)|dµ,

#Vp,0(f , g) =

ˆ

p(n)0 |log(f , g)# K (f , g)|p dµ.

6 / 21

Page 9: Bayesian adaptive optimal estimation using a sieve prior

Define a Kullblack-Leibler neighborhood

Bn =$

! : K%

p(n)0 , p(n)

!

&$ n$2

n, Vp,0

%p(n)

0 , p(n)!

&$

'n$2

n(p/2)

.

We use a semimetric dn on !, and define !n =*! ! Rkn , %!% $ %n

+

with kn = k0n$2n/ log n and %n some power of n.

The posterior distribution is defined by

"(B|X (n)) =

´

B p(n)!

'X (n)

(d"(!)

´

! p(n)!

'X (n)

(d"(!)

.

7 / 21

Page 10: Bayesian adaptive optimal estimation using a sieve prior

Outline

1 Motivations

2 Assumptions

3 Results

4 White noise model

5 Conclusion

8 / 21

Page 11: Bayesian adaptive optimal estimation using a sieve prior

Assumptions

Assumption 1 On the priorAssume there exist a, b, c, d > 0 such that "k and gn satisfy

e#ak log k $ "k $ e#bk log k ,

Ae#A1|t|d$ g(t) $ Be#B1|t|d

,

&T , #0 > 0, s.t. mini!kn

#i ' n#T and maxi>0

#i $ #0 < (,

kn"

i=1

|!0i |d /#di $ Ckn log n.

Assumption 2 On the rate of convergenceThe rate of convergence $n is bounded below by the two inequalities

K%

p(n)0 , p(n)

0kn

&$ n$2

n, and Vp,0

%p(n)

0 , p(n)0kn

&$

'n$2

n(p/2

.

9 / 21

Page 12: Bayesian adaptive optimal estimation using a sieve prior

Assumptions

Assumption 1 On the priorAssume there exist a, b, c, d > 0 such that "k and gn satisfy

e#ak log k $ "k $ e#bk log k ,

Ae#A1|t|d$ g(t) $ Be#B1|t|d

,

&T , #0 > 0, s.t. mini!kn

#i ' n#T and maxi>0

#i $ #0 < (,

kn"

i=1

|!0i |d /#di $ Ckn log n.

Assumption 2 On the rate of convergenceThe rate of convergence $n is bounded below by the two inequalities

K%

p(n)0 , p(n)

0kn

&$ n$2

n, and Vp,0

%p(n)

0 , p(n)0kn

&$

'n$2

n(p/2

.

9 / 21

Page 13: Bayesian adaptive optimal estimation using a sieve prior

Assumption 3 On divergencies#K and #Vp,0 satisfy

#K%

p(n)0kn

, p(n)!

&$ C

n2%!0kn # !%2 , #Vp,0

%p(n)

0kn, p(n)

!

&$ Cnp/2 %!0kn # !%p ,

Assumption 4 On semimetric dn

There exist G0, G > 0 such that, for any two !, !$,

dn(!, !$) $ CkG0

n %! # !$%G

10 / 21

Page 14: Bayesian adaptive optimal estimation using a sieve prior

Assumption 3 On divergencies#K and #Vp,0 satisfy

#K%

p(n)0kn

, p(n)!

&$ C

n2%!0kn # !%2 , #Vp,0

%p(n)

0kn, p(n)

!

&$ Cnp/2 %!0kn # !%p ,

Assumption 4 On semimetric dn

There exist G0, G > 0 such that, for any two !, !$,

dn(!, !$) $ CkG0

n %! # !$%G

10 / 21

Page 15: Bayesian adaptive optimal estimation using a sieve prior

Assumption 5 Test conditionThere exist constants c1, & > 0 such that for every $ > 0 and for each!1 such that dn(!1, !0) > $, one can construct a test statistic 'n ! [0, 1]which satisfies

E(n)0 'n $ e#c1n"2

, supdn(!,!1)<#"

E(n)! (1# 'n) $ e#c1n"2

.

11 / 21

Page 16: Bayesian adaptive optimal estimation using a sieve prior

Outline

1 Motivations

2 Assumptions

3 Results

4 White noise model

5 Conclusion

12 / 21

Page 17: Bayesian adaptive optimal estimation using a sieve prior

Results

Theorem Posterior concentration rateThe rate of convergence of the posterior distribution relative to dn is$n,

E(n)0 "

%d2

n (!, !0) ' M$2n|X (n)

&) 0.

Corollary Risk convergence rateIf assumptions are satisfied with p > 2, and if dn is bounded, then theintegrated posterior risk given !0 and " converges at least at thesame rate $n

Rdnn (!0,") = E(n)

0 E",d2

n (!, !0)|X (n)-

= O'$2

n(.

13 / 21

Page 18: Bayesian adaptive optimal estimation using a sieve prior

Results

Theorem Posterior concentration rateThe rate of convergence of the posterior distribution relative to dn is$n,

E(n)0 "

%d2

n (!, !0) ' M$2n|X (n)

&) 0.

Corollary Risk convergence rateIf assumptions are satisfied with p > 2, and if dn is bounded, then theintegrated posterior risk given !0 and " converges at least at thesame rate $n

Rdnn (!0,") = E(n)

0 E",d2

n (!, !0)|X (n)-

= O'$2

n(.

13 / 21

Page 19: Bayesian adaptive optimal estimation using a sieve prior

Suppose the true parameter !0 has the Sobolev regularity (( > 1/2)

!$(Q0) =

.! :

""

i=1

!2i i2$ $ Q0 < (

/.

Then the assumption of the following Corollary holds in the Gaussianwhite noise model and in the regression. For these models, the rategiven in the following Corollary coincides with the minimax rate (up toa log n term) in these models: it is in this sense adaptive optimal.

14 / 21

Page 20: Bayesian adaptive optimal estimation using a sieve prior

CorollaryIf ! ! !$(Q0) and

K%

p(n)0 , p(n)

0kn

&$ Cn %!0 # !0kn%

2 , Vp,0

%p(n)

0 , p(n)0kn

&$ Cnp/2 %!0 # !0kn%

p ,

then the rate $n is

$n = $0

0log n

n

1 !2!+1

.

15 / 21

Page 21: Bayesian adaptive optimal estimation using a sieve prior

CorollaryIf ! ! !$(Q0) and

K%

p(n)0 , p(n)

0kn

&$ Cn %!0 # !0kn%

2 , Vp,0

%p(n)

0 , p(n)0kn

&$ Cnp/2 %!0 # !0kn%

p ,

then the rate $n is

$n = $0

0log n

n

1 !2!+1

.

15 / 21

Page 22: Bayesian adaptive optimal estimation using a sieve prior

Outline

1 Motivations

2 Assumptions

3 Results

4 White noise model

5 Conclusion

16 / 21

Page 23: Bayesian adaptive optimal estimation using a sieve prior

White noise model

dX n(t) = f0(t)dt +1*n

dW (t), 0 $ t $ 1,

By Fourier transform on a basis ('i), equivalent normal mean model

X ni = !0i +

1*n

)i , i = 1, 2, . . .

Global L2 loss

RL2

n = E(n)0

222f̂n # f0222

2= E(n)

0

""

i=1

%#!ni # !0i

&2.

Pointwise l2 loss at point t (with ai = 'i(t))

Rl2n = E(n)

0

%f̂n(t)# f0(t)

&2= E(n)

0

3 ""

i=1

ai

%#!ni # !0i

&42

.

17 / 21

Page 24: Bayesian adaptive optimal estimation using a sieve prior

White noise model

dX n(t) = f0(t)dt +1*n

dW (t), 0 $ t $ 1,

By Fourier transform on a basis ('i), equivalent normal mean model

X ni = !0i +

1*n

)i , i = 1, 2, . . .

Global L2 loss

RL2

n = E(n)0

222f̂n # f0222

2= E(n)

0

""

i=1

%#!ni # !0i

&2.

Pointwise l2 loss at point t (with ai = 'i(t))

Rl2n = E(n)

0

%f̂n(t)# f0(t)

&2= E(n)

0

3 ""

i=1

ai

%#!ni # !0i

&42

.

17 / 21

Page 25: Bayesian adaptive optimal estimation using a sieve prior

Results in the white noise model

We show that the model satisfies Assumptions 1 to 5.

PropositionUnder global loss, concentration and risk rates are adaptive optimal

E(n)0 "

%%! # !0%2 ' M$2

n|X (n)&) 0,

RL2

n (!0,") = E(n)0 E"

,%! # !0%2 |X (n)

-= O

'$2

n(.

18 / 21

Page 26: Bayesian adaptive optimal estimation using a sieve prior

Pointwise loss

Pointwise l2 loss does not satisfy Assumption 4. We can show thefollowing lower bound on the rate of the associated risk.

PropositionUnder pointwise loss, a lower bound on the frequentist risk rate isgiven by

sup!0%!!(Q0)

Rl2n (!0,") ! n#

2!!12!+1

log2 n.

A global optimal estimator can not be pointwise optimal (result statedby Cai, Low and Zhao, 2007).There is a penalty here from global to pointwise loss of (up to a log nterm)

n1

2!(2!+1) .

19 / 21

Page 27: Bayesian adaptive optimal estimation using a sieve prior

Pointwise loss

Pointwise l2 loss does not satisfy Assumption 4. We can show thefollowing lower bound on the rate of the associated risk.

PropositionUnder pointwise loss, a lower bound on the frequentist risk rate isgiven by

sup!0%!!(Q0)

Rl2n (!0,") ! n#

2!!12!+1

log2 n.

A global optimal estimator can not be pointwise optimal (result statedby Cai, Low and Zhao, 2007).There is a penalty here from global to pointwise loss of (up to a log nterm)

n1

2!(2!+1) .

19 / 21

Page 28: Bayesian adaptive optimal estimation using a sieve prior

Outline

1 Motivations

2 Assumptions

3 Results

4 White noise model

5 Conclusion

20 / 21

Page 29: Bayesian adaptive optimal estimation using a sieve prior

Conclusion

• We have first derived posterior concentration and riskconvergence rates for a variety of models that accomodate asieve prior.

• In a second result we have obtained a lower bound for thefrequentist risk under pointwise loss, that is to say that the sieveprior does not achieve the optimal rate under pointwise loss.

• Further work should focus on posterior concentration rate underpointwise loss.

21 / 21