ASSP8_

8/9/2019 ASSP8_

1/26

Page 3

Bayesian Estimation

Classical approach in statistical estimation:

is assumed to be a deterministic but unknown constant.

Bayesian approach:

We assume that is a random variable whose particularrealization we must estimate.

This is the Bayesian approach, so named because itsimplementation is based directly on Bayes' theorem.

Prior knowledge about can be incorporated into ourestimator by assuming that is a random variable with agivenprior PDF.

8/9/2019 ASSP8_

2/26

Page 4

Bayesian Estimation

MSE in Classical Estimation.

MSE in Bayesian Estimation:

The difference: In the Bayesian approach the averaging PDF isthe joint PDF ofxand , while the averaging PDF in theclassical approach isp(x).

== xxx dpE )()()(MSE22

== ddpE xxx ),()()(BMSE

22

,

8/9/2019 ASSP8_

3/26

Page 5

Bayesian Estimation

Underlying Experiment (for our usual simple DC level in noise,where we assume U[-A0,A0] as prior PDF).

Classical approach: MSE for each value of.

Bayesian approach: One single MSE (which is an average over thePDF of).

Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River 2009

8/9/2019 ASSP8_

4/26Page 6

Bayesian Estimation

Derivation of the Bayesian MMSE estimator

Cost function: The Bayesian MSE

Note: The averaging PDF is the joint PDF ofx and !

We apply Bayes theorem

to obtain

Sincep(x) >= 0 for all x, if the integral in brackets can beminimized for each x, then the Bayesian MSE will be minimized.

= ddpJ xx ),()(2

)()|(),( xxx ppp =

[ ] .)()|()( 2 xxx dpdpJ =

8/9/2019 ASSP8_

5/26Page 7

Bayesian Estimation

Hence, fixing x so that is a scalar variable (as opposed to ageneral function ofx), we have

which when set to zero results in

= dpJ )|()('2

x

2)|(2

)|(2)|(2

)|()(2

)|()(

)|()(

' 22

+=

+=

=

=

x

xx

x

xx

E

dpdp

dp

dpdpJ

).|( x E=

8/9/2019 ASSP8_

6/26Page 8

Bayesian Estimation

Comments:

It is seen that the optimal estimator in terms of minimizing theBayesian MSE is the mean of theposteriorPDFp(lx).

The posterior PDF refers to the PDF ofafterthe data have beenobserved. It summarizes our new state of knowledge about theparameter.

In contrast,p() may be thought of as the prior PDF of,indicating the PDF before the data are observed.

We will term the estimator that minimizes the Bayesian MSE the

minimum mean square error (MMSE) estimator.

Intuitively, the effect of observing data will be to concentrate thePDF of.

This is because knowledge of the data should reduce ouruncertainty about .

8/9/2019 ASSP8_

7/26Page 9

Bayesian Estimation

Comments:

The MMSE estimator will in general depend on the prior knowledge aswell as the data.

If the prior knowledge is weak relative to that of the data, then theestimator will ignore the prior knowledge.

Otherwise, the estimator will be "biased" towards the prior mean. Asexpected, the use of prior information always improves the estimation

accuracy.

The choice of a prior PDF is critical in Bayesian estimation. The wrongchoice will result in a poor estimator, similar to the problems of a

classical estimator designed with an incorrect data model.

Remember: Classical MSE will depend on hence estimators that

attempt to minimize the MSE will usually depend on the Bayesian willnot!

In effect the parameter dependency has be integrated away

8/9/2019 ASSP8_

8/26Page 10

Bayesian Estimation

We derived the MMSE estimator for the case of continuousrandom variables.

We notice that the same estimator also holds for discrete random

variables.

Example 1:

8/9/2019 ASSP8_

9/26Page 11

Bayesian Estimation

It is often not possible to find closed form solutions for the MMSEestimator. An exception is the case, where x and are jointlyGaussian distributed.

8/9/2019 ASSP8_

10/26

Page 12

Bayesian Estimation

Example: DC level in WGN with Gaussian prior

x[n] = A + w[n]

Gaussian prior

with A=0

If

then p(A|x) can be written as:

( )

= A

AA

AAp22 2

1exp

2

1)(

( )( )

=

=

1

0

2

2

22

][2

1exp

2

1)|(

N

nN

AnxAp

x

dAAQ

AQ

Ap

=

)(2

1exp

)(2

1exp

)|(x

8/9/2019 ASSP8_

11/26

Page 13

Bayesian Estimation

with

note that the denominator of p(A|x) does not depend on A anymore, being a normalizing factor (normalizing the area belowp(A|x) ) and the argument of the exponential is quadratic in A

Hence p(A|x) must be Gaussian. It can be shown that its mean

and variance are:

A

A

A

A

A

AAxNAANAQ2

2

22

2

22

222)(

++=

A

A

A

A

A

A

N

N

xN

22

x|2

22

22

x|

1

1

1

+

=

+

+

=

8/9/2019 ASSP8_

12/26

Page 14

Bayesian Estimation

In this form, the Bayes MMSE estimator is readily found as

for better interpretation this can be written as

with

A

A

A

A N

xN

AEA22

22

x| 1x)|(

+

+

===

AA

AA

Ax

N

Nx

N

A

)1(

22

2

22

2

+=

+

+

+

=

NA

A

22

2

+

=

8/9/2019 ASSP8_

13/26

Page 15

Bayesian Estimation

Note that is a weighting factor since 0 < < 1

When there is little data available so thatthen is small and

but as more data are observed so thatand

The weighting factor directly depends on the confidence in the

prior knowledge and the confidence in the sample data

If one examines the posterior pdf, its variance

decreases as N increases.

NA

22 >xA 1

2

A N/2

A

A N22

x|

2

1

1

+

=

8/9/2019 ASSP8_

14/26

Page 16

Bayesian Estimation

As we have seen, the posterior mean changes with increasing N

for small N it will approximately be

but will approach for increasing N

A

x

Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River 2009

8/9/2019 ASSP8_

15/26

Page 17

Bayesian EstimationVector Case

Theorem: Ifxandare jointly Gaussian, where x is of dimensionkx1 andyof dimension lx1, with mean vector [E(x) E()]Tand

partitioned covariance matrix

so that

then the conditional PDF p(lx) is also Gaussian and

=

CCCCC

x

xxx

=

+ )(

)(C

)(

)(

2

1exp

)(det)2(

1),( 1

2

1

2

xx

xx

C

xE

E

E

Ep

T

lk

xxxxx

xxx EEE

CCCCC

xxCCx

1

|

1))(()()|(

=

+=

B i E ti ti

8/9/2019 ASSP8_

16/26

Page 18

Bayesian EstimationVector Case

Theorem: If the observed data x can be modeled as

where x is an Nx1 data vector, H is a known Nxp matrix,is a px1random vector with prior pdf N(,C), andw is an Nx1 noisevector with pdf N(0,C) and independent ofthen the posterior pdf

p(|x) is Gaussian with mean

and covariance

In contrast to the classical general linear model, H need not befull rank to ensure the invertibility of

Note that is also the covariance matrix of the estimation error( has zero mean)

)()(x)|(1

HxCHHCHC ++=

w

TT

E

wHx +=

HCCHHCHCCC1

| )(

+= wTT

x

w

TCHHC +

x|C

=

8/9/2019 ASSP8_

17/26

Page 19

Linear Bayesian Estimation

Linear MMSE Estimation

Except when x and are jointly Gaussian the MMSE estimatormay be difficult to find.

The situation is different when we constrain the estimator to belinear in x.

As will be seen shortly we do not have to assume any specificform for the joint PDFp(x, ), only a knowledge of the first twomoments will be sufficient to derive the LMMSE (compare to theBLUE).

That may be estimated from x is due to the assumed statisticaldependence of on x as summarized by the joint PDF p(x, ).

In particular, for a linear estimator we rely on the correlationbetween and x.

8/9/2019 ASSP8_

18/26

Page 20


Introductory Example: Assumexand are jointly distributed. Findthe linear (actually affine) estimator

that minimizes the Bayesian MMSE:

Solution:

Example: Derive the LMMSE estimator and the MSE for Example 1.

))(()var(

),cov()( xEx

x

xE +=

bax+=

2

, )

( =

xEJ


8/9/2019 ASSP8_

19/26

Page 21

Linear Bayesian EstimationScalar Parameter

Aim: Find the (affine linear) estimator of form

that minimizes the Bayesian MSE

aNcompensate nonzero means ofx and

omitted when both means are zero

2

, )( = xEJ

=

+=

1

0

][N

n

Nn anxa


8/9/2019 ASSP8_

20/26

Page 22


Deriving the optimal weighting coefficients:

Starting with aN:

Setting to zero results in:

( ) ( )

=

=

=

=

=

=

1

0

1

0

21

0

][2

][2][

N

n

Nn

N

n

Nn

N

n

Nn

N

anxEaE

anxaEanxaEa

( ) ( )

=

=1

0

][N

n

nN nxEaEa


8/9/2019 ASSP8_

21/26

Page 23


Continuing for the remaining coefficients an:

When writing the sums as inner vector products witha = [a0,a1,,aN-1] leads to

( ) ( )

( )( ) ( )( )

=

+=

=

=

=

=

21

0

21

0

1

0

21

0

][][

][][][

N

n

n

N

n

N

n

nn

N

n

Nn

EnxEnxaE

nxEaEnxaEanxaE

( )( ) ( )( )( )( )( ) ( )( )[ ] ( )( ) ( )( )[ ]

( )( ) ( )( )[ ] ( )( )[ ]2

2

EEEEE

EEEEEE

EEE

T

TTT

T

+

=

=

axx

xxaaxxxxa

xxa


8/9/2019 ASSP8_

22/26

Page 24


where is Cxx the NxN covariance matrix ofx and cx is the 1xNcross-covariance vector having the property cx = cx

T and isthe variance of.

Taking the gradient yields

setting to zero results in

( )( ) ( )( )( )( )( ) ( )( )[ ] ( )( ) ( )( )[ ]

( )( ) ( )( )[ ] ( )( )[ ]

+=

=+

=

==

accaaCa

axx

xxaaxxxxa

xxa

xx

T

xx

T

T

TTT

T

EEEEE

EEEEEE

EEEJ

2

2

xxx

JcaC

a22 =

xxx cCa1

=


8/9/2019 ASSP8_

23/26

Page 25


Combining with the result for aN leads to

as the LMMSE estimator and the corresponding BMSE of

Note that this is identical in form to the MMSE estimator forjointly Gaussian x and . This is because in the Gaussian case theMMSE estimator happens to be linear, and hence our constraint isautomatically satisfied.

( ) ( )( )xxCc EE xxx +=1

xxxxBMSE cCc1

)(

=


8/9/2019 ASSP8_

24/26

Page 26

Linear Bayesian EstimationVector Parameter

The vector LMMSE estimator is a straightforward extension of thescalar one

We wish to find the linear estimator that minimizes the Bayesian

MSE for each element:

for i=1,2,,p. And choose the weighting coefficient to minimize

Combining the scalar LMMSE estimators leads to

and

=

+=

1

0

][N

n

iNini anxa

2)( iii EJ =

( ) ( )( )xxCc EE xxx +=1

iii xxxxiBMSE cCc

1)(

=


8/9/2019 ASSP8_

25/26

Page 27


Problem: inverse system identification

Let the following communication scenario be given:

Data samples y[k] (+1 or -1, uncorrelated, zero mean, y2=1)

are transmitted through a discrete time linear system, given byits impulse response h = [h0,, hl-1]. After that additive whiteGaussian noise n[k] (zero mean, n

2) is added. Your task is to

Find the best linear system w=[w0

,, wp-1

] in an LMMSEsense to estimate the data.

Write down the estimator using the hints at the next slide

Write a Matlab script simulating the system with l=4 and p=4

Vary n2 from 0.001 to 1 and observe the results


8/9/2019 ASSP8_

26/26

Page 28


Hints:

as we will see in the following lectures:

For uncorrelated data samples and uncorrelated noise with zero

means and y2 and n2, respectively, we have

as the autocorrelation matrix of the samples and

as the cross correlation vector of x and y. ei is the vector that has a

one as position i and zeros at all other elements. Choose i = l+1 andthe length ofei as 7.

H is the convolution matrix of h. Use convmtx(h,l) to obtain H inmatlab.

Please be aware that the output sequence after the filter w is shiftedby i=l+1 samples

+= IHHR

2

22

y

nH

yxx

iy

H

xy eHr2=

ASSP8_

Documents

Transcript of ASSP8_