ASSP8_
Transcript of ASSP8_
-
8/9/2019 ASSP8_
1/26
Page 3
Bayesian Estimation
Classical approach in statistical estimation:
is assumed to be a deterministic but unknown constant.
Bayesian approach:
We assume that is a random variable whose particularrealization we must estimate.
This is the Bayesian approach, so named because itsimplementation is based directly on Bayes' theorem.
Prior knowledge about can be incorporated into ourestimator by assuming that is a random variable with agivenprior PDF.
-
8/9/2019 ASSP8_
2/26
Page 4
Bayesian Estimation
MSE in Classical Estimation.
MSE in Bayesian Estimation:
The difference: In the Bayesian approach the averaging PDF isthe joint PDF ofxand , while the averaging PDF in theclassical approach isp(x).
== xxx dpE )()()(MSE22
== ddpE xxx ),()()(BMSE
22
,
-
8/9/2019 ASSP8_
3/26
Page 5
Bayesian Estimation
Underlying Experiment (for our usual simple DC level in noise,where we assume U[-A0,A0] as prior PDF).
Classical approach: MSE for each value of.
Bayesian approach: One single MSE (which is an average over thePDF of).
Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River 2009
-
8/9/2019 ASSP8_
4/26Page 6
Bayesian Estimation
Derivation of the Bayesian MMSE estimator
Cost function: The Bayesian MSE
Note: The averaging PDF is the joint PDF ofx and !
We apply Bayes theorem
to obtain
Sincep(x) >= 0 for all x, if the integral in brackets can beminimized for each x, then the Bayesian MSE will be minimized.
= ddpJ xx ),()(2
)()|(),( xxx ppp =
[ ] .)()|()( 2 xxx dpdpJ =
-
8/9/2019 ASSP8_
5/26Page 7
Bayesian Estimation
Hence, fixing x so that is a scalar variable (as opposed to ageneral function ofx), we have
which when set to zero results in
= dpJ )|()('2
x
2)|(2
)|(2)|(2
)|()(2
)|()(
)|()(
' 22
+=
+=
=
=
x
xx
x
xx
E
dpdp
dp
dpdpJ
).|( x E=
-
8/9/2019 ASSP8_
6/26Page 8
Bayesian Estimation
Comments:
It is seen that the optimal estimator in terms of minimizing theBayesian MSE is the mean of theposteriorPDFp(lx).
The posterior PDF refers to the PDF ofafterthe data have beenobserved. It summarizes our new state of knowledge about theparameter.
In contrast,p() may be thought of as the prior PDF of,indicating the PDF before the data are observed.
We will term the estimator that minimizes the Bayesian MSE the
minimum mean square error (MMSE) estimator.
Intuitively, the effect of observing data will be to concentrate thePDF of.
This is because knowledge of the data should reduce ouruncertainty about .
-
8/9/2019 ASSP8_
7/26Page 9
Bayesian Estimation
Comments:
The MMSE estimator will in general depend on the prior knowledge aswell as the data.
If the prior knowledge is weak relative to that of the data, then theestimator will ignore the prior knowledge.
Otherwise, the estimator will be "biased" towards the prior mean. Asexpected, the use of prior information always improves the estimation
accuracy.
The choice of a prior PDF is critical in Bayesian estimation. The wrongchoice will result in a poor estimator, similar to the problems of a
classical estimator designed with an incorrect data model.
Remember: Classical MSE will depend on hence estimators that
attempt to minimize the MSE will usually depend on the Bayesian willnot!
In effect the parameter dependency has be integrated away
-
8/9/2019 ASSP8_
8/26Page 10
Bayesian Estimation
We derived the MMSE estimator for the case of continuousrandom variables.
We notice that the same estimator also holds for discrete random
variables.
Example 1:
-
8/9/2019 ASSP8_
9/26Page 11
Bayesian Estimation
It is often not possible to find closed form solutions for the MMSEestimator. An exception is the case, where x and are jointlyGaussian distributed.
-
8/9/2019 ASSP8_
10/26
Page 12
Bayesian Estimation
Example: DC level in WGN with Gaussian prior
x[n] = A + w[n]
Gaussian prior
with A=0
If
then p(A|x) can be written as:
( )
= A
AA
AAp22 2
1exp
2
1)(
( )( )
=
=
1
0
2
2
22
][2
1exp
2
1)|(
N
nN
AnxAp
x
dAAQ
AQ
Ap
=
)(2
1exp
)(2
1exp
)|(x
-
8/9/2019 ASSP8_
11/26
Page 13
Bayesian Estimation
with
note that the denominator of p(A|x) does not depend on A anymore, being a normalizing factor (normalizing the area belowp(A|x) ) and the argument of the exponential is quadratic in A
Hence p(A|x) must be Gaussian. It can be shown that its mean
and variance are:
A
A
A
A
A
AAxNAANAQ2
2
22
2
22
222)(
++=
A
A
A
A
A
A
N
N
xN
22
x|2
22
22
x|
1
1
1
+
=
+
+
=
-
8/9/2019 ASSP8_
12/26
Page 14
Bayesian Estimation
In this form, the Bayes MMSE estimator is readily found as
for better interpretation this can be written as
with
A
A
A
A N
xN
AEA22
22
x| 1x)|(
+
+
===
AA
AA
Ax
N
Nx
N
A
)1(
22
2
22
2
+=
+
+
+
=
NA
A
22
2
+
=
-
8/9/2019 ASSP8_
13/26
Page 15
Bayesian Estimation
Note that is a weighting factor since 0 < < 1
When there is little data available so thatthen is small and
but as more data are observed so thatand
The weighting factor directly depends on the confidence in the
prior knowledge and the confidence in the sample data
If one examines the posterior pdf, its variance
decreases as N increases.
NA
22 >xA 1
2
A N/2
A
A N22
x|
2
1
1
+
=
-
8/9/2019 ASSP8_
14/26
Page 16
Bayesian Estimation
As we have seen, the posterior mean changes with increasing N
for small N it will approximately be
but will approach for increasing N
A
x
Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River 2009
-
8/9/2019 ASSP8_
15/26
Page 17
Bayesian EstimationVector Case
Theorem: Ifxandare jointly Gaussian, where x is of dimensionkx1 andyof dimension lx1, with mean vector [E(x) E()]Tand
partitioned covariance matrix
so that
then the conditional PDF p(lx) is also Gaussian and
=
CCCCC
x
xxx
=
+ )(
)(C
)(
)(
2
1exp
)(det)2(
1),( 1
2
1
2
xx
xx
C
xE
E
E
Ep
T
lk
xxxxx
xxx EEE
CCCCC
xxCCx
1
|
1))(()()|(
=
+=
B i E ti ti
-
8/9/2019 ASSP8_
16/26
Page 18
Bayesian EstimationVector Case
Theorem: If the observed data x can be modeled as
where x is an Nx1 data vector, H is a known Nxp matrix,is a px1random vector with prior pdf N(,C), andw is an Nx1 noisevector with pdf N(0,C) and independent ofthen the posterior pdf
p(|x) is Gaussian with mean
and covariance
In contrast to the classical general linear model, H need not befull rank to ensure the invertibility of
Note that is also the covariance matrix of the estimation error( has zero mean)
)()(x)|(1
HxCHHCHC ++=
w
TT
E
wHx +=
HCCHHCHCCC1
| )(
+= wTT
x
w
TCHHC +
x|C
=
-
8/9/2019 ASSP8_
17/26
Page 19
Linear Bayesian Estimation
Linear MMSE Estimation
Except when x and are jointly Gaussian the MMSE estimatormay be difficult to find.
The situation is different when we constrain the estimator to belinear in x.
As will be seen shortly we do not have to assume any specificform for the joint PDFp(x, ), only a knowledge of the first twomoments will be sufficient to derive the LMMSE (compare to theBLUE).
That may be estimated from x is due to the assumed statisticaldependence of on x as summarized by the joint PDF p(x, ).
In particular, for a linear estimator we rely on the correlationbetween and x.
-
8/9/2019 ASSP8_
18/26
Page 20
Linear Bayesian Estimation
Introductory Example: Assumexand are jointly distributed. Findthe linear (actually affine) estimator
that minimizes the Bayesian MMSE:
Solution:
Example: Derive the LMMSE estimator and the MSE for Example 1.
))(()var(
),cov()( xEx
x
xE +=
bax+=
2
, )
( =
xEJ
Linear Bayesian Estimation
-
8/9/2019 ASSP8_
19/26
Page 21
Linear Bayesian EstimationScalar Parameter
Aim: Find the (affine linear) estimator of form
that minimizes the Bayesian MSE
aNcompensate nonzero means ofx and
omitted when both means are zero
2
, )( = xEJ
=
+=
1
0
][N
n
Nn anxa
Linear Bayesian Estimation
-
8/9/2019 ASSP8_
20/26
Page 22
Linear Bayesian EstimationScalar Parameter
Deriving the optimal weighting coefficients:
Starting with aN:
Setting to zero results in:
( ) ( )
=
=
=
=
=
=
1
0
1
0
21
0
][2
][2][
N
n
Nn
N
n
Nn
N
n
Nn
N
anxEaE
anxaEanxaEa
( ) ( )
=
=1
0
][N
n
nN nxEaEa
Linear Bayesian Estimation
-
8/9/2019 ASSP8_
21/26
Page 23
Linear Bayesian EstimationScalar Parameter
Continuing for the remaining coefficients an:
When writing the sums as inner vector products witha = [a0,a1,,aN-1] leads to
( ) ( )
( )( ) ( )( )
=
+=
=
=
=
=
21
0
21
0
1
0
21
0
][][
][][][
N
n
n
N
n
N
n
nn
N
n
Nn
EnxEnxaE
nxEaEnxaEanxaE
( )( ) ( )( )( )( )( ) ( )( )[ ] ( )( ) ( )( )[ ]
( )( ) ( )( )[ ] ( )( )[ ]2
2
EEEEE
EEEEEE
EEE
T
TTT
T
+
=
=
axx
xxaaxxxxa
xxa
Linear Bayesian Estimation
-
8/9/2019 ASSP8_
22/26
Page 24
Linear Bayesian EstimationScalar Parameter
where is Cxx the NxN covariance matrix ofx and cx is the 1xNcross-covariance vector having the property cx = cx
T and isthe variance of.
Taking the gradient yields
setting to zero results in
( )( ) ( )( )( )( )( ) ( )( )[ ] ( )( ) ( )( )[ ]
( )( ) ( )( )[ ] ( )( )[ ]
+=
=+
=
==
accaaCa
axx
xxaaxxxxa
xxa
xx
T
xx
T
T
TTT
T
EEEEE
EEEEEE
EEEJ
2
2
xxx
JcaC
a22 =
xxx cCa1
=
Linear Bayesian Estimation
-
8/9/2019 ASSP8_
23/26
Page 25
Linear Bayesian EstimationScalar Parameter
Combining with the result for aN leads to
as the LMMSE estimator and the corresponding BMSE of
Note that this is identical in form to the MMSE estimator forjointly Gaussian x and . This is because in the Gaussian case theMMSE estimator happens to be linear, and hence our constraint isautomatically satisfied.
( ) ( )( )xxCc EE xxx +=1
xxxxBMSE cCc1
)(
=
Linear Bayesian Estimation
-
8/9/2019 ASSP8_
24/26
Page 26
Linear Bayesian EstimationVector Parameter
The vector LMMSE estimator is a straightforward extension of thescalar one
We wish to find the linear estimator that minimizes the Bayesian
MSE for each element:
for i=1,2,,p. And choose the weighting coefficient to minimize
Combining the scalar LMMSE estimators leads to
and
=
+=
1
0
][N
n
iNini anxa
2)( iii EJ =
( ) ( )( )xxCc EE xxx +=1
iii xxxxiBMSE cCc
1)(
=
Linear Bayesian Estimation
-
8/9/2019 ASSP8_
25/26
Page 27
Linear Bayesian EstimationVector Parameter
Problem: inverse system identification
Let the following communication scenario be given:
Data samples y[k] (+1 or -1, uncorrelated, zero mean, y2=1)
are transmitted through a discrete time linear system, given byits impulse response h = [h0,, hl-1]. After that additive whiteGaussian noise n[k] (zero mean, n
2) is added. Your task is to
Find the best linear system w=[w0
,, wp-1
] in an LMMSEsense to estimate the data.
Write down the estimator using the hints at the next slide
Write a Matlab script simulating the system with l=4 and p=4
Vary n2 from 0.001 to 1 and observe the results
Linear Bayesian Estimation
-
8/9/2019 ASSP8_
26/26
Page 28
Linear Bayesian EstimationVector Parameter
Hints:
as we will see in the following lectures:
For uncorrelated data samples and uncorrelated noise with zero
means and y2 and n2, respectively, we have
as the autocorrelation matrix of the samples and
as the cross correlation vector of x and y. ei is the vector that has a
one as position i and zeros at all other elements. Choose i = l+1 andthe length ofei as 7.
H is the convolution matrix of h. Use convmtx(h,l) to obtain H inmatlab.
Please be aware that the output sequence after the filter w is shiftedby i=l+1 samples
+= IHHR
2
22
y
nH
yxx
iy
H
xy eHr2=