ASSP8_

download ASSP8_

of 26

Transcript of ASSP8_

  • 8/9/2019 ASSP8_

    1/26

    Page 3

    Bayesian Estimation

    Classical approach in statistical estimation:

    is assumed to be a deterministic but unknown constant.

    Bayesian approach:

    We assume that is a random variable whose particularrealization we must estimate.

    This is the Bayesian approach, so named because itsimplementation is based directly on Bayes' theorem.

    Prior knowledge about can be incorporated into ourestimator by assuming that is a random variable with agivenprior PDF.

  • 8/9/2019 ASSP8_

    2/26

    Page 4

    Bayesian Estimation

    MSE in Classical Estimation.

    MSE in Bayesian Estimation:

    The difference: In the Bayesian approach the averaging PDF isthe joint PDF ofxand , while the averaging PDF in theclassical approach isp(x).

    == xxx dpE )()()(MSE22

    == ddpE xxx ),()()(BMSE

    22

    ,

  • 8/9/2019 ASSP8_

    3/26

    Page 5

    Bayesian Estimation

    Underlying Experiment (for our usual simple DC level in noise,where we assume U[-A0,A0] as prior PDF).

    Classical approach: MSE for each value of.

    Bayesian approach: One single MSE (which is an average over thePDF of).

    Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River 2009

  • 8/9/2019 ASSP8_

    4/26Page 6

    Bayesian Estimation

    Derivation of the Bayesian MMSE estimator

    Cost function: The Bayesian MSE

    Note: The averaging PDF is the joint PDF ofx and !

    We apply Bayes theorem

    to obtain

    Sincep(x) >= 0 for all x, if the integral in brackets can beminimized for each x, then the Bayesian MSE will be minimized.

    = ddpJ xx ),()(2

    )()|(),( xxx ppp =

    [ ] .)()|()( 2 xxx dpdpJ =

  • 8/9/2019 ASSP8_

    5/26Page 7

    Bayesian Estimation

    Hence, fixing x so that is a scalar variable (as opposed to ageneral function ofx), we have

    which when set to zero results in

    = dpJ )|()('2

    x

    2)|(2

    )|(2)|(2

    )|()(2

    )|()(

    )|()(

    ' 22

    +=

    +=

    =

    =

    x

    xx

    x

    xx

    E

    dpdp

    dp

    dpdpJ

    ).|( x E=

  • 8/9/2019 ASSP8_

    6/26Page 8

    Bayesian Estimation

    Comments:

    It is seen that the optimal estimator in terms of minimizing theBayesian MSE is the mean of theposteriorPDFp(lx).

    The posterior PDF refers to the PDF ofafterthe data have beenobserved. It summarizes our new state of knowledge about theparameter.

    In contrast,p() may be thought of as the prior PDF of,indicating the PDF before the data are observed.

    We will term the estimator that minimizes the Bayesian MSE the

    minimum mean square error (MMSE) estimator.

    Intuitively, the effect of observing data will be to concentrate thePDF of.

    This is because knowledge of the data should reduce ouruncertainty about .

  • 8/9/2019 ASSP8_

    7/26Page 9

    Bayesian Estimation

    Comments:

    The MMSE estimator will in general depend on the prior knowledge aswell as the data.

    If the prior knowledge is weak relative to that of the data, then theestimator will ignore the prior knowledge.

    Otherwise, the estimator will be "biased" towards the prior mean. Asexpected, the use of prior information always improves the estimation

    accuracy.

    The choice of a prior PDF is critical in Bayesian estimation. The wrongchoice will result in a poor estimator, similar to the problems of a

    classical estimator designed with an incorrect data model.

    Remember: Classical MSE will depend on hence estimators that

    attempt to minimize the MSE will usually depend on the Bayesian willnot!

    In effect the parameter dependency has be integrated away

  • 8/9/2019 ASSP8_

    8/26Page 10

    Bayesian Estimation

    We derived the MMSE estimator for the case of continuousrandom variables.

    We notice that the same estimator also holds for discrete random

    variables.

    Example 1:

  • 8/9/2019 ASSP8_

    9/26Page 11

    Bayesian Estimation

    It is often not possible to find closed form solutions for the MMSEestimator. An exception is the case, where x and are jointlyGaussian distributed.

  • 8/9/2019 ASSP8_

    10/26

    Page 12

    Bayesian Estimation

    Example: DC level in WGN with Gaussian prior

    x[n] = A + w[n]

    Gaussian prior

    with A=0

    If

    then p(A|x) can be written as:

    ( )

    = A

    AA

    AAp22 2

    1exp

    2

    1)(

    ( )( )

    =

    =

    1

    0

    2

    2

    22

    ][2

    1exp

    2

    1)|(

    N

    nN

    AnxAp

    x

    dAAQ

    AQ

    Ap

    =

    )(2

    1exp

    )(2

    1exp

    )|(x

  • 8/9/2019 ASSP8_

    11/26

    Page 13

    Bayesian Estimation

    with

    note that the denominator of p(A|x) does not depend on A anymore, being a normalizing factor (normalizing the area belowp(A|x) ) and the argument of the exponential is quadratic in A

    Hence p(A|x) must be Gaussian. It can be shown that its mean

    and variance are:

    A

    A

    A

    A

    A

    AAxNAANAQ2

    2

    22

    2

    22

    222)(

    ++=

    A

    A

    A

    A

    A

    A

    N

    N

    xN

    22

    x|2

    22

    22

    x|

    1

    1

    1

    +

    =

    +

    +

    =

  • 8/9/2019 ASSP8_

    12/26

    Page 14

    Bayesian Estimation

    In this form, the Bayes MMSE estimator is readily found as

    for better interpretation this can be written as

    with

    A

    A

    A

    A N

    xN

    AEA22

    22

    x| 1x)|(

    +

    +

    ===

    AA

    AA

    Ax

    N

    Nx

    N

    A

    )1(

    22

    2

    22

    2

    +=

    +

    +

    +

    =

    NA

    A

    22

    2

    +

    =

  • 8/9/2019 ASSP8_

    13/26

    Page 15

    Bayesian Estimation

    Note that is a weighting factor since 0 < < 1

    When there is little data available so thatthen is small and

    but as more data are observed so thatand

    The weighting factor directly depends on the confidence in the

    prior knowledge and the confidence in the sample data

    If one examines the posterior pdf, its variance

    decreases as N increases.

    NA

    22 >xA 1

    2

    A N/2

    A

    A N22

    x|

    2

    1

    1

    +

    =

  • 8/9/2019 ASSP8_

    14/26

    Page 16

    Bayesian Estimation

    As we have seen, the posterior mean changes with increasing N

    for small N it will approximately be

    but will approach for increasing N

    A

    x

    Taken from Kay: Fundamentals of Statistical Signal Processing, Vol 1: Estimation Theory, Prentice Hall, Upper Saddle River 2009

  • 8/9/2019 ASSP8_

    15/26

    Page 17

    Bayesian EstimationVector Case

    Theorem: Ifxandare jointly Gaussian, where x is of dimensionkx1 andyof dimension lx1, with mean vector [E(x) E()]Tand

    partitioned covariance matrix

    so that

    then the conditional PDF p(lx) is also Gaussian and

    =

    CCCCC

    x

    xxx

    =

    + )(

    )(C

    )(

    )(

    2

    1exp

    )(det)2(

    1),( 1

    2

    1

    2

    xx

    xx

    C

    xE

    E

    E

    Ep

    T

    lk

    xxxxx

    xxx EEE

    CCCCC

    xxCCx

    1

    |

    1))(()()|(

    =

    +=

    B i E ti ti

  • 8/9/2019 ASSP8_

    16/26

    Page 18

    Bayesian EstimationVector Case

    Theorem: If the observed data x can be modeled as

    where x is an Nx1 data vector, H is a known Nxp matrix,is a px1random vector with prior pdf N(,C), andw is an Nx1 noisevector with pdf N(0,C) and independent ofthen the posterior pdf

    p(|x) is Gaussian with mean

    and covariance

    In contrast to the classical general linear model, H need not befull rank to ensure the invertibility of

    Note that is also the covariance matrix of the estimation error( has zero mean)

    )()(x)|(1

    HxCHHCHC ++=

    w

    TT

    E

    wHx +=

    HCCHHCHCCC1

    | )(

    += wTT

    x

    w

    TCHHC +

    x|C

    =

  • 8/9/2019 ASSP8_

    17/26

    Page 19

    Linear Bayesian Estimation

    Linear MMSE Estimation

    Except when x and are jointly Gaussian the MMSE estimatormay be difficult to find.

    The situation is different when we constrain the estimator to belinear in x.

    As will be seen shortly we do not have to assume any specificform for the joint PDFp(x, ), only a knowledge of the first twomoments will be sufficient to derive the LMMSE (compare to theBLUE).

    That may be estimated from x is due to the assumed statisticaldependence of on x as summarized by the joint PDF p(x, ).

    In particular, for a linear estimator we rely on the correlationbetween and x.

  • 8/9/2019 ASSP8_

    18/26

    Page 20

    Linear Bayesian Estimation

    Introductory Example: Assumexand are jointly distributed. Findthe linear (actually affine) estimator

    that minimizes the Bayesian MMSE:

    Solution:

    Example: Derive the LMMSE estimator and the MSE for Example 1.

    ))(()var(

    ),cov()( xEx

    x

    xE +=

    bax+=

    2

    , )

    ( =

    xEJ

    Linear Bayesian Estimation

  • 8/9/2019 ASSP8_

    19/26

    Page 21

    Linear Bayesian EstimationScalar Parameter

    Aim: Find the (affine linear) estimator of form

    that minimizes the Bayesian MSE

    aNcompensate nonzero means ofx and

    omitted when both means are zero

    2

    , )( = xEJ

    =

    +=

    1

    0

    ][N

    n

    Nn anxa

    Linear Bayesian Estimation

  • 8/9/2019 ASSP8_

    20/26

    Page 22

    Linear Bayesian EstimationScalar Parameter

    Deriving the optimal weighting coefficients:

    Starting with aN:

    Setting to zero results in:

    ( ) ( )

    =

    =

    =

    =

    =

    =

    1

    0

    1

    0

    21

    0

    ][2

    ][2][

    N

    n

    Nn

    N

    n

    Nn

    N

    n

    Nn

    N

    anxEaE

    anxaEanxaEa

    ( ) ( )

    =

    =1

    0

    ][N

    n

    nN nxEaEa

    Linear Bayesian Estimation

  • 8/9/2019 ASSP8_

    21/26

    Page 23

    Linear Bayesian EstimationScalar Parameter

    Continuing for the remaining coefficients an:

    When writing the sums as inner vector products witha = [a0,a1,,aN-1] leads to

    ( ) ( )

    ( )( ) ( )( )

    =

    +=

    =

    =

    =

    =

    21

    0

    21

    0

    1

    0

    21

    0

    ][][

    ][][][

    N

    n

    n

    N

    n

    N

    n

    nn

    N

    n

    Nn

    EnxEnxaE

    nxEaEnxaEanxaE

    ( )( ) ( )( )( )( )( ) ( )( )[ ] ( )( ) ( )( )[ ]

    ( )( ) ( )( )[ ] ( )( )[ ]2

    2

    EEEEE

    EEEEEE

    EEE

    T

    TTT

    T

    +

    =

    =

    axx

    xxaaxxxxa

    xxa

    Linear Bayesian Estimation

  • 8/9/2019 ASSP8_

    22/26

    Page 24

    Linear Bayesian EstimationScalar Parameter

    where is Cxx the NxN covariance matrix ofx and cx is the 1xNcross-covariance vector having the property cx = cx

    T and isthe variance of.

    Taking the gradient yields

    setting to zero results in

    ( )( ) ( )( )( )( )( ) ( )( )[ ] ( )( ) ( )( )[ ]

    ( )( ) ( )( )[ ] ( )( )[ ]

    +=

    =+

    =

    ==

    accaaCa

    axx

    xxaaxxxxa

    xxa

    xx

    T

    xx

    T

    T

    TTT

    T

    EEEEE

    EEEEEE

    EEEJ

    2

    2

    xxx

    JcaC

    a22 =

    xxx cCa1

    =

    Linear Bayesian Estimation

  • 8/9/2019 ASSP8_

    23/26

    Page 25

    Linear Bayesian EstimationScalar Parameter

    Combining with the result for aN leads to

    as the LMMSE estimator and the corresponding BMSE of

    Note that this is identical in form to the MMSE estimator forjointly Gaussian x and . This is because in the Gaussian case theMMSE estimator happens to be linear, and hence our constraint isautomatically satisfied.

    ( ) ( )( )xxCc EE xxx +=1

    xxxxBMSE cCc1

    )(

    =

    Linear Bayesian Estimation

  • 8/9/2019 ASSP8_

    24/26

    Page 26

    Linear Bayesian EstimationVector Parameter

    The vector LMMSE estimator is a straightforward extension of thescalar one

    We wish to find the linear estimator that minimizes the Bayesian

    MSE for each element:

    for i=1,2,,p. And choose the weighting coefficient to minimize

    Combining the scalar LMMSE estimators leads to

    and

    =

    +=

    1

    0

    ][N

    n

    iNini anxa

    2)( iii EJ =

    ( ) ( )( )xxCc EE xxx +=1

    iii xxxxiBMSE cCc

    1)(

    =

    Linear Bayesian Estimation

  • 8/9/2019 ASSP8_

    25/26

    Page 27

    Linear Bayesian EstimationVector Parameter

    Problem: inverse system identification

    Let the following communication scenario be given:

    Data samples y[k] (+1 or -1, uncorrelated, zero mean, y2=1)

    are transmitted through a discrete time linear system, given byits impulse response h = [h0,, hl-1]. After that additive whiteGaussian noise n[k] (zero mean, n

    2) is added. Your task is to

    Find the best linear system w=[w0

    ,, wp-1

    ] in an LMMSEsense to estimate the data.

    Write down the estimator using the hints at the next slide

    Write a Matlab script simulating the system with l=4 and p=4

    Vary n2 from 0.001 to 1 and observe the results

    Linear Bayesian Estimation

  • 8/9/2019 ASSP8_

    26/26

    Page 28

    Linear Bayesian EstimationVector Parameter

    Hints:

    as we will see in the following lectures:

    For uncorrelated data samples and uncorrelated noise with zero

    means and y2 and n2, respectively, we have

    as the autocorrelation matrix of the samples and

    as the cross correlation vector of x and y. ei is the vector that has a

    one as position i and zeros at all other elements. Choose i = l+1 andthe length ofei as 7.

    H is the convolution matrix of h. Use convmtx(h,l) to obtain H inmatlab.

    Please be aware that the output sequence after the filter w is shiftedby i=l+1 samples

    += IHHR

    2

    22

    y

    nH

    yxx

    iy

    H

    xy eHr2=