Another example of ﬁnding the MVUE using su cient statistichehu/SSP/lecture5.pdfAnother example of...

Another example of finding the MVUE usingsufficient statistic

• Consider tossing a biased coin for N times.

• Suppose the coin is biased such that the probability of ahead is θ and that of a tail is 1 − θ.

• The coin is tossed N times resulting in a sequencex[n] ∈ {0, 1}, for n = 0, 1, . . . ,N− 1. Here, one denotes ahead and zero a tail.


• The probability of the sequence is given by the Bernoullidistribution:

p(x; θ) = θ∑x[n](1 − θ)N−

∑x[n],

Note that∑x[n] is simply the number of heads in the

sequence.


• The task is to find the MVUE for θ. The Neyman-Fisherfactorization is again simple:

p(x; θ) = θ∑x[n](1 − θ)N−

∑x[n]︸︷︷︸

g(T(x),θ)h(x)

· 1︸︷︷︸h(x)

,

where T(x) =∑x[n] is the sufficient statistic.

• Now it remains to find a function f of T(x) that makes itunbiased.


• Let’s see the bias of T(x) first:

E[T(x)] = E[N−1∑n=0

x[n]] =

N−1∑n=0

E[x[n]] =

N−1∑n=0

θ = Nθ.

• Thus, division by Nmakes an unbiased estimator:

θ =

∑N−1n=0 x[n]

N.

Unbiasedness check:

E[θ] =

∑N−1n=0 E[x[n]]

N=Nθ

N= θ.

Best linear unbiased estimator (BLUE)

Best linear unbiased estimator (BLUE)

• This time we’ll look into a simple subclass of all estimators:linear estimators.

• If we don’t know the pdf of the data or we are not willingto model it we cannot apply the CRLB or RBLS theorem.

• Even if the pdf is known the CRLB and RBLS theorem maynot produce the MVU estimator.

• Then, we might resort to some sub-optimal estimator.• The BLUE restricts the estimator to be linear in data.• The BLUE can be determined only with the knowledge of

the first two moments (instead of the complete PDF).

BLUE - definition

• The BLUE restricts the estimator to be linear in data

θ =

N−1∑n=0

anx[n].

• To determine the BLUE we must determine an so that theestimator is unbiased and has minimum variance.

• The BLUE will be optimal (the MVU estimator) if the MVUestimator is linear in data, otherwise it is sub-optimal.

• If the MVU estimator is nonlinear in data, then the BLUE issub-optimal and the difference in performance can besubstantial.

BLUE - definition

• For some estimation problems the use of the BLUE can betotally inappropriate.• For example in the estimation of the power of WGN, all

linear estimators will be biased, hence BLUE does not exist.• However, if we can use transformed data: y[n] = x2[n], the

BL estimator becomes viable

σ2 =

N−1∑n=0

anx2[n]

• Sometimes their performance is low compared to theMVUE; for example in the estimation of the mean of anuniform distribution, whose BLUE is the sample mean.

Finding the BLUE

• We constrain θ to be linear and unbiased and then find thean’s to minimize the variance.

• The unbiasedness constraint says that:

E(θ) =

N−1∑n=0

anE(x[n]) = θ (1)

• The problem is to minimize the variance var(θ):

var(θ) = E[(N−1∑n=0

anx[n] − E(N−1∑n=0

anx[n]))2]

.

Finding the BLUE

• Using vector notation, this becomes easier to read. Let’sdenote a = [a0,a1, . . . ,aN−1]

T :

var(θ) = E[(aTx − aTE(x))2

]= E

[(aT (x − E(x))

)2]

= E[aT (x − E(x))(x − E(x))Ta

]= aTCa. (2)

• To satisfy the unbiasedness constraint, E(x[n]) must belinear in θ: E(x[n]) = s[n]θ, where s[n] is a known signal.

Finding the BLUE

• What does this representation mean: what kind ofproblems is the BLUE good for?

• Note, that if we write x[n] as

x[n] = E(x[n]) + (x[n] − E(x[n])) ,

we can view the random term (x[n] − E(x[n])) as noisew[n]:

x[n] = s[n]θ+w[n]

• Thus, the BLUE is applicable to problems, which can bemodeled as above.

• But isn’t this exactly the linear model?

Finding the BLUE

• No, the linear model assumes Gaussian noise. Here weonly assume the knowledge of the first two moments of thedata: s[n] and C.

1 If your noise is Gaussian, use θ = (HTH)−1HTx.

2 If not (and you know the first 2 moments instad), use theBLUE estimator we’re about to derive.

3 If you are unable to assume even the mean and covariance,use Least Squares instead.

Finding the BLUE

• The linearity of E(x[n]) gives the unbiasedness constraintthe following form

N−1∑n=0

anE(x[n]) = θ

⇔N−1∑n=0

ans[n]θ = θ

⇔N−1∑n=0

ans[n] = 1

Finding the BLUE

which we can write more briefly as

aT s = 1.

• Now we arrive at a constrained linear programmingproblem:

Minimize var(θ) = aTCa subject to the constraint aT s = 1.

• This problem can be solved using Lagrange multipliers asfollows.

Finding the BLUE

• The method of Lagrangian multipliers adds the constraintas a penalty term into the objective function and aftersolving sets the penalty to zero. In our case the Lagrangianfunction to be minimized becomes

J(a) = aTCa + λ(aT s − 1)

Note that when the constraint is satisfied, this is equal tothe objective function.

Finding the BLUE

• Differentiating with respect to a gives (see thedifferentiation rules given on slide 4 of lecture 3):

∂J

∂a= 2Ca + λs

Setting this equal to zero gives the optimum aopt:1

2Caopt + λs = 0, or

aopt = −λ

2C−1s

Finding the BLUE

• Next, we get rid of λ by choosing the particular value of λthat actually forces the constraint to hold. Our constraintrequires that aT s = 1, which has to hold also for the aboveoptimum aopt. Thus,

aTopts = sTaopt = −λ

2sTC−1s = 1.

• Solving for λ gives

λ = −2

sTC−1s

Finding the BLUE

• Substituting this into the equation of aopt gives the optimalcoefficient vector:

aopt = −λ

2C−1s =

C−1ssTC−1s

• Since

θ =

N−1∑n=0

anx[n] = aTx,

we get the BLUE as

θBLUE = aToptx =sTC−1xsTC−1s

Finding the BLUE

• Furthermore, the variance of θ is

var(θ) =1

sTC−1s

Example: DC Level in White Noise

• DC level in White Noise (note: not necessarily Gaussian):

x[n] = A+w[n], n = 0, 1, . . . ,N− 1

• Now s = 1 and C = σ2I. Thus, the BLUE is given by

A =sTC−1xsTC−1s

=1T 1σ2 Ix

1T 1σ2 I1

=1N

N−1∑n=0

x[n]

• Moreover,

var(A) =1

sTC−1s=

11T 1σ2 1

=σ2

N

Example: DC Level in Uncorrelated Noise

• Here, uncorrelated noise means, that the samples areuncorrelated, but may have different distributions.Otherwise the model is the same:

x[n] = A+w[n], n = 0, 1, . . . ,N− 1

We only need to know the variances (var(w[n]) = σ2n) and

the mean (s = 1).


• The covariance matrix is now diagonal:

C =

σ2

0 0 · · · 00 σ2

1 · · · 0...

.... . .

...0 0 · · · σ2

N−1


• The inverse becomes

C−1 =

1σ2

00 · · · 0

0 1σ2

1· · · 0

......

. . ....

0 0 · · · 1σ2N−1

• Again s = 1, so the BLUE is given by

A =sTC−1xsTC−1s

=1TC−1x1TC−11


• The numerator is now

1TC−1x =(

1σ2

0

1σ2

1· · · 1

σ2N−1

)x[0]x[1]

...x[N− 1]

=

N−1∑n=0

x[n]

σ2n

• The denominator acts simply as a scaling factor tonormalize the gain:

1TC−11 =

N−1∑n=0

1σ2n

.


• Thus, the BLUE for uncorrelated noise is

A =

∑N−1n=0

x[n]σ2n∑N−1

n=01σ2n

• In other words, more reliable samples with smallervariance are given more weight.

• Finally, the estimator variance is

var(A) =1

1TC−11=

1∑N−1n=0

1σ2n

BLUE for Vector Parameter Case

• The BLUE can be extended also to the vector parametercase. Let us first extend the definitions of linearity andunbiasedness there.

• If the parameter to be estimated is a p× 1 vectorθ = [θ0, θ1, . . . , θN−1], then the linear estimator has theform

θi =

N−1∑n=0

ainx[n], i = 1, 2, . . . ,p.

• In matrix form this becomes

θ = Ax


• The unbiasedness requirement is now:

E(θi) =

N−1∑n=0

ainE(x[n]) = θi, i = 1, 2, . . . ,p

or in matrix form

E(θ) = AE(x) = θ

• Just like the scalar case, also here this must imply that themodel is linear in data:

E(x) = Hθ


• Substituting this form into the unbiasedness constraintabove gives

AE(x) = θ

⇔ AHθ = θ

⇔ AH = I

• Thus, a linear estimator is unbiased if its coefficient matrixA satisfies AH = I.


• Similar derivation as in the scalar case gives the vectorBLUE (see Appendix 6B):

θ = (HTC−1H)−1HTC−1x

and the covariance matrix of θ is Cθ = (HTC−1H)−1.• Formally, this can be stated as the following theorem.• Note, that though many details are the same as in Chapter

4 (Linear models), the difference is in that we do notassume Gaussianity (or any other distribution either).

• On the other hand, the following theorem does not alwaysresult in MVUE, but in a suboptimal solution.


TheoremGauss-Markov theorem: If the data are of the general linear modelform x = Hθ+ w where H is a known N× p matrix, θ is a p× 1vector of parameters to be estimated, and w is a p× 1 noise vectorwith zero mean and covariance C then the BLUE of θ is

θ = (HTC−1H)−1HTC−1x

the variance of θi is

var(θi) = [(HTC−1H)−1]ii

and the covariance matrix of θ is Cθ = (HTC−1H)−1.

Example: Source Localization

• Consider the problem of localizing an aircraft based on asignal it is sending.

• The signal is received by N antennas at known locations(x0,y0), (x1,y1),. . . , (xN−1,yN−1).


Distance d0Distance d1 Distance dN-1

Antenna 0 at

(x0,y0)Antenna 1 at

(x1,y1)

Antenna N-1

at (xN-1,yN-1)

Aircraft at position

(xs,ys)


• A signal emitted at time T0 is received at antenna k at time

tk = T0 +dkc

+ εk, k = 0, 1, 2 . . . ,N− 1,

where c is the propagation speed.• Due to the Pythagorean theorem, dk is related to the

locations by

dk =√

(xs − xk)2 + (ys − yk)2,


resulting in the model

t0 = T0 +

√(xs − x0)2 + (ys − y0)2

c+ ε0

t1 = T0 +

√(xs − x1)2 + (ys − y1)2

c+ ε1

...

tN−1 = T0 +

√(xs − xN−1)2 + (ys − yN−1)2

c+ εN−1

• Unfortunately the model is nonlinear, and a BLUE can notbe applied.


• Instead, the model for the location change can belinearized, and can be used for a tracking application.

• Define the location change vector as

θ =

(∆x

∆y

)=

(xs − xoldys − yold

),

where (xold,yold) is the previous location estimate.


Antenna 0 at

(x0,y0)Antenna 1 at

(x1,y1)

Antenna N-1

at (xN-1,yN-1)

Aircraft at position

(xs,ys)

Old location (xold,ynew)

Old dN-1Old d2Old d1

q

a0


• The change in distance can be approximated by

dk ≈ dold +xold − xkdold

∆x+yold − ykdold

∆y,

which is a first order Taylor expansion around (xold,yold).There’s no error if the plane is moving linearly towards oraway from the antenna.


• Substituting this into the model

tk = T0 +dkc

+ εk, k = 0, 1, 2 . . . ,N− 1,

gives

tk ≈ T0 +dold

c+xold − xkdoldc

∆x+yold − ykdoldc

∆y+ εk,

which is a linear model of the arrival times with unknownparameters T0, ∆x and ∆y.


• The model can be further simplified using the followingobservations:

xold − xkdold

= cosαk

yold − ykdold

= sinαk

and by introducing a new variable

τk = tk +dold

c.


• These result in the model

τk ≈ T0 +cosαkc

∆x+sinαkc

∆y+ εk,

where the unknowns are T0, ∆x and ∆y.2• The estimation of T0 is often impractical, as it would require

synchronization between the plane and the antennas.• Therefore, it is customary to model the time difference of

arrival (TDOA), where the observable is the time differencebetween antennas:

ξk = τk − τk−1, for k = 1, 2, . . . ,N− 1.

2Note that this model assumes that the angle αk between the antenna and the aircraft stays constant. This canalso be taken into account, but let’s keep it less complicated for the moment.


• This results in the final model

ξk =1c(cosαk−cosαk−1)∆x+

1c(sinαk−sinαk−1)∆y+εk−εk−1.

• In matrix form this becomes

ξ = Hθ+ w

or


ξ1

ξ2...

ξN−1

=1c

cosα1 − cosα0 sinα1 − sinα0

cosα2 − cosα1 sinα2 − sinα1...

...cosαN−1 − cosαN−2 sinαN−1 − sinαN−2

·(∆x

∆y

)+

ε1 − ε0

ε2 − ε1...

εN−1 − εN−2

.

• Now, the BLUE solution requires the knowledge of thecovariance matrix. Since our simplified model assumesthat H and θ are constant, the covariance matrix is that ofw.


• The vector w can be simplified:

w =

−1 1 0 0 · · · 00 −1 1 0 · · · 00 0 −1 1 · · · 0...

......

.... . .

...0 0 · · · 0 −1 1

︸︷︷︸

A

ε0ε1...

εN−1

︸︷︷︸

ε

.


• Therefore, the covariance matrix becomes

C = E[AεεTAT ] = σ2AAT .

Using the Gauss-Markov theorem we obtain the BLUE

θ = (HTC−1H)−1HTC−1ξ

= (HT (AAT )−1H)−1HT (AAT )−1ξ.

Another example of ﬁnding the MVUE using su cient statistichehu/SSP/lecture5.pdfAnother example of...

Documents

Transcript of Another example of ﬁnding the MVUE using su cient statistichehu/SSP/lecture5.pdfAnother example of...