EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-1
Chapter 5 Moments and Conditional Statistics
Let X denote a random variable, and z = h(x) a function of x. Consider the
transformation Z = h(X). We saw that we could express
E Z E h x x dxx[ ] [ ( )f ( )= =−∞∞zh(X)] , (5-1)
a method of calculating E[Z] that does not require knowledge of fZ(z). It is possible to extend
this method to transformations of two random variables.
Given random variables X, Y and function z = g(x,y), form the new random variable
Z = g(X,Y). (5-2)
fZ(z) denotes the density of Z. The expected value of Z is E Z z f z dzz[ ] ( )=−∞∞z ; however, this
formula requires knowledge of fZ, a density which may not be available. Instead, we can use
E Z E g X Y g x y f x y dxdyxy[ ] [ ( , )] ( , ) ( , )= =−∞∞
−∞∞ zz (5-3)
to calculate E[Z] without having to obtain fZ. This is a very useful result.
Covariance
The covariance CXY of random variables X and Y is defined as
C = E[(X - )(Y - )] = (x - )(y - ) f (x, y)dxdy XY -- XYη η η ηx y x y∞∞
∞∞ zz , (5-4)
where ηx = E[X] and ηy = E[Y]. Note that CXY can be expressed as
C = E[(X - )(Y - )] = E[XY -XY xη η η η η η η ηx y y x y x yY X E XY− + = −] [ ] . (5-5)
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-2
Correlation Coefficient
The correlation coefficient for random variables X and Y is defined as
r Cxy
XY
x y=
σ σ . (5-6)
rxy is a measure of the “statistical similarity” between X and Y.
Theorem 5-1: The correlation coefficient must lie in the range −1 ≤ rxy ≤ +1.
Proof: Let α denote any real number. Consider the parabolic equation
g( ) α α η η α σ α σ≡ − + − = + + ≥E X Y Cx y x xy y[ ( ) ( ) ]m r2 2 2 22 0 (5-7)
Note that g(α) ≥ 0 for all α; g is a parabola that opens
upward.
As a first case, suppose that there exists a
value α0 for which g(α0) = 0 (see Fig. 5-1). Then α0
is a repeated root of g(α) = 0. In the quadratic
formula used to determine the roots of (5-7), the
discriminant must be zero. That is, (2Cxy)2-4σx2σy
2 =
0, so that
rxy = =Cxy x y/ σ σ 1 .
Now, consider the case g(α) > 0 for all α; g
has no real roots (see Fig. 5-2). This means that the
discriminant must be negative (so the roots are
complex valued). Hence, (2Cxy)2-4σx2σy
2 < 0 so that
α0α -axis
g( ) =α α σ α σ2 2 2
2x xy yC+ +
Figure 5-1: Case for which the discriminant is zero.
α -axis
g( ) =α α σ α σ2 2 2
2x xy yC+ +
Figure 5-2: Case for which the discriminant is negative.
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-3
rxy = <Cxy
x yσ σ1. (5-8)
Hence, in either case, −1 ≤ rxy ≤ +1 as claimed.
Suppose an experiment yields values for X and Y. Consider that we perform the
experiment many times, and plot the outcomes X and Y on a two dimensional plane. Some
hypothetical results follow.
x-axis
y-axis
Correlation Coefficient rxynear -1
x-axis
y-axis
Correlation Coefficient rxyVery Small
y-axis
Correlation Coefficient rxynear +1
x-axis
Figure 5-3: Samples of X and Y with varying degrees of correlation.
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-4
Notes:
1. If ⎮rxy⎮ = 1, then there exists constants a and b such that Y = aX + b in the mean-square sense
(i.e., E[{Y - (aX + b)}2] = 0).
2. The addition of a constant to a random variable does not change the variance of the random
variable. That is, σ2 = VAR[X] = VAR[X + α] for any α.
3. Multiplication by a constant increases the variance of a random variable. If VAR[X] = σ2,
then VAR[αX] = α2σ2.
4. Adding constants to random variables X and Y does not change the covarance or correlation
of these random variables. That is, X + α and Y + β have the same covariance and correlation
coefficient as X and Y.
Correlation Coefficient for Gaussian Random Variables
Let zero mean X and Y be joint Gaussian with joint density
f x yXY( , ) exp( )
= −−
− +LNMM
OQPP
RS|T|
UV|W|−
1
2 1 rx 2r xy y
x y2 2
2
x2 x y
2
y21 rπσ σ σ σ σ σ
12
. (5-9)
We are interested in the correlation coefficient rXY; we claim that rXY = r, where r is just a
parameter in the joint density (from statements given above, r is the correlation coefficient for
the nonzero mean case as well). First, note that CXY = E[XY], since the means are zero. Now,
show rXY = r by establishing E[XY] = rσXσY, so that rXY = CXY/σXσY = E[XY]/σXσY = r. In the
square brackets of fXY is an expression that is quadratic in x/σX. Complete the square for this
quadratic form to obtain
( ) ( )2x
2 2 22 2 2x2 2 2 2x y y yxx y y y
2 2xy y y y1yxx 2r r 1 r x r y 1 r
⎧ ⎫ ⎧ ⎫⎪ ⎪ ⎪ ⎪=⎨ ⎬ ⎨ ⎬⎪ ⎪ ⎪ ⎪⎩ ⎭ ⎩ ⎭
σ− + = − + − − + −σ σ σ σσ σσ σ σ σ. (5-10)
Use this new quadratic form to obtain
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-5
XY
x2 2
yy
y
x y
2y / 2
2 22 xx
x {normal density with mean r y}
E[XY] xy f (x, y) dxdy
(x r y)1 xy e exp dxdy.2 2 (1 r )2 (1 r )
× σσ
∞ ∞−∞ −∞
σσ∞ ∞− σ
−∞ −∞
=
−⎡ ⎤⎢= − ⎥
σ π ⎢ σ − ⎥σ π − ⎣ ⎦
∫ ∫
∫ ∫ (5-11)
Note that the inner integral is an expected value calculation; the inner integral evaluates to r yx
y
σσ . Hence,
2 2
y xyy
2 2yx x
y yy
y / 2
y / 22 2y
x y
1E[XY] y e r y dxdy2
1r y e dy r2
r ,
∞ − σ σσ−∞
∞ − σσ σσ σ−∞
⎡ ⎤= ⎢ ⎥σ π ⎣ ⎦
⎡ ⎤⎡ ⎤= = σ⎢ ⎥ ⎣ ⎦σ π⎢ ⎥⎣ ⎦
= σ σ
∫
∫ (5-12)
as desired. From this, we conclude that rXY = r.
Uncorrelatedness and Orthogonality
Two random variables are uncorrelated if their covariance is zero. That is, they are
uncorrelated if
CXY = rXY = 0. (5-13)
Since CXY = E[XY] – E[X]E[Y], Equation (5-13) is equivalent to the requirement that E[XY] =
E[X]E[Y]. Two random variables are called orthogonal if
E[XY] = 0. (5-14)
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-6
Theorem 5-2: If random variables X and Y are independent, then they are uncorrelated
(independence ⇒ uncorrelated).
Proof: Let X and Y be independent. Then
XY X YE[XY] xy f (x, y) dxdy xy f (x)f (y) dxdy E[X] E[Y]∞ ∞ ∞ ∞−∞ −∞ −∞ −∞
= = =∫ ∫ ∫ ∫ . (5-15)
Therefore, X and Y are uncorrelated. Note: The converse is not true in general. If X and Y are
uncorrelated, then they are not necessarily independent. This general rule has an exception for
Gaussian random variable, a special case.
Theorem 5-3: For Gaussian random variables, uncorrelatedness is equivalent to independence
( Gaussian random variablesIndependence Uncorrelatedness for⇔ ) .
Proof: We have only to show that uncorrelatedness ⇒ independence. But this is easy. Let the
correlations coefficient r = 0 (so that the two random variables are uncorrelated) in the joint
Gaussian density . Note that the joint density factors into a product of marginal densities.
Joint Moments
Joint moments of X and Y can be computed. These are defined as
m E X Y x y f x y dxdykrk r k r
XY= =−∞∞
−∞∞ zz[ ] ( , ) . (5-16)
Joint central moments are defined as
μ η η η ηkr xk
yr
xk
yrE X Y x y f x y dxdyXY= − − = − −
−∞∞
−∞∞ zz[( ) ( ) ] ( ) ( ) ( , ) . (5-17)
Conditional Distributions/Densities
Let M denote an event with P(M) ≠ 0, and let X and Y be random variables. Recall that
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-7
[Y y M]F(y M) [Y y M][M]≤ ,
⎮ = ≤ ⎮ = PPP
. (5-18)
Now, event M can be defined in terms of the random variable X.
Example (5-1): Define M = [X ≤ x] and write
XY
X
[X x Y y] F (x, y)F(y X x)[X x] F (x)≤ , ≤
⎮ ≤ = =≤
PP
(5-19)
f y X F x y yF x
XY
X( ) ( , ) /
( )Y ≤ = ∂ ∂x . (5-20)
Example (5-2): Define M = [x1 < X ≤ x2] and write
XY XY
X X
1 2 2 11 2
1 2 2 1
[x X x Y y] F (x , y) F (x , y)F(y x X x )[x X x ] F (x ) F (x )< ≤ , ≤ −
⎮ < ≤ = =< ≤ −
PP
. (5-21)
Example (5-3): Define M = [X = x], where fX(x) ≠ 0. The quantity [Y y M]/ [M]≤ ,P P can be
indeterminant (i.e., 0/0) in this case (certainly, this is true for continuous X) so that we must use
x 0F(y X x) F(y x - x X x)limit
+Δ →⎮ = = ⎮ Δ < ≤ . (5-22)
From the previous example, this result can be written as
XY XY XY XY
X X X X
XY
X
x 0 x 0
F (x, y) F (x x, y) [F (x, y) F (x x, y)] / xF(y X x) limit limitF (x) F (x x) [F (x) F (x x)]/ x
F (x, y) / x .F (x) / x
+ +Δ → Δ →
− − Δ − − Δ Δ⎮ = = =
− − Δ − − Δ Δ
∂ ∂=∂ ∂
(5-23)
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-8
From this last result, we conclude that the conditional density can be expressed as
XY
X
2
f (y X x) F(y X x)y
F (x, y) / x y ,F (x) / x
∂⎮ = = ⎮ =
∂
∂ ∂ ∂=∂ ∂
(5-24)
which yields
f y X f x yf x
XY
X( ) ( , )
( )Y = =x . (5-25)
Use the abbreviated notation f (y⎮x) = f (y⎮X = x), Equation (5-25) and symmetry to write
fXY(x,y) = f (y⎮x) fX(x) = f (x⎮y) fY(y). (5-26)
Use this form of the joint density with the formula before last to write
f yf xX
( )( )
Yx = Yf(x y)f (y)Y , (5-27)
a result that is called Bayes Theorem for densities.
Conditional Expectations
Let M denote an event, g(x) a function of x, and X a random variable. Then, the conditional
expectation of g(X) given M is defined as
E g X g x f x[ ( ) ( ) (YΜ] = YΜ)−∞∞z dx . (5-28)
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-9
For example, let X and Y denote random variables, and write the conditional mean of X given Y
= y as
x y E[X Y = y] E[X y] x f (x y dx⎮
∞−∞
η ≡ ≡ ≡ )⎮ ⎮ ⎮∫ . (5-29)
Higher-order conditional moments can be defined in a similar manner. For example, the
conditional variance is written as
2 2x y x y x y x yE[(X Y = y] E[(X y] (x ) f (x y) dx⎮ ⎮ ⎮ ⎮
∞2 2−∞
σ ≡ − η ) ≡ − η ) ≡ − η⎮ ⎮ ⎮∫ . (5-30)
Remember that x y⎮η and 2x y⎮σ are functions of algebraic variable y, in general.
Example (5-4): Let X and Y be zero-mean, jointly Gaussian random variables with
f x yXY( , ) exp( )
= −−
− +LNMM
OQPP
RS|T|
UV|W|−
1
2 1 rx 2r xy y
x y2 2
2
x2 x y
2
y21 rπσ σ σ σ σ σ
12
. (5-31)
Find f(x⎮y), ηX⎮Y and σx yY2 . We will accomplish this by factoring fXY into the product
f(x⎮y)fY(y). By completing the square on the quadratic, we can write
( )
( )2x
2 22 22 2 2x y yxx y y
22x
2y y
2xy y yyxx 2r r 1 r
2y1 x r y 1 r
⎧ ⎫⎪ ⎪⎨ ⎬⎪ ⎪⎩ ⎭
⎧ ⎫⎪ ⎪= ⎨ ⎬⎪ ⎪⎩ ⎭
− + = − + −σ σ σσσ σ σ
σ− + −σσ σ
, (5-32)
so that
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-10
xy
XY
Y
22
22 22 yyxxf (y)f (x y)
(x r y)1 1 yf (x, y) exp exp222 (1 r )2 (1 r )
σσ
⎮
−⎡ ⎤ ⎡ ⎤⎢ ⎥= − −⎢ ⎥⎢ σπ σ⎥σ − ⎢ ⎥π − σ ⎣ ⎦⎣ ⎦
. (5-33)
From this factorization, we observe that
2x
y2 22 xx
(x r y)1f (x y) exp2 (1 r )2 (1 r )
σσ−⎡ ⎤
⎢ ⎥⎮ = −⎢ ⎥σ −π − σ ⎣ ⎦
. (5-34)
Note that this conditional density is Gaussian! This unexpected conclusion leads to
x
x y y
2 2 2x y x
r y
(1 r )
⎮
⎮
σση =
σ = σ −
(5-35)
as the conditional mean and variance, respectively.
The variance σx2 of a random variable X is a measure of uncertainty in the value of X. If
σx2 is small, it is highly likely to find X near its mean. The conditional variance σx yY
2 is a
measure of uncertainty in the value of X given that Y = y. From (5-35), note that σx yY2 0→ as
⎮r⎮ → 1. As perfect correlation is approached, it becomes more likely to find X near its
conditional mean ηx yY .
Example (5-5): Generalize the previous example to the non-zero mean case. Consider X and Y
same as above except for E[X] = ηX and E[Y] = ηY. Now, define zero mean Gaussian variables
Xd and Yd so that X = Xd + ηX , Y = Yd + ηY and
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-11
X Yd dXY X Yd d
xy
x yx y
d d
2 2x y y22 22 yyxx
f (x , y )f (x, y) f (x , y )
(x, y)(x , y )
(x r (y )) (y )1 1exp exp222 (1 r )2 (1 r )
σσ
− η − η= = − η − η
∂∂
− η − − η⎡ ⎤ ⎡ ⎤−η⎢ ⎥= − −⎢ ⎥⎢ σπ σ⎥σ − ⎢ ⎥π − σ ⎣ ⎦⎣ ⎦
. (5-36)
By Bayes rule for density functions, it is easily seen that
xy
2x y
2 22 xx
(x r (y ))1f (x y) exp2 (1 r )2 (1 r )
σσ− η − − η⎡ ⎤
⎢ ⎥⎮ = −⎢ ⎥σ −π − σ ⎣ ⎦
. (5-37)
Hence, the conditional mean and variance are
x
x y x yy
2 2 2x y x
r (y )
(1 r )
⎮
⎮
σση = η + − η
σ = σ −
(5-38)
respectively, for the case where X and Y are themselves nonzero mean. Note that (5-38) follows
directly from (5-35) since
x y d x d y
d d y x d y
xy xy
E X Y y E X Y y
E X Y y E Y y
r (y ) .
⎮
σσ
⎡ ⎤ ⎡ ⎤η ≡ = = + η + η =⎮ ⎮⎣ ⎦ ⎣ ⎦
⎡ ⎤ ⎡ ⎤= = − η + η = − η⎮ ⎮⎣ ⎦ ⎣ ⎦
= − η + η
(5-39)
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-12
Conditional Expected Value as a Transformation for a Random Variable
Let X and Y denote random variables. The conditional mean of random variable Y given
that X = x is an "ordinary" function ϕ(x) of x. That is,
(x) E[Y X x] E[Y x] y f (y x) dy∞−∞
ϕ = ⎮ = = ⎮ = ⎮∫ . (5-40)
In general, function ϕ(x) can be plotted, integrated, differentiated, etc.; it is an "ordinary"
function of x. For example, as we have just seen, if X and Y are jointly Gaussian, we know that
y
y xx
(x) E[Y X x] r (x )σ
ϕ = ⎮ = = η + − ησ
, (5-41)
a simple linear function of x.
Use ϕ(x) to transform random variable X. Now, ϕ(X) = E[Y⎮X] is a random variable.
Be very careful with the notation: random variable E[Y⎮X] is different from function
E[Y⎮X = x] ≡ E[Y⎮x] (note that E[Y⎮X = x] and E[Y⎮x] are used interchangeably). Find the
expected value E[ϕ(X)] = E[E[Y⎮X]] of random variable ϕ(X). In the usual way, we start this
task by writing
E E Y X E Y X f x dx y f y dy f x dxX X[ [ ] ] [ ] ( ) ( ) ( )Y Y Y= = LNM
OQP−∞
∞−∞∞
−∞∞z zzx x= . (5-42)
Now, since fXY(x,y) = f (y⎮x) fX(x) we have
E E Y X y f y f x dxdy y f x y dxdy y f y dyX XY Y[ [ ] ] ( ) ( ) ( , ) ( )Y Y=−∞∞
−∞∞
−∞∞
−∞∞
−∞∞zz zz z= =x . (5-43)
From this, we conclude that
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-13
E Y E E Y X[ ] [ [ ] ]= Y . (5-44)
The inner conditional expectation is conditioned on X; the outer expectation is over X. To
emphasis this fact, the notation EX[E[Y⎮X]] ≡ E[E[Y⎮X]] is used sometimes in the literature.
Example (5-6): Example: Two fair dice are tossed until the combination “1 and 1” (“snake
eyes”) appear. Determine the average (i.e., expected) number of tosses required to hit “snake
eyes”. To solve this problem, define random variables
1) N = {number of tosses to hit “snakes eyes” for the first time
2) H =1 if “snake eyes” hit on first roll
= 0 if “snake eyes” not hit first roll
Note that H takes on only two values with P[H= 1] = 1/36 and P[H=0] = 35/36. Now, we can
compute the average [ ]E N E E[N H]⎡ ⎤= ⎮⎣ ⎦ , where the inner expectation is conditioned on H, and
the outer expectation is an average over H. We write
[ ] [ ] [ ]E N E E[N H] E[N H 1] H 1 E[N H 0] H 0⎡ ⎤= ⎮ = ⎮ = = + ⎮ = =⎣ ⎦ P P
Now, if H = 0, then “snake eyes” was not hit on the first toss, and the game starts over (at the
second toss) with an average of E[N] additional tosses still required to hit “snake eyes”. Hence,
E[N⎮H = 0] = 1 + E[N]. On the other hand, if H = 1, “snake eyes” was hit on the first roll, so
E[N⎮H = 1] = 1. These two observations produce
[ ] [ ] [ ]
[ ]
[ ]
E N E[N H 1] H 1 E[N H 0] H 0
1 351 1 E N36 36
35 E N 1,36
= ⎮ = = + ⎮ = =
⎛ ⎞ ⎛ ⎞⎤⎡= + +⎜ ⎟ ⎜ ⎟⎣ ⎦⎝ ⎠ ⎝ ⎠
⎛ ⎞= +⎜ ⎟⎝ ⎠
P P
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-14
and the conclusion E[N] = 36.
Generalizations
This basic concept can be generalized. Again, X and Y denote random variables. And,
g(x,y) denotes a function of algebraic variables x and y. The conditional mean
ϕ(x) = E[g(X,Y) X = x] = E[g(x,Y) X = x] = g(x, y) f(y x ) dy-
Y Y Y∞∞z (5-45)
is an "ordinary" function of real value x.
Now, ϕ(X) = E[g(X,Y)⎮X] is a transformation of random variable X (again, be careful:
E[g(X,Y)⎮X] is a random variable and E[g(X,Y)⎮X = x] = E[g(x,Y)⎮x] = ϕ(x) is a function of
x). We are interested in the expected value E[ϕ(X)] = E[E[g(X,Y)⎮X]] so we write
X
X
- -
xy- - - -
E[ (X)] = E[ E[g(X,Y) X] ] = f (x) g(x,y)f (y x)dy dx
= g(x,y)f (y x)f (x) dy dx g(x,y)f (x, y) dy dx E[g(X, Y)] ,
∞ ∞∞ ∞
∞ ∞ ∞ ∞∞ ∞ ∞ ∞
⎡ ⎤ϕ ⎮ ⎮⎢ ⎥⎣ ⎦
⎮ = =
∫ ∫
∫ ∫ ∫ ∫
(5-46)
where we have used fXY(x,y) = f(y⎮x)fX(x), Bayes law of densities. Hence, we conclude that
E[g(X,Y)] = E[E[g(X,Y)⎮X]] = EX[E[g(X,Y)⎮X]]. (5-47)
In this last equality, the inner conditional expectation is used to transform X; the outer
expectation is over X.
Example (5-7): Let X and Y be jointly Gaussian with E[X] = E[Y] = 0, Var[X] = σX2, Var[Y] =
σY2 and correlation coefficient r. Find the conditional second moment E[X2⎮Y = y] = E[X2⎮y].
First, note that
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-15
Var[XY Y Yy E X y E X y2] [ ] [ ]= − e j2 . (5-48)
Using the conditional mean and variance given by (5-35), we write
E X y y E X y2[ ] ] [ ] ( )Y Y Y= + = − +FHGIKJVar[X r r yx
x
ye j2 2 2
2
1σ σσ
. (5-49)
Example (5-8): Let X and Y be jointly Gaussian with E[X] = E[Y] = 0, Var[X] = σX2, Var[Y] =
σY2 and correlation coefficient r. Find
YE[XY] E [ (Y)]= ϕ , (5-50)
where
x
yr y(y) E[XY Y = y] = y E[X Y = y] y
σ⎛ ⎞ϕ = ⎮ ⎮ = ⎜ ⎟σ⎝ ⎠
. (5-51)
To accomplish this, substitute (5-51) into (5-50) to obtain
Y Y2 2x x
y x yy y
E[XY] E [ (Y)] r E [Y ] r rσ σ= ϕ = = σ = σ σσ σ
. (5-52)
Application of Conditional Expectation: Bayesian Estimation
Let θ denote an unknown DC voltage (for example, the output a thermocouple, strain
gauage, etc.). We are trying to measure θ. Unfortunately, the measurement is obscured by
additive noise n(t). At time t = T, we take a single sample of θ and noise; this sample is called z
= θ + n(T). We model the noise sample n(T) as a random variable with known density fn(n) (we
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-16
have “abused” the symbol n by using it simultaneously to denote a random quantity and an
algebraic variable. Such “abuses” are common in the literature). We model unknown θ as a
random variable with density fθ(θ). Density fθ(θ) is called the a-priori density of θ, and it is
known. In most cases, random variables θ and n(T) are independent, but this is not an absolute
requirement (the independence assumption simplifies the analysis). Figure 5-4 depicts a block
diagram that illustrates the generation of voltage-sample z.
From context in the discussion given below (and in the literature), the reader should be
able to discern the current usage of the symbol z. He/she should be able to tell whether z denotes
a random variable or a realization of a random variable (a particular sample outcome). Here, (as
is often the case in the literature) there is no need to use Z to denote the random variable and z to
denote a particular value (sample outcome or realization) of the random variable.
We desire to use the measurement z to estimate voltage θ. We need to develop an
estimator that will take our measurement sample value z and give us an estimate θ̂ (z) of the
actual value of θ. Of course, there is some difference between the estimate θ̂ and the true value
of θ; that is, there is an error voltage θ (z) ≡ θ̂ (z) - θ. Finally, making errors cost us. C( θ (z))
denotes the cost incurred by using measurement z to estimate voltage θ; C is a known cost
function.
The values of z and C( θ (z)) change from one sample to the next; they can be interpreted
as random variables as described above. Hence, it makes no sense to develop estimator θ̂ that
minimizes C( θ (z)). But, it does make sense to choose/design/develop θ̂ with the goal of
θ
n(t)
Σ+
+
at t = T
z = θ + n(T)+
Figure 5-4: Noisy measurement of a DC voltage.
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-17
minimizing E[C( θ (z))] = E[C( θ̂ (z) - θ)], the expected or average cost associated with the
estimation process. It is important to note that we are performing an ensemble average over all
possible z and θ (random variables that we average over when computing E[C( θ̂ (z) - θ)]).
The estimator, denoted here as bθ̂ , that minimizes this average cost is called the
Bayesian estimator. That is, Bayesian estimator bθ̂ satisfies
bˆ ˆE[ ( (z) - )] E[ ( (z) - )]ˆ ˆ .b
θ θ ≤ θ θθ≠ θ⎮
C C (5-53)
( bθ̂ is the "best" estimator. On the average, you "pay more" if you use any other estimator).
Important Special Case : Mean Square Cost Function C( θ ) = 2θ
Let's use the squared error cost function C( θ ) = 2θ . Then, when estimator θ̂ is used,
the average cost per decision is
( ) ( ) Z
2 22
zˆ ˆE[ ] (z) f ( , z) d dz (z) f ( z) d f (z)dzθ∞ ∞ ∞ ∞−∞ −∞ −∞ −∞
⎡ ⎤θ = θ − θ θ θ = θ − θ θ⎮ θ⎢ ⎥
⎢ ⎥⎣ ⎦∫ ∫ ∫ ∫ (5-54)
For the outer integral of the last double integral, the integrand is a non-negative function of z.
Hence, average cost 2E[ ]θ will be minimized if, for every value of z, we pick ˆ(z)θ to minimize
the non-negative inner integral
( )2
ˆ(z) f ( z) d∞−∞
θ − θ θ⎮ θ∫ . (5-55)
With respect to θ̂ , differentiate this last integral, set your result to zero and get
( )ˆ2 (z) f ( z) d 0∞−∞
θ − θ θ⎮ θ =∫ . (5-56)
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-18
Finally, solve this last result for the Bayesian estimator
bˆ (z) f ( z) d E[ z]∞−∞
θ = θ θ⎮ θ = θ⎮∫ . (5-57)
That is, for the mean square cost function, the Bayesian estimator is the mean of θ conditioned
on the data z. Sometimes, we call (5-57) the conditional mean estimator.
As outlined above, we make a measurement and get a specific numerical value for z (i.e.,
we may interpret numerical z as a specific realization of a random variable). This measured
value can be used in (5-57) to obtain a numerical estimate of θ. On the other hand, suppose that
we are interested in the average performance of our estimator (averaged over all possible
measurements and all possible values of θ). Then, as discussed below, we treat z as a random
variable and average 2 2bˆ(z) { (z) }θ = θ − θ over all possible measurements (values of z) and all
possible values of θ; that is, we compute the variance of the estimation error. In doing this, we
treat z as a random variable. However, we use the same symbol z regardless of the interpretation
and use of (5-57). From context, we must determine if z is being used to denote a random
variable or a specific measurement (that is, a realization of a random variable).
Alternative Expression for ˆbθ
The conditional mean estimator can be expressed in a more convenient fashion. First,
use Bayes rule for densities (here, we interpret z as a random variable)
z
f (z )f ( )f ( z)f (z)
θ⎮θ θθ⎮ = (5-58)
in the estimator formula (5-57) to obtain
bz z
f (z )f ( ) d f (z )f ( ) df (z )f ( )ˆ (z) d ,f (z) f (z) f (z )f ( ) d
∞ ∞θ θ∞ θ −∞ −∞
∞−∞θ−∞
θ ⎮θ θ θ θ ⎮θ θ θ⎮θ θθ = θ θ = =⎮θ θ θ
∫ ∫∫
∫ (5-59)
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-19
a formulation that is used in application.
Mean and Variance of the Estimation Error
For the conditional mean estimator, the estimation error is
bˆ E[ z]θ = θ − θ = θ − θ⎮ . (5-60)
The mean value of θ is (averaged over all θ and all possible measurements z)
bˆE[ ] E[ ] E E[ z]
E[ ] E E[ z] E[ ] E[ ]
= 0
⎡ ⎤θ = θ − θ = θ − θ⎮⎣ ⎦
⎡ ⎤= θ − θ⎮ = θ − θ⎣ ⎦ . (5-61)
Equivalently, bˆE[ ] E[ ]θ = θ ; because of this, we say that bθ̂ is an unbiased estimator.
Since E[ θ ] = 0, the variance of the estimation error is
22VAR[ ] E[ ] E[ z] f ( , z)d dz
∞ ∞−∞ −∞
⎡ ⎤θ = θ = θ − θ⎮ θ θ⎣ ⎦∫ ∫ , (5-62)
where f(θ,z) is the joint density that describes θ and z. We want VAR[ θ ] < VAR[θ]; otherwise,
our estimator is of little value since we could use E[θ] to estimate θ. In general, VAR[ θ ] is a
measure of estimator performance.
Example (5-9): Bayesian Estimator for Single-Sample Gaussian Case
Suppose that θ is N(θ0, σ0) and n(T) is N(0,σ). Also, assume that θ and n are
independent. Find the conditional mean (Bayesian) estimator θb . First, when interpreted as a
random variable, z = θ + n(T) is Gaussian with mean θ0 and variance σ02 + σ2. Hence, from the
conditional mean formula (5-38) for the Gaussian case, we have
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-20
z0
b 0 02 20
ˆ (z) E[ z] = r (z )θσθ = θ⎮ θ + − θ
σ + σ, (5-63)
where rθZ is the correlation coefficient between θ and z. Now, we must find rθZ. Observe that
2
0 0 0 0 0 0z 2 2 2 2 2 2
0 0 0 0 0 0
20 0
2 2 2 20 0 0
E[( )(z )] E[( )([ ] (T))] E[( ) ( ) (T)]r
E[( ) ] ,
θθ − θ − θ θ − θ θ − θ + θ − θ + θ − θ= = =σ σ + σ σ σ + σ σ σ + σ
θ − θ σ= =σ σ + σ σ + σ
n n
(5-64)
since θ and η(T) are independent. Hence, the Bayesian estimator is
20
b 0 02 20
ˆ (z) (z )σθ = θ + − θσ + σ
. (5-65)
The error is θ = θ - bθ̂ , and E[ θ ] = 0 as shown by (5-61). That is, bθ̂ is an unbiased
estimator since its expected value is the mean of the quantity being estimated. The variance of
θ is
22
2 0b 0 02 2
0
22 22 20 0
0 0 0 02 2 2 20 0
ˆVAR[ ] E[( ) ] E ( ) (z )
E[( ) ] 2 E[( )(z )] E[(z ) ]
⎡ ⎤⎛ ⎞σ⎢ ⎥θ = θ − θ = θ − θ − − θ⎜ ⎟⎜ ⎟⎢ ⎥σ + σ⎝ ⎠⎢ ⎥⎣ ⎦
⎛ ⎞σ σ= θ − θ − θ − θ − θ + − θ⎜ ⎟⎜ ⎟σ + σ σ + σ⎝ ⎠
. (5-66)
Due to independence, we have
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-21
20 0 0 0 0 0 0E[( )(z )] E[( )( (T))] E[( )( )]θ − θ − θ = θ − θ θ − θ + = θ − θ θ − θ = σn (5-67)
2 2 2 2
0 0 0E[(z ) ] E[( (T)) ]− θ = θ − θ + = σ + σn (5-68)
Now, use (5-67) and (5-68) in (5-66) to obtain
22 2
2 2 2 20 00 0 02 2 2 2
0 0
220 2 2
0
VAR[ ] 2 [ ]⎛ ⎞σ σθ = σ − σ + σ + σ⎜ ⎟⎜ ⎟σ + σ σ + σ⎝ ⎠
⎡ ⎤σ= σ ⎢ ⎥σ + σ⎢ ⎥⎣ ⎦
. (5-69)
As expected, the variance of error θ approaches zero as the noise average power (i.e., the
variance) σ2 → 0. On the other hand, as σ2 → ∞, we have VAR[ θ ] → σ02 (this is the noise
dominated case). As can be seen from (5-69), for all values of σ2, we have VAR[ θ ] < VAR[θ]
= σ02, which means that bθ̂ will always out perform the simple approach of selecting mean E[θ]
= θ0 as the estimate of θ.
Example (5-10): Bayesian Estimator for Multiple Sample Gaussian Case
As given by (5-69), the variance (i.e., the uncertainty) of bθ̂ may be too large for some
applications. We can use a sample mean (involving multiple samples) in the Bayesian estimator
to lower its variance.
Take multiple samples of z(tk) = θ + n(tk), 1 ≤ k ≤ N (tk, 1 ≤ k ≤ N, denote the times at
which samples are taken). Assume that the tk are far enough apart in time that n(tk) and n(tj) are
independent for tk ≠ tj (for example, this would be the case if the time intervals between samples
are large compared to the reciprocal of the bandwidth of noise n(t)). Define the sample mean of
the collected data as
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-22
Nk
k 1
1z z(t )N =
≡ = θ +∑ n (5-70)
where
N
kk 1
1 (t )N =
≡ ∑n n (5-71)
is the sample mean of the noise. The quantity n is Gaussian with mean E[ n ] = 0; due to
independence, the variance is
N 2
k2k 1
1VAR[ ] VAR[ (t )]NN =
σ≡ =∑n n . (5-72)
Note that z ≡ θ + n has the same form regardless of the number of samples N. Hence,
based on the data z , the Bayesian estimator for θ has the same form regardless of the number of
samples. We can adopt (5-65) and write
20
b 0 02 20
ˆ (z) (z )/ N
σθ = θ + − θσ + σ
. (5-73)
That is, in the Bayesian estimator formula, use sample mean z instead of the single sample z.
Adapt (5-69) to the multiple sample case and write the variance of error θ = θ - bθ̂ as
2
20 2 2
0
/ NVAR[ ]/ N
⎡ ⎤σθ = σ ⎢ ⎥σ + σ⎢ ⎥⎣ ⎦
. (5-74)
By making the number N of averaged samples large enough, we can “average out the noise” and
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-23
make (5-74) arbitrarily small.
Conditional Multidimensional Gaussian Density
Let X be an n × 1 Gaussian vector with E[ X ] = 0 and a positive definite n × n
covariance matrix ΛX. Likewise, define Y as a zero-mean, m × 1 Gaussian random vector with
m × m positive definite covariance matrix ΛY. Also, define n × m matrix ΛXY = E[ XYT ]; note
that ΛXYT = ΛYX = E[YXT ], an m × n matrix. Find the conditional density f( X⎮Y ).
First, define the (n+m) × 1 “super vector”
ZXY
=LNMOQP
, (5-75)
which is obtained by “stacking” X on top of Y . The (n+m) × (n+m) covariance matrix for Z
is
X XYT T T
ZYX Y
XE[Z Z ] E X Y
Y
⎡ ⎤⎡ ⎤ Λ Λ⎡ ⎤⎢ ⎥⎢ ⎥ ⎢ ⎥⎡ ⎤Λ = = =⎮⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ Λ Λ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦⎣ ⎦⎣ ⎦
. (5-76)
The inverse of this matrix can be expressed as (observe that ΛZΛZ-1 = I)
ΛZ
A BB CT
− =LNMOQP
1 , (5-77)
where A is n×n, B is n×m and C is m×m. These intermediate block matrices are given by
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-24
A I C
B A C
C I A
X XY Y YX X XY YX X
XY Y X XY
Y YX X XY Y YX XY Y
= − = +
= − = −
= − = +
− − − −
− −
− − − −
( ) [ ]
( ) [ ]
Λ Λ Λ Λ Λ Λ Λ Λ
Λ Λ Λ Λ
Λ Λ Λ Λ Λ Λ Λ Λ
1 1 1 1
1 1
1 1 1 1
(5-78)
Now, the joint density is
f X Y X YA BB C
XYXY
Z
T TTn m
( , )( )
exp= −LNMOQPLNMOQP
LNMM
OQPP+
1
212
π Λ Y (5-79)
The marginal density is
f Y Y YY
Y
TYm
( )( )
exp= − −1
212
1
π ΛΛ (5-80)
From Bayes Theorem for densities
f X Y f X Yf Y
X YA BB C
XY
XY
Y
T TT
Yn Z
Y
( ) ( )( ) ( )
expY , Y= = −−
LNM
OQPLNMOQP
LNMM
OQPP−
1
2
12 1
π ΛΛ
Λ (5-81)
However, straightforward but tedious matrix algebra yields
X YA BB C
XY
X YAX BY
B X C Y
X AX BY Y B X C Y
X AX X BY Y C Y
T TT
Y
T TT
Y
T T TY
T T TY
Y Y−LNM
OQPLNMOQP
=+
+ −LNM
OQP
= + + + −
= + + −
− −
−
−
Λ Λ
Λ
Λ
1 1
1
12
(
[ ] [ ( ]
[ ]
)
) (5-82)
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-25
(Note that the scalar identity X BYT = Y B XT T was used in obtaining this result). From the
previous page, use the results B A XY Y= − −Λ Λ 1 and C AY Y YX XY Y− =− − −Λ Λ Λ Λ Λ1 1 1 to write
X YA BB C
XY
X AX X A Y Y A Y
X Y A X Y
T TT
Y
T TXY Y
TY YX XY Y
XY YT
XY Y
Y −LNM
OQPLNMOQP
= − +
= − −
−− − −
− −
ΛΛ Λ Λ Λ Λ Λ
Λ Λ Λ Λ
11 1 1
1 1
2
(5-83)
To simplify the notation, define
M YXY Y
X XY Y YX
≡ ×
≡ = − ×
−
−
Λ Λ
Λ Λ Λ Λ
1
1
(an m 1 vector)
Q A (an n n matrix)-1 (5-84)
so that the quadratic form becomes
X YA BB C
XY
X M Q X MT TT
Y
T Y −LNM
OQPLNMOQP
= − −−−
Λ 11( ) ( ) (5-85)
Now, we must find the quotient ΛΛ
Z
Y. Write
1
X XY X XY Y YX XY
Z 1YX Y Y YXY
n
m
I 0
I0
−
−
⎡ ⎤Λ Λ⎡ ⎤ ⎡ ⎤Λ − Λ Λ Λ Λ⎢ ⎥⎢ ⎥ ⎢ ⎥Λ = = ⎢ ⎥⎢ ⎥ ⎢ ⎥Λ Λ Λ ΛΛ⎢ ⎥⎢ ⎥ ⎢ ⎥⎣ ⎦ ⎣ ⎦⎣ ⎦
(5-86)
Im is the m × m identity matrix and In is the n × n identity matrix. Hence,
Λ Λ Λ Λ Λ ΛZ X XY Y YX Y= − −1 (5-87)
EE385 Class Notes 11/13/2012 John Stensby
Updates at http://www.ece.uah.edu/courses/ee385/ 5-26
ΛΛ
Λ Λ Λ ΛZ
YX XY Y YX Q= − =−1 (5-88)
Use Equation (5-85) and (5-88) in fX(x⎮y) to obtain
T 11
2n1f (X Y) exp (X M) Q (X M)
(2 ) Q−⎡ ⎤⎮ = − − −⎣ ⎦π
, (5-89)
where
1
XY Y
1X XY Y YX
-1
M Y (an m 1 vector)
Q A (an n n matrix)
−
−
≡ Λ Λ ×
≡ = Λ − Λ Λ Λ × (5-90)
Vector M = E[ X YY ] is the conditional expectation vector. Matrix Q E X M X M YT= − −[( )( ) ]Y
is the conditional covariance matrix.
Generalizations to Nonzero Mean Case
Suppose E[ X ] = MX and E[Y ] = MY , then
f X YQ
X M Q X Mn
T( )( )
exp ( ) ( )Y = − − −−1
212
1
π, (5-91)
where
1
X XY Y Y
T 1X XY Y YX
(M E[X Y] M Y M ) (an n 1 vector)
Q E[(X M)(X M) Y] (an n n matrix).
−
−
≡ ⎮ = + Λ Λ − ×
≡ − − ⎮ = Λ − Λ Λ Λ × (5-92)
Top Related