LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are...

24
Lecture Notes 3 Random Vectors Specifying a Random Vector Mean and Covariance Matrix Coloring and Whitening Gaussian Random Vectors EE 278: Random Vectors Page 3 – 1

Transcript of LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are...

Page 1: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Lecture Notes 3

Random Vectors

• Specifying a Random Vector

• Mean and Covariance Matrix

• Coloring and Whitening

• Gaussian Random Vectors

EE 278: Random Vectors Page 3 – 1

Page 2: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Specifying a Random Vector

• Let X1, X2, . . . , Xn be random variables defined on the same probability space.We define a random vector (RV) as

X =

X1

X2...

Xn

• X is completely specified by its joint cdf for x = (x1, x2, . . . , xn):

FX(x) = P{X1 ≤ x1, X2 ≤ x2, . . . , Xn ≤ xn} , x ∈ Rn

• If X is continuous, i.e., FX(x) is a continuous function of x, then X can bespecified by its joint pdf:

fX(x) = fX1,X2,...,Xn(x1, x2, . . . , xn) , x ∈ Rn

• If X is discrete then it can be specified by its joint pmf:

pX(x) = pX1,X2,...,Xn(x1, x2, . . . , xn) , x ∈ Xn

EE 278: Random Vectors Page 3 – 2

Page 3: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

• A marginal cdf (pdf, pmf) is the joint cdf (pdf, pmf) for a subset of{X1, . . . , Xn}; e.g., for

X =

X1

X2

X3

the marginals are

fX1(x1) , fX2(x2) , fX3(x3)

fX1,X2(x1, x2) , fX1,X3(x1, x3) , fX2,X3(x2, x3)

• The marginals can be obtained from the joint in the usual way. For the previousexample,

FX1(x1) = limx2,x3→∞

FX(x1, x2, x3)

fX1,X2(x1, x2) =

∫ ∞

−∞

fX1,X2,X3(x1, x2, x3) dx3

EE 278: Random Vectors Page 3 – 3

Page 4: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

• Conditional cdf (pdf, pmf) can also be defined in the usual way. E.g., theconditional pdf of Xn

k+1 = (Xk+1, . . . , Xn) given Xk = (X1, . . . ,Xk) is

fXnk+1

|Xk(xnk+1|xk) =

fX(x1, x2, . . . , xn)

fXk(x1, x2, . . . , xk)=

fX(x)

fXk(xk)

• Chain Rule: We can write

fX(x) = fX1(x1)fX2|X1(x2|x1)fX3|X1,X2

(x3|x1, x2) · · · fXn|Xn−1(xn|xn−1)

Proof: By induction. The chain rule holds for n = 2 by definition of conditionalpdf. Now suppose it is true for n− 1. Then

fX(x) = fXn−1(xn−1)fXn|Xn−1(xn|xn−1)

= fX1(x1)fX2|X1(x2|x1) · · · fXn−1|Xn−2(xn−1|xn−2)fXn|Xn−1(xn|xn−1) ,

which completes the proof

EE 278: Random Vectors Page 3 – 4

Page 5: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Independence and Conditional Independence

• Independence is defined in the usual way; e.g., X1, X2, . . . , Xn are independentif

fX(x) =n∏

i=1

fXi(xi) for all (x1, . . . , xn)

• Important special case, i.i.d. r.v.s: X1, X2, . . . , Xn are said to be independent,identically distributed (i.i.d.) if they are independent and have the samemarginals

Example: if we flip a coin n times independently, we generate i.i.d. Bern(p)r.v.s. X1, X2, . . . ,Xn

• R.v.s X1 and X3 are said to be conditionally independent given X2 if

fX1,X3|X2(x1, x3|x2) = fX1|X2

(x1|x2)fX3|X2(x3|x2) for all (x1, x2, x3)

• Conditional independence neither implies nor is implied by independence;X1 and X3 independent given X2 does not mean that X1 and X3 areindependent (or vice versa)

EE 278: Random Vectors Page 3 – 5

Page 6: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

• Example: Coin with random bias. Given a coin with random bias P ∼ fP (p),flip it n times independently to generate the r.v.s X1,X2, . . . ,Xn, whereXi = 1 if i-th flip is heads, 0 otherwise

◦ X1, X2, . . . , Xn are not independent

◦ However, X1, X2, . . . , Xn are conditionally independent given P ; in fact, theyare i.i.d. Bern(p) for every P = p

• Example: Additive noise channel. Consider an additive noise channel with signalX , noise Z , and observation Y = X + Z , where X and Z are independent

◦ Although X and Z are independent, they are not in general conditionallyindependent given Y

EE 278: Random Vectors Page 3 – 6

Page 7: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Mean and Covariance Matrix

• The mean of the random vector X is defined as

E(X) =[

E(X1) E(X2) · · · E(Xn)]T

• Denote the covariance between Xi and Xj , Cov(Xi, Xj), by σij (so thevariance of Xi is denoted by σii, Var(Xi), or σ

2Xi)

• The covariance matrix of X is defined as

ΣX =

σ11 σ12 · · · σ1n

σ21 σ22 · · · σ2n... ... . . . ...

σn1 σn2 · · · σnn

• For n = 2, we can use the definition of correlation coefficient to obtain

ΣX =

[

σ11 σ12

σ21 σ22

]

=

[

σ2X1

ρX1,X2σX1σX2

ρX1,X2σX1σX2 σ2X2

]

EE 278: Random Vectors Page 3 – 7

Page 8: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Properties of Covariance Matrix ΣX

• ΣX is real and symmetric (since σij = σji)

• ΣX is positive semidefinite, i.e., the quadratic form

aTΣXa ≥ 0 for every real vector a

Equivalently, all the eigenvalues of ΣX are nonnegative, and also all principalminors are nonnegative

• To show that ΣX is positive semidefinite we write

ΣX = E[

(X− E(X))(X− E(X))T]

,

i.e., as the expectation of an outer product. Thus

aTΣXa = aT E[

(X− E(X))(X− E(X))T]

a

= E[

aT (X− E(X))(X− E(X))Ta]

= E[

(aT (X− E(X)))2]

≥ 0

EE 278: Random Vectors Page 3 – 8

Page 9: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Which of the Following Can Be a Covariance Matrix ?

1.

1 0 00 1 00 0 1

2.

1 2 12 1 11 1 1

3.

1 0 11 2 10 1 3

4.

−1 1 11 1 11 1 1

5.

1 1 11 2 11 1 3

6.

1 2 32 4 63 6 9

EE 278: Random Vectors Page 3 – 9

Page 10: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Coloring and Whitening

• Square root of covariance matrix: Let Σ be a covariance matrix. Then thereexists an n× n matrix Σ1/2 such that Σ = Σ1/2(Σ1/2)T . The matrix Σ1/2 iscalled the square root of Σ

• Coloring: Let X be white RV, i.e., has zero mean and ΣX = aI , a > 0. Assumewithout loss of generality that a = 1

Let Σ be a covariance matrix, then the RV Y = Σ1/2X has covariance matrixΣ (why?)

Hence we can generate a RV with any prescribed covariance from a white RV

• Whitening: Given a zero mean RV Y with nonsingular covariance matrix Σ,then the RV X = Σ−1/2Y is white

Hence, we can generate a white RV from any RV with nonsingular covariancematrix

• Coloring and whitening have applications in simulations, detection, andestimation

EE 278: Random Vectors Page 3 – 10

Page 11: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Finding Square Root of Σ

• For convenience, we assume throughout that Σ is nonsingular

• Since Σ is symmetric, it has n real eigenvalues λ1, λ2, . . . , λn and ncorresponding orthogonal eigenvectors u1,u2, . . . ,un

Further, since Σ is positive definite, the eigenvalues are all positive

• Thus, we have

Σui = λiui, λi > 0, i = 1, 2, . . . , n

uTi uj = 0 for every i 6= j

Without loss of generality assume that the ui vectors are unit vectors

• The first set of equations can be rewritten in the matrix form

ΣU = UΛ,

where

U = [u1 u2 . . . un]

and Λ is a diagonal matrix with diagonal elements λi

EE 278: Random Vectors Page 3 – 11

Page 12: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

• Note that U is a unitary matrix (UTU = UUT = I), hence

Σ = UΛUT

and the square root of Σ isΣ1/2 = UΛ1/2,

where Λ1/2 is a diagonal matrix with diagonal elements λ1/2i

• The inverse of the square root is straightforward to find as

Σ−1/2 = Λ−1/2UT

• Example: Let

Σ =

[

2 11 3

]

To find the eigenvalues of Σ, we find the roots of the polynomial equation

det(Σ− λI) = λ2 − 5λ+ 5 = 0,

which gives λ1 = 3.62, λ2 = 1.38

To find the eigenvectors, consider[

2 11 3

] [

u11

u12

]

= 3.62

[

u11

u12

]

,

EE 278: Random Vectors Page 3 – 12

Page 13: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

and u211 + u2

12 = 1, which yields

u1 =

[

0.530.85

]

Similarly, we can find the second eigenvector

u2 =

[

−0.850.53

]

Hence,

Σ1/2 =

[

0.53 −0.850.85 0.53

] [√3.62 0

0√1.38

]

=

[

1 −11.62 0.62

]

The inverse of the square root is

Σ−1/2 =

[

1/√3.62 0

0 1/√1.38

] [

0.53 0.85−0.85 0.53

]

=

[

0.28 0.45−0.72 0.45

]

EE 278: Random Vectors Page 3 – 13

Page 14: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Geometric Interpretation

• To generate a RV Y with covariance matrix Σ from white RV X, we use thetransformation Y = UΛ1/2X

• Equivalently, we first scale each component of X to obtain the RV Z = Λ1/2X

And then rotate Z using U to obtain Y = UZ

• We can visualize this by plotting xTIx = c, zTΛz = c, and yTΣy = c

x1

x2

z1

z2

y1

y2

Z = Λ1/2X

=⇒=⇒

Y = UZ

⇐= ⇐=

X = Λ−1/2Z Z = UTY

EE 278: Random Vectors Page 3 – 14

Page 15: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Cholesky Decomposition

• Σ has many square roots:

If Σ1/2 is a square root, then for any unitary matrix V , Σ1/2V is also a squareroot since Σ1/2V V T (Σ1/2)T = Σ

• The Cholesky decomposition is an efficient algorithm for computing lowertriangle square root that can be used to perform coloring causally (sequentially)

• For n = 3, we want to find a lower triangle matrix (square root) A such that

Σ =

σ11 σ12 σ13

σ21 σ22 σ23

σ31 σ32 σ33

=

a11 0 0a21 a22 0a31 a32 a33

a11 a21 a310 a22 a320 0 a33

The elements of A are computed in a raster scan manner:

a11: σ11 = a211 ⇒ a11 =√σ11

a21: σ21 = a21a11 ⇒ a21 = σ21/a11

a22: σ22 = a221 + a222 ⇒ a22 =√

σ22 − a221

a31: σ31 = a11a31 ⇒ a31 = σ31/a11

EE 278: Random Vectors Page 3 – 15

Page 16: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

a32: σ32 = a21a31 + a22a32 ⇒ a32 = (σ32 − a21a31)/a22

a33: σ33 = a231 + a232 + a233 ⇒ a33 =√

σ33 − a231 − a232

• The inverse of a lower triangle square root is also lower triangular

• Coloring and whitening summary:

◦ Coloring:

X Σ1/2 Y

ΣX = I ΣY = Σ

◦ Whitening:

Y Σ−1/2 X

ΣX = IΣY = Σ

◦ Lower triangle square root and its inverse can be efficiently computed usingCholesky decomposition

EE 278: Random Vectors Page 3 – 16

Page 17: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Gaussian Random Vectors

• A random vector X = (X1, . . . , Xn) is a Gaussian random vector (GRV) (orX1,X2, . . . ,Xn are jointly Gaussian r.v.s) if the joint pdf is of the form

fX(x) =1

(2π)n2 |Σ|12

e−12(x − µ)TΣ−1(x − µ) ,

where µ is the mean and Σ is the covariance matrix of X, and∣

∣Σ∣

∣ > 0, i.e., Σis positive definite

• Verify that this joint pdf is the same as the case n = 2 from Lecture Notes 2

• Notation: X ∼ N (µ,Σ) denotes a GRV with given mean and covariance matrix

• Since Σ is positive definite, Σ−1 is positive definite. Thus if x− µ 6= 0,

(x− µ)TΣ−1(x− µ) > 0 ,

which means that the contours of equal pdf are ellipsoids

• The GRV X ∼ N (0, aI), where I is the identity matrix and a > 0, is calledwhite; its contours of equal joint pdf are spheres centered at the origin

EE 278: Random Vectors Page 3 – 17

Page 18: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Properties of GRVs

• Property 1: For a GRV, uncorrelation implies independence

This can be verified by substituting σij = 0 for all i 6= j in the joint pdf.

Then Σ becomes diagonal and so does Σ−1, and the joint pdf reduces to theproduct of the marginals Xi ∼ N (µi, σii)

For the white GRV X ∼ N (0, aI), the r.v.s are i.i.d. N (0, a)

• Property 2: Linear transformation of a GRV yields a GRV, i.e., given anym× n matrix A, where m ≤ n and A has full rank m, then

Y = AX ∼ N (Aµ, AΣAT )

• Example: Let

X ∼ N(

0,

[

2 11 3

])

Find the joint pdf of

Y =

[

1 11 0

]

X

EE 278: Random Vectors Page 3 – 18

Page 19: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

Solution: From Property 2, we conclude that

Y ∼ N(

0,

[

1 11 0

] [

2 11 3

] [

1 11 0

])

= N(

0,

[

7 33 2

])

Before we prove Property 2, let us show that

E(Y) = Aµ and ΣY = AΣAT

These results follow from linearity of expectation. First, expectation:

E(Y) = E(AX) = AE(X) = Aµ

Next consider the covariance matrix:

ΣY = E[

(Y − E(Y))(Y − E(Y))T]

= E[

(AX−Aµ)(AX−Aµ)T]

= AE[

(X− µ)(X− µ)T]

AT = AΣAT

Of course this is not sufficient to show that Y is a GRV—we must also showthat the joint pdf has the right form

We do so using the characteristic function for a random vector

EE 278: Random Vectors Page 3 – 19

Page 20: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

• Definition: If X ∼ fX(x), the characteristic function of X is

ΦX(ω) = E(

eiωTX

)

,

where ω is an n-dimensional real valued vector and i =√−1

Thus

ΦX(ω) =

∫ ∞

−∞

∫ ∞

−∞

. . .

∫ ∞

−∞

fX(x)eiωTx dx

This is the inverse of the multi-dimensional Fourier transform of fX(x), whichimplies that there is a one-to-one correspondence between ΦX(ω) and fX(x).The joint pdf can be found by taking the Fourier transform of ΦX(ω), i.e.,

fX(x) =

∫ ∞

−∞

∫ ∞

−∞

. . .

∫ ∞

−∞

1

(2π)nΦX(ω)e−iωTx dω

• Example: The characteristic function for X ∼ N (µ, σ2) is

ΦX(ω) = e−12ω

2σ2 + iµω ,

and for a GRV X ∼ N (µ,Σ),

ΦX(ω) = e−12ω

TΣω + iωTµ

EE 278: Random Vectors Page 3 – 20

Page 21: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

• Now let’s go back to proving Property 2

Since A is an m× n matrix, Y = AX and ω are m-dimensional. Therefore thecharacteristic function of Y is

ΦY(ω) = E(

eiωTY

)

= E(

eiωTAX

)

= ΦX(ATω)

= e−12(A

Tω)TΣ(AT

ω) + iωTAµ

= e−12ω

T (AΣAT)ω + iωTAµ

Thus Y = AX ∼ N (Aµ, AΣAT )

• An equivalent definition of GRV: X is a GRV iff for every real vector a 6= 0, ther.v. Y = aTX is Gaussian (see HW for proof)

• Whitening transforms a GRV to a white GRV; conversely, coloring transforms awhite GRV to a GRV with prescribed covariance matrix

EE 278: Random Vectors Page 3 – 21

Page 22: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

• Property 3: Marginals of a GRV are Gaussian, i.e., if X is GRV then for anysubset {i1, i2, . . . , ik} ⊂ {1, 2, . . . , n} of indexes, the RV

Y =

Xi1

Xi2...

Xik

is a GRV

• To show this we use Property 2. For example, let n = 3 and Y =

[

X1

X3

]

We can express Y as a linear transformation of X:

Y =

[

1 0 00 0 1

]

X1

X2

X3

=

[

X1

X3

]

Therefore

Y ∼ N([

µ1

µ3

]

,

[

σ11 σ13

σ31 σ33

])

• As we have seen in Lecture Notes 2, the converse of Property 3 does not hold ingeneral, i.e., Gaussian marginals do not necessarily mean that the r.v.s arejointly Gaussian

EE 278: Random Vectors Page 3 – 22

Page 23: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

• Property 4: Conditionals of a GRV are Gaussian, more specifically, if

X =

X1

−−X2

∼ N

µ1

−−µ2

,

Σ11 | Σ12

−− | −−Σ21 | Σ22

,

where X1 is a k-dim RV and X2 is an n− k-dim RV, then

X2 | {X1 = x} ∼ N(

Σ21Σ−111 (x− µ1) + µ2 , Σ22 − Σ21Σ

−111 Σ12

)

Compare this to the case of n = 2 and k = 1:

X2 | {X1 = x} ∼ N(

σ21

σ11(x− µ1) + µ2 , σ22 −

σ212

σ11

)

• Example:

X1

−−X2

X3

∼ N

1−−22

,

1 | 2 1−− | −− −−2 | 5 21 | 2 9

EE 278: Random Vectors Page 3 – 23

Page 24: LectureNotes3 RandomVectorsisl.stanford.edu/~abbas/ee278/lect03.pdf · The elements of A are computed in a raster scan manner: a11: σ11 = a2 11 ⇒ a11 = √ σ11 a21: σ21 = a21a11

From Property 4, it follows that

E(X2 |X1 = x) =

[

21

]

(x− 1) +

[

22

]

=

[

2xx+ 1

]

Σ{X2|X1=x} =

[

5 22 9

]

−[

21

]

[

2 1]

=

[

1 00 8

]

• The proof of Property 4 follows from properties 1 and 2 and the orthogonalityprinciple (HW exercise)

EE 278: Random Vectors Page 3 – 24