2102-502 Questions Random Signals and Systemspioneer.netserv.chula.ac.th/~nsuvit/502Notes.pdf ·...

2102-502 Random Signals and Systems, Chulalongkorn University

2102-502 Random Signals and Systems

Text “Random Signals and Systems”, Richard E. Mortensen, Wiley. Chapter 1 – 6, 8 and 11. Chapter 1 Discussion of Probability and Stochastic Processes Probability is a quantity of the uncertainty that a random event will occur. Other similar quantities such as fuzzy, plausibility, believe, etc. will do the same job but with different properties. Event is a set of some elementary events. This random event occurred if one (and only one) of the elementary events in the set occurred. Two events can occur at the same time if they share the same elementary event that occurred. Sample Space is a universal set that includes all possible elementary events. Any outcome must be member of this set but not all elementary events in this set may occur (some zero probability event).

2102-502 Random Signals and Systems, Chulalongkorn University Page 2

Questions • Is the probability really exist in real physical world

in the same way as Newton’s law of motion? • What is the mechanism that tries to balance the

number of Head and Tail in flipping a balanced coin for 106 times? God?

• Can someone prove or find the correct value of the probability for any event?

Answers • Probability is just a mathematics concept or model

that try to fit, explain and predict the complex, uncertain real world.

• We can use other models to fit and explain the real world as well, e.g., fuzzy, if appropriated. There is no the genuine or best model.

• We can confirm the correctness of probability value up to some accuracy by comparing the predictions with the experiments. But we still do not know the true probabilities any way.

• The probability value will be assigned to all events and if this gives good enough predictions compared to the experiments or observations, then we can accept that assignment.


• The frequency of occurrence for large number of experiments can be thought of as the probability estimation as well as the physical interpretation of the probability itself. But in many cases, we can not perform many experiments and this type of interpretation may not be useful in real life. For example, to estimate the probability of one particular student getting A from this course, he must take this course for 100 times and if he gets A for 20 times then his probability is 0.2 !?!?!?!

Examples • Flip 10 coins: (TTT…T), (HTT…T), (THT…T),

(HHT…T), …, (HHH…H) with 1,024 events • No. of Heads in flipping 10 coins: 0, 1, …, 10 • Sum of 2 dices: 2, 3, …, 12 • Times of coin flipping until 1st Head: 1, 2, …, ∞ • Fractional part of body weight in kg: [0,1) • Noise voltage: (-∞,∞), the whole real line Finite Case: We can assign arbitrary probabilities to all elementary events such that 0 ≤ prob ≤ 1 and total sum of all probabilities is 1. Then we can find the probabilities of any set as the sum of elementary probabilities without any conflict.


Infinite Case: The probability of each elementary event is → 0 and we can only assign probabilities to some subset of elementary events. But we will not be able to find the probabilities of all subsets. Only some subsets, called admissible subset, can have the probabilities.

Probability Trio (Ω, A, P) is the defined entities to support the use of probability space. Ω: sample space with finite, countably infinite or uncountably infinite number of elementary events. A: a family of admissible subsets (events) of Ω to which the probabilities can be assigned, also called Borel field or σ - field.

• Finite case: all 2|Ω| subsets of Ω (power set) • Infinite case: only some subsets of Ω

P: the probability measures (value) assigned to all admissible subsets satisfying the following Axiom

k k

k k i jk 1k 1

P( ) 0P( ) 1

P( ) 0,

P( ) P( ), for∞ ∞

==

φ =Ω =

Α ≥ Α ∈

Α = Α Α Α = φ∑ ∩∪A


Examples Flipping a coin infinite times will have a sample space of countably infinite number of outcomes, and each outcome can be represented by ∞ bits sequence. The probability of each sequence will be zero and we can only assign P to some subsets of sequences. One possible family of subsets is the power set of

B = H…, TH…, TTH…, TTTH…, TTTTH…, …

which is also countably infinite set of subsets of Ω. The probabilities assigned to members (subsets) of B are 1/2 , 1/4 , 1/8 , 1/16 , 1/32 , …respectively if the coin is balanced. Note that the subsets are disjointed and the probability of any combinations (union) of these subsets will be the sum of their probabilities. But, still, there exists infinitely many subsets of Ω that we can not find the probabilities from this given probability trio. For example, subsets of all outcomes which is irrational number in binary representation, e.g., HTHH… represented as 0.1011… .

0 1

0.00

010.

0000

1

0.00

1

0.01 0.1

0.11

H...TH...TTH...TTTH...


Random Variables are the real numbers X assigned (or mapped) to any elementary events ω of Ω, or as a real function X(ω). A set of elementary events then can be represented by a set of real numbers with the same probability. For infinite case, we cannot assign random variables X(ω) arbitrarily. But the assignment for the admissible random variables must agree with the admissible subset A as following. The assignment must allow us to find the probabilities of all intervals, I = (-∞, a] for all a, on the real line. If X-1(I) = ω∈Ω | X(ω)∈I is a set of random events represented by real interval I. Then X is an admissible random variable if X-1(I) ∈ A for all a. This will guarantee the existence of the probabilities cdf (Cumulative Distribution Function) defined as FX(a) = P-∞ < X ≤ a = P(I) = PX-1(I).

0 1 2 3 4 5

Real Line

-1-2 ......

X(ω)Ω

AdmissibleSubset


If Ω is uncountably infinite, FX will be differentiable and define pdf (Probability Density Function) as

X X

b

Xa

df (x) F (x)dx

Pa X b f (x)dx

=

< ≤ = ∫

Properties of cdf 1.Non-decreasing; a < b ⇒ FX(a) ≤ FX(b) 2.lim x→+∞ FX(x) = 1

lim x→-∞ FX(x) = 0 3.Continuous from the

right, any discontinuity has upper value

X

FX

1

X

fX

Discrete Random Variable

Properties of pdf 1.Non-negative but can

be greater than 1 2. Xf (x)dx 1

∞

−∞=∫

3.Discontinuity in cdf will give delta function pdf

X

FX

1

X

fX

Continuous Random Variable


Expected Value or Mean of a random variable is the best guess that gives minimum average square error.

Discrete:

k

n

k kk 1

X( )P | X( ) a E[X]

a P Lebesque Integral

ω∈Ω

=

µ = ω ω∈Ω ω = =

= =

∑

∑

Continuous:

Xxf (x)dx E[X]∞

−∞

µ = =∫

If X is a random variable then Y = g(X) is also a random variable. The expected value of function g(X) is defined as

X

Y Y

E[g(X)] g(x)f (x)dx

E[Y] yf (y)dy

∞

−∞

∞

−∞

= =

= = µ

∫

∫


It is possible to have many random variables defined on the same probability trio (Ω, A, P). Define the Joint Cumulative Distribution Function as

1 2 nX ,X , ,X 1 2 n

1 1 2 2 n n

F (a ,a , ,a )

P x a , x a , , x a= −∞ < ≤ −∞ < ≤ −∞ < ≤… …

…

11 12 13 14 15 16

21 22 23 24 25 26

31 32 33 34 35 36

41 42 43 44 45 46

51 52 53 54 55 56

61 62 63 64 65 66

X2

X11 2 3 4 5 6

1

2

3

4

5

6

Ω

And if jointly differentiable with respect to all arguments then, define the Joint Density Function as

1 2 n

1 2 n

X ,X , ,X 1 2 n

n

X ,X , ,X 1 2 n1 2 n

f (x , x , , x )

F (x , x , , x )x x x

∂=

∂ ∂ ∂

…

…

…

……

We can use vector-valued random variable notation for jointly distributed random variables as X = [x1 x2 … xn]T with fX(x) as joint density function.


Independence and Conditional Probability Given probability trio (Ω, A, P), let A, B ∈ A and if

P(A∩B) = P(A)P(B)

we call A and B independent. In general, if P(B) ≠ 0, define conditional probability of A given B as

P(A|B) = P(A∩B)/P(B)

If A and B are independent then P(A|B) = P(A) which means that event B has no influence on event A.

Bivariate : Given X,Y defined on the same (Ω, A, P) with joint density fXY(x,y). Define marginal density as

X XY

Y XY

f (x) f (x, y)dy :

f (y) f (x, y)dx :

to eliminate Y

to eliminate X

∞

−∞

∞

−∞

=

=

∫

∫

If X and Y are independent then fXY(x,y) = fX(x)fY(y). Define conditional density of X given Y=y as

fX|Y(x|y) =fXY(x,y)/fY(y)

and if X and Y are independent then fX|Y(x|y) = fX(x).


Multivariate : X1,X2, …, Xn defined on the same (Ω, A, P) with joint density

1 2 nX ,X , ,X 1 2 nf (x ,x , ,x )… … . Let 1 < m < n, and define the conditional density of Xm+1,Xm+2, …, Xn given X1,X2, …, Xm as

m 1 n 1 m

1 2 n

1 2 m

X , ,X |X , ,X m 1 m 2 n 1 2 m

X ,X , ,X 1 2 n

X ,X , ,X 1 2 m

f (x , x , , x | x , x , , x )

f (x , x , , x )f (x , x , , x )

+ + +

=

… …

…

…

… ………

where the marginal density

1 2 m

1 n

X ,X , ,X 1 2 m

X , ,X 1 n m 1 m 2 n

f (x , x , , x )

f (x , , x ) dx dx dx∞ ∞

+ +−∞ −∞

=

∫ ∫…

…

…

…

If X1, …, Xm are independent from Xm+1, …, Xn then

m 1 m 2 n 1 2 m m 1 m 2 nX ,X , ,X |X ,X , ,X X ,X , ,Xf f+ + + +

=… … …

P(Y2=4|Y1=2) = P(Y2=4,Y1=2)/P(Y1=2) = (1/36)/(4/36) = 1/4 ≠ P(Y2=4) = 3/36

1213

1415

2122

2324

2526

3132

3334

3536

4142

4344

4546

5152

5354

5556

6162

6364

6566

Y2

Y11 2 3 4 50

2 Ω11

16

3

54

678

109

1112

-5 -4 -3 -2 -1


Hilbert Space of Second-Order Random Variables

Linear Vector Space: closed, associative, commutative, distributive, identity, additive inverse

Banach Space: Complete ≡ every Cauchy sequence converges in the normed vector space. Norm of vector is a measure of distance or length of vector.

X 0; X 0 X 0; X X ;

X Y X Y

iff≥ = = α = α ⋅

+ ≤ +

Hilbert Space: Inner product of two vectors measures the orientation between them. Normed vector space induced by inner product is complete.

12

X, Y Y, X ; X, X 0 X 0;

X, Y X, Y ; X Y, Z X, Z Y, Z ;

: X X, XInduced Norm

∗ = > ⇒ ≠

α = α + = +

=

Second-Order Random Variable: random variables with finite second moment (variance, power, size) as

Vector SpaceBanach SpaceHilbert Spaceinner product

norm, completelinear, closed


2 22 XE[X ] x f (x)dx

∞

−∞

µ = = < ∞∫

The vector space of second-order random variables with the inner product below is Hilbert Space.

1/ 2 2

: X, Y E[XY]

: X X, X E[X ]

: X, Y X Y

Inner product

Induced Norm

Schwarz inequality

=

= = < ∞

≤ ⋅

Linearly Independent: if and only if

c1X1 + c2X2 +…+ cnXn = 0 implies c1=c2 =…=cn= 0

which means all Xk can not be expressed as a linear summation of other X’s.

Statistical Independent: if and only if

1 2 n k

n

X ,X , ,X 1 2 n X kk 1

F (x , x , , x ) F (x )=

= ∏… …

If all Xk have µ = 0 and µ2 < ∞ then statistical independence implies linearly independence too, but not conversely. For example, X and X2 are linearly independent but not statistical independent.


Gram-Schmidt Orthogonalization

Given X1, X2, …, Xn , find a set of orthogonal vectors (inner product = 0)

X2V2

V1 = X1

1 1

2 12 2 1

1 1

k 1k j

k k jj 1 j j

V XX ,V

V X VV ,V

X ,VV X V

V ,V

−

=

=

= − ⋅

= − ⋅∑

If all X1, X2, …, Xn are linearly independent, we will get V1, V2, …, Vn as an orthogonal set. Otherwise, Vk may equal 0 for some k then skip that Xk .

Random Processes

Definition: A Random Process or Stochastic Process is a family of random variables Xt | t ∈ T, all based on the same (Ω, A, P). The set T is the parameter set of the random process.

We can interpret set T as time and a random process is a system that outputs Xt at each time t ∈ T. Xt can be statistical dependent or independent on the others.


Xt is random variable while xt is the value of Xt (just one case of Xt that happened). When we said all Xt are identical, it means that all the random variables have the same cdf or pdf but not the same value xt.

Examples • Throwing 2 dices at a time, let Xt for t = 1,2,3, … is

the sum of both dices thrown at each t. In this case t is discrete and all Xt are identical and statistical independent.

• Random walk: let Xt for t = 1,2,3, … is the distance you have moved from the original position. At each time t, we toss a coin then move forward one step if getting H and backward one step for T. Xt will now depend on (and only on) Xt-1. Because Xt must be either Xt-1+1 or Xt-1-1.

• Wiener process: is a random walk process with the lim ∆t → 0, ∆t = tn – tn-1. Step size must be reduced proportionally and the process will be continuous in time. The Wiener process in 3-dimensional space is called Brownian process.

discrete parameter (or time) if t is discrete continuous parameter (or time) if t is continuous discrete state (or value) if X is discrete continuous state (or value) if X is continuous


Markov Processes are the random processes that the next

m 1tX+

and all futures will be statistically dependent on (and only on) the present

mtX (not

m 1tX−

or earlier).

t t t t t t t t t tm 1 m 2 n 1 2 m m 1 m 2 n mX ,X , ,X |X ,X , ,X X ,X , ,X |Xf f+ + + +

=… … … Gaussian Processes are the processes that their joint pdf of all

ntX are Gaussian distributions. Given any set t1, t2, …,tn in ascending order from T for any integer n, there exists an nxn Autocovariance Matrix C and an n-dimensional vector µ such that

1 2 n

1/ 2T 11

2n / 2

T

t t t

T

f ( ) exp ( ) ( )(2 )

X X X

E[( )( ) ]

−−= − − −

π

= = − −

X

CX X µ C X µ

X

C X µ X µ


Chapter 2 1-D and 2-D Gaussian Distribution

1-D Gaussian Distribution has the pdf as

2

2

2

(x )1 2

2f (x) e ; f (x)dx 1

−µ ∞−σ

πσ−∞

= =∫

Normalize x by using (x )u −µ

σ= and the normalized pdf of u will become

2u1 22

(u) e ; (u)du 1∞

−

π−∞

φ = φ =∫

Note that any pdf must have total area under curve =1. And because of symmetry, half of the area is

12

0

f (x)dx (u)du∞ ∞

µ

= φ =∫ ∫

To calculate probability of x in the interval [a,b)

2 2

2

2

(x ) (b ) /b u1 12 2

22a (a ) /

P a X b e dx e du−µ −µ σ− −σ

ππσ−µ σ

≤ < = =∫ ∫

X

fX(x)

µ

σ

0

u

φ(u)

1

0


We can create a table of integration

y

0

(y) (u)duΦ = φ∫

and calculate the probability of interval [a,b] as

|b | |a |P a X b ( ) ( )−µ −µσ σ≤ < = ±Φ ± Φ depended on a,b.

2-D Gaussian Distribution for two random variables with joint Gaussian pdf as

[ ]

( )( ) ( ) ( )

( ) ( )( )

( )( ) ( )

1 2X ,X 1 2

1/ 21 111

1 1 2 222 2

1 11 1 2 2

2 2

21 1 1 1 2 2

22 2 1 1 2 2

11 12

21 22

f (x , x )

xexp x x

x2

xE x x

x

E x E x x

E x x E x

c cc c

covariance matr

−− − µ

= − − µ − µ − µπ − µ

= − µ − µ − µ − µ − µ − µ = − µ − µ − µ

= =

CC

C

ix

u

φ(u)

y0

Φ(y)


Covariance matrix C is symmetric (c12 = c21) and also its inverse C-1 and |C| is its determinant.

222 121 111 22 12

12 11

c c; c c c

c c− −

= = − − CC C

X1 and X2 are uncorrelated (or orthogonal) if c12 = E(x1-µ1) (x2-µ2) = 0. In case of Gaussian, uncorrelated implies statistical independent.

( ) ( )( ) ( ) ( )

2 21 1 2 2

1 2 11 22

2 21 1 2 2

11 22

1 2

x x1X ,X 1 2 2 c c

11 22

x x2c 2c

11 22

X 1 X 2

1f (x , x ) exp2 c c

1 1exp exp2 c 2 c

f (x )f (x )

−µ −µ

−µ −µ

= − +π

= − −π π

=

X1

f(x1,x2)

µ10

σ1

X2

σ2

µ2

X1µ10

σ1X2

σ2µ2

2-D Gaussian with c12 = c21 = 0, c11 = σ12, c22 = σ2

2


To calculate the probability of a rectangular area, let

1 1

2 2

x,

xµ

= = µ X µ

( ) ( )

1 2

1 2

1 2

1 2

b b

1 1 1 2 2 2 1 2 1 2a a

1/ 2 b bT 11

1 22a a

P a x b ,a x b f (x , x )dx dx

exp dx dx ;2

−−

≤ < ≤ < =

= − − −π

∫ ∫

∫ ∫C

X µ C X µ

And again, the total volume 1 2 1 2f (x , x )dx dx 1∞ ∞

−∞ −∞

=∫ ∫

In 2-D, the symmetry may not be straight forward. If µ1 = µ2 = 0 and X1 , X2 are uncorrelated (c12 = 0), the volume under pdf surface over one quadrant will be

11 2 1 2 1 1 2 2 4

0 0 0 0

f (x , x )dx dx f (x )dx f (x )dx∞ ∞ ∞ ∞

= =∫ ∫ ∫ ∫

This is not true if X1 and X2 are correlated (c12 ≠ 0) as in the following example.


Integrating over a quadrant , let µ1= µ2 = 0 and 5 33 2

=

C then 1 2 3

1 ;3 5

− − = = −

C C

[ ]

( ) ( )

111 2 1 22

2

2 211 1 2 22

2 23 11 2 22 4

x2 31f (x , x ) exp x xx3 52

1 exp 2x 6x x 5x21 exp x x x

2

− = − −π

= − − +π

= − − −π

Let 3 11 2 22 2y x x ; z x= − = then the Jacobian equals

1 2

1 2

y y 3x x 2 1

21z z2x x

1J

0

∂ ∂∂ ∂

∂ ∂∂ ∂

−= = = and the integration will be

( ) ( )

1 2 1 20 0 0 3z

2 2 1 13

0 3z

I f (x , x )dx dx f (y, z)2dydz

1 exp y z dydz tan 0.4488

∞ ∞ ∞ ∞

−

∞ ∞−

−

= =

= − − = − =π

∫ ∫ ∫ ∫

∫ ∫

X1

2x12-6x1x2+5x2

2

= const.X2

z

y + 3z = 0

y

[-3z,inf)


Chapter 3 Multi-dimensional Gaussian Distribution

Let X = [X1 X2 … Xn]T be an n-dim vector of random variables, µ = [µ1 µ2 … µn]T a vector of constants and C = [cij], i, j = 1,2, … , n a positive definite symmetric matrix (xTCx ≥ 0 for all x). The n-dim Gaussian pdf is

1/ 2T 11

2n / 2f ( ) exp ( ) ( )(2 )

always 0, if positive definite≥−−

= − − − π

C

X

Cx x µ C x µ

|C| = determinant of C. Let |dx| = dx1dx2…dxn then

1 2 n

1 2 n

b b b

1 1 1 n n na a a

P a x b , ,a x b f ( ) d≤ < ≤ < = ∫ ∫ ∫ x x…

Theorem: If X is an n-dim random vector with n-dim joint Gaussian distribution described above, then its mean EX = µ and covariance matrix E(X-µ)(X-µ)T = C

C must always be positive definite matrix because it is square of real number and symmetric matrix because the multiplication is commutative.


All positive definite symmetric matrix can be factored as C = LDLT where

D =

1

2

n

dd

d

0

0

, L = 23

31 32

n1 n2 n3

1l 1l l 1

l l l 1

0

is a diagonal and lower triangular matrix respectively. L has all its diagonal elements equal 1 so that |L| = 1, and |C| = |D|. Then C-1 = (LT)-1D-1L-1. If we substitute the random vector X by Y with y = L-1(x-µ) which has unit Jacobian ⇒ |dx| =|dy| then

1 2 n

1 2 n

T

1 2 n

1 2 n

1/ 2 b b b

n / 2a a a

T T 1 1 112

1/ 2T 11

2n / 2

P(2 )

exp ( ) (L ) L ( ) d

exp d(2 )

−

− − −

− β β β−

α α α

≤ < =π

− − −

= −π

∫ ∫ ∫

∫ ∫ ∫

y y

Ca x b

x µ D x µ x

Dy D y y

αi and βi are not constant but functions of yk, k<i. 2102-502 Random Signals and Systems, Chulalongkorn University Page 24

From the same example from chapter 2, we get

C = 5 33 2

= LDLT with L = 35

1 01

, D = 15

5 00

and L-1 = 35

1 01

−

, D-1 = 15 00 5

. Instead of using

312

122

1 xy0 xz

− =

to decorrelate y and z, we use

11 113 3

1 22 25 5

1 0 y xx xyL

1 z x xx xz−

′′ = = = ⇒ ′′ − = − +

and

get the new uncorrelated joint Gaussian pdf as

2 25110 2

1f (y ,z ) exp y z2

′ ′ ′ ′= − −π

. Then

( )

35

35

1 2 1 20 0 0 y

2 25110 2

0 y

I f (x , x )dx dx f (y ,z ) dz dy

1 exp y z dz dy 0.44882

∞ ∞ ∞ ∞

−

∞ ∞

−

′ ′ ′ ′= =

′ ′ ′ ′= − − =π

∫ ∫ ∫ ∫

∫ ∫

z

y + 3z = 0

y

[-3z,inf)

y'

3y' + 5z' = 0

z'

[(-3/5)y',inf )


Conditional Density Function Let n+m random variables have joint Gaussian pdf with x = [x1 x2 … xn]T and y = [y1 y2 … ym]T as the as the first n and last m random variables respectively. The covariance matrix C can be partitioned as

=

XX XY

YX YY

C CC

C C where C is (n+m)×(n+m), CXX is

n×n, CXY = CYXT is n×m and CYY is m×m matrix. The

(n+m)-dim mean vector is

X

Y

µ

µ where µX and µY are

n and m-dim mean vectors for x, y respectively.

1/ 2

(n m) / 2

T T 112

f ( , )(2 )

( )exp ( ) ( )

( )

−

+

−

= ⋅π

− − − − −

XX Y

Y

Cx y

x µx µ y µ C

y µ

Marginal integration:

m

1

n

2

f ( ) f ( , ) d

f ( ) f ( , ) d

∞ ∞

−∞ −∞

∞ ∞

−∞ −∞

=

=

∫ ∫

∫ ∫

x x y y

y x y x


Conditional Density Function of x Given y

C2

f ( , )f ( | )

f ( )=

x yx y

y

Both f(x,y) and f2(y) are Gaussian so fC(x|y) will also be Gaussian. fC(x|y) is the n-dim pdf and must have

n

Cf ( | ) d 1∞ ∞

−∞ −∞

=∫ ∫ x y x

fC(x|y) must be of the form

( ) ( ) 1/ 2

T 11C 2n / 2f ( | ) exp

(2 )

−−= − − −

πP

x y x m P x m

Both n×n covariance matrix P and n-dim mean vector m can be functions of the given y (in fact only m is).

How to get P and m from the known parameters CXX, CXY, CYY, µX, µY and given value y.

Matrix Inversion Lemma (proof in text pp. 35-38)

From

=

XX XY

YX YY

C CC

C C we have 1−

=

XX XY

YX YY

A AC

A A


Where

( )

( )

-1-1

-1 -1 -1

-1 -1

T

-1-1

-1 -1 -1

-

- -

-

=

= +

= =

=

=

= +

XX XX XY YY YX

XX XX XY YY YX XX

XY XX XY YY XX XY YY

YX XY

YY YY YX XX XY

YY YY YX XX XY YY

A C C C C

C C C A C C

A A C C C C A

A A

A C C C C

C C C A C C

( )( ) ( ) ( )

( )( ) ( )( )( )( ) ( )( )

n m

TT

TTn m

TT

: E f ( , ) d d

: E

f ( , ) d d

mean

covariance

+∞ ∞

−∞ −∞

+∞ ∞

−∞ −∞

= =

− − − −

− − − − =

− − − −

⋅

=

∫ ∫

∫ ∫

X

Y

X

X Y

Y

X X X Y

Y X Y Y

XX XY

µxXx y x y

µyY

X µX µ Y µ

Y µ

x µ x µ x µ y µ

y µ x µ y µ y µ

x y x y

C CC

=

YX YY

CC


Because f2(y) is Gaussian, its pdf will be

( ) ( ) 1/ 2

T 112 2m / 2f ( ) exp

(2 )

−−= − − −

πYY

YYY Y

Cy y µ C y µ

Then the conditional pdf of x given y is

( ) ( ) ( )

( ) ( ) ( )

1/ 2 m / 2

C 1/ 2 (n m) / 22

T T 112

T 112

1/ 2

T12n / 2

TT 12

f ( , ) (2 )f ( | )f ( ) (2 )

( )exp ( ) ( )

( )

exp

exp ( ) ( )(2 )

( )

−

− +

−

−

−

π= = ⋅

π

− − − − − ⋅− − −

= ⋅ − − −π

− − − − − −

+

YY

XX Y

Y

YYY Y

YYXXX X

XY YYX Y Y Y

x y Cx y

y C

x µx µ y µ C

y µ

y µ C y µ

C Cx µ A x µ

x µ A y µ y µ A y µ

( ) ( )

( )( )

T 112

T T12

k exp

Linear inQuadratic in

other terms without

−− −

= ⋅ − + − −

+

YYY Y

xx

XX XX XYX Y

y µ C y µ

x A x x A µ A y µ

x…


Conditional Mean and Covariance

By matching the coefficients of quadratic and linear terms of x to find P and m, we get

conditional covariance matrix: P = AXX

-1 = CXX – CXYCYY-1CYX

conditional mean: m = µX + CXYCYY

-1(y - µY)

The conditional mean m is linear function of y while the conditional covariance matrix is constant.

X

Y

Uncorrelated

µX=µY=0

Y=y

fC(x|Y=y)

m=µX

P

X

Y

Y=y

fC(x|Y=y)

m

P

Correlated

µX=µY=0

m=(σ2 XY

/σ2 YY

)y


Conditional Mean ≡ Bayesian Estimation

For any joint pdf (not necessary Gaussian) f(x,y), if X and Y are correlated, knowing Y=y will give some informations on X. We then can estimate X from Y. Estimation of X: ( )=X g Y Estimation error: = −e X X Loss function: L( )−X X is a real, non-negative, convex function (i.e., has minimum). We want to find the optimal function g( )⋅ that minimize L( )−X X . If

L(e) is 2e , square of Euclidean norm, this is called Minimum Mean Square Error (MMSE) criterion.

n m

m n

2 2

2

T T TC 2

A B C

: E L( ) E E ( )

( ) f ( , ) d d

2 ( ) ( ) ( ) f ( | ) d f ( ) d

− = − = −

= −

= − +

∫ ∫

∫ ∫

X X X X X g Y

x g y x y x y

x x g y x g y g y x y x y y

R R

R R

minimize

nC( ) f ( | ) d= ∫Define : m y x x y x

R conditional mean

and


n

2

CV( ) ( ) f ( | ) d= −∫Define : y x m y x y xR

total variance

Find CA f ( | ) d∫ x y x , from the identity,

n

TT T T

T TC

( ) ( ) 2 ( ) ( ) ( ) ( )

f ( | ) d V( ) 2 ( ) ( ) ( )

↓↓

= − − + − +

= + −∫

x x x m y x m y m y x m y m y m y

x x x y x y m y m y m yR

2( ) + m y

Find CB f ( | ) d∫ x y x

n

T TC2 ( ) f ( | ) d 2 ( ) ( )− = −∫ g y x x y x g y m y

R

Find CC f ( | ) d∫ x y x

n

2 2

C( ) f ( | ) d ( )=∫ g y x y x g yR

Substitute part A , B and C and get E L( )−X X

m

m m

2 2T2

2

2 2

( ) ( ) ( )

V( ) ( ) 2 ( ) ( ) ( ) f ( ) d

V( )f ( ) d ( ) ( ) f ( ) d

not depend on non-negative, min if

↓

=

= + − +

= + −

∫

∫ ∫g y g y m y

y m y g y m y g y y y

y y y m y g y y y

R

R R


We can see that the optimum Bayesian MMSE estimator of X from given Y is the conditional mean,

opt ( )=X m Y

Gaussian case: 1

XY YYX Y( ) ( )−= + −m y µ C C y µ

and V(y) is constant (not depend on y).

V(y) = trace of P = tr P = p11 + p22 + … + Pnn = sum of eigen value of P = total variance (scalar) of −X X because P is also the covariance matrix of −X X

Notations

EX|Y=y = m(y) : not random variables but EX|Y = m(Y) : are random variables

In case of Gaussian,

1XY YYX Y

( )−= + −m µ C C y µ : expected value of X but

1XY YYX Y

( )−= + −X µ C C Y µ : new random variables


Linear MMSE Estimator

Consider simplest case, let X and Y are two correlated random variables with arbitrary joint pdf. If we only want the estimator linearX from given Y=y to be linear function of y, we call the optimum estimator linear,optX as Linear MMSE Estimator. The solution will be in linear form, linear,opt oX h Y= , assuming all zero means. To solve the optimum ho, we can use the Hilbert space of second order random variables to find the solution. Because linearX is linear functions of Y, it must be in the subspace spanned by Y. Then the optimum vector in this subspace that gives the shortest error vector will be the orthogonal projection of vector X on this subspace. Then the error vector linearX X− must be orthogonal to Y,

( ) ( )linear,opt o

o o

XY XYlinear,opto

YY YY

0 X X ,Y X h Y ,Y

0 X,Y h Y,Y X,Y h Y,Y

X,Y C Ch X YY,Y C C

= − = −

= − = −

= = ⇒ = ⋅

X

Y

hoY

error

For Gaussian, Linear MMSE is also Bayesian MMSE estimator (conditional mean, the best of all)


Chapter 4 Finite Random Sequences

Random Process: is a family Xt|t∈T of random variables, all defined on the same (Ω, A, P). T is parameter set of random process. For EE’s applications, we can treat set T as a set of time.

Random sequence ≡ Discrete time process

Successive Viewpoint

Let X1, X2, …, Xn be a sequence of n Gaussian random variables. For each Xk , we must have pdf as

2k2

k2

k

(x )21

k 2f (x ) e

−µ−

σ

πσ=

µk = mean and σk2 = variance of Xk. If all Xk’s are

mutually independent random variables, i.e., all pairs, triples, quadruples, … of Xk’s are independent, then only µk and σk

2, k=1,2,…,n are enough to define this random sequence (or discrete time random process). Mutual independence means that we can not know anything about Xk from knowing all other X’s. For Gaussian distribution, pairwise independence implies mutual independence. This is not true for other cases.


TH TT

HH HT A

CB

P(A) = P(B) = P(C) = 1/2 P(A∩B) = P(A)P(B) = 1/4 P(B∩C) = P(B)P(C) = 1/4 P(C∩A) = P(C)P(A) = 1/4 P(A∩B∩C) = 1/4

≠ P(A)P(B)P(C) = 1/8 A, B, C are pairwise independent but not mutually independent. Knowing that one of them has happened will not tell anything about two others. But knowing that two of them has happened will fix the last one.

That is, for Gaussian: E(Xi-µi)(Xj-µj) = 0 for i ≠ j, being pairwise uncorrelated ⇒ pairwise independent ⇒mutually independent. This type of processes are simple but not useful, because we cannot make any statistical inference. The normalized sequence of uncorrelated Gaussian random variables W1, W2, …, Wn with all µk = 0 and σk

2 = 1 is called Unit White Gaussian Noise (u.w.g.n.). All Wk’s are independent identically distributed (i.i.d.) random variables.

In general, the sequence of Xk’s are not independent and we need more parameters to describe the process. The covariances cij = E(Xi-µi)(Xj-µj), with cij = cji, will give the correlations of each pairs of X’s and cii is the variances of Xi. The covariance matrix C = [cij].


We can create a new random sequence by using linear combinations of random variables from other random sequence and get different characteristics (mean, covariance). The simplest case is by using u.w.g.n. to create a sequence with given µi and cij. That is given sequence Wi with

EWi = 0, EWi2 = 1 and EWiWj = 0 for i ≠ j

we want to create a new sequence Xi with

EXi = µi and E(Xi-µi) (Xj-µj) = cij

By factoring C into

( )( )

11 12 1n

21 22 2n T 1/ 2 1/ 2 T

n1 n1 nn

T1/ 2 1/ 2 T

11 11

21 221/ 2 22

n1 n1 nnnn

c c cc c c

c c c

,

d TT Td

,

T T Td

where

= = =

= =

= =

C LDL LD D L

LD D L TT

0 0

D T

0


we can create the sequence Xi as

1 11 1 1

2 21 1 22 2 2

k

k kj j kj 1

X T WX T W T W

X T W=

= + µ= + + µ=

= + µ∑

Z-1 Z-1 Z-1

+Xi

Wi Wi-1 W1

Tii Ti i-1

µi

Ti1Ti2

W2

...

Causal Linear Time-varying System (Filter) with random signal input at time t = i

Each time we feed the system with random sequence Wi, we will get random sequence Xi. Each output sequence is called a realization of Xi from sample space of all possible sequence called ensemble of realizations.


For each sequence Xi, we can not find any statistics out of it. We can find the time average on a sequence as 1

in X∑ but this is not the expectation EXi. To find the expectation (or statistical average) we must average over the ensemble, not the time. To get the mean of X1, we must get many many realizations Xi and take the average only the first value of each.

Examples If all µk=0 for k =1,2,…,n but all Xk’s are not independent (e.g., all Xk equals kX1). Then the average of Xk over one sequence will not be 0. But the average of Xk over the ensemble (many sequences) will give all µk=0.

The expectation operator E⋅ uses only the ensemble average, not time average.


Simultaneous Viewpoint

We treat a whole random sequence as a n-dimensional random vector instead of n random variables. For u.w.g.n vector W = [W1 W2 … Wn]T, we have

( ) ( )[ ]

n / 2

21 1W 22

TW

W

f ( ) exp

0 0 0 ( )

1 0 00 1 0

( )

0 0 1

zero vector

identity matrix

π= −

= =

= =

w w

µ 0

C I

To generate X = [X1 X2 … Xn]T from W with

EX = µ = [µ1 µ2 … µn]T, E(X-µ)(X-µ)T=C=TTT,

we get X = TW + µ.

Proof:

EX = TEW + µ = µ and E(X-µ)(X-µ)T = E(TW)(TW)T = TEW WTTT= TTT = C

Simultaneous viewpoint has more powerful tools for analysis. But if n→∞ then the successive view point may be more practical.


Chapter 5 Stationary Random Sequences

We will extend random sequence to infinity, -∞ …, X-2, X-1, X0, X1, X2, … ∞ ⇒ Xk for -∞ < k < ∞.

µk = E[Xk] , cij = E[(Xi-µi)(Xj-µj)] From simultaneous viewpoint, Xk is just a vector in infinite-dimensional space and C is an operator in that space. An important class of doubly infinite sequence is stationary random sequence:

All statistical parameters are time invariant (do not change under time translation)

For Gaussian sequence, only 2 statistical parameters needed to be defined, mean and covariance. If both are time invariant, the sequence is stationary. mean stationary: if µ = E[Xi] = E[Xi+k] = constant, covariance stationary: if cij = ci+k,j+k , for all i, j, k. That is E[(Xi-µi)(Xj-µj)] = E[(Xi+k-µi+k)(Xj+k-µj+k)] and if mean is also stationary, E[XiXj] = E[Xi+kXj+k].

00 01 02 03

01 00 01

02 01 00

03

c c c cc c c

,cc c cc

equal diagonal values

µ µ = =

µ

µ


For EE’s applications, we use t as time instead of i, j and use the notation as X(t) for a discrete-time random process. For stationary random process X(t)

define: mean µX = E[X(t)] and autocovariance cXX(τ) = E[X(t+τ) - µX][X(t) - µX]

Note that both are not functions of t (absolute time) so that the values will not change with time t. τ is just the time interval parameter.

White Noise Input to Discrete-Time System

Let V(t) be a u.w.g.n discrete-time random process,

V VV

1, 00 , c ( )

0, 0τ =

µ = τ = τ ≠

We can see that u.w.g.n. is a stationary process. We can generate new random process X(t) from V(t) by using discrete-time convolution. This is equivalent to filtering the u.w.g.n. V(t) with a filter. Let the filter has impulse response h(t) which must be bounded and

square summable, 2

t 0

h(t)∞

=

< ∞∑ . For causal system in

real life, h(t) is zero for all negative time and has to be defined for non-negative time only.


The filter’s output is

n 0: X(t) h(n)V(t n)

: E[X(t)] h(n) E[V(t n)]

convolution

mean

∞

=

= −

µ = = −

∑

n 0

XX

m 0 n 0

m 0 n 0

m 0 n 0

VV

0

: c ( ) E[X(t )X(t)]

E h(m)V(t m) h(n)V(t n)

E h(m)h(n)V(t m)V(t n)

h(m)h(n)E[V(t m)V(t n)]

h(m)h(n)c (n

autocovariance function

∞

=

∞ ∞

= =

∞ ∞

= =

∞ ∞

= =

=

τ = + τ

= + τ − −

= + τ − −

= + τ − −

=

∑

∑ ∑

∑∑

∑∑

m 0 n 0m)

difference∞ ∞

= =

+ τ −∑∑

cVV = 1 only if its argument = 0. Take the summation over m first and cVV will be all zero except only when m = n + τ . Finally, we get

XXn 0

c ( ) h(n )h(n)∞

=

τ = + τ∑


Because h(t) is causal impulse response, i.e., h(n) = 0 for n < 0, we can show that cXX(-τ)=cXX(τ). For τ = -5,

XXn 0

n 5

m 0

XX

c ( 5) h(n 5)h(n)

h(n 5)h(n) :

h(m)h(m 5) :

c (5)

h(negative) = 0

let m = n - 5

=

∞

=

∞

=

∞

=

− = −

= −

= +

∑

∑

∑

Cross-covariance Function

Let X(t) and Y(t) are two random processes with µX(t) and µY(t) respectively. Define cross-covariance

cXY(t1,t2) = E[X(t1) - µX(t1)][Y(t2) - µY(t2)]

Even if both X and Y are stationary, they may not be jointly stationary. For Gaussian process to become stationary, cXY must depend only on τ = t2 - t1 ,

cXY(τ) = E[X(t+τ) - µX(t+τ)][Y(t) - µY(t)] .

From previous example of generating sequence X(t) from sequence V(t) we have,


XV

n 0

n 0

c ( ) E[X(t )V(t)]

E h(n)V(t n)V(t)

h(n)E[V(t n)V(t)]

∞

=

∞

=

τ = + τ

= + τ −

= + τ −

∑

∑

XV VVn 0

c ( ) h(n)c ( n)∞

=

τ = τ −∑

From this, we can interpret cXV(t) as the output signal of the filter, with impulse response h(t), when we put cVV(t) as input. For u.w.g.n. V(t), its cVV(t) is delta function, cVV(τ-n) = 1 only if τ-n = 0⇒ cXV(τ) = h(τ) .

XXn 0

n 0

c ( ) E h(n)V(t n)X(t)

h(n)E[V(t n)X(t)]

∞

=

∞

=

τ = + τ −

= + τ −

∑

∑

XX VXn 0

c ( ) h(n)c ( n)∞

=

τ = τ −∑

If stationary, cVX(τ) = E[V(t+τ)X(t)] = E[X(t)V(t+τ)] = E[X(t-τ)V(t)]

= cXV(-τ)


So that for stationary X(t) and V(t), cVX(τ-n) = cXV(n-τ)

Then

XX XV

n 0

VVn 0 0

c ( ) h(n)c (n )

h(n) h( )c (n )

∞

=

∞ ∞

= λ=

τ = − τ

= λ − τ − λ

∑

∑ ∑

XX VVn 0 0

c ( ) h( )h(n)c (n )∞ ∞

= λ=

τ = λ − τ − λ∑∑

Again, for u.w.g.n. V(t), cVV(τ-n) = 1 only if τ-n = 0 and the summation over n will be all zero except only when n = τ + λ. In this case, we get the same as before

XX0

c ( ) h( )h( )∞

λ=

τ = λ τ + λ∑

h(t)

h(t)

h(t)

t

1

V(t)=u.w.g.n.

t

cVV (cVX)

in

0 t

h(t)out

XXn 0

XV

E[X(t)] 0

X(t) c ( ) h(n )h(n)

c ( ) h( )

∞

=

= τ = + τ τ = τ

∑

XV VVn 0

XX VXn 0

c (t) h(n)c (t n)

c (t) h(n)c (t n)

∞

=

∞

=

= −

= −

∑

∑

0

0Any V(t) not u.w.g.n.


Power Spectral Density Let X(t) be a discrete time waveform (deterministic or random), if we pass X(t) through a narrow band filter, center at frequency ω, the power output per ∆ω of the filter is called power spectral density (psd) of X(t). In the case of deterministic waveform this is |X(ω)|2, the square of discrete Fourier transform of X(t). But for random waveform X(t), we can not use |X(ω)|2 of single realization waveform because each different realization will give different result. The power spectral density of random signal must be defined on the statistics of random signal, not the signal itself.

Definition: The power spectral density of stationary discrete time random process is the discrete Fourier transform of its autocorrelation function

psd jXX XX[r ( )] r ( )e

∞− ωτ

τ=−∞

= τ = τ∑F

where autocorrelation rXX(τ) = E[X(t+τ)X(t)]

The autocorrelation rXX(τ) and autocovariance cXX(τ) will be the same if µX = 0. If µX ≠ 0, X(t) will have D.C. component equals to µX and

rXX(τ) = E([X(t+τ)-µX]+µX)([X(t)-µX]+µX) = cXX(τ) + µX

2


Let j

XX XX XX( ) [c ( )] c ( )e∞

− ωτ

τ=−∞

φ ω = τ = τ∑F be the A.C.

component of power spectral density and the D.C.

component is 2 2 j 2X X X[ ] e 2 ( )

∞− ωτ

τ=−∞

µ = µ = πµ δ ω∑F . Then

2XX XX X

2XX X

2XX X

[r ( )] [c ( ) ]( ) [ ]( ) 2 ( )

A.C. power D.C. power

power spectral density = τ = τ + µ

= φ ω + µ

= φ ω + πµ δ ω

F F

F

Note that the D.C. power spectral density is a delta function δ(ω) which is an infinite impulse at ω = 0 with area under the curve equals 1. This means that the signal power is concentrated at just one frequency. If X(t) is periodic (deterministic) signal, its psd will also be a delta function δ(ω-ωo) but at frequency ωo.

We can see that φXX(ω) must have non-negative value, φXX(ω) ≥ 0 for all ω, because the power density at any frequency must be positive. This is also the result of Bochner’s theorem that φXX(ω) ≥ 0 is necessary and sufficient conditions for matrix C of X(t) to become non-negative definite.


j t

t

j12

jXX

j1XX2

X(t )e

X( )e d

2E[X(t )X(t )]

c ( )e

XX XX( )e d

X(t) X( )

| |

c ( ) ( )

∞− ω

=−∞

πωτ

π− π

∞− ωτ

τ=−∞

πωτ

π− π

ω ω

+ τ

τ

φ ω ω

∑→ ω←

∫

↓ ↓ ⋅

∑→τ φ ω←

∫

Review of Constant Parameter Discrete-Time Deterministic Linear System Theory

Let y(t) be a deterministic discrete-time sequence, for t = 0, 1, 2, …, ∞. Define its Z-transform as

t

t 0

Y(z) y(t)z∞

−

=

= ∑

compare to DFT : j t

t

Y( ) y(t)e∞

− ω

=−∞

ω = ∑ which gives

the signal analysis of Y(ω) as linear combinations of all frequencies e-jωt at amplitude y(t).

Z-transform is the tools for describing the system operations on inputs to get outputs. In Z-transform, each z-t has the physical meaning of delaying signal by t time steps.


Consider a sequence y(t+1), for t = 0, 1, 2, …, ∞. This is a left shifted version of y(t) and delete out y(0). The Z-transform of this sequence becomes

t (s 1) s

t 0 s 1 s 1y(t 1)z y(s)z z y(s)z

z[Y(z) y(0)]

∞ ∞ ∞− − − −

= = =

+ = =

= −

∑ ∑ ∑

The operation of z in front of the bracket means left shifted, opposite to z-1 which is right shifted or delay.

Example Linear Difference Equation: y(t+1) + a y(t) = 0 with initial condition: y(0) = c Z-transform: z[Y(z) – y(0)] + a Y(z) = 0

2 3

2 3

t t t

t 0

c z c a a aY(z) c [1 ]az a z z z1z

Y(z) c ( a) z y(t) c( a)∞

−

=

= = = − + − ++ +

= − ⇒ = −∑

…

Z-1

+y(t)

-a t

y(t)

0

c

-ac-a3c -a5c

a2c a4c a6c


Let signal u(t) pass a discrete-time linear system (filter) with impulse response h(t), the output y(t) will be the convolution:

n 0

t

t 0 n 0

t

n 0 t 0

(r n)

n 0 r n

(r n)

n 0 r 0

n

n 0

y(t) h(n)u(t n)

Y(z) z h(n)u(t n) :

h(n) u(t n)z

h(n) u(r)z :

h(n) u(r)z :

h(n)z

Take Z-transform

Let r = t - n

u(t) = 0 for t < 0

∞

=

∞ ∞−

= =

∞ ∞−

= =

∞ ∞− +

= =−

∞ ∞− +

= =

∞−

=

= −

= −

= −

=

=

=

∑

∑ ∑

∑ ∑

∑ ∑

∑ ∑

∑ r

r 0

u(r)z

H(z)U(z)

∞−

=

=

∑

j t

t

t

t 0

: F( ) f (t)e

: F(z) f (t)z

Discrete Time Fourier Transform

Z-Transform

∞− ω

=−∞

∞−

=

ω =

=

∑

∑


Input-Output Relations for Spectral Densities

Analogy to φXX(ω) for cXX(τ), define the cross-spectral density for cross-covariance cXY(τ) as

jXY XY XY( ) [c ( )] c ( )e

∞− ωτ

τ=−∞

φ ω = τ = τ∑F . Then,

jXX XX

jVX

n 0

jVX

n 0

j ( n)VX

n 0

j n jVX

n 0

jVX

( ) c ( )e

h(n)c ( n) e

h(n) c ( n)e

h(n) c ( )e - n

h(n)e c ( )e

H(e ) ( )

let

∞− ωτ

τ=−∞

∞ ∞− ωτ

τ=−∞ =

∞ ∞− ωτ

= τ=−∞

∞ ∞− ω λ+

= λ=−∞

∞ ∞− ω − ωλ

= λ=−∞

ω

φ ω = τ

= τ −

= τ −

= λ τ = λ

= λ

= φ ω

∑

∑ ∑

∑ ∑

∑ ∑

∑ ∑

Similarly, φXV(ω) = H(ejω)φVV(ω)

And because cVX(τ) = cXV(-τ) ⇒ φVX(ω) = φXV(-ω) 2102-502 Random Signals and Systems, Chulalongkorn University Page 52

Then φXX(ω) = H(ejω)H(e-jω)φVV(ω) = |H(ejω)|2φVV(ω)

For the u.w.g.n. input v(t), cVV(τ) is delta function and φVV(ω) = 1 (constant for all frequencies). Then

φXX(ω) = H(ejω)H(e-jω) = |H(ejω)|2

Note that we use H(ejω) instead of H(ω) to emphasize that H(ejω) is in fact the H(z) with z = ejω (evaluate z on the unit circle). H(z) is called sample-data transfer function.

And φXX(ω) must be non-negative real value for all ω and symmetry with ω. But φXV(ω) can be complex value and non-symmetry.

Given required cXX(τ) or φXX(ω), how to find the filter that outputs X(t) from u.w.g.n. input, is conceptually by taking the square root of given cXX(τ) or φXX(ω) to get h(t) or H(z). But there are many solutions, most of them are not realizable and more than one realizable solutions. The realizable filter is the causal filter so that the impulse response must be zero for all t < 0. For discrete-time linear filter, this also means that all poles must be inside the unit circle of z-plane. The spectral factorization is to find the causal filter with transfer function H(z) that has |H(ejω)|2 = φXX(ω).


Factorization of Rational Spectral Densities

Definition: φXX(ω) is called rational if and only if it can be written as a ratio of two polynomials in ejω.

Theorem: Given rational φXX(ω), there exists rational function H(z) with all poles and zeros inside the unit circle in z-plane such that φXX(ω) = |H(ejω)|2.

This is equivalent to factorize the covariance matrix C into TTTor factorize the autocovariance cXX(τ) into

n 0

h(n)h( n)∞

=

τ +∑ .

Example If we start from simple FIR filter (all zeros, no pole) with impulse response: h(0)=1, h(1)=-0.60, h(2)=0.25 and h(t) = 0 for all t > 2 or t < 0. Then

H(z) = 1 – 0.60z-1 + 0.25z-2 H(ejω) = 1 – 0.60e-jω + 0.25e-2jω

H*(ejω) = H(e-jω) = 1 – 0.60ejω + 0.25e2jω So, φXX(ω) = |H(ejω)|2 = H(ejω)H(e-jω)

= 1.4225 – 0.75(ejω + e-jω) + 0.25(ej2ω + e-j2ω) = 1.4225 – 1.5cos(ω) + 0.5cos(2ω), for -π ≤ ω ≤ π

But if φXX(ω) is given and we want to reverse the calculations to find h(t), we must factor φXX(ω) as


t

h(t)

0

1.0

-0.6

0.25

z-plane

0.3+j0.4

0.3-j0.4

1.2+j1.6

1.2-j1.6

z1

z1*

z2=1/z1*

z2*=1/z1

φXX(ω) = e-j2ω[.25ej4ω-.75ej3ω+1.4225ej2ω-.75ejω+.25] = 0.25e-j2ω(ej2ω-0.6ejω+0.25)(ej2ω-2.4ejω+4) = 0.25e-j2ω(ejω-z1)(ejω- z1

*)(ejω- z2)(ejω- z2*)

where z1 = 0.3 + j0.4 and z2 = 1.2 + j1.6 = 1/z1*

We see that, apart from conjugate pairs, the zeros will also be in reciprocal pairs. If one zero is inside the unit circle, its reciprocal will be outside the unit circle but at the conjugate angle. Because when we take the conjugate of |H(z)| with z = ejω, z* will be e-jω = 1/ejω. φXX(ω) = |H(ejω)|2 = H(ejω)H(e-jω) will always have extra reciprocal zeros than the original zeros of H(z).

How do we know which one is the original one? We don’t. Both can be the original zeros of H(z). So both H(z) = z-2(z2-0.6z+0.25) = 1 - 0.6z-1 + 0.25z-2 or H(z) = 0.25z-2(z2-2.4z+4) = 0.25 – 0.6z-1 + z-2 will have the same φXX(ω). But we choose the one with zeros inside the unit circle because it can be inverted.


Z-1

+

v(t)

1

Z-1

x(t)

-0.6

0.25

Now, for the filter with both poles and zeros, the factorization will be the same for both nominator and denominators of H(z). But now we must choose the poles inside the unit circle only. The filter with poles outside the unit circle will be unstable or non-causal and can not be realizable.

In conclusion, we must select the inside half of the reciprocal pairs for both poles and zeros for the H(z) and the other half belongs to H*(z).

For filters with poles, the impulse response h(t) will be infinitely long, so that the filter is called infinite impulse response (IIR) filter. For example, for |a| < 1

( )1 2 2 3 3 t t

1t 0

1 1 az a z a z ( a) z1 az

∞− − − −

−=

= − + − + = −+

∑…

The polynomial degree 1 of pole is equivalent to polynomial degree ∞ of zeros. In implementations, the zeros are from delayed input sum but the poles are from delayed feedback output sum.


Chapter 6 Continuous Time Stationary Gaussian Processes

Notations X(t) one realization (waveform, function) of

stochastic (random) process X(⋅) ensemble of waveforms of random process X(t) ensemble of random variables evaluated at

time t (scalar random variable) stochastic process: ensemble of all possible random waveforms with probability distribution for them Define µ(t) = E[X(t)] c(t,s) = E[X(t) - µ(t)][X(s) - µ(s)] Now t is real variable. Let t1 , t2 , … , tn be a set of times. The value of random process X(⋅) at these times are n-dimensional vector [X(t1) X(t2) … X(tn)]T. If this random vector has n-dimensional Gaussian pdf

1/ 2

T 112n / 2

1 1 1 1 2 1 n

2 2 1 2 2 2 n

n n 1 n 2 n n

f ( ) exp ( ) ( )(2 )

(t ) c(t , t ) c(t , t ) c(t , t )(t ) c(t , t ) c(t , t ) c(t , t )

,

(t ) c(t , t ) c(t , t ) c(t , t )

−−= − − −

π

µ µ = = µ

X

Cx x µ C x µ

µ C


C must be symmetric, non-negative definite matrix. If it is true for any set of times, this is Gaussian process. In general, the techniques , e.g., spectral factorization, in chapter 5 and 6 can be used for any second-order processes. Because it uses only second order statistics in the processing (mean and covariance). Gaussian is just the simplest case of all second order process with the same second-order parameters. Covariance and Spectral Density Functions

• strict sense stationary: all moments of random process are time invariant (stationary)

• wide sense stationary: only the first two moments are required to be stationary, also called second-order stationary

Definition A second-order random process is called stationary if and only if

µ(t) = µ and c(t,s) = c(t-s) = c(τ)


Let X(⋅) and Y(⋅) be a zero-mean second-order stationary process. • autocovariance: cXX(τ) = E[X(t+τ)X(t)] • crosscovariance: cXY(τ) = E[X(t+τ)Y(t)]

• power spectral density: jXX XX( ) c ( )e d

∞− ωτ

−∞

φ ω = τ τ∫

• cross spectral density: jXY XY( ) c ( )e d

∞− ωτ

−∞

φ ω = τ τ∫

• inverse Fourier transform: j

XX XX1c ( ) ( )e d

2

∞ωτ

−∞

τ = φ ω ωπ ∫

jXY XY

1c ( ) ( )e d2

∞ωτ

−∞

τ = φ ω ωπ ∫

• symmetry: cXX(-τ) = cXX(τ) and φXX(-ω) = φXX(ω) cXY(-τ) = cYX(τ) and φXY(-ω) = φYX(ω)

• non-negativity: φXX(ω) ≥ 0 , -∞ < ω < ∞ • finite power:

2XX XX

1c (0) E[X (t)] ( )d2

∞

−∞

= = φ ω ω < ∞π ∫

The non-negativity comes from Bochner theorem that this is the necessary and sufficient condition for non-negative covariance matrix C.


Note that symmetric and non-negative definite C does not imply the Gaussian distribution, any second-order distribution must have these same properties.

Laplace Transform and Linear System Theory

Let h(t) be the impulse response of a continuous-time time-invariant linear dynamic system (filter).

Laplace transform: st

0

H(s) h(t)e dt∞

−= ∫

H(s) is called transfer function of the system (filter). The relationship between Laplace and Z transform with Fourier transform is that the first two are for system description (or processing) but the last one is for signal analysis. For the causal system in real life, it can not take the future input value for computing the output. That’s why the integration or summation for the system (Laplace or Z transform) will be only from 0 to ∞. But for the signal analysis (or decomposition), the Fourier integral can cover the whole -∞ to ∞. Apart from this, all these transforms are related by

z = ejω = es Because of the nonlinear relationship between s and z, we can not convert polynomial of one variable into polynomial of other variable, only approximation.


Let u(t) and y(t) be two deterministic signals defined for t ≥ 0. Their Laplace transforms are

st st

0 0

U(s) u(t)e dt Y(s) y(t)e dtand∞ ∞

− −= =∫ ∫

If u(t) is input of filter with impulse response h(t). And y(t) is output of the filter. Let u(t) = 0 for t < 0.

0

y(t) h(r)u(t r)dr Y(s) H(s)U(s)and∞

= − =∫

Input-Output Relations for Stochastic Processes

Let X(⋅) be the input and Y(⋅) be the output of filter with impulse response h(t). Then for each

realization of X(t) we have 0

Y(t) h(r)X(t r)dr∞

= −∫

To find the statistics of output related to input, we first find cross-covariance and autocovariance functions. Then we can find the power spectral density and cross-spectral density from their Fourier transforms.


XY0

XX0 0

YX XY XX0

XX XX XX0

YX XX

XY YX

c ( ) E[X(t )Y(t)] E[X(t ) h(r)X(t r)dr]

h(r)E[X(t )X(t r)]dr h(r)c ( r)dr

c ( ) c ( ) h(r)c (r )dr

h(r)c ( r)dr h( ) c ( ) c

( ) H( j ) ( )( ) ( ) H( j

symmetry

∞

∞ ∞

∞

∞

τ = + τ = + τ −

= + τ − = τ +

τ = −τ = − τ

= τ − = τ ∗ τ

φ ω = ω φ ω

φ ω = φ −ω = − ω

∫

∫ ∫

∫

∫

XX XX

YY0

YX0 0

YX YY0

XY XY YX0

XY

YY XY

) ( ) H ( j ) ( )

c ( ) E[Y(t )Y(t)] E[Y(t ) h(r)X(t r)dr]

h(r)E[Y(t )X(t r)]dr h(r)c (r )dr

h(r)c (r )dr c

h(r)c ( r)dr c ( ) c ( )

h( ) c ( )

( ) H( j ) (

symmetry

∗

∞

∞ ∞

∞

∞

φ −ω = ω φ ω

τ = + τ = + τ −

= + τ − = + τ

= − τ

= τ − τ = −τ

= τ ∗ τ

φ ω = ω φ ω

∫

∫ ∫

∫

∫

XX2

XX

) H( j )H ( j ) ( )

H( j ) ( )

∗= ω ω φ ω

= ω φ ω2102-502 Random Signals and Systems, Chulalongkorn University Page 62

Spectral Factorization and Paley-Wiener Criterion

If we want to create the stochastic process Y(⋅) with power spectral density φYY(ω), how to find the transfer function H(jω) of the filter that output Y(⋅) from known input process X(⋅). The simplest X(⋅) is white noise V(⋅) with φVV(ω) = 1, -∞< ω <∞. The inverse Fourier transform of 1 is Dirac delta function

, t 0(t) (t)dt 1

0 , and

elsewhere

∞

−∞

∞ =δ = δ =

∫

So, cVV(τ) = δ(τ) will have infinite power (variance). 2

VV VVc (0) E[V (t)] ( )d (0)∞

−∞

= = φ ω ω = δ = ∞∫

This means that there is no true white noise in real life even though we always use it in theoretical analysis. In practice, the signal always have limited power and the white noise we used is actually band-limited, i.e., φVV(ω) = 1 up to some frequency ωmax which is much higher than operating frequency. Beyond that φVV(ω) will go down to zero. If we apply V(t) as input , then the output power spectral density will be

φYY(ω) = |H(jω)|2

Again, obviously φYY(ω) ≥ 0 and φYY(-ω) = φYY(ω).


But not all non-negative symmetric φYY(ω) can be factored into |H(jω)|2. If it can be factored, Y(t) is realizable and H(s) must be the transfer function of a (continuous-time) stable causal linear system. That is H(s) must have no pole on the right half of s-plane. But all non-negative symmetric rational φYY(ω) can always be factored. All the poles and zeros of φYY(ω) must have opposite pairs, e.g., a+jb opposite to -a-jb. Because when we take the conjugate of H(s) and evaluate with s = jω, s∗ will become -jω = -s and will create opposite poles and zeros to the original ones.

Im

Re

Im

Re

s-plane s-planeφ(s) Η (s)

Then, H(s) will be the rational function with all left half plane poles and zeros of φYY(ω), and the all the right half belongs to H∗(s). In fact, we can also choose some or all right half plane zeros instead and still get the stable causal filter with the same power spectral density. But it will not be invertible. The system H(s) with all left half plane zeros (and poles) is called minimum phase system.


The inverse system H-1(s) will accept input Y(⋅) and output u.w.g.n so this system is called whitening filter while H(s) is called innovation filter.

H-1(s) H(s)whitening filter innovation filter

Y(t) Y(t)V(t)u.w.g.n.

For non-rational φYY(ω), H(s) will not be rational and the system (filter) will be infinite order. The necessary and sufficient condition for φYY(ω) to be factored as |H(jω)|2 and H(s) is stable causal system is the

Paley-Wiener criterion: 2

log ( )d

1

∞

−∞

φ ωω < ∞

+ ω∫

This is more specific condition in frequency domain than just the non-negativity at all frequencies for the causal system. Obviously, φ(ω) must be non-negative. But more than that, it can not be 0 or ∞ continuously in some intervals of frequency. So the ideal low pass, band pass or high pass signals, which have the spectral density in the stop band equals to zero, will not satisfy Paley-Wiener criterion and can not be realizable. These signal will have non-causal impulse response h(t) in time domain too, h(t) ≠ 0 for t < 0.

ω

h(t)

t

φLP(ω )

Ω−Ω


Ergodic Process Stochastic process is a set of random variables Xt or X(t) where t is a parameter (discrete or continuous, finite or infinite set). If we imagine that a random process is a set of dices, each dice outcome is X(t). The whole vector Xt of outcome of all dices is one random object (realization, sequence, waveform). To find statistical parameters (mean, variance) of each Xt (each dice), we need many many trials (realizations) and find the statistics over these ensemble average.

If the process is stationary, these parameters will not change with the dices we are observing (translation invariant). But we still need many many trials for finding the statistics. We cannot tell the mean of each dice from just one outcome, even all of them has them same mean. The statistical average over a realization, called time average, may not give the correct answer. For example, if all the dices are locked to turn up the same face as each other, then the time average will be the same as throwing a dice just once to find its mean.

But if time average statistics are the same as ensemble average statistics, then we call that random process is Ergodic. Obviously, all ergodic process must also be stationary but not inversely.


Define 2

1

T

1 22 1 T

1X(T ,T ) X(t)dtT T

=− ∫

as time average of stationary process X(⋅). 1 2X(T ,T ) is also a random variable because it is linear function of random variables X(t). For Gaussian process,

1 2X(T ,T ) will also be Gaussian with mean

2

1

2

1

T

1 22 1 T

T

X X2 1 T

1E X(T ,T ) E X(t) dtT T

1 dtT T

=−

= µ = µ−

∫

∫

We can see that 1 2X(T ,T ) is a good estimator for µX because its mean equals µX. This property is called unbias estimator or statistics. But 1 2X(T ,T ) is random and its variance will determine how close to µX in average.

( )

2

1 2 1 2 X

2 21 2 1 2 X X

2 21 2 X

Var X(T ,T ) E X(T ,T )

E X (T ,T ) 2E X(T ,T )

E X (T ,T )

= − µ

= − µ + µ

= − µ


( )

( )

( )

( )

2

1

2 2

1 1

2 2

1 1

2 2

1 1

2T2

1 2 2T2 1

T T

2T T2 1

T T

XX2T T2 1

T T2

XX X2T T2 1

1where, E X (T ,T ) E X(t)dtT T

1 E X(t)X( ) dtdT T

1 r (t )dtdT T

1 c (t )dtdT T

= −

= λ λ−

= − λ λ−

= − λ λ + µ−

∫

∫ ∫

∫ ∫

∫ ∫

then, ( )( )

2 2

1 1

T T

1 2 XX2T T2 1

1Var X(T ,T ) c (t )dtdT T

= − λ λ− ∫ ∫

If we know cXX(τ), we can choose T1, T2 to get the variance as small as we need and 1 2X(T ,T ) can be almost the same as µX . But in practice we do not know cXX(τ) and all other ensemble average statistics, that is why we try to use time average. But if we know that the integral of cXX(τ) grows slower than (T2-T1)2 as T1→-∞ and T2→∞, then the Var(X) will be 0 as T→∞ and

T

XTT

1X lim X(t)dt2T→∞

−

= → µ∫


We can see that cXX(τ) must decrease fast enough in the ergodic process which means that X(t) and X(t+τ) must be almost independent of each other if τ is large enough. Then we can split one realization into several independent short segments as if they were different realizations. Then the ensemble averages between them are, in fact, time average. Similarly, we can define the time-average estimator for cXX or rXX as

T

XX TT

1R ( ) lim X(t )X(t)dt2T→∞

−

τ = + τ∫

In summary, the process is ergodic if and only if X = µX and RXX(τ) = rXX(τ)

Power Spectra for Deterministic Signals In many applications, the signals are not truly random but we may need the signal spectrum from observed unknown signals. We can also use the same technique as for random signal. That is, finding the time-average autocorrelation function first and calculates its Fourier transform to get the spectrum even if the signals are not random at all. Deterministic signal can be finite energy or finite power signal. The random signals and periodic signals must always be power signals. The finite duration signals must always be energy signals.


To deal with (deterministic) energy signals, we must re-define all the power terms into energy by removing the time averaging as

T2

TT

/ 22

/ 2T

xx TT

jxx xx

1lim x (t)dt ;2T

1 x (t)dt ;

1R ( ) lim x(t)x(t )dt ;2T

S ( ) R ( )e d ;

signal power

periodic, period

autocorrelation

power spectrum

→∞−

τ

−τ

→∞−

∞− ωτ

−∞

=

= τ =τ

τ = + τ

ω = τ τ

∫

∫

∫

∫

P

2

x

jx x

j t

x (t)dt ;

( ) x(t)x(t )dt ;

( ) ( )e d ;

X( j ) x(t)e dt ;

signal energy

energy autocorrelation

energy spectrum

signal Fourier transform

∞

−∞

∞

−∞

∞− ωτ

−∞

∞− ω

−∞

=

ρ τ = + τ

φ ω = ρ τ τ

ω =

∫

∫

∫

∫

E


The relationship between X(jω) and φx(ω) is

jx

j ( t )

j t j

2

( ) x(t)x(t )e dtd

x(t) x( )e d dt ; t

x(t)e dt x( )e d

X( j )X( j ) X( j )

let

∞ ∞− ωτ

−∞ −∞

∞ ∞− ω σ−

−∞ −∞

∞ ∞ω − ωσ

−∞ −∞

φ ω = + τ τ

= σ σ σ = + τ

= σ σ

= − ω ω = ω

∫ ∫

∫ ∫

∫ ∫

The energy spectrum is simply the magnitude squared of the Fourier transform of the signal itself.

For (deterministic) periodic signals, which are power signals, we must use the time-average functions in the same way as for random signals. But because of the periodicity, the autocorrelation will be periodic and both the Fourier transform and power spectral density will be delta function at harmonic frequencies.

Matched Filter is the filter with transfer function as the conjugate Fourier transform of the signal to be matched. If the signal is x(t) and its Fourier transform is X(jω), then the transfer function will be X(-jω).


Then the filter impulse response will be the mirror signal x(-t). The output spectrum Y(jω) of the filter with matched input x(t) will be

X(jω)X(-jω) = | X(jω)|2 = φx(ω) which is the signal energy spectrum. Therefore y(t) will be the signal autocorrelation function ρx(t).

y(t)=ρx(t)

t

x(t)

tT

h(t)=x(-t)

t-T

*00 0-T T

=

This filter will give the highest peak output for the matched input x(t), all other signals with the same energy will give lower output than this. But this filter is not causal and the filter must be modified. If x(t) has finite duration T, then we can shift h(t) to the right by T to make the filter causal, h(t) = x(T-t). And the transfer function will be X(-jω)e-jωT, or just add linear phase shift. In this case, the output y(t) = ρx(t+T) will give the peak output at time T instead.

y(t)=ρx(t+T)

t

x(t)

tT

h(t)=x(T-t)

t*00 0 2TT

=T


Chapter 8 Karhunen-Loeve and Fourier Series Expansion of Continuous-Time Processes

We have already known how to expand deterministic signals into infinite series, e.g., Fourier series, Taylor series. Now we want to expand random signals into infinite series, but in this case, the coefficients will be random variables, e.g., different realizations will give different sets of coefficients. Then we want to know the statistics of these random coefficients. If we deal only with zero mean, second-order random variables, we can use the Hilbert space of second-order random variables for our analysis. Orthogonal in this space is statistical uncorrelated and also statistical independent for Gaussian case. We will also use the Hilbert space of Time Functions on an Interval for deterministic part of the signals analysis too. The basis functions for the expansions will be orthogonal in this space.

L2[0,T] : The Hilbert space of all square integrable functions for the time interval 0 ≤ t ≤ T is the space of all waveform f(t) with

T2

0

f (t)dt < ∞∫


This space has infinite dimensions and any waveform f(t) can be thought of as a vector. We can choose the basis waveforms so that it has simpler representations. For example, f(t) = sin(t) for 0 ≤ t ≤ 2π will be simple form if we use the basis of Fourier series expansion which are sin(nT) for n = 0, 1, …, ∞. Then f(t) will align on one axis with only one coefficient. But if we use the basis of Taylor series, tn for n = 0, 1, …, ∞ , then f(t) still span on all infinite dimensional space. The inner product and induced norm of L2[0,T] is

T

01/ 2T

1/ 2 2

0

, f ,g f (t)g(t)dt

, f f , f f (t)dt

inner product

induced norm

=

= =

∫

∫

In n-dim space, if we multiply n-dim vector x by an n×n matrix A, we will get another n-dim vector y in the same space. Matrix A is called an operator (vector function of vector that maps a vector to another vector in the same space), y = Ax . In ∞-dim space, A will be ∞×∞ matrix and the matrix multiply by a vector gives a new ∞-dim vector with each element as the inner products of each row of the matrix with the vector.


T

0

, g(s) K(s, t)f (t)dt

, g Lf

integral operator

abstract form

=

=

∫

K(s,t) is called kernel function of integral operator. K(s,t) is a real function for 0 ≤ s ≤ T, 0 ≤ t ≤ T, in analogy to a square matrix of ∞×∞ dimensions. Where s is row index and t column index of K(s,t).

In n-dim space, we have eigenvalue/eigenvector of operator (or matrix) A such that Ax = λx In ∞-dim space, we have spectrum of any linear operator equivalent to a set of eigenvalue, which can be continuous or discrete set. The eigenvector is the waveform or signal that if input to the operator will result in the same waveform with different size. The amplitude gain is, in fact, the eigenvalue for that eigenvector (or eigen-waveform).

In unbounded linear system, we can see that the only waveforms that input to a filter and get output as the same waveform are all the sinusoidal waveform at all frequencies. These are the eigenvectors of the system and the gain at all frequencies, which is its frequency response, are in fact the eigenvalues. This is why we call these eigenvalues as the spectrum of an operator.


The class of operator that we are interested in, is the class of compact, self adjoint operator which has the properties of its kernel equivalent to symmetric and positive definite function. So that, the auto covariance function c(t,s) can be used as a kernel function of a compact, self adjoint operator.

Definition A linear operator L in Hilbert space will be self adjoint operator if and only if, for all pairs of

vectors f and g, we will get f , Lg Lf ,g= . For our case of real function in L2[0,T], self adjoint operator means that the kernel is symmetric, K(t,s) = K(s,t) .

Definition A linear operator L in Hilbert space will be compact operator if, for any orthonormal (O.N.)

set of basis en such that n m

0 , n me ,e

1 , n m≠

= =,then

nLe 0 nas→ → ∞

For example, the kernel K(s,t) = δ(s-t), which gives the identity operator equivalent to identity matrix In in n-dim space, is not a compact operator. Because for all n, Len = en will not converge to zero. While the

kernel that has eigenvalues as n1n

λ = , n = 1, 2, … ∞


is compact operator. Imagine the kernel as an ∞×∞

diagonal matrix with n1n

λ = on its diagonal. Then

n n1Le e 0 nn

as = → → ∞ .

But compact operators can also be asymmetric ( not

self adjoint), for example, n n 11Le en += . This operator

has rank zero and no eigenvalue.

12

1 13 2

13

1n

n n n n n n 1

1 1 01 1 0

1 0

1 01 1Le e Le e Le en n +

= = =

0 0 0

0 0 0

Theorem Every compact, self-adjoint operators which map Hilbert space into itself has at least one eigenvalue and at most countably infinite eigenvalues.

Theorem The eigenvectors of a compact, self-adjoint operator are mutually orthogonal.

These orthogonal eigenvectors (waveforms) can be used as the othonormal basis functions for expansion.


The Karhunen-Loeve Expansion

Given a second-order process X(⋅) with auto-covariance function c(t,s), the symmetric positive definite matrix, if we use c(t,s) as a kernel of integral operator, the set of non-zero eigenvalues λn will all be positive. φn is a set of normalized eigenfunctions (eigenvectors) for each λn that satisfy Lφn = λnφn and are mutually orthogonal because of the compact, self-adjoint operator.

T

n n n0

T

n m n m0

c(s, t) (t)dt (s) ,0 s, t T

1 , n m, (t) (t)dt

0 , n mand

φ = λ φ ≤ ≤

=φ φ = φ φ = ≠

∫

∫

If we use φn as orthonormal set of basis functions (vectors) for expanding the sample function X(⋅) of the process X(⋅), we have each projections,

T

n n n0

X, (t)X(t)dt , n 1,2,3,...α = φ = φ =∫

αn are functions of random variables and also random variables. Combine all the projections back to get,


n nn 1

X(t) (t)∞

=

= α φ∑

In fact the above expansion is true for all orthonormal basis vectors φn such as Fourier series. The special property of using eigenfunctions of c(t,s) as the basis vectors is that it gives statistical independent (for zero mean Gaussian random variables) αn as following.

T T

n m n m0 0

T T

n m0 0

T T

n m0 0T T

n m0 0

T T

n m0 0

n n m

E E (t)X(t)dt (s)X(s)ds

E (t) (s)X(t)X(s)dtds

(t) (s)E X(t)X(s) dtds

(t) (s)c(t,s)dtds

(t)c(t,s)dt (s)ds

(s) (s)ds

α α = φ φ

= φ φ

= φ φ

= φ φ

= φ φ

= λ φ φ

∫ ∫

∫ ∫

∫ ∫

∫ ∫

∫ ∫T

0

nn n m

,n m,

0 ,n mλ =

= λ φ φ = ≠

∫


This expansion is called Karhunen-Loeve expansion. Its basis functions (vectors) are orthonormal in the Hilbert space of square integrable functions, L2[0,T], and its expansion coefficients are also orthogonal (statistical independent) in the Hilbert space of second-order random variables. All other orthogonal expansions will not have both properties together. The

mean Eαn = 0 and and its variance 2n nE α = λ .

If c(t,s) is jointly continuous in both t and s. The expansion will converge statistically in the mean square sense, pointwise in time. This means that the mean square error →0 as n→∞ for each t.

2N

n nn 1

2N N2

n n n nn 1 n 1

N N N

n n n m n mn 1 n 1 m 1

T

n0

E X(t) (t) :

E X (t) 2 X(t) (t) (t)

c(t, t) 2 E X(t) (t) E (t) (t)

c(t, t) 2 E (s)X(s)dsX(t)

MSE of the first N terms=

= =

= = =

− α φ = − α φ + α φ

= − α φ + α α φ φ

= − φ

∑

∑ ∑

∑ ∑∑

N

nn 1

N N

n m n mn 1 m 1

(t)

E (t) (t)

=

= =

φ

+ α α φ φ

∑ ∫

∑∑


TN N

2n n n n

n 1 n 10TN N

2n n n n

n 1 n 10N N

2 2n n n n

n 1 n 1N

2n n

n 1

c(t, t) 2 (s)E X(s)X(t) ds (t) (t)

c(t, t) 2 (s)c(t,s)ds (t) (t)

c(t, t) 2 (t) (t)

c(t, t) (t) 0 Nas

= =

= =

= =

=

= − φ φ + λ φ

= − φ φ + λ φ

= − λ φ + λ φ

= − λ φ → → ∞

∑ ∑∫

∑ ∑∫

∑ ∑

∑

This comes from Mercer’s theorem which states that

n n nn 1

c(t,s) (t) (s)∞

=

= λ φ φ∑ and so, 2n n

n 1c(t, t) (t)

∞

=

= λ φ∑

This is equivalent to the eigenvalue decomposition of symmetric matrix into its own eigenvectors

nφ and

eigenvalues nλ .The modal matrix U = [1

φ 2

φ … n

φ ] is an orthonormal matrix of normalized eigenvectors.

T

T

11T

2 21 2 n

Tnn

=

φλ λ φ = φ φ φ λ φ

C U U

0

0

Λ


Mercer’s theorem is applicable onlty for jointly continuous autocovariance function c(t,s). If c(t,s) is not jointly continuous, the convergence of Karhunen-Loeve expansion will not be pointwise in t, but in the sense of L2[0,T]. This means that at some points of time, it may not have zero error. But the integration of error square from 0 to T will converge to zero instead.

That is 2T N

n nn 10

E X(t) (t) dt=

− α φ ∑∫ → 0 as N → ∞.

This requires the signal energy T

2

0

E X (t)dt

< ∞ ∫

or equivalently, T T

2

0 0

E X (t) dt c(t, t)dt= < ∞∫ ∫ .

This is the trace of autocovariance function (sum of covariance matrix diagonal elements) which equals to sum of all its eigenvalues. Then,

T T

2 2n n

n 1 n 10 0

E X (t) dt c(t, t)dt E ∞ ∞

= =

= = α = λ < ∞∑ ∑∫ ∫

If we sum only up to αN, the reconstructed waveform

will have the total energy N

nn 1=

λ∑ out of nn 1

∞

=

λ∑ and the

mean square error is just the energy we disregarded.2102-502 Random Signals and Systems, Chulalongkorn University Page 82

Example: Expansion of Brownian Motion

Brownian motion is the limited case of random walk as ∆t → 0. In term of white noise process V(⋅), the Brownian process B(⋅) is simply the time integral of white noise,

t

0

B(t) V( )d= τ τ∫

This is the accumulation of random steps in random walk process. If V(⋅) has zero mean then B(⋅) too.

t s

BB0 0

t s

0 0

t s

0 0t s

0 0

c (t,s) E B(t)B(s) E V( )d V( )d

E V( )V( )d d

E V( )V( ) d d

( )d d

s ,0 s tmin(t,s)

t ,0 t s

= = τ τ σ σ

= τ σ τ σ

= τ σ τ σ

= δ τ − σ τ σ

≤ ≤= = ≤ ≤

∫ ∫

∫ ∫

∫ ∫

∫ ∫


0 τ

σ

t

sδ(τ-σ)

Consider the K-L expansion of B(t) for 0 ≤ t ≤ T = 1. To find the eigenvalues of cBB(t,s) which must satisfy

the eigenvalues equation 1

BB n n n0

c (t,s) (t)dt (s)φ = λ φ∫

s 1

n n n n0 s

t (t)dt s (t)dt (s) (1)φ + φ = λ φ∫ ∫

1

n n ns

d; (t)dt (s) (2)ds

d both sidesds

φ = λ φ∫

2

n n n2

d; (s) (s)ds

d againds

− φ = λ φ

From (1) we get φn(0) = 0 and from (2) φ′n(1) = 0. This is wave equation with boundary value problem,

n n n nn

1(s) (s) 0 (0) (1) 0with′′ ′φ + φ = φ = φ =λ

And the solutions for n=1, 2, …∞ are 2

n n(2n 1) s 2(s) 2 sin

2 (2n 1)and − π

φ = λ = − π


Then the Brownian process has the K-L expansion as

nn 1

(2n 1) tB(t) 2 sin2

∞

=

− π= α∑

and the mutually orthogonal random variables αn

will have Eαn = 0 and 2

2n n

2E (2n 1)

α = λ = − π

.

Note that the basis waveforms of K-L expansion are not the same as of Fourier series, which are all the harmonic frequencies. But the K-L basis waveforms are matched to the Brownian process better by choosing the odd sub-harmonic frequencies. They will all start from zero and end with peak amplitude. Fourier series will always gives the same amplitude at both the start and end points while B(t) always start from zero but very unlikely to end there. Fewer number of K-L coefficients can approximate B(t) better than all other expansions.

0 1 t

0 1

φ1(t)=sin(π t/2)

t

0 1

B(t)

t

φ2(t)=sin(3π t/2)

0 1 t

0 1

sin(2π t), 1st harmonic

t

cos(2π t), 1st harmonic


Equivalence in Finite Dimensional Space

All the expansions are just the change of basis vectors for representing the unknown vector. This can be thought of as a change of variable technique. We may try to use the new representation (coefficients) that are uncorrelated by using C = LDLT technique. But the basis vectors in this case are not orthogonal, that is LTL ≠ In . The K-L expansion uses the C = UΛUT eigenvalue decomposition which has the orthogonal basis vectors, UTU = In . But the matrix U is not lower triangular, so that it cannot be computed in successive viewpoint. Followings are the equivalent of K-L expansion using matrix for finite dimensional space.

Gaussian pdf: 1/ 2

T 112n / 2f ( ) exp

(2 )

−− = − π

Cb b C b

covariance matrix: C = UΛUT , |U| = 1 , Λ = diag λi

change of variables: α = U-1b ⇒ bTC-1b = αTΛ-1α

uncorrelated pdf: 1/ 2

T 112n / 2f ( ) exp

(2 )

−− = − π

Λα α Λ α

orthonormal basis: UTU = In


Application for 1-D signal compression

Karhunen-Loeve expansion also has other names such as, principal component transform, Hotelling transform, eigenvector transform, Karhunen-Loeve transform (KLT). In DSP applications, we use finite dimensional process. Given 1-D signal, e.g., speech waveform, we always assume stationarity and also ergodicity on it. Then we can break the waveform into several pieces with only n points in each waveform. We will treat each piece of n points waveform as one realization of n-dim random sequence. Then we will have several realizations and can find the ensemble average mean µ and covariance matrix C from them, but in fact all come from just one big realization (time average). If C can be decomposed into UΛUT, then the transform matrix is U-1. If x is an unknown n-dim random vector (waveform), then the transform of x is α = U-1x and the random variables αn of vector α will be mutually independent with variances 2

n nE α = λ .

0 t

X(t)Realization #1 Realization #2 Realization #3 Realization #4


1 1

2 2T

8 8

11 12 18

21 2 281 2 8

81 82 88

1

82 1

nnn 1

8

xx

n 8, , ,

x

u u uu u u

, 1

u u u

,

8 eigenvectors

and−

=

λ λ = = = = λ

= = = α

α = = = = α α

∑

0

x C U U

0

U u u u U

U x x U u

Λ Λ

α α

The variances σ2 of xi are stationary and all equal to the diagonal value of matrix C because all of xi are identical. While the variance of αi, the eigenvalues λi ,

are all different. But 8 8

2 2i

i 1 i 1

8= =

λ = σ = σ∑ ∑ = total signal

energy. When some λi is bigger than σ2, some other λi must be smaller and can be discard as zero. Then the representation of each segment of x by α can reduce the data bit down. This property is called energy packing and KLT is the best energy packing of all.


Fourier Series Expansion of Random Signal

The Fourier series basis functions are ej2πnt/T/T for nth harmonics with the fundamental frequency f0 = 1/T. The basis functions are orthonormal in L2[0,T] but the Fourier series coefficients are not mutually orthogonal (uncorrelated). Let X(⋅) be zero mean, second-order process.

F.S. coefficients: T

j2 mt / Tm

0

X(t)e dt / T− πζ = ∫

F.S. expansion: j2 mt / Tm

m

X(t) e / T∞

π

=−∞

= ζ∑

cross-covariance: *m nE ζ ζ

T Tj2 mt / T j2 ns / T 2

0 0

E X(t)X(s)e e dtds / T− π + π =

∫ ∫

T Tj2 mt / T j2 ns / T 2

XX mn0 0

c (t,s)e e dtds / T− π + π= = γ∫ ∫

γmn is a 2-D Fourier series coefficient of cXX(t,s). For arbitrary cXX(t,s), the cross-covariance γmn will not be delta function (ζm and ζn will be correlated). But if we force γmn = βmδmn to find the properties of X(t) for this condition, then cXX(t,s) must be stationary. From the 2-D inverse Fourier series of γmn ,


j2 mt / T j2 ns / TXX mn

m n

j2 m(t s) / Tm

m

j2 m / TXX m

m

c (t,s) e e

e

c ( ) e

∞ ∞+ π − π

=−∞ =−∞

∞+ π −

=−∞

∞+ π τ

=−∞

= γ

= β

τ = β

∑ ∑

∑

∑

This is 1-D Fourier series expansion of cXX(t,s) but over the period of only T. And its F.S. coefficients are

T / 2j2 m / T

m XXT / 2

c ( )e d / T− π τ

−

β = τ τ∫

Note that we shift the integration interval to [-T/2,T/2] in stead of [0,T] to maintain the symmetry of cXX(τ). This shows that to get uncorrelated F.S. expansion of X(t), i.e., γmn as delta function, X(t) must have cXX(τ) as a periodic function of period T, i.e., cXX(τ) defined only for τ ∈ [-T/2,T/2] then repeat itself. But for any signal X(t) of length T, its cXX(τ) will have the length of 2T, for τ ∈ [-T,T]. So, cXX(τ) for |τ| >T/2 must be a replica of |τ| < T/2. This is not required for K.L.

0 τT

cXX(τ)

-T T/2-T/2

F.S.

K.L.

cXX(t,s)0 T/2

T/2

τt

s

T

T 2102-502 Random Signals and Systems, Chulalongkorn University Page 90

Chapter 11 Discrete-Time Kalman Filtering

For non-stationary process, the concept of Power Spectrum has no meaning and cannot be used any more. We must solve the problems in time domain. The tools we used are state-space method. This method describe the internal states of the system by differential or difference equation for continuous and discrete time respectively as,

( )

( )n 1 n n n 1 n r r 1

: (t) (t) (t) (t) t

: (t 1) (t) (t) (t) t

continuous time

discrete time× × × × ×

= +

+ = +

X A X B V

X A X B V

(t)V is r 1× vector of zero mean white Gaussian noise with nonstationary r r× covariance matrix

TE[ (t) (t)] (t)=V V Q (t)X is n 1× vector of internal states of the system at each time t. The initial state (0)X is Gaussian with zero mean and n n× covariance matrix 0P which is independent of (t)V . After the initial random state, the state will evolve randomly governed by the state equations above. But we can not observe the state

(t)X directly. Instead, we can only measure the output


according to each state as T

1 1 1 n n 1 1 1

Y(t) (t) (t) N(t)× × × ×

= +h X .

Where N(t) is white Gaussian measurement noise, independent of (t)X and (t)V , with zero mean and nonstationary variance R(t). In general, Y(t) can also be vector. But for simplicity, we will consider only the scalar output case above. Given the time-varying, deterministic parameters (t)A , (t)B , (t)h , stochastic parameters 0P , (t)Q , R(t) and observation sequence of output Y(t) : what is the MMSE estimation of the internal state (t)X based on all informations available until that time t. The solution is Bayesian Estimation which is the conditional mean.

(t) E (t) | Y(t), Y(t 1), Y(t 2), , Y(1)= − −X X …

Let [ ]T(t) Y(1) Y(2) Y(t)=Z is a random vector with variable dimension of measured output Y until time t. (t)Z will also be a Gaussian random vector with zero mean and t t× covariance matrix

Tt E (t) (t)=C Z Z (known by computations).


If we de-correlate using Ct = LtDtLtT factorization and

transform (t)Z into a new uncorrelated random vector [ ]T 1

t(t) (1) (2) (t) (t)−= ν ν ν = L Ζζ

with TtE (t) (t) = Dζ ζ : t t× diagonal matrix

Then ν(t) will be a sequence of uncorrelated random variables causally computable from Y(t). This dues to the lower triangular Lt and the fact that

(t 1)+Z add just one more element to (t)Z , then Ct+1 and Lt+1 will have one more row and column added to Ct and Lt . And so (t 1)+ζ will only have one more element added to (t)ζ without changing the rest too.

Let TXZ

n t n 1 1 t

(t) E (t) (t)

× × ×

=C X Z , TX

n t n 1 1 t

(t) E (t) (t)ζ

× × ×

=C X ζ

The conditional mean for Gaussian process is 1

XY YYX YE | ( )−= + −X Y C C Yµ µ

Then 1

XZ t

n 1 n 1 n t t t t 1

(t) E (t) | (t) (t)−

× × × × ×

= =X X Z C C Z

and also 1

X t

n 1 n 1 n t t t t 1

(t) E (t) | (t) (t)−ζ

× × × × ×

= =X X C Dζ ζ too.

Because (t)Z and (t)ζ gives the same information.


We will use (t)ζ because of simpler diagonal Dt . Let

[ ]t1 t2 ttX

n t n 1 n 1 n 1

(t)ζ

× × × ×

=C α α α , tk

n 1 n 1 1 1

E (t) (k)× × ×

= νXα

and then t

tkk 1 k

(k)(t) E (t) | (t)d=

ν= = ∑X X ζ α

If we use conditional information only up to time (t-1) to compute conditional mean of X(t), then

t 1

tkk 1 k

(k)E (t) | (t 1)d

−

=

ν− = ∑X ζ α and

Update Formula

ttt

(t)(t) E (t) | (t) E (t) | (t 1)d

ν= = − +X X Xζ ζ α

This means that if we already had E (t) | (t 1)−X ζ and receive one more data Y(t) which also gives new ν(t), then we can update to new (t)X by adding the latest term. We do not need to keep all the past value of Y(t) to be able to compute (t)X . ν(t) is called innovation of Y(t) and ν(t) innovation sequence.


We can view ν(t) as the output of Gram-Schmidt orthogonalization of y(t) in Hilbert space of second order random variables. This processing is causal and ν(t) is the component of Y(t) which is orthogonal to the subspace spanned by all previous Y(1) to Y(t-1), or equivalently, ν(1) to ν(t-1).

ν 1 = Y1

ν 2Y2

ν 3

Y3

Propagate Formula If we already knew E (t) | (t 1)−X ζ , called predictor, and then update with the new information from Y(t) or ν(t) to get E (t) | (t)X ζ or (t)X , called estimator, then how to get one-step prediction E (t 1) | (t)+X ζ . This is where the state space model comes in. From state equation, ( )(t 1) (t) (t) (t) t+ = +X A X B V , take the conditional mean using information up to time t,

( ) ( )

E (t 1) (t) E (t) (t) (t) t (t)

(t)E (t) (t) (t)E t (t)

+ = +

= +

X A X B V

A X B V

ζ ζ

ζ ζ


Because (t)V is independent of (t)X and N(t). But (t)Y depends on (t)X and N(t) so that (t)V and (t)Y

are independent. Then (t)Z and (t)ζ will also be independent of (t)V and E (t) | (t) E (t) 0= =V Vζ . Finally we get the propagate formula :

E (t 1) (t) (t)E (t) (t) (t) (t)+ = =X A X A Xζ ζ

or E (t) (t 1) (t 1)E (t 1) (t 1)− = − − −X A Xζ ζ

Kalman Filter Equations

Combine both update and propagate equation together

ttt

ttt

(t)(t) E (t) | (t) E (t) | (t 1)d

1(t 1)E (t 1) | (t 1) (t)d

ν= = − +

= − − − + ν

X X X

A X

ζ ζ α

ζ α

ttt

1(t) (t 1) (t 1) (t)d

= − − + νX A X α

This recursive formula will calculate new estimator (t)X from old estimator (t 1)−X after receiving new

information ν(t) or Y(t).


The vector ttt

1d

α is called Kalman gain which takes

care of the random part in the state equation. It tells us how much to adjust (t)X for the randomness of (t)X based on the available information from ν(t) or Y(t). These randomness will always give some error in the estimation of (t)X . We can calculate the Kalman gain and their mean square error of the prediction or estimation for each time t in advance from the system parameters (t)A , (t)B , (t)h , 0P , (t)Q , R(t). Define prediction error:

(t) (t) E (t) (t 1)= − −X X∆ ζ with

T(t) E (t) (t)=M ∆ ∆ estimation error:

(t) (t) E (t) (t) (t) (t)= − = −X X X Xθ ζ with

T(t) E (t) (t)= θ θP M(t) and P(t) are the error covariance matrices of predictor and estimator at time t, the smaller the better. How to compute them including the Kalman

gain = ttt

1d

α in terms of M(t) and P(t) are following.


The key concept of innovation ν(t) is that it is the component of Y(t) which is new (unexpected, uncorrelated, orthogonal) to our knowledge from all the past measurement (t 1)−Z or (t 1)−ζ . So that the prediction Y(t) based on past measurement is just the conditional mean of Y(t) given (t 1)−ζ . This also has the meaning as Y(t) projected on (t-1)-dim subspace.

T

T

Y(t) E Y(t) (t 1) E (t) (t) N(t) (t 1)

(t)E (t) (t 1) E N(t) (t 1)

= − = + −

= − + −

h X

h X

ζ ζ

ζ ζ T

1

(t) (t 1) (t 1)= − −h A X

Then, T(t) Y(t) Y(t) Y(t) (t) (t 1) (t 1)ν = − = − − −h A X , and substitute in Kalman equation to get,

Ttt

t

(t) E (t) | (t)

1(t 1) (t 1) Y(t) (t) (t 1) (t 1)d

=

= − − + − − −

X X

A X h A X

ζ

α

From 1 , the relationship between (t)ν and (t)∆ is,

T T

T

(t) Y(t) Y(t)

(t) (t) N(t) (t)E (t) (t 1)

(t) (t) E (t) (t 1) N(t)

ν = −

= + − −

= − − +

h X h X

h X X

ζ

ζ


and get T T(t) (t) (t) N(t) (t) (t) N(t)ν = + = +h h∆ ∆

Then we can find ttα and dt in terms of M(t) as T

tt E (t) (t) E (t) (t) (t) E (t)N(t)= ν = +X X h Xα ∆ T

tt E (t) (t) (t) (t) (t)= =h M hα ∆ ∆

T T2 2td E (t) (t)E (t) (t) (t) E N (t)= ν = +h h∆ ∆

Ttd (t) (t) (t) R(t)= +h M h

ν1

ν2

EX(t)|ζ(t)

X(t)θ(t)

∆(t)

EX(t)|ζ(t-1)

t = 2

To calculate M(t), we first find the relationship of (t)∆ and (t)θ as, (t) (t) E (t) | (t 1)= − −X X∆ ζ

(t 1) (t 1) (t 1) (t 1) (t 1) (t 1)(t 1) (t 1) (t 1) (t 1)

= − − + − − − − −= − − + − −

A X B V A XA B Vθ

From T(t) E (t) (t)=M ∆ ∆ and T(t) E (t) (t)=P θ θ , we can relate M(t) to P(t-1) as


T T(t) (t 1) (t 1) (t 1) (t 1) (t 1) (t 1)= − − − + − − −M A P A B Q B

To complete the recursion loop, we must be able to calculate P(t) from M(t). From Kalman equation,

updateprediction

ttt

ttt

T T Ttt

talways 0

Ttt tt

t

(t) (t) (t) (t) E (t) | (t)

1(t) E (t) | (t 1) (t)d

1(t) (t)d

1E (t) (t) E (t) (t) E (t) (t)d

1P(t) M(t)d

≥

θ = − = − ζ

= − ζ − − α ν

= ∆ − α ν

θ = − α ν

= − α α

X X X X

X X

X ∆ X X

Substitute ttα and dt in terms of M(t) and finally get T T

T(t) (t) (t) (t)(t) (t)(t) (t) (t) R(t)

= −+

M h h MP Mh M h

We can start from P(0) = P0 to find M(1) and then find P(1), M(2), P(2), … . Then we can calculate all other parameters such as Kalman gain directly.


Example X(t+1) = 0.5X(t) + V(t) Y(t) = X(t) + N(t) Where X(0) = 0, V(t) and N(t) are independent zero mean u.w.g.n. This is a scalar and stationary case with A(t) = 0.5, B(t) = 1, Q(t) = 1, R(t) = 1, h(t) = 1, P0 = 0 (because X(0) is deterministic). We will get

2

M(t) 0.5P(t 1)0.5 (1)(1)(1) 0.25P(t 1) 1M (t) M(t)P(t) M(t)

M(t) 1 M(t) 1

= − + = − +

= − =+ +

Substitute M(t): 0P(t 1) 4P(t) ; P(0) P 0P(t 1) 8

− += = =

− +

Kalman gain: M(t)k(t) P(t)M(t) 1

= =+

( ) ( ) ( )[ ] ( )

X t 0.5X t 1 P(t)[Y(t) 0.5X t 1 ]

1 P(t) 0.5X t 1 P(t)Y(t)

= − + − −

= − − +

Because this is stationary and with t → ∞, P(t) will converge to

2P 4 65 7P P 7P 4 0 P 0.531P 8 2

+ −= ⇒ + − = ⇒ = ≈

+

The Kalman filter will reduce to time-invariant filter, ( ) ( )X t 0.234X t 1 0.531Y(t)= − +

Even with infinite observations of Y(t), the estimation ( )X t still has error ( )tθ with variance P = 0.531.


Solutions to selected problems

#9/p.79

j j

j2 j2

j j

j2 j2

e e11 cos 2( )17 e e2.125 cos 28 2

4(e 2 e )( 4e 17 4e )

ω − ω

ω − ω

ω − ω

ω − ω

+++ ω

φ ω = =+− ω −

+ +=

− + −

To find filter transfer function, 1 1

22 2 2 2

1 1

2 214

4(z 2 z ) 2(z 1) 2(z 1)H (z)( 4z 17 4z ) (4z 1) (4z 1)2(z 1) z (1 z )H(z)(4z 1) 2(1 z )

− −

− −

− −

−

+ + + += = ⋅

− + − − −

+ += =

− −

To find filter impulse response, ( )

( )1 1

2

1 1 2 4 6

1 1 1 2 3 3 3 4 5 5 5 6

1 1 3 3 5 5

(2z) 1 zH(z)

1 (2z)

(2z) [1 z ][1 (2z) (2z) (2z) ]2 z 2 z 2 z 2 z 2 z 2 z

h(t) 0,2 ,2 ,2 ,2 ,2 ,2 , ; t 0,1,2,for

− −

−

− − − − −

− − − − − − − − − − − −

− − − − − −

+=

−

= + + + + +

= + + + + + +

= =

……

… …

We can find autocovariance function from h(t) by,

0

c( ) h( )h( )∞

τ=

τ = λ λ + τ∑


But this is suitable for finite h(t) or finding c(t) at some value of t only. We can also find autocovariance function from φ(ω) by,

j

j jj

j2 j2

1c( ) ( )e d2

1 4(e 2 e ) e d2 ( 4e 17 4e )

πωτ

−π

π ω − ωωτ

ω − ω−π

τ = φ ω ωπ

+ += ω

π − + −

∫

∫

Let z = ejω → dz = jejωdω = jzdω and ejωτ = zτ and the integral becomes contour integral on unit circle. The direction will be counter clockwise for τ ≥ 0.

1

2 2|z| 1

2

1 12 2|z| 1

1 12 2|z| 1

1 4(z 2 z ) dzc( ) z2 ( 4z 17 4z ) jz

1 (z 1) z dz2 j (z )(z )(z 2)(z 2)

1 A B C D dz2 j (z ) (z ) (z 2) (z 2)

A B

−τ

−=

τ

=

=

+ +τ =

π − + −

− +=

π − + − +

= + + + π − + − + = +

∫

∫

∫

A and B are the residues at poles zP = ±1/2 which are enclosed by the contour integral. To find the residues (or the coefficients of the partial fraction above), we multiply the integrand by (z – zP) and evaluate the integrand at z = zP as followings.


2

12

(z 1) zA(z )

τ− +=

−121

2

(z )(z )(z 2)(z 2)

⋅ −+ − +

z 1/ 223 1

2 2 353 5

2 2

2

1 12 2

( ) ( ) ( )(2)(1)( )( )

(z 1) zB(z ) (z )

=

τ−τ

−

τ

−= =

− +=

− +12(z )

(z 2)(z 2)⋅ +

− +z 1/ 2

21 12 2 1

155 32 2

( ) ( ) ( )( 2)( 1)( )( )

=−

τ−−τ−

−

−= = −

−

For negative τ, the integral direction will be clockwise and enclose the pole outside unit circle instead. But because c(τ) is symmetry, so we can just use the |τ|.

| | | |3 15 15

| || |

c( ) ( )(2) ( )( 2)

9 ( 1) (2)15

− τ − τ−

τ− τ

τ = + −

− −=

c(0) = 815 , c(±2) = 8 1

15 4⋅ , c(±4) = 8 115 16⋅ , c(±6) = 8 1

15 64⋅ c(±1) = 2 1

3 2⋅ , c(±3) = 2 13 8⋅ , c(±5) = 2 1

3 32⋅ , c(±7) = 2 13 128⋅


Kalman Filter Example A box of resistors has the mean and variance of the resistance as Xµ and 2

Xσ respectively. Pick up only one resistor and measure the resistance many times using an unbiased noisy ohm-meter with variance of measurement error 2

nσ . Assume independent measurement errors and Gaussian pdf for all. Let X be the true resistance and and Y(t) the measured value at time t = 1, 2, 3, … respectively. Formulate the Kalman filter to estimate the X(t) at each time t in terms of Xµ , 2

Xσ , 2nσ , y(t)

and t. If 2Xσ or 2

nσ is very large then X(t) will reduce to simple form, explain the meaning and reason for each case. Static system, scalar, no state noise case:

X(t 1) X(t)Y(t) X(t) N(t)+ =

= +

With A(t) = 1 , B(t) = 0 , h(t) = 1 , P(0) = 2Xσ , Q(t) =

N.A. , R(t) = 2nσ

T TM(t) A(t 1)P(t 1)A (t 1) B(t 1)Q(t 1)B (t 1)P(t 1)

= − − − + − − −= −

and T 2 2

tt tt n2 2

t n n

M (t) M(t)P(t) M(t) M(t)d M(t) M(t)

α α σ= − = − =

+ σ + σ


then 2n

2 2n n

P(t 1) 1 1 1P(t)P(t 1) P(t) P(t 1)

− σ= ⇒ = +

− + σ − σ

Start from P(0) = 2Xσ ,

2 2 2 2X n X n

2 2X n

1 1 1 1 1 2P(1) P(2)

1 1 tP(t)

= + ⇒ = + ⇒σ σ σ σ

⇒ = +σ σ

and get

2 2X n

1M(t) P(t 1) 1 t 1= − =−

+σ σ

Find Kalman gain, tt

T 2t n

tt2

t n

2n 2

n 2 2X n

2 2n X2 2

n X

M(t)h(t) M(t)

d h (t)M(t)h(t) R(t) M(t)M(t)Kalman gain k(t)

d M(t)1 1

1 t 11 1M(t)

1 1 for St S t

α = =

= + = + σα

= = =+ σ

= =σ −+ + σ + σ σ

= = = σ σσ σ + +


The MMSE estimator from Kalman equation,

( )X(t) 1 k(t) X(t 1) k(t)Y(t)S t 1 1X(t 1) Y(t)

S t S t(S t)X(t) (S t 1)X(t 1) Y(t)

= − − +

+ −= − +

+ +

+ = + − − +

From the initial estimator XX(0) E[X]= = µ (the unconditional mean),

t t

Xm 1 m 1

t

Xm 1

2 2 tn X

X2 2 2 2m 1n X n X

2 2n X

X2 2 2 2n X n X

(S 1)X(1) SX(0) Y(1)

(S 2)X(2) (S 1)X(1) Y(2) SX(0) Y(1) Y(2)

(S t)X(t) SX(0) Y(m) S Y(m)

S 1then X(t) Y(m)S t S t

Y(m)t t

/ t 1 Y(m)/ t / t t

= =

=

=

+ = +

+ = + + = + +

+ = + = µ +

= µ ++ +

σ σ= µ +

σ + σ σ + σ

σ σ= µ +

σ + σ σ + σ

∑ ∑

∑

∑t

m 1

2 2n X

X2 2 2 2n X n X

/ t Y/ t / t

=

σ σ= µ +

σ + σ σ + σ

∑


for t

m 1

1Y Y(m)t =

= ∑

Interpretation of the result: • If 2

nσ is very large then XX(t) = µ . Because the measurements are too noisy to be useful and the filter will rely only on a priori knowledge of X.

• If 2Xσ is very large then X(t) Y= . Because X can

be very far from Xµ and the filter have to rely on the measurement value only which is just the averaging.

• For general cases, the filter will try to balance the two sources of information about X according to their variances so that X(t)becomes the best estimator.

2102-502 Questions Random Signals and Systemspioneer.netserv.chula.ac.th/~nsuvit/502Notes.pdf ·...

Documents

Transcript of 2102-502 Questions Random Signals and Systemspioneer.netserv.chula.ac.th/~nsuvit/502Notes.pdf ·...