random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and...

21
BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a column vector X R n whose coordinates are jointly defined random variables: X = X 1 X 2 . . . X n . Detailed probability work with random vectors requires knowledge of the joint distribution of the coordinates X 1 ,...,X n , often based on a joint probability density function. But here we will not worry about the general theory, and will only concern ourselves with the joint dis- tribution in one special case, namely Gaussian random vectors, to be introduced later. For now, we will focus only on expected values and covariances for random vectors. 1.1. Mean vectors. We take the expected value of a random vector X in the following natural way: E( X)= E(X 1 ) E(X 2 ) . . . E(X n ) = μ 1 μ 2 . . . μ n = μ , where μ i = E(X i ) for each i. We call the vector μ the mean vector of X. Note that the mean vector is simply a vector of numbers. These numbers are the means (expected values) of the random variables X i . Exercise 1. Let X be a random 2-vector. Calculate its mean vector if X is a point randomly selected from within (i) The unit square (that is, the square with vertices (0, 0), (1, 0), (0, 1), (1, 1)). (ii) The unit circle centered at the origin. (iii) The triangle with vertices (0, 0), (1, 0), (0, 1). (iv) The region above the parabola y = x 2 and inside the unit square. 1

Transcript of random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and...

Page 1: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION

1. Expectations and Covariances of Random Vectors

A random vector, or more precisely, a random n-vector is a column

vector→X ∈ Rn whose coordinates are jointly defined random variables:

→X =

X1

X2...Xn

.

Detailed probability work with random vectors requires knowledge ofthe joint distribution of the coordinates X1, . . . , Xn, often based on ajoint probability density function. But here we will not worry aboutthe general theory, and will only concern ourselves with the joint dis-tribution in one special case, namely Gaussian random vectors, to beintroduced later. For now, we will focus only on expected values andcovariances for random vectors.

1.1. Mean vectors. We take the expected value of a random vector→X in the following natural way:

E(→X) =

E(X1)E(X2)

...E(Xn)

=

µ1

µ2...µn

=→µ ,

where µi = E(Xi) for each i. We call the vector→µ the mean vector of

→X. Note that the mean vector is simply a vector of numbers. Thesenumbers are the means (expected values) of the random variables Xi.

Exercise 1. Let→X be a random 2-vector. Calculate its mean vector

if→X is a point randomly selected from within

(i) The unit square (that is, the square with vertices (0, 0), (1, 0), (0, 1), (1, 1)).(ii) The unit circle centered at the origin.

(iii) The triangle with vertices (0, 0), (1, 0), (0, 1).(iv) The region above the parabola y = x2 and inside the unit

square.1

Page 2: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

2 BROWNIAN MOTION

Not surprisingly, linear algebra plays an important role with randomvectors. For example, it will be common to multiply a random n-vector→X by a non-random m × n matrix A, giving us the random m-vector

A→X. We can also multiply random vectors by scalars, and add random

vectors to other vectors (random or non-random). Suppose→X and

→Y

are random vectors, and let A and B be non-random matrices. Thenit is not hard to prove that

(1) E(A→X +B

→Y ) = AE(

→X) +BE(

→Y ) ,

assuming that the sizes of the vectors→X,

→Y and the matrices A,B are

chosen so that the matrix products A→X and B

→Y both make sense, and

also so that the vector addition A→X +B

→Y makes sense.

An important special case of (1) is the following: suppose→X is a

random n-vector, A is a non-random m × n matrix, and→b is a non-

random m-vector (column). Then

(2) E(A→X +

→b) = A

→µ+

→b ,

where→µ = E(

→X).

Exercise 2. Explain why Equation (2) is a special case of Equation (1).

Exercise 3. Prove Equation (2) in the case that→X is a random 2-

vector, A is a non-random 2 × 2 matrix, and→b is a non-random 2-

vector.

Similarly, we can take transposes of our random vectors to get rowvectors, and multiply by matrices on the right:

(3) E(→X

T

A+→Y

T

B) = E(→X)TA+ E(

→Y )TB ,

where we again assume that the sizes of the vectors and matrices are

appropriate. This means, for example, that if→X is a random n-vector,

then the matrix A must n ×m for some m, rather than m × n. Hereis the special case corresponding to our previous Equation (2):

(4) E(→X

T

A+→bT

) =→µTA+

→bT

,

where→X is a random n-vector with mean vector

→µ, A is a non-random

n×m matrix, and→b is a non-random m-vector (column).

Page 3: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 3

1.2. Covariance matrices. We are also interested in defining some-thing that would correspond to the “variance” of the random vector→X. This will be a matrix, consisting of all the possible covariancesof pairs of the random variables X1, X2, . . . , Xn. Because it contains

covariances, we call it the covariance matrix of→X:

Cov(→X) =

Cov(X1, X1) Cov(X1, X2) . . . Cov(X1, Xn)Cov(X2, X1) Cov(X2, X2) . . . Cov(X2, Xn)

......

. . ....

Cov(Xn, X1) Cov(Xn, X2) . . . Cov(Xn, Xn)

Remember that for any random variable Y , Cov(Y, Y ) = Var(Y ), sothe diagonal elements in the covariance matrix are actually variances,and we could write the covariance matrix this way:

Cov(→X) =

Var(X1) Cov(X1, X2) . . . Cov(X1, Xn)

Cov(X2, X1) Var(X2) . . . Cov(X2, Xn)...

.... . .

...Cov(Xn, X1) Cov(Xn, X2) . . . Var(Xn)

Of course, in all of these definitions and formulas involving mean vec-tors and covariance matrices, we need to assume that the means andcovariances of the underlying random variables X1, X2, . . . , Xn exist.Since we will be primarily dealing with Gaussian random variables,this assumption will typically be satisfied.

Exercise 4. Calculate the covariance matrix of the random vector→X

for each of the parts of Exercise 1.

In connection with the covariance matrix, it will be useful for us totake the expected value of a random matrix. As you might expect,this gives us the matrix whose elements are the expected values of therandom variables that are the corresponding elements of the randommatrix. Thus, if M is a random m × n matrix whose elements arerandom variables Xij, then

E(M) =

E(X11) E(X12) . . . E(X1n)E(X21) E(X22) . . . E(X2n)

......

. . ....

E(Xm1) E(Xm2) . . . E(Xmn)

.

It should be easy for you to see that if M and N are random m × nmatrices and if a and b are real numbers, then

(5) E(aM + bN) = aE(M) + bE(N) .

Page 4: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

4 BROWNIAN MOTION

Now we have a fancy and useful way to write the covariance matrix:

(6) Cov(→X) = E((

→X − →µ)(

→X − →µ)T ) = E(

→X→X

T

)− →µ→µT.

Can you see how this expression makes sense? In order to do so, you

must recall that if→v is a column vector in Rn, then the product

→v→vT

is an n×n matrix. Thus, (→X−→µ)(

→X−→µ)T is a random n×n matrix:

(→X−→µ)(

→X−→µ)T =

(X1 − µ1)2 (X1 − µ1)(X2 − µ2) . . . (X1 − µ1)(Xn − µn)

(X2 − µ2)(X1 − µ1) (X2 − µ2)2 . . . (X2 − µ2)(Xn − µn)...

.... . .

...(Xn − µn)(X1 − µ1) (Xn − µn)(X2 − µ2) . . . (Xn − µn)2

Now you should be able to see that if you take the expected value of

this random matrix, you get the covariance matrix of→X.

Exercise 5. Prove the second equality in (6). (Use (5).)

Now we are equipped to get a formula for Cov(A→X +

→b), where

→X

is a random n-vector, A is a non-random m × n matrix, and→b is a

non-random m-vector. Note that since the vector→b won’t affect any

of the covariances, we have

Cov(A→X +

→b) = Cov(A

→X) ,

so we can ignore→b . But we obviously can’t ignore A! Let’s use

our fancy formula (6) for the covariance matrix, keeping in mind that

E(A→X) = A

→µ, where

→µ is the mean vector

→X.

Cov(A→X) = E((A

→X − A→µ)(A

→X − A→µ)T ) .

Now we use rules of matrix multiplication from linear algebra:

(A→X − A→µ)(A

→X − A→µ)T = A(

→X − →µ)(

→X − →µ)TAT

Take the expected value, and then use (2) and (4) to pull A and AT

outside of the expected value, giving us

(7) Cov(A→X +

→b) = ACov(

→X)AT .

This is an important formula that will serve us well when we work withGaussian random vectors.

Page 5: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 5

1.3. Change of coordinates. Note that Cov(→X) is a symmetric ma-

trix. In fact, this matrix is nonnegative definite. How do we knowthis? From the Spectral Theorem, we know there exists an orthogonal

matrix O such that OT Cov(→X)O is a diagonal matrix D, with the

eigenvalues of Cov(→X) being the diagonal elements, and the columns

of O being the corresponding eigenvectors, normalized to have length1. But by (7), this diagonal matrix D is the covariance matrix of the

random vector OT→X, and the diagonal entries of any covariance matrix

are variances, hence nonnegative. This tells us that Cov(→X) has non-

negative eigenvalues, making it nonnegative definite. Let√D stand

for the diagonal matrix whose diagonal elements are the nonnegativesquare roots of the diagonal elements of D. Then it is easy to see that

Cov(→X) = O

√D√DTOT = ΣΣT , where Σ = O

√D. Even though

the matrix Σ is not uniquely determined, it seems OK to call it the

standard deviation matrix of→X. So we have

D = OT Cov(→X)O and Σ = O

√D ,

where

D =

λ1 0 . . . 00 λ2 . . . 0...

.... . .

...0 0 . . . λn

and O =

↑ ↑ . . . ↑→v1

→v2

. . .→vn

↓ ↓ . . . ↓

and λ1, λ2, . . . , λn are the eigenvalues of Cov(

→X), with corresponding

eigenvectors→v1,

→v2, . . . ,

→vn. We will always assume that we have put

the eigenvalues into decreasing order, so that λ1 ≥ λ2 ≥ · · · ≥ λn ≥ 0.

Now suppose that→X is a random vector with covariance matrix

ΣΣT , where Σ = O√D and O and D are as in the previous paragraph.

Then clearly→Y = OT

→X is a random vector with covariance matrix D.

This means that by simply performing a rotation in Rn, we have trans-

formed→X into a random vector whose coordinates are uncorrelated.

If the coordinates of→X are uncorrelated, then we say the

→X itself is

uncorrelated. Informally, we have the following statement:

Every random vector is uncorrelated if you lookat it from the right direction!

Mathematically, this statement can be written as

Cov(OT

→X)

= OTΣΣTO = D .

Page 6: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

6 BROWNIAN MOTION

Exercise 6. For each part of Exercise 1, determine the change of coor-

dinates (rotation) that changes→X into an uncorrelated random vector.

1.4. Standardization. We can go a little bit further. As stated ear-lier, we select the matrix O so that the nonzero diagonal elements of D,which are the eigenvalues λ1, λ2, . . . , λn, are in decreasing order. Thismeans that the nonzero eigenvalues come before the eigenvalues thatequal 0. Let m ≤ n be the number of nonzero diagonal eigenvalues.This is the same as saying that m is the rank of the covariance matrix.

Then for i > m, the coordinates Yi of the vector→Y = OT (

→X − →µ)

(note that we centered→X by subtracting its mean vector) have ex-

pected value 0 and variance 0, implying that Yi = 0 for i > m. Let

→Z =

Y1/√λ1

Y2/√λ2

...Ym/√λm

.

This is the same as letting→Z = B

→Y , where B is the m× n matrix

B =

1√λ1

0 . . . 0 0 . . . 0

0 1√λ2

. . . 0 0 . . . 0...

.... . .

...0 0 . . . 1√

λm0 . . . 0

Then

→Z is a random m-vector with mean vector

→0 and covariance

matrix equal to the m × m identity matrix Im. We say that→X has

been centered and scaled, or in other words, standardized. Informally

speaking, we center→X by subtracting its mean vector, and we scaled

it by “dividing” by its standard deviation matrix (after eliminating theextra dimensions that correspond to 0 eigenvalues). To summarize, the

standardized version of→X is

→Z, where

(8)→Z = BOT (

→X − →µ) .

In many cases, all of the diagonal elements of D are nonzero, and B =√D−1. This last condition is equivalent to saying that Σ is invertible,

and then we have Σ−1 =√D−1OT .

Page 7: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 7

Theorem 1. Let→X be a random n-vector with mean vector

→µ and

standard deviation matrix Σ. If Σ has rank n (or equivalently, is in-vertible), then the standardized random vector

(9)→Z = Σ−1(

→X − →µ)

is a random n-vector with mean vector→0 and covariance matrix In,

where In is the n× n identity matrix. If Σ has rank m < n, then thereis an m-dimensional subspace V ∈ Rn such that

P ((→X − →µ) ∈ V ) = 1 ,

and the random m-vector→Z defined in (8) has mean vector

→0 and

covariance matrix Im.

Note that if the covariance matrix of a random vector is the identitymatrix, then its standard deviation matrix is also the identity matrix.

Exercise 7. Standardize→X for each part of Exercise 1.

Exercise 8. Let V1 and V2 be the outcomes (0 or 1) of independenttosses of a fair coin. Let

→X =

V1 + V2

V1

V1 − V2

.

Calculate the mean vector and covariance matrix of→X, and then stan-

dardize→X.

2. Gaussian Random Vectors

Let→X be a random n-vector. We say that

→X is a Gaussian random

n-vector if for all non-random n-vectors→a, the random variable

→a ·

→X

is normally distributed. Note that

→a ·

→X = a1X1 + a2X2 + · · ·+ anXn ,

so this condition simply says that→X is a Gaussian random vector if any

linear combination of its coordinates is normally distributed. In par-ticular, the individual coordinates X1, X2, . . . , Xn must all be normally

distributed, but that alone is not enough for→X to be Gaussian.

Page 8: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

8 BROWNIAN MOTION

Here is how you get Gaussian random vectors. Start with the random

n-vector→Z whose coordinates are independent random variables, all of

which have the standard normal distribution. Then it is easy to see

that→Z has mean vector

→0 and covariance matrix In. We know from

the text that any sum of independent normal random variables is itself

normal, so→Z satisfies the definition of a Gaussian random vector. This

is the standard normal (or Gaussian) random n-vector.

Let Σ be a non-random m × n matrix and let→µ be a non-random

m-vector. Then the random m-vector→X = Σ

→Z +

→µ

has mean vector µ and covariance matrix ΣΣT . Let→a be a non-random

m-vector. Then→a ·

→X = (

→aΣ) ·

→Z +

→a · →µ .

Since→aΣ is a non-random n-vector, we know that (

→aΣ) ·

→Z must be

normally distributed. And since→a ·→µ is just a constant, it follows that

→a ·→X is normally distributed. Therefore, by definition,

→X is a Gaussian

random m-vector. This procedure shows us how we can get a Gaussianrandom vector with any desired mean vector and covariance matrix.

The preceding argument also works to prove the following fact: if→X is a Gaussian random n-vector with mean vector

→µ and covariance

matrix Γ, and if A is a non-random m×n matrix and→b is a non-random

m-vector, then the random m-vector

Y = A→X +

→b

is a Gaussian random vector with mean vector A→µ+

→b and covariance

matrix AΓAT .To summarize:

(i) If→Z is a random n-vector with independent coordinates that

have the standard normal distribution, then→Z is a Gauss-

ian random n-vector, called the standard Gaussian randomn-vector.

(ii) if→Z is a standard Gaussian random n-vector, Σ a non-random

m×n matrix and→µ a non-random m-vector, then

→X = Σ

→Z +

→µ is a Gaussian random m-vector with mean vector

→µ and

covariance matrix ΣΣT .

Page 9: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 9

(iii) if→X is any Gaussian random n-vector, then for any non-

random m × n matrix A and non-random m-vector→b , the

random m-vector A→X +

→b is a Gaussian random m-vector.

(iv) Theorem 1 shows how to use matrix multiplication and vectoraddition to turn any Gaussian random vector into a standardGaussian random vector.

It turns out that this is all you need to know about Gaussian ran-dom vectors, because the distribution of a Gaussian random vector isuniquely determined by its mean vector and covariance matrix. Thisfact requires advanced techniques for its proof, so we won’t give theproof here. But we state it as a theorem:

Theorem 2. The distribution of a Gaussian random n-vector→X is

uniquely determined by its mean vector→µ and covariance matrix Γ. If

Γ is invertible, the joint probability density function of→X is

f(x1, x2, . . . , xn) = f(→x) =

1√2π

n√det(Γ)

exp(−1

2(→x−→µ)TΓ−1(

→x−→µ))

If Γ is a diagonal matrix, then the coordinates of→X are independent,

and for each i, the ith coordinate Xi has the distribution N (µi,Γii).

If we combine Theorems 1 and 2, we see that

Every Gaussian random vector has indepen-dent coordinates if you look at it from the rightdirection!

Exercise 9. If Γ is a diagonal matrix with strictly positive diagonalelements, show that the joint density function given in Theorem 2 isindeed the joint density function of n independent random variablesX1, X2, . . . , Xn such that for each i, Xi has the distribution N (µi,Γii).

Exercise 10. (i) Which of the following two matrices is a covari-ance matrix? 1 0 −1

0 2√

2

−1√

2 2

1√

2 0√2 2 1

0 1 2

(ii) Using three independent standard normal random variables

Z1, Z2, Z3, create a random vector→X whose mean vector is

(−1, 1, 0)T and whose covariance matrix is from (i).

Page 10: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

10 BROWNIAN MOTION

Example 1. Let

Γ =

113−8

383

−83

113−8

3

83−8

3113

It turns out that the eigenvalues of Γ are λ1 = 9, λ2 = λ3 = 1, withcorresponding orthogonal eigenvectors

→v1 =

33

−√

33

√3

3

→v2 =

0

√2

2

√2

2

→v3 =

−√

63

−√

66

√6

6

Let O be the orthogonal matrix whose columns are these eigenvectors,and let D be the diagonal matrix whose diagonal elements are theeigenvalues 9, 1, 1. Then

√D is the diagonal matrix whose diagonal

elements are 3, 1, 1. You should check that all of these statements arecorrect, or at least check some of them and know how to check the rest.

Since Γ is positive definite, we can think of Γ as a covariance matrix,with corresponding standard deviation matrix

Σ = O√D =

3 0 −√

63

−√

3√

22−√

66

√3

√2

2

√6

6

You should check that Γ = ΣΣT .

Now, if→X is a random 3-vector with covariance matrix Γ and mean

vector, say,

→µ =

10−2

,

Page 11: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 11

then→Y = OT (

→X − →µ) is uncorrelated, with covariance matrix D and

mean vector→0:

→Y = OT (

→X − →µ) =

33−√

33

√3

3

0√

22

√2

2

−√

63−√

66

√6

6

X1 − 1

X2

X3 + 2

Doing the matrix multiplication gives

→Y =

33

(X1 −X2 +X3 + 1)

√2

2(X2 +X3 + 2)

√6

6(−2X1 −X2 +X3 + 4)

You should be able to check directly that this is correct, and also that

this random vector is uncorrelated and has mean vector→0. In order

to fully standardize the random vector→X, we need to multiply

→Y by√

D−1 to get

→Z =

√D−1

→Y =

39

(X1 −X2 +X3 + 1)

√2

2(X2 +X3 + 2)

√6

6(−2X1 −X2 +X3 + 4)

This random vector

→Z has mean vector

→0 (that’s easy to check) and

covariance matrix I3. Let’s do a little checking of the covariance matrix.According to the rules we learned about covariances of ordinary randomvariables,

Var(Z1) =1

27[Var(X1) + Var(X2) + Var(X3)

− 2 Cov(X1, X2) + 2 Cov(X1, X3)− 2 Cov(X2, X3)] .

We can obtain these variances and covariances from the covariancematrix Γ, so

Var(Z1) =1

27

[11

3+

11

3+

11

3+

16

3+

16

3+

16

3

]= 1 .

Page 12: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

12 BROWNIAN MOTION

That’s what we expected. Let’s try one of the covariances:

Cov(Z2, Z3) =

√12

12[−2 Cov(X2, X1)− Var(X2) + Cov(X2, X3)

− 2 Cov(X3, X1) + Cov(X3, X2) + Var(X3)]

Filling in the values of the variances and covariances from Γ, you caneasily check that this equals 0, as expected.

Let’s do one more thing. Suppose we started with a standard Gauss-

ian random 3-vector→Z. How can we use it to create a random Gaussian

3-vector→X with covariance matrix Γ and mean vector

→µ? We simply

multiply→Z by the matrix Σ and then add

→µ:

→X = Σ

→Z +

→µ =

3 0 −√

63

−√

3√

22−√

66

√3

√2

2

√6

6

Z1

Z2

Z3

+

1

0

−2

.

You can carry out the indicated operations to find that, for example,X1 =

√3Z1 −

√6Z3/3 + 1.

3. Random Walk

Let X1, X2, . . . be independent random variables, all having the samedistribution, each having mean µ and variance σ2. For n = 1, 2, 3, . . . ,define

W (1)n = X1 +X2 + · · ·+Xn .

Let→W be the following random vector (with infinitely many coordi-

nates):→

W (1) = (W(1)1 ,W

(1)2 ,W

(1)3 , . . . ) .

This is the random walk with steps X1, X2, X3, . . . . (The reason su-perscript (1) will be clear later.) We often think of the subscript as

time, so that W(1)n is the position of the random walk

→W (1) at time n.

That is, we think of→

W (1) as a (random) dynamical system in discretetime, and at each time n, the system takes the step Xn to move from

its previous position W(1)n−1 to its current position W

(1)n . In those cases

where it is desirable to talk about time 0, then we define W(1)0 = 0.

But we typically don’t include this value as one of the coordinates of→

W (1).

Page 13: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 13

It is easy to calculate the mean vector→µ and covariance matrix of

→W (1):

→µ =

µ2µ3µ...

and Γ =

σ2 σ2 σ2 . . .σ2 2σ2 2σ2 . . .σ2 2σ2 3σ2 . . ....

......

. . .

Thus,

→µn = nµ and Γij = min(i, j)σ2.

Exercise 11. Verify these formulas for the mean vector and covariance

matrix of→

W (1).

Exercise 12. The simple symmetric random walk has steps Xi suchthat P (Xi = 1) = P (Xi = −1) = 1/2. Find the mean vector andcovariance matrix of this random walk.

In the next section we will want to make the transition from randomwalks in discrete time to Brownian motion in continuous time. The keyto this transition is to chop discrete time up into smaller and smallerpieces. Fix a value ∆ > 0. For n = 1, 2, 3, . . . and t = n∆, define

W(∆)t = Y1 + Y2 + · · ·+ Yn ,

where for k = 1, 2, 3, . . . ,

Yk =√

∆(Xk − µ) + ∆µ .

Note that the random walk→

W (1) defined earlier agrees with the case∆ = 1.

We think of the random variables Yk as the steps taken by a randomwalk at times t that are multiples of ∆. Since the random variables Xk

are assumed to be independent, and have the same distribution, withmean µ and variance σ2, the random variables Yk are also independent,and they have the same distribution as each other, but their commonmean is ∆µ and their common variance is ∆σ2. Thus, if t = n∆, the

random variable W(∆)t has mean tµ and variance tσ2. And if s = m∆,

then Cov(W(∆)s ,W

(∆)t ) = min(s, t)σ2.

Thus, for any ∆ > 0, we have created a random walk→

W (∆) that

takes steps at times that are multiples of ∆. The mean vector→µ of

→W (∆) has coordinates

→µt = tµ covariance matrix Γ with elements

Γs,t = min(s, t)σ2, for all times s, t that are multiples of ∆. Note thatthese do not depend on ∆, so we might expect something good tohappen as we try to make the transition to continuous time by letting

Page 14: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

14 BROWNIAN MOTION

∆ → 0. In the next section, we will see exactly what it is that doeshappen.

Exercise 13. The following numbers are the results of 32 rolls of astandard 6-sided die:

1, 5, 3, 4, 6, 2, 1, 1, 6, 1, 2, 4, 3, 5, 4, 5, 5, 1, 5, 2, 5, 2, 3, 1, 1, 1, 5, 6, 1, 2, 2, 1

Let X1, X2, . . . , X32 be these numbers. For ∆ = 1, 12, 1

4, 1

8, draw graphs

of→

W (∆) for appropriate values of t between 0 and 4.

4. The Central Limit Theorem for Random Walks

We start by considering a special collection of random walks, theGaussian random walks. These are defined as in the previous section,with the additional assumption that the steps Xi are normally dis-

tributed, so that for each ∆,→

W (∆) is a Gaussian random vector (makesure you understand why this is true). If each step has mean µ and

variance σ2, then→

W (∆) is the Gaussian random walk with drift µ,diffusion coefficient σ and time step size ∆.

If the drift is 0 and the diffusion coefficient is 1, then the steps of→

W (∆) all have mean 0 and variance ∆, and then→

W (∆) is called thestandard Gaussian random walk with time step size ∆. Note that for a

fixed time step size ∆, if→

W (∆) is a Gaussian random walk with driftµ and diffusion coefficient σ, then the random walk whose position attime t is

1

σ(W

(∆)t − µt)

is the standard Gaussian random walk with time step size ∆ (for timest that are multiples of ∆).

On the other hand, if→

W (1) is a standard Gaussian random walk withtime step size ∆, then we get a Gaussian random walk with drift µ anddiffusion coefficient σ by going the other way. That is, the randomwalk whose position at time t is

σW(∆)t + µt

is a Gaussian random walk with drift µ and diffusion coefficient σ. Sojust as we can easily convert back and forth between a standard nor-mally distributed random variable and a normally distributed randomvariable with mean µ and variance σ2, we can also easily convert backand forth between a standard Gaussian random walk and a Gaussianrandom walk with drift µ and diffusion coefficient σ.

Page 15: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 15

What do we know about Gaussian random walks? In the following,

the times (such as s and t) are all multiples of ∆. If→

W (∆) is a Gaussianrandom walk with drift µ, diffusion coefficient σ and time step size ∆,then:

(i) For each t > s ≥ 0, the increment W(∆)t −W (∆)

s has the distri-bution N (µ(t− s), σ2(t− s)).

(ii) If (s1, t1), (s2, t2), . . . , (sn, tn) are disjoint open intervals in (0,∞),then the increments

W(∆)t1 −W

(∆)s1

,W(∆)t2 −W

(∆)s2

, . . . ,W(∆)tn −W

(∆)sn

are independent.

Since the distribution of an increment of a Gaussian random walk de-pends only on the time difference t− s, we say that the increments arestationary. We summarize the two properties together by saying thata Gaussian random walk has stationary, independent increments.

Note that the distributions of the increments do not depend on ∆,except for the assumption that the times involved must be multiplesof ∆. It seems reasonable to guess that if we let ∆→ 0, we will get adynamical system W = {Wt, t ≥ 0} that is defined for all nonnegativetimes t, and this stochastic process will have stationary independentnormally distributed increments. Furthermore, it seems reasonable toguess that Wt, the position at time t, depends continuously on t. Thatis, there is a stochastic process W with the following properties:

(i) The position Wt depends continuously on t for t ≥ 0.(ii) For each t > s ≥ 0, the increment Wt−Ws has the distributionN (µ(t− s), σ2(t− s)).

(iii) If (s1, t1), (s2, t2), . . . , (sn, tn) are disjoint open intervals in (0,∞),then the increments

Wt1 −Ws1 ,Wt2 −Ws2 , . . . ,Wtn −Wsn

are independent.

Einstein first talked about this in a meaningful mathematical way, andNorbert Wiener was the first to establish the existence of W with somerigor. The stochastic process W is called Brownian motion with drift µand diffusion coefficient σ. When µ = 0 and σ = 1, it is called standardBrownian motion, or the Wiener process.

Here is a summary of what we have so far: if we fix the drift µand the diffusion coefficient σ, then as the time step size ∆ goes to0, the corresponding sequence of Gaussian random walks converges (ina rigorous mathematical sense that will not be made explicit here) toBrownian motion with drift µ and diffusion coefficient σ.

Page 16: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

16 BROWNIAN MOTION

Here is a wonderful fact: the statement in the previous paragraphremains true, even if we drop the Gaussian assumption. It doesn’tmatter what the original distribution of the steps X1, X2, X3, . . . is,when ∆ goes to 0, we get Brownian motion. This is the “Central LimitTheorem” for random walks.

Why is this result true? The first point is that for any random walk,we know no matter what the time step size ∆ equals, the increments

of the random walk→

W (∆) for non-overlapping time intervals are in-dependent, and we know that the means, variances, and covariancesof increments do not depend on ∆. The second point is that for anyfixed t > s ≥ 0, the ordinary Central Limit Theorem tells us that

the increment W(∆)t −W (∆)

s is approximately normal when ∆ is small,because this increment is the sum of a lot of independent random vari-ables that all have the same distribution. This means that for smallenough ∆ > 0, it is hard to distinguish a non-Gaussian random walkfrom a Gaussian random walk. In the limit as ∆ → 0, non-Gaussianand Gaussian random walks become indistinguishable.

One final comment. For any ∆ > 0, the relationship between→

W (1)

and ~W (∆) is merely a change of coordinates. If µ = 0, then the changeof coordinates is particularly simple: speed up time by the factor 1/∆

and scale space by√

∆.

W(∆)t =

√∆(W

(1)t/∆

).

This means that if you start with a graph of the original random walk→

W (1) and do the proper rescaling, you can get a graph of→

W (∆) forreally small ∆, and this graph will look at lot like a graph of Brownianmotion! This works whether your random walk is the simple symmetricrandom walk, with P (X1 = 1) = P (X1 = −1) = 1/2, or a standardGaussian random walk.

5. Geometric Brownian Motion

Let’s recall a few facts about the binomial tree model. We assumethere is a risk-free interest rate r, and we will let ∆ > 0 be the lengthof a time step. When the stock price goes up, it is multiplied by afactor u, and when it goes down, it is multiplied by a factor d. Thereis a “probability” p of going up at each time step that comes from theNo Arbitrage Theorem, given by the formula:

p =er∆ − du− d

.

Page 17: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 17

We will assume that

u = eσ√

∆ and d = 1/u = e−σ√

∆ .

The idea is that at each time step, the stock price is multiplied by a

random quantity eR√

∆, which equals u with probability p and d withprobability 1 − p. Therefore, R = σ with probability p and −σ withprobability 1−p. This makes R like the step of a random walk. In thissetting, σ is called the volatility.

Let’s calculate the approximate mean and variance of R. In order todo this, we will repeatedly use the approximation ex ≈ 1 +x when x issmall. Then we get er∆ ≈ 1 + r∆, u ≈ 1 + σ

√∆ and d ≈ 1− σ

√∆, so

p ≈ r∆ + σ√

2σ√

∆=

1

2+r√

2σ.

We know that E(R) = σ(2p − 1), so we get E(R) ≈ r√

∆. We alsoknow that

Var(R) = 4σ2p(1− p) ≈ σ2 .

Now we look at the geometric random walk, which we will call→S(∆):

S(∆)t = S0 exp(

√∆(R1 +R2 + · · ·+Rt/∆) ,

where R1, R2, . . . are the steps of the underlying random walk. Whenthe time step size ∆ is small, the expression in the exponent is approx-

imately a Brownian motion→W . Note that

E(Wt) =√

∆E(R)/∆ ≈ r ,

andVar(Wt) = ∆(t/∆) Var(R) ≈ σ2 .

This is exactly what we expect: using the probabilities given by theNo Arbitrage Theorem, the approximate average rate of growth shouldequal the risk-free rate and we have arranged things so that σ2 is theapproximate variance of the growth rate.

It turns out that we have been a little sloppy with our approxi-mations. Brownian motion is wild enough so that the approximationex ≈ 1 +x isn’t good enough. Instead, we should use ex ≈ 1 +x+x2/2in some places. When this is done, the diffusion coefficient of the under-lying Brownian motion Wt remains equal to σ, but the drift is changed.The key change occurs in the approximation for p:

p ≈r∆ + σ

√∆− 1

2σ2∆

2σ√

∆=

1

2+r√

2σ− σ√

4.

Once this change is made, we get E(R) ≈ (r − σ2/2)√

∆.

Page 18: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

18 BROWNIAN MOTION

6. Calculations with Brownian motion

Throughout this section, Wt will be the position at time t of a stan-dard Brownian motion, Xt will be the position at time t of a Brownianmotion with drift µ and diffusion coefficient σ, and St will be the po-sition at time t of a geometric Brownian motion with risk-free interestrate r and volatility σ. We will assume that W0 and X0 are alwaysequal to 0, and that the initial stock price S0 is a positive quantity.

In order to do calculations with a general Brownian motion Xt, it isoften best to turn it into a standard Brownian motion:

Wt =Xt − µt

σ.

Similarly, it can be useful to turn a geometric Brownian motion St intoa standard Brownian motion. We could do this in two steps. First takethe logarithm of St and subtract the logarithm of the initial stock priceto get a Brownian motion Xt with drift µ = r − σ2/2 and diffusioncoefficient σ:

Xt = log(St/S0) = log(St)− log(S0) .

Then we get the standard Brownian motion by subtracting the driftand dividing by the diffusion coefficient:

Wt =Xt − rt+ σ2t

2

σ.

The upshot is that for many purposes, if you can do probability cal-culations with standard Brownian motion, then you can do probabilitycalculations for all Brownian motions, including geometric Brownianmotion. The main exception to this is calculations that involve ran-dom times in some way. But we will avoid such calculations in thiscourse. You will learn how to do those next year.

For the rest of this section, we will focus on doing some probabilitycalculations with standard Brownian motion Wt. The simplest calcu-lation has to do with the position of the Brownian motion at a specifictime. For example, calculate P (W4 ≥ 3). To do this, we use the factthat W4 has the distribution N (0, 4). That means that Z = W4/2 hasthe standard normal distribution, so

P (W4 ≥ 3) = P (Z ≥ 3/2) = 1− Φ(1.5) ,

where Φ is the standard normal cumulative distribution function.The next simplest calculation has to do with non-overlapping incre-

ments. For such calculations, we use the fact that Brownian motion

Page 19: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 19

has stationary independent increments. For example,

P (|W1| ≤ 1 and W3 > W2 and W5 < W3 + 1)

= P (−1 ≤ W1 ≤ 1)P (W3 −W2 > 0)P (W5 −W3 < 1)

= P (−1 ≤ Z ≤ 1)P (Z > 0)P (Z <√

2/2)

= (2Φ(1)− 1)

(1

2

)(Φ

(√1

2

)).

The hardest calculations are for probabilities involving the positionsof Brownian motion at several different times. For these calculations,we use the joint distribution function. To keep things from gettingtoo messy, we will restrict our attention to events involving only twodifferent times. For example, suppose we want to calculate

P (W2 < 1 and W5 > −2) .

To do this, we use the fact that the random 2-vector (W2,W5)T is a

Gaussian random 2-vector with mean vector→0 and covariance matrix(

2 22 5

). By Theorem 2, the joint density function of this random

2-vector is

f(x, y) =1

2π√

6exp

(−1

2(x y)

(2 22 5

)−1(xy

))Since (

2 22 5

)−1

=

56−1

3

−13

13

,

the quadratic expression in the exponential function is

− 5

12x2 +

1

3xy − 1

6y2 .

Therefore

P (W2 < 1 and W5 > −2) =1

2π√

6

∫ ∞−2

∫ 1

−∞exp

(− 5

12x2 +

1

3xy − 1

6y2

)dx dy .

It is best to use computer software to calculate the final numericalanswer. Another approach would be to make a rotation that wouldconvert the Gaussian 2-vector into a vector with independent coordi-nates. This simplifies the joint density function but complicates theregion of integration.

Page 20: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

20 BROWNIAN MOTION

7. The fundamental stochastic differential equation

Recall that the basic differential equation for the growth of a bankaccount with interest rate r (compounded continuously) is

dS

dt= rS .

Let’s write this equation in a different way:

dSt = rSt dt .

This emphasizes the way in which S depends on t, and also makes usthink about the approximation to this differential equation in terms ofincrements:

∆St = St+∆t − St ≈ rSt∆t .

There is no randomness in this equation. It describes a “risk-free”situation. Its solution is St = S0 exp(rt).

Here is a way to introduce some randomness:

(10) ∆St ≈ rSt∆t+ σSt(Wt+∆t −Wt) = rSt∆t+ σSt ∆Wt ,

where Wt represents the position of a standard Brownian motion attime t. The idea here is that the price of the stock St is affected in twoways: first, there is the growth that comes from the risk-free interestrate r, and then there is a random fluctuation which, like the growthdue to interest, is expressed as a factor σWt multiplied by the currentprice St. This factor can be positive or negative. Roughly speaking,one thinks of it as an accumulation of many tiny random factors, sothat it should be (approximately) normally distributed. For reasonsthat go back to the No Arbitrage Theorem, these fluctuations shouldhave mean 0. But the size of the fluctuations varies with the stock,and that is the reason for the “volatility” constant σ. You’ll learnmore about why this is a reasonable model next year.

As we let ∆t go to 0, we get the stochastic differential equation

(11) dSt = rSt dt+ σSt dWt .

This equation does not make sense in the ordinary way, because of theterm dWt. The paths of Brownian motion are so wild that a naiveapproach to understanding dWt doesn’t work. It took several geniuses(including Norbert Wiener and Kiyosi Ito) to find a way to make senseof this.

We won’t go into detail here, but we will see a little bit of what goeson by checking that geometric Brownian motion with growth rate r andvolatility σ is a solution to this stochastic differential equation. Thus,let Xt be a Brownian motion with drift µ = r − σ2/2 and diffusion

Page 21: random vector random -vectorgrayx004/pdf/FM5002... · BROWNIAN MOTION 1. Expectations and Covariances of Random Vectors A random vector, or more precisely, a random n-vector is a

BROWNIAN MOTION 21

coefficient σ, and let St = S0 exp(Xt), where S0 is the initial price ofthe stock. To justify our claim that this choice for St gives a solutionto (11), we check to see if it seems to work in the approximation (10).

(12) ∆St = St+∆t − St = S0 exp(Xt + ∆Xt)− exp(Xt)

= St(exp(∆Xt)− 1) ≈ St

(∆Xt +

1

2(∆Xt)

2

).

In this expression, we used the 2nd degree Taylor expansion of theexponential function to get our approximation of ∆St on the rightside. Remember that Xt = µt+σWt, where Wt is a standard Brownianmotion, so

∆Xt =

(r − σ2

2

)∆t+ σ∆Wt = r∆t+ σ∆Wt −

σ2

2∆t .

We also have

(∆Xt)2 = σ2(∆Wt)

2 + (terms involving (∆t)2 and ∆t∆Wt) .

We saw last semester that the terms involving (∆t)2 can be ignoredin such approximations. It turns out that the terms involving ∆t∆Wt

can also be ignored – they are obviously o(∆t). But Brownian motionis wild enough so that (∆Wt)

2 cannot be ignored. Remember:

E((∆Wt)2) = E((Wt+∆t −Wt)

2) = Var(W∆t) = ∆t ,

so (∆Wt)2 is comparable to ∆t. Putting everything into (12) gives

∆St ≈ rSt ∆t+ σSt ∆Wt +σ2

2((∆Wt)

2 −∆t) .

It turns out that the difference (∆Wt)2 − ∆t can also be ignored –

it is a random quantity that is typically small compared to ∆t. Inother words, the random quantity (∆Wt)

2−∆t is typically o(∆t). Af-ter removing that term, we see that our choice of St satisfies (10), asdesired.

You can see that there is a lot of fancy mathematics hiding behindall of this. Welcome to the world of stochastic differential equations!