Brownian Motion - University of Washingtonmorrow/papers/peter-thesis.pdf · the expected value of...

Brownian Motion

Peter Rudzis

June 7th, 2017

Contents

1 Introduction 1

2 Preliminaries 32.1 Probability Theory: Basic Definitions . . . . . . . . . . . . . . . . . . . . . . 32.2 The Probabilistic Way of Thought . . . . . . . . . . . . . . . . . . . . . . . . 42.3 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.4 Characteristic functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.5 Stochastic Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 Gaussian Distribution 12

4 Wiener Process 17

5 Construction of the Wiener process 205.1 Haar functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215.2 The Main Construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

6 Markov Property 266.1 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266.2 Stopping Times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.3 The Wiener Process as a random variable taking values in C([0,∞),Rn) . . 306.4 The Weak Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . . 316.5 The Strong Markov Property . . . . . . . . . . . . . . . . . . . . . . . . . . 33

7 Appendices 35

1 Introduction

Attempts by persons such as Robert Brown to understand the motion of pollen grains sus-pended in a fluid (1828) and Bachelier to understand the stock market (1900) provided

1

impetus for the development of a mathematical theory of Brownian motion during the twen-tieth century [3, p. 301]. Today, the theory of Brownian motion has assumed a central rolein modern probability theory, and it has deep connections in other fields of mathematics aswell. Brownian motion is not only an applied tool, but also a concept of enormous theoreticalimportance.

The purpose of this paper is to give a mathematical exposition on Brownian motion,emphasizing the concept’s theoretical underpinnings and basic properties. We spend §2 re-viewing the concepts from probability theory needed for the definition and construction ofBrownian motion. Since Brownian motion is a Gaussian process, it is useful to develop somegeneral facts about Gaussian distribution in §3. In §4 we define the Wiener process, whichis the mathematical model for Brownian motion, and prove some of its some elementaryproperties. (Note that while the term “Wiener process” refers to a particular model for theconcept of Brownian motion, following tradition, we will sometimes be sloppy and refer toboth by the same name. When we refer to Brownian motion, we always have the Wienerprocess in mind.) A construction for the Wiener process is provided in §5 . Finally §6 intro-duces conditional expectation and stopping times, and proves two remarkable homogeneityconditions satisfied by Brownian motion, known as the weak and strong Markov properties.

This paper does not cover the many applications of Brownian motion in mathematics andvarious scientific fields. However, in writing this paper, we had applications to problems indifferential equations in mind. In particular, the Markov property of Brownian motion canbe utilized to obtain solutions to the Dirichlet problem on a very general class of domains(see Bass [1, Chapter 2]). As such, this paper would provide most of the background neededfor this direction of study. Certainly other applications, found in for example [1] or [5], wouldbecome accessible as well.

The most essential prerequisite for understanding this paper is a knowledge of measuretheory. Familiarity with (measure-theoretic) probability and the basics of Fourier analysisis also important. While the results which we need from probability theory are presented in§2, this section is intended as a review rather than a thorough introduction. We also rely onresults from Fourier analysis for several of our proofs, and these results are presented in theAppendices.

This paper draws upon a number of different sources. Most of the basic material onprobability theory can be found in Durrett [3], and material on measure theory can be foundin Folland [4] and Rudin [7]. The construction we give for the Wiener process is presentedby Bass [1] and Roger and Williams [6] but is originally due to Ciesielski [2]. Our proofs ofthe Markov properties more or less follow those of Bass [1], with some modifications.

Acknowledgements

I would like to express my sincere thanks to Jim Morrow, my thesis supervisor, fromwhom I first learned about about the connection between Brownian motion and the Dirich-let problem. This paper is in a large part a product of his excellent advice, insight, andencouragement. I also would like to thank my parents and siblings, my professors at Uni-versity of Washington, and everyone who has contributed to my education thus far.

2

2 Preliminaries

2.1 Probability Theory: Basic Definitions

Let Ω is a set, F a σ-algebra of subsets of Ω, and P a measure on F. If P (Ω) = 1,we call the measure space (Ω,F, P ) a probability space. Measurable functions on Ωare called random variables, and are most commonly denoted by the letters X, Y, or Z.The codomain of a random variable X will usually be R or Rn; however, it could be anymeasurable space. Later in this paper, for example, we will study random variables takingvalues in C([0,∞),Rn), the space of continuous functions from [0,∞) into Rn (equippedwith a certain σ-algebra to be defined later). If (M,M) is a measurable space, every randomvariable X : Ω → M has an associated σ-algebra σ(X) ⊂ F, defined by σ(X) = X−1A :A ∈M. It is called the σ-algebra generated by X.

Probabilistic terminology differs from the standard terminology in analysis in a numberof cases. For example, if A ∈ F, the function 1A, defined by 1A(x) = 1 if x ∈ A and1A(x) = 0 if x /∈ A, is called the indicator function for A. In other fields of analysis,this function would be called the characteristic function for A and would be denoted byχA. A certain property T (ω), depending on the elements ω ∈ Ω, is said to hold almostsurely (abbreviated a.s.) if T holds true except on a P -null set, in contrast to the usualterm almost everywhere used in other fields of analysis. For this paper, we will use theconventional probabilistic terms and notation when working with a probability space, anduse the usual analytical terms and notation when working with another measure space (forexample Rn with Lebesgue measure). This should not cause any serious confusion.

If a random variable X takes values in Rn, the integral of X with respect to P is calledthe expected value of X and is denoted by E[X]. Integration over a set A ∈ F is denotedEA[X]. However, if we happen to be working with multiple probability measures, to avoidambiguity we may instead use the usual integration notation

∫XdP . Of course, the definition

of expected value only applies if X ≥ 0 or E|X| < ∞. If E|X| < ∞, we write X ∈ L1(P ).More generally, if E|X|p <∞ for any for 0 < p <∞, we write X ∈ Lp(P ).

Among the primary objects of interest in probability theory are the distributions ofrandom variables. (This term should not to be confused with a related but distinct conceptof the same name, occurring in other fields of analysis and physics.) Given a random variableX, taking values in a measurable space (M,M) (e.g. M = Rn and M = Bn, the Borel σ-algebra) the law, or distribution, of X is the probability measure PX on M defined by

PX(B) = P (X ∈ B) for all B ∈M. (1)

If (M,M) = (Rn,Bn), sometimes the distribution of X is given by a density function, thatis a nonnegative function ρ on Rn such that PX(B) =

∫Bρdm, where m denotes Lebesgue

measure. If X and Y are random variables, the notation X =d Y means that PX = PY .IfX1, ..., Xm are random variables taking values in measurable spaces (M1,M1), ..., (Mk,Mk)

respectively, we can regard the vector [X1, ..., Xk]T as a random variable in the product space∏k

j=1Mj, equipped with the product σ-algebra⊗k

j=1 Mj (see Folland [4, §1.2]). The joint

distribution of X1, ..., Xk is then defined to be the distribution of [X1, ..., Xk]T in

⊗kj=1Mj.

For example, if all of the Xj’s take values in R with the Borel σ-algebra B, then the joint dis-

3

tribution of the Xj’s would be the distribution of [X1, .., Xk]T in Rk with the Borel σ-algebra

Bk.

Remark 2.1. Because the measure P is finite, a probability space is often easier to workwith than more general measure spaces. For example, we have the especially simple situationthat Lq(P ) ⊂ Lp(P ) whenever q ≥ p. To prove this, suppose X ∈ Lq(P ), let A = ω ∈ Ω :|X| ≤ 1, and let B = ω ∈ Ω : |X| > 1. Then EA|X|p ≤ P (A) <∞, while |X|p ≤ |X|q onB so EB|X|p ≤ EB|X|q <∞. Therefore, E|X|p = EA|X|p + EB|X|p <∞.

Remark 2.2. In addition, verifying dominated convergence theorem is often easier in theprobabilistic setting. Specifically, suppose that Xk is a sequence of random variables suchthat Xk → X a.s, and suppose we want to show that limk→∞E[Xk] = E[X]. The dominatedconvergence theorem always applies when the sequence is bounded, say |Xk| < b for all k,because we may take the constant random variable b as our dominating function. While inother settings in analysis using the dominated convergence theorem may require a carefulargument, proofs in this paper will often just use the phrase “by dominated convergence”with no further explanation, when it is obvious we are working with a bounded sequence ofrandom variables.

Before moving on, let us prove the following useful fact

Proposition 2.3 (Borel-Cantelli Lemma). Let (Ω,F, P ) be a probability space, and supposeAn∞n=1 is a sequence of sets in F such that

∑P (An) <∞. Then P (∩∞n=1 ∪∞j=n An) = 0.

Proof. Observe that P (∩∞j=1 ∪∞n=j An) = limj→∞ P (∪∞n=jAj). If∑P (An) <∞, then

limj→∞

P (∪∞n=jAj) ≤ limj→∞

∞∑n=j

P (An)→ 0 as j →∞. (2)

Hence, the result follows.

2.2 The Probabilistic Way of Thought

A notable characteristic of probability theory is that, in contrast to other fields of analysis,the spaces on which functions (i.e. random variables) are defined, play a somewhat peripheralrole. If we want to understand some “space” K (e.g. a subset of Rn, a graph, a set continuousfunctions, a collection of real world data, etc.), the probabilistic approach to the problemis to study random variables X : Ω → K, taking advantage of their formal properties. Wedraw conclusions about K itself by examining the joint and individual distributions of therandom variables, which weight objects in K in various ways.

Within this (somewhat over-simplified) paradigm, essentially any probability space willdo, provided that the random variables have the same formal properties, the same dis-tributions, and the same joint distributions. This idea is made precise by the notion ofan extension of a probability space. Given a probability space (Ω,F, P ), a probabilityspace (Ω′,F′, P ′) is said to extend the original probability space if there exists a measur-able mapping ϕ : Ω′ → Ω, such that the mapping ϕ−1 : F → F′ preserves probabilities,

4

i.e. P (A) = P ′(ϕ−1(A)) for all A ∈ F. If (M,M) is a measurable space, let us temporar-ily introduce the notation VΩ(M,M) to denote the set of all random variables from Ω into(M,M). The next proposition says that probability space extensions are characterized bythe fact that they preserve distributions. Hence, from an abstract point of view, probabilitytheory can regarded as the study of properties of probability spaces which are preservedunder extensions.

Proposition 2.4. A probability space (Ω′,F′, P ′) extends a probability space (Ω,F, P ) if andonly if, for each measurable space (M,M), there exists a map Φ : VΩ(M,M) → VΩ′(M,M)such that, for all X ∈ VΩ(M,M), X =d Φ(X).

Remark 2.5. Of course, the map Φ also preserves joint distributions, since the joint distri-bution of a collection of random variables is the distribution of a particular random variable.

Proof. If (Ω′,F′, P ′) extends (Ω,F, P ) via the mapping ϕ : Ω′ → Ω, and (M,M) is a mea-surable, space define Φ : VΩ(M,M) → VΩ′(M,M) by Φ(X) = X ϕ. Then for any A ∈ M,P (X ∈ A) = P (X−1A) = P ′(ϕ−1X−1A) = P ′(X ϕ ∈ A). Hence X =d Φ(X).

Conversely, suppose such a map Φ exists for all measurable space (M,M). Take (M,M) =(Ω,F), let I be the identity random variable on (M,M), and let ϕ = Φ(I). Then ϕ is ameasurable map from Ω′ into Ω. Moreover, if A ∈ F, then P (A) = P (I ∈ A) = P ′(Φ(I) ∈A) = P ′(ϕ−1(A)). Hence ϕ−1 preserves probabilities. Hence (Ω′,F′, P ′) extends (Ω,F, P )via the mapping ϕ.

2.3 Independence

One concept more or less unique to probability theory is that of independence. This conceptcomes in a number of different guises, which we enumerate below.

• If (Ω,F, P ) is a probability space, two sets F1, F2 ∈ F are said to be independent ifP (F1 ∩ F2) = P (F1)P (F2).

• Two sub-σ-algebras F1,F2 ⊂ F are said to be independent if, for every F1 ∈ F1 andevery F2 ∈ F2, F1 and F2 are independent.

• Two random variables X1 and X2 are said to be independent if the σ-algebras whichthey generate are independent.

• A finite collection of sets F1, F2, ..., Fn in F is said to be independent if, wheneverI ⊂ 1, 2, ..., n, P (∩i∈IFi) =

∏i∈I P (Fi).

• Independence of a finite collection of σ-algebras F1,F2, ...,Fn and independence of afinite collection of random variables X1, X2, ...Xn are defined in obvious analogy to theprevious cases.

• Finally, an arbitrary collection Fαα∈A of sets in F is said to be independent if everyfinite subcollection is independent, and similarly for arbitrary collections of σ-algebrasand of random variables.

5

Independent random variables have excellent formal properties. For example, expectationdistributes over products of random variables.

Proposition 2.6. Suppose X1, ..., Xn is a collection of independent random variables inL1(P ). Then

E[X1X2 · · ·Xn] = E[X1]E[X2] · · ·E[Xn]. (3)

Proof. First, suppose that Xj = 1Aj , 1 ≤ j ≤ n, where each Aj ∈ F. Note that eachAj ∈ σ(Xj) and X1X2 · · ·Xn = 1A1∩···∩An . By independence

E[X1 · · ·Xn] = P (A1 ∩ · · · ∩ An) = P (A1) · · ·P (An) = E[X1] · · ·E[Xn]. (4)

Since the expected value operator is linear, it then follows that (3) holds whenever X1, ..., Xn

are simple functions. If we take X1, ..., Xn to be arbitrary elements of L1(P ), since the simplefunctions are dense in L1(P ), for each 1 ≤ j ≤ n there exists a sequence of simple functionsXj,knk=1 such thatXj,k → Xj in L1. In fact, we may assume that |Xj,1| ≤ |Xj,2| ≤ · · · ≤ |Xj|(see Folland [4, Theorem 2.10]). The equality (3) then follows by dominated convergence,where |Xj| is our dominating function.

One reason why the concept of extensions for probability spaces is powerful is that itallows us to easily obtain large collections of independent random variables. To show howthis may be accomplished, first we need to describe how the idea of product measure isextended to arbitrary products of measure spaces. We will not present the full constructionhere. For a detailed argument, see Williams [9] To sketch the key ideas of the construction, let(Ωα,Fα, Pα) : α ∈ A be an arbitrary collection of probability spaces, and let Ω =

∏α∈A Ωα.

For each α ∈ A, let πα : Ω→ Ωα be the projection map, and let F be the σ-algebra generatedby the projection maps πα (i.e. F is the smallest σ-algebra such that each πα is measurable).Let A0 be the collection of all finite intersections ∩mj=1π

−1αj

(Bj) where α1, ..., αm ∈ A andeach Bj ∈ Fαj . Let A be the set of all finite disjoint unions of sets in A0. Observe that theσ-algebra generated by the sets in A is precisely F. Moreover, one can prove that A is analgebra, i.e. is closed under finite intersections and the taking of complements. Now eachelement R ∈ A takes the form

R =n⋃i=1

mi⋂j=1

π−1αi,j

(Si,j), (5)

where for each i, αi,1, ..., αi,mi are distinct elements of A, each each Si,j ∈ Fαi,j . Here we arethinking of R is a union of n disjoint rectangles, with sides Si,j. We define a the set functionP0 on A by

P0(R) =n∑i=1

mi∏j=1

Pαi,j(Si,j), (6)

which is just the sum of the “areas” of each rectangle. Note that in particular P0(Ω) = 1.Of course, the representation (5) of elements of A is in general not unique, so one has tocheck that P0 is well-defined. Having established this, one then proves

Lemma 2.7. P0 is a premeasure on the algebra A.

6

We remark that even though as a product space Ω may have infinitely many factors, thedefinition of P0 involves only finite products, and consequently the proof of the propositionabove follows essentially verbatim the typical argument that is given in the construction ofproduct measures on finite product spaces (see e.g. Folland [4, §2.5]). Using Caratheodory’sextension theorem (see [4, Theorems 1.11 and 1.14]), one then can prove

Proposition 2.8. There exists a unique probability measure P on F such that P |A = P0.

In particular, the measure P is the outer measure on F induced by P0. In this construc-tion, the σ-algebra F generated by the projection maps πα is called the product σ-algebraon Ω, and the measure P is called the product measure on Ω. The concept of prod-uct measure allows us to prove the following result, which will play an essential role in theconstruction of Brownian motion.

Proposition 2.9. Suppose Xαα∈A is any (finite or infinite) collection of random vari-ables on the probability space (Ω,F, P ). Then there exists an extension (Ω′,F′, P ′) and anindependent collection of random variables Yαα∈A on Ω′ such that Yα =d Xα for all α ∈ A.

Proof. Let Ω′ =∏

α∈A Ω and let F′ be the corresponding product σ-algebra. For each α ∈ A,let πα : Ω′ → Ω be the projection onto α’th factor of Ω′. As above, we define P ′ to be thecorresponding product measure on F′. Hence, for each α ∈ A and F ∈ F, P ′(π−1

α A) = P (A),and if F1, ..., Fn ∈M, and α1, ..., αn are distinct elements of A, then

P ′(∩nj=1π−1αjFj) =

n∏j=1

P (Fj). (7)

Moreover, it is clear that the probability space (Ω′,F′, P ′) is an extension of (Ω,F, P ), becauseany one of the maps πα is a measurable map such that π−1

α preserves probabilities.Let (M,M) be the measurable space which is the common codomain of the Xα’s. For each

α ∈ A, define Yα = Xα πα. Given distinct elements α1, ..., αn in A and sets B1, ..., Bn ∈M,we have

P ′(Yαj ∈ Bj, for 1 ≤ j ≤ n) = P ′(∩nj=1π−1αjX−1αjBj)

=n∏j=1

P ′(π−1αjX−1αjBj) =

n∏j=1

P ′(Yαj ∈ Bj),(8)

here using equation (7). This completes the proof.

2.4 Characteristic functions

Characteristic functions provide us with an efficient method to check when two random vari-ables X and Y taking values in Rn have the same distribution. To describe these functions,we begin by defining the Fourier transform of a finite measure. If µ is a finite measure(think probability measure), the Fourier transform of µ is simply the function

µ(ξ) =

∫e−2πix·ξdµ(x) (ξ ∈ Rn). (9)

7

(This is of course in direct analogy to the definition of the usual Fourier transform of afunction f on Rn, denoted by f . See Appendix B.)

If X is a random variable taking values in Rn, let PX denote the distribution of X. Thecharacteristic function of X is the function ϕ = ϕX defined to be the Fourier transformof the measure PX , i.e.

ϕX = PX . (10)

A simple but useful observation is that, for any random variable X,

ϕX(ξ) =

∫e−2πiξ·xdPX(x) = E[e−2πiξ·X ]. (11)

Further, in the special case where the distribution of X is given by a density function ρ, then

ϕX(ξ) =

∫e−2πix·ξdPX(x) =

∫e−2πix·ξρ(x)dx = ρ(ξ). (12)

Characteristic functions are useful because of the following result, which we will proveusing the Fourier inversion theorem (see Theorem 7.6),

Theorem 2.10. Any finite measure µ on Bn is uniquely determined by its Fourier transformµ.

To prove this fact, first we need a lemma.

Lemma 2.11. Suppose µ is a finite measure on Bn, and suppose f ∈ L1(Rn). Then∫f(x)dµ(x) =

∫f(ξ)µ(ξ)dξ. (13)

Proof. This is simply a matter of applying Fubini’s Theorem. We have∫f(x)dµ(x) =

∫∫e−2πix·ξf(ξ)dξdµ(x)

=

∫f(ξ)

∫e−2πix·ξdµ(x)dξ =

∫f(ξ)µ(ξ)dξ,

(14)

as required.

Proof of Theorem 2.10. Suppose µ1 and µ2 are two finite Borel measures such that µ1 = µ2.Since the closed rectangles in Rn generate the Borel sets, to prove that µ1 = µ2, it is enoughto prove that µ1(R) = µ2(R) for any closed rectangle R ⊂ Rn. Let f = χR, and note thatf ∈ L1. Further, |f∨(x)| ≤

∫|f(x)|dx = m(R) <∞, where m is Lebesgue measure on Bn),

so f∨ is bounded. Then, by Corollary 7.9, there exists a sequence of smooth L1 functionsfj such that fj → f a.e., and each f∨j ∈ L1. By the inversion theorem (Theorem 7.6),

f∨j = fj. Thus by Lemma 2.11 and equality of µ1 and µ2,∫fj(x)dµ1(x) =

∫f∨j (ξ)µ1(ξ)dξ =

∫f∨j (ξ)µ2(ξ)dξ =

∫fj(x)dµ2(x). (15)

By dominated convergence, this equality implies

µ1(R) = limj→∞

∫fj(x)dµ1(x) = lim

j→∞

∫fj(x)dµ2(x) = µ2(R). (16)

This completes the proof.

8

The uniqueness of the Fourier transform of finite measures clearly implies

Corollary 2.12. Suppose X and Y are two random variable in Rn. Then ϕX = ϕY iffX =d Y .

The situation where we need to check that two random variables have equal distributionswill come up quite often in this paper. Corollary 2.12 will therefore be an indispensableresult.

2.5 Stochastic Processes

Stochastic process is a broad term which may refer to anything we think of as a randomquantity evolving over time. As the main stochastic process of interest in this paper isBrownian motion (see §4.1), we will use this subsection to present some basic results aboutstochastic processes which can be easily proved in general, and we will provide furtherrefinements later in the paper.

In its most basic definition, a stochastic process is parametrized family of random vari-ables Xtt∈T taking values in a measurable space (M,M), where T ⊂ [0,∞). For this paper,we will almost always take T = [0,∞) and (M,M) = (Rn,Bn). Note a stochastic processXt is associated with two collections of functions. First, for each fixed t ∈ T , there is therandom variable

Xt : Ω→M, ω 7→ Xt(ω), (17)

and second, for each fixed ω ∈ Ω, there is the function

X(ω) : T →M, t 7→ Xt(ω). (18)

The later function is called the sample path (or sometimes just path) of X correspondingto ω.

According to the basic definition of a stochastic process given above, Xt is simplyan indexed collection of random variables which may not be related to each other in anymeaningful way. In order to put greater emphasis on the sample paths, it is useful tospecialize definition of a stochastic process as follows. Let MT denote the set functionsf : T → M . For each t ∈ T , let πt : MT → M be the projection map onto the t’th factor,i.e. πt(f) = f(t) for all f ∈M . We define MT to be the product σ-algebra generated by theprojection maps πt. In other words, MT is the smallest σ-algebra such that all the maps πtare measurable. We now make the following official definition.

Definition 2.13 (Stochastic process). A stochastic process is a random variable X from Ωinto the space (MT ,MT ).

Having established this definition, we will fix T and M for the discussion that follows.The usual notation πt for projection maps is not quite adequate for our needs, so we introducefollowing notation.

Notation 2.14.

• If t ∈ T , as before πt : MT →M denotes the projection map onto the t’th factor, thatis πt(f) = f(t) for f ∈MT .

9

• If U ⊂ T , we define πU : MT →MU by πU(f) = f |U for f ∈MT .

• If t ∈ U , we define πUt : MU →M by πUt = f(t) for f ∈MU .

• Finally, if V ⊂ U , we define πUV : MU →MV by πUV (f) = f |V for f ∈MU .

For the discussion that follows, for any U ⊂ T , we will assume that the product space MU

is equipped with the product σ-algebra MU , generated by the projection maps πUt . Henceeach projection map πUt is measurable. Moreover, we have

Proposition 2.15. Suppose V ⊂ U ⊂ T . Then πUV is measurable.

Proof. Observe that, for any t ∈ U , πUt = πVt πUV . Now MV is generated by sets of formA = (πVt )−1B, where B ∈ M. Observe that (πUV )−1A = (πVt πUV )−1B = (πUt )−1B ∈ MU ,because MU is generated by sets of this form. Therefore πUV is measurable.

Define a cylinder of MT to be a set of form C = π−1S E, where S ⊂ T is finite and

E ∈ MS. Note that a cylinder C ∈ MT because πS is measurable by Proposition 2.15. LetCT ⊂MT denote the set of all cylinders of MT . When working with the σ-algebra MT , thefollowing fact is sometimes useful.

Proposition 2.16. The collection CT is an algebra. Moreover, CT is a generating set forMT

Proof. The latter statement is clear, because CT is a subset of MT and contains the collectionof sets π−1

t B : t ∈ T and B ∈ M which generates MT . As for the rest of the claim, letS ⊂ T be finite, and let E ∈MS. Then

(π−1S E)c = π−1

S (Ec) ∈ CT . (19)

Hence CT is closed under the taking of complements. Moreover, if S ′ ⊂ T is another finiteset and F ∈MS′ , observe that

π−1S E ∩ π−1

S′ F = π−1S∪S′G, (20)

where G = f ∈MS∪S′ : f |S ∈ E and f |S′ ∈ F = (πS∪S′

S )−1E∩(πS∪S′

S′ )−1F . Since the mapsπS∪S

′S and πS∪S

′

S′ are measurable, it follows that G ∈MS∪S′ . Therefore, π−1S E∩π−1

S′ F ∈MS∪S′ .We conclude that CT is closed under finite intersections. This completes the proof.

Note that with definition (2.13), it makes sense to speak of the law, or distribution,of a stochastic process. If X is a stochastic process, this is simply the measure PX onMT defined by

PX(E) = P (X ∈ E) (E ∈MT ). (21)

In practice, rather than working directly with the law, or distribution of a stochasticprocess, it is often easier to work with the finite dimensional distributions, which are definedas follows. Since the maps πS are measurable, the function

πS X (22)

10

is a random variable taking values in the measurable space (MS,MS). Let us denote thelaw of such a random variable by µS. We then define the collection

µS : S ⊂ T is finite (23)

to be the finite-dimensional distributions of X. Explicitly, if say S = s1, ..., sm ⊂ T ,and A = A1 × · · · × An ⊂MS, then

µS(A) = P (Xs1 ∈ A1, ..., Xsn ∈ An). (24)

This is simply the joint distribution of the random variables Xs1 , ..., Xsn .To understand the distribution of a stochastic process, it turns out that the finite-

dimensional distributions tell us everything.

Proposition 2.17. The finite-dimensional distributions of a stochastic process uniquely de-termine its law.

Proof. Let X be a stochastic process. By the Caratheodory extension theorem and the factthat the cylinders CT form an algebra which generates MT (see Folland [4, Theorems 1.11and 1.14]), the measure PX is uniquely determined by its values on sets of form π−1

S E, whereS ⊂ T is finite and E ∈MS. But

PX(π−1S E) = P (X ∈ π−1

S E) = P (πS X ∈ E) = µS(E). (25)

Thus PX is uniquely determined by the finite-dimensional distributions µS.

Finally, we restrict our attention to the case of a stochastic process X = Xtt∈T , whereT = [0,∞) and M = Rn with the Borel σ-algebra. A rather important case of this situationis the following.

Definition 2.18 (Continuous stochastic process). Suppose that X = Xtt≥0 is a stochasticprocess in Rn.

i. We say that X is continuous if the paths of X are continuous, i.e. t 7→ Xt(ω) iscontinuous for all ω ∈ Ω.

ii. We say that X is almost surely continuous if there exists a null set N ∈ F such thatt 7→ Xt(ω) is continuous for all ω ∈ N c.

Note that since stochastic processes are just random variables, it makes sense to speakof two stochastic processes being independent, and we may apply Proposition 2.9 to obtain(possibly large) collections of stochastic processes with given distributions. However, even ifwe start with continuous stochastic processes, there is no reason that the resulting indepen-dent processes should still be continuous. Later in this paper, it will be essential to be ableto obtain independent continuous stochastic processes, and with this end in mind we provethe following proposition. The proof is very similar to the proof of Proposition 2.9.

Proposition 2.19. Suppose X = Xtt≥0 and Y = Ytt≥0 are stochastic processes takingvalues in Rn. Then (after possibly extending the probability space) there exist stochasticprocesses X ′ = X ′tt≥0 and Y ′ = Y ′t t≥0, such that

11

a. For all t ≥ 0, Xt =d X ′t and Yt =d Y ′t .

b. For all s, t ≥ 0, X ′s and Y ′t are independent.

c. Moreover, if X and Y are almost surely continuous, X ′ and Y ′ may be chosen to bealmost surely continuous.

Proof. If X and Y are initially defined on the probability space (Ω,F, P ), define a newprobability space (Ω′,F′, P ′) by letting Ω′ = Ω × Ω and letting F′ and P ′ be the inducedproduct σ-algebra and product measure on Ω′. For t ≥ 0 and (ω1, ω2) ∈ Ω′ = Ω × Ω, wedefine

X ′t(ω1, ω2) = Xt(ω1), and

Y ′t (ω1, ω2) = Yt(ω2).(26)

Given a Borel set B ⊂ Rn, let E = X−1t B. Clearly X ′−1

t B = E × Ω. Therefore, P ′(X ′t ∈B) = P ′(E × Ω) = P (E)P (Ω) = P (E) = P (Xt ∈ B). Thus Xt =d X ′t. Similarly Yt =d Y ′t .

Given s, t ≥ 0, to see that X ′s and Y ′t are independent, let B,A ⊂ Rn be a Borel sets,let E = X−1

s B and F = Y −1t A. Then X ′−1

s B = E × Ω and Y ′−1t A = Ω × F . Hence

P ′(X ′s ∈ B, Y ′s ∈ A) = P ′(E × F ) = P ′(E)P ′(F ) = P ′(X ′t ∈ B)P (Y ′t ∈ A).Finally suppose that X has almost surely continuous paths. Then there exists N ⊂ Ω

such that P (N) = 0, and for all ω1 ∈ N c, Xt(ω1) is continuous in t. Then it is clear from thedefinition (26) that X ′t(ω1, ω2) is continuous for all (ω1, ω2) ∈ N c ×Ω. Since P ′(R×Ω) = 1,this shows that X ′t is almost surely continuous. Similarly, if Y has almost surely continuouspaths, then so does Y ′. This completes the proof.

3 Gaussian Distribution

The first ingredient needed for the mathematical description of Brownian motion is Gaussiandistribution.

Basic definitions and results

A random variable X has Gaussian or normal distribution (the names are entirely inter-changeable) if its distribution is given by dPX(x) = b−1e−π(x−a)2/b2dx, where a, b real numbersb > 0. However, it is convenient to include in our definition degenerate distribution, wherethe variance of X is zero. We therefore take the following as our official definition of normaldistribution.

Definition 3.1 (normal distribution). A random variable X, taking values in R, is said tobe normally distributed, if either the variance of X is zero, or the distribution of X is givenby the density function

ρ(x) = b−1e−π(x−a)2/b2 , (27)

for some a, b ∈ R and b > 0.

12

Remark 3.2. If the distribution of X is given by (27), then a and b2 are respectively themean and variance of X. This may be checked by writing

E[X] =

∫b−1xe−π(x−a)2/b2dx and Var(X) =

∫b−1(x− a)2e−π(x−a)2/b2dx, (28)

and applying standard integration techniques.

The next result is trivial, but it is important since it will allow us to streamline manyarguments to come.

Proposition 3.3. The distribution of a Gaussian random variable is uniquely determinedby its mean and variance.

Proof. Let X be a Gaussian random variable, let a = E[X] and b = Var(X). If b > 0, theresult is immediate from the definition of Gaussian distribution and the previous remark. Ifb = 0, then X = a a.s. In this case, the distribution of X is defined by PX(B) = 1 if a ∈ Fand PX(B) = 0 otherwise, for all Borel sets B.

This justifies the following notation.

Notation 3.4. To indicate the distribution of a Gaussian random variable, we write X ∈N(a, b2), where a = E[X] and b2 = Var(X).

Using Proposition 7.8 in Appendix B, we have

Proposition 3.5. If X ∈ N(a, b2), then the characteristic function of X is

ϕ(ξ) = e−πb2ξ2e−2πiaξ, (29)

Proof. By (12), if b > 0 and ρ is the density function of X, then ϕX = ρ. Using Proposition7.8, we compute

ρ(ξ) = b−1

∫e−2πixξe−π(x−a)2/b2dx = b−1e−2πiaξ

∫e−2πiyξe−πy

2/b2dy

= e−2πiaξ

∫e−2πibzξe−πz

2

dz = e−πb2ξ2e−2πiaξ,

(30)

which is the desired result. If on the other hand, b = 0, then X = a a.s., and so ϕX(ξ) =E[e−2πiξX ] = e−2πiaξ.

The next proposition illustrates how Gaussian distribution behaves well under the imageof linear maps.

Proposition 3.6. Suppose Xjmj=1 is a collection of independent Gaussian random vari-ables, such that each Xj ∈ N(aj, b

2j), for some real numbers aj and bj. Let Y =

∑m1 cjXj be

a linear combination of the Xj’s, where each cj ∈ R. Then Y ∈ N(∑cjaj,

∑c2jb

2j).

13

Proof. Note that E[cjXj] = cjE[Xj] = cjaj and Var(cjXj) = c2jVar(Xj) = c2

jb2j . Hence

each cjXj ∈ N(cjaj, c2jb

2j). Therefore, using the previous proposition and Proposition 2.6, we

compute

ϕY (ξ) = E[e−2πiξY ] = E[m∏j=1

e−2πiξcjXj ] =m∏j=1

E[e−2πiξcjXj ]

=m∏j=1

ϕcjX(ξ) =m∏j=1

e−πc2j b

2jξ

2

e−2πicjajξ = e−π(∑c2j b

2j )ξ

2

e−2πi(∑cjaj)ξ.

(31)

But by the previous proposition this is the characteristic function of a random variable inN(∑cjaj,

∑c2jb

2j). The result follows by uniqueness in Corollary 2.12.

The following simple estimate will be needed later.

Proposition 3.7. Suppose Z ∈ N(0, 1) and r ≥ 1/2π. Then

P (Z ≥ r) ≤ e−πr2

. (32)

Proof. Indeed,

P (Z ≥ r) =

∫ ∞r

e−πx2

dx ≤∫ ∞r

x

re−πx

2

dx

= (− 1

2πre−πx

2

)|∞r ≤ e−πr2

,

(33)

as claimed.

Multidimensional Gaussian distribution

Proposition 3.6 motivates the following definition for Gaussian distribution in Rn.

Definition 3.8 (Multinormal distribution). A random variable X in Rn is said to be nor-mally distributed if its image ϕ(X) under any linear functional φ : Rn → R is a Gaussianrandom variable in R.

Other names for such a distribution are (multi)normal distribution or Gaussiandistribution. If the components of X are given explicitly, say X = [X1, X2, ..., Xn]T , wesay that the Xj’s are jointly Gaussian to mean that X is Gaussian.

Remark 3.9. Gaussian distribution is preserved by linear maps. Indeed, if T : Rn → Rm isa linear map, and X is a Gaussian random variable in Rn, then TX is a Gaussian randomvariable in Rm, because if φ is a linear functional on Rm, then φ T is a linear functional onRn.

If X is a Gaussian random variable in Rn, the analogue for the one-dimensional concept ofvariance is the covariance of X, defined to be the matrix Cov(X) = E[XXT ]. Proposition3.3 turns out to still be essentially true in higher dimensions.

14

Proposition 3.10. The distribution of a Gaussian random variable in Rn is uniquely de-termined by its mean and covariance.

Proof. Suppose X and Y are two Gaussian random variables in Rn such that E[X] = E[Y ]and Cov(X) = Cov(Y ). By replacing X and Y with X − a and Y − a, where a = E[X] =E[Y ], we may assume that E[X] = E[Y ] = 0. By definition of Gaussian distribution inRn, for any ξ ∈ Rn, ξ · X and ξ · Y are Gaussian random variables in R. Also E[ξ · X] =E[ξ·Y ] = 0 sinceX and Y are mean-zero. Moreover, Var(ξ·X) = E[(ξ·X)2] = E[ξTXXT ξ] =ξTCov(X)ξ. Similarly Var(ξ · Y ) = ξTCov(Y )ξ. Hence Var(ξ · X) = Var(ξ · Y ). Thus byProposition 3.3 ξ ·X =d ξ · Y . Thus by Corollary 2.12 ϕξ·X = ϕξ·Y . Observe that

ϕξ·X(1) = E[e−2πiξ·X ] = ϕX(ξ). (34)

Similarly, ϕξ·Y (1) = ϕY (ξ). Therefore, ϕX(ξ) = ϕY (ξ). It then follows by Corollary 2.12again that X =d Y .

This proposition justifies introducing the following notation for the distribution of aGaussian random variable.

Notation 3.11. To indicate the distribution of a Gaussian random variable in Rn, we writeX ∈ Nn(a, S), where a = E[X] and S = Cov(X).

The most fundamental example of an n-dimensional Gaussian random variable is Z =[Z1, ..., Zn]T , where the Zj’s are independent N(0, 1) random variables. The fact that Z isGaussian follows from Proposition 3.6, which shows that a linear combination of indepen-dent Gaussian random variables is Gaussian. We call Z (or any random variable with thesame distribution) a standard normal random variable in Rn. Observe that, usingthe notation introduced above, Z ∈ Nn(0, I), where I is the n × n identity matrix. Thenext proposition shows that any n-dimensional Gaussian random variable is the affine linearimage of a standard normal random variable.

Proposition 3.12. If X is a Gaussian random variable in Rn, then X =d BZ + a, forsome B ∈ Mn(R), a ∈ Rn, and standard normal random variable Z in Rn. Furthermore,E[X] = a and Cov(X) = BBT .

First we need a lemma.

Lemma 3.13. Suppose S is a symmetric, positive semidefinite matrix in Mn(R). ThenS = BBT for some B ∈ Rn.

Proof. Since S is symmetric, by the spectral theorem for real matrices, we may write S =GDGT for some orthogonal matrix G and diagonal matrix D. Since S is positive semidefinite,the diagonal entries of D are nonnegative. Therefore, we may write D = H2, where H isthe diagonal matrix whose diagonal entries are the nonnegative square roots of the diagonalentries of D. Thus, if we set B = GH, then S = BBT , as required.

Proof of Proposition 3.12. We may assume EX = 0 and prove that X =d BZ, becauseotherwise we may set a = EX and replace X with X − a. Let S = Var(X) = E[XXT ],and observe that S is a symmetric, positive semidefinite matrix. Indeed, given a nonzero

15

z ∈ Rn, zTX is a Gaussian random variable in R, by definition of Gaussian distribution inRn. Expected value commutes with linear maps, so zTSz = E[zTXXT z] = E[(zTX)2] ≥ 0.Thus by the lemma we may write S = BBT for some B ∈ Mn(R). Let Z be a standardnormal random variable in Rn, and observe that E[BZ] = BE[Z] = 0 and Cov(BZ) =E[BZZTBT ] = BCov(Z)BT = BBT = S. Thus the mean and covariance of BZ agree withthose of X. Thus by 3.10, X =d BZ, as required.

Corollary 3.14. Suppose X is a Gaussian random variable in Rn. Let S = Cov(X) anda = E[X]. Then

ϕX(ξ) = e−πξTSξe−2πiξ·a. (35)

Proof. By the previous proposition, X =d BZ + a, where a = E[X] and BBT = S. Ifu ∈ Rn, then uTZ = u1Z1 + u2Z2 + · · ·+ unZn. Hence by Proposition 3.6 uTZ ∈ N(0, uTu).Then by Proposition 3.5

ϕuTZ(1) = Ee−2πiuTZ = e−πuTu. (36)

Substituting u = BT ξ, where ξ ∈ Rn, yields

Ee−2πiξTBZ = e−πξTBBT ξ. (37)

Therefore, the characteristic of function of X is

Ee−2πiξ·X = E[e−2πiξTBZe−2πiξ·a] = e−πξTBBT ξe−2πiξ·a, (38)

as required.

We close this section with a result showing that Gaussian random variables behave nicelywith respect to limits. This result will be useful to have on hand when we construct theWiener process in §4.

Proposition 3.15. Suppose Xk∞k=1 is a sequence of Gaussian random variables in Rn

such that Xk → X a.s., and the components of X are in L2(P ). Then X is also a Gaussianrandom variable.

Proof. Set ak = E[Xk] and a = E[X]. Set Sk = Cov(Xk) = E[XkXTk ] and let S = Cov(X) =

E[XXT ]. We claim that ak → a and Sk → S as k → ∞. These assertions follow bydominated convergence applied to the components of ak and Sk. Indeed, by assumption thecomponents of X are in L2(P ) (and hence in L1(P ) by Remark 2.1). The components of eachXk are in L2(P ) since these are Gaussian. Thus, for 1 ≤ j ≤ n, (ak)j = E[(Xk)j]→ E[Xj] =aj, by taking 2|Xj| as our dominating function. For 1 ≤ i, j ≤ n, (Sk)i,j = E[(Xk)i(Xk)j]→E[XiXj] = Si,j, by taking 2|XiXj| as our dominating function.

We also observe S is symmetric positive semidefinite. Indeed, S is symmetric becausethis is true of any covariance matrix. Let v be an eigenvector of S, and let λ be thecorresponding eigenvalue. Then vTMkv ≥ 0 for all k, because each Mk is the covariancematrix of a Gaussian random variable and hence is positive semidefinite. Hence λ|v|2 =vTMv = limk→∞ v

TMkv ≥ 0. This shows λ ≥ 0, so S is positive semidefinite.Next, we compute the characteristic functions. By Corollary 3.14, and the fact that each

of the Xk’s are normal, we have

ϕXk(ξ) = Ee−2πiξ·Xk = e−πξTSkξe−2πiξak → e−πξ

TSξe−2πiξa, (39)

16

as k →∞. On the other hand, by dominated convergence,

ϕXk(ξ) = Ee−2πiξ·Xk → Ee−2πiξ·X = ϕX(ξ), (40)

as k → ∞. Therefore, ϕX(ξ) = e−πξTSξe−2πiξa. Since S is symmetric positive definite, by

Lemma 3.13, S = BBT for some B ∈ Mn(R). Hence ϕX is the characteristic function ofthe Gaussian random variable BZ + a, where Z ∈ N(0, I). Thus X is Gaussian, and inparticular X ∈ N(a, S).

4 Wiener Process

The Weiner process is the mathematical model for Brownian motion. This section defines theWiener process, first in R and then in Rn, and establishes several key invariance properties.Proof of the existence of the Wiener process is deferred to the next section.

Definition 4.1 (Weiner process in R). Suppose X = Xtt≥0 is a real valued stochasticprocess. We say that X is a Wiener process if the following four conditions are satisfied.

i. X0 = 0 a.s.

ii. If 0 ≤ t0 < · · · < tm, then, for 1 ≤ j ≤ m, the increments Xtj −Xtj−1are independent.

iii. If 0 ≤ s < t, then Xt −Xs ∈ N(0, t− s).

iv. The paths of X are almost surely continuous (i.e. there exists a measurable set A ⊂ Ωsuch that P (A) = 1 and for all ω ∈ A, the map t 7→ Xt(ω) is continuous).

More generally, if we replace condition (i) with the requirement that X0 = x a.s. for somex ∈ R, we say that X is a Wiener process starting from x.

A stochastic process Y is said to be a Gaussian process if the finite-dimensional-distributions of Y are Gaussian. Since by condition (iii), for t ≥ 0, Xt ∈ N(0, t), it followsthat the Wiener process is an example of a Gaussian process.

Remark 4.2. While strictly speaking the term Wiener process refers to the mathematicalmodel for the physical phenomenon of Brownian motion, in practice the term Brownianmotion is more common and in a mathematical context is understood to refer to the Wienerprocess. In this paper, we typically use the term Wiener process, but sometimes use theterm Brownian motion in more informal discussions.

The next two propositions gives us equivalent characterizations for a Wiener processwhich are often easier to verify than Definition 4.1.

Proposition 4.3. Suppose that Y = Y t≥0 is a stochastic process with almost surely con-tinuous paths (as in condition (iv) of Definition 4.1). Then Y is a Wiener process startingfrom x if and only if its finite-dimensional distributions agree with those of a Wiener processstarting from x.

17

Proof. The direction (⇒) is immediate. Conversely, suppose the finite-dimensional distri-butions of Y agree with those of a Wiener process X starting from x. Then P (Y0 = x) =P (X0 = x) = 1, so Y satisfies (i). If 0 ≤ t1 < t2 < · · · < tm, with m ≥ 2, then thedistribution of [Yt1 , ..., Ytm ]T in Rm agrees with that of [Xt1 , ..., Xtm ]T . Thus, if T : Rm → Rm

is the linear map defined by

T ([x1, x2, ..., xm]T ) = [x1, x2 − x1, ..., xn − xn−1]T , (41)

then the distribution of T ([Yt1 , ..., Ytm ]T ) in Rm agrees with that of T ([Xt1 , ..., Xtm ]T ). SinceX is a Wiener process, it follows that the Ytj −Ytj−1

’s are independent and have N(tj− tj−1)distribution, so (ii) and (iii) are satisfied. Since the paths of Y are almost surely continuous,it follows that Y is a Wiener process.

Proposition 4.4. Suppose Y = Ytt≥0 is a Gaussian process with almost surely continuouspaths (as in condition (iv) of Definition 4.1). Then Y is a Wiener process if and only if, forall s, t ∈ [0,∞), EYs = 0 and EYsYt = mins, t.

Proof. Suppose Y is a Wiener process, and s, t ∈ [0,∞). Assume without loss of generalitythat s < t. Since Ys ∈ N(0, s), EYs = 0, and independence of increments in condition (ii)gives us

E[YsYt] = E[Ys(Yt − Ys)] + E[Y 2s ] = E[Ys]E[Yt − Ys] + E[Y 2

s ] = s. (42)

Conversely, suppose for all s, t ∈ [0,∞), EYs = 0 and EYsYt = mins, t. Note that X is aGaussian process, and as we have already shown E[XsXt] = mins, t. Suppose t0, ..., tq aredistinct times in [0,∞). Then [Yt0 , ..., Ytq ]

T and [Xt0 , ..., Xtq ]T are Gaussian random variables

in Rq, and, for 1 ≤ i, j ≤ q, E[XtiXtj ] = E[YtiYtj ] = minti, tj. Hence, [Yt0 , ..., Ytq ]T and

[Xt0 , ..., Xtq ]T have equal covariance matrices. So, by Proposition 3.10, X and Y have the

same finite-dimensional distributions. The result then follows by Proposition 4.4.

Brownian motion process possesses a number of interesting symmetries. Some basicresults on the symmetries of Brownian are enumerated in the following proposition.

Proposition 4.5. Let X be a Wiener process starting from 0. Then

a. (Space translation) For all h ∈ R, Xt + ht≥0 is a Wiener process starting from h.

b. (Time translation) For all a > 0, Xt+a −Xat≥0 is a Wiener process.

c. (Scaling property) For all c 6= 0, cXt/c2t≥0 is a Wiener process.

Proof. (a) follows easily from Definition 4.1: We have X0 + h = h a.s. because X startsfrom 0. (ii) and (iii) hold because translation does not change the increments Xt − Xs,s, t ∈ [0,∞). And (iv) holds because translation is a continuous operation.

With regard to (b) and (c), we make the following observation. If Z = Ztt≥0 is aGaussian process, then so is Zf(t), whenever f is an injective function from [0,∞) into[0,∞). For, if t1, ..., tq are distinct points in [0,∞), then so are f(t1), ..., f(tq), and hencethe distribution of (Zf(t1), ..., Zf(tq)) is Gaussian, since Zt is a Gaussian process. Thus, theprocesses (b) and (c) are Gaussian.

18

Processes (b) and (c) are almost surely continuous since X is. Moreover, if 0 ≤ s < t,then E[Xs+a − Xs] = 0 − 0 = 0 and E[cXt/c2 ] = c0 = 0. Using Proposition 4.4, we alsocompute

E[(Xt+a −Xa)(Xs+a −Xa)] = (s+ a)− 2a+ a = s, (43)

andE[(cXt/c2)(cXs/c2) = c2 s

c2= s. (44)

Thus by Proposition 4.4, processes (b) and (c) are Wiener processes.

The definition of multidimensional Wiener process is simply the following.

Definition 4.6 (n-dimensional Weiner process). Suppose X = Xtt≥0 is a Wiener processtaking values in Rn. We say that X is an (n-dimensional) Wiener process if the componentsare independent, and each component is a Wiener process in R.

Note that all of the symmetries enumerated in Proposition 2.3 extend in obvious waysto multidimensional Wiener processes. For example, if Xt is an n-dimensional Wienerprocess, then so is cXt/c2, for any c 6= 0, where scaler multiplication is defined component-wise. In fact, the n-dimensional Wiener process is more symmetric than first meets the eye.Fundamental to applications to the Dirichlet problem is the following result.

Proposition 4.7 (Rotational symmetry). Suppose that X is an n-dimensional Wiener pro-cess starting from x ∈ Rn, and A ∈ SO(n). Then AX a Wiener process starting at Ax.

Proof. By replacing X with X − x, we may assume that x = 0. Then AX0 = 0 a.s. sinceX0 = 0 a.s. Also, the paths of AX are almost surely continuous, since this is true of X. HenceAX satisfies conditions (i) and (iv) of Definition 4.1. In addition, if 0 ≤ t0 < t1 < · · · tm,then the increments Xtj − Xtj−1

are independent. Hence the increments AXtj − AXtj−1

are independent, so AX satisfies condition (ii). Finally, suppose that 0 ≤ s < t. Then, for1 ≤ i ≤ n, the component (Xt)i−(Xs)i ∈ N(0, t−s). Since the components are independent,Xt −Xs is a mean-zero n-dimensional Gaussian random variable, and its covariance matrixis

Cov(Xt −Xs) = E[(Xt −Xs)(Xt −Xs)T ] = (t− s)I.

In addition, AXt − AXs is an n-dimensional Gaussian random variable. Since the expectedvalue operator commutes with linear maps and A is orthogonal, the covariance matrix ofAXt − AXs is

Cov(AXt − AXs) = E[(AXt − AXs)(AXt − AXs)T ]

= AE[(Xt −Xs)(Xt −Xs)T ]AT

= (t− s)AAT = (t− s)I.

Since the covariance matrices are equal, Proposition 3.10 implies that Xt−Xs and AXt−AXs

are equally distributed. This proves that AX satisfies condition (iii) of Definition 4.1, so weare done.

19

5 Construction of the Wiener process

We have deduced some basic properties of the Wiener process, but so far it is not obviousthat such a process should even exist. To emphasize this fact, suppose we modified Definition4.1 of the Wiener process, by replacing condition (iii) with

(iii)’ If 0 ≤ s < t, then Xt −Xs ∈ N(0, (t− s)p),

for some real number p 6= 1. If 0 ≤ s < t, then, on the one hand, by (iii)’ E[(Xt −Xs)2] =

(t− s)p. On the other hand, using independence of increments in condition (ii) of Definition4.1, we have

E[(Xt −Xs)2] = E[X2

t ] + E[X2s ]− 2E[XtXs]

= tp + sp − 2(E[Xs(Xt −Xs)] + E[X2s ])

= tp + sp − 2(E[Xs]E[Xt −Xs] + E[X2s ])

= tp + sp − 2sp = tp − sp.

(45)

This gives us a contradiction, because in general (t− s)p 6= tp − sp.The construction of Brownian motion we present is due to Ciesielski [2]. Note that we

only need to construct the one-dimensional Wiener process, since the n-dimensional Wienerprocess is defined in terms of the one-dimensional process. While our construction is oneof the easiest ways to obtain the Wiener process, it is still somewhat technical, so beforegetting our hands dirty, let us sketch our plan.

(1) Our first task is to define a special collection of functions ϕij : i ≥ 0, and 0 ≤ j ≤2i−1− 1, known as Haar functions, in the real Hilbert space L2[0, 1]. We will provethat this collection of functions is in fact an orthonormal set in the L2[0, 1].

(2) For each ϕij, define ψij(t) =∫ t

0ϕij, where t ∈ [0, 1], and let Yij be an collection of

independent N(0, 1) random variables. Define V0(t) = Y00ψ00, and for each i ≥ 1 define

Vi(t) =2i−1∑j=1

Yijψij(t). (46)

Our prototype of for Brownian motion will be the process

Xt =∞∑i=0

Vi(t). (47)

The difficult part of our argument will be verifying that Xt converges almost surelyto a continuous function in t. Once we have proven this result, we will show that Xsatisfies conditions (i)-(iv) of the Definition 4.1 of the Wiener process. This task willbe greatly simplified by the tools of Hilbert space theory applied to the orthonormalset ϕij.

20

(3) The process Xt, obtained in step (2), satisfies all the requirements of Definition 4.1,except that it is only defined for t ∈ [0, 1]. However, using time inversion (see Propo-sition 4.5(d)), we will be able extend Xt to the interval [0,∞) by defining Xt = tY1/t

for t ≥ 1, where Ys is an independent copy of Xs, for s ∈ [0, 1].

5.1 Haar functions

While Haar functions are conceptually simple, their definition is awkward to write out, andhence we devote a whole subsection to them. [I hope to include some pictures in this section.]We begin with an indexed collection of subintervals of the unit interval I = Iij : 0 ≤ i <∞, and 0 ≤ j ≤ 2i−1, with elements defined as follows. Let I00 = [0, 1). Let I10 = [0, 1/2)and I11 = [1/2, 1). Let I20 = [0, 1/4), I21 = [1/4, 1/2), I22 = [1/2, 3/4), and I23 = [3/4, 1).Continuing inductively, we let each Ii0 = 1

2Ii−1,0 = x/2 : x ∈ Ii−1,0 and we let each

Iij = Ii0 + j2−i = x+ j2−i : x ∈ Iij.A useful decomposition of each interval Iij is obtained as follows. For each i and j,

observe that Iij = Ii+1,2j ∪ Ii+1,2j+1. Reasoning inductively, we obtain for each integer k ≥ 0,

Iij = Ii+k,2kj ∪ Ii+k,2kj+1 ∪ · · · ∪ Ii+k,2kj+2k−1. (48)

Note that the intervals making up the union in the above expression are disjoint. Further,the intersection of Iij with any other interval in I of form Ii+k,m not appearing in the aboveexpression is empty. We will refer back to decomposition given by (48) a number of times.

For 0 ≤ i <∞ and 0 ≤ j ≤ 2i−1 − 1, we define functions ϕij : [0, 1]→ R as follows. Letϕ00 ≡ 1, and for i ≥ 1 define

ϕij(x) =

2(i−1)/2 x ∈ Ii,2j−2(i−1)/2 x ∈ Ii,2j+1

0 otherwise.

(49)

Hence the support of each function ϕij is Ii,2j∪Ii,2j+1 = Ii−1,j. Functions taking this form arecalled Haar functions. The great utility of Haar functions comes from the following result.

Proposition 5.1. The collection of Haar functions ϕij : 0 ≤ i < ∞, and 0 ≤ j ≤ 2i − 1is a complete orthonormal system for the Hilbert space L2[0, 1].

Proof. First, we prove that the collection ϕij is orthonormal in L2[0, 1]. By construc-tion, each interval Ii−1,j = supp(ϕij) has width 2−(i−1), and we have ||φij||22 = 〈ϕij, ϕij〉 =∫

2i−1χIi+1,j= 1. Further, if (i, j) 6= (p, q), we claim that 〈ϕij, ϕpq〉 = 0. To see this, note

that this is clearly true if i = p because then supp(ϕij) = Ii−1,j and supp(ϕpq) = Ii−1,q aredisjoint. If i 6= p, assume without loss of generality that i < p, and write p = i + k, forsome positive integer k. Either supp(ϕpq) and supp(ϕij) are disjoint, in which case clearly〈ϕij, ϕpq〉 = 0; or supp(ϕpq) and supp(ϕij) are not disjoint – that is, Ip−1,q = Ii−1+k,q andIi−1,j are not disjoint. Then by the decomposition given by equation (48) Ip−1,q ⊂ Ii−1,j, andfurther we observe that either Ip−1,q is a subset of Ii,2j or Ip−1,q is a subset of Ii,2j+1. Hence,

either ϕijϕpq = 2(i−1)/2ϕpq or ϕijϕpq = −2(i−1)/2ϕpq. In both cases, 〈ϕij, ϕpq〉 =∫ 1

0ϕijϕpq = 0.

Thus, we have shown that the collection ϕij is orthonormal.

21

To prove completeness of the collection ϕij, we observe that any half-open intervalJ = [a, b), with endpoints in the dyadic rationals in [0, 1], may be written as

J =n⋃

j=m

Iij, (50)

for some positive integers i,m, and n, with 0 ≤ m ≤ n ≤ 2i − 1. In addition, note that bydefinition ϕ00 = χI00 , and for i ≥ 1 and 0 ≤ j ≤ 2i−1 − 1 we have

χIi,2j =1

2(χIi−1,j

+ 2−(i−1)ϕi−1,j), (51)

and

χIi,2j+1=

1

2(χIi−1,j

− 2−(i−1)ϕi−1,j). (52)

These two equations just come from the definition of ϕi−1,j and the fact that Ii−1,j = Ii,2j ∪Ii,2j+1. Induction on i then gives us the following result: Every characteristic functionχIij may be written as a linear combination of Haar functions. Since χJ =

∑mj=n χIij ,

it follows that χJ may be written as a linear combination of Haar functions. Since thecollection of linear combinations of functions of form χJ , where as above J ⊂ [0, 1] a half-open interval with end-points in the dyadic rationals, is dense in L2[0, 1], we conclude thatlinear combinations of Haar functions are also dense in L2[0, 1]. This shows that the collectionϕij is complete.

5.2 The Main Construction

Start with the collection of Haar functions ϕij : 0 ≤ i < ∞, and 0 ≤ j ≤ 2i−1, and, foreach ϕij, define the function ψij on [0, 1] by

ψij(t) =

∫ t

0

ϕij(s)ds. (53)

We record a useful set of bounds for the functions ψij.

Lemma 5.2. For all i, j, 0 ≤ ψij ≤ 2−(i+1)/2 and supp(ϕij) = Ii−1,j.

Proof. This is simply a matter of writing down an explicit expression for ψij. By definition,ϕij(x) = 2(i−1)/2 on Ii,2j = [(2j)2−i, (2j + 1)2−i], ϕij(x) = −2(i−1)/2 on Ii,2j+1 = [(2j +1)2−i, (2j + 2)2−i], and ϕij(x) = 0 otherwise. Hence,

ψij(t) =

2(i−1)/2(t− (2j)2−i) t ∈ [(2j)2−i, (2j + 1)2−i]

2(i−1)/2 − 2(i−1)/2(t− (2j + 1)2−i) t ∈ [(2j + 1)2−i, (2j + 2)2−i]

0 otherwise.

(54)

Therefore, ψij ≥ 0, and ψij attains its maximum value at t = (2j+1)2−i, and this maximumvalue is ψij((2j + 1)2−i) = 2(i−1)/22−i = 2−(i+1)/2. Note that (54) also shows supp(ϕij) =[(2j)2−i, (2j + 2)2−i] = Ii−1,j.

22

Next, let Yij be a collection of independent N(0, 1) random variables. (By Proposition2.9 such a collection exists.) As described in our outline, we define V0(t) = Y00ψ00, and

Vi(t) =2i−1∑j=1

Yijψij(t), (55)

for each i ≥ 1. Our prototype for the Wiener process is

Xt =∞∑i=0

Vi(t). (56)

Our next objective is to show that this series indeed converges almost surely to a continuousfunction in t.

Lemma 5.3. With probability 1,∑∞

i=0 Vi(t) converges uniformly to a continuous functionon [0, 1].

Proof. The main ingredient is the Borel-Cantelli Lemma (Proposition 2.3). Consider thesequence of sets Ai∞i=1 defined by Ai = (|Vi(t)| > i−2 for some t ∈ [0, 1]). It is not imme-diately clear that the sets Ai are measurable. To see that in fact they are, observe that:|Vi(t)| > i−2 for some t ∈ [0, 1] if and only if |Vi(t)| > i−2 for some t ∈ [0, 1]∩Q, by continuityof Vi(t) for each fixed ω. Therefore, for each i ≥ 1, Ai =

⋃t∈[0,1]∩Qω : |Vi(t)| > i−2, which

is a countable union of measurable sets and hence measurable.For i ≥ 1, notice that if |Vi(t)| > i−2, then |Yij|ψij(t) > i−2 for some 1 ≤ j ≤ 2i−1,

because |Vi(t)| ≤∑

j |Yij|ψij(t) and the ψij’s have disjoint support by Lemma 5.2. Then

|Yij|2−(i+1)/2 > i−2 for some 1 ≤ j ≤ 2i−1 by the bound obtained in Lemma 5.2. It followsthat, for each i ≥ 1,

P (Ai) ≤ P (|Yij|2−(i+1)/2 > i−2 for some 1 ≤ j ≤ 2i−1)

≤ 2i−1P (|Z| ≥ 2(i+1)/2i−2)

≤ 2i−1 exp(−2ii−4).

(57)

Here Z =d Yij is an N(0, 1) random variable, and the last line of (57) is obtained fromProposition 3.7. It is easy to see that

∑2i−1 exp(−2ii−4) converges by, for example, the root

test.Thus

∑∞i=1 P (Ai) <∞, and by the Borel-Cantelli Lemma, P (∩∞k=0∪∞i=kAi) = 0. In other

words, with probability 1, there are only finitely many i ≥ 1 such that for some t ∈ [0, 1],|Vi(t)| ≥ i−2. Hence, with probability 1,

∑∞i=0 Vi(t) converges uniformly on [0, 1]. Since each

of the Vi(t)’s is continuous in t, uniform convergence implies that∑∞

i=0 Vi(t) is continuousin t. This completes the proof.

Let Xt =∑∞

i=0 Vi(t). In the next two lemmas, we first show that Xt has finite variance(i.e. Xt ∈ L2(P ) for each fixed t), and then we obtain more precise result.

Lemma 5.4. For all t ∈ [0, 1], Var(X2t ) <∞.

23

Proof. To begin, observe that for any integer N > 0 and t ∈ [0, 1],(N∑i=0

Vi(t)

)2

=N∑

i,k=0

Vi(t)Vk(t) =N∑

i,k=0

∑0≤j≤2i−1−1,0≤l≤2k−1−1

YikYjlψik(t)ψjl(t). (58)

Note that E[YijYkl] = 1 if (i, j) = (k, l), by standard normal distribution, and otherwiseE[YijYkl] = 0, by independence. Thus, taking expected values yields

E

(N∑i=0

Vi(t)

)2

=N∑i=0

2i−1−1∑j=0

(ψij(t))2 ≤

N∑i=0

2−(i+1), (59)

here using Lemma 5.2. Therefore, by Fatou’s Lemma, E[X2t ] ≤ lim infN→∞E[(

∑Ni=0 Vi(t))

2] ≤∑2−(i+1) <∞, as required.

Lemma 5.5. For all s, t ∈ [0, 1], E[XsXt] = min(s, t).

This is the one step of our proof that requires tools from Hilbert space theory. The spaceL2[0, 1] may be equipped with the real Hilbert space inner product

〈f, g〉 =

∫ 1

0

f(t)g(t)dt, where f, g ∈ L2[0, 1]. (60)

Note that, if ej∞j=1 is a complete orthonormal system in L2[0, 1], then for any functionf ∈ L2[0, 1]

N∑j=1

〈f, ej〉ej → f in L2 as N →∞. (61)

We will apply this result to the collection of Haar functions ϕij, which by Proposition 5.1is a complete orthonormal system for L2[0, 1].

Proof of Lemma 5.5. Let s, t ∈ [0, 1]. First, we claim

E[XsXt] =∑i,j

〈1[0,s], ϕij〉〈1[0,t], ϕij〉. (62)

To justify this equality, observe that for any integer N > 0

E

[(N∑i=0

Vi(s)

)(N∑k=0

Vk(t)

)]=

N∑i,k=0

E[Vi(s)Vk(t)]

=N∑

i,k=0

∑0≤j≤2i−1−1,0≤l≤2k−1−1

E[YijYklψij(s)ψkl(t)] =N∑i=0

2i−1−1∑j=0

ψij(s)ψij(t),

(63)

where the last equality follows from the fact that the Yij’s are independent standard normalrandom variables, so E[YijYkl] = 1 if (i, j) = (k, l) and equals 0 otherwise. For i ≥ 0

24

and 0 ≤ j ≤ 2i−1 − 1, observe that by definition ψij(t) = 〈1[0,t], ϕij〉. Thus equality (62)

will be proved if we can show that limN→∞E[(∑N

i=0 Vi(s))(∑N

k=0 Vk(t))] = E[XsXt]. Note

that limN→∞(∑N

i=0 Vi(s))(∑N

k=0 Vk(t)) = XsXt. Thus, if we take 2|XsXt| as our dominatingfunction (and we can because Xs, Xt ∈ L2(P ) by Lemma 5.4), this result is implied bydominated convergence.

Next, using bilinearity of the inner product and applying (61) to equation (62), wecompute

E[XsXt] =

⟨1[0,t],

∑i,j

〈1[0,s], ϕij〉ϕij

⟩= 〈1[0,t], 1[0,s]〉 = min(s, t).

(64)


For t ≥ 1, define Yt = tX1/t. Let X ′ = Xtt∈[0,1] be an independent copy with a.s.continuous paths of the process X, as in Proposition 2.19. (In other words, X ′ is a processwith a.s. continuous paths such that, for all s, t ≥ 0, X ′t =d Xt, and X ′s and Xt areindependent.) For t ∈ [0,∞), define

Zt =

X ′t t ∈ [0, 1]

X ′1 + Yt − Y1 t ∈ (0,∞).(65)

Our final step is to show

Theorem 5.6. Z = Ztt≥0 is a Wiener process.

Proof. By Proposition 4.4, it is enough to prove that (i) the paths of Z are almost surelycontinuous, (ii) Z is a Gaussian process, and (iii) for all s, t ∈ [0,∞), E[Zs] = 0 andE[ZsZt] = min(s, t).

(i) By Lemma 5.3 and the definitions of X ′t and Yt, the paths of Z are, with probability1, continuous at all t 6= 1. To see that the paths of Z are almost surely continuous at 1,observe that limt→1− Zt = limt→1− X

′t = X ′1 = Z1 a.s., by a.s. continuity of the paths of X ′.

Also, limt→1+ Zt = limt→1+(X ′1 + Yt − Y1) = X ′1 = Z1 a.s., by a.s. continuity of the paths ofY . Hence Z almost surely continuous at t = 1. This shows that the paths of Z are almostsurely continuous.

(ii) For each fixed t, Zt is a linear combination of Gaussian random variables and henceis Gaussian by Proposition 3.6. Given distinct times, t0, ..., tm ∈ [0,∞), the componentsof the random vector [Zt1 , ..., Ztm ]T are Gaussian, and hence [Zt1 , ..., Ztm ]T is multivariateGaussian. This shows that the finite-dimensional distributions of Z are Gaussian, so Z is aGaussian process.

(iii) Suppose s, t ∈ [0,∞). We may assume without loss of generality that s < t. Ifs, t ≤ 1, the result just follows by Lemma 5.5. There are two cases left to check.

Case 1: s ≤ 1 and t > 1. Then ZsZt = X ′s(X′1 + Yt − Y1) = X ′sX

′1 + tX ′sX1/t − X1.

Thus, using independence and Lemma 5.5, we have

E[ZsZt] = E[X ′sX′1] + tE[X ′s]E[X1/t]− E[X1] = E[X ′sX

′1] = s. (66)

25

Case 2: 1 < s < t. Then ZsZt = (X ′1 + sX1/s −X1)(X ′1 + tX1/t −X1) = (X ′1)2 + tX ′1X1/t −X ′1X1 + sX1/sX

′1 + stX1/sX1/t − sX1/sX1 − X1X

′1 − tX1X1/t + X2

1 . By independence andLemma 5.5, E[X1X

′1] = E[X ′1X1/t] = E[X1/sX

′1] = 0. Hence,

E[ZsZt] = E[(X ′1)2] + stE[X1/sX1/t]− sE[X1/sX1]− tE[X1X1/t] + E[X21 ]

= 1 + st1

t− s1

s− t1

t+ 1 = s.

(67)

Hence, in all cases E[ZsZt] = min(s, t) = s, and this completes the proof.

6 Markov Property

6.1 Conditional Expectation

The Markov property is essentially a statement about the conditional expectation of Brownianmotion with respect to certain σ-algebras. We first therefore need to define conditionalexpectation and describe its basic properties.

Definition 6.1 (Conditional expectation). Let (Ω,F, P ) be a probability space, X an F-measurable random variable, and G ⊂ F a sub-σ-algebra. If X ∈ L1(P ), the conditionalexpectation of X given F, denoted by E[X|G], is a G-measurable random variable Y ∈ L1(P ),which satisfies ∫

G

XdP =

∫G

Y dP for all G ∈ G. (68)

This definition is of little value if we do not prove.

Proposition 6.2. If X ∈ L1(P ) and G ⊂ F is a sub-σ-field, then (i) the conditional expec-tation of X given G exists, and (ii) it is unique up to almost sure equivalence.

Remark 6.3. Proposition 6.2 expresses the fact that conditional expectation is defined upto almost sure equivalence. Hence, technically any equality involving E[X|G] will only holdup to almost sure equivalence. However, the standard convention is to write Y = E[X|G]with almost sure equivalence understood, rather than to write Y = E[X|G] a.s., even thoughthe latter is perhaps more correct.

We can establish uniqueness (ii) immediately.

Proof of Proposition 6.2(ii). Suppose Y, Y ′ are two G-measurable functions in L1(P ) satis-fying (68). Let ε > 0, and let A = ω : Y − Y ′ ≥ ε. Then H ∈ G, and

0 =

∫A

(X −X)dP =

∫A

(Y − Y ′)dP ≥ εP (A). (69)

Hence P (A) = 0. Since ε is arbitrary, it follows Y ≥ Y ′ a.s. Reversing the roles of Y andY ′, we get Y ′ ≥ Y a.s. Hence Y = Y ′ a.s.

26

The existence of conditional expectation is a somewhat deeper matter, which relies on theRadon-Nikodym theorem from analysis. If µ and ν are two σ-finite measures on a σ-algebraF, we say that µ is absolutely continuous with respect to ν (write µ ν) if, for anyN ∈ F, ν(N) = 0 implies µ(N) = 0. The special case of Radon-Nikodym theorem which weneed is the following result.

Theorem 6.4 (Radon-Nikodym theorem). Suppose µ and ν are σ-finite measures on F suchthat µ ν. Then there exists a real-valued F-measurable function f such that∫

fdν = µ(F ) for all F ∈ F. (70)

For proof of this result, we refer the reader to Durrett [3, Theorem A.4.6], or Rudin[7, Theorem 6.10], or Folland [4, Theorem 3.8]. Using this result, the proof of first part ofProposition 6.2 is fairly simple.

Proof of Proposition 6.2(i). First, let us assume that X ≥ 0. We define the set function µon G by

µ(G) =

∫G

XdP (G ∈ G). (71)

One easily verifies that µ is a measure on G: µ(∅) = 0 by definition of the integral, and count-able additivity follows from dominated convergence. Also, by basic properties of integrals,if P (G) = 0 for some G ∈ G, then µ(G) = 0. Hence µ is abolutely continuous with respectto the measure P |G. Hence, by the Radon-Nikodym theorem, there exists a G-measurablerandom variable Y such that∫

G

XdP = µ(G) =

∫G

Y dP for all G ∈ G, (72)

Hence Y = E[X|G].To obtain the result in general, write X = X+−X−, where X+ = X if X ≥ 0 and X+ = 0

otherwise, and X− = |X| −X+. Let Y+ = E[X+|G] and Y− = E[X−|G]. Let Y = Y+ − Y−.Then, for any G ∈ G,∫

G

XdP =

∫G

X+dP −∫G

X−dP =

∫G

Y+dP −∫G

Y−dP =

∫G

Y dP. (73)

Hence Y = E[X|G], which is what we needed to show.

Next we prove some basic properties of conditional expectation. In the statement of thenext proposition, we will use the notation X ∈ L1(G), where G is any sub-σ-field of F, toindicate that X is an integrable, G-measurable random variable.

Proposition 6.5. Suppose G is a sub-σ-algebra of F. Then

a. Conditional expectation is linear, i.e. E[aX + bY |G] = aE[X|G] + bE[Y |G] for allscalers a, b and all X, Y ∈ L1(F).

b. Suppose X ∈ L1(G) and Y ∈ L1(F). Then E[XY |G] = XE[Y |G].

27

c. If X ∈ L1(G), then E[X|G] = X.

d. If X ∈ L1(F) is independent of G, then E[X|G] = E[X].

e. Suppose H is a sub-σ-algebra of G, and X ∈ L1(F). Then E[E[X|G]|H] = E[E[X|H]|G] =E[X|H].

The fact (e) is sometimes stated informally as “the smallest σ-algebra always wins.”

Proof. (a) Let Z1 = E[X|G] and Z2 = E[Y |G]. Then, for any G ∈ G,∫G

(aX + bY )dP = a

∫G

XdP + b

∫G

Y dP

= a

∫G

Z1dP + b

∫G

Z2dP =

∫G

(aZ1 + bZ2)dP,

(74)

as required.(b) First, suppose X = 1A for some A ∈ G, and let Z = E[XY |G] and Z ′ = E[Y |G].

Then, for any G ∈ G ∫G

ZdP =

∫G∩A

Y dP =

∫G∩A

Z ′dP =

∫G

XZ ′dP. (75)

Since XZ ′ is G-measurable, this shows that Z = XZ ′. Hence, by linearity of conditionalexpectation, E[XY ] = XE[Y ] whenever X is a G-measurable simple function. Now take Xto be an arbitrary random variable in L1(G). By a basic fact from integration theory, thereexists a sequence of G-measurable simple functions Xj such that Xj → X as j →∞. Thenby dominated convergence, if we let Z and Z ′ be taken as above, then∫

G

ZdP =

∫G

XY dP = limj→∞

∫G

XjY dP = limj→∞

∫G

XjZ′dP =

∫G

XZ ′dP, (76)

as required.(c) This is a special case of (b), with Y ≡ 1.(d) For any G ∈ G, X and 1G are independent random variables, and hence by Proposition

2.6 ∫G

XdP = E[X1G] = E[X]E[1G] =

∫G

E[X]dP, (77)

as required.(e) The fact that E[E[X|G]|H] = E[X|H] follows from the fact that any G-measurable

random variable is H-measurable. As for the second inequality, let H ∈ H, and observe thatsince H ∈ G ∫

H

E[E[X|H]|G]dP =

∫H

E[X|H]dP =

∫H

XdP, (78)

This shows that E[E[X|H]|G] = E[X|H].

28

6.2 Stopping Times

We will need to know about stopping times in order to describe the strong Markov property.To define stopping times, first we must introduce filtrations, which are also be needed todefine the Markov properties.

Definition 6.6 (Filtration). Let (Ω,F, P ) is a probability space, a filtration is a collectionof σ-algebras Ftt≥0 such that

Fs ⊂ Ft ⊂ F whenever s ≤ t. (79)

In addition, we say that a filtration Ftt≥0 is right-continuous if, for all t ≥ 0, Ft = ∩s≥tFs.

Example 6.7 (Filtration of a stochastic process). A common way in which filtrations ariseis as follows. If Y = Ytt≥0 is a stochastic process, one defines Ft to be the σ-algebragenerated by the collection of random variables Ys : 0 ≤ s ≤ t (i.e. Ft is the smallestσ-algebra for which each Ys, for s ≤ t, is measurable). The collection Ftt≥0 clearly satisfiesthe above definition. Intuitively Ft represents the package of “potential information” wemay extract from X by the time t.

We are now ready to define stopping times. Let Ftt≥0, be a fixed filtration of σ-algebras.

Definition 6.8 (Stopping time). A random variable T : Ω→ [0,∞) is said to be a stoppingtime if, for all t ≥ 0, ω : T (ω) < t ∈ Ft.

Next we prove some elementary properties of stopping times.

Proposition 6.9.

a. T is a stopping time if and only if ω : T (ω) ≤ t ∈ Ft for all t ≥ 0.

b. Any fixed a ≥ 0 is a stopping time.

c. If T and T ′ are stopping times, then TM = max(T, T ′) and Tm = min(T, T ′) arestopping times.

d. If Tn is a sequence of stopping times, then T = supn Tn is a stopping time, andT = infn Tn.

e. If S and T are stopping times, then so is S + T .

Proof. (a) Observe that T ≤ t if and only if T < q for all rational numbers q > t. Hence if Tis a stopping time, then ω : T (ω) ≤ t = ∩q∈(t,∞)ω : T (ω) < q ∈ Ft. Conversely, supposeω : T (ω) ≤ t ∈ Ft for all t ≥ 0. Observe that T < t if and only if T ≤ q for some rationalq < t. Hence ω : T (ω) < t = ∪q∈[0,t)ω : T (ω) ≤ q ∈ Ft. Thus T is a stopping time.

(b) For any t ≥ 0, ω : a < t is equal to ∅ or Ω, which are both contained in Ft.(c) Observe that ω : TM(ω) < t = ω : T (ω) < t ∩ ω : T ′(ω) < t ∈ Ft, and

ω : Tm(ω) < t = ω : T (ω) < t ∪ ω : T ′(ω) < t ∈ Ft(d) Observe that T ≤ t if and only if Tn ≤ t for all n. Hence T = ∩nω : Tn(ω) ≤ t ∈ Ft

by (a). Also, T < t if and only if Tn < t for some n. Hence ω : T < t = ∪nω : Tn(ω) <t ∈ Ft.

29

(e) Observe that

ω : S(ω) + T (ω) < t =⋃

q∈Q∩[0,∞)

ω : q + T (ω) < t ∩ S(ω) < q (80)

Note that ω : q + T (ω) < t = ω : T (ω) < t− q ∈ Ft (if q > t, then this set is the emptyset). Hence ω : S(ω) + T (ω) < t ∈ Ft.

6.3 The Wiener Process as a random variable taking values inC([0,∞),Rn)

Let us now introduce another point of view from which to consider the Wiener process.Suppose X = Xtt≥0 is a Wiener process in Rn starting from a point x. While before we hadavoided assigning any definite meaning to the measure space (Ω,F, P ) on which X is defined,we will see that a rather nice situation arises when we identify Ω with C = C([0,∞),Rn),the set of continuous functions from [0,∞) into Rn. This identification is accomplished asfollows:

For each t ≥ 0, let πt : C → Rn be the usual projection map (i.e. πt(f) = f(t) for allf ∈ C). Let C be the σ-algebra of subsets of C generated by the projection maps πt. If Xis a Wiener process in Rn starting from x, define a set function P x on C by

P x(A) = P (X ∈ A) (A ∈ C). (81)

To show that the set ω : X(ω) ∈ A = X−1A ∈ F, it is enough to prove X−1A ∈ F when Ais in a generating set for C. The σ-algebra C is generated by sets of form π−1

t B, where t ≥ 0and B ∈ Bn. Hence X−1(π−1

t B) = (πt X)−1B = X−1t B ∈ F, because Xt is a measurable

random variable from Ω into Rn. Thus the definition above for P x makes sense.The fact that P x is a measure is easy to see, because first P x(∅) = P (∅) = 0. Second,

if Ajnj=1 is a countable disjoint collection of sets in C, then P x(∪Aj) = P (∪ω : X ∈Aj) =

∑∞j=1 P (X ∈ Aj), here using countable additivity of P and the fact that the sets

ω : X ∈ Aj are disjoint.For each t ∈ [0,∞), let Yt = πt be the projection map on C defined above.

Proposition 6.10. The process Y = Ytt≥0, defined on the probability space (C,C, P x), isa Wiener process in Rn starting from x.

Remark 6.11. We can regard Y as a random variable from C into the product space MT ,where M = Rn and T = [0,∞), and MT is equipped with the product σ-algebra MT , whereM = Bn is the collection of Borel sets (see §2.5). Indeed, the σ-algebra MT is generated bysets of form π−1

t B where B ∈ M and t ≥ 0. Hence f ∈ C : Y (f) ∈ π−1t B = f ∈ C :

f(t) ∈ B = π−1t B ∈MT , and it follows that Y is measurable. In particular, it is clear that

Y satisfies the official Definition 2.13 of a stochastic process.

Proof. The paths of Y are just the elements of C and hence continuous. To see that thefinite-dimensional distributions of Y agree with those of X, suppose t1, ..., tm are distinct

30

elements of [0,∞), and suppose B1, ..., Bm are Borel subsets of Rn. Then

P x(Yt1 ∈ B1, ..., Ytm ∈ Bm) = P x(∩mj=1Y−1tjBj) = P (X ∈ ∩mj=1Y

−1tjBj)

= P (Ytj(X(ω)) ∈ Bj for j = 1, ...,m)

= P (Xt0 ∈ B1, ..., Xtm ∈ Bm).

(82)

Thus the finite dimensional distributions of X and Y agree, and the result follows by Propo-sition 4.3.

The content of Proposition 6.10 is that to understand the Wiener process, the probabilityspace (Ω,F, P ), on which the Wiener process X starting from x is defined, may be replacedby a probability space (C,C, P x), on which the Wiener process takes an especially simpleform as a collection of evaluation maps. The features of the original process X could bequite complicated depending on the measure P , with respect to which it was constructed.Within the new framework, arbitrary features arising from P are hidden away.

This new point of view is so convenient that we will work with it exclusively for the restof this paper. Rather than breaking with our previous notation for the Wiener process, andusing Y , C, and C from now on, we establish the following

Convention. From now on, we will use Ω to denote the space C = C([0,∞),Rn) de-fined above, and we will use F to denote the σ-algebra C. The elements ω ∈ Ω are thuscontinuous functions from [0,∞) into Rn. A Wiener process starting from x ∈ Rn will beunderstood to be the pair (X,P x), where P x is the measure defined above and X = Xtt≥0

where each Xt is the evaluation map ω 7→ ω(t) (ω ∈ Ω).

6.4 The Weak Markov Property

Let X be a Wiener process in Rn. We associate with X two filtrations. First, for t ≥ 0, wedefine

F0t = σ(Xs : s ≤ t). (83)

that is, the σ-algebra generated by the collection of random variables Xs such that s ≤ t.Second, for t ≥ 0, we define the σ-algebras

Ft =⋂ε>0

Ft+ε. (84)

Clearly the collections of σ-algebras F0t t≥0 and Ftt≥0 are both filtrations, and the latter

is a right-continuous filtration. To paraphrase Durrett, the σ-algebra F0t represents the

information about X we will know by time t, and Ft represents the information about X wewill know by time t if we are allowed “an infinitesimal peak into the future” [3, p. 307].

The weak Markov property is the following result.

Theorem 6.12 (Weak Markov property). Suppose f is a bounded, Borel measurable func-tion, and define a process Y = f(X). Then

Ex[Y τs|Fs] = EXsY P x-a.s. (85)

31

To explain the notation, EXsY is the random variable ϕ(Xs), where ϕ(y) = EyY . Toprove the theorem, we first prove the result for when f is a complex exponential function.We then extend this result to a larger class of functions using the Fourier inversion theorem.Finally, we obtain the general result using a simple dominated convergence argument.

Lemma 6.13. If f(x) = e−2πiξ·x for some ξ ∈ Rn, then the conclusion of Theorem 6.12holds true.

Proof. First, we show the result holds if we replace the σ-algebra Fs with F0s . Indeed,

Ex[Y τs|F0s ] = Ex[e−2πiξ·Xt+s|F0

s ]

= Ex[e−2πiξ·Xt+s−Xse−2πiξ·Xs|F0s ]

= E[e−2πiξ·(Xt+s−Xs)]e−2πiξ·Xs

= e−π|ξ|2te−2πiξ·Xs .

(86)

The second to last line above follows from independence of increments together with Propo-sition 6.5(b) and (d), and the last line follows from Corollary 3.14. On the other hand, bythe same corollary, for any y ∈ Rn,

EyY = Eye−2πiξ·Xt = e−π|ξ|2te−2πiξ·y. (87)

Thus, setting y = Xs and comparing with (86), we see that Ex[Y τs|F0s ] = EXsY .

To obtain the same result for the σ-algebra Fs, we must show that, for any F ∈ Fs,∫F

Y τsdP x =

∫F

EXsY. (88)

If F ∈ Fs, then for all ε > 0, F ∈ F0s+ε. Hence, by the result we just established, for all

ε > 0, ∫F

Y τs+εdP x =

∫F

EXs+ε [Y ]dP x. (89)

Note that Y τs+ε = e−2πiξ·Xs+t+ε → e−2πiξ·Xs+t pointwise as ε → 0, by continuity. Hence,by dominated convergence, the left side of equation (89) approaches

∫FY τsdP x as ε→ 0.

On the other hand, by equation (87), EXs+ε [Y ] = e−π|ξ|2te−2πiξ·Xs+ε → e−π|ξ|

2te−2πiξ·Xs =EXs+ε [Y ] pointwise as ε → 0. Hence, by dominated convergence, the right hand side ofequation (89) approaches

∫FEXs [Y ]dP x as ε→ 0. Thus the equality (88) is proved.

Lemma 6.14. If f is bounded and has compact support, then the conclusion of Theorem6.12 holds true.

Proof. Let K = supp(f). By assumption |f | ≤ c for some c > 0, so∫|f | ≤ cm(K) < ∞.

Thus f ∈ L1. Moreover, taking the inverse Fourier transform of f , we have |f∨| ≤∫|f | ≤

cm(K) < ∞, so f∨ is also bounded. Therefore, by Corollary 7.9 there exists a sequencefj∞j=1 of smooth L1 functions such that fj → f a.e., each f∨j ∈ L1, and f∨j → f∨ pointwise.

Moreover, by the inversion formula (Theorem 7.6), for all j, (f∨j ) = fj a.e.Define Yj = fj(X). Then Yj → Y P x-a.s. By this we mean that, except for ω in P x-null

set, Yj(ω)→ Y (ω) pointwise as a sequence of continuous functions on [0,∞). To justify this

32

statement, let N ∈ Bn be the null set where fj 6→ f . Observe that Yj(ω) 6→ Y (ω) if andonly if, for some t ∈ [0,∞), Xt(ω) ∈ N . If t > 0, since the distribution of Xt is given by anormal density function, it follows that P x(Xt ∈ N) = 0. By possibly redefining the fj’s ona null set, we may assume that x /∈ N , in which case we also have P x(Xt ∈ N) = 0 whent = 0.

Given F ∈ F and s ≥ 0, we use the fact that fj = f∨j and apply Fubini’s theorem severaltimes to compute∫

F

Yj τsdP x =

∫F

fj(τsX)dP x =

∫F

∫Rne−2πiξ·τsXf∨j (ξ)dξdP x

=

∫Rn

(∫F

e−2πiξ·τsXdP x

)f∨j (ξ)dξ

=

∫Rn

(∫F

EXs [e−2πiξ·X ]dP x

)f∨j (ξ)dξ (by Lemma 6.13)

=

∫F

∫RnEXs [e−2πiξ·X ]f∨j (ξ)dξdP x

=

∫F

EXs

[∫Rne−2πiξ·Xf∨j (ξ)dξ

]dP x

=

∫F

EXsfj(X)dP x =

∫F

EXsYjdPx.

(90)

Note that by dominated convergence limEyYj = EyY for any y ∈ Rn. Hence limEXsYj =EXsY . Thus again by dominated convergence the first and last quantities appearing in (90)converge to

∫FY τsdP x and

∫FEXsY dP x respectively, as j → ∞, and this proves the

lemma.

With these two results on hand, the theorem then follows relatively easily.

Proof of Theorem 6.12. Let f be any bounded measurable function, and for each integern ≥ 0, let Bn be the ball of radius n centered at zero. Let fn = fχBn , and let Yn = fn(X).Then fn → f pointwise, and hence Yn → Y pointwise. Given F ∈ F and s ≥ 0, since eachfn is bounded and has compact support, by the previous lemma we have∫

F

Yn τsdP x =

∫F

EXsYndPx. (91)

Note that by dominated convergence limEyYn = EyY for any y ∈ Rn, and in particularlimEXsYn = EXsY . Thus again by dominated convergence, the quantities on the right andleft in equation (91) converge to

∫FY τsdP x and

∫FEXsY dP x respectively, and the theorem

is proved.

6.5 The Strong Markov Property

Let Ft be the filtration defined in the previous section. Let T be a fixed stopping timewith respect to this filtration. We define

FT = A ∈ F∞ : A ∩ (T ≤ t) ∈ Ft, for all t > 0. (92)

33

Proposition 6.15. (i) FT is a σ-algebra. (ii) T is FT -measurable. (iii) XT is FT measurable.

Proof. (i) Suppose Aj∞1 ⊂ FT . Then, for any t > 0, (∪∞1 Aj) ∩ (T ≤ t) = ∪∞1 (Aj ∩ (T ≤t)) ∈ Ft, since each Aj ∩ (T ≤ t) ∈ Ft. If B ∈ FT , then B ∩ (T ≤ t) ∈ Ft and (T ≤ t) ∈ Ft.Hence (T ≤ t)\(B∩ (T ≤ t)) = Bc∩ (T ≤ t) ∈ Ft. Thus FT is closed under countable unionsand complementation, so FT is a σ-algebra.

(ii) Since the intervals [0, s], s > 0, generate the Borel sets in [0,∞), it is enough toshow that, for any s > 0, (T ≤ s) ∈ FT . But this is clearly the case, because, for all t > 0,(T ≤ s) ∩ (T ≤ t) = (T ≤ min(s, t)) ∈ Ft.

(iii) XT is the composition of functions

XT : (Ω,FT )g−→ ([0,∞)× Ω,B[0,∞) × F∞)

f−→ (Rn,Bn), (93)

where f(t, ω) = Xt(ω), g(ω) = (T (ω), ω), and B[0,∞) × F∞ is the product σ-algebra. Thefunction f is measurable because it is an F∞-measurable function in ω and a continuousfunction in t. That g is measurable follows from the fact that T is FT -measurable.

Some additional notation: We define the maps θT and XT by θT (ω)(t) = ω(T (ω) + t)and XT (ω) = XT (ω)(ω). We are now ready to state the strong Markov property.

Theorem 6.16 (Strong Markov Property). Suppose Y = f(X), where f is a bounded, Borelmeasurable function, and (X,P x) is a Wiener process. Then

Ex[θTY |FT ] = EXT [Y ]. (94)

Proof. We must prove that, for any A ∈ FT ,∫A

f(θTX)dP x =

∫A

EXT [f(X)]dP x. (95)

Since the continuous functions are dense in the space of Borel-measurable functions on Rn,we may assume that f is continuous. The general result then follows by taking a sequenceof continuous functions and applying the dominated convergence theorem.

Define a sequence of stopping times Tn by

Tn(ω) = k/2n if T (ω) ∈ [(k − 1)/2n, k/2n) k = 1, 2, 3, ... (96)

Then Tn decreases monotonically to T . If A ∈ FT , then A ∈ FTn and hence, if we letAk = A ∩ ω : Tn(ω) = k/2n, then Ak ∈ Fk/2n . By the weak Markov property∫

Ak

f(θTnX)dP x =

∫Ak

f(θk/2nX)dP x =

∫Ak

EXk/2n [f(X)]dP x

=

∫Ak

EXTn [f(X)]dP x.

(97)

Since ∪∞k=1Ak = A, it follows that∫A

f(θTnX)dP x =∞∑k=1

∫Ak

EXTn [f(X)]dP x =

∫A

EXTn [f(X)]dP x. (98)

34

By continuity, for any t ≥ 0 and ω ∈ Ω, limn→∞ f((θTnX)t(ω)) = limn→∞ f(XTn(ω)+t(ω)) =f((θTX)t(ω)). Hence, by dominated convergence, the quantity on the left in (98) approaches∫Af(θTX)dP x as n → ∞. As for the quantity on the right, note that the function ϕ(y) =

Ey[f(X)] is continuous. This is because Ey[f(X)] = E0[f(X+y)], and since f is continuous,continuity of ϕ follows by dominated convergence. Thus, applying dominated convergence tothe quantity on the right in (98) yields that limn→∞

∫AEXTn [f(X)]dP x =

∫AEXT [f(X)]dP x.


7 Appendices

Appendix A: Convolutions and Approximate Identities

If f and g are two measurable functions on Rn, the convolution of f and g is defined to bethe function

(f ∗ g)(x) =

∫f(x− y)g(y)dy (x ∈ Rn), (99)

whenever the above integral is defined.Convolutions have a number of nice analytical and algebraic properties. These include

Proposition 7.1. Suppose f, g, h are measurable functions on Rn. Then, whenever therespective convolutions are defined,

a. f ∗ g = g ∗ f .

b. f ∗ (g ∗ h) = (f ∗ g) ∗ h.

c. If f, g ∈ L1, then f ∗ g ∈ L1, and ||f ∗ g||1 ≤ ||f ||1||g||1.

d. Let f, g ∈ L1(Rn). Suppose g ∈ Ck(U) for some U ⊂ Rn and each of the derivatives∂αg is bounded for |α| ≤ k. Then f ∗ g ∈ Ck(U) and ∂α(f ∗ g) = f ∗ ∂αg..

Proof. (a) This is just a change variables:

f ∗ g(x) =

∫f(x− y)g(y)dy =

∫f(z)g(x− z)dz = g ∗ f(x) (100)

where we have made the substitution z = x− y.(b) By Fubini’s Theorem and (a)

f ∗ (g ∗ h)(x) =

∫f(y)

∫g(z − x+ y)h(z)dzdy

=

∫h(z)

∫g(y − x+ z)f(y)dydz = (f ∗ g) ∗ h(x).

(101)

(c) Using Tonelli’s Theorem, we have∫∫|f(y)g(x− y)|dydx =

∫|f(y)|

∫|g(x− y)|dxdy = ||f ||1||g||1, (102)

35

which is finite by assumption. It then follows that f ∗ g is defined almost everywhere, and

||f ∗ g||1 =

∫ ∣∣∣∣∫ f(y)g(x− y)dy

∣∣∣∣ dx ≤ ∫∫ |f(y)g(x− y)|dydx = ||f ||1||g||1, (103)

as required.(d) By boundedness of the partial derivatives, it is clear that f(y)∂αg(x−y) is integrable

for |α| ≤ k and x ∈ U . It follows that for |α| ≤ k and x ∈ U we can differentiate under theintegral to obtain

∂αf ∗ g(x) =

∫f(y)∂αg(x− y)dy = f ∗ ∂αg. (104)

(For justification of differentiating under the integral, see e.g. Folland [4, Theorem 2.27].)

Convolutions allow us to approximate a given function by well-behaved functions. Sup-pose that φ ∈ L1(Rn), and

∫φ = 1. For t > 0, define φt(x) = t−nφ(t−1x). Notice that by

changing variables we still have∫φt = 1. The collection of functions φtt>0 is known as an

approximate identity. The reason for this name comes from the following result.

Theorem 7.2. Suppose f ∈ L1. Then f ∗ φt → f in L1 as t→ 0.

Remark 7.3. One can obtain other forms of convergence, such as uniform convergence oruniform convergence on compact sets, by imposing stronger conditions on f . See for exampleTheorem 8.14 in Folland [4].

If g is a function on Rn and y ∈ Rn, we define τyg(x) = g(x − y). The operator τy issometimes called the translation operator. To prove Theorem 7.2, we will need the following

Fact. If g ∈ L1(Rn), then limy→0 ||τyg − g||1 = 0.

To avoid a somewhat lengthy digression, we refer the reader to Folland [4, Proposition8.5] for a justification of this fact. The proof of Theorem 7.2 then proceeds as follows.

Proof. We start with the following calculation.

f ∗ φt(x)− f(x) =

∫[f(x− y)− f(x)]φt(y)dy =

∫[f(x− tz)− f(z)]φ(z)dz

=

∫[τtzf(x)− f(x)]φ(z)dz.

(105)

Using Fubini’s Theorem, this allows us to obtain the following estimate.

||f ∗ φt − f ||1 ≤∫∫|τtzf(x)− f(x)||φ(z)|dzdx

=

∫ (∫|τtzf(x)− f(x)|dx

)|φ(z)|dz

=

∫||τtzf − f ||1|φ(z)|dz.

(106)

Note that ||τtzf − f ||1 ≤ 2||f ||1. Hence applying dominated convergence to the estimateabove shows that ||f ∗ φt − f ||1 → 0 as t→ 0, as desired.

36

Example 7.4. For x ∈ Rn, define φ(x) = e−π|x|2. We can write

e−π|x|2

=n∏j=1

e−πx2j . (107)

By a well-known fact from calculus, each∫e−πx

2jdxj = 1. Hence by Fubini’s theorem

∫φ = 1.

Thus by Theorem 7.2,f ∗ φt → f in L1. (108)

But we can say more: Each φt is a smooth L1 function, and hence by Propositions 7.1(c)and (d), f ∗ φt is a smooth L1 function. In fact, by choosing an appropriate sequence tj,where each tj > 0 and tj ↓ 0, and defining fj = f ∗ φtj , we obtain a sequence of smoothL1 functions fj such that fj → f a.e. (This follows from the fact that any sequence offunctions gj → g in L1 has a subsequence which converges to g almost everywhere. SeeCorollary 2.32 in Folland [4])

Appendix B: The Fourier Transform

Given a function f ∈ L1(Rn), the Fourier transform of f is the function

f(ξ) =

∫e−2πix·ξf(x)dx (ξ ∈ Rn). (109)

Typically, the Fourier transform of a function has better smoothness properties than theoriginal function. In particular, by dominated convergence (take |f | as the dominatingfunction), the Fourier transform of the function f in (109) is continuous. Additional mildhypotheses on the function f allow one to differentiate under the integral in (109) multipletimes to show that f ∈ Ck. Another useful fact is that, under the Fourier transform,convolution becomes multiplication. More precisely, we have

Proposition 7.5. Let f and g be L1 functions (whose convolution is well-defined, by Propo-

sition 7.1(c)). Then f ∗ g = f g.

Proof. The proof is just a matter of applying Fubini’s Theorem. For x ∈ Rn,

f ∗ g(x) =

∫∫e−2πix·ξf(ξ − y)g(y)dydξ

=

∫∫e−2πix·(ξ−y)f(ξ − y)e−2πix·yg(y)dydξ

=

∫ (∫e−2πix·(ξ−y)f(ξ − y)dξ

)e−2πix·yg(y)dy

=

∫f(x)e−2πix·yg(y)dy = f(x)g(x),

(110)

as required.

37

To recover the original function from its Fourier transform, we introduce inverse Fouriertransform of a function g ∈ L1. This is the function

g∨(x) = g(−x) =

∫e2πix·ξg(ξ)dξ (x ∈ Rn). (111)

Observe that, due to the simple relationship above between g∨ and g, Proposition 7.5 alsoholds for the inverse Fourier transform. The inversion formula is the following result.

Theorem 7.6 (Inversion of the Fourier transform). If f ∈ L1(Rn) and f ∈ L1(Rn), letg = (f)∨. Then g, h ∈ C0(Rn), and f = g = h a.e.

Remark 7.7. The same result still holds if we replace f by f∨ and let g = (f∨).

For a proof of Theorem 7.6, see for example Theorem 8.26 in Folland [4] or Theorem 9.5in Rudin [7].

The next result says that the function defined in Example 7.4 is a kind of eigenfunction ofthe Fourier transform. This result will give us a useful approximate identity. It also providesus with the characteristic function for Gaussian distribution (see Proposition 3.5).

Proposition 7.8. For x ∈ Rn, let φ(x) = e−π|x|2. Then φ = φ.

Proof. The proof has two stepsStep 1. First we show that result holds for dimension n = 1. In this case, by definition,

φ(ξ) =

∫e−2πixξe−πx

2

dx. (112)

The derivative of e−2πixξe−πx2

with respect to ξ is −2πixe−2πixξe−πx2, which is bounded by

g(x) = 2π|x|e−πx2 , which is integrable. Hence we may differentiate under the integral (seeFolland [4, Proposition 2.27(b)]) to obtain

φ′(ξ) =

∫−2πixe−2πixξe−πx

2

dx

= e−2πixξie−πx2 |∞−∞ −

∫(ie−πx

2

)(−2πiξe−2πixξ)dx

= 0− 2πξ

∫e−2πixξe−πx

2

dx

= −2πξφ(ξ).

(113)

(The second line above is obtained by integration by parts with u = e−2πixξ and dv =−2πixe−πx

2dx.) Thus we have obtained an ODE φ′(ξ) = −2πφ(ξ). To solve it, observe that

by (113)d

dξ[φ(ξ)eπξ

2

] = φ′(ξ)eπξ2

+ 2πξφ(ξ)eπξ2

= 0. (114)

Therefore, φ(ξ)eπξ2

= φ(0) =∫e−πx

2dx = 1. So indeed φ(ξ) = e−πξ

2.

38

Step 2. Now we generalize dimension n ≥ 1. For x ∈ Rn, |x|2 =∑n

j=1 x2j . Hence, by

Fubini’s Theorem,

φ(ξ) =

∫e−2πix·ξe−π|x|

2

dx =

∫(n∏j=1

e−2πixjξje−πx2j )dx1dx2 · · · dxj

=n∏j=1

∫e−2πixjξje−πx

2jdxj =

n∏j=1

e−πξ2j = e−π|ξ|

2

,

(115)

here using the result from Step 1. This completes the proof.

Corollary 7.9. Suppose f ∈ L1. Then there exists a sequence fj∞j=1 of smooth L1 functions

such that fj → f a.e. and fj → f pointwise. In addition, if f is bounded, then the sequence

may be chosen so that each fj ∈ L1.

Proof. Let φ and fj be chosen as in Example 7.4. Then as shown in the example, fjis a sequence of smooth L1 functions such that fj → f a.e. Moreover, by Proposition 7.5,

fj = f φtj . Given t > 0, using the previous proposition, we compute

φt(ξ) = t−n∫e−2πix·ξe−πt

−2|x|2dx =

∫e−2πiy·ξ/te−π|y|

2

dy

= φ(ξ/t) = e−π|ξ|2/t2 .

(116)

Thus φt → 1 as t → 0, and hence fj → f pointwise as j → ∞. Moreover, φt ∈ L1. So if f

is bounded, then each fj ∈ L1.

Remark 7.10. The result also holds if we replace fj with f∨j and f with f∨.

39

References

1. R. F. Bass, Probabilistic Techniques in Analysis, Probability and its Applications,Springer-Verlag New York, Inc., New York, 1995.

2. Z. Ciesielski, Holder conditions for realisations of Gaussian processes, Trans. Amer.Math. Soc., 99(1961) 403-413.

3. R. Durrett, Probability: Theory and Examples, 4th ed. Cambridge University Press,Cambridge, 2010.

4. G. B. Folland, Real Analysis: Modern Techniques and Their Applications, 2nd ed.John Wiley & Sons, Inc., Hoboken, NJ, 1999.

5. B. Øksendal, Stochastic Differential Equations: An Introduction with Applications, 3rded. Springer-Verlag Berlin, Heidelberg, 1992.

6. L. C. G. Rogers, D. Williams, Diffusions, Markov Processes and Martingales, Volume1: Foundations, 2nd ed. Cambridge University Press, Cambridge, 2000.

7. W. Rudin, Real and Complex Analysis, 3rd ed. McGraw-Hill Book Co., Singapore,1987.

8. T. Tao, 254A, Notes 3b: Brownian Motion and Dyson Brownian Motion. 18 Jan.2010. Web 8 June 2017. https://terrytao.wordpress.com/2010/01/18/254a-notes-3b-brownian-motion-and-dyson-brownian-motion/

9. D. Williams, Probability with Martingales. Cambridge University Press, Cambridge,1991.

40

Brownian Motion - University of Washingtonmorrow/papers/peter-thesis.pdf · the expected value of...

Documents

Transcript of Brownian Motion - University of Washingtonmorrow/papers/peter-thesis.pdf · the expected value of...