dec41.user.srcf.netdec41.user.srcf.net/notes/II_M/probability_and_measure_thm_proof.p… · Part II...

Part II — Probability and Measure

Theorems with proof

Based on lectures by J. MillerNotes taken by Dexter Chua

Michaelmas 2016

These notes are not endorsed by the lecturers, and I have modified them (oftensignificantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

Analysis II is essential

Measure spaces, σ-algebras, π-systems and uniqueness of extension, statement *andproof* of Caratheodory’s extension theorem. Construction of Lebesgue measure on R.The Borel σ-algebra of R. Existence of non-measurable subsets of R. Lebesgue-Stieltjesmeasures and probability distribution functions. Independence of events, independenceof σ-algebras. The Borel–Cantelli lemmas. Kolmogorov’s zero-one law. [6]

Measurable functions, random variables, independence of random variables. Construc-tion of the integral, expectation. Convergence in measure and convergence almosteverywhere. Fatou’s lemma, monotone and dominated convergence, differentiationunder the integral sign. Discussion of product measure and statement of Fubini’stheorem. [6]

Chebyshev’s inequality, tail estimates. Jensen’s inequality. Completeness of Lp for1 ≤ p ≤ ∞. The Holder and Minkowski inequalities, uniform integrability. [4]

L2 as a Hilbert space. Orthogonal projection, relation with elementary conditionalprobability. Variance and covariance. Gaussian random variables, the multivariatenormal distribution. [2]

The strong law of large numbers, proof for independent random variables with boundedfourth moments. Measure preserving transformations, Bernoulli shifts. Statements*and proofs* of maximal ergodic theorem and Birkhoff’s almost everywhere ergodictheorem, proof of the strong law. [4]

The Fourier transform of a finite measure, characteristic functions, uniqueness and

inversion. Weak convergence, statement of Levy’s convergence theorem for characteristic

functions. The central limit theorem. [2]

1

Contents II Probability and Measure (Theorems with proof)

Contents

0 Introduction 3

1 Measures 41.1 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Probability measures . . . . . . . . . . . . . . . . . . . . . . . . . 11

2 Measurable functions and random variables 132.1 Measurable functions . . . . . . . . . . . . . . . . . . . . . . . . . 132.2 Constructing new measures . . . . . . . . . . . . . . . . . . . . . 142.3 Random variables . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4 Convergence of measurable functions . . . . . . . . . . . . . . . . 172.5 Tail events . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3 Integration 223.1 Definition and basic properties . . . . . . . . . . . . . . . . . . . 223.2 Integrals and limits . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 New measures from old . . . . . . . . . . . . . . . . . . . . . . . 283.4 Integration and differentiation . . . . . . . . . . . . . . . . . . . . 283.5 Product measures and Fubini’s theorem . . . . . . . . . . . . . . 30

4 Inequalities and Lp spaces 344.1 Four inequalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 344.2 Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.3 Orthogonal projection in L2 . . . . . . . . . . . . . . . . . . . . . 384.4 Convergence in L1(P) and uniform integrability . . . . . . . . . . 40

5 Fourier transform 445.1 The Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . 445.2 Convolutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.3 Fourier inversion formula . . . . . . . . . . . . . . . . . . . . . . 455.4 Fourier transform in L2 . . . . . . . . . . . . . . . . . . . . . . . 485.5 Properties of characteristic functions . . . . . . . . . . . . . . . . 505.6 Gaussian random variables . . . . . . . . . . . . . . . . . . . . . 50

6 Ergodic theory 536.1 Ergodic theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

7 Big theorems 587.1 The strong law of large numbers . . . . . . . . . . . . . . . . . . 587.2 Central limit theorem . . . . . . . . . . . . . . . . . . . . . . . . 60

2

0 Introduction II Probability and Measure (Theorems with proof)

0 Introduction

3

1 Measures II Probability and Measure (Theorems with proof)

1 Measures

1.1 Measures

Proposition. A collection A is a σ-algebra if and only if it is both a π-systemand a d-system.

Lemma (Dynkin’s π-system lemma). Let A be a π-system. Then any d-systemwhich contains A contains σ(A).

Proof. Let D be the intersection of all d-systems containing A, i.e. the smallestd-system containing A. We show that D contains σ(A). To do so, we will showthat D is a π-system, hence a σ-algebra.

There are two steps to the proof, both of which are straightforward verifica-tions:

(i) We first show that if B ∈ D and A ∈ A, then B ∩A ∈ D.

(ii) We then show that if A,B ∈ D, then A ∩B ∈ D.

Then the result immediately follows from the second part.We let

D′ = B ∈ D : B ∩A ∈ D for all A ∈ A.

We note that D′ ⊇ A because A is a π-system, and is hence closed underintersections. We check that D′ is a d-system. It is clear that E ∈ D′. If wehave B1, B2 ∈ D′, where B1 ⊆ B2, then for any A ∈ A, we have

(B2 \B1) ∩A = (B2 ∩A) \ (B1 ∩A).

By definition of D′, we know B2 ∩A and B1 ∩A are elements of D. Since D is ad-system, we know this intersection is in D. So B2 \B1 ∈ D′.

Finally, suppose that (Bn) is an increasing sequence in D′, with B =⋃Bn.

Then for every A ∈ A, we have that(⋃Bn

)∩A =

⋃(Bn ∩A) = B ∩A ∈ D.

Therefore B ∈ D′.Therefore D′ is a d-system contained in D, which also contains A. By our

choice of D, we know D′ = D.We now let

D′′ = B ∈ D : B ∩A ∈ D for all A ∈ D.

Since D′ = D, we again have A ⊆ D′′, and the same argument as above impliesthat D′′ is a d-system which is between A and D. But the only way that canhappen is if D′′ = D, and this implies that D is a π-system.

Theorem (Caratheodory extension theorem). Let A be a ring on E, and µa countably additive set function on A. Then µ extends to a measure on theσ-algebra generated by A.

4


Proof. (non-examinable) We start by defining what we want our measure to be.For B ⊆ E, we set

µ∗(B) = inf

∑n

µ(An) : (An) ∈ A and B ⊆⋃An

.

If it happens that there is no such sequence, we set this to be∞. This measure isknown as the outer measure. It is clear that µ∗(φ) = 0, and that µ∗ is increasing.

We say a set A ⊆ E is µ∗-measurable if

µ∗(B) = µ∗(B ∩A) + µ∗(B ∩AC)

for all B ⊆ E. We let

M = µ∗-measurable sets.

We will show the following:

(i) M is a σ-algebra containing A.

(ii) µ∗ is a measure on M with µ∗|A = µ.

Note that it is not true in general that M = σ(A). However, we will alwayshave M ⊇ σ(A).

We are going to break this up into five nice bite-size chunks.

Claim. µ∗ is countably subadditive.

Suppose B ⊆⋃nBn. We need to show that µ∗(B) ≤

∑n µ∗(Bn). We can

wlog assume that µ∗(Bn) is finite for all n, or else the inequality is trivial. Letε > 0. Then by definition of the outer measure, for each n, we can find asequence (Bn,m)∞m=1 in A with the property that

Bn ⊆⋃m

Bn,m

andµ∗(Bn) +

ε

2n≥∑m

µ(Bn,m).

Then we haveB ⊆

⋃n

Bn ⊆⋃n,m

Bn,m.

Thus, by definition, we have

µ∗(B) ≤∑n,m

µ∗(Bn,m) ≤∑n

(µ∗(Bn) +

ε

2n

)= ε+

∑n

µ∗(Bn).

Since ε was arbitrary, we are done.

Claim. µ∗ agrees with µ on A.

5


In the first example sheet, we will show that if A is a ring and µ is a countablyadditive set function on µ, then µ is in fact countably subadditive and increasing.

Assuming this, suppose that A, (An) are in A and A ⊆⋃nAn. Then by

subadditivity, we have

µ(A) ≤∑n

µ(A ∩An) ≤∑n

µ(An),

using that µ is countably subadditivity and increasing. Note that we have to dothis in two steps, rather than just applying countable subadditivity, since we didnot assume that

⋃nAn ∈ A. Taking the infimum over all sequences, we have

µ(A) ≤ µ∗(A).

Also, we see by definition that µ(A) ≥ µ∗(A), since A covers A. So we get thatµ(A) = µ∗(A) for all A ∈ A.

Claim. M contains A.

Suppose that A ∈ A and B ⊆ E. We need to show that

µ∗(B) = µ∗(B ∩A) + µ∗(B ∩AC).

Since µ∗ is countably subadditive, we immediately have µ∗(B) ≤ µ∗(B ∩A) +µ∗(B ∩AC). For the other inequality, we first observe that it is trivial if µ∗(B)is infinite. If it is finite, then by definition, given ε > 0, we can find some (Bn)in A such that B ⊆

⋃nBn and

µ∗(B) + ε ≥∑n

µ(Bn).

Then we have

B ∩A ⊆⋃n

(Bn ∩A)

B ∩AC ⊆⋃n

(Bn ∩AC)

We notice that Bn ∩AC = Bn \A ∈ A. Thus, by definition of µ∗, we have

µ∗(B ∩A) + µ∗(B ∩Ac) ≤∑n

µ(Bn ∩A) +∑n

µ(Bn ∩AC)

=∑n

(µ(Bn ∩A) + µ(Bn ∩AC))

=∑n

µ(Bn)

≤ µ∗(Bn) + ε.

Since ε was arbitrary, the result follows.

Claim. We show that M is an algebra.

6


We first show that E ∈M. This is true since we obviously have

µ∗(B) = µ∗(B ∩ E) + µ∗(B ∩ EC)

for all B ⊆ E.Next, note that if A ∈M, then by definition we have, for all B,

µ∗(B) = µ∗(B ∩A) + µ∗(B ∩AC).

Now note that this definition is symmetric in A and AC . So we also haveAC ∈M .

Finally, we have to show that M is closed under intersection (which isequivalent to being closed under union when we have complements). SupposeA1, A2 ∈M and B ⊆ E. Then we have

µ∗(B) = µ∗(B ∩A1) + µ∗(B ∩AC1 )

= µ∗(B ∩A1 ∩A2) + µ∗(B ∩A1 ∩AC2 ) + µ∗(B ∩AC1 )

= µ∗(B ∩ (A1 ∩A2)) + µ∗(B ∩ (A1 ∩A2)C ∩A1)

+ µ∗(B ∩ (A1 ∩A2)C ∩AC1 )

= µ∗(B ∩ (A1 ∩A2)) + µ∗(B ∩ (A1 ∩A2)C).

So we have A1 ∩A2 ∈M. So M is an algebra.

Claim. M is a σ-algebra, and µ∗ is a measure on M.

To show that M is a σ-algebra, we need to show that it is closed undercountable unions. We let (An) be a disjoint collection of sets in M, then wewant to show that A =

⋃nAn ∈M and µ∗(A) =

∑n µ∗(An).

Suppose that B ⊆ E. Then we have

µ∗(B) = µ∗(B ∩A1) + µ∗(B ∩AC1 )

Using the fact that A2 ∈M and A1 ∩A2 = ∅, we have

= µ∗(B ∩A1) + µ∗(B ∩A2) + µ∗(B ∩AC1 ∩AC2 )

= · · ·

=

n∑i=1

µ∗(B ∩Ai) + µ∗(B ∩AC1 ∩ · · · ∩ACn )

≥n∑i=1

µ∗(B ∩Ai) + µ∗(B ∩AC).

Taking the limit as n→∞, we have

µ∗(B) ≥∞∑i=1

µ∗(B ∩Ai) + µ∗(B ∩AC).

By the countable-subadditivity of µ∗, we have

µ∗(B ∩A) ≤∞∑i=1

µ∗(B ∩Ai).

7


Thus we obtainµ∗(B) ≥ µ∗(B ∩A) + µ∗(B ∩AC).

By countable subadditivity, we also have inequality in the other direction. Soequality holds. So A ∈M. So M is a σ-algebra.

To see that µ∗ is a measure on M, note that the above implies that

µ∗(B) =

∞∑i=1

(B ∩Ai) + µ∗(B ∩AC).

Taking B = A, this gives

µ∗(A) =

∞∑i=1

(A ∩Ai) + µ∗(A ∩AC) =

∞∑i=1

µ∗(Ai).

Theorem. Suppose that µ1, µ2 are measures on (E, E) with µ1(E) = µ2(E) <∞. If A is a π-system with σ(A) = E , and µ1 agrees with µ2 on A, then µ1 = µ2.

Proof. LetD = A ∈ E : µ1(A) = µ2(A)

We know that D ⊇ A. By Dynkin’s lemma, it suffices to show that D is ad-system. The things to check are:

(i) E ∈ D — this follows by assumption.

(ii) If A,B ∈ D with A ⊆ B, then B \A ∈ D. Indeed, we have the equations

µ1(B) = µ1(A) + µ1(B \A) <∞µ2(B) = µ2(A) + µ2(B \A) <∞.

Since µ1(B) = µ2(B) and µ1(A) = µ2(A), we must have µ1(B \ A) =µ2(B \A).

(iii) Let (An) ∈ D be an increasing sequence with⋃An = A. Then

µ1(A) = limn→∞

µ1(An) = limn→∞

µ2(An) = µ2(A).

So A ∈ D.

Theorem. There exists a unique Borel measure µ on R with µ([a, b]) = b− a.

Proof. We first show uniqueness. Suppose µ is another measure on B satisfyingthe above property. We want to apply the previous uniqueness theorem, but ourmeasure is not finite. So we need to carefully get around that problem.

For each n ∈ Z, we set

µn(A) = µ(A ∩ (n, n+ 1]))

µn(A) = µ(A ∩ (n, n+ 1]))

Then µn and µn are finite measures on B which agree on the π-system of intervalsof the form (a, b] with a, b ∈ R, a < b. Therefore we have µn = µn for all n ∈ Z.Now we have

µ(A) =∑n∈Z

µ(A ∩ (n, n+ 1]) =∑n∈Z

µn(A) =∑n∈Z

µn(A) = µ(A)

8


for all Borel sets A.To show existence, we want to use the Caratheodory extension theorem. We

let A be the collection of finite, disjoint unions of the form

A = (a1, b1] ∪ (a2, b2] ∪ · · · ∪ (an, bn].

Then A is a ring of subsets of R, and σ(A) = B (details are to be checked onthe first example sheet).

We set

µ(A) =

n∑i=1

(bi − ai).

We note that µ is well-defined, since if

A = (a1, b1] ∪ · · · ∪ (an, bn] = (a1, b1] ∪ · · · ∪ (an, bn],

thenn∑i=1

(bi − ai) =

n∑i=1

(bi − ai).

Also, if µ is additive, A,B ∈ A, A ∩B = ∅ and A ∪B ∈ A, we obviously haveµ(A ∪B) = µ(A) + µ(B). So µ is additive.

Finally, we have to show that µ is in fact countably additive. Let (An) be adisjoint sequence in A, and let A =

⋃∞i=1An ∈ A. Then we need to show that

µ(A) =∑∞n=1 µ(An).

Since µ is additive, we have

µ(A) = µ(A1) + µ(A \A1)

= µ(A1) + µ(A2) + µ(A \A1 ∪A2)

=

n∑i=1

µ(Ai) + µ

(A \

n⋃i=1

Ai

)

To finish the proof, we show that

µ

(A \

n⋃i=1

Ai

)→ 0 as n→∞.

We are going to reduce this to the finite intersection property of compact sets in R:if (Kn) is a sequence of compact sets in R with the property that

⋂nm=1Km 6= ∅

for all n, then⋂∞m=1Km 6= ∅.

We first introduce some new notation. We let

Bn = A \n⋃

m=1

Am.

We now suppose, for contradiction, that µ(Bn) 6→ 0 as n→∞. Since the Bn’sare decreasing, there must exist ε > 0 such that µ(Bn) ≥ 2ε for every n.

For each n, we take Cn ∈ A with the property that Cn ⊆ Bn and µ(Bn\Cn) ≤ε

2n . This is possible since each Bn is just a finite union of intervals. Thus we

9


have

µ(Bn)− µ

(n⋂

m=1

Cm

)= µ

(Bn \

n⋂m=1

Cm

)

≤ µ

(n⋃

m=1

(Bm \ Cm)

)

≤n∑

m=1

µ(Bm \ Cm)

≤n∑

m=1

ε

2m

≤ ε.

On the other hand, we also know that µ(Bn) ≥ 2ε.

µ

(n⋂

m=1

Cm

)≥ ε

for all n. We now let that Kn =⋂nm=1 Cm. Then µ(Kn) ≥ ε, and in particular

Kn 6= ∅ for all n.Thus, the finite intersection property says

∅ 6=∞⋂n=1

Kn ⊆∞⋂n=1

Bn = ∅.

This is a contradiction. So we have µ(Bn)→ 0 as n→∞. So done.

Proposition. The Lebesgue measure is translation invariant , i.e.

µ(A+ x) = µ(A)

for all A ∈ B and x ∈ R, where

A+ x = y + x, y ∈ A.

Proof. We use the uniqueness of the Lebesgue measure. We let

µx(A) = µ(A+ x)

for A ∈ B. Then this is a measure on B satisfying µx([a, b]) = b − a. So theuniqueness of the Lebesgue measure shows that µx = µ.

Proposition. Let µ be a Borel measure on R that is translation invariant andµ([0, 1]) = 1. Then µ is the Lebesgue measure.

Proof. We show that any such measure must satisfy

µ([a, b]) = b− a.

By additivity and translation invariance, we can show that µ([p, q]) = q−p for allrational p < q. By considering µ([p, p+ 1/n]) for all n and using the increasingproperty, we know µ(p) = 0. So µ(([p, q)) = µ((p, q]) = µ((p, q)) = q − p forall rational p, q.

Finally, by countable additivity, we can extend this to all real intervals. Thenthe result follows from the uniqueness of the Lebesgue measure.

10


1.2 Probability measures

Proposition. Events (An) are independent iff the σ-algebras σ(An) are inde-pendent.

Theorem. Suppose A1 and A2 are π-systems in F . If

P[A1 ∩A2] = P[A1]P[A2]

for all A1 ∈ A1 and A2 ∈ A2, then σ(A1) and σ(A2) are independent.

Proof. This will follow from two applications of the fact that a finite measure isdetermined by its values on a π-system which generates the entire σ-algebra.

We first fix A1 ∈ A1. We define the measures

µ(A) = P[A ∩A1]

andν(A) = P[A]P[A1]

for all A ∈ F . By assumption, we know µ and ν agree on A2, and we have thatµ(Ω) = P[A1] = ν(Ω) ≤ 1 <∞. So µ and ν agree on σ(A2). So we have

P[A1 ∩A2] = µ(A2) = ν(A2) = P[A1]P[A2]

for all A2 ∈ σ(A2).So we have now shown that if A1 and A2 are independent, then A1 and

σ(A2) are independent. By symmetry, the same argument shows that σ(A1)and σ(A2) are independent.

Lemma (Borel–Cantelli lemma). If∑n

P[An] <∞,

thenP[An i.o.] = 0.

Proof. For each k, we have

P[An i.o] = P

⋂n

⋃m≥n

Am

≤ P

⋃m≥k

Am

≤∞∑m=k

P[Am]

→ 0

as k →∞. So we have P[An i.o.] = 0.

11


Lemma (Borel–Cantelli lemma II). Let (An) be independent events. If∑n

P[An] =∞,

thenP[An i.o.] = 1.

Proof. By example sheet, if (An) is independent, then so is (ACn ). Then we have

P

[N⋂

m=n

ACm

]=

N∏m=n

P[ACm]

=

N∏m=n

(1− P[Am])

≤N∏

m=n

exp(−P[Am])

= exp

(−

N∑m=n

P[Am]

)→ 0

as N →∞, as we assumed that∑n P[An] =∞. So we have

P

[ ∞⋂m=n

ACm

]= 0.

By countable subadditivity, we have

P

[⋃n

∞⋂m=n

ACm

]= 0.

This in turn implies that

P

[⋂n

∞⋃m=n

Am

]= 1− P

[⋃n

∞⋂m=n

ACm

]= 1.

So we are done.

12

2 Measurable functions and random variablesII Probability and Measure (Theorems with proof)

2 Measurable functions and random variables

2.1 Measurable functions

Lemma. Let (E, E) and (G,G) be measurable spaces, and G = σ(Q) for someQ. If f−1(A) ∈ E for all A ∈ Q, then f is measurable.

Proof. We claim thatA ⊆ G : f−1(A) ∈ E

is a σ-algebra on G. Then the result follows immediately by definition of σ(Q).Indeed, this follows from the fact that f−1 preserves everything. More

precisely, we have

f−1

(⋃n

An

)=⋃n

f−1(An), f−1(AC) = (f−1(A))C , f−1(∅) = ∅.

So if, say, all An ∈ A, then so is⋃nAn.

Proposition. Let fi : E → Fi be functions. Then fi are all measurable iff(fi) : E →

∏Fi is measurable, where the function (fi) is defined by setting the

ith component of (fi)(x) to be fi(x).

Proof. If the map (fi) is measurable, then by composition with the projectionsπi, we know that each fi is measurable.

Conversely, if all fi are measurable, then since the σ-algebra of∏Fi is

generated by sets of the form π−1j (A) : A ∈ Fj , and the pullback of such sets

along (fi) is exactly f−1j (A), we know the function (fi) is measurable.

Proposition. Let (E, E) be a measurable space. Let (fn : n ∈ N) be a sequenceof non-negative measurable functions on E. Then the following are measurable:

f1 + f2, f1f2, maxf1, f2, minf1, f2,infnfn, sup

nfn, lim inf

nfn, lim sup

nfn.

The same is true with “real” replaced with “non-negative”, provided the newfunctions are real (i.e. not infinity).

Proof. This is an (easy) exercise on the example sheet. For example, the sumf1 + f2 can be written as the following composition.

E [0,∞]2 [0,∞].(f1,f2) +

We know the second map is continuous, hence measurable. The first function isalso measurable since the fi are. So the composition is also measurable.

The product follows similarly, but for the infimum and supremum, we need tocheck explicitly that the corresponding maps [0,∞]N → [0,∞] is measurable.

Theorem (Monotone class theorem). Let (E, E) be a measurable space, andA ⊆ E be a π-system with σ(A) = E . Let V be a vector space of functions suchthat

(i) The constant function 1 = 1E is in V.

13


(ii) The indicator functions 1A ∈ V for all A ∈ A

(iii) V is closed under bounded, monotone limits.

More explicitly, if (fn) is a bounded non-negative sequence in V, fn f(pointwise) and f is also bounded, then f ∈ V.

Then V contains all bounded measurable functions.

Proof. We first deduce that 1A ∈ V for all A ∈ E .

D = A ∈ E : 1A ∈ V.

We want to show that D = E . To do this, we have to show that D is a d-system.

(i) Since 1E ∈ V, we know E ∈ D.

(ii) If 1A ∈ V , then 1− 1A = 1E\A ∈ V. So E \A ∈ D.

(iii) If (An) is an increasing sequence in D, then 1An→ 1⋃

Anmonotonically

increasingly. So 1⋃An

is in D.

So, by Dynkin’s lemma, we know D = E . So V contains indicators of all measur-able sets. We will now try to obtain any measurable function by approximating.

Suppose that f is bounded and non-negative measurable. We want to showthat f ∈ V. To do this, we approximate it by letting

fn = 2−nb2nfc =

∞∑k=0

k2−n1k2−n≤f<(k+1)2−n.

Note that since f is bounded, this is a finite sum. So it is a finite linearcombination of indicators of elements in E . So fn ∈ V, and 0 ≤ fn → fmonotonically. So f ∈ V.

More generally, if f is bounded and measurable, then we can write

f = (f ∨ 0) + (f ∧ 0) ≡ f+ − f−.

Then f+ and f− are bounded and non-negative measurable. So f ∈ V.

2.2 Constructing new measures

Lemma. Let g : R→ R be non-constant, non-decreasing and right continuous.We set

g(±∞) = limx→±∞

g(x).

We set I = (g(−∞), g(∞)). Since g is non-constant, this is non-empty.Then there is a non-decreasing, left continuous function f : I → R such that

for all x ∈ I and y ∈ R, we have

x ≤ g(y)⇔ f(x) ≤ y.

Thus, taking the negation of this, we have

x > g(y)⇔ f(x) > y.

Explicitly, for x ∈ I, we define

f(x) = infy ∈ R : x ≤ g(y).

14


Proof. We just have to verify that it works. For x ∈ I, consider

Jx = y ∈ R : x ≤ g(y).

Since g is non-decreasing, if y ∈ Jx and y′ ≥ y, then y′ ∈ Jx. Since g isright-continuous, if yn ∈ Jx is such that yn y, then y ∈ Jx. So we have

Jx = [f(x),∞).

Thus, for f ∈ R, we have

x ≤ g(y)⇔ f(x) ≤ y.

So we just have to prove the remaining properties of f . Now for x ≤ x′, we haveJx ⊆ Jx′ . So f(x) ≤ f(x′). So f is non-decreasing.

Similarly, if xn x, then we have Jx =⋂n Jxn . So f(xn)→ f(x). So this

is left continuous.

Theorem. Let g : R→ R be non-constant, non-decreasing and right continuous.Then there exists a unique Radon measure dg on B such that

dg((a, b]) = g(b)− g(a).

Moreover, we obtain all non-zero Radon measures on R in this way.

Proof. Take I and f as in the previous lemma, and let µ be the restriction ofthe Lebesgue measure to Borel subsets of I. Now f is measurable since it is leftcontinuous. We define dg = µ f−1. Then we have

dg((a, b]) = µ(x ∈ I : a < f(x) ≤ b)= µ(x ∈ I : g(a) < x ≤ g(b))= µ((g(a), g(b)]) = g(b)− g(a).

So dg is a Radon measure with the required property.There are no other such measures by the argument used for uniqueness of

the Lebesgue measure.To show we get all non-zero Radon measures this way, suppose we have a

Radon measure ν on R, we want to produce a g such that ν = dg. We set

g(y) =

−ν((y, 0]) y ≤ 0

ν((0, y]) y > 0.

Then ν((a, b]) = g(b) − g(a). We see that ν is non-zero, so g is non-constant.It is also easy to see it is non-decreasing and right continuous. So ν = dg bycontinuity.

2.3 Random variables

Proposition. We have

FX(x)→

0 x→ −∞1 x→ +∞

.

Also, FX(x) is non-decreasing and right-continuous.

15


Proposition. Let F be any distribution function. Then there exists a probabilityspace (Ω,F ,P) and a random variable X such that FX = F .

Proof. Take (Ω,F ,P) = ((0, 1),B(0, 1),Lebesgue). We take X : Ω→ R to be

X(ω) = infx : ω ≤ f(x).

Then we haveX(ω) ≤ x⇐⇒ w ≤ F (x).

So we haveFX(x) = P[X ≤ x] = P[(0, F (x)]] = F (x).

Therefore FX = F .

Proposition. Two real-valued random variables X,Y are independent iff

P[X ≤ x, Y ≤ y] = P[X ≤ x]P[Y ≤ y].

More generally, if (Xn) is a sequence of real-valued random variables, then theyare independent iff

P[x1 ≤ x1, · · · , xn ≤ xn] =

n∏j=1

P[Xj ≤ xj ]

for all n and xj .

Proof. The ⇒ direction is obvious. For the other direction, we simply note that(−∞, x] : x ∈ R is a generating π-system for the Borel σ-algebra of R.

Proposition. Let

(Ω,F ,P) = ((0, 1),B(0, 1),Lebesgue).

be our probability space. Then there exists as sequence Rn of independentBernoulli(1/2) random variables.

Proof. Suppose we have ω ∈ Ω = (0, 1). Then we write ω as a binary expansion

ω =

∞∑n=1

ωn2−n,

where ωn ∈ 0, 1. We make the binary expansion unique by disallowing infinitesequences of zeroes.

We define Rn(ω) = ωn. We will show that Rn is measurable. Indeed, we canwrite

R1(ω) = ω1 = 1(1/2,1](ω),

where 1(1/2,1] is the indicator function. Since indicator functions of measurablesets are measurable, we know R1 is measurable. Similarly, we have

R2(ω) = 1(1/4,1/2](ω) + 1(3/4,1](ω).

16


So this is also a measurable function. More generally, we can do this for anyRn(ω): we have

Rn(ω) =

2n−1∑j=1

1(2−n(2j−1),2−n(2j)](ω).

So each Rn is a random variable, as each can be expressed as a sum of indicatorsof measurable sets.

Now let’s calculate

P[Rn = 1] =

2n−1∑j=1

2−n((2j)− (2j − 1)) =

2n−1∑j=1

2−n =1

2.

Then we have

P[Rn = 0] = 1− P[Rn = 1] =1

2

as well. So Rn ∼ Bernoulli(1/2).We can straightforwardly check that (Rn) is an independent sequence, since

for n 6= m, we have

P[Rn = 0 and Rm = 0] =1

4= P[Rn = 0]P[Rm = 0].

Proposition. Let

(Ω,F ,P) = ((0, 1),B(0, 1),Lebesgue).

Given any sequence (Fn) of distribution functions, there is a sequence (Xn) ofindependent random variables with FXn = Fn for all n.

Proof. Let m : N2 → N be any bijection, and relabel

Yk,n = Rm(k,n),

where the Rj are as in the previous random variable. We let

Yn =

∞∑k=1

2−kYk,n.

Then we know that (Yn) is an independent sequence of random variables, andeach is uniform on (0, 1). As before, we define

Gn(y) = infx : y ≤ Fn(x).

We set Xn = Gn(Yn). Then (Xn) is a sequence of random variables withFXn

= Fn.

2.4 Convergence of measurable functions

Theorem.

(i) If µ(E) <∞, then fn → f a.e. implies fn → f in measure.

17


(ii) For any E, if fn → f in measure, then there exists a subsequence (fnk)

such that fnk→ f a.e.

Proof.

(i) First suppose µ(E) <∞, and fix ε > 0. Consider

µ(x ∈ E : |fn(x)− f(x)| ≤ ε).

We use the result from the first example sheet that for any sequence ofevents (An), we have

lim inf µ(An) ≥ µ(lim inf An).

Applying to the above sequence says

lim inf µ(x : |fn(x)− f(x)| ≤ ε) ≥ µ(x : |fm(x)− f(x)| ≤ ε eventually)≥ µ(x ∈ E : |fm(x)− f(x)| → 0)= µ(E).

As µ(E) <∞, we have µ(x ∈ E : |fn(x)− f(x)| > ε)→ 0 as n→∞.

(ii) Suppose that fn → f in measure. We pick a subsequence (nk) such that

µ

(x ∈ E : |fnk

(x)− f(x)| > 1

k

)≤ 2−k.

Then we have

∞∑k=1

µ

(x ∈ E : fnk

(x)− f(x)| > 1

k

)≤∞∑k=1

2−k = 1 <∞.

By the first Borel–Cantelli lemma, we know

µ

(x ∈ E : |fnk

(x)− f(x)| > 1

ki.o.

)= 0.

So fnk→ f a.e.

Theorem (Skorokhod representation theorem of weak convergence).

(i) If (Xn), X are defined on the same probability space, and Xn → X inprobability. Then Xn → X in distribution.

(ii) If Xn → X in distribution, then there exists random variables (Xn) andX defined on a common probability space with FXn

= FXn and FX = FX

such that Xn → X a.s.

Proof. Let S = x ∈ R : FX is continuous.

(i) Assume that Xn → X in probability. Fix x ∈ S. We need to show thatFXn(x)→ FX(x) as n→∞.

18


We fix ε > 0. Since x ∈ S, this implies that there is some δ > 0 such that

FX(x− δ) ≥ FX(x)− ε

2

FX(x+ δ) ≤ FX(x) +ε

2.

We fix N large such that n ≥ N implies P[|Xn −X| ≥ δ] ≤ ε2 . Then

FXn(x) = P[Xn ≤ x]

= P[(Xn −X) +X ≤ x]

We now notice that (Xn−X) +X ≤ x ⊆ X ≤ x+ δ∪|Xn−X| > δ.So we have

≤ P[X ≤ x+ δ] + P[|Xn −X| > δ]

≤ FX(x+ δ) +ε

2≤ FX(x) + ε.

We similarly have

FXn(x) = P[Xn ≤ x]

≥ P[X ≤ x− δ]− P[|Xn −X| > δ]

≥ FX(x− δ)− ε

2≥ FX(x)− ε.

Combining, we have that n ≥ N implying |Fxn(x)− FX(x)| ≤ ε. Since ε

was arbitrary, we are done.

(ii) Suppose Xn → X in distribution. We again let

(Ω,F ,B) = ((0, 1),B((0, 1)),Lebesgue).

We let

Xn(ω) = infx : ω ≤ FXn(x),X(ω) = infx : ω ≤ FX(x).

Recall from before that Xn has the same distribution function as Xn forall n, and X has the same distribution as X. Moreover, we have

Xn(ω) ≤ x⇔ ω ≤ FXn(x)

x < Xn(ω)⇔ FXn(x) < ω,

and similarly if we replace Xn with X.

We are now going to show that with this particular choice, we have Xn → Xa.s.

Note that X is a non-decreasing function (0, 1) → R. Then by generalanalysis, X has at most countably many discontinuities. We write

Ω0 = ω ∈ (0, 1) : X is continuous at ω0.

19


Then (0, 1) \ Ω0 is countable, and hence has Lebesgue measure 0. So

P[Ω0] = 1.

We are now going to show that Xn(ω)→ X(ω) for all ω ∈ Ω0.

Note that FX is a non-decreasing function, and hence the points of discon-tinuity R \ S is also countable. So S is dense in R. Fix ω ∈ Ω0 and ε > 0.We want to show that |Xn(ω)− X(ω)| ≤ ε for all n large enough.

Since S is dense in R, we can find x−, x+ in S such that

x− < X(ω) < x+

and x+−x− < ε. What we want to do is to use the characteristic propertyof X and FX to say that this implies

FX(x−) < ω < FX(x+).

Then since FXn→ FX at the points x−, x+, for sufficiently large n, we

haveFXn

(x−) < ω < FXn(x+).

Hence we havex− < Xn(ω) < x+.

Then it follows that |Xn(ω)− X(ω)| < ε.

However, this doesn’t work, since X(ω) < x+ only implies ω ≤ FX(x+),and our argument will break down. So we do a funny thing where weintroduce a new variable ω+.

Since X is continuous at ω, we can find ω+ ∈ (ω, 1) such that X(ω+) ≤ x+.

X(ω)

ω

x−x+

ω+

ε

Then we havex− < X(ω) ≤ X(ω+) < x+.

Then we haveFX(x−) < ω < ω+ ≤ FX(x+).

So for sufficiently large n, we have

FXn(x−) < ω < FXn(x+).

So we havex− < Xn(ω) ≤ x+,

and we are done.

20


2.5 Tail events

Theorem (Kolmogorov 0-1 law). Let (Xn) be a sequence of independent (real-valued) random variables. If A ∈ T , then P[A] = 0 or 1.

Moreover, if X is a T -measurable random variable, then there exists aconstant c such that

P[X = c] = 1.

Proof. The proof is very funny the first time we see it. We are going to provethe theorem by checking something that seems very strange. We are going toshow that if A ∈ T , then A is independent of A. It then follows that

P[A] = P[A ∩A] = P[A]P[A],

so P[A] = 0 or 1. In fact, we are going to prove that T is independent of T .Let

Fn = σ(X1, · · · , Xn).

This σ-algebra is generated by the π-system of events of the form

A = X1 ≤ x1, · · · , Xn ≤ xn.

Similarly, Tn = σ(Xn+1, Xn+2, · · · ) is generated by the π-system of events of theform

B = Xn+1 ≤ xn+1, · · · , Xn+k ≤ xn+k,

where k is any natural number.Since the Xn are independent, we know for any such A and B, we have

P[A ∩B] = P[A]P[B].

Since this is true for all A and B, it follows that Fn is independent of Tn.Since T =

⋂k Tk ⊆ Tn for each n, we know Fn is independent of T .

Now⋃k Fk is a π-system, which generates the σ-algebra F∞ = σ(X1, X2, · · · ).

We know that if A ∈⋃n Fn, then there has to exist an index n such that A ∈ Fn.

So A is independent of T . So F∞ is independent of T .Finally, note that T ⊆ F∞. So T is independent of T .To find the constant, suppose that X is T -measurable. Then

P[X ≤ x] ∈ 0, 1

for all x ∈ R since X ≤ x ∈ T .Now take

c = infx ∈ R : P[X ≤ x] = 1.

Then with this particular choice of c, it is easy to see that P[X = c] = 1. Thiscompletes the proof of the theorem.

21

3 Integration II Probability and Measure (Theorems with proof)

3 Integration

3.1 Definition and basic properties

Proposition. A function is simple iff it is measurable, non-negative, and takeson only finitely-many values.

Proposition. Let f : [0, 1]→ R be Riemann integrable. Then it is also Lebesgueintegrable, and the two integrals agree.

Theorem (Monotone convergence theorem). Suppose that (fn), f are non-negative measurable with fn f . Then µ(fn) µ(f).

Proof. We will split the proof into five steps. We will prove each of the followingin turn:

(i) If fn and f are indicator functions, then the theorem holds.

(ii) If f is an indicator function, then the theorem holds.

(iii) If f is simple, then the theorem holds.

(iv) If f is non-negative measurable, then the theorem holds.

Each part follows rather straightforwardly from the previous one, and the readeris encouraged to try to prove it themself.

We first consider the case where fn = 1Anand f = 1A. Then fn f is true

iff An A. On the other hand, µ(fn) µ(f) iff µ(An) µ(A).For convenience, we let A0 = ∅. We can write

µ(A) = µ

(⋃n

An \An−1

)

=

∞∑n=1

µ(An \An−1)

= limN→∞

N∑n=1

µ(An \An−1)

= limN→∞

µ(AN ).

So done.

We next consider the case where f = 1A for some A. Fix ε > 0, and set

An = fn > 1− ε ∈ E .

Then we know that An A, as fn f . Moreover, by definition, we have

(1− ε)1An≤ fn ≤ f = 1A.

As An A, we have that

(1− ε)µ(f) = (1− ε) limn→∞

µ(An) ≤ limn→∞

µ(fn) ≤ µ(f)

22


since fn ≤ f . Since ε is arbitrary, we know that

limn→∞

µ(fn) = µ(f).

Next, we consider the case where f is simple. We write

f =

m∑k=1

ak1Ak,

where ak > 0 and Ak are pairwise disjoint. Since fn f , we know

a−1k fn1Ak

1Ak.

So we have

µ(fn) =

m∑k=1

µ(fn1Ak) =

m∑k=1

akµ(a−1k fn1Ak

)→m∑k=1

akµ(Ak) = µ(f).

Suppose f is non-negative measurable. Suppose g ≤ f is a simple function.As fn f , we know fn ∧ g f ∧ g = g. So by the previous case, we know that

µ(fn ∧ g)→ µ(g).

We also know thatµ(fn) ≥ µ(fn ∧ g).

So we havelimn→∞

µ(fn) ≥ µ(g)

for all g ≤ f . This is possible only if

limn→∞

µ(fn) ≥ µ(f)

by definition of the integral. However, we also know that µ(fn) ≤ µ(f) for all n,again by definition of the integral. So we must have equality. So we have

µ(f) = limn→∞

µ(fn).

Theorem. Let f, g be non-negative measurable, and α, β ≥ 0. We have that

(i) µ(αf + βg) = αµ(f) + βµ(g).

(ii) f ≤ g implies µ(f) ≤ µ(g).

(iii) f = 0 a.e. iff µ(f) = 0.

Proof.

(i) Let

fn = 2−nb2nfc ∧ ngn = 2−nb2ngc ∧ n.

23


Then fn, gn are simple with fn f and gn g. Hence µ(fn) µ(f) andµ(gn g and µ(αfn + βgn) µ(αf + βg), by the monotone convergencetheorem. As fn, gn are simple, we have that

µ(αfn + βgn) = αµ(fn) + βµ(gn).

Taking the limit as n→∞, we get

µ(αf + βg) = αµ(f) + βµ(g).

(ii) We shall be careful not to use the monotone convergence theorem. Wehave

µ(g) = supµ(h) : h ≤ g simple≥ supµ(h) : h ≤ f simple= µ(f).

(iii) Suppose f 6= 0 a.e. Let

An =

x : f(x) >

1

n

.

Thenx : f(x) 6= 0 =

⋃n

An.

Since the left hand set has non-negative measure, it follows that there issome An with non-negative measure. For that n, we define

h =1

n1An

.

Then µ(f) ≥ µ(h) > 0. So µ(f) 6= 0.

Conversely, suppose f = 0 a.e. We let

fn = 2−nb2nfc ∧ n

be a simple function. Then fn f and fn = 0 a.e. So

µ(f) = limn→∞

µ(fn) = 0.

Theorem. Let f, g be integrable, and α, β ≥ 0. We have that

(i) µ(αf + βg) = αµ(f) + βµ(g).

(ii) f ≤ g implies µ(f) ≤ µ(g).

(iii) f = 0 a.e. implies µ(f) = 0.

Proof.

24


(i) We are going to prove these by applying the previous theorem.

By definition of the integral, we have µ(−f) = −µ(f). Also, if α ≥ 0, then

µ(αf) = µ(αf+)− µ(αf−) = αµ(f+)− αµ(f−) = αµ(f).

Combining these two properties, it then follows that if α is a real number,then

µ(αf) = αµ(f).

To finish the proof of (i), we have to show that µ(f + g) = µ(f) + µ(g).We know that this is true for non-negative functions, so we need to employa little trick to make this a statement about the non-negative version. Ifwe let h = f + g, then we can write this as

h+ − h− = (f+ − f−) + (g+ − g−).

We now rearrange this as

h+f− + g− = f+ + g+ + h−.

Now everything is non-negative measurable. So applying µ gives

µ(f+) + µ(f−) + µ(g−) = µ(f+) + µ(g+) + µ(h−).

Rearranging, we obtain

µ(h+)− µ(h−) = µ(f+)− µ(f−) + µ(g+)− µ(g−).

This is exactly the same thing as saying

µ(f + g) = µ(h) = µ(f) = µ(g).

(ii) If f ≤ g, then g−f ≥ 0. So µ(g−f) ≥ 0. By (i), we know µ(g)−µ(f) ≥ 0.So µ(g) ≥ µ(f).

(iii) If f = 0 a.e., then f+, f− = 0 a.e. So µ(f+) = µ(f−) = 0. So µ(f) =µ(f+)− µ(f−) = 0.

Proposition. If A is a π-system with E ∈ A and σ(A) = E , and f is anintegrable function that

µ(f1A) = 0

for all A ∈ A. Then µ(f) = 0 a.e.

Proof. LetD = A ∈ E : µ(f1A) = 0.

It follows immediately from the properties of the integral that D is a d-system.So D = E by Dynkin’s lemma. Let

A+ = x ∈ E : f(x) > 0,A− = x ∈ E : f(x) < 0.

Then A± ∈ E , andµ(f1A+) = µ(f1A−) = 0.

So f1A+ and f1A− vanish a.e. So f vanishes a.e.

25


Proposition. Suppose that (gn) is a sequence of non-negative measurablefunctions. Then we have

µ

( ∞∑n=1

gn

)=

∞∑n=1

µ(gn).

Proof. We know (N∑n=1

gn

)

( ∞∑n=1

gn

)as N →∞. So by the monotone convergence theorem, we have

N∑n=1

µ(gn) = µ

(N∑n=1

gn

) µ

( ∞∑n=1

gn

).

But we also know thatN∑n=1

µ(gn)∞∑n=1

µ(gn)

by definition. So we are done.

3.2 Integrals and limits

Theorem (Fatou’s lemma). Let (fn) be a sequence of non-negative measurablefunctions. Then

µ(lim inf fn) ≤ lim inf µ(fn).

Proof. We start with the trivial observation that if k ≥ n, then we always havethat

infm≥n

fm ≤ fk.

By the monotonicity of the integral, we know that

µ

(infm≥n

fm

)≤ µ(fk).

for all k ≥ n.So we have

µ

(infm≥n

fm

)≤ infk≥n

µ(fk) ≤ lim infm

µ(fm).

It remains to show that the left hand side converges to µ(lim inf fm). Indeed,we know that

infm≥n

fm lim infm

fm.

Then by monotone convergence, we have

µ

(infm≥n

fm

) µ

(lim infm

fm

).

So we haveµ(

lim infm

fm

)≤ lim inf

mµ(fm).

26


Theorem (Dominated convergence theorem). Let (fn), f be measurable withfn(x)→ f(x) for all x ∈ E. Suppose that there is an integrable function g suchthat

|fn| ≤ gfor all n, then we have

µ(fn)→ µ(f)

as n→∞.

Proof. Note that|f | = lim

n|f |n ≤ g.

So we know thatµ(|f |) ≤ µ(g) <∞.

So we know that f , fn are integrable.Now note also that

0 ≤ g + fn, 0 ≤ g − fnfor all n. We are now going to apply Fatou’s lemma twice with these series. Wehave that

µ(g) + µ(f) = µ(g + f)

= µ(

lim infn

(g + fn))

≤ lim infn

µ(g + fn)

= lim infn

(µ(g) + µ(fn))

= µ(g) + lim infn

µ(fn).

Since µ(g) is finite, we know that

µ(f) ≤ lim infn

µ(fn).

We now do the same thing with g − fn. We have

µ(g)− µ(f) = µ(g − f)

= µ(

lim infn

(g − fn))

≤ lim infn

µ(g − fn)

= lim infn

(µ(g)− µ(fn))

= µ(g)− lim supn

µ(fn).

Again, since µ(g) is finite, we know that

µ(f) ≥ lim supn

µ(fn).

These combine to tell us that

µ(f) ≤ lim infn

µ(fn) ≤ lim supn

µ(fn) ≤ µ(f).

So they must be all equal, and thus µ(fn)→ µ(f).

27


3.3 New measures from old

Lemma. For (E, E , µ) a measure space and A ∈ E , the restriction to A is ameasure space.

Proposition. Let (E, E , µ) and (F,F , µ′) be measure spaces and A ∈ E . Letf : E → F be a measurable function. Then f |A is EA-measurable.

Proof. Let B ∈ F . Then

(f |A)−1(B) = f−1(B) ∩A ∈ EA.

Proposition. If f is integrable, then f |A is µA-integrable and µA(f |A) =µ(f1A).

Proposition. If g is a non-negative measurable function on G, then

ν(g) = µ(g f).

Proof. Exercise using the monotone class theorem (see example sheet).

Proposition. The ν defined above is indeed a measure.

Proof.

(i) ν(φ) = µ(f1∅) = µ(0) = 0.

(ii) If (An) is a disjoint sequence in E , then

ν(⋃

An

)= µ(f1⋃

An) = µ

(f∑

1An

)=∑

µ (f1An) =

∑ν(f).

3.4 Integration and differentiation

Proposition (Change of variables formula). Let φ : [a, b]→ R be continuouslydifferentiable and increasing. Then for any bounded Borel function g, we have∫ φ(b)

φ(a)

g(y) dy =

∫ b

a

g(φ(x))φ′(x) dx. (∗)

Proof. We let

V = Borel functions g such that (∗) holds.

We will want to use the monotone class theorem to show that this includes allbounded functions.

We already know that

(i) V contains 1A for all A in the π-system of intervals of the form [u, v] ⊆ [a, b].This is just the fundamental theorem of calculus.

(ii) By linearity of the integral, V is indeed a vector space.

28


(iii) Finally, let (gn) be a sequence in V , and gn ≥ 0, gn g. Then we knowthat ∫ φ(b)

φ(a)

gn(y) dy =

∫ b

a

gn(φ(x))φ′(x) dx.

By the monotone convergence theorem, these converge to∫ φ(b)

φ(a)

g(y) dy =

∫ b

a

g(φ(x))φ′(x) dx.

Then by the monotone class theorem, V contains all bounded Borel functions.

Theorem (Differentiation under the integral sign). Let (E, E , µ) be a space,and U ⊆ R be an open set, and f : U × E → R. We assume that

(i) For any t ∈ U fixed, the map x 7→ f(t, x) is integrable;

(ii) For any x ∈ E fixed, the map t 7→ f(t, x) is differentiable;

(iii) There exists an integrable function g such that∣∣∣∣∂f∂t (t, x)

∣∣∣∣ ≤ g(x)

for all x ∈ E and t ∈ U .

Then the map

x 7→ ∂f

∂t(t, x)

is integrable for all t, and also the function

F (t) =

∫E

f(t, x)dµ

is differentiable, and

F ′(t) =

∫E

∂f

∂t(t, x) dµ.

Proof. Measurability of the derivative follows from the fact that it is a limit ofmeasurable functions, and then integrability follows since it is bounded by g.

Suppose (hn) is a positive sequence with hn → 0. Then let

gn(x) =f(t+ hn, x)− f(t, x)

hn− ∂f

∂t(t, x).

Since f is differentiable, we know that gn(x)→ 0 as n→∞. Moreover, by themean value theorem, we know that

|gn(x)| ≤ 2g(x).

On the other hand, by definition of F (t), we have

F (t+ hn)− F (t)

hn−∫E

∂f

∂t(t, x) dµ =

∫gn(x) dx.

By dominated convergence, we know the RHS tends to 0. So we know

limn→∞

F (t+ hn)− F (t)

hn→∫E

∂f

∂t(t, x) dµ.

Since hn was arbitrary, it follows that F ′(t) exists and is equal to the integral.

29


3.5 Product measures and Fubini’s theorem

Lemma. Let E = E1 × E2 be a product of σ-algebras. Suppose f : E → R isE-measurable function. Then

(i) For each x2 ∈ E2, the function x1 7→ f(x1, x2) is E1-measurable.

(ii) If f is bounded or non-negative measurable, then

f2(x2) =

∫E1

f(x1, x2) µ1(dx1)

is E2-measurable.

Proof. The first part follows immediately from the fact that for a fixed x2,the map ι1 : E1 → E given by ι1(x1) = (x1, x2) is measurable, and that thecomposition of measurable functions is measurable.

For the second part, we use the monotone class theorem. We let V bethe set of all measurable functions f such that x2 7→

∫E1f(x1, x2)µ1(dx1) is

E2-measurable.

(i) It is clear that 1E ,1A ∈ V for all A ∈ A (where A is as in the definitionof the product σ-algebra).

(ii) V is a vector space by linearity of the integral.

(iii) Suppose (fn) is a non-negative sequence in V and fn f , then(x2 7→

∫E1

fn(x1, x2) µ1(dx1)

)(x2 7→

∫E1

f(x1, x2) µ(dx1)

)by the monotone convergence theorem. So f ∈ V .

So the monotone class theorem tells us V contains all bounded measurablefunctions.

Now if f is a general non-negative measurable function, then f ∧n is boundedand measurable, hence f ∧n ∈ V . Therefore f ∈ V by the monotone convergencetheorem.

Theorem. There exists a unique measurable function µ = µ1 ⊗ µ2 on E suchthat

µ(A1 ×A2) = µ(A1)µ(A2)

for all A1 ×A2 ∈ A.

Proof. One might be tempted to just apply the Caratheodory extension theorem,but we have a more direct way of doing it here, by using integrals. We define

µ(A) =

∫E1

(∫E2

1A(x1, x2) µ2(dx2)

)µ1(dx1).

Here the previous lemma is very important. It tells us that these integralsactually make sense!

We first check that this is a measure:

(i) µ(∅) = 0 is immediate since 1∅ = 0.

30


(ii) Suppose (An) is a disjoint sequence and A =⋃An. Then we have

µ(A) =

∫E1

(∫E2

1A(x1, x2) µ2(dx2)

)µ1(dx1)

=

∫E1

(∫E2

∑n

1An(x1, x2) µ2(dx2)

)µ1(dx1)

We now use the fact that integration commutes with the sum of non-negative measurable functions to get

=

∫E1

(∑n

(∫E2

1A(x1, x2) µ2(dx2)

))µ1(dx1)

=∑n

∫E1

(∫E2

1An(x1, x2) µ2(dx2)

)µ1(dx1)

=∑n

µ(An).

So we have a working measure, and it clearly satisfies

µ(A1 ×A2) = µ(A1)µ(A2).

Uniqueness follows because µ is finite, and is thus characterized by its values onthe π-system A that generates E .

Theorem (Fubini’s theorem).

(i) If f is non-negative measurable, then

µ(f) =

∫E1

(∫E2

f(x1, x2) µ2(dx2)

)µ1(dx1). (∗)

In particular, we have∫E1

(∫E2

f(x1, x2) µ2(dx2)

)µ1(dx1) =

∫E2

(∫E1

f(x1, x2) µ1(dx1)

)µ2(dx2).

This is sometimes known as Tonelli’s theorem.

(ii) If f is integrable, and

A =

x1 ∈ E :

∫E2

|f(x1, x2)|µ2(dx2) <∞.

thenµ1(E1 \A) = 0.

If we set

f1(x1) =

∫E2f(x1, x2) µ2(dx2) x1 ∈ A

0 x1 6∈ A,

then f1 is a µ1 integrable function and

µ1(f1) = µ(f).

31


Proof.

(i) Let V be the set of all measurable functions such that (∗) holds. Then Vis a vector space since integration is linear.

(a) By definition of µ, we know 1E and 1A are in V for all A ∈ A.

(b) The monotone convergence theorem on both sides tell us that V isclosed under monotone limits of the form fn f , fn ≥ 0.

By the monotone class theorem, we know V contains all bounded measur-able functions. If f is non-negative measurable, then (f ∧ n) ∈ V , andmonotone convergence for f ∧ n f gives that f ∈ V .

(ii) Assume that f is µ-integrable. Then

x1 7→∫E2

|f(x1, x2)| µ(dx2)

is E1-measurable, and, by (i), is µ1-integrable. So A1, being the inverseimage of ∞ under that map, lies in E1. Moreover, µ1(E1 \A1) = 0 becauseintegrable functions can only be infinite on sets of measure 0.

We set

f+1 (x1) =

∫E2

f+(x1, x2) µ2(dx2)

f−1 (x1) =

∫E2

f−(x1, x2) µ2(dx2).

Then we havef1 = (f+

1 − f−1 )1A1

.

So the result follows since

µ(f) = µ(f+)− µ(f−) = µ(f+1 )− µ1(f−1 ) = µ1(f1).

by (i).

Proposition. Let X1, · · · , Xn be random variables on (Ω,F ,P) with values in(E1, E1), · · · , (En, En) respectively. We define

E = E1 × · · · × En, E = E1 ⊗ · · · ⊗ En.

Then X = (X1, · · · , Xn) is E-measurable and the following are equivalent:

(i) X1, · · · , Xn are independent.

(ii) µX = µX1 ⊗ · · · ⊗ µXn .

(iii) For any f1, · · · , fn bounded and measurable, we have

E

[n∏k=1

fk(Xk)

]=

n∏k=1

E[fk(Xk)].

Proof.

32


– (i) ⇒ (ii): Let ν = µX1× · · · ⊗ µXn

. We want to show that ν = µX . Todo so, we just have to check that they agree on a π-system generating theentire σ-algebra. We let

A = A1 × · · · ×An : A1 ∈ E1, · · · , Ak ∈ Ek.

Then A is a generating π-system of E . Moreover, if A = A1×· · ·×An ∈ A,then we have

µX(A) = P[X ∈ A]

= P[X1 ∈ A1, · · · , Xn ∈ An]

By independence, we have

=

n∏k=1

P[Xk ∈ Ak]

= ν(A).

So we know that µX = ν = µX1 ⊗ · · · ⊗ µXn on E .

– (ii) ⇒ (iii): By assumption, we can evaluate the expectation

E

[n∏k=1

fk(Xk)

]=

∫E

n∏k=1

fk(xk)µ(dxk)

=

n∏k=1

∫Ek

f(xk)µk(dxk)

=

n∏k=1

E[fk(Xk)].

Here in the middle we have used Fubini’s theorem.

– (iii) ⇒ (i): Take fk = 1Akfor Ak ∈ Ek. Then we have

P[X1 ∈ A1, · · · , Xn ∈ An] = E

[n∏k=1

1Ak(Xk)

]

=

n∏k=1

E[1Ak(Xk)]

=

n∏k=1

P[Xk ∈ Ak]

So X1, · · · , Xn are independent.

33

4 Inequalities and Lp spacesII Probability and Measure (Theorems with proof)

4 Inequalities and Lp spaces

4.1 Four inequalities

Proposition (Chebyshev’s/Markov’s inequality). Let f be non-negative mea-surable and λ > 0. Then

µ(f ≥ λ) ≤ 1

λµ(f).

Proof. We writef ≥ f1f≥λ ≥ λ1f≥λ.

Taking µ gives the desired answer.

Proposition (Jensen’s inequality). Let X be an integrable random variablewith values in I. If c : I → R is convex, then we have

E[c(X)] ≥ c(E[X]).

Lemma. If c : I → R is a convex function and m is in the interior of I, thenthere exists real numbers a, b such that

c(x) ≥ ax+ b

for all x ∈ I, with equality at x = m.

φ

m

ax+ b

Proof. If c is smooth, then we know c′′ ≥ 0, and thus c′ is non-decreasing. Weare going to show an analogous statement that does not mention the word“derivative”. Consider x < m < y with x, y,m ∈ I. We want to show that

c(m)− c(x)

m− x≤ c(y)− c(m)

y −m.

To show this, we turn off our brains and do the only thing we can do. We canwrite

m = tx+ (1− t)y

for some t. Then convexity tells us

c(m) ≤ tc(x) + (1− t)c(y).

Writing c(m) = tc(m) + (1− t)c(m), this tells us

t(c(m)− c(x)) ≤ (1− t)(c(y)− c(m)).

34


To conclude, we simply have to compute the actual value of t and plug it in. Wehave

t =y −my − x

, 1− t =m− xy − x

.

So we obtainy −my − x

(c(m)− c(x)) ≤ m− xy − x

(c(y)− c(m)).

Cancelling the y − x and dividing by the factors gives the desired result.Now since x and y are arbitrary, we know there is some a ∈ R such that

c(m)− c(x)

m− x≤ a ≤ c(y)− c(m)

y −m.

for all x < m < y. If we rearrange, then we obtain

c(t) ≥ a(t−m) + c(m)

for all t ∈ I.

Proof of Jensen’s inequality. To apply the previous result, we need to pick aright m. We take

m = E[X].

To apply this, we need to know that m is in the interior of I. So we assume thatX is not a.s. constant (that case is boring). By the lemma, we can find somea, b ∈ R such that

c(X) ≥ aX + b.

We want to take the expectation of the LHS, but we have to make sure theE[c(X)] is a sensible thing to talk about. To make sure it makes sense, we showthat E[c(X)−] = E[(−c(X)) ∨ 0] is finite.

We simply bound

[c(X)]− = [−c(X)] ∨ 0 ≤ |a||X|+ |b|.

So we haveE[c(X)−] ≤ |a|E|X|+ |b| <∞

since X is integrable. So E[c(X)] makes sense.We then just take

E[c(X)] ≥ E[aX + b] = aE[X] + b = am+ b = c(m) = c(E[X]).

So done.

Proposition (Holder’s inequality). Let p, q ∈ (1,∞) be conjugate. Then forf, g measurable, we have

µ(|fg|) = ‖fg‖1 ≤ ‖f‖p‖g‖q.

When p = q = 2, then this is the Cauchy-Schwarz inequality.

35


Proof. We assume that ‖f‖p > 0 and ‖f‖p <∞. Otherwise, there is nothing toprove. By scaling, we may assume that ‖f‖p = 1. We make up a probabilitymeasure by

P[A] =

∫|f |p1A dµ.

Since we know

‖f‖p =

(∫|f |p dµ

)1/p

= 1,

we know P[ · ] is a probability measure. Then we have

µ(|fg|) = µ(|fg|1|f |>0)

= µ

(|g||f |p−1

1|f |>0|f |p)

= E[|g||f |p−1

1|f |>0

]Now use the fact that (E|X|)q ≤ E[|X|q] since x 7→ xq is convex for q > 1. Thenwe obtain

≤(E[|g|q

|f |(p−1)q1|f |>0

])1/q

.

The key realization now is that 1q + 1

p = 1 means that q(p − 1) = p. So thisbecomes

E[|g|q

|f |p1|f |>0

]1/q

= µ(|g|q)1/q = ‖g‖q.

Using the fact that ‖f‖p = 1, we obtain the desired result.

Alternative proof. We wlog 0 < ‖f‖p, ‖g‖q < ∞, or else there is nothing toprove. By scaling, we wlog ‖f‖p = ‖g‖q = 1. Then we have to show that∫

|f ||g| dµ ≤ 1.

To do so, we notice if 1p + 1

q = 1, then the concavity of log tells us for any a, b > 0,we have

1

plog a+

1

qlog b ≤ log

(a

p+b

q

).

Replacing a with ap; b with bp and then taking exponentials tells us

ab ≤ ap

p+bq

q.

While we assumed a, b > 0 when deriving, we observe that it is also valid whensome of them are zero. So we have∫

|f ||g| dµ ≤∫ (

|f |p

p+|g|q

q

)dµ =

1

p+

1

q= 1.

Lemma. Let a, b ≥ 0 and p ≥ 1. Then

(a+ b)p ≤ 2p(ap + bp).

36


Proof. We wlog a ≤ b. Then

(a+ b)p ≤ (2b)p ≤ 2pbp ≤ 2p(ap + bp).

Theorem (Minkowski inequality). Let p ∈ [1,∞] and f, g measurable. Then

‖f + g‖p ≤ ‖f‖p + ‖g‖p.

Proof. We do the boring cases first. If p = 1, then

‖f + g‖1 =

∫|f + g| ≤

∫(|f |+ |g|) =

∫|f |+

∫|g| = ‖f‖1 + ‖g‖1.

The proof of the case of p =∞ is similar.Now note that if ‖f + g‖p = 0, then the result is trivial. On the other hand,

if ‖f + g‖p =∞, then since we have

|f + g|p ≤ (|f |+ |g|)p ≤ 2p(|f |p + |g|p),

we know the right hand side is infinite as well. So this case is also done.Let’s now do the interesting case. We compute

µ(|f + g|p) = µ(|f + g||f + g|p−1)

≤ µ(|f ||f + g|p−1) + µ(|g||f + g|p−1)

≤ ‖f‖p‖|f + g|p−1‖q + ‖g‖p‖|f + g|p−1‖q= (‖f‖p + ‖g‖p)‖|f + g|p−1‖q= (‖f‖p + ‖g‖p)µ(|f + g|(p−1)q)1−1/p

= (‖f‖p + ‖g‖p)µ(|f + g|p)1−1/p.

So we knowµ(|f + g|p) ≤ (‖f‖p + ‖g‖p)µ(|f + g|p)1−1/p.

Then dividing both sides by (µ(|f + g|p)1−1/p tells us

µ(|f + g|p)1/p = ‖f + g‖p ≤ ‖f‖p + ‖g‖p.

4.2 Lp spaces

Theorem. Let 1 ≤ p ≤ ∞. Then Lp is a Banach space. In other words, if (fn)is a sequence in Lp, with the property that ‖fn − fm‖p → 0 as n,m→∞, thenthere is some f ∈ Lp such that ‖fn − f‖p → 0 as n→∞.

Proof. We will only give the proof for p < ∞. The p = ∞ case is left as anexercise for the reader.

Suppose that (fn) is a sequence in Lp with ‖fn − fm‖p → 0 as n,m → ∞.Take a subsequence (fnk

) of (fn) with

‖fnk+1− fnk

‖p ≤ 2−k

for all k ∈ N. We then find that∥∥∥∥∥M∑k=1

|fnk+1− fnk

|

∥∥∥∥∥p

≤M∑k=1

‖fnk+1− fnk

‖p ≤ 1.

37


We know that

M∑k=1

|fnk+1− fnk

| ∞∑k=1

|fnk+1− fnk

| as M →∞.

So applying the monotone convergence theorem, we know that∥∥∥∥∥∞∑k=1

|fnk+1− fnk

|

∥∥∥∥∥p

≤∞∑k=1

‖fnk+1− fnk

‖p ≤ 1.

In particular,∞∑k=1

|fnk+1− fnk

| <∞ a.e.

So fnk(x) converges a.e., since the real line is complete. So we set

f(x) =

limk→∞ fnk

(x) if the limit exists

0 otherwise

By an exercise on the first example sheet, this function is indeed measurable.Then we have

‖fn − f‖pp = µ(|fn − f |p)

= µ

(lim infk→∞

|fn − fnk|p)

≤ lim infk→∞

µ(|fn − fnk|p),

which tends to 0 as n → ∞ since the sequence is Cauchy. So f is indeed thelimit.

Finally, we have to check that f ∈ Lp. We have

µ(|f |p) = µ(|f − fn + fn|p)≤ µ((|f − fn|+ |fn|)p)≤ µ(2p(|f − fn|p + |fn|p))= 2p(µ(|f − fn|p) + µ(|fn|p)2)

We know the first term tends to 0, and in particular is finite for n large enough,and the second term is also finite. So done.

4.3 Orthogonal projection in L2

Theorem. Let V be a closed subspace of L2. Then each f ∈ L2 has anorthogonal decomposition

f = u+ v,

where v ∈ V and u ∈ V ⊥. Moreover,

‖f − v‖2 ≤ ‖f − g‖2

for all g ∈ V with equality iff g ∼ v.

38


Lemma (Pythagoras identity).

‖f + g‖2 = ‖f‖2 + ‖g‖2 + 2〈f, g〉.

Lemma (Parallelogram law).

‖f + g‖2 + ‖f − g‖2 = 2(‖f‖2 + ‖g‖2).

Proof of orthogonal decomposition. Given f ∈ L2, we take a sequence (gn) in Vsuch that

‖f − gn‖2 → d(f, V ) = infg‖f − g‖2.

We now want to show that the infimum is attained. To do so, we show that gnis a Cauchy sequence, and by the completeness of L2, it will have a limit.

If we apply the parallelogram law with u = f − gn and v = f − gm, then weknow

‖u+ v‖22 + ‖u− v‖22 = 2(‖u‖22 + ‖v‖22).

Using our particular choice of u and v, we obtain∥∥∥∥2

(f − gn + gm

2

)∥∥∥∥2

2

+ ‖gn − gm‖22 = 2(‖f − gn‖22 + ‖f − gm‖22).

So we have

‖gn − gm‖22 = 2(‖f − gn‖22 + ‖f − gm‖22)− 4

∥∥∥∥f − gn + gm2

∥∥∥∥2

2

.

The first two terms on the right hand side tend to d(f, V )2, and the last termis bounded below in magnitude by 4d(f, V ). So as n,m → ∞, we must have‖gn − gm‖2 → 0. By completeness of L2, there exists a g ∈ L2 such that gn → g.

Now since V is assumed to be closed, we can find a v ∈ V such that g = va.e. Then we know

‖f − v‖2 = limn→∞

‖f − gn‖2 = d(f, V ).

So v attains the infimum. To show that this gives us an orthogonal decomposition,we want to show that

u = f − v ∈ V ⊥.

Suppose h ∈ V . We need to show that 〈u, h〉 = 0. We need to do another funnytrick. Suppose t ∈ R. Then we have

d(f, V )2 ≤ ‖f − (v + th)‖22= ‖f − v‖2 + t2‖h‖22 − 2t〈f − v, h〉.

We think of this as a quadratic in t, which is minimized when

t =〈f − v, h〉‖h‖22

.

But we know this quadratic is minimized when t = 0. So 〈f − v, h〉 = 0.

39


Proposition. The conditional expectation of X given G is the projection of Xonto the subspace L2(G,P) of G-measurable L2 random variables in the ambientspace L2(P).

Proof. Let Y be the conditional expectation. It suffices to show that E[(X−W )2]is minimized for W = Y among G-measurable random variables. Suppose thatW is a G-measurable random variable. Since

G = σ(Gn : n ∈ N),

it follows that

W =

∞∑n=1

an1Gn.

where an ∈ R. Then

E[(X −W )2] = E

( ∞∑n=1

(X − an)1Gn

)2

= E

[∑n

(X2 + a2n − 2anX)1Gn

]

= E

[∑n

(X2 + a2n − 2anE[X | Gn])1Gn

]

We now optimize the quadratic

X2 + a2n − 2anE[X | Gn]

over an. We see that this is minimized for

an = E[X | Gn].

Note that this does not depend on what X is in the quadratic, since it is in theconstant term.

Therefore we know that E[X | Gn] is minimized for W = Y .

4.4 Convergence in L1(P) and uniform integrability

Theorem (Bounded convegence theorem). Suppose X, (Xn) are random vari-ables. Assume that there exists a (non-random) constant C > 0 such that|Xn| ≤ C. If Xn → X in probability, then Xn → X in L1.

Proof. We first show that |X| ≤ C a.e. Let ε > 0. We then have

P[|X| > C + ε] ≤ P[|X −Xn|+ |Xn| > C + ε]

≤ P[|X −Xn| > ε] + P[|Xn| > C]

We know the second term vanishes, while the first term → 0 as n→∞. So weknow

P[|X| > C + ε] = 0

for all ε. Since ε was arbitrary, we know |X| ≤ C a.s.

40


Now fix an ε > 0. Then

E[|Xn −X| = E[|Xn −X|(1|Xn−X|≤ε + 1|Xn−X|>ε)

]≤ ε+ 2C P [|Xn −X| > ε] .

Since Xn → X in probability, for N sufficiently large, the second term is ≤ ε.So E[|Xn −X|] ≤ 2ε, and we have convergence in L1.

Proposition. Finite unions of uniformly integrable sets are uniformly integrable.

Proposition. Let X be an Lp-bounded family for some p > 1. Then X isuniformly integrable.

Proof. We letC = sup‖X‖p : X ∈ X <∞.

Suppose that X ∈ X and A ∈ F . We then have

E[|X|1A] =≤ E[|X|p]1/pP[A]1/q ≤ CP[A]1/q.

by Holder’s inequality, where p, q are conjugates. This is now a uniform bounddepending only on P[A]. So done.

Lemma. Let X be a family of random variables. Then X is uniformly integrableif and only if

supE[|X|1|X|>k] : X ∈ X → 0

as k →∞.

Proof.

(⇒) Suppose that χ is uniformly integrable. For any k, and X ∈ X by Cheby-shev inequality, we have

P[|X| ≥ k] ≤ E[X]

k.

Given ε > 0, we pick δ such that P[|X|1A] < ε for all A with µ(A) < δ.Then pick k sufficiently large such that kδ < supE[X] : X ∈ X. ThenP[|X| ≥ k] < δ, and hence E[|X|1|X|>k] < ε for all X ∈ X .

(⇐) Suppose that the condition in the lemma holds. We first show that X isL1-bounded. We have

E[|X|] = E[|X|(1|X|≤k + 1|X|>k)] ≤ k + E[|X|1|X|>k] <∞

by picking a large enough k.

Next note that for any measurable A and X ∈ X , we have

E[|X|1A] = E[|X|1A(1|X|>k + 1|X|≤k)] ≤ E[|X|1|X|>k] + kP[A].

Thus, for any ε > 0, we can pick k sufficiently large such that the firstterm is < ε

2 for all X ∈ X by assumption. Then when P[A] < ε2k , we have

E|X|1A] ≤ ε.

41


Corollary. Let X = X, where X ∈ L1(P). Then X is uniformly integrable.Hence, a finite collection of L1 functions is uniformly integrable.

Proof. Note that

E[|X|] =

∞∑k=0

E[|X|1X∈[k,k+1)].

Since the sum is finite, we must have

E[|X|1|X|≥K ] =

∞∑k=K

E[|X|1X∈[k,k+1)]→ 0.

Theorem. Let X, (Xn) be random variables. Then the following are equivalent:

(i) Xn, X ∈ L1 for all n and Xn → X in L1.

(ii) Xn is uniformly integrable and Xn → X in probability.

Proof. We first assume that Xn, X are L1 and Xn → X in L1. We want to showthat Xn is uniformly integrable and Xn → X in probability.

We first show that Xn → X in probability. This is just going to come fromthe Chebyshev inequality. For ε > 0. Then we have

P[|X −Xn| > ε] ≤ E[|X −Xn|]ε

→ 0

as n→∞.Next we show that Xn is uniformly integrable. Fix ε > 0. Take N such

that n ≥ N implies E[|X−Xn|] ≤ ε2 . Since finite families of L1 random variables

are uniformly integrable, we can pick δ > 0 such that A ∈ F and P[A] < δimplies

E[X1A],E[|Xn|1A] ≤ ε

2

for n = 1, · · · , N .Now when n > N and A ∈ F with P[A] ≤ δ, then we have

E[|Xn|1A] ≤ E[|X −Xn|1A] + E[|X|1A]

≤ E[|X −Xn|] +ε

2

≤ ε

2+ε

2= ε.

So Xn is uniformly integrable.

Assume that Xn is uniformly integrable and Xn → X in probability.The first step is to show that X ∈ L1. We want to use Fatou’s lemma, but

to do so, we want almost sure convergence, not just convergence in probability.Recall that we have previously shown that there is a subsequence (Xnk

) of(Xn) such that Xnk

→ X a.s. Then we have

E[|X|] = E[lim infk→∞

|Xnk|]≤ lim inf

k→∞E[|Xnk

|] <∞

42


since uniformly integrable families are L1 bounded. So E[|X|] < ∞, henceX ∈ L1.

Next we want to show that Xn → X in L1. Take ε > 0. Then there existsK ∈ (0,∞) such that

E[|X|1|X|>K

],E[|Xn|1|Xn|>K

]≤ ε

3.

To set things up so that we can use the bounded convergence theorem, we haveto invent new random variables

XKn = (Xn ∨ −K) ∧K, XK = (X ∨ −K) ∧K.

Since Xn → X in probability, it follows that XKn → XK in probability.

Now bounded convergence tells us that there is some N such that n ≥ Nimplies

E[|XKn −XK |] ≤ ε

3.

Combining, we have for n ≥ N that

E[|Xn −X|] ≤ E[|XKn −XK |] + E[|X|1|X|≥K] + E[|Xn|1|Xn|≥K] ≤ ε.

So we know that Xn → X in L1.

43

5 Fourier transform II Probability and Measure (Theorems with proof)

5 Fourier transform

5.1 The Fourier transform

Proposition.‖f‖∞ ≤ ‖f‖1, ‖µ‖∞ ≤ µ(Rd).

Proposition. The functions f , µ are continuous.

Proof. If un → u, then

f(x)ei(un,x) → f(x)ei(u,x).

Also, we know that|f(x)ei(un,x)| = |f(x)|.

So we can apply dominated convergence theorem with |f | as the bound.

5.2 Convolutions

Proposition. For any f ∈ Lp and ν a probability measure, we have

‖f ∗ ν‖p ≤ ‖f‖p.

Proposition.

f ∗ ν(u) = f(u)ν(u).

Proof. We have

f ∗ ν(u) =

∫ (∫f(x− y)ν(dy)

)ei(u,x) dx

=

∫∫f(x− y)ei(u,x) dx ν(dy)

=

∫ (∫f(x− y)ei(u,x−y) d(x− y)

)ei(u,y) µ(dy)

=

∫ (∫f(x)ei(u,x) d(x)

)ei(u,y) µ(dy)

=

∫f(u)ei(u,x) µ(dy)

= f(u)

∫ei(u,x) µ(dy)

= f(u)ν(u).

Proposition. Let µ, ν be probability measures, and X,Y be independent vari-ables with laws µ, ν respectively. Then

µ ∗ ν(u) = µ(u)ν(u).

Proof. We have

µ ∗ ν(u) = E[ei(u,X+Y )] = E[ei(u,X)]E[ei(u,Y )] = µ(u)ν(u).

44


5.3 Fourier inversion formula

Theorem (Fourier inversion formula). Let f, f ∈ L1. Then

f(x) =1

(2π)d

∫f(u)e−i(u,x) du a.e.

Proposition. Let Z ∼ N(0, 1). Then

φZ(a) = e−u2/2.

Proof. We have

φZ(u) = E[eiuZ ]

=1√2π

∫eiuxe−x

2/2 dx.

We now notice that the function is bounded, so we can differentiate under theintegral sign, and obtain

φ′Z(u) = E[iZeiuZ ]

=1√2π

∫ixeiuxe−x

2/2 dx

= −uφZ(u),

where the last equality is obtained by integrating by parts. So we know thatφZ(u) solves

φ′Z(u) = −uφZ(u).

This is easy to solve, since we can just integrate this. We find that

log φZ(u) = −1

2u2 + C.

So we haveφZ(u) = Ae−u

2/2.

We know that A = 1, since φZ(0) = 1. So we have

φZ(u) = e−u2/2.

Proposition. Let Z = (Z1, · · · , Zd) with Zj ∼ N(0, 1) independent. Then√tZ

has density

gt(x) =1

(2πt)d/2e−|x|

2/(2t).

withgt(u) = e−|u|

2t/2.

45


Proof. We have

gt(u) = E[ei(u,√tZ)]

=

d∏j=1

E[ei(uj ,√tZj)]

=

d∏j=1

φZ(√tuj)

=

d∏j=1

e−tu2j/2

= e−|u|2t/2.

Lemma. The Fourier inversion formula holds for the Gaussian density function.

Proposition.‖f ∗ gt‖1 ≤ ‖f‖1.

Proposition.‖f ∗ gt‖∞ ≤ (2πt)−d/2‖f‖1.

Proposition.

‖f ∗ gt‖1 = ‖f gt‖1 ≤ (2π)d/2t−d/2‖f‖1,and

‖f ∗ gt‖∞ ≤ ‖f‖∞.

Lemma. The Fourier inversion formula holds for Gaussian convolutions.

Proof. We have

f ∗ gt(x) =

∫f(x− y)gt(y) dy

=

∫f(x− y)

(1

(2π)d

∫gt(u)e−i(u,y) du

)dy

=

(1

2π

)d ∫∫f(x− y)gt(u)e−i(u,y) du dy

=

(1

2π

)d ∫ (∫f(x− y)e−i(u,x−y) dy

)gt(u)e−i(u,x) du

=

(1

2π

)d ∫f(u)gt(u)e−i(u,x) du

=

(1

2π

)d ∫f ∗ gt(u)e−i(u,x) du

So done.

Theorem (Fourier inversion formula). Let f ∈ L1 and

ft(x) = (2π)−d∫f(u)e−|u|

2t/2e−i(u,x) du = (2π)−d∫f ∗ gt(u)e−i(u,x) du.

Then ‖ft−f‖1 → 0, as t→ 0, and the Fourier inversion holds whenever f, f ∈ L1.

46


Lemma. Suppose that f ∈ Lp with p ∈ [1,∞). Then ‖f ∗ gt − f‖p → 0 ast→ 0.

Proof. We fix ε > 0. By a question on the example sheet, we can find h whichis continuous and with compact support such that ‖f − h‖p ≤ ε

3 . So we have

‖f ∗ gt − h ∗ gt‖p = ‖(f − h) ∗ gt‖p ≤ ‖f − h‖p ≤ε

3.

So it suffices for us to work with a continuous function h with compact support.We let

e(y) =

∫|h(x− y)− h(x)|p dx.

We first show that e is a bounded function:

e(y) ≤∫

2p(|h(x− y)|p + |h(x)|p) dx

= 2p+1‖h‖pp.

Also, since h is continuous and bounded, the dominated convergence theoremtells us that e(y)→ 0 as y → 0.

Moreover, using the fact that∫gt(y) dy = 1, we have

‖h ∗ gt − h‖pp =

∫ ∣∣∣∣∫ (h(x− y)− h(x))gt(y) dy

∣∣∣∣p dx

Since gt(y) dy is a probability measure, by Jensen’s inequality, we can boundthis by

≤∫∫|h(x− y)− h(x)|pgt(y) dy dx

=

∫ (∫|h(x− y)− h(x)|p dx

)gt(y) dy

=

∫e(y)gt(y) dy

=

∫e(√ty)g1(y) dy,

where we used the definition of g and substitution. We know that this tends to 0as t→ 0 by the bounded convergence theorem, since we know that e is bounded.

Finally, we have

‖f ∗ gt − f‖p ≤ ‖f ∗ gt − h ∗ gt‖p + ‖h ∗ gt − h‖p + ‖h− f‖p

≤ ε

3+ε

3+ ‖h ∗ gt − h‖p

=2ε

3+ ‖h ∗ gt − h‖p.

Since we know that ‖h ∗ gt − h‖p → 0 as t→ 0, we know that for all sufficientlysmall t, the function is bounded above by ε. So we are done.

47


Proof of Fourier inversion theorem. The first part is just a special case of theprevious lemma. Indeed, recall that

f ∗ gt(u) = f(u)e−|u|2t/2.

Since Gaussian convolutions satisfy Fourier inversion formula, we know that

ft = f ∗ gt.

So the previous lemma says exactly that ‖ft − f‖1 → 0.

Suppose now that f ∈ L1 as well. Then looking at the integrand of

ft(x) = (2π)−d∫f(u)e−|u|

2t/2e−i(u,x) du,

we know that ∣∣∣f(u)e−|u|2t/2e−i(u,x)

∣∣∣ ≤ |f |.Then by the dominated convergence theorem with dominating function |f |, weknow that this converges to

ft(x)→ (2π)−d∫f(u)e−i(u,x) du as t→ 0.

By the first part, we know that ‖ft − f‖1 → 0 as t → 0. So we can find asequence (tn) with tn > 0, tn → 0 so that ftn → f a.e. Combining these, weknow that

f(x) =

∫f(u)e−i(u,x) du a.e.

So done.

5.4 Fourier transform in L2

Theorem (Plancherel identity). For any function f ∈ L1 ∩ L2, the Plancherelidentity holds:

‖f‖2 = (2π)d/2‖f‖2.

Proof. We first work with the special case where f, f ∈ L1, since the Fourierinversion formula holds for f . We then have

‖f‖22 =

∫f(x)f(x) dx

=1

(2π)d

∫ (∫f(u)e−i(u,x) du

)f(x) dx

=1

(2π)d

∫f(u)

(f(x)e−i(u,x) dx

)du

=1

(2π)d

∫f(u)

(f(x)ei(u,x) dx

)du

=1

(2π)d

∫f(u)f(u) du

=1

(2π)d‖f(u)‖22.

48


So the Plancherel identity holds for f .To prove it for the general case, we use this result and an approximation

argument. Suppose that f ∈ L1 ∩ L2, and let ft = f ∗ gt. Then by our earlierlemma, we know that

‖ft‖2 → ‖f‖2 as t→ 0.

Now note thatft(u) = f(u)gt(u) = f(u)e−|u|

2t/2.

The important thing is that e−|u|2t/2 1 as t→ 0. Therefore, we know

‖ft‖22 =

∫|f(u)|2e−|u|

2t du→∫|f(u)|2 du = ‖f‖22

as t→ 0, by monotone convergence.Since ft, ft ∈ L1, we know that the Plancherel identity holds, i.e.

‖ft‖2 = (2π)d/2‖ft‖2.

Taking the limit as t→ 0, the result follows.

Theorem. There exists a unique Hilbert space automorphism F : L2 → L2

such thatF ([f ]) = [(2π)−d/2f ]

whenever f ∈ L1 ∩ L2.Here [f ] denotes the equivalence class of f in L2, and we say F : L2 → L2 is

a Hilbert space automorphism if it is a linear bijection that preserves the innerproduct.

Proof. We define F0 : L1 ∩ L2 → L2 by

F0([f ]) = [(2π)−d/2f ].

By the Plancherel identity, we know F0 preserves the L2 norm, i.e.

‖F0([f ])‖2 = ‖[f ]‖2.

Also, we know that L1 ∩ L2 is dense in L2, since even the continuous functionswith compact support are dense. So we know F0 extends uniquely to an isometryF : L2 → L2.

Since it preserves distance, it is in particular injective. So it remains to showthat the map is surjective. By Fourier inversion, the subspace

V = [f ] ∈ L2 : f, f ∈ L1

is sent to itself by the map F . Also if f ∈ V , then F 4[f ] = [f ] (note thatapplying it twice does not suffice, because we actually have F 2[f ](x) = [f ](−x)).So V is contained in the image F , and also V is dense in L2, again because itcontains all Gaussian convolutions (we have ft = f gt, and f is bounded and gtis decaying exponentially). So we know that F is surjective.

49


5.5 Properties of characteristic functions

Theorem. The characteristic function φX of a distribution µX of a randomvariable X determines µX . In other words, if X and X are random variablesand φX = φX , then µX = µX

Proof sketch. Use the Fourier inversion to show that φX determines µX(g) =E[g(X)] for any bounded, continuous g.

Theorem. If φX is integrable, then µX has a bounded, continuous densityfunction

fX(x) = (2π)−d∫φX(u)e−i(u,x) du.

Proof sketch. Let Z ∼ N(0, 1) be independent of X. Then X +√tZ has a

bounded continuous density function which, by Fourier inversion, is

ft(x) = (2π)−d∫φX(u)e−|u|

2t/2e−i(u,x) du.

Sending t→ 0 and using the dominated convergence theorem with dominatingfunction |φX |.

Theorem. Let X, (Xn) be random variables with values in Rd. If φXn(u) →

φX(u) for each u ∈ Rd, then µXn→ µX weakly.

Proof sketch. By the example sheet, it suffices to show that E[g(Xn)]→ E[g(X)]for all compactly supported g ∈ C∞. We then use Fourier inversion andconvergence of characteristic functions to check that

E[g(Xn +√tZ)]→ E[g(X +

√tZ)]

for all t > 0 for Z ∼ N(0, 1) independent of X, (Xn). Then we check thatE[g(Xn +

√tZ)] is close to E[g(Xn)] for t > 0 small, and similarly for X.

5.6 Gaussian random variables

Proposition. Let X ∼ N(µ, σ2). Then

E[X] = µ, var(X) = σ2.

Also, for any a, b ∈ R, we have

aX + b ∼ N(aµ+ b, a2σ2).

Lastly, we have

φX(u) = e−iµu−u2σ2/2.

Proof. All but the last of them follow from direct calculation, and can be foundin IA Probability.

For the last part, if X ∼ N(µ, σ2), then we can write

X = σZ + µ,

50


where Z ∼ N(0, 1). Recall that we have previously found that the characteristicfunction of a N(0, 1) function is

φZ(u) = e−|u|2/2.

So we have

φX(u) = E[eiu(σZ+µ)]

= eiuµE[eiuσZ]

= eiuµφZ(iuσ)

= eiuµ−u2σ2/2.

Theorem. Let X be Gaussian on Rn, and le tA be an m×n matrix and b ∈ Rm.Then

(i) AX + b is Gaussian on Rm.

(ii) X ∈ L2 and its law µX is determined by µ = E[X] and V = var(X), thecovariance matrix.

(iii) We haveφX(u) = ei(u,µ)−(u,V u)/2.

(iv) If V is invertible, then X has a density of

fX(x) = (2π)−n/2(detV )−1/2 exp

(−1

2(x− µ, V −1(x− µ))

).

(v) If X = (X1, X2) where Xi ∈ Rni , then cov(X1, X2) = 0 iff X1 and X2 areindependent.

Proof.

(i) If u ∈ Rm, then we have

(AX + b, u) = (AX, u) + (b, u) = (X,ATu) + (b, u).

Since (X,ATu) is Gaussian and (b, u) is constant, it follows that (AX+b, u)is Gaussian.

(ii) We know in particular that each component of X is a Gaussian randomvariable, which are in L2. So X ∈ L2. We will prove the second part of (ii)with (iii)

(iii) If µ = E[X] and V = var(X), then if u ∈ Rn, then we have

E[(u,X)] = (u, µ), var((u,X)) = (u, V u).

So we know(u,X) ∼ N((u, µ), (u, V u)).

So it follows that

φX(u) = E[ei(u,X)] = ei(u,µ)−(u,V u)/2.

So µ and V determine the characteristic function of X, which in turndetermines the law of X.

51


(iv) We start off with a boring Gaussian vector Y = (Y1, · · · , Yn), where theYi ∼ N(0, 1) are independent. Then the density of Y is

fY (y) = (2π)−n/2e−|y|2/2.

We are now going to construct X from Y . We define

X = V 1/2Y + µ.

This makes sense because V is always non-negative definite. Then X isGaussian with E[X] = µ and var(X) = V . Therefore X has the samedistribution as X. Since V is assumed to be invertible, we can computethe density of X using the change of variables formula.

(v) It is clear that if X1 and X2 are independent, then cov(X1, X2) = 0.

Conversely, let X = (X1, X2), where cov(X1, X2) = 0. Then we have

V = var(X) =

(V11 00 V22

).

Then for u = (u1, u2), we have

(u, V u) = (u1V11u1) + (u2, V22u2),

where V11 = var(X1) and V22 var(X2). Then we have

φX(u) = eiµu−(u,V u)/2

= eiµ1u1−(u1,V11u1)/2eiµ2u2−(u2,V22u2)/2

= φX1(u1)φX2

(u2).

So it follows that X1 and X2 are independent.

52

6 Ergodic theory II Probability and Measure (Theorems with proof)

6 Ergodic theory

Proposition. If f is integrable and Θ is measure-preserving. Then f Θ isintegrable and ∫

f Θdµ =

∫E

fdµ.

Proposition. If Θ is ergodic and f is invariant, then there exists a constant csuch that f = c a.e.

Theorem. The shift map Θ is an ergodic, measure preserving transformation.

Proof. It is an exercise to show that Θ is measurable and measure preserving.To show that Θ is ergodic, recall the definition of the tail σ-algebra

Tn = σ(Xm : m ≥ n+ 1), T =⋂n

Tn.

Suppose that A ∈∏n∈NAn ∈ A. Then

Θ−n(A) = Xn+k ∈ Ak for all k ∈ Tn.

Since Tn is a σ-algebra, we and Θ−n(A) ∈ TN for all A ∈ A and σ(A) = E , weknow Θ−n(A) ∈ TN for all A ∈ E .

So if A ∈ EΘ, i.e. A = Θ−1(A), then A ∈ TN for all N . So A ∈ T .From the Kolmogorov 0-1 law, we know either µ[A] = 1 or µ[A] = 0. So

done.

6.1 Ergodic theorems

Lemma (Maximal ergodic lemma). Let f be integrable, and

S∗ = supn≥0

Sn(f) ≥ 0,

where S0(f) = 0 by convention. Then∫S∗>0

f dµ ≥ 0.

Proof. We letS∗n = max

0≤m≤nSm

andAn = S∗n > 0.

Now if 1 ≤ m ≤ n, then we know

Sm = f + Sm−1 Θ ≤ f + S∗n Θ.

Now on An, we haveS∗n = max

1≤m≤nSm,

since S0 = 0. So we haveS∗n ≤ f + S∗n Θ.

53


On ACn , we haveS∗n = 0 ≤ S∗n Θ.

So we know∫E

S∗n dµ =

∫An

S∗n dµ+

∫AC

n

S∗n dµ

≤∫An

f dµ+

∫An

S∗n Θ dµ+

∫AC

n

S∗n Θ dµ

=

∫An

f dµ+

∫E

S∗n Θ dµ

=

∫An

f dµ+

∫E

S∗n dµ

So we know ∫An

f dµ ≥ 0.

Taking the limit as n → ∞ gives the result by dominated convergence withdominating function f .

Theorem (Birkhoff’s ergodic theorem). Let (E, E , µ) be σ-finite and f beintegrable. There exists an invariant function f such that

µ(|f |) ≤ µ(|f |),

andSn(f)

n→ f a.e.

If Θ is ergodic, then f is a constant.

Theorem (von Neumann’s ergodic theorem). Let (E, E , µ) be a finite measurespace. Let p ∈ [1,∞) and assume that f ∈ Lp. Then there is some functionf ∈ Lp such that

Sn(f)

n→ f in Lp.

Proof of Birkhoff’s ergodic theorem. We first note that

lim supn

Snn, lim sup

n

Snn

are invariant functions, Indeed, we know

Sn Θ = f Θ + f Θ2 + · · ·+ f Θn

= Sn+1 − f

So we have

lim supn→∞

Sn Θ

n= lim sup

n→∞

Snn

+f

n→ lim sup

n→∞

Snn.

Exactly the same reasoning tells us the lim inf is also invariant.

54


What we now need to show is that the set of points on which lim sup andlim inf do not agree have measure zero. We set a < b. We let

D = D(a, b) =

x ∈ E : lim inf

n→∞

Sn(x)

n< a < b < lim sup

n→∞

Sn(x)

n

.

Now if lim sup Sn(x)n 6= lim inf Sn(x)

n , then there is some a, b ∈ Q such thatx ∈ D(a, b). So by countable subadditivity, it suffices to show that µ(D(a, b)) = 0for all a, b.

We now fix a, b, and just write D. Since lim sup Sn

n and lim inf Sn

n are bothinvariant, we have that D is invariant. By restricting to D, we can assume thatD = E.

Suppose that B ∈ E and µ(G) <∞. We let

g = f − b1B .

Then g is integrable because f is integrable and µ(B) <∞. Moreover, we have

Sn(g) = Sn(f − b1B) ≥ Sn(f)− nb.

Since we know that lim supnSn(f)n > b by definition, we can find an n such that

Sn(g) > 0. So we know that

S∗(g)(x) = supnSn(g)(x) > 0

for all x ∈ D. By the maximal ergodic lemma, we know

0 ≤∫D

g dµ =

∫D

f − b1B dµ =

∫D

f dµ− bµ(B).

If we rearrange this, we know

bµ(B) ≤∫D

f dµ.

for all measurable sets B ∈ E with finite measure. Since our space is σ-finite, wecan find Bn D such µ(Bn) <∞ for all n. So taking the limit above tells

bµ(D) ≤∫D

f dµ. (†)

Now we can apply the same argument with (−a) in place of b and (−f) in placeof f to get

(−a)µ(D) ≤ −∫D

f dµ. (‡)

Now note that since b > a, we know that at least one of b > 0 and a < 0 has tobe true. In the first case, (†) tells us that µ(D) is finite, since f is integrable.Then combining with (‡), we see that

bµ(D) ≤∫D

f dµ ≤ aµ(D).

But a < b. So we must have µ(D) = 0. The second case follows similarly (orfollows immediately by flipping the sign of f).

55


We are almost done. We can now define

f(x) =

limSn(f)/n the limit exists

0 otherwise

Then by the above calculations, we have

Sn(f)

n→ f a.e.

Also, we know f is invariant, because limSn(f)/n is invariant, and so is the setwhere the limit exists.

Finally, we need to show that

µ(f) ≤ µ(|f |).

This is sinceµ(|f Θn|) = µ(|f |)

as Θn preserves the metric. So we have that

µ(|Sn|) ≤ nµ(|f |) <∞.

So by Fatou’s lemma, we have

µ(|f |) ≤ µ(

lim infn

∣∣∣∣Snn∣∣∣∣)

≤ lim infn

µ

(Snn

)≤ µ(|f |)

Proof of von Neumann ergodic theorem. It is an exercise on the example sheetto show that

‖f Θ‖pp =

∫|f Θ|p dµ =

∫|f |p dµ = ‖f‖pp.

So we have ∥∥∥∥Snn∥∥∥∥p

=1

n‖f + f Θ + · · ·+ f Θn−1‖ ≤ ‖f‖p

by Minkowski’s inequality.So let ε > 0, and take M ∈ (0,∞) so that if

g = (f ∨ (−M)) ∧M,

then‖f − g‖p <

ε

3.

By Birkhoff’s theorem, we know

Sn(g)

n→ g

56


a.e.Also, we know ∣∣∣∣Sn(g)

n

∣∣∣∣ ≤Mfor all n. So by bounded convergence theorem, we know∥∥∥∥Sn(g)

n− g∥∥∥∥p

→ 0

as n→∞. So we can find N such that n ≥ N implies∥∥∥∥Sn(g)

n− g∥∥∥∥p

<ε

3.

Then we have ∥∥f − g∥∥pp

=

∫lim inf

n

∣∣∣∣Sn(f − g)

n

∣∣∣∣p dµ

≤ lim inf

∫ ∣∣∣∣Sn(f − g)

n

∣∣∣∣p dµ

≤ ‖f − g‖pp.

So if n ≥ N , then we know∥∥∥∥Sn(f)

n− f

∥∥∥∥p

≤∥∥∥∥Sn(f − g)

n

∥∥∥∥p

+

∥∥∥∥Sn(g)

n− f

∥∥∥∥p

+∥∥g − f∥∥

p≤ ε.

So done.

57

7 Big theorems II Probability and Measure (Theorems with proof)

7 Big theorems

7.1 The strong law of large numbers

Theorem (Strong law of large numbers assuming finite fourth moments). Let(Xn) be a sequence of independent random variables such that there exists µ ∈ Rand M > 0 such that

E[Xn] = µ, E[X4n] ≤M

for all n. With Sn = X1 + · · ·+Xn, we have that

Snn→ µ a.s. as n→∞.

Proof. We reduce to the case that µ = 0 by setting

Yn = Xn − µ.

We then have

E[Yn] = 0, E[Y 4n ] ≤ 24(E[µ4 +X4

n]) ≤ 24(µ4 +M).

So it suffices to show that the theorem holds with Yn in place of Xn. So we canassume that µ = 0.

By independence, we know that for i 6= j, we have

E[XiX3j ] = E[Xi]E[X3

j ] = 0.

Similarly, for all i, j, k, ` distinct, we have

E[XiXjX2k ] = E[XiXjXkX`] = 0.

Hence we know that

E[S4n] = E

n∑k=1

X4k + 6

∑1≤i<j≤n

X2iX

2j

.We know the first term is bounded by nM , and we also know that for i 6= j, wehave

E[X2iX

2j ] = E[X2

i ]E[X2j ] ≤

√E[X4

i ]E[X4j ] ≤M

by Jensen’s inequality. So we know

E

6∑

1≤i<j≤n

X2iX

2j

≤ 3n(n− 1)M.

Putting everything together, we have

E[S4n] ≤ nM + 3n(n− 1)M ≤ 3n2M.

So we know

E[(Sn/n)4

]≤ 3M

n2.

58


So we know

E

[ ∞∑n=1

(Snn

)4]≤∞∑n=1

3M

n2<∞.

So we know that∞∑n=1

(Snn

)4

<∞ a.s.

So we know that (Sn/n)4 → 0 a.s., i.e. Sn/n→ 0 a.s.

Theorem (Strong law of large numbers). Let (Yn) be an iid sequence of inte-grable random variables with mean ν. With Sn = Y1 + · · ·+ Yn, we have

Snn→ ν a.s.

Proof. Let m be the law of Y1 and let Y = (Y1, Y2, Y3, · · · ). We can view Y asa function

Y : Ω→ RN = E.

We let (E, E , µ) be the canonical space associated with the distribution m sothat

µ = P Y −1.

We let f : E → R be given by

f(x1, x2, · · · ) = X1(x1, · · · , xn) = x1.

Then X1 has law given by m, and in particular is integrable. Also, the shift mapΘ : E → E given by

Θ(x1, x2, · · · ) = (x2, x3, · · · )

is measure-preserving and ergodic. Thus, with

Sn(f) = f + f Θ + · · ·+ f Θn−1 = X1 + · · ·+Xn,

we have thatSn(f)

n→ f a.e.

by Birkhoff’s ergodic theorem. We also have convergence in L1 by von Neumannergodic theorem.

Here f is EΘ-measurable, and Θ is ergodic, so we know that f = c a.e. forsome constant c. Moreover, we have

c = µ(f) = limn→∞

µ(Sn(f)/n) = ν.

So done.

59


7.2 Central limit theorem

Theorem. Let (Xn) be a sequence of iid random variables with E[Xi] = 0 andE[X2

1 ] = 1. Then if we set

Sn = X1 + · · ·+Xn,

then for all x ∈ R, we have

P[Sn√n≤ x

]→∫ x

−∞

e−y2/2

√2π

dy = P[N(0, 1) ≤ x]

as n→∞.

Proof. Let φ(u) = E[eiuX1 ]. Since E[X21 ] = 1 < ∞, we can differentiate under

the expectation twice to obtain

φ(u) = E[eiuX1 ], φ′(u) = E[iX1eiuX1 ], φ′′(u) = E[−X2

1eiuX1 ].

Evaluating at 0, we have

φ(0) = 1, φ′(0) = 0, φ′′(0) = −1.

So if we Taylor expand φ at 0, we have

φ(u) = 1− u2

2+ o(u2).

We consider the characteristic function of Sn/√n

φn(u) = E[eiuSn/√n]

=

n∏i=1

E[eiuXj/√n]

= φ(u/√n)n

=

(1− u2

2n+ o

(u2

n

))n.

We now take the logarithm to obtain

log φn(u) = n log

(1− u2

2n+ o

(u2

n

))= −u

2

2+ o(1)

→ −u2

2

So we know thatφn(u)→ e−u

2/2,

which is the characteristic function of a N(0, 1) random variable.So we have convergence in characteristic function, hence weak convergence,

hence convergence in distribution.

60

dec41.user.srcf.netdec41.user.srcf.net/notes/II_M/probability_and_measure_thm_proof.p… · Part II...

Documents

Transcript of dec41.user.srcf.netdec41.user.srcf.net/notes/II_M/probability_and_measure_thm_proof.p… · Part II...