LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE...

160
VIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic Processes (STATS) Helmut Strasser Department of Statistics and Mathematics Vienna University of Economics and Business Administration [email protected] http://helmut.strasserweb.net/public November 5, 2007 Copyright c 2006 by Helmut Strasser All rights reserved. No part of this text may be reproduced, stored in a retrieval system,

Transcript of LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE...

Page 1: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

VIENNA GRADUATE SCHOOL OF FINANCE (VGSF)

LECTURE NOTES

Introduction to Probability Theory andStochastic Processes (STATS)

Helmut Strasser

Department of Statistics and MathematicsVienna University of Economics and Business

[email protected]

http://helmut.strasserweb.net/public

November 5, 2007

Copyright c© 2006 by Helmut Strasser

All rights reserved. No part of this text may be reproduced, stored in a retrieval system,

Page 2: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

2

or transmitted, in any form or by any means, electronic, mechanical, photocoping,recording, or otherwise, without prior written permission of the author.

Page 3: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Contents

Preliminaries i0.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . i0.2 Literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ii

1 Foundations of mathematical analysis 11.1 Sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.1.1 Set operations . . . . . . . . . . . . . . . . . . . . . . . . . . 11.1.2 Cartesian products . . . . . . . . . . . . . . . . . . . . . . . 31.1.3 Uncountable sets . . . . . . . . . . . . . . . . . . . . . . . . 3

1.2 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.3 Real numbers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.4 Real-valued functions . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.4.1 Simple functions . . . . . . . . . . . . . . . . . . . . . . . . 71.4.2 Regulated functions . . . . . . . . . . . . . . . . . . . . . . 81.4.3 Riemannian approximation . . . . . . . . . . . . . . . . . . . 101.4.4 Functions of bounded variation . . . . . . . . . . . . . . . . 11

1.5 Banach spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121.6 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2 Measures and measurable functions 152.1 Sigma-fields . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2.1.1 The concept of a sigma-field . . . . . . . . . . . . . . . . . . 152.1.2 How to construct sigma-fields . . . . . . . . . . . . . . . . . 162.1.3 Borel sigma-fields . . . . . . . . . . . . . . . . . . . . . . . 17

2.2 Measurable functions . . . . . . . . . . . . . . . . . . . . . . . . . . 172.2.1 The idea of measurability . . . . . . . . . . . . . . . . . . . . 172.2.2 The basic abstract assertions . . . . . . . . . . . . . . . . . . 182.2.3 The structure of real-valued measurable functions . . . . . . . 18

2.3 Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.3.1 The concept of measures . . . . . . . . . . . . . . . . . . . . 202.3.2 The abstract construction of measures . . . . . . . . . . . . . 21

2.4 Measures on the real line . . . . . . . . . . . . . . . . . . . . . . . . 22

3

Page 4: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

4 CONTENTS

2.4.1 Point measures . . . . . . . . . . . . . . . . . . . . . . . . . 222.4.2 The Lebesgue measure . . . . . . . . . . . . . . . . . . . . . 222.4.3 Measure defining functions . . . . . . . . . . . . . . . . . . . 232.4.4 Discrete measures . . . . . . . . . . . . . . . . . . . . . . . 24

3 Integrals 253.1 The integral of simple functions . . . . . . . . . . . . . . . . . . . . 253.2 The extension process . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.2.1 Extension to nonnegative functions . . . . . . . . . . . . . . 263.2.2 Integrable functions . . . . . . . . . . . . . . . . . . . . . . 27

3.3 Convergence of integrals . . . . . . . . . . . . . . . . . . . . . . . . 283.3.1 The theorem of monotone convergence . . . . . . . . . . . . 293.3.2 The infinite series theorem . . . . . . . . . . . . . . . . . . . 293.3.3 The dominated convergence theorem . . . . . . . . . . . . . 30

3.4 Stieltjes integration . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.4.1 The notion of the Stieltjes integral . . . . . . . . . . . . . . . 303.4.2 Integral calculus . . . . . . . . . . . . . . . . . . . . . . . . 32

3.5 Proofs of the main theorems . . . . . . . . . . . . . . . . . . . . . . 35

4 More on integration 394.1 The image of a measure . . . . . . . . . . . . . . . . . . . . . . . . . 394.2 Measures with densities . . . . . . . . . . . . . . . . . . . . . . . . . 404.3 Product measures and Fubini’s theorem . . . . . . . . . . . . . . . . 424.4 Spaces of integrable functions . . . . . . . . . . . . . . . . . . . . . 45

4.4.1 Integrable functions . . . . . . . . . . . . . . . . . . . . . . 454.4.2 Square integrable functions . . . . . . . . . . . . . . . . . . . 47

4.5 Fourier transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Probability 515.1 Basic concepts of probability theory . . . . . . . . . . . . . . . . . . 51

5.1.1 Probability spaces . . . . . . . . . . . . . . . . . . . . . . . 515.1.2 Random variables . . . . . . . . . . . . . . . . . . . . . . . . 515.1.3 Distributions of random variables . . . . . . . . . . . . . . . 525.1.4 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

5.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 545.3 Convergence and limit theorems . . . . . . . . . . . . . . . . . . . . 56

5.3.1 Convergence in probability . . . . . . . . . . . . . . . . . . . 565.3.2 Convergence in distribution . . . . . . . . . . . . . . . . . . 57

5.4 The causality theorem . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6 Random walks 616.1 The ruin problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.1.1 One player . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Page 5: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

CONTENTS 5

6.1.2 Two players . . . . . . . . . . . . . . . . . . . . . . . . . . . 636.2 Optional stopping . . . . . . . . . . . . . . . . . . . . . . . . . . . . 646.3 Wald’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.3.1 Improving chances . . . . . . . . . . . . . . . . . . . . . . . 666.3.2 First passage of a one-sided boundary . . . . . . . . . . . . . 666.3.3 First passage of a two-sided boundary . . . . . . . . . . . . . 67

6.4 Gambling systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7 Conditioning 717.1 Conditional expectation . . . . . . . . . . . . . . . . . . . . . . . . . 717.2 Martingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 757.3 Some theorems on martingales . . . . . . . . . . . . . . . . . . . . . 78

8 Continuous time processes 818.1 Basic concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 818.2 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . 818.3 Point processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 848.4 Levy processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 858.5 The Wiener Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9 Continuous time martingales 919.1 From independent increments to martingales . . . . . . . . . . . . . . 919.2 A technical issue: Augmentation . . . . . . . . . . . . . . . . . . . . 939.3 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

9.3.1 Hitting times . . . . . . . . . . . . . . . . . . . . . . . . . . 969.3.2 The optional stopping theorem . . . . . . . . . . . . . . . . . 97

9.4 Application: First passage times of the Wiener process . . . . . . . . 1009.4.1 One-sided boundaries . . . . . . . . . . . . . . . . . . . . . . 1009.4.2 Two-sided boundaries . . . . . . . . . . . . . . . . . . . . . 1029.4.3 The reflection principle . . . . . . . . . . . . . . . . . . . . . 103

9.5 The Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . 105

10 The stochastic integral 10710.1 Integrals along stochastic paths . . . . . . . . . . . . . . . . . . . . . 10710.2 The integral of simple processes . . . . . . . . . . . . . . . . . . . . 10810.3 Semimartingales . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11110.4 Extending the stochastic integral . . . . . . . . . . . . . . . . . . . . 11310.5 The Wiener integral . . . . . . . . . . . . . . . . . . . . . . . . . . . 115

11 Stochastic calculus 11911.1 The associativity rule . . . . . . . . . . . . . . . . . . . . . . . . . . 11911.2 Quadratic variation and the integration-by-parts formula . . . . . . . 12111.3 Ito’s formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

Page 6: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

6 CONTENTS

12 Stochastic differential equations 12712.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12712.2 The abstract linear equation . . . . . . . . . . . . . . . . . . . . . . . 12812.3 Wiener driven models . . . . . . . . . . . . . . . . . . . . . . . . . . 130

13 Martingales and stochastic integrals 13313.1 Locally square integrable martingales . . . . . . . . . . . . . . . . . 13313.2 Square integrable martingales . . . . . . . . . . . . . . . . . . . . . . 13613.3 Levy’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13713.4 Martingale representation . . . . . . . . . . . . . . . . . . . . . . . . 138

14 Change of probability measures 14314.1 Equivalent probability measures . . . . . . . . . . . . . . . . . . . . 14314.2 The exponential martingale . . . . . . . . . . . . . . . . . . . . . . . 14414.3 Likelihood processes . . . . . . . . . . . . . . . . . . . . . . . . . . 14414.4 Girsanov’s theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 146

Page 7: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Preliminaries

0.1 Introduction

The goal of this course is to give an introduction into some mathematical concepts andtools which are indispensable for understanding the modern mathematical theory offinance. Let us give an overview of historic origins of some of the mathematical tools.

The central topic will be those probabilistic concepts and results which play an im-portant role in mathematical finance. Therefore we have to deal with mathematicalprobability theory. Mathematical probability theory is formulated in a language thatcomes from measure theory and integration. This language differs considerably fromthe language of classical analysis, known under the label of calculus. Therefore, ourfirst step will be to get an impression of basic measure theory and integration.

We will not go into the advanced problems of measure theory where this theory be-comes exciting. Such topics would be closely related to advanced set theory andtopology which also differs basically from mere set theoretic language and topologi-cally driven slang which is convenient for talking about mathematics but nothing more.Similarly, our usage of measure theory and integration is sort of a convenient languagewhich on this level is of little interest in itself. For us its worth arises with its power togive insight into exciting applications like probability and mathematical finance.

Therefore, our presentation of measure theory and integration will be an overviewrather than a specialized training program. We will become more and more familiarwith the language and its typical kind of reasoning as we go into those applicationsfor which we are highly motivated. These will be probability theory and stochasticcalculus.

In the field of probability theory we are interested in probability models having a dy-namic structure, i.e. a time evolution governed by endogeneous correlation properties.Such probability models are called stochastic processes.

Probability theory is a young theory compared with the classical cornerstones of math-ematics. It is illuminating to have a look at the evolution of some fundamental ideasof defining a dynamic structure of stochastic processes.

i

Page 8: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

ii PRELIMINARIES

One important line of thought is looking at stationarity. Models which are themselvesstationary or are cumulatives of stationary models have determined the econometricliterature for decades. For Gaussian models one need not distinguish between strictand weak (covariance) stationarity. As for weak stationarity it turns out that typicalprocesses follow difference or differential equations driven by some noise process.The concept of a noise process is motivated by the idea that it does not transport anyinformation.

From the beginning of serious investigation of stochastic processes (about 1900) an-other idea was leading in the scientific literature, i.e. the Markov property. This isnot the place to go into details of the overwhelming progress in Markov chains andprocesses achieved in the first half of the 20th century. However, for a long timethis theory failed to describe the dynamic behaviour of continuous time Markov pro-cesses in terms of equations between single states at different times. Such equationshave been the common tools for deterministic dynamics (ordinary difference and dif-ferential equations) and for discrete time stationary stochastic sequences. In contrast,continuous time Markov processes were defined in terms of the dynamic behaviour oftheir distributions rather than of their states, using partial difference and differentialequations.

The situation changed dramatically about the middle of the 20th century. There weretwo ingenious concepts at the beginning of this disruption. The first is the concept ofa martingale introduced by Doob. The martingale turned out to be the final mathe-matical fixation of the idea of noise. The notion of a martingale is located betweena process with uncorrelated increments and a process with independent increments,both of which were the competing noise concepts up to that time. The second conceptis that of a stochastic integral due to K. Ito. This notion makes it possible to applydifferential reasoning to stochastic dynamics.

At the beginning of the stochastic part of this lecture we will present an introduction tothe ideas of martingales and stopping times at hand of stochastic sequences (discretetime processes). However, the main subject of the second half of the lecture will becontinuous time processes with a strong focus on the Wiener process. However, thenotions of martingales, semimartingales and stochastic integrals are introduced in away which lays the foundation for the study of more general process theory. Thechoice of examples is governed by be the needs of financial applications (covering thenotion of gambling, of course).

0.2 Literature

Let us give some comments to the bibliography.

The popular monograph by Bauer, [1], has been for a long time the standard textbook

Page 9: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

0.2. LITERATURE iii

in Germany on measure theoretic probability. However, probability theory has manydifferent faces. The book by Shiryaev, [21], is much closer to those modern conceptswe are heading to. Both texts are mathematically oriented, i.e. they aim at giving com-plete and general proofs of fundamental facts, preferable in abstract terms. A modernintroduction into probability models containing plenty of fascinating phenomena isgiven by Bremaud, [6] and [7]. The older monograph by Bremaud, [5], is not locatedat the focus of this lecture but contains as appendix an excellent primer on probabilitytheory.

Our topic in stochastic processes will be the Wiener process and the stochastic analy-sis of Wiener driven systems. A standard monograph on this subject is Karatzas andShreve, [15]. The Wiener systems part of the probability primer by Bremaud givesa very compact overview of the main facts. Today, Wiener driven systems are a veryspecial framework for modelling financial markets. In the meanwhile, general stochas-tic analysis is in a more or less final state, called semimartingale theory. Present andfuture research applies this theory in order to get a much more flexible modelling offinancial markets. Our introduction to semimartingale theory follows the outline byProtter, [20] (see also [19]).

Let us mention some basic literature on mathematical finance.

There is a standard source by Hull, [11]. Although this book heavily tries to presentitself as not demanding, nevertheless the contrary is true. The reason is that the com-bination of financial intuition and the appearently informal utilization of advancedmathematical tools requires on the reader’s side a lot of mathematical knowledge inorder to catch the intrinsics. Paul Wilmott, [22] and [23], tries to cover all topics infinancial mathematics together with the corresponding intuition, and to make the an-alytical framework a bit more explicit and detailed than Hull does. I consider thesebooks by Hull and Wilmott as a must for any beginner in mathematical finance.

The books by Hull and Wilmott do not pretend to talk about mathematics. Let usmention some references which have a similar goal as this lecture, i.e. to present themathematical theory of stochastic analysis aiming at applications in finance.

A very popular book which may serve as a bridge from mathematical probability tofinancial mathematics is by Björk, [4]. Another book, giving an introduction both tothe mathematical theory and financial mathematics, is by Hunt and Kennedy, [12].

Standard monographs on mathematical finance which could be considered as corner-stones marking the state of the art at the time of their publication are Karatzas andShreve, [16], Musiela and Rutkowski, [17], and Bielecki and Rutkowski, [3]. Thepresent lecture should lay some foundations for reading books of that type.

Page 10: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

iv PRELIMINARIES

Page 11: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 1

Foundations of mathematical analysis

1.1 Sets

1.1.1 Set operations

Let Ω be a basic set and let A, B, C . . . be subsets. Remember the basic set operationsA ∩B (intersection), A ∪B (union), Ac (complementation) and their rules. Denotethe difference of sets by A \B := A ∩Bc.

PROBLEM 1.1: Describe in words de Morgan’s laws: (A ∪B)c = Ac ∩Bc,(A ∩B)c = Ac ∪Bc.PROBLEM 1.2: Show that A \ (B ∪ C) = (A \B) ∩ (A \ C).PROBLEM 1.3: Expand A \ (B ∩ C).

We denote by N, Q, R the sets of natural numbers, of rational numbers and of realnumbers, respectively.

Set operations can also be applied to infinite families of sets, e.g. to a sequence (Ai)∞i=1

of sets.

PROBLEM 1.4: Describe in words:∞⋃i=1

Ai,∞⋂i=1

Ai

PROBLEM 1.5: Explain De Morgan’s laws:( ∞⋃i=1

Ai

)c=

∞⋂i=1

Aci ,( ∞⋂i=1

Ai

)c=

∞⋃i=1

Aci

PROBLEM 1.6: Describe the elements of the sets

lim infi→∞

Ai :=∞⋃k=1

∞⋂i=k

Ai, lim supi→∞

Ai :=∞⋂k=1

∞⋃i=k

Ai

1

Page 12: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

2 CHAPTER 1. FOUNDATIONS OF MATHEMATICAL ANALYSIS

by the properties: is contained in at most finitely many Ai, is contained in in-finitely many Ai, is contained in all but finitely many Ai.PROBLEM 1.7: Establish the subset relations between the sets considered in the

preceding problems.

A sequence (Ai)∞i=1 of sets is increasing (Ai ↑) if A1 ⊆ A2 ⊆ A3 . . . and it is decreas-

ing (Ai ↓) if A1 ⊇ A2 ⊇ A3 . . . A sequence of sets is a monotone sequence if it iseither increasing or decreasing.

PROBLEM 1.8: Find the union and the intersection of monotone sequences ofsets.PROBLEM 1.9: Find lim inf and lim sup of monotone sequences of sets.

The preceding problems explain why the union of an increasing sequence is called itslimit. Similarly the intersection of a decreasing sequence is called its limit.

PROBLEM 1.10: Let a < b. Find the limits of

(a, b+ 1/n], (a, b− 1/n], (a, b+ 1/n), (a, b− 1/n)[a+ 1/n, b), [a− 1/n, b), (a+ 1/n, b), (a− 1/n, b]

PROBLEM 1.11: Find the limits of

x : |x| < 1/n, x : |x| ≤ 1/n, x : |x| > 1/n, x : |x| ≥ 1/nx : |x| < 1− 1/n, x : |x| < 1 + 1/n, x : |x| ≥ 1− 1/n, x : |x| ≥ 1 + 1/n

PROBLEM 1.12: Let (Ai)∞i=1 be any sequence of sets. Determine the limits of

Bn :=n⋃i=1

Ai, Cn :=n⋂i=1

Ai,

for n→∞.

The set of all subsets of a set A is the power set of A.

PROBLEM 1.13: Let A be a set with N elements. Explain why the power setcontains 2N elements.

The preceding problem explains the name of the power set and why the power set ofset A is denoted by 2A.

Page 13: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

1.1. SETS 3

1.1.2 Cartesian products

Let A and B be sets. Then the (Cartesian) product A×B is the set of all ordered pairs(a, b) where a ∈ A and b ∈ B. This notion is extended in an obvious way to productsof any finite or infinite collection of sets. We write A2 := A × A, A3 := A × A × Aetc.

The elements of a product An are lists (vectors) a = (a1, a2, . . . , an) whose elementsai are called components. For every product of sets there are coordinate functions

Xi : An → A : a = (a1, a2, . . . , an) 7→ ai

In this way subsets of An can be described by

(Xi = b) = a ∈ An : ai = b, (Xi = b1, Xj = b2) = a ∈ An : ai = b1, aj = b2

PROBLEM 1.14: Let Ω = 0, 1n. Find the number of elements of (maxXi =1) and (minXi = 1).PROBLEM 1.15: (1) Let Ω = 0, 13. Find (X1 = 0), (X1 +X3 = 1).

(2) Let Ω = 0, 1n. Find the number of elements of (X1 + · · ·+Xn = k).PROBLEM 1.16: (1) Let Ω = 0, 13. Find 2Ω.

(2) Let Ω = 0, 1n. Find the number of elements in Ω and in 2Ω.(3) Let Ω = ω1, . . . , ωN. Find the number of elements of Ωn.

The symbol AN denotes the set of all infinite sequences consisting of elements of A.

PROBLEM 1.17: Let Ω = 0, 1N. Describe by formula:(1) The set of all sequences in Ω containing no components equal to 1.(2) The set of all sequences in Ω containing at least one component equal to 1.(3) The set of all sequences in Ω containing at most finitely many componentsequal to 1.(4) The set of all sequences in Ω containing at least infinitely many componentsequal to 1.(5) The set of all sequences in Ω where all but at most finitely many componentsare equal to 1.(6) The set of all sequences in Ω where all components are equal to 1.

1.1.3 Uncountable sets

An infinite set is countable if its elements can be arranged as a sequence. Otherwiseit is called uncountable. Two sets A and B are called equivalent (have equal cardi-nality) if there is one-to-one correspondence between the elements of A and of B. It isclear that equivalent infinite sets are either both countable or uncountable.

Page 14: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

4 CHAPTER 1. FOUNDATIONS OF MATHEMATICAL ANALYSIS

PROBLEM 1.18: Explain why Ω = 0, 1N uncountable.PROBLEM 1.19: Explain that R is equivalent to Ω = 0, 1N and thus uncount-

able.PROBLEM 1.20: Explain why the power set of a countable set is equivalent toΩ = 0, 1N and thus uncountable.

1.2 Functions

Let X and Y be non-empty sets.

A function f : X → Y is a set of pairs (x, f(x)) ∈ X × Y such that for every x ∈ Xthere is exactly one f(x) ∈ Y . X is the domain of f and Y is the range of f .

A function f : X → Y is injective if f(x1) = f(x2) implies x1 = x2. It is surjectiveif for every y ∈ Y there is x ∈ X such that f(x) = y. If a function is injective andsurjective then it is bijective.

If A ⊆ X then f(A) := f(x) : x ∈ A is the image of A under f . If B ⊆ Y thenf−1(B) := x : f(x) ∈ B is the inverse image of B under f .

PROBLEM 1.21: f−1(B1 ∪B2) = f−1(B1) ∪ f−1(B2).PROBLEM 1.22: f−1(B1 ∩B2) = f−1(B1) ∩ f−1(B2).PROBLEM 1.23: f−1(Bc) = (f−1(B))c

PROBLEM 1.24: Extend the preceding formulas to families of sets.

PROBLEM 1.25: f(A1 ∪A2) = f(A1) ∪ f(A2).PROBLEM 1.26: (1) f(A1 ∩A2) ⊆ f(A1) ∩ f(A2).

(2) Give an example where inequality holds in (b).(3) Show that for injective functions equality holds in (b).(4) Extend (a) and (b) to families of sets.

PROBLEM 1.27: f(f−1(B)) = f(X) ∩BPROBLEM 1.28: f−1(f(A)) ⊇ A

Let f : X → Y and g : Y → Z. Then the composition g f is the function from Xto Z such that (g f)(x) = g(f(x)).

PROBLEM 1.29: Let f : X → Y and g : Y → Z. Show that (g f)−1(C) =f−1(g−1(C)), C ⊆ Z.

Page 15: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

1.3. REAL NUMBERS 5

1.3 Real numbers

The set R of real numbers is well-known, at least regarding its basic algebraic opera-tions. Let us talk about topological properties of R.

The following is not intended to be an introduction to the subject, but a checklist whichshould be well understood or otherwise an introductory textbook has to be consulted.

A subset M ⊆ R is bounded from above if there is an upper bound of M . It isbounded from below if there is a lower bound. It is bounded if it is bounded bothfrom above and from below.

The simplest subsets of R are intervals. There are open intervals (a, b) where theboundary points a and b are not included, closed intervals [a, b] where the boundarypoints are included, half-open intervals [a, b) or (a, b], and so on. Intervals which arebounded and closed are called compact. Unbounded intervals are written as (a,∞),(−∞, b], and so on.

If a set M is bounded from above then there is always a uniquely determined leastupper bound sup M which is called the supremum of M . This is not a theorem butthe completeness axiom. It requires an advanced mathematical construction to showthat there exists R, i.e. a set having the familiar properties of real numbers includingcompleteness.

Any set M ⊆ R which has a maximal element max M is bounded from above sincethe maximum is an upper bound. The maximum is also the least upper bound. A set Mneed not have a maximum. The existence of a maximum is equivalent to sup M ∈ M .If M is bounded from below then there is a gretest lower bound inf M called theinfimum of M .

A (open and connected) neighborhood of x ∈ R is an open interval (a, b) whichcontains x. Note that neighborhoods can be very small, i.e. can have any length ε > 0.

An (infinite) sequence is a function form N → R, denoted by n 7→ xn, for short (xn),where n = 1, 2, . . .. When we say that an assertion holds for almost all xn then wemean that it is true for all xn, beginning with some index N , i.e. for xn with n ≥ Nfor some N .

A number x ∈ R is called a limit of (xn) if every neighborhood of x contains almostall xn. In other words: The sequence (xn) converges to x: limn→∞ xn = x or xn → x.A sequence can have at most one limit since two different limits could be put intodisjoint neighborhoods.

A fundamental property of R is the fact that any bounded increasing sequence has alimit which implies that every bounded monotone sequence has a limit.

An increasing sequences (xn) which is not bounded is said to diverge to ∞ (xn ↑

Page 16: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

6 CHAPTER 1. FOUNDATIONS OF MATHEMATICAL ANALYSIS

∞), i.e. for any a we have xn > a for almost all xn. Thus, we can summarize:An increasing sequence either converges to some real number (iff it is bounded) ordiverges to∞ (iff it is unbounded). A similar assertion holds for decreasing sequences.

A simple fact which is an elementary consequence of the order structure says thatevery sequence has a monotone subsequence.

Putting terms together we arrive at a very important assertion: Every bounded se-quence (xn) has a convergent subsequence. The limit of a subsequence is called anaccumulation point of the original sequence (xn). In other words: Every boundedsequence has at least one accumulation point. An accumulation point x can also beexplained in the follwing way: Every neighborhood of x contains infinitely many xn,but not necessarily almost all xn. A sequence can have many accumulation points, andit is not necessarily bounded to have accumulation points. A sequence has a limit iff itis bounded and has only one accumulation point, which then is necessarily the limit.

If a sequence is bounded from above then the set of accumulation points is alsobounded from above. It is a remarkable fact that in this case there is even a maxi-mal accumulation point lim supn→∞ xn called limit superior. Similarly a sequencebounded from from below has a minimal accumulation point lim infn→∞ xn calledlimit inferior. A sequence has a limit iff both limit inferior and limit superior existand are equal.

There is a popular criterion for convergence of a sequence which is related to theassertion just stated. Call a sequence (xn) a Cauchy-sequence if there exist arbitrar-ily small intervals containing almost all xn. Cleary every convergent sequence is aCauchy-sequence. But also the converse is true in view of completeness. Indeed, ev-ery Cauchy-sequence is bounded and can have at most one accumulation point. Bycompleteness it has at least one accumulation point, and is therefore convergent.

The set R = [−∞,∞] is called the extended real line. If a sequence (xn) ⊆ Rdiverges to ∞ then we say that limn→∞ xn = ∞. If it has a subsequence whichdiverges to ∞ then we say that lim supn→∞ xn = ∞. In both cases we have sup xn =∞.

There is a interesting convergence criterion which is important for martingale theory.

1.1 THEOREM. A sequence (xn) ⊆ R is convergent in R iff it crosses every interval(a, b) at most a finite number of times.

PROOF: Note, that we always have lim inf xn ≤ lim sup xn where equality holdsiff the sequence is convergent in R. Thus, the sequence is not convergent in R ifflim inf xn < lim sup xn. The last inequality means that for any a < b such that

lim inf xn < a < b < lim sup xn

the interval (a, b) is crossed infinitely often. 2

Page 17: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

1.4. REAL-VALUED FUNCTIONS 7

1.4 Real-valued functions

In this section we give an overview over basic facts on real-valued functions as farthese are required for understanding the ideas of integration theory.

1.4.1 Simple functions

Let Ω 6= ∅ by any set. For a subset A ⊆ Ω the indicator function of A is defined to be

1A(x) =

1 if x ∈ A,0 if x 6∈ A

A function is a simple function if it has only finitely many different values. Everylinear combination of indicator functions is a simple function. A linear combinationof indicator functions is canonical if the sets supporting the indicators are a partitionof Ω and the coefficients are pairwise different.

PROBLEM 1.30: Show that every simple function has a uniquely determinedcanonical representation.PROBLEM 1.31: Let f and g be simple functions. Express the canonical repre-

sentation of f + g in terms of the canonical representations of f and g.PROBLEM 1.32: Show that the set of all simple functions is a vector space

(closed under linear combinations).

Many facts of integration theory rely on approximation arguments where complicatedfunctions are approximated by simple functions. There are a lot of different kinds ofapproximation.

1.2 DEFINITION. A sequence of functions fn : Ω → R is pointwise convergent tof : Ω → R if

limn→∞

fn(x) = f(x) for every x ∈ Ω.

A sequence of functions fn : Ω → R is uniformly convergent to f : Ω → R if

limn→∞

supx∈Ω

|fn(x)− f(x)| = 0.

It is convenient to define||f ||u = sup

x∈Ω|f(x)|

This is called the uniform norm (or the norm of uniform convergence).

Page 18: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

8 CHAPTER 1. FOUNDATIONS OF MATHEMATICAL ANALYSIS

A level set of a function f : Ω → R is a set of the form ω : a < f(ω) ≤ b.

1.3 FUNDAMENTAL APPROXIMATION THEOREM.(1) Every real valued function f is the pointwise limit of simple functions based onlevel sets of f .(2) Every bounded real valued function f is the uniform limit of simple functions basedon level sets of f .

PROOF: The fundamental statement is (b).

Let f ≥ 0. For every n ∈ N define

fn :=

(k − 1)/2n whenever (k − 1)/2n ≤ f < k/2n, k = 1, 2, . . . , n2n

n whenever f ≥ n

Then fn ↑ f . If f is bounded then (fn) converges uniformly to f . Part (a) followsfrom f = f+ − f−. 2

PROBLEM 1.33: Draw a diagram illustrating the construction of the proof of2.16.PROBLEM 1.34: Show: If f is bounded then the approximating sequence can

be chosen to be uniformly convergent.

A simple functions f : [a, b] → R is a step-function if its canonical partition consistsof intervals (including single points).

PROBLEM 1.35: Explain why every monotone function f : [a, b] → R is thepointwise limit of step-functions.

1.4.2 Regulated functions

1.4 DEFINITION. A function f : [a, b) → R has a limit from right at x ∈ [a, b) if forevery sequence xn ↓ x the function values (f(xn)) converge to a common limit

f(x+) := limn→∞

f(xn) ∈ R

A function f : (a, b] → R has a limit from left at x ∈ (a, b] if for every sequencexn ↑ x the function values (f(xn)) converge to a common limit

f(x−) := limn→∞

f(xn) ∈ R

Page 19: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

1.4. REAL-VALUED FUNCTIONS 9

Note, that function limits need not coincide with function values.

1.5 DEFINITION. A function is continuous from right (right-continuous) at x ∈[a, b) if f(x+) = f(x) for every x ∈ [a, b).A function is continuous from left (left-continuous) at x ∈ (a, b] if f(x−) = f(x) forevery x ∈ (a, b].A function is continuous at x ∈ (a, b) if f(x+) = f(x−) = f(x).

PROBLEM 1.36: Give the canonical representation of a left-continuous step-function.PROBLEM 1.37: Give the canonical representation of a right-continuous step-

function.

A function can be discontinuous in many ways. It may be that function limits exits butare not equal, or they are equal but don’t coincide with the function value. It may alsohappen that function limits don’t exist at all. A point where function limits exist butwhere f is not continuous is called a jump of f .

1.6 DEFINITION. [Regulated functions] A function is regulated on [a, b] it has limitsfrom right on [a, b) and from left on (a, b].

Regulated functions have nice properties.

1.7 THEOREM. Let f : [a, b] → R be a regulated function. Then(1) f is bounded on [a, b].(2) All discontinuities of f are jumps.(3) For every positive number a > 0 a regulated function can have only finitely manyjumps with size exceeding a.

PROBLEM 1.38: Explain via diagrams which kinds of discontinuities can hap-pen for a regulated function.PROBLEM 1.39: Give an example of a regulated function which is neither right-

continuous nor left-continuous.PROBLEM 1.40: Construct a regulated function with infinitely many jumps.PROBLEM 1.41: Show that a regulated function can have only countable many

jumps.

Regulated functions can be adjusted to be right-continuous or left-continuous. Thisis done by replacing the function values f(x) at every point x ∈ [a, b] by f(x+) andf(x−), respectivley. The resulting functions

f+ : x 7→ f(x+), f− : x 7→ f(x−)

Page 20: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

10 CHAPTER 1. FOUNDATIONS OF MATHEMATICAL ANALYSIS

are called the right- resp. left-continuous modifications of f . They are still regulatedfunctions.

1.8 DEFINITION. It is cadlag (continuous from right with limits from left) if it isregulated and continuous from right on [a, b].It is caglad (continuous from left with limits from right) if it is regulated and continu-ous from left on [a, b].

1.4.3 Riemannian approximation

For integration theory it is important to know how to approximate regulated functionsby step-functions. Let f : [a, b] → R be any function. The basic idea is to usesubdivisions a = t0 < t1 < . . . < tk = b and to define linear combinations of the form

g =k∑i=1

f(ξi)1Ii

where the intervals Ii form an interval partition of [a, b] with separating points ti andξ ∈ [ti−1, ti]. Let us call such a stepfunction a Riemannian approximator of f . Ofspecial importance are left-adjusted approximators

g =k∑i=1

f(ti−1)1(ti−1,ti]

and right-adjusted approximators

g =k∑i=1

f(ti)1[ti−1,ti)

A sequence of subdivisions a = t0 < t1 < . . . < tkn = b is called a Riemanniansequence of subdivisions if kn →∞ and max |ti − ti−1| → 0.

1.9 THEOREM. Let f : [a, b] → R be a regulated function and consider a Riemanniansequence of subdivisions.(1) The sequence of left-adjusted Riemannian approximators converges pointwise tof−, i.e.

k∑i=1

f(ti−1)1(ti−1,ti] → f−

(2) The sequence of right-adjusted Riemannian approximators converges pointwise tof+, i.e.

k∑i=1

f(ti)1[ti−1,ti) → f+

Page 21: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

1.4. REAL-VALUED FUNCTIONS 11

PROBLEM 1.42: Explain via diagrams the idea of Riemannian approximation.

1.4.4 Functions of bounded variation

Let f : [a, b] → R be any function.

1.10 DEFINITION. The variation of f on the interval [s, t] ⊆ [0, T ] is

supn∑j=1

|f(tj)− f(tj−1)|

where the supremum is taken over all subdivisions a = t0 < t1 < . . . < tn = b and alln ∈ N.

A function f is of bounded variation on [a, b] if V ba (f) < ∞. The set of all functions

of bounded variation is denoted by BV ([a, b]).

PROBLEM 1.43: Show that monotone functions are in BV ([a, b]) and calculatetheir variation.PROBLEM 1.44: Show that BV ([a, b]) is a vector space (is stable under linear

combinations).PROBLEM 1.45: Explain whyBV ([a, b]) is a subset of the set of regulated func-

tions.PROBLEM 1.46: Let f be differentiable on [a, b] with continuous derivative and

finitley many critical points. Show that f ∈ BV )[a, b]) and

V ba (f) =

∫ b

a|f ′(u)|du

PROBLEM 1.47: Show that any function f ∈ BV ([a, b]) can be written as f =g − h where g, h are increasing and satisfy V t

a (f) = g(t) + h(t).Hint: Let g(t) := (V t

a (f) + f(t))/2 and h(t) := (V ta (f)− f(t))/2.

PROBLEM 1.48: Construct a continuous functions on a compact interval whichis not of bounded variation.

Page 22: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

12 CHAPTER 1. FOUNDATIONS OF MATHEMATICAL ANALYSIS

1.5 Banach spaces

Let V be a vector space (a set which is closed under linear combinations).

1.11 DEFINITION. A norm on V is a function v 7→ ||v||, v ∈ V , satisfying thefollowing conditions:(1) ||v|| ≥ 0, ||v|| = 0 ⇔ v = o,(2) ||v + w|| ≤ ||v||+ ||w||, v, w ∈ V ,(3) ||λv|| ≤ |λ| ||v||, λ ∈ R, v ∈ V .A pair (V, ||.||) consisting of a vector space V and a norm ||.|| is a normed space.

1.12 EXAMPLE. (1) V = R is a a normed space with ||v|| = |v|.(2) V = Rd is a normed space under several norms. E.g.

||v||1 =d∑i=1

|vi|, ||v||2 =( d∑i=1

v2i

)1/2(Euclidean norm), ||v||∞ = max

1≤i≤d|vi|

(3) Let V = C([0, 1]) be the set of all continuous functions f : [0, 1] → R. Thisis a vector space. Popular norms on this vector space are

||f ||∞ = max0≤s≤1

|f(s)|

and

||f ||1 =∫ 1

0|f(s)| ds

The distance of two elements of V is defined to be

d(v, w) := ||v − w||

This function has the usual properties of a dstance, in particular satisfies the triangleinequality. A set of the form

B(v, r) := w ∈ V : ||w − v|| < r

is called an open ball around v with radius r. As sequence (vn) ⊆ V is convergentwith limit v if ||vn − v|| → 0.

A sequence (vn) is a Cauchy-sequence if there exist arbitrarily small balls containingalmost all members of the sequence, i.e.

∀ε > 0 ∃N(ε) ∈ N such that ||vn − vm|| < ε whenever n, m ≥ N(ε)

1.13 DEFINITION. A normed space is a Banach space if it is complete, i.e. if everyCauchy sequence is convergent.

Page 23: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

1.6. HILBERT SPACES 13

It is clear that R and Rd are complete under the usual norms. Actually they are com-plete under any norm. The situation is different with infinite dimensional normedspaces.

1.14 EXAMPLE. The space of continuous functions C([0, 1]) is complete under||.||∞ (under uniform convergence). However it is not complete under ||.||1.

The latter fact is one of the reasons for extending the notion and the range of theelementary integral.

1.6 Hilbert spaces

A special class of normed spaces are inner product spaces. Let V be a vector space.

1.15 DEFINITION. An inner product on V is a function (v, w) 7→< v,w >, v, w ∈V , satisfying the following conditions:(1) (v, w) 7→< v,w > is linear in both variables,(2) < v, v) ≥ 0, < v, v >= 0 ⇔ v = o.A pair (V, < ., . >) consisting of a vector space V and an inner product < ., . > is aninner product space.

An inner product gives rise to a norm according to

||v|| :=< v, v >1/2, v ∈ V.

PROBLEM 1.49: Show that ||v|| :=< v, v >1/2 is a norm.

1.16 EXAMPLE. (1) V = R is an inner product space with < v,w >= vw. Thecorresponding norm is ||v|| = |v|.(2) V = Rd is an inner product space with

< v,w >=d∑i=1

viwi

The corresponding norm is ||v||2.

(3) Let V = C([0, 1]) be the set of all continuous functions f : [0, 1] → R. Thisis an inner product space with

< f, g >=∫ 1

0f(s)g(s) ds

The corresponding norm is

||f ||2 =( ∫ 1

0f(s)2 ds

)1/2

Page 24: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

14 CHAPTER 1. FOUNDATIONS OF MATHEMATICAL ANALYSIS

1.17 DEFINITION. An inner product space is a Hilbert space if it is complete underthe norm defined by the inner product.

Inner product spaces have a geometric structure which is very similar to that of Rd

endowed with the usual inner product. In particular, the notions of orthogonality andof projections are available on inner product spaces. The existence of orthogonalprojections depends on completeness, and therefore requires Hilbert spaces.

PROBLEM 1.50: Let C be a closed convex subset of an Hilbert space (V,<., . >) and let v 6∈ C. Show that there exists v0 ∈ C such that

||v − v0|| = min||v − w|| : w ∈ C

Hint: Let α := inf||v − w|| : w ∈ C and choose a sequence (wn) ⊆ C suchthat ||v − wn|| → α. Apply the parallelogram equality to show that (wn) is aCauchy sequence.

The following is an extension theorem which plays a central role in many parts ofintegration and stochastic integration. It is concerned with the extension of a linearfunction between two Hiblert spaces.

Let (H1, < | >1) and (H2, < | >2) be two Hilbert spaces. Let D ⊆ H1 be a densesubspace, i.e. a subspace (closed under linear combinations) from which each elementof H1 can be reached as a limit. Assume that we have a linear function T : D → H2

which is isometric, i.e. such that

||Tx||2 = ||x||1 for all x ∈ D

The problem is to extend the function T from D to the whole of H1.

1.18 THEOREM. The linear isometric function T : D → H2 can be extended in auniquely determined way to a linear isometric function T : H1 → H2.

PROOF: Let x ∈ H2 be an arbitray element of H2. A natural idea to define Tx is totake some sequence (xn) ⊆ H1 such that xn → x and to define Tx = lim Txn. To besure that this procedure works one has to make sure a couple of facts:

(1) The definition is a valid equation for x ∈ D.

(2) For every convergent sequence (xn) ⊆ H1 the sequence (Txn) is convergent in H2.

(3) If (xn) and (yn) are two sequences with the same limit in H1 then the limits of(Txn) and Tyn) are equal, too.

(4) The function T : H1 → H2 is linear and isometric.

Details of the proof are left as PROBLEM 1.51 . 2

Page 25: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 2

Measures and measurable functions

2.1 Sigma-fields

2.1.1 The concept of a sigma-field

Let Ω be a (non-empty) set. We are interested in systems of subsets of Ω which areclosed under set operations.

2.1 EXAMPLE. In general, a system of subsets need not be closed under set op-erations.

Let Ω = 1, 2, 3. Consider the system of subsets A = 1, 2, 3. Thissystem is not closed under union, intersection or complementation. E.g. the com-plement of 1 is not in A.

It is clear that the power set is closed under any set operations. However, thereare smaller systems of sets which are closed under set operations, too.

Let Ω = 1, 2, 3. Consider the system of subsets B = Ω, ∅, 1, 2, 3. It iseasy to see that this system is closed under union, intersection and complemen-tation. Moreover, it follows that these set operations can be repeated in arbitraryorder resulting always in sets contained in A.

2.2 DEFINITION. A (non-empty) system F of subsets of Ω is called a σ-field if itclosed under union, intersection and complementation als well as under building limitsof monotone sequences. The pair (Ω,F) is called a measurable space.

There are some obvious necessary properties of a σ-field.

PROBLEM 2.1: Show that every σ-field on Ω contains ∅ and Ω.PROBLEM 2.2: What is the smallest possible σ-field on Ω?

15

Page 26: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

16 CHAPTER 2. MEASURES AND MEASURABLE FUNCTIONS

If we want to check whether a given system of sets is actually a σ-field then it issufficient to verify only a minimal set of conditions. The following assertion statessuch a minimal set of conditions.

2.3 PROPOSITION. A (non-empty) system F of subsets of Ω is a σ-field iff it satisfiesthe following conditions:(1) Ω ∈ F ,(2) A ∈ F ⇒ A′ ∈ F ,(3) If (Ai)

∞i=1 ⊆ F then

⋃∞i=1 Ai ∈ F .

The proof is PROBLEM 2.3 .

2.1.2 How to construct sigma-fields

When one starts to construct a σ-field one usually starts with a family C of sets whichin any case should be contained in the σ-field. If this starting family C does not fulfil allconditions of a σ-field then a simple idea could be to add further sets until the familyfulfils all required conditions. Actually, this procedure works if the starting family Cis a finite system.

2.4 DEFINITION. Let C be any system of subsets on Ω. The σ-field generated by Cis the smallest σ-field F which contains C. It is denoted by σ(C).

PROBLEM 2.4: Assume that C = A. Find σ(C).PROBLEM 2.5: Assume that C = A, B. Find σ(C).PROBLEM 2.6: Show by giving an example that the union of two σ-field need

not be a σ-field.

If the system C is any finite system then σ(C) consists of all sets which can be obtainedby finitely many unions, intersections and complementations of sets in C. Although theresulting system σ(C) still is finite a systematic overview over all sets could be rathercomplicated.

Things are much easier if the generating system is a finite partition of Ω.

2.5 PROPOSITION. Assume that C is a finite partition of Ω. Then σ(C) consists of ∅and of all unions of sets in C.

The proof is PROBLEM 2.7 .

PROBLEM 2.8: Let Ω be a finite set. Find the σ-field which is generated by theone-point sets.PROBLEM 2.9: Show that every finite σ-field F is generated by a partition of Ω.

Hint: Call a nonempty set A ∈ F an atom if it contains no nonempty propersubset in F . Show that the collection of atoms is a partition of Ω and that everyset in F is a union of atoms.

Page 27: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

2.2. MEASURABLE FUNCTIONS 17

2.1.3 Borel sigma-fields

Let us discuss σ-fields on R.

Clearly, the power set of R is a σ-field. However, the power set is too large. Let us bemore modest and start with a system of simple sets and then try to extend the systemto a σ-field.

The following example shows that such a procedure does not work if we start withone-point sets.PROBLEM 2.10: Let F be the collection of all subsets of R which are countable

or are the complement of a countable set.(1) Show that F is a σ-field.(2) Show that F is the smallest σ-field which contains all one-point sets.(3) Does F contain intervals ?

A reasonable σ-field on R should at least contain all intervals.

2.6 DEFINITION. The smallest σ-field on R which contains all intervals is called theBorel σ-field. It is denoted by B and its elements are called Borel sets.

Unfortunately, there is no way of describing all sets in B in a simple manner. All wecan say is that any set which can be obtained from intervals by countably many setoperations is a Borel set. E.g., every set which is the countable union of intervals isa Borel set. But there are even much more complicated sets in B. On the other hand,however, there are subsets of R which are not in B.

The concept of Borel sets is easily extended to Rn.

2.7 DEFINITION. The σ-field on Rn which is generated by all rectangles

R = I1 × I2 × · · · × In : Ik being any interval

is called the Borel σ-field on Rn and is denoted by Bn.

All open and all closed sets in Rn are Borel sets since open sets can be represented asa countable union of rectangles and closed sets are the complements of open sets.

2.2 Measurable functions

2.2.1 The idea of measurability

2.8 DEFINITION. A function f : (Ω,F) → R defined on a measurable space is calledF-measurable if the inverse images f−1(B) = (f ∈ B) are in F for all Borel setsB ∈ B.

Page 28: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

18 CHAPTER 2. MEASURES AND MEASURABLE FUNCTIONS

To get an idea what measurability means let us consider some simple examples.PROBLEM 2.11: Let (Ω,F , µ) be a measure space and let f = 1A where A ⊆Ω. Show that f is F-measurable iff A ∈ F .PROBLEM 2.12: Explain why f = 1Q is B-measurable.PROBLEM 2.13: Let Ω,F , µ) be a measure space and let f : Ω → R be a simple

function. Show that f is F-measurable iff all sets of the canonical representationare in F .

When we consider functions f : R → R then B-measurability is called Borel mea-surability.

2.2.2 The basic abstract assertions

The notion of measurability is not restricted to real-valued functions.

2.9 DEFINITION. A function f : (Ω,A) → (Y,B) is called (A,B)-measurable iff−1(B) ∈ A for all B ∈ B.

There are two fundamental principles for dealing with measurability. The first prin-ciple says that measurability is a property which is preserved under composition offunctions.

2.10 THEOREM. Let f : (Ω,A) → (Y,B) be (A,B)-measurable, and let g : (Y,B) →(Z, C) be (B, C)-measurable. Then g f is (A, C)-measurable.

The proof is PROBLEM 2.14 .

The second principle is concerned with checking measurability. For checking measur-ability of f it is sufficient to consider the sets in a generating system of the σ-field inthe range of f .

2.11 THEOREM. Let f : (Ω,A) → (Y,B) and let C be a generating system of B, i.e.B = σ(C). Then f is (A,B)-measurable iff f−1(C) ∈ A for all C ∈ C.

PROOF: Let D := D ⊆ Y : f−1(D) ∈ A. It can be shown that D is a σ-field. Iff−1(C) ∈ A for all C ∈ C then C ⊆ D. This implies σ(C) ⊆ D.

The details of the proof are PROBLEM 2.15 . 2

2.2.3 The structure of real-valued measurable functions

Let (Ω,F) be a measurable space. Let L(F) denote the set of all F-measurable real-valued functions. We start with the most common and most simple criterion for check-

Page 29: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

2.2. MEASURABLE FUNCTIONS 19

ing measurability of a real-valued function.

2.12 THEOREM. A function f : Ω → R is F-measurable iff (f ≤ α) ∈ F for everyα ∈ R.

The proof is PROBLEM 2.16 . (Hint: Apply 2.11.) This theorem provides us with alot of examples of Borel-measurable functions.

PROBLEM 2.17: Show that every monotone function f : R → R is Borel-measurable.PROBLEM 2.18: Show that every continuous function f : Rn → R is Bn-

measurable.Hint: Note that (f ≤ α) is a closed set.PROBLEM 2.19: Let f : (Ω,F) → R be F-measurable. Show that f+, f−, |f |,

and every polynomial a0 + a1f + · · ·+ anfn are F-measurable.

Even much more is true.

2.13 THEOREM. Let f1, f2, . . . , fn be measurable functions. Then for every continu-ous function φ : Rn → R the composition φ(f1, f2, . . . , fn) is measurable.

It follows that applying the usual algebraic operations to measurable functions pre-serves measurability.

2.14 COROLLARY. Let f1, f2 be measurable functions. Then f1 + f2, f1 · f2, f1 ∩ f2,f1 ∪ f2 are measurable functions.

The proof is PROBLEM 2.20 .

As a result we see that L(F) is a space of functions where we may perform any al-gebraic operations without leaving the space. Thus it is a very convenient space forformal manipulations. The next assertion shows that we may even perform all of thoseoperations involving a countable set (e.g. a sequence) of measurable functions !

2.15 THEOREM. Let (fn)n∈N be a sequence of measurable functions. Then supn fn,infn fn are measurable functions. Let A := (∃ limn fn). Then A ∈ F and limn fn · 1Ais measurable.

PROOF: Since(supn

fn ≤ α) =⋂n

(fn ≤ α)

it follows from 2.12 that supn fn and infn fn = − supn(−fn) are measurable. We have

A := (∃ limn

fn) =(

supk

infn≥k

fn = infk

supn≥k

fn

)This implies A ∈ F . The last statement follows from

limn

fn = supk

infn≥k

fn on A.

Page 30: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

20 CHAPTER 2. MEASURES AND MEASURABLE FUNCTIONS

2

Note that the preceding corollaries are only very special examples of the power ofTheorem 2.10. Roughly speaking, any function which can be written as an expressioninvolving countable many operations with countable many measurable functions ismeasurable. Therefore it is rather difficult to construct non-measurable functions.

Let us denote the set of all F-measurable simple functions by S(F). Clearly, all limitsof simple measurable functions are measurable. The remarkable fact being fundamen-tal for almost everything in integration theory is the converse of this statement.

2.16 THEOREM.(a) Every measurable function f is the limit of some sequence of simple measurablefunctions.(b) If f ≥ 0 then the approximating sequence can be chosen to be increasing.

This theorem is consequence of Theorem 1.3.

2.3 Measures

2.3.1 The concept of measures

Measures are set functions.

2.17 EXAMPLE. Let Ω by an arbitrary set and for any subset A ⊆ Ω define

µ(A) = |A| :=k if A contains k elements,∞ if A contains infinitely many elements.

This set function is called a counting measure. It is defined for all subsets of Ω.Obviously, it is additive, i.e.

A ∩B = ∅ ⇒ µ(A ∪B) = µ(A) + µ(B).

Measures are set functions which intuitively should be related to the notion of volume.Therefore measures should be nonnegative and additive. In order to apply additivitythey should be defined on systems of subsets which are closed under the usual setoperations. This leads to the requirement that measures should be defined on σ-fields.Finally, if the underlying σ-field contains infinitely many sets there should be somerule how to handle limits of infinite sequences of sets.

Page 31: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

2.3. MEASURES 21

Thus, we are ready for the definition of a measure.

2.18 DEFINITION. Let Ω be a non-empty set. A measure µ on Ω is a set functionwhich satisfies the following conditions:(1) µ is defined on a σ-field F on Ω.(2) µ is nonnegative, i.e. µ(A) ≥ 0, A ∈ F , and µ(∅) = 0.(3) µ is σ-additive, i.e. for every pairwise disjoint sequence (Ai)

∞i=1 ⊆ F

µ( ∞⋃i=1

Ai

)=

∞∑i=1

µ(Ai)

A measure is called finite if µ(Ω) < ∞. A measure P is called a probability measureif P (Ω) = 1. If µ|F is a measure then (Ω,F , µ) is a measure space. If P |F is aprobability measure then (Ω,F , P ) is called a probability space.

There are some obvious consequences of the preceding definition.

Let µ|F be a measure.PROBLEM 2.21: Every measure is additive.PROBLEM 2.22: A1 ⊆ A2 implies µ(A1) ≤ µ(A2).PROBLEM 2.23: Show the inclusion-exclusion law:

µ(A1) + µ(A2) = µ(A1 ∪A2) + µ(A1 ∩A2)

PROBLEM 2.24: Extend the formula of the preceding problem the union of threesets.PROBLEM 2.25: Any nonnegative linear combination of measures is a measure.

The property of being σ-additive both guarantees additivity and implies easy rules forhandling infinite sequences of sets.

Let µ|F be a measure.PROBLEM 2.26: If Ai ↑ A then µ(Ai) ↑ µ(A).PROBLEM 2.27: If Ai ↓ A and µ(A1) <∞ then µ(Ai) ↓ µ(A).PROBLEM 2.28: Every infinite sum of measures is a measure.

2.3.2 The abstract construction of measures

PROBLEM 2.29: Explain the construction of measures on a finite σ-field.Hint: Measures have to be defined for atoms only.

Page 32: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

22 CHAPTER 2. MEASURES AND MEASURABLE FUNCTIONS

2.4 Measures on the real line

2.4.1 Point measures

The most simple example of a measure is a point measure.

2.19 DEFINITION. The set function defined by

δa(A) = 1A(a), A ⊆ R,

is called the point measure at a ∈ R.

Take a moment’s reflection on whether this definition actually satisfies the propertiesof a measure. Note that any point measure can be defined for all subsets of R, i.e. it isdefined on the largest possible σ-field 2R.

Taking linear combinations of point measures gives a lot of further examples of mea-sures.

PROBLEM 2.30: Let µ = δ0 + 2δ1 + 0.5δ−1. Calculate µ(([0, 1)), µ([−1, 1)),µ((−1, 1]).PROBLEM 2.31: Describe in words the values of µ =

∑kj=1 δaj .

PROBLEM 2.32: Let x ∈ Rn be a list of data and let µ(I) be the percentage ofdata contained in I . Show that µ is a measure by writing it as a linear combinationof point measures.

2.4.2 The Lebesgue measure

Let Ω = R and for every interval I ⊆ R define

λ(I) := length of I

E.g. λ((a, b]) = b − a. This set function is called the Lebesgue content of intervals.At the moment it is defined only on the family of all intervals.

The Lebesgue content is also additive in the following sense: If I1 and I2 are twointervals such that the union I1 ∪ I2 = I3 is an interval, too, then

I1 ∩ I2 = ∅ ⇒ µ(I1 ∪ I2) = µ(I1) + µ(I2).

However, the family of intervals is not a σ-field. In order to obtain a measure we haveto extend the Lebesgue content to a σ-field which contains the intervals. The smallestσ-field with this property is the Borel-σ-field.

2.20 MEASURE EXTENSION THEOREM. There exists a uniquely determined measureλ|B such that λ((a, b]) = b−a, a < b. This measure is called the Lebesgue measure.

Page 33: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

2.4. MEASURES ON THE REAL LINE 23

Knowing that λ|B is a measure we may calculate its values for simple Borel sets whichare not intervals.

PROBLEM 2.33: Find the Lebesgue measure of Q.

2.4.3 Measure defining functions

Now, let us turn to the problem how to get an overview over all measures µ|B. Werestrict our interest to measures which give finite values to bounded intervals.

Let µ|B be a measure such that µ((a, b]) < ∞ for a < b. Define

α(x) :=

µ((0, x]) if x > 0

−µ((x, 0]) if x ≤ 0

and note that for any a < b we have

µ((a, b]) = α(b)− α(a) =

µ((0, b])− µ((0, a]) if 0 ≤ a < b,µ((0, b]) + µ((a, 0]) if a < 0 < b,

−µ((b, 0]) + µ((a, 0]) if a < b ≤ 0

This means: For every such measure µ there is a function α : R → R which definesthe measure at least for all intervals. This function is called the measure-definingfunction of µ.

Note, that our definition of the measure-defining function α is such that α(0) = 0.However, any function which differs from α by an additive constant, only, defines thesame measure.

Calculate the measure-defining function of the following measures:PROBLEM 2.34: A point measure: δ−2, δ0, δ3PROBLEM 2.35: A linear combination of point measures: δ−2 + 2δ0 + 0.5δ3PROBLEM 2.36: The Lebesgue measure λ.

Let µ|B be finite on bounded intervals and α its measure-defining function. Ex-plain:PROBLEM 2.37: α is increasing.PROBLEM 2.38: α is right-continuous.

The following is an existence theorem which establishes a one-to-one relation betweenfunctions and measures.

2.21 MEASURE EXTENSION THEOREM.For every increasing right-continuous function α : R → R there exists a uniquelydetermined measure λα such that λα((a, b]) = α(b)− α(a).

Page 34: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

24 CHAPTER 2. MEASURES AND MEASURABLE FUNCTIONS

If the measure-defining function α is continuous and piecewise differentiable then itsderivative is called the density of the measure λα (with respect to the Lebesgue mea-sure λ). This name comes from

α′(x) = limh→0

α(x + h)− α(x− h)

2h= lim

h→0

λα((x− h, x + h])

λ((x− h, x + h])

In such a situation we have

λα((a, b]) =

∫ b

a

α′(x) dx

2.4.4 Discrete measures

A measure µ|B is discrete if it is a finite or infinite linear combination of point mea-sures. A counting measure is a discrete measure where all point measures with posi-tive weight have weight one.

PROBLEM 2.39: Explain the characteristic properties of the measure-definingfunction of a discrete measure and of a counting measure.

Let α be the measure-defining function of λα.PROBLEM 2.40: Show that λα(a) = ∆α(a).PROBLEM 2.41: For which measures λα is α continuous ?PROBLEM 2.42: For which measures λα is α a step-function ?

Page 35: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 3

Integrals

3.1 The integral of simple functions

Let (Ω,F , µ) be a measure space. We start with defining the µ-integral of a measurablesimple function.

3.1 DEFINITION. Let f =∑n

i=1 ai1Fibe a nonnegative simple F-measurable func-

tion with its canonical representation. Then∫f dµ :=

n∑i=1

aiµ(Fi)

is called the µ-integral of f .

We had to restrict the preceding definition to nonnegative functions since we admit thecase µ(F ) = ∞. If we were dealing with a finite measure µ the definition would workfor all F-measurable simple functions.

3.2 EXAMPLE. Let (Ω,F , P ) be a probability space and let X =∑n

i=1 ai1Fi bea simple random variable. Then we have E(X) =

∫X dP .

PROBLEM 3.1: What is the integral with respect to a linear combination of pointmeasures ? Which functions can be integrated ?PROBLEM 3.2: Give a geometric interpretation of the integral of a step function

with respect to a Borel measure.

3.3 THEOREM. The µ-integral on S(F)+ has the following properties:(1)

∫1Fdµ = µ(F ),

(2)∫

(sf + tg) dµ = s∫

f dµ + t∫

g dµ if s, t ∈ R+ and f, g ∈ S(F)+

(3)∫

f dµ ≤∫

g dµ if f ≤ g and f, g ∈ S(F)+

25

Page 36: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

26 CHAPTER 3. INTEGRALS

PROOF: The only nontrivial part is to prove that∫

(f + g)dµ =∫

fdµ +∫

gdµ. Thisis PROBLEM 3.3 .Hint: Try to find the canonical representation of f + g in terms of the canonical repre-sentations of f and g. 2

It follows that the defining formula of the µ-integral can be applied to any (nonnega-tive) linear combination of indicators, not only to canonical representations !

3.2 The extension process

3.2.1 Extension to nonnegative functions

We know that every nonnegative measurable function f ∈ L(F)+ is the limit of anincreasing sequence (fn) ⊆ S(F)+ of measurable simple functions: fn ↑ f . It is anatural idea to think of the integral of f as something like∫

f dµ := limn→∞

∫fn dµ (1)

This is actually the way we will proceed. But there are some points to worry about.

First of all, we should ask whether the limit on the right hand side exists. This isalways the case. Indeed, the integrals

∫fndµ form an increasing sequence in [0,∞].

This sequence either has a finite limit or it increases to ∞. Both cases are covered byour definition.

The second and far more subtle question is whether the definition is compatible withthe definition of the integral on S(F). This is the only nontrivial part of the extensionprocess of the integral and it is the point where σ-additivity of µ comes in. This isproved in Theorem 3.14.

The third question is whether the value of the limit is independent of the approximatingsequence. This is is is also the case and proved in Theorem 3.15.

Thus, (1) is a valid definition of the integral of f ∈ L(F)+.

3.4 DEFINITION. Let (Ω,F , µ) be a measure space. The µ-integral of a functionf ∈ L+(F) is defined by equation (1) where (fn) is any increasing sequence (fn) ⊆S(F)+ of measurable simple functions such that fn ↑ f .

It is now straightforward that the basic properties of the integral of simple functions

Page 37: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

3.2. THE EXTENSION PROCESS 27

stated in Theorem 3.3 carry over to L(F)+.

3.5 THEOREM. The µ-integral on L(F)+ has the following properties:(1)

∫1Fdµ = µ(F ),

(2)∫

(sf + tg) dµ = s∫

f dµ + t∫

g dµ if s, t ∈ R+ and f, g ∈ L(F)+

(3)∫

f dµ ≤∫

g dµ if f ≤ g and f, g ∈ L(F)+

The following problems establish some easy properties of the integral developed sofar.

PROBLEM 3.4: Let f ∈ L(F)+. Prove Markoff’s inequality

µ(f > a) ≤ 1a

∫f dµ, a > 0.

PROBLEM 3.5: Let f ∈ L(F)+. Show that∫f dµ = 0 implies µ(f 6= 0) = 0.

Hint: Show that µ(f > 1/n) = 0 for every n ∈ N.

An assertion A about a measurable function f is said to hold µ-almost everywhere(µ-a.e.) if µ(Ac) = 0. Using this terminology the assertion of the preceding problemscan be phrased as: ∫

f dµ = 0, f ≥ 0 ⇒ f = 0 µ-a.e.

If we are talking about probability measures and random variables the phrase ”almosteverwhere” is sometimes replaced by ”almost sure”.

PROBLEM 3.6: Let f ∈ L(F)+. Show that∫f dµ < ∞ implies µ(f > a) <

∞ for every a > 0.

3.2.2 Integrable functions

Now the integral is defined for every nonnegative measurable function. The value ofthe integral may be ∞. In order to define the integral for measurable functions whichmay take both positive and negative values we have to exclude infinite integrals.

3.6 DEFINITION. A measurable function f is µ-integrable if∫

f+ dµ < ∞ and∫f− dµ < ∞. If f is µ-integrable then∫

f dµ :=

∫f+ dµ−

∫f− dµ

The set of all µ-integrable functions is denoted by L1(µ) = L1(Ω,F , µ).

Proving the basic properties of the integral of integrable functions is an easy matter.We collect these facts in a couple of problems.

Page 38: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

28 CHAPTER 3. INTEGRALS

PROBLEM 3.7: Show that f ∈ L(F) is µ-integrable iff∫|f | dµ <∞.

PROBLEM 3.8: The set L1(µ) is a linear space and the µ-integral is a linearfunctional on L1(µ).PROBLEM 3.9: The µ-integral is an isotonic functional on L1(µ).PROBLEM 3.10: Let f ∈ L1(µ). Show that |

∫f dµ| ≤

∫|f | dµ.

PROBLEM 3.11: Let f be a measurable function and assume that there is anintegrable function g such that |f | ≤ g (say: f is dominated). Then f is integrable.PROBLEM 3.12: (a) Discuss the question whether bounded measurable func-

tions are integrable.(b) Characterize those measurable simple functions which are integrable.

Many assertions in measure theory concerning measurable functions are stable underlinear combinations and under convergence. Assertions of such a type need only beproved for indicators. The procedure of proving (understanding) an assertion for in-dicators and extending it to nonnegative and to integrable functions is called measuretheoretic induction.

PROBLEM 3.13: Show that integrals are linear with respect to the integratingmeasure.

The integral over a (measurable) subset is defined by∫A

f dµ :=

∫1Af dµ, A ∈ F .

PROBLEM 3.14: (a) Let f be an integrable function. Then∫A f dµ = 0 for all

A ∈ F implies f = 0 µ-a.e.(b) Let f and g be integrable functions. Then

∫A f dµ =

∫A g dµ for all A ∈ F

implies f = g µ-a.e.

3.3 Convergence of integrals

One of the reasons for the great success of abstract integration theory are the conver-gence theorems for integrals. The problem is the following. Assume that (fn) is asequence of functions converging to some function f . When can we conclude that

limn→∞

∫fn dµ =

∫f dµ ?

There are (at least) three basic assertions of this kind which could be viewed as thethree basic principles of integral convergence. We will present these principles togetherwith typical applications.

Page 39: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

3.3. CONVERGENCE OF INTEGRALS 29

3.3.1 The theorem of monotone convergence

The first principle says that for increasing sequences of nonnegative functions the limitand the integral may be interchanged.

3.7 THEOREM OF BEPPO LEVI.Let (fn) ⊆ L(F)+. Then fn ↑ f ⇒ limn→∞

∫fn dµ =

∫f dµ

The theorem is proved in section 3.5. Note that there is no assumption on integrability.If the sequence is decreasing instead of increasing the corresponding assertion is onlyvalid if the sequence is integrable.

PROBLEM 3.15: (a) Let (fn) ⊆ L1(F)+. Then fn ↓ f ⇒ limn→∞∫fn dµ =∫

f dµ(b) Show by example that the integrability assumption cannot be omitted withoutcompensation.

The first application looks harmless.

PROBLEM 3.16: (a) Let f be a measurable function such that f = 0 µ-a.e..Then f is integrable and

∫f dµ = 0.

Hint: Consider f+ and f− separately.(b) Let f and g be measurable functions such that f = g µ-a.e.. Then f is inte-grable iff g is integrable.

3.3.2 The infinite series theorem

The second principle says that for nonnegative measurable function integrals and in-finite sums may be interchanged. It is an easy consequence of the monotone conver-gence theorem (see section 3.5).

3.8 THEOREM. For every sequence (fn) of nonnegative measurable functions we have∫ ( ∞∑n=1

fn

)dµ =

∞∑n=1

∫fn dµ

PROBLEM 3.17: Let (Ω,F , µ) be a measure space and f ≥ 0 a measurablefunction. Show that τ : A 7→

∫A f dµ is a measure.

PROBLEM 3.18: Let (amn) a double sequence of nonnegative numbers. Showthat

∑m

∑n amn =

∑n

∑m amn.

Hint: Define fn(x) := amn if x ∈ (m− 1,m].

Page 40: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

30 CHAPTER 3. INTEGRALS

PROBLEM 3.19: (a) Let Ω = N and F = 2N. Show that for every sequencean ≥ 0 there is a uniquely determined measure µ|F such that µ(n) = an.(b) Find

∫f dµ for f ≥ 0.

3.3.3 The dominated convergence theorem

The most popular result concerning this issue is Lebesgue’s theorem on dominatedconvergence. Find the proof in section 3.5.

3.9 DOMINATED CONVERGENCE THEOREM.Let (fn) be a sequence of measurable function which is dominated by an integrablefunction g, i.e. |fn| ≤ g, n ∈ N. If fn → f µ-a.e. then f ∈ L1(µ) andlimn→∞

∫fn dµ =

∫f dµ.

PROBLEM 3.20: Show that under the assumptions of the dominated conver-gence theorem we even have

limn→∞

∫|fn − f | dµ = 0

(This type of convergence is called mean convergence.)PROBLEM 3.21: Discuss the question whether a uniformly bounded sequence

of measurable functions is dominated in the sense of the dominated convergencetheorem.

There is a plenty of applications of the dominated convergence theorem. Let us presentthose consequences which show the superiority of general measure theory comparedwith previous approaches to integration.

3.4 Stieltjes integration

3.4.1 The notion of the Stieltjes integral

If we are dealing with Borel measure spaces (R,B, λα) where the measure is definedby some increasing right-continuous function α, then we write∫

f dλα :=

∫f dα =

∫f(x) dα(x)

A special case is the Lebesgue integral∫

f dλ =∫

f(x) dx.

Let α1 and α2 be increasing right-continuous functions.

Page 41: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

3.4. STIELTJES INTEGRATION 31

PROBLEM 3.22: Explain λα1+α2 = λα1 + λα2

PROBLEM 3.23: Explain∫ ba f d(α1 + α2) =

∫ ba f dα1 +

∫ ba f dα2 for bounded

measurable functions f : [a, b] → R.

Moreover, integral limits are defined by∫ b

a

f dα :=

∫(a,b]

f dα.

Note, that the lower integral limit is not included, but the upper limit is included !

PROBLEM 3.24: What is the difference between∫ ba f dα and

∫(a,b) f dα ?

It seems to be a natural goal to extend the notion of the integral from monotone in-tegrators α to integrators which are not increasing. However this is not possible witharbitrary right continuous-functions. The family of functions where an extension iseasily available is the family of functions of bounded variation. The reason is thatthese are exactly the functions which can be written as linear combination (actually asa difference) of increasing functions.

3.10 DEFINITION. Let g : [a, b] → R be a right-continuous function of boundedvariation. Then the Stieltjes integral of a bounded measurable function f : [a, b] → Ris ∫ b

a

f dg :=

∫ b

a

f dg1 −∫ b

a

f dg2

where g = g1 − g2 is such that g1 and g2 are right-continuous increasing functions.

PROBLEM 3.25: Show that the definition of the Stieltjes integral is independentof the decomposition g = g1 − g2.

The Stieltjes integral has many properties which can be used for calculation purposes.Moreover, the Stieltjes integral is a special case of the general stochastic integral whichis an indispensable tool in the theory of stochastic processes and their applications.

The starting point for all calculation rules is the approximation of Stieltjes integralsby Riemannian sums. This is an easy consequence of the general limit theorem forintegrals.

Recall the notion of a Riemannian sequence of subdivisions of an interval [a, b].PROBLEM 3.26: Let f : [a, b] → R be a regulated function and let g be a right-

continuous of bounded variation. Show that for every Riemannian sequence ofsubdivisions of [a, b]

(a) limn→∞

kn∑i=1

f(ti−1)(g(ti)− g(ti−1)) =∫ b

af− dg

Page 42: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

32 CHAPTER 3. INTEGRALS

(b) limn→∞

kn∑i=1

f(ti)(g(ti)− g(ti−1)) =∫ b

af+ dg

3.4.2 Integral calculus

In the following we denote by the letter g : [a, b] → R a right-continuous function ofbounded variation. The letter f : [a, b] → R denotes a bounded measurable function.We define

f • g : t 7→∫ t

a

f dg

PROBLEM 3.27: Explain why f • g is of bounded variation.PROBLEM 3.28: Show that f • g is right-continuous with left limits.PROBLEM 3.29: Show that ∆(f • g)(t) = f(t)∆g(t).

If g has jumps then we have ∆(f • g)(t) = f(t)∆g(t). Hence, f • g is continuouswhenever g is continuous.

Let h : [a, b] → R be bounded and measurable. As a consequence of the precedingproblems the integral

∫ b

af d(h • g) is well-defined. How can we express this integral

in terms of an integral with respect to g ?

3.11 ASSOCIATIVITY RULE.Let f and h be bounded measurable functions. Then∫ b

a

f d(h • g) =

∫ b

a

fh dg

PROOF: The assertion is obvious for f = 1(a,t], 0 ≤ t ≤ b. The general case followsby measure theoretic induction. 2

Since for rules of this kind the function f is only a dummy function it is convenient tostate the rule in a more compact way as

d(h • g) = h dg

which is called differential notation. It should be kept in mind that such formulashave always to be interpreted as assertions about integrals.

PROBLEM 3.30: Let g be differentiable with continuous derivative g′. Show thatdg = g′dt.

Page 43: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

3.4. STIELTJES INTEGRATION 33

The next result is of fundamental importance both for classical integral calculus aswell as for stochastic calculus.

3.12 INTEGRATION BY PARTS.Let both f and g be right-continuous and of bounded variation. Then

f(b)g(b) = f(a)g(a) +

∫ b

a

f− dg +

∫ b

a

g− df +∑a<s≤b

∆f(s)∆g(s)

At this point it is not clear how the last term of the formual is defined. This questionwill be settled in the proof of the theorem.

PROOF: The first step is to show that

f(b)g(b) = f(a)g(a) +

∫ b

a

f− dg +

∫ b

a

g df

This can be done by approximation of integrals via Riemannian sums ( PROBLEM

3.31 ). Then it remains to show that∫ b

a

g df =

∫ b

a

g− df +∑a<s≤b

∆f(s)∆g(s)

which is equivalent to ∫ b

a

∆g(t) df(t) =∑a<s≤t

∆g(s)∆f(s)

Recall that ∆g(t) is a function which is zero except at countable many points (ti)i∈N(positions of the jumps of g). Therefore it remains to be shown that∫ b

a

∆g(t) df(t) =∞∑i=1

∆g(ti)∆f(ti)

We have to apply the infinite series theorem for integrals ( PROBLEM 3.32 ). 2

PROBLEM 3.33: Let f be increasing and right continuous. Show that for everyRiemannian sequence of subdivisions of [a, b]

limn→∞

kn∑i=1

(f(ti)− f(ti−1))2 =∑a<s≤b

(∆f(s))2

(Remark: The limit is called the quadratic variation of f on [a, b]. The assertionshows that it is zero if f is continuous.)

Page 44: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

34 CHAPTER 3. INTEGRALS

PROBLEM 3.34: State the integration by parts formula for continuous and forcontinuously differentiable functions. How to write the integration by parts for-mula in differential form ?

There is a third calculation rule for Stieltjes integrals. This rule is the classical proto-type of the famous Ito formula of stochastic integration.

3.13 TRANSFORMATION FORMULA. Let φ : R → R be a continuous function with acontinuous derivative. Let f : [a, b] → R be a function of bounded variation.(1) If f is continuous then

φ f(b) = φ f(a) +

∫ b

a

φ′ f df

(2) If f is right-continuous then

φ f(b) = φ f(a) +

∫ b

a

φ′ f− df +∑

0<s≤t

(φ f − φ f− − φ′ f−∆f

)(s)

The first part of the assertion can easily be understood. For this let us consider the caseof a differentiable function f . Then we know that

φ f(b) = φ f(a) +

∫ b

a

φ′ f(s) f ′(s) ds

which by df(s) = f ′(s) ds gives part (1) of the theorem. Therefore, part (1) is theextension of the fundamental theorem of analysis to continuous functions f which areof bounded variation.

The second part of the assertion can be understood intuitively, too. It simply subtractsthe contributions of the integral at the jumps of f and replaces them by the jumpheights of φ f .

PROBLEM 3.35: Assume that f has finitely many jumps on [a, b]. Show thatpart (2) of Theorem 3.13 follows from part (1).Hint: Apply part (1) to each interval where f is continuous.

Theorem 3.13 can be proved as follows. The following two problems imply that thetheorem holds if φ(x) is any polynomial. Since a continuously differentiable func-tion can be approximated by polynomials in such a way that both the function and itsderivative is approximated uniformly on compact intervals, the assertion follows.

PROBLEM 3.36: Use the integration by parts-formula to show that Theorem3.13 is true for φ(x) = x2.

Page 45: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

3.5. PROOFS OF THE MAIN THEOREMS 35

PROBLEM 3.37: Assume that Theorem 3.13 holds for the function φ(x). Usethe integration by parts-formula to show that Theorem 3.13 is true also for ψ(x) =φ(x)x.

3.5 Proofs of the main theorems

3.14 THEOREM. Let f ∈ S(F)+ and (fn) ⊆ S(F)+. Then

fn ↑ f ⇒ limn

∫fn dµ =

∫f dµ

PROOF: Note that ”≤” is clear. For an arbitrary ε > 0 let Bn := (f ≤ fn · (1 + ε)). Itis clear that ∫

1Bnf dµ ≤∫

1Bnfn · (1 + ε) dµ ≤∫

fn dµ · (1 + ε)

From Bn ↑ Ω it follows that A ∩ Bn ↑ A and µ(A ∩ Bn) ↑ µ(A) by σ-additivity. Weget ∫

f dµ =n∑j=1

αjµ(Aj) = limn

n∑j=1

αj µ(Aj ∩Bn) = limn

∫1Bnf dµ

which implies ∫f dµ ≤ lim

n

∫fn dµ · (1 + ε)

Since ε is arbitrarily small the assertion follows. 2

3.15 THEOREM. Let (fn) and gn) be increasing sequences of nonnegative measurablesimple functions. Then

limn

fn = limn

gn ⇒ limn

∫fndµ = lim

n

∫gndµ.

PROOF: It is sufficient to prove the assertion with ”≤” replacing ”=”. Since limk fn∩gk = fn ∩ limk gk = fn we obtain by 3.14∫

fn dµ = limk

∫fn ∩ gk dµ ≤ lim

k

∫gk dµ

Page 46: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

36 CHAPTER 3. INTEGRALS

2

3.16 THEOREM. (Theorem of Beppo Levi)Let f ∈ L(F)+ and (fn) ⊆ L(F)+. Then

fn ↑ f ⇒ limn

∫fn dµ =

∫f dµ

PROOF: We have to show ”≥”.

For every n ∈ N let (fnk)k∈N be an increasing sequence in S(F)+ such that limk fnk =fn. Define

gk := f1k ∪ f2k ∪ . . . ∪ fkk

Thenfnk ≤ gk ≤ fk ≤ f whenever n ≤ k.

It follows that gk ↑ f and∫f dµ = lim

k

∫gk dµ ≤ lim

k

∫fk dµ

2

PROBLEM 3.38: Prove Fatou’s lemma: For every sequence (fn) of nonnegativemeasurable functions

lim infn

∫fn dµ ≥

∫lim inf

nfn dµ

Hint: Recall that lim infn xn = limk infn≥k xn. Consider gk := infn≥k fn andapply Levi’s theorem to (gk).

3.17 THEOREM. (Dominated convergence theorem)Let (fn) be a sequence of measurable function which is dominated by an integrablefunction g, i.e. |fn| ≤ g, n ∈ N. Then

fn → f µ-a.e. ⇒ f ∈ L1(µ) and limn

∫fn dµ =

∫f dµ

Now it is easy to prove several important facts concerning the integral. We state thesea problems.

PROOF: Integrability of f is obvious since f is dominated by g, too. Moreover, thesequences g − fn and g + fn consist of nonnegative measurable functions. Thereforewe may apply Fatou’s lemma:∫

(g − f) dµ ≤ lim inf

∫(g − fn) dµ =

∫g dµ− lim sup

n

∫fn dµ

Page 47: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

3.5. PROOFS OF THE MAIN THEOREMS 37

and ∫(g + f) dµ ≤ lim inf

∫(g + fn) dµ =

∫g dµ + lim inf

n

∫fn dµ

This implies ∫f dµ ≤ lim inf

n

∫fn dµ ≤ lim sup

n

∫fn dµ ≤

∫f dµ

2

Page 48: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

38 CHAPTER 3. INTEGRALS

Page 49: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 4

More on integration

4.1 The image of a measure

Let (Ω,A, µ) be a measure space and let (Y,B) be a measurable space. Moreover,let f : Ω → Y be a function. We are going to consider the problem of mapping themeasure µ to the set Y be means of the function f .

4.1 DEFINITION. Let f : (Ω,A, µ) → (Y,B) be (A,B)-measurable. Then

µf (B) := µ(f ∈ B) = µ(f−1(B)), B ∈ B.

is the image of µ under f .

PROBLEM 4.1: Show that µf is indeed a measure on B.PROBLEM 4.2: Let Ω,F , µ) be a measure space and let f = 1A where A ⊆ Ω.

Find µf .PROBLEM 4.3: Let Ω,F , µ) be a measure space and let f : Ω → R be a simple

function. Find µf .

An important point is how integrals behave under measure mappings.

4.2 TRANSFORMATION FORMULA. Let (Ω,F , µ) be a measure space and let g ∈L(F). Then for every f ∈ L+(B)∫

f g dµ =

∫f dµg

The proof is PROBLEM 4.4 (use measure theoretic induction).

39

Page 50: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

40 CHAPTER 4. MORE ON INTEGRATION

PROBLEM 4.5: Let (Ω,F , µ) be a measure space and let g ∈ L(F). Show thatf g is µ-integrable iff f is µg-integrable. In case of integrability the transforma-tion formula holds.

4.2 Measures with densities

Let (Ω,F , µ) be a measure space and let f ∈ L+(F).

PROBLEM 4.6: Show that ν : A 7→∫A f dµ, A ∈ F is a measure.

4.3 DEFINITION. Let µ be a measure and define a measure ν = fµ by

ν : A 7→∫A

f dµ, A ∈ F .

Then f =:dν

dµis called a density or a Radon-Nikodym derivative of ν with respect

to µ.

We would like to say that f is the density of ν with respect to µ but for doing so wehave to be sure that f is uniquely determined by ν. But this is not true, in general.

PROBLEM 4.7: Show that the density is uniquely determined if the measure ν isfinite.

4.4 EXAMPLE. Let µ|B be a measure such that all countable sets B ∈ B havemeasure zero and all uncountable sets have measure µ(B) = ∞. A moment’sreflection shows that this is actually a measure. Now for every positive constantfunction f ≡ c > 0 we have∫

Bf dµ = µ(B), B ∈ B.

In the light of the preceding example we see that we have to exclude unreasonablemeasures in order to obtain uniqueness of densities. The following lemma shows thedirection we have to go.

4.5 DEFINITION. A measure µ|F is called σ-finite if there is a sequence of sets(Fn)n∈N ⊆ F such that µ(Fn) < ∞ for all n ∈ N and

⋃∞n=1 Fn = Ω.

Note that Borel measures λα are σ-finite. It turns out that for σ-finite measures µdensitites are uniquely determined.

4.6 THEOREM. If µ is σ-finite then every density w.r.t. µ is uniquely determind µ-almoste everywhere.

Page 51: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

4.2. MEASURES WITH DENSITIES 41

PROOF: Let f, g ∈ L+(F). We will show that∫A

f dµ =

∫A

g dµ ∀A ∈ F ⇒ µ((f 6= g) ∩ A) = 0 ∀µ(A) < ∞.

In other words: f = g µ-a.e. on every set of finite µ-measure. This implies theassertion since Ω can be exhausted by sets of finite µ-measure.

Let µ(M) < ∞ and define Mn := M ∩ (f ≤ n) ∩ g ≤ n). Since f1Mn and g1Mn areµ-integrable it follows that f1Mn = g1Mn µ-a.e. For n →∞ we have Mn ↑ M whichimplies f1M = g1M µ-a.e. 2

A density w.r.t the Lebesgue measure is called a Lebesgue density.

PROBLEM 4.8: Let α : R → R be an increasing function which is supposed tobe differentiable on R. Find the Lebesgue density of λα.

Which measures ν|F have densities w.r.t. µ|F ?

PROBLEM 4.9: Let ν = fµ. Show that µ(A) = 0 implies ν(A) = 0, A ∈ F .

4.7 DEFINITION. Let µ|F and ν|F be measures. The measure ν is said to beabsolutely continuous w.r.t the measure µ|F (ν µ) if

µ(A) = 0 ⇒ ν(A) = 0, A ∈ F .

We saw that absolute continuity is necessary for having a density. It is even sufficient.

4.8 THEOREM. (Radon-Nikodym theorem)Assume that µ is σ-finite. Then ν µ iff ν = fµ for some f ∈ L+(F).

PROOF: See Bauer, [2]. 2

Let µ and ν be measures on a finite field F .PROBLEM 4.10: State ν µ in terms of the generating partition of F .PROBLEM 4.11: If ν µ find dν/dµ.

An important question is how µ-integrals can be transformed into ν-integrals.

PROBLEM 4.12: Let ν = fµ. Discuss the validity of∫f dν =

∫fdν

dµdµ

Hint: Prove it for f ∈ S+(F) and extend it by measure theoretic induction.

Page 52: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

42 CHAPTER 4. MORE ON INTEGRATION

4.3 Product measures and Fubini’s theorem

Let (Ω1,F) and (Ω2,G) be measurable spaces. We are going to discuss measure andintegration on Ω1 × Ω2.

To begin with we have to define a σ-field on Ω1 × Ω2. This σ-field should be largeenough to contain at least the rectangles (diagram) F ×G where F ∈ F and G ∈ G.

4.9 DEFINITION. The σ-field on Ω1 × Ω2 which is generated by the family of mea-surable rectangles

R = F ×G : F ∈ F , G ∈ G

is called the product of F and G and is denoted by F ⊗ G.

A special case of a product σ-field is the Borel σ-field B2.

Having established a σ-field we turn to measurable functions. Recall that any contin-uous function f : R2 → R is B2-measurable.

PROBLEM 4.13: Let f : Ω1 → R be F-measurable. Show that (x, y) 7→ f(x)is F ⊗ G-measurable.PROBLEM 4.14: Let f : Ω1 → R be F-measurable, g : Ω2 → R be G-

measurable, and let φ : R2 → R be continuous. Show that (x, y) 7→ φ(f(x), g(y))is F ⊗ G-measurable.

The preceding problems show that functions of several variables which are set up ascompositions of measurable functions of one variable are usually measurable withrespect to the product σ-field (confer corollaries 2.13 and 2.14).

The next point is to talk about measures. There is a special class of measures onproduct spaces which are constructed from measures on the components in a simpleway.

The starting idea is the geometric content of rectangles in R2. If I1 and I2 are intervalsthen the geometric content (area) of the rectangle I1× I2 is the product of the contents(lengths) of the constituting intervals. The extension of this idea to general measuresleads to product measures.

4.10 THEOREM. Let (Ω1,F , µ) and (Ω2,G, ν) be measure spaces. Then there exists auniquely determined measure µ⊗ ν|F ⊗ G satisfying

(µ⊗ ν)(F ×G) = µ(F )ν(G), F ×G ∈ R.

The measure µ⊗ ν is called the product measure of µ and ν.

PROOF: See Bauer, [2]. 2

Page 53: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

4.3. PRODUCT MEASURES AND FUBINI’S THEOREM 43

As a consequence it follows that there is a uniquely determined measure on (R2,B2)which measures rectangles by their geometric area. In terms of product measure thisis λ⊗ λ = λ2, and is called the Lebesgue measure on R2.

Let us turn to integration. Integration for general measures on product spaces can bea rather delicate matter. Things are much simpler when we are dealing with productmeasures. The main point is that for product measures the integration on productspaces can be reduced to iterated integration (i.e. evaluating integrals over singlecomponents).

Let us proceed step by step.

The most simple case is the integration of the indicator of a rectangle. Let F ×G ∈ R.Then we have∫

1F×G d(µ⊗ ν) = (µ⊗ ν)(F ×G) = µ(F )ν(G) =

∫1F dµ

∫1G dν

In general, a set A ∈ F ⊗ G need not be a rectangle. How, can we extend the formulaabove to general sets ? The answer is the section theorem (Cavalieri’s principle).

For any set A ⊆ Ω1 × Ω2 we call

Ay := x ∈ Ω1 : (x, y) ∈ A, y ∈ Ω2,

the y-section of A (diagram !). Similarly the x-section Ax, x ∈ Ω1, is defined. Note,that for rectangles the sections are particularly simple.

PROBLEM 4.15: Find the sections of a rectangle.

The section theorem says that the volume of a set is the integral of the volumes of itssections.

4.11 SECTION THEOREM. Let A ∈ F ⊗ G. Then all sections of A are measurable,i.e. Ay ∈ F , y ∈ Ω2, and y 7→ µ(Ay) is a G-measurable function. Moreover, we have

(µ⊗ ν)(A) =

∫µ(Ay)ν(dy)

PROOF: The measurability parts of the section theorem are a matter of measuretheoretic routine arguments. Much more interesting is the integral formula.

In order to understand the integral formula we write it as an iterated integral:

(µ⊗ ν)(A) =

∫ ( ∫1A(x, y) µ(dx)

)ν(dy)

Page 54: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

44 CHAPTER 4. MORE ON INTEGRATION

It is easy to see that the inner integral evaluates to µ(Ay) ( PROBLEM 4.16 ). Why isthis formula valid ? First of all, it is valid for rectangles A = F ×G ∈ R. This followsimmediately from the definition of the product measure. Moreover, both sides of theequation define measures on the σ-field F ⊗G. Since these two measures are equal onrectangles they necessarily are equal on the generated σ-field. 2

Let us illustrate how the section theorem works.

PROBLEM 4.17: Find the area of the unit circle by means of the section theorem.Outline: Let A be the unit circle with center at the origin. Then we have

λ2(A) =∫λ(Ay) dy = 2

∫ 1

−1

√1− y2 dy

Substitute y = sin t and apply (sin t cos t)′ = 2 cos2 t− 1.PROBLEM 4.18: Find the area of a right angled triangle by means of the section

theorem.

Our last topic in this section is to extend the section theorem to integrals. The resultinggeneral assertion is Fubini’s theorem.

4.12 FUBINI’S THEOREM. Let f : Ω1×Ω2 → R be a nonnegative F⊗G-measurablefunction. Then

x 7→ f(x, y) and y 7→∫

f(x, y) µ(dx)

are measurable functions and∫f d(µ⊗ ν) =

∫ ( ∫f(x, y) µ(dx)

)ν(dy)

PROOF: Fubini’s theorem follows from the section theorem in a straightforward wayby measure theoretic induction. 2

PROBLEM 4.19: Find a version of Fubini’s theorem for integrable functions.PROBLEM 4.20: Explain when it is possible to interchange the order of integra-

tion for an iterated integral.PROBLEM 4.21: Deduce from Fubini’s theorem assertions for interchanging the

order of summation for double series of numbers.

Page 55: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

4.4. SPACES OF INTEGRABLE FUNCTIONS 45

4.4 Spaces of integrable functions

4.4.1 Integrable functions

We know that the space L1 = L1(Ω,F , µ) is a vector space. We would like to definea norm on L1.

A natural idea is to define

||f ||1 :=

∫|f | dµ, f ∈ L1.

It is easy to see that this definition has the following properties:(1) ||f ||1 ≥ 0, f = 0 ⇒ ||f ||1 = 0,(2) ||f + g||1 ≤ ||f ||1 + ||g||1, f, g ∈ L1,(3) ||λf ||1 ≤ |λ| ||f ||1, λ ∈ R, f ∈ L1.However, we have

||f ||1 = 0 ⇒ f = 0 µ-a.e.

A function with zero norm need not be identically zero ! Therefore, ||.||1 is not a normon L1 but only a pseudo-norm.

In order to get a normed space one has to change the space L1 in such a way that allfunctions f = g µ-a.e. are considered as equal. Then f = 0 µ-a.e. can be consideredas the null element of the vector space. The space of integrable functions modified inthis way is denoted by L1 = L1(Ω,F , µ).

4.13 DISCUSSION. For those readers who like to have hard facts instead of softwellness we provide some details.

For any f ∈ L(F) let

f = g ∈ L(F) : f = g µ-a.e.

denote the equivalence class of f . Then integrability is a class-property and thespace

L1 := f : f ∈ L1

is a vector space. The value of the integral depends only on the class and there-fore it defines a linear function on L1 having the usual properties. In particular,||f ||1 := ||f ||1 defines a norm on L1.

It is common practice to work with L1 instead of L1 but to write f instead of f .This is a typical example of what mathematicians call abuse of language.

4.14 THEOREM. The space L1(Ω,F , µ) is a Banach space.

Page 56: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

46 CHAPTER 4. MORE ON INTEGRATION

PROOF: Let (fn) be a Cauchy sequence in L1, i.e.

∀ε > 0 ∃N(ε) such that∫|fn − fm| dµ < ε whenever n, m ≥ N(ε).

Let ni := N(1/2i). Then ∫|fni+1

− fni| dµ <

1

2i

It follows that for all k ∈ N∫ (|fn1 |+ |fn2 − fn1|+ · · ·+ |fnk+1

− fnk|)

dµ ≤ C < ∞

Hence the corresponding infinite series converges which implies that

|fn1|+∞∑i=1

|fni+1− fni

| < ∞ µ-a.e.

Since absolute convergence of series in R implies convergence (here completenes ofR goes in) the partial sums

fn1 + (fn2 − fn1) + · · ·+ (fnk− fnk−1

) = fnk

converge to some limit f µ-a.e. Mean convergence of (fn) follows from Fatou’s lemmaby ∫

|fn − f | dµ =

∫limk→∞

|fn − fnk| dµ

≤ lim infk→∞

∫|fn − fnk

| dµ < ε whenever n ≥ N(ε).

In other words we ||fn − f ||1 → 0. 2

Our next result establishes a dense subset of L1. First of all, it says that simple func-tions are dense in L1. But even more is true. We can can even restrict the class of setswhere the canonical decomposition of the simple functions comes from. It is sufficientto consider simple functions made up of indicators of sets in a system R which gen-erates F and is a field. This means that it is is closed under unions, intersections andcomplementations. E.g. if we are dealing with (R,B) we may consider stepfunctions,i.e. simple functions based on intervals. The reason is that finite unions of intervalsform a field which generates B.

4.15 THEOREM. Let R be a field which generates F . Then the set of R-measurablesimple functions is dense in L1(Ω,F , µ).

PROOF: The assertion is proved in two parts. Let ε > 0. First we note that forevery f ∈ L1(Ω,F , P ) there exists an F-measurable simple function g such that ||f −g||1 < ε. This can easily be shown for the positive and the negative parts separately.Second we have show that for everyF-measurable simple function g there exists anR-measurable simple function h such that ||g − h||1 < ε. This follows from the measureextension theorem. We do not go into details but refer to Bauer, [2]. 2

Page 57: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

4.5. FOURIER TRANSFORMS 47

4.4.2 Square integrable functions

LetL2 = L2(Ω,F , µ) =

f ∈ L(F) :

∫f 2dµ < ∞

This is another important space of integrable functions.

PROBLEM 4.22: Show that L2 is a vector space.PROBLEM 4.23: Show that

∫f2dµ < ∞ is a property of the µ-equivalence

class of f ∈ L(F).

By L2 = L2(Ω,F , µ) we again denote the corresponding space of equivalence classes.On this space there is an inner product

< f, g >:=

∫fg dµ, f, g ∈ L2.

The corresponding norm is

||f ||2 =< f, f >=( ∫

f 2dµ)1/2

The following facts can be proved in a way similar as for the L1-case.

4.16 THEOREM. The space L2(Ω,F , µ) is a Hilbert space.

4.17 THEOREM. Let R be a field which generates F . Then the set of R-measurablesimple functions is dense in L2(Ω,F , P ).

4.5 Fourier transforms

In order to represent and treat measures in a mathematically convenient way measuretransforms play a predominant role. The most simple measure transform is the momentgenerating function.

4.18 DEFINITION. Let µ|B be a finite measure. Then the function

m(t) =

∫etx µ(dx), t ∈ R,

is called the Laplace transform or moment generating function of µ.

The moment generating function shares important useful properties with other mea-sure transforms but it has a serious drawback. The exponential function x 7→ etx is

Page 58: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

48 CHAPTER 4. MORE ON INTEGRATION

unbounded and therefore may be not integrable for some values of t and measures µ.The application of moment generating functions is only possible in such cases wherethe exponential function is integrable at least for all values of t in an interval of positivelength.

This kind of complication vanishes if we replace the real-valued exponential functionx 7→ etx by its complex version x 7→ eitx. The corresponding measure transform iscalled the Fourier transform.

4.19 DISCUSSION. Let us recall some basic facts on complex numbers.

The complex number field

C = z = u+ iv : u, v ∈ R

is an extension of the real numbers R in such a way that a number i (the ”imag-inary” number) is introduced which satisfies i2 = i · i = −1. All other rules ofcalculation carry over from R to C.

Complex numbers are not ordered but have an absolute value, defined by |z| =√u2 + v2 if z = u + iv. For every complex number z ∈ C there is a conju-

gate number z := u − iv. The operation of conjugation satisfies z1z2 = z1z2.Moreover, we have zz = |z|2.

Several functions defined on R can be extended to C. For our purposes only theexponential function is of importance. It is defined by

eu+iv := eu(cos(v) + i sin(v)), u, v ∈ R.

This definition satisfies ez1+z2 = ez1ez2 , z1, z2 ∈ C. For the notion of the Fouriertransform it is important to note that |eiv| = 1, v ∈ R. This is a consequence offamiliar properties of trigonometric functions.

Differentiation and integration of complex-valued functions of a real variable is easilydefined to be performed for the real and the imaginary parts separately. Be sure to notethat we are not dealing with function of a complex variable ! This would be a muchmore advanced topic called complex analysis.

PROBLEM 4.24: Find the derivative of x 7→ eax, x ∈ R, where a ∈ C.PROBLEM 4.25: Show that the basic derivation rules (summation rule, product

rule and chain rule) are valid for complex-valued functions.PROBLEM 4.26: Let f be a complex-valued measurable function (both the real

and the imaginary part are measurable). Show that |f | is µ-integrable iff both thereal and the imaginary part of f are µ-integrable.PROBLEM 4.27: Show that the µ-integral of complex-valued functions on R is

a linear functional.PROBLEM 4.28: Let f be a complex-valued µ-integrable function. Show that∣∣∣ ∫

f dµ∣∣∣ ≤ ∫

|f | dµ.

Page 59: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

4.5. FOURIER TRANSFORMS 49

The next problem shows that the usual integration calculus (substitution, integrationby parts) carries over from real-valued functions to complex-valued functions.

PROBLEM 4.29: Show that indefinite integrals of complex-valued functions onR are primitives of their integrands.

PROBLEM 4.30: Find∫ d

ceax dx, where c, d ∈ R, a ∈ C.

With these preparations we are in a position to proceed with Fourier transforms.

4.20 DEFINITION. Let µ|B be a finite measure. Then the function

µ(t) =

∫eitx µ(dx), t ∈ R,

is called the Fourier transform of µ.

Note that the Fourier transform is well-defined and finite for every t ∈ R.

PROBLEM 4.31: Find the Fourier transform of a point measure.PROBLEM 4.32: Find the Fourier transform of an exponential distribution.PROBLEM 4.33: Find the Fourier transform of a Poisson distribution.

Hint: The series expansion of the exponential function carries over to the complex-valued case.PROBLEM 4.34: Find the Fourier transform of a Gaussian distribution.

Hint: Derive a differential equation for the Fourier transform.

The Fourier transform can be used to find the moments of a measure.

4.21 THEOREM. Let µ|B be a finite measure. If∫|x|k µ(dx) < ∞ then µ is k-times

differentiable anddk

dtkµ(t)

∣∣∣t=0

= ik∫

xk µ(dx)

The proof is PROBLEM 4.35 .

The fundamental fact on Fourier transforms is the uniqueness theorem.

4.22 THEOREM. Let µ1|B and µ2|B be finite measures. Then

µ1 = µ2 ⇔ µ1 = µ2.

For the proof we refer to Bauer, [2].

Page 60: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

50 CHAPTER 4. MORE ON INTEGRATION

The notion of the Fourier transform can be extended to measures on (Rn,Bn).

4.23 DEFINITION. Let µ|Bn be a finite measure on Rn. Then the function

µ(t) =

∫eit·x µ(dx), t ∈ Rn,

is called the Fourier transform of µ.

The uniqueness theorem is true also for the n-dimensional case.

Page 61: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 5

Probability

5.1 Basic concepts of probability theory

5.1.1 Probability spaces

In probability theory a model of a random experiment consists of a pair (Ω,F , P )where Ω is a non-empty set, F is a σ-field on Ω and P |F is a probability measure, i.e.a measure satisfying P (Ω) = 1.

The set Ω serves as sample space. It is interpreted as set of possible outcomes of theexperiment. Note, that it is not necessarily the case that single outcomes are actuallyobservable.

The σ-field F is interpreted as the field of observable events. Observability of a setA ⊆ Ω means that after having performed the random experiment it can be decidedwhether A has been realized or not. In this sense the σ-field contains the informationwhich is obtained after having performed the random experiment. Therefore F is alsocalled the information set of the random experiment.

5.1.2 Random variables

Let (Ω,F) be a model of a random experiment. The idea of a random variable is thatof a function X : Ω → R such that assertions about X are observable events, i.e.are contained in F . In other words: A random variable is simply an F-measurablefunction.

5.1 DEFINITION. A random variable is a function X : Ω → R such that (X ∈ B) ∈F for every Borel set B ∈ B.

51

Page 62: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

52 CHAPTER 5. PROBABILITY

PROBLEM 5.1: Show that every function satisfying (X ≤ x) ∈ F for everyx ∈ R is a random variable.

Let us turn to the question for the information set of a general random variable. Con-ceptually, the information set σ(X) is the σ-field that is generated by all events whichcan be observed through X .

5.2 DEFINITION. The information set σ(X) is the system of sets (X ∈ B) where Bis an arbitrary Borel set.

A simple random variable X is a simple function whose basic partition is observable,i.e. (X = a) ∈ F for every value a of X . The information set σ(X) is the σ-fieldwhich is generated by the basic partition of X .

5.3 EXAMPLE. Consider the random experiment of throwing a coin n-times. De-note the sides of the coin by 0 and 1. Then the sample space is Ω = 0, 1n. As-sume that the outcomes of each throw are observable. If Xi denotes the outcomeof the i-th throw then this means that (Xi = 0) and (Xi = 1) are observable.

PROBLEM 5.2: Let Ω = 0, 13 and define Sk :=∑k

i=1Xi.(1) Find σ(X1), σ(X2), σ(X3).(2) Find σ(S1), σ(S2), σ(S3).PROBLEM 5.3: Let Ω be the sample space of throwing a die twice. Denote the

outcomes of the throws X and Y , respectively. Find σ(X), σ(Y ), σ(X + Y ),σ(X − Y ).

5.1.3 Distributions of random variables

Let X be a random variable.

5.4 DEFINITION. The set function

PX : B 7→ P (X ∈ B); B ∈ B,

is called the distribution of X (under P ).

The notion of the distribution of a random variable is a special case of the image ofa measure (see 4.1). Since PX is a measure on (R,B) it can be represented by itsmeasure defining function α. For probability measures it is, however, simpler to usethe distribution function

F (x) = P (X ≤ x) = PX((−∞, x]) = PX((−∞, 0]) + α(x)

which differs from α only by an additive constant. Thus we have

P (a < X ≤ b) = PX((a, b]) = F (b)− F (a) = α(b)− α(a).

Page 63: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

5.1. BASIC CONCEPTS OF PROBABILITY THEORY 53

5.5 PROPOSITION. Let X be a random variable with distribution function F . ThenPX = λF .

The proof is PROBLEM 5.4 .

For examples illustrating the relation between random variables and their distributionfunction we refer to [].

5.1.4 Expectation

If (Ω,F , P ) is a probability space and X is a nonnegative or integrable random variablethen

E(X) :=

∫X dP

is called the expectation of X . Thus, expectations are integrals of random variablesw.r.t. the underlying probability measures.

PROBLEM 5.5: Let (Ω,F , P ) be a probability space and X a random variablewith distribution function F . Explain the formula

E(f X) =∫f dλF

PROBLEM 5.6: Let (Ω,F , P ) be a measure space andX a random variable withdifferentiable distribution function F . Explain the formulas

P (X ∈ B) =∫BF ′(t) dt and E(g X) =

∫g(t)F ′(t) dt

PROBLEM 5.7: Let X be a P -integrable random variable. Prove Cebysev’s in-equality.

PROBLEM 5.8: LetZ be a random variable with values in N0. Show thatE(Z) =∑∞k=1 P (Z ≥ k).

PROBLEM 5.9: Let X ≥ 0. Show that E(X) =∫∞0 P (X ≥ t) dt.

Hint: Note that X(ω) =∫ X(ω)0 dt and apply Fubini’s theorem.

PROBLEM 5.10: Let X ≥ 0. Show that E(X2) = 2∫∞0 tP (X ≥ t) dt.

Hint: Note that X(ω)2 = 2∫ X(ω)0 t dt and apply Fubini’s theorem.

For examples and further notions related to the concept of expectation (variance, co-variance, . . . ) we refer to [].

Page 64: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

54 CHAPTER 5. PROBABILITY

5.2 Independence

The notion of independence marks the point where probability theory goes beyondmeasure theory.

Recall that two events A, B ∈ F are independent if the product formulaP (A ∩B) = P (A)P (B) is true. This is easily extended to families of events.

5.6 DEFINITION. Let C and D be subfamilies of F . The families C and D are saidto be independent (with respect to P ) if P (A ∩ B) = P (A)P (B) for every choiceA ∈ C and B ∈ D.

It is natural to call random variables X and Y independent if the corresponding infor-mation sets are independent.

5.7 DEFINITION. Two random variables X and Y are independent if σ(X) andσ(Y ) are independent.

The preceding definition can be stated as follows: Two random variables X and Y areindependent if

P (X ∈ B1, Y ∈ B2) = P (X ∈ B1)P (Y ∈ B2), B1, B2 ∈ B.

This is equivalent to saying that the joint distribution PX,Y of X and Y is the productof PX and P Y .

How to check independence of random variables ? Is it sufficient to check the inde-pendence of generators of the information sets ? This is not true, in general, but with aminor modification it is.

5.8 THEOREM. Let X and Y be random variables and let C and D be generatorsof the corresponding information sets. If C and D are independent and closed underintersection then X and Y are independent.

PROBLEM 5.11: Let F (x, y) be the joint distribution function of (X,Y ). Showthat X and Y are independent iff F (x, y) = h(x)k(y) for some functions h andk.

Page 65: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

5.2. INDEPENDENCE 55

For independent random variables there is a product formula for expectations.

5.9 THEOREM. (1) Let X ≥ 0 and Y ≥ 0 be independent random variables. Then

E(XY ) = E(X)E(Y )

(2) Let X ∈ L1 and Y ∈ L1 be independent random variables. Then XY ∈ L1 and

E(XY ) = E(X)E(Y )

PROOF: Apply measure theoretic induction to obtain (1). Part (2) follows from (1).2

PROBLEM 5.12: Let X and Y be random variables on a common probabilityspace. Show that X and Y are independent iff

E(ei(sX+tY )) = E(eisX)E(eitY ), s, t ∈ R.

Recall that square integrable random variables X and Y are called uncorrelated ifE(XY ) = E(X)E(Y ). This is a weaker notion than independence.

PROBLEM 5.13: Show that uncorrelated random variables need not be indepen-dent.PROBLEM 5.14: Find the variance of the sample mean of independent random

variables.PROBLEM 5.15: Show that X and Y are independent iff f(X) and g(Y ) are

uncorrelated for all bounded measurable functions f and g.

The notion of independence (as well as the notion of uncorrelated random variables)can be extended to more than two random variables. We will state the correspondingfacts when we need them.

Page 66: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

56 CHAPTER 5. PROBABILITY

5.3 Convergence and limit theorems

5.3.1 Convergence in probability

For probability theory other kinds of convergence play a predominant role than thosewe are accustomed to so far.

5.10 DEFINITION. Let (Ω,F , P ) be a probability space and let (Xn) be a sequenceof random variables. The sequence (Xn) is said to converge to a random variable XP -almost sure if

limn→∞

Xn(ω) = X(ω) for P -almost all ω ∈ Ω

This kind of convergence is also considered in measure theory and we know that undercertain additional conditions convergence P -almost sure implies convergence of theexpectations of the random variables.

However, the probabilistic meaning of almost sure convergence is limited. The reasonis that the idea of approximating a random variable X by another random variable Yin a probabilistic sense does not require that the random variables near to each otherfor particular ω ∈ Ω. It is sufficient that the probability of being near to each other islarge.

5.11 DEFINITION. Let (Ω,F , P ) be a probability space and let (Xn) be a sequenceof random variables. The sequence (Xn) is said to converge to a random variable X

in P -probability (XnP→ X) if

limn→∞

P (|Xn −X| > ε) = 0 for every ε > 0

PROBLEM 5.16: The limit of a sequence of random variables which is conver-gent in P -probability is uniquely determined P -almost everywhere.PROBLEM 5.17: Apply Cebysev’s inequality to prove the weak law of large

numbers (WLLN): For a sequence (Xn) of independent and identically dis-tributed square integrable random variables the corresponding sequence of samplemeans converges to the expectation in probability.

Convergence in probability is actually a weaker concept than almost sure convergence.

PROBLEM 5.18: Show that almost sure convergence implies convergence in prob-ability.

Page 67: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

5.3. CONVERGENCE AND LIMIT THEOREMS 57

PROBLEM 5.19: Every sequence which converges in probability contains a sub-sequence which converges almost sure to the same limit.PROBLEM 5.20: Show by an example that convergence in probability does not

imply almost sure convergence.PROBLEM 5.21: Show that convergence in the mean and convergence in the

quadratic mean imply convergence in probability.

The power of convergence in probability is also due to the fact that the dominated con-vergence theorem remains valid if almost sure convergence is replaced by convergencein probability.

Many assertions which are obviously valid for almost sure convergence are also validfor convergence in probability. Let us state two of the most important assertions.

5.12 THEOREM. (1) Convergence in probability is inherited under algebraic opera-tions.(2) Convergence in probability is inherited under compositions with continuous func-tions.

We do not go into details of the proof of the preceding assertions but refer to theliterature.

5.3.2 Convergence in distribution

There is another concept of convergence which is important for probability theory.This is the notion of convergence in distribution or weak convergence. Let us com-ment on this by some motivational remarks.

In many application one is interested in the approximation of distributions of randomvariables rather than of the random variables themselves. E.g. if we consider so-called asymptotic normality of random variables (Xn) we think of approximatingprobabilities P (Xn ∈ I) by Q(I) where Q is some normal distribution.

5.13 DEFINITION. A sequence of random variables (Xn) converges in distribution toa random variable X if

limn→∞

E(f(Xn)) = E(f(X))

for every bounded continuous function f : R → R.

PROBLEM 5.22: Show that convergence in probability implies convergence indistribution.

Page 68: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

58 CHAPTER 5. PROBABILITY

The most useful ctriterion for convergence in distribution is the following.

5.14 THEOREM. Assume that the random variable X has a continuous distributionfunction. A sequence of random variables (Xn) converges in distribution to a randomvariable X iff

limn→∞

P (Xn ≤ x) = P (X ≤ x)

for every x ∈ R.

The most famous special case of asymptotic normality is the central limit theorem(CLT) which in its simplest form runs as follows.

5.15 CENTRAL LIMIT THEOREM.Let (Xn) be a sequence of independent identically distributed random variables squareintegrable random variables. Let (Zn) be the corresponding sequence of standardizedsample means. Then

limn→∞

P (Zn ∈ I) = Q(I)

for every interval I ⊆ R, where Q denotes the standard normal distribution.

The proof of the CLT is carried out by Fourier transforms since pointwise convergenceof Fourier transforms is equivalent to weak convergence.

5.4 The causality theorem

When we are dealing with random variables which are not independent then we wouldlike to express the kind of dependence in an appropriate way. In this section we con-sider the strongest kind of dependence.

Let X and Y be random variables such that Y = f X where f is some Borel-measurable function. Since (Y ∈ B) = (X ∈ f−1(B)) it follows that σ(Y ) ⊆ σ(X).In other words: If Y is a function of X (causally dependent on X) then the informationset of Y is contained in the information set of X . This is intuitively very plausible:Any assertion about Y can be stated as an assertion about X .

It is a remarkable fact that even the converse is true.

5.16 CAUSALITY THEOREM. Let X and Y be random variables such that σ(Y ) ⊆σ(X). Then there exists a measurable function f such that Y = f X .

PROOF: By measure theoretic induction it is sufficient to prove the assertion forY = 1A, A ∈ F .

Recall that σ(Y ) = ∅, Ω, A,Ac. From σ(Y ) ⊆ σ(X) it follows that A ∈ σ(X), i.e.A = (X ∈ B) for some B ∈ B. This means 1A = 1B X . 2

Page 69: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

5.4. THE CAUSALITY THEOREM 59

PROBLEM 5.23: State and prove a causality theorem for σ(X1, X2, . . . , Xk)-measurable random variables.Hint: Let C be the generating system of σ(X1, X2, . . . , Xk) and let D be thefamily of sets A such that 1A is a function of (X1, X2, . . . , Xk). Show thatD is a σ-field and that C ⊆ D. This implies that any indicator of a set inσ(X1, X2, . . . , Xk) is a function of (X1, X2, . . . , Xk). Extend this result by mea-sure theoretic induction.

Page 70: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

60 CHAPTER 5. PROBABILITY

Page 71: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 6

Random walks

6.1 The ruin problem

6.1.1 One player

Let us start with a very simple gambling system.

A gambler bets a stake of one unit at subsequent games. The games are independentand p denotes the probability of winning. In case of winning the gambler’s return isthe double stake, otherwise the stake is lost.

A stochastic model of such a gambling system consists of a probability space (Ω,F , P )and a sequence of random variables (Xi)i≥1. The random variables are independentwith values +1 and−1 representing the gambler’s gain or loss at time i ≥ 1. Thus, wehave P (X = 1) = p. The sequence of partial sums, i.e. the accumulated gains,

Sn = X1 + X2 + · · ·+ Xn

is called a random walk on Z starting at zero. If p = 1/2 then it is a symmetricrandom walk.

Assume the the gambler starts at i = 0 with capital V0 = a. Then her wealth after ngames is

Vn = a + X1 + X2 + · · ·+ Xn = a + Sn

The sequence (Vn)n≥0 of partial sums is a random walk starting at a.

We assume that the gambler plans to continue gambling until her wealth attains a givenlevel c > a or 0. Let

Tx := minn : Vn = x

be the first time when the wealth attains the level x.

61

Page 72: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

62 CHAPTER 6. RANDOM WALKS

PROBLEM 6.1: Explain why Tx is a random variable.

The conditional probability q0(a) := P (T0 < Tc|V0 = a) is called the probability ofruin. Similarly, qc(a) := P (Tc < T0|V0 = a) is the probability of winning.

How to evaluate the probability of ruin ? This probability can be obtained by study-ing the dynamic behaviour of the gambling situation. Thus, this is a basic example ofa situation which is typical for stochastic analysis: Probabilities are not obtained bycombinatorial methods but by a dynamic argument resulting in a difference or differ-ential equation.

The starting point is the following assertion.

6.1 LEMMA. The ruin probabilities satisfy the difference equation

qc(a) = p qc(a + 1) + (1− p) qc(a− 1) whenever 0 < a < c

with boundary conditions qc(0) = 0 and qc(c) = 1.

It is illuminating to understand the assertion with the help of an heuristic argument: Ifthe random walk starts at V0 = a, 0 < a < c, then we have V1 = a+1 with probabilityp and V1 = a− 1 with probability 1− p. This gives

P (Tc < T0|V0 = a) = pP (Tc < T0|V1 = a + 1) + (1− p)P (Tc < T0|V1 = a− 1)

However, the random walk starting at time i = 1 has the same ruin probabilities as therandom walk starting at i = 0. This proves the assertion. In this argument we utilizedthe intuitively obvious fact that the starting time of the random walk does not affect itsruin probabilities. The exact proof is left a PROBLEM 6.2 .

In order to calculate the ruin probabilities we have to solve the difference equation.

6.2 DISCUSSION. The difference equation

xa = pxa+1 + (1− p)xa−1 whenever a = 1, . . . , c− 1

has the general solution

xa =

A+B(1− p

p

)aif p 6= 1/2

A+Ba if p = 1/2

(Hint: Try xa = λa which gives two special solutions for λ. The general solutionis a linear combination of the special solutions.) The constants A and B are

Page 73: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

6.1. THE RUIN PROBLEM 63

determined by the boundary conditions. This gives

qc(a) =

(1− p

p

)a− 1(1− p

p

)c− 1

if p 6= 1/2

a

cif p = 1/2

In order to calculate q0(a) we note that q0(a) = qc(c−a) where q denotes the ruinprobabilities of a random walk with interchanged transitions probabilities. Thisimplies

q0(a) =

( p

1− p

)c−a− 1( p

1− p

)c− 1

if p 6= 1/2

c− a

cif p = 1/2

Easy calculations show that

qc(a) + q0(a) = 1

which means that gambling ends with probability 1.

PROBLEM 6.3: Show that the random walk hits the boundaries almost surely(with probability one).

6.1.2 Two players

Now we assume that two players with initial capitals a and b are playing against eachother. The stake of each player is 1 at each game. The game ends when one player isruined.

This is obviously equivalent to the situation of the preceding section leading to

P (player 1 wins) = qa+b(a)

P (player 2 wins) = q0(a).

We know that the game ends with probability one. Fill in the details as PROBLEM

6.4 .

Let us turn to the situation where player 1 has unlimited initial capital. Then the gamecan only end by the ruin of player 2, i.e. if

supn

Sn ≥ b

Page 74: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

64 CHAPTER 6. RANDOM WALKS

where Sn denotes the accumulated gain of player 1.

6.3 THEOREM. Let (Sn) be a random walk on Z. Then

P (supn

Sn ≥ b) =

1 whenever p ≥ 1/2( p

1− p

)bwhenever p < 1/2

PROOF: We have to show that

P (supn

Sn ≥ b) = lima→∞

qa+b(a)

Details are left as PROBLEM 6.5 . 2

Note that P (supn Sn ≥ 1) is the probability that a gambler with unlimited initialcapital gains 1 at some time. If p ≥ 1/2 this happens with probability 1 if we waitsufficiently long. Later we will see that in a fair game (p = 1/2) the expected waitingtime is infinite.

6.2 Optional stopping

Let us consider the question whether gambling chances can be improved by a gamblingsystem.

We start with a particularly simple gambling system, called optional stopping system.The idea is as follows: The gambler waits up to a random time σ and then startsgambling. (The game at period σ + 1 is the first game to play.) Gambling is continueduntil a further random time τ ≥ σ and then stops. (The game at period τ is the lastgame to play.) Random times are random variables σ, τ : Ω → N0 ∪ ∞.

Now it is an important condition that the choice of the random times σ and τ may de-pend only on the information available up to the times where the choice is made sincethe gambler does not know the future. A random time which satisfies this condition iscalled a stopping time. In the following we will turn this intuitive notion into a precisedefinition.

Let X1, X2, . . . , Xk, . . . be a sequence of random variables representing the outcomesof a game at times k = 1, 2, . . ..

6.4 DEFINITION. The σ-field Fk := σ(X1, X2, . . . , Xk). which is generated by theevents (X1 ∈ B1, X2 ∈ B2, . . . , Xk ∈ Bk), Bi ∈ B, is called the past of the sequence(Xi)i≥1 at time k.

The past Fk at time k is the information set of the beginning (X1, X2, . . . , Xk) of thesequence (Xi)i≥1. The history of the game is the family of σ-fields (Fk)k≥0 where

Page 75: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

6.2. OPTIONAL STOPPING 65

F0 = ∅, Ω. The history is an increasing sequence of σ-fields representing the in-creasing information in course of time.

6.5 DEFINITION. Any increasing sequence of σ-fields is called a filtration.

6.6 DEFINITION. A sequence (Xk)k≥1 of random variables is adapted to a filtration(Fk)k≥0 if Xk is Fk-measurable for every k ≥ 0.

Clearly, every sequence of random variables is adapted to its own history.

Now we are in the position to give a formal definition of a stopping time.

6.7 DEFINITION. Let (Fk)k≥0 be a filtration. A random variable τ : Ω → N0 ∪ ∞is a stopping time (relative to the filtration (Fk)) if

(τ = k) ∈ Fk for every k ∈ N.

In view of the causality theorem 5.16 the realisation of the events (τ = k) is deter-mined by the values of the random variables X1, X2, . . . , Xk, i.e.

1(τ=k) = fk(X1, X2, . . . , Xk)

where fk are any functions. In terms of gambling this means that the decisions onstarting or stopping the game depend only on the known past of the game.

PROBLEM 6.6: Let (Fk)k≥0 be a filtration and let τ : Ω → N0 ∪ ∞ be arandom variable. Show that the following assertions are equivalent:(a) (τ = k) ∈ Fk for every k ∈ N(b) (τ ≤ k) ∈ Fk for every k ∈ N(c) (τ < k) ∈ Fk−1 for every k ∈ N(d) (τ ≥ k) ∈ Fk−1 for every k ∈ N(e) (τ > k) ∈ Fk for every k ∈ N

The most important examples of stopping times are first passage times.

PROBLEM 6.7: Let (Xn)n≥0 be adapted. Show that the hitting time or firstpassage time

τ = mink ≥ 0 : Xk ∈ B

is a stopping time for any B ∈ B. (Note that τ = ∞ if Xk 6∈ B for all k ∈ N.)

6.8 Problem. Let (Xk) be a sequence adapted to (Fk) and let τ be a finite stoppingtime. Then Xτ is a random variable.

Page 76: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

66 CHAPTER 6. RANDOM WALKS

6.3 Wald’s equation

6.3.1 Improving chances

Does the stopping system improve the gambler’s chances ? For answering this questionwe require some preparations.

6.9 WALD’S EQUATION. Let (Xk)k≥0 be an independent sequence of integrable ran-dom variables with a common expectation E(Xk) = µ. If τ is an integrable stoppingtime then Sτ is integrable and

E(Sτ ) = µE(τ)

PROOF: It is sufficient to show that the equation is true both for the positive parts andthe negative parts of Xk. Let Xk ≥ 0. Then

E(Sτ ) =∞∑k=0

∫τ=k

Sk dP =∞∑i=0

∫τ≥i

Xi dP

=∞∑i=0

E(Xi)P (τ ≥ i) = µE(τ)

(Note, that the second equality holds since all terms are ≥ 0.) 2

The following assertion answers our question for improving chances by stopping strate-gies. It shows that unfavourable games cannot be turned into fair games and fair gamescannot be turned into favourable games. The result is a consequence of Wald’s equa-tion.

Note, that µ is the average gain for a single game and Sτ − Sσ is the accumulated gainfor the gambling strategy starting at σ and ending at τ .

6.10 COROLLARY. Let (Xk) be an independent sequence of integrable random vari-ables with a common expectation E(Xk) = µ. Let σ ≤ τ be integrable stopping times.Then:(a) µ < 0 ⇒ E(Sτ − Sσ) < 0.(b) µ = 0 ⇒ E(Sτ − Sσ) = 0.(c) µ > 0 ⇒ E(Sτ − Sσ) > 0.

6.3.2 First passage of a one-sided boundary

The strategy of waiting until the accumulated gain passes a given level, say 1, (whichhappens with probability one for fair games) and then stopping, seems to contradict

Page 77: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

6.3. WALD’S EQUATION 67

the preceding assertion. Let us have a closer look at this question.

The problem is concerned with one-sided boundaries. In this case first passage timesmay have expectation∞ such that Wald’s equation does not apply. In fact, for randomwalks with p < 1/2 there is a positive probability of never passing a positive boundary.Hence, the corresponding first passage times τ have the value ∞ with positive proba-bility and therefore E(τ) = ∞. In the symmetric case, however, we know that eachhorizontal boundary is passed with probability one. Surprisingly, the first passage timehas infinite expectation, too.

PROBLEM 6.8: Let (Sk) be a symmetric random walk and let τ := min(k ≥ 0 :Sk = 1). Show that P (τ <∞) and E(τ) = ∞.Hint: Assume E(τ) <∞ and derive a contradiction.

6.3.3 First passage of a two-sided boundary

Finally, we will apply Wald’s equation to first passage times of two-sided boundaries.Let (Sn)n≥0 be a random walk (starting at S0 = 0) with discrete steps +1 and −1. Let

τ ∗ := min(k ≥ 0 : Sk = a or Sk = −b).

It should be noted that τ ∗ is finite but not bounded ! The duration of a gambling systembased on such a stopping time is thus finite but not bounded. The random variable Sτ∗involves infinitely many periods n ∈ N.

If τa and τ−b denote the hitting times of the one-sided boundaries a resp. −b then wehave (Sτ∗ = a) = (τa < τ−b) and (Sτ∗ = b) = (τ−b < τa). Therefore the probabilitiesP (Sτ∗ = a) and P (Sτ∗ = b) can be obtained immediately from (6.2). For this, wehave only to note that (using the notation of (6.2))

P (Sτ∗ = a) = P ((τa < τ−b) = P (Ta+b < T0|V0 = b) = qa+b(b)

andP (Sτ∗ = −b) = P ((τ−b < τa) = P (T0 < Ta+b|V0 = b) = q0(b)

This gives us the distribution of Sτ∗ since from (6.2) we also know that τ ∗ is finite a.s.and therefore Sτ∗ has only two values a and −b. In this way we can calculate

E(Sτ∗) = aP (Sτ∗ = a)− bP (Sτ∗ = −b) = . . .

Next, let us turn to the problem of calculating E(τ ∗). Our idea is to apply Wald’sequation E(Sτ∗) = µE(τ ∗). However, we proved Wald’s equation only for integrablestopping times. Up to now we don’t know whether τ ∗ is integrable. Thus, in order toapply our version of Wald’s equation we have to use some tricks.

Page 78: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

68 CHAPTER 6. RANDOM WALKS

PROBLEM 6.9: Show that the hitting time τ∗ satisfies E(Sτ∗) = µE(τ∗).Hint: The basic trick is to approximate τ∗ by bounded stopping times. Define

τ∗ ∩ n = minτ∗, n =τ∗ whenever τ∗ ≤ nn whenever τ∗ ≥ n

It is easy to see that τ∗ ∩ n is a stopping time. Since it is bounded we may applyWald’s equation and obtain

E(Sτ∗∩n) = µE(τ∗ ∩ n)

It is clear that τ∗ ∩ n ↑ τ∗ as n → ∞ and similarly Sτ∗∩n → Sτ∗ . In order toobtain the validity of Wald’s equation for τ∗ we have to think about the questionwhether the corresponding expectations converge. The answer is that the left handside converges by Lebegue’s dominated convergence theorem and the right handside converges by Levi’s theorem.PROBLEM 6.10: Find E(τ∗) if µ 6= 0.

Unfortunately, Wald’s equation gives E(τ ∗) only if µ 6= 0 which is the case if p 6= 1/2.It does not work for the symmetric random walk. We will come back to this problemlater (problem 7.18).

6.4 Gambling systems

Now, we generalize our gambling system. We are going to admit that the gambler mayvary the stakes. The stopping system is a special case where only 0 and 1 are admittedas stakes.

The stake for game n is denoted by Hn and has to be nonnegative. It is fixed before pe-riod n and therefore must be Fn−1-measurable since it is determined by the outcomesat times k = 1, 2, . . . , n− 1. The sequence of stakes (Hn) is thus not only adapted buteven predictable.

6.11 DEFINITION. A sequence (Xk)k≥1 of random variables is predictable to afiltration (Fk)k≥0 if Xk is Fk−1-measurable for every k ≥ 0.

The gain at game k is HkXk = Hk(Sk − Sk−1) = Hk∆Sk. For the wealth of thegambler after n games we obtain

Vn = V0 +n∑k=1

Hk(Sk − Sk−1) = V0 +n∑k=1

Hk∆Sk (1)

If the stakes are integrable then we have

E(Vn) = E(Vn−1) + E(Hn)E(Xn).

Page 79: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

6.4. GAMBLING SYSTEMS 69

In particular, if p = 1/2 we have E(Vn) = 0 for all n ∈ N.

It follows that variable stakes cannot change the fairness of a game. This is even trueif we combine variable stakes with stopping strategies. The following assertion goesbeyond Wald’s equation since the wealth sequence (Vn)n≥0 need not be a random walk.Later we will use such sequences as prototypes of martingales.

6.12 THEOREM. Let (Xk)k≥1 be an independent sequence of integrable random vari-ables with a common expectation E(Xk) = µ. Let (Vn)n≥0 be the sequence of wealthsgenerated by a gambling system with integrable stakes. If σ ≤ τ are bounded stoppingtimes then(a) µ < 0 ⇒ E(Vτ − Vσ) < 0,(b) µ = 0 ⇒ E(Vτ − Vσ) = 0,(c) µ > 0 ⇒ E(Vτ − Vσ) > 0.

PROOF: Let N := max τ . Since

Vτ − Vσ =N∑k=1

HkXk1σ<k≤τ

and since (σ < k ≤ τ) is independent of Xk it follows that

E(Vτ − Vσ) = µN∑k=1

E(Hk1σ<k≤τ ).

2

There is a difference to the situation with constant stakes where we admitted un-bounded but integrable stopping times. In the present case of variable stakes this isno longer true as is shown by the famous doubling strategy.

6.13 EXAMPLE. Let τ be the waiting time to the first success, i.e.

τ = mink ≥ 1 : Xk = 1

and define stakes byHn := 2n−11τ≥n

Obviously, the stakes are integrable. However, we have

P (Vτ = 1) = 1

for any p ∈ (0, 1). Therefore, a fair game can be transformed into a favourablegame by such a strategy. And this is true although the stopping time τ is inte-grable, actually E(τ) = 1/p !

Fill in the details as PROBLEM 6.11 .

Page 80: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

70 CHAPTER 6. RANDOM WALKS

PROBLEM 6.12: Let p = 1/2. Show that for the doubling strategy we haveE(Vτ∩n) = 0 for every n ∈ N.PROBLEM 6.13: Explain for the doubling strategy, why Lebesgue’s theorem on

dominated convergence does not imply E(Vτ∩n) → E(Vτ ), although Vτ∩n → Vτas n→∞.Hint: Show that the sequence (Vτ∩n) is not dominated from below by an inte-grable random variable.

Page 81: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 7

Conditioning

7.1 Conditional expectation

Now we are going to explore the most important and most successful probabilisticnotion for describing dependence. The starting point is the relation between a randomvariable and a σ-field.

Let (Ω,F , P ) be a probability space and let A ⊆ F be a sub-σ-field.

If a random variable X is A-measurable, i.e. σ(X) ⊆ A, then the information avail-able in A tells us everything about X . In fact, every assertion (X ∈ B), B ∈ B,about X is contained in A, and if we know which events in A have been realized weknow everything about X . A special case is the causality theorem according to whichA = σ(Y ) implies that X = f(Y ).

Now, how can we describe the relation between X and A if X is not A-measurable ?

If the random variable X is notA-measurable we could be interested in finding an op-timalA-measurable approximation of X . This idea leads to the concept of conditionalexpectation.

7.1 DISCUSSION. Let us explain the kind of approximation we have in mind.

A successful way consists in decomposing X into a sum X = Y + R where Yis A-measurable and R is uncorrelated to A. A minimal requirement on Y is thatE(X) = E(Y ) which implies E(R) = 0. Then the condition on R of beinguncorrelated to A means ∫

ARdP = 0 for all A ∈ A.

In other words the approximating variable Y should satisfy the condition∫AX dP =

∫AY dP for all A ∈ A

71

Page 82: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

72 CHAPTER 7. CONDITIONING

Clearly, for these integrals to be defined we need nonnegative or integrable ran-dom variables.

As a matter of fact, for each integrable random variable X there exists a uniquelydetermined optimal A-measurable approximation.

7.2 THEOREM. Let X ≥ 0 or integrable and letA ⊆ F be a σ-field. Then there existsa P -a.s. uniqueley determined random variable Y satisfying∫

A

X dP =

∫A

Y dP for all A ∈ A

If X ≥ 0 then Y ≥ 0 P -a.e. If X is integrable then Y is integrable, too.

PROOF: This is a consequence of the Radon-Nikodym theorem. If X ≥ 0 then

µ(A) :=∫A

X dP defines a measure on A such that µ P . Define Y :=dµ

dP.

Then this random variable fulfils the asserted equation, is nonnegative and is uniquelydetermined P -a.e.

If X is integrable apply the preceding to X+ and X−. 2

7.3 DEFINITION. Let (Ω,F , P ) be a probability space and let A ⊆ F be sub-σ-field.Let X be a nonnegative or integrable random variable. The conditional expectationE(X|A) of X given A is a nonnegative resp. integrable, and A-measurable randomvariable satisfying ∫

A

X dP =

∫A

E(X|A) dP for all A ∈ A

Let us consider the evaluation of conditional expectations.

PROBLEM 7.1: Find the conditional expectation given a finite field.Hint: Find the values of the conditional expectation on the sets of the generatingpartition.

Let X and Y be random variables where X is integrable. Then the conditional expec-tation E(X|σ(Y )) is a σ(Y )-measurable random variable and thus can be written asE(X|σ(Y )) = f Y . The function f is called the regression function of X w.r.t. Y .Usually, notation is simplified by E(X|σ(Y )) =: E(X|Y ) and f(y) =: E(X|Y = y).

PROBLEM 7.2: Let X and Y be random variables where X is integrable. As-sume that Y is a simple random variable. Calculate E(X|Y = y).PROBLEM 7.3: Let X and Y be random variables where X is integrable. As-

sume that the joint distribution of (X,Y ) has the Lebesgue density p(x, y). Show

Page 83: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

7.1. CONDITIONAL EXPECTATION 73

that the regression function of X w.r.t. Y is

f(y) = E(X|Y = y) =∫x p(x, y) dx∫p(x, y) dx

Hint: Show that ∫Y ∈B

X dP =∫Y ∈B

f(Y ) dP, B ∈ B.

Now, we turn to the mathematical properties of the conditional expectation.

The following properties are easy consequences of the definition of conditional ex-pectations. Moreover, these properties are intuitively plausible in the sense that anyreasonable notion of a conditional expectation should have these properties.

7.4 THEOREM. Assume that X is an integrable random variable. Then:(a) E(E(X|A)) = E(X)(b) If X is A-measurable then E(X|A) = X .(c) If X is independent of A then E(X|A) = E(X).

The proof is left as PROBLEM 7.4 .

7.5 LINEARITY OF CONDITIONAL EXPECTATIONS.(1) Assume that X and Y are nonnegative random variables. Then

E(αX + βY |A) = αE(X|A) + βE(Y |A) whenever α, β ≥ 0.

(2) Assume that X and Y are integrable random variables. Then

E(αX + βY |A) = αE(X|A) + βE(Y |A) whenever α, β ∈ R.

PROOF: The proof follows a scheme which is always the same for similar assertionsconcerning formulas for conditional expectations: If we want to show that E(Z1|A) =Z2 then we have to show that the equations∫

A

Z1 dP =

∫A

Z2 dP, A ∈ A,

are true. The verification of these equations is in most cases completely straightforwardusing standard properties of integrals. 2

From linearity it follows immediately that conditional expectations preserve the orderstructure (isotonicity) and consequently fulfil several inequalities which depend onlinearity and isotonicity.

PROBLEM 7.5: If X ≤ Y are nonnegative or integrable random variables thenE(X|A) ≤ E(Y |A).

Page 84: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

74 CHAPTER 7. CONDITIONING

PROBLEM 7.6: Let X ∈ L1. Show that |E(X|A)| ≤ E(|X||A).PROBLEM 7.7: Let X ∈ L1. Show that E(X|A)2 ≤ E(X2|A).

Hint: Start with E((X − E(X|A))2|A) ≥ 0.PROBLEM 7.8: Show that X ∈ L2 implies E(X|A) ∈ L2.

7.6 ITERATED CONDITIONING. Let A ⊆ B be sub-σ-fields of F . Then for nonnega-tive or integrable X

E(E(X|A)|B) = E(E(X|B)|A) = E(X|A)

(The smaller σ-field ”succeeds”.)

The proof is left as PROBLEM 7.9 .

The following is located at the core of the concept of conditional expectations. Itsays that conditional expectations are not only homogeneous with respect to constantfactors but also with respect to A-measurable factors if A is the conditioning σ-field.

7.7 REDUNDANT CONDITIONING. Let X and Y be square integrable. If X is A-measurable then

E(XY |A) = XE(Y |A).

PROOF: For X = 1A this is immediate by definition of the conditional expectation.The general case follows by measure theoretic induction. Fill the details as PROBLEM

7.10 . 2

PROBLEM 7.11: Let X ∈ L2. Show that E(X|A) minimizes E((X − Y )2)among all A-measurable variables Y ∈ L2.PROBLEM 7.12: Let X and Y be square integrable. If X is A-measurable andY is independent of A then

E(XY |A) = XE(Y ).

PROBLEM 7.13: (1) Let X be a random variable with moment generating func-tion being finite on an interval I and A a σ-field. If E(etX |A) is constant forevery t ∈ I then X and A are independent.(2) Find a similar assertion involving the Fourier transform.

If A = σ(X) and Y is independent of X then problem ?? implies

E(XY |X = x) = E(xY |X = x)

This is intuitively plausible since conditioning on X = x determines the value of X .This is even true in more general cases like

E(f(X,Y )|X = x) = E(f(x, Y ))

Page 85: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

7.2. MARTINGALES 75

provided that f is sufficiently integrable. Confer the following problem for a generalversion.

PROBLEM 7.14: IfA is any σ-field such that X isA-measurable and Y is inde-pendent of A then the

E(f(X,Y )|A) = φ X where φ(ξ) = E(f(ξ, Y ))

7.2 Martingales

In gambler’s speech gambling systems are called martingales. This might be the his-torical reason for the mathematical terminology: A (mathematical) martingale is themathematical notion of a value process of a gambling strategy in a fair game.

From Theorem 6.12 we know that the expected gambler’s wealth in a fair game cannotbe changed by stopping times. This property can equivalently be stated without usingthe concept of stopping times. The basis is the following lemma.

7.8 LEMMA. Let (Xn)n≥0 be adapted to (Fn)n≥0. If σ ≤ τ are bounded stoppingtimes then for any A ∈ F∫

A

(Xτ −Xσ)dP =n∑j=1

∫A∩(σ<j≤τ)

(Xj −Xj−1

)dP

PROOF: Let τ ≤ n. It is obvious that

Xτ = X0 +∑j≤τ

(Xj −Xj−1) = X0 +n∑j=1

1τ≥j(Xj −Xj−1)

This gives ∫A

(Xτ −X0)dP =n∑j=1

∫A∩(τ≥j)

(Xj −Xj−1)dP

Further details are left as PROBLEM 7.15 . 2

7.9 THEOREM. Let (Xn)n≥0 be adapted to (Fn)n≥0. Then the following assertions areequivalent:(1) E(Xσ) = E(Xτ ) for all bounded stopping times σ ≤ τ .(2) E(Xj|Fj−1) = Xj−1, j ≥ 1.

PROOF: (2) ⇒ (1) is clear from 7.8. (Take A = Ω and replace Xj by E(Xj|Fj−1)).

Page 86: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

76 CHAPTER 7. CONDITIONING

Let us show that (1) ⇒ (2). Choose F ∈ Fj−1 and define

τ :=

j − 1 whenever ω ∈ F,j whenever ω 6∈ F.

Then τ is a stopping time. From E(Xj) = E(Xτ ) the assertion follows. 2

7.10 DEFINITION. Let (Fn)n≥0 be a filtration and let (Xn)n≥0 be an adapted sequenceof integrable random variables. The sequence (Xn) is called a martingale if any ofcondition (1) or (2) of Theorem 7.9 is satisfied.

PROBLEM 7.16: Let Sn = X1 + X2 + · · · + Xn where (Xi) are independentidentically distributed (i.i.d.) and integrable random variables with E(Xi) = µ.(a) Show that Mn := Sn − nµ is a martingale.(b) Derive Wald’s equation for bounded stopping times from the martingale prop-erty.PROBLEM 7.17: Let Sn = X1+X2+· · ·+Xn where (Xi) are independent iden-

tically distributed (i.i.d.) and square integrable random variables with E(Xi) = 0and V (Xi) = σ2.(a) Show that Mn := S2

n − σ2n is a martingale.Hint: Note that S2

n − S2n−1 = X2

n + 2Sn−1Xn.(b) Show that E(S2

τ ) = E(τ)σ2 for bounded stopping times τ .PROBLEM 7.18: Let τ∗ be a first passage time of a symmetric random walk with

a two-sided boundary.(a) Show that E(S2

τ∗) = E(τ∗)σ2.(b) Find E(τ∗).PROBLEM 7.19: Let Sn = X1 + X2 + · · · + Xn where (Xi) are independent

identically distributed (i.i.d.) and integrable random variables with E(Xi) = 0.Let (Hk) be a predictable (with respect to the history of (Sn)) sequence of inte-grable random variables.Show that

Vn := V0 +n∑k=1

Hk(Sk − Sk−1)

is a martingale.

It turns out that the variance of a martingale is the sum of the variances of its dif-ferences. This is an extension of the familiar formula for the variance of a sum ofindependent random variables.

PROBLEM 7.20: Let (Mk) be a square integrable martingale. Then

E(M2n) = M0 +

n∑k=1

E((Mk −Mk−1)2).

Page 87: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

7.2. MARTINGALES 77

7.11 DEFINITION. Let (Fn)n≥0 be a filtration and let (Xn)n≥0 be an adapted sequenceof integrable random variables.The sequence (Xn) is called a submartingale if E(Xσ) ≤ E(Xτ ) for all boundedstopping times σ ≤ τ .The sequence (Xn) is called a supermartingale if E(Xσ) ≥ E(Xτ ) for all boundedstopping times σ ≤ τ .

PROBLEM 7.21: Extend theorem 7.9 to submartingales and supermartingales.PROBLEM 7.22: Let (Mn) be a square integrable martingale. Show that (|Mn|)

and (M2n) are a submartingales.

The importance of the martingale concept becomes even more clear by the followingtheorem which is the elementary version of the celebrated Doob-Meyer decomposi-tion. It is the final mathematical formulation of the old idea that time series can bedecomposed into a noise component and a trend component. The notion of a martin-gale turns out to be the right formalization of the idea of noise.

Recall the definition of a predictable sequence.

PROBLEM 7.23: Show that a predictable martingale is constant.

7.12 DOOB-MEYER DECOMPOSITION. Each adapted sequence (Xn)n≥0 of inte-grable random variables can be written as

Xn = Mn + An, n ≥ 0,

where (Mn) is a martingale and (An) is a predictable sequence. The decomposition isunique up to constants.The sequence (Xn)n≥0 is a martingale iff (An) is constant.

PROOF: Let

Mn =n∑j=1

(Xj − E(Xj|Fj−1))

and

An =n∑j−1

(E(Xj|Fj−1)−Xj−1)

This proves existence of the decomposition. Uniqueness follows from the fact that apredictable martingale is constant. Details are PROBLEM 7.24 . 2

PROBLEM 7.25: Describe martingales, submartingales and supermartingales interms of the predictable components of the corresponding Doob-Meyer decom-position.

Page 88: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

78 CHAPTER 7. CONDITIONING

7.3 Some theorems on martingales

Equation (1) of Theorem 7.9 extends easily to stopping times after having defined thepast of a stopping time.

7.13 DEFINITION. Let τ be a stopping time. Then

Fτ := A ∈ F : F ∩ (τ ≤ j) ∈ Fj, j ≥ 0

is called the past of the stopping time τ .

PROBLEM 7.26: Show that the past of a stopping time is a σ-field.

7.14 OPTIONAL STOPPING THEOREM. Let (Xn)n≥0 be a martingale. Then for anypair σ ≤ τ of bounded stopping times we have

E(Xτ |Fσ) = Xσ

PROOF: Applying 7.8 to A ∈ Fσ proves the assertion. Details are PROBLEM 7.27 .2

PROBLEM 7.28: Extend ?? to submartingales and supermartingales.

7.15 MAXIMAL INEQUALITY. Let (Xk) be a nonnegative submartingale. Then

P (maxj≤n

Xj ≥ ε) ≤ 1

ε

∫maxj≤n Xj≥ε

XndP ≤ E(Xn)

ε

PROOF: Let τ = mink : Xk ≥ ε and put τ = n if maxk≤n Xk < ε. This is astopping time. Denote M := maxk≤n Xk. Then

E(Xn) ≥ E(Xτ )

=

∫M≥ε

XτdP +

∫M<ε

XτdP

≥ εP (M ≥ ε) +

∫M<ε

XndP

2

7.16 KOLMOGOROFF’S INEQUALITY.Let (Xk) be a nonnegative submartingale. Then

E(maxk≤n

X2k) ≤ 2E(X2

n)

Page 89: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

7.3. SOME THEOREMS ON MARTINGALES 79

PROOF: Let X = maxj≤n Xj and Y = Xn. Then we know from the maximalinequality that

P (X ≥ t) ≤ 1

t

∫X≥t

Y dP

It follows that

E(X2) = 2

∫ ∞

0

tP (X ≥ t) dt

≤ 2

∫ ∞

0

∫X≥t

Y dP dt

= 2

∫XY dP ≤ 2

√E(X2)E(Y 2)

2

Page 90: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

80 CHAPTER 7. CONDITIONING

Page 91: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 8

Continuous time processes

8.1 Basic concepts

A stochastic process (random process) on a probability space (Ω,F , P ) is a family(Xt)t≥0 of random variables. The parameter t is usually interpreted as time. Therefore,the intuitive notion of a stochastic process is that of a random system whose state attime t is Xt.

There are some notions related to a stochastic process (Xt)t≥0 which are importantfrom the very beginning: the starting value X0, the increments Xt − Xs for s < t,and the paths t 7→ Xt(ω), ω ∈ Ω.

The most important prototypes of stochastic processes are defined in terms of the prop-erties of the increments and their path properties.

8.2 The Poisson process

Let us start with the Poisson process. This process has the advantage of admitting aconstructive approach.

Assume that incidental signals appear in the course of time. The waiting times betweentwo subsequent signals follow an exponential distribution with a fixed parameter λ >0. Different waiting times are independent from each other. The Poisson process(Nt)t≥0 is the counting process of the signals, i.e. Nt is number of signals in [0, t].

PROBLEM 8.1: Let τ be a nonnegative random variable. Show that τ follows anexponential distribution iff it has the absence of memory property

P (τ > t+ s|τ > t) = P (τ > s) s > 0, t > 0.

81

Page 92: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

82 CHAPTER 8. CONTINUOUS TIME PROCESSES

Hint: For the sufficiency part show that g(t) := P (τ > t) satisfies g(s + t) =g(s)g(t).

Let us put the idea of the Poisson process into mathematical terms.

Let τ1, τ2, . . . , τn, . . . be a sequence of independent random variables each distributedaccording to G(1, λ), i.e. an exponential distribution with parameter λ. These randomvariables stand for the waiting times between subsequent signals. Let

Tk = τ1 + τ2 + · · ·+ τk

be the waiting time for the k-th signal. We know that Tk follows the Gamma distribu-tion G(k, λ).

The Poisson process (Nt)t≥0 is now defined by

Nt = n ⇔ Tn ≤ t < Tn+1, n = 0, 1, 2, . . . (1)

This means: Nt is the number of signals during the time interval [0, t].

Basically, this is the full definition of the Poisson process. There are several otherequivalent definitions. But for the moment we are interested in the basic properties ofthe Poisson process which follow from our construction.

PROBLEM 8.2: Explain why the starting value of a Poisson process is N0 = 0.PROBLEM 8.3: Show that P (Nt < ∞) = 1, t > 0. (The Poisson process has

no explosions).Hint: Show that limn→∞ Tn = ∞, P -a.s.

Let us turn to the path properties of the Poisson process. Any single path t 7→ Nt(ω)of the Poisson process starts at N0 = 0 and jumps to Nt = 1 as soon as the firstsignal occurs. Any further change of the value is a jump of height +1. Between twojumps the paths are constant. By definition the intervals where the paths are constantare left closed and right open which implies that the paths of the Poisson process arecontinuous from right and have limits from left (cadlag).

8.1 DEFINITION. A stochastic process is a counting process if it starts at 0, hascadlag paths which are constant except at jumps of height +1 and has no explosions.

As a result we obtain: With respect to its path properties the Poisson process is acounting process.

It should be clear that there are as many counting processes as there are sequences ofnonnegative random variables (τn)n≥1 such that

∑ni=1 τi → ∞. The Poisson process

is a very special counting process which is generated by an i.i.d sequence of exponen-tially distributed random variables.

Page 93: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

8.2. THE POISSON PROCESS 83

Let us turn to the distributional properties of the Poisson process. The key result is asfollows.

8.2 THEOREM. (1) A Poisson process has independent increments, i.e. if 0 < t1 <t2 < . . . < tn then the increments

Nt1 , Nt2 −Nt1 , . . . , Ntn −Ntn−1

are independent variables.(2) The increments Nt − Ns, 0 ≤ s < t, of a Poisson process follow a Poissondistribution with parameter λ(t− s).

The first part of the assertion says that the Poisson process has independent incre-ments. The second part implies that the distribution of the increments depends only ofthe length of the interval but not on the position. This means that the Poisson processhas stationary increments.

Both parts of the preceding theorem are consequences of the fact that the waiting timesbetween subsequent jumps have a common exponential distribution.

PROBLEM 8.4: Prove part (2) of Theorem 8.2.Hint: Apply redundant conditioning to show that

P (Tn ≤ t < Tn+1) =∫Tn≤t

P (τn+1 > t−Tn|Tn) dP =∫ t

0P (τn+1 > t−y)P Tn(dy)

A formal proof of part (1) of Theorem 8.2 involves the substitution formula for mul-tiple integration. The proof is based on a property of the Poisson process which is ofinterest in itself.

PROBLEM 8.5: Let (Nt)t≥0 be a Poisson process. Show that 8.2 implies that for0 < s < t and k + l = n, k, l ∈ N,

P (Ns = k,Nt −Ns = `|Nt = n) =n!k! `!

(st

)k( t− s

t

)`In order to prove 8.2 one has to establish the assertion of the preceding exercise asconsequence of exponentially distributed waiting times. The assertion itself says thatgiven the number of signals in an interval the positions of the signals are distributedlike independent and uniformly distributed random variables. This fact can be used forefficient simulation of Poisson processes.

We saw that the Poisson process is a counting process with stationary and indepen-dent increments. It can be shown that these properties already determine the Poissonprocess. The exponential distribution of the waiting times (or equivalently the Poissondistribution of the increments) is a necessary consequence.

Page 94: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

84 CHAPTER 8. CONTINUOUS TIME PROCESSES

PROBLEM 8.6: Show that a Poisson process is continuous in probability,i.e. lims→tNs = Nt (P ).

Note that continuity in probability does not contradict to the existence of jumps. Inparticular, it does not imply that paths are continuous ! If a process with cadlag pathsis continuous in probability then the only assertion we can make is that the paths don’thave fixed jumps like calendar effects.

8.3 Point processes

The Poisson process is in many respects the most basic and simple prototype of astochastic process. Let us give a brief survey of classes of stochastic processes whicharise by generalizing the Poisson process.

Let us start with counting processes. We have already mentioned that any sequence ofnonnegative random variables (τn)n≥1 such that

∑ni=1 τi →∞ defines a counting pro-

cess (Nt)t≥0 by formula (1). If these random variables are independent and identicallydistributed then the counting process is a renewal process. We will not go into furtherdetails of renewal processes.

A counting process can be viewed as a system which produces the value +1 at certainrandom times Tn = τ1 + τ2 + · · · + τn. A point process is a system which producesmore general values (Yn) at random times Tn. Thus, a point process is defined by asequence (Tn, Yn)n∈N of pairs of random variables.

After n jumps and before the n + 1-st jump, i.e. if Tn ≤ t < Tn+1, the value Xt of thepoint process is Sn = Y1 + Y2 + · · ·+ Yn. This means Xt = SNt , t ≥ 0.

It should be clear that point processes have cadlag paths which are constant except atjumps. Jumps heights can have any size and may be random.

A simple but important special case of a point process is a compound Poisson process.In such a case (Yn) is a sequence of independent and identically distributed randomvariables and (Nt) is a Poisson process which is independent of (Yn). In other wordswe have

Xt =Nt∑i=1

Yi, t ≥ 0.

PROBLEM 8.7: Let (Y1, Y2, . . . , Yn) be independent random variables with com-mon distributionQ and letN be a Poisson random variable with parameter λ. Thedistribution of S = Y1 + · · ·+YN is called compound Poisson distribution withparameters Q and λ.(1) Find the Fourier transform of a compound Poisson distribution.

Page 95: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

8.4. LEVY PROCESSES 85

(2) Show that for every compound Poisson distribution there exists a uniquelydetermined pair (Q,λ) such that Q0 = 0.PROBLEM 8.8: Find expectations and variances of a compound Poisson process.PROBLEM 8.9: Show that a compound Poisson process has independent and sta-

tionary increments.PROBLEM 8.10: Show that a compound Poisson process is continuous in prob-

ability.

For every stochastic process (Xt) with cadlag paths the jump at time t is given by∆Xt := Xt −Xt−. We may count the number of jumps with particular jump heights.E.g. ∑

s≤t

1(|∆Xs|≥1)

denotes the number of jumps until time t with absolute jump height ≥ 1. Since acadlag function on a compact interval can have only finitely many jumps ≥ 1 the sumis finite. In general, for B ∈ B let

Nt(B) :=∑s≤t

1(∆Xs∈B\0).

For every B ∈ B which is bounded away from zero this is a counting process.

8.3 DEFINITION. Let (Xt) be a stochastic process with cadlag paths. Then µt(B) :=E(Nt(B)), B ∈ B, is called the jump measure of the process.

PROBLEM 8.11: Find the jump measure of a compound Poisson process.

8.4 Levy processes

If we concentrate at the incremental properties of the Poisson process then we arriveat another important class of processes.

8.4 DEFINITION. A stochastic process is called a Levy process if it has cadlag paths,independent and stationary increments and is continuous in probability.

Applying this terminology we note that Poisson processes and compound Poisson pro-cesses are Levy processes.

Compound Poisson processes have paths which are piecewise constant (constant onintervals which are separated by isolated jumps). General Levy processes need nothave piecewise constant paths. The path structure of a Levy process may be of a verycomplicated nature. There may be continuous parts and parts with accumulation ofinfinitely many very small jumps.

Page 96: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

86 CHAPTER 8. CONTINUOUS TIME PROCESSES

Note that for stochastic sequences the property of having independent and stationaryincrements is typical for random walks. Thus, Levy processes are the natural extensionof random walks to continuous time.

PROBLEM 8.12: Assume that (Xt) is a square integrable Levy process. Showthat expectations and variances of Xt are proportional to t.PROBLEM 8.13: Let (Xt) be a Levy process.

(1) Show that E(eiuXt) = eitψ(u).(2) Explain why the function ψ determines the distribution of the increments of(Xt).PROBLEM 8.14: Find the function ψ for Poisson processes and compound Pois-

son processes.

At this point we are far from a systematic discussion of the theory of Levy processes.But we are in a position to get an idea of the richness of this concept.

The distributions of the random variables of a Levy process have a remarkable prop-erty. In order to describe this property we need the notion of convolution. The prob-abilistic version of convolution is as follows: Let X and Y be independent randomvariables. Then the distribution of X +Y is called the convolution of PX and P Y andis denoted by PX ? P Y .

PROBLEM 8.15: Let X and Y be independent random variables with distribu-tions µ1 and µ2, respectively. Let m be the distribution of X + Y . Show that∫

f dm =∫ ∫

f(x+ y)µ1(dx)µ2(dy) (2)

for every bounded measurable function f .

Equation (2) serves as definition of the convolution m = µ1 ? µ2 of arbitrary finitemeasures. In analytical terms a convolution is most easily described by Fourier trans-forms.

PROBLEM 8.16: Show that m = µ1 ? µ2 iff m = µ1 · µ2.

Now we are in a position to state the announced remarkable property of the distribu-tions of a Levy process.

8.5 THEOREM. Let (Xt)t≥0 be a Levy process and denote µt := PXt , t ≥ 0. Thenµs ? µt = µs+t, i.e. the distribution of Xs+t is the convolution of the distributions ofXs and Xt, s, t ≥ 0.

The proof is PROBLEM 8.17 .

Page 97: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

8.5. THE WIENER PROCESS 87

Now, there is an important converse of Theorem 8.5. It says that for every familyof distributions (µt) satisfying the convolution property of 8.5 (together with somecontinuity condition) there exists a corresponding Levy process.

The examples of Levy processes we know so far are the Poisson process and the com-pound Poisson process. But there is a plenty of further examples of Levy processeswhich can be described by the corresponding family (µt).

PROBLEM 8.18: Show that the family of normal distributions µt := ν(0, t),t ≥ 0, has the convolution property.PROBLEM 8.19: Show that the family of Gamma distributions µt := G(t, 1),t ≥ 0, has the convolution property.

The second example corresponds to a Levy process with a very complicated path struc-ture. The paths are increasing and not constant on any interval, but do not contain anycontinuous parts. The process is a pure jump process driven by infinitely many jumpswhich are dense on every time interval.

The first example corresponds to the Wiener process. This is the only Levy processhaving continuous paths.

8.5 The Wiener Process

8.6 DEFINITION. A stochastic process (Wt)t≥0 is called a Wiener process if(1) the starting value is W0 = 0,(2) the increments Wt −Ws are N(0, t− s)-distributed and mutually independent fornon-overlapping intervals,(3) the paths are continuous for P -almost all ω ∈ Ω.

The Wiener process is thus a Levy process with continuous paths. Later (problem13.17) it will be shown that the Wiener process is the only Levy process with continu-ous paths (up to a scaling factor). The distributional properties of the increments are anecessary consequence.

PROBLEM 8.20: Let (Wt) be a Wiener process and define Xt = x0 +µt+σWt

(generalized Wiener process). Discuss the properties of (Xt).

As it is the case with many probability models one has to ask whether there existsa probability space (Ω,F , P ) and a family of random variables (Wt)t≥0 satisfyingthe properties of Definition 8.6. The mathematical construction of such models is acomplicated matter and is one of great achievements of probability theory in the firsthalf of the 20th century. Accepting the existence of the Wiener process as a valid

Page 98: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

88 CHAPTER 8. CONTINUOUS TIME PROCESSES

mathematical model we may forget the details of the construction (there are several ofthem) and start with the axioms stated in 8.6.

It is, however, easy to set up discrete time random walks which approximately sharethe properties of a Wiener process.

8.7 DISCUSSION. Let X1, X2, . . . be independent replications of a bounded ran-dom variable X such that E(X) = 0 and V (X) = σ2 <∞. (E.g. P (Xi = u) =d/(u+ d), P (Xi = −d) = u/(u+ d), where u, d > 0.) Then for every n ∈ N

Snt =1√n

[nt]∑i=1

Xi, t ≥ 0,

is a random walk with jumps at every point t = k/n, k ∈ N, and being constanton intervals of length 1/n.

This random walk approximately shares the properties of a Wiener process if nis large. Indeed, the paths are ”almost continuous” since the jump heights areuniformly small in that they are are determined by the distribution of X/

√n. The

increments are independent at least on the discrete time scale. And from the CLTit follows that the distribution of the Snt as well as of the increments Snt − Snstend to normal distributions N(0, σ2t) and N(0, σ2(t− s)), respectively.

PROBLEM 8.21: Let (Wt)t≥0 be a Wiener process. Show that Xt := −Wt,t ≥ 0, is a Wiener process, too.PROBLEM 8.22:PROBLEM 8.23: Show that Wt/t

P→ 0 as t→∞.

For beginners the most surprising properties are the path properties of a Wiener pro-cess.

The paths of a Wiener process are continuous (which is part of our definition). Inthis respect the paths seem to be not complicated since they have no jumps or othersingularities. It will turn out, however, that in spite of their continuity, the paths of aWiener process are of a very peculiar nature.

8.8 THEOREM. Let (Wt)t≥0 be a Wiener process. For every t > 0 and every Rieman-nian sequence of subdivisions 0 = tn0 < tn1 < . . . < tnn = t

n∑i=1

|W (tni )−W (tni−1)|2P→ t =: Q(t), t > 0.

PROOF: Let Qn(t) :=∑n

i=1 |W (tni )−W (tni−1)|2 for a particular Riemannian sequenceof subdivisions. Show that E(Qn(t)) = t and V (Qn(t)) → 0. Then the assertionfollows from Cebysev’s inequality.

Page 99: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

8.5. THE WIENER PROCESS 89

Details are PROBLEM 8.24 . 2

Theorem 8.8 shows that (almost) all paths of a Wiener process have nonvanishingquadratic variation. This implies that the paths don’t have bounded variation on anyinterval (see problem ??). Actually, it can even be shown that they are nowhere differ-entiable.

It is remarkable that the quadratic variation t 7→ Q(t) of the Wiener process is adeterministic function of a very simple nature in that it is a linear function.

Page 100: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

90 CHAPTER 8. CONTINUOUS TIME PROCESSES

Page 101: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 9

Continuous time martingales

9.1 From independent increments to martingales

The topic of this section are martingales. Our main examples will be general Levyprocesses. Therefore the assertions of this section cover Wiener processes, Poissonprocesses and compound Poisson processes.

We start with the concept of the past of a process.

9.1 DEFINITION. Let (Xt)t≥0 be stochastic process. The past of the process (Xt)t≥0

at time t is the σ-field of events FXt = σ(Xs : s ≤ t) generated by variables Xs of the

process prior to t, i.e. s ≤ t. The internal history of (Xt)t≥0 is the family (FXt )t≥0 of

pasts of the process.

The intuitive idea behind the concept of past is the following: FXt consists of all

events which are observable if one observes the process up to time t. It represents theinformation about the process available at time t. It it obvious that t1 < t2 ⇒ FX

t1⊆

FXt2

. If X0 is a constant then FX0 = ∅, Ω.

9.2 DEFINITION. Any increasing family of σ-fields (Ft)t≥0 is called a filtration.A process (Yt)t≥0 is adapted to the filtration (Ft)t≥0 if Yt is Ft-measurable for everyt ≥ 0.

The internal history (FXt )t≥0 of a process (Xt)t≥0 is a filtration and the process (Xt)t≥0

is adapted to its internal history. But also Yt := φ(Xt) for any measurable function φis adapted to the internal history of (Xt)t≥0. Adaption simply means that the past ofthe process (Yt)t≥0 at time t is contained in Ft. Having the information contained in

91

Page 102: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

92 CHAPTER 9. CONTINUOUS TIME MARTINGALES

Ft we know everything about the process up to time t.

9.3 DEFINITION. A martingale relative to the filtration (Ft)t≥0 is an adapted andintegrable stochastic process (Xt)t≥0 such that

E(Xt|Fs) = Xs whenever s < t

It is a square integrable martingale if E(X2t ) < ∞, t ≥ 0.

PROBLEM 9.1: Show that the martingale property remains valid if the filtrationis replaced by another filtration consisting of smaller σ-fields, provided that theprocess is still adapted.

For establishing the martingale property the property of having independent incre-ments plays an important role. Therefore Levy processes are natural candidates formartingales.

9.4 LEMMA. Let (Xt)t≥0 be a Levy process. Then the increments Xt − Xs of theLevy process (Xt)t≥0 are independent of the past FX

s .

PROOF: Let s1 ≤ s2 ≤ . . . ≤ sn ≤ s < t. Then the random variables

Xs1 , Xs2 −Xs1 , . . . , Xsn −Xsn−1 , Xt −Xs

are independent. By calculating partial sums it follows that even the random variablesXs1 , Xs2 , . . . , Xsn are independent of Xt − Xs. Since this is valid for any choice oftime points si ≤ s the independence assertion carries over to the whole past FX

t . 2

9.5 THEOREM. Let (Xt)t≥0 be an integrable Levy process such that E(X1) = µ. ThenMt = Xt − µt is a martingale with respect to its internal history.

PROOF: Since Mt − Ms is independent of Fs it follows that E(Mt − Ms|Fs) =E(Mt −Ms) = 0. Hence E(Mt|Fs) = E(Ms|Fs) = Ms. 2

It follows that any Wiener process (Wt) is a martingale. If (Nt) is a Poisson processwith parameter λ then Nt − λt is a martingale.

PROBLEM 9.2: Apply Theorem 9.5 to a compound Poisson process.

A nonlinear function of a martingale typically is not a martingale. But the next theoremis a first special case of a very general fact: It is sometimes possible to correct a processby a bounded variation process in such a way that the result is a martingale.

9.6 THEOREM. Let (Xt)t≥0 be a square integrable Levy process such that E(X1) = µand V (X1) = σ2 and let Mt = Xt − µt. Then the process M2

t − σ2t is a martingalewith respect to the internal history of the driving Levy process (Xt)t≥0.

Page 103: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

9.2. A TECHNICAL ISSUE: AUGMENTATION 93

PROOF: Note that

M2t −M2

s = (Mt −Ms)2 + 2Ms(Mt −Ms)

This gives

E(M2t −M2

s |Fs) = E((Mt −Ms)2|Fs) + 2E(Ms(Mt −Ms)|Fs) = t− s

2

PROBLEM 9.3: Apply Theorem 9.6 to the Wiener process, to the Poisson pro-cess and to compound Poisson processes.

9.7 THEOREM. Let (Wt) be a Wiener process. The process exp(aWt − a2t/2) is amartingale with respect to the internal history of the driving Wiener process (Wt)t≥0.

PROOF: Use eaWt = ea(Wt−Ws)eaWs to obtain

E(eaWt|Fs) = E(ea(Wt−Ws)) eaWs = ea2(t−s)/2eaWs

2

The processE(W )t := exp(Wt − t/2)

is called the exponential martingale of (Wt)t≥0.

9.2 A technical issue: Augmentation

For technical reasons which will become clear later the internal history of Wienerprocess sometimes is slightly too small. It is convenient to increase the σ-fields of theinternal history in a way that does not destroy the basic properties of the underlyingprocess. This procedure is called augmentation.

Roughly speaking, the augmentation process consists in two steps. First, the filtrationis slightly enlarged such that events are added to Ft which occur ”immediately” aftert. This makes the filtration right-continuous. Secondly, all negligible sets of F∞ areadded. This is a purely technical act insuring that the equivalence classes of stochasticprocesses (containing all negligible ”modifications”) contain sufficiently regular ver-sions.

9.8 DEFINITION. Let Ft+ :=⋂s>tFs and define

F t := F ∈ F∞ : P (F4G) = 0 for some G ∈ Ft+

Then (F t)t≥0 is the augmented filtration.

Page 104: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

94 CHAPTER 9. CONTINUOUS TIME MARTINGALES

PROBLEM 9.4: Show that the augmented filtration is really a filtration.

9.9 LEMMA. Let (Ft)t≥0 be a filtration. Then the augmented filtration is right-continuous, i.e.

F t =⋂s>t

F s

PROOF: It is clear that ⊆ holds. In order to prove ⊇ let F ∈⋂s>tF s. We have to

show that F ∈ F t.

For every n ∈ N there is Gn ∈ Ft+1/n such that P (F4Gn) = 0. Define

G :=∞⋂m=1

∞⋃n=m

Gn =∞⋂

m=K

∞⋃n=m

Gn ∈ Ft+1/K for all K ∈ N.

Then G ∈ Ft+ and P (G4F ) = 0. 2

One says that a filtration satisfies the ”usual conditions” if it is right-continuous andcontains all negligible sets of F∞. The internal history of the Wiener process does notsatisfy the usual conditions. However, every augmented filtration satisfies the usualconditions.

It is important that the augmentation process does not destroy the basic structural prop-erties. We show this fact at hand of Wiener processes. A similar argment applies togeneral Levy processes. It should be noted, however, that when we are dealing withpoint processes only, augmentation is not necessary.

9.10 THEOREM. Let (Wt)t≥0 be a Wiener process. Then the increments Wt −Ws areindependent of FW

s , s ≥ 0.

PROOF: (Outline) It is easy to see that

E(ea(Wt−Ws)|FWs+) = ea

2(t−s)/2

From problem ?? it follows that Wt −Ws is independent of FWs+. It is clear that this

carries over to F s. 2

Thus, 9.10 shows that every Wiener process has independent increments with respectto a filtration that satisfies the usual conditions. When we are dealing with a Wienerprocess we may suppose that the underlying filtration satisfies the usual conditions.

PROBLEM 9.5: Show that the assertions of 9.5, 9.6 and 9.7 are valid for theaugmented internal history of the Wiener process.

Let us illustrate the convenience of filtrations satisfying the usual conditions by a fur-ther result. For some results on stochastic integrals it will be an important point that

Page 105: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

9.3. STOPPING TIMES 95

martingales are cadlag. A general martingale need not be cadlag. We will show that amartingale has a cadlag modification if the filtration satisfies the usual conditions.

9.11 THEOREM. Let (Xt)t≥0 be a martingale w.r.t. a filtration satisfying the usualconditions. Then there is a cadlag modification of (Xt)t≥0.

PROOF: (Outline. Further reading: Karatzas-Shreve, [15], Chapter 1, Theorem 3.13.)

We begin with path properties which are readily at hand: There is a set A ∈ F∞,satisfying P (A) = 1 and such that the process (Xt)t∈Q (restricted to a rational timescale) has paths with right and left limits for every ω ∈ A. This is a consequenceof the basic convergence theorem for martingales. For further details see Karatzas-Shreve, [15], Chapter 1, Proposition 3.14, (i).

It is now our goal to modify the martingale in such a way that it becomes cadlag. Theidea is to define

X+t := lim

s↓t,s∈QXs, t ≥ 0.

on A and X+t := 0 elsewhere. It is easy to see that the paths of (X+

t )t≥0 are cadlag.Since (Ft)t≥0 satisfies the usual conditions it follows that (X+

t )t≥0 is adapted. We haveto show that (X+

t )t≥0 is a modification of (Xt)t≥0, i.e. Xt = X+t P -a.e. for all t ≥ 0.

Let sn ↓ t, (sn) ⊆ Q. Then Xsn = E(Xs1|Fsn) is uniformly integrable which implies

Xsn

L1

→ X+t . From Xt = E(Xsn|Ft) we obtain Xt = E(X+

t |Ft) = X+t P -a.e. 2

As a result we can say: Under the ”usual conditions” every martingale is cadlag.

9.3 Stopping times

Let (Xt)t≥0 be a cadlag adapted process such that X0 = 0 and for some a > 0 let

τ = inft ≥ 0 : Xt ≥ a

The random variable τ is called a first passage time: It is the time when the processhits the level a for the first time. By right continuity of the paths we have ( PROBLEM

9.6 )τ ≤ t ⇔ max

s≤tXs ≥ a (1)

Thus, we have (τ ≤ t) ∈ Ft for all t ≥ 0.

9.12 DEFINITION. A random variable τ : Ω → [0,∞] is called a stopping time if(τ ≤ t) ∈ Ft for all t ≥ 0.

Let τ be a stopping time. The intuitive meaning of (τ ≤ t) ∈ Ft is as follows: Atevery time t it can be decided whether τ ≤ t or not.

Page 106: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

96 CHAPTER 9. CONTINUOUS TIME MARTINGALES

Let σ, τ and τn be stopping times.PROBLEM 9.7: Show that τ is a stopping time iff (τ < t) ∈ Ft for every t ≥ 0.PROBLEM 9.8: Show that σ ∩ τ , σ ∪ τ and σ + τ are stopping times.PROBLEM 9.9: Show that τ +α for α ≥ 0 and λτ for λ ≥ 1 are stopping times.PROBLEM 9.10: Show that supn τn, infn τn are stopping times.PROBLEM 9.11: Show that every bounded stopping time τ is limit of a decreas-

ing sequence of bounded stopping times each of which has only finitely manyvalues.Hint: Let T = max τ . Define τn = k/2n whenever (k − 1)/2n < τ ≤ k/2n,k = 0, . . . , T2n.

Let (Mt)t≥0 be a martingale. In the discrete time case we were able to describe mar-tingales in terms of expectations of the stopped process. This carries over to the con-tinuous case.

9.13 THEOREM. Let (Mt)t≥0 be an integrable process with right-continuous paths.The the following assertions are equaivalent:(1) (Mt)t≥0 is a martingale.(2) For all bounded stopping times τ Mτ is measurable and integrable and we haveE(Mτ ) = E(M0).

PROOF: (1) ⇒ (2): Assume that τ ≤ T . Let τn ↓ τ where τn are boundedstopping times with finitely many values. Then it follows from the discrete version ofthe optional stopping theorem that E(Mτn) = E(M0). Clearly we have Mτn → Mτ .Since Mτn = E(MT |Fτn) it follows that the sequence (Mτn) is even mean-convergent.

(2) ⇒ (1): For s < t and F ∈ Fs define

τ :=

s whenever ω ∈ F,t whenever ω 6∈ F.

Then τ is a stopping time. From E(Mt) = E(Mτ ) the assertion follows. 2

This characterization of martingales has important and interesting consequences formany applications. We indicate some these in the next section.

The interplay between stopping times and adapted processes is at the core of stochasticanalysis. In the rest of this section we try to provide some information for reasons oflater reference. Throughout the section we assume tacitely that the filtration satisfiesthe usual conditions.

9.3.1 Hitting times

Let (Xt)t≥0 be a process adapted to a filtration (Ft)t≥0 and let A ⊆ R. Define

τA = inft : Xt ∈ A

Page 107: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

9.3. STOPPING TIMES 97

Then τA is called the hitting time of the set A. First passage are hitting tmes ofintervals.

For which sets A are hitting times stopping times ?

9.14 REMARK. The question, for which sets A a hitting time τA is a stoppingtime, is completely solved. The solution is as follows.

We may assume that P |F∞ is complete, i.e. that all subsets of negligible setsare added to F∞. The whole theory developed so far is not affected by such acompletion. We could assume from the beginning that our probability space iscomplete. The reason why we did not mention this issue is simple: We did notneed completeness so far.

However, the most general solution of the hitting time problem needs complete-ness. The following is true: If P |F∞ is complete and if the filtration satisfies theusual conditions then the hitting time of every Borel set is a stopping time. Forfurther comments see Jacod-Shiryaev, [14], Chapter I, 1.27 ff.

For particular cases the stopping time property of hitting times is easy to prove.

9.15 THEOREM. Assume that (Xt)t≥0 has right-continuous paths and is adapted to afiltration which satisfies the usual conditions.(a) Then τA is a stopping time for every open set A.(b) If (Xt)t≥0 has continuous paths then τA is a stopping time for every closed set A.

PROOF: (a) Note that

τ < t ⇔ Xs ∈ A for some s < t

Since A is open and (Xt)t≥0 has right-continuous paths it follows that

τ < t ⇔ Xs ∈ A for some s < t, s ∈ Q

(b) Let A be closed and let (An) be open neighbourhoods of A such that An ↓ A.Define τ := limn→∞ τAn ≤ τA which exists since τAn ↑. We will show that τ = τA.

Since τAn ≤ τA we have τ ≤ τA. By continuity of paths we have XτAn→ Xτ

whenever τ < ∞. Since XτAn∈ An it follows that Xτ ∈ A whenever τ < ∞. This

implies τA ≤ τ . 2

9.3.2 The optional stopping theorem

We need a notion of the past of a stopping time.

PROBLEM 9.12: A stochastic interval is an interval whose boundaries are stop-ping times. Show that the indicators of stochastic intervals are adapted processes.Hint: Consider 1(τ,∞) and 1[τ,∞).

Page 108: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

98 CHAPTER 9. CONTINUOUS TIME MARTINGALES

PROBLEM 9.13: Let τ be a stopping time and let F ⊆ Ω. Show that the process1F 1[0,τ) is adapted iff F ∩ (τ ≤ t) ∈ Ft for all t ≥ 0.PROBLEM 9.14: Let τ be a stopping time and define Fτ := F : F ∩ (τ ≤ t) ∈Ft, t ≥ 0. Show that Fτ is a σ-field.

9.16 DEFINITION. Let τ be a stopping time. The σ-field Fτ is called the past of τ .

The intuitive meaning of the past of a stopping time is as follows: An event F is in thepast of τ if at every time t the occurrence of F can be decided provided that τ ≤ t.Many of the subsequent assertions can be understood intuitively if this interpretationis kept in mind.

Let σ and τ be stopping times.PROBLEM 9.15: If σ ≤ τ then Fσ ⊆ Fτ .PROBLEM 9.16: Fσ∩τ = Fσ ∩ Fτ .PROBLEM 9.17: The sets (σ < τ), (σ ≤ τ) and (σ = τ) are in Fσ ∩ Fτ .

Hint: Start with proving (σ < τ) ∈ Fτ and (σ ≤ τ) ∈ Fτ .PROBLEM 9.18: Show that every stopping time σ is Fσ-measurable.PROBLEM 9.19: Let τn ↓ τ . Show that Fτ =

⋂∞n=1Fτn .

There is a fundamental rule for iterated conditional expectations with respect to pastsof stopping times.

9.17 THEOREM. Let Z be an integrable or nonnegative random variable and let σ andτ be stopping times. Then

E(E(Z|Fσ)|Fτ ) = E(Z|Fσ∩τ )

PROOF: The proof is bit tedious and therefore many textbooks pose it as exerciseproblem (see Karatzas-Shreve, [15], Chapter 1, 2.17). Let us give more detailed hints.

We have to start with showing that

F ∩ (σ < τ) ∈ Fσ∩τ and F ∩ (σ ≤ τ) ∈ Fσ∩τ whenever F ∈ Fσ

Note that the nontrivial part is to show ∈ Fτ . The trick is to observe that on (σ ≤ τ)we have (τ ≤ t) = (τ ≤ t) ∩ (σ ≤ t).

The second step is based on the first step and consists in showing that

1(σ≤τ)E(Z|Fσ) = 1(σ≤τ)E(Z|Fσ∩τ ) (2)

Finally, we prove the assertion separately on (σ ≤ τ) and (σ ≥ τ). For case 1 weapply (2) to the inner conditional expectation. For case 2 we apply (2) to the outerconditional expectation (interchanging the roles of σ and τ ). 2

Page 109: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

9.3. STOPPING TIMES 99

We arrive at the most important result on stopping times and martingales. A prelimi-nary technical problem is whether an adapted process stopped at σ is Fσ-measurable.Intuitively, this should be true.

9.18 MEASURABILITY OF STOPPED PROCESSES.Let (Xt)t≥0 be an adapted process and σ a stopping time. We ask whetherXσ1(σ<∞) is Fσ-measurable.

It is easy to prove the assertion for right-continuous processes with the help of9.11. This would be sufficient for the optional stopping theorem below. However,for stochastic integration we want to be sure that the assertion is also valid forleft-continuous processes. This can be shown in the following way.

Define

Xnt := n

∫ t

0Xse

n(s−t)ds

Then (Xnt )t≥0 are continuous adapted processes such that Xn

t → Xt providedthat (Xt)t≥0 has left-continuous paths. Since the assertion is true for (Xn

t ) itcarries over to (Xt).

9.19 OPTIONAL STOPPING THEOREM. Let (Mt)t≥0 be a right continuous martingale.If σ is a bounded stopping time and τ is any stopping time then

E(Mσ|Fτ ) = Mσ∩τ

PROOF: The proof is based on the following auxiliary assertion: Let τ be a boundedstopping time and let Mt := E(Z|Ft) for some integrable random variable Z. ThenMτ = E(Z|Fτ ).

Let τ be a stopping time with finitely many values t1 < t2 < . . . < tn. Then

Mtn −Mτ =n∑k=1

(Mtk −Mtk−1)1(τ≤tk−1)

(Prove it on (τ = tj−1)). It follows that E((Mtn − Mτ )1F ) = 0 for every F ∈ Fτ .This proves the auxiliary assertion for stopping times with finitely many values. Theextension to arbitrary bounded stopping times is done by 9.11.

Let T = sup σ. The assertion of the theorem follows from

E(Mσ|Fτ ) = E(E(MT |Fσ)|Fτ ) = E(MT |Fσ∩τ ) = Mσ∩τ .

2

Page 110: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

100 CHAPTER 9. CONTINUOUS TIME MARTINGALES

We finish this section with two consequences of the optional stopping theorem whichare fundamental for stochastic integration.

9.20 COROLLARY. Let τ be any stopping time. If (Mt)t≥0 is a martingale then(Mτ∩t)t≥0 is a martingale, too.

The proof is PROBLEM 9.20 .

9.21 COROLLARY. Let (Mt)t≥0 be a martingale. Let σ ≤ τ be stopping times and letZ be Fσ-measurable and bounded. Then Z(Mτ∩t −Mσ∩t) is a martingale, too.

The proof is PROBLEM 9.21 (Hint: Apply 9.13).

9.4 Application: First passage times of the Wiener pro-cess

As an application of the optional stopping theorem we derive the distribution of firstpassage times of the Wiener process.

9.4.1 One-sided boundaries

9.22 THEOREM. Let (Wt)t≥0 be a Wiener process and for a > 0 and b ∈ R define

τa,b := inft : Wt ≥ a + bt

Then we haveE(e−λτa,b1(τa,b<∞)) = e−a(b+

√b2+2λ), λ ≥ 0

PROOF: (Outline. Details are PROBLEM 9.22 .) Applying the optional stoppingtheorem to the exponential martingale of the Wiener process we get

E(eθWτ−θ2τ/2) = 1

for every θ ∈ R and every bounded stopping time τ . Therefore this equation is true forτn := τa,b ∩ n for every n ∈ N. We note that (use 8.23)

eθWτn−θ2τn/2 P→ eθWτa,b−θ2τa,b/21(τa,b<∞)

Applying the dominated convergence theorem it follows (at least for sufficiently largeθ) that

E(eθWτa,b−θ2τa,b/21(τa,b<∞)) = 1

Page 111: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

9.4. APPLICATION: FIRST PASSAGE TIMES OF THE WIENER PROCESS 101

The rest are easy computations. Since Wτa,b= a + bτa,b we get

E(e(θb−θ2/2)τa,b1(τa,b<∞)) = e−ab

Putting λ := −θb + θ2/2 proves the assertion. 2

In the following two problems treat the cases b > 0, b = 0 and b < 0 separately.PROBLEM 9.23: Find P (τa,b <∞).PROBLEM 9.24: Find E(τa,b).PROBLEM 9.25: Does the assertion of the optional sampling theorem hold for

the martingale (Wt)t≥0 and τa,b ?PROBLEM 9.26: Does the assertion of the optional sampling theorem hold for

the martingale W 2t − t and τa,b ?

PROBLEM 9.27: Show that P (τ0,b = 0) = 1 for every b > 0. (ConsiderE(e−λτan,b) for an ↓ 0.) Give a verbal interpretation of this result.PROBLEM 9.28: Show that P (maxtWt = ∞, mintWt = −∞) = 1. Con-

clude from (a) that almost all paths of (Wt)t≥0 infinitely often cross every hori-zontal line.

From 9.22 we obtain the distribution of the first passage times.

9.23 COROLLARY. Let τa,b be defined as in 9.22. Then

P (τa,b ≤ t) = 1− Φ(a + bt√

t

)+ e−2abΦ

(−a + bt√t

), t ≥ 0

PROOF: Let G(t) := P (τa,b ≤ t) and let Fa,b(t) denote the right hand side of theasserted equation. We want to show that Fa,b(t) = G(t), t ≥ 0. For this we will applythe uniqueness of the Laplace transform. Note that 9.22 says that∫ ∞

0

e−λtdG(t) = e−a(b+√b2+2λ), t ≥ 0

Therefore, we have to show that∫ ∞

0

e−λtdFa,b(t) = e−a(b+√b2+2λ), t ≥ 0

This is done by the following simple calculations. First, it is shown that

Fa,b(t) =1√2π

∫ t

0

a

s3/2exp

(− a2

2s− b2s

2− ab

)ds

(This is done by calculating the derivatives of both sides.) Then it follows that

eab∫ t

0

e−λsdFa,b(s) = ea√b2+2λFa,

√b2+2λ(t)

Putting t = ∞ the assertion follows. (Details are PROBLEM 9.29 .) 2

PROBLEM 9.30: Find the distribution of maxs≤tWs.

Page 112: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

102 CHAPTER 9. CONTINUOUS TIME MARTINGALES

9.4.2 Two-sided boundaries

The following problems are concerned with first passage times for two horizontalboundaries. Let c, d > 0 and define

σc,d = inft : Wt 6∈ (−c, d)

PROBLEM 9.31: Show that σc,d is a stopping time.PROBLEM 9.32: Show that P (σc,d <∞) = 1.

For σc,d the application of the optional sampling theorem is straightforward since|Wt| ≤ maxc, d for t ≤ σc,d.

PROBLEM 9.33: Find the distribution of Wσc,d.

Hint: Note that E(Wσc,d) = 0 (why ?) and remember that Wσc,d

has only twodifferent values.Solution: P (Wσc,d

= −c) = dc+d , P (Wσc,d

= d) = cc+d

PROBLEM 9.34: Find E(σc,d).Hint: Note that E(W 2

σc,d) = E(σc,d).

Solution: E(σc,d) = cd.

The distribution of the stopping time σc,d is a more complicated story. It is easy to ob-tain the Laplace transforms. Obtaining probabilistic information requires much moreanalytical efforts.

9.24 DISTRIBUTION OF σc,d. For reasons of symmetry we have

A :=∫Wσc,d

=−ce−θ

2σc,d/2dP =∫Wσd,c

=ce−θ

2σd,c/2dP

and

B :=∫Wσc,d

=de−θ

2σc,d/2dP =∫Wσd,c

=−de−θ

2σd,c/2dP

From1 = E

(eθWσc,d

−θ2σc,d/2)

and 1 = E(eθWσd,c

−θ2σd,c/2)

we obtain a system of equations for A and B leading to

A =eθd − e−θd

eθ(c+d) − e−θ(c+d)and B =

eθc − e−θc

eθ(c+d) − e−θ(c+d)

This implies

E(e−λσc,d) =e−c

√2λ + e−d

√2λ

1 + e−(c+d)√

Page 113: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

9.4. APPLICATION: FIRST PASSAGE TIMES OF THE WIENER PROCESS 103

Expanding this into an infinite geometric series and applying∫ ∞

0e−λtdFa,0(t) = e−a

√2λ, t ≥ 0

we could obtain an infinite series expansion of the distribution of σc,d.(Further reading: Karatzas-Shreve [15], section 2.8.)

9.4.3 The reflection principle

Let (Wt)t≥0 be a Wiener process and let (Ft)t≥0 be its internal history.

Let s > 0 and consider the process Xt := Ws+t−Ws, t ≥ 0. Since the Wiener processhas independent increments the process (Xt)t≥0 is independent of Fs. Moreover, it iseasy to see that (Xt)t≥0 is a Wiener process. Let us give an intuitive interpretation ofthese facts.

Assume that we observe the Wiener process up to time s. Then we know the past Fsand the value Ws at time s. What about the future ? How will the process behave fort > s ? The future variation of the process after time s is give by (Xt)t≥0. From theremarks above it follows that the future variation is that of a Wiener process which isindependent of the past. The common formulation of this fact is: At every time s > 0the Wiener process starts afresh.

PROBLEM 9.35: Show that the process Xt := Ws+t −Ws, t ≥ 0 is a Wienerprocess for every s ≥ 0.

There is a simple consequence of the property of starting afresh at every time s. Notethat

Wt =

Wt whenever t ≤ sWs + (Wt −Ws) whenever t > s

Define the corresponding process reflected at time s by

Wt =

Wt whenever t ≤ sWs − (Wt −Ws) whenever t > s

Then it is clear that (Wt)t≥0 and (Wt)t≥0 have the same distribution. This assertionlooks rather harmless and self-evident. However, it becomes a powerful tool when itis extended to stopping times.

9.25 REFLECTION PRINCIPLE. Let τ be any stopping time and define

Wt =

Wt whenever t ≤ τWτ − (Wt −Wτ ) whenever t > τ

Then the distributions of (Wt)t≥0 and (Wt)t≥0 are equal.

Page 114: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

104 CHAPTER 9. CONTINUOUS TIME MARTINGALES

PROOF: Let us show that the single random variables Wt and Wt have equal distri-butions. Equality of the finite dimensional marginal distributions is shown in a similarmanner.

We have to show that for any bounded continuous function f we have E(f(Wt)) =

E(f(Wt)). For obvious reasons we need only show∫τ<t

f(Wt) dP =

∫τ<t

f(Wt) dP

which is equivalent to∫τ<t

f(Wτ + (Wt −Wτ )) dP =

∫τ<t

f(Wτ − (Wt −Wτ )) dP

The last equation is easily shown for stopping times with finitely many values. Thecommon approximation argument then proves the assertion. 2

PROBLEM 9.36: To get an idea of how the full proof of the reflection principleworks show E(f(Wt1 ,Wt2)) = E(f(Wt1 , Wt2)) for t1 < t2 and bounded con-tinuous f .Hint: Distinguish between τ < t1, t1 ≤ τ < t2 and t2 ≤ τ .

The reflection principle offers an easy way for obtaining information on first passagetimes.

9.26 THEOREM. Let Mt := maxs≤t Ws. Then

P (Mt ≥ y, Wt < y − x) = P (Wt > y + x), t > 0, y > 0, x ≥ 0

PROOF: Let τ := inft : Wt ≥ y and τ := inft : Wt ≥ y. Then

P (Mt ≥ y, Wt < y − x) = P (τ ≤ t,Wt < y − x)

= P (τ ≤ t, Wt < y − x)

= P (τ ≤ t,Wt > y + x)

= P (Wt > y + x)

2

PROBLEM 9.37: Use 9.26 to find the distribution of Mt.PROBLEM 9.38: Find P (Wt < z,Mt < y) when z < y, y > 0.

Page 115: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

9.5. THE MARKOV PROPERTY 105

9.5 The Markov property

We explain and discuss the Markov property at hand of the Wiener process. Similarassertions are valid for general Levy processes.

When we calculate conditional expectations given the past Fs of a stochastic process(Xt)t≥0 then from the general point of view conditional expectations E(Xt|Fs) areFs-measurable, i.e. they depend on any Xu, u ≤ s. But when we were dealing withspecial conditional expectations given the past of a Wiener process then we have gotformulas of the type

E(Wt|Fs) = Ws, E(W 2t |Fs) = W 2

s + (t− s), E(eaWt|Fs) = eaWs+a2(t−s)/2

These conditional expectations do not use the whole information available in Fs butonly the value Ws of the Wiener process at time s.

9.27 THEOREM. Let (Wt)t≥0 be a Wiener process and (Ft)t≥0 its internal history.Then for every P -integrable function Z which is σ(

⋃u≥sFu)-measurable we have

E(Z|Fs) = φ(Ws)

where φ is some measurable function.

PROOF: For the proof we only have to note that the system of functions

ea1Ws+h1+a2Ws+h2

+···+anWs+hn , hi ≥ 0, n ∈ N,

is total in L2(σ(⋃u≥sFu)). 2

PROBLEM 9.39: Under the assumptions of 9.27 show thatE(Z|Fs) = E(Z|Ws).

9.27 is the simplest and basic formulation of the Markov property. It is, however,illuminating to discuss more sophisticated versions of the Markov property.

Let us calculate E(f(Ws+t)|Fs) where f is bounded and measurable. We have

E(f(Ws+t)|Fs) = E(f(Ws + (Ws+t −Ws))|Fs)

Since Ws is Fs-measurable and Ws+t −Ws is independent of Fs we have

E(f(Ws+t)|Fs) = φ Ws where φ(ξ) = E(f(ξ + (Ws+t −Ws))) (3)

Roughly speaking, conditional expectations simply are expectations depending de-pending on a parameter slot where the present value of the process has to be pluggedin.

9.28 THEOREM. Let (Wt)t≥0 be a Wiener process and (Ft)t≥0 its internal history.Then the conditional distribution of (Ws+t)t≥0 given Fs is the same as the distributionof a process ξ + Wt where ξ = Ws and (Wt)t≥0 is any (other) Wiener process.

Page 116: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

106 CHAPTER 9. CONTINUOUS TIME MARTINGALES

PROOF: Extend (3) to functions of several variables. 2

9.28 contains that formulation which is known as the ordinary Markov property of theWiener process. It says that at every time point s the Wiener process starts afresh atthe state ξ = Ws as a new Wiener process forgetting everything what happened beforetime s.

It is a remarkable fact with far reaching consequences that the Markov property stillholds if time s is replaced by a stopping time. The essential preliminary step is thefollowing.

9.29 THEOREM. Let τ be any stopping time and define Q(F ) = P (F |τ < ∞),F ∈ F∞. Then the process

Xt := Wτ+t −Wτ , t ≥ 0,

is a Wiener process under Q which is independent of Fτ .

PROOF: (Outline. Details are PROBLEM 9.40 .) Let us show that∫F

f(Wτ+t −Wτ ) dP = P (F )E(f(Wt))

when F ⊆ (τ < ∞), F ∈ Fτ and f is any bounded continuous function. But this iscertainly true for stopping times with finitely many values. The common approxima-tion argument proves the equation. Noting that the equation holds for τ + s, s > 0,replacing τ , proves the assertion. 2

9.30 STRONG MARKOV PROPERTY.Let (Wt)t≥0 be a Wiener process and (Ft)t≥0 its internal history. Let σ be any stoppingtime. Then on (σ < ∞) the conditional distribution of (Wσ+t)t≥0 given Fσ is the sameas the distribution of a process ξ + Wt where ξ = Wσ and (Wt)t≥0 is some (other)Wiener process.

Further reading: Karatzas-Shreve [15], sections 2.5 and 2.6.

Page 117: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 10

The stochastic integral

10.1 Integrals along stochastic paths

Let (Xt)t≥0 be any cadlag (right-continuous with left limits) adapted process.

10.1 DEFINITION. The process (Xt)t≥0 is a process of finite variation on [0, T ] ifP -almost all paths are of bounded variation on [0, T ].

Every process with increasing paths is a finite variation process. Therefore the Pois-son process is a finite variation process. The Wiener process is not a finite variationprocess.

PROBLEM 10.1: Show that every compound Poisson process is a finite variationprocess.

If (Xt)t≥0 is a finite variation process then we may use the paths of the process fordefining Stieltjes integrals.

10.2 DEFINITION. Let (Xt)t≥0 be a process of finite variation on [0, T ] and (Ht)t≥0 bea caglad (left continuous with right limits) adapted process. Then the random variable∫ T

0

H dX : ω 7→∫ T

0

Hs(ω) dXs(ω)

is called the stochastic integral of (Ht)t≥0 with respect to (Xt)t≥0.

For the stochastic integral we have the following basic approximation result which

107

Page 118: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

108 CHAPTER 10. THE STOCHASTIC INTEGRAL

follows from 3.4.1.

10.3 THEOREM. Let (Xt)t≥0 be a process of finite variation on [0, T ] and (Ht)t≥0 be acaglad (left continuous with right limits) adapted process. Then for every Riemanniansequence of subdivisions of [0, T ]

kn∑i=1

Hsi−1(Xsi

−Xsi−1)P→

∫ T

0

HsdXs

Definition 10.2 only works for stochastic processes whose paths are of bounded vari-ation. It is important to extend the stochastic integral to a larger class of processeswhich contains e.g. the Wiener process, too. Since most properties of the Stieltjesintegral are consequences of approximations by Riemannian sequences the extensionshould be such that Theorem 10.3 remains true.

The following sections are devoted to this extension of the stochastic integral.

Let us make some historical remarks. The first extension of the stochastic integral toprocesses which are not of finite variation was concerned with the Wiener process. Fornon-stochastic integrands this integral was constructed by Norbert Wiener already inthe first half of the 20th century. The starting point of the general concept was the inte-gral of K. Ito about the middle of the 20th century. A key concept was the restriction ofthe integrands to non-anticipating adapted functions and the application of the at thattime new martingale concept by Doob. In the following decades a general theory ofstochastic integration has been established under the leadership of the French school ofprobability theory. The theory culminated in the notion of so-called semimartingalesbeing the most general class of processes that can be used for defining a stochasticintegral.

Originally, semimartingales were defined as processes which can be decomposed intoa sum of a martingale and a finite variation process. This definition reflects the factthat for both types of processes a stochastic integral can be defined either as a Stieltjesintegral or by the martingale construction due to Ito. Nevertheless, it seemed neces-sary to unify both the semimartingale concept and the construction of the stochasticintegral. Based on ideas of P. A. Meyer such a new approach has been presented byProtter, [19] and [20].

Our presentation follows the outline by Protter.

10.2 The integral of simple processes

In this section we collect some basic notation and facts which are easy to obtain.

Page 119: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

10.2. THE INTEGRAL OF SIMPLE PROCESSES 109

Let 0 ≤ t1 < t2 ≤ T . The most simple caglad process is of the form

Hs(ω) = a(ω)1(t1,t2](s) =

0 whenever s ≤ t1a(ω) whenever t1 < s ≤ t20 whenever t2 < s

In order to be adapted the process must satisfy Hs ∈ Fs for all s ≥ 0. This onlymatters for t1 < s ≤ t2 where a should be Fs-measurable for all s > t1. Imposing the”usual conditions” this means that a must be Ft1-measurable.

For such a simple process we define∫ T

0

HsdXs =

∫ T

0

H dX := a (Xt2 −Xt1)

This definition is identical to the definition of the Stieltjes-integral for each ω, i.e. foreach path of the underlying processes separately. Note, that the stochastic integral is arandom variable, i.e. it depends on ω ∈ Ω.

If 0 ≤ t ≤ T we define ∫ t

0

H dX =

∫ T

0

1(0,t]H dX

Since it is easy to see that1(0,t]1(t1,t2] = 1(t1∩t,t2∩t]

we have ∫ t

0

H dX = a (Xt2∩t −Xt1∩t)

Now the stochastic integral can be considered as a function of both ω and t, i.e. asstochastic process. Since X is assumed to be cadlag and adapted the stochastic integralis cadlag and adapted, too.

The next step is to consider sums.

10.4 DEFINITION. Let E0 be the set of simple processes defined on a deterministicsubdivision, i.e. processes of the form

Ht(ω) =n∑j=0

aj−1(ω)1(sj−1,sj ](t)

where 0 = s0 < s1 < . . . < sn = T is a subdivision and aj is Fsj-measurable for

every j.

It is clear that every process (Ht)t≥0 ∈ E0 is caglad and adapted. Again we may definethe integral pathwise by∫ t

0

H dX :=n∑j=1

aj−1(Xtj∩t −Xtj−1∩t)

Page 120: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

110 CHAPTER 10. THE STOCHASTIC INTEGRAL

Next let us adopt a more general view. For applications to financial markets we have toadmit that subdivisions are based on random times rather than on determinstic intervallimits. This leads us to the set of simple processes.

10.5 DEFINITION. Let E of simple processes, i.e. processes of the form

Ht(ω) =n∑j=0

aj−1(ω)1(σj−1,σj ](t) (1)

where 0 = σ0 < σ1 < . . . < σn = T is a subdivision of stopping times and aj isFσj

-measurable for every j.

Again it is obvious that the paths are caglad and from 9.12(b) we know that the pro-cesses in E are adapted.

For functions in E we may define the integral again pathwise, i.e. separately for eachω as a Stieltjes integral. This leads to the following definition:

∫ t

0

H dX :=

∫ T

0

1(0,t]H dX =n∑j=1

aj−1(Xσj∩t −Xσj−1∩t)

if H is defined by (1). Since for each single path this is an ordinary Stieltjes integralwe have immediately the properties:

∫ t

0

(αH1 + βH2) dX = α

∫ t

0

H1 dX + β

∫ t

0

H2 dX (2)∫ t

0

Hd(αX1 + βX2) = α

∫ t

0

H dX1 + β

∫ t

0

H dX2 (3)

For notational convenience we denote the process defined by the stochastic integralby H • X : t 7→

∫ t

0H dX . The preceding discussion shows that H • X is cadlag

and adapted if X is cadlag and adapted. Moreover, if X has continuous paths thenH • X has continuous paths, too. A third property is stated as a theorem in viewof its fundamental importance. Note that the boundedness assumption is required forintegrability. The martingale aspect of stochastic integration is pursued in chapter 13.

10.6 THEOREM. Let (Mt)t≥0 be a martingale and let H ∈ E be bounded. Then H •Mis a martingale.

PROOF: Apply 9.21. 2

Page 121: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

10.3. SEMIMARTINGALES 111

10.3 Semimartingales

Our next step is to extend the stochastic integral from simple processes H ∈ E to moregeneral integrands.

10.7 DEFINITION. Let L0 be the set of all adapted processes with caglad paths.

We want to construct the stochastic integral in such a way that for every process H ∈L0 the integral is well-defined.

Let us talk about the difficulties arising with a naive approach to this problem.

For every process H ∈ L0 we have

limn→∞

kn∑i=1

Hti−1(ω)1(ti−1,ti](s) → Hs(ω), s ∈ [0, T ], ω ∈ Ω, (4)

for every Riemannian subdivision of [0, T ]. Therefore it tempting to try to define thestochastic integral as ∫ T

0

H dX := limn→∞

kn∑i=1

Hti−1(Xti −Xti−1

) (5)

But such a definition only works if the limits on the right hand side exist and areindependent of the underlying sequence of Riemannian subdivisions.

We know that this is the case if the process (Xt)t≥0 is an (adapted cadlag) finite vari-ation process. However, as counterexamples show, this is not the case for completelygeneral adapted cadlag processes. Therefore the question arises how to restrict theclass of processes (Xt)t≥0 ?

The answer to this question is the notion of a semimartingale.

10.8 DEFINITION. A cadlag adapted process (Xt)t≥0 is a semimartingale if forevery sequence (Hn) of simple processes the following condition holds:

sups≤T

|Hns | → 0 ⇒ sup

t≤T

∣∣∣ ∫ t

0

Hns dZs

∣∣∣ P→ 0

The set of all semimartingales is denoted by S.

Thus, semimartingales are defined by a ”continuity property”: If a sequence of simpleprocesses converges to zero uniformly on compact time intervals then the stochasticintegrals of that sequence converge to zero, too.

As amatter of fact for deterministic processes (not depending on ω ∈ Ω) this continuityproperty is equivalent to bounded variation. However, for a stochastic process thecontinuity property does not imply that (Zt) is a finite variation process.

Page 122: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

112 CHAPTER 10. THE STOCHASTIC INTEGRAL

It will turn out that a reasonable extension process of the stochastic integral can becarried out for integrator processes which are semimartingales. It is therefore impor-tant to get an overview over typical processes that are semimartingales. From (??) itfollows that concept of semimartingales covers adapted cadlag processes with paths ofbounded variation. The following result opens the door to stochastic processes like theWiener process.

10.9 THEOREM. Every square integrable cadlag martingale (Mt)t≥0 is a semimartin-gale.

PROOF: Let (Hn) be a sequence in E such that ||Hn||u → 0. Since∫ t

0HndM is a

martingale we have by the maximal inequality

P(

sups≤t

∣∣∣ ∫ s

0

HndM∣∣∣ > a

)≤ 1

a2E

(( ∫ t

0

HndM)2)

For convenience let Mj := Mσnj ∩t. We have

E(( ∫ t

0

HndM)2)

= E(( n∑

j=1

aj−1(Mj −Mj−1))2)

= E( n∑j=1

a2j−1(Mj −Mj−1)

2)

≤ ||Hn||2uE( n∑j=1

(Mj −Mj−1)2)

= ||Hn||2uE( n∑j=1

(M2j −M2

j−1))≤ ||Hn||2uE(M2

t )

2

It follows that the Wiener process is a semimartingale.

The set of semimartingales is a very convenient set to work with.

10.10 THEOREM.(a) The set of semimartingales is a vector space.(b) If X ∈ S then for every stopping time the stopped process Xτ := (Xτ∩t)t≥0 is asemimartingale.(c) Let τn ↑ ∞ be a sequence of stopping times such that Xτn ∈ S for every n ∈ N.Then X ∈ S .

The proof is PROBLEM 10.2 . (Hint for part (c): Note that (Xt 6= Xτnt ) ⊆ (τn < t).)

PROBLEM 10.3: Let (Wt) be a Wiener process. Show that (W 2t )t≥0 is a semi-

martingale.

Page 123: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

10.4. EXTENDING THE STOCHASTIC INTEGRAL 113

PROBLEM 10.4: Show that every cadlag martingale (Mt)t≥0 with continuouspaths is a semimartingale.Hint: Let τn = inft : |Mt| ≥ n and show that M τn is a square integrablemartingale for very n ∈ N.

Summing up, we have shown that every cadlag process which is a sum of a continuousmartingale and an adapted process with paths of bounded variation is a semimartingale.

Actually every cadlag martingale is a semimartingale. See Jacod-Shiryaev, [14], Chap-ter I, 4.17.

10.4 Extending the stochastic integral

The extension of the stochastic integral from E to L0 is based on the fact that everyprocess inL0 can be approximated by processes in E converging uniformly on compacttime intervals.

In short, the procedure is as follows. Let X be a semimartingale and let H ∈ L0.Consider some sequence (Hn) in E such that Hn → H uniformly on compact timeintervals and define ∫ T

0

H dX := limn→∞

∫ T

0

Hn dX (6)

However, in order to make sure that such a definition makes sense one has to considerseveral mathematical issues. For the interested reader let us collect some of the details.

10.11 DISCUSSION. The main points of definition (6) are existence and unique-ness of the limit. Let X ∈ S and H ∈ L0. We follow Protter, [20], chapter II,section 4.

(1) One can always find a sequence (Hn) ⊆ E such that

sups≤T

|Hns −Hs|

P→ 0

Note, that such an approximation is in general not available with deterministicRiemannian sequences of subdivisions. However, it is available with Riemanniansequences of subdivisions based on stopping times. Moreover, the constructionrequires right continuous filtrations. Therefore, in general, we require augmentedfiltrations.

(2) Semimartingales satisfy

(Hn) ⊆ E , sups≤T

|Hns |

P→ 0 ⇒ supt≤T

∣∣∣ ∫ t

0HndX

∣∣∣ P→ 0.

(This is slightly stronger than the defining property of semimartingales.)

Page 124: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

114 CHAPTER 10. THE STOCHASTIC INTEGRAL

(3) From (2) it follows that for every sequence (Hn) ⊆ E satisfying (1) the corre-sponding sequence of stochastic integrals

∫ T0 HndX is a Cauchy sequence with

respect to convergence in probability, uniformly on [0, T ]. Therefore there existsa process Y such that

supt≤T

∣∣∣ ∫ t

0HndX − Yt

∣∣∣ P→ 0.

(4) From (2) it follows that the limiting process Y does not depend on the se-quence (Hn).

(5) The type of convergence which is used for the extension procedure impliesthat the processes H •X are cadlag and adapted.

The preceding discussion shows that there is a well-defined stochastic integral process(H • X)t =

∫ T

0H dX whenever H ∈ L0 and X ∈ S. This process is adapted and

cadlag. The question of continuity is answered by the following exercise.

PROBLEM 10.5: Show that ∆(H •X)t = Ht∆Xt.Hint: Explain that the assertion is true for H ∈ E .PROBLEM 10.6: Let H ∈ L0 and X ∈ S. If X is continuous then H X is

continuous, too.

For deriving (understanding) the basic properties or rules of this stochastic integral weneed not go through all details of the mathematical construction. The reason is thatat the end of the mathematical construction it is proved that stochastic integrals canbe approximated by arbitrary Riemannian sequences, even with determinstic intervallimits. The underlying processes H ∈ L0 do not (necessarily) converge uniformly oncompact time intervals but only pointwise, but as long as we are dealing with semi-martingales we have convergence of the integrals.

10.12 THEOREM. Let X be a semimartingale and H ∈ L0. Assume that 0 = tn0 <tn1 < . . . < tnkn

= t is any Riemannian sequence of subdivisions of [0, t]. Then

∣∣∣ kn∑j=1

Htj−1(Xtj −Xtj−1

)−∫ t

0

H dX∣∣∣ P→ 0

PROOF: Protter, [20], Chapter II, Theorem 21. 2

Let us apply 10.12 for our first evaluation of a stochastic integral.

10.13 THEOREM. Let (Wt)t≥0 be a Wiener process. Then∫ t

0

WsdWs =1

2(W 2

t − t) (7)

Page 125: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

10.5. THE WIENER INTEGRAL 115

PROOF: Let 0 = t0 ≤ t1 ≤ . . . ≤ tn = t be an interval partition such that max |tj −tj−1| → 0 as n →∞. This implies by 10.12 that

n∑j=1

Wtj−1(Wtj −Wtj−1

)P→

∫ t

0

Ws dWs

On the other hand we have

W 2t =

n∑j=1

(W 2tj−W 2

tj−1) =

=n∑j=1

(Wtj −Wtj−1)2 + 2

n∑j=1

Wtj−1(Wtj −Wtj−1

)

We know thatn∑j=1

(Wtj −Wtj−1)2 P→ t

This proves the assertion. 2

It is clear that the linearity properties (2) remain valid for the stochastic integral withH ∈ L0. Define ∫ t

s

H dX :=

∫ t

0

1(s,∞)H dW

It is clear that the concatenation property∫ t

s

H dX =

∫ u

s

H dX +

∫ t

u

H dX

holds if s < u < t.

An important extension of ordinary linearity is homogeneity with respect to randomfactors.

PROBLEM 10.7: Let H ∈ L0 and X ∈ S. Show that∫ t

sZH dX = Z

∫ t

sH dX whenever Z ∈ Fs.

Hint: Consider H ∈ E0 and represent 1(s,t]H by a subdivision of (s, t].

10.5 The Wiener integral

Any stochastic integral with respect to the Wiener process is called an Ito-integral.The special case when the integrand is not random it is called a Wiener integral.

Page 126: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

116 CHAPTER 10. THE STOCHASTIC INTEGRAL

Let (Wt)t≥0 be a Wiener process. Let f =∑n

j=1 aj1(tj−1,tj ].PROBLEM 10.8: Show that the Wiener integral

∫ t0 f dW has a normal distribu-

tion with mean 0 and variance∫ t0 f

2(s)ds.PROBLEM 10.9: Show that f • W is a continuous process with independent

increments.

In order to extend these properties to arbitrary non-random f ∈ L0 we need the fol-lowing lemma.

10.14 LEMMA. Let f ∈ L0 be defined on [0, t] and non-random. Let (fn) be a se-quence of functions of the form

∑nj=1 f(tj−1)1(tj−1,tj ] based on a Riemannian sequence

of subdivisions. Then∫ t

0

(fn(s)− f(s))2ds → 0 and V( ∫ t

0

fndW −∫ t

0

f dW)→ 0

PROOF: From left-continuity of f it follows that fn → f pointwise. Since f isbounded on [0, t] the first convergence follows from Lebesgue’s dominated conver-gence theorem.

Thus, (fn) is a Cauchy sequence in L2([0, t]). Since

V( ∫ t

0

fndW −∫ t

0

fm dW)

=

∫ t

0

(fn(s)− fm(s))2ds

we obtain that∫ t

0fndW is a Cauchy sequence in L2(P ) and therefore has a limit Z with

respect to L2-convergence. However, we know that∫ t

0fndW has a limit in probability

which equals∫ t

0f dW . Since (by Cebysev’s inequality) the L2-limit is also the limit

in probability the second convergence follows. 2

10.15 THEOREM. Let f ∈ L0 be non-random. Then the process f W has thefollowing properties:(1)

∫ t

0f dW has a normal distribution with mean 0 and variance

∫ t

0f(s)2ds.

(2) The process has independent increments.(3) The process has continuous paths.

PROOF: Property (3) is a consequence of general assertions on stochastic integrals.Properties (1) and (2) carry over from step functions since the Fourier transforms ofthe joint distributions of increments converge by 10.14. 2

PROBLEM 10.10: Let f ∈ L0 be non-random and let Xt =∫ t0 f dW . Show

that (Xt)t≥0 is a square integrable martingale.PROBLEM 10.11: Let f ∈ L0 be non-random and let Xt =

∫ t0 f dW . Show

that(X2t −

∫ t0 f(s)2ds

)t≥0

is a square integrable martingale.

Page 127: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

10.5. THE WIENER INTEGRAL 117

PROBLEM 10.12: Let f ∈ L0 be non-random and let Xt =∫ t0 f dW . Show

that(eXt−

R t0 f(s)2ds/2

)t≥0

is a square integrable martingale.

Page 128: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

118 CHAPTER 10. THE STOCHASTIC INTEGRAL

Page 129: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 11

Stochastic calculus

There are three fundamental rules for calculations with the stochastic integral whichcorrespond to the three rules considered for Stieltjes integration:

(1) the associativity rule,

(2) the integration-by-parts formula,

(3) the chain rule (Ito’s formula)

11.1 The associativity rule

The associativity rule can be formulated briefly as follows.

Let H, G ∈ L0 and X ∈ S . Then

H • (G •X) = (HG) •X, in short: d(G •X) = G dX (1)

To state it a bit more explicitly:∫ t

0

H d(G •X) =

∫ t

0

HG dX, H ∈ L0.

And to say it in words: A stochastic integral whose integrator is itself a stochasticintegral G •X can be written as a stochastic integral with integrator X by multiplyingthe integrand by G.

11.1 ASSOCIATIVITY RULE.(1) Let X ∈ S and G ∈ L0. Then G •X is in S.(2) Let H ∈ L0. Then

∫ t

0H d(G •X) =

∫ t

0HG dX .

119

Page 130: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

120 CHAPTER 11. STOCHASTIC CALCULUS

PROOF: It is easy to see that for Hn ∈ E0 we have∫ t

0

Hn d(G •X) =

∫ t

0

HnG dX

( PROBLEM 11.1 ). If Hn → 0 in an appropriate sense this implies the semimartingaleproperty of G •X . If Hn → H in an appropriate sense the asserted equation follows.2

There is an important consequence of rule (1) which should be isolated.

11.2 TRUNCATION RULE. Let H ∈ L0 and X ∈ S . Then for any stopping time τ∫ t

0

1(0,τ ]H dX =

∫ t∩τ

0

H dX =

∫ t

0

H dXτ

Note, that the second expression means (H •X)t∩τ .

PROOF: Let us prove the truncation rule step by step. Details are PROBLEM 11.2 .

Let X ∈ S , H ∈ L0 and let τ be a stopping time. Then we have

1(0,τ ] •X = Xτ

(Hint: Apply the definition of the stochastic integral.) It follows that∫ T

0

1(0,τ ]H dX =

∫ T∩τ

0

H dX

and ∫ T

0

1(0,τ ]H dX =

∫ T

0

H dXτ

2

The next exercise is a non-trivial application of the truncation rule.

PROBLEM 11.3: Let X and Y be continuous semimartingales and τ a stoppingtime. Show that ∫ t

0XτdY =

∫ τ∩t

0X dY +Xτ∩t(Yt − Yτ∩t)

Hint: Split 1(0,t] = 1(0,τ∩t] + 1(τ∩t,t]. Show that for s ∈ (0, τ ∩ t] we haveXτs = Xs. Show that for s ∈ (τ ∩ t, t] we have Xτ

s = Xτt .

Page 131: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

11.2. QUADRATIC VARIATION AND THE INTEGRATION-BY-PARTS FORMULA121

11.2 Quadratic variation and the integration-by-partsformula

Recall the deterministic integration-by-parts formula for cadlag BV-functions:

f(t)g(t)− f(0)g(0) =

∫ t

0

f− dg +

∫ t

0

g− df +∑

0<s≤t

∆f(s)∆g(s)

There is a similar formula for arbitrary semimartingales. Note that a non-continuoussemimartingale (Xt) can only be used as an integrand of our stochastic integral if it isreplaced by its left-continuous version X− := (Xt−)t≥0.

11.3 DEFINITION. Let X and Y be semimartingales. Define

[X, Y ]t := XtYt −X0Y0 −∫ t

0

X− dY −∫ t

0

Y− dX, t ≥ 0.

This process is called the quadratic covariation of X and Y .

It is clear that [X, Y ] is well-defined and is a cadlag adapted process. If we knew that[X, Y ] is even a semimartingale then by the associativity rule we could write as∫ t

0

H d(XY ) =

∫ t

0

HX− dY +

∫ t

0

HY− dX +

∫ t

0

H d[X, Y ], H ∈ L0

or in shortd(XY ) = X−dY + Y−dX + d[X, Y ]

However, this only makes sense if [X, Y ] is actually a semimartingale. So let us havea closer look onto [X, Y ].

PROBLEM 11.4: Show that [X,Y ] is linear in both arguments.PROBLEM 11.5: Show that ∆[X,X]t = (∆X)2t .

11.4 THEOREM. Let X and Y be semimartingales. For every Riemannian sequenceof subdivisions of [0, t]

n∑j=1

(Xtj −Xtj−1)(Ytj − Ytj−1

)P→ [X, Y ]t, t ≥ 0.

PROOF: Note that for s < t

XtYt −XsYs = (Xt −Xs)(Yt − Ys) + Xs(Yt − Ys) + Ys(Xt −Xs)

Page 132: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

122 CHAPTER 11. STOCHASTIC CALCULUS

and apply it to

XtYt = X0Y0 +n∑j=1

(XtjYtj −Xtj−1Ytj−1

)

(Details are PROBLEM 11.6 .) 2

From 11.4 it follows that [X, X] =: [X] is the quadratic variation of X . This is anincreasing process, hence a FV-process and a semimartingale. Moreover, since

[X,Y ] =1

2([X + Y ]− [X − Y ])

also the quadratic covariation is a FV-process and a semimartingale.

PROBLEM 11.7: If X and Y are semimartingales then XY is a semimartingale,too.PROBLEM 11.8: Let X be a FV-process. Show that [X]t =

∑0<s≤t(∆X)2t .

PROBLEM 11.9: Let X be a continuous FV-process and Y, Z any semimartin-gales. Show that:(a) [X] = 0,(b) [X,Y ] = 0,(c) [X + Y, Z] = [Y, Z].Hint: For proving (b) apply the Cauchy-Schwarz inequality.

Next we ask for the quadratic variation process of a stochastic integral. The basicresult on this topic is prepared by a preliminary assertion. The intuitive meaning ofthis formula is clear: If the process X is stopped at τ then it is constant after τ andtherefore after τ there is no further contribution to the quadratic variation.

PROBLEM 11.10: LetX and Y be semimartingales and τ a stopping time. Showthat [Xτ , Y ] = [X,Y ]τ .Hint: This can be shown by combining the truncation lemma with the definitionof covariation. Apply 11.13.

11.5 THEOREM. Let X and Y be semimartingales and H ∈ L0. Then

[H •X, Y ] = H • [X,Y ]

PROOF: For details see Protter, [20], chapter II, Theorem 29. The idea is as follows.

From 11.10 we get for stopping times σ ≤ τ

[Xτ −Xσ, Y ] = [X, Y ]τ − [X, Y ]σ

Page 133: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

11.2. QUADRATIC VARIATION AND THE INTEGRATION-BY-PARTS FORMULA123

In other words the assertion is true for H = 1(σ,τ ]. Let Z be a random variable which isFσ measurable. Then using the explicit formula of 11.4 one can see that the assertionis even true for H = Z1(σ,τ ] and thus for all H ∈ E .

In order to apply the common induction argment for passing to L0 we need someinformation on the behaviour of [X, Y ] under convergence of semimartingales. This isobtained from 11.3 and 10.12. 2

The most important consequence of 11.5 is the quadratic variation of a stochastic inte-gral:

[H •X]t =

∫ t

0

H2d[X]

This means that the quadratic variation of a stochastic integral∫ t

0H dX can be written

as a Stieltjes integral with respect to the quadratic variation of X .

PROBLEM 11.11: Let X ∈ S. Use integration by parts for finding a formula fordX3.PROBLEM 11.12: Let X ∈ S be a continuous semimartingale. Show by induc-

tion that for k ≥ 2

dXk = kXk−1dX +k(k − 1)

2Xk−2d[X]

PROBLEM 11.13: Extend the preceding problem to arbitrary semimartingales.PROBLEM 11.14: Let (Wt) be a Wiener process and H ∈ L0. Calculate [H •W ].PROBLEM 11.15: Let (Wt) be a Wiener process andH ∈ L0. Show thatH •W

is a FV-process iff H ≡ 0.

11.6 DEFINITION. Let (Wt)t≥0 be a Wiener process and let a and b processes in L0.Then a process of the form

Xt = x0 +

∫ t

0

asds +

∫ t

0

bsdWs

is called an Ito-process.

Ito-processes are an important class of processes for many kinds of applications.

PROBLEM 11.16: Let

Xt = x0 +∫ t

0asds+

∫ t

0bsdWs

be an Ito-process.(a) Explain why X is a continuous semimartingale.(b) Show that X determines uniquely the processes a and b.

Page 134: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

124 CHAPTER 11. STOCHASTIC CALCULUS

(c) Calculate the quadratic variation of X .(d) Evaluate

∫ t0 H dX for H ∈ L0.

11.3 Ito’s formula

Now we turn to the most important and most powerful rule of stochastic analysis.It is the extension of the transformation formula for Stieltjes integrals to stochasticintegration.

11.7 ITO’S FORMULA. Let X ∈ S and let φ : R → R be twice differentiable withcontinuous derivatives.(1) If X is a continuous semimartingale then

φ(Xt) = φ(X0) +

∫ t

0

φ′(Xs) dXs +1

2

∫ t

0

φ′′(Xs) d[X]s

(2) If X is any semimartingale then

φ(Xt) = φ(X0) +

∫ t

0

φ′(Xs−) dXs +1

2

∫ t

0

φ′′(Xs−) d[X]cs

+∑

0<s≤t

(φ(Xs)− φ(Xs−)− φ′(Xs−)∆Xs

)Note, that for an FV-process (Xt) Ito’s formula equals the transformation formula forStieltjes integrals.

Our main applications will be concerned with continuous semimartingales. In this caseIto’s formula implies for H ∈ L0∫ t

0

H d(φ X) =

∫ t

0

H φ′(X) dX +1

2

∫ t

0

H φ′′(X) d[X]

Thus, in differential notation Ito’s formula for continuous semimartingales can be writ-ten as

d(φ X) = φ′(X) dX +1

2φ′′(X) d[X]

PROOF: Let us indicate the proof for continuous semimartingales. With integrationby parts it can be shown that for k ≥ 2

dXk = kXk−1dX +k(k − 1)

2Xk−2d[X]

(use an induction argument). This formula is identical to Ito’s formula for φ(x) = xk.Thus, Ito’s formula is true for powers and hence also for polynomials. Since smooth

Page 135: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

11.3. ITO’S FORMULA 125

functions can be approximated uniformly by polynomials in such a way that also thecorresponding derivatives are approximated the assertion follows. 2

PROBLEM 11.17: Prove part (2) of Ito’s formula for φ(x) = xk, k ≥ 2.PROBLEM 11.18: Let (Wt)t≥0 be a Wiener process. Calculate dW k

t , k ∈ N.PROBLEM 11.19: Let (Wt)t≥0 be a Wiener process. Calculate deαWt , α ∈ R.PROBLEM 11.20: Let X be an Ito-process. Calculate dXk

t , k ∈ N.PROBLEM 11.21: Let X be an Ito-process. Calculate deαXt , α ∈ R.

11.8 DEFINITION. Let X ∈ S be continuous. Then

E(X) = eX−[X]/2

is called the stochastic exponential of X .

PROBLEM 11.22: Let X ∈ S be continuous and Y := E(X). Show that

Yt = Y0 +∫ t

0YsdXs, in short: dY = Y dX

Hint: Let Z = X − [X]/2 and expand eZ by Ito’s formula.

There is a subtle point to discuss. Consider some positive continuous semimartingaleX and a function like φ(x) = log(x) or φ(x) = 1/x. Then we may consider φ(X)since it is well-defined and real-valued. But Ito’s formula cannot be applied in thatversion we have proved it. The reason for this difficulty is due to the fact that the rangeof X is not contained in a compact interval where φ can be approached uniformly bypolynomials.

PROBLEM 11.23: Let X be a positive continuous semimartingale.(a) Show that Ito’s formula holds for φ(x) = log(x) and for φ(x) = 1/x.Hint: Let τn = mint ≥ 0 : Xt ≥ 1/n. Apply Ito’s formula to Xτn and letn→∞.(b) Show that φ(X) is a semimartingale.PROBLEM 11.24: LetX be a continuous positive semimartingale. Find

∫ t0 1/Xk

s dXs,k ∈ N.Hint: Find dX1−k.PROBLEM 11.25: Show that every positive continuous semimartingale X can

be written as a stochastic exponential E(L).Hint: Expand logXt by Ito’s formula.

Page 136: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

126 CHAPTER 11. STOCHASTIC CALCULUS

11.9 THEOREM. (Ito’s formula, 2-dimensional case)Let X, Y ∈ S be continuous and let φ : R2 → R be twice differentiable with continu-ous derivatives. Then

φ(Xt, Yt) = φ(X0, Y0)

+

∫ t

0

φ′1(Xs, Ys) dXs + +

∫ t

0

φ′2(Xs, Ys) dYs

+1

2

∫ t

0

φ′′11(Xs, Ys) d[X]s +

∫ t

0

φ′′12(Xs, Ys) d[X, Y ]s +1

2

∫ t

0

φ′′22(Xs, Ys) d[Y ]s

PROOF: The assertion is true for polynomials (use integration by parts). Since smoothfunctions can be approximated uniformly by polynomials in such a way that also thecorresponding derivatives are approximated the assertion follows. 2

PROBLEM 11.26: State and explain Ito’s formula for φ(x, t).Hint: Apply 11.9 to Yt = t.

Page 137: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 12

Stochastic differential equations

12.1 Introduction

A (Wiener driven) stochastic differential equation is an equation of the form

dXt = b(t,Xt)dt + σ(t,Xt) dWt

where (Wt)t≥0 is a Wiener process and b(t, x) and σ(t, x) are given functions. Theproblem is to find a process (Xt)t≥0 that satisfies the equation. Such a process is thencalled a solution of the differential equation.

Note that the differential notation is only an abbreviation for the integral equation

Xt = x0 +

∫ t

0

b(s, Xs) ds +

∫ t

0

σ(s, Xs) dWs

There are three issues to be discussed for differential equations:

(1) Theoretical answers for existence and uniqueness of solutions.

(2) Finding analytical expressions for solutions.

(3) Calculating solutions by numerical methods.

We will focus on analytical expressions for important but easy special cases. However,let us indicate some issues which are important from the theoretical point of view.

For stochastic differential equations even the concept of a solution is a subtle question.We have to distinguish between weak and strong solutions, even between weak andstrong uniqueness. It is not within the scope of this text to give precise definitions ofthese notions. But the idea can be described in an intuitive way.

127

Page 138: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

128 CHAPTER 12. STOCHASTIC DIFFERENTIAL EQUATIONS

A strong solution is a solution where the driving Wiener process (and the underlyingprobability space) is fixed in advance and the solution (Xt)t≥0 is a function of thisgiven driving Wiener process. A weak solution is an answer to the question: Doesthere exist a probability space where a process (Xt)t≥0 and a Wiener process (Wt)t≥0

exist such that the differential equation holds ?

When we derive analytical expressions for solutions we will derive strong solutions.In particular for linear differential equations (to be defined below) complete formulasfor strong solutions are available.

There is a general theory giving sufficient conditions for existence and uniqueness ofnon-exploding strong solutions. Both the proofs and the assertions of this theory arequite similar to the classical theory of ordinary differential equations. We refer toHunt-Kennedy [12] and Karatzas-Shreve [15].

Let us introduce some terminology.

Any stochastic differential equation is time homogeneous if b(t, x) = b(x) and σ(t, x) =σ(x). A linear differential equation is of the form

dXt = (a0(t) + a1(t)Xt)dt + (σ0(t) + σ1(t)Xt)dWt

It is a homogeneous linear differential equation if a0(t) = σ0(t) = 0.

The simplest homogeneous case is

dXt = µXtdt + σXtdWt

which corresponds to the Black Scholes model. The constant σ is called the volatilityof the model. There are plenty of linear differential equations used in the theory ofstochastic interest rates. If (Bt) denotes a process that is a model for a bank accountwith stochastic interest rate then

rt :=B′t

Bt

⇔ Bt = B0eR t0 rsds

is called the short rate. Popular short rate models are the Vasicek model

drt = a(b− rt)dt + σdWt

and the Hull-White model

drt = (θ(t)− a(t)rt)dt + σ(t)dWt

12.2 The abstract linear equation

Let Y and Z be any continuous semimartingales. The abstract homogeneous linearequation is

dXt = XtdYt

Page 139: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

12.2. THE ABSTRACT LINEAR EQUATION 129

and its solution is known to us as

Xt = x0eYt−[Y ]t/2 = x0E(Yt)

This is the recipe to solve any homogeneous linear stochastic differential equation.There is nothing more to say about it at the moment.

PROBLEM 12.1: Solve dXt = a(t)Xtdt+ σ(t)XtdWt.

Things become more interesting when we turn to the general inhomogeneous equation

dXt = XtdYt + dZt

There is an explicit expression for the solution but it is much more illuminating tomemorize the approach how to arrive there.

The idea is to write the equation as

dXt −XtdYt = dZt

and to find an integrating factor that transforms the left hand side into a total differen-tial.

Let dAt = AtdYt and multiply the equation by 1/At giving

1

At

dXt −Xt

At

dYt =1

At

dZt (1)

Note thatd

1

At

= − 1

At

dYt +1

At

d[Y ]t

Then

d( 1

At

Xt

)=

1

At

dXt + Xtd1

At

+ d[ 1

At

, Xt

]=

1

At

dXt −Xt

At

dYt +Xt

At

d[Y ]t −1

At

d[Y,X]t

=1

At

dXt −Xt

At

dYt −1

At

d[Y, Z]t

Thus, the left hand side of (1) differs from a total differential by a known BV-function.We obtain

d( 1

At

Xt

)=

1

At

dZt −1

At

d[Y, Z]t

leading to

Xt = At

(x0 −

∫ t

0

1

As

d[Y, Z]s +

∫ t

0

1

As

dZs

)(2)

Note that the solution is particularly simple if either Y or Z are BV-processes.

PROBLEM 12.2: Fill in and explain all details of the derivation of (2).

Page 140: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

130 CHAPTER 12. STOCHASTIC DIFFERENTIAL EQUATIONS

12.3 Wiener driven models

The Vasicek model isdXt = (ν − µXt)dt + σdWt

For ν = 0 the solution is called the Ornstein-Uhlenbeck process.

The Vasicek model is a special case of the inhomogeneous linear equation which isclear by setting

dYt = −µdt and dZt = νdt + σdWt

Therefore the integrating factor is At = e−µt and the solution is obtained as in the caseof an ordinary linear differential equation.

PROBLEM 12.3: Show that the solution of the Vasicek equation is

Xt = e−µtx0 +ν

µ(1− e−µt) + σ

∫ t

0e−µ(t−s)dWs

PROBLEM 12.4: Derive the following properties of the Vasicek model:(a) The process (Xt)t≥0 is a Gaussian process (i.e. all joint distribution are normaldistributions).(b) Find E(Xt) and limt→∞E(Xt).(c) Find V (Xt) and limt→∞ V (Xt).(d) Find Cov(Xt, Xt+h) and limt→∞ Cov(Xt, Xt+h).PROBLEM 12.5: Let X0 ∼ N

(νµ ,

σ2

). Explore the mean and covariance struc-

ture of a Vasicek model starting with X0.

Let us turn to models that are not time homogneous.

PROBLEM 12.6: The Brownian bridge:(a) Find the solution of

dXt = − 11− t

Xtdt+ dWt, 0 ≤ t < 1.

(b) Show that (Xt)t≥0 is a Gaussian process. Find the mean and the covariancestructure.(c) Show that Xt → 0 if t→ 1.PROBLEM 12.7: Find the solution of the Hull-White model:

dXt = (θ(t)− a(t)Xt)dt+ σ(t)dWt

Finally, let us consider a nonlinear model.

Page 141: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

12.3. WIENER DRIVEN MODELS 131

PROBLEM 12.8: Let Zt = E(µt+ σWt).(a) For a > 0 find the differential equation of

Xt :=Zt

1 + a∫ t0 Zsds

(b) What about a < 0 ?

Page 142: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

132 CHAPTER 12. STOCHASTIC DIFFERENTIAL EQUATIONS

Page 143: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 13

Martingales and stochastic integrals

Let (Mt)t≥0 be a martingale. We would like to know for which H ∈ L0

H •M : t 7→∫ t

0

H dM

is a martingale, too. This is a reasonable question for several reasons. First of all, whenwe were dealing with stochastic sequences then we observed that gambling strategiesdo not change the martingale properties of the underlying stochastic sequence. Thevalues of such gambling strategies are of the same structure as stochastic integrals withsimple integrands. This observation carries over to the continuous case as is shown intheorem 10.6.

A general version of this assertion is a delicate matter. It turns out that dealing withmartingale properties of stochastic integrals the notion of an ordinary martingale is notthe right concept.

13.1 Locally square integrable martingales

We start with a preliminary assertion concerning the martingale property of stochasticintegrals with respect to square integrable martingales. This assertion extends theorem10.6 and is the basis of many further results. The proof is given at the end of thesection.

13.1 LEMMA. Let (Mt)t≥0 be a square integrable martingale. Then H •M is a squareintegrable martingale for every bounded H ∈ L0.

How can we extend this assertion to more general integrands H ∈ L0 ? The ideais to truncate the process H ∈ L0 by stopping times. Such a procedure works forleft-continuous processes, thus in particular for continuous processes.

133

Page 144: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

134 CHAPTER 13. MARTINGALES AND STOCHASTIC INTEGRALS

PROBLEM 13.1: Show that for every caglad process (Xt)t≤T there is a sequenceof stopping times τn such that |Xτn∩t| ≤ n and τn ↑ ∞.Hint: Let τn = inft : |Xt| ≥ n.PROBLEM 13.2: Discuss the question whether such a truncation procedure can

also be applied to cadlag processes with jumps.

Let (Mt) be a square integrable martingale and let H ∈ L0. Then for any stoppingtime τ we have (H •M)τ = Hτ •M . If we choose τ in such a way that Hτ is boundedthen it follows that (H •M)τ is a square integrable martingale.

As a result we see that even if H •M is not a martingale it can be stopped in such a waythat it becomes a martingale and this stopping procedure can be performed arbitrarilylate. This leads to the concept of local martingales.

13.2 DEFINITION.A process (Xt) is a local martingale if there exists a sequence of stopping timesτn ↑ ∞ such that Xτn is a martingale for every n ∈ N.A process (Xt) is a locally square integrable martingale if there exists a sequenceof stopping times τn ↑ ∞ such that Xτn is a square integrable martingale for everyn ∈ N.

The term „locally” is used in a very general and extensive way. Whenever a process issuch that it can be stopped by a sequence τn ↑ ∞ of stopping times in such a way thatthe stopped processes have a certain property then it is said that the process has thisproperty locally.

PROBLEM 13.3: Every caglad process is locally bounded.PROBLEM 13.4: Every continuous martingale is a locally square integrable mar-

tingale.PROBLEM 13.5: Every continuous local martingale is a locally square integrable

martingale.

Applying what we know so far leads us to the following assertion.

13.3 LEMMA. If M is a square integrable martingale and if H ∈ L0 then H •M is alocally square integrable martingale.

The proof is PROBLEM 13.6 .

It follows that stochastic integrals with respect to the Wiener process or the Poissonprocess are locally square integrable martingales.

However, stochastic integration is associative, i.e. we may consider stochastic integralswith respect to processes which themselves are stochastic integrals. These, in general,are not necessarily square integrable martingales.

Fortunately, even more is true. The next theorem shows that being a locally squareintegrable martingale is a property which is stable under stochastic integration. Since

Page 145: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

13.1. LOCALLY SQUARE INTEGRABLE MARTINGALES 135

it applies to continuous local martingales and to martingales with uniformly boundedjumps we have arrived at a very useful result.

13.4 THEOREM. Let M be a locally square integrable martingale. Then each stochas-tic integral H •M with H ∈ L0 is again a locally square integrable martingale.

PROOF: Let σn ↑ ∞ be a sequence of stopping times such that Mσn are squareintegrable martingales and let τn ↑ ∞ be a sequence of stopping times such that Hτn

are bounded. Then (H •M)σn∩τn are square integrable martingales. 2

PROBLEM 13.7: Let M be a locally square integrable martingale. Show thatM2 − [M ] is a locally square integrable martingale, too.Hint: Apply integration by parts.PROBLEM 13.8: Let M be a continuous local martingale. Then each stochastic

integral H •M with H ∈ L0 is again a continuous local martingale.

It can be shown that the assertion of Theorem 13.4 remains valid if continuity or squareintegrability is removed. But this result is beyond our scope.

We finish this section with proving Lemma 13.1.

PROOF: We show that E((H •M)2t ) < ∞ and E(

∫ t

0H dM) = 0.

Let 0 = t0 < t1 < . . . < tn = t be a the n-th element of a Riemannian sequence ofsubdivisions and define

Hn =n∑j=1

Htj−11(tj−1,tj ]

Then E(∫ t

0HndM) = 0 and

∫ t

0HndM

P→∫ t

0H dM . It remains to show that E((

∫ t

0HndM)2)

is bounded since this implies∫ t

0HndM

L1

→∫ t

0H dM .

For this, note that

E(( ∫ t

0

HndM)2)

= E(( n∑

j=1

Htj−1(Mtj −Mtj−1

))2)

=n∑j=1

E(H2tj−1

(Mtj −Mtj−1)2)

≤ Cn∑j=1

E((Mtj −Mtj−1)2) = C(E(M2

t )− E(M20 ))

2

PROBLEM 13.9: Why is it sufficient in the proof of 13.1 to showE(∫ t0 H dM) =

0 ?

Page 146: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

136 CHAPTER 13. MARTINGALES AND STOCHASTIC INTEGRALS

PROBLEM 13.10: Let XnP→ X and E(X2

n) ≤ C, n ∈ N. Show that ||Xn −X||1 → 0.PROBLEM 13.11: Explain why in the proof of 13.1

E([ n∑

j=1

Htj−1(Mtj −Mtj−1)]2)

=n∑j=1

E(H2tj−1

(Mtj −Mtj−1)2)

13.2 Square integrable martingales

Although being a locally square integrable martingale is a sufficiently strong martin-gale property for many purposes it is also important to know when the martingaleproperty holds without localization. In particular, we would like to know under whatcircumstances Ito-intgrales are martingales.

First, we note that Kolmogoroff’s inequality for martingales carries over from the dis-crete time case to the continuous time case.

13.5 LEMMA. Let M be a square integrable martingale. Then

E(supt≤T

M2t ) ≤ 4E(M2

T ) < ∞

This gives a more or less obvious criterion for the martingale property.

13.6 COROLLARY. A locally square integrable martingale M is a square integrablemartingale iff E(supt≤T M2

t ) < ∞.

PROOF: We have to show that E(Mτ∩t) = E(M0). This is certainly true for theprocesses stopped at a suitable localization. We can get rid of the stopping times byapplying the dominated convergence theorem. 2

PROBLEM 13.12: Every bounded local martingale is a martingale.

The preceding criterion is unconvenient to apply. It is much better to have a criterion interms of analytically tractable expressions like the quadratic variation. The followingtheorem contains a fundamental property of square integrable martingales which is afirst step into that direction.

13.7 THEOREM. For any square integrable martingale (Mt)t≥0 the process M2t − [M ]t

is a martingale. In particular, we have E(M2t ) = E([M ]t), t ≤ T .

PROOF: It is sufficent to prove the second part of the assertion. Applying the secondpart to stopped processes gives the martingale property.

Page 147: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

13.3. LEVY’S THEOREM 137

In view of problem 13.6 there is a sequence of stopping times τn ↑ ∞ be such thatM2

τn∩t − [M ]τn∩t are martingales which implies E(M2τn∩t) = E([M ]2τn∩t). By Kol-

mogoroff’s maximal inequality we obtain E(M2t ) = E([M ]t). 2

Note that 13.7 is known to us for the Wiener process. Thus, it is a generalization of afamiliar structure. Let us turn to some consequences of 13.7.

PROBLEM 13.13: Show that every continuous locally square integrable martin-gale of finite variation is necessarily constant.PROBLEM 13.14: Let (Mt)t≥0 be a locally square integrable martingale. If (At)

is a continuous adapted process of bounded variation such that M2t − At is a

martingale, then At = [M ]t.

Now we are in a position to give a sufficiently general criterion for the martingaleproperty of a local martingale.

13.8 THEOREM. A locally square integrable martingale is a square integrable martin-gale iff E([M ]T ) < ∞.

PROOF: Necessity follows from 13.7.

To prove sufficiency let (τn) be a localizing sequence and note that Kolmogoroff’smaximal inequality implies

E(supt≤T

M2τn∩t) ≤ 4E(M2

τn∩T ) = 4E([M ]τn∩T ) ≤ 4E([M ]T ).

Let n →∞, apply Levi’s theorem and Corollary 13.6. 2

13.9 COROLLARY. Let (Mt)t≥0 be a locally square integrable martingale and H ∈ L0.Then H •M is a square integrable martingale iff

E( ∫ T

0

H2sd[M ]s

)< ∞.

PROBLEM 13.15: Explain how 13.9 follows from the preceding assertions.PROBLEM 13.16: Discuss the martingale properties of Ito-integrals.

13.3 Levy’s theorem

Levy’s theorem is a far reaching characterization of the Wiener process. The remark-able fact is that the Wiener process is a local martingale which is uniquely determinedby its quadratic variation.

13.10 THEOREM. (Levy)Let (Mt) be a continuous local martingale. If [M ]t = t then (Mt) is a Wiener process.

Page 148: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

138 CHAPTER 13. MARTINGALES AND STOCHASTIC INTEGRALS

PROOF: Let

Zt = eiaMt +

a2

2t

Note that (Zt)t≤T is bounded. Moreover, as can be shown by a complex version ofIto’s formula, (Zt) satisfies the linear differential equation

Zt = 1 + ia

∫ t

0

Z dM

Hence, (Zt) is a local martingale and, since bounded, even a square integrable martin-gale. The martingale property implies

E(Zt|Fs) = Zs

which meansE

(eia(Mt−Ms

∣∣∣Fs) = e−a2

2(t−s)

Thus, the increments are independent and N(0, t− s)-distributed. 2

PROBLEM 13.17: Let (Xt) be a continuous process with independent incre-ments, E(Xt) = 0 and V (Xt) = t, t ≥ 0. Show that (Xt) is a Wiener process.PROBLEM 13.18: Let (Xt) be a continuous process with independent incre-

ments, E(Xt) = 0, and assume that g(t) := V (Xt) is continuous and strictlyincreasing. Show that there exists a Wiener process (Wt) such that Xt = Wg(t).PROBLEM 13.19: Let (Xt) be a continuous local martingale such that

[X]t =∫ t

0σ2(s) ds

where σ2(t) > 0 is continuous. Show that there exists a Wiener process (Wt)such that

Xt = X0 +∫ t

0σ(s) dWs

13.4 Martingale representation

Let (Wt)t≥0 be a Wiener process and let (Ft) the augmented internal history. We knowthat ∫ t

0

HsdWs, 0 ≤ t ≤ T,

is a square integrable martingale iff

E( ∫ T

0

H2sds

)< ∞.

Now, in this special case there is a remarkable converse: Each square integrable mar-tingale with respect to (Ft) arises in this way !

Page 149: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

13.4. MARTINGALE REPRESENTATION 139

The case of caglad processes

Actually, we have to be a bit more modest: If we confine ourselves (as we have doneso far) to H ∈ L0 (caglad adapted processes) then square integrable martingales canonly be approximated with arbitrary precision by stochastic integrals.

The martingale representation fact is a consequence of the following seemingly simplerassertion: Each random variable C ∈ L2(FT ) (each ”claim”) can be (approximately)written as a constant plus a stochastic integral (”hedged” by a self-financing strategy).

Let us introduce some simplifying terminology.

13.11 DEFINITION. A set C of random variables in L2(FT ) is called dense if forevery C ∈ L2(FT ) there is a sequence Cn ⊆ C such that E((Cn − C)2) → 0.A set C of random variables in L2(FT ) is called total if the linear hull of C is dense.

Thus, we want to prove

13.12 THEOREM. The set of all integrals a +∫ T

0H dW with a ∈ R, H ∈ L0 and

E(∫ T

0H2sds) < ∞ is dense in L2(FT ).

PROOF: The starting point is that FT is generated by (Ws)s≤T and therefore also by(eWs)s≤T . Therefore an obvious dense set consists of the functions

φ(eWs1 , eWs2 , . . . , eWsn ),

where φ is some continuous function with compact support and s1, s2, . . . , sn is somefinite subset of [0, T ]. Every continuous function can be approximated uniformly bypolynomials (Weierstrass’ theorem) and polynomials are linear combinations of pow-ers. Thus, we arrive at a total set consisting of

exp( n∑j=1

kjWsj

)which after reshuffling can be written as

exp( n∑j=1

aj−1(Wsj−Wsj−1

))

= exp( ∫ T

0

f(s) dWs

)(1)

for some bounded left-continuous (step) function f : [0, t] → R. It follows that the setof functions (differring from (1) by constant factors)

GT = exp( ∫ T

0

f(s) dWs −1

2

∫ T

0

f 2(s) ds)

is total when f varies in the set of all bounded left-continuous (step) functions f :[0, T ] → R.

Page 150: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

140 CHAPTER 13. MARTINGALES AND STOCHASTIC INTEGRALS

Recall that (Gt)t≤T is a square integrable martingale and satisfies

Gt = 1 +

∫ t

0

G d(f •W ) =

∫ t

0

Gsf(s)dWs

From 13.7 it follows that

E( ∫ T

0

G2sf

2(s) ds)

< ∞.

Therefore, the set of integrals

1 +

∫ t

0

HsdWs where H ∈ L0 and E( ∫ t

0

H2s ds

)< ∞

is total and by linearity of the integral the assertion follows. 2

How can be apply this result to martingale representation ?

Note, that for the proof of the preceding theorem we did not make use of the „usualconditions”. So at first, let (Wt) be a Wiener process with inner history (Ft). Assumethat (Mt) is a square integrable martingale w.r.t. (Ft). If there is a representation

MT = M0 +

∫ T

0

HsdWs,

then it follows from the martingale property of the stochastic integral that for everyt ≤ T

Mt = E(MT |Ft) = M0 + E( ∫ T

0

HsdWs

∣∣∣Ft) = M0 +

∫ t

0

HsdWs P -a.s.

However, this does not imply that the paths the processes are equal P -almost sure.This is the reason why we have to turn to the augemented filtration. Then we mayassume that (Mt) is cadlag and that the stochastic integral is continuous. In this wayit follows that the martingale (Mt) has continuous paths, too, and that both processesare indistinguishable.

Predictable processes

So far we have defined stochastic integrals with integrands that are adapted and left-continuous (with right limits). This class of processes is sufficiently large for appli-cations but not for theoretical purposes. A typical example where the restriction tocaglad processes is annoying is the martingale representation theorem.

The reason why we can only approximate martingales by stochastic integrals is thatthe class of caglad processes that we are presently using as integrands is too small. Wehave to enlarge the space of integrands.

Page 151: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

13.4. MARTINGALE REPRESENTATION 141

Now we will indicate how to enlarge the space of integrands. We will consider onlythe case of the Wiener process. The approach is the same for every square integrablemartingale.

Consider a time interval [0, T ] with finite horizon T . Stochastic processes (Hs) arefunctions

H : Ω× [0, T ] → R : (ω, s) 7→ Hs(ω)

The most simple example of an adapted caglad process is

Hs(ω) := 1F (ω)1(t1,t2](s) = 1F×(t1,t2](ω, s), F ∈ Ft1 .

This elementary process can be written as the indicator function of a ”predictable rect-angle” F × (t1, t2] ⊆ Ω× [0, T ], F ∈ Ft1 .

13.13 DEFINITION. The σ-field PT on Ω× [0, T ] which is generated by predictablerectangles F × (t1, t2], F ∈ Ft1 , (including Ω × 0), is the predictable σ-field. APT -measurable function H : Ω× [0, T ] → R is called a predictable process.

13.14 THEOREM. All processes in L0 (caglad and adapted) are predictable.

PROOF: Let Hs(ω) = a(ω)1(t1,t2](s) where a is Ft1-measurable. Since a is thelimit of linear combinations of indicators in Ft1 the process H is the limit of linearcombinations of indicators of predictable rectangles and thus predictable. It followsthat all processes in E0 are predictable and thus all processes in L0. 2

Our stochastic integral was defined for integrands H ∈ L0 which are predictable pro-cesses. It is our goal to extend the notion of the stochastic integral to predictableintegrands. This will be done by an approximation process in some suitably definedL2-space.

We start with defining the measure µ of the underlying measure space. On the pre-dictable σ-field PT we define the measure

µ(A) = E( ∫ T

0

1A(ω, s) ds), A ∈ PT . (2)

By measure theoretic induction this measure satisfies∫H2 dµ = E

( ∫ T

0

H2s ds

), H ∈ L2(Ω,PT , µ). (3)

From 4.17 it follows that the set of all linear combinations of predictable rectanglesis dense in L2(Ω × [0, T ],PT , µ). Since L0 contains this set it follows that the setL0 ∩ L2(Ω × [0, T ],PT , µ) is a dense subspace of L2(Ω × [0, T ],PT , µ). Thus, everysquare integrable predictable process can be approximated by caglad processes.

Page 152: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

142 CHAPTER 13. MARTINGALES AND STOCHASTIC INTEGRALS

Now it is clear how the extension of the stochastic integral form L0 to predictableprocesses will be carried out. For each H ∈ L2(Ω × [0, T ],PT , µ) we may find asequence Hn ∈ L0 ∩ L2(Ω× [0, T ],PT , µ) such that

limn→∞

∫(Hn −H)2 dµ = 0

Since we have

E([ ∫ T

0

(Hms −Hn

s ) dWs

]2)= E

( ∫ T

0

(Hms −Hm

s )2 ds)

=

∫(Hm −Hn)2 dµ

the stochastic integrals∫ T

0HndW converge in L2(FT , P ). Thus, we may define∫ T

0

H dW := limn→∞

∫ T

0

HndW

Essentially all properties of the stochastic integral on L0 carry over to the stochasticintegral for predictable integrands.

In particular the martingale representation theorem can be stated in a satisfactory way.

13.15 THEOREM. Let (Wt) be a Wiener process and (Ft) its augmented inner history.For every C ∈ L2(FT ) there exists a predictable process H ∈ L2(Ω × [0, T ],PT , µ)such that

C = E(C) +

∫ T

0

H dW

13.16 MARTINGALE REPRESENTATION THEOREM.Let (Wt) be a Wiener process and (Ft) its augmented inner history. For every squareintegrable martingale (Mt) there exists a predictable process H ∈ L2(Ω×[0, T ],PT , µ)such that

Mt = M0 +

∫ t

0

H dW, t ≤ T.

PROBLEM 13.20: Let (Wt) be a Wiener process and (Ft) its inner history.Show that for every continuous local martingale there exists a predictable processH such that∫ T

0H2s ds <∞ P -a.e. and Mt = M0 +

∫ t

0H dW, t ≤ T.

Page 153: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Chapter 14

Change of probability measures

14.1 Equivalent probability measures

14.1 DEFINITION. The probability measures P |F and Q|F are said to be equivalentif they are mutually absolutley continuous (P ∼ Q), i.e.

P (F ) = 0 ⇔ Q(F ) = 0 whenever F ∈ F

Obviously, we have P ∼ Q iff Q P and P Q. Therefore there exist the Radon-

Nikodym derivativesdQ

dPand

dP

dQ. The following two problems are contain general

properties of Radon-Nkodym derivatives.

PROBLEM 14.1: Let P ∼ Q. Show thatdP

dQ= 1

/dQ

dP.

Hint: Show that for all F ∈ FT∫F

(dPdQ

· dQdP

− 1)dP = 0

PROBLEM 14.2: Let Q P . Show that P ∼ Q iffdQ

dP> 0 P -a.s.

Hint: For proving ”⇐”, show that Q(F ) = 0 implies 1FdQ

dP= 0 P -a.s.

PROBLEM 14.3: Let Q P and (An) ⊆ F . Then P (An) → 0 impliesQ(An) → 0.Hint: Let ε > 0 and choose M such that∫

dQdP>M

dQ

dPdP < ε.

143

Page 154: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

144 CHAPTER 14. CHANGE OF PROBABILITY MEASURES

(Why is this possible ?) Let B =(dQdP

> M)

and split An = (An ∩B)∪ (An ∩Bc).

14.2 The exponential martingale

Let (Lt) be a continuous positive semimartingale on [0, T ]. We know that L can bewritten as a stochastic exponential:

dMt =1

Lt

dLt ⇔ dLt = LtdMt ⇔ Lt = L0eMt−[M ]t/2

The following assertion is basic.

14.2 THEOREM. Let L and M be continuous semimartingales such that

Lt = eMt − [M ]t/2

(i.e. L0 = 1). Then:(1) L is a local martingale iff M is a local martingale.(2) If L is a local martingale then it is a supermartingale and satisfies E(LT ) ≤ 1.(3) If L is a local martingale then it is a martingale iff E(LT ) = 1.

PROOF: Part (1) is true since stochastic integrals of contiuous local martingales arecontinuous local martingales.

As for part (2) first we show that a local martingale which is bounded from below isa supermartingale, i.e. E(Lt|Fs) ≤ Ls, s < t ≤ T . Let τn be a localizing sequence.Then we have E(Lτn

t |Fs) = Lτns , s < t ≤ T . Now, by Fatou’s lemma it follows that

E(Lt|Fs)) = E(limn

Lτnt |Fs) ≤ lim

nE(Lτn

t |Fs) = limn

Lτns = Ls

This is the supermartingale property.

It is now easy to see (next problem) that the martingale property is equivalent toE(LT ) = L0. 2

PROBLEM 14.4: Let (Lt)t≤T be a supermartingale which is bounded form be-low. Show that it is a martingale iff E(LT ) = L0.

14.3 Likelihood processes

Recall that we are working with a fixed probability space (Ω,F , P ) and a filtration(Ft)0≤t≤T . We assume that F = FT since FT is the largest σ-field which is needed tostudy processes adapted to (Ft)t≤T .

Page 155: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

14.3. LIKELIHOOD PROCESSES 145

We have a basic probability measure P |FT . Now, we consider a second probabiltymeasure Q|FT .

14.3 DEFINITION. Let P ∼ Q be two equivalent probability measures on FT . Then

Lt := E(dQ

dP

∣∣∣Ft), 0 ≤ t ≤ T,

is called the likelihood process of Q w.r.t. P .

Note that LT =dQ

dPsince P and Q are considered as probability measures on FT ,

which implies thatdQ

dPis FT -measurable. The random variables Lt of the likelihood

process have the property of being the Radon-Nikodym derivatives of Q and P re-stricted to Ft.

Let P |FT ∼ Q|FT .

PROBLEM 14.5: Show that Lt =dQ|FtdP |Ft

, t ≤ T.

PROBLEM 14.6: Let P |FT ∼ Q|FT . Show that the likelihood process (Lt)t≤Tis a positive martingale.Hint: For proving positivity note that P |Ft ∼ Q|Ft, t ≤ T .

Let Q|FT ∼ P |FT and let (Lt)t≤T be the likelihood process. Since the likelihoodprocess is a positive semimartingale it can be written as

Lt = E(M)t = eMt−[M ]t/2, where dMt =1

Lt

dLt.

Since the likelihood process is even a martingale it follows that E(LT ) = 1 and that(Mt)t≤T is a local martingale. But also the converse is true: Positive martingales canbe used to define equivalent probability measures.

PROBLEM 14.7: Let (Mt)t≤T be a continuous local martingale and let Lt =E(M)t. Assume that E(LT ) = 1 and define

Q := LTP ⇔ Q(F ) =∫FLTdP ; F ∈ FT .

Show that Q is a probability measure and that (Lt)t≤T is the likelihood processof Q w.r.t. P .

For calculation prposes we need a formula for the relation between conditional expec-tations w.r.t. Q and w.r.t. P . This is the so-called ”Bayes-formula”.

Page 156: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

146 CHAPTER 14. CHANGE OF PROBABILITY MEASURES

PROBLEM 14.8: Let (Ω,F , P ) be a probability space and let (Ft)t≥0 be a fil-tration. Let Q|FT ∼ P |FT be equivalent probability measures and let (Lt)t≤Tbe the likelihood process.Prove the Bayes-formula:

EQ(X|Fs) =EP (XLT |Fs)EP (LT |Fs)

whenever X is FT -measurable and X ≥ 0 or X ∈ L1(Q).

Note that if X is Ft-measurable, t ≤ T , then the Bayes-formula holds with LT re-placed by Lt.

14.4 Girsanov’s theorem

In financial mathematics an important and common method of pricing claims is tocalculate expectations under some probability measure Q which is different from P .Therefore we have to discuss some substantial features of such a change of measure.

The first problem is for warming up.

PROBLEM 14.9: Let (Wt)t≥0 be a Wiener process on a probability space (Ω,F , P )and (Ft)t≥0 its internal history. Define

Lt := eaWt−a2t/2, t ≤ T,

and let Q := LTP .(a) Show that Q|FT is equivalent to P |FT .(c) Show that Wt := Wt − at, t ≤ T , is a Wiener process under Q.Hint: Prove that for s < t

EQ

(eλ(fWt−fWs)

∣∣∣Fs) = eλ2(t−s)/2.

Now we turn to theory. The first assertion deals with the inheritance of the semimartin-gale property. Since the semimartingale property is concerned with convergence inprobability the assertion of 14.4 follows from problem 14.3.

14.4 THEOREM. Let Q|FT ∼ P |FT . If (Xt)t≤T is a semimartingale under P then itis a semimartingale under Q.

Which other properties of a stochastic process do not change under a measure change? It is clear that continuity of paths is left unchanged. A remarkable but plausible factis that the quadratic variation is invariant under a measure change.

14.5 THEOREM. Let Q|FT ∼ P |FT and let (Xt)t≤T be a continuous semimartingale.Then the quadratic variation under P coincides with the quadratic variation under Q.

Page 157: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

14.4. GIRSANOV’S THEOREM 147

PROBLEM 14.10: Explain how 14.4 and 14.5 follow from 14.3.

Next we consider the question how martingale properties are influenced by a changeof probability measures. The basic result in this direction is Girsanov’s theorem.

We begin with preliminary assertions.

14.6 LEMMA. Let Q ∼ P and let (Lt) be the likelihood process. Then a process(Xt) is a local Q-martingale iff (LtXt) is a local P -martingale.

PROOF: It is sufficient to prove the assertion: A process (Xt) is a Q-martingale iff(LtXt) is a P -martingale.

The Bayes formula implies

EQ(Xt|Fs) =EP (XtLt|Fs)EP (Lt|Fs)

=EP (XtLt|Fs)

Ls

which is equivalent toEQ(Xt|Fs)Ls = EP (XtLt|Fs)

HenceEQ(Xt|Fs) = Xs ⇔ EP (XtLt|Fs) = XsLs

2

14.7 THEOREM. (Girsanov)Let Q|FT ∼ P |FT with continuous likelihood process Lt = E(Zt), t ≤ T . If (Xt)t≤Tis a continuous local P -martingale then Xt− [X, Z]t, t ≤ T, is a local Q-martingale.

PROOF: In order to show that Xt− [X, Z]t is a local Q-a martingale we have to showthat Lt(Xt − [X, Z]t) is a local P -martingale. This is done by integration by parts.

The complete proof is PROBLEM 14.11 . 2

Actually Girsanov’s theorem provides an assertion on compensators. The compen-sator of a continuous semimartingale (Xt) is a continuous FV-process (At) such thatXt − At is a local martingale. It can be shown that a continuous compensator isuniquely determined.

If (Xt)t≤T is a continuous local P -martingale then its compensator under P is zero.On the other hand, we have

Xt = (Xt − [X, Z]t) + [X, Z]t, t ≤ T.

Girsanov’s theorem tells us that Xt − [X, Z]t is a local Q-martingale. Therefore([X, Z]t)t≤T is the compensator of (Xt)t≤T under Q.

Girsanov’s theorem is of extremely great practical importance. It provides a formulaof the compensator of a semimartingale after a change of measure.

Page 158: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

148 CHAPTER 14. CHANGE OF PROBABILITY MEASURES

PROBLEM 14.12: Let (Wt)t≤T be a Wiener process under P . Let Q be theprobability measure with likelihood process E(σW )t, σ > 0..(a) Find the compensator (At)t≤T of (Wt)t≤T under Q.(b) Explain why the compensated process (Wt −At)t≤T is a Wiener process.PROBLEM 14.13: Let (Wt)t≤T be a Wiener process and defineXt = at+σWt,t ≤ T . Find a martingale measure, i.e. a probability measureQ such that (Xt)t≤Tis a Q-martingale.PROBLEM 14.14: Let (Wt)t≤T be a Wiener process under P and let dXt =a(t)dWt where a(t) 6= 0 is continuous on [0, T ]. LetQ be the probability measurewith likelihood process E(σ •W )t where σ(t) > 0 is continuous on [0, T ].(a) Find the compensator (At)t≤T of (Xt)t≤T under Q.(b) Find the distribution of (Xt −At)t≤T under Q.PROBLEM 14.15: Let (Wt) be a Wiener process underP and let dXt = a(t)dt+σ(t)dW where a(t) and σ(t) > 0 are continuous on [0, T ]. Find a martingalemeasure, i.e. a probability measure Q such that (Xt) is a Q-martingale.

The following problem shows that sometimes a change of measure can make the driftterm of a stochastic differential equation to zero.

PROBLEM 14.16: Let (Wt)t≤T be a Wiener process under P and let (Xt)t≤Tbe a solution of the stochastic differential equation

dXt = b(Xt, t)dt+ σ(Xt, t)dWt, σ(x, t) > 0.

Define

dZt = − b(Xt, t)σ(Xt, t)

dWt

and let Q := LTP with Lt = −E(Z)t.Assume that EP (LT ) = 1 ! (This depends on the properties of b(x, t) andσ(x, t).)Show that there is a Q-Wiener process (Wt) such that

dXt = σ(Xt, t)dWt

Hint: Show that (Xt)t≤T is a local Q-martingale and that 1/σ(Xt, t)dXt is aQ-Wiener process.

Page 159: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

Bibliography

[1] Heinz Bauer. Probability theory. Translated from the German by Robert B. Bur-ckel. de Gruyter Studies in Mathematics. 23. Berlin: Walter de Gruyter. xv, 523p. , 1996.

[2] Heinz Bauer. Measure and integration theory. Transl. from the German by RobertB. Burckel. de Gruyter Studies in Mathematics. 26. Berlin: de Gruyter. xvi, 230p. , 2001.

[3] Tomasz R. Bielecki and Marek Rutkowski. Credit risk: Moldelling, valuationand hedging. Springer Finance. Berlin: Springer. xviii, 500 p. , 2002.

[4] Tomas Bjoerk. Arbitrage Theory in Continuous Time. Oxford University Press,2004.

[5] Pierre Bremaud. Point processes and queues. Martingale dynamics. SpringerSeries in Statistics. New York - Heidelberg - Berlin: Springer- Verlag. XIX, 354p. , 1981.

[6] Pierre Brémaud. An introduction to probabilistic modeling. Undergraduate Textsin Mathematics. New York etc.: Springer-Verlag. xvi, 207 p. , 1988.

[7] Pierre Brémaud. Markov chains. Gibbs fields, Monte Carlo simulation, andqueues. Texts in Applied Mathematics. New York, NY: Springer. xviii, 444 p., 1999.

[8] Jean Dieudonné. Foundations of modern analysis. Enlarged and corrected print-ing. New York-London: Academic Press. XV, 387 p. , 1969.

[9] Michael U. Dothan. Prices in financial markets. New York etc.: Oxford Univer-sity Press. xv, 342 p. , 1990.

[10] Edwin Hewitt and Karl Stromberg. Real and abstract analysis. A modern treat-ment of the theory of functions of a real variable. 3rd printing. Graduate Textsin Mathematics. 25. New York - Heidelberg Berlin: Springer-Verlag. X, 476 p. ,1975.

149

Page 160: LECTURE NOTES Introduction to Probability Theory …math4tune.com/stats 5-11-2007.pdfVIENNA GRADUATE SCHOOL OF FINANCE (VGSF) LECTURE NOTES Introduction to Probability Theory and Stochastic

150 BIBLIOGRAPHY

[11] John C. Hull. Options, futures, and other derivatives. 5th ed. Prentice-HallInternational Editions. Upper Saddle River, NJ: Prentice Hall. xxi, 744 p. , 2003.

[12] P.J. Hunt and J.E. Kennedy. Financial derivatives in theory and practice. Reviseded. Wiley Series in Probability and Statistics. Chichester: John Wiley & Sons.xxi, 437 p. , 2004.

[13] Albrecht Irle. Financial mathematics. The evaluation of derivatives. (Finanz-mathematik. Die Bewertung von Derivaten) 2., überarbeitete und erweiterte Au-flage. Teubner Studienbücher Mathematik. Stuttgart: Teubner. 302 S. , 2003.

[14] Jean Jacod and Albert N. Shiryaev. Limit theorems for stochastic processes. 2nded. Grundlehren der Mathematischen Wissenschaften. 288. Berlin: Springer.,2003.

[15] Ioannis Karatzas and Steven E. Shreve. Brownian motion and stochastic calculus.2nd ed. Graduate Texts in Mathematics, 113. New York etc.: Springer-Verlag.xxiii, 470 p. , 1991.

[16] Ioannis Karatzas and Steven E. Shreve. Methods of mathematical finance. Ap-plications of Mathematics. Berlin: Springer. xv, 407 p. , 1998.

[17] Marek Musiela and Marek Rutkowski. Martingale methods in financial mod-elling. 2nd ed. Stochastic Modelling and Applied Probability 36. Berlin:Springer. xvi, 636 p. , 2005.

[18] Salih N. Neftci. Introduction to the mathematics of financial derivatives. 2nd ed.Orlando, FL: Academic Press. xxvii, 527 p. , 2000.

[19] Philip Protter. Stochastic integration without tears (with apology to P. A. Meyer).Stochastics, 16:295–325, 1986.

[20] Philip E. Protter. Stochastic integration and differential equations. 2nd ed. Ap-plications of Mathematics 21. Berlin: Springer. xiii, 2004.

[21] A.N. Shiryaev. Probability. Transl. from the Russian by R. P. Boas. 2nd ed. Grad-uate Texts in Mathematics. 95. New York, NY: Springer-Verlag. xiv, 609 p. ,1995.

[22] Paul Wilmott. Paul Wilmott on Quanitative Finance, Volume One. John Wileyand Sons, 2000.

[23] Paul Wilmott. Paul Wilmott on Quanitative Finance, Volume Two. John Wileyand Sons, 2000.