Walks and Branching

8/2/2019 Walks and Branching

1/33

PSTAT160A: Random Walks and Branching ProcessesUniversity of California, Santa Barbara

Gerard Brunick

Last Updated: February 2, 2012

These notes are a minor modification of a set on notes which were generously shared with thepresent author by Gordan Zitkovic who currently works in the Department of Mathematics atThe University of Texas at Austin. Any mistakes in these notes where undoubtedly introduced bythe present author when he modified the original presentation.

Contents

1 Random Walks I 21.1 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.2 The canonical probability space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Constructing the random walk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.4 The reflection principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

2 Generating Functions 92.1 Generating functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Associating a generating function with a random variable . . . . . . . . . . . . . . . 10

2.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.4 Moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5 Random sums . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

3 Random Walks II 193.1 Stopping times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 193.2 Walds identity II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213.3 The distribution of the first hitting time T1 . . . . . . . . . . . . . . . . . . . . . . . 233.4 Strong Markov property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Branching Process 274.1 A bit of history . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

4.2 A mathematical model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.3 Construction and simulation of branching processes . . . . . . . . . . . . . . . . . . . 284.4 A generating-function approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 294.5 Extinction probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

1


2/33

1 Random Walks I

1.1 Stochastic processes

Definition 1.1. LetT be a subset of [0, ). A family of random variables (Xt)tT, indexed byT,is called a stochastic (or random) process. WhenT = N (or T = N0), (Xt)tT is said to be adiscrete-time process, and when T = [0, ), it is called a continuous-time process.

When T is a singleton (say T = {1}), the process (Xt)tT X1 is really just a single randomvariable. When T is finite (e.g., T = {1, 2, . . . , n}), we get a random vector. Therefore, stochasticprocesses are generalizations of random vectors. The interpretation is, however, somewhat differ-ent. While the components of a random vector usually (not always) stand for different spatialcoordinates, the index t T is more often than not interpreted as time. Stochastic processesusually model the evolution of a random system in time. When T = [0, ) (continuous-timeprocesses), the value of the process can change every instant. When T = N (discrete-timeprocesses), the changes occur discretely.

In contrast to the case of random vectors or random variables, it is not easy to define a notionof a density (or a probability mass function) for a stochastic process. Without going into details

why exactly this is a problem, let me just mention that the main culprit is the infinity. One usuallyconsiders a family of (discrete, continuous, etc.) finite-dimensional distributions, i.e., the jointdistributions of random vectors

(Xt1, Xt2, . . . , X tn),

for all n N and all choices t1, . . . , tn T.The notion of a stochastic processes is very important both in mathematical theory and its

applications in science, engineering, economics, etc. It is used to model a large number of variousphenomena where the quantity of interest varies discretely or continuously through time in a non-predictable fashion.

Every stochastic process can be viewed as a function of two variables - t and . For each fixedt,

Xt() is a random variable, as postulated in the definition. However, if we change our

point of view and keep fixed, we see that the stochastic process is a function mapping tothe real-valued function t Xt(). These functions are called the trajectories of the stochasticprocess X. The following two figures show two possible trajectories of a simple random walk1, i.e.,each one corresponds to a (different) frozen , but t varies from 0 to 30.

5 10 15 20

6

4

2

6

5 10 15 0

6

4

2

2

4

6

1We will define the simple random walk later. For now, let us just say that is behaves as follows. It starts atx = 0 for t = 0. After that a (possibly biased) fair coin is tossed and we move up (to x = 1) ifheads is observed anddown to x = 1 is we see tails. The procedure is repeated at t = 1, 2, . . . and the position at t + 1 is determined inthe same way, independently of all the coin tosses before (note that the position at t = k can be any of the followingx = k, x = k + 2, . . . , x = k 2, x = k).

2


3/33

Unlike with the figures above, the next two pictures show two time-slices of the same randomprocess; in each graph, the time t is fixed (t = 15 vs. t = 25) but the various values random variablesX15 and X25 can take are presented through the probability mass functions.

20 10 0 10 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Figure 1: Probability mass function for X15

20 10 0 10 20

0.00

0.05

0.10

0.15

0.20

0.25

0.30

Figure 2: Probability mass function for X25

1.2 The canonical probability space

When one deals with infinite-index (|T | = +) stochastic processes, the construction of the proba-bility space (, F,P) to support a given model is usually quite a technical matter. This course doesnot suffer from that problem because all our models can be implemented on a special probabilityspace. We start with the sample-space :

= [0, 1] [0, 1] = [0, 1]N0,

and any generic element of will be a sequence = (0, 1, 2, . . . ) of real numbers in [0, 1]. Forn N0 we define the mapping Un : [0, 1] which simply chooses the nth coordinate:

Un() = n.

The proof of the following theorem can be found in most advanced probability books (e.g. [ 1] Thm.20.4):

Theorem 1.2. There exists a probability measure P on such that

1. eachUn, n N0 is a random variable with the uniform distribution on [0, 1], and2. the sequence (Un)nN0 is independent.

Remark 1.3. One should think of the sample space as a source of all the randomness in thesystem: the elementary event is chosen by a process beyond out control and the exactvalue of is assumed to be unknown. All the other parts of the system are possibly complicated,but deterministic, functions of (random variables). When a coin is tossed, only a single drop of

randomness is needed - the outcome of a coin-toss. When several coins are tossed, more randomnessis involved and the sample space must be bigger. When a system involves an infinite number ofrandom variables (like a stochastic process with infinite T), a large sample space is needed.

Once we can construct a sequence of independent random variables which are uniformly dis-tributed on the unit interval, we can then construct any number of models. For example:

3


4/33

1.3 Constructing the random walk

Let us show how to construct the simple random walk on the canonical probability space (, F,P)from Theorem 1.2. First of all, we need a definition of the simple random walk:

Definition 1.4. A sequence (Xn)nN0 of random variables is called a simple random walk(with

parameter p (0, 1)) ifa) X0 = 0,

b) Xn+1 Xn is independent of (X0, X1, . . . , X n) for all n N, andc) the random variable Xn+1 Xn has the following distribution

x = 1 1P(Xn+1 Xn = x) = p q

where, as usual, q = 1 p.If p = 12 , the random walk is called symmetric.

The adjective simple comes from the fact that the size of each step is fixed (equal to 1) and it

is only the direction that is random. One can study more general random walks where each stepcomes from an arbitrary prescribed probability distribution. For the sequence (Un)nN, given byTheorem 1.2, define the following, new, sequence (n)nN of random variables:

n =

1, Un p1, otherwise.

We then set

X0 = 0, Xn =n

k=1

k, n N.

Intuitively, we use each n to emulate a biased coin toss and then define the value of the process

X at time n as the cumulative sum of the first n coin-tosses.

Proposition 1.5. The sequence (Xn)nN0 defined above is a simple random walk.

Proof. Property a) is trivially true. To check property b), we first note that the (n)nN is anindependent sequence (as it has been constructed by an application of a deterministic function toeach element of an independent sequence (Un)nN). Therefore, the increment Xn+1 Xn = n+1is independent of all the previous coin-tosses 1, . . . , n. What we need to prove, though, is that itis independent of all the previous values of the process X. These, previous, values are nothing butlinear combinations of the coin-tosses 1, . . . , n, so they must also be independent of n+1. Finally,to get (3), we compute

P[Xn+1 Xn = 1] = P[n+1 = 1] = P[Un+1 p] = p.A similar computation shows that P[Xn+1 Xn = 1] = q.

We have now defined and constructed a random walk (Xn)nN0. Our next task is to study someof its mathematical properties.

4


5/33

Proposition 1.6. Let (Xn)nN0 be a simple random walk with parameter p. The distribution ofthe random variable Xn is discrete with support {n, n + 2, . . . , n 2, n}, and probabilities

P[Xn = l] =

n

l+n2

p(n+l)/2q(nl)/2, l = n, n + 2, . . . , n 2, n. (1.1)

Proof. Xn

is composed of n independent steps k

= Xk+1

Xk

, k = 1, . . . , n, each of which goeseither up or down. In order to reach level l in those n steps, the number u of up-steps and thenumber d of down-steps must satisfy u d = l (and u + d = n). Therefore, u = n+l2 and d = nl2 .The number of ways we can choose these u up-steps from the total of n is

nn+l

2

, which, with

the fact the probability of any trajectory with exactly u up-steps is puqnu, gives the probability(1.1) above. Equivalently, we could have noticed that the random variable n+Xn2 has the binomialb(n, p)-distribution.

1.4 The reflection principle

Now we know how to compute the probabilities related to the position of the random walk ( Xn)nN0at a fixed future time n. A mathematically more interesting question can be posed about the

maximum of the random walk on {0, 1, . . . , n}. A nice expression for this probability is availablefor the case of symmetric simple random walks.

To compute this quantity, it is more helpful to view the random walkas a random trajectory in some space of paths, and, compute therequired probability by simply counting the number of trajectories inthe subset (event) you are interested in, and adding them all together,weighted by their probabilities. To prepare the ground for the futureresults, let C be the set of all possible trajectories:

C = {(x0, x1, . . . , xn) : x0 = 0, xk+1 xk = 1, k n 1}.

You can think of the first n steps of a random walk simply as aprobability distribution on the state-space C.

The figure on the right shows the superposition of all trajectories inC for n = 4 with path (0, 1, 0, 1, 2) marked in red.

1 2 3 4

4

2

Proposition 1.7. Let (Xn)nN0 be a symmetric simple random walk, suppose n 2, and letMn = max(X0, . . . , X n) be the maximal value of (Xn)nN0 on the interval 0, 1, . . . , n. The supportof Mn is {0, 1, . . . , n} and its probability mass function is given by

P[Mn = l] =

n

n+l+12

2n, l = 0, . . . , n .

Proof. Let us first pick a level l {0, 1, . . . , n} and compute the auxiliary probabilityP

[Mn l] bycounting the number of trajectories whose maximal level reached is at least l. Indeed, the symmetryassumption ensures that all trajectories are equally likely. More precisely, let Al C be given by

Al =

(x0, x1, . . . , xn) C : maxk=0,...,n

xk l

=

(x0, x1, . . . , xn) C : xk l, for at least one k {0, . . . , n}

.

5


6/33

Then P[Mn l] = 12n |Al|, where |A| denotes the number of elements in the set A. When l = 0,we clearly have P[Mn 0] = 1, since X0 = 0. To count the number of elements in Al, we use thefollowing clever observation (known as the reflection principle):Claim: For l N, we have

|Al

|= 2 {(x0, x1, . . . , xn) : xn > l} + {(x0, x1, . . . , xn) : xn = l}. (1.2)We start by defining a bijective transformation which maps trajectories into trajectories. For a

trajectory (x0, x1, . . . , xn) Al, let k(l) = k(l, (x0, x1, . . . , xn)) be the smallest value of the index ksuch that xk l. In the stochastic-process-theory parlance, k(l) is the first hitting time of theset {l, l + 1, . . . }. We know that k(l) is well-defined (since we are only applying it to trajectories inAl) and that it takes values in the set {1, . . . , n}. With k(l) at our disposal, let (y0, y1, . . . , yn) Cbe a trajectory obtained from (x0, x1, . . . , xn) by the following procedure:

1. do nothing until you get to k(l):

y0 = x0, y1 = x1, . . .

yk(l) = xk(l).2. use the flipped values for the coin-tosses

from k(l) onwards:

yk(l)+1 yk(l) = (xk(l)+1 xk(l)), yk(l)+2 yk(l)+1 = (xk(l)+2 xk(l)+1),

. . .

yn yn1 = (xn xn1).

6 8 10 12 14

2

6

The picture on the right shows two trajectories: a blue one and its reflection in red, with n = 15,l = 4 and k(l) = 8. Graphically, (y0, . . . , yn) looks like (x0, . . . , xn) until it hits the level l, andthen follows its reflection around the level l so that yk

l = l

xk, for k

k(l). If k(l) = n,

then (x0, x1, . . . , xn) = (y0, y1, . . . , yn). It is clear that (y0, y1, . . . , yn) is in C. Let us denote thistransformation by

: Al C, (x0, x1, . . . , xn) = (y0, y1, . . . , yn)and call it the reflection map. The first important property of the reflection map is that it is itsown inverse: apply to any (y0, y1, . . . , yn) in Al, and you will get the original (x0, x1, . . . , xn). Inother words = Id, i.e. is an involution. It follows immediately that is a bijection fromAl onto Al.

To get to the second important property of , let us split the set Al into three parts accordingto the value of xn:

1. A>l = {(x0, x1, . . . , xn) Al : xn > l},2. A=l = {(x0, x1, . . . , xn) Al : xn = l}, and3. Al ) = A

l and A=l , the a priori stipulation that (x0, x1, . . . , xn)

Al is unnecessary. Indeed, if xn l, you must already be in Al. Therefore, by the bijectivity of ,we have

|Al | = |{(x0, x1, . . . , xn) : xn > l}|,and so

|Al| = 2 |{(x0, x1, . . . , xn) : xn > l}| + |{(x0, x1, . . . , xn) : xn = l}|.This shows the claim.

Now that we have (1.2), we can easily rewrite it as follows:

P[Mn l] = P[Xn = l] + 2j>l

P[Xn = j] =j>l

P[Xn = j] +jl

P[Xn = j].

Finally, we subtract P[Mn l + 1] from P[Mn l] to get the expression for P[Mn = l]:P[Mn = l] = P[Xn = l + 1] + P[Xn = l].

It remains to note that only one of the probabilities P[Xn = l + 1] and P[Xn = l] is non-zero, the

first one if n and l have different parity and the second one otherwise. In either case the non-zeroprobability is given by

n

n+l+12

2n.

Let us use the reflection principle to solve a classical problem in combinatorics.

Example 1.8 (The Ballot Problem). Suppose that two candidates, Daisy and Oscar, are run-ning for office, and n N voters cast their ballots. Votes are counted by the same official, oneby one, until all n of them have been processed (like in the old days). After each ballot is opened,the official records the number of votes each candidate has received so far. At the end, the officialannounces that Daisy has won by a margin ofm > 0 votes, i.e., that Daisy got (n + m)/2 votes and

Oscar the remaining (n m)/2 votes. What is the probability that Daisy never trails Oscar duringthe counting of the votes?

We assume that the order in which the official counts the votes is completely independent of theactual votes, and that each voter chooses Daisy with probability p (0, 1) and Oscar with probabilityq = 1 p. For k n, let Xk be the number of votes received by Daisy minus the number of votesreceived by Oscar in the first k ballots. When the k + 1-st vote is counted, Xk either increases by1 (if the vote was for Daisy), or decreases by 1 otherwise. The votes are independent of each otherand X0 = 0, so Xk, 0 k n is (the beginning of ) a simple random walk. The probability of anup-step is p (0, 1), so this random walk is not necessarily symmetric. The ballot problem can nowbe restated as follows:

What is the probability that Xk

0 for all k {

0, . . . , n}

, given that Xn = m?

The first step towards understanding the solution is the realization that the exact value of p doesnot matter. Indeed, we are interested in the conditional probabilityP[F|G] = P[F G]/P[G], where

F = all trajectories that stay non-negative = {Xi 0, for all 0 i n},G = all trajectories that reach m at time n = {Xn = m}.

7


8/33

Each trajectory in G has (n + m)/2 up-steps and (n m)/2 down-steps, so its probability weight isalways equal to p(n+m)/2q(nm)/2. Therefore,

P[F|G] = P[F G]P[G]

=|F G| p(n+m)/2q(nm)/2

|G| p(n+m)/2q(nm)/2 =|F G|

|G| . (1.3)

We already know how to count the number of paths in G - it is equal to n(n+m)/2 - so all thatremains to be done is to count the number of paths in G F. If we set

H = all paths which finish at m and visit the level l = 1= {Xn = m and min

0inXi 1}.

then G = (G F) H. In other words, the collection of paths that go from 0 to m can be split into1. G F: the paths that go from 0 to m and stay nonegative, and2. H: the paths that go from 0 to m and become negative at some point.

So

|G

F

|=

|G

| |H

|.

Can we use the reflection principle to find |H|? Yes, we can. In fact, you can convince yourselfthat the reflection of any path in H around the level l = 1 after its first hitting time of that levelproduces a path that starts at 0 and ends at m 2. Conversely, the same procedure applied tosuch a path yields a path in H. If a path travels from 0 to m 2, then it must have (n + m + 2)/2down steps and (n m 2)/2 up steps. This means there are n1+(n+m)/2 of these paths. Puttingeverything together, we get

P[F|G] =nk

nk+1nk

= 2k + 1 nk + 1

, where k =n + m

2.

The last equality follows from the definition of binomial coefficients

nk

= n!k!(nk)! .

How would you modify this argument compute the probability that Daisy leads Oscar during the

entire counting of votes?

The Ballot problem has a long history (going back to at least 1887) and has spurred a lot ofresearch in combinatorics and probability. In fact, people still write research papers on some ofits generalizations. When posed outside the context of probability, it is often phrased as in howmany ways can the counting be performed . . . (the difference being only in the normalizing factornk

appearing in (1.3) above). A special case m = 0 seems to be even more popular - the number

of 2n-step paths from 0 to 0 never going below zero is called the Catalan number and equals to

Cn =1

n + 1

2n

n

.

Can you derive this expression from (1.3)? If you want to test your understanding a bit further,here is an identity (called Segners recurrence formula) satisfied by the Catalan numbers

Cn =n

i=1

Ci1Cni, n N.

Can you prove it using the Ballot-problem interpretation?

8


9/33

2 Generating Functions

A generating function is a clothesline on which we hang up a sequence of numbers for display.-Herbert S. Wilf, generatingfunctionology

The path-counting method used in the previous lecture only works for computations related to

the first n steps of the random walk, where n is given in advance. We will see later that most of theinteresting questions do notfall into this category. For example, the distribution ofthe time it takesfor the random walk to hit the level l = 0 is like that. There is no way to give an a-priori boundon the number of steps it will take to get to l (in fact, the expectation of this random variable canbe +). To deal with a wider class of properties of random walks (and other processes), we needto develop some new mathematical tools.

2.1 Generating functions

A generating function associates a sequence of numbers with a function (a power series). It turnsout that in many cases of interest we can use this function to learn about the sequence.

Definition 2.1. If (an)nN0 is a sequence of numbers then we say that radius of convergenceof the power series

k=0 aks

k is the largest number R [0, ] such thatk=0 |ak||s|k convergeswhen |s| < R. When R > 0, we say that the function

G(s) =kN0

ak sk, R < s < R,

is the generating function associated with the sequence (an)nN0.

The generating function A associated with a sequence (an)nN0 is infinitely differentiable, andits derivative can be expressed as another power series.

Proposition 2.2. When R > 0, the function A(s) is infinitely differentiable on (

R, R) and

dn

dsnA(s) =

nk


10/33

The name generating function comes from the last part of this result. The knowledge of A impliesthe knowledge of the whole sequence (an)nN0. It also turns out that we can use generating functionsto study convolution. We will see shortly that convolution arises naturally when we compute theprobability mass function of the sum of two independent, N0-valued, random variables.

Definition 2.3. Let(an)nN0 and (bn)nN0 be sequences and define

cn =n

j=0

aj bjk =n

k=0

ank bk, n N0.

Then we say that the sequence (cn)nN0 is the convolution of the sequences (an)nN0 and (bn)nN0and we write c = a b.It turns out that convolving two sequences is equivalent to multiplying their generating functions.

Proposition 2.4. Let (an)nN0 and (bn)nN0 be sequences, let Ga and Gb denote the generating functions associated with these sequences, and assume that both power series have radius of con-vergence at least as large as R > 0. If we set c = a b, then the

Gc(s) =k=0

ck sk,

also has radius of convergence at least as large as R, and Gc(s) = Ga(s)Gb(s) for |s| < R.Proof. If we formally expand and then collect like powers of s in the following expression:

Ga(s)Gb(s) = (a0 + a1s + a2s2 + . . . ) (b0 + b1s + b2s

2 + . . . )

then we see that the resulting coefficient for sn is given by cn. Checking the remaining claimsrigorously is again beyond the scope of this course.

2.2 Associating a generating function with a random variable

In this section we will look at random variables which take values in the set

T N0 {+} = {0, 1, 2, 3, . . . } {+}.

We will often be interested in random variables which record the amount of (discrete) time that wehave to wait for an event to occur. In some cases, the event may never occurs, so we allow theserandom variables to take the value + to indicate that we have to wait forever.

The distribution of an T-valued random variable X is completely determined by the sequence(an)nN0 of numbers in [0, 1] given by

an

= P[X = n], nN

0. (2.2)

Notice that the value P(X = ) does not occur in the sequence (an)nN0, but we can still figure itfrom the values in the sequence:

P(X = ) = 1 P(X < ) = 1 nN0

an.

10


11/33

In the future, when we say let (an)nN0 be the sequence associated with X, we mean that (an)nN0is given by (2.2). We then define the generating function associated with the sequence (an)nN0 by

GX(s) =

0k 1 in (3), in Example2.5. For the distribution with pmf given by ak =

C(k+1)2 , where C = (

k=0

1(k+1)2 )

1, the radius of

convergence is exactly equal to 1. Can you see why?

The following proposition gives another way to the compute the generating function associated

with a random variable.

Proposition 2.7. LetX be anT-valued random variable with generating function GX . Then

1. GX(s) = E[sX ] = E[sX1{X


12/33

Proof. Statement (1) follows directly from the formula

E[g(X)] =nT

g(n)P(X = n)

applied to g(x) = sx where we have used the fact/convention that s = 0 if |s| < 1.The second claim follows from the fact that:

lims1

GX(s) = lims1

nN0

sn P(X = n)

=nN0

lims1

sn P(X = n) =nN0

P(X = n) = P(X < ).

Of course, one should really justify the fact that we can exchange the summation and limit in theprevious equation. In this case, one could employ the monotone convergence theorem from realanalysis, or simply check it by hand, but this is beyond the scope of this course.

Remark 2.8. We used the formula an = P[X = n] to associate a sequence with the random variable

X. One could also use the formula bn = E[Xn/n!] to associate a sequence with X. If one thencomputes the generating function B corresponding to the sequence (bn), the resulting functionis given by B(s) = E[esX ] and is known as the moment generating function associated withX. The moment generating function is quite similar to the probability generating function. Inparticular, one can check that that they are related by the formula GX(s) = B(log(s)), for s (0, 1).The probability generating function will turn out to be more convenient for this class.

2.3 Convolution

The true power of generating functions comes from the fact that they behave very well under theusual operations in probability.

Proposition 2.9. Let X, Y be independent T-valued random variables and set Z = X + Y. If(an)nN0 and GX are the sequence and generating function associated with X, (bn) and Gy arethe sequence and generating function associated with Y, and (cn) and GZ are the sequence andgenerating function associated with Z, then c = a b and GZ(s) = GX(s)GY(s).

Proof. For each n N, we have

cn = P(Z = n) =n

i=0

P(X = i and Y = n i) =n

i=0

P(X = i)P(Y = n i) =n

i=0

ai bn1.

Similarly, if |s| < 1, then

GZ(s) = E[sZ] = E[sX+Y] = E[sXsY] = E[sX ]E[sY] = GX(s)GY(s).

12


13/33

Example 2.10.

1. The binomial(n, p) distribution is a sum of n independent Bernoulli random variables withparameter(p). Therefore, if we apply Prop. 2.9 n times to the generating function (q +ps) ofthe Bernoullib(p) distribution we immediately get that the generating function of the binomialis (q +ps) . . . (q +ps) = (q +ps)n.

2. More generally, we can show that the sum of m independent random variables with the b(n, p)distribution has a binomial(mn,p) distribution. If you try to sum binomials with differentvalues of the parameter p you will not get a binomial.

3. What is even more interesting, the following statement can be shown: Suppose that the sumZ of two independent N0-valued random variables X and Y is binomially distributed withparameters n and p. Then both X and Y are binomial with parameters nX , p and ny, p wherenX +nY = n. In other words, the only way to get a binomial as a sum of independent randomvariables is the trivial one.

We will actually need something slightly more complicated than Proposition 2.9 when we get

back to random walks. To understand what we want, lets consider the following example.

Example 2.11. You own a pizza shop with a single delivery driver. You send your driver out withan order, but then realize that you gave him the wrong pizza. Unfortunately, its 1983 so you cantcall your driver on a cell phone and there is nothing that you can do but wait for your driver toreturn.

Let T1 T denote the time that your driver gets back from the first trip, and let T2 T1denote the time that your driver gets back from the second trip. Of course, these should probably becontinuous random variable, but lets assume that the world is discrete. Moreover, if you dont likethe idea that T1 might take the value zero, you can just assign probability zero to this possibility.Your driver is actually somewhat unreliable: on each trip there is some chance that he will decide

he is sick of this job and never return. In other words,P

(T1 = ) > 0 andP

(T2 = ) > 0.In the event that your driver does return to the shop after the first trip, the time that it takesfor him to make the second round trip is independent of the time that the first trip took and hasthe same distribution. More formally, we suppose that:

P(T2 = m + n | T1 = m) = P(T1 = n), m, n N0. (2.4)

Since conditional on the event{T1 = m}, T2 can only take the values {m, m + 1, m + 2, . . . }{}.So it follows from (2.4) that

P(T2 = | T1 = m) = 1 nN0

P(T2 = m + n | T1 = m) = 1

nk


14/33

round trip took. We could just define = , but this still isnt going to make T1 and T2 T1independent. Fortunately, this minor annoyance doesnt end up mattering. First notice that

E(sT2 | T1 = m) =nT

sm+n P(T1 = n) = smGT1(s), m N0, |s| < 1.

We also know that T2 = when T1 = , soE(sT2 | T1 = ) = s = 0, |s| < 1.

As a result, we may apply the tower law for conditional expectation to conclude that

GT2(s) = E[sT2] =

mT

E[sT2 | T1 = m] P(T1 = m) =mN0

smGT1(s) P(T1 = m) = G2T1(s),

when |s| < 1.In fact, we know something slightly stronger. If two generating functions agree around zero,

then it follows from Proposition 2.2 they are generated by the same sequence, so they must agreeon their entire common radius of convergence.

We have now shown the following proposition which we will need in the next section.

Proposition 2.12. LetT2 T1 be random times taking values inT = N0 {+} with generatingfunctions GT1 and GT2. If

P(T2 = m + n | T1 = m) = P(T1 = n), m, n N0,

then GT2(s) = G2T1

(s).

2.4 Moments

Another useful thing about generating functions is that they can make the computation of momentseasier. Recall that E[Xn] is said to be the nth moment of X. Also notice that ifP(X = ) > 0,then Xn is never integrable, so we will now restrict attention to random variables that only takevalues in N0.

Proposition 2.13. Let X be aN0-valued random variable X with generating function GX . Forn N the following two statements are equivalent

1. E[Xn] < ,

2. dnGX(s)dsn

s=1

exists (in the sense that the left limit lims1dnGX(s)

dsn exists)

In either case, we have

E[X(X 1)(X 2) . . . (X n + 1)] = dn

dsnGX(s)

s=1

.

Proof. Formally, one can check this by setting s = 1 in (2.1), and checking the resulting summationamounts to calculating the desired expectation.

14


15/33

The quantitiesE[X], E[X(X 1)], E[X(X 1)(X 2)], . . .

are called factorial moments of the random variable X. You can get the classical moments fromthe factorial moments by solving a system of linear equations. It is very simple for the first few:

E[X] = E[X],E[X2] = E[X(X 1)] + E[X],E[X3] = E[X(X 1)(X 2)] + 3E[X(X 1)] + E[X], . . .

A useful identity which follows directly from the above results is the following:

Var[X] = P(1) + P(1) (P(1))2,

and it is valid if the first two derivatives of P at 1 exist.

Example 2.14. LetX be a Poisson random variable with parameter . Its generating function isgiven by

A(s) = e(s1).

Therefore, dn

dsn A(1) = n, and so, the sequence (E[X],E[X(X 1)],E[X(X 1)(X 2)], . . . ) of

factorial moments of X is just (, 2, 3, . . . ). It follows that

E[X] = ,

E[X2] = 2 + , Var[X] =

E[X3] = 3 + 32 + , . . .

Example 2.15. We have an urn which contains three number balls. We then play a repeated gamewhere on each turn we draw a ball with the following outcomes:

a) If we draw the first ball, we win a dollar, replace the ball in the urn, and then play anotherround.

b) If we draw the second ball, we win two dollars, replace the ball in the urn, and then playanother round.

c) If we draw the third ball, the game is over.

LetX denote the amount of money that we win in this game. The number of rounds that we playhas a geometric distribution, so the game ends with probability one andP(X = ) = 0. We wouldlike to determine the generating function GX associated with X. To do this, we let Y denote theamount of winnings obtained after (but not including the winnings from) the first round, and welet Z denote the first ball drawn. Then the conditional distribution of Y given Z = 1 or Z = 2 isthe same as the unconditional distribution of X. That is:

P(Y = n | Z = 1) = P(Y = n | Z = 2) = P(X = n), for all n N0.

As a result:GY(s) = E(s

Y | Z = 1) = E(sY | Z = 2) = E[sX ] = GX(s),

15


16/33

and

GX(s) = E[sX ] = E[sX | Z = 1]P(Z = 1) + E[sX | Z = 2]P(Z = 2) + E[sX | Z = 3]P(Z = 3)

= E[s1+Y | Z = 1]/3 + E[s2+Y | Z = 2]/3 + E[s0 | Z = 3]/3= sGX(s)/3 + s

2GX(s)/3 + 1/3.

Solving for GX show that GX(s) = 1/(3 s s2). In particular,

GX(s) =1 + 2s

(3 s s2)2 , GX(s) =

8 + 6s + 6s2

(3 s s2)3 .

As a result, we can determine a number of properties of X

P(X = 0) = GX(0) = 1/3, P(X = 1) = GX(0) = 1/9, P(X = 2) = G

X(0)/2 = 4/27,

E[X] = GX(1) = 3 E[X2] = GX(1) + G

X(1) = 23 Var(X) = 14.

2.5 Random sums

Our next application of generating function in the theory of stochastic processes deals with theso-called random sums. Let (n)nN be a sequence of random variables, and let N be a randomtime (a random time is simply an T = N0 {+}-value random variable). We can define therandom variable

Y =N

k=0

k by Y() =

0, N() = 0,N()

k=1 k(), N() 1for .

More generally, for an arbitrary stochastic process (Xn)nN0 and a random time N (with P[N =+] = 0), we define the random variable XN by XN() = XN()(), for . When Nis a constant (N = n), then X

Nis simply equal to X

n. In general, think of X

Nas a value

of the stochastic process X taken at the time which is itself random. If Xn =n

k=1 k, then

XN =N

k=1 k.

Example 2.16. Let (n)nN be the increments of a symmetric simple random walk (coin-tosses),and let N have the following distribution

n = 0 1 2

P(N = n) = 1/3 1/3 1/3

which is independent of (n)nN (it is very important to specify the dependence structure betweenN and (n)nN in this setting!). Let us compute the distribution of Y =

Nk=0 k in this case. This

16


17/33

is where we, typically, use the formula of total probability:

P(Y = m)

= P(Y = m|N = 0)P(N = 0) + P(Y = m|N = 1)P(N = 1) + P(Y = m|N = 2)P(N = 2)

=P

N

k=0

k = m N = 0P(N = 0) + PN

k=0

k = m N = 1P(N = 1)+ P

N

k=0

k = m N = 2

P(N = 2)

=1

3

1{m=0} + P(1 = m) + P(1 + 2 = m)

.

When m = 1 (for example), we get

P[Y = 1] =0 + 12 + 0

3= 1/6.

Perform the computation for some other values of m for yourself.

What happens when N and (n)nN are dependent? This will usually be the case in practice, asthe value of the time N when we stop adding increments will typically depend on the behavior ofthe sum itself.

Example 2.17. Let(n)nN be as above - we can think of a situation where a gambler is repeatedlyplaying the same game in which a fair coin is tossed and the gambler wins a dollar if the outcomeis heads and loses a dollar otherwise. A smart gambler enters the game and decides on the following tactic: Lets see how the first game goes. If I lose, Ill play another 2 games and hopefullycover my losses, and if I win, Ill quit then and there. The described strategy amounts to the choiceof the random time N as follows:

N() =

1, 1 = 1,

3, 1 = 1.

Then

Y() =

1, 1 = 1,1 + 2 + 3, 1 = 1.

Therefore,

P[Y = 1] = P[Y = 1|1 = 1]P[1 = 1] + P[Y = 1|1 = 1]P[1 = 1]= 1

P[1 = 1] + P[2 + 3 = 2]P[1 =

1]

= 12 (1 + 14 ) = 58 .

Similarly, we get P[Y = 1] = 14 and P[Y = 3] = 18 . The expectationE[Y] is equal to 1 58 +(1) 14 + (3) 18 = 0. This is not an accident. One of the first powerful results of the beautifulmartingale theory states that no matter how smart a strategy you employ, you cannot beat a fairgamble.

17


18/33

We will return to the general (non-independent) case in the next lecture. Let us use generatingfunctions to give a full description of the distribution of Y =

Nk=0 k when the time is independent

of the summands.

Proposition 2.18. Let (n)nN be a sequence of independent N0-valued random variables, all ofwhich share the same distribution and generating function G(s). Let N be a random time which

is independent of (n)nN with P(N < ) = 1 and generating function GN. Then the generatingfunction GY of the random sum Y =

Nk=0 k is given by

GY(s) = GN

G(s)

.

Proof. First let X0 = 0 and Xn =n

i=1 i for n N denote the sequence of partial sums. Repeatedapplications of Proposition 2.9 show that GXn = G

n (where G

0(s) = 1). As a result, we may apply

the tower law for conditional expectation to see that

E[sY] =nN0

E[sY | N = n]P(N = n)

=

nN0E[sXn | N = n]P(N = n) =

nN0Gn (s)P(N = n) = GN

G(s)

.

Corollary 2.19 (Walds Identity I). Let (n)nN and N be as in Proposition 2.18. Suppose,also, thatE[N] < andE[1] < . Then

E

Nk=0

k

= E[N]E[1].

Proof. We just apply the composition rule for derivatives to the equality GY = GN G to getGY(s) = G

N(G(s))G

(s).

After we let s 1, we get

E[Y] = G

Y(1) = G

N(G(1))G

(1) = G

N(1)G

(1) = E[N]E[1].

Example 2.20. Every time the Springfield Isotopes play in the league championship, their chanceof winning is p (0, 1). The number of years between two championships they get to play in has thePoisson distribution p(), > 0. What is the expected number of years Y between the consecutivechampionship wins?

Let (n)nN be the sequence of independent Poisson()-random variables modeling the numberof years between consecutive championship appearances by the Wildcat. Moreover, let N be aGeometric(p) random variable with success probability p. Then

Y =N

k=0k.

Indeed, every time the Isotopes lose the championship, another years have to pass before they getanother chance and the whole thing stops when they finally win. To compute the expectation of Ywe use Corollary 2.19

E[Y] = E[N]E[k] =1 p

p.

18


19/33

3 Random Walks II

3.1 Stopping times

The last application of generating functions dealt with sums evaluated between 0 and some randomtime N. An especially interesting case occurs when the value ofN depends directly on the evolution

of the underlying stochastic process. Even more important is the case where times arrow is takeninto account. If you think ofN as the time you stop adding new terms to the sum, it is usually thecase that you are not allowed (able) to see the values of the terms you would get if you continuedadding. Think of an investor in the stock market. Her decision to stop and sell her stocks candepend only on the information available up to the moment of the decision. Otherwise, she wouldsell at the absolute maximum and buy at the absolute minimum, making tons of money in theprocess. Of course, this is not possible unless you are clairvoyant, so the mere mortals have torestrict their choices to so-called stopping times.

Definition 3.1. Let(Xn)nN0 be a stochastic process. A random variable T taking values inT =N0 {+} is said to be a stopping time with respect to (Xn)nN0 if for each n N0 there existsa function Gn : Rn+1

{0, 1

}such that

1{T=n} = Gn(X0, X1, . . . , X n), for all n N0.

The functions Gn are called the decision functions, and should be thought of as a black boxwhich takes the values of the process (Xn)nN0 observed up to the present point and outputs either0 or 1. The value 0 means keep going and 1 means stop. The whole point is that the decision hasto based only on the available observations and not on the future ones.

Example 3.2.

1. The simplest examples of stopping times are (non-random) deterministic times. Just setT = 5 (or T = 723 or T = n0 for any n0

N0

{+

}), no matter what the state of the

world is. The family of decision rules is easy to construct:

Gn(x0, x1, . . . , xn) =

1, n = n0,

0, n = n0..

Decision functions Gn do not depend on the values of X0, X1, . . . , X n at all. A gambler whostops gambling after 20 games, no matter of what the winnings of losses are uses such a rule.

2. Probably the most well-known examples of stopping times are (first) hitting times. They canbe defined for general stochastic processes, but we will stick to simple random walks for thepurposes of this example. So, let Xn =

nk=0 k be a simple random walk, and let Tl be the

first time X hits the level l

N. More precisely, we use the following slightly non-intuitive

but mathematically correct definition

Tl = min{n N0 : Xn = l}.

The set {n N0 : Xn = l} is the collection of all time-points at which X visits the level l.The earliest one - the minimum of that set - is the first hitting time of l. In states of the

19


20/33

world in which the level l just never get reached, i.e., when {n N0 : Xn = l} is anempty set, we setTl() = +. In order to show that Tl is indeed a stopping time, we need toconstruct the decision functions Gn, n N0. Let us start with n = 0. We would have Tl = 0in the (impossible) case X0 = l, so we always have G

0(X0) = 0. How aboutn N. For thevalue of Tl to be equal to exactly n, two things must happen:

(a) Xn = l (the level l must actually be hit at time n), and

(b) Xn1 = l, Xn2 = l, . . . , X1 = l, X0 = l (the level l has not been hit before).Therefore,

Gn(x0, x1, . . . , xn) =

1, x0 = l, x1 = l , . . . , xn1 = l, xn = l0, otherwise.

The hitting time T2 of the levell = 2 for a particular trajectory of a symmetric simple randomwalk is depicted below:

T2 TM5 10 15 20 25 30

6

4

2

2

4

6

.

3. How about something that is not a stopping time? Let n0 be an arbitrary time-horizon and letTM be the last time during 0, . . . , n0 that the random walk visits its maximum during0, . . . , n0(see picture above). If you bought a stock at time t = 0, had to sell it some time before n0 and

had the ability to predict the future, this is one of the points you would choose to sell it at.Of course, it is impossible to decide whether TM = n, for some n 0, . . . , n0 1 without theknowledge of the values of the random walk after n. More precisely, let us sketch the proof ofthe fact that TM is not a stopping time. Suppose, to the contrary, that it is, and let G

n be thefamily of decision functions. Consider the fol lowing two trajectories: (0, 1, 2, 3, . . . , n 1, n)and (0, 1, 2, 3, . . . , n 1, n 2). The differ only in the direction of the last step. They alsodiffer in the fact that TM = n for the first one and TM = n 1 for the second one. On theother hand, by the definition of the decision functions, we have

1{TM=n1} = Gn1(X0, . . . , X n1).

The right-hand side is equal for both trajectories, while the left-hand side equals to 0 for the

first one and1 for the second one. A contradiction.

Remark 3.3. In the remainder of this section, we will sometimes write our decision function as afunction of the random variables 1, 2, . . . , n rather than the random variables X0, X1, X2, . . . , X n.As knowing the values X0, X1, X2, . . . , X n is clearly equivalent to knowing the values 1, 2, . . . , n,we are free to use whichever representation is more convenient.

20


21/33

3.2 Walds identity II

Having defined the notion of a stopping time, let use try to compute something about it. Therandom variables (n)nN in the statement of the theorem below are only assumed to be independentof each other and identically distributed. To make things simpler, you can think of (n)nN asincrements of a simple random walk. Before we state the main result, here is an extremely useful

identity:

Proposition 3.4. LetN be anN0-valued random variable. Then

E[N] =kN0

P[N k].

Proof. Clearly, P[N k] = jk P[N = j], so (note what happens to the indices when we switchthe sums)

kN

P[N k] =kN

kj T) after that. Taking expectation of both sides and switching E and

(this

can be justified, but the argument is technical and we omit it here) yields:

ET

k=1 k =

k=1E[1{kT} k]. (3.1)Now lets look at the random variable 1{kT} more closely. We have

1{kT} = 1 1{k>T} = 1 1{k1T} = 1 k1j=0

1{T=j} = 1 k1j=0

Gj(1, . . . , j),

21


22/33

where Gj(1, . . . , j) is the decision function which corresponds to the the event {T = j}. Inparticular, we see that the random variable 1{kT} can be written as function of the variables(1, . . . , k1). As the random variables (1, . . . , k) are independent, the random variables 1{kT}and k are also independent. This means that

E Tk=1

k = k=1

E[1{kT}]E[k] = E[k]

k=1

P(k T) = E[k]E[T].

Example 3.6 (Gamblers ruin problem). A gambler start with x N dollars and repeatedlyplays a game in which he wins a dollar with probability 12 and loses a dollar with probability

12 . He

decides to stop when one of the following two things happens:

1. he goes bankrupt, i.e., his wealth hits 0, or

2. he makes enough money, i.e., his wealth reaches some level a > x.

The classical Gamblers ruin problem asks the following question: what is the probability that the

gambler will make a dollars before he goes bankrupt?Gamblers wealth (Wn)nN is modeled by a simple random walk starting from x, whose incre-

ments k = Wk Wk1 are coin-tosses. Then Wn = x + Xn, where Xn =n

k=0 k, n N0. Let Tbe the time the gambler stops. We can representT in two different (but equivalent) ways. On theone hand, we can think of T as the smaller of the two hitting times Tx and Tax of the levels xand a x for the random walk (Xn)nN0 (remember that Wn = x + Xn, so these two correspondto the hitting times for the process (Wn)nN0 of the levels 0 and a). On the other hand, we canthink of T as the first hitting time of the two-element set {x, a x} for the process (Xn)nN0. Ineither case, it is quite clear that T is a stopping time (can you write down the decision functions?).We will see later that the probability that the gamblers wealth will remain strictly between 0 and aforever is zero, so P[T < ] = 1.

What can we say about the random variable XT - the gamblers wealth (minus x) at the randomtime T? Clearly, it is either equal to x or to a x, and the probabilities p0 and pa with which ittakes these values are exactly what we are after in this problem. We know that, since there are noother values XT can take, we must have p0 +pa = 1. Walds identity gives us the second equationfor p0 and pa:

E[XT] = E[1]E[T] = 0 E[T] = 0,so

0 = E[XT] = p0(x) +pa(a x).These two linear equations with two unknowns yield

p0 =

a

x

a , pa =

x

a .

It is remarkable that the two probabilities are proportional to the amounts of money the gamblerneeds to make (lose) in the two outcomes. Again we see that the gambler cannot extract positiveexpected value from a fair game. The situation is different when p = 12 .

22


23/33

3.3 The distribution of the first hitting time T1

Let (Xn)nN0 be a simple random walk, with the probability p of stepping up and let

T = min{n N0 : Xn = }

be the first hitting time of level but the random walk. We will now study the random variables Tusing the generating-function methods. We will essentially follow the approach of Example 2.15:we will attempt to determine an equation that the generating function associated with the randomvariable T satisfies.

The first step is contained in the following proposition. Recall that the minimum value in emptyset is taken to be + by convention.Proposition 3.7. Let (X1, X2, . . . ) be a random walk with probability p of moving up. If N,then GT(s) = G

T1

(s), where GT denotes the generating function of T.

Proof. We will only handle the case = 2, but the general case follows in much that same way.The strategy is to show that

P(T2 = m + n | T1 = m) = P(T1 = n), m, n N0.

To do this, lets consider the events

Am,n = n is the first time after m that Xn = Xm + 1

= {Xi Xm for all i < n, and Xn = Xm + 1},for m n. These events have the two properties:

1. Ifi < j k < , then the events Ai,j and Ak, are independent.This is because Ai,j on depends on the steps i+1, i+1, . . . , j of the random walk, while Ak,only depends the steps

k+1,

k+2, . . . ,

, and all the steps of the random walk are independent.

2. P(Ai,j) = P(An+i,n+j) for all n N0.The event Ai,j is determined by some relatively complicated formula applied to the randomvariables (i+1, i+2, . . . , j) and the event An+i,n+j is determined by the exact same formulaapplied to the random variables (n+i+1, n+k+2, . . . , n+j).

More precisely, we can choose a set B {1, 1}ji1 such that Ai,j = {(i+1, i+2, . . . , j) B} and An+i,n+j = {(n+i+1, n+i+2, . . . , n+j) B}. As the random vectors (i+1, i+2, . . . , j)and (n+i+1, n+k+2 . . . , n+j) have the same joint distribution, the events Ai,j and An+i,n+jmust have the same probability.

So, if m, n N, thenP(T2 = m + n | T1 = m) = P(Am,m+n | A0,m) = P(Am,m+n) = P(A0,n) = P(T1 = n).

In then follows from Proposition 2.12 that GT2(s) = G2T1

(s).

We will now use the previous relationship between T1 and T2 to obtain the generating functionfor T1 explicitly.

23


24/33

Proposition 3.8. Let(X1, X2, . . . ) be a random walk with probability p of moving up and let

T1 = min{n N0 : Xn = 1}

denote the first time that the random walk hits the level 1. Then the generating function for T1 isgiven by

GT1(s) =1 1 4pqs2

2qs.

Proof. Our strategy is to condition of the first move that the random makes, and then derive arecursive formula for the generating function. As a result, it will be useful to consider an auxiliaryprocess given by:

Yn = Xn+1 X1, n N0.

So Y corresponds to the changes in the process X after the first step. It is not hard to check thatY is also a random walk with probability p of moving up at each step and Y is independent ofX1.

It turns out that it will be useful to consider the random variable:T2 = The first time that Y hits the level 2

= min {n N0 : Yn = 2}

As Y is a random walk, the generating function for T2 is given by G2T1

. As Y is independent of X1,and T2 is determined by Y, T

2 is also independent of X1.

Crucial observation: If the first step that X makes is down, then T1 = 1+T

2. This is becauseX has now has to climb two steps up to get from 1 up to 1. This is equivalent to Y climbing twosteps up from 0 to 2. The other thing to realize is that Y is running on a clock that is one timestep behind X. If Y first hits 2 at time T2 relative to its clock, then X first hits 1 at time 1 + T

2

relative the initial clock.

Now we make the recursive argument:

GT1(s) = E[sT1] = E[sT1 | X1 = 1]P(X1 = 1) + E[sT1 | X1 = 1]P(X1 = 1)

= s p + E[s1+T

2 | X1 = 1](1 p)= s p + sE[sT

2] (1 p)= s p + s G2T1(s) (1 p).

We now know that GT1 solves: GT1(s) = sp + sqG2T1

(s) for |s| < 1. There are two possiblesolutions to this equation (for each s), given by

GT1(s) =1

1 4pqs22qs .One of these solutions is always greater than 1 in absolute value, so it cannot correspond to a valueof a generating function and we must select the negative square root.

24


25/33

Once we have the generating function for T1, we can begin to answer some questions about thisrandom variable. The first question is: does the random walk eventually hit the level 1? To answerthis, we use second part of Proposition 2.7 and compute

P(T1 12 , the situation is less severe:

lims1

GT1(s) =1

p q .

We can summarize the situation in the following table

P[T1 < ] E[T1]p < 12

pq +

p =

1

2 1 +p > 12 1

1pq

Finally, we can try to extract the probability mass function of the random variable T1 from thegenerating function GT1. The obvious way to do this is to compute higher and higher derivativesof GT1 and then set s = 0. In this case, it turns out there is an easier way.

25


26/33

The square root appearing in the formula for GX is an expression of the form (1 + x)1/2 and

the (generalized) binomial formula can be used:

(1 + x) =k=0

k

xk, where

k

=

( 1) . . . ( k + 1)k!

, k N, R.

Therefore,

GT1(s) =1

2qs 1

2qs

k=0

1/2

k

(4pqs2)k =

k=1

s2k11

2p(4pq)k(1)k1

1/2

k

,

and

a2k1 =1

2q(4pq)k(1)k1

1/2

k

, k N.

Of course, the random walk cannot move from 0 to 1 in a even number of steps, so an = 0 if n iseven. This expression can be simplified a bit further: one can show (by induction on k for instance)that 1/2

k = 2(1)k+1

4k(2k 1)2k 1

k.

Thus,

P(T1 = 2k 1) = 12k 1

2k 1

k

pkqk1, k N,

and P(T1 = k) = 0 when k is even.

3.4 Strong Markov property

In the previous section, we observed that Yn = X1+n Xn is a again a random walk. Inuitively,we stop, reset time and space, and then continue, the resulting process that we obtain is again a

random walk. It turns out this is true not only for deterministic times, but even for stopping times.This in fact is a result of the strong Markov property which we will define later in the course.

Proposition 3.9. Let(Xn)nN0 be a random walk with parameter p, let T be a stopping time withrespect to (Xn)nN0 which never takes the value . If we define the process

Yn = XT+n XT, n N0,

then (Yn)nN0 is also a random walk with parameter p.

26


27/33

4 Branching Process

4.1 A bit of history

In the mid 19th century several aristocratic families in Victo-rian England realized that their family names could become ex-

tinct. Was it just unfounded paranoia, or did something realprompt them to come to this conclusion? They decided to askaround, and Sir Francis Galton (a polymath, anthropologist,eugenicist, tropical explorer, geographer, inventor, meteorolo-gist, proto-geneticist, psychometrician and statistician and half-cousin of Charles Darwin) posed the following question (1873,Educational Times):

How many male children (on average) must each gen-eration of a family have in order for the family nameto continue in perpetuity?

The first complete answer came from Reverend Henry WilliamWatson soon after, and the two wrote a joint paper entitled Onethe probability of extinction of families in 1874. By the end ofthis section, you will be able to give a precise answer to Galtonsquestion.

Sir Francis Galton

4.2 A mathematical model

The model proposed by Watson was the following:

1. A population starts with one individual at time n = 0: Z0 = 1.

2. After one unit of time (at time n = 1) the sole individual produces Z1 identical clones of itself

and dies. Z1 is an N0-valued random variable.

3. (a) If Z1 happens to be equal to 0 the population is dead and nothing happens at any futuretime n 2.

(b) If Z1 > 0, a unit of time later, each of Z1 individuals gives birth to a random numberof children and dies. The first one has Z1,1 children, the second one Z1,2 children, etc.The last, Zth1 one, gives birth to Z1,Z1 children. We assume that the distribution of thenumber of children is the same for each individual in every generation and independentof either the number of individuals in the generation and of the number of childrenthe others have. This distribution, shared by all Zn,i and Z1, is called the offspringdistribution. The total number of individuals in the second generation is now

Z2 =

Z1k=1

Z1,k.

27


28/33

(c) The third, fourth, etc. generations are produced in the same way. If it ever happens thatZn = 0, for some n, then Zm = 0 for all m n - the population is extinct. Otherwise,

Zn+1 =Znk=1

Zn,k.

Definition 4.1. A stochastic process with the properties described in (1), (2) and (3) above iscalled a (simple) branching process.

The mechanism that produces the next generation from the present one can differ from applicationto application. It is the offspring distribution alone that determines the evolution of a branchingprocess. With this new formalism, we can pose Galtons question more precisely:

Under what conditions on the offspring distribution will the process (Zn)nN0 never goextinct, i.e., when does

P[Zn

1 for all n

N0] = 1 (4.1)

hold?

4.3 Construction and simulation of branching processes

Before we answer Galtons question, let us figure out how to simulate a branching process, for agiven offspring distribution p(k) = P[Z1 = k], k N0. When we studies simulation, we showedthat it is possible to construct a function g : [0, 1] N0 such that the random variable g(U) hasprobability mass function p when U Uniform(0, 1).

Some time ago we asserted that a probability space which supports a sequence (Un)nN0 ofindependent U[0, 1] random variables exists. We think of (Un)nN0 as a sequence of random numbersproduced by a computer. Let us first apply the function g to each member of (U

n)nN0

to obtainan independent sequence (n)nN0 of N0-valued random variables with pmf p. In the case of asimple random walk, we would be done at this point - an accumulation of the first n elements of(n)nN0 would give you the value Xn of the random walk at time n. Branching processes are a bitmore complicated; the increment Zn+1 Zn depends on Zn: the more individuals in a generation,the more offspring they will produce. In other words, we need a black box with two inputs -randomness and Zn - which will produce Zn+1. What do we mean by randomness? Ideally, wewould need exactly Zn (unused) elements of (n)nN0 to simulate the number of children for eachof Zn members of generation n. This is exactly how one would do it in practice: given the sizeZn of generation n, one would draw Zn simulations from the distribution (pn)nN0, and sum upthe results to get Zn+1. Mathematically, it is easier to be more wasteful. The sequence (n)nN0can be rearranged into a double sequence2

{Zn,i

}nN0,iN. In words, instead of one sequence of

independent random variables with pmf p, we have a sequence of sequences. Such an abundanceallows us to feed the whole row {Zn,i}iN into the black box which produces Zn+1 from Zn. Youcan think ofZn,i as the number of children the i

th individual in the nth generation would have had

2Can you find a one-to-one and onto mapping from N into N N?

28


29/33

she been born. The black box uses only the first Zn elements of{Zn,i}iN and discards the rest:

Z0 = 1, Zn+1 =Zni=1

Zn,i,

where all{

Zn,i}nN0,iN are independent of each other and have the same distribution with pmf p.

Once we learn a bit more about the probabilistic structure of (Zn)nN0, we will describe anotherway to simulate it.

4.4 A generating-function approach

Having defined and constructed a branching process (Zn)nN0 with offspring distribution given bythe pmf p, let us analyze its probabilistic structure. The first question the needs to be answered isthe following: What is the distribution of Zn, for n N0? It is clear that Zn must be N0-valued,so its distribution is completely described by its pmf, which is, in turn, completely determined byits generating function. While an explicit expression for the pmf of Zn may not be available, itsgenerating function can always be computed:

Proposition 4.2. Let (Zn)nN0 be a branching process, and let the generating function of its off-spring distribution be given by G(s). Then the generating function of Zn is the n-fold compositionof G with itself, i.e.,

GZn(s) = G(G(. . . G(s) . . . )) n Gs

, for n 1.

Proof. For n = 1, the distribution of Z1 has pmfp, so GZ1(s) = G(s). Suppose that the statementof the proposition holds for some n N. Then

Zn+1 =Zni=1

Zi,n,

can be viewed as a random sum of Zn independent random variables where each random summandhas generating function G and the number of summands Zn is independent of the terms in thesum. Proposition 2.18 asserts that the generating function GZn+1 of Zn+1 is a composition of thegenerating function G(s) of each of the summands and the generating function GZn of the randomtime Zn. Therefore,

GZn+1(s) = GZn(G(s)) = G(G(. . . G(G(s)) . . . ))) n + 1 Gs

,

where the second equality follows by induction.

Let us use Proposition 4.2 in some simple examples.

Example 4.3. Let (Zn)nN0 be a branching process with offspring distribution (pn)n

N0. In thefirst three examples no randomness occurs and the population growth can be described exactly. In

the other examples, more interesting things happen.

1. p(0) = 1, p(n) = 0, n N:In this case Z0 = 1 and Zn = 0 for all n N. This infertile population dies after the firstgeneration.

29


30/33

2. p(0) = 0, p(1) = 1, p(n) = 0, n 2:Each individual produces exactly one child before he/she dies. The population size is always1: Zn = 1, n N0.

3. p(0) = 0, p(1) = 0, . . . , p(k) = 1, p(n) = 0, n k, for some k 2:Here, there are k kids per individual, so the population grows exponentially: G(s) = sk, so

GZn(s) = ((. . . (sk)k . . . )k)k = skn. Therefore, Zn = kn, for n N.4. p(0) = p, p(1) = q = 1 p,p(n) = 0, n 2:

Each individual tosses a (a biased) coin and has one child of the outcome is heads or dieschildless if the outcome is tails. The generating function of the offspring distribution isG(s) = p + qs. Therefore,

GZn(s) = (p + q(p + q(p + q(. . . (p + qs))))) n pairs of parentheses

.

The expression above can be simplified considerably. One needs to realize two things:

(a) After all the products above are expanded, the resulting expression must be of the formA + Bs, for some A, B. If you inspect the expression for GZn even more closely, youwill see that the coefficient B next to s is just qn.

(b) GZn is a generating function of a probability distribution, so A + B = 1.

Therefore,GZn(s) = (1 qn) + qns.

Of course, the value ofZn will be equal to 1 if and only if all of the coin-tosses of its ancestorsturned out to be heads. The probability of that event is qn. So we didnt need Proposition 4.2after all.

This example can be interpreted alternatively as follows. Each individual has exactly one child,but its gender is determined at random - male with probability q and female with probabilityp. Assuming that all females change their last name when they marry, and assuming thatall of them marry, Zn is just the number of individuals carrying the family name after ngenerations.

5. p(0) = p2, p(1) = 2pq,p(2) = q2, p(n) = 0, n 3:In this case each individual has exactly two children and their gender is female with probabilityp and male with probability q, independently of each other. The generating function G of theoffspring distribution (pn)nN is given by G(s) = (p + qs)

2. Then

GZn = (p + q(p + q(. . . p + qs)2 . . . )2)2 n pairs of parentheses

.

Unlike the example above, it is not so easy to simplify the above expression.

Proposition 4.2 can be used to compute the mean and variance of the population size Zn, forn N.

30


31/33

Proposition 4.4. Letp denote the pmf of the offspring distribution for a branching process (Zn)nN0.If p is integrable, i.e., if

=k=0

k p(k) < ,

then

E[Zn] = n. (4.2)

If the variance of p is also finite, i.e., if

2 =k=0

(k )2p(k) < ,

then

Var[Zn] = 2n(1 + + 2 + + n) =

2n 1

n+1

1 , = 1,2(n + 1), = 1

(4.3)

Proof. Since the distribution of Z1 has probability mass function p, it is clear that E[Z1] = andVar[Z1] =

2. We proceed by induction and assume that the formulas (4.2) and (4.3) hold for n N.By Proposition 4.2, the generating function GZn+1 is given as a composition GZn+1(s) = GZn(G(s)).Therefore, if we use the identity E[Zn+1] = G

Zn+1

(1), we get

GZn+1(1) = GZn(G(1))G

(1) = GZn(1)G(1) = E[Zn]E[Z1] =

n = n+1.

A similar (but more complicated and less illuminating) argument can be used to establish (4.3).

4.5 Extinction probability

We now turn to the central question (the one posed by Galton). We define extinction to be the

following event:E = { : Zn() = 0 for some n N}.

It follows from the properties of the branching process that Zm = 0 for all m n whenever Zn = 0.Therefore, we can write E as an increasing union of sets En, where

En = { : Zn() = 0}.Therefore, the sequence (P[En])nN is non-decreasing and continuity of probability implies that

P[E] = limn

P[En].

The number P[E] is called the extinction probability. Using generating functions, and, in

particular, the fact that P[En] = P[Zn = 0] = GZn(0) we get

P[E] = limn

GZn(0) = limnG(G(. . . G(0) . . . ))

n Gs

.

It is amazing that this probability can be computed, even if the explicit form of the generatingfunction GZn is not known.

31


32/33

Proposition 4.5. The extinction probability p = P[E] is the smallest non-negative solution of theequation

x = G(x), called the extinction equation,

where G is the generating function of the offspring distribution.

Proof. Let us show first that p = P[E] is a solution of the equation x = G(x). Indeed, G is acontinuous function, so G(limn xn) = limn G(xn) for every convergent sequence (xn)nN0 in[0, 1]. Let us take a particular sequence given by

xn = G(G(. . . G(0) . . . )) n Gs

.

Then

1. p = P[E] = limn xn, and

2. G(xn) = xn+1.

Therefore,p = lim

nxn = lim

nxn+1 = lim

nG(xn) = G( lim

nxn) = G(p),

and so p solves the equation G(x) = x.The fact that p = P[E] is the smallest solution of x = G(x) on [0, 1] is a bit trickier to get. Let

p be another solution of x = G(x) on [0, 1]. Since 0 p and G is a non-decreasing function on[0, 1], we have

G(0) G(p) = p.We can apply the function G to both sides of the inequality above to get

G(G(0)) G(G(p)) = G(p) = p.

Continuing in the same way we get

P[En] = G(G(. . . G(0) . . . )) n G

p,

so p = P[E] = limn P[En] p, so p is not larger then any other solution p of x = G(x).

Example 4.6. Let us compute extinction probabilities in the cases from Example 4.3.

1. p(0) = 1, p(n) = 0, n N:No need to use any theorems. P[E] = 1 in this case.

2. p(0) = 0, p(1) = 1, p(n) = 0, n 2:Like above, the situation is clear - P[E] = 0.

3. p(0) = 0, p(1) = 0, . . . , p(k) = 1, p(n) = 0, n k, for some k 2:No extinction here - P[E] = 0.

32


33/33

Walks and Branching

Documents

Transcript of Walks and Branching