The Geometric Distribution

The Geometric Distribution

• Probability of the 1st success on the Nth trial, given a probability, p, of success

04/21/23 Comp 790– Distributions & Coalescence 1

€

P(N=j)=(1−p) j−1p

E(N)=1p

Var(N)=1−pp2

P(Roll 1st 6 on the ith roll) =(1 - 5/6)i-1 (1/6)

P(1st heads on the ith flip) =(1 - 1/2)i-1 (1/2)

P(Roll 1st 6 on the ith roll) =(1 - 5/6)i-1 (1/6)

P(1st heads on the ith flip) =(1 - 1/2)i-1 (1/2)

€

(1−p) j−1pj=1

∞

∑

=

p (1−p) j

j=1

∞

∑1−p

=

p (1−p)∞−(1−p)(1−p)−1

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟

1−p=

p 0−(1−p)−p

⎛

⎝ ⎜

⎞

⎠ ⎟

1−p=1

To show P(N=j) is a proper pdf:

Example

• Difference from “Binomial” distribution– Binomial(k) = P(k successes in N trials)– Geometric(k) = P(1st success after k-1 failures)


Expected Value Proof

• Expected value is value times its probability

• Recall the relation:

• Substituting gives:


€

E(N)= jj=1

∞

∑ (1−p) j−1p

=

p jj=1

∞

∑ (1−p) j

1−p

€

jj=1

∞

∑ a j = a(1−a)2

for0≤a<1

€

E(N)=

p 1−p

p2

⎛

⎝ ⎜

⎞

⎠ ⎟

1−p=1

p

Other Properties

• Markov Property– The probability of the “next step” in a discrete or

continuous process depends only on the process's present state

– The process is without memory of previous events


€

P(T>t2 T> t1)=P(T> t2 −t1)

Continuous Generalization

• Geometric distributions characterize “discrete” events

• Sometimes we’d like to pose questions about continuous variable, for example– Probability that a population will be inbred after T years,

rather than after N generations, where T is a real number, and N is an integer

• The “continuous” counterpart of the geometric distribution is the “exponential” distribution


Exponential Distribution

• The Exponential density function is characterized by one parameter, a, called the “rate” or “intensity”


€

Exp(a,t)=ae−at

E(Exp(a,t))=1a

Var(Exp(a,t))= 1a2

€

ae−atdtt=0

∞

∫

ae−atdtt=0

∞

∫ =1−e−at

0

∞=1−0=1

To show Exp(a,t) is a proper pdf:

Exponential Properties

• Other useful properties of U = Exp(a,t) include:– Markov property, where t2 > t1

– Assuming a second independent exponential process, V = Exp(b,t)


€

P(U >t2 U > t1)=P(U > t2 −t1)

€

P(U <V)= aa+b

(minU,V)~Exp(a+b)

Approximations

• The geometric distribution can be approximated with the exponential distribution in various ways

• Consider the following geometric distribution

• We can model discrete time as a rational fraction of of some very large number, M, that includes all intervals of interest. (i.e. 1/M, 2/M, … N/M … M/M, rather than 1, 2, 3, …)

• Assuming p is small and N is large, we can approximate “continuous” time as t = j/M and a = pM


€

P(N≥j)=(1−p) j There are at least “j” failures before the first success

Approximations (cont)

• Recalling t = j/M and a = pM, we can rewrite (1-p)j as:

• Also note, for large M:

• Thus, P(T = t) = a P(N/M ≥ t) is approximately exponential with intensity a.


€

P(N≥j)=(1−p) j = 1−pMM

⎛

⎝ ⎜

⎞

⎠ ⎟

jMM

= 1−aM

⎛

⎝ ⎜

⎞

⎠ ⎟tM

=P(NM≥t)

€

1−aM

⎛

⎝ ⎜

⎞

⎠ ⎟tM

≈e−at

The Discrete-Time Coalescent

• We consider the N-coalescent, or the coalescent for a sample of N genes (Kingman 1982)

• N-coalescent: What is the distribution of the number of generations to find the Most Recent Common Ancestor (MCRA) for a fixed population of 2N genes

• We use 2N because we recognize that the diploid case is more realistic, and it is related to the simpler haploid case by a factor of 2


MRCA Examples


Coalescence of two genes

• What is the distribution of the number of prior generations for the MCRA (waiting time)?

• Probability a common parent (i.e. the MCRA is in the immediately previous generation) is:

• Probability that 2 genes have a different parents is


€

12N

The first gene can choose its ancestor freely,but the second must choose the same of the first, thus it has 1 out of 2N choices

€

1− 12N

Going back further

• Since sampling in successive generations is independent of the past, the probability that two genes find a common ancestor j generations back is:

• Which is a geometric distribution with p = 1/2N• Thus, the coalescence time for 2 genes is:


€

MRCA(j)= 1− 12N

⎛

⎝ ⎜

⎞

⎠ ⎟j−1

12N

In the first, j-1, generations they chose different ancestors, and then in generation j they chose the same ancestor

€

E(MRCA(j))=1p=2N

MRCA Examples


N = 10

N-genes, no common parent

• The waiting time for k ≤ 2N genes to have fewer than k lineages is:

• Manipulating a little

• Where, for large N, 1/N2 is negligible04/21/23 Comp 790– Distributions & Coalescence 15

€

(2N−1)2N

(2N−2)2N

L(2N−(k−1))

2N= 1− i

2Ni=1

k−1

∏

The 1st gene can choose it parent freely, but the next k-1 must choose from the remainderGenes without a child

€

1− i2N

i=1

k−1

∏ =1− j2N

i=1

k−1

∑ +O 1N2( )=1−

k2

⎛

⎝ ⎜

⎞

⎠ ⎟12N

+O 1N2( )

N-gene Colescence

• The probability k-genes have different parents is:

• And one or more have a common parent:

• Repeated failures for j generations leads to a geometric distribution, with


€

1−k2

⎛

⎝ ⎜

⎞

⎠ ⎟12N

€

1− 1−k2

⎛

⎝ ⎜

⎞

⎠ ⎟12N

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟=

k2

⎛

⎝ ⎜

⎞

⎠ ⎟12N

€

P(N=j)= 1−k2

⎛

⎝ ⎜

⎞

⎠ ⎟12N

⎛

⎝ ⎜ ⎜

⎞

⎠ ⎟ ⎟

j−1k2

⎛

⎝ ⎜

⎞

⎠ ⎟12N

€

p=k2

⎛

⎝ ⎜

⎞

⎠ ⎟12N

Next Time

• Finish coalesence of a N-genes

• The effect of approximations

• The continuous-time coalescent

• The effective population size


The Geometric Distribution

Documents

Transcript of The Geometric Distribution