The Geometric Distribution

17
The Geometric Distribution • Probability of the 1 st success on the N th trial, given a probability, p, of success 06/08/22 Comp 790– Distributions & Coalescence 1 P ( N = j )=(1 − p) j −1 p E ( N )= 1 p Var ( N )= 1− p p 2 P(Roll 1 st 6 on the i th roll) = (1 - 5/6) i-1 (1/6) P(1 st heads on the i th flip) = (1 - 1/2) i-1 (1/2) (1 − p) j −1 p j =1 = p (1 − p) j j =1 1− p = p (1 − p) −(1 − p) (1 − p)−1 1− p = p 0 −(1 − p) p 1− p =1 To show P(N=j) is a proper pdf

description

The Geometric Distribution. Probability of the 1 st success on the N th trial, given a probability, p, of success. P(Roll 1 st 6 on the i th roll) = (1 - 5/6) i-1 (1/6) P(1 st heads on the i th flip) = (1 - 1/2) i-1 (1/2). To show P(N=j) is a proper pdf:. Example. - PowerPoint PPT Presentation

Transcript of The Geometric Distribution

Page 1: The Geometric Distribution

The Geometric Distribution

• Probability of the 1st success on the Nth trial, given a probability, p, of success

04/21/23 Comp 790– Distributions & Coalescence 1

P(N=j)=(1−p) j−1p

E(N)=1p

Var(N)=1−pp2

P(Roll 1st 6 on the ith roll) =(1 - 5/6)i-1 (1/6)

P(1st heads on the ith flip) =(1 - 1/2)i-1 (1/2)

P(Roll 1st 6 on the ith roll) =(1 - 5/6)i-1 (1/6)

P(1st heads on the ith flip) =(1 - 1/2)i-1 (1/2)

(1−p) j−1pj=1

=

p (1−p) j

j=1

∑1−p

=

p (1−p)∞−(1−p)(1−p)−1

⎝ ⎜ ⎜

⎠ ⎟ ⎟

1−p=

p 0−(1−p)−p

⎝ ⎜

⎠ ⎟

1−p=1

To show P(N=j) is a proper pdf:

Page 2: The Geometric Distribution

Example

• Difference from “Binomial” distribution– Binomial(k) = P(k successes in N trials)– Geometric(k) = P(1st success after k-1 failures)

04/21/23 Comp 790– Distributions & Coalescence 2

Page 3: The Geometric Distribution

Expected Value Proof

• Expected value is value times its probability

• Recall the relation:

• Substituting gives:

04/21/23 Comp 790– Distributions & Coalescence 3

E(N)= jj=1

∑ (1−p) j−1p

=

p jj=1

∑ (1−p) j

1−p

jj=1

∑ a j = a(1−a)2

for0≤a<1

E(N)=

p 1−p

p2

⎝ ⎜

⎠ ⎟

1−p=1

p

Page 4: The Geometric Distribution

Other Properties

• Markov Property– The probability of the “next step” in a discrete or

continuous process depends only on the process's present state

– The process is without memory of previous events

04/21/23 Comp 790– Distributions & Coalescence 4

P(T>t2 T> t1)=P(T> t2 −t1)

Page 5: The Geometric Distribution

Continuous Generalization

• Geometric distributions characterize “discrete” events

• Sometimes we’d like to pose questions about continuous variable, for example– Probability that a population will be inbred after T years,

rather than after N generations, where T is a real number, and N is an integer

• The “continuous” counterpart of the geometric distribution is the “exponential” distribution

04/21/23 Comp 790– Distributions & Coalescence 5

Page 6: The Geometric Distribution

Exponential Distribution

• The Exponential density function is characterized by one parameter, a, called the “rate” or “intensity”

04/21/23 Comp 790– Distributions & Coalescence 6

Exp(a,t)=ae−at

E(Exp(a,t))=1a

Var(Exp(a,t))= 1a2

ae−atdtt=0

ae−atdtt=0

∫ =1−e−at

0

∞=1−0=1

To show Exp(a,t) is a proper pdf:

Page 7: The Geometric Distribution

Exponential Properties

• Other useful properties of U = Exp(a,t) include:– Markov property, where t2 > t1

– Assuming a second independent exponential process, V = Exp(b,t)

04/21/23 Comp 790– Distributions & Coalescence 7

P(U >t2 U > t1)=P(U > t2 −t1)

P(U <V)= aa+b

(minU,V)~Exp(a+b)

Page 8: The Geometric Distribution

Approximations

• The geometric distribution can be approximated with the exponential distribution in various ways

• Consider the following geometric distribution

• We can model discrete time as a rational fraction of of some very large number, M, that includes all intervals of interest. (i.e. 1/M, 2/M, … N/M … M/M, rather than 1, 2, 3, …)

• Assuming p is small and N is large, we can approximate “continuous” time as t = j/M and a = pM

04/21/23 Comp 790– Distributions & Coalescence 8

P(N≥j)=(1−p) j There are at least “j” failures before the first success

Page 9: The Geometric Distribution

Approximations (cont)

• Recalling t = j/M and a = pM, we can rewrite (1-p)j as:

• Also note, for large M:

• Thus, P(T = t) = a P(N/M ≥ t) is approximately exponential with intensity a.

04/21/23 Comp 790– Distributions & Coalescence 9

P(N≥j)=(1−p) j = 1−pMM

⎝ ⎜

⎠ ⎟

jMM

= 1−aM

⎝ ⎜

⎠ ⎟tM

=P(NM≥t)

1−aM

⎝ ⎜

⎠ ⎟tM

≈e−at

Page 10: The Geometric Distribution

The Discrete-Time Coalescent

• We consider the N-coalescent, or the coalescent for a sample of N genes (Kingman 1982)

• N-coalescent: What is the distribution of the number of generations to find the Most Recent Common Ancestor (MCRA) for a fixed population of 2N genes

• We use 2N because we recognize that the diploid case is more realistic, and it is related to the simpler haploid case by a factor of 2

04/21/23 Comp 790– Distributions & Coalescence 10

Page 11: The Geometric Distribution

MRCA Examples

04/21/23 Comp 790– Distributions & Coalescence 11

Page 12: The Geometric Distribution

Coalescence of two genes

• What is the distribution of the number of prior generations for the MCRA (waiting time)?

• Probability a common parent (i.e. the MCRA is in the immediately previous generation) is:

• Probability that 2 genes have a different parents is

04/21/23 Comp 790– Distributions & Coalescence 12

12N

The first gene can choose its ancestor freely,but the second must choose the same of the first, thus it has 1 out of 2N choices

1− 12N

Page 13: The Geometric Distribution

Going back further

• Since sampling in successive generations is independent of the past, the probability that two genes find a common ancestor j generations back is:

• Which is a geometric distribution with p = 1/2N• Thus, the coalescence time for 2 genes is:

04/21/23 Comp 790– Distributions & Coalescence 13

MRCA(j)= 1− 12N

⎝ ⎜

⎠ ⎟j−1

12N

In the first, j-1, generations they chose different ancestors, and then in generation j they chose the same ancestor

E(MRCA(j))=1p=2N

Page 14: The Geometric Distribution

MRCA Examples

04/21/23 Comp 790– Distributions & Coalescence 14

N = 10

Page 15: The Geometric Distribution

N-genes, no common parent

• The waiting time for k ≤ 2N genes to have fewer than k lineages is:

• Manipulating a little

• Where, for large N, 1/N2 is negligible04/21/23 Comp 790– Distributions & Coalescence 15

(2N−1)2N

(2N−2)2N

L(2N−(k−1))

2N= 1− i

2Ni=1

k−1

The 1st gene can choose it parent freely, but the next k-1 must choose from the remainderGenes without a child

1− i2N

i=1

k−1

∏ =1− j2N

i=1

k−1

∑ +O 1N2( )=1−

k2

⎝ ⎜

⎠ ⎟12N

+O 1N2( )

Page 16: The Geometric Distribution

N-gene Colescence

• The probability k-genes have different parents is:

• And one or more have a common parent:

• Repeated failures for j generations leads to a geometric distribution, with

04/21/23 Comp 790– Distributions & Coalescence 16

1−k2

⎝ ⎜

⎠ ⎟12N

1− 1−k2

⎝ ⎜

⎠ ⎟12N

⎝ ⎜ ⎜

⎠ ⎟ ⎟=

k2

⎝ ⎜

⎠ ⎟12N

P(N=j)= 1−k2

⎝ ⎜

⎠ ⎟12N

⎝ ⎜ ⎜

⎠ ⎟ ⎟

j−1k2

⎝ ⎜

⎠ ⎟12N

p=k2

⎝ ⎜

⎠ ⎟12N

Page 17: The Geometric Distribution

Next Time

• Finish coalesence of a N-genes

• The effect of approximations

• The continuous-time coalescent

• The effective population size

04/21/23 Comp 790– Distributions & Coalescence 17