The Geometric Inverse Burr Distribution: Model, Properties and Simulation
The Geometric Distribution
description
Transcript of The Geometric Distribution
The Geometric Distribution
• Probability of the 1st success on the Nth trial, given a probability, p, of success
04/21/23 Comp 790– Distributions & Coalescence 1
€
P(N=j)=(1−p) j−1p
E(N)=1p
Var(N)=1−pp2
P(Roll 1st 6 on the ith roll) =(1 - 5/6)i-1 (1/6)
P(1st heads on the ith flip) =(1 - 1/2)i-1 (1/2)
P(Roll 1st 6 on the ith roll) =(1 - 5/6)i-1 (1/6)
P(1st heads on the ith flip) =(1 - 1/2)i-1 (1/2)
€
(1−p) j−1pj=1
∞
∑
=
p (1−p) j
j=1
∞
∑1−p
=
p (1−p)∞−(1−p)(1−p)−1
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
1−p=
p 0−(1−p)−p
⎛
⎝ ⎜
⎞
⎠ ⎟
1−p=1
To show P(N=j) is a proper pdf:
Example
• Difference from “Binomial” distribution– Binomial(k) = P(k successes in N trials)– Geometric(k) = P(1st success after k-1 failures)
04/21/23 Comp 790– Distributions & Coalescence 2
Expected Value Proof
• Expected value is value times its probability
• Recall the relation:
• Substituting gives:
04/21/23 Comp 790– Distributions & Coalescence 3
€
E(N)= jj=1
∞
∑ (1−p) j−1p
=
p jj=1
∞
∑ (1−p) j
1−p
€
jj=1
∞
∑ a j = a(1−a)2
for0≤a<1
€
E(N)=
p 1−p
p2
⎛
⎝ ⎜
⎞
⎠ ⎟
1−p=1
p
Other Properties
• Markov Property– The probability of the “next step” in a discrete or
continuous process depends only on the process's present state
– The process is without memory of previous events
04/21/23 Comp 790– Distributions & Coalescence 4
€
P(T>t2 T> t1)=P(T> t2 −t1)
Continuous Generalization
• Geometric distributions characterize “discrete” events
• Sometimes we’d like to pose questions about continuous variable, for example– Probability that a population will be inbred after T years,
rather than after N generations, where T is a real number, and N is an integer
• The “continuous” counterpart of the geometric distribution is the “exponential” distribution
04/21/23 Comp 790– Distributions & Coalescence 5
Exponential Distribution
• The Exponential density function is characterized by one parameter, a, called the “rate” or “intensity”
04/21/23 Comp 790– Distributions & Coalescence 6
€
Exp(a,t)=ae−at
E(Exp(a,t))=1a
Var(Exp(a,t))= 1a2
€
ae−atdtt=0
∞
∫
ae−atdtt=0
∞
∫ =1−e−at
0
∞=1−0=1
To show Exp(a,t) is a proper pdf:
Exponential Properties
• Other useful properties of U = Exp(a,t) include:– Markov property, where t2 > t1
– Assuming a second independent exponential process, V = Exp(b,t)
04/21/23 Comp 790– Distributions & Coalescence 7
€
P(U >t2 U > t1)=P(U > t2 −t1)
€
P(U <V)= aa+b
(minU,V)~Exp(a+b)
Approximations
• The geometric distribution can be approximated with the exponential distribution in various ways
• Consider the following geometric distribution
• We can model discrete time as a rational fraction of of some very large number, M, that includes all intervals of interest. (i.e. 1/M, 2/M, … N/M … M/M, rather than 1, 2, 3, …)
• Assuming p is small and N is large, we can approximate “continuous” time as t = j/M and a = pM
04/21/23 Comp 790– Distributions & Coalescence 8
€
P(N≥j)=(1−p) j There are at least “j” failures before the first success
Approximations (cont)
• Recalling t = j/M and a = pM, we can rewrite (1-p)j as:
• Also note, for large M:
• Thus, P(T = t) = a P(N/M ≥ t) is approximately exponential with intensity a.
04/21/23 Comp 790– Distributions & Coalescence 9
€
P(N≥j)=(1−p) j = 1−pMM
⎛
⎝ ⎜
⎞
⎠ ⎟
jMM
= 1−aM
⎛
⎝ ⎜
⎞
⎠ ⎟tM
=P(NM≥t)
€
1−aM
⎛
⎝ ⎜
⎞
⎠ ⎟tM
≈e−at
The Discrete-Time Coalescent
• We consider the N-coalescent, or the coalescent for a sample of N genes (Kingman 1982)
• N-coalescent: What is the distribution of the number of generations to find the Most Recent Common Ancestor (MCRA) for a fixed population of 2N genes
• We use 2N because we recognize that the diploid case is more realistic, and it is related to the simpler haploid case by a factor of 2
04/21/23 Comp 790– Distributions & Coalescence 10
MRCA Examples
04/21/23 Comp 790– Distributions & Coalescence 11
Coalescence of two genes
• What is the distribution of the number of prior generations for the MCRA (waiting time)?
• Probability a common parent (i.e. the MCRA is in the immediately previous generation) is:
• Probability that 2 genes have a different parents is
04/21/23 Comp 790– Distributions & Coalescence 12
€
12N
The first gene can choose its ancestor freely,but the second must choose the same of the first, thus it has 1 out of 2N choices
€
1− 12N
Going back further
• Since sampling in successive generations is independent of the past, the probability that two genes find a common ancestor j generations back is:
• Which is a geometric distribution with p = 1/2N• Thus, the coalescence time for 2 genes is:
04/21/23 Comp 790– Distributions & Coalescence 13
€
MRCA(j)= 1− 12N
⎛
⎝ ⎜
⎞
⎠ ⎟j−1
12N
In the first, j-1, generations they chose different ancestors, and then in generation j they chose the same ancestor
€
E(MRCA(j))=1p=2N
MRCA Examples
04/21/23 Comp 790– Distributions & Coalescence 14
N = 10
N-genes, no common parent
• The waiting time for k ≤ 2N genes to have fewer than k lineages is:
• Manipulating a little
• Where, for large N, 1/N2 is negligible04/21/23 Comp 790– Distributions & Coalescence 15
€
(2N−1)2N
(2N−2)2N
L(2N−(k−1))
2N= 1− i
2Ni=1
k−1
∏
The 1st gene can choose it parent freely, but the next k-1 must choose from the remainderGenes without a child
€
1− i2N
i=1
k−1
∏ =1− j2N
i=1
k−1
∑ +O 1N2( )=1−
k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
+O 1N2( )
N-gene Colescence
• The probability k-genes have different parents is:
• And one or more have a common parent:
• Repeated failures for j generations leads to a geometric distribution, with
04/21/23 Comp 790– Distributions & Coalescence 16
€
1−k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
€
1− 1−k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟=
k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
€
P(N=j)= 1−k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
⎛
⎝ ⎜ ⎜
⎞
⎠ ⎟ ⎟
j−1k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
€
p=k2
⎛
⎝ ⎜
⎞
⎠ ⎟12N
Next Time
• Finish coalesence of a N-genes
• The effect of approximations
• The continuous-time coalescent
• The effective population size
04/21/23 Comp 790– Distributions & Coalescence 17