guenomu software -- model and agorithm in 2013
-
Upload
leonardo-de-oliveira-martins -
Category
Science
-
view
97 -
download
4
description
Transcript of guenomu software -- model and agorithm in 2013
![Page 1: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/1.jpg)
guenomu
Software and Model
Leonardo de O. Martins
University of Vigo
May, 16th 2013
Leo Martins (U Vigo) guenomu software 2013/5/16 1 / 15
![Page 2: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/2.jpg)
Outline
1 The Model
2 The Sampling
3 The Code
Leo Martins (U Vigo) guenomu software 2013/5/16 2 / 15
![Page 3: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/3.jpg)
Hierarchical Bayesian model
P(S ,Θ | D) ∝ P(θ0)P( ~λ0)P(α0)P(S)×
×N∏i=1
P(Di | Gi , ~θi )P(~θi | θ0)P(Gi | ~λi , ~wi ,S)P(~λi | ~λ0)P(~wi | αi )P(αi | α0)
Leo Martins (U Vigo) guenomu software 2013/5/16 3 / 15
![Page 4: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/4.jpg)
The mixture of distance distributions
P(G | ~λ, ~w , S) =
w1e−(dDUPS (G ,S)/λDUPS+dLOSS (G ,S)/λLOSS ) + w2e−(dILS (G ,S)/λILS ) + w3e−(dRF (G ,S)/λRF )
Z(~λ, ~w , S)
wi ∼ Gamma(αgene , 1)
λx ∼ Exp(Λx )
each gene has its own set of wi and λi
the distances dx (G , S) are scaled to account for different gene family sizes
Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
![Page 5: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/5.jpg)
The mixture of distance distributions
P(G | ~λ, ~w , S) =
w1e−(dDUPS (G ,S)/λDUPS+dLOSS (G ,S)/λLOSS ) + w2e−(dILS (G ,S)/λILS ) + w3e−(dRF (G ,S)/λRF )
Z(~λ, ~w , S)
wi ∼ Gamma(αgene , 1)
λx ∼ Exp(Λx )
each gene has its own set of wi and λi
the distances dx (G , S) are scaled to account for different gene family sizes
Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
![Page 6: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/6.jpg)
The mixture of distance distributions
P(G | ~λ, ~w , S) =
w1e−(dDUPS (G ,S)/λDUPS+dLOSS (G ,S)/λLOSS ) + w2e−(dILS (G ,S)/λILS ) + w3e−(dRF (G ,S)/λRF )
Z(~λ, ~w , S)
wi ∼ Gamma(αgene , 1)
λx ∼ Exp(Λx )
each gene has its own set of wi and λi
the distances dx (G , S) are scaled to account for different gene family sizes
Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
![Page 7: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/7.jpg)
The mixture of distance distributions
P(G | ~λ, ~w , S) =
w1e−(dDUPS (G ,S)/λDUPS+dLOSS (G ,S)/λLOSS ) + w2e−(dILS (G ,S)/λILS ) + w3e−(dRF (G ,S)/λRF )
Z(~λ, ~w , S)
wi ∼ Gamma(αgene , 1)
λx ∼ Exp(Λx )
each gene has its own set of wi and λi
the distances dx (G , S) are scaled to account for different gene family sizes
Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
![Page 8: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/8.jpg)
The mixture of distance distributions
P(G | ~λ, ~w , S) =
w1e−(dDUPS (G ,S)/λDUPS+dLOSS (G ,S)/λLOSS ) + w2e−(dILS (G ,S)/λILS ) + w3e−(dRF (G ,S)/λRF )
Z(~λ, ~w , S)
wi ∼ Gamma(αgene , 1)
λx ∼ Exp(Λx )
each gene has its own set of wi and λi
the distances dx (G , S) are scaled to account for different gene family sizes
Leo Martins (U Vigo) guenomu software 2013/5/16 4 / 15
![Page 9: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/9.jpg)
Outline
1 The Model
2 The Sampling
3 The Code
Leo Martins (U Vigo) guenomu software 2013/5/16 5 / 15
![Page 10: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/10.jpg)
Doubly-intractable distributions
π(y | θ) =qθ(y)
Z (θ)=
eθts(y)
Z (θ); Z (θ) =
∑y
eθts(y) (1)
augmented distribution: π(θ′, y ′, θ | y) ∝ π(y | θ)π(θ)h(θ′ | θ)π(y ′ | θ′)
Gibbs update of the auxiliary variables θ′,y ′:
I. draw θ′ ∼ h(· | θ)II. draw y ′ ∼ π(· | θ′)
exchange ratio from θ to θ′
min
{1,
qθ(y ′)π(θ′)h(θ | θ′)qθ′(y)
qθ(y)π(θ)h(θ′ | θ)qθ′(y ′)
}(2)
We draw y ′ (the gene tree) through a secondary MCMC starting at itscurrent value
Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
![Page 11: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/11.jpg)
Doubly-intractable distributions
π(y | θ) =qθ(y)
Z (θ)=
eθts(y)
Z (θ); Z (θ) =
∑y
eθts(y) (1)
augmented distribution: π(θ′, y ′, θ | y) ∝ π(y | θ)π(θ)h(θ′ | θ)π(y ′ | θ′)Gibbs update of the auxiliary variables θ′,y ′:
I. draw θ′ ∼ h(· | θ)II. draw y ′ ∼ π(· | θ′)
exchange ratio from θ to θ′
min
{1,
qθ(y ′)π(θ′)h(θ | θ′)qθ′(y)
qθ(y)π(θ)h(θ′ | θ)qθ′(y ′)
}(2)
We draw y ′ (the gene tree) through a secondary MCMC starting at itscurrent value
Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
![Page 12: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/12.jpg)
Doubly-intractable distributions
π(y | θ) =qθ(y)
Z (θ)=
eθts(y)
Z (θ); Z (θ) =
∑y
eθts(y) (1)
augmented distribution: π(θ′, y ′, θ | y) ∝ π(y | θ)π(θ)h(θ′ | θ)π(y ′ | θ′)Gibbs update of the auxiliary variables θ′,y ′:
I. draw θ′ ∼ h(· | θ)
II. draw y ′ ∼ π(· | θ′)exchange ratio from θ to θ′
min
{1,
qθ(y ′)π(θ′)h(θ | θ′)qθ′(y)
qθ(y)π(θ)h(θ′ | θ)qθ′(y ′)
}(2)
We draw y ′ (the gene tree) through a secondary MCMC starting at itscurrent value
Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
![Page 13: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/13.jpg)
Doubly-intractable distributions
π(y | θ) =qθ(y)
Z (θ)=
eθts(y)
Z (θ); Z (θ) =
∑y
eθts(y) (1)
augmented distribution: π(θ′, y ′, θ | y) ∝ π(y | θ)π(θ)h(θ′ | θ)π(y ′ | θ′)Gibbs update of the auxiliary variables θ′,y ′:
I. draw θ′ ∼ h(· | θ)II. draw y ′ ∼ π(· | θ′)
exchange ratio from θ to θ′
min
{1,
qθ(y ′)π(θ′)h(θ | θ′)qθ′(y)
qθ(y)π(θ)h(θ′ | θ)qθ′(y ′)
}(2)
We draw y ′ (the gene tree) through a secondary MCMC starting at itscurrent value
Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
![Page 14: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/14.jpg)
Doubly-intractable distributions
π(y | θ) =qθ(y)
Z (θ)=
eθts(y)
Z (θ); Z (θ) =
∑y
eθts(y) (1)
augmented distribution: π(θ′, y ′, θ | y) ∝ π(y | θ)π(θ)h(θ′ | θ)π(y ′ | θ′)Gibbs update of the auxiliary variables θ′,y ′:
I. draw θ′ ∼ h(· | θ)II. draw y ′ ∼ π(· | θ′)
exchange ratio from θ to θ′
min
{1,
qθ(y ′)π(θ′)h(θ | θ′)qθ′(y)
qθ(y)π(θ)h(θ′ | θ)qθ′(y ′)
}(2)
We draw y ′ (the gene tree) through a secondary MCMC starting at itscurrent value
Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
![Page 15: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/15.jpg)
Doubly-intractable distributions
π(y | θ) =qθ(y)
Z (θ)=
eθts(y)
Z (θ); Z (θ) =
∑y
eθts(y) (1)
augmented distribution: π(θ′, y ′, θ | y) ∝ π(y | θ)π(θ)h(θ′ | θ)π(y ′ | θ′)Gibbs update of the auxiliary variables θ′,y ′:
I. draw θ′ ∼ h(· | θ)II. draw y ′ ∼ π(· | θ′)
exchange ratio from θ to θ′
min
{1,
qθ(y ′)π(θ′)h(θ | θ′)qθ′(y)
qθ(y)π(θ)h(θ′ | θ)qθ′(y ′)
}(2)
We draw y ′ (the gene tree) through a secondary MCMC starting at itscurrent value
Leo Martins (U Vigo) guenomu software 2013/5/16 6 / 15
![Page 16: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/16.jpg)
Species tree proposal with the exchange algorithm
Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
![Page 17: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/17.jpg)
Species tree proposal with the exchange algorithm
Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
![Page 18: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/18.jpg)
Species tree proposal with the exchange algorithm
Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
![Page 19: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/19.jpg)
Species tree proposal with the exchange algorithm
Leo Martins (U Vigo) guenomu software 2013/5/16 7 / 15
![Page 20: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/20.jpg)
Generalized Multiple-Try Metropolis
MH: sample y , decide if accept it with probability r
r =π(y)
π(x)
q(y , x)
q(x , y)=π(y)
π(x)
p(x | y)
p(y | x)
MTM: choose y among several samples, according to their relative weights
r =w(y1, x) + · · ·+ w(yk , x)
w(x∗1 , y) + · · ·+ w(x∗k , y)
where w(x , y) = π(x)q(x , y)λ(x , y) = π(x)p(y | x)λ(x , y)
GMTM: weights w(.) do not need to represent probability distributions.
r =π(y)pk(x | y)
π(x)pk(y | x)
Wx
Wy
where Wy = wi (yi ,x)∑kj=1 wj (yj ,x)
for the chosen element i
Leo Martins (U Vigo) guenomu software 2013/5/16 8 / 15
![Page 21: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/21.jpg)
Generalized Multiple-Try Metropolis
MH: sample y , decide if accept it with probability r
r =π(y)
π(x)
q(y , x)
q(x , y)=π(y)
π(x)
p(x | y)
p(y | x)
MTM: choose y among several samples, according to their relative weights
r =w(y1, x) + · · ·+ w(yk , x)
w(x∗1 , y) + · · ·+ w(x∗k , y)
where w(x , y) = π(x)q(x , y)λ(x , y) = π(x)p(y | x)λ(x , y)
GMTM: weights w(.) do not need to represent probability distributions.
r =π(y)pk(x | y)
π(x)pk(y | x)
Wx
Wy
where Wy = wi (yi ,x)∑kj=1 wj (yj ,x)
for the chosen element i
Leo Martins (U Vigo) guenomu software 2013/5/16 8 / 15
![Page 22: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/22.jpg)
Generalized Multiple-Try Metropolis
MH: sample y , decide if accept it with probability r
r =π(y)
π(x)
q(y , x)
q(x , y)=π(y)
π(x)
p(x | y)
p(y | x)
MTM: choose y among several samples, according to their relative weights
r =w(y1, x) + · · ·+ w(yk , x)
w(x∗1 , y) + · · ·+ w(x∗k , y)
where w(x , y) = π(x)q(x , y)λ(x , y) = π(x)p(y | x)λ(x , y)
GMTM: weights w(.) do not need to represent probability distributions.
r =π(y)pk(x | y)
π(x)pk(y | x)
Wx
Wy
where Wy = wi (yi ,x)∑kj=1 wj (yj ,x)
for the chosen element i
Leo Martins (U Vigo) guenomu software 2013/5/16 8 / 15
![Page 23: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/23.jpg)
gene tree proposal with GMTM or MTM
Leo Martins (U Vigo) guenomu software 2013/5/16 9 / 15
![Page 24: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/24.jpg)
gene tree proposal with GMTM or MTM
Leo Martins (U Vigo) guenomu software 2013/5/16 9 / 15
![Page 25: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/25.jpg)
gene tree proposal with GMTM or MTM
Leo Martins (U Vigo) guenomu software 2013/5/16 9 / 15
![Page 26: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/26.jpg)
Outline
1 The Model
2 The Sampling
3 The Code
Leo Martins (U Vigo) guenomu software 2013/5/16 10 / 15
![Page 27: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/27.jpg)
RF distance, Assignment cost (Hdist)
Leo Martins (U Vigo) guenomu software 2013/5/16 11 / 15
![Page 28: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/28.jpg)
RF distance, Assignment cost (Hdist)
Leo Martins (U Vigo) guenomu software 2013/5/16 11 / 15
![Page 29: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/29.jpg)
A parallel pseudo-random number generator (PRNG)
Given a seed and an algorithm, we have a stream of PRNs.
PRNG1
PRNG2
PRNG2
PRNG2
PRNG2
x1
seed
x2
x3
x4
x11 x12
Leo Martins (U Vigo) guenomu software 2013/5/16 12 / 15
![Page 30: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/30.jpg)
A parallel pseudo-random number generator (PRNG)
Given a seed and an algorithm, we have a stream of PRNs.
PRNG1
PRNG2
PRNG2
PRNG2
PRNG2
x1
seed
x2
x3
x4
x11 x12
Using a second algorithm, the firststream will give us a sequence ofseeds. We use the 150 parametersets for the Tausworthe (LFSR)generators (L’ecuyer, Maths Comput1999, pp.261).Therefore, given the seed, we canpredict all states of all streams.
Leo Martins (U Vigo) guenomu software 2013/5/16 12 / 15
![Page 31: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/31.jpg)
A parallel pseudo-random number generator (PRNG)
In our gene/species model:
PRNG1
PRNG2
PRNG2
PRNG2
PRNG2
x1
seed
x2
x3
x4
x11 x12
we split gene families among jobs
all jobs receive seed (broadcast)and therefore can reproduce thesame x1. That’s cheaper thancommunicating the states.
each job uses its own x(i+1) forsampling new gene trees etc. andcan work in parallel. They use thecommon x1 for sampling e.g. newspecies tree, which needssynchronization.
the only thing that must be sharedis thus the proposal values(AllReduce) when updating”global” parameters”, so that alljobs can make the sameacceptance/rejection decision.
Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
![Page 32: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/32.jpg)
A parallel pseudo-random number generator (PRNG)
In our gene/species model:
PRNG1
PRNG2
PRNG2
PRNG2
PRNG2
x1
seed
x2
x3
x4
x11 x12
we split gene families among jobs
all jobs receive seed (broadcast)and therefore can reproduce thesame x1. That’s cheaper thancommunicating the states.
each job uses its own x(i+1) forsampling new gene trees etc. andcan work in parallel. They use thecommon x1 for sampling e.g. newspecies tree, which needssynchronization.
the only thing that must be sharedis thus the proposal values(AllReduce) when updating”global” parameters”, so that alljobs can make the sameacceptance/rejection decision.
Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
![Page 33: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/33.jpg)
A parallel pseudo-random number generator (PRNG)
In our gene/species model:
PRNG1
PRNG2
PRNG2
PRNG2
PRNG2
x1
seed
x2
x3
x4
x11 x12
we split gene families among jobs
all jobs receive seed (broadcast)and therefore can reproduce thesame x1. That’s cheaper thancommunicating the states.
each job uses its own x(i+1) forsampling new gene trees etc. andcan work in parallel. They use thecommon x1 for sampling e.g. newspecies tree, which needssynchronization.
the only thing that must be sharedis thus the proposal values(AllReduce) when updating”global” parameters”, so that alljobs can make the sameacceptance/rejection decision.
Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
![Page 34: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/34.jpg)
A parallel pseudo-random number generator (PRNG)
In our gene/species model:
PRNG1
PRNG2
PRNG2
PRNG2
PRNG2
x1
seed
x2
x3
x4
x11 x12
we split gene families among jobs
all jobs receive seed (broadcast)and therefore can reproduce thesame x1. That’s cheaper thancommunicating the states.
each job uses its own x(i+1) forsampling new gene trees etc. andcan work in parallel. They use thecommon x1 for sampling e.g. newspecies tree, which needssynchronization.
the only thing that must be sharedis thus the proposal values(AllReduce) when updating”global” parameters”, so that alljobs can make the sameacceptance/rejection decision.
Leo Martins (U Vigo) guenomu software 2013/5/16 13 / 15
![Page 35: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/35.jpg)
Each job looks like an independent analysis
Leo Martins (U Vigo) guenomu software 2013/5/16 14 / 15
![Page 36: guenomu software -- model and agorithm in 2013](https://reader033.fdocuments.us/reader033/viewer/2022060122/5595adaa1a28abcb248b4630/html5/thumbnails/36.jpg)
https://bitbucket.org/leomrtns/guenomu
Leo Martins (U Vigo) guenomu software 2013/5/16 15 / 15