Download - Computing Bayesian posterior with empirical likelihood in population genetics

Computing Bayesian posterior with empiricallikelihood in population genetics

Pierre Pudlo

INRA & U. Montpellier 2

SSB, 18 sept. 2012

Pudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 1 / 29

Table of contents

1 Likelihood free methods

2 Empirical likelihood

3 Population genetics

4 Numerical experiments


Table of contents






When the likelihood is not completely known

Likelihood intractable: `(y|φ) =

∫U`(y,u|φ)du

where u is a latent process

I Hidden Markov and other dynamic models: latent processwhich is not observed

↪→ Classical answer: Markov chain Monte Carlo,. . .

I Population genetics: the whole gene genealogy is unobservedLikelihood is an integral over

I all possible gene genealogiesI all possible mutations along the genealogies

↪→ Classical answer: Approximate Bayesian computation (ABC)


When the likelihood is not completely known

Likelihood intractable: `(y|φ) =

∫U`(y,u|φ)du

where u is a latent processI Hidden Markov and other dynamic models: latent process

which is not observed

↪→ Classical answer: Markov chain Monte Carlo,. . .

I Population genetics: the whole gene genealogy is unobservedLikelihood is an integral over

I all possible gene genealogiesI all possible mutations along the genealogies

↪→ Classical answer: Approximate Bayesian computation (ABC)


ABC in a nutshell

Posterior distribution is the conditional distribution ofπ(φ)`(x|φ) (∗)

knowing that x = xobs

MethodologyDraw a (large) set of particles (φi, xi) from (∗) and use anonparametric estimate of the conditional density

π(φ|xobs) ∝ π(φ)`(xobs|φ)

Seminal papersI Tavare, Balding, Griffith and Donnelly (1997, Genetics)I Pritchard, Seielstad, Perez-Lezuan, Feldman (1999, Molecular

Biology and Evolution)


ABC in a nutshellPosterior distribution is the conditional distribution of

π(φ)`(x|φ) (∗)knowing that x = xobs



Shortcomings.I time consuming – If simulation of the latent process is not

straightforward

I curse of dimensionality vs. lost of information –I If x lies in a high dimensional space X (often), we are unable to

estimate of the conditional densityI Hence, we project the (observed and simulated) datasets on a

space with smaller dimension (trough summary statistics)η : X → Rd (summary statistics)


ABC in a nutshellPosterior distribution is the conditional distribution of

π(φ)`(x|φ) (∗)knowing that x = xobs



Shortcomings.I time consuming – If simulation of the latent process is not

straightforwardI curse of dimensionality vs. lost of information –

I If x lies in a high dimensional space X (often), we are unable toestimate of the conditional density

I Hence, we project the (observed and simulated) datasets on aspace with smaller dimension (trough summary statistics)η : X → Rd (summary statistics)


You went to school to learn, girl (. . . )Why 2 plus 2 makes fourNow, now, now, I’m gonna teach you (. . . )

All you gotta do is repeat after me!A, B, C!It’s easy as 1, 2, 3!Or simple as Do, re, mi! (. . . )

SurveysI Beaumont (2010, Annual Review of Ecology, Evolution, and

Systematics)I Lopes, Beaumont (2010, Genetics and Evolution)I Csillery, Blum, Gaggiotti, Francois (2010, Trends in Ecology and

Evolution)I Marin, Pudlo, Robert, Ryder (to appear, Statis. and Comput.)


Table of contents






Empirical likelihood (EL)

Owen (1988, Biometrika),Owen (2001, Chapman &Hall)

EmpiricalLikelihood

Data

Parameters' definition in term of a moment constraint

x = (x1, ..., xK)

xProfiled likelihood

xi

Black boxInput:I Independent data

x = (x1, . . . , xK)I Definition of the parameters

E(h(xi,φ)) = 0Output:I A profiled “likelihood function”

Lel(φ|x)

EL do not require a fully defined andoften complex (hence debatable)parametric model



Owen (1988, Biometrika), Owen (2001, Chapman & Hall)

Assume that the dataset x is composed of n independent replicatesx = (x1, . . . , xn) of some X ∼ F

Generalized moment condition modelThe law F of X satisfy

EF[h(X,φ)

]= 0,

where h is a known function, and φ an unknown parameter

Empirical likelihood

Lel(φ|x) = maxp

n∏i=1

pi (∗)

for all p such that 0 6 pi 6 1,∑pi = 1,

∑i pih(xi,φ) = 0.




Lel(φ|x) = maxp

n∏i=1

pi (∗)

for all p such that 0 6 pi 6 1,∑pi = 1,

∑i pih(xi,φ) = 0.

Why (∗)?

(1) L(p) :=n∏i=1

pi is the likehood of

discrete distribution p =∑i piδxi

(2)∑i

pih(xi,φ) = 0 implies that p

fulfils the moment condition

How to compute?

Newton-Lagrange scheme.

The constraint is linear (in p)

See Owen (2001) chap. 12& package emplik in R


EL examples

EmpiricalLikelihood

Data


x = (x1, ..., xK)


xi

Simple case: φ = meanh(xi,φ) = xi − φ

I Lel(φ|x) is maximal atφ = 1

K(x1 + · · ·+ xK)

I if enough data (K large),& if data from a true parametricmodel,then Lel ≈ true likelihood


EL examples

EmpiricalLikelihood

Data


x = (x1, ..., xK)


xi

h(xi,φ) = xi − φred = param. likelihoodblack = Lel

Example 1: Gaussian modelK = 20

1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

theta

el.n

orm

al

K = 200

1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

theta

el.n

orm

al


EL examples

EmpiricalLikelihood

Data


x = (x1, ..., xK)


xi

red = true likelihoodblue = Gaussian likelihoodblack = Lel

Example 2: non Gaussian modelK = 200

0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0

0.0

0.2

0.4

0.6

0.8

1.0

theta

el.n

orm

al

I Lel is better than a false model.


Table of contents






Neutral model at a given microsatellite locus, in aclosed panmictic population at equilibrium

Sample of 8 genes

Mutations according tothe Simple stepwiseMutation Model (SMM)• date of the mutations ∼

Poisson process withintensity θ/2 over thebranches• MRCA = 100• independent mutations:±1 with pr. 1/2



Kingman’s genealogyWhen time axis isnormalized,T(k) ∼ Exp(k(k− 1)/2)







Poisson process withintensity θ/2 over thebranches

• MRCA = 100• independent mutations:±1 with pr. 1/2



Observations: leafs of the treeθ = ?





Much more interesting models. . .

I several independent locusIndependent gene genealogies and mutations

I different populationslinked by an evolutionary scenario made of divergences,admixtures, migrations between populations, etc.

I larger sample sizeusually between 50 and 100 genes

A typical evolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2


Moment condition in population genetics?

Main difficultyDerive a constraint

EF[h(X,φ)

]= 0,

on the parameters of interest φ when X is the genotypes of the sampleof individuals at a given locus

E.g., in phylogeography, φ is composed ofI dates of divergence between populations,I ratio of population sizes,I mutation rates, etc.

None of them are moments of the distribution of the allelic states of thesample

↪→ h = pairwise composite scores whose zero is the pairwisemaximum likelihood estimator


Pairwise composite likelihood?

The intra-locus pairwise likelihood

`2(xk|φ) =∏i<j

`2(xik, xjk|φ)

with x1k, . . . , xnk : allelic states of the gene sample at the k-th locus

Sample of size 2⇒ coalescent tree is a single line joining xik to xjk⇒ closed form of `2(xik, xjk|φ)

The pairwise score function

∇φ log `2(xk|φ) =∑i<j

∇φ log `2(xik, xjk|φ)


Composite lik , approximation of lik� Composite likelihoods are often much more narrow than the

distribution of the model (See, e.g., Cooley et al., 2012)

Hence,I produces too narrow posterior distributionsI confidence intervals or credibility intervals are not reliable

But,

Eφ(∇φ log `2(x|φ)) =∫∇φ

[log `2(x|φ)

]`(x|φ)dx

=∑i<j

∫∇φ

[log `2(xi, xj|φ)

]`2(xi, xj|φ)dxidxj

=∑i<j

∫ ∇φ`2(xi, xj|φ)`2(xi, xj|φ)

`2(xi, xj|φ)dxidxj

=∑i<j

∇φ[ ∫`2(xi, xj|φ)dxidxj

]= 0

Safe with EL because we only use position of its modePudlo (INRA & U. Montpellier 2) ABCel GenPop SSB 09/2012 19 / 29

Pairwise likelihood: a simple case

Assumptions

I sample ⊂ closed, panmicticpopulation at equilibrium

I marker: microsatelliteI mutation rate: θ/2

if xik et xjk are two genes of thesample,

`2(xik, xjk|θ) depends only on

δ = xik − xjk

`2(δ|θ) =1√1+ 2θ

ρ (θ)|δ|

withρ(θ) =

θ

1+ θ+√1+ 2θ

Pairwise score function∂θ log `2(δ|θ) =

−1

1+ 2θ+

|δ|

θ√1+ 2θ


Pairwise likelihood: 2 diverging populations

MRCA

POP a POP b

τ

AssumptionsI τ: divergence date of pop.a and b

I θ/2: mutation rateLet xik and xjk be two genescoming resp. from pop. a and bSet δ = xik − x

jk.

Then `2(δ|θ, τ) =e−τθ√1+ 2θ

+∞∑k=−∞ ρ(θ)

|k|Iδ−k(τθ).

whereIn(z) nth-order modified Besselfunction of the first kind


Pairwise likelihood: 2 diverging populations

MRCA

POP a POP b

τ

AssumptionsI τ: divergence date of pop.a and b

I θ/2: mutation rateLet xik and xjk be two genescoming resp. from pop. a and bSet δ = xik − x

jk.

A 2-dim score function∂τ log `2(δ|θ, τ) =

−θ+θ

2

`2(δ− 1|θ, τ) + `2(δ+ 1|θ, τ)`2(δ|θ, τ)

∂θ log `2(δ|θ, τ) =

−τ−1

1+ 2θ+

τ

2

`2(δ− 1|θ, τ) + `2(δ+ 1|θ, τ)`2(δ|θ, τ)

+

q(δ|θ, τ)`2(δ|θ, τ)

whereq(δ|θ, τ) :=

e−τθ√1+ 2θ

ρ ′(θ)

ρ(θ)

∞∑k=−∞ |k|ρ(θ)|k|Iδ−k(τθ)


Table of contents






A first experimentEvolutionary scenario:

MRCA

POP 0 POP 1

τ

Dataset:I 50 genes per populations,I 100 microsat. loci

Assumptions:I Ne identical over all

populationsI φ = (log10 θ, log10 τ)I uniform prior over(−1., 1.5)× (−1., 1.)

Comparison of the originalABC with ABCel

ESS=7034

log(theta)

Den

sity

0.00 0.05 0.10 0.15 0.20 0.25

05

1015

log(tau1)

Den

sity

−0.3 −0.2 −0.1 0.0 0.1 0.2 0.3

01

23

45

67

histogram = ABCelcurve = original ABCvertical line = “true” parameter


ABC vs. ABCel on 100 replicates of the 1st experiment

Accuracy:log10 θ log10 τ

ABC ABCel ABC ABCel(1) 0.097 0.094 0.315 0.117(2) 0.071 0.059 0.272 0.077(3) 0.68 0.81 1.0 0.80

(1) Root Mean Square Error of the posterior mean(2) Median Absolute Deviation of the posterior median(3) Coverage of the credibility interval of probability 0.8

Computation time: on a recent 6-core computer (C++/OpenMP)I ABC ≈ 4 hoursI ABCel≈ 2 minutes


Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2



populationsI φ = log10(θ, τ1, τ2)I non-informative uniform prior

Comparison of the original ABCwith ABCelhistogram = ABCelcurve = original ABCvertical line = “true” parameter


Second experimentEvolutionary scenario:

MRCA

POP 0 POP 1 POP 2

τ1

τ2



populationsI φ = log10(θ, τ1, τ2)I non-informative uniform prior

Comparison of the original ABCwith ABCel

histogram = ABCelcurve = original ABCvertical line = “true” parameter


ABC vs. ABCel on 100 replicates of the 2ndexperimentAccuracy:

log10 θ log10 τ1 log10 τ2ABC ABCel ABC ABCel ABC ABCel

(1) 0.0059 0.0794 0.472 0.483 29.3 4.76(2) 0.048 0.053 0.32 0.28 4.13 3.36(3) 0.79 0.76 0.88 0.76 0.89 0.79

(1) Root Mean Square Error of the posterior mean(2) Median Absolute Deviation of the posterior median(3) Coverage of the credibility interval of probability 0.8

Computation time: on a recent 6-core computer (C++/OpenMP)I ABC ≈ 6 hoursI ABCel≈ 8 minutes


Why?

On large datasets, ABCel gives more accurate results than ABC

ABC simplifies the dataset through summary statisticsDue to the large dimension of x, the original ABC algorithm estimates

π(θ∣∣∣η(xobs)

),

where η(xobs) is some (non-linear) projection of the observed dataseton a space with smaller dimension↪→ Some information is lost

ABCelsimplifies the model through a generalized moment conditionmodel.↪→ Here, the moment condition model is based on pairwisecomposition likelihood


Joint work withI Christian P. Robert (U. Dauphine & IUF)I Kerrie Mengersen (QUT, Australia)I Raphael Leblois (INRA CBGP, Montpellier)

Grant from ANR throughProject “Emile”

First preprint on arXivApproximate Bayesian computation viaempirical likelihood

Coming soon: population genetic modelswhich are too slow to simulate



Wilk’s theorem to EL likelihood ratio.

Theorem (Owen, 1991)Let X,X1, . . . ,Xn be iid random vector with common distribution F0.For φ ∈ Θ, x ∈X , let h(x,φ) ∈ Rs.Let φ0 ∈ Θ be such that Var(h(X, θ0)) is finite and has rank q > 0.

If φ0 satisfies E(h(X,φ0)) = 0, then −2 log(Lel(φ0|X1:n)

n−n

) → χ2(q)

Note thatI n−n is the maximum value of Lel

I no condition to ensure a unique value for θ0I no condition to make φel = argmaxLel a good estimate of φ0I provides tests and confidence intervals