Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

32
Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions Jeremy M. Brown Robert C. Thomson @jembrown www.phyleauxgenetics.org

Transcript of Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Climbing Peaks and Crossing Valleys: Metropolis Coupling and Rugged Phylogenetic Distributions

Jeremy M. Brown Robert C. Thomson

@jembrown www.phyleauxgenetics.org

Bayesian Inference Requires Integration

Tree,Parameter Space

Pro

babi

lity

Den

sity

Ƭ2

Ƭ1

Markov Chain Monte Carlo (MCMC)

Tree,Parameter Space

Pro

babi

lity

Den

sity

1) Start somewhere 2) Propose a new position 3) Calculate posterior density

ratio (r) of new to old states - If r > 1, accept - If r < 1, accept with

probability r. 4) Record state. 5) Repeat many times.

Yes!Maybe

Markov Chain Monte Carlo (MCMC)

Tree,Parameter Space

Pro

babi

lity

Den

sity

MCMC Has Trouble With Rugged Distributions

Tree,Parameter Space

Pro

babi

lity

Den

sity

Ƭ2

Ƭ1

Tree,Parameter Space

Pro

babi

lity

Den

sity

Tree,Parameter Space

Pro

babi

lity

Den

sity

MCMC Has Trouble With Rugged Distributions

Tree,Parameter Space

Pro

babi

lity

Den

sity

MCMC Has Trouble With Rugged Distributions

Tree,Parameter Space

Pro

babi

lity

Den

sity

Bipartition Bayes Factors

A

B

C

E

D

Marginal likelihood with AB | CDE

Bayes Factor

Marginal likelihood without AB | CDE + -

Negative Constraints = Rugged Distributions

Negative Constraints = Rugged Distributions

homo_sapiens

pantherophis_guttata

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

crocodylus_porosus

pelomedusa_subrufa

sphenodon_tuatara

chrysemys_picta

homo_sapiens

chrysemys_picta

sphenodon_tuatara

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

pantherophis_guttata

pelomedusa_subrufa

crocodylus_porosus

zebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa

Alternative Insertion Swaps are Difficult

homo_sapiens

pantherophis_guttata

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

crocodylus_porosus

pelomedusa_subrufa

sphenodon_tuatara

chrysemys_picta zebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa

Data

Data

The Po-Boy Problem

How do you change the seafood on your po-boy while someone’s holding the sandwich?

Shrimp

Oysters

Halves of french roll = Naturally monophyletic taxa

Seafood = Inserted taxon

Metropolis Coupling (MC3) Improves Mixing

Tree,Parameter Space

Pro

babi

lity

Den

sity Additional heated chains

can act as “scouts”.

Swap?

Peaks All Found, But Different Probabilities?

homo_sapiens

chrysemys_picta

sphenodon_tuatara

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

pantherophis_guttata

pelomedusa_subrufa

crocodylus_porosus

homo_sapiens

pantherophis_guttata

zebra_finch

anolis_carolinensis

gallus_gallus

alligator_mississippiensis

crocodylus_porosus

pelomedusa_subrufa

sphenodon_tuatara

chrysemys_pictazebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa0.500.25

0.240.38

0.250.24

Run 1Run 2

GenerationLn

L

A Closer Look at the Acceptance Ratio

r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)

pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)

A Closer Look at the Acceptance Ratio

Does chain i like where chain j is?

Does chain j like where chain i is?

r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)

pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)

A Closer Look at the Acceptance Ratio

r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)

pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)

r =

p(⌧j , ✓j |D)

p(⌧i, ✓i|D)

� 1Ti

� 1Tj

A Closer Look at the Acceptance Ratio

r =pi(⌧j , ✓j |D) pj(⌧i, ✓i|D)

pi(⌧i, ✓i|D) pj(⌧j , ✓j |D)

r =

p(⌧j , ✓j |D)

p(⌧i, ✓i|D)

� 1Ti

� 1Tj

When temps equal, ALL swaps accepted regardless of posterior density.

A Simple One-Parameter Example

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

https://github.com/jembrown/toyMC3/

Max Temp > Number of Chains

2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Maximum Temperature

Peak O

ne P

robability

5 Chains

10 Chains

20 Chains

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

bability D

ensity

0.8

0.2

Peaks Have Different “Capture” Probabilities

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

P=0.8 P=0.2

Spurious Convergence by Chain Number

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

P=0.8 P=0.2

When two runs end up with the same distribution

of poorly mixing chains across peaks,

they will estimate nearly identical (but incorrect!)

probabilities.

Lots of Chains Looks Like Convergence

2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Maximum Temperature

Peak O

ne P

robability/S

tandard

Devia

tion

5 Chains

10 Chains

20 Chains

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

ba

bility D

en

sity

0.8

0.2

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

Peak One0.8 * N

Peak Two0.2 * N

P=0.8 P=0.2

N (large #) Chains

Law of Large Numbers

Lots of Chains Looks Like Convergence

Negative Constraint on Bird Monophyly

zebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Maximum Temperature

Pro

babi

lity

2 Chains4 Chains8 Chains16 Chains32 Chains

Negative Constraint on Bird Monophyly

zebra_finchhomo_sapiens

crocodylus_porosus

sphenodon_tuatara

pantherophis_guttata

chrysemys_picta

alligator_mississippiensis

gallus_gallus

anolis_carolinensis

pelomedusa_subrufa

0.0 0.5 1.0 1.5 2.0 2.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Maximum Temperature

Pro

babi

lity/

Sta

ndar

d D

evia

tion 2 Chains

4 Chains8 Chains16 Chains32 Chains

Warnings

• Despite improving mixing, MC3 analyses still require careful thought.

• With small numbers of chains and small numbers of runs, estimated probabilities can be incorrect but identical across some runs.

• With large numbers of chains, estimated probabilities become increasingly similar across all runs.

Broad v Rugged Distributions

Tree,Parameter Space

Pro

babi

lity

Den

sity

Recommendations

• For rugged distributions, increase maximum chain temperature not chain number

• For broad distributions, increase chain number

• Use more than 2 runs

Thank You

DEB-1355071DEB-1354506

@jembrown

Michael Landis

Karen Cranston

Negative Constraints = Rugged Distributions

TreeScaper

Guifang Zhou (SSB symposium lightning talk) - Monday, 1:45-1:50 - Ballroom A "A network framework to explore phylogenetic structure in genome data"

Guifang Zhou (iEvoBio talk) - Tuesday, 2:05-2:12 - Meeting Room 9C"TreeScaper: Software to visualize and extract phylogenetic signals from sets of trees”

https://github.com/whuang08/TreeScaper

Spurious Convergence by Chain Number

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

Parameter Value

Pro

babi

lity

Den

sity

0.8

0.2

2 Chains, 0 Chains0.64

1 Chain, 1 Chain0.32

0 Chains, 2 Chains0.04 P=0.8 P=0.2

2 Chains