Bayesian Divergence Time Estimation – Workshop Lecture
-
Upload
tracy-heath -
Category
Science
-
view
2.305 -
download
15
description
Transcript of Bayesian Divergence Time Estimation – Workshop Lecture
B D TE
Tracy Heath
Integrative Biology, University of California, Berkeley
Ecology & Evolutionary Biology, University of Kansas
Ecology, Evolution, & Organismal Biology, Iowa State University (in 2015)
2014 Workshop on Molecular EvolutionWoods Hole, MA USA
O
Overview of divergence time estimation
• Relaxed clock models – accounting for variation insubstitution rates among lineages
• Tree priors and fossil calibration
break
BEAST v2.1.3 Tutorialhttp://treethinkers.org/divergence-time-estimation-using-beast/
• Walk through: set up BEAST input file in BEAUti andexecute BEAST MCMC analysis
• On your own: complete analysis & summarize output
dinner
A T-S E
Phylogenetic trees can provide both topological informationand temporal information
0.2 expected
substitutions/site
Primates
Carnivora
Cetacea
Simiiformes
Artiodactyla
Mic
roce
bu
s
Homininae
Hippo
Fo
ssil
Ca
libra
tio
ns
Ph
ylo
ge
ne
tic R
ela
tio
nsh
ips
Se
qu
en
ce
Da
ta
Did Simiiformes experience
accelerated rates of molecular
evolution?
What is the age of the MRCA of
mouse lemurs (Microcebus)?
100 0.020.040.060.080.0
EquusRhinocerosBosHippopotamusBalaenopteraPhyseterUrsusCanisFelisHomoPanGorillaPongoMacacaCallithrixLorisGalagoDaubentoniaVareciaEulemurLemurHapalemurPropithecusLepilemur
MirzaM. murinusM. griseorufus
M. myoxinusM. berthaeM. rufus1M. tavaratraM. rufus2M. sambiranensisM. ravelobensis
Cheirogaleus
Sim
iiform
es
Mic
rocebus
Cretaceous Paleogene Neogene Q
Time (Millions of years)
Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003)
A T-S EPhylogenetic divergence-timeestimation
• What was the spacial andclimatic environment of ancientangiosperms?
• Did the uplift of the PatagonianAndes drive the diversity ofPeruvian lilies?
• How has mammalian body-sizechanged over time?
• Is diversification in Caribbeananoles correlated with ecologicalopportunity?
• How has the rate of molecularevolution changed across theTree of Life?
(Antonelli & Sanmartin. Syst. Biol. 2011)
(Lartillot & Delsuc. Evolution 2012) (Mahler, Revell, Glor, & Losos. Evolution 2010)
(Nabholz, Glemin, Galtier. MBE 2008)
Historical biogeography Molecular evolution
Trait evolution
Diversi�cation
Anolis fowleri (image by L. Mahler)
Understanding Evolutionary Processes
D T E
Goal: Estimate the ages of interior nodes to understand thetiming and rates of evolutionary processes
Model how rates aredistributed across the tree
Describe the distribution ofspeciation events over time
External calibrationinformation for estimates ofabsolute node times
calibrated node
100 0.020.040.060.080.0
EquusRhinocerosBosHippopotamusBalaenopteraPhyseterUrsusCanisFelisHomoPanGorillaPongoMacacaCallithrixLorisGalagoDaubentoniaVareciaEulemurLemurHapalemurPropithecusLepilemur
MirzaM. murinusM. griseorufus
M. myoxinusM. berthaeM. rufus1M. tavaratraM. rufus2M. sambiranensisM. ravelobensis
Cheirogaleus
Sim
iiform
es
Mic
rocebus
Cretaceous Paleogene Neogene Q
Time (Millions of years)
U H B
“From East Gondwana to Central America: historicalbiogeography of the Alstroemeriaceae”
(Chacón et al., J. Biolgeograpy 2012)
D T E
Historical biogeographyrequires external calibration
Model how rates aredistributed across the tree
Describe the distribution ofspeciation events over time
External calibrationinformation for estimates ofabsolute node times
(Chacón et al., J. Biolgeograpy 2012)
D T E
What about when the fossil record (or other types ofcalibration information) is poor or absent?
Example: Despite the richdiversity of Anolis there arefew fossils
There are some amberfossils, but these fossils fallwithin a narrow time range
Amber Anolis fossil (http://www.anoleannals.org/2012/03/06/the-hi-tech-world-of-anole-paleontology/)
D T E
What about when the fossilrecord is poor or absent?
Model how rates aredistributed across the tree
Describe the distribution ofspeciation events over time
Estimation of relativedivergence times
Anolis hendersoni (Image courtesy of L. Mahler)
R T D
“Ecological opportunity and the rate of morphologicalevolution in the diversification of Greater Antillean Anoles”
Anolis fowleri (image courtesy of L. Mahler)
(Mahler, Revell, Glor, & Losos. Evolution 2010)
T D I D
Divergence time estimationof rapidly evolvingpathogens provideinformation about spatialand temporal dynamics ofinfectious diseases
Sequences sampled atdifferent time horizonsimpose a temporal structureon the tree by providingages for non-contemporaneous tips
(Pybus & Rambaut. 2009. Nature Reviews Genetics.)
A T-S E
Phylogenetic trees can provide both topological informationand temporal information
100 0.020.040.060.080.0
EquusRhinocerosBosHippopotamusBalaenopteraPhyseterUrsusCanisFelisHomoPanGorillaPongoMacacaCallithrixLorisGalagoDaubentoniaVareciaEulemurLemurHapalemurPropithecusLepilemur
MirzaM. murinusM. griseorufus
M. myoxinusM. berthaeM. rufus1M. tavaratraM. rufus2M. sambiranensisM. ravelobensis
Cheirogaleus
Sim
iiform
es
Mic
roce
bu
s
Cretaceous Paleogene Neogene Q
Time (Millions of years)
Understanding Evolutionary Processes (Yang & Yoder Syst. Biol. 2003; Heath et al. MBE 2012)
T G M C
Assume that the rate ofevolutionary change isconstant over time
(branch lengths equalpercent sequencedivergence) 10%
400 My
200 My
A B C
20%
10%10%
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
T G M C
We can date the tree if weknow the rate of change is1% divergence per 10 My N
A B C
20%
10%10%
10%200 My
400 My
200 My
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
T G M C
If we found a fossil of theMRCA of B and C, we canuse it to calculate the rateof change & date the rootof the tree
N
A B C
20%
10%10%
10%200 My
400 My
(Based on slides by Jeff Thorne; http://statgen.ncsu.edu/thorne/compmolevo.html)
R G M C
Rates of evolution vary across lineages and over time
Mutation rate:Variation in
• metabolic rate
• generation time
• DNA repair
Fixation rate:Variation in
• strength and targets ofselection
• population sizes
10%
400 My
200 My
A B C
20%
10%10%
U A
Sequence data provideinformation about branchlengths
In units of the expected # ofsubstitutions per site
branch length = rate × time0.2 expected
substitutions/site
Ph
ylo
ge
ne
tic R
ela
tio
nsh
ips
Se
qu
en
ce
Da
ta
P L
f (D | V , θs,Ψ)
V Vector of branch lengths
θs Sequence model parameters
D Sequence data
Ψ Tree topology
R T
The expected # of substitutions/site occurring along abranch is the product of the substitution rate and time
length = rate × time length = rate length = time
Methods for dating species divergences estimate thesubstitution rate and time separately
S R
Substitution rate: the rateat which mutations are fixedin a population
Depends on: mutation rate,selection, population size,drift length = subst. rate
Mutation rate measures the rate at which mutations occurover time and is affected by metabolic rate, generation time,DNA repair efficiency
R T
The sequence dataprovide informationabout branch length
for any possible rate,there’s a time that fitsthe branch lengthperfectly
0
1
2
3
4
5
0 1 2 3 4 5
Bra
nch
Rat
e
Branch Time
time = 0.8rate = 0.625
branch length = 0.5
(based on Thorne & Kishino, 2005)
B D T E
length = rate length = time
R = (r, r, r, . . . , rN−)
A = (a, a, a, . . . , aN−)
N = number of tips
B D T E
length = rate length = time
R = (r, r, r, . . . , rN−)
A = (a, a, a, . . . , aN−)
N = number of tips
B D T E
Posterior probability
f (R,A, θR, θA, θs | D,Ψ)
R Vector of rates on branches
A Vector of internal node ages
θR, θA, θs Model parameters
D Sequence data
Ψ Tree topology
B D T E
f(R,A, θR, θA, θs | D) =
f (D | R,A, θs) f(R | θR) f(A | θA) f(θs)
f(D)
f(D | R,A, θR, θA, θs) Likelihood
f(R | θR) Prior on rates
f(A | θA) Prior on node ages
f(θs) Prior on substitution parameters
f(D) Marginal probability of the data
B D T E
Estimating divergence times relies on 2 main elements:
• Branch-specific rates: f (R | θR)
• Node ages: f (A | θA,C)
M R VSome models describing lineage-specific substitution ratevariation:
• Global molecular clock (Zuckerkandl & Pauling, 1962)
• Local molecular clocks (Hasegawa, Kishino & Yano 1989;Kishino & Hasegawa 1990; Yoder & Yang 2000; Yang & Yoder
2003, Drummond and Suchard 2010)
• Punctuated rate change model (Huelsenbeck, Larget andSwofford 2000)
• Log-normally distributed autocorrelated rates (Thorne,Kishino & Painter 1998; Kishino, Thorne & Bruno 2001; Thorne &
Kishino 2002)
• Uncorrelated/independent rates models (Drummond et al.2006; Rannala & Yang 2007; Lepage et al. 2007)
• Mixture models on branch rates (Heath, Holder, Huelsenbeck2012)
Models of Lineage-specific Rate Variation
G M C
The substitution rate isconstant over time
All lineages share the samerate
branch length = substitution rate
low high
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
G M C
rate
de
nsity
r
rate prior distribution
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
G M C
The sampled rate is appliedto every branch in the tree
rate
de
nsity
r
rate prior distribution
Models of Lineage-specific Rate Variation (Zuckerkandl & Pauling, 1962)
R G M C
Rates of evolution vary across lineages and over time
Mutation rate:Variation in
• metabolic rate
• generation time
• DNA repair
Fixation rate:Variation in
• strength and targets ofselection
• population sizes
10%
400 My
200 My
A B C
20%
10%10%
R-C M
To accommodate variation in substitution rates‘relaxed-clock’ models estimate lineage-specific substitutionrates
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Mixture models on branch rates
L M C
Rate shifts occurinfrequently over the tree
Closely related lineageshave equivalent rates(clustered by sub-clades)
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
L M C
Most methods forestimating local clocksrequired specifying thenumber and locations ofrate changes a priori
Drummond and Suchard(2010) introduced aBayesian method thatsamples over a broad rangeof possible random localclocks
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Yang & Yoder 2003, Drummond and Suchard 2010)
A R
Substitution rates evolvegradually over time –closely related lineages havesimilar rates
The rate at a node isdrawn from a lognormaldistribution with a meanequal to the parent rate
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)
A R
Models of Lineage-specific Rate Variation (Thorne, Kishino & Painter 1998; Kishino, Thorne & Bruno 2001)
P R C
Rate changes occur alonglineages according to apoint process
At rate-change events, thenew rate is a product ofthe parent’s rate and aΓ-distributed multiplier
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Huelsenbeck, Larget and Swofford 2000)
I/U R
Lineage-specific rates areuncorrelated when the rateassigned to each branch isindependently drawn froman underlying distribution
low high
branch length = substitution rate
Models of Lineage-specific Rate Variation (Drummond et al. 2006)
I/U R
In BEAST, the rates for the branches are drawn from adiscretized lognormal distribution
De
nsity
0 2.01.51.00.5
Branch rate (r)
Branch rates under the uncorrelated, discritized LN model
Models of Lineage-specific Rate Variation (Drummond et al. 2006)
I M M
Dirichlet process prior:Branches are partitionedinto distinct rate categories
branch length = substitution rate
c5
c4
c3
c2
substitution rate classes
c1
Models of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE)
T D P P (DPP)
A stochastic process that models data as a mixture ofdistributions and can identify latent classes present in thedata
Branches are assumed toform distinct substitutionrate clusters
Efficient Markov chainMonte Carlo (MCMC)implementations allow forinference under this model
branch length = substitution rate
c5
c4
c3
c2
substitution rate classes
c1
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
T D P P (DPP)
A stochastic process that models data as a mixture ofdistributions and can identify latent classes present in thedata
Random variables under theDPP informed by the data:
• the number of rateclasses
• the assignment ofbranches to classes
• the rate value for eachclass
branch length = substitution rate
c5
c4
c3
c2
substitution rate classes
c1
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
T D P P (DPP)
A stochastic process that models data as a mixture ofdistributions and can identify latent classes present in thedata
Random variables under theDPP informed by the data:
• the number of rateclasses
• the assignment ofbranches to classes
• the rate value for eachclass
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
DPP C P
Global molecular
clock
rate classes
branch length = substitution rate
18c1
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
DPP C P
Independent
rates
1c1
1c18
1c3
1c2
rate classes
branch length = substitution rate
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
DPP C P
Local molecular
clock
5c3
5c2
rate classes
branch length = substitution rate
8c1
(262,142 different local molecular clocks)
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
DPP C P
Global molecular
clock
4c3
6c2
rate classes
branch length = substitution rate
8c1
Each of the
682,076,806,159
configurations
has a prior weight
DPP Model of Lineage-specific Rate Variation (Heath, Holder, Huelsenbeck. 2012 MBE 29:939-955)
M R V
These are only a subset of the available models forbranch-rate variation
• Global molecular clock
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Dirchlet process prior
Models of Lineage-specific Rate Variation
M R V
Are our models appropriate across all data sets?
cave bear
American
black bear
sloth bear
Asian
black bear
brown bear
polar bear
American giant
short-faced bear
giant panda
sun bear
harbor seal
spectacled
bear
4.08
5.39
5.66
12.86
2.75
5.05
19.09
35.7
0.88
4.58
[3.11–5.27]
[4.26–7.34]
[9.77–16.58]
[3.9–6.48]
[0.66–1.17]
[4.2–6.86]
[2.1–3.57]
[14.38–24.79]
[3.51–5.89]14.32
[9.77–16.58]
95% CI
mean age (Ma)
t 2
t 3
t 4
t 6
t 7
t 5
t 8
t 9
t 10
t x
node
MP•MLu•MLp•Bayesian
100•100•100•1.00
100•100•100•1.00
85•93•93•1.00
76•94•97•1.00
99•97•94•1.00
100•100•100•1.00
100•100•100•1.00
100•100•100•1.00
t 1
Eocene Oligocene Miocene Plio Plei Hol
34 5.3 1.823.8 0.01
Epochs
Ma
Global expansion of C4 biomassMajor temperature drop and increasing seasonality
Faunal turnover
Krause et al., 2008. Mitochondrial genomes reveal anexplosive radiation of extinct and extant bears near theMiocene-Pliocene boundary. BMC Evol. Biol. 8.
Taxa
1
5
10
50
100
500
1000
5000
10000
20000
0100200300MYA
Ophidiiformes
Percomorpha
Beryciformes
Lampriformes
Zeiforms
Polymixiiformes
Percopsif. + Gadiif.
Aulopiformes
Myctophiformes
Argentiniformes
Stomiiformes
Osmeriformes
Galaxiiformes
Salmoniformes
Esociformes
Characiformes
Siluriformes
Gymnotiformes
Cypriniformes
Gonorynchiformes
Denticipidae
Clupeomorpha
Osteoglossomorpha
Elopomorpha
Holostei
Chondrostei
Polypteriformes
Clade r ε ΔAIC
1. 0.041 0.0017 25.32. 0.081 * 25.53. 0.067 0.37 45.1 4. 0 * 3.1Bg. 0.011 0.0011
Ostariophysi
Acanthomorpha
Teleo
stei
Santini et al., 2009. Did genome duplication drive the originof teleosts? A comparative study of diversification inray-finned fishes. BMC Evol. Biol. 9.
M R V
These are only a subset of the available models forbranch-rate variation
• Global molecular clock
• Local molecular clocks
• Punctuated rate change model
• Log-normally distributed autocorrelated rates
• Uncorrelated/independent rates models
• Dirchlet process prior
Model selection and model uncertainty are very importantfor Bayesian divergence time analysis
Models of Lineage-specific Rate Variation
B D T E
Estimating divergence times relies on 2 main elements:
• Branch-specific rates: f (R | θR)
• Node ages: f (A | θA,C)
http://bayesiancook.blogspot.com/2013/12/two-sides-of-same-coin.html
P N T
Relaxed clock Bayesian analyses require a prior distributionon node times
f(A | θA)
Different node-age priors make different assumptions aboutthe timing of divergence events
Node Age Priors
G N T P
Assumed to be vague or uninformative by not makingassumptions about biological processes
Uniform prior: the time ata given node has equalprobability across theinterval between the timeof the parent node and thetime of the oldest daughternode
(conditioned on root age)
Node Age Priors
G N T P
Assumed to be vague or uninformative by not makingassumptions about biological processes
Dirichlet prior: ages of theinterior nodes on a singlepath spanning the age ofthe root node to one ofthe tip nodes are sampledfrom a flat Dirichletdistribution
(conditioned on root age)
Node Age Priors
S B P
Node-age priors based on stochastic models of lineagediversification
Yule process: assumes aconstant rate of speciation,across lineages
A pure birth process—everynode leaves extantdescendants (no extinction)
Node Age Priors
S B P
Node-age priors based on stochastic models of lineagediversification
Yule process: assumes aconstant rate of speciation,across lineages
A pure birth process—everynode leaves extantdescendants (no extinction)
Node Age Priors
S B P
Node-age priors based on stochastic models of lineagediversification
Constant-rate birth-deathprocess: at any point intime a lineage can speciateat rate λ or go extinct witha rate of μ
Node Age Priors
S B P
Node-age priors based on stochastic models of lineagediversification
Constant-rate birth-deathprocess: at any point intime a lineage can speciateat rate λ or go extinct witha rate of μ
Node Age Priors
S B P
Different values of λ and μ leadto different trees
Bayesian inference under thesemodels can be very sensitive tothe values of these parameters
Using hyperpriors on λ and μaccounts for uncertainty in thesehyperparameters
Node Age Priors
S B P
Node-age priors based on stochastic models of lineagediversification
Birth-death-samplingprocess: an extension ofthe constant-rate birth-deathmodel that accounts forrandom sampling of tips
Conditions on a probabilityof sampling a tip, ρ
Node Age Priors
S B P
Node-age priors based on stochastic models of lineagediversification
Birth-death-samplingprocess: an extension ofthe constant-rate birth-deathmodel that accounts forrandom sampling of tips
Conditions on a probabilityof sampling a tip, ρ
Node Age Priors
P N T
Sequence data are only informative on relative rates & times
Node-time priors cannot give precise estimates of absolutenode ages
We need external information (like fossils) to calibrate orscale the tree to absolute time
f(A | θA,C)
Node Age Priors
C D T
Fossils (or other data) are necessary to estimate absolutenode ages
There is no information inthe sequence data forabsolute time
Uncertainty in theplacement of fossils
N
A B C
20%
10%10%
10%200 My
400 My
C D
Bayesian inference is well suited to accommodatinguncertainty in the age of the calibration node
Divergence times arecalibrated by placingparametric densities oninternal nodes offset by ageestimates from the fossilrecord
N
A B C
200 My
De
nsity
Age
F C
Fossil and geological datacan be used to estimate theabsolute ages of ancientdivergences
Time (My)
Calibrating Divergence Times
F C
The ages of extant taxaare known
Time (My)
Calibrating Divergence Times
F C
Fossil taxa are assigned tomonophyletic clades
Time (My)Minimum age
Calibrating Divergence Times Notogoneus osculus (Grande & Grande J. Paleont. 2008)
F C
Fossil taxa are assigned tomonophyletic clades andconstrain the age of theMRCA
Minimum age Time (My)
Calibrating Divergence Times
A F C
Misplaced fossils can affect node age estimates throughoutthe tree – if the fossil is older than its presumed MRCA
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
A F C
Crown clade: allliving species andtheir most-recentcommon ancestor(MRCA)
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
A F C
Stem lineages:purely fossil formsthat are closer totheir descendantcrown clade thanany other crownclade
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
A F C
Fossiliferoushorizons: thesources in therock record forrelevant fossils
Calibrating the Tree (figure from Benton & Donoghue Mol. Biol. Evol. 2007)
F C
Age estimates from fossilscan provide minimum timeconstraints for internalnodes
Reliable maximum boundsare typically unavailable
Minimum age Time (My)
Calibrating Divergence Times
P D C N
Parametric distributions aretypically off-set by the ageof the oldest fossil assignedto a clade
These prior densities do not(necessarily) requirespecification of maximumbounds
Uniform (min, max)
Exponential (λ)
Gamma (α, β)
Log Normal (µ, σ2)
Time (My)Minimum age
Calibrating Divergence Times
P D C N
Describe the waiting timebetween the divergenceevent and the age of theoldest fossil
Minimum age Time (My)
Calibrating Divergence Times
P D C N
Overly informative priorscan bias node ageestimates to be too young
Minimum age
Exponential (λ)
Time (My)
Calibrating Divergence Times
P D C N
Uncertainty in the age ofthe MRCA of the claderelative to the age of thefossil may be bettercaptured by vague priordensities
Minimum age
Exponential (λ)
Time (My)
Calibrating Divergence Times
P D C N
Expecte
d n
ode a
ge
Min age
(fossil)
De
nsity
Node age - Fossil age0 252015105 30
30
35
40
45
Lognormal prior density
λ = 5-1
λ = 20-1
λ = 60-1
De
nsity
Node age - Fossil age0 80604020 100
60
80
100
120
140
Exponential prior density
Expecte
d n
ode a
ge
Min age
(fossil)
Calibrating Divergence Times
P P
Hyperprior: place an higher-order prior on the parameter ofa prior distribution
Sample the time fromthe MRCA to thefossil from a mixtureof differentexponentialdistributions
Account foruncertainty in valuesof λ
De
nsity
Hyperparameter
Hyperprior
De
nsity
Parameter
Prior
P D C N
Common practice in Bayesian divergence-time estimation:
Estimates of absolute nodeages are driven primarily bythe calibration density
Specifying appropriatedensities is a challenge formost molecular biologists
Uniform (min, max)
Exponential (λ)
Gamma (α, β)
Log Normal (µ, σ2)
Time (My)Minimum age
Calibration Density Approach
I F C
We would prefer toeliminate the need forad hoc calibrationprior densities
Calibration densitiesdo not account fordiversification of fossils
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
We want to use allof the available fossils
Example: Bears12 fossils are reducedto 4 calibration ageswith calibration densitymethods
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
We want to use allof the available fossils
Example: Bears12 fossils are reducedto 4 calibration ageswith calibration densitymethods
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
Because fossils arepart of thediversification process,we can combine fossilcalibration withbirth-death models
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
I F C
This relies on abranching model thataccounts forspeciation, extinction,and rates offossilization,preservation, andrecovery
Domestic dog
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Zaragocyon daamsi
Ballusia elmensis
Ursavus brevihinus
Ailurarctos lufengensis
Ursavus primaevus
Agriarctos spp.
Kretzoiarctos beatrix
Indarctos vireti
Indarctos arctoides
Indarctos punjabiensis
Giant short-faced bear
Cave bear
Fossil and Extant Bears (Krause et al. BMC Evol. Biol. 2008; Abella et al. PLoS ONE 2012)
T F B-D P (FBD)
Improving statistical inference of absolute node ages
Eliminates the need to specify arbitrarycalibration densities
Better capture our statisticaluncertainty in species divergence dates
All reliable fossils associated with aclade are used 150 100 50 0
Time
Heath, Huelsenbeck, & Stadler. 2014. The fossilized birth-death process forcoherent calibration of divergence time estimates. PNAS 111(29):E2957–E2966.
http://www.pnas.org/content/111/29/E2957.abstract
T F B-D P (FBD)
Recovered fossil specimensprovide historicalobservations of thediversification process thatgenerated the tree ofextant species
150 100 50 0
Time
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The probability of the treeand fossil observationsunder a birth-death modelwith rate parameters:
λ = speciation
μ = extinction
ψ = fossilization/recovery
150 100 50 0
Time
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
We assume that the fossilis a descendant of aspecified calibrated node
The time of the fossil: Aindicates an observation ofthe birth-death process afterthe age of the node N
0250 50100150200
Time (My)
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The fossil must attach tothe tree at some time: ⋆
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If it is the descendant ofan unobserved lineage, thenthere is a speciation eventat time ⋆ on one of the 2branches
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If it is the descendant ofan unobserved lineage, thenthere is a speciation eventat time ⋆ on one of the 2branches
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If ⋆ = A, the fossil is anobservation of a lineageancestral to the extantspecies
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If ⋆ = A, the fossil is anobservation of a lineageancestral to the extantspecies
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The probability of thisrealization of thediversification process isconditional on:
λ, μ, and ψ
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Using MCMC, we cansample the age of thecalibrated node • whileconditioning on
λ, μ, and ψother node agesA and ⋆
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The posterior samples ofthe calibrated node age areinformed by the fossilattachment times
0250 50100150200
Time (My)
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The FBD model allowsmultiple fossils to calibratea single node
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Given ⋆ and A, the newfossil can attach to the treevia speciation along eitherbranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Given ⋆ and A, the newfossil can attach to the treevia speciation along eitherbranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Given ⋆ and A, the newfossil can attach to the treevia speciation along eitherbranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Or the unobserved branchleading to the othercalibrating fossil
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If ⋆ = A, then the newfossil lies directly on abranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
If ⋆ = A, then the newfossil lies directly on abranch in the extant tree
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Or it is an ancestor of theother calibrating fossil
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
The probability of thisrealization of thediversification process isconditional on:
λ, μ, and ψ
N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Using MCMC, we cansample the age of thecalibrated node whileconditioning on
λ, μ, and ψother node agesA and ⋆
A and ⋆N
0250 50100150200
Time (My)
N
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
MCMC allows us toconsider all possible valuesof ⋆ (marginalization)
0250 50100150200
Time (My)
Diversification of Fossil & Extant Lineages (Heath, Huelsenbeck, Stadler. PNAS 2014)
B I U FBD
Implemented in:
• DPPDiv:github.com/trayc7/FDPPDIV
• BEAST2:beast2.cs.auckland.ac.nz/
Available soon:
• RevBayes:sourceforge.net/projects/revbayes
150 100 50 0
Time
0250 50100150200
Time (My)
FBD Implementation (Heath, Huelsenbeck, Stadler. PNAS 2014)
L T T
The FBD model provides anestimate of the number oflineages over time
050100150200
Time (My)
3.0
2.0
1.0
1.0
50100150 0200
Time (My)
Nu
mb
er
of lin
ea
ge
s
L T T
Lineage diversity over time with fossils
MCMC samples thetimes of lineages inthe reconstructed treeand the times of thefossil lineages
(for 1 simulation replicate with 10% random sample of fossils)
Simulations: Results
L T T
Lineage diversity over time with fossils
Visualize extant andsampled fossil lineagediversification whenusing all availablefossils (21 total)
(for 1 simulation replicate with 10% random sample of fossils)
Simulations: Results
L T T
Lineage diversity over time with fossils
Choosing only theoldest (calibration)fossils reduced the setof sampled fossilsfrom 21 to 12,giving less informationabout diversificationover time
(for 1 simulation replicate with 10% random sample of fossils)
Simulations: Results
B: D T
Sequence data for extantspecies:
• 8 Ursidae• 1 Canidae (gray wolf)• 1 Phocidae (spotted seal)
Fossil ages:
• 12 Ursidae• 5 Canidae• 5 Pinnipedimorpha
DPP relaxed clock model
(Heath, Holder, Huelsenbeck MBE 2012)
Fixed tree topology
Se
qu
en
ce
Da
ta
Fossil Data Phylogenetic Relationships
fossil canids
fossil pinnepeds
stem fossil ursids
fossil Ailuropodinae
giant short-faced bear
cave bear
Empirical Analysis (Wang 1994, 1999; Krause et al. 2008; Fulton & Strobeck 2010; Abella et al. 2012)
B: D TGray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
1. fossil canids
2. fossil pinnepeds
2. stem fossil ursids
3. fossil Ailuropodinae
4. giant short-faced bear
5. cave bear
1
3
2
4
5
Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)
B: D TGray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Urs
ida
e
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Ple
is
Paleocene
95% CI
Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)
B: D T
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
fossil pinnipeds
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Urs
ida
e
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Ple
is
Paleocene
stem fossil Ursidae
fossil canids
Empirical Analysis (Heath et al. PNAS 2014; silhouette images: http://phylopic.org/)
T F B-D P (FBD)
Improved statistical inference of absolute node ages
Biologically motivatedmodels can bettercapture statisticaluncertainty
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
fossil pinnipeds
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Urs
ida
e
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Ple
is
Paleocene
stem fossil Ursidae
fossil canids
Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Improved statistical inference of absolute node ages
Use all available fossils
Eliminates arbitrarychoice of calibrationpriors
fossil Ailuropodinae
fossil Arctodus
fossil Ursus
fossil pinnipeds
Gray wolf
Spotted seal
Giant panda
Spectacled bear
Sun bear
Am. black bear
Asian black bear
Brown bear
Polar bear
Sloth bear
Urs
ida
e
Time (My)
60 2040 0
Eocene Oligocene Miocene
Plio
Ple
is
Paleocene
stem fossil Ursidae
fossil canids
Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)
T F B-D P (FBD)
Improved statistical inference of absolute node ages
Extensions of the FBDcan account forstratigraphic samplingof fossils and shifts inrates of speciationand extinction
0175 255075100125150
Time
0.2 1.05
Preservation Rate
Modeling Diversification Processes (Heath, Huelsenbeck, Stadler. PNAS 2014)
F T E D
Ideally, we would like toinclude all of the availabledata
Account for uncertainty inthe placement of fossillineages
Keep all fossil data, not justthe oldest descendant for agiven node
Time
F T E D
Combining extant and fossilspecies
Fo
ssil
Ag
e D
ata
Se
qu
en
ce
Da
ta
© AntWeb.org
Mo
rph
olo
gic
al
Da
ta © AntWeb.org
0%
20%
40%
60%
80%
100%
OrthopteraParaneoptera
NeuropteraRaphidiopteraColeo PolyphagaColeoptera AdephagaLepidopteraMecoptera
XyelaMacroxyela
RunariaParemphytus
Blasticotoma
TenthredoAglaostigmaDolerus
SelandriaStrongylogasterMonophadnoidesMetallus
Athalia
Taxonus
HoplocampaNematinusNematusCladiusMonoctenusGilpiniaDiprion
CimbicinaeAbiaCorynis
ArgeSterictiphoraPergaPhylacteophagaLophyrotoma
AcorduleceraDecameria
Neurotoma
OnycholydaPamphiliusCephalciaAcantholyda
Megalodontes cephalotesMegalodontes skorniakowii
CephusCalameutaHartigiaSyntexisSirexXerisUrocerusTremexXiphydria
Orussus
Stephanidae AStephanidae B
Megalyridae
Trigonalidae
Chalcidoidea
Evanioidea
Ichneumonidae
Cynipoidea
Apoidea AApoidea BApoidea CVespidae
Grimmaratavites
Ghilarella
AulidontesProtosirexAuliscaKaratavitesSepulca
Onokhoius
TrematothoraxThoracotremaProsyntexis
FerganolydaRudisiriciusSogutia
XyelulaBrigittepterus
MesolydaBrachysyntexis
DahuratomaPseudoxyelocerus
PalaeathaliaAnaxyela
SyntexyelaKulbastavia
Undatoma
Abrotoxyela
Mesoxyela mesozoicaSpathoxyela
TriassoxyelaLeioxyela
NigrimonticolaChaetoxyelaAnagaridyela
EoxyelaLiadoxyela
Xyelotoma
Pamphiliidae undescribed
Turgidontes
PraeoryssusParoryssus
Mesorussus
SymphytopterusCleistogasterLeptephialtitesStephanogaster
97
100
100
100100
100100
100
100
100100
100
100100
100
100100
100
100100
100
100
100100
100
100
100
100
100
100
97
68
55
77
75
91
84
6157
62
9896
71
71
85
93
52
95
64
80
64
99
77
99
98
97
82
52
58
9490
60
70
57
8361
7060
350 300 250 200 150 100 50 0 million years before present
% of morphological
characters scored
for each terminal
(Ronquist, Klopfstein, et al. Syst. Biol. 2012.)
O
Overview of divergence time estimation
• Relaxed clock models – accounting for variation insubstitution rates among lineages
• Tree priors and fossil calibration
break
BEAST v2.1.3 Tutorialhttp://treethinkers.org/divergence-time-estimation-using-beast/
• Walk through: set up BEAST input file in BEAUti andexecute BEAST MCMC analysis
• On your own: complete analysis & summarize output
dinner
D T E S
Program Models/Methodr8s Strict clock, local clocks, NPRS, PLape (R) NPRS, PLmultidivtime log-n autocorrelated (plus some others)PhyBayes OU, log-n autocorrelated (plus some others)PhyloBayes CIR, white noise (uncorrelated) (plus some others)BEAST Uncorrelated (log-n & exp), local clocks (plus others)TreeTime Dirichlet model, CPP, uncorrelatedMrBayes 3.2 CPP, strict clock, autocorrelated, uncorrelatedDPPDiv DPP, strict clock, uncorrelatedRevBayes “the limit is the sky”
BEAST
Bayesian Evolutionary Analysis Sampling Trees
• population size
• growth/decline inpopulation
• bottlenecks/transitionpoints
• gene trees/species trees
• virus transmissiondynamics
• recombination
• migration
• founder effects
• epidemiological tracking
• phylogeography
• trait evolution
• dates of MRCAs
• lineage rates
• ancestral character statereconstruction
• times ofbottlenecks/transitions
http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)
BEAST
Bayesian Evolutionary Analysis Sampling Trees
• free, open-source, cross-platform software package forestimating evolutionary parameters on rooted time trees
• includes several utility programs for creating input filesand summarizing output
• relies on a verbose XML syntax for executing analysis
http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)
BEAST
BEAUti
• a GUI, utility program fro generating properly formattedBEAST XML input files
http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)
BEASTBEAST 2 XML: The eXtensible Markup Language
• specifies the sequences, node calibrations, models, priors,output file names, etc.
• dataset-specific issues can arise and some understandingof the BEAST-specific XML format is essential fortroubleshooting
• there are a number of interesting models and analysesavailable in BEAST that cannot be specified using theBEAUti utility
• XML syntax help:http://www.beast2.org/wiki/index.php/BEAST_2.0.x_XML
http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)
BEAST
BEAST (analysis program)
• reads the commands in the xml input file• performs MCMC and generates output files
• <file_stem>.trees (contains the trees and branch rates forevery n generations)
• <file_stem>.log (contains the parameter samples forevery n generation)
• the main BEAST binary only runs a single chain (forMCMCMC use BEASTMC3 only available in BEASTv1.8)
http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)
BEAST
LogCombiner
• combines the log or trees files from multiple,independent MCMC runs
http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)
BEAST
TreeAnnotator
• reads in the trees file and summarizes the topology,branch times, and rates
• places annotations on the tree that can easily be viewedin FigTree
http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)
BEAST T
• Walkthrough using BEAUTti and executing the analysisin BEAST
• Independently complete the tutorial and summarize theBEAST output
http://beast2.org/ (Bouckaert et al. PLoS Comp Bio 2014)
BEAST TThis tutorial uses a simulated dataset of 10 taxa with 4calibration points
ML phylogeny
R
1
2
3
Log-normal (ln(20), 0.75)
Exponential (30-1)
Normal (140, 10)
Uniform (8, 32)
3
2
1
R
T1
T2
T3
T4
T5
T6
T8
T7
T9
T10
20406080100120 0160 140200 180
Time
60
40
8
32
1R Calibrated
nodes
FossilsT
mean = 140
mean = 90
median = 60
mean = 20