BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 ·...
Transcript of BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 ·...
![Page 1: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/1.jpg)
rexpokit and cladoRcpp: R packages integra3ng FORTRAN and C++ for faster matrix exponen3a3on and likelihood calcula3ons in historical biogeography
Nicholas J. Matzke, Postdoctoral Fellow, NIMBioS (Na;onal Ins;tute of Mathema;cal, www.nimbios.org)
iEvoBio 2014, Raleigh, NC, Tuesday, June 24, 2014 SoMware Bazaar, Room 402, 3:15-‐5:00 pm
Coauthor: Drew Schmidt, University of Tennessee
DEC (LnL=-‐34.5) DEC+J (LnL=-‐20.9)
Hawaii Psychotria BioGeoBEARS stochas3c mapping:
![Page 2: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/2.jpg)
Figure 1, Matzke 2013, Frontiers of Biogeography
![Page 3: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/3.jpg)
A
A BA B
A A B
A B
A
A B
A B
LAGRANGE or DIVA
A
A BA B
A B
A
A B
Founder-‐event
A
Founder event
Range expands
Vicariance
BioGeoBEARS: tes3ng biogeographical models
![Page 4: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/4.jpg)
Test biogeographic processes in BioGeoBEARSI implemented this method in three R packages:
A
A
C
C
B
A B
Package #1, rexpokit, handles anagenic events requiring matrix exponentiation (FORTRAN EXPOKIT library, C++, Rcpp)
A A B
A B
![Page 5: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/5.jpg)
I implemented this method in three R packages:
A
A
C
C
B
A B
A B
Package #1, rexpokit, handles anagenic events requiring matrix exponentiation (FORTRAN EXPOKIT library, C++, Rcpp)
A A B
Package #2, cladoRcpp, handles cladogenesis events (C++, Rcpp)
Test biogeographic processes in BioGeoBEARS
![Page 6: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/6.jpg)
Package #3, BioGeoBEARS, does the ML/Bayesian searches, model testing, etc.
BioGeographic Bayesian Evolutionary Analysis with R Scripts
A
A
C
C
B
A B
A B
I implemented this method in three R packages:
TRY IT YOURSELF! h[p://phylo.wikidot.com/biogeobears
Test biogeographic processes in BioGeoBEARS
![Page 7: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/7.jpg)
Dealing with slowness #1: Matrix exponentiation
v = branch length Q = Q matrix P = Probability transition matrix
![Page 8: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/8.jpg)
Dealing with slowness #1: Matrix exponentiation
DNA: 4x4 Q matrix Amino acids: 20x20 matrix Biogeography: # ranges x # ranges matrix
![Page 9: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/9.jpg)
Number of states in biogeography
# states = 2number of areas
2 areas = 22 = 4 states 4 areas = 24 = 16 states 8 areas = 28 = 256 states 10 areas = 210 = 1024 states
Exponentiating a 1024x1024 matrix is slow
![Page 10: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/10.jpg)
Dealing with slowness #1: Matrix exponentiation
![Page 11: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/11.jpg)
A
A
C
C
B
A B
Package #1, rexpokit, handles anagenic events requiring matrix exponentiation (FORTRAN EXPOKIT library, C++, Rcpp)
A A B
A B
![Page 12: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/12.jpg)
R Package: rexpokit
![Page 13: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/13.jpg)
R Package: rexpokit
![Page 14: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/14.jpg)
R Package: rexpokitDense matrix exponentiation: EXPOKIT's DGPADM — (faster than the other R options, esp. for large matrices)
Sparse matrix exponentiation: EXPOKIT’s DMEXPV !
(DMEXPV’s imprecision affected ML analysis, needs more exploration)
![Page 15: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/15.jpg)
R Package: rexpokit
Use of multicore mapply() to farm out the matrix exponentiations on the branches
![Page 16: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/16.jpg)
Dealing with slowness #2: conditional probabilities at
cladogenesis
![Page 17: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/17.jpg)
Number of possible cladogenetic range-inheritance events
# states = 2number of areas
2 areas = 4 states = 4-1 = 3 non-null states At every node,
3 possible ancestor states, 3 possible left descendants, 3 possible right descendants
!
= 3x3x3 = 27 combinations to assign probabilities to
![Page 18: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/18.jpg)
# states = 2number of areas
9 areas = 512 states = 512-1 = 511 non-null states At every node,
511 possible ancestor states, 511 possible left descendants, 511 possible right descendants
!
= 511x511x511 = 133,432,831 combinations to assign probabilities to
Number of possible cladogenetic range-inheritance events
![Page 19: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/19.jpg)
E.g., range AB giving rise to: !
left = CDEFG, right = DEFGH !
…should get probability 0. !
But, searching intelligently involves lots of nested for-loops
Problem can be simplified, according to your cladogenesis
model
![Page 20: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/20.jpg)
R Package: cladoRcpp
![Page 21: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/21.jpg)
A
A
C
C
B
A B
A B
A A B
Package #2, cladoRcpp, handles cladogenesis events (C++, Rcpp)
![Page 22: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/22.jpg)
A
A
C
C
B
A B
A B
A A B
BioGeoBEARS speedSmall analysis (4 areas, 16 states) !
C++ Lagrange: < 1 second Python Lagrange: a few sec. BioGeoBEARS: ~10 seconds !
![Page 23: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/23.jpg)
A
A
C
C
B
A B
A B
A A B
BioGeoBEARS speedLarge analysis (10 areas, 6 max, 848 states) !
C++ Lagrange: 208 min. Python Lagrange: won’t run !
BioGeoBEARS single-core: 187 minutes !
BioGeoBEARS twelve cores: 22 minutes
![Page 24: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/24.jpg)
A
A
C
C
B
A B
A B
A A B
rexpokit improvementsCoauthor Drew Schmidt, University of Tennessee !
Gets a 40% speed improvement via various fixes in the R/FORTRAN interaction !
New version of rexpokit going on CRAN this week !
See his talk at UseR conference, June 30-July 3, UCLA
![Page 25: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/25.jpg)
R Package: at PhyloWiki
![Page 26: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/26.jpg)
0
0.05
0.1
0.15
0.20.2
d
Yule simulations (λ=0.3, µ=0, α=0, ω=0)true parameter valueDEC inferenceDEC+J inference
00.10.20.30.40.5
j
1/150.25
0.5
0.75
1
Prop
ortio
n co
rrect
0
50
100
130
∆AI
Cc
True model: +J +J +J DEC +J +J +J DEC +J +J +J
Accurate when simulated under Yule process
![Page 27: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/27.jpg)
0
0.05
0.1
0.15
0.20.2
d
SSE simulations (λ=0.3, µ=0.3, α=1, ω=−1)true parameter valueDEC inferenceDEC+J inference
00.10.20.30.40.5
j
1/150.25
0.5
0.75
1
Prop
ortio
n co
rrect
0
50
100
130
∆AI
Cc
True model: +J +J +J DEC +J +J +J DEC +J +J +J
Accurate even with simulation under SSE
![Page 28: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/28.jpg)
Papers using BioGeoBEARS
![Page 30: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/30.jpg)
Thanks! I welcome ques;ons/comments/collabora;ons at: [email protected] (also: seeking a job!)
Funding: NIMBioS NSF “Bivalves in Time and Space” UC Berkeley Wang Fellowship UC Berkeley Tien Fellowship Google Summer of Code NIMBioS
TRY IT YOURSELF AT: h[p://phylo.wikidot.com/biogeobears
Thanks especially to: !NIMBioS Brian O’Meara Jeremy Beaulieu Ka=e Massana Michael Landis !!Ph.D. commi]ee John Huelsenbeck Tony Barnosky David Jablonski Roger Byrne !Systema=c Biology editors & reviewers
![Page 31: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/31.jpg)
DEC: Dispersal-extinction cladogenesis
Cladogenesis -- range evolution at speciation events (LAGRANGE model)
Probabilites per speciation event:
ending&rangeleft: A A A A A A A B B B B B B B C C C C C C C AB AB AB AB AB AB AB BC BC BC BC BC BC BC AC AC AC AC AC AC AC ABCABCABCABCABCABCABCright: A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC countA sym 1B sym 1
starting C sym 1range AB vic sub vic sub sub sub 6
BC vic sub vic sub sub sub 6AC vic sub vic sub sub sub 6ABC vic sub vic sub vic sub vic vic vic sub sub sub 12
sympatric subset vicariance
![Page 32: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/32.jpg)
ending&rangeleft: A A A A A A A B B B B B B B C C C C C C C AB AB AB AB AB AB AB BC BC BC BC BC BC BC AC AC AC AC AC AC AC ABCABCABCABCABCABCABCright: A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC countA sym 1B sym 1
starting C sym 1range AB vic sub vic sub sub sub 6
BC vic sub vic sub sub sub 6AC vic sub vic sub sub sub 6ABC vic sub vic sub vic sub vic vic vic sub sub sub 12
DEC+J: Dispersal-extinction cladogenesis PLUS founder-event speciation
Probabilites per speciation event:
sympatric subset vicariance Founder Event
jjjj
jjj
jj j
jj j
j
jj j
j
![Page 33: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three](https://reader033.fdocuments.us/reader033/viewer/2022041612/5e383ef0b4bb956e9779f7b0/html5/thumbnails/33.jpg)
BayArea cladogenesis model
Probabilites per speciation event:
ending&rangeleft: A A A A A A A B B B B B B B C C C C C C C AB AB AB AB AB AB AB BC BC BC BC BC BC BC AC AC AC AC AC AC AC ABCABCABCABCABCABCABCright: A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC countA sym 1B sym 1
starting C sym 1range AB sym 1
BC sym 1AC sym 1ABC sym 1
Sympatric speciation (range duplication)
Cladogenesis -- “no event” during speciation (i.e., pure continuous time)