BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 ·...

33
rexpokit and cladoRcpp: R packages integra3ng FORTRAN and C++ for faster matrix exponen3a3on and likelihood calcula3ons in historical biogeography Nicholas J. Matzke, Postdoctoral Fellow, NIMBioS (Na;onal Ins;tute of Mathema;cal, www.nimbios.org) iEvoBio 2014, Raleigh, NC, Tuesday, June 24, 2014 SoMware Bazaar, Room 402, 3:155:00 pm Coauthor: Drew Schmidt, University of Tennessee DEC (LnL=34.5) DEC+J (LnL=20.9) Hawaii Psychotria BioGeoBEARS stochas3c mapping:

Transcript of BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 ·...

Page 1: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

rexpokit  and  cladoRcpp:  R  packages  integra3ng  FORTRAN  and  C++  for  faster  matrix  exponen3a3on  and  likelihood  calcula3ons  in  historical  biogeography

Nicholas  J.  Matzke,  Postdoctoral  Fellow,  NIMBioS  (Na;onal  Ins;tute  of  Mathema;cal,  www.nimbios.org)  

iEvoBio  2014,  Raleigh,  NC,  Tuesday,  June  24,  2014  SoMware  Bazaar,  Room  402,  3:15-­‐5:00  pm  

Coauthor:  Drew  Schmidt,  University  of  Tennessee

  DEC  (LnL=-­‐34.5)   DEC+J  (LnL=-­‐20.9)

Hawaii  Psychotria  BioGeoBEARS  stochas3c  mapping:

Page 2: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Figure 1, Matzke 2013, Frontiers of Biogeography

Page 3: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

A

A BA B

A A B

A B

A

A B

A B

LAGRANGE  or  DIVA

A

A BA B

A B

A

A B

Founder-­‐event

A

Founder  event

Range  expands

Vicariance

BioGeoBEARS:  tes3ng  biogeographical  models

Page 4: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Test  biogeographic  processes  in  BioGeoBEARSI implemented this method in three R packages:

A

A

C

C

B

A B

Package #1, rexpokit, handles anagenic events requiring matrix exponentiation (FORTRAN EXPOKIT library, C++, Rcpp)

A A B

A B

Page 5: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

I implemented this method in three R packages:

A

A

C

C

B

A B

A B

Package #1, rexpokit, handles anagenic events requiring matrix exponentiation (FORTRAN EXPOKIT library, C++, Rcpp)

A A B

Package #2, cladoRcpp, handles cladogenesis events (C++, Rcpp)

Test  biogeographic  processes  in  BioGeoBEARS

Page 6: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Package #3, BioGeoBEARS, does the ML/Bayesian searches, model testing, etc.

BioGeographic Bayesian Evolutionary Analysis with R Scripts

A

A

C

C

B

A B

A B

I implemented this method in three R packages:

TRY  IT  YOURSELF!  h[p://phylo.wikidot.com/biogeobears

Test  biogeographic  processes  in  BioGeoBEARS

Page 7: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Dealing with slowness #1: Matrix exponentiation

v = branch length Q = Q matrix P = Probability transition matrix

Page 8: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Dealing with slowness #1: Matrix exponentiation

DNA: 4x4 Q matrix Amino acids: 20x20 matrix Biogeography: # ranges x # ranges matrix

Page 9: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Number of states in biogeography

# states = 2number of areas

2 areas = 22 = 4 states 4 areas = 24 = 16 states 8 areas = 28 = 256 states 10 areas = 210 = 1024 states

Exponentiating a 1024x1024 matrix is slow

Page 10: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Dealing with slowness #1: Matrix exponentiation

Page 11: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

A

A

C

C

B

A B

Package #1, rexpokit, handles anagenic events requiring matrix exponentiation (FORTRAN EXPOKIT library, C++, Rcpp)

A A B

A B

Page 12: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

R Package: rexpokit

Page 13: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

R Package: rexpokit

Page 14: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

R Package: rexpokitDense matrix exponentiation: EXPOKIT's DGPADM — (faster than the other R options, esp. for large matrices)

Sparse matrix exponentiation: EXPOKIT’s DMEXPV !

(DMEXPV’s imprecision affected ML analysis, needs more exploration)

Page 15: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

R Package: rexpokit

Use of multicore mapply() to farm out the matrix exponentiations on the branches

Page 16: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Dealing with slowness #2: conditional probabilities at

cladogenesis

Page 17: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Number of possible cladogenetic range-inheritance events

# states = 2number of areas

2 areas = 4 states = 4-1 = 3 non-null states At every node,

3 possible ancestor states, 3 possible left descendants, 3 possible right descendants

!

= 3x3x3 = 27 combinations to assign probabilities to

Page 18: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

# states = 2number of areas

9 areas = 512 states = 512-1 = 511 non-null states At every node,

511 possible ancestor states, 511 possible left descendants, 511 possible right descendants

!

= 511x511x511 = 133,432,831 combinations to assign probabilities to

Number of possible cladogenetic range-inheritance events

Page 19: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

E.g., range AB giving rise to: !

left = CDEFG, right = DEFGH !

…should get probability 0. !

But, searching intelligently involves lots of nested for-loops

Problem can be simplified, according to your cladogenesis

model

Page 20: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

R Package: cladoRcpp

Page 21: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

A

A

C

C

B

A B

A B

A A B

Package #2, cladoRcpp, handles cladogenesis events (C++, Rcpp)

Page 22: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

A

A

C

C

B

A B

A B

A A B

BioGeoBEARS speedSmall analysis (4 areas, 16 states) !

C++ Lagrange: < 1 second Python Lagrange: a few sec. BioGeoBEARS: ~10 seconds !

Page 23: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

A

A

C

C

B

A B

A B

A A B

BioGeoBEARS speedLarge analysis (10 areas, 6 max, 848 states) !

C++ Lagrange: 208 min. Python Lagrange: won’t run !

BioGeoBEARS single-core: 187 minutes !

BioGeoBEARS twelve cores: 22 minutes

Page 24: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

A

A

C

C

B

A B

A B

A A B

rexpokit improvementsCoauthor Drew Schmidt, University of Tennessee !

Gets a 40% speed improvement via various fixes in the R/FORTRAN interaction !

New version of rexpokit going on CRAN this week !

See his talk at UseR conference, June 30-July 3, UCLA

Page 25: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

R Package: at PhyloWiki

Page 26: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

0

0.05

0.1

0.15

0.20.2

d

Yule simulations (λ=0.3, µ=0, α=0, ω=0)true parameter valueDEC inferenceDEC+J inference

00.10.20.30.40.5

j

1/150.25

0.5

0.75

1

Prop

ortio

n co

rrect

0

50

100

130

∆AI

Cc

True model: +J +J +J DEC +J +J +J DEC +J +J +J

Accurate when simulated under Yule process

Page 27: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

0

0.05

0.1

0.15

0.20.2

d

SSE simulations (λ=0.3, µ=0.3, α=1, ω=−1)true parameter valueDEC inferenceDEC+J inference

00.10.20.30.40.5

j

1/150.25

0.5

0.75

1

Prop

ortio

n co

rrect

0

50

100

130

∆AI

Cc

True model: +J +J +J DEC +J +J +J DEC +J +J +J

Accurate even with simulation under SSE

Page 28: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Papers  using  BioGeoBEARS

Page 29: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

See  also:

In  review:  email  me  at  [email protected]  for  this

Page 30: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

Thanks!  I  welcome  ques;ons/comments/collabora;ons  at:  [email protected]  (also:  seeking  a  job!)

Funding:  NIMBioS  NSF  “Bivalves  in  Time  and  Space”  UC  Berkeley  Wang  Fellowship  UC  Berkeley  Tien  Fellowship  Google  Summer  of  Code  NIMBioS

TRY  IT  YOURSELF  AT:  h[p://phylo.wikidot.com/biogeobears

Thanks  especially  to:  !NIMBioS  Brian  O’Meara  Jeremy  Beaulieu  Ka=e  Massana  Michael  Landis  !!Ph.D.  commi]ee John  Huelsenbeck Tony  Barnosky David  Jablonski Roger  Byrne  !Systema=c  Biology  editors  &  reviewers  

Page 31: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

DEC: Dispersal-extinction cladogenesis

Cladogenesis -- range evolution at speciation events (LAGRANGE model)

Probabilites per speciation event:

ending&rangeleft: A A A A A A A B B B B B B B C C C C C C C AB AB AB AB AB AB AB BC BC BC BC BC BC BC AC AC AC AC AC AC AC ABCABCABCABCABCABCABCright: A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC countA sym 1B sym 1

starting C sym 1range AB vic sub vic sub sub sub 6

BC vic sub vic sub sub sub 6AC vic sub vic sub sub sub 6ABC vic sub vic sub vic sub vic vic vic sub sub sub 12

sympatric subset vicariance

Page 32: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

ending&rangeleft: A A A A A A A B B B B B B B C C C C C C C AB AB AB AB AB AB AB BC BC BC BC BC BC BC AC AC AC AC AC AC AC ABCABCABCABCABCABCABCright: A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC countA sym 1B sym 1

starting C sym 1range AB vic sub vic sub sub sub 6

BC vic sub vic sub sub sub 6AC vic sub vic sub sub sub 6ABC vic sub vic sub vic sub vic vic vic sub sub sub 12

DEC+J: Dispersal-extinction cladogenesis PLUS founder-event speciation

Probabilites per speciation event:

sympatric subset vicariance Founder Event

jjjj

jjj

jj j

jj j

j

jj j

j

Page 33: BioGeoBEARS)stochas3c)mapping: …phylo.wdfiles.com/local--files/abstracts-for... · 2014-06-24 · Test)biogeographic)processes)in)BioGeoBEARS I implemented this method in three

BayArea cladogenesis model

Probabilites per speciation event:

ending&rangeleft: A A A A A A A B B B B B B B C C C C C C C AB AB AB AB AB AB AB BC BC BC BC BC BC BC AC AC AC AC AC AC AC ABCABCABCABCABCABCABCright: A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC A B C AB BC AC ABC countA sym 1B sym 1

starting C sym 1range AB sym 1

BC sym 1AC sym 1ABC sym 1

Sympatric speciation (range duplication)

Cladogenesis -- “no event” during speciation (i.e., pure continuous time)