How to use PHYLDOG: a tutorial

54
PHYLDOG in practice Bastien Boussau [email protected] @Bastounette

description

Presentation of PHYLDOG, a piece of software for reconstructing gene and species phylogenies, with a focus on the practical side of things and pointers to a tutorial.

Transcript of How to use PHYLDOG: a tutorial

Page 1: How to use PHYLDOG: a tutorial

PHYLDOG in practiceBastien Boussau

[email protected] @Bastounette

Page 2: How to use PHYLDOG: a tutorial

Getting the program• From internet:

• http://pbil.univ-lyon1.fr/software/phyldog/#try

!

• Using the USB keys in the room:

• contain VirtualBox

• contain the application along with the data to analyze

Page 3: How to use PHYLDOG: a tutorial

• LBBE collaborators (Lyon):

– Gergely Szöllősi (Budapest),

– Eric Tannier,

– Vincent Daubin,

–Manolo Gouy,

– Sophie Abby,

– Laurent Duret,

– Thomas Bigot, –Magali Semeria

Collaborators

Page 4: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Page 5: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Page 6: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Page 7: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D

Page 8: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

D DL

Page 9: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGTD DL

Page 10: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILSD DL

Page 11: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILS

PHYLDOG

D DL

Page 12: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILS

DL+LGT: Szollosi et al., PNAS

PHYLDOG

D DL

Page 13: How to use PHYLDOG: a tutorial

Species: A B C D

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

Species: A B C D

Discrete character:Continuous character:

a a b a0.1 0.2 0.2 0.4

TIME

LGT ILS

DL+LGT: Szollosi et al., PNAS

ILS: Not yet

PHYLDOG

D DL

Page 14: How to use PHYLDOG: a tutorial

What is PHYLDOG?• Program for the coestimation of species and gene trees at the

genome scale

• Probabilistic model of sequence evolution + model of gene duplication and loss

• Statistical framework

• Branch-wise parameters of duplications and losses

• Gene families evolve independently of each other

• Based on a parallel architecture using MPI

Genome-scale coestimation of species and gene trees. Boussau et al., Genome research. 2013 23:323:330.

Page 15: How to use PHYLDOG: a tutorial

Parallel architecture

Although it may not look like it, PHYLDOG infers rooted trees

Page 16: How to use PHYLDOG: a tutorial

Structure of the input data

Page 17: How to use PHYLDOG: a tutorial

Option files

family_X.option: options specific to gene family X (alignment file, substitution model options, gene tree search options)

GeneralOptions.txt: options concerning the species tree search, and options common to all gene families (species tree search options, duplication/loss model options, list of gene families)

Page 18: How to use PHYLDOG: a tutorial

Option files

family_X.option: options specific to gene family X (alignment file, substitution model options, gene tree search options)

GeneralOptions.txt: options concerning the species tree search, and options common to all gene families (species tree search options, duplication/loss model options, list of gene families)

Easy generation of basic option files using the prepareData.py script

Page 19: How to use PHYLDOG: a tutorial

PHYLDOG tutorialhttp://www.prabi.fr/redmine/projects/phyldogtoolt/wiki/Tutorial

• Installing PHYLDOG • Downloading files • Basic input files • Generating all the option files using the prepareData.py script • Running PHYLDOG • Diminishing the number of species considered • Diminishing the number of gene families considered • Running PHYLDOG, at last • Interpreting PHYLDOG's output • Going further

Page 20: How to use PHYLDOG: a tutorial

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

Why our current pipeline can be improved

Page 21: How to use PHYLDOG: a tutorial

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

Why our current pipeline can be improved

Page 22: How to use PHYLDOG: a tutorial

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

•Gene trees: •based on alignments •Point estimates

Why our current pipeline can be improved

Page 23: How to use PHYLDOG: a tutorial

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

•Gene trees: •based on alignments •Point estimates

•Species trees: •based on gene trees

Why our current pipeline can be improved

Page 24: How to use PHYLDOG: a tutorial

�������������

��������

� ���������

�������� �

�������������

���������������

��������

�������������� ���������������������

���������������������� ������������ ���������������

����������� !���"� !��#����!�#$��%

���������&$�%!�������������'(%!�#$�%

�������( )'�

����!�����*+ ('�,#$��%

����!��������&�����-���!�����&( ��� $�.��"'(%

���������/���

•Gene alignments: •Error prone •Short •Point estimates

•Gene trees: •based on alignments •Point estimates

•Species trees: •based on gene trees

Why our current pipeline can be improved

Page 25: How to use PHYLDOG: a tutorial

Simulations to test PHYLDOG

rooted organism tree

• 40 species

Page 26: How to use PHYLDOG: a tutorial

D1D2

D3D4

D5

D6

L2L1

L4L3

L5

L6

rooted organism tree

numbers of duplications

and losses

• 40 species • Randomly pick duplication and loss rates per branch

Simulations to test PHYLDOG

Page 27: How to use PHYLDOG: a tutorial

D1D2

D3D4

D5

D6

L2L1

L4L3

L5

L6

rooted organism tree

numbers of duplications

and losses

rooted gene trees

• 40 species • Randomly pick duplication and loss rates per branch

Simulations to test PHYLDOG

Page 28: How to use PHYLDOG: a tutorial

D1D2

D3D4

D5

D6

L2L1

L4L3

L5

L6

rooted organism tree

numbers of duplications

and losses

rooted gene trees

• 40 species • Randomly pick duplication and loss rates per branch • Complex model of sequence evolution

Simulations to test PHYLDOG

Page 29: How to use PHYLDOG: a tutorial

D1D2

D3D4

D5

D6

L2L1

L4L3

L5

L6

PHYLDOG

rooted organism tree

numbers of duplications

and losses

rooted gene trees

? ? ?

• 40 species • Randomly pick duplication and loss rates per branch • Complex model of sequence evolution

Simulations to test PHYLDOG

Page 30: How to use PHYLDOG: a tutorial

D1D2

D3D4

D5

D6

L2L1

L4L3

L5

L6

PHYLDOG

rooted organism tree

numbers of duplications

and losses

rooted gene trees

? ? ?

• 40 species • Randomly pick duplication and loss rates per branch • Complex model of sequence evolution

!

Simulations to test PHYLDOG

Page 31: How to use PHYLDOG: a tutorial

Wrong modelPhyML

Wrong modelPHYLDOG

05

1015

20Tr

ee e

rror

Simulations:PHYLDOG builds accurate gene trees

Page 32: How to use PHYLDOG: a tutorial

Wrong modelPhyML

Correct modelPhyML

Wrong modelPHYLDOG

05

1015

20Tr

ee e

rror

Simulations:PHYLDOG builds accurate gene trees

Page 33: How to use PHYLDOG: a tutorial

Simulations: PHYLDOG accurately recovers numbers of duplications and losses

Page 34: How to use PHYLDOG: a tutorial

Simulations: PHYLDOG accurately recovers numbers of duplications and losses

●●

●●

●●

●●

●●

●●

●●

●●

●●

1e−04 1e−03 1e−02 1e−01 1e+00

1e−0

41e−0

31e−0

21e−0

11e

+00

Expected Numbers

Rec

onst

ruct

ed N

umbe

rs

●●●●

●●

● ●

●●

● ●

●●

●●

●●●

●●

●●

●●●

●●

DuplicationsLosses

Page 35: How to use PHYLDOG: a tutorial

Study of mammalian genomes

• Challenging but well-studied phylogeny

• 36 mammalian genomes available in Ensembl v. 57

• About 7000 gene families

• Correction for incomplete genomes

Page 36: How to use PHYLDOG: a tutorial

PHYLDOG finds a good species tree

Sus scrofa

Felis catus

Ornithorhynchus anatinus

Oryctolagus cuniculus

Loxodonta africana

Mus musculus

Gorilla gorilla

Dipodomys ordii

Monodelphis domestica

Vicugna pacos

Macaca mulatta

Tupaia belangeri

Procavia capensis

Spermophilus tridecemlineatus

Pongo pygmaeus

Tursiops truncatus

Microcebus murinus

Callithrix jacchus

Equus caballus

Erinaceus europaeus

Tarsius syrichta

Choloepus hoffmanni

Ochotona princeps

Cavia porcellus

Pan troglodytes

Bos taurus

Rattus norvegicus

Homo sapiens

Otolemur garnettii

Dasypus novemcinctusEchinops telfairi

Pteropus vampyrus

Macropus eugenii

Canis familiaris

Sorex araneus

Myotis lucifugus

Laurasiatheria

Afrotheria

Xenarthra

Marsupials

Primates

Glires

Page 37: How to use PHYLDOG: a tutorial

PHYLDOG finds a good species tree

Sus scrofa

Felis catus

Ornithorhynchus anatinus

Oryctolagus cuniculus

Loxodonta africana

Mus musculus

Gorilla gorilla

Dipodomys ordii

Monodelphis domestica

Vicugna pacos

Macaca mulatta

Tupaia belangeri

Procavia capensis

Spermophilus tridecemlineatus

Pongo pygmaeus

Tursiops truncatus

Microcebus murinus

Callithrix jacchus

Equus caballus

Erinaceus europaeus

Tarsius syrichta

Choloepus hoffmanni

Ochotona princeps

Cavia porcellus

Pan troglodytes

Bos taurus

Rattus norvegicus

Homo sapiens

Otolemur garnettii

Dasypus novemcinctusEchinops telfairi

Pteropus vampyrus

Macropus eugenii

Canis familiaris

Sorex araneus

Myotis lucifugus

Laurasiatheria

Afrotheria

Xenarthra

Marsupials

Primates

Glires

Page 38: How to use PHYLDOG: a tutorial

Sorex_araneus

Callithrix_jacchus

Dasypus_novemcinctus

Pongo_pygmaeus

Canis_familiaris

Vicugna_pacos

Otolemur_garnettii

Myotis_lucifugus

Microcebus_murinus

Sus_scrofa

Ornithorhynchus_anatinus

Dipodomys_ordii

Gorilla_gorilla

Tupaia_belangeri

Ochotona_princepsCavia_porcellus

Mus_musculus

Erinaceus_europaeus

Pteropus_vampyrus

Felis_catus

Homo_sapiens

Loxodonta_africana

Monodelphis_domestica

Tursiops_truncatus

Echinops_telfairi

Macaca_mulatta

Pan_troglodytes

Rattus_norvegicus

Choloepus_hoffmanni

Oryctolagus_cuniculus

Equus_caballus

Bos_taurus

Procavia_capensis

Macropus_eugenii

Spermophilus_tridecemlineatus

Tarsius_syrichta

Marsupials

Afrotheria

Xenarthra

Laurasiatheria

Primates

Glires

Duptree finds a pretty good species tree

Page 39: How to use PHYLDOG: a tutorial

Sorex_araneus

Callithrix_jacchus

Dasypus_novemcinctus

Pongo_pygmaeus

Canis_familiaris

Vicugna_pacos

Otolemur_garnettii

Myotis_lucifugus

Microcebus_murinus

Sus_scrofa

Ornithorhynchus_anatinus

Dipodomys_ordii

Gorilla_gorilla

Tupaia_belangeri

Ochotona_princepsCavia_porcellus

Mus_musculus

Erinaceus_europaeus

Pteropus_vampyrus

Felis_catus

Homo_sapiens

Loxodonta_africana

Monodelphis_domestica

Tursiops_truncatus

Echinops_telfairi

Macaca_mulatta

Pan_troglodytes

Rattus_norvegicus

Choloepus_hoffmanni

Oryctolagus_cuniculus

Equus_caballus

Bos_taurus

Procavia_capensis

Macropus_eugenii

Spermophilus_tridecemlineatus

Tarsius_syrichta

Marsupials

Afrotheria

Xenarthra

Laurasiatheria

Primates

Glires

Duptree finds a pretty good species tree

Page 40: How to use PHYLDOG: a tutorial

Bos_taurus

Monodelphis_domestica

Dipodomys_ordii

Myotis_lucifugus

Equus_caballus

Gorilla_gorilla

Mus_musculus

Choloepus_hoffmanni

Echinops_telfairiProcavia_capensis

Vicugna_pacos

Tursiops_truncatus

Spermophilus_tridecemlineatus

Callithrix_jacchus

Ornithorhynchus_anatinus

Canis_familiaris

Sus_scrofa

Cavia_porcellus

Rattus_norvegicus

Pan_troglodytes

Erinaceus_europaeus

Microcebus_murinus

Felis_catus

Pteropus_vampyrus

Loxodonta_africana

Ochotona_princeps

Sorex_araneus

Dasypus_novemcinctus

Tupaia_belangeri

Pongo_pygmaeus

Otolemur_garnettii

Homo_sapiens

Macropus_eugenii

Macaca_mulatta

Oryctolagus_cuniculus

Tarsius_syrichta

Marsupials

Afrotheria

Xenarthra

Laurasiatheria

Primates

Glires

iGTP finds a different species tree

Page 41: How to use PHYLDOG: a tutorial

Bos_taurus

Monodelphis_domestica

Dipodomys_ordii

Myotis_lucifugus

Equus_caballus

Gorilla_gorilla

Mus_musculus

Choloepus_hoffmanni

Echinops_telfairiProcavia_capensis

Vicugna_pacos

Tursiops_truncatus

Spermophilus_tridecemlineatus

Callithrix_jacchus

Ornithorhynchus_anatinus

Canis_familiaris

Sus_scrofa

Cavia_porcellus

Rattus_norvegicus

Pan_troglodytes

Erinaceus_europaeus

Microcebus_murinus

Felis_catus

Pteropus_vampyrus

Loxodonta_africana

Ochotona_princeps

Sorex_araneus

Dasypus_novemcinctus

Tupaia_belangeri

Pongo_pygmaeus

Otolemur_garnettii

Homo_sapiens

Macropus_eugenii

Macaca_mulatta

Oryctolagus_cuniculus

Tarsius_syrichta

Marsupials

Afrotheria

Xenarthra

Laurasiatheria

Primates

Glires

iGTP finds a different species tree

Page 42: How to use PHYLDOG: a tutorial

• Two approaches: 1. Looking at ancestral genome sizes

2. Assessing how well one can recover ancestral syntenies using reconstructed gene trees (Bérard et al., Bioinformatics)

Assessing the quality of gene trees

Page 43: How to use PHYLDOG: a tutorial

• Two approaches: 1. Looking at ancestral genome sizes

2. Assessing how well one can recover ancestral syntenies using reconstructed gene trees (Bérard et al., Bioinformatics)

• Comparison between: – PhyML (PhylomeDB and Homolens databases )

– TreeBeST (Ensembl-Compara database)

– PHYLDOG

Assessing the quality of gene trees

Page 44: How to use PHYLDOG: a tutorial

1) Junk trees generate obesity

• Errors in gene tree reconstruction result in larger ancestral genomes – Better algorithms should yield smaller ancestral genomes

Page 45: How to use PHYLDOG: a tutorial

1) PHYLDOG fights genome obesity

●●

Extant genomes PhyML Ensembl−Compara PHYLDOG

5000

1000

015

000

Num

ber o

f Gen

es

**

TreeBeST

**: Student t-test p-value: 6.61e-06, Wilcoxon test p-value: 2.91e-11

Page 46: How to use PHYLDOG: a tutorial

Sus scrofa

Felis catus

Ornithorhynchus anatinus

Oryctolagus cuniculus

Loxodonta africana

Mus musculus

Gorilla gorilla

Dipodomys ordii

Monodelphis domestica

Vicugna pacos

Macaca mulatta

Tupaia belangeri

Procavia capensis

Spermophilus tridecemlineatus

Pongo pygmaeus

Tursiops truncatus

Microcebus murinus

Callithrix jacchus

Equus caballus

Erinaceus europaeus

Tarsius syrichta

Choloepus hoffmanni

Ochotona princeps

Cavia porcellus

Pan troglodytes

Bos taurus

Rattus norvegicus

Homo sapiens

Otolemur garnettii

Dasypus novemcinctusEchinops telfairi

Pteropus vampyrus

Macropus eugenii

Canis familiaris

Sorex araneus

Myotis lucifugus

Laurasiatheria

Afrotheria

Xenarthra

Marsupials

Primates

Glires

010

000

010

000

010

000

010

000

010

000

010

000

010

000PHYLDOG

TreeBeSTPhyML

1) PHYLDOG fights genome obesity

Page 47: How to use PHYLDOG: a tutorial

2) Junk trees break synteny groups

• We use Deco (Bérard et al. Bioinformatics 2013) to reconstruct ancestral synteny groups using gene trees

• Errors in gene tree reconstruction break synteny groups – Better algorithms should yield more genes in ancestral

synteny groups

Page 48: How to use PHYLDOG: a tutorial

0 1 2 3 4 5 6 7

0.0

0.1

0.2

0.3

0.4

0.5

Number of adjacencies

Prop

ortio

n of

anc

estra

l gen

es PHYLDOGTreeBeSTPhyML

2) Ancestral synteny says PHYLDOG gene trees are better

0.0Pr

opor

tion

of a

nces

tral

gen

es

in s

ynte

ny g

roup

s

Bérard et al., Bioinformatics, 2013

Page 49: How to use PHYLDOG: a tutorial

0.1

Ornithorhynchus anatinus

0.3

Ornithorhynchus anatinusMus musculusMus musculusMus musculusCavia porcellusMus musculus

Oryctolagus cuniculusCanis familiaris

Bos taurusHomo sapiens

Pongo pygmaeusOryctolagus cuniculus

Cavia porcellusEquus caballusEquus caballus

Bos taurusCallithrix jacchusHomo sapiens

Monodelphis domesticaSpermophilus tridecemlineatus

Homo sapiensOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinus

Mus musculusMus musculus

Ornithorhynchus anatinusOrnithorhynchus anatinus

Mus musculusMus musculusMus musculus

Cavia porcellus

Mus musculus

Oryctolagus cuniculus

Canis familiaris

Bos taurus

Homo sapiens

Pongo pygmaeus

Oryctolagus cuniculus

Cavia porcellus

Equus caballusEquus caballus

Bos taurus

Callithrix jacchusHomo sapiens

Monodelphis domestica

Spermophilus tridecemlineatus

Homo sapiens

Ornithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinus

Mus musculusMus musculus

TreeBeST PHYLDOG

An example gene family

Page 50: How to use PHYLDOG: a tutorial

Perspectives

!

• Improvement of the algorithms to reconstruct gene trees (e.g. Magali Semeria)

• Improvement of the algorithms to reconstruct the species tree

• Dealing with ILS

• Joint reconstruction of gene trees and gene alignments

Page 51: How to use PHYLDOG: a tutorial

Perspectives

!

• Improvement of the algorithms to reconstruct gene trees (e.g. Magali Semeria)

• Improvement of the algorithms to reconstruct the species tree

• Dealing with ILS

• Joint reconstruction of gene trees and gene alignments

Page 52: How to use PHYLDOG: a tutorial

The Ancestrome project

• Reconstructing a species tree and gene trees for a large number of species

• Reconstructing ancestral gene contents

• Inferring ancient metabolisms and lifestyles

• Inferring ancient communities

New insights into the evolution of life on Earth, and into genomic evolution

Page 53: How to use PHYLDOG: a tutorial

Postdocs wanted!

Page 54: How to use PHYLDOG: a tutorial

• LBBE collaborators (Lyon):

– Gergely Szöllősi (Budapest),

– Eric Tannier,

– Vincent Daubin,

–Manolo Gouy,

– Sophie Abby,

– Laurent Duret,

– Thomas Bigot, –Magali Semeria

Postdocs wanted!

Thank you!