How to use PHYLDOG: a tutorial
description
Transcript of How to use PHYLDOG: a tutorial
Getting the program• From internet:
• http://pbil.univ-lyon1.fr/software/phyldog/#try
!
• Using the USB keys in the room:
• contain VirtualBox
• contain the application along with the data to analyze
• LBBE collaborators (Lyon):
– Gergely Szöllősi (Budapest),
– Eric Tannier,
– Vincent Daubin,
–Manolo Gouy,
– Sophie Abby,
– Laurent Duret,
– Thomas Bigot, –Magali Semeria
Collaborators
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
D
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
D DL
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
LGTD DL
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
LGT ILSD DL
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
LGT ILS
PHYLDOG
D DL
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
LGT ILS
DL+LGT: Szollosi et al., PNAS
PHYLDOG
D DL
Species: A B C D
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
Species: A B C D
Discrete character:Continuous character:
a a b a0.1 0.2 0.2 0.4
TIME
LGT ILS
DL+LGT: Szollosi et al., PNAS
ILS: Not yet
PHYLDOG
D DL
What is PHYLDOG?• Program for the coestimation of species and gene trees at the
genome scale
• Probabilistic model of sequence evolution + model of gene duplication and loss
• Statistical framework
• Branch-wise parameters of duplications and losses
• Gene families evolve independently of each other
• Based on a parallel architecture using MPI
Genome-scale coestimation of species and gene trees. Boussau et al., Genome research. 2013 23:323:330.
Parallel architecture
Although it may not look like it, PHYLDOG infers rooted trees
Structure of the input data
Option files
family_X.option: options specific to gene family X (alignment file, substitution model options, gene tree search options)
GeneralOptions.txt: options concerning the species tree search, and options common to all gene families (species tree search options, duplication/loss model options, list of gene families)
Option files
family_X.option: options specific to gene family X (alignment file, substitution model options, gene tree search options)
GeneralOptions.txt: options concerning the species tree search, and options common to all gene families (species tree search options, duplication/loss model options, list of gene families)
Easy generation of basic option files using the prepareData.py script
PHYLDOG tutorialhttp://www.prabi.fr/redmine/projects/phyldogtoolt/wiki/Tutorial
• Installing PHYLDOG • Downloading files • Basic input files • Generating all the option files using the prepareData.py script • Running PHYLDOG • Diminishing the number of species considered • Diminishing the number of gene families considered • Running PHYLDOG, at last • Interpreting PHYLDOG's output • Going further
�������������
��������
� ���������
�������� �
�������������
���������������
��������
�������������� ���������������������
���������������������� ������������ ���������������
�
�
�
�
�
�
�
�
�
����������� !���"� !��#����!�#$��%
���������&$�%!�������������'(%!�#$�%
�������( )'�
����!�����*+ ('�,#$��%
����!��������&�����-���!�����&( ��� $�.��"'(%
���������/���
Why our current pipeline can be improved
�������������
��������
� ���������
�������� �
�������������
���������������
��������
�������������� ���������������������
���������������������� ������������ ���������������
�
�
�
�
�
�
�
�
�
����������� !���"� !��#����!�#$��%
���������&$�%!�������������'(%!�#$�%
�������( )'�
����!�����*+ ('�,#$��%
����!��������&�����-���!�����&( ��� $�.��"'(%
���������/���
•Gene alignments: •Error prone •Short •Point estimates
Why our current pipeline can be improved
�������������
��������
� ���������
�������� �
�������������
���������������
��������
�������������� ���������������������
���������������������� ������������ ���������������
�
�
�
�
�
�
�
�
�
����������� !���"� !��#����!�#$��%
���������&$�%!�������������'(%!�#$�%
�������( )'�
����!�����*+ ('�,#$��%
����!��������&�����-���!�����&( ��� $�.��"'(%
���������/���
•Gene alignments: •Error prone •Short •Point estimates
•Gene trees: •based on alignments •Point estimates
Why our current pipeline can be improved
�������������
��������
� ���������
�������� �
�������������
���������������
��������
�������������� ���������������������
���������������������� ������������ ���������������
�
�
�
�
�
�
�
�
�
����������� !���"� !��#����!�#$��%
���������&$�%!�������������'(%!�#$�%
�������( )'�
����!�����*+ ('�,#$��%
����!��������&�����-���!�����&( ��� $�.��"'(%
���������/���
•Gene alignments: •Error prone •Short •Point estimates
•Gene trees: •based on alignments •Point estimates
•Species trees: •based on gene trees
Why our current pipeline can be improved
�������������
��������
� ���������
�������� �
�������������
���������������
��������
�������������� ���������������������
���������������������� ������������ ���������������
�
�
�
�
�
�
�
�
�
����������� !���"� !��#����!�#$��%
���������&$�%!�������������'(%!�#$�%
�������( )'�
����!�����*+ ('�,#$��%
����!��������&�����-���!�����&( ��� $�.��"'(%
���������/���
•Gene alignments: •Error prone •Short •Point estimates
•Gene trees: •based on alignments •Point estimates
•Species trees: •based on gene trees
Why our current pipeline can be improved
Simulations to test PHYLDOG
rooted organism tree
• 40 species
D1D2
D3D4
D5
D6
L2L1
L4L3
L5
L6
rooted organism tree
numbers of duplications
and losses
• 40 species • Randomly pick duplication and loss rates per branch
Simulations to test PHYLDOG
D1D2
D3D4
D5
D6
L2L1
L4L3
L5
L6
rooted organism tree
numbers of duplications
and losses
rooted gene trees
• 40 species • Randomly pick duplication and loss rates per branch
Simulations to test PHYLDOG
D1D2
D3D4
D5
D6
L2L1
L4L3
L5
L6
rooted organism tree
numbers of duplications
and losses
rooted gene trees
• 40 species • Randomly pick duplication and loss rates per branch • Complex model of sequence evolution
Simulations to test PHYLDOG
D1D2
D3D4
D5
D6
L2L1
L4L3
L5
L6
PHYLDOG
rooted organism tree
numbers of duplications
and losses
rooted gene trees
? ? ?
• 40 species • Randomly pick duplication and loss rates per branch • Complex model of sequence evolution
Simulations to test PHYLDOG
D1D2
D3D4
D5
D6
L2L1
L4L3
L5
L6
PHYLDOG
rooted organism tree
numbers of duplications
and losses
rooted gene trees
? ? ?
• 40 species • Randomly pick duplication and loss rates per branch • Complex model of sequence evolution
!
Simulations to test PHYLDOG
Wrong modelPhyML
Wrong modelPHYLDOG
05
1015
20Tr
ee e
rror
Simulations:PHYLDOG builds accurate gene trees
Wrong modelPhyML
Correct modelPhyML
Wrong modelPHYLDOG
05
1015
20Tr
ee e
rror
Simulations:PHYLDOG builds accurate gene trees
Simulations: PHYLDOG accurately recovers numbers of duplications and losses
Simulations: PHYLDOG accurately recovers numbers of duplications and losses
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
1e−04 1e−03 1e−02 1e−01 1e+00
1e−0
41e−0
31e−0
21e−0
11e
+00
Expected Numbers
Rec
onst
ruct
ed N
umbe
rs
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●●
●
●
●
●
●●●
●
●
●
●●
●
●
●●
●
●●●
●
●
●
●
●●
●
●
●
●
DuplicationsLosses
Study of mammalian genomes
• Challenging but well-studied phylogeny
• 36 mammalian genomes available in Ensembl v. 57
• About 7000 gene families
• Correction for incomplete genomes
PHYLDOG finds a good species tree
Sus scrofa
Felis catus
Ornithorhynchus anatinus
Oryctolagus cuniculus
Loxodonta africana
Mus musculus
Gorilla gorilla
Dipodomys ordii
Monodelphis domestica
Vicugna pacos
Macaca mulatta
Tupaia belangeri
Procavia capensis
Spermophilus tridecemlineatus
Pongo pygmaeus
Tursiops truncatus
Microcebus murinus
Callithrix jacchus
Equus caballus
Erinaceus europaeus
Tarsius syrichta
Choloepus hoffmanni
Ochotona princeps
Cavia porcellus
Pan troglodytes
Bos taurus
Rattus norvegicus
Homo sapiens
Otolemur garnettii
Dasypus novemcinctusEchinops telfairi
Pteropus vampyrus
Macropus eugenii
Canis familiaris
Sorex araneus
Myotis lucifugus
Laurasiatheria
Afrotheria
Xenarthra
Marsupials
Primates
Glires
PHYLDOG finds a good species tree
Sus scrofa
Felis catus
Ornithorhynchus anatinus
Oryctolagus cuniculus
Loxodonta africana
Mus musculus
Gorilla gorilla
Dipodomys ordii
Monodelphis domestica
Vicugna pacos
Macaca mulatta
Tupaia belangeri
Procavia capensis
Spermophilus tridecemlineatus
Pongo pygmaeus
Tursiops truncatus
Microcebus murinus
Callithrix jacchus
Equus caballus
Erinaceus europaeus
Tarsius syrichta
Choloepus hoffmanni
Ochotona princeps
Cavia porcellus
Pan troglodytes
Bos taurus
Rattus norvegicus
Homo sapiens
Otolemur garnettii
Dasypus novemcinctusEchinops telfairi
Pteropus vampyrus
Macropus eugenii
Canis familiaris
Sorex araneus
Myotis lucifugus
Laurasiatheria
Afrotheria
Xenarthra
Marsupials
Primates
Glires
Sorex_araneus
Callithrix_jacchus
Dasypus_novemcinctus
Pongo_pygmaeus
Canis_familiaris
Vicugna_pacos
Otolemur_garnettii
Myotis_lucifugus
Microcebus_murinus
Sus_scrofa
Ornithorhynchus_anatinus
Dipodomys_ordii
Gorilla_gorilla
Tupaia_belangeri
Ochotona_princepsCavia_porcellus
Mus_musculus
Erinaceus_europaeus
Pteropus_vampyrus
Felis_catus
Homo_sapiens
Loxodonta_africana
Monodelphis_domestica
Tursiops_truncatus
Echinops_telfairi
Macaca_mulatta
Pan_troglodytes
Rattus_norvegicus
Choloepus_hoffmanni
Oryctolagus_cuniculus
Equus_caballus
Bos_taurus
Procavia_capensis
Macropus_eugenii
Spermophilus_tridecemlineatus
Tarsius_syrichta
Marsupials
Afrotheria
Xenarthra
Laurasiatheria
Primates
Glires
Duptree finds a pretty good species tree
Sorex_araneus
Callithrix_jacchus
Dasypus_novemcinctus
Pongo_pygmaeus
Canis_familiaris
Vicugna_pacos
Otolemur_garnettii
Myotis_lucifugus
Microcebus_murinus
Sus_scrofa
Ornithorhynchus_anatinus
Dipodomys_ordii
Gorilla_gorilla
Tupaia_belangeri
Ochotona_princepsCavia_porcellus
Mus_musculus
Erinaceus_europaeus
Pteropus_vampyrus
Felis_catus
Homo_sapiens
Loxodonta_africana
Monodelphis_domestica
Tursiops_truncatus
Echinops_telfairi
Macaca_mulatta
Pan_troglodytes
Rattus_norvegicus
Choloepus_hoffmanni
Oryctolagus_cuniculus
Equus_caballus
Bos_taurus
Procavia_capensis
Macropus_eugenii
Spermophilus_tridecemlineatus
Tarsius_syrichta
Marsupials
Afrotheria
Xenarthra
Laurasiatheria
Primates
Glires
Duptree finds a pretty good species tree
Bos_taurus
Monodelphis_domestica
Dipodomys_ordii
Myotis_lucifugus
Equus_caballus
Gorilla_gorilla
Mus_musculus
Choloepus_hoffmanni
Echinops_telfairiProcavia_capensis
Vicugna_pacos
Tursiops_truncatus
Spermophilus_tridecemlineatus
Callithrix_jacchus
Ornithorhynchus_anatinus
Canis_familiaris
Sus_scrofa
Cavia_porcellus
Rattus_norvegicus
Pan_troglodytes
Erinaceus_europaeus
Microcebus_murinus
Felis_catus
Pteropus_vampyrus
Loxodonta_africana
Ochotona_princeps
Sorex_araneus
Dasypus_novemcinctus
Tupaia_belangeri
Pongo_pygmaeus
Otolemur_garnettii
Homo_sapiens
Macropus_eugenii
Macaca_mulatta
Oryctolagus_cuniculus
Tarsius_syrichta
Marsupials
Afrotheria
Xenarthra
Laurasiatheria
Primates
Glires
iGTP finds a different species tree
Bos_taurus
Monodelphis_domestica
Dipodomys_ordii
Myotis_lucifugus
Equus_caballus
Gorilla_gorilla
Mus_musculus
Choloepus_hoffmanni
Echinops_telfairiProcavia_capensis
Vicugna_pacos
Tursiops_truncatus
Spermophilus_tridecemlineatus
Callithrix_jacchus
Ornithorhynchus_anatinus
Canis_familiaris
Sus_scrofa
Cavia_porcellus
Rattus_norvegicus
Pan_troglodytes
Erinaceus_europaeus
Microcebus_murinus
Felis_catus
Pteropus_vampyrus
Loxodonta_africana
Ochotona_princeps
Sorex_araneus
Dasypus_novemcinctus
Tupaia_belangeri
Pongo_pygmaeus
Otolemur_garnettii
Homo_sapiens
Macropus_eugenii
Macaca_mulatta
Oryctolagus_cuniculus
Tarsius_syrichta
Marsupials
Afrotheria
Xenarthra
Laurasiatheria
Primates
Glires
iGTP finds a different species tree
• Two approaches: 1. Looking at ancestral genome sizes
2. Assessing how well one can recover ancestral syntenies using reconstructed gene trees (Bérard et al., Bioinformatics)
Assessing the quality of gene trees
• Two approaches: 1. Looking at ancestral genome sizes
2. Assessing how well one can recover ancestral syntenies using reconstructed gene trees (Bérard et al., Bioinformatics)
• Comparison between: – PhyML (PhylomeDB and Homolens databases )
– TreeBeST (Ensembl-Compara database)
– PHYLDOG
Assessing the quality of gene trees
1) Junk trees generate obesity
• Errors in gene tree reconstruction result in larger ancestral genomes – Better algorithms should yield smaller ancestral genomes
1) PHYLDOG fights genome obesity
●
●
●
●
●●
●
●
Extant genomes PhyML Ensembl−Compara PHYLDOG
5000
1000
015
000
Num
ber o
f Gen
es
**
TreeBeST
**: Student t-test p-value: 6.61e-06, Wilcoxon test p-value: 2.91e-11
Sus scrofa
Felis catus
Ornithorhynchus anatinus
Oryctolagus cuniculus
Loxodonta africana
Mus musculus
Gorilla gorilla
Dipodomys ordii
Monodelphis domestica
Vicugna pacos
Macaca mulatta
Tupaia belangeri
Procavia capensis
Spermophilus tridecemlineatus
Pongo pygmaeus
Tursiops truncatus
Microcebus murinus
Callithrix jacchus
Equus caballus
Erinaceus europaeus
Tarsius syrichta
Choloepus hoffmanni
Ochotona princeps
Cavia porcellus
Pan troglodytes
Bos taurus
Rattus norvegicus
Homo sapiens
Otolemur garnettii
Dasypus novemcinctusEchinops telfairi
Pteropus vampyrus
Macropus eugenii
Canis familiaris
Sorex araneus
Myotis lucifugus
Laurasiatheria
Afrotheria
Xenarthra
Marsupials
Primates
Glires
010
000
010
000
010
000
010
000
010
000
010
000
010
000PHYLDOG
TreeBeSTPhyML
1) PHYLDOG fights genome obesity
2) Junk trees break synteny groups
• We use Deco (Bérard et al. Bioinformatics 2013) to reconstruct ancestral synteny groups using gene trees
• Errors in gene tree reconstruction break synteny groups – Better algorithms should yield more genes in ancestral
synteny groups
0 1 2 3 4 5 6 7
0.0
0.1
0.2
0.3
0.4
0.5
Number of adjacencies
Prop
ortio
n of
anc
estra
l gen
es PHYLDOGTreeBeSTPhyML
2) Ancestral synteny says PHYLDOG gene trees are better
0.0Pr
opor
tion
of a
nces
tral
gen
es
in s
ynte
ny g
roup
s
Bérard et al., Bioinformatics, 2013
0.1
Ornithorhynchus anatinus
0.3
Ornithorhynchus anatinusMus musculusMus musculusMus musculusCavia porcellusMus musculus
Oryctolagus cuniculusCanis familiaris
Bos taurusHomo sapiens
Pongo pygmaeusOryctolagus cuniculus
Cavia porcellusEquus caballusEquus caballus
Bos taurusCallithrix jacchusHomo sapiens
Monodelphis domesticaSpermophilus tridecemlineatus
Homo sapiensOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinus
Mus musculusMus musculus
Ornithorhynchus anatinusOrnithorhynchus anatinus
Mus musculusMus musculusMus musculus
Cavia porcellus
Mus musculus
Oryctolagus cuniculus
Canis familiaris
Bos taurus
Homo sapiens
Pongo pygmaeus
Oryctolagus cuniculus
Cavia porcellus
Equus caballusEquus caballus
Bos taurus
Callithrix jacchusHomo sapiens
Monodelphis domestica
Spermophilus tridecemlineatus
Homo sapiens
Ornithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinusOrnithorhynchus anatinus
Mus musculusMus musculus
TreeBeST PHYLDOG
An example gene family
Perspectives
!
• Improvement of the algorithms to reconstruct gene trees (e.g. Magali Semeria)
• Improvement of the algorithms to reconstruct the species tree
• Dealing with ILS
• Joint reconstruction of gene trees and gene alignments
Perspectives
!
• Improvement of the algorithms to reconstruct gene trees (e.g. Magali Semeria)
• Improvement of the algorithms to reconstruct the species tree
• Dealing with ILS
• Joint reconstruction of gene trees and gene alignments
The Ancestrome project
• Reconstructing a species tree and gene trees for a large number of species
• Reconstructing ancestral gene contents
• Inferring ancient metabolisms and lifestyles
• Inferring ancient communities
New insights into the evolution of life on Earth, and into genomic evolution
Postdocs wanted!
• LBBE collaborators (Lyon):
– Gergely Szöllősi (Budapest),
– Eric Tannier,
– Vincent Daubin,
–Manolo Gouy,
– Sophie Abby,
– Laurent Duret,
– Thomas Bigot, –Magali Semeria
Postdocs wanted!
Thank you!