Alberto M. R. DávilaEmail: [email protected]
Laboratório de Biologia Computacional e Sistemas
Instituto Oswaldo Cruz – FIOCRUZ, Rio de Janeiro, Brasil.
ProtozoaDB: comparative genomics approaches for the study of
protozoan species
Pathogenic protozoan species are commonly studied because they are a major
cause of mortality and morbidity in humans and domestic animals throughout the
world.
We have developed an automatic methodology to reconstruct the phylogenetic
species tree in Protozoa integrating different phylogenetic algorithms and
programs.
We demonstrate the utility of a supermatrix approach to construct phylogenomic
trees using 31 universal representative orthologs.
These universal orthologs are involved in metabolic pathways, thus might
potentially provide some extra information for the understanding of the
phylogenetic relationships and evolutionary processes of Protozoa.
Background
Illness caused by parasitic Protozoa are a major cause of diseaseworldwide, but because they are concentrated in low socio-economic parts of the world, they receive relatively little attentionfrom the pharmaceutical industry and major scientific fundingagencies.
Of the ten diseases targeted as research priorities by the WorldHealth Organization’s Special Program for Research and Trainingin Tropical Diseases (http://www.who.int/tdr) four are caused byprotozoan parasites (malaria, leishmaniasis, Chagas disease,African tripanosomiasis).
These diseases and other less dangerous ones (e.g. amoebiasisand trichomoniasis) are having an alarming increase ofrefractoriness cases to the main treatment. Treatment failure haspotentially a multifactorial origin with one of the major concernsbeing drug resistance.
Phylogenomics is being intensively used in the “Post GenomicEra”. Many different phylogenetic trees have been published on thebasis of different models of sequence evolution and utilization ofdifferent parameter settings and algorithms.
Introduction
Trypanosoma cruzi
Trypanosoma brucei
Leishmania major
Plasmodium falciparum
Although, the rRNA and othersingle genes have beenextremely valuable forphylogenetic studies but theyhave their limitations.
Chaudhary (2005) and Keeling(2005) showed results onavailable sequencing data.
Phylogenetic analysis definedfive supergroups of eukaryotes:(i) the plant and red/green algallineage;
(ii) a clade comprised ofanimals, fungi, slime molds andamebozoans;
and three groups that areentirely protozoan:
(iii) chromalveolates,
(iv) excavates and
(v) rhizaria.
Introduction
Universal Orthologs:
Ciccareli et al (2006): UO for tree of life
Serra et al (2008): Orthosearch
Science. 2006 Mar 3;311(5765):1283-7.
Selection and preparation of marker gene families
We based our methodology on Ciccarelli et al. (2006) study for the selection andconstruction of the species tree.
Thirty-one universal orthologous (UO) genes were downloaded in fasta format fromGenBank and then aligned using MAFTT (version 5.861)
Selection and preparation of marker gene families for protozoan species trees
Hidden Markov Models (HMM) profiles were constructed for the 31 UO (localdatabase) set and queried against all Protozoa sequences available in Refseq (RefSeq-release35.catalog-01/11/2009) and Genbank (NCBI-Flat File Release 172.0-06/15/2009).
HMM profiles were created (hmmbuild) and calibrated (hmmcalibrate) usingHMMER version 2.3.2 and searches done using hmmpfam with e-value “1e-5” as cut-off.
Those best hmmpfam hits of each of the 31 UO against protozoan genomes were usedto construct individual multiple alignments (Mafft v5.861).
Methods
Methodology 1 (M1):
Those individual alignments were concatenated, then resulting in a globalsupermatrix of 21,260 positions in a total of 74 species.
This supermatrix was used to generate a tree with PHYML using 100 bootstrapreplicates and the JTT matrix.
Methodology 2 (M2):
Those individual alignments were trimmed using TrimAl v1.2 aiming toeliminate the most variable regions of each of them. Those remaining conservedblocks were then concatenated, resulting in a global supermatrix of 12,807positions in a total of 74 species.
This supermatrix was used to generate a tree with PHYML using 100 bootstrapreplicates and the JTT matrix.
Methods
Complete
Gblocks
TrimAI
Complete
TrimAI
Concatenated genes aligned
Supermatrix
Supertree
Individual genes aligned
Figure 2. Algorithm for Supermatrix versus Supertree construction.
We reconstructed the protozoan species tree using several automated bioinformaticstools: downloaded 31 universal UO from GenBank, built HMM profiles with them,searched those profiles in protozoan genomes (43 species) and related species (31species) showing different level of completeness/coverage.
The species tree was formed by 3 major clades that were related to 3 groups of data:
(i) 26 species composed of at least 80% of UO (25/31) called G1,
(ii) 12 species composed of 50-79% (15-24/31) of UO called G2 and
(iii) 36 species composed of less than 50% (1-14/31) of UO called G3.
Results
Results
Figure 1. Phylogenomic supermatrix radiation tree of protozoan using masked sequences with TrimAl.
Was used a supermatrix of 12,807 positions in the total 74 protozoan species. Maximum likelihood tree was
constructed with Phyml, JTT as evolutionary model and bootstrap 100. (C1 is red, C2 blue and C3 black).
C1: 26 sp - at least 80% of UO
C2: 12 sp - between 50-79% of UO
C3: 36 sp - less than 50% of UOE.h
isto
lytic
a
E. dis
par
T. va
gin
alis
C
. hom
inis
C. parv
um
G
. la
mblia
L. in
fantu
m
L. m
ajo
r
L. bra
zili
ensis
L. enriettii
T. cr
uzi
T. bru
cei
P. t
etra
urel
iaP. c
auda
tum
T. therm
ophila
M. b
revic
ollis
P. viva
xP. k
nwolesiP. fa
lciparumP. yoeliiP. berghei
P. chaubadi
T. parva
T. annulata
B. bovis
H. andersenii B. natans
D. fasciculatum
P. pallidum L. amazonensis
D. citrinum
D. discoideum
T. pyriformis
T. malaccensis
T. pigm
entosa
T. p
ara
vora
x
P. a
ure
liaC
. roenberg
ensis
A. h
ealy
iL. d
igita
taP
. littora
lis
F. v
esic
ulo
susD
. virid
is
L. d
onovani
D. d
ichoto
ma
A. poly
phaga
A. caste
llanii
N. gru
beri
T. gondii
E. te
nella
P. yezoensis
P.p
urp
ure
a
G. te
nuis
tipita
taG
. the
ta
R. s
alin
a
C. para
doxa
E. huxleyi
O. sinensis
T. pseudonana
P. tricornutumH. akashiwoC. merolae
C. caldariumE. longa
E. gracilisM. jakobiformis
R. americana
C. crispus
P. ramorum
P. so
jae
P. in
festa
ns
S. fe
rax
Csy
O. d
anic
a
0.2
C1 was composed by only protozoan species, C2 was composed by rodophyta,cryptophyta, glaucocystophyceae, stramenopiles species, and C3 was composed bysome species of C1 and C2.
C1 presented monophyly of Protozoa, C2 presented monophyly of species related toProtozoa, and C3 polytomy of some Protozoa and related species. The more UOincluded in the supermatrix the better was the tree robustness inside the 3 clades.
Figure 2 shows the comparison of trees using no masked (M1) and TrimAl masked(M2) sequences. Both methodologies showed very similar topologies, however thetree utilizing TrimAl masked sequences showed higher bootstrap values. Figure 1 isthe same tree using TrimAl masked sequences but presented as radiation tree.
Results
Results
Figure 2. Phylogenomic supermatrix trees of protozoan using total sequences and conserved blocks (TrimAl)
sequences. Were used supermatrixes of 12,807 and 21,260 positions respectively in 74 species. Maximum likelihood
tree was constructed with Phyml, JTT as evolutionary model and bootstrap 100. (C1 is red, C2 blue and C3 black).
Edi Ehi Entamoeba
Trichomonas Tva Cho Cpa Cryptosporidium
Giardia Gla Lin Lma Lbr
Leishmania
Len Tcr Tbr Trypanosoma
Pca Pte Paramecium
Tetrahymena TthMonosiga Mbr
Pvi Pkn Pfa
Pyo Pbe
Pch
Plasmodium
Tpa Tan Theileria
Babesia BboHemiselmis HanBigelowiella Bna
Dfa Ppa
Lam Dci
Dictyostelium Ddi Ahe
Cro Pau Tpy Tma Tpi Tpar
Pli Ldi Fve
Dvi Ldo
Ddic Apo
Aca Ngr
Tgo Ete Pye
Ppu PorphyraGracilaria GteGuillardia GthRhodomonas RsaCyanophora CparEmiliana Ehu
TpsOdontella OsiPhaeodactylum PtrHeterosigma HakCyanidioschyzon CmeCyanidium Cca
Elo Egr
Mja Ram
Cri Pin
Pso Pra
Sfe Csy Oda
100100
100
76
7489
100100
98
100
27
82
22
13
81
49
14
34
100100
100
33
20
57
64 98
52
24
10061
1007
20
19
7730
3664
97
18
10
4026
4721
26
19
24
100100
100100
100
100
100
87
23
25 66
43
22
8510043
37
46
34
29100
3544
100
0.2
Edi (30) Ehi (30) Entamoeba
Trichomonas Tva (30) Cho (29) Cpa (30) Cryptosporidium
Giardia Gla (30) Lin (31) Lma (29) Lbr (31)
Leishmania
Len (2) Tcr (30) Tbr (31) Trypanosoma
Pca (1) Pte (31) Paramecium
Tetrahymena Tth (28)Monosiga Mbr (31)
Pvi (30) Pkn (31) Pfa (31)
Pyo (30) Pbe (31)
Pch (30)
Plasmodium
Tpa (31) Tan (30) Theileria
Babesia Bbo (30)Hemiselmis Han (22)Bigelowiella Bna (25)
Dfa (7) Ppa (7)
Lam (1) Dci (6)
Dictyostelium Ddi (31) Ahe (1)
Cro (4) Pau (2) Tpy (2) Tma (2) Tpi (2) Tpar (2)
Pli (8) Ldi (7) Fve (6)
Dvi (6) Ldo (2)
Ddic (7) Apo (1) Aca (4)
Ngr (3) Tgo (6)
Ete (6) Pye (19)
Ppu (20) PorphyraGracilaria Gte (22)Guillardia Gth (25)Rhodomonas Rsa (18)Cyanophora Cpar (17)Emiliana Ehu (17)
Tps (20)Odontella Osi (20)Phaeodactylum Ptr (20)Heterosigma Hak (18)Cyanidioschyzon Cme (19)Cyanidium Cca (18)
Elo (6) Egr (10)
Mja (10) Ram (12)
Cri (1) Pin (7)
Pso (6) Pra (6)
Sfe (7) Csy (2) Oda (3)
100100
100
83
8290
99100
100
100
27
73
15
11
73
52
21
33
95100
100
29
19
54
69 100
56
22
10053
1007
17
11
6623
3675
99
21
7
4928
2418
27
12
13
100100
100100
100
100
100
87
23
39 81
62
41
6310051
45
61
53
33100
3540
100
0.2
Supermatrix tree built
with total sequences
Supermatrix tree built
with conserved blocks
Edi
EhiEntamoeba
Monosiga Mbr
Dictyostelium Ddi
Tetrahymena Tth
Paramecium Pte
Lin
Lma
Lbr
Leishmania
Tcr
TbrTrypanosoma
Trichomonas Tva
Cho
CpaCryptosporidium
Giardia Gla
Tpa
TanTheileria
Babesia Bbo
Pvi
Pkn
Pfa
Pyo
Pbe
Pch
Plasmodium
Bigelowiella Bna
Guillardia Gth
93
100100
100
84
100
100
100
100
100
72
100
87
61
100
100
49
100100
10060
100
100
0.2
Most variable regions trimmed (>80 UO): Clade 1 (zoom)
We have presented a phylogenomic-based overview for Protozoa. Relationshipsbetween protozoan groups are in agreement with previous studies, showingmonophyly for protozoan.
On the other hand, phylogenetic information inferred from C3 is questionable due toincomplete genomes, suggesting that using less than 15 universal UO forphylogenomic reconstruction is not reliable.
The inclusion of more data/genes is necessary to obtain a robust tree.
Our phylogenomic-based methodology using a supermatrix approach proved to bereliable with protozoan genome data, suggesting that (a) more universal UO used thebetter, and (b) using the entire UO sequence or just a conserved block of it producesimilar reliable results.
Finally, we need to investigate further if this methodology could be extrapolated orreproduced to other taxonomic groups.
Conclusion
ProtozoaDBhttp://protozoadb.biowebdb.org
Sistema baseado na web para a execução do pipeline de genotipagem e
visualização dos resultados.
Página de Internet construída para a execução do pipeline de genotipagem via web, onde é possível
optar em se utilizar o pipeline de desenho de iniciadores, o pipeline de anotação de sequências
MethodologyDNA extraction
PCR using 33 pair of designed primers
Agarose gel visualization
PCR product purification
Sequencing
Annotation
Phylogenetic analyses
L. braziliensis
L. infantum
L. major
T.brucei
T.cruzi
P. falciparum
P. vivax
L. braziliensis
L. infantum
L. major
T.brucei
T.cruzi
P. falciparum
P. vivax
Alinhamento dos Iniciadores COG0092F e COG0092R
no gene da Proteína Ribossomal S3Forward primer (COG0092F) in the multiple alignment
Reverse primer (COG0092F) in the multiple alignment
Genotyping system visualization (multiple alignment) using
Jalview
Phylogenetic tree inferred with
Maximum Likelihood for
KOG0400 (Ribosomal protein
S15P/S13E). 1000 replications
(bootstrap). Sequences from
genbank and generated with our
primers are shown.
Methodology: short version
380 sequences from 36 UO220 sequences validated,
corresponding to 20 UO
33 pair of primers designed
8 pair of primers with
good potential for
multiplex PCR
Sequencing of 4 genes
Genotyping:
2 genes showed good
potential/variability
Genotyping:
2 genes showed low
variability
OrthoSearch Workflow (Orthologs inference)
COG/KOG
DBMAFFT hmmbuild
fastacmdformatdb
hmmsearch
hmmcalibrate
Ptn
DB
Reciprocals
Best Hits Finder
InterPROReannotated
sequences
hmmpfam
HMMER
BLAST
ConceptualPipeline
OrthoSearch results
Protozoan Orthologs
(OrthoMCL-based)
* Some can be protozoa-specific
GRID: http://grid.biowebdb.org
http://arpa.biowebdb.org
http://stingray.biowebdb.org
Collaborators
Marta Mattoso
M. Luiza Machado Campos
Edmundo C. Grisard
“Yoko” Cavalcanti
Students
Sérgio Serra
Kary Soriano Ocaña
Fabio Coutinho
Edgar Sarmiento
Diogo Tschoeke
Rafael Cuadrat
Daniel Loureiro
Rodrigo Jardim
Luisa Pitaluga
Funding
CNPq
FINEP
CAPES
FAPERJ
European Union (EELA-2)
Recent collaborators:
Elisa Cupolillo
Claudia D. Levy
Top Related