Comparative and evolutionary analysis of genomes from Rickettsia -related endosymbionts

40
Comparative and evolutionary analysis of genomes from Rickettsia-related endosymbionts B. Franz Lang Tom Doak, Michael Lynch, Hans-Dieter Görtz, Henner Brinkmann, Hervé Philippe and G. Burger

description

Comparative and evolutionary analysis of genomes from Rickettsia -related endosymbionts B. Franz Lang Tom Doak, Michael Lynch, Hans-Dieter Görtz, Henner Brinkmann, Hervé Philippe and G. Burger. - PowerPoint PPT Presentation

Transcript of Comparative and evolutionary analysis of genomes from Rickettsia -related endosymbionts

Page 1: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Comparative and evolutionary analysis of genomes from Rickettsia-related

endosymbionts

B. Franz Lang

Tom Doak, Michael Lynch, Hans-Dieter Görtz, Henner Brinkmann, Hervé

Philippe and G. Burger

Page 2: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

My principle interest is in mitochondria their genes, proteins, and functions, and where they come from. This implies that I am most interested in bacteria whose ancestors gave rise to mitochondria. They, like bacterial endosymbionts, undergo never-ending most rapid evolutionary change. Life is’nt static, it evolves and so do pathways – hence my interest in how they change, get damaged and repaired (sometime by adopting alien genes), and get sometimes eliminated in evolutionary time. Following up the evolution of Rickettsia-related endosymbionts is a model for what occurred when mitochondria entered the eukaryotic cell. The basis for inferences are broadly sampled, perfect genome sequences and annotations, to map the evolution of pathways to a phylogenetic tree (that has to be correct with high confidence).

As you will notice this is a massive undertaken, and much of what I talk about is work in progress.

Page 3: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Outline of presentationIn the following, I will discuss the following:

• what are Rickettsia-like bacteria - history• why certain of them are more interesting than others• conceptual view of mitochondrial and eukaryotic origins• phylogenetic concepts, how to infer biologically meaningful

(correct) phylogenies • results from phylogenomic analyses to locate mitochondrial

and rickettsial origins• annotating genes and assigning EC numbers with AutoFact• some results on the highly reduced Holospora• using pathway hole inference to update annotations, and the

continuing problem of genome sequence quality

Page 4: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Rickettsia, Wolbachia, Ehrlichia, Orientia …

history

Rickettsia-like bacteria (including Wolbachia, Ehrlichia, Orientia …) are well-known obligate, intracellular pathogens of animals, that undergo progressive, reductive genome evolution. Instead of producing all metabolites by themselves, they take some from their host, continuously inventing new transporters (even for stealing ATP). In turn, we suspect that they produce certain components (e.g., biotin) that are shared with the host, creating some sort of perverse dependence on the intruder.

Page 5: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Rickettsia, Wolbachia, Ehrlichia, Orientia … history

Investigating their genetics and related functional pathways at a truly biochemical level is technically most difficult; there is no bacterial model that allows effective lab work. Inferences are almost all paper biochemistry based on genome sequences (in the future foreseeably including transcriptome data). Unfortunately, once introduced, erroneous interpretations and annotations are copied and perpetuated.

Page 6: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Holospora, Caedibacter, Ichthyophtirius … history

The search for Rickettsia-like models that are more easily investigated are endosymbionts of unicellular eukaryotes (ciliates) such as Holospora and Caedibacter. Other more recent additions are the fish pathogen Ichthyophtirius, causing ‘sudden, catastrophic death of aquarium fish’, and an endosymbiont of Stachyamoeba (an amoeboid, excavate protist) that we have found, and sequenced just two weeks ago.

Page 7: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Holospora, Caedibacter, Ichthyophtirius … history

The molecular basis of cellular infection by Holospora has been intensely studied in the H-D. Görtz and M. Fujishima labs – thus our interest in sequencing its genome. Holospora invades ciliates via the food vacuole, escapes into the cytoplasm, and enters into one of the nuclei (micro- or macro) where they propagate.

This project was interesting to us because of the potential of comparative genome analyses among several known endosymbionts and their phylogenomic analysis, in conjunction with mitochondria (i.e., identification of mitochondrial origins).

A partial Holospora genome sequence is analyzed in Lang et al. (2005) Jpn. J. Protzool. 38: 171-181

Page 8: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Holospora genome – history

Phylogenetic position of Holospora at the base of the Rickettsia/Ehrlichia/Wolbachia cluster of animal endosymbionts; together at the base (outside) of the α-Proteobacteria.

Yet: a long branch attraction artifact may move them towards the distant outgroup?

Page 9: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Holospora genome – history

In this paper, we reached the conclusion that the Holospora genome is much more derived than its endosymbiont neighbors. At a bit more than 1 Mbp it has lost a number of cellular functions and

pathways, including oxidative phosphorylation, with most of its key genes used for inferring the evolutionary origin of mitochondria .

It further contains a high number of insertion elements, which makes genome assembly most difficult.

Conclusion: finish genome sequence, but find Holospora relatives that are minimally derived and slowly evolving.

Page 10: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

History end – new start.

In the end, due to lacking funds, the Holospora genome remained uncompleted, until the M. Lynch group came to our rescue, more recently. They are now very close to completing Holospora obtusa (~ 1.4 Mbp, linear), and are getting close to Caedibacter caryophila as well. Likewise, we are about to complete the genome sequence of the Stachyamoeba endosymbiont with ~ 1.8 Mbp).

Page 11: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

On the origin of mitochondria and bacterial endosymbionts

The symbiotic introduction of mitochondria is a key event in eukaryotic evolution – a sizeable contribution of genetic material (~10% or more, species depending), essential for understanding the nature of the eukaryotic cell.

It occurred a billion or more years ago, thus phylogenetic inferences aimed at resolving eukaryotic origins are exceedingly difficult. The origin of mitochondria and Rickettsia-like bacteria is somewhere close to (but not within) α-Proteobacteria.

Yet, our insights remain plagued by phylogenetic artifacts --- published analyses are poor if not misleading. Genome sequences from diverse bacterial species (among them Holospora and Caedibacter) are the most promising way to overcome the current impasse.

Page 12: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

On the origin of mitochondria and bacterial endosymbionts

To obtain statistically significant and biologically meaningful results, it requires the use of broad taxon sampling and data from preferentially slowly-evolving species,

• minimally derived mtDNAs plus nuclear genomes from, for instance, relatives of jakobid flagellates (e.g., Reclinomonas americana), and

• a large variety of genomes from free-living α-Proteobacteria and endosymbionts close to mitochondria.

Page 13: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

What are jakobids?(e.g., Reclinomonas americana and Andalucia godoyi)

Page 14: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Why R. americana and A. godoyi?

Ongoing nuclear genome project on Reclinomonas, and in preparation for Andalucia.

Among jakobids, they have the slowest-evolving mt sequences, and Andalucia has even a few

more mt genes than the previous record set in Reclinomonas.

Page 15: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

How to infer biologically meaningful (correct) phylogenies

To obtain statistically significant and biologically meaningful results, use most realistic phylogenetic models (CAT …), which are ideally derived from and adapted to the data to be analyzed (CAT+GTR).

For this it needs lots of sequence, multiple gene sequences or proteins.

Page 16: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

What is CAT/PhyloBayes?We know that many a.a. sequence positions have specific

profiles that do not fit global evolutionary models such as WAG

A/S A/S/T A/P A/N/Q

Page 17: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

CAT (PhyloBayes) models this site-wise heterogeneity.Its use increases phylogenetic signal and reduces the impact of

artifacts (e.g., LBA). Even better, CAT + GTR infers profiles from the data (yet, very slow …)

Page 18: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Value of mitochondrial genes in phylogenetics

Extant eukaryotic lineages all have (or sometimes had) mitochondria, a parallel genetic universe with distinct phylogenetic markers – providing a comparative view and confirmation of nuclear gene phylogenies back to the time point where the mitochondrial endosymbiont was introduced.

More, it allows identifying known bacterial relatives.

Although the bacterium-derived mitochondrial genome is small (13 to 30 protein genes), nuclear genes of clearly α-proteobacterial origin and with

evidently mitochondrial function may be added. ~ 3,300 > 10,000 a. a.

Problem: nuclear genomes are hybrid monsters containing genes transferred from organelles and more !

Page 19: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

But nuclear genomes are wildly mosaic!

- Organelle genomes undergo massive gene loss, plus transfer to the nucleus.- Nuclear genomes therefore include proteobacterial (or cyanobacterial) genes.In addition, nuclear genes may be acquired by lateral transfer, from various sources.

The challenge: incongruent gene/genome/species phylogenies, often difficult to identify and resolve.

Page 20: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Eukaryote-eukaryote endosymbiosis further increases genomic mosaicism

?

From Keeling et al. 2004

Page 21: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Are we misled by eukaryote-eukaryote endosymbioses?

Almost unavoidably!

Phylogenies including data from stramenopiles, haptophytes, cryptophytes, chlorarachniophytes, … any secondary-plastid-containing group of species are a priori suspect; definitively so

when phylogenies with plastid genes (including the nucleus encoded ones) differ from other nuclear, and mitochondrial gene trees.

For the planned analysis, we will therefore use only mitochondrial and nuclear genes from jakobids, without known photosynthetic members, and with the highest number of mtDNA-encoded (i.e.,

unproblematic) genes.

Page 22: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

(I) Origin of mitochondria from within Proteobacteria

As a start, we have analyzed genomes from all > 500 Proteobacteria at GenBank (i) to check if the bacterial textbook topology (with rRNA data) is

reproduced (it is), and (ii) to confidently identify/exclude genes with a tendency for lateral transfer, and/or are plagued by paralogy. New

unpublished data: Holospora, Caedibacter, Stachyamoeba-endo

~ 1/2 of analyzed genes are totally unproblematic; transporters are virtually always questionable, as are many of the tRNA synthetases.

Trees with paralogs/transferred genes removed versus all proteins included are almost identical; i.e., contrary to the belief of some, phylogenetic issue are

minor even when not removing genes with occasional transfers.

Page 23: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts
Page 24: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Phylogenomic analysis, α-Proteobacteria plus mitochondria.

Dataset with 10,800 aligned a.a. positions, except for Holospora which is about half; PhyloBayes analysis (CAT, GTR).

Endosymbionts + Mitochondriabranch together, but outside α-Proteobacteria.

Strong potential for an LBA artifact of these fast-evolving species (only exception Caedibacter) attracting them (i) together and (ii) to the distant outgroup.

What happens when all fast-evolving species are removed?

Page 25: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Phylogenomic analysis, α-Proteobacteria plus mitochondria.

What happens when all fast-evolving species are removed?

Caedibacter and Stachyamoeba-endo now clearly branch within Rhodospirillales (confirmed with an independent dataset w/o the genes used here)

By inference, endosymbionts plus mitochondria potentially derive from within the Rhodospirillum/ Magnotospirillum clade.

But beware of more LBA artifacts, e.g. mitochondria – Rickettsias!

Page 26: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Phylogenomic analysis, α-Proteobacteria plus mitochondria – what next?

• Include more sequences from slowly-evolving relatives of Caedibacter and many more free-living Rhodospirillales.

• apply better phylogenetic models, adapted to A+T rich and fast-evolving genomes.

• eliminate fast-evolving (or heterotachous) sequence positions, which requires a much larger dataset (20-30,000 a.a.) -- to compensate for loss of sequence information.

Page 27: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

(II) Analyzing genes and metabolic pathways

• initial prediction of protein-coding genes (e.g., Glimmer, or simply conceptual ORFs)

• re-annotation with AutoFact (Blast against several reference databases such as uniref, kegg, cog, pfam, smart, and optimize by scoring; HMM profile search instead of Blast would be better, is under development)

• To gain sensitivity and be more certain in picking orthologs, it would be even better to combine AutoFact with comparative bacterial genome annotation that uses synteny information (e.g., Mage at Genoscope, currently under exploration)

Page 28: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Analyzing genes and metabolic pathways

• extract relevant data from AutoFact as food for pathway-tools (assign E.C. numbers)

• infer pathways, initial round

Page 29: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Analyzing genes and metabolic pathways(Example of AutoFact result, with EC number from Kegg)

Page 30: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Analyzing genes and metabolic pathways(example of database collection including Holospora and Ich)

Page 31: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Analyzing genes and metabolic pathways(Example of Holospora pathway overview graph; mousing over objects provides details)

Page 32: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Analyzing genes and metabolic pathways(Example of Holospora biotin synthesis I pathway)

Page 33: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Analyzing genes and metabolic pathways,second round

• check for pathway holes; if incomplete recheck presence of respective genes in genome (HMM profile searches for highest sensitivity)

• apply pathway comparisons among species to identify other potential inconsistencies; search missing in genome sequence

Using manual curation for this step would be overwhelming. We therefore work on scripting and automation, using HMM profile searches. For this we need to build models from proteins of closely related reference bacteria (preferentially Rhodospirillales) – thus the importance of knowing phylogenetic relationships and origins.

Page 34: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Analyzing genes and metabolic pathways …optimizing HMM profile models

For HMM profiles one needs to start with a multiple alignment (e.g., Muscle). We optimize this alignment with iterated rounds of HMMalign (criterion: best E-value), and then eliminate too close sequences based on a phylogenetic distance matrix – which in the end further improves the sensitivity of the resulting HMM model.

This approach works best when many sequences are available, thus the urgent need for more Rhodospirillales.

Page 35: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Continuing problems with sequence quality

When going through the process of finding missing genes, we noted that some have simply been missed, and that others contain frameshifts and were not considered.

Frameshifts might indicate that a species is on its way to dropping a function or whole pathway, or --- there is sequencing error. This is common with early Sanger technology but now resurges with pyrosequencing. For instance, in our current 454 project on Stachy-endo we have lots of potential error in homopolymer stretches and these are not at all flagged. A potential solution is adding Illumina sequences for error correction.

Page 36: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Conclusions

The Caedibacter/Holospora group of bacterial endosymbionts diverge from within Rhodospirillales, a deep divergence in α-Proteobacteria.

Caedibacter/Holospora diverge prior to mitochondria and the Rickettsia/Wolbachia/Ehrlichia (RWE) group of pathogens with the new Stachyamoeba-endo as its most slowly evolving member.

Mitochondria appear to be a sister group to RWE endosymbionts. Yet, this topology maybe be caused by a phylogenetic LBA artifact (?).

Holospora is highly derived and fast-evolving. It specifically lost oxidative phosphorylation, but curiously, retained the complete two, alternative pathways for biotin synthesis (a means for host dependence?).

Our results indicate a need of genome projects for broadly sampled relatives of Caedibacter and other slowly-evolving endosymbionts, and more free-living Rhodospirillales, to better resolve evolutionary relationships and the evolution of metabolic pathways.

Page 37: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Lab members and collaborators

Michael Lynch, Tom DoakGertraud Burger, Lise Forget (Montreal)

Henner Brinkmann, Hervé Philippe (Montreal)Andrew Roger, Alistar Simpson, Mike Gray (Halifax)

Iñaki Ruiz-Trillo (Barcelona)

… numerous others unnamed …

Page 38: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Thanks !

This work was possible thanks to generous and long-standing financial support by the

Canadian Institute of Health Research (CIHR)

Canadian Institute for Advanced Research (CIfAR)Canadian Research Chair ProgramGenome Quebec/Atlantic/Canada

GenomeQuébec

GenomeCanada

EUSKO JAURLARITZAGOBIERNO VASCO

EST data TBestDB

Page 39: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Thanks also

to the National Human Genome Research Institute (NHGRI/NIH), to endorse a multi-taxon genome sequencing initiative, to gain insights into

how multicellularity evolved. This initiative, the UNICellular Opisthokont Research iNitiative ('UNICORN') will generate genomic data from some

unicellular relatives of both animals and fungi.

G. Burger, M.W. Gray, P.W. Holland, N. King, B.F. Lang, A.J. Roger, I. Ruiz-Trillo

For more information seeRuiz-Trillo et al., Trends in Genetics 23 (2007).

To use these data for analyses at the genomic level,please contact members of the UNICORN project, either

for collaboration or for approval of use.

Page 40: Comparative and evolutionary analysis of genomes from  Rickettsia -related endosymbionts

Status of genome projects

In sequencing pipeline or close to finished:

Allomyces, Spizellomyces, Mortierella, Amoebidium, Sphaeroforma, Capsaspora, Amastigomonas, Proterospongia, Reclinomonas, Ministeria

DNA purification phase:

Andalucia, Malawimonas