Microbial Evolution Zoology/Anthro/Botany 410 Nicole T. Perna April24, 2014.

Post on 20-Jan-2016

214 views 0 download

Transcript of Microbial Evolution Zoology/Anthro/Botany 410 Nicole T. Perna April24, 2014.

ECH3937 20431ECH3937 20167ECH3937 14722ECH3937 17419ECH3937 18541ECH3937 17896ECH3937 17863PlTTO1 141022SCRI1043 50855SCRI1043 53164SCRI1043 54408SCRI1043 57558ECH3937 17662ECH3937 17665ECH3937 17668ECH3937 17672ECH3937 17674SCRI1043 58972SCRI1043 59598

SCRI1043 59602

SCRI1043 59605

SCRI1043 59608

ECH3937 16585SCRI1043 48415ECH3937 18892SCRI1043 48511ECH3937 18585SCRI1043 58061

ECH3937 18765

SCRI1043 52036ECH3937 15168

SCRI1043 58277SenLT2 95332

MG1655 4744

EcoRIM 131994

EDL933 26073

Sfl301 88087Sfl2457T 72554

YPCO92 120790YPCO92 120791

ECH3937 16436ECH3937 17090

ECH3937 17537SCRI1043 50312YP91001 236316YPCO92 119344YPKIM 32701

ECH3937 16979SCRI1043 51018ECH3937 14618SCRI1043 54897

ECH3937 19858

ECH3937 19855ECH3937 19852

ECH3937 19851SCRI1043 51623ECH3937 19306

ECH3937 19050SCRI1043 54545SCRI1043 54551

ECH3937 17097SCRI1043 47533

ECH3937 20252

ECH3937 15600ECH3937 15603ECH3937 16115

SCRI1043 47561

YP91001 243830YPCO92 129020YPKIM 33881

ECH3937 15513SCRI1043 53830YP91001 241164YPCO92 117982YPKIM 32953

ECH3937 19309

ECH3937 14726

EcoRIM 134012EDL933 28099MG1655 10087

CFT073 79997

Sfl301 89357

Sfl2457T 74510

SenCT18 112185

SenLT2 100002

SenTy2 84856

ECH3937 14843SCRI1043 57364YP91001 238556YPCO92 120760YPKIM 31500

ECH3937 14536

SCRI1043 52281

ECH3937 19790

SCRI1043 47806

SCRI1043 47811

SenLT2 99999

SenCT18 112182SenTy2 84855

ECH3937 19718SCRI1043 56439ECH3937 18754SCRI1043 52063ECH3937 46680SCRI1043 48500

ECH3937 14824SCRI1043 57403ECH3937 18502SCRI1043 57501

ECH3937 18511SCRI1043 57474

EcoRIM 132579EDL933 26681MG1655 6288

Sfl301 88220

Sfl2457T 73134

PlTTO1 141025

YP91001 238664YPCO92 120888 YPKIM 31533

MG1655 6290CFT073 78420

EcoRIM 132580EDL933 26682

Sfl301 88221

Sfl2457T 73135

SenCT18 109228SenLT2 96205SenTy2 82672

SenLT2 103914

EcoRIM 135436

EDL933 29529

MG1655 14282

CFT073 81646

Sfl301 90598

Sfl2457T 76001SenCT18 107948SenLT2 95413SenTy2 83275SenCT18 112064SenLT2 99818SenCT18 113896SenLT2 101078SenTy2 85671YP91001 238660YPCO92 120881YPKIM 31532

SCRI1043 53573ECH3937 16380

Microbial Evolution

Zoology/Anthro/Botany 410Nicole T. PernaApril24, 2014

A couple of key facts

• Prokaryotes have been around a long time (2.5-3.5 GYA). Bacteria and Archaea diverged a very long time ago and are not more closely related to each other than to eukaryotes

• Prokaryotes exhibit tremendous diversity of habitats, lifestyles, and metabolic strategies

Important applications of microbial evolution

Critical Topics Already Introduced

• Genomic revolution and genome evolution– Core vs. Variable fractions of genomes– Pan-genome– Genome size and organization

• Horizontal (Lateral) Gene Transfer (HGT)– There is no “tree of life”– How frequent is HGT?

• Bacterial species - is there such a thing? What do we mean?

Assigned reading

Microbial genome sequence availability is exponentially increasing

NCBI Genome Project List

As of 4/19/2012 (2013):

2029 complete bacterial genomes (2510)

134 complete archaea (262)

3313 draft bacteria (>10K)44 draft archaea

4600 bacteria – no data yet49 archaea – no data yet

http://www.ncbi.nlm.nih.gov/genome/browse/

How well sampled is prokaryotic diversity by current genome sequences?

Koonin and Wolf 2008 perspective:

• Uncultivated organisms remain problematic• Only 10% of the genes in major metagenomic samplings have

no detectable homologs

“The possibility, certainly, remains that major new and, perhaps, unusual groups of archaea and bacteria dwell in complex and unusual habitats. Nevertheless, it appears likely that the current collections of archaeal and bacterial genomes provide a reasonable approximation of the diversity of prokaryotic life forms on earth.”

Genomic Encyclopedia of Bacteria and Archaea (GEBA) Project

• Objective – sequence genomes selected solely for their phylogenetic novelty (plus in depth sampling of a single phylum)

• …based on 16S rDNA tree

• Wu et al. Nature. 2009 Dec 24; 462(7276):1056-60.

DY Wu et al. Nature 462, 1056-1060 (2009) doi:10.1038/nature08656

Maximum-likelihood phylogenetic tree of the bacterial domain based on a concatenated alignment of 31 broadly conserved protein-coding genes16. Phyla are distinguished by colour of the branch and GEBA genomes are indicated in red in the outer circle of species names.

53 GEBA bacteria accounted for 2.8–4.4 times more phylogenetic diversity than randomly sampled subsets of 53 non-GEBA bacterial genomes

DY Wu et al. Nature 462, 1056-1060 (2009) doi:10.1038/nature08656

Rate of discovery of protein families as a function of phylogenetic breadth of genomes.

Even discovered a bacterial homolog of eukaryotic cytoskeleton protein, Actin

Evolution-oriented reasons to target genomes for sequencing

• Maximize sampling of diversity• Understand structure of particular populations

and/or species• Make targeted comparisons to understand the

genetic basis of phenotypic differences

Size and organization of microbial genomes (Koonin and Wolf 2008)

Size Range = 180 Kbp – 13 Mbp

Structure of a prokaryotic genome

• One circular chromosome is typical.• Some have other replicons, such as linear or

circular plasmids.• Some have more than one chromosome,

generally distinguished from a plasmid by the presence of at least one “essential” gene.

• Some have linear chromosomes.

Fitch WM. Trends Genet. 2000 May;16(5):227-31.

Analogy vs Homology

Analogy

The relationship of any two characters that have descended convergently from unrelated ancestors.

Homology

The relationship of any two characters that have descended, usually with divergence, from a common ancestral character.

Orthology

The relationship of any two homologous characters whose common ancestor lies in the cenancestor of the taxa from which the two sequences were obtained.

Paralogy

The relationship of any two homologous characters arising from a duplication of the gene for that character.

Xenology

The relationship of any two homologous characters whose history, since their common ancestor, involves an interspecies (horizontal) transfer of the genetic material for at least one of those characters.

Test Yourself

• A1 – B1• A1 – B2• A1 – C3• B1 – C2• C2 – C3• B2 – C3• C3 – AB1

Homology on a Genome-Scale

• How many and which genes are common to two or more organisms?

• Which genes differentiate one organism from another?

• How is homology related to function?

A phylogenetic perspective

• Orthologs are the set of genes/proteins with gene trees identical to the species tree.

• We can understand other types of homology relationships by comparison to the species tree.

• But often we don’t know the species tree, and phylogenetic methods are complex

Consider two genomes

• Use BLASTP to compare one set of proteins (proteome) to the other

• Which set will you use as the query and which as the database?

• What criteria will you use to define “a match”?

GenomeA – gene 1GenomeA – gene 2GenomeA – gene 3

GenomeB– gene 1GenomeB – gene 2GenomeB – gene 3

A1, A3, B2 and B3 are homologs (assuming the aligned regions overlap)

Reciprocal Best Hits

• Use BLASTP to compare sets of proteins (proteome) to each other– First using GenomeA to query against GenomeB– Then using GenomeB to query against GenomeA– Save only one best match for each query– Save only the reciprocal best matches as “orthologs”

GenomeA – gene 1GenomeA – gene 2GenomeA – gene 3

GenomeB– gene 1GenomeB – gene 2GenomeB – gene 3

GenomeA – gene 1GenomeA – gene 2GenomeA – gene 3

GenomeB– gene 1GenomeB – gene 2GenomeB – gene 3

GenomeA – gene 1GenomeA – gene 2GenomeA – gene 3

GenomeB– gene 1GenomeB – gene 2GenomeB – gene 3

Lose A3-B2 and A1-B3 homology

Software/Methods for Predicting Orthologs from Genome Sequences

• RBH• RSD (Reciprocal Shortest Distance)• INPARANOID• RIO• Orthostrapper• Ortholuge• TribeMCL• OrthoMCL

Method Comparison

Chen F, Mackey AJ, Vermunt JK, Roos DS. PLoS ONE. 2007 Apr 18;2(4):e383.

Core and variable genes- single genome perspective

A small number of genes have orthologs in all microbial genomes (core)

More genes have orthologs in many genomes, but not all (shell)

Some genes are rare and have orthologs in only a few genomes (cloud)

Some are unique to one genome (ORFans)

Core and variable genes – species perspective (pan-genome)

For some species as a whole,

The number of core (plus shell) genes can be much smaller than the variable fraction (cloud plus ORFans)

And the pan-genome can be very large

Touchon et al. PLoS Genetics. 2009

Different types of pan-genomes

Figure 3. Power law regression for species with open and closed pan-genomes. Tettelin et al. Curr Opin Microbiology 2008:11(5).

Open vs Closed Pan-genomes

• Open– Number of new genes discovered continues to grow

as additional genomes of the species are sequenced– Organisms live in diverse environments and are

genetically amenable to horizontal gene transfer• Closed

– Number of new genes discovered is very small as additional genomes of the species are sequenced

– Organisms have little exposure to other organisms and/or are refractory to horizontal gene transfer

Horizontal Gene Transfer

• Mechanisms include conjugation, transduction and transformation

• Can introduce entirely new genes and gene clusters into genomes (grow the pan-genome)

• Can replace existing genes with functionally equivalent (?) xenologs (scramble phylogenetic history)

Horizontal Gene Transfer

• How prevalent is it?– We don’t know. Debates continue largely based on the

challenges of separating the error associated with phylogenetic reconstruction from true differences in phylogenetic signal

• Who is doing it?– We don’t know. Same problem as above.– Good evidence that it is much more frequent within (some)

species than between– Some evidence for relationship with evolutionary distance

and/or commonality of enviroment

SSU rDNA perspective

EVOLUTION: Genome Data Shake Tree of LifeE Pennisi - Science, 1998 - sciencemag.org

The ring of life provides evidence for a genome fusion origin of eukaryotes MC Rivera, JA Lake - Nature, 2004

The net of life: reconstructing the microbial phylogenetic networkV Kunin, L Goldovsky, N Darzentas, CA … - Genome Research 2005

The tree of one percentT Dagan, W Martin - Genome biology, 2006

Uprooting the tree of lifeWF Doolittle - Evolution: a Scientific American reader, 2006

Comparison of phylogenies for nearly universally conserved genes

102 ML trees for 100 taxa

Objective – compare topological distance between trees

New metric called IS (inconsistency score) = fraction of splits two trees have in common

The network of similarities among the nearly universal trees (NUTs). (a) Each node (green dot) denotes a NUT, and nodes are connected by edges if the similarity between the respective edges exceeds the indicated threshold. (b) The connectivity of 102 NUTs and the 14 1:1 NUTs depending on the topological similarity threshold.

Real trees are more similar to each other than randomly simulated trees

Although no single tree appears to represent the evolutionary history of these organisms, there is distinctly preserved phylogenetic signal across the dataset as a whole

The big divide?

• Look for evidence of HGT between bacteria and archaea

• 56% of NUTs separated the groups perfectly

• 44% show at least one HGT– 13% from archaea to bacteria– 23% from bacteria to archaea– 8% both directions

The supernetwork of the NUTs. Puigbò et al. Journal of Biology 2009 8:59 doi:10.1186/jbiol159

Expanding to ~6800 other predicted ortholog clusters

• Network connectivity is greatly reduced

• Different functional categories of genes show different levels of connectedness

Network representation of the 6,901 trees of the forest of life. The 102 NUTs are shown as red circles in the middle. The NUTs are connected to trees with similar topologies: trees with at least 50% of similarity with at least one NUT (P-value < 0.05) are shown as purple circles and connected to the NUTs. The rest of the trees are shown as green circles.Puigbò et al. Journal of Biology 2009 8:59 doi:10.1186/jbiol159

Proc Natl Acad Sci U S A. 2005 Oct 4;102(40):14332-7.

Highways of obligate gene transfer within and among phyla and divisions of prokaryotes, based on analysis of the 22,348 protein trees for which a minimal edit path could be resolved

Beiko R G et al. PNAS 2005;102:14332-14337

©2005 by National Academy of Sciences

Horizontal Transfer within speciesEstimate that a given basepair is 100 times more likely to have undergone a recombination event than a point mutation within the species E. coli, so how can we justify representing the relationship between strains with a tree like structure?

Modeling and simulation support inference of a tree summarizing dominant signal AS LONG AS patterns of recombination are more or less random between lineages

Touchon et al. PLoS Genetics. 2009

Major processes affecting prokaryotic genome evolution (Koonin and Wolf, 2008)

(1) Genome streamlining under strong selection.(2) Neutral gene loss and genome degradation under weak

selection (or neutral).(3) Innovation and complexification via gene duplication.(4) Innovation via operon shuffling.(5) Innovation and complexification via HGT, in particular, of

partially selfish operons, a process that often leads to nonorthologous gene displacement.

(6) Replicon fusion, propagation of mobile elements and other interactions between the relatively stable chromosomes and the mobilome.

Test Yourself

• A1 – B1• A1 – B2• A1 – C3• B1 – C2• C2 – C3• B2 – C3• C3 – AB1

Test Yourself

• A1 – B1 = Ortho• A1 – B2 = Ortho• A1 – C3 = Ortho• B1 – C2 = Para (out)• C2 – C3 = Para (in)• B2 – C3 = Ortho• C3 – AB1= Xeno