Yuri Wolf

42
National Center for Biotechnology Information Evolving ideas: paradigm shifts in evolutionary biology from Darwin's times to the age of genomics Yuri Wolf February 2014, Minot State University

description

Evolving ideas : paradigm shifts in evolutionary biology from Darwin's times to the age of genomics. Yuri Wolf. February 2014, Minot State University. Overview. Basic Darwinian concepts Synthetic theory of evolution Paradigm shifts (accomplished and emerging): Selection and drift - PowerPoint PPT Presentation

Transcript of Yuri Wolf

Page 1: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Evolving ideas:paradigm shifts in evolutionary biology from Darwin's times to

the age of genomics

Yuri Wolf

February 2014, Minot State University

Page 2: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Overview

Basic Darwinian concepts

Synthetic theory of evolution

Paradigm shifts (accomplished and emerging):

• Selection and drift

• Darwinian and Lamarckian modes of evolution

• Tree of Life and Forest of Life

• Genomes and supergenomes

• Molecular Clock and Universal Pacemaker

Page 3: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Basic Darwinian Concepts

Charles Darwin (1859)

Page 4: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Basic Darwinian Concepts

Varia

tion

Heredity

Selection

Encapsulates the conceptual core of evolutionary biology.

Page 5: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Synthetic Theory of Evolution

The following concepts were solidified by synthetic theory of evolution:

• discrete genes are the basis of heredity; genomes are collections of genes; progeny inherits parental genomes

• random mutations alter genes creating new alleles

• different alleles contribute to organism fitness

• selection changes allele frequencies in populations

• evolution is sufficiently described in terms of changing allele frequencies in populations (including loss and fixation)

• evolution is a continuous gradual process over all extant and extinct species that descend from a single Universal Common Ancestor

Page 6: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Synthetic Theory of Evolution

Mutation effects are small in magnitude and random in direction; if selection acts "symmetrically" (negative selection), population is at equilibrium; if selection acts "asymmetrically" (positive selection), population experiences changes from generation to generation.

parents action of mutation on

progeny

action of selection on

progeny

next generation

Page 7: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Selection and Drift

Finite populations are subject to stochastic sampling (Wright 1932, 1948). For a long time it was believed that the natural populations are too large to take this into account (Fisher 1930).

Elucidation of the role and structure of DNA gradually led to realization that alleles are generated and recombined at the level of single nucleotides (i.e. the number of (semi-) independently inherited units is enormous). Selection under realistic circumstances cannot act on so many units, but mutations keep occurring (Kimura 1968; King and Jukes 1969).

A more realistic analysis of population structure and dynamics suggests that effective population sizes are limited in nature, are often in the 104―106 range and probably never exceed 109 (Lynch 2007).

Page 8: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Selection and Drift

Neutrality is the prevalent mode of nucleotide and protein sequence evolution. It has important practical consequences for researchers:

• sequence alignments show functionally important regions

• sequences carry historical information (Darwin 1959, Ch. X)

Paradigm shift: The neutral nature of observed evolutionary changes becomes the null hypothesis.

Not "What was the reason for this?", but "Was there a reason for this?".

Page 9: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Lamarck, Darwin and WrightThree major conceptual models of evolution:

beneficial mutations

mutation-directing mechanism

environmental factors

adapted organism

random mutations

random mutagenesis

beneficial mutations fixed by selection; adapted organism

environmental factors

selection

random mutations

random mutagenesis

beneficial mutations fixed by chance; adapted organism

random fixation

Lamarck

Darwin

Wright

Page 10: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Lamarck, Darwin and Wright

Three major conceptual models of evolution:

beneficial mutations

mutation-directing mechanism

environmental factors

adapted organism

random mutations

random mutagenesis

beneficial mutations fixed by selection; adapted organism

environmental factors

selection

random mutations

random mutagenesis

beneficial mutations fixed by chance; adapted organism

random fixation

Lamarck

Darwin

Wright

a continuum, depending on relative strength of drift and selection

radically different modality?

Page 11: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Lamarck, Darwin and Wright

attack by a novel phage

attack by a known phage

cell with CASS

cell accidentally survives cell acquires immunity to phage

phage destroys the cell

phage DNA (RNA) is targeted by CASS

phage attack fails

CRISPR-Cas system: an almost purely Lamarckian path to adaptation

HGT to an organism, entering a particular environment, is enriched with genes, adaptive in this environment

Page 12: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Lamarck, Darwin and Wright

deterministic

stochastic

drift

draft

hitchhiking

selection

gen

era

tion

of v

aria

tion

mutations

HGT

recombination

plasmid acquisition

CRISPR-Cas

fixa

tion

of v

aria

tion

Lam

arc

kia

n m

oda

lity

Da

rwin

ian

mod

alit

y

Paradigm shift: Lamarckian and Darwinian modalities form a continuum.

look-ahead mutations

Koonin & Wolf 2009, 2010, 2012

epigenetics

Page 13: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Thinking of the history of life in term of phylogenetic trees is as old as scientific biology.

Charles Darwin 1859. Origin of Species [one and only illustration]: "descent with modification"

Ernst Haeckel 1879The Evolution of Man

Tree of Life

Page 14: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Advent of molecular phylogenetics – expectations of objectively reconstructed complete Tree of Life.

Woese 1990. Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. PNAS 87, 4576-4579 [Figure 1, modified]

Tree of Life

Page 15: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Genomic era – growing frustration with discrepancies between the trees reconstructed for individual genes and heroic efforts to overcome the noise. Role of horizontal gene transfer in the evolution of prokaryotic genomes is established.

Major lines of approach:

• gene repertoire and gene order• distribution of distances between orthologs• concatenated alignments of "non-transferable" gene cores• consensus trees and supertrees

Ciccarelli 2006. Towards automatic reconstruction of a highly resolved tree of life. Science 311, 1283-1287 [Figure 2]

Tree of Life in Genomic Era

Page 16: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Doolittle 2000. Uprooting the tree of life. Sci. Am. 282, 90-95 [modified]

Tree of Life, Rejected

Bacteria Archaea

Eukaryotes

Bacteria Archaea

Eukaryotes

Troubled times – "uprooting" of TOL for prokaryotes.

• horizontal gene transfer is rampant; no gene is exempt• histories of individual genes are non-coherent with each other• vertical signal is completely lost (or never existed at all)• there are no species (or other taxa) in prokaryotes• a consistent signal we observe is created by biases in HGT

"Standard Model" "Net of Life"

Page 17: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Forest of Life – MethodsSource data and basic analysis methods:

• 100 hand-picked microbial genomes (41 archaea and 59 bacteria) representing a "fair" sample of prokaryote diversity (as known in 2008)

• clusters of orthologous genes (NCBI COGs and EMBL EggNOGs)• multiple protein sequence alignments → index orthologs → ML

phylogenetic trees• 6901 trees cover 4-100 species; of them 102 cover 90-100 species

(Nearly Universal Trees)• direct tree comparison (distances between trees)• quartet decomposition; analysis of quartet spectra• simulation evolutionary models

Page 18: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Forest of Life – Analysis

0

0.5

1COG0541

COG0532

COG0092

COG0100

COG0090

COG0528

COG0096

COG0525

COG0051

COG0452

COG0495

COG0172

COG0089

COG0522

COG0124

COG0185

COG0094

COG0126

COG0519

COG0540

COG0149

COG0198

COG0177

COG0057

COG0009

COG0537

IS

NUTs

NUTs are much closer to each other than expected by chance

random

NUTs form a tightly connected network when clustered by similarity

NUTs don’t form clusters (random scatter around center)

NUTs are connected to the rest of the forest

Page 19: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Forest of Life – Analysis

NUTs are dominated by tree-like descent

NUTs FOL

0.63 +/- 0.35 0.39 +/- 0.31

“Tree-like” vs “Net-like” components of the trees (how many quartets agree/disagree with the consensus tree).

Overall the forest of life is dominated by network-like relationships (HGT)

Page 20: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Forest of Life – AnalysisSimulated example of 16 trees for 10 organisms:

No two trees are the same; each contains 2 random deviations from the consensus tree. Common statistical trend is visible.

Page 21: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Forest of LifeParadigm shift: “Tree of Life” isn’t a useful description of evolutionary history of prokaryotes; “Forest of life” (a.k.a. “phylome”) is a better framework.

Highly conserved nearly universal genes, however, retain the history of tree-like descent of core genes (mostly translation-related).

When necessary, this history can be extracted and used to describe the central statistical trend in nearly universal trees or as the first approximation of the relationships between organisms.

Puigbo 2009, 2010

Page 22: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Genomes and PangenomesPangenome – sum total of different genes, sound in a group of organisms.

1

10

100

1000

10000

0 50 100 150 200 250 300 350 400

Num

ber o

f CO

Gs

Number of Organisms

DATA

Core

Cloud

Shell

338 Archaea and Bacteria

1

10

100

1000

10000

0 5 10 15 20 25 30 35 40 45

Num

ber o

f CO

Gs

Number of Organisms

DATA

Core

Cloud

Shell

41 Archaea

1

10

100

1000

10000

0 5 10 15 20 25 30 35 40 45 50

Num

ber o

f CO

Gs

Number of Organisms

DATA

Core

Cloud

Shell

44 Escherichia and Salmonella

Pangenomes, constructed for different groups, display qualitatively similar distributions by the fraction of genomes a gene is common to.

Page 23: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Genomes and PangenomesPangenome – sum total of different genes, sound in a group of organisms.

1

10

100

1000

10000

0 50 100 150 200 250 300 350 400

Num

ber o

f CO

Gs

Number of Organisms

DATA

Core

Cloud

Shell

338 Archaea and Bacteria

Cloud: ~24000 genes

Shell: ~5700 genes

Core:~70

genes

Page 24: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Genomes and PangenomesCore, shell and cloud in genomes and pangenomes.

individual bacterial genome

(genes)

prokaryotic pangenome(families)

core

shell

cloud

Page 25: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Genomes in FluxThe 109 genomes of for Escherichia, Salmonella, Enterobacter and Citrobacter have the genome size of 4,700±420 genes and are separated by the evolutionary distance of 0.13 substitutions per site (on the order of 10 million years).

There are only 996 (20% of genome) families shared by all of them. 24,110 (x5 genome size) more families are found in two or more isolates, comprising additional 78% of the genome. 9,759 (x2 genome size) genes are found in one genome only (~90 or 2% in each genome).

Only 20% of a genome remained intact in 10 million years, many thousands of genes acquired, lost and exchanged.

Page 26: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Genomes and PangenomesPangenomes – theoretical and practical questions:

• complete genomes of many groups are sequenced. At this point, sequencing each new isolate usually discovers genes never seen before. Will this trend continue? Or will we, at some point, discover all (most) genes of this group and newly sequenced isolates will consist of different combinations of already known genes?

• is there some objective reality behind the concept of pangenome, or it is purely artificial?

Page 27: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Pangenomes and Supergenomes Supergenome – set of genes, compatible with and available to a group of organisms.

supergenome

genomes

ancestral genome

genomes

supergenome

pangenome

Page 28: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Supergenome Size Two common approaches to estimate supergenome size:

sampling curvesTettelin 2005

explicit evolutionary modelingBaumdicker 2012

Page 29: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Supergenome Size Our alternative approach (work in progress):

number of multiple gains estimates supergenome size

gene 1

gene 2

gene 3

gains and

losses

Page 30: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Genomes and SupergenomesParadigm shift: genomes of prokaryotes are in a state of flux, gaining and losing tens of genes per millennium; cores of supergenomes (directly available as pangenomes) are relatively stable and provide a good description of groups of organisms.

Wolf & Koonin 2013; Lobkovsky 2013; work in progress

Page 31: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Molecular ClockDivergence between orthologous sequences is proportional to time separating the species.

Different genes evolve at specific, roughly constant rates.

Zuckerkandl & Pauling 1962

divergence time

dis

tan

ce

time

rate

sampling error

Page 32: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Molecular ClockUnder MC all individual gene trees are ultrametric (up to a sampling error) and identical to the species tree up to a scaling factor (evolution rate).

ABCDEFGH

time

ABCDEFGH

distance

ABCDEFGH

distance

species tree

gene 1 gene 2

Page 33: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Molecular ClockMost of the real phylogenetic trees are far from being ultrametric.

Molecular clock is substantially overdispersed.

time

rate

0.2

ideal expected based on sampling error

observed

Page 34: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Relaxed Molecular ClockRelaxed molecular clock models allows for rate variation.

Rates are sampled from prior distributions with limited variance, independently or in autocorrelated manner.

Genes are either analyzed individually, or as concatenated alignments (implying evolution as a single unit).

time

rate

Page 35: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Universal PacemakerUniversal Pacemaker model assumes that evolutionary time runs at different pace in each lineage.

Under the UPM, species trees are intrinsically non-ultrametric.

ABCDEFGH

AB

CDEF

GH

time pacemaker ticks

Page 36: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Pacemaker vs ClockBoth overdispersed MC and UPM models predict that individual gene trees would deviate from ultrametricity.

Under MC these deviations are expected to be uncorrelated.

Under UPM these deviations are expected to be correlated, so there exists a non-ultrametric pacemaker tree that can significantly reduce variance of observed rates.

A testable hypothesis!

2,300 trees of 100 prokaryotic species;

7,000 trees of 6 Drosophila species

1,000 trees of 9 yeast species

5,700 trees of 8 mammalian species

Page 37: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Pacemaker vs Clock2,300 trees of 100 prokaryotic species;

7,000 trees of 6 Drosophila species

1,000 trees of 9 yeast species

5,700 trees of 8 mammalian species

All show an overwhelming support to UPM model.

Snir 2012; work in progress

Page 38: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Pacemaker Properties

Genes do not cluster in the tree shape space (i.e. no evidence of multiple pacemakers).

Variance of observed rate exceeds the sampling and rate estimation variance by a factor of 2 (i.e. half of the observed variance originates from biological, not technical sources).

-5

-4

-3

-2

-1

0

1

2

3

-4 -3 -2 -1 0 1 2 3 4 5

PC2

PC1-3

-2

-1

0

1

2

3

4

-5 -4 -3 -2 -1 0 1 2 3 4

PC2

PC1

A B

Page 39: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Universal PacemakerParadigm shift: Universal Pacemaker is a more general model, better supported by evidence than Molecular Clock.

Pacemaker seems to be Universal in both senses (operates across all genes in a genome and in all organisms).

Different lineages evolve at individual rates, possibly faster or slower than related organisms. Lineage-specific evolution rates are probably determined by population dynamics.

Individual gene evolution rates deviate from the pacemaker-derived expectation for both technical (sampling fluctuations, calculation errors) and biological reasons. The latter are responsible for ~50% of ovserved rate variation and probably reflect lineage-specific changes of evolutionary pressure on different genes.

Page 40: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Recommended Reading

Page 41: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

In Lieu of Conclusion

Darwinian theory of natural selection

Population geneticsQuantitative theory of selection

and drift

Neutral theory

Selfish gene

HG

T

phy

loge

nom

ics

Neo-Lamarckian evolution models

Lamarckian L'influence des circonstances

constructive neutral evolution of complexity

evo

lutio

n o

f e

volv

abili

ty

Evolution theories

keep evolving!

Page 42: Yuri  Wolf

Nat

ion

al C

ente

r fo

r B

iote

chn

olo

gy

Info

rmat

ion

Acknowledgments

Eugene Koonin, NCBI

Pere Puigbo, NCBI

Alex Lobkovsky, NCBI

David Kristensen, NCBI

Sagi Snir, University of Haifa, Israel