Evolutionary genomics

Post on 20-Jun-2015

241 views 4 download

Tags:

description

A presentation about evolutionary genomics, showing a selection of examples of what information can be extracted from genomes.

Transcript of Evolutionary genomics

LBBE, CNRS, Université de Lyon

Evolutionary genomics

Bastien Boussau

boussau@gmail.com

@bastounette

Chance and necessity

…ATCGACATCAGCATCAGCACTAC…

Chance and necessity

… …Evolution

…ATCGACATCAGCATCAGCACTAC…

Chance and necessity

… …Evolution

Function

…ATCGACATCAGCATCAGCACTAC…

3

Evolution in our genomes

3

Brown, Sanger, Kitai. Biochem. J. 1955.

3

Evolution in our genomes

3

Brown, Sanger, Kitai. Biochem. J. 1955.

3

Evolution in our genomes

3

Brown, Sanger, Kitai. Biochem. J. 1955.

Genomes as Documents of Evolutionary History

4

What information can we extract from genome sequences?

5

1. Species phylogeny!

2. Phylogeography!

3. Diversification history!

4. Ancestral lifestyles!

5. Selective pressures in extant species!

6. Application to cell lineages

Genome evolution

6

Genome sequence

Processes

…ACTCGATCGCATCGACTCCTCCAGC…

Genome evolution

6

Genome sequence

point mutations

Processes

…ACTCGTTCGCATCGACTCCTCCAGC…

Genome evolution

7

Genome sequence

point mutations

Processes

…ACTCGATCGCATCGACTCCTCCAGC…

Genome evolution

7

Genome sequence

point mutations insertions/deletions

Processes

…ACTCGATCGCATCGAAAACTCCTCCAGC…

Genome evolution

8

Genome sequence

point mutations insertions/deletions

Processes

…ACTCGATCGCATCGACTCCTCCAGC…

Genome evolution

8

Genome sequence

point mutations insertions/deletions duplications/losses

Processes

…ACTCGATCGCATCGACTCCTTCCTCCAGC…

Genome evolution

9

Genome sequence

point mutations insertions/deletions duplications/losses

Processes

…ACTCGATCGCATCGACTCCTCCAGC…

Genome evolution

9

Genome sequence

point mutations insertions/deletions duplications/losses rearrangements

Processes

…ACTCGATCGCAAGCTCTCCTCCAGC…

Genome evolution

10

Genome sequence

point mutations insertions/deletions duplications/losses rearrangements

Processes

population genetics molecular machinery species phylogeny environment

…ACTCGATCGCATCGACTCCTCCAGC…

Using genomes for statistical inference

11

Genome sequence

point mutations insertions/deletions duplications/losses rearrangements

Processes

population genetics molecular machinery species phylogeny environment

…ACTCGATCGCATCGACTCCTCCAGC…

Inferential statistics

12

Boussau and Daubin, Tree 2010

Inferential statistics

12

Boussau and Daubin, Tree 2010

• Using computers!• Probabilistic models (e.g. models of sequence evolution)!• “What I cannot create, I do not understand.” (Feynman, 1988)!• What I cannot simulate, I do not understand.”

What information can we extract from genome sequences?

13

1. Species phylogeny!

2. Phylogeography!

3. Diversification history!

4. Ancestral lifestyles!

5. Selective pressures in extant species!

6. Application to cell lineages

1 Inferring the phylogeny

Models:!

• Modelling events of substitution!

• In some cases, modelling insertions and deletions!

• In some cases, modelling allele sorting!

• In some cases, modelling gene duplications, losses and transfers!

• In some cases, modelling hybridization!

• Dates of speciation can also be inferred with a model of rate evolution

1 The phylogeny of life

Williams et al., Nature 2013

1 The phylogeny of life

Williams et al., Nature 2013

Improvements in:!• probabilistic models!• data available

1 The origin of viral strains

Boussau, Guéguen, Gouy, Evolutionary Bioinformatics 2009.

1353 first sites ~1200 remaining sites

1 The origin of viral strains

Boussau, Guéguen, Gouy, Evolutionary Bioinformatics 2009.

The N HIV strain originated through a recombination between a Human and a Chimp virus

1353 first sites ~1200 remaining sites

Contagion, Steven Soderbergh, 2011

Contagion, Steven Soderbergh, 2011

1 Forensic analyses

Scaduto et al., PNAS 2010

The problem:!

• CC01 is a HIV-positive male, accused by several partners of hiding

his seropositivity and infecting them in the process ==> trial!

• 1 accused male, 6 partners, all seropositive!

• HIV sequences available from each of them!

• How can we tell whether CC01 likely contaminated his partners?

1 Forensic analyses

Scaduto et al., PNAS 2010

The problem:!

• CC01 is a HIV-positive male, accused by several partners of hiding

his seropositivity and infecting them in the process ==> trial!

• 1 accused male, 6 partners, all seropositive!

• HIV sequences available from each of them!

• How can we tell whether CC01 likely contaminated his partners?

Use the HIV sequences to build a phylogenetic tree!

1 Forensic analyses

Scaduto et al., PNAS 2010

1 Forensic analyses

Scaduto et al., PNAS 2010

Evidence used to establish that CC01 had infected his partners

1 Inferring the phylogeny

Purposes:!

• Inferring the species phylogeny!

• Reconstructing the evolutionary history of infectious agents!

• Reconstructing transmission histories (e.g. forensic analyses)!

2 Phylogeography

Question:!• How did these organisms get to be where they are?!

2 Phylogeography

Question:!• How did these organisms get to be where they are?!

Mus musculus, GBIF database

2 Phylogeography

Faria et al., Science 2014

Models:!• Add spatial information at the leaves!• Use Discrete models or continuous models to reconstruct

ancestral ranges!

2 Phylogeography

Landis et al., Syst. Biol. 2014

Landis et al., Syst. Biol. 2014

2 Phylogeography

2 HIV phylogeography

Faria et al., Science 2014

2 Phylogeography

Purposes:!

• Inferring the species geographical range through time!

• Reconstructing the evolutionary spread of infectious agents!

• Investigating plate tectonics!

3 Diversification history

How did species diversify? Were there bursts of speciation, or

mass extinctions? !

How many species/individuals through time?!

Models:!

• Modelling events of speciation, and events of extinction!

• In some cases, can be dependent on other parameters

28

3 Species of birds through time

Jetz et al., Nature 2012

29

3 Speciations of birds across the globe

Jetz et al., Nature 2012

30

3 Phylodynamics of HCV in Egypt

Drummond et al., MBE 2005

30

3 Phylodynamics of HCV in Egypt

Drummond et al., MBE 2005

Huge increase in number of viruses coincides with the extensive use of an antischistosomiasis treatment from 1920 to 1980

3 Diversification history

Purposes:!

• Inferring the number of species/individuals through time!

• Finding major radiation/extinction events!

• Reconstructing past epidemics!

4 Ancestral lifestyles

How did ancestral species live? What temperature did they

like most? How long did they live?!

Models:!

• Correlating molecular evolution with phenotypic traits!

4 Inferring growth temperature across the tree of life

4 Inferring growth temperature across the tree of life

Idea: reconstruct ancestral sequences in silico, and predict ancestral growth temperatures

4 Inferring growth temperature across the tree of life

Usual model: !all branches evolve according to the same model

Better model: !different models for different branches

Boussau et al., Nature 2008

4 Inferring growth temperature across the

tree of life

Boussau et al., Nature 2008

4 Inferring growth temperature across the tree of life

Boussau et al., Nature 2008

4 Inferring growth temperature across the tree of life

Late Heavy Bombardment 3.8 Bya?

Lartillot and Delsuc, Evolution 2012

4 Joint inference of rates, dates, and traits

4 Ancestral lifestyles

Purposes:!

• Inferring characteristics of ancient organisms!

• Inferring characteristics of ancient environments!

• Finding/using correlations between phenotype evolution and

genotype evolution

5 Selective pressures in extant species

What sites in the genome of species X are important?!

Models:!

• Usual models of sequence evolution!

• Models of insertion-deletion!

• Hidden Markov Models that run along the genome!

40

5 Conservation across species indicates function

40

Brown, Sanger, Kitai. Biochem. J. 1955.

41

5 Conservation across species indicates function

41

The UCSC genome browser uses conservation across 100 vertebrates to detect functional regions

Gnad et al., BMC Genomics 2013

5 How to best predict cancer-

causing mutations?

Comparison across 12 methods:

5 How to best predict cancer-causing mutations?

Gnad et al., BMC Genomics 2013

5 Selective pressures in extant species

Purposes:!

• Screening a genome for new functional elements!

• Evaluating the severity of candidate mutations (e.g. genetic

disease, cancer)!

• Finding sites to target in a pest that needs controlling

6 Application to cell lineages

As cells divide by mitosis, mutations accumulate. !—> phylogenetic approaches can be used to address developmental questions:!• How similar is development across individuals?!• Do the first cells produced during development contribute equally to

the adult organism?!• …!Models:!• Usual models of sequence evolution!• Models of microsatellite evolution

6 Application to cell lineages

Behjati et al., Nature 2014

6 Application to cell lineages

Behjati et al., Nature 2014

Contributions of early embryonic cells to adult tail cell populations

6 Application to cell lineages

6 Application to cell lineages

Purposes:!

• Learning about development!

• Learning about cancer evolution (e.g. using phylogeography to

understand cancer spread)

Conclusions

• Genomes contain a lot of information about their history and about how they work!

• The comparative approach is a powerful way to learn about the function of a stretch of sequence!

• Thanks to probabilistic models, one can exploit the huge amount of information in genomes to ask a large number of interesting questions

Slides available on SlideShare: http://www.slideshare.net/boussau