Post on 20-Jun-2015
description
LBBE, CNRS, Université de Lyon
Evolutionary genomics
Bastien Boussau
boussau@gmail.com
@bastounette
Chance and necessity
…ATCGACATCAGCATCAGCACTAC…
Chance and necessity
… …Evolution
…ATCGACATCAGCATCAGCACTAC…
Chance and necessity
… …Evolution
Function
…ATCGACATCAGCATCAGCACTAC…
3
Evolution in our genomes
3
Brown, Sanger, Kitai. Biochem. J. 1955.
3
Evolution in our genomes
3
Brown, Sanger, Kitai. Biochem. J. 1955.
3
Evolution in our genomes
3
Brown, Sanger, Kitai. Biochem. J. 1955.
Genomes as Documents of Evolutionary History
4
What information can we extract from genome sequences?
5
1. Species phylogeny!
2. Phylogeography!
3. Diversification history!
4. Ancestral lifestyles!
5. Selective pressures in extant species!
6. Application to cell lineages
Genome evolution
6
Genome sequence
Processes
…ACTCGATCGCATCGACTCCTCCAGC…
Genome evolution
6
Genome sequence
point mutations
Processes
…ACTCGTTCGCATCGACTCCTCCAGC…
Genome evolution
7
Genome sequence
point mutations
Processes
…ACTCGATCGCATCGACTCCTCCAGC…
Genome evolution
7
Genome sequence
point mutations insertions/deletions
Processes
…ACTCGATCGCATCGAAAACTCCTCCAGC…
Genome evolution
8
Genome sequence
point mutations insertions/deletions
Processes
…ACTCGATCGCATCGACTCCTCCAGC…
Genome evolution
8
Genome sequence
point mutations insertions/deletions duplications/losses
Processes
…ACTCGATCGCATCGACTCCTTCCTCCAGC…
Genome evolution
9
Genome sequence
point mutations insertions/deletions duplications/losses
Processes
…ACTCGATCGCATCGACTCCTCCAGC…
Genome evolution
9
Genome sequence
point mutations insertions/deletions duplications/losses rearrangements
Processes
…ACTCGATCGCAAGCTCTCCTCCAGC…
Genome evolution
10
Genome sequence
point mutations insertions/deletions duplications/losses rearrangements
Processes
population genetics molecular machinery species phylogeny environment
…ACTCGATCGCATCGACTCCTCCAGC…
Using genomes for statistical inference
11
Genome sequence
point mutations insertions/deletions duplications/losses rearrangements
Processes
population genetics molecular machinery species phylogeny environment
…ACTCGATCGCATCGACTCCTCCAGC…
Inferential statistics
12
Boussau and Daubin, Tree 2010
Inferential statistics
12
Boussau and Daubin, Tree 2010
• Using computers!• Probabilistic models (e.g. models of sequence evolution)!• “What I cannot create, I do not understand.” (Feynman, 1988)!• What I cannot simulate, I do not understand.”
What information can we extract from genome sequences?
13
1. Species phylogeny!
2. Phylogeography!
3. Diversification history!
4. Ancestral lifestyles!
5. Selective pressures in extant species!
6. Application to cell lineages
1 Inferring the phylogeny
Models:!
• Modelling events of substitution!
• In some cases, modelling insertions and deletions!
• In some cases, modelling allele sorting!
• In some cases, modelling gene duplications, losses and transfers!
• In some cases, modelling hybridization!
• Dates of speciation can also be inferred with a model of rate evolution
1 The phylogeny of life
Williams et al., Nature 2013
1 The phylogeny of life
Williams et al., Nature 2013
Improvements in:!• probabilistic models!• data available
1 The origin of viral strains
Boussau, Guéguen, Gouy, Evolutionary Bioinformatics 2009.
1353 first sites ~1200 remaining sites
1 The origin of viral strains
Boussau, Guéguen, Gouy, Evolutionary Bioinformatics 2009.
The N HIV strain originated through a recombination between a Human and a Chimp virus
1353 first sites ~1200 remaining sites
Contagion, Steven Soderbergh, 2011
Contagion, Steven Soderbergh, 2011
1 Forensic analyses
Scaduto et al., PNAS 2010
The problem:!
• CC01 is a HIV-positive male, accused by several partners of hiding
his seropositivity and infecting them in the process ==> trial!
• 1 accused male, 6 partners, all seropositive!
• HIV sequences available from each of them!
• How can we tell whether CC01 likely contaminated his partners?
1 Forensic analyses
Scaduto et al., PNAS 2010
The problem:!
• CC01 is a HIV-positive male, accused by several partners of hiding
his seropositivity and infecting them in the process ==> trial!
• 1 accused male, 6 partners, all seropositive!
• HIV sequences available from each of them!
• How can we tell whether CC01 likely contaminated his partners?
Use the HIV sequences to build a phylogenetic tree!
1 Forensic analyses
Scaduto et al., PNAS 2010
1 Forensic analyses
Scaduto et al., PNAS 2010
Evidence used to establish that CC01 had infected his partners
1 Inferring the phylogeny
Purposes:!
• Inferring the species phylogeny!
• Reconstructing the evolutionary history of infectious agents!
• Reconstructing transmission histories (e.g. forensic analyses)!
2 Phylogeography
Question:!• How did these organisms get to be where they are?!
2 Phylogeography
Question:!• How did these organisms get to be where they are?!
Mus musculus, GBIF database
2 Phylogeography
Faria et al., Science 2014
Models:!• Add spatial information at the leaves!• Use Discrete models or continuous models to reconstruct
ancestral ranges!
2 Phylogeography
Landis et al., Syst. Biol. 2014
Landis et al., Syst. Biol. 2014
2 Phylogeography
2 HIV phylogeography
Faria et al., Science 2014
2 Phylogeography
Purposes:!
• Inferring the species geographical range through time!
• Reconstructing the evolutionary spread of infectious agents!
• Investigating plate tectonics!
3 Diversification history
How did species diversify? Were there bursts of speciation, or
mass extinctions? !
How many species/individuals through time?!
Models:!
• Modelling events of speciation, and events of extinction!
• In some cases, can be dependent on other parameters
28
3 Species of birds through time
Jetz et al., Nature 2012
29
3 Speciations of birds across the globe
Jetz et al., Nature 2012
30
3 Phylodynamics of HCV in Egypt
Drummond et al., MBE 2005
30
3 Phylodynamics of HCV in Egypt
Drummond et al., MBE 2005
Huge increase in number of viruses coincides with the extensive use of an antischistosomiasis treatment from 1920 to 1980
3 Diversification history
Purposes:!
• Inferring the number of species/individuals through time!
• Finding major radiation/extinction events!
• Reconstructing past epidemics!
4 Ancestral lifestyles
How did ancestral species live? What temperature did they
like most? How long did they live?!
Models:!
• Correlating molecular evolution with phenotypic traits!
4 Inferring growth temperature across the tree of life
4 Inferring growth temperature across the tree of life
Idea: reconstruct ancestral sequences in silico, and predict ancestral growth temperatures
4 Inferring growth temperature across the tree of life
Usual model: !all branches evolve according to the same model
Better model: !different models for different branches
Boussau et al., Nature 2008
4 Inferring growth temperature across the
tree of life
Boussau et al., Nature 2008
4 Inferring growth temperature across the tree of life
Boussau et al., Nature 2008
4 Inferring growth temperature across the tree of life
Late Heavy Bombardment 3.8 Bya?
Lartillot and Delsuc, Evolution 2012
4 Joint inference of rates, dates, and traits
4 Ancestral lifestyles
Purposes:!
• Inferring characteristics of ancient organisms!
• Inferring characteristics of ancient environments!
• Finding/using correlations between phenotype evolution and
genotype evolution
5 Selective pressures in extant species
What sites in the genome of species X are important?!
Models:!
• Usual models of sequence evolution!
• Models of insertion-deletion!
• Hidden Markov Models that run along the genome!
40
5 Conservation across species indicates function
40
Brown, Sanger, Kitai. Biochem. J. 1955.
41
5 Conservation across species indicates function
41
The UCSC genome browser uses conservation across 100 vertebrates to detect functional regions
Gnad et al., BMC Genomics 2013
5 How to best predict cancer-
causing mutations?
Comparison across 12 methods:
5 How to best predict cancer-causing mutations?
Gnad et al., BMC Genomics 2013
5 Selective pressures in extant species
Purposes:!
• Screening a genome for new functional elements!
• Evaluating the severity of candidate mutations (e.g. genetic
disease, cancer)!
• Finding sites to target in a pest that needs controlling
6 Application to cell lineages
As cells divide by mitosis, mutations accumulate. !—> phylogenetic approaches can be used to address developmental questions:!• How similar is development across individuals?!• Do the first cells produced during development contribute equally to
the adult organism?!• …!Models:!• Usual models of sequence evolution!• Models of microsatellite evolution
6 Application to cell lineages
Behjati et al., Nature 2014
6 Application to cell lineages
Behjati et al., Nature 2014
Contributions of early embryonic cells to adult tail cell populations
6 Application to cell lineages
6 Application to cell lineages
Purposes:!
• Learning about development!
• Learning about cancer evolution (e.g. using phylogeography to
understand cancer spread)
Conclusions
• Genomes contain a lot of information about their history and about how they work!
• The comparative approach is a powerful way to learn about the function of a stretch of sequence!
• Thanks to probabilistic models, one can exploit the huge amount of information in genomes to ask a large number of interesting questions
Slides available on SlideShare: http://www.slideshare.net/boussau