Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden...
Transcript of Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden...
![Page 1: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/1.jpg)
Reconstructing the past:deep learning for population genetics
Flora Jay and Guillaume CharpiatLRI (Bioinfo and TAO)
CDS Pitching Day
![Page 2: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/2.jpg)
Overview
1. Introduction: past demography and genetics
2. Extracting information from genomic data
3. State of the art: summary statistics (hand designed) + ABC
4. Deep learning
4a. Challenges: variable input size
4b. Desired properties: invariances
4c. Plan
5. Summary
![Page 3: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/3.jpg)
Introduction
![Page 4: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/4.jpg)
Population genetic and demography
Introduction
Methods and applications
GENETICVARIATION
MUTATIONRECOMBINATION
DEMOGRAPHY
Admixture btw populations
Population structure
Expansions
Population size
NATURAL SELECTION
...
Eg. Peopling of the world by modern humans
![Page 5: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/5.jpg)
Population genetic and demography
Introduction
GENETICVARIATION
MUTATIONRECOMBINATION
DEMOGRAPHY
Admixture btw populations
Population structure
Expansions
Population size
NATURAL SELECTION
...
![Page 6: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/6.jpg)
Population genetic and demography
Introduction
Demographicinference
Methods and applicationsIdentify events, date them, estimate their strength, ...
GENETICVARIATION
MUTATIONRECOMBINATION
DEMOGRAPHY
Admixture btw populations
Population structure
Expansions
Population size
NATURAL SELECTION
...
![Page 7: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/7.jpg)
Why inferring past demography ?
If you love at least one of the following...
● History
● Medicine
● Evolution
then you already have a good reason for trying to infer demography!
eg. did we mingle with Neandertals 50k year ago while peopling earth?
eg. is there a mutation increasing the risk of gettingbreast cancer?
eg. are Tibetan adapted to altitude and why?Are plant populations adapted to their environment and what could be the impact of climate change?
→ gives a null model to test non-neutral hypotheses eg. observed signal at a gene due to [demography] versus [demography+selection] ?
Introduction
![Page 8: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/8.jpg)
8
Where is the information?
![Page 9: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/9.jpg)
Past demography leaves signatures in genetic data
● Population sizes ● Migration / admixture between populations
Sousa & Hey 2013© Slatkin
Introduction
![Page 10: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/10.jpg)
10
Project:Inferring demography using whole genomes
Develop a method using sequence data...
...to identify complex demographic histories.
past
Research
size
present pastpresent
Mutation → polymorphism (SNP)
Dim ~ n x 3Gb
![Page 11: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/11.jpg)
11
Genetic process
Adapted from Wikimedia Commons
ancestors ancestors ancestors ...
Identical segment
(same recent ancestor)
Recombination point
On individual = 1 chromosome inherited from the father and 1 from the mother = mosaic of different ancestors
![Page 12: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/12.jpg)
Genetic process
● Recombination along the DNA sequence at each generation→ not a single tree for the whole genome BUT multiple genealogies
● Genealogies are influenced by past demography
© Sheehan
Not observed
Palamara et al. 2012
recombination
recomb.
More coalescence when population is small
![Page 13: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/13.jpg)
13
Inferring demographic history
The approach: Combining different type of informationto learn about hidden genealogies and thus demography
(1) from “expert statistics” (past and current research)(site-frequency spectrum, linkage disequilibrium, diversity, ...)Using Approximate Bayesian Computation
(2) by learning interesting features from raw data (PROJECT)Deep learning: build a deep neural network tuned forpopulation genetic data
![Page 14: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/14.jpg)
14
State of the art: summary statistics + ABC
![Page 15: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/15.jpg)
15
Understanding the data - Summary statistics
DNA is spatially structured
1. Distant sites: distant histories (different evolutionary trees)→ site frequency spectrum = histogram of allele counts
1...
2 n
coun
ts
Goal: extract information from genomes about hidden genealogies
Research – What's the data?
![Page 16: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/16.jpg)
16
Summary statistics
1. histogram of allele counts at all segregating sites
1
...2 n
coun
ts
Long external branches → excess of Mutations carried
by only ONE individual
1
...2 n
coun
ts
Nb Mutations carried by onlyONE individual
present present
past past
Research – What's the data?
![Page 17: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/17.jpg)
17
Summary statistics
DNA is spatially structured
1. Distant sites: distant histories (different evolutionary trees)
2. Less distant sites: related histories→ linkage disequilibrium (correlation between SNPs)
Research – What's the data?
![Page 18: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/18.jpg)
18
Summary statistics
DNA is spatially structured
1. Distant sites: distant histories (different evolutionary trees)
2. Less distant sites: related histories
3. Adjacent sites share the same history→ Diversity per regions
Research – What's the data?
recent ancestorold common ancestor
![Page 19: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/19.jpg)
19
Summary statistics
1. Distant sites: distant histories (different evolutionary trees)
2. Less distant sites: related histories
3. Adjacent sites share the same history→ Diversity per regions
And so on...
Research – What's the data?
![Page 20: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/20.jpg)
20
Approximate Bayesian Computation (ABC)
Generate randomlythousands of histories
present pasttime
Ne
Ne
Ne
1...
2 nco
unts
LD
dist
1...
2 n
coun
ts
LD
dist
1...
2 n
coun
ts
LD
dist
Compute summarystatistics
Keep histories that produce sum. stat. similar to real ones
Infer history
present pasttime
Ne
PopsizeABC pipeline example
![Page 21: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/21.jpg)
21
Application to real data
Log scale
Human dataBottleneck + Paleolithic and
neolithic expansions in European data
past
Jay et al (in prep)
Bos taurusBos primigenius (wild aurochs)
Danubianroute
Mediterraneanroute
Neolithic
Paleolithic
Cattle breed datapopulation decline
pastpresent
PopSizeABCBoitard, Rodríguez, Jay et al.
(PloS Genet. 2016)
![Page 22: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/22.jpg)
22
Project: Deep Learning instead
![Page 23: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/23.jpg)
23
Project
Simulations
Learning- data features
- functions predicting demographic model/parameters/...
Pseudo data
Comparison with
observationsdemo∼ prior
Summary statsSummary stats
Summary stats
Summary stats...
pseu
do
data
dem
ogra
phy
Multi-layer neural networks
ProjectLearn a global relationship between the data and the demographic
parameters with a deep neural network
ABC
![Page 24: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/24.jpg)
24
Project
Learn a global relationship between the data and the demographic parameters with a deep neural network
Motivation from previous work in population genetics:
● Improvement when using a simple neural network inside ABC to learn locally the relationship between the summary statistics S(.) and the demographic parameters (Blum&François 2010, Boitard et al 2016, Jay et al in prep, …)
● A global relationship can be learned between S(.) and popgen. parameters using deep neural networks (fully connected) (Sheehan&Song (2016))→ they get rid of the rejection step (Method tested on coarse demographic models only)
● Natural next step: learn automatically the features from raw data → get rid of S(), gain information?
Project
![Page 25: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/25.jpg)
25
Project - Challenges
Learn a global relationship between the data and the demographic parameters with a deep neural network
Challenges due to input data: ● Raw genetic data are large (larger than images)● Number of sequenced individuals vary● Length of sequences vary
➔ Need for flexibility & generalization w.r.t. input size➔ e.g.: if knowing how to predict past demography for sets of 10 sequences
of length 100 000 000, want not to start from scratch for a new set of 9 sequences of length 70 000 000.
➔ Recurrent networks can somehow deal with (1D) variable length, but not necessarily suited here (2D, more information by contemplating a whole column at once...)
Project
![Page 26: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/26.jpg)
26
Project – Desired properties
Learn a global relationship between the data and the demographic parameters with a deep neural network
Incorporate coalescence knowledge in the architecture:
● Invariance by translation along the genome ● Invariance by permutation of the individuals*● Correlation decreases with distance (but rate depends on the demography)● ...
Project
* but maybe not on the permutation of the haplotypes
![Page 27: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/27.jpg)
27
Project - Plan
Learn a global relationship between the data and the demographic parameters with a deep neural network
● Step 1: describe all properties specific to genetic data AND population genetic data to be used in the DNN
● Step 2: practical test with common DNN layers such as recurrent and convolution layers
● Step 3: study flexible architectures that scale to different input size → naturally scalable node functions (e.g.: max, average, variance...) → training a family of neural nets, by combining pre-defined families of layers (indexed by input size) → meta DNN generating architectures (take as input the data size and outputs a neural network)
Project
![Page 28: Reconstructing the past: deep learning for population genetics€¦ · to learn about hidden genealogies and thus demography (1) from “expert statistics” ... distant histories](https://reader034.fdocuments.us/reader034/viewer/2022050203/5f56d54a6ae89e0edf4cf781/html5/thumbnails/28.jpg)
28
Summary
Asked funding: Grant for a Master internship = 3000 euros
References:- Boitard S, Rodriguez W, Jay F, et al. Inferring population size history from large samples of genome wide molecular data an approximate Bayesian computation approach. PLoS Genet. 2016 12(3):e1005877. - Sheehan S, Song YS. Deep learning for population genetic inference. PLoS Comput Biol. 2016 12(3):e1004845. - Stanley, Kenneth O., David B. D'Ambrosio, and Jason Gauci. A hypercube based encoding for evolving large scale neural networks. Artificial life 15.2 (2009): 185 -212.