Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van...
-
Upload
sheena-hood -
Category
Documents
-
view
213 -
download
0
Transcript of Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van...
![Page 1: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/1.jpg)
Added value of whole-genome sequence data to genomic predictions in dairy cattle
Rianne van Binsbergen1,2, Mario Calus1, Chris Schrooten3, Fred van Eeuwijk2, Roel Veerkamp1, Marco Bink2
1 Animal Breeding & Genetics Centre, Wageningen UR (NL)2 Biometris, Wageningen UR (NL)3 CRV (cattle breeding company) , Arnhem (NL)
![Page 2: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/2.jpg)
Genomic Prediction in agricultural species
Goddard & Hayes (2009) Nature Reviews Genetics 10:381
Reference population: 1) Estimate effects for each SNP (w)2) Generate a prediction equation that combines all
the marker genotypes with their effects to predict the breeding value of each individual
Each SNP represented by a variable (x), which takes the values 0 [A A] 1 [A B] 2 [B B]
Apply prediction equation to a group of individuals that have genotypes but not phenotypes Estimated genomic breeding values
Select the best individuals for breeding
Advantages:• Select at early age (before phenotypes available)• Save costs to phenotype candidates• Increase accuracy of predicted Breeding Values
![Page 3: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/3.jpg)
One seminal paper on Genomic Prediction
Dense marker maps SNP markers at 1cM density
Prediction Accuracy Least Squares method: 0.32 Genomic BLUP method: 0.73 Bayesian methods(A,B): 0.85
Conclusion:“selection on genetic values predicted from markers could substantially increase the rate of genetic gain in animals and plants, especially if combined with reproductive techniques to shorten the generation interval”
Simulation Study
![Page 4: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/4.jpg)
Another (seminal) paper on Genomic Prediction
“Only few SNPs were useful for predicting the trait [because they were in linkage disequilibrium (LD) with mutations causing variation in the trait] while many SNPs were not useful.”
Prediction of Total Genetic Value Using Genome-Wide Dense Marker Maps T. H. E. Meuwissen,* B. J. Hayes† and M. E. Goddard†,‡
Higher accuracy in genomic predictions since causal mutation is included (assumption) No dependency on LD Persistency across generations Genomic prediction across breeds
“In the case of whole-genome sequence data, the polymorphisms that are causing the genetic differences between the individuals are among those being analyzed.”
![Page 5: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/5.jpg)
Genomic predictions from whole-genome sequence data
Tremendous increase in number of SNPs (more noise) Large (sequence) data are required
Solution Sequence core set of individuals (e.g. founders)
Impute whole-genome sequence genotypes of other individuals
Accuracy of imputation to whole-genome sequence data was generally high for imputation from 777K SNP panelVan Binsbergen, et al. Genet Sel Evol
2014 (in press)
This presentation: First results of genomic prediction with imputed whole-genome
sequence data for 5503 bulls with accurate phenotypes
![Page 6: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/6.jpg)
Dataset: SNP genotypes & trait phenotypes
1000 bull genomes project
28M SNP genotypes
De-regressed progeny based proofs (DRP1) and associated effective daughter contributions (EDC2)
Somatic cell score (SCS)
Interval fist and last insemination (IFL)
Protein yield (PY)
1 VanRaden et al. 2009 (J Dairy Sci) 2 VanRaden and Wiggans 1991 (J Dairy Sci)
5503 Holstein Friesian bulls
777K SNP genotypes (Illumina BovineHD BeadChip)
5503 Holstein Friesian bulls
12M SNP genotypes MAF > 0.005Imputation accuracy > 0.05
Imputation - Beagle v4 software429 bulls
(multiple breeds)
![Page 7: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/7.jpg)
Prediction reliability
Validation population Youngest bulls with EDC 0 Mainly sons of bulls in training population Mimics breeding practice
= squared correlation between original phenotype (DRP) and estimated genetic values (GEBV)
5503 Holstein Friesian bulls
777K SNP genotypes (Illumina BovineHD BeadChip)
5503 Holstein Friesian bulls
12M SNP genotypes MAF > 0.005Imputation accuracy > 0.05
training population
validation population
4322 old bulls
1181 young bulls
training population
validation population
4322 old bulls
1181 young bulls
differences?
![Page 8: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/8.jpg)
Genomic prediction – 2 methods
GBLUP
Genome-enabled best linear unbiased prediction
Distribution QTL effects to be close to infinitesimal model (all SNPs equally small effect)
Build a genomic relationship matrix to model variance-covariance structure
BSSVS
Bayes stochastic search variable selection
Large number of SNPs with tiny (close to zero) and a few SNPs with moderate effects (=mixture of two Normal distributions)
Implementation via Markov chain Monte Carlo (MCMC)
simulation algorithms (computer intensive)
Calus M (2014). Right-hand-side updating for fast computing of genomic breeding values. Genetics Selection Evolution 46(1): 24.
3 chains of 60,000 cycles (10,000 cycles burn-in)
![Page 9: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/9.jpg)
Computation
GBLUP
●HPC – 1 node
●~ 3 hours
●~ 32 GB RAM
●HPC – 12 nodes
●~ 6 hours
●~ 600 GB RAM
BSSVS (per MCMC chain)
●Windows – 1 CPU
●~ 5 days
●~ 1.6 GB RAM
●HPC – 1 node
●~ 50 days
●~ 32 GB RAM
777K SNP
12M SNP
Windows 7 Enterprise desktop pc: 32 CPU – 8 GB RAM/CPU (clock speed 2.60 GHz)
HPC Linux cluster: Normal nodes – 64 GB/node (2.60 GHz); 2 fat nodes – 1 TB RAM/node (2.20 GHz)
3 chains of 60,000 cycles (10,000 cycles burn-in)
![Page 10: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/10.jpg)
Results: Prediction Reliability
SCS IFL PY0.0
0.1
0.2
0.3
0.4
0.5
0.6
BovineHD GBLUPBovineHD BSSVSSequence GBLUPSequence BSSVS *R
eliab
ilit
y
* Based on 45,000 cycles
BSSVS: Average over 3 chains of 60,000 cycles (10,000 cycles burn-in)
![Page 11: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/11.jpg)
Results: Prediction Reliability
SCS IFL PY0.0
0.1
0.2
0.3
0.4
0.5
0.6
BovineHD GBLUPBovineHD BSSVSSequence GBLUPSequence BSSVS *R
eliab
ilit
y
* Based on 45,000 cycles
![Page 12: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/12.jpg)
BSSVS: Convergence & SNP effects
Sequence: 45,000 cycles
3 chains of 60,000 cycles (10,000 cycles burn-in)
Trace of variance of SNP effects Bayes Factor for SNP effects
777K SNP
12M SNP
![Page 13: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/13.jpg)
Suitability of BSSVS model?
Large number of SNPs with tiny and a few SNPs with moderate effects
●Sequence data: Really large number of SNPs with tiny effects Captures too much signal?
Another Bayesian Prediction Model: Bayes-C●Large number of SNPs with NO effect and a few SNPs with moderate
effects
![Page 14: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/14.jpg)
Concentrate on single chromosome (BTA 6)
777K SNP
12M SNP
BSSSVS Bayes-C
MCMC convergence
![Page 15: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/15.jpg)
Concentrate on single chromosome (BTA 6)
777K SNP
12M SNP
Reliability estimates
BSSSVS Bayes-C
BSSVS BayesCBovineHD 0.328 0.328Sequence
0.324 0.325
Signal of QTL effects
![Page 16: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/16.jpg)
Conclusions
Genomic prediction using sequence data becomes reality●However, sequence data requires intensive computation
Need for faster algorithmsUse of Sequence Data did not improve Prediction reliability
●Convergence issues with BSSVS Longer chains may yield better results
BSSVS slightly better compared to GBLUP Preliminary results BTA6 hint that Bayes-C method may work
better (than BSSVS) for sequence data
Next Steps: Did we bet on the wrong horse - named BSSVS?
Review choice of priors in BSSVS model.
Apply Bayes-C model to whole genome sequence data
![Page 17: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/17.jpg)
Thanks!
1000 bull genomes project
(www.1000bullgenomes.com)
Acknowledgments
![Page 18: Added value of whole-genome sequence data to genomic predictions in dairy cattle Rianne van Binsbergen 1,2, Mario Calus 1, Chris Schrooten 3, Fred van.](https://reader035.fdocuments.us/reader035/viewer/2022062713/56649ccb5503460f949944f9/html5/thumbnails/18.jpg)
De-regressed proofs (DRP)
Effective daughter contribution (EDC)
𝐸𝐷𝐶𝐸𝐵𝑉=𝛼𝑅𝐸𝐿𝐸𝐵𝑉 / (1−𝑅𝐸𝐿𝐸𝐵𝑉 )(4−h2) /h2 Published reliability
of EBV
𝐸𝐷𝐶𝑝𝑟𝑜𝑔=𝐸𝐷𝐶𝐸𝐵𝑉−𝐸𝐷𝐶𝑃𝐴
Based on reliability of parents
VanRaden and Wiggans 1991 (J Dairy Sci)VanRaden et al. 2009 (J Dairy Sci)
Parent average
Estimated breeding value
Effective Daughter Contribution