Use of the Genome Aggregation Database (gnomAD)
Transcript of Use of the Genome Aggregation Database (gnomAD)
![Page 1: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/1.jpg)
Use of the Genome Aggregation Database (gnomAD)
Anne O’Donnell-Luria, MD, PhDBroad Institute of MIT and Harvard
Boston Children’s Hospital
H3Africa Rare Disease Workshop February 2021
Twitter: @AnneOtation
![Page 2: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/2.jpg)
Every genome contains many rare, potentially functional variants
o ~2 million high quality variants in a variant call file (vcf)o ~20,000 variants within coding (exome) regionso ~500 rare missense variants (1/3 of which are predicted
damaging by in silico predictors)o ~100 LoF variants: ~20 homozygous, ~20 rareo ~100 rare variants in known disease geneso ~50 reported disease-causing mutations (!)o 1-2 de novo coding mutationso Unknown number of sequencing errors
In Mendelian disease analysis, how can we identify the pathogenic genetic variant(s) in
the sea of benign variation?
![Page 3: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/3.jpg)
Given known mutation rates, it is almost certain that every possible single base change compatible with
life exists in a living human
The power of 7.7 billion people
![Page 4: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/4.jpg)
Variant aggregation at Broad
• 20+ M pageviews from 184 countries• Aided in the diagnosis of over 200,000 patients with rare disease
Exome Aggregation Consortium (ExAC) – v160,076 exomesGenome Build 37released October 2014
Genome Aggregation Database (gnomAD) – v2125,748 exomes + 15,708 genomesGenome Build 37released October 2016
Genome Aggregation Database (gnomAD) – v371,702 genomesGenome Build 38released October 2019
Matt Solomonson
![Page 5: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/5.jpg)
Who’s in the gnomAD database?• Case-control studies of complex adult-onset diseases (e.g. type 2 diabetes, heart
attack, migraine, bipolar)
• Depleted as much as possible of people known to have severe pediatric disease, and their first-degree relatives
• 55% Male, Mean age 54 years
Grace Tiao
![Page 6: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/6.jpg)
Thank you to the PIs who share data in gnomADMaria AbreuCarlos A Aguilar SalinasTariq AhmadChristine M. AlbertNicholette AllredDavid AltshulerDiego ArdissinoGil AtzmonJohn BarnardLaurent BeaugerieGary BeechamEmelia J. BenjaminMichael BoehnkeLori BonnycastleErwin BottingerDonald BowdenMatthew BownSteven BrantHannia CamposJohn ChambersJuliana ChanDaniel ChasmanRex ChisholmJudy ChoRajiv ChowdhuryMina Chung
Wendy ChungBruce CohenAdolfo CorreaDana DabeleaMark DalyJohn DaneshDawood DarbarJoshua DennyRavindranath DuggiralaJosée DupuisPatrick T. EllinorRoberto ElosuaJeanette ErdmannTõnu EskoMartt FärkkiläDiane FatkinJose FlorezAndre FrankeGad GetzDavid GlahnBen GlaserStephen GlattDavid GoldsteinClicerio GonzalezLeif GroopChristopher Haiman
Ira HallCraig HanisMatthew HarmsMikko HiltunenMatti HoliChristina HultmanChaim JalasMikko KallelaJaakko KaprioSekar KathiresanEimear KennyBong-Jo KimYoung Jin KimGeorge KirovJaspal KoonerSeppo KoskinenHarlan M. KrumholzSubra KugathasanSoo Heon KwakMarkku LaaksoTerho LehtimäkiRuth LoosSteven A. LubitzRonald MaDaniel MacArthurGregory M. Marcus
Jaume MarrugatKari MattilaSteve McCarrollMark McCarthyJacob McCauleyDermot McGovernRuth McPhersonJames MeigsOlle MelanderDeborah MeyersLili MilaniBraxton MitchellAliya NaheedSaman NazarianBen NealePeter NilssonMichael O'DonovanDost OngurLorena OrozcoMichael OwenColin PalmerAarno PalotieKyong Soo ParkCarlos PatoAnn PulverDan Rader
Nazneen RahmanAlex ReinerAnne RemesStephen RichJohn D. RiouxSamuli RipattiDan RodenJerome I. RotterDanish SaleheenVeikko SalomaaNilesh SamaniJeremiah ScharfHeribert SchunkertSvati ShahMoore ShoemakerTai ShyongEdwin K. SilvermanPamela SklarGustav Smith
Broad Genomics PlatformBroad Data Sciences Platform
Hilkka SoininenHarry SokolTim SpectorNathan StitzielPatrick SullivanJaana SuvisaariKent TaylorYik Ying TeoTuomi TiinamaijaMing TsuangDan TurnerTeresa Tusie LunaErkki VartiainenJames WareHugh WatkinsRinse WeersmaMaija WessmanJames WilsonRamnik Xavier
Data processed through same pipeline and called jointly
![Page 7: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/7.jpg)
http://gnomad.broadinstitute.orgMPG Primers on using gnomAD
https://www.youtube.com/watch?v=J29_Pf8Ti1shttps://www.youtube.com/watch?v=Bh8AKkI-DhY
gnomAD v2 – GRCh37exomes and genomes
gnomAD v3 – GRCh38genomes only (for now)
![Page 8: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/8.jpg)
![Page 9: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/9.jpg)
![Page 10: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/10.jpg)
Additional features in gnomAD v3 (GRCh38)
In silico predictors
Released last week (also in gnomAD v2)
![Page 11: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/11.jpg)
Additional features in gnomAD v3 (GRCh38)
Two open access datasetsHuman Genome Diversity
Project (HGDP)
1000Genomes (1KG)
![Page 12: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/12.jpg)
Additional features in gnomAD v3 (GRCh38)
Two open access datasetsHuman Genome Diversity
Project (HGDP)
1000Genomes (1KG)
![Page 13: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/13.jpg)
Constraint
• Where to we see depletion of variation across human populations?
Conservation is between speciesConstraint is within the human species
![Page 14: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/14.jpg)
Developed a mutational model to calculate expected number of variants per gene
Kaitlin Samocha
(Samocha et al. 2014; Lek et al. 2016)
Daniel MacArthurMark Daly
Konrad Karczewski
![Page 15: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/15.jpg)
• Used z-scores to assess depletion from expectation• For pLoF variants, correlation with gene length (short genes under powered)
Constraint metrics
LOEUF
![Page 16: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/16.jpg)
Constraint metrics
LOEUF
Samocha et al. 2014; Lek et al. 2016
pLI
# of
gen
es
pLI = Probability of loss function intoleranceDichotomous metric (>0.9 LoF constrained)
Karczewski et al., Nature, 2020
LOEUF = Loss-of-function observed/expected upper bound fractionContinuous metric (lower scores more LoF constrained)
0.9
![Page 17: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/17.jpg)
• Binning this spectrum into deciles
Resolving the spectrum of LoF intolerance
More depletedMore constrained
More tolerantLess constrained
![Page 18: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/18.jpg)
• Known haploinsufficient genes where loss of one copy results in disease
Resolving the spectrum of LoF intolerance
ClinGen gene dosage map
![Page 19: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/19.jpg)
• Autosomal recessive genes are centered around 60% of expected
Resolving the spectrum of LoF intolerance
Gene list from:Blekhman et al., 2008
Berg et al., 2013
![Page 20: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/20.jpg)
• Some genes, e.g. olfactory receptors, are unconstrained
Resolving the spectrum of LoF intolerance
![Page 21: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/21.jpg)
Important notes on constraint• Primarily for dominant disease genes; do not expect to see strong
constraint signal (depletion of pLoF variation) for recessive disease genes• Not necessarily a sign of a strong phenotype – can see evidence of
constraint from mild degrees of negative selection • Selection occurs through reproduction so deleterious effect must be
before/during reproductive years • Example: BRCA1 does not show evidence of loss of function constraint
BRCA1Autosomal dominant
Breast and ovarian cancer
![Page 22: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/22.jpg)
Regional missense constraint (Samocha et al.)https://www.biorxiv.org/content/early/2017/06/12/148353
![Page 23: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/23.jpg)
Regional missense constraint (Samocha et al.)https://www.biorxiv.org/content/early/2017/06/12/148353
![Page 24: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/24.jpg)
Regional missense constraint (Samocha et al.)https://www.biorxiv.org/content/early/2017/06/12/148353
![Page 25: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/25.jpg)
DECIPHER Protein View: NSD1https://decipher.sanger.ac.uk/gene/NSD1/overview/protein-info
![Page 26: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/26.jpg)
Expression data from GTEx in gnomAD
![Page 27: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/27.jpg)
GTEx
NSD1
Showing only Pathogenic pLoF variants
![Page 28: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/28.jpg)
GTEx
NSD1
![Page 29: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/29.jpg)
GTEx Consortium et al., Nature, 2017https://www.gtexportal.org/home/
Gene expressionTranscript expressioneQTLs
![Page 30: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/30.jpg)
Example gene expression across tissues: NSD1
https://www.gtexportal.org/home/gene/NSD1
![Page 31: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/31.jpg)
Example gene expression across tissues: NSD1
https://www.gtexportal.org/home/gene/NSD1
![Page 32: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/32.jpg)
Transcript aware expression analysis - pext score
Cummings et al., Nature, 2020
![Page 33: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/33.jpg)
Exploring subsets
![Page 34: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/34.jpg)
Exploring subsets
Cancer means germline sample (typically blood) for individual recruited to a study for a non-blood cancer
![Page 35: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/35.jpg)
Gene page region view
![Page 36: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/36.jpg)
• gnomAD-SV callset from 10,847 whole genomes• Multiple algorithms followed by integration step• 433,371 unique structural variants with frequency data
RyanCollins
HarrisonBrand
MikeTalkowski
![Page 37: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/37.jpg)
gnomAD Structural Variants: RBM8A
Collins et al., Nature, 2020 gnomad.broadinstitute.org
Thrombocytopenia absent radius (TAR) syndrome (recessive)
![Page 38: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/38.jpg)
Previously reported “common” deletion in Thrombocytopenia absent radius (TAR) syndrome
![Page 39: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/39.jpg)
![Page 40: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/40.jpg)
Mitochondrial variants in gnomAD v3
Siekevitz P, Scientific American (Wikipedia)
37 genes
![Page 41: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/41.jpg)
![Page 42: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/42.jpg)
Vamsi MoothaSarah CalvoKristen LaricchiaMonkol LekNicole LakeDaniel MacArthur
Mitochondrialvariant page
![Page 43: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/43.jpg)
pLoF curations: predictions not the full story
Manually curated (gnomAD v2)• All homozygous pLOF variants
• 28% not LoF• Heterozygous pLoF variants in
some recessive genes• 25% not LoF
• Working on pLoF genes in haploinsufficient genes
• 60% not LoF
https://gnomad.broadinstitute.org/blog/2020-10-loss-of-function-curations-in-gnomad/
LoF Curation column if the gene has variants curated by the gnomAD team
Moriel Singer-Berk
![Page 44: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/44.jpg)
Genetics as a lever to understand biology
Discovery of novel disease genes Phenotyping humans with LoF variantsSlide from Daniel MacArthur
![Page 45: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/45.jpg)
Making sense of one exome requires tens of thousands of exomes (or genomes) to reveal rare variants
vs
Harnessing the power of allele frequency
![Page 46: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/46.jpg)
Five-fold reduction in number of very rare variants with large
reference databases
# variants remaining in an exome after applying a 0.1% filter across all populations
Both size and ancestral diversity increase filtering power6K 60K
East AsianSouth AsianLatinoAfricanEuropean
# va
riant
s rem
aini
ng a
fter
filte
ring
Lek et al., Nature, 2016
![Page 47: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/47.jpg)
2015
Evaluating rare variant pathogenicity
![Page 48: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/48.jpg)
Richards et al., Genet Med,
2015
![Page 49: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/49.jpg)
Conclusions• Reference population databases are public resources that are
critically important to evaluate variant rarity, which is necessary but not sufficient for pathogenicity for rare disease
• These datasets provide multiple tools (constraint, pext) to help prioritize variants to understand human biology
• The power of reference population datasets will increase as they grow in size and diversity
• gnomAD v4 with >300,000 exomes on GRCh38 anticipated in 2021 or 2022
![Page 50: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/50.jpg)
• Exome and genome data is in separate files
• Exome data is 125K exomes
• Genome data is 15K genomes
Available on• Google Cloud Public Datasets• Registry of Open Data on AWS• Azure Open Datasets
![Page 51: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/51.jpg)
![Page 52: Use of the Genome Aggregation Database (gnomAD)](https://reader035.fdocuments.us/reader035/viewer/2022062806/62b87ea8463fc41d652d247c/html5/thumbnails/52.jpg)
Acknowledgements
Steering CommitteeHeidi Rehm (co-PI)Mark Daly (co-PI)Daniel MacArthurBen NealeMike TalkowskiAnne O’Donnell-LuriaKonrad KarczewskiGrace TiaoMatt SolomonsonSamantha Baxter
Analysis teamKonrad KarczewskiLaurent FrancioliGrace TiaoKristen LaricchiaBeryl CummingsEric MinikelIrina ArmeanJames WareKaitlin SamochaNicola WhiffinQingbo WangRyan Collins
Structural VariationRyan CollinsHarrison BrandKonrad KarczewskiLaurent FrancioliNick WattsMatthew SolomonsonXuefang ZhaoLaura GauthierHarold WangChelsea LowtherMark WalkerChristopher WhelanTed BrookingsTed SharpeJack FuEric BanksMichael Talkowski
Browser teamMatthew SolomonsonNick WattsBen Weisburd
Ethics teamAndrea SaltzmanMolly SchleicherNamrata GuptaStacey Donnelly
Broad Genomics PlatformStacey GabrielKristen ConnollySteven Ferriera
Ben NealeCotton SeedTim PoterbaArcturus WangChris Vittal
Production teamGrace TiaoJulia GoodrichKatherine ChaoMichael WilsonWilliam PhuEric BanksCharlotte TolonenChristopher LlanwarneDavid RoazenDiane KaplanGordon WadeJeff GentryJose SotoKathleen TibbettsKristian CibulskisLaura GauthierLouis BergelsonMiguel CovarrubiasNikelle PetrilloRuchi MunshiSam NovodThibault JeandetValentin Ruano-RubioYossi Farjoun
https://www.nature.com/collections/afbgiddedegnomAD papers at:
https://gnomad.broadinstitute.org/aboutContact for ?s: [email protected] or [email protected]