Progress in IBD Genetics
Mark DalyChief, Analytic and Translational Genetics Unit (ATGU)
Massachusetts General Hospital / HMSand
Senior Associate Member, Broad Institute
Primary: Informing therapeutic development
Identifytarget
Chemicalscreening
Clinical trials
Most therapies developed through traditional approaches fail to have efficacy
Animal models
No novel therapies for neuropsych disorders
Genetics research reveals theunderlying biological causes ofdisease
Why we have statins
• Brown and Goldstein discover genetic cause of extreme familial hypercholesterolemia
Why we have statins
• Brown and Goldstein discover mutations in LDLR cause extreme familial hypercholesterolemia
• Understanding of the cause of this disease (mutations cause a lack of receptors that remove cholesterol from the blood) uncovered a general mechanism and proposed an effective approach to therapy for high LDL cholesterol
Secondary: Predicting individual outcomes
While not the primary goal, genetics can in some cases provide individualized medical insights
- diagnostics in cases of severe genetic disorders
- prediction of individuals at risk for severe idiosyncratic adverse drug responses
- identification of groups more or less likely to benefit from specific therapeutic interventions
Linkage mapping (1913)A B C
I suddenly realized that the variations in strength of linkage, already attributed by Morgan to differences in the spatial separation of genes, offered the possibility of determining sequences in the linear dimension of a chromosome. I went home and spent most of the night (to the neglect of my undergraduate homework) in producing the first chromosome map, which included the sex linked genes y, w, v, m, and r, in the order and approximately the same relative spacing that they still appear on the standard maps
— Alfred Sturtevant “A History of Genetics”
‘Mendelian’ diseases travel predictablyand consistently in families
Aa AA Aa Aa AA AA Aa AA
Aa AA
Dominant transmission
Thousands of diseases or traits caused by mutations in a single gene (e.g., Huntington’s, CF, muscular dystrophy)
Family-based linkage analysisAC
AA
AC
AC
AC
AC
AA
AA
AA
AA
A/C Disease GeneSaw dramatic successes in the 1980s-90s for the localization of genes underlying countless Mendelian disorders:Huntington’s, CF, DMD, early onset forms of breast cancer, Alzheimers, diabetes…
Tracking ‘co-segregation’ of DNA polymorphisms with disease status permits identification of region containing responsible gene and mutations
Mendelian disease genetics
genotypegenotype disease statedisease state
Linkage analysis and positional cloning are powerful because genetic risk factors are highly “penetrant”
Glazier, Nadeau and Aitman, Science 2002
Linkage revolution in medical genetics did not generally apply to complex traits
The first complex trait mapping:Altenburg and Muller (1920)
Hermann Muller
Polygenicity proved a fundamental barrier…
…as did the incomplete genotype-phenotype
relationship
disease statedisease state
phenotype1
phenotype2
phenotype3
phenotype4 phenotype5
Exposures / environment
genotypegenotype
other genes
“We suggest that evolutionary changes in anatomy and way of life are more often based on changes in the mechanisms controlling the expression of genes than on sequence changes in proteins. We therefore propose that regulatory mutations account for the major biological di erences between ffhumans and chimpanzees.” – King & Wilson. Science. April, 1975.
Many Mendelian traits can sum to a continuous distribution
Ronald Fisher (1918)
Complex traits
• Instead of one gene determining a disease or trait, many genes each exert a small influence
• None by themselves can cause or explain the disease or trait fully – but together with environmental influences combine to define an individual outcome
MOST COMMON DISEASES WORK THIS WAY
ten years ago
• First human genome was just being completed
• More than 1000 Mendelian traits had been genetically decoded
• Only a handful of genes had been discovered for any common disease
Three major elements turned the tide
Genome ResourcesTechnology
Collaboration
Fundamental Change 1:Understanding the Genome
• Sequencing of the human genome
• Understanding and cataloging variation in the genome (HapMap, 1000 Genomes Project)
Re-screen an independent sample
≈90% chance of seeing same variation with freq. > 1%
Human genetic differences are modest and largely attributable to common variation
Compare any two chromosomes: one in a thousand bp heterozygous
Gabriel et al., Science, 2002
ATGCCGATCGTACGACACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGATCCATTTTATACTGACTGCATCGTACTGACTGCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTTTACCCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATAGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATAGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTG
ATGCCGATCGTACGACACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGATCCATTTTATACTGACTGCATCGTACTGACTGCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTTTACCCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATAGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGACTAAACACATCCCACTTTACCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATAGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTATTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTG
Genetic variation• Single nucleotide polymorphisms
– 1 every few hundred bp, mutation rate* ≈ 10-9
• Short indels (=insertion/deletion)– 1 every few kb, mutation rate v. variable
• Microsatellite (STR) repeat number– 1 every few kb, mutation rate ≤ 10-3
• Minisatellites– 1 every few kb, mutation rate ≤ 10-1
• Repeated genes– rRNA, histones
• Large deletions, duplications, inversions– Rare, e.g. Y chromosome
TGCATTGCGTAGGCTGCATTCCGTAGGC
TGCATT---TAGGCTGCATTCCGTAGGC
TGCTCATCATCATCAGCTGCTCATCA------GC
≤100bp
1-5kb
* rates per generation
Different, but not that different• Humans are among the least diverse organisms
Species Diversity (percent)
Humans 0.08 - 0.1
Chimpanzees 0.12 - 0.17
Drosophila simulans 2
E. coli 5
HIV1 30
Photos from UN photo gallery www.un.org/av/photoCourtesy of Gil McVean - Oxford
10M or so common variants: typically shared across populations
Gabriel et al, Science 2002 Rosenberg et al, Science 2002
Haplotypes identify shared regional ancestry
Haplotypes identify shared regional ancestry
many generations of recombination, drift later
mutation occurs
What is a haplotype? • A genotype of an individual are the
specific bases carried by the individual at a variant site
• A haplotype is the set of those variable bases along a chromosome C C A haplotype
ATTCGACGTATCGGCACTAACAGGATTCGATGTATTGGCACTTACAGG T T T haplotype
Identification of cystic fibrosisgene
(1989)
courtesy of A. Chakravarti
Septin2-likegenes
genesRAD50IL13IL4
IL5 IRF1
OCTN2 OCTN1 RIL
P4HA2
CSF2IL3
LACS2
SNPs
= 50 kb
CAh14b ATTh14c IL4m2 GAh18a CAh15a
IRF1p1
CAh17a D5S1984 CSF2p10
GGACAACCAATTCGTG
TTACGCCCAA
CGGAGACGAGACTGGTCGCGCAGACGA
CGCGCCCGGATTTGCCCCGGCTCTGCTATAACCCTGCCCCAACC
CCAGCCAACC GCGCTCCACC
CCGATCTGAC CTGACATACT
CCCTGCTTACGGTGCAGTGGCACGTATT*CACATCACTCCCCAGACTGTGATGTTAGTATCTTCCCATCCATCATGGTCGAATGCGTACATTACCCCGCTTACGGTGCAGTGGCACGTATATCA
CGTTTAGTAATTGGTGTT*GATGATTAG
ACAACAGTGACG GCGGTGACGGTG
GTTCTGATGTGCGGTG*GTAA
TATAGTATCACGGCG
84 kb 3 kb 14 kb 30 kb 25 kb 11 kb 92 kb 21 kb 27 kb 55 kb 19 kb
94% 96% 92% 94% 93% 97% 93% 91% 92% 90% 98%
Θ .08 .40 .33 .05 .11 .05 .07 .02 .27 .24
Haplotype Map of 5q31
Daly et al., Nature Genetics, Oct. 2001
* Common (35%) haplotype is associated with CD* 2:1 over-transmission of risk haplotype* present in >75% CD patients
>3,000,000 SNPs typed in samples from Nigeria, USA, China, Japan
Detailed catalog of human genetic variation patterns
Project ran from 2003-2007 following the Human Genome Project
Fundamental change 2: Technological advances accelerate discovery
Affecteds Controls AC
AA
AC
AC
AC
AC
AA
AA
AA
AA
Enable us to compare the genomes of…
Populations of cases and controls Family members with and without disease
New technologies to examine genomes…
Arrays of millions Sequencing the entire of DNA variants human genome (2005-present) (2010)
InflammatoryBowel Disease:
Crohn’s Disease (CD)&
Ulcerative Colitis (UC)Chronic, heritable inflammatory diseases of the gastrointestinal system.
• 2000: 0 genes known• 2005: 2 genes knownOnly a tiny fraction of the
heritability understood
Linkage and Candidate Gene Era
2000 2001 2002 2003 2004 2005 2006 2007 2008
NOD25q31
Performing a Genomewide Association Study
Key Quality Control Concepts• Technical QC
– Removal of failed SNPs, samples
• Genetic QC– Mendelian segregation and HWE– Estimating relatedness, gender– Population structure
• Analysis-based QC– Do initial runs of test statistics show inflation, biases
towards missing data, specific allele frequencies
Testing for association
• Most straightforward: compare proportion of each SNP allele in cases and controls
rs11209026 Allele A Allele G
Cases 22 976
Controls 68 932
Chi-sq = 24.5, p=7.3 x 10-7
Multiple Testing• In linkage, p = .001 (.05 / ~50 chromosomal
arms) considered potentially significant
• In GWAS, we’re performing O(105-106) tests that are largely independent– Each study has hundreds of p<.001 purely by
statistical chance (no real relationship to disease)
– “Genome-wide significance” often set at p=5x10-8 (= .05 / 1 million tests)
Genomewide Association
‘Manhattan’ plot
Q-Q plot
NOD2/CARD15
IBD5
ATG16L1
IL23R
10q
Previously Found by Linkage
Previously Found by Linkage
NOT Found by linkage
NOT Found by linkage
First Genomewide Association Results – 2006
New scales of computational tools required to manage and analyze these data
Replication is key
• Don’t believe a report of association from a single study– Even with strict quality control there are
artifacts that can affect 1 every thousand or ten thousand SNPs and escape notice
– Strict genomewide significance generally not dramatically exceeded (if it’s reached at all in a single study)
2000 2001 2002 2003 2004 2005 2006 2007 2008
NOD25q31
5p1310q213p21PTPN2IRGMIL12BNKX2-3TNFSF15
IL23RATG16L1
Individual GWA studies – Crohn’s
Andre Franke Dermot McGovern Kai Wang Richard Duerr A Hillary Steinhart Edouard Louis Kent Taylor Richard Gearry Albert Cohen Frank Seibold Leif Torkvist Rinse Weersma Amir Levine G Radford Smith Lisa Simms Robert Baldassano Andre Van Gossum Grant Montgomery Marc Lemann Salvator Cucchiara Anne Griffiths Hakon Hakonarson Mario Cottone Severine Vermeire Anne Phillips Ian Lawrance Mark Daly Stefan Schreiber Carl Anderson Jack Satsangi Mark Silverberg Stephan Brand Carsten Buning James Lee Marla Dubinsky Stephan Targan Cathryn Edwards Jean-Fred Colombel Martine De vos Steve Guthery Cecile Libioulle Jean-Pierre Hugot Mauro D’Amato Steven R Brant Charlie Lees Jeff Barrett Michel Georges Subra Kugathasan Christopher Mathew Jeremy Sanderson Miles Parkes Tariq Ahmad Cisca Wijmenga Jerry Rotter Miquel Sans Ted Denson Craig Mowat John Mansfield Murray Barclay Theodore Bayless Daan Hommes John Rioux Paul Rutgeerts Thomas Balschun Debby Laukens Jonas Halfvarson Phil Schumm Tim Florin Deborah D Proctor Judy Cho Pieter Stokkers Vito Annese Denis Franchimont Julián Panés Rebecca Roberts William Newman Derek P Jewell Jürgen Glas Renata D'Inca Yashoda Sharma
Fundamental change 3: CollaborationInternational IBD Genetics Consortium
Fundamental change 3: CollaborationInternational IBD Genetics Consortium
A Genome-Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease GeneRichard H. Duerr, Kent D. Taylor, Steven R. Brant, John D. Rioux, Mark S. Silverberg, Mark J. Daly, A. Hillary Steinhart, Clara Abraham, Miguel Regueiro, Anne Griffiths, Themistocles Dassopoulos, Alain Bitton, Huiying Yang, Stephan Targan, Lisa Wu Datta, Emily O. Kistner, L. Philip Schumm, Annette T. Lee, Peter K. Gregersen, M. Michael Barmada, Jerome I. Rotter, Dan L. Nicolae, Judy H. Cho*
Genome-wide association study identifies new susceptibility loci for Crohn’s disease and implicates autophagy in disease pathogenesisJohn D Rioux, Ramnik J Xavier, Kent D Taylor, Mark S Silverberg, et al.
Pre-GWAS: NOD2, 5q31identified by consortiummembers
Gene Finding in Crohn’sNIDDK-IBDGC
International IBDGC
InflammatoryBowel Disease:
Crohn’s Disease (CD)&
Ulcerative Colitis (UC)Chronic, heritable inflammatory diseases of the gastrointestinal system.
• 2000: 0 genes known• 2005: 2 genes known• 2010: 100 confirmed lociWhat do we do with 100 genes?
Many associated genes implicate limited number of pathways
Computational tools enable the integration of ‘humangenetic screens’ with other genome-scale screening data
‘GRAIL plot’ from Franke et al 2010
Novel Insights Into Disease Biology =Novel Targets for Therapeutic Development
General immune pathway associated with CD, UC,
MS, psoriasis
General immune pathway associated with CD, UC,
MS, psoriasis
Innate immune components(NOD2, ATG16L1, IRGM)
specifically associated to CD
Innate immune components(NOD2, ATG16L1, IRGM)
specifically associated to CD
Autophagy, Innate Immunity IL23 signaling
AssociatedComponents:
IL23RIL12BTYK2JAK2STAT3NKX2-3
Ramnik Xavier
Any clinical value?
• Individually, DNA variants discovered by GWAS are not predictive of individual outcome
• In large numbers, and alongside environmental and traditional biomarkers, there may be clinical value
Genetic+Serologic prediction in IBD
CD & UC patients can be distinguished accurately
Patients with predictions strongly discordant from initial dx are 10x enriched for later diagnostic reclassification, complications Jonah Essers
Many other parallel successes
• 95 associated loci for blood lipids
• 32 associated loci for type 2 diabetes
• Numerous rare deletions/duplications associated to both autism and schizophrenia
Redefining disease boundaries and relationships
Locus Type
SCZ freq (%)
Odds ratio
Size (MB) Genes micro
RNA Other Diseases
1q21.1 del 0.25 7-15 0.9 10 yes Aut, MR, Epilepsy
3q29 del 0.08 17 1.6 21 Aut, MR
15q13.3 del 0.20 12-18 1.5 10 yes Aut, MR, Epilepsy
16p11.2 dup 0.3 8-25 0.5 >25 Aut, Bipolar Disorder
17q12 del 0.063 4.5 1.4 15 Aut, Epilepsy
22q11.2 del 1.0 30 1.5/3.0 44 yes Aut, MR, Epilepsy
NRXN1 del 0.19 2 variable 1 Aut
Every event which increases schizophrenia risk also increases autism risk
Copy-number risk factors for schizophrenia
More than half of all genes defined forone autoimmune disease are riskfactors for one or more other immune-mediated conditions
Limitations of GWA
• Often difficult to identify which gene is implicated
• For only a small fraction have conclusive, actionable, functional alleles been defined
• Rarer variation not yet assessed
Genomes
10k
1M
100k
100
1,000
Genome Sequencing is now a realityC
ost p
er H
uman
Gen
ome
Venter / capillaries
Watson / 454
African, Asian, Cancer pair169 in Genbank
Costs
$1M
$10k
$100M
$10M
$100k
2012
Time
20082007 2009 2010 2011
Individual Genome Sequencing
Enormous Potential…• Ng, Shendure: Miller syndrome, 4 cases
– exome sequenced reveals causal mutations in DHODH• Choi, Lifton: Undiagnosed congenital chloride diarrhoea (consanguinous)
– Exome seq reveals homozygous SLC26A3 chloride ion transporter mutation– Return diagnosis of CLD (gi) – corrected prior dx of Bartter syndrome (renal)
• Worthey, Dimmock: 4-year old, severe unusual IBD– exome seq reveals XIAP mutation (at a highly conserved aa) – proimmune disregulation opt for bone marrow transplant over chemo
• Jones, Marra: Secondary lung carcinoma unresponsive to erlotinib– Genome and transcriptome sequencing reveals defects– directs alternative sunitinib therapy
• Mardis, Wilson: acute myelocytic leukaemia but not classical translocation– Genome sequencing (1 week + analysis) reveals PML-RARA translocation– Directs ATRA (all trans retinoic acid) treatment decision
…Tremendous challenges
• Managing and processing vast quantities of data into variation
• Interpreting millions of variants per individual– An individual’s genome harbors
• ~50 point nonsense mutations• ~100-150 frameshift mutations• Tens of splice mutants, CNV induced gene
disruptions
For very few of these do we have any conclusive understanding of their medical impact in the population
Clinical Genetics: Current Paradigm
Issued: 01 MAR 07 Recommended next check: 28 FEB 10
PGI id: 5910322 – 61215923014
Are there validated genetic findings
that can influencethis decision?
3: 12,300 3: 12,400 ( kb )
PPARg
CONSEQUENTLY,CLINICAL DECISION MAKING RARELY INFORMED BY GENETICS
Consultation
Consent
Clinical assessment
CLINICAL DECISION TO BE MADE
USUALLY NO OFTEN IT IS NOT
Is it cost-effective/Feasible/
Reimbursableto acquire the data
Clinical Genetics: Future Paradigm
INDIVIDUAL SEQUENCEacquired in advance
Issued: 01 MAR 07 Recommended next check: 28 FEB 10
PGI id: 5910322 – 61215923014
Subset of VALIDATED, INTERPRETABLE
VARIANTS available for clinical use
GWAS, targeted studies
Clinical studies Population Variation
Medical SequencingFunctional annotation
3: 12,300 3: 12,400 ( kb )
PPARg
KNOWLEDGEBASE of DNA VARIATIONand their MEDICAL IMPACT
Variant: C3 : 12,450,610 : T/C:
PPARG : Pro 467 Leu :
Medical
consequence:
Associated with severe insulin resistance, diabetes mellitus, hypertension, familial lipodystrophy
Pharmacological
consequence:
Resistant to thiazolidinediones
CLINICAL DECISION MAKINGOFTEN INFORMED BY GENETICS
Consultation
Consent
Clinical assessment
Adapted from Bentley et al. Nature 429: pp. 440-5 (2004)
Analytic and Translational Genetics Unit
• First new unit in the department of medicine in recent years
• Core faculty will focus on developing key analytic & translational infrastructure surrounding the interpretation of genome sequence
• Affiliated faculty will represent clinical partnerships throughout the hospital
• Close partnership with CHGR and Broad Institute
Tackling challenges of sequencing era
n = 2n = 3n = 4
n = 5n = 6
n = 7
n = 8
p = 1/2
Risk/Low Protective/High
n = 9
PROCESSING
ANALYSIS
BIOLOGICAL & CLINICAL INTERPRETATION
APO-B : triglyceridesC-alpha p<.001
PPI network of genes harboring de novo mutations in autism
ATGU
INDIVIDUAL SEQUENCEacquired in advance
Issued: 01 MAR 07 Recommended next check: 28 FEB 10
PGI id: 5910322 – 61215923014
Subset of VALIDATED, INTERPRETABLE VARIANTS
(available for clinical use)
GWAS, targeted studies
Clinical studies Population VariationMedical SequencingFunctional annotation
3: 12,300 3: 12,400 ( kb )
PPARg
KNOWLEDGEBASE of VARIANTSand their MEDICAL IMPACT
Variant: C3 : 12,450,610 : T/C:
PPARG : Pro 467 Leu :
Medical
consequence:
Associated with severe insulin resistance, diabetes mellitus, hypertension, familial lipodystrophy
Pharmacological
consequence:
Resistant to thiazolidinediones
CLINICAL DECISION MAKINGOFTEN INFORMED BY GENETICS
Consultation
Consent
Clinical assessment
Goal 1: develop this knowledgebase Goal 2: through partnerships with clinical divisions, pathology – solve challenges of acquisition & delivery of actionable genetic information
Outstanding challenges in human genetics
I. Common Disease
• We can discover common SNPs associated to disease but cannot
– Systematically figure out which gene near the SNP is responsible
– Readily define functional variants
– Discover rare variation associated to disease
II. Mendelian Disease
• We can find the region of the genome but there are no shortcuts which of many genes bears responsible mutations
• Classical sequencing approaches can completely miss some types of mutations such as inversions, deletions/duplications
• Severe individual cases/families unapproachable until many others discovered
III. Spontaneous mutations• Most cancers caused by somatic
mutations (not inherited in germ line)• Many developmental disorders and other
severe conditions caused by spontaneous mutations
• While these conditions are even more “genetic” than many others we study, these types of variation are not amenable to standard gene mapping techniques
In a few brief years, next-generation sequencing has
developed and is now able to address all of these discovery
bottlenecks!!!
Pilot Experiment
• 350 CD and 350 Controls from NIDDK IBDGC GWAS sequenced for ~50 genes in Crohn’s associated regions (ca. Barrett et al 2008)– Pooled (n=50) NGS (PCR->Illumina)
• ~100 low frequency (2 copies to 5% MAF) ‘functional’ variants (missense/nonsense/splice) discovered (many singletons as well)
Replication will be keyCD versus HC +
UC (CD loci) GENE/AASub
seq seq
Replication p-value
case alleles
control alleles
chr16:49308343 NOD2,M863V 6 1 (12690,25210) 1.10E-16chr16:49308311 NOD2,N852S 5 0 (12456,24462) 5.25E-08chr16:49303430 NOD2,R703C 7 6 (10788,22461) 4.40E-08chr16:49302615 NOD2, S431L 3 5 (11076,23436) 0.00044chr2:102434852 IL18RAP,Val527Leu 4 2 (8674,17662) 6.94E-05chr12:39226476 MUC19,V56M 3 2 (7190,14577) 0.007
IBD versus HC (CD + UC loci)
chr9:138379413 CARD9,IVS10+1C>G 0 6 (25849,16440) <1e-16
chr1:67478488 IL23R,V362I 9 12 (19371,12393)
1.99E-07chr1:67421184 IL23R, Gly149Arg 2 1 (18418,16012) 3.00E-04chr10:35354137 CUL2,IVS17+ 5A>G 4 4 (19984,13003) 0.0043chr9:5062561 JAK2, Gly571Ser 1 3 (18453,11914) 0.01
Counts from pooled sequencing of 350 Crohn’s cases and 350 controls –
Identified ~100 low frequency ‘functional’ variants (2 copies up to 5%) across 50 sequenced genes – none are statistically compelling
Manny Rivas
Replication will be keyCD versus HC +
UC (CD loci) GENE/AASub
seq seq
Replication p-value
case alleles
control alleles
chr16:49308343 NOD2,M863V 6 1 (12690,25210) 1.10E-16chr16:49308311 NOD2,N852S 5 0 (12456,24462) 5.25E-08chr16:49303430 NOD2,R703C 7 6 (10788,22461) 4.40E-08chr16:49302615 NOD2, S431L 3 5 (11076,23436) 0.00044chr2:102434852 IL18RAP,Val527Leu 4 2 (8674,17662) 6.94E-05chr12:39226476 MUC19,V56M 3 2 (7190,14577) 0.007
IBD versus HC (CD + UC loci)
chr9:138379413 CARD9,IVS10+1C>G 0 6 (25849,16440) <1e-16
chr1:67478488 IL23R,V362I 9 12 (19371,12393)
1.99E-07chr1:67421184 IL23R, Gly149Arg 2 1 (18418,16012) 3.00E-04chr10:35354137 CUL2,IVS17+ 5A>G 4 4 (19984,13003) 0.0043chr9:5062561 JAK2, Gly571Ser 1 3 (18453,11914) 0.01
Replication attempted for 65 variants in >10,000 cases and>20,000 controls establishes highly significant novel variants were discovered
Unique challenge of deep sequencing data – especially exome data –Relevant variants are almost certainly found but we have next to no ability to recognize them from the sequencing data alone
Manny Rivas
Allelic series at CARD9
Common (~50%) risk alleledefined by GWAS (OR ~ 1.2) –allele correlates with higher expressionp < 10-15
Rare (~0.5%) protective allele defined by deep sequencing of GWAS hits (OR ~ 4-5) confirmed by genotyping in 10,000 casesp < 10-15
Exon 10 skipped, out of frame, transcript ends prematurely in exon 11
CARD9
Ongoing ATGU efforts
• Exome sequencing in autism, schizophrenia– Association of standing & de novo variation– Identification of rare recessive subtypes
• Complete definition of full allelic spectrum of GWAS hits (IBD, other immune mediated)– Define functionally actionable alleles– Move to experiments
• Genetics of Drug Induced Liver Injury• Clinical/Mendelian sequencing
– Genetically diagnosing severely affected children
Daly & AltshulerLabs
Chris Cotsapas *Jonah EssersJason FlannickTodd GreenAndrew KirbyElaine LimJared MaguireBen NealeSoumya Raychaudhuri *Stephan RipkeManny RivasLiz RossinAyellet SegreBen Voight
Crohn’s Disease
Ramnik Xavier & LabAlan Huett, Petric Kuballa, Agnes Gardet, Aylwin Ng, Yair Benita
John Rioux & Lab (UdeM)Philippe Goyette, Guillaume Lettre
Investigators of NIDDK IBDGCJudy ChoSteve BrantRick DuerrJerry RotterMark SilverbergDermot McGovern
Miles Parkes & Wellcome Trust IBDBelgian-French IBD Consortium
IIBDGC Analysis TeamJeff Barrett, Dan Nicolae, Michel Georges, Sarah HansoulCarl Anderson, Emily Kistner
Broad Institute
Christine StevensNoel BurttStacey Gabriel
Mark dePristoKiran GarimellaEric Banks& the GSA team!
ATGUShaun PurcellBen Neale
Top Related