Download - Progress in IBD Genetics - massgeneral.org · No novel therapies for neuropsych disorders ... countless Mendelian disorders: ... Project ran from 2003-2007 following the

Progress in IBD Genetics

Mark DalyChief, Analytic and Translational Genetics Unit (ATGU)

Massachusetts General Hospital / HMSand

Senior Associate Member, Broad Institute

Primary: Informing therapeutic development

Identifytarget

Chemicalscreening

Clinical trials

Most therapies developed through traditional approaches fail to have efficacy

Animal models

No novel therapies for neuropsych disorders

Genetics research reveals theunderlying biological causes ofdisease

Why we have statins

• Brown and Goldstein discover genetic cause of extreme familial hypercholesterolemia

Why we have statins

• Brown and Goldstein discover mutations in LDLR cause extreme familial hypercholesterolemia

• Understanding of the cause of this disease (mutations cause a lack of receptors that remove cholesterol from the blood) uncovered a general mechanism and proposed an effective approach to therapy for high LDL cholesterol

Secondary: Predicting individual outcomes

While not the primary goal, genetics can in some cases provide individualized medical insights

- diagnostics in cases of severe genetic disorders

- prediction of individuals at risk for severe idiosyncratic adverse drug responses

- identification of groups more or less likely to benefit from specific therapeutic interventions

Linkage mapping (1913)A B C

I suddenly realized that the variations in strength of linkage, already attributed by Morgan to differences in the spatial separation of genes, offered the possibility of determining sequences in the linear dimension of a chromosome. I went home and spent most of the night (to the neglect of my undergraduate homework) in producing the first chromosome map, which included the sex linked genes y, w, v, m, and r, in the order and approximately the same relative spacing that they still appear on the standard maps

— Alfred Sturtevant “A History of Genetics”

‘Mendelian’ diseases travel predictablyand consistently in families

Aa AA Aa Aa AA AA Aa AA

Aa AA

Dominant transmission

Thousands of diseases or traits caused by mutations in a single gene (e.g., Huntington’s, CF, muscular dystrophy)

Family-based linkage analysisAC

AA

AC

AC

AC

AC

AA

AA

AA

AA

A/C Disease GeneSaw dramatic successes in the 1980s-90s for the localization of genes underlying countless Mendelian disorders:Huntington’s, CF, DMD, early onset forms of breast cancer, Alzheimers, diabetes…

Tracking ‘co-segregation’ of DNA polymorphisms with disease status permits identification of region containing responsible gene and mutations

Mendelian disease genetics

genotypegenotype disease statedisease state

Linkage analysis and positional cloning are powerful because genetic risk factors are highly “penetrant”

Glazier, Nadeau and Aitman, Science 2002

Linkage revolution in medical genetics did not generally apply to complex traits

The first complex trait mapping:Altenburg and Muller (1920)

Hermann Muller

Polygenicity proved a fundamental barrier…

…as did the incomplete genotype-phenotype

relationship

disease statedisease state

phenotype1

phenotype2

phenotype3

phenotype4 phenotype5

Exposures / environment

genotypegenotype

other genes

“We suggest that evolutionary changes in anatomy and way of life are more often based on changes in the mechanisms controlling the expression of genes than on sequence changes in proteins. We therefore propose that regulatory mutations account for the major biological di erences between ffhumans and chimpanzees.” – King & Wilson. Science. April, 1975.

Many Mendelian traits can sum to a continuous distribution

Ronald Fisher (1918)

Complex traits

• Instead of one gene determining a disease or trait, many genes each exert a small influence

• None by themselves can cause or explain the disease or trait fully – but together with environmental influences combine to define an individual outcome

MOST COMMON DISEASES WORK THIS WAY

ten years ago

• First human genome was just being completed

• More than 1000 Mendelian traits had been genetically decoded

• Only a handful of genes had been discovered for any common disease

Three major elements turned the tide

Genome ResourcesTechnology

Collaboration

Fundamental Change 1:Understanding the Genome

• Sequencing of the human genome

• Understanding and cataloging variation in the genome (HapMap, 1000 Genomes Project)

Re-screen an independent sample

≈90% chance of seeing same variation with freq. > 1%

Human genetic differences are modest and largely attributable to common variation

Compare any two chromosomes: one in a thousand bp heterozygous

Gabriel et al., Science, 2002

ATGCCGATCGTACGACACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGATCCATTTTATACTGACTGCATCGTACTGACTGCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTTTACCCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATAGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATAGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTG

ATGCCGATCGTACGACACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGATCCATTTTATACTGACTGCATCGTACTGACTGCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTTTACCCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATAGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGACTAAACACATCCCACTTTACCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATAGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGCATCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTATTCTAAACACATCCCAGCATCCATCCATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCTATGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGACTGCATCGTACTGACTGCACATATCGTCATACATAGACTTCGTACTGACTGTCTAGTCTAAACACATCCCACATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACTTTACCCATGATATCGTCATCGTACTGACTGTCTAGTCTAAACACATCCCACACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACGCCGATCGTACGACACATATCGTCATCGTACTGCCCTACGGGACTGTCTAGTCTAAACACATCCATCGTACTGACTGCATCGTACTG

Genetic variation• Single nucleotide polymorphisms

– 1 every few hundred bp, mutation rate* ≈ 10-9

• Short indels (=insertion/deletion)– 1 every few kb, mutation rate v. variable

• Microsatellite (STR) repeat number– 1 every few kb, mutation rate ≤ 10-3

• Minisatellites– 1 every few kb, mutation rate ≤ 10-1

• Repeated genes– rRNA, histones

• Large deletions, duplications, inversions– Rare, e.g. Y chromosome

TGCATTGCGTAGGCTGCATTCCGTAGGC

TGCATT---TAGGCTGCATTCCGTAGGC

TGCTCATCATCATCAGCTGCTCATCA------GC

≤100bp

1-5kb

* rates per generation

Different, but not that different• Humans are among the least diverse organisms

Species Diversity (percent)

Humans 0.08 - 0.1

Chimpanzees 0.12 - 0.17

Drosophila simulans 2

E. coli 5

HIV1 30

Photos from UN photo gallery www.un.org/av/photoCourtesy of Gil McVean - Oxford

10M or so common variants: typically shared across populations

Gabriel et al, Science 2002 Rosenberg et al, Science 2002

Haplotypes identify shared regional ancestry

Haplotypes identify shared regional ancestry

many generations of recombination, drift later

mutation occurs

What is a haplotype? • A genotype of an individual are the

specific bases carried by the individual at a variant site

• A haplotype is the set of those variable bases along a chromosome C C A haplotype

ATTCGACGTATCGGCACTAACAGGATTCGATGTATTGGCACTTACAGG T T T haplotype

Identification of cystic fibrosisgene

(1989)

courtesy of A. Chakravarti

Septin2-likegenes

genesRAD50IL13IL4

IL5 IRF1

OCTN2 OCTN1 RIL

P4HA2

CSF2IL3

LACS2

SNPs

= 50 kb

CAh14b ATTh14c IL4m2 GAh18a CAh15a

IRF1p1

CAh17a D5S1984 CSF2p10

GGACAACCAATTCGTG

TTACGCCCAA

CGGAGACGAGACTGGTCGCGCAGACGA

CGCGCCCGGATTTGCCCCGGCTCTGCTATAACCCTGCCCCAACC

CCAGCCAACC GCGCTCCACC

CCGATCTGAC CTGACATACT

CCCTGCTTACGGTGCAGTGGCACGTATT*CACATCACTCCCCAGACTGTGATGTTAGTATCTTCCCATCCATCATGGTCGAATGCGTACATTACCCCGCTTACGGTGCAGTGGCACGTATATCA

CGTTTAGTAATTGGTGTT*GATGATTAG

ACAACAGTGACG GCGGTGACGGTG

GTTCTGATGTGCGGTG*GTAA

TATAGTATCACGGCG

84 kb 3 kb 14 kb 30 kb 25 kb 11 kb 92 kb 21 kb 27 kb 55 kb 19 kb

94% 96% 92% 94% 93% 97% 93% 91% 92% 90% 98%

Θ .08 .40 .33 .05 .11 .05 .07 .02 .27 .24

Haplotype Map of 5q31

Daly et al., Nature Genetics, Oct. 2001

* Common (35%) haplotype is associated with CD* 2:1 over-transmission of risk haplotype* present in >75% CD patients

>3,000,000 SNPs typed in samples from Nigeria, USA, China, Japan

Detailed catalog of human genetic variation patterns

Project ran from 2003-2007 following the Human Genome Project

Fundamental change 2: Technological advances accelerate discovery

Affecteds Controls AC

AA

AC

AC

AC

AC

AA

AA

AA

AA

Enable us to compare the genomes of…

Populations of cases and controls Family members with and without disease

New technologies to examine genomes…

Arrays of millions Sequencing the entire of DNA variants human genome (2005-present) (2010)

InflammatoryBowel Disease:

Crohn’s Disease (CD)&

Ulcerative Colitis (UC)Chronic, heritable inflammatory diseases of the gastrointestinal system.

• 2000: 0 genes known• 2005: 2 genes knownOnly a tiny fraction of the

heritability understood

Linkage and Candidate Gene Era

2000 2001 2002 2003 2004 2005 2006 2007 2008

NOD25q31

Performing a Genomewide Association Study

Key Quality Control Concepts• Technical QC

– Removal of failed SNPs, samples

• Genetic QC– Mendelian segregation and HWE– Estimating relatedness, gender– Population structure

• Analysis-based QC– Do initial runs of test statistics show inflation, biases

towards missing data, specific allele frequencies

Testing for association

• Most straightforward: compare proportion of each SNP allele in cases and controls

rs11209026 Allele A Allele G

Cases 22 976

Controls 68 932

Chi-sq = 24.5, p=7.3 x 10-7

Multiple Testing• In linkage, p = .001 (.05 / ~50 chromosomal

arms) considered potentially significant

• In GWAS, we’re performing O(105-106) tests that are largely independent– Each study has hundreds of p<.001 purely by

statistical chance (no real relationship to disease)

– “Genome-wide significance” often set at p=5x10-8 (= .05 / 1 million tests)

Genomewide Association

‘Manhattan’ plot

Q-Q plot

NOD2/CARD15

IBD5

ATG16L1

IL23R

10q

Previously Found by Linkage

Previously Found by Linkage

NOT Found by linkage

NOT Found by linkage

First Genomewide Association Results – 2006

New scales of computational tools required to manage and analyze these data

Replication is key

• Don’t believe a report of association from a single study– Even with strict quality control there are

artifacts that can affect 1 every thousand or ten thousand SNPs and escape notice

– Strict genomewide significance generally not dramatically exceeded (if it’s reached at all in a single study)

2000 2001 2002 2003 2004 2005 2006 2007 2008

NOD25q31

5p1310q213p21PTPN2IRGMIL12BNKX2-3TNFSF15

IL23RATG16L1

Individual GWA studies – Crohn’s

Andre Franke Dermot McGovern Kai Wang Richard Duerr A Hillary Steinhart Edouard Louis Kent Taylor Richard Gearry Albert Cohen Frank Seibold Leif Torkvist Rinse Weersma Amir Levine G Radford Smith Lisa Simms Robert Baldassano Andre Van Gossum Grant Montgomery Marc Lemann Salvator Cucchiara Anne Griffiths Hakon Hakonarson Mario Cottone Severine Vermeire Anne Phillips Ian Lawrance Mark Daly Stefan Schreiber Carl Anderson Jack Satsangi Mark Silverberg Stephan Brand Carsten Buning James Lee Marla Dubinsky Stephan Targan Cathryn Edwards Jean-Fred Colombel Martine De vos Steve Guthery Cecile Libioulle Jean-Pierre Hugot Mauro D’Amato Steven R Brant Charlie Lees Jeff Barrett Michel Georges Subra Kugathasan Christopher Mathew Jeremy Sanderson Miles Parkes Tariq Ahmad Cisca Wijmenga Jerry Rotter Miquel Sans Ted Denson Craig Mowat John Mansfield Murray Barclay Theodore Bayless Daan Hommes John Rioux Paul Rutgeerts Thomas Balschun Debby Laukens Jonas Halfvarson Phil Schumm Tim Florin Deborah D Proctor Judy Cho Pieter Stokkers Vito Annese Denis Franchimont Julián Panés Rebecca Roberts William Newman Derek P Jewell Jürgen Glas Renata D'Inca Yashoda Sharma

Fundamental change 3: CollaborationInternational IBD Genetics Consortium

Fundamental change 3: CollaborationInternational IBD Genetics Consortium

A Genome-Wide Association Study Identifies IL23R as an Inflammatory Bowel Disease GeneRichard H. Duerr, Kent D. Taylor, Steven R. Brant, John D. Rioux, Mark S. Silverberg, Mark J. Daly, A. Hillary Steinhart, Clara Abraham, Miguel Regueiro, Anne Griffiths, Themistocles Dassopoulos, Alain Bitton, Huiying Yang, Stephan Targan, Lisa Wu Datta, Emily O. Kistner, L. Philip Schumm, Annette T. Lee, Peter K. Gregersen, M. Michael Barmada, Jerome I. Rotter, Dan L. Nicolae, Judy H. Cho*

Genome-wide association study identifies new susceptibility loci for Crohn’s disease and implicates autophagy in disease pathogenesisJohn D Rioux, Ramnik J Xavier, Kent D Taylor, Mark S Silverberg, et al.

Pre-GWAS: NOD2, 5q31identified by consortiummembers

Gene Finding in Crohn’sNIDDK-IBDGC

International IBDGC

InflammatoryBowel Disease:

Crohn’s Disease (CD)&

Ulcerative Colitis (UC)Chronic, heritable inflammatory diseases of the gastrointestinal system.

• 2000: 0 genes known• 2005: 2 genes known• 2010: 100 confirmed lociWhat do we do with 100 genes?

Many associated genes implicate limited number of pathways

Computational tools enable the integration of ‘humangenetic screens’ with other genome-scale screening data

‘GRAIL plot’ from Franke et al 2010

Novel Insights Into Disease Biology =Novel Targets for Therapeutic Development

General immune pathway associated with CD, UC,

MS, psoriasis

General immune pathway associated with CD, UC,

MS, psoriasis

Innate immune components(NOD2, ATG16L1, IRGM)

specifically associated to CD

Innate immune components(NOD2, ATG16L1, IRGM)

specifically associated to CD

Autophagy, Innate Immunity IL23 signaling

AssociatedComponents:

IL23RIL12BTYK2JAK2STAT3NKX2-3

Ramnik Xavier

Any clinical value?

• Individually, DNA variants discovered by GWAS are not predictive of individual outcome

• In large numbers, and alongside environmental and traditional biomarkers, there may be clinical value

Genetic+Serologic prediction in IBD

CD & UC patients can be distinguished accurately

Patients with predictions strongly discordant from initial dx are 10x enriched for later diagnostic reclassification, complications Jonah Essers

Many other parallel successes

• 95 associated loci for blood lipids

• 32 associated loci for type 2 diabetes

• Numerous rare deletions/duplications associated to both autism and schizophrenia

Redefining disease boundaries and relationships

Locus Type

SCZ freq (%)

Odds ratio

Size (MB) Genes micro

RNA Other Diseases

1q21.1 del 0.25 7-15 0.9 10 yes Aut, MR, Epilepsy

3q29 del 0.08 17 1.6 21 Aut, MR

15q13.3 del 0.20 12-18 1.5 10 yes Aut, MR, Epilepsy

16p11.2 dup 0.3 8-25 0.5 >25 Aut, Bipolar Disorder

17q12 del 0.063 4.5 1.4 15 Aut, Epilepsy

22q11.2 del 1.0 30 1.5/3.0 44 yes Aut, MR, Epilepsy

NRXN1 del 0.19 2 variable 1 Aut

Every event which increases schizophrenia risk also increases autism risk

Copy-number risk factors for schizophrenia

More than half of all genes defined forone autoimmune disease are riskfactors for one or more other immune-mediated conditions

Limitations of GWA

• Often difficult to identify which gene is implicated

• For only a small fraction have conclusive, actionable, functional alleles been defined

• Rarer variation not yet assessed

Genomes

10k

1M

100k

100

1,000

Genome Sequencing is now a realityC

ost p

er H

uman

Gen

ome

Venter / capillaries

Watson / 454

African, Asian, Cancer pair169 in Genbank

Costs

$1M

$10k

$100M

$10M

$100k

2012

Time

20082007 2009 2010 2011

Individual Genome Sequencing

Enormous Potential…• Ng, Shendure: Miller syndrome, 4 cases

– exome sequenced reveals causal mutations in DHODH• Choi, Lifton: Undiagnosed congenital chloride diarrhoea (consanguinous)

– Exome seq reveals homozygous SLC26A3 chloride ion transporter mutation– Return diagnosis of CLD (gi) – corrected prior dx of Bartter syndrome (renal)

• Worthey, Dimmock: 4-year old, severe unusual IBD– exome seq reveals XIAP mutation (at a highly conserved aa) – proimmune disregulation opt for bone marrow transplant over chemo

• Jones, Marra: Secondary lung carcinoma unresponsive to erlotinib– Genome and transcriptome sequencing reveals defects– directs alternative sunitinib therapy

• Mardis, Wilson: acute myelocytic leukaemia but not classical translocation– Genome sequencing (1 week + analysis) reveals PML-RARA translocation– Directs ATRA (all trans retinoic acid) treatment decision

…Tremendous challenges

• Managing and processing vast quantities of data into variation

• Interpreting millions of variants per individual– An individual’s genome harbors

• ~50 point nonsense mutations• ~100-150 frameshift mutations• Tens of splice mutants, CNV induced gene

disruptions

For very few of these do we have any conclusive understanding of their medical impact in the population

Clinical Genetics: Current Paradigm

Issued: 01 MAR 07 Recommended next check: 28 FEB 10

PGI id: 5910322 – 61215923014

Are there validated genetic findings

that can influencethis decision?

3: 12,300 3: 12,400 ( kb )

PPARg

CONSEQUENTLY,CLINICAL DECISION MAKING RARELY INFORMED BY GENETICS

Consultation

Consent

Clinical assessment

CLINICAL DECISION TO BE MADE

USUALLY NO OFTEN IT IS NOT

Is it cost-effective/Feasible/

Reimbursableto acquire the data

Clinical Genetics: Future Paradigm

INDIVIDUAL SEQUENCEacquired in advance


PGI id: 5910322 – 61215923014

Subset of VALIDATED, INTERPRETABLE

VARIANTS available for clinical use

GWAS, targeted studies

Clinical studies Population Variation

Medical SequencingFunctional annotation

3: 12,300 3: 12,400 ( kb )

PPARg

KNOWLEDGEBASE of DNA VARIATIONand their MEDICAL IMPACT

Variant: C3 : 12,450,610 : T/C:

PPARG : Pro 467 Leu :

Medical

consequence:

Associated with severe insulin resistance, diabetes mellitus, hypertension, familial lipodystrophy

Pharmacological

consequence:

Resistant to thiazolidinediones

CLINICAL DECISION MAKINGOFTEN INFORMED BY GENETICS

Consultation

Consent

Clinical assessment

Adapted from Bentley et al. Nature 429: pp. 440-5 (2004)

Analytic and Translational Genetics Unit

• First new unit in the department of medicine in recent years

• Core faculty will focus on developing key analytic & translational infrastructure surrounding the interpretation of genome sequence

• Affiliated faculty will represent clinical partnerships throughout the hospital

• Close partnership with CHGR and Broad Institute

Tackling challenges of sequencing era

n = 2n = 3n = 4

n = 5n = 6

n = 7

n = 8

p = 1/2

Risk/Low Protective/High

n = 9

PROCESSING

ANALYSIS

BIOLOGICAL & CLINICAL INTERPRETATION

APO-B : triglyceridesC-alpha p<.001

PPI network of genes harboring de novo mutations in autism

ATGU

INDIVIDUAL SEQUENCEacquired in advance


PGI id: 5910322 – 61215923014

Subset of VALIDATED, INTERPRETABLE VARIANTS

(available for clinical use)

GWAS, targeted studies

Clinical studies Population VariationMedical SequencingFunctional annotation

3: 12,300 3: 12,400 ( kb )

PPARg

KNOWLEDGEBASE of VARIANTSand their MEDICAL IMPACT

Variant: C3 : 12,450,610 : T/C:

PPARG : Pro 467 Leu :

Medical

consequence:

Associated with severe insulin resistance, diabetes mellitus, hypertension, familial lipodystrophy

Pharmacological

consequence:

Resistant to thiazolidinediones

CLINICAL DECISION MAKINGOFTEN INFORMED BY GENETICS

Consultation

Consent

Clinical assessment

Goal 1: develop this knowledgebase Goal 2: through partnerships with clinical divisions, pathology – solve challenges of acquisition & delivery of actionable genetic information

Outstanding challenges in human genetics

I. Common Disease

• We can discover common SNPs associated to disease but cannot

– Systematically figure out which gene near the SNP is responsible

– Readily define functional variants

– Discover rare variation associated to disease

II. Mendelian Disease

• We can find the region of the genome but there are no shortcuts which of many genes bears responsible mutations

• Classical sequencing approaches can completely miss some types of mutations such as inversions, deletions/duplications

• Severe individual cases/families unapproachable until many others discovered

III. Spontaneous mutations• Most cancers caused by somatic

mutations (not inherited in germ line)• Many developmental disorders and other

severe conditions caused by spontaneous mutations

• While these conditions are even more “genetic” than many others we study, these types of variation are not amenable to standard gene mapping techniques

In a few brief years, next-generation sequencing has

developed and is now able to address all of these discovery

bottlenecks!!!

Pilot Experiment

• 350 CD and 350 Controls from NIDDK IBDGC GWAS sequenced for ~50 genes in Crohn’s associated regions (ca. Barrett et al 2008)– Pooled (n=50) NGS (PCR->Illumina)

• ~100 low frequency (2 copies to 5% MAF) ‘functional’ variants (missense/nonsense/splice) discovered (many singletons as well)

Replication will be keyCD versus HC +

UC (CD loci) GENE/AASub

seq seq

Replication p-value

case alleles

control alleles

chr16:49308343 NOD2,M863V 6 1 (12690,25210) 1.10E-16chr16:49308311 NOD2,N852S 5 0 (12456,24462) 5.25E-08chr16:49303430 NOD2,R703C 7 6 (10788,22461) 4.40E-08chr16:49302615 NOD2, S431L 3 5 (11076,23436) 0.00044chr2:102434852 IL18RAP,Val527Leu 4 2 (8674,17662) 6.94E-05chr12:39226476 MUC19,V56M 3 2 (7190,14577) 0.007

IBD versus HC (CD + UC loci)

chr9:138379413 CARD9,IVS10+1C>G 0 6 (25849,16440) <1e-16

chr1:67478488 IL23R,V362I 9 12 (19371,12393)

1.99E-07chr1:67421184 IL23R, Gly149Arg 2 1 (18418,16012) 3.00E-04chr10:35354137 CUL2,IVS17+ 5A>G 4 4 (19984,13003) 0.0043chr9:5062561 JAK2, Gly571Ser 1 3 (18453,11914) 0.01

Counts from pooled sequencing of 350 Crohn’s cases and 350 controls –

Identified ~100 low frequency ‘functional’ variants (2 copies up to 5%) across 50 sequenced genes – none are statistically compelling

Manny Rivas

Replication will be keyCD versus HC +

UC (CD loci) GENE/AASub

seq seq

Replication p-value

case alleles

control alleles

chr16:49308343 NOD2,M863V 6 1 (12690,25210) 1.10E-16chr16:49308311 NOD2,N852S 5 0 (12456,24462) 5.25E-08chr16:49303430 NOD2,R703C 7 6 (10788,22461) 4.40E-08chr16:49302615 NOD2, S431L 3 5 (11076,23436) 0.00044chr2:102434852 IL18RAP,Val527Leu 4 2 (8674,17662) 6.94E-05chr12:39226476 MUC19,V56M 3 2 (7190,14577) 0.007

IBD versus HC (CD + UC loci)

chr9:138379413 CARD9,IVS10+1C>G 0 6 (25849,16440) <1e-16

chr1:67478488 IL23R,V362I 9 12 (19371,12393)

1.99E-07chr1:67421184 IL23R, Gly149Arg 2 1 (18418,16012) 3.00E-04chr10:35354137 CUL2,IVS17+ 5A>G 4 4 (19984,13003) 0.0043chr9:5062561 JAK2, Gly571Ser 1 3 (18453,11914) 0.01

Replication attempted for 65 variants in >10,000 cases and>20,000 controls establishes highly significant novel variants were discovered

Unique challenge of deep sequencing data – especially exome data –Relevant variants are almost certainly found but we have next to no ability to recognize them from the sequencing data alone

Manny Rivas

Allelic series at CARD9

Common (~50%) risk alleledefined by GWAS (OR ~ 1.2) –allele correlates with higher expressionp < 10-15

Rare (~0.5%) protective allele defined by deep sequencing of GWAS hits (OR ~ 4-5) confirmed by genotyping in 10,000 casesp < 10-15

Exon 10 skipped, out of frame, transcript ends prematurely in exon 11

CARD9

Ongoing ATGU efforts

• Exome sequencing in autism, schizophrenia– Association of standing & de novo variation– Identification of rare recessive subtypes

• Complete definition of full allelic spectrum of GWAS hits (IBD, other immune mediated)– Define functionally actionable alleles– Move to experiments

• Genetics of Drug Induced Liver Injury• Clinical/Mendelian sequencing

– Genetically diagnosing severely affected children

Daly & AltshulerLabs

Chris Cotsapas *Jonah EssersJason FlannickTodd GreenAndrew KirbyElaine LimJared MaguireBen NealeSoumya Raychaudhuri *Stephan RipkeManny RivasLiz RossinAyellet SegreBen Voight

Crohn’s Disease

Ramnik Xavier & LabAlan Huett, Petric Kuballa, Agnes Gardet, Aylwin Ng, Yair Benita

John Rioux & Lab (UdeM)Philippe Goyette, Guillaume Lettre

Investigators of NIDDK IBDGCJudy ChoSteve BrantRick DuerrJerry RotterMark SilverbergDermot McGovern

Miles Parkes & Wellcome Trust IBDBelgian-French IBD Consortium

IIBDGC Analysis TeamJeff Barrett, Dan Nicolae, Michel Georges, Sarah HansoulCarl Anderson, Emily Kistner

Broad Institute

Christine StevensNoel BurttStacey Gabriel

Mark dePristoKiran GarimellaEric Banks& the GSA team!

ATGUShaun PurcellBen Neale