Betwixt and Between; Common and Rare Genetic Variants in Human Disease Peter Szatmari MD Offord...

68
Betwixt and Between; Common and Rare Genetic Variants in Human Disease Peter Szatmari MD Offord Centre for Child Studies McMaster University McMaster Children’s Hospital

Transcript of Betwixt and Between; Common and Rare Genetic Variants in Human Disease Peter Szatmari MD Offord...

Betwixt and Between; Common and Rare Genetic Variants in Human Disease

Peter Szatmari MD

Offord Centre for Child Studies

McMaster University

McMaster Children’s Hospital

Financial Disclosure

The Canadian Institutes of Health Research

Autism Speaks Sinneave Family Foundation Ontario Research Fund Royalties from Guildford Press No other sources of funding (stocks,

industry, Big Pharma etc)

Objectives

What have we learned about the genetic architecture of ASD;

Focus on explanatory power of common and rare variants

Copy Number Variants as examples of rare risk factors

Neither story provides much explanatory power So we are “betwixt and between”; what does the

future hold? WGS?

What is Genetic Epidemiology?

The study of inherited factors in disease Combination of epidemiology and statistical

genetics Uses a variety of study designs to meet its

objectives

Steps in Genetic Epidemiology

Is the disorder familial?- family studies Is the familiality due to genetic factors?-twin and

adoption studies Can candidate genes be identified? Can chromosomal susceptibility regions be

identified?-GW linkage and association studies Exome and Whole genome sequencing? A disease can be genetic without being inherited The history of autism genetics thru these steps

An heterogeneous ‘spectrum’ disorder involving deficits in 3 domains of function

4 to 1 sex ratio, more females with severe ID

Changing epidemiology; more non-autism ASD

Changing epidemiology; less frequent ID

Diagnostic substitution occurringmedical co-morbidities

25-40%language

Anxiet

y

30-5

0%

Socialcommunication

deficits

cogn

itive

deficit

sStrict autism

Spectrumrestrictive/repetitive

behaviors

0.6 % to 1% prevalence

Increasing prevalence due to better case finding

Autism spectrum disorders

6

Family Studies

RR to sibs; 5% but based on old data collected retrospectively

Stoppage rules; when taken into account, sib RR increases to 10%

Baby sibs studies; RR now 19% Intermediate phenotypes in another 20%

Twin Studies

Twin studies; traits in general population and in diagnosed twins

Older studies of ASD twins; MZ vs DZ=.65 vs .05 Heritability >90%

Hallmeyer et al 2011; MZ vs DZ concordance Males .58 vs.21 Females .60 vs. 27

Greater role for shared environmental factors (55%) than genetic (37%)

The Genetic Architecture of ASD

Some single gene disorders; TS, FraX, NF, etc (5%)

Chromosomal abnormalities spread throughout the genome (5%)

Kelleher III R.J and Bear M.F (2008) Cell 135, October 31, 2008

391 cytogenetically-visible breakpoints in autismSource: http://projects.tcag.ca/autism/

11 122 5 9 104 863 131 7

14 20 21 22 Y15 16 18 X17 19

Translocation (n=126)

Deletion (n=128)

Inversion (n=37)

Duplication (n=100)

Breakpoints

What About the Other 90%?

Little family history of autism, low risk to sibs and twins

Like other genetically “complex” disorders such as CVD, epilepsy, obesity, diabetes, etc

Except that effect on fertility is greater Two models of genetic complexity

Common disease-common variant Common disease-rare variant

The Common disease-common variant model; finding genes

Candidate gene studies

Genome wide linkage

GWAS

Common Disease-Common Variant model

Non-syndromic, non-Mendelian ASD is a common disease, therefore it might be caused by common genetic variants

Polygenic multifactorial model; each gene has a small to moderate effect size

Many different variants with an additive effect

The London Underground; de Vries Nature Medicine 15 (8) August 2009

Candidate Gene Studies

ASD considered to be “caused” by neurotransmitters; 5HT, dopamine, NE

Focus on genes associated with regulating those proteins

Hundreds of positive results Hundreds of non-replications Small sample sizes, multiple testing of

different alleles, marker density, population stratification etc

Linkage Studies

Common variants of moderate to large effect size

Genetic (locus) homogeneity

Focus on affected sib pairs and non-parametric models

Linkage; Parametric Methods

Based on non-independent segregation of genetic markers and disease alleles

Developed for Mendelian disorders “Log of the odds” of linkage vs no linkage (>3.0 is

significant) Need dense families Accurate classification is essential Must specify a genetic model (gene frequency,

mode of transmission, penetrance)

Non Parametric methods

Degree of allele sharing among affected relatives, most commonly sibs

Sibs share 0,1 and 2 alleles at 25%, 50% and 25%

Is there distortion in allele sharing? Model free, less vulnerable to

misclassification Major challenge is power; esp when there is

genetic (locus) heterogeneity!

Common Disorder/Common Variant Linkage Studies in ASD

Many genome wide linkage studies using affected sib pairs (using non-parametric methods)

Each with sample size 50 to 400 Many significant linkage peaks but few are

replicable Conclusion; disorder is so heterogeneous

and effect of common variant so small we need very large sample sizes

Autism Genome ProjectAutism Genome Project Phase IPhase I

Affymetrix 10k SNP genotype dataAffymetrix 10k SNP genotype data Linkage analysis in 1146 multiplex autism Linkage analysis in 1146 multiplex autism

familiesfamilies Initial scan for CNVInitial scan for CNV

Phase IIPhase II Illumina 1M SNP genotype dataIllumina 1M SNP genotype data High-resolution scan for High-resolution scan for de novode novo and inherited and inherited

CNVCNV Genome-wide association analysisGenome-wide association analysis Molecular studies of candidate lociMolecular studies of candidate loci

Linkage Peaks Stratified by Sex

Problems with Linkage for Complex Disorders

Very sensitive to locus heterogeneity Low power for loci of small to moderate

effects Very sensitive to misclassification of

phenotype Turn to GWAS; much greater power than

linkage for alleles of small effect

Genome Wide Association Studies (GWAS)

1 Million genetic markers (SNP’s are biallelic markers)

Which markers in which genes are more common in children with ASD than expected? Trio based or case-control

Are those markers located in genes (or in LD with genes) that are expressed in brain?

GWAS

Very successful if MAF>5% 500 SNP’s (genetic markers) associated with

many common diseases Eg Type 2 diabetes; 5000 cases and 5000

controls 18 SNP’s associated with type 2 diabetes

(OR=1.09 to 1.37) Explain 6% of the heritability Actual causal variant not discovered

GWAS

Wang et al (2009); cadherin genes at 5p14

Ma et al (2009); also at 5p14 but only in secondary analysis

Weiss et al (2009) 5p15 at SEMA5A

Anney et al (2010) MACROD2

All Ancestry − Autism Dx − Additive Model

All Ancestry − ASD Dx − Additive Model

MACROD2

MACROD2

Bottom Line of GWAS?

One SNP barely reaches GWS No subtype or ASD quantitative trait reaches

GWS (especially if correct for multiple testing) None of the other results can be replicated But beware of the “Winner’s Curse”! GWAS very sensitive to allele frequency and

allelic heterogeneity

Power curves

1.2

1.4

1.6

1.8

2.0

OddsRatio

Power

Risk allele frequency

1.2

1.4

1.6- 2.0

Largest sample evaluated in Stage 1N = 1385 ASD subjects

OddsRatio

Power

Risk allele frequency

The Argument for the Common Variant Model

We should be studying more “familial cases” We should be using intermediate phenotypes,

quantitative traits We should be looking at gene X gene, gene X

environment interactions We should be looking at parent of origin effects We should ignore p-values and instead rank order

SNP’s All true, next generation of GWAS

The Argument Against the Common Variant Model

ASD is associated with reduced fertility New variants must arise de novo that are risk

factors to keep prevalence stable If they are new they are rare Each person carries on average 175 de novo

mutations, deletions, duplications that are mostly benign

If a deleterious variant occurs in a brain expressed gene? Might cause ASD

Is ASD a Common Disease/Rare Variant?

ASD a disorder with reduced fertility De novo mechanisms of causation (like a

spontaneous mutation) These will necessarily be rare until they

diffuse thru the population

What is a Rare Event?

Frequency of risk factor<1% Variation in DNA sequence that affects

protein coding SNP; biallelic marker (by itself or in LD with a

DNA sequence) Structural variant; chromosomal abnormality

(ie a CNV, insertions, duplications, translocations etc)

But they might have a big effect size

Slide courtesy of Dr. C. Marshall

The Boston Underground; de Vries Nature Medicine 15(8) August 2009

What are Copy Number Variants (CNV’s)?

Variations in DNA segments >1kb

Deletions, insertions, duplications, others

Rare or common; inherited from parents or arise de novo?

If CNV overlaps a gene expressed in brain, AND it disrupts the function of that gene, it could lead to ASD

“CNV refers to DNA segments for which copy number differences have been observed in the comparison of two or

more genomes”

Lee and Scherer, Expert Reviews in Mol. Med. 2010

DeletionDeletion

Duplication

Copy Number Variation (CNV)Copy Number Variation (CNV)

Slide courtesy of Dr. C. Marshall

Slide courtesy of Julie Cohen, ScM, CGC, Kennedy Krieger Institute

Copy Number Variations (CNVs)

• We all have them!• Most of them do not

harm us• Most of them we

inherited from our parents

Rare Variants in ASD

What is the evidence that rare variants, as measured by CNV’s, play a role in ASD?

Simple comparison of “global burden” of brain expressed CNVs or previously implicated CNVs in ASD vs controls

Autism Genome Project

Collaboration of 13 research groups

Pooling of families (1500 families)

Common genotyping (1M SNP’s) and clinical measures (ADI/ADOS) for all affected sib pairs

Funded by Autism Speaks, CIHR, Genome Canada, UK MRC, HRB (Ireland)

Global burden for rare CNVs in cases vs. controls

PLINK v. 1.07, genome-wide P values, one-sided tests, 100,000 permutations*Pcorr, controlled for global case-control differences, logistic regression

3 measures:• CNV rate• Estimated size• CNV location and # of genes affected

*

48

CNV burden in known ASD and/or ID genes

Enrichment of genic-CNVs in known ASD and ID loci (1.69 fold, P= 3.4 x 10-4)

n=46 n=127 n=103

Genes in which CNV’s have been replicated

Neuroligin 3 and 4 Neurexin Shank2 and Shank3 Contactin associated protein 2 PTCHD1 Large region on chromosome 16p11 New ones reported each week! Each one seen in <1% of cases Range of effects; linked in common networks,

Walsh C.A., Morrow E.M. and Rubenstein J.L (2008) Cell 135, October 31, 2008

ASD and ID risk genes may be linked in a connected pathway

Functional Enrichment Gene-set Map for ASD

de novo CNVdup 8p23.3, 791kb, disrupts DLGAP2

maternal missense mutation Xp21.3, IL1RAPL1 (A117S, 349G>T)1/325 cases; 0/250 controls

T / G

-- / T

G / --

-- / G

5290

Familial segregation - examples

5444

2 adjacent 17q25.3 de novo CNVs

de novo del 17q25.3: SLC16A3, CSNK1Dde novo dup 17q25.3, 829Kb, 37 genes

*In red if there is previous evidence suggesting gene involvement in ASD or ID

121kb del

5298

121kb del121kb del

maternal Xp22.11 del in malesDDX53/ PTCHD1AS (non-coding RNA for PTCHD1)

829 kb dup64 Kb del 791 kb dup

54

MM0088 – MPX family. Proband has 676 kb de novo loss at 16p11.2

SK0019 – SPX family. Proband has 676 kb de novo loss at 16p11.2

SK0102 – SPX family. Proband has 432 kb de novo gain at 16p11.2

676 kb loss 432 kb gain

MM0088 SK0102

676 kb loss

SK0019

MPX #62346:De novo 1.2 Mb deletion at 3p25.1,3.4 Mb deletion at 5p15, t(5;7)(p15;p13)

III. What does a de novo change mean in a complex disorder?

PDD ADdel 3p25.1del 5p15

SPX #HSC0215:De novo 1 Mb deletion at 1p21.3Inherited t(19;21)(p13.q22.1)

ADdel 1p21.3

t(19;21)

t(19;21)

t(19;21)

Prefer multiple lines of evidence supporting locus involvement

MM0160/MM1470-72 [SHANK1 deletion]

MM0160_003blood64kb del

MM0160_001blood64kb del

c

c

c c

c

21 3

4 6 7 9 10

14 15

1

11 12

5 8 16

MM0160_002bloodno del

MM0160_005blood64kb del

MM0160_004no del (from old DNA)bloodno del

MM1470_004saliva64kb delAsperger

MM1470_005salivano del

MM1470_003blood64kb del

MM0160_006bloodno del

MM1472_002bloodno del

MM1470_002bloodno del

MM1471_002refused blood collection

? ?

13MM1472_003bloodno del

MM1471_003salivano del

MM0160_007lymphocyte64kb del

MM0160_008lymphocyteno loss

?

CNV’s in ASD

More de novo CNV’s in genes implicated in ASD and ID, OR=1.69; 7% of cases vs 4% of controls

Population attributal risk is 3% Discovered functional networks of genes In ASD, a shift from neurotransmitters to synaptic

genes Same CNV’s seen in ID, epilepsy, ADHD,

schizophrenia, BAD (?)

Next Generation of Studies

Search for rare inherited variants thru linkage CNV’s smaller than 1KB More complicated structural rearrangements Whole exome and whole genome sequencing Current efforts at WGS in ASD identifying

variants in another 10-15%? Rare mutations common in unaffected controls

as well

Challenges

Annotation of functional significance of variants

Determination of “causation” when risk factor is rare and disorder is multifactorial

Are the health benefits of identifying rare genetic variants worth the cost? Diagnostics and therapeutics?

Heterogeneity is the main obstacle

Recent findings from WGS

Rare variants are common; due to populaiton overgrowth and weak purifying selection

Most SNV in the genome are rare >90% of SNVs detected to be functionally

relevant were rare But it will take huge sample sizes to detect

the majority of rare variants involved in disease mechanisms.

A final twist!

A final twist!

A final twist!

Conclusions

Data a mix of many genetic subgroups

Conclusions

Draw a sample

May get lucky and catchlots of the "orange" type

Conclusions

Draw a new sample= reshuffling the mix

And now the disorderlooks green

conclusions

ASD is a complex genetic disorder with more complexity than previously imagined

Many rare, de novo, variants account for an increasing proportion of cases

Low hanging fruit in ASD genetics Common vs rare variant models an

oversimplification Many unanswered questions remain