Rare variant analysis in large-scale association and sequencing studies

28
Rare variant analysis in large-scale association and sequencing studies Eleftheria Zeggini [email protected]

description

Rare variant analysis in large-scale association and sequencing studies. Eleftheria Zeggini [email protected]. Missing heritability in complex traits . Interactions Structural variation Epigenetics and environment Thousands of very small effects - PowerPoint PPT Presentation

Transcript of Rare variant analysis in large-scale association and sequencing studies

Page 1: Rare variant analysis in large-scale association and sequencing studies

Rare variant analysis in large-scale association and sequencing studies

Eleftheria [email protected]

Page 2: Rare variant analysis in large-scale association and sequencing studies

Missing heritability in complex traits Interactions

Structural variation

Epigenetics and environment

Thousands of very small effects

Large phenotype-genotype heterogeneity

Locus heterogeneity and rare variants

Page 3: Rare variant analysis in large-scale association and sequencing studies

Low frequency and rare variants Low frequency (0.01<MAF<0.05) and rare variation

(MAF<0.01) can contribute to complex common phenotypes

Rare variants can have higher penetrance, contribute to more extreme phenotypes and may be more useful as predictive markers

Accessing low frequency and rare variants through:– GWAS– imputation– re-sequencing

Page 4: Rare variant analysis in large-scale association and sequencing studies

Rare variant analysis Single-point analysis of rare variants is under-powered

Approximate sample sizes (cases+controls, equally sized) required to attain 80% power to detect an allelic OR=2.0 at α=5×10−8 dramatically increases as MAF decreases:

An alternative is to use multivariate methods to combine information across multiple variant sites

Several locus-specific approaches have been proposed– collapsing methods– allele-matching methods

MAF Sample size0.05 2,5000.01 12,000

0.001 117,000

Page 5: Rare variant analysis in large-scale association and sequencing studies

Rare variant analysis methods: challenges

ImputationGenotype-associated probabilities

ResequencingGenotype call uncertaintyFalse positive rate

Probability that a variant be functional

Family-based designs

Extreme distribution ends designs

Incorporating multiple covariates

Correlation structure

Direction of effect

Meta-analysis

Page 6: Rare variant analysis in large-scale association and sequencing studies

Collapsing methods

pi

0.2

0.1

0.0

0.2

iii

ii m

ry βx

ARIEL: Accumulation of Rare variants Integrated and Extended Locus-specific test

Page 7: Rare variant analysis in large-scale association and sequencing studies

Allele-matching methods

Extended to account for uncertainty: AMELIA

(Allele-Matching Empirical Locus Integrated Association test)

cases

controls

Compare similarity scores between cases and controls at each SNP, then sum over SNPs: KBAT

2 4 4 4 4 2

2 4 0 4 4 4

Mukhopadhyay et al, Gen Epi 2009

Page 8: Rare variant analysis in large-scale association and sequencing studies

Power comparison

1000 replications, d=0.02, Q=0.05,

non-consensus SNP quality scores, 1000 cases/1000 controls,

causal variants are of high quality (phred score 10; probability of

correct base-call 0.90)

• in the presence of different directions of effect allele-matching methods are much more powerful than collapsing methods

• accounting for uncertainty increases power

Page 9: Rare variant analysis in large-scale association and sequencing studies

Power comparisons using 500 cases/500 controls and 1000 cases/1000 controls, when causal variants are of high quality (phred score 10; probability of correct base-call 0.90)

• the power of the allele-matching methods further increase over the collapsing methods with increasing sample size

• accounting for uncertainty increases power

Page 10: Rare variant analysis in large-scale association and sequencing studies

Population isolates• The study of rare variants can be empowered by

focusing on isolated populations, in which rare variants may have increased in frequency and linkage disequilibrium tends to be extended

• Need deeply-phenotyped isolated population samples

• Whole-genome sequencing in a subset of samples and imputation out into the full set of GWASed samples

• Association with traits of interest

Page 11: Rare variant analysis in large-scale association and sequencing studies
Page 12: Rare variant analysis in large-scale association and sequencing studies

Analysis of rare variants in 1000 genomes-imputed data

Page 13: Rare variant analysis in large-scale association and sequencing studies

Osteoarthritis• Osteoarthritis (OA) is characterised by cartilage degeneration in

synovial joints leading to pain and loss of function particularly in the hip and the knee

• OA is a common complex disease with environmental and genetic components affecting 40% of people over the age of 70 years

• Current treatments: analgesics, total joint replacement (TJR)

• To date only two loci have been robustly associated with OA

• Common variants (>0.20 MAF) small effect sizes(OR~1.15)

Page 14: Rare variant analysis in large-scale association and sequencing studies

Directly typed SNPs (Illumina 610k) Imputed SNPs: HapMap

Imputed SNPs: 1000 genomes

3,177 cases4,854 controls

Page 15: Rare variant analysis in large-scale association and sequencing studies

Directly-typed

Page 16: Rare variant analysis in large-scale association and sequencing studies

Directly-typed

HapMap-basedImputation

Page 17: Rare variant analysis in large-scale association and sequencing studies

Directly-typed

HapMap-basedImputation

1KGP-basedImputation

Page 18: Rare variant analysis in large-scale association and sequencing studies

Study NumberCases

NumberControls

EffectAllele

MAF OR(95% CIa)

P value

arcOGEN GWAS 3177 4894 A 0.0718 1.32(1.16-1.50)

1.67x10-5

arcOGEN replication set 1

5165 6155 A 0.0694 1.17(1.06-1.30)

2.60x10-3

GOAL 1686 743 A 0.0720 1.23(0.99-1.56)

7.20x10-2

arcOGEN replication set 2

2409 2319 A 0.0636 1.16(0.98-1.37)

7.86x10-2

deCODE 1552 3071 A 0.0917 1.03(0.88-1.20)

7.31x10-1

EGCUT 2617 2619 A 0.0769 1.16(1.01-1.34)

4.01x10-2

RSI 1950 3243 G 0.0608 1.01(0.86-1.20)

8.61x10-1

RSII 485 1460 A 0.0715 1.46(1.07-2.00)

1.68x10-2

Meta-analysis 19041 24504 A 1.17(1.11-1.23)

2.07x10-8

Page 19: Rare variant analysis in large-scale association and sequencing studies

Intron 4 of the guanine nucleotide exchange factor-encoding gene MCF2L

Mcf2l studies in rat models of OA have shown expression in articular chondrocytes

In human cells MCF2L regulates neurotrophin-3 induced cell migration in Schwann cells. Neurotrophin-3 is a member of the nerve growth factor (NGF) family, and inhibition of NGF has an effect on the pain experienced by OA patients

Page 20: Rare variant analysis in large-scale association and sequencing studies

PE sequencingPE library preparation

Long-range PCR

Pulldown

Data processing and statistical analysis

Analysis of rare variants in sequence data

Targeted resequencingWhole-genome and whole-exome resequencing

Page 21: Rare variant analysis in large-scale association and sequencing studies

500 Exomes Project– Collaborative exome resequencing experiment

between the Sanger Institute, GSK and Lausanne University

– Study design:– 500 individuals from the CoLaus cohort with BMI>25– 250 with type 2 diabetes and 250 normoglycaemic matched controls

– Affymetrix 500k GWAS data

– Exome sequencing

– Mean depth ~65x

Page 22: Rare variant analysis in large-scale association and sequencing studies

500 Exomes Project –preliminary dataNumber of cases 195Number of controls 166Number of transcripts analyzed 14,924

Single-point ARIEL AMELIA

Page 23: Rare variant analysis in large-scale association and sequencing studies

UK10K projectRare genetic variants in health and disease

4,000 whole genomes: population-based cohorts with rich phenotype data6,000 whole exomes: obesity, neurodevelopmental disorders and further rare diseases

Aims• Elucidate singleton variants by maximising variation detected• Directly associate genetic variations to phenotypic traits• Uncover rare variants contributing to disease• Assign uncovered variations into genotyped cohort and case/control collections• Provide a sequence variation resource for future studies

www.uk10k.org

Page 24: Rare variant analysis in large-scale association and sequencing studies

Andrew MorrisJenn Asimit

Reedik Magi

Acknowledgements

Page 25: Rare variant analysis in large-scale association and sequencing studies

Acknowledgements

A.G. Day-Williams, L. Southam, K. Panoutsopoulou, N.W. Rayner, T. Esko, K. Estrada, H.T. Helgadottir, A. Hofman, T. Ingvarsson, H. Jonsson, A. Keis, H.J.M. Kerkhof, G. Thorleifsson, N.K. Arden, A. Carr, K. Chapman, P. Deloukas, J. Loughlin, A. McCaskie, W.E.R. Ollier, S.H. Ralston, T.D. Spector, G.A. Wallis, J.M. Wilkinson, N. Aslam, F. Birell, I. Carluke, J. Joseph, A. Rai, M. Reed, K. Walker, S.A. Doherty, I. Jonsdottir, R.A. Maciewicz, K.R. Muir, A. Metspalu, F. Rivadeneira, K. Stefansson, U. Styrkarsodottir , A.G. Uitterlinden, J.B.J. van Meurs, W. Zhang, A.M. Valdes, M. Doherty, arcOGEN Consortium

Page 26: Rare variant analysis in large-scale association and sequencing studies

500 Exomes ProjectA partnership between the Wellcome Trust Sanger Institute, the CoLaus principal investigators and the Quantitative Sciences dept. of GlaxoSmithKline

GSK: Lausanne:Vincent Mooser Peter Vollenweider John Whittaker Gerard WaeberLinda McCarthy Jacques BeckmannMatt Nelson Sven BergmannClaudio Verzilli Pedro Marques VidalJudong Shen Murielle BochudStephanie Chissoe Zoltan KutalikCharles CoxMeg EhmKeith NangleDana FraserKijoung SongPeter WoollardDawn Waterworth

Wellcome Trust Sanger Institute:Jennifer AsimitInes BarrosoCaren BrockingtonYuan ChenAaron Day-WilliamsRichard DurbinMartin HuntSarah HuntMatt HurlesJimmy LiuMargarida LopesDaniel MacArthurAarno PalotieTheo PapamarkouFliss PayneManj SandhuCarol ScottLorraine SouthamIoanna TachmazidouChris Tyler-SmithEllie WheelerBendik WinsvoldYali XueEleftheria Zeggini

Page 27: Rare variant analysis in large-scale association and sequencing studies

Principal ApplicantsLeena Peltonen, Wellcome Trust Sanger Institute Richard Durbin, Wellcome Trust Sanger Institute

Co-applicantsJeffrey Barrett, Wellcome Trust Sanger Institute Ines Barroso, Wellcome Trust Sanger Institute George Davey-Smith, University of Bristol Ismaa Sadaf Farooqi, University of Cambridge Matthew Hurles, Wellcome Trust Sanger Institute Stephen O'Rahilly, University of Cambridge Aarno Palotie, Wellcome Trust Sanger Institute Nicole Soranzo, Wellcome Trust Sanger Institute Tim Spector, King's College London Eleftheria Zeggini, Wellcome Trust Sanger Institute

Named collaboratorsPhil Beales, University College London Jamie Bentham, University of Oxford Shoumo Bhattacharya, University of Oxford Patrick Bolton, King's College London Gerome Breen, King's College London Krishnan Chatterjee, University of Cambridge Laura K Curran, King's College London Anne Farmer, King's College London David Fitzpatrick, Edinburgh University Daniel Geschwind, UCLA, USA Steve Humphries, University College London Jouko Lonnqvist, National Public Health Institute, Finland Peter McGuffin, King's College London Lucy Raymond, University of Cambridge David Savage, University of Cambridge Peter Scambler, University College London Robert Semple, University of Cambridge David St Clair, University of Aberdeen Lennart von Wendt, University of Helsinki, Finland

Page 28: Rare variant analysis in large-scale association and sequencing studies

Supported by the Wellcome Trust, Arthritis Research UK, Pfizer