Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic...

41
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director, Center for Human Genomics Wake Forest University School of Medicine GWA ─ promising but challenging

Transcript of Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic...

Page 1: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Jianfeng Xu, M.D., Dr.PH

Professor of Public Health and Cancer BiologyDirector, Program for Genetic and Molecular Epidemiology of Cancer

Associate Director, Center for Human GenomicsWake Forest University School of Medicine

GWA ─ promising but challenging

Page 2: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Outline

The need for genome-wide association studies

The reality of genome-wide association studies

Important issues in genome-wide association

studies Genome coverage

Strategies for pre-association analysis

Strategies for association analysis

Sample size and false positives (Type I and II errors)

Confirmation in independent study populations

Increase the magnitude of effects of a specific gene

Page 3: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

The need for GWA

Current understanding of disease etiology is limited

Therefore, candidate genes or pathways are insufficient

Current understanding of functional variants is limited Therefore, the focusing on nonsynonymous changes is not sufficient

Results from linkage studies are often inconsistent and broad Therefore, the utility of identified linkage regions is limited

GWA studies offer an effective and objective approach Better chance to identify disease associated variants

Improve understanding of disease etiology

Improve ability to test gene-gene interaction and predict disease risk

Page 4: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

GWA is promising

Many diseases and traits are influenced by genetic factors i.e., they are caused by sequence variants in the genome

Over 6 millions SNPs are known in the genome i.e., some SNPs will be directly or indirectly associated with causal

variants

The cost of SNP Genotyping is reduced i.e., it is affordable to genotype a large number of SNPs in the genome

Large numbers of cases and controls are available i.e., there is statistical power to detect variants with modest effect

When the above conditions are met… …associated SNPs will have different frequencies between cases and

controls

Page 5: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

GWA is challenging

Many diseases and traits are influenced by genetic factors But probably due to multiple modest risk variants

They confer a stronger risk when they interact

True associated SNPs are not necessary highly significant

Too many SNPs are evaluated False positives due to multiple tests

Single studies tend to be underpowered False negatives

Considerable heterogeneity among studies Phenotypic and genetic heterogeneity

False positives due to population stratification

Page 6: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Reality of GWA

AMD, IBD, T1D, etc.

Parkinson’s, nicotine dependence, T2D, etc.

Prostate cancer, breast cancer, and other ongoing studies

Heart diseases, lung diseases, psychiatric diseases, inflammatorydiseases, cancers, and many other studies that are in planning stages

Page 7: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Important issues in genome-wide association studies

Genome coverage

Strategies for pre-association analysis

Strategies for association analysis

Sample size and false positives (Type I and II errors)

Confirmation in independent study populations

Increase the magnitude of effects of a specific gene

Page 8: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Genome coverage

Two major platforms for GWA Illumina: HumanHap300, HumanHap550, and HumanHap1M

Affymetrix: GeneChip 100K, 500K, and 1M

Genome-wide coverage The percentage of known SNPs in the genome that are in LD with the

genotyped SNPs Calculated based on HapMap Calculated based on ENCODE

Page 9: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Genome coverage

Genome-wide coverage

Genome coverage of common SNPs (MAF ≥ 0.05)

Genome coverage of rare SNPs

Genome coverage using multi-markers

Pe’er, 2006

Page 10: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Genome coverage

Genome coverage for common SNPs (MAF ≥ 0.05)

Pe’er, 2006

Page 11: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Genome coverage

Genome coverage for common SNPs (MAF ≥ 0.05)

Genome coverage for common and rare SNPs

Pe’er, 2006

Page 12: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Genome coverage

Genome coverage of common SNPs (MAF ≥ 0.05)

Genome coverage of common and rare SNPs

Genome coverage using multi-markers

Pe’er, 2006

Page 13: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Important issues in genome-wide association studies

Genome coverage

Strategies for pre-association analysis

Strategies for association analysis

Sample size and false positives (Type I and II errors)

Confirmation in independent study populations

Increase the magnitude of effects of a specific gene

Page 14: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Strategies for pre-association analysis

Quality control Filter SNPs by genotype call rates

Filter SNPs by minor allele frequencies

Filter SNPs by testing for Hardy-Weinberg Equilibrium

Page 15: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Strategies for pre-association analysis

Quality control

Quantile-quantile plot (Q-Q plot) Evaluate whether there is an upward bias in association tests

Page 16: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Q-Q plot

Clayton, 2006

Adjust for stratification

Filter by call rateAll SNPs

Page 17: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Strategies for pre-association analysis

Quality control

Quantile-quantile plot (Q-Q plot)

Population stratification Genomic control

Correct for stratification by adjusting association statistics at each SNP by a uniform overall inflation factor

Is susceptible to over or under adjustment

Page 18: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Strategies for pre-association analysis

Quality control

Quantile-quantile plot (Q-Q plot)

Population stratification Genomic control

Structure (STRUCTURE) Used to assign the samples to discrete subpopulation clusters and then

aggregate evidence of association within each cluster Estimate individual proportion of ancestry and treat it as a covariate Computationally intensive when there are a large number of AIMs

Page 19: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Strategies for pre-association analysis

Quality control

Quantile-quantile plot (Q-Q plot)

Population stratification Genomic control

Structure (STRUCTURE)

Principal component analysis (EIGENSTRAT) Identify several eigenvectors (ancestries or geographic regions) Adjust genotypes and phenotypes along each eigenvector Compute association statistics using adjusted genotypes and phenotypes No need for AIMs

Page 20: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Important issues in genome-wide association studies

Genome coverage

Strategies for pre-association analysis

Strategies for association analysis

Sample size and false positives (Type I and II errors)

Confirmation in independent study populations

Increase the magnitude of effects of a specific gene

Page 21: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Strategies for association analysis

Single SNP analysis using pre-specified genetic models 2 x 3 table (2-df)

Additive model (1-df), and test for additivity

All possible genetic models

Page 22: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Strategies for association analysis

Single SNP analysis using pre-specified genetic models

Haplotype analysis Two-marker and three-marker slide

Multi-marker

Within haplotype block

Between two recombination hot spots

Page 23: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Strategies for association analysis

Single SNP analysis using pre-specified genetic models

Haplotype analysis

Gene-gene and gene-environment interactions Interaction with main effect

Logistic regression

Interaction without main effect: data mining Classification and recursive tree (CART) Multifactor Dimensionality Reduction (MDR)

Page 24: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Important issues in genome-wide association studies

Genome coverage

Strategies for pre-association analysis

Strategies for association analysis

Sample size and false positives (Type I and II errors)

Confirmation in independent study populations

Increase the magnitude of effects of a specific gene

Page 25: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Sample size and false positives

Estimate sample size

Sample size OR MAF Type I error Power

Quanto Effective sample

size

Page 26: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Sample size and false positives

Estimate sample size

False positives: too many dependent tests

Adjust for number of tests Bonferroni correction

Nominal significance level = study-wide significance / number of tests

Nominal significance level = 0.05/500,000 = 10-7

Effective number of tests

Take LD into account

Permutation procedure

Permute case-control status

Mimic the actual analyses

Obtain empirical distribution of maximum test statistic under null hypothesis

Page 27: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Sample size and false positives

Estimate sample size

False positives: too many dependent tests

Adjust for number of tests

False discovery rate (FDR)

Expected proportion of false discoveries among all discoveries

Offers more power than Bonferroni

Holds under weak dependence of the tests

Page 28: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Sample size and false positives

Estimate sample size

False positives: too many dependent tests

Adjust for number of tests

False discovery rate (FDR)

Bayesian approach

Taking a priori into account, False-Positive Report Probability

(FPRP)

Page 29: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Important issues in genome-wide association studies

Genome coverage

Strategies for pre-association analysis

Strategies for association analysis

Sample size and false positives (Type I and II errors)

Confirmation in independent study populations

Increase the magnitude of effects of a specific gene

Page 30: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Confirmation in independent study populations

The above approaches may limit the number of false positives

Confirmation is needed to dissect true from false positives Replication, examine the results from the 2nd stage only

Joint analysis, combining data from 1st stage with 2nd stage

Multiple stages

Page 31: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Replication vs. joint analysis

Skol, 2006

Page 32: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Multiple stages

1st stage

# of true sig. SNPs (80% power)

# of total sig. SNPs ( = 0.01)

# of Risk SNPs

# of SNPs tested

% of true sig. SNPs

20

500,000

16

5,016

0.38%

2nd stage

16

5,016

13

63

21%

3rd stage

13

63

10

10

100%

Page 33: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Important issues in genome-wide association studies

Genome coverage

Strategies for pre-association analysis

Strategies for association analysis

Sample size and false positives (Type I and II errors)

Confirmation in independent study populations

Increase the magnitude of effects of a specific gene

Page 34: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Increase the magnitude of effects of a specific gene

Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotype, e.g. aggressive or early onset

Page 35: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Study aggressive cases

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

Iceland Sweden Chicago CGEMS JHU

MA

F o

f rs

1447

295

Controls

Low grade

High grade

Page 36: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Increase the magnitude of effects of a specific gene

Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset

Cases with family history

Page 37: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Study cases with family history

Antoniou and Easton, 2003

Page 38: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Increase the magnitude of effects of a specific gene

Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset

Cases with family history

Controls that are disease free

Page 39: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Disease free controls

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

Iceland Sweden Chicago CGEMS JHU

MA

F o

f rs

1447

295

Controls

Low grade

High grade

Page 40: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Increase the magnitude of effects of a specific gene

Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset

Cases with family history

Controls that are disease free

Increase their effects by studying a homogeneous population Lower levels of genetic heterogeneity

Page 41: Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic and Molecular Epidemiology of Cancer Associate Director,

Summary

GWA studies are promising but difficult

There are many important issues in GWA

The impact of these issues can be minimized by a well-

designed study