Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic...

Jianfeng Xu, M.D., Dr.PH

Professor of Public Health and Cancer BiologyDirector, Program for Genetic and Molecular Epidemiology of Cancer

Associate Director, Center for Human GenomicsWake Forest University School of Medicine

GWA ─ promising but challenging

Outline

The need for genome-wide association studies

The reality of genome-wide association studies

Important issues in genome-wide association

studies Genome coverage

Strategies for pre-association analysis

Strategies for association analysis

Sample size and false positives (Type I and II errors)

Confirmation in independent study populations

Increase the magnitude of effects of a specific gene

The need for GWA

Current understanding of disease etiology is limited

Therefore, candidate genes or pathways are insufficient

Current understanding of functional variants is limited Therefore, the focusing on nonsynonymous changes is not sufficient

Results from linkage studies are often inconsistent and broad Therefore, the utility of identified linkage regions is limited

GWA studies offer an effective and objective approach Better chance to identify disease associated variants

Improve understanding of disease etiology

Improve ability to test gene-gene interaction and predict disease risk

GWA is promising

Many diseases and traits are influenced by genetic factors i.e., they are caused by sequence variants in the genome

Over 6 millions SNPs are known in the genome i.e., some SNPs will be directly or indirectly associated with causal

variants

The cost of SNP Genotyping is reduced i.e., it is affordable to genotype a large number of SNPs in the genome

Large numbers of cases and controls are available i.e., there is statistical power to detect variants with modest effect

When the above conditions are met… …associated SNPs will have different frequencies between cases and

controls

GWA is challenging

Many diseases and traits are influenced by genetic factors But probably due to multiple modest risk variants

They confer a stronger risk when they interact

True associated SNPs are not necessary highly significant

Too many SNPs are evaluated False positives due to multiple tests

Single studies tend to be underpowered False negatives

Considerable heterogeneity among studies Phenotypic and genetic heterogeneity

False positives due to population stratification

Reality of GWA

AMD, IBD, T1D, etc.

Parkinson’s, nicotine dependence, T2D, etc.

Prostate cancer, breast cancer, and other ongoing studies

Heart diseases, lung diseases, psychiatric diseases, inflammatorydiseases, cancers, and many other studies that are in planning stages

Important issues in genome-wide association studies

Genome coverage






Genome coverage

Two major platforms for GWA Illumina: HumanHap300, HumanHap550, and HumanHap1M

Affymetrix: GeneChip 100K, 500K, and 1M

Genome-wide coverage The percentage of known SNPs in the genome that are in LD with the

genotyped SNPs Calculated based on HapMap Calculated based on ENCODE

Genome coverage

Genome-wide coverage

Genome coverage of common SNPs (MAF ≥ 0.05)

Genome coverage of rare SNPs

Genome coverage using multi-markers

Pe’er, 2006

Genome coverage

Genome coverage for common SNPs (MAF ≥ 0.05)

Pe’er, 2006

Genome coverage

Genome coverage for common SNPs (MAF ≥ 0.05)

Genome coverage for common and rare SNPs

Pe’er, 2006

Genome coverage

Genome coverage of common SNPs (MAF ≥ 0.05)

Genome coverage of common and rare SNPs

Genome coverage using multi-markers

Pe’er, 2006


Genome coverage







Quality control Filter SNPs by genotype call rates

Filter SNPs by minor allele frequencies

Filter SNPs by testing for Hardy-Weinberg Equilibrium


Quality control

Quantile-quantile plot (Q-Q plot) Evaluate whether there is an upward bias in association tests

Q-Q plot

Clayton, 2006

Adjust for stratification

Filter by call rateAll SNPs


Quality control

Quantile-quantile plot (Q-Q plot)

Population stratification Genomic control

Correct for stratification by adjusting association statistics at each SNP by a uniform overall inflation factor

Is susceptible to over or under adjustment


Quality control



Structure (STRUCTURE) Used to assign the samples to discrete subpopulation clusters and then

aggregate evidence of association within each cluster Estimate individual proportion of ancestry and treat it as a covariate Computationally intensive when there are a large number of AIMs


Quality control



Structure (STRUCTURE)

Principal component analysis (EIGENSTRAT) Identify several eigenvectors (ancestries or geographic regions) Adjust genotypes and phenotypes along each eigenvector Compute association statistics using adjusted genotypes and phenotypes No need for AIMs


Genome coverage







Single SNP analysis using pre-specified genetic models 2 x 3 table (2-df)

Additive model (1-df), and test for additivity

All possible genetic models


Single SNP analysis using pre-specified genetic models

Haplotype analysis Two-marker and three-marker slide

Multi-marker

Within haplotype block

Between two recombination hot spots


Single SNP analysis using pre-specified genetic models

Haplotype analysis

Gene-gene and gene-environment interactions Interaction with main effect

Logistic regression

Interaction without main effect: data mining Classification and recursive tree (CART) Multifactor Dimensionality Reduction (MDR)


Genome coverage






Sample size and false positives

Estimate sample size

Sample size OR MAF Type I error Power

Quanto Effective sample

size



False positives: too many dependent tests

Adjust for number of tests Bonferroni correction

Nominal significance level = study-wide significance / number of tests

Nominal significance level = 0.05/500,000 = 10-7

Effective number of tests

Take LD into account

Permutation procedure

Permute case-control status

Mimic the actual analyses

Obtain empirical distribution of maximum test statistic under null hypothesis




Adjust for number of tests

False discovery rate (FDR)

Expected proportion of false discoveries among all discoveries

Offers more power than Bonferroni

Holds under weak dependence of the tests




Adjust for number of tests

False discovery rate (FDR)

Bayesian approach

Taking a priori into account, False-Positive Report Probability

(FPRP)


Genome coverage







The above approaches may limit the number of false positives

Confirmation is needed to dissect true from false positives Replication, examine the results from the 2nd stage only

Joint analysis, combining data from 1st stage with 2nd stage

Multiple stages

Replication vs. joint analysis

Skol, 2006

Multiple stages

1st stage

# of true sig. SNPs (80% power)

# of total sig. SNPs ( = 0.01)

# of Risk SNPs

# of SNPs tested

% of true sig. SNPs

20

500,000

16

5,016

0.38%

2nd stage

16

5,016

13

63

21%

3rd stage

13

63

10

10

100%


Genome coverage







Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotype, e.g. aggressive or early onset

Study aggressive cases

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

Iceland Sweden Chicago CGEMS JHU

MA

F o

f rs

1447

295

Controls

Low grade

High grade


Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset

Cases with family history

Study cases with family history

Antoniou and Easton, 2003




Controls that are disease free

Disease free controls

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.14

0.16

0.18

0.20

Iceland Sweden Chicago CGEMS JHU

MA

F o

f rs

1447

295

Controls

Low grade

High grade




Controls that are disease free

Increase their effects by studying a homogeneous population Lower levels of genetic heterogeneity

Summary

GWA studies are promising but difficult

There are many important issues in GWA

The impact of these issues can be minimized by a well-

designed study

Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic...

Documents

Transcript of Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic...