A&T Financial Services, Inc. and Jianfeng AAO & District Court Dismissals 2013-2015
Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic...
-
Upload
florence-cummings -
Category
Documents
-
view
218 -
download
0
Transcript of Jianfeng Xu, M.D., Dr.PH Professor of Public Health and Cancer Biology Director, Program for Genetic...
Jianfeng Xu, M.D., Dr.PH
Professor of Public Health and Cancer BiologyDirector, Program for Genetic and Molecular Epidemiology of Cancer
Associate Director, Center for Human GenomicsWake Forest University School of Medicine
GWA ─ promising but challenging
Outline
The need for genome-wide association studies
The reality of genome-wide association studies
Important issues in genome-wide association
studies Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
The need for GWA
Current understanding of disease etiology is limited
Therefore, candidate genes or pathways are insufficient
Current understanding of functional variants is limited Therefore, the focusing on nonsynonymous changes is not sufficient
Results from linkage studies are often inconsistent and broad Therefore, the utility of identified linkage regions is limited
GWA studies offer an effective and objective approach Better chance to identify disease associated variants
Improve understanding of disease etiology
Improve ability to test gene-gene interaction and predict disease risk
GWA is promising
Many diseases and traits are influenced by genetic factors i.e., they are caused by sequence variants in the genome
Over 6 millions SNPs are known in the genome i.e., some SNPs will be directly or indirectly associated with causal
variants
The cost of SNP Genotyping is reduced i.e., it is affordable to genotype a large number of SNPs in the genome
Large numbers of cases and controls are available i.e., there is statistical power to detect variants with modest effect
When the above conditions are met… …associated SNPs will have different frequencies between cases and
controls
GWA is challenging
Many diseases and traits are influenced by genetic factors But probably due to multiple modest risk variants
They confer a stronger risk when they interact
True associated SNPs are not necessary highly significant
Too many SNPs are evaluated False positives due to multiple tests
Single studies tend to be underpowered False negatives
Considerable heterogeneity among studies Phenotypic and genetic heterogeneity
False positives due to population stratification
Reality of GWA
AMD, IBD, T1D, etc.
Parkinson’s, nicotine dependence, T2D, etc.
Prostate cancer, breast cancer, and other ongoing studies
Heart diseases, lung diseases, psychiatric diseases, inflammatorydiseases, cancers, and many other studies that are in planning stages
Important issues in genome-wide association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Genome coverage
Two major platforms for GWA Illumina: HumanHap300, HumanHap550, and HumanHap1M
Affymetrix: GeneChip 100K, 500K, and 1M
Genome-wide coverage The percentage of known SNPs in the genome that are in LD with the
genotyped SNPs Calculated based on HapMap Calculated based on ENCODE
Genome coverage
Genome-wide coverage
Genome coverage of common SNPs (MAF ≥ 0.05)
Genome coverage of rare SNPs
Genome coverage using multi-markers
Pe’er, 2006
Genome coverage
Genome coverage for common SNPs (MAF ≥ 0.05)
Pe’er, 2006
Genome coverage
Genome coverage for common SNPs (MAF ≥ 0.05)
Genome coverage for common and rare SNPs
Pe’er, 2006
Genome coverage
Genome coverage of common SNPs (MAF ≥ 0.05)
Genome coverage of common and rare SNPs
Genome coverage using multi-markers
Pe’er, 2006
Important issues in genome-wide association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Strategies for pre-association analysis
Quality control Filter SNPs by genotype call rates
Filter SNPs by minor allele frequencies
Filter SNPs by testing for Hardy-Weinberg Equilibrium
Strategies for pre-association analysis
Quality control
Quantile-quantile plot (Q-Q plot) Evaluate whether there is an upward bias in association tests
Q-Q plot
Clayton, 2006
Adjust for stratification
Filter by call rateAll SNPs
Strategies for pre-association analysis
Quality control
Quantile-quantile plot (Q-Q plot)
Population stratification Genomic control
Correct for stratification by adjusting association statistics at each SNP by a uniform overall inflation factor
Is susceptible to over or under adjustment
Strategies for pre-association analysis
Quality control
Quantile-quantile plot (Q-Q plot)
Population stratification Genomic control
Structure (STRUCTURE) Used to assign the samples to discrete subpopulation clusters and then
aggregate evidence of association within each cluster Estimate individual proportion of ancestry and treat it as a covariate Computationally intensive when there are a large number of AIMs
Strategies for pre-association analysis
Quality control
Quantile-quantile plot (Q-Q plot)
Population stratification Genomic control
Structure (STRUCTURE)
Principal component analysis (EIGENSTRAT) Identify several eigenvectors (ancestries or geographic regions) Adjust genotypes and phenotypes along each eigenvector Compute association statistics using adjusted genotypes and phenotypes No need for AIMs
Important issues in genome-wide association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Strategies for association analysis
Single SNP analysis using pre-specified genetic models 2 x 3 table (2-df)
Additive model (1-df), and test for additivity
All possible genetic models
Strategies for association analysis
Single SNP analysis using pre-specified genetic models
Haplotype analysis Two-marker and three-marker slide
Multi-marker
Within haplotype block
Between two recombination hot spots
Strategies for association analysis
Single SNP analysis using pre-specified genetic models
Haplotype analysis
Gene-gene and gene-environment interactions Interaction with main effect
Logistic regression
Interaction without main effect: data mining Classification and recursive tree (CART) Multifactor Dimensionality Reduction (MDR)
Important issues in genome-wide association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Sample size and false positives
Estimate sample size
Sample size OR MAF Type I error Power
Quanto Effective sample
size
Sample size and false positives
Estimate sample size
False positives: too many dependent tests
Adjust for number of tests Bonferroni correction
Nominal significance level = study-wide significance / number of tests
Nominal significance level = 0.05/500,000 = 10-7
Effective number of tests
Take LD into account
Permutation procedure
Permute case-control status
Mimic the actual analyses
Obtain empirical distribution of maximum test statistic under null hypothesis
Sample size and false positives
Estimate sample size
False positives: too many dependent tests
Adjust for number of tests
False discovery rate (FDR)
Expected proportion of false discoveries among all discoveries
Offers more power than Bonferroni
Holds under weak dependence of the tests
Sample size and false positives
Estimate sample size
False positives: too many dependent tests
Adjust for number of tests
False discovery rate (FDR)
Bayesian approach
Taking a priori into account, False-Positive Report Probability
(FPRP)
Important issues in genome-wide association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Confirmation in independent study populations
The above approaches may limit the number of false positives
Confirmation is needed to dissect true from false positives Replication, examine the results from the 2nd stage only
Joint analysis, combining data from 1st stage with 2nd stage
Multiple stages
Replication vs. joint analysis
Skol, 2006
Multiple stages
1st stage
# of true sig. SNPs (80% power)
# of total sig. SNPs ( = 0.01)
# of Risk SNPs
# of SNPs tested
% of true sig. SNPs
20
500,000
16
5,016
0.38%
2nd stage
16
5,016
13
63
21%
3rd stage
13
63
10
10
100%
Important issues in genome-wide association studies
Genome coverage
Strategies for pre-association analysis
Strategies for association analysis
Sample size and false positives (Type I and II errors)
Confirmation in independent study populations
Increase the magnitude of effects of a specific gene
Increase the magnitude of effects of a specific gene
Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotype, e.g. aggressive or early onset
Study aggressive cases
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
Iceland Sweden Chicago CGEMS JHU
MA
F o
f rs
1447
295
Controls
Low grade
High grade
Increase the magnitude of effects of a specific gene
Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset
Cases with family history
Study cases with family history
Antoniou and Easton, 2003
Increase the magnitude of effects of a specific gene
Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset
Cases with family history
Controls that are disease free
Disease free controls
0.00
0.02
0.04
0.06
0.08
0.10
0.12
0.14
0.16
0.18
0.20
Iceland Sweden Chicago CGEMS JHU
MA
F o
f rs
1447
295
Controls
Low grade
High grade
Increase the magnitude of effects of a specific gene
Increase their effects by focusing on a subset of study subjects Cases with a uniform phenotypes, e.g. aggressive or early onset
Cases with family history
Controls that are disease free
Increase their effects by studying a homogeneous population Lower levels of genetic heterogeneity
Summary
GWA studies are promising but difficult
There are many important issues in GWA
The impact of these issues can be minimized by a well-
designed study