Pitfalls in Genetic Association Studies M. Tevfik DORAK Paediatric and Lifecourse Epidemiology...

42
Pitfalls in Genetic Pitfalls in Genetic Association Studies Association Studies M. Tevfik DORAK Paediatric and Lifecourse Epidemiology Research Group Sir James Spence Institute Newcastle University, U.K. Clinical Studies & Objective Medicine Clinical Studies & Objective Medicine Bodrum, 15-16 April 2006 Bodrum, 15-16 April 2006

Transcript of Pitfalls in Genetic Association Studies M. Tevfik DORAK Paediatric and Lifecourse Epidemiology...

Pitfalls in Genetic Pitfalls in Genetic Association StudiesAssociation Studies

M. Tevfik DORAK

Paediatric and Lifecourse Epidemiology Research Group

Sir James Spence Institute

Newcastle University, U.K.

Clinical Studies & Objective MedicineClinical Studies & Objective Medicine

Bodrum, 15-16 April 2006Bodrum, 15-16 April 2006

Incident or prevalent cases Comparable controls or convenience samples

Population based?

Confounding by ethnicity / Population stratificationConfounding by locus / Linkage disequilibrium

No adjustment for known associations Effect modification by sex Different genetic models

Statistical powerMultiple comparisons

Replication / Consistency Publication bias

OverinterpretationLD ruled out?

Biological plausibility (a priori hypothesis)?

Internal Validity of an Association StudyInternal Validity of an Association Study

Avoid "BIAS"

Control for "CONFOUNDING"

Rule out "CHANCE"

Internal Validity of an Association StudyInternal Validity of an Association Study

Avoid "BIAS"

Be careful!

Control for "CONFOUNDING"

Matching, Stratification, MV Analysis

Rule out "CHANCE"

Large sample, Replication

Special Kinds of Confounding in Special Kinds of Confounding in Genetic EpidemiologyGenetic Epidemiology

CONFOUNDING by Locus (LD)

MV Analysis, LD Analysis

CONFOUNDING by Ethnicity (Population Stratification)

Matching, Stratification, MV Analysis

Family-Based Association Studies

Genomic Controls and Specially Designed GE Analysis

Special Kinds of Confounding in Genetic Epidemiology

CONFOUNDING by Locus (LD)

MV Analysis, LD Analysis

CONFOUNDING by Ethnicity (Population Stratification)

Matching, Stratification, MV Analysis

Family-Based Association Studies

Genomic Controls and Specially Designed GE Analysis

Martin, 2000 (www)

Mapping Disease Susceptibility Genes by Association Studies

Plot of minus log of P value for case-control test for allelic association with AD, for SNPs immediately surrounding APOE (<100 kb)

Example: Linkage Disequilibrium

HLA-B47 association with congenital adrenal hyperplasia (Dupont et al, Lancet 1977)

HLA-B14 association with late-onset adrenal hyperplasia (Pollack et al, Am J Hum Genet 1981)

Is congenital adrenal hyperplasia an immune system-mediated disease?

HLA-B47 association with congenital adrenal hyperplasia is due to deletion of CYP21A2 on

HLA-B47DR7 haplotype

HLA-B14 association with late-onset adrenal hyperplasia is due to an exon 7 missense

mutation (V281L) in CYP21A2 on HLA-B14DR1 haplotype

Example: Linkage Disequilibrium

Preliminary evidence of an association between HLA-DPB1*0201 and childhood common ALL supports an infectious aetiology

Leukemia 1995;9(3):440-3

Evidence that an HLA-DQA1-DQB1 haplotype influences susceptibility to childhood common ALL in boys provides further support for an infection-related aetiology

Br J Cancer 1998;78(5):561-5

Why not LD? Why not LD?

Publication Bias Publication Bias

Negative studies do not get published - NIH Genetic Associations Database ?

A different kind of publication bias?

Preliminary evidence of an association between HLA-DPB1*0201 and childhood common ALL supports an infectious aetiology

Leukemia 1995;9(3):440-3

Evidence that an HLA-DQA1-DQB1 haplotype influences susceptibility to childhood common ALL in boys provides further support for an infection-related aetiology

Br J Cancer 1998;78(5):561-5

Special Kinds of Confounding in Special Kinds of Confounding in Genetic EpidemiologyGenetic Epidemiology

CONFOUNDING by Locus (LD)

MV Analysis, LD Analysis

CONFOUNDING by EthnicityCONFOUNDING by Ethnicity (Population Stratification)

Matching, Stratification, MV Analysis

Family-Based Association Studies

Genomic Controls and Specially Designed GE Analysis

Marchini, 2004 (www)

Population Stratification

Cardon & Palmer, 2003 (www)

Internal Validity of an Association Study

Avoid "BIAS"

Be careful!

Control for "CONFOUNDING"

Matching, Stratification, MV Analysis

Rule out "CHANCE"

Large sample, Replication

Example: Functional Correlation

HFE-C282Y Association in Childhood ALL

SCOTTISH GROUP 135 patients - 238 newbornsP = 0.0004; OR = 3.0 (1.7 to 5.4)In cALL: P < 0.0001; OR = 4.7 (2.5 to 8.9)

WELSH GROUP 117 patients - 415 newbornsP = 0.005; OR = 2.8 (1.4 to 5.4) In cALL: P = 0.02; OR = 2.9 (1.4 to 6.4)

%%

Example: Replication

Dorak et al, Blood 1999

Diepstra, Lancet 2005 (www)

Multiple Comparisons & Spurious Associations

Genetic Models and Genetic Models and Case-Control Association Data AnalysisCase-Control Association Data Analysis

The data may also be analysed assuming a prespecified genetic model. For example, with the hypothesis that carrying allele B increased risk of disease (dominant model), the AB and BB genotypes are pooled giving a 2x3x2 table. This is particularly relevant when allele B is rare, with few BB observations in cases and controls. Alternatively, under a recessive model for allele B, cells AA and AB would be pooled. Analysing by alleles provides an alternative perspective for case control data. This breaks down genotypes to compare the total number of A and B alleles in cases and controls, regardless of the genotypes from which these alleles are constructed. This analysis is counter-intuitive, since alleles do not act independently, but it provides the most powerful method of testing under a multiplicative genetic model, where risk of developing a disease increases by a factor r for each B allele carried: risk r for genotype AB and r2 for genotype BB. If a multiplicative genetic model is appropriate, both case and control genotypes will be in Hardy–Weinberg equilibrium, and this can be tested for. A fourth possible genetic model is additive, with an increased disease risk of r for AB genotypes, and 2r for BB genotypes. This model shows a clear trend of an increased number of AB and BB genotypes, with the risk for AB genotypes approximately half that for BB genotypes. The additive genetic model can be tested for using Armitage’s test for trend.

Lewis CM. Brief Bioinform 2002 (www)

HLA-DRB4 Association in Childhood ALL

Homozygosity for HLA-DRB4 family is associated with susceptibility to childhood ALL

in boys only (P < 0.0001, OR = 6.1, 95% CI = 2.9 to 12.6 )

Controls are an unselected group of local newborns (201 boys & 214 girls)

* Case-only analysis P = 0.002 (OR = 5.6; 95% CI = 1.8 to 17.6)This association extends to a DRB4-HSP70 haplotype (OR = 8.3; 95% CI = 3.0 to 22.9)

This association has been replicated in Scotland and Turkey

%%

Boys, n=64

*

Girls, n=53

*

ADDITIVE MODELADDITIVE MODEL

Linear ModelLinear Model

Logit estimates Number of obs = 265 LR chi2(1) = 14.24 Prob > chi2 = 0.0002Log likelihood = -139.37794 Pseudo R2 = 0.0486------------------------------------------------------------------------------ caco | Common Odds Ratio Std. Err. z P>|z| [95% CI]-------------+---------------------------------------------------------------- drb4add | 2.208651 .4734163 3.70 0.000 1.45103 - 3.36186------------------------------------------------------------------------------

Heterozygosity and HomozygosityHeterozygosity and Homozygosity

Logit estimates Number of obs = 265 LR chi2(2) = 22.00 Prob > chi2 = 0.0000Log likelihood = -135.49623 Pseudo R2 = 0.0751------------------------------------------------------------------------------ caco | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------Wild-type | 1.00 (ref)Heterozygosity | 1.060652 .3557426 0.18 0.861 .549642 2.04676Homozygosity | 6.258503 2.65464 4.32 0.000 2.72534 14.37211------------------------------------------------------------------------------

HLA-DRB4HLA-DRB4 ASSOCIATION ASSOCIATION

EFFECT MODIFICATIONEFFECT MODIFICATION

Logit estimates Number of obs = 532 LR chi2(3) = 23.97 Prob > chi2 = 0.0000Log likelihood = -268.27826 Pseudo R2 = 0.0428------------------------------------------------------------------------------ caco | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- sex | -.0299037 .2229554 -0.13 0.893 -.4668883 .4070808 hsp53 | 2.530033 .5929603 4.27 0.000 1.367852 3.692214_IsexXhsp5~2 | -2.758189 .8812645 -3.13 0.002 -4.485436 -1.030943 _cons | -1.321474 .3517969 -3.76 0.000 -2.010984 -.6319651------------------------------------------------------------------------------

CONFOUNDING BY SEXCONFOUNDING BY SEX

Logit estimates Number of obs = 532 LR chi2(2) = 11.99 Prob > chi2 = 0.0025Log likelihood = -274.26995 Pseudo R2 = 0.0214------------------------------------------------------------------------------ caco | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- hsp53 | 3.32777 1.191429 3.36 0.001 1.649684 6.712832 sex | .7693041 .1636106 -1.23 0.218 .5070725 1.167148------------------------------------------------------------------------------

Adjusted for sex?

HLA-DRB4HLA-DRB4 - - HSPA1BHSPA1B HAPLOTYPE ASSOCIATION HAPLOTYPE ASSOCIATION

BOYS ONLYBOYS ONLYLogit estimates Number of obs = 265 LR chi2(1) = 22.41 Prob > chi2 = 0.0000Log likelihood = -135.29119 Pseudo R2 = 0.0765------------------------------------------------------------------------------ caco | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- hsp53 | 12.55392 7.444028 4.27 0.000 3.926876 40.13392------------------------------------------------------------------------------

GIRLS ONLYGIRLS ONLYLogit estimates Number of obs = 267 LR chi2(1) = 0.13 Prob > chi2 = 0.7205Log likelihood = -132.98706 Pseudo R2 = 0.0005------------------------------------------------------------------------------ caco | Odds Ratio Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- hsp53 | 0.796 .5189439 -0.35 0.726 .22181 2.856571------------------------------------------------------------------------------

The association is modified by sex

HLA-DRB4HLA-DRB4 - - HSPA1BHSPA1B HAPLOTYPE ASSOCIATION HAPLOTYPE ASSOCIATION

Current Criteria for Good Association StudiesCurrent Criteria for Good Association Studies

Statistical checklist for genetic association studiesStatistical checklist for genetic association studies- In a case-control study:

Cases and controls derive from the same study baseThere are more controls than cases (up to 5-to-1, for increased statistical power)There are at least 100 cases and 100 controls

- Statistical power calculations are presented - Hardy-Weinberg equilibrium (HWE) is checked and appropriate tests are used- If HWE is violated, allelic association tests are not used- Possible genotyping errors and counter-measures are discussed- All statistical tests are two-tailed- Alternative genetic models of association considered- The choice of marker/allele/genotype frequency (for comparisons) is justified- For HLA associations, a global test for association (G-test, RxC exact test) for each locus is used (if necessary, with correction for multiple testing)- Chi-squared and Fisher tests are NOT used interchangeably- P values are presented without spurious accuracy (with two decimal places)- Strength of association has been measured (usually odds ratio and its 95% CI)- In a retrospective case-control study, ORs are presented (as opposed to RRs)- Multiple comparisons issue is handled appropriately (this does not necessarily mean Bonferroni corrections)- Alternative explanations for the observed associations (chance, bias, confounding) are discussed

http://www.dorak.info/hla/stat.html

ROCHE Genetic Education (www)

Multifactorial Etiology

Hunter, 2005 (www)

Models of gene–environment interactions

Banks, 2000 (www)

Generating Protein Diversity from the 'Small' Genome

Alternative Splicing Can Generate Very Large Numbers of Related Proteins From a Single Gene

Most extreme example is the Drosophila Dscam Gene:

Generating Protein Diversity from the 'Small' Genome

12 x 48 x 33 x 2 = 38,016 alternative splice variants

Black, Cell 2000 (www)

Wojtowicz, Cell 2004 (www)

DSCAM =  Down syndrome cell adhesion molecule 

Lodish et al. Molecular Cell Biology, 5th Ed, WH Freeman (www)

Generating Protein Diversity from the 'Small' Genome

Alternative Splicing Can be Tissue or Cell-Specific

mRNA editing (base modification) is a different mechanism of alternative splicing

OMIM 107730 (www)Chen & Chan, 1996 (www)

Wedekind, 2003 (www)RNA Editing in The Cell – NCBI Online (www)

Lodish et al. Molecular Cell Biology, 5th Ed, WH Freeman (www)

Generating Protein Diversity from the 'Small' Genome

Integration of proteomics in genetic epidemiology studies would eliminate a lot of

obstacles arising from the followingOnly <2% of the genome is protein-coding and most sequence

variants are silent changes

Even genome-wide sequence variant studies cannot identify the genomic counterparts of 1.5 million proteins

Epigenetic changes, alternative transcription/splicing and posttranslational modifications cannot be predicted by study

of sequence variants

Genomic DNA studies does not take into account selective expression of genes in certain cell or tissues

No genetic association is complete without demonstration of the functional relevance

50s Rule50s Rule

Genome - 1:50 coding

Gene:Protein - 1:50 ratio

Overall efficiency of pure genomic studies

1:2500

http://www.dorak.info