Shaun Purcell Psychiatric & Neurodevelopmental Genetics Unit Center for Human Genetic Research...
-
date post
19-Dec-2015 -
Category
Documents
-
view
220 -
download
0
Transcript of Shaun Purcell Psychiatric & Neurodevelopmental Genetics Unit Center for Human Genetic Research...
Shaun Purcell
Psychiatric & Neurodevelopmental Genetics UnitCenter for Human Genetic Research
Massachusetts General Hospital
http://pngu.mgh.harvard.edu/[email protected]
Gene-environment & gene-gene Gene-environment & gene-gene interaction in association studies: interaction in association studies:
a methodologic introductiona methodologic introduction
Finding disease-causing variation
The Human Genome
chromosome 4 DNA sequenceSNP (single nucleotide polymorphism)
…GGCGGTGTTCCGGGCCATCACCATTGCGGGCCGGATCAACTGCCCTGTGTACATCACCAAGGTCATGAGCAAGAGTGCAGCCGACATCATCGCTCTGGCCAGGAAGAAAGGGCCCCTAGTTTTTGGAGAGCCCATTGCCGCCAGCCTGGGGACCGATGGCACCCATTACTGGAGCAAGAACTGGGCCAAGGCTGCGGCGTTCGTGACTTCCCCTCCCCTGAGCCCGGACCCTACCACGCCCGACTA…
Rare disease, major gene effect
Genotype Risk of diseaseDD 0.001
Dd 0.001
dd 0.95
Disease prevalence ~1 in 1000
Individuals with dd are ~1000 times more likely to get disease
Frequency of d in controls ~ 5%Frequency of d in cases ~ 96%
Disease prevalence ~1 in 1000
Individuals with dd are ~1000 times more likely to get disease
Frequency of d in controls ~ 5%Frequency of d in cases ~ 96%
Genotype Risk of diseaseDD 0.01
Dd 0.012
dd 0.0144
Common disease, polygenic effects
Disease prevalence ~1 in 100
Each extra d allele increases risk by ~1.2 times
Frequency of d in controls ~ 5%Frequency of d in cases ~ 6%
Disease prevalence ~1 in 100
Each extra d allele increases risk by ~1.2 times
Frequency of d in controls ~ 5%Frequency of d in cases ~ 6%
?
Gene-environment correlation
Gen
e ef
fect
Environmental effect
The environment modifies the effect of a gene
A gene modifies the effect of an environment
G x E interactionG x E interaction
Gene-environment interaction
Linkage disequilibrium (LD)
Epistasis
Gene effect
Gen
e ef
fect
Epistasis: one gene modifies the effect of another
Gene Gene ×× gene interaction gene interaction
Classical definition of epistasis
The aa genotype masks the effect of the bb genotype
AA
Aa
aa
BB Bb bb
Separate analysis
• locus A shows an association with the trait
• locus B appears unrelated
AA Aa aa BB Bb bb
Marker A Marker B
Epistasis & haplotypes
• Two-locus genotype A/a B/b (AaBb)A and B need not even be on same chromosome
• Haplotype AB / abA and B on same chromosome; effect could appear as “interaction”
• cis versus trans effectsAB haplotype causes disease A and B interact to cause disease
A
a
B
b
A
a
b
B
A
a
B
b
A
a
b
B
disease
no disease
disease
disease
Two locus genotypes Locus A
Locus B AA Aa aa
BB fAABB fAaBB faaBB
fBB
Bb fAABb fAaBb faaBb
fBb
bb fAabb fAabb faabb
fbb
fAA fAa faa f
“Penetrance” = probability of developing disease given genotype
Genotype Risk of diseaseDD 0.01
Dd 0.012
dd 0.0144
Common disease, polygenic effects
Disease prevalence ~1 in 100
Each extra d allele increases risk by ~1.2 times
Frequency of d in controls ~ 5%Frequency of d in cases ~ 6%
Disease prevalence ~1 in 100
Each extra d allele increases risk by ~1.2 times
Frequency of d in controls ~ 5%Frequency of d in cases ~ 6%
Small single SNP effectsmight represent larger epistatic effects
AA
Aa
aa
BB Bb bb
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.01
0.20
Risk of developing disease
0.01 0.01 0.012
Frequency a = b = 0.1
Interaction may be a common feature of genetic variation
• Brem et al (2005) Nature– gene expression phenotypes in yeast
– two-stage approach to find pairs of loci
• 65% of these pairs showed significant interaction
• many secondary loci would be missed by standard approaches though
Examples of interactions?Risk Environment Outcome
phenylalanine hydroxylase deficiency
dietary phenylalanine
mental retardation
debrisoquine metabolism
smoking lung cancer
fair skin sun exposure skin cancer
Lewis blood group alcohol intake coronary atherosclerosis
APOE genotype head injury Alzheimer's disease
AA AC
AA CC AA
AA
AC
AA
AA AC CCAA
AA AA AA AC AA AC CC CC
Family-based transmission disequilibrium test (TDT)
Population-based case/control
Odds ratio: measure of association
A aCase a bControl c d
Odds of A in cases = a/bOdds of A in controls = c/d
Odds ratio = (a/b)/(c/d) = ad / bc
E- E+
A a A a
Case 80 20 60 40
Control 80 20 80 20
Odds ratio 1.00 0.375(80*20)/(80*20) (60*20)/(80*40)
Z = ( ln(ORE-) – ln(ORE+) ) / sqrt( VE- + VE+)
V( ln(OR) ) = 1/a + 1/b + 1/c + 1/d
Regression modeling of interaction
Y = bXX + e
Y = bXX + bZZ + bIXZ + e
Y = ( bX + bIZ )X + bZZ + e
interaction component
effect of X on Y is modified by Z
Y = b0 + b1G + b2E +b3G×E
Y
0 1 2
• Linear for continuous outcomes
• Logistic regression for yes/no outcomes
G = 0, 1, 2 copies of allele “A”
E = yes/no exposure (0/1)
continuous measure
E-
E+
Gene dosage
Definitions of epistasisBiological Statistical
Individual-levelphenomenon
Population-level phenomenon
BB Bb bb
AA
Aa
aa
BB Bb bb
AA
Aa
aa
Requires:1) Variation between individuals2) Effect on disease
BB Bb bb
AA
Aa
aa
Requires:1) Correct statistical definition of effect
What do interactions mean?
• TEST MAIN EFFECT – Null hypothesis straightforward
• TEST INTERACTION– Null hypothesis is a mathematical model describing
joint effects
A- A+
B- 1 a
B+ b ?
A- A+ RR(A)
B- 1 a a/1 = a
B+ b ab ab/b = a
Additive risk differencesAdditive risk differences
A- A+ RD(A)
B- 1 a a-1 = a-1
B+ b a+b-1 a+b-1-b = a-1
Multiplicative risk ratiosMultiplicative risk ratios
“…we defined interaction as departure from a multiplicative model…”
• Multiplicative model (a×b)– common, easy to implement, logistic
regression• additive on log-odds scale• multiplicative on risk scale
• Other common models (on risk)– additive (a + b)– heterogeneity model (a + b – ab )
Original Log-transform
0.1
.2.3
De
nsity
-5 0 5p0
0.5
11.
5D
ens
ity
-2 -1 0 1 2p1
0.0
2.0
4.0
6.0
8D
ens
ity
-100 -50 0 50 100 150p2
0.5
11.
52
De
nsity
0 1 2 3 4 5p3
0.1
.2.3
De
nsity
0 2 4 6 8p4
Cubic-transform Censored 7-point scale
G1
G2
G1G2
AA AA
AA AC
No controls(Case-only design)
Population-based controls
Family-based controls
More robust, fewer assumptions
More efficient, powerful
v.s.
Case-only design• Detect interaction only, no main effects
Risk factors Prevalence
G- E- p0
G+ E- pG
G- E+ pE
G+ E+ pGE = p0 ∙ pG /p0 ∙ pE /p0
Case-only design• Detect interaction only, no main effects
Risk factors Prevalence
G- E- p0
G+ E- pG
G- E+ pE
G+ E+ pGE = p0 ∙ pG /p0 ∙ pE /p0
Leads to ORINT = ORGE / (ORG ∙ ORE)
It turns out, ORINT = ORCase / ORControl
where ORCase is the association of G and E in cases
and ORControl is the association of G and E in controls
0
10
20
30
40
50
60
70
80
90
100
100 cases, 100 controls
200 cases, 200 controls
200 cases only
200 controls only
No interaction Interaction0
10
20
30
40
50
60
70
80
90
100
100 cases, 100 controls
200 cases, 200 controls
200 cases only
200 controls only
0
10
20
30
40
50
60
70
80
90
100
% r
epli
cate
s si
gnif
ican
t at p
=0.
05
Case-only designs offer efficient detection of interaction
Case-only design isn’t always valid
• Chromosomal proximity
• Multiple ethnicities in case sample
Gene A Gene B
Gene A Gene B
stratification
Cases(Scz)
Controls
Genes in 5q GABA cluster
Pamela SklarTracey PetryshenC&M Pato
Pamela SklarTracey PetryshenC&M Pato
TDT requires independence assumption
aa Aa
aa
aa Aa
Aa
AA Aa
Aa
AA Aa
AA
Stratify for bb probands Stratify for BB probands
→100% →0% → 0% →100%
If variants A and B are in LD (common haplotypes AB / ab)
→ false positive interactions (due to linkage or population stratification)
An “all pairs of SNPs” approach to epistasis does not scale well
# SNPs # pairs
5 10
10 45
50 1,225
100 4,950
500 124,750
500000 124,999,750,000
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 5 10 15 20 25 30 35 40 45 50
Multiple testing increases false positives
Number of independent tests performed
P(a
t lea
st 1
fals
e po
siti
ve) per test false positive
rate 0.05
per test false positive rate0.001 = 0.05/50
Tests for interaction have low power
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Increasing sample N
Stat
isti
cal p
ower
Epistasis test
Standard association test
• DTNBP1 & 7 other genes encode proteins that make up the BLOC1 protein complex– biogenesis of lysosome-related organelles complex 1
• DTNBP1’s effect on Scz mediated via BLOC1?– if so, an analysis including all 8 genes
might help to resolve inconsistent studies
Dysbindin-1 (DTNBP1) & schizophrenia
Derek MorrisAiden CorvinMichael Gill
Derek MorrisAiden CorvinMichael Gill
DTNBP1 association studies
rs10
47
63
1
P1
32
8
P1
33
3
rs734129 P
12
87
rs38
29
89
3
P1
65
5
P1
63
5
rs26
19
54
2
P1
32
5
rs26
19
55
0
P1
76
5
P1
75
7
P1
32
0
P1
76
3
P1
57
8
P1
79
2
P1
79
5
P1
58
3
rs27
43
85
2
rs26
19
53
8
A A T
G G C
C C C
G C A A T C C
A C A T T
T G T C A
C A
C A T
C A T C T C
G G
G G
1 2 3 4 5 6 7 8 9 10
Exons
Straub et al. (2002)
SNPs
Schwab et al. (2003)
Van den Oord et al. (2003)
Van den Bogaert et al. (2003)
Tang et al. (2003)
Kirov et al. (2004)
Williams et al. (2004)
Funke et al. (2004)
Numakawa et al. (2004)
Li et al. (2005)
Duplicate gene action
Example: Kernel Color in Wheat
Only 1 dominant allele required, either A or B
A_B_ NormalA_bb NormalaaB_ Normalaabb No product
AA Aa aa
BB
Bb
bb
Complementary gene action
Example: Flower color in sweet pea
One recessive genotype at either gene would increase disease risk
i.e. genes A and B required
A_B_ NormalA_bb No productaaB_ No productaabb No product
AA Aa aa
BB
Bb
bb
AA Aa aa
BB
Bb
bb
Complementarygene action
Duplicategene action
Heterogeneity
model“Checkerboard”
model
-/- +/- +/+
-/-
+/-
+/+
Negative feedback: simple model of dysregulationNegative feedback: simple model of dysregulation
-/- +/- +/+
-/-
+/-
+/+0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
2.2
2.4
2.6
Frequency of one locus(other locus fixed p=0.4)
Single markerrelative risk
Negative feedback: single marker analysis leads toNegative feedback: single marker analysis leads tothe “opposite allele” problemthe “opposite allele” problem
0
0.5
1
1.5
2
2.5
Standard single SNP analyses
DTNBP1 MUTED PLDN SNAPAP CNO BLOC1S1 BLOC1S2 BLOC1S3
-log
10(p
-val
ue)
p=0.05
Dysbindin-1 by itself shows no evidence of association with Scz
373 Irish schizophrenics
812 controls
ABCDEFGHIJ
12345678
A 1A 2A 3A 4A 5A 6A 7A 8B 1B 2B 3B 4B 5B 6B 7B 8
…….J 6J 7J 8
A single gene-based test
80 allele-based tests
0
0.5
1
1.5
2
2.5
MUTED genotype
DTNBP1 DTNBP1 DTNBP1
Odd
s ra
tio
An independent replication? DTNBP1 MUTED epistasis (Straub et al. WCPG meeting Oct 2005.)
An independent replication? DTNBP1 MUTED epistasis (Straub et al. WCPG meeting Oct 2005.)
DTNBP1
MUTED
BLOC1S2
CNO
PLDN
SNAPAPBLOC1S1
BLOC1S3
Known protein interactions in BLOC-1 complex
Gene-based p = 0.0009Correcting for multiple tests, p = 0.025
Gene-based p = 0.0009Correcting for multiple tests, p = 0.025
Methylenetetrahydrofolate reductase (MTHFR) polymorphisms and serum folate interact to influence negative symptoms
and cognitive impairment in schizophrenia
Joshua Roffman, Donald Goff, et al
• Folic acid deficiency may contribute to negative symptoms and cognitive impairment in schizophrenia– underlying mechanism remains
uncertain
• A cohort of 159 outpatients
with schizophrenia measured:– negative symptoms– frontal lobe deficits
0
5
10
15
20
25
30
35Low folate
High folate
PA
NS
S N
egat
ive
Sym
pto
ms
C/C & C/T
T/T
0
10
20
30
40
50
60
Low folate
High folate
C/C & C/T T/T
Ver
bal
Flu
ency
0
10
20
30
40
50
60
Low folate
High folate
C/C & C/T T/T
WC
ST
% P
erse
vera
tive
Err
ors
•Interaction of low serum folic acid and homozygosity for the MTHFR 677T allele confers risk.
•Patients homozygous for the MTHFR 677T allele may therefore benefit specifically from folic acid supplementation.
Further reading
• Cordell HJ (2002) Human Molecular Genetics 11: 2463-2468.
– a statistical review of epistasis, methods and definitions
• Clayton D & McKeigue P (2001) The Lancet, 358, 1357-60.
– a critical appraisal of GxE research
• Marchini J, Donnelly P & Cardon LR (2005) Nature Genetics, 37, 413-417
– epistasis in whole-genome association studies