Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.
-
Upload
percival-carroll -
Category
Documents
-
view
218 -
download
0
Transcript of Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.
Robust and powerful
sibpair test for rare variant association
Sebastian ZöllnerUniversity of Michigan
Acknowledgements
Matthew Zawistowski
Keng-Han Lin
Mark Reppell
GWAS have been successful.
Only some heritability is explained by common variants.
Uncommon coding variants (maf 5%-0.5%) explain less.
Rare variants could explain some ‘missing’ heritability.
◦ Better Risk prediction.
◦ Rare variants may identify new genes.
◦ Rare exonic variants may be easier to annotate functionally and
interpret.
Rare Variants –Why Do We Care?
Testing individual variants is unfeasible.◦Limited power due to small number of
observations.◦Multiple testing correction.
Alternative: Joint test.◦Burden test (CMAT, Collapsing, WSS)◦Dispersion test (SKAT, C-alpha)
Burden/Dispersion Tests
Gene-based tests have low power.◦ Nelson at al (2010) estimated that 10,000 cases &
10,000 controls are required for 80% power in half of the genes.
Large sample size required
More heterogeneous sample =>Danger of stratification
Stratification may differ from common variants in magnitude and pattern.
Challenges of Rare Variant Analysis
(202 genes, n=900/900, MAF < 1%,
Nonsense/nonsynonymous variants)
Stratification in European Populations
Variant Abundance across Populations
African-American
Southern AsiaSouth-Eastern Europe
Finland
South-Western Europe
Northern Europe
Central EuropeWestern Europe
Eastern EuropeNorth-Western Europe
A gradient in diversity from Southern to Northern Europe
Sample SizeExp
ect
ed N
um
ber
of
vari
ants
p
er
kb
Allele Sharing
Median EU-EU: 0.71 Median EU-EU: 0.86 Median EU-EU: 0.98
• Measure of rare variant diversity.• Probability of two carriers of the minor alleles being
from different populations (normalized).
1. Select 2 populations.
2. Select mixing parameter r.
3. Sample 30 variants from the 202 genes.
4. Calculate inflation based on observed frequency differences.
General Evaluation of Stratification
Inflation by Mixture Proportion
Zawistowski et al. 2014
Inflation across Comparisons
If multiple affected family members are collected, it may be more powerful to sequence all family members.
Family-based tests can be robust against stratification.
TDT-Type tests are potentially inefficient.
How to leverage low frequency?
◦ Low frequency risk variants should me more common in cases.
◦ And even more common on chromosomes shared among many cases.
Family-based Test against Stratification
• Consider affected sibpairs.• Estimate IBD sharing.• Compare the number of
rare variants on shared (solid) and non-shared chromosomes (blank).
Any aggregate test can be applied.
Family Test S=0
S=2
S=1
Twice as many non-shared as shared chromosomes.
Null hypothesis determines test:
Shared alleles : Non-shared alleles=1:2Test for linkage or association
Shared alleles : Non-shared alleles=Shared chromosomes : Non-shared chromosomes
Test for association only
Basic Properties
IBD sharing is known.
Individuals don’t need phase to identify shared variants.
Except one configuration: IBD 1 and both sibs are heterozygous
Under null, probability of configuration 2 is allele frequency. Under the alternative, we need to use multiple imputation.
Haplotypes not required
Configuration 1
+1 shared
Configuration 1
+2 non-shared
Assume chromosome sharing
status is known for each sibpair.
Count rare variants; impute sharing
status for double-heterozygotes.
Compare number of rare variants
between shared and non-shared
chromosomes with chi-squared test
(Burden Style).
Evaluation of Internal Control
S=0
S=2
S=1
Classic Case-Control
Selected Cases
Enriching Based on Familial Risk
S=0
S=2
S=1
Internal Control
Consider 2 populations.
p=0.01 in pop1, p=0.05 in pop2.
1000 sibpairs for internal control design.
1000 cases, 1000 controls for selected cases.
1000 cases and 1000 controls for case-control.
Sample cases from pop1 with proportion .
Test for association with α=0.05.
Stratification
Robust to Population Stratification
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Proportion
Typ
e I
Err
or
Ra
te
Internal ControlSelected CasesConventional
Realistic rare variant models are unknown◦ Typical allele frequency
◦ Number of risk variants/gene
◦ Typical effect size
◦ Distribution of effect sizes
◦ Identifiabillity of risk variants
Goal: Create a model that summarizes these unknowns into◦ Summed allele frequency
◦ Mean effect size
◦ Variance of effect size
Evaluating Study Designs
Assume many loci carrying risk variants.
Risk alleles at multiple loci each increase the risk by a factor independently.
Frequency of risk variant:
◦ Independent cases
◦ On shared chromosome
Basic Genetic Model
)()|()|( RPRAAPAARP
A Affected
AA Affected relative pair
R Risk locus genotype
P(A|R)P(R)ARP )|(
Relative risk is sampled from distribution f with mean , variance σ2.
Simplifications:
◦ Each risk variant occurs only once in the population.
◦ Each risk variant on its own haplotype.
Then the risk in a random case is
Effect Size Model
2121 )()(),|( 212121rrrr mfmfmmrrAP
A Affected
r1,r
2
Carrier status of chromosome 1,2
m1,m2
Relative risk of risk variants on 1,2
Mean effect size
σ2 Variance of effect size
To calculate the probability of having an affected sib-pair we condition on sharing S.
For S>0, the probability depends on σ2. E.g. (S=2):
Effect in Sib-pairsAA Affected rel pair
ri Carrier stat chrom i
mi Relative risk of variant on i
f Distribution of RR
Mean RR
σ2 Variance of RR
S Sharing status
2121 )()()( 2222 rrrr fEfE
)()(
)2,,|(
2122
21
21
21 mfmfmm
SrrAAPrr
Select μ, σ2 and cumulative frequency f
Calculate allele frequency in cases/controls P(R|A).
Calculate allele frequency in shared/non-shared chromosomes.
=> Non-centrality parameter of χ2 distribution.
Analytic Power Analysis
Minor Allele Frequency
1 2 3 4 5
0.0
0.2
0.4
0.6
f=0.2f=0.01
sMA
F
1 2 3 4 5Mean Relative Risk
1 2 3 4 5
Conventional Case-Control
Internal Control
Selected Cases
Power Comparison by Mean Effect Size
1.0 2.5 4.0
0.0
0.4
0.8
Po
we
r
f=0.01
1.0 2.5 4.0
sap
ply
(x,
fun
ctio
n(x
) p
ow
er.
sas(
mu
= x
, si
gm
a2
= s
igm
a2
, f
= 0
.05
,
n
_sb
= n
1))
f=0.05
Mean Relative Risk1.0 2.5 4.0
sap
ply
(x,
fun
ctio
n(x
) p
ow
er.
sas(
mu
= x
, si
gm
a2
= s
igm
a2
, f
= 0
.2,
n_
sb =
n1
))
f=0.2
Internal ControlSelected CasesConventional
Power Comparison by Variance
0 1 2 3 4
0.0
0.4
0.8
Po
we
r
f=0.01
0 1 2 3 4
sap
ply
(x,
fun
ctio
n(x
) p
ow
er.
sas(
mu
= m
u,
sig
ma
2 =
x,
f =
0.0
5,
n_
sb =
n1
))
f=0.05
Variance of Relative Risk0 1 2 3 4
sap
ply
(x,
fun
ctio
n(x
) p
ow
er.
sas(
mu
= m
u,
sig
ma
2 =
x,
f =
0.2
,
n
_sb
= n
1))
f=0.2
Internal ControlSelected CasesConventional
Gene-gene interaction affects power in families.
For broad range of interaction models, consider two-locus model.
G now has alleles g1,g2. The joint effect is
We compare the effect of while adjusting L and
G to maintain marginal risk.
Gene-Gene Interaction
))((2121
21212121),,,|( ggrrggG
rrLggrrAP
Power for Antagonistic Interaction
0.2 0.4 0.6 0.8 1.0
0.0
0.4
0.8
Po
we
r
Interaction Coefficient
IC SRR=2IC SRR=8Conventional
Power for Positive Interaction
1.0 1.2 1.4 1.6 1.8 2.0
0.0
0.4
0.8
Po
we
r
Interaction Coefficient
IC SRR=2IC SRR=8Conventional
Stratification is a strong confounder for rare variant tests.
Family-based association methods are robust to stratification.
Comparing rare variants between shared and non-shared chromosomes is substantially more powerful than case-control designs.
All family based methods/samples depend on the model of gene-gene interaction. Under antagonistic interaction power can be lower than a population sample.
Conclusions
Questions?Thank you for your attention