Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

32
Robust and powerful sibpair test for rare variant association Sebastian Zöllner University of Michigan

Transcript of Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Page 1: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Robust and powerful

sibpair test for rare variant association

Sebastian ZöllnerUniversity of Michigan

Page 2: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Acknowledgements

Matthew Zawistowski

Keng-Han Lin

Mark Reppell

Page 3: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

GWAS have been successful.

Only some heritability is explained by common variants.

Uncommon coding variants (maf 5%-0.5%) explain less.

Rare variants could explain some ‘missing’ heritability.

◦ Better Risk prediction.

◦ Rare variants may identify new genes.

◦ Rare exonic variants may be easier to annotate functionally and

interpret.

Rare Variants –Why Do We Care?

Page 4: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Testing individual variants is unfeasible.◦Limited power due to small number of

observations.◦Multiple testing correction.

Alternative: Joint test.◦Burden test (CMAT, Collapsing, WSS)◦Dispersion test (SKAT, C-alpha)

Burden/Dispersion Tests

Page 5: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Gene-based tests have low power.◦ Nelson at al (2010) estimated that 10,000 cases &

10,000 controls are required for 80% power in half of the genes.

Large sample size required

More heterogeneous sample =>Danger of stratification

Stratification may differ from common variants in magnitude and pattern.

Challenges of Rare Variant Analysis

Page 6: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

(202 genes, n=900/900, MAF < 1%,

Nonsense/nonsynonymous variants)

Stratification in European Populations

Page 7: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Variant Abundance across Populations

African-American

Southern AsiaSouth-Eastern Europe

Finland

South-Western Europe

Northern Europe

Central EuropeWestern Europe

Eastern EuropeNorth-Western Europe

A gradient in diversity from Southern to Northern Europe

Sample SizeExp

ect

ed N

um

ber

of

vari

ants

p

er

kb

Page 8: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Allele Sharing

Median EU-EU: 0.71 Median EU-EU: 0.86 Median EU-EU: 0.98

• Measure of rare variant diversity.• Probability of two carriers of the minor alleles being

from different populations (normalized).

Page 9: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

1. Select 2 populations.

2. Select mixing parameter r.

3. Sample 30 variants from the 202 genes.

4. Calculate inflation based on observed frequency differences.

General Evaluation of Stratification

Page 10: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Inflation by Mixture Proportion

Zawistowski et al. 2014

Page 11: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Inflation across Comparisons

Page 12: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

If multiple affected family members are collected, it may be more powerful to sequence all family members.

Family-based tests can be robust against stratification.

TDT-Type tests are potentially inefficient.

How to leverage low frequency?

◦ Low frequency risk variants should me more common in cases.

◦ And even more common on chromosomes shared among many cases.

Family-based Test against Stratification

Page 13: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

• Consider affected sibpairs.• Estimate IBD sharing.• Compare the number of

rare variants on shared (solid) and non-shared chromosomes (blank).

Any aggregate test can be applied.

Family Test S=0

S=2

S=1

Page 14: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Twice as many non-shared as shared chromosomes.

Null hypothesis determines test:

Shared alleles : Non-shared alleles=1:2Test for linkage or association

Shared alleles : Non-shared alleles=Shared chromosomes : Non-shared chromosomes

Test for association only

Basic Properties

Page 15: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

IBD sharing is known.

Individuals don’t need phase to identify shared variants.

Except one configuration: IBD 1 and both sibs are heterozygous

Under null, probability of configuration 2 is allele frequency. Under the alternative, we need to use multiple imputation.

Haplotypes not required

Configuration 1

+1 shared

Configuration 1

+2 non-shared

Page 16: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Assume chromosome sharing

status is known for each sibpair.

Count rare variants; impute sharing

status for double-heterozygotes.

Compare number of rare variants

between shared and non-shared

chromosomes with chi-squared test

(Burden Style).

Evaluation of Internal Control

S=0

S=2

S=1

Page 17: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Classic Case-Control

Selected Cases

Enriching Based on Familial Risk

S=0

S=2

S=1

Internal Control

Page 18: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Consider 2 populations.

p=0.01 in pop1, p=0.05 in pop2.

1000 sibpairs for internal control design.

1000 cases, 1000 controls for selected cases.

1000 cases and 1000 controls for case-control.

Sample cases from pop1 with proportion .

Test for association with α=0.05.

Stratification

Page 19: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Robust to Population Stratification

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

Proportion

Typ

e I

Err

or

Ra

te

Internal ControlSelected CasesConventional

Page 20: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Realistic rare variant models are unknown◦ Typical allele frequency

◦ Number of risk variants/gene

◦ Typical effect size

◦ Distribution of effect sizes

◦ Identifiabillity of risk variants

Goal: Create a model that summarizes these unknowns into◦ Summed allele frequency

◦ Mean effect size

◦ Variance of effect size

Evaluating Study Designs

Page 21: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Assume many loci carrying risk variants.

Risk alleles at multiple loci each increase the risk by a factor independently.

Frequency of risk variant:

◦ Independent cases

◦ On shared chromosome

Basic Genetic Model

)()|()|( RPRAAPAARP

A Affected

AA Affected relative pair

R Risk locus genotype

P(A|R)P(R)ARP )|(

Page 22: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Relative risk is sampled from distribution f with mean , variance σ2.

Simplifications:

◦ Each risk variant occurs only once in the population.

◦ Each risk variant on its own haplotype.

Then the risk in a random case is

Effect Size Model

2121 )()(),|( 212121rrrr mfmfmmrrAP

A Affected

r1,r

2

Carrier status of chromosome 1,2

m1,m2

Relative risk of risk variants on 1,2

Mean effect size

σ2 Variance of effect size

Page 23: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

To calculate the probability of having an affected sib-pair we condition on sharing S.

For S>0, the probability depends on σ2. E.g. (S=2):

Effect in Sib-pairsAA Affected rel pair

ri Carrier stat chrom i

mi Relative risk of variant on i

f Distribution of RR

Mean RR

σ2 Variance of RR

S Sharing status

2121 )()()( 2222 rrrr fEfE

)()(

)2,,|(

2122

21

21

21 mfmfmm

SrrAAPrr

Page 24: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Select μ, σ2 and cumulative frequency f

Calculate allele frequency in cases/controls P(R|A).

Calculate allele frequency in shared/non-shared chromosomes.

=> Non-centrality parameter of χ2 distribution.

Analytic Power Analysis

Page 25: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Minor Allele Frequency

1 2 3 4 5

0.0

0.2

0.4

0.6

f=0.2f=0.01

sMA

F

1 2 3 4 5Mean Relative Risk

1 2 3 4 5

Conventional Case-Control

Internal Control

Selected Cases

Page 26: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Power Comparison by Mean Effect Size

1.0 2.5 4.0

0.0

0.4

0.8

Po

we

r

f=0.01

1.0 2.5 4.0

sap

ply

(x,

fun

ctio

n(x

) p

ow

er.

sas(

mu

= x

, si

gm

a2

= s

igm

a2

, f

= 0

.05

,

n

_sb

= n

1))

f=0.05

Mean Relative Risk1.0 2.5 4.0

sap

ply

(x,

fun

ctio

n(x

) p

ow

er.

sas(

mu

= x

, si

gm

a2

= s

igm

a2

, f

= 0

.2,

n_

sb =

n1

))

f=0.2

Internal ControlSelected CasesConventional

Page 27: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Power Comparison by Variance

0 1 2 3 4

0.0

0.4

0.8

Po

we

r

f=0.01

0 1 2 3 4

sap

ply

(x,

fun

ctio

n(x

) p

ow

er.

sas(

mu

= m

u,

sig

ma

2 =

x,

f =

0.0

5,

n_

sb =

n1

))

f=0.05

Variance of Relative Risk0 1 2 3 4

sap

ply

(x,

fun

ctio

n(x

) p

ow

er.

sas(

mu

= m

u,

sig

ma

2 =

x,

f =

0.2

,

n

_sb

= n

1))

f=0.2

Internal ControlSelected CasesConventional

Page 28: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Gene-gene interaction affects power in families.

For broad range of interaction models, consider two-locus model.

G now has alleles g1,g2. The joint effect is

We compare the effect of while adjusting L and

G to maintain marginal risk.

Gene-Gene Interaction

))((2121

21212121),,,|( ggrrggG

rrLggrrAP

Page 29: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Power for Antagonistic Interaction

0.2 0.4 0.6 0.8 1.0

0.0

0.4

0.8

Po

we

r

Interaction Coefficient

IC SRR=2IC SRR=8Conventional

Page 30: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Power for Positive Interaction

1.0 1.2 1.4 1.6 1.8 2.0

0.0

0.4

0.8

Po

we

r

Interaction Coefficient

IC SRR=2IC SRR=8Conventional

Page 31: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Stratification is a strong confounder for rare variant tests.

Family-based association methods are robust to stratification.

Comparing rare variants between shared and non-shared chromosomes is substantially more powerful than case-control designs.

All family based methods/samples depend on the model of gene-gene interaction. Under antagonistic interaction power can be lower than a population sample.

Conclusions

Page 32: Sebastian Zöllner University of Michigan. Matthew Zawistowski Keng-Han Lin Mark Reppell.

Questions?Thank you for your attention