Post on 30-Mar-2015
Sequential Kernel Association Tests for the Combined Effect of Rare and Common Variants
Journal club (Nov/13)SH Lee
Introduction
• Sequence data– Rare and unidentified variants
• Groupwise association tests– Omnibus tests– Burden test, CMC test, SKAT test• Up-weighting for rare, • down-weighting for common• Rare/common variants tested separately
Introduction• This study develops a joint test of rare/common– Combining burden/SKAT test for rare/common
• Can be applied to – whole exome sequencing + GWAS – Deep resequencing of GWAS loci
• Basically can analyse all variants including rare, low-frequency and common variants
• Simulation (type 1 error, power)• Real data, CD and Autism
Materials and Methods
Definition of rare/common• <0.01 rare• 0.01-0.05 low frequency• >0.05 common
Or• <1/sqrt(2*n) rare • >1/sqrt(2*n) common– n = 500, rare MAF < 0.031– n = 10000, rare MAF < 0.007
Materials and Methods
• Testing for the overall effect of rare and common variants– Rare for Burden test– Common for SKAT test
Weighted-sum statisticsFishers method of combining the p values
Weighted-sum statistics
• Within a region (e.g. a gene) having m variants– g(*) is a linear or logistic link function – Alpha is for covariates– X is n x m matrix– Beta is regression coefficient and random variable
Weighted sum score test(Variance component score test)
Taking the first derivative of log-likelihood respect with the variance τ
P-value from κχ2ν
κ is scale parameter, v is degree of freedom
Weighted sum score test(Variance component score test)
Wu et al (2010) AJHG 86: 929; Liu et al (2008) BMC Bioinformatics 8: 292;
Lin (1997) Biometrika 84: 309; White (1982) Econometrica 50: 1
Weighted sum score test(Variance component score test)
• ρ : the correlation between regression coefficients • If perfectly correlated (ρ = 1), they will be all the same after
weighting, and one should collapse the variants first before running regression, i.e., the burden test
• If the regression coefficients are unrelated to each other, one should use SKAT
Lee et al. (2012) AJHG 91: 224
Burden-C, SKAT-C
• Combined test statistic for rare and common– Weighting beta(p,1,25) for rare, – beta(p,0.5,0.5) for common
• Partitioning rare and common variants
Other methods
• Burden-A, SKAT-A– Adaptive combining rare/common– Searching φ for the minimum p-value
• Burden-F, SKAT-F– Fisher’s combination method
Simulation
• Sequence data on 10,000 haplotypes on 1 Mb region
• Calibrated model for the European pop• Random sample of a region of 5 or 25 kb and
simulated data with 1000-5000 individuals • Proportion of cases in the sample is 0.5
Disease model
Methods
Type I error
• The proposed methods agrees with the expectation
Power (separation cut-off)
• Using burden-C test• Power with different separation cut-offs• 1/sqrt(2n) will be used further
Power (proposed methods)
• Power for 8 different tests• The proposed combination tests outperform
Power
• Rare/common causal variants (model 1, 2, 3, 6)– The combination methods perform better
Power
• Common causal variants (model 5)– The combination methods perform better
• Rare causal variants (model 4)– The combination methods perform similarly
Power (proposed methods)
•The proposed combination methods outperform CMC for all 6 disease models•The proposed combination methods outperform the original SKAT for all 6 disease models
Power
•For model 1-4 which include only risk variants• SKAT better than Burden when prop. risk variants is small (10%)• Burden better than SKAT when prop. risk variants is large (30%)
Power
• Model 1-3 which include both rare/common• SKAT-F better than burden-F regardless of prop. risk variants
• Model 5 which include only common risk variants• SKAT better than burden regardless of prop. risk variants
Power
• Adaptive test (SKAT-A, Burden-A)– Perform worse than SKAT-C and Burden-C
• Results for a region of size 5 kb were similar
Real data
• CD NOD2 sequence data – 453 cases, 103 controls– 60 single nucleotide variations (9 of them have >
MAF 0.05)– Because only pooled frequency counts available
for each variants, sequencing data were simulated.
• Autism LRP2 sequencing data– 430 cases, 379 controls
Real data
• The combination methods powerful than others
Discussion
• The proposed combination methods– Partitioning rare/common– Powerful approach– Better than CMC (rare/common partitioning)– Better than original Burden and SKAT test – Extend to family-based designs
Discussion
• T1D HLA region – SKAT (2.7e-43)– Wald test (6.7e-49)– Likelihood ratio test (8.9e-221)
• LD between regions • Multiple different components within a region
• Thanks
Linear SKAT vs individual variant test statistics
• Linear SKAT (lower) and individual variant test (upper) is equivalent
• Three disease model for power comparison