GEE & GLMM in GWAS
-
Upload
jinseob-kim -
Category
Data & Analytics
-
view
259 -
download
9
description
Transcript of GEE & GLMM in GWAS
Association Study: Binomial CaseGEE & GLMM
Jinseob Kim
GSPH, SNU
July 2, 2014
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 1 / 45
Contents
1 Correlated = Not IndependentConceptExample
2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison
3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM
4 Conclusion
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 2 / 45
Objective
1 Correlated data structure를 이해한다.
2 GEE, GLMM의 개념, 공통점, 차이점에 대해 이해한다.
3 GWAS에서 GEE, GLMM의 적용현실을 이해한다.
4 Binomial case에서 GEE, GLMM을 이용하지 못함을 숙지한다.
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 3 / 45
Correlated = Not Independent
Contents
1 Correlated = Not IndependentConceptExample
2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison
3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM
4 Conclusion
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 4 / 45
Correlated = Not Independent Concept
iid??
εi ∼iid N(0, σ2) or ε ∼ N(0, σ2In)
Independent
Identically distributed
εi ∼ N(0, σ2i )
Independent
Not Identically distributed
같은 모집단이 아니다!!
다음 시간에..
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 5 / 45
Correlated = Not Independent Concept
Variance-covariance matrix
var(ε) =
σ2 0 0 · · · 00 σ2 0 · · · 0...
......
. . ....
0 0 0 · · · σ2
= σ2
1 0 0 · · · 00 1 0 · · · 0...
......
. . ....
0 0 0 · · · 1
= σ2In
즉, covariance 중 0 아닌 것이 하나라도 있으면 correlated data!!
즉, 상관계수 중 0 아닌 것이 하나라도 있으면 correlated data!!
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 6 / 45
Correlated = Not Independent Example
Repeated Measure
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 7 / 45
Correlated = Not Independent Example
Clustered/Multilevel study
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 8 / 45
Correlated = Not Independent Example
Serial Correlation
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 9 / 45
Correlated = Not Independent Example
Familial structure in Genetic Study
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 10 / 45
Correlated = Not Independent Example
Genetic correlation
1 ρ12 ρ13 · · · ρ1nρ21 1 ρ23 · · · ρ2n
......
.... . .
...ρn1 ρn2 ρn3 · · · 1
1 0.5 0.25 · · · 00.5 1 1 · · · 0.5
......
.... . .
...0 0.5 0 · · · 1
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 11 / 45
GEE & GLMM Basic
Contents
1 Correlated = Not IndependentConceptExample
2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison
3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM
4 Conclusion
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 12 / 45
GEE & GLMM Basic Basic Linear Regression
Remind
β estimation in linear regression
1 Ordinary Least Square(OLS): semi-parametric
2 Maximum Likelihood Estimator(MLE): parametric
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 13 / 45
GEE & GLMM Basic Basic Linear Regression
Least Square(최소제곱법)
제곱합을 최소로: y 정규성에 대한 가정 필요없다.
Figure. OLS Fitting
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 14 / 45
GEE & GLMM Basic Basic Linear Regression
Likelihood??
가능도(likelihood) VS 확률(probability)
Discrete: 가능도 = 확률 - 주사위 던져 1나올 확률은 16
Continuous: 가능도 != 확률 - 0∼1 에서 숫자 하나 뽑았을 때 0.7일확률은 0...
Figure. Likelihood
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 15 / 45
GEE & GLMM Basic Basic Linear Regression
Maximum likelihood estimator(MLE)
최대가능도추정량: ε1, · · · , εn이 서로 독립이라하자.
1 각각의 가능도 함수를 구한다.
2 가능도를 전부 곱하면 전체 사건의 가능도 (독립이니까)
3 가능도를 최대로 하는 β를 구한다.
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 16 / 45
GEE & GLMM Basic Basic Linear Regression
MLE: 최대가능도추정량
데이터가 일어날 가능성을 최대로: y또는 ε 분포가정필요.
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 17 / 45
GEE & GLMM Basic Basic Linear Regression
Logistic function: MLE
Figure. Fitting Logistic Function
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 18 / 45
GEE & GLMM Basic Basic Linear Regression
LRT? Ward? score?
Likelihood Ratio Test VS Ward test VS score test
1 통계적 유의성 판단하는 방법들.
2 가능도비교 VS 베타값비교 VS 기울기비교/
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 19 / 45
GEE & GLMM Basic Basic Linear Regression
비교
Figure. Comparison
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 20 / 45
GEE & GLMM Basic Basic Linear Regression
AIC
우리가 구한 모형의 가능도를 L이라 하면.
1 AIC = −2× log(L) + 2× k
2 k: 설명변수의 갯수(성별, 나이, 연봉...)
3 작을수록 좋은 모형!!!
가능도가 큰 모형을 고르겠지만.. 설명변수 너무 많으면 페널티!!!
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 21 / 45
GEE & GLMM Basic GEE
OLS, GLS, GEE
Y = Xβ + ε (1)
var(ε) = σ2In : 즉 독립 - 그냥 OLS.
var(ε) = σ2Φ : 즉 독립이 아니라면?
GY = GXβ + Gε (2)
적당한 행렬 G를 곱한다.
var(Gε) = σ2In
OLS → G의 역행렬 다시 곱해준다: Generalized Least Square
GLS의 binomial, poisson 버전이 Generalized Estimating Equation.
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 22 / 45
GEE & GLMM Basic GEE
Ex: Repeated Measure
Cluster= individual, Option= exchangeable
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 23 / 45
GEE & GLMM Basic GEE
Serial or Unstructured
1 ρ ρ2 · · · ρn−1
ρ 1 ρ · · · ρn−2
......
.... . .
...ρn−1 ρn−2 ρn−3 · · · 1
1 ρ12 ρ13 · · · ρ1nρ21 1 ρ23 · · · ρ2n
......
.... . .
...ρn1 ρn2 ρn3 · · · 1
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 24 / 45
GEE & GLMM Basic GLMM
Fixed effect VS Random effect
Fixed effect
β를 구한다.
β = 0?
Random effect
β 구하는 것 포기. (ex: 병원 50개, 사람 3461명)
β 에 불확실성을 가정: 정확히 알 수 없다. (병원들의 효과 각각은 알수 없다, 개개인의 polygenic effect 정확히는 알 수 없다.)
Var(β) = 0? (병원들의 효과가 얼마나 차이가 있을라나...)
변수 49개 → 1개.
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 25 / 45
GEE & GLMM Basic GLMM
Linear Mixed Model
Y = Xβ + Zγ + ε (3)
Z: dummy variables for cluster.
var(ε) = σ2e In : 독립가정!!
var(β) = 0, var(γ) = σ2uA
σ2 = σ2u + σ2e (4)
이것의 Binomial 버전이 GLMM.
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 26 / 45
GEE & GLMM Basic Comparison
비교
공통점
1 독립가정이 깨졌을 때 이용한다.
차이점
1 GEE: semi-parametric, GLMM: parametric
2 Inference : Population VS Individual
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 27 / 45
GEE & GLMM Basic Comparison
Inference
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 28 / 45
GEE & GLMM Basic Comparison
철학의 차이
GEE: Cluster 보정만 하면 된다. 관심없다.
GLMM: Cluster마다 β값을 구하는 것은 포기. 단, Cluster마다 얼마나중요한지는 알아야겠다: 숫자 하나로 표현(σ2u) & β값 대략적으로는구할 수 있다(BLUP).
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 29 / 45
GEE & GLMM Basic Comparison
GEE example: Continuous
running glm to get initial regression estimate
(Intercept) age sex BMI
-64.2956645 0.1811694 -42.3958662 8.5256257
gee(formula = TG ~ age + sex + BMI, id = FID, data = a, corstr = "exchangeable")
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) -67.2665582 35.8624272 -1.8756834 35.9094269 -1.8732284
age 0.1751885 0.3340099 0.5245007 0.3996143 0.4383938
sex -42.2905294 11.3716707 -3.7189372 8.3038131 -5.0929048
BMI 8.6744524 1.2930220 6.7086657 1.4041520 6.1777161
Working Correlation
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.2582559 0.2582559 0.2582559
[2,] 0.2582559 1.0000000 0.2582559 0.2582559
[3,] 0.2582559 0.2582559 1.0000000 0.2582559
[4,] 0.2582559 0.2582559 0.2582559 1.0000000
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 30 / 45
GEE & GLMM Basic Comparison
GLMM example: Continuous
lmer(formula = TG ~ age + sex + BMI + (1 | FID), data = a)
Estimate Std. Error t value
(Intercept) -65.222107 35.8720093 -1.8181894
age 0.109564 0.3318413 0.3301699
sex -41.942137 11.3684264 -3.6893529
BMI 8.648601 1.2917159 6.6954362
Groups Name Std.Dev.
FID (Intercept) 39.356
Residual 72.007
39.356^2/(39.356^2+72.007^2)=0.23
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 31 / 45
GEE & GLMM Basic Comparison
GEE example: Binomial
running glm to get initial regression estimate
(Intercept) age sex BMI
-5.457458529 0.009749659 -1.385819506 0.157734298
gee(formula = hyperTG ~ age + sex + BMI, id = FID, data = a,
family = binomial, corstr = "exchangeable")
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) -5.453486897 1.10811194 -4.9214224 1.14198243 -4.7754561
age 0.008754136 0.00997040 0.8780125 0.01087413 0.8050421
sex -1.337114934 0.53428456 -2.5026270 0.52621253 -2.5410169
BMI 0.158988089 0.03867076 4.1113256 0.04248749 3.7419975
Working Correlation
[,1] [,2] [,3] [,4]
[1,] 1.0000000 0.1942491 0.1942491 0.1942491
[2,] 0.1942491 1.0000000 0.1942491 0.1942491
[3,] 0.1942491 0.1942491 1.0000000 0.1942491
[4,] 0.1942491 0.1942491 0.1942491 1.0000000
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 32 / 45
GEE & GLMM Basic Comparison
GLMM example: Binomial
glmer(formula = hyperTG ~ age + sex + BMI + (1 | FID), data = a,
family = binomial)
Estimate Std. Error z value Pr(>|z|)
(Intercept) -6.65451749 1.48227814 -4.4893852 7.142904e-06
age 0.01052907 0.01206682 0.8725635 3.829010e-01
sex -1.48506920 0.60773433 -2.4436158 1.454090e-02
BMI 0.19131619 0.05022612 3.8090977 1.394749e-04
Groups Name Std.Dev.
FID (Intercept) 1.1163
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 33 / 45
GEE & GLMM in GWAS
Contents
1 Correlated = Not IndependentConceptExample
2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison
3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM
4 Conclusion
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 34 / 45
GEE & GLMM in GWAS Concepts of GWAS
Issues
Concepts
Sample < SNP (3461 VS 500,000)
Regression more than 500,000 repeat...!!!!
Strict p-value(≤ 5× 10−8)
Issues
Computation burden.. speed!!
Complex correlation structure
Approximation technique
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 35 / 45
GEE & GLMM in GWAS Genetic Correlation
GCM
Genetic Correlation Matrix
Correlation structure: 이미 알고 있다. (가족구조 VS Data)
복잡하다. 규칙이 없다.
Computation...
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 36 / 45
GEE & GLMM in GWAS Genetic Correlation
Genetic Correlation Matrix: Example
R1E1I00051 R1E1I00241 R1E1I00251 R1E1I00040 R1E1I00230 R1R1I00251
R1E1I00051 1.00 0.5 0.0 0.25 0.25 0.5
R1E1I00241 0.50 1.0 0.0 0.50 0.50 0.0
R1E1I00251 0.00 0.0 1.0 0.50 0.50 0.0
R1E1I00040 0.25 0.5 0.5 1.00 0.50 0.0
R1E1I00230 0.25 0.5 0.5 0.50 1.00 0.0
R1R1I00251 0.50 0.0 0.0 0.00 0.00 1.0
R1E1I00060 0.00 0.0 0.0 0.00 0.00 0.0
R1E1I00070 0.00 0.0 0.0 0.00 0.00 0.0
R1E1I00081 0.00 0.0 0.0 0.00 0.00 0.0
R1E1I00091 0.00 0.0 0.0 0.00 0.00 0.0
R1E1I00060 R1E1I00070 R1E1I00081 R1E1I00091
R1E1I00051 0.0 0.0 0.0 0.0
R1E1I00241 0.0 0.0 0.0 0.0
R1E1I00251 0.0 0.0 0.0 0.0
R1E1I00040 0.0 0.0 0.0 0.0
R1E1I00230 0.0 0.0 0.0 0.0
R1R1I00251 0.0 0.0 0.0 0.0
R1E1I00060 1.0 0.5 0.5 0.5
R1E1I00070 0.5 1.0 0.5 0.5
R1E1I00081 0.5 0.5 1.0 0.5
R1E1I00091 0.5 0.5 0.5 1.0
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 37 / 45
GEE & GLMM in GWAS Use GEE & GLMM
주의점
Cluster는 없다. 각 개인 하나하나가 Cluster.
GCM 미리 저장한다.
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 38 / 45
GEE & GLMM in GWAS Use GEE & GLMM
GWAS example: GEE-continuous
running glm to get initial regression estimate
(Intercept) age sex BMI genecount
-63.0665181 0.1441694 -39.0676606 7.8280011 19.8533844
gee(formula = TG ~ age + sex + BMI + genecount, id = ID, data = a,
R = kin, corstr = "fixed")
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) -63.0665181 35.4400639 -1.7795261 31.4650444 -2.0043359
age 0.1441694 0.3376881 0.4269307 0.3558302 0.4051635
sex -39.0676606 11.2797186 -3.4635315 7.2549380 -5.3849751
BMI 7.8280011 1.2914399 6.0614519 1.3054881 5.9962258
genecount 19.8533844 6.2315166 3.1859635 5.8534124 3.3917624
Working Correlation
[,1] [,2] [,3] [,4]
[1,] 1.0 0.5 0.5 0.5
[2,] 0.5 1.0 0.5 0.5
[3,] 0.5 0.5 1.0 0.0
[4,] 0.5 0.5 0.0 1.0
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 39 / 45
GEE & GLMM in GWAS Use GEE & GLMM
GWAS example: GEE-binomial
running glm to get initial regression estimate
(Intercept) age sex BMI genecount
-5.482288956 0.009646267 -1.348154797 0.151819412 0.192508455
gee(formula = hyperTG ~ age + sex + BMI + genecount, id = ID,
data = a, R = kin, family = binomial, corstr = "fixed")
Estimate Naive S.E. Naive z Robust S.E. Robust z
(Intercept) -5.482288957 1.10060632 -4.9811535 1.07919392 -5.0799850
age 0.009646267 0.01004073 0.9607134 0.01027862 0.9384789
sex -1.348154801 0.53873048 -2.5024662 0.52100579 -2.5876004
BMI 0.151819412 0.03861585 3.9315312 0.04199752 3.6149615
genecount 0.192508455 0.18683677 1.0303564 0.19281252 0.9984230
Working Correlation
[,1] [,2] [,3] [,4]
[1,] 1.0 0.5 0.5 0.5
[2,] 0.5 1.0 0.5 0.5
[3,] 0.5 0.5 1.0 0.0
[4,] 0.5 0.5 0.0 1.0
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 40 / 45
GEE & GLMM in GWAS Use GEE & GLMM
GWAS example: GLMM
lme4 패키지에서 구현 불가능.
hglm 패키지에서 가능.
GenABEL에서 polygenic hglm 함수로 구현되어 있음.
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 41 / 45
GEE & GLMM in GWAS Use GEE & GLMM
Limitation
Both GEE & GLMM
느리다. 특히 가족구조 + Binomial은 최악..
Continuous: Approximation의 발달로 극복- FASTA, GRAMMAR,GEMMA..
Binomial: Approximation 딱히..- Speed문제 극복불가.
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 42 / 45
Conclusion
Contents
1 Correlated = Not IndependentConceptExample
2 GEE & GLMM BasicBasic Linear RegressionGEEGLMMComparison
3 GEE & GLMM in GWASConcepts of GWASGenetic CorrelationUse GEE & GLMM
4 Conclusion
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 43 / 45
Conclusion
정리
1 독립가정이 깨질 때 이용한다.
2 GEE와 GLMM은 해석의 차이가 있다.
3 GLMM이 Computing burden이 더 크다.
4 GWAS에서는 Correlation 구조 미리 구한다: kinship matrix
5 Binomial trait: GWAS - 해결하면 nature급.
현재 Binomial trait은 TDT기반의 통계량밖에.. Sample size issue..;;
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 44 / 45
Conclusion
END
Email : [email protected]: (02)880-2473H.P: 010-9192-5385
Jinseob Kim (GSPH, SNU) Association Study: Binomial Case July 2, 2014 45 / 45