Continuous heterogeneity

55
Continuous heterogeneity Shaun Purcell Boulder Twin Workshop March 2004

description

Continuous heterogeneity. Shaun Purcell Boulder Twin Workshop March 2004. MZ 1.03 0.870.98 DZ 0.95 0.571.08. Raw data VS summary statistics. ZygT1 T2 11.20.8 1-1.3-2.2 20.71.9 20.2-0.8 ...... Raw data VS summary statistics. ZygT1 T2 11.20.8 1-1.3-2.2 - PowerPoint PPT Presentation

Transcript of Continuous heterogeneity

Page 1: Continuous heterogeneity

Continuous heterogeneity

Shaun Purcell

Boulder Twin WorkshopMarch 2004

Page 2: Continuous heterogeneity

Raw data VS summary statistics

Zyg T1 T2

1 1.2 0.8

1 -1.3 -2.2

2 0.7 1.9

2 0.2 -0.8

.. ... ...

MZ

1.03

0.87 0.98

DZ

0.95

0.57 1.08

Page 3: Continuous heterogeneity

Raw data VS summary statistics

Zyg T1 T2

1 1.2 0.8

1 -1.3 -2.2

2 0.7 1.9

2 0.2 -0.8

.. ... ...

Page 4: Continuous heterogeneity

Raw data VS summary statistics

Zyg T1 T2 age

1 1.2 0.8 12.3

1 -1.3 -2.2 10.3

2 0.7 1.9 8.7

2 0.2 -0.8 14.5

.. ... ... ...

Page 5: Continuous heterogeneity

Variance

Bivariate normal distribution

-3 -2 -1 0 1 2 3 -3-2

-10

12

3

0.30.40.5

-3 -2 -1 0 1 2 3 -3-2

-10

12

3

0.30.40.5

Data Mean

Page 6: Continuous heterogeneity

Introducing Definition variables

• Zygosity as a definition variable

• “Rectangular” file data.raw

1 1 0.361769 -0.356412 1 0.888986 1.463423 1 0.535161 0.636073...1 2 0.234099 0.08483182 2 -0.547252 -0.229763 2 -0.307926 -0.253692...

Page 7: Continuous heterogeneity

!Using definition variablesGroup1: Defines MatricesCalc NGroups=2

Begin Matrices;X Lower 1 1 freeY Lower 1 1 freeZ Lower 1 1 freeM full 1 1 freeH Full 1 1

End Matrices;Begin Algebra;

A= X*X'; C = Y*Y'; E = Z*Z'; End Algebra;Ma X 0Ma Y 0 Ma Z 1 Ma M 0 Options MX%P=rawfit.txt

End

Group2: MZ & DZ twin pairsData NInput_vars=4 NObservations=0RE file=data.rawLabels id zyg t1 t2Select t1 t2 zyg /Definition zyg /Matrices = Group 1Means M | M /Covariances A + C + E | (H~)@A + C _ (H~)@A + C | A + C + E /Specify H -1End

H will be specified as a definition variable

M, necessary for the means model

Optional: request individualfit statistics for each pair

A single group for both MZ & DZ twins

Points to a “REctangular” data file

No need to specify number of pairs

Zygosity is a “Definition” variable

Multiply A component by 1/H

1 x 1 matrix H represents each pair’s zygosity

A model for the means [ twin1 | twin 2]

Page 8: Continuous heterogeneity

Output from zyg.mxRE FILE=DATA.RAW Rectangular continuous data read initiated NOTE: Rectangular file contained 500 records with data that contained a total of 2000 observations LABELS ID ZYG T1 T2 SELECT T1 T2 ZYG / DEFINITION ZYG / NOTE: Selection yields 500 data vectors for analysis NOTE: Vectors contain a total of 1500 observations NOTE: Definition yields 500 data vectors for analysis NOTE: Vectors contain a total of 1000 observations

Page 9: Continuous heterogeneity

Output from zyg.mx Summary of VL file data for group 2 ZYG T1 T2 Code -1.0000 1.0000 2.0000 Number 500.0000 500.0000 500.0000 Mean 1.5000 -0.0140 0.0240 Variance 0.2500 0.5601 0.5211 Minimum 1.0000 -2.1941 -1.9823 Maximum 2.0000 2.1218 2.7670

Page 10: Continuous heterogeneity

Output from zyg.mx MATRIX H This is a FULL matrix of order 1 by 1 1 1 -1 MATRIX M This is a FULL matrix of order 1 by 1 1 1 4 MATRIX X This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 1 MATRIX Y This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 2 MATRIX Z This is a LOWER TRIANGULAR matrix of order 1 by 1 1 1 3

Specify H -1

Page 11: Continuous heterogeneity

Output from zyg.mx

Your model has 4 estimated parameters and

1000 Observed statistics

-2 times log-likelihood of data >>> 2134.998

Degrees of freedom >>>>>>>>>>>>>>>> 996

• Fixing X to zero

Your model has 3 estimated parameters and 1000 Observed statistics

-2 times log-likelihood of data >>> 2154.626

Degrees of freedom >>>>>>>>>>>>>>>> 997

Page 12: Continuous heterogeneity

Continuous moderators

• Traits often best defined continuously

• Many environmental moderators also likely to be continuous in nature– Age

– Gestational age

– Socio-economic status

– Educational level

– Consumption of food / alcohol / drugs

• How to test for G x E interaction in this case?

Page 13: Continuous heterogeneity

Continuous moderators

• Problems?– Stratification of sample reduced sample size

– Modelling proportions of variance• implicitly assumes equality of variance w.r.t moderator

– Logical to assume a linear G E interaction• linearity at the level of effect, not variance

– No obvious statistical test for heterogeneity

Heritability

4 6 8 10Age (yrs)0%

100%

Page 14: Continuous heterogeneity

Biometrical G E model

• At a hypothetical single locus– additive genetic value a– allele frequency p– QTL variance 2p(1-p)a2

• Assuming a linear interaction– additive genetic value a + M– allele frequency p– QTL variance 2p(1-p)(a +

M)2

Page 15: Continuous heterogeneity

Biometrical G E model

M

No interaction

AaAA aa

a

0

-a

M

Interaction

1

-1

M

Equivalently…

21

1

Page 16: Continuous heterogeneity

Model-fitting approach to GxE

Twin 1

A C E

Twin 2

A C E

a c e c ea

Page 17: Continuous heterogeneity

Model-fitting approach to GxE

Twin 1

A C E

Twin 2

A C E

a+XM c e c ea+XM

Continuous moderator variable MCan be coded 0 / 1 in the dichotomous case

Page 18: Continuous heterogeneity

Individual specific moderators

Twin 1

A C E

Twin 2

A C E

a+XM1 c e c ea+XM2

Page 19: Continuous heterogeneity

E x E interactions

Twin 1

A C E

Twin 2

A C E

a+XM1

c+YM1

e+ZM1a+XM2

c+YM2

e+ZM2

Page 20: Continuous heterogeneity

ACE - XYZ - M

Twin 1

A C E

Twin 2

A C E

a+XM1

c+YM1

e+ZM1a+XM2

c+YM2

e+ZM2

M

m+MM1

M

m+MM2

Main effects and moderating effects statistically and conceptually distinct

Page 21: Continuous heterogeneity

Model-fitting approach to GxE

C

A

ECom

pone

nt o

f var

ianc

e

Moderator variable

Page 22: Continuous heterogeneity

Turkheimer et al (2003)

• 320 twin pairs recruited at birth from urban hospitals

• G : additive genetic variance

• E : SES – parental education, occupation, income

• X : IQ– Wechsler; Verbal, Performance, Full

Page 23: Continuous heterogeneity

A C EF

ull s

cale

IQ

Ver

bal

IQN

on-V

erba

lIQ

Page 24: Continuous heterogeneity

Standard model

• Means vector

• Covariance matrix

mm

22222

222

ecacZa

eca

Page 25: Continuous heterogeneity

Allowing for a main effect of X

• Means vector

• Covariance matrix

ii XmXm 21

22222

222

ecacZa

eca

Page 26: Continuous heterogeneity

! Basic model + main effect of a definition variableG1: Define MatricesData Calc NGroups=3Begin Matrices;A full 1 1 freeC full 1 1 freeE full 1 1 freeM full 1 1 free ! grand meanB full 1 1 free ! moderator-linked means modelH full 1 1R full 1 1 ! twin 1 moderator (definition variable)S full 1 1 ! twin 2 moderator (definition variable)End Matrices;Ma M 0Ma B 0Ma A 1Ma C 1Ma E 1Matrix H .5Options NO_OutputEnd

Page 27: Continuous heterogeneity

G2: MZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 1 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | A*A' + C*C' _A*A' + C*C' | A*A' + C*C' + E*E' /

!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2Options NO_OutputEnd

Page 28: Continuous heterogeneity

G3: DZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 2 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance A*A' + C*C' + E*E' | H@A*A' + C*C' _H@A*A' + C*C' | A*A' + C*C' + E*E' /

!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2

End

Page 29: Continuous heterogeneity

MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3228 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.3381 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1051 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9728 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1035 Your model has 5 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3123.925 Degrees of freedom >>>>>>>>>>>>>>>> 795

Page 30: Continuous heterogeneity

MATRIX A This is a FULL matrix of order 1 by 1 1 1 1.3078 MATRIX B This is a FULL matrix of order 1 by 1 1 1 0.0000 MATRIX C This is a FULL matrix of order 1 by 1 1 1 1.1733 MATRIX E This is a FULL matrix of order 1 by 1 1 1 0.9749 MATRIX M This is a FULL matrix of order 1 by 1 1 1 0.1069 Your model has 4 estimated parameters and 800 Observed statistics -2 times log-likelihood of data >>> 3138.157 Degrees of freedom >>>>>>>>>>>>>>>> 796

Page 31: Continuous heterogeneity

Continuous heterogeneity model

• Means vector

• Covariance matrix

ii XmXm 21

22

22

222121

21

21

21

)()()())(())((

)()()(

iZiYiXiYiYiXiX

iZiYiX

XeXcXaXcXcXaXaZ

XeXcXa

Page 32: Continuous heterogeneity

! GxE - Basic modelG1: Define MatricesData Calc NGroups=3Begin Matrices;A full 1 1 freeC full 1 1 freeE full 1 1 freeT full 1 1 free ! moderator-linked A componentU full 1 1 free ! moderator-linked C componentV full 1 1 free ! moderator-linked E componentM full 1 1 free ! grand meanB full 1 1 free ! moderator-linked means modelH full 1 1R full 1 1 ! twin 1 moderator (definition variable)S full 1 1 ! twin 2 moderator (definition variable)End Matrices;Ma T 0Ma U 0Ma V 0Ma M 0Ma B 0Ma A 1Ma C 1Ma E 1Matrix H .5Options NO_OutputEnd

Page 33: Continuous heterogeneity

G2: MZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 1 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | (A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _(A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) /

!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2Options NO_OutputEnd

Page 34: Continuous heterogeneity

G3: DZData NInput_vars=6 NObservations=0Missing =-999RE File=f1.datLabels id zyg p1 p2 m1 m2Select if zyg = 2 /Select p1 p2 m1 m2 /Definition m1 m2 /Matrices = Group 1Means M + B*R | M + B*S / Covariance (A+T*R)*(A+T*R) + (C+U*R)*(C+U*R) + (E+V*R)*(E+V*R) | H@(A+T*R)*(A+T*S) + (C+U*R)*(C+U*S) _H@(A+T*S)*(A+T*R) + (C+U*S)*(C+U*R) | (A+T*S)*(A+T*S) + (C+U*S)*(C+U*S) + (E+V*S)*(E+V*S) /

!twin 1 moderator variableSpecify R -1 !twin 2 moderator variableSpecify S -2

End

Page 35: Continuous heterogeneity

Practical 1

• The script: mod.mx• The data: f1.datID zygosity trait_twin_1 trait_twin_2 mod_twin_1 mod_twin_2

1. Any evidence for G × E for this trait ?• i.e. does the A latent variable show heterogeneity with

respect to the moderator variable

2. If so, in what way?• i.e. how would you interpret/describe the effect?

Page 36: Continuous heterogeneity

Practical 1 : f1.dat

Fra

ctio

n

v5-2.43686 2.45835

0

.095

Moderator distribution

v3

v4-4.68825 5.3401

-4.59727

5.46955

v3

v4-4.42047 5.43616

-5.59837

4.77361

MZ pairs (trait)

DZ pairs (trait)

v3

v5-2.43686 2.45835

-5.59837

5.46955

All twin 1’s v.s. moderator

Page 37: Continuous heterogeneity

nomod.mx

a 1.3078 a2 ~ 1.7

c 1.1733 c2 ~ 1.4

e 0.9749 e2 ~ 0.95

a2+c2+e2 = 4.05

i.e. % variance is 42%, 35% and 23%

Page 38: Continuous heterogeneity

Parameter estimates: mod.mxACE-XYZ-M ACE-YZ-M

A 1.2288 1.4455

C 0.9874 0.6837

E 0.9236 0.9484

T -0.6007

U 0.1763 -0.6817

V 0.3825 0.4663

M 0.0737 0.0724

B 0.367 0.3625

Page 39: Continuous heterogeneity

Plotting VCs

• For the additive genetic VC, for example– Given a, and a range of values for the

moderator variable

• For example, a = 0.5, = -0.2 and M ranges from -2 to +2

M (a+M)2 (a+M)2

-2 (0.5+(-0.2×-2))2 0.81

-1.5 (0.5+(-0.2×-1.5))2 0.73

+2 (0.5+(-0.2×2))2 0.01 -0.1

0.1

0.3

0.5

0.7

0.9

1.1

1.3

1.5

-2 -1.5 -1 -0.5 0 0.5 1 1.5 2

M

VC

Page 40: Continuous heterogeneity

0

1

2

3

4

5

6

7

8

9

10

-4 -3 -2 -1 0 1 2 3 4

Moderator

Va

ria

nc

e C

om

po

ne

nts

A

C

E

Page 41: Continuous heterogeneity

Specific test of G×E

-2LL Df

Full model

ACE-XYZ-M3024.689 792

Sub model

ACE-YZ-M3034.898 793

Difference 10.209 1

p-value = 0.00139

Page 42: Continuous heterogeneity

Other tests

Test Submodel -2LL Δdf p-value

Y ACE-XZ-M 3025.782 1 0.29

Z ACE-XY-M 3110.429 1 < 1×10-19

M ACE-XYZ 3039.370 1 0.00013

C & Y AE-XZ-M 3026.228 2 0.46

All made against the full model

ACE-XYZ-M, -2LL = 3024.689

Page 43: Continuous heterogeneity

Confidence intervals

• Easy to get CIs for individual parameters

• Additionally, CIs on the moderated VCs are useful for interpretation

• e.g. a 95% CI for (a+M)2, for a specific M

Page 44: Continuous heterogeneity

• Define two extra vectors in Group 1

P full 1 13 O Unit 1 13 Matrix P -3 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 2 2.5 3

• Add a 4th group to calculate the CIs

CIsCalc Matrices = Group 1Begin Algebra; F= ( A@O + T@P ) . ( A@O + T@P ) / G= ( C@O + U@P ) . ( C@O + U@P ) / I= ( E@O + V@P ) . ( E@O + V@P ) /End Algebra;Interval @ 95 F 1 1 to F 1 13Interval @ 95 G 1 1 to G 1 13 Interval @ 95 I 1 1 to I 1 13 End;

Page 45: Continuous heterogeneity

Calculation of CIs

F= ( A@O + T@P ) . ( A@O + T@P ) /

• E.g. if P were 210

210111 xathen ( A@O + T@P ) equals

xaxaa 2

222 2xaxaa

or

Finally, the dot-product squares all elements to give

xxaaa 20or

Page 46: Continuous heterogeneity

Confidence intervals on VCs

0

2

4

6

8

10

12

14

0

2

4

6

8

10

12

14

0

2

4

6

8

10

12

14

A C E

Page 47: Continuous heterogeneity

Other considerations

• Simple approach to test for heterogeneity – easily adapted, e.g. for ordinal data models

• Extensions / things to watch for…– scalar v.s. qualitative heterogeneity

• v. low power

– the environment may show shared genetic influence with the trait

– nonlinear effects in both mediation and moderation

Page 48: Continuous heterogeneity

X

E

G

Main effect

Moderating G E

rGE

Page 49: Continuous heterogeneity
Page 50: Continuous heterogeneity

Turkheimer et al, 2003

SES

IQ

SES

V(IQ)

Page 51: Continuous heterogeneity

Simulated twin data

Moderator Standard

Qua

drat

ic

E(T

rait)

A 3 df test of any moderating effect Standard analysis : linear means model (in HA and H0) Quadratic analysis : linear and quadratic means model (in HA and H0)

18/50 replicates significanti.e. type I error 36% for nominal 5% level

Page 52: Continuous heterogeneity

More complex G E interaction

E-risk

Trait P(disease)

Page 53: Continuous heterogeneity

Include E-risk in means model

E-risk

Residual Trait P(disease | E-risk)

Page 54: Continuous heterogeneity

Biometrical model

E-risk

Additive genetic effect

Quadratic form

AaAA aa

Page 55: Continuous heterogeneity

ACE - XYZ - X2Y2Z2 - M

Twin 1

A C E

Twin 2

A C E

a +XM1 +XM2

1

c e c ea+XM2

+XM22