October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S....
-
Upload
curtis-york -
Category
Documents
-
view
218 -
download
0
Transcript of October 2005Jax Workshop © Brian S. Yandell1 Bayesian Model Selection for Multiple QTL Brian S....
October 2005 Jax Workshop © Brian S. Yandell 1
Bayesian Model Selectionfor Multiple QTL
Brian S. YandellUniversity of Wisconsin-Madison
www.stat.wisc.edu/~yandell/statgen↑
Jackson Laboratory, October 2005
October 2005 Jax Workshop © Brian S. Yandell 2
outline
1. what is the goal of QTL study?
2. Bayesian priors & posteriors
3. model search using MCMC• Gibbs sampler and Metropolis-Hastings
4. model assessment• Bayes factors & model averaging
5. data examples in detail• plants & animals
October 2005 Jax Workshop © Brian S. Yandell 3
0 5 10 15 20 25 30
01
23
rank order of QTL
addi
tive
eff
ect
Pareto diagram of QTL effects
54
3
2
1
major QTL onlinkage map
majorQTL
minorQTL
polygenes
1. what is the goal of QTL study?
October 2005 Jax Workshop © Brian S. Yandell 4
interval mapping basics• observed measurements
– y = phenotypic trait– m = markers & linkage map– i = individual index (1,…,n)
• missing data– missing marker data– q = QT genotypes
• alleles QQ, Qq, or qq at locus
• unknown quantities = QT locus (or loci) = phenotype model parameters– H = QTL model/genetic architecture
• pr(q|m,,H) genotype model– grounded by linkage map, experimental cross– recombination yields multinomial for q given m
• pr(y|q,,H) phenotype model– distribution shape (assumed normal here) – unknown parameters (could be non-parametric)
observed X Y
missing Q
unknown after
Sen Churchill (2001)
y
q
m
October 2005 Jax Workshop © Brian S. Yandell 5
2. Bayesian priors & posteriors• augment data (y,m) with missing genotypes q• study model parameters (,) given data (y,m,q)
– properties of posterior pr(,,q | y,m )– average posterior over missing genotypes q
• pr(, | y,m ) = sumq posterior
• sample from posterior in some clever way– multiple imputation (Sen Churchill 2002)– Markov chain Monte Carlo (MCMC)
posterior = likelihood * prior / constant
)|(pr
)|(pr)(pr),|(pr),|(pr),|,,(pr
my
mmqqymyq
October 2005 Jax Workshop © Brian S. Yandell 6
Bayes posterior vs. maximum likelihood
• classical approach maximizes likelihood• Bayesian posterior averages over other parameters
qmqqymy
Cdmym
cmy
),|(pr),|(pr),,|(pr
})(pr),,|(pr)|(pr{log)(LPD
)},,|(pr{maxlog)(LOD
10
10
October 2005 Jax Workshop © Brian S. Yandell 7
0 50 100 150 200
0
2
4
6
8
10
12
Map position (cM)
lod
15
50
10
00
LOD and LPD: QTL at 100
0 50 100 150 200
-0.5
0.0
0.5
1.0
1.5
2.0
Map position (cM)
sub
stitu
tion
effe
ct
substitution effect
BC with 1 QTL: IM vs. BIM
black=BIMpurple=IM
2nd QTL? 2nd QTL?
blue=ideal
October 2005 Jax Workshop © Brian S. Yandell 8
how does phenotype y improve posterior for genotype q?
90
100
110
120
D4Mit41D4Mit214
Genotype
bp
AAAA
ABAA
AAAB
ABAB
what are probabilitiesfor genotype qbetween markers?
recombinants AA:AB
all 1:1 if ignore yand if we use y?
October 2005 Jax Workshop © Brian S. Yandell 9
posterior on QTL genotypes• full conditional of q given data, parameters
– proportional to prior pr(q | m, )• weight toward q that agrees with flanking markers
– proportional to likelihood pr(y|q,)• weight toward q with similar phenotype values
• phenotype and flanking markers may conflict– posterior recombination balances these two weights
• this is the E-step of EM computations
),,|(pr
),|(pr),|(pr),,,|(pr
my
mqqymyq
October 2005 Jax Workshop © Brian S. Yandell 10
prior & posteriors: genotypic means q
6 8 10 12 14 16
y = phenotype values
n la
rge
n la
rge
n s
ma
llp
rio
r
qq Qq QQ
data meandata means prior mean
October 2005 Jax Workshop © Brian S. Yandell 11
shrink posterior from sample genotypic mean toward overall meanpartition individuals into sets Sq by genotype qhyperparameter is related to heritability
prior:
posterior given data:
shrinkage factor:
prior & posteriors: genotypic means q
11
}{count,sum
,)1(~
,~
2
2
q
qqq
Sq
qqqqqq
q
n
nb
Snn
yy
nbybybN
yN
q
October 2005 Jax Workshop © Brian S. Yandell 12
what if variance 2 is unknown?• sample variance is proportional to chi-square
– ns2 / 2 ~ 2 ( n )– likelihood of sample variance s2 given n, 2
• conjugate prior is inverse chi-square 2 / 2 ~ 2 ( )– prior of population variance 2 given , 2
• posterior is weighted average of likelihood and prior– (2+ns2) / 2 ~ 2 ( +n )– posterior of population variance 2 given n, s2, , 2
• empirical choice of hyper-parameters 2= s2/3, =6– E(2 | ,2) = s2/2, Var(2 | ,2) = s4/4
October 2005 Jax Workshop © Brian S. Yandell 13
• phenotype affected by genotype & environmentpr(y|q,) ~ N(q , 2)
y = q + environment
q = 0 + sum{j in H} j(q)
number of terms in QTL model H 2nqtl (3nqtl for F2)
• partition genotypic mean into QTL effectsq = 0 + 1(q1) + 2(q2) + 12(q1,q2)
q = mean + main effects + epistatic interactions
• partition prior and posterior (details omitted)
multiple QTL phenotype model
October 2005 Jax Workshop © Brian S. Yandell 14
prior & posterior on QTL model H
• index model H by number of QTL• what prior on number of QTL?
– uniform over some range
– Poisson with prior mean
– geometric with prior mean
• prior influences posterior– good: reflects prior belief
• push data in discovery process
– bad: skeptic revolts!• “answer” depends on “guess”
e
e
e
e
ee
e e e e e
0 2 4 6 8 100.
000.
100.
200.
30
p
p
p p
p
p
pp
p p p
u u u u u u u
u u u u
m = number of QTL
prio
r pr
obab
ility
epu
exponentialPoissonuniform
October 2005 Jax Workshop © Brian S. Yandell 15
epistatic interactions
• model space issues– 2-QTL interactions only?– Fisher-Cockerham partition vs. tree-structured?– general interactions among multiple QTL
• model search issues– epistasis between significant QTL
• check all possible pairs when QTL included?• allow higher order epistasis?
– epistasis with non-significant QTL• whole genome paired with each significant QTL?• pairs of non-significant QTL?
• Yi Xu (2000) Genetics; Yi, Xu, Allison (2003) Genetics; Yi (2004)
October 2005 Jax Workshop © Brian S. Yandell 16
3. QTL Model Search using MCMC• construct Markov chain around posterior
– want posterior as stable distribution of Markov chain– in practice, the chain tends toward stable distribution
• initial values may have low posterior probability• burn-in period to get chain mixing well
• update QTL model components from full conditionals– update locus given q,H (using Metropolis-Hastings step)– update genotypes q given ,,y,H (using Gibbs sampler)– update effects given q,y,H (using Gibbs sampler)– update QTL model H given ,,y,q (using Gibbs or M-H)
NHqHqHq
XYHqHq
),,,(),,,(),,,(
),|,,,(pr~),,,(
21
October 2005 Jax Workshop © Brian S. Yandell 17
Gibbs sampler idea• two correlated normals (genotypic means in BC)
– could draw samples from both together– but easier to sample one at a time
• Gibbs sampler:– sample each from its full conditional– pick order of sampling at random– repeat N times
2
QQQQQq
2QqQqQQ
QqQQQqQQ
1,~given
1,~given
),(but )1,0(~,
N
N
corN
October 2005 Jax Workshop © Brian S. Yandell 18
Gibbs sampler samples: = 0.6
0 10 20 30 40 50
-2-1
01
2
Markov chain index
Gib
bs:
mea
n 1
0 10 20 30 40 50
-2-1
01
23
Markov chain index
Gib
bs:
mea
n 2
-2 -1 0 1 2
-2-1
01
23
Gibbs: mean 1
Gib
bs:
mea
n 2
-2 -1 0 1 2
-2-1
01
23
Gibbs: mean 1
Gib
bs:
mea
n 2
0 50 100 150 200
-2-1
01
23
Markov chain index
Gib
bs:
mea
n 1
0 50 100 150 200
-2-1
01
2
Markov chain index
Gib
bs:
mea
n 2
-2 -1 0 1 2 3
-2-1
01
2
Gibbs: mean 1
Gib
bs:
mea
n 2
-2 -1 0 1 2 3
-2-1
01
2
Gibbs: mean 1
Gib
bs:
mea
n 2
N = 50 samples N = 200 samples
October 2005 Jax Workshop © Brian S. Yandell 19
How to sample a locus ?• cannot easily sample from locus full conditional
pr( | m,q) = pr() pr(q | m,) / constant• constant determined by averaging
– over all possible genotypes
– over entire map
• Gibbs sampler will not work in general– but can use method based on ratios of probabilities
– Metropolis-Hastings is extension of Gibbs sampler
October 2005 Jax Workshop © Brian S. Yandell 20
Metropolis-Hastings idea• want to study distribution f()• take Monte Carlo samples
– unless too complicated• pick an arbitrary distribution g(, *) • draw Metropolis-Hastings samples:
– current sample value – propose new value * from g(, *)– accept new value with prob A
• Gibbs sampler is special case– g(, *) = f(*)– always accept new proposal: A = 1
),()(
),()(,1min
*
**
gf
gfA
0 2 4 6 8 10
0.0
0.2
0.4
-4 -2 0 2 4
0.0
0.2
0.4
f()
g(–*)
October 2005 Jax Workshop © Brian S. Yandell 21
MCMC realization
0 2 4 6 8 10
020
0040
0060
0080
0010
000
mcm
c se
quen
ce
2 3 4 5 6 7 8
0.0
0.1
0.2
0.3
0.4
0.5
0.6
pr(
|Y)
added twist: occasionally propose from whole domain
October 2005 Jax Workshop © Brian S. Yandell 22
Metropolis-Hastings samples
0 2 4 6 8
050
150
mcm
c se
quen
ce
0 2 4 6 8
02
46
pr(
|Y)
0 2 4 6 8
050
150
mcm
c se
quen
ce
0 2 4 6 8
0.0
0.4
0.8
1.2
pr(
|Y)
0 2 4 6 8
040
080
0
mcm
c se
quen
ce
0 2 4 6 8
0.0
1.0
2.0
pr(
|Y)
0 2 4 6 8
040
080
0
mcm
c se
quen
ce
0 2 4 6 8
0.0
0.2
0.4
0.6
pr(
|Y)
N = 200 samples N = 1000 samplesnarrow g wide g narrow g wide g
October 2005 Jax Workshop © Brian S. Yandell 23
sampling across QTL models H
action steps: draw one of three choices• update QTL model H with probability 1-b(H)-d(H)
– update current model using full conditionals– sample QTL loci, effects, and genotypes
• add a locus with probability b(H)– propose a new locus along genome– innovate new genotypes at locus and phenotype effect– decide whether to accept the “birth” of new locus
• drop a locus with probability d(H)– propose dropping one of existing loci– decide whether to accept the “death” of locus
0 L1 m+1 m2 …
October 2005 Jax Workshop © Brian S. Yandell 24
reversible jump MCMC
• consider known genotypes q at 2 known loci – models with 1 or 2 QTL
• M-H step between 1-QTL and 2-QTL models– model changes dimension (via careful bookkeeping)
– consider mixture over QTL models H
eqqYnqtl
eqYnqtl
)()(:2
)(:1
22110
110
October 2005 Jax Workshop © Brian S. Yandell 25
Gibbs sampler with loci indicators
• partition genome into intervals– at most one QTL per interval– interval = 1 cM in length– assume QTL in middle of interval
• use loci to indicate presence/absence of QTL in each interval = 1 if QTL in interval = 0 if no QTL
• Gibbs sampler on loci indicators– see work of Nengjun Yi (and earlier work of Ina Hoeschele)
eqqY )()(1221110
October 2005 Jax Workshop © Brian S. Yandell 26
Bayesian shrinkage estimation
• soft loci indicators– strength of evidence for j depends on variance of j
– similar to > 0 on grey scale • include all possible loci in model
– pseudo-markers at 1cM intervals• Wang et al. (2005 Genetics)
– Shizhong Xu group at U CA Riverside
chisquare-inverse~),,0(~)(
...)()(
22
12110
jjjjNq
eqqY
October 2005 Jax Workshop © Brian S. Yandell 27
0.05 0.10 0.15
0.0
0.05
0.10
0.15
b1
b2
a short sequence
-0.3 -0.1 0.1
0.0
0.1
0.2
0.3
0.4
first 1000 with m<3
b1
b2
-0.2 0.0 0.2
geometry of effect samples(collinearity of QTL yields correlated estimates)
1 1
2
2
October 2005 Jax Workshop © Brian S. Yandell 28
4. Model Assessment
• balance model fit against model complexity
smaller model bigger modelmodel fit miss key features fits betterprediction may be biased no biasinterpretation easier more complicatedparameters low variance high variance
• information criteria: penalize L by model size |H|– compare IC = – 2 log L( H | y ) + penalty( H )
• Bayes factors: balance posterior by prior choice– compare pr( data y | model H )
October 2005 Jax Workshop © Brian S. Yandell 29
QTL Bayes factors• BF = posterior odds / prior odds• BF equivalent to BIC
– simple comparison: 1 vs 2 QTL• same as LOD test
– general comparison of models– want Bayes factor >> 1
• nqtl = number of QTL– indexes model complexity– genetic architecture also important
)1(pr)data1(pr
)(pr)data(pr1 nqtl/|nqtl
nqtl/nqtl|BFnqtl,nqtl
e
e
e
e
ee
e e e e e
0 2 4 6 8 10
0.00
0.10
0.20
0.30
p
p
p p
p
p
pp
p p p
u u u u u u u
u u u u
m = number of QTL
prio
r pr
obab
ility
epu
exponentialPoissonuniform
October 2005 Jax Workshop © Brian S. Yandell 30
Bayes factors to assess models•Bayes factor: which model best supports the data?
– ratio of posterior odds to prior odds– ratio of model likelihoods
•equivalent to LR statistic when– comparing two nested models– simple hypotheses (e.g. 1 vs 2 QTL)
•Bayes Information Criteria (BIC)– Schwartz introduced for model selection in general settings– penalty to balance model size (p = number of parameters)
)log()()log(2)log(2
)model|(pr
)model|(pr
)model(pr/)model(pr
)|model(pr/)|model(pr
1212
2
1
21
2112
nppLRB
Y
YYYB
October 2005 Jax Workshop © Brian S. Yandell 31
simulations and data studies•simulated F2 intercross, 8 QTL
– (Stephens, Fisch 1998)– n=200, heritability = 50%– detected 3 QTL
•increase to detect all 8– n=500, heritability to 97%
QTL chr loci effect
1 1 11 –32 1 50 –53 3 62 +24 6 107 –35 6 152 +36 8 32 –47 8 54 +18 9 195 +2
7 8 9 10 11 12 13
010
2030
40
number of QTL
fre
qu
en
cy in
%0 50 100 150 200
Chr
omos
ome
ch1ch2ch3ch4ch5
ch6ch7ch8ch9
ch10
Genetic map
posterior
October 2005 Jax Workshop © Brian S. Yandell 32
loci pattern across genome• notice which chromosomes have persistent loci• best pattern found 42% of the time
Chromosome
m 1 2 3 4 5 6 7 8 9 10 Count of 8000
8 2 0 1 0 0 2 0 2 1 0 3371
9 3 0 1 0 0 2 0 2 1 0 751
7 2 0 1 0 0 2 0 1 1 0 377
9 2 0 1 0 0 2 0 2 1 0 218
9 2 0 1 0 0 3 0 2 1 0 218
9 2 0 1 0 0 2 0 2 2 0 198
October 2005 Jax Workshop © Brian S. Yandell 33
B. napus 8-week vernalizationwhole genome study
• 108 plants from double haploid– similar genetics to backcross: follow 1 gamete– parents are Major (biennial) and Stellar (annual)
• 300 markers across genome– 19 chromosomes– average 6cM between markers
• median 3.8cM, max 34cM– 83% markers genotyped
• phenotype is days to flowering– after 8 weeks of vernalization (cooling)– Stellar parent requires vernalization to flower
• available in R/bim package• Ferreira et al. (1994); Kole et al. (2001); Schranz et al. (2002)
October 2005 Jax Workshop © Brian S. Yandell 34
Markov chain Monte Carlo sequence
burnin (sets up chain)mcmc sequence
number of QTLenvironmental varianceh2 = heritability(genetic/total variance)LOD = likelihood
-50000 -40000 -30000 -20000 -10000 0
05
1015
2025
burnin sequence
nqtl
-50000 -40000 -30000 -20000 -10000 00.
006
0.01
00.
014
burnin sequenceen
vvar
-50000 -40000 -30000 -20000 -10000 0
0.0
0.2
0.4
0.6
burnin sequence
herit
-50000 -40000 -30000 -20000 -10000 0
05
1015
20
burnin sequence
LOD
0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06
24
68
10
mcmc sequence
nqtl
0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06
0.00
40.
010
0.01
6
mcmc sequence
envv
ar
0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+06
0.1
0.3
0.5
mcmc sequence
herit
0 e+00 2 e+05 4 e+05 6 e+05 8 e+05 1 e+065
1015
20mcmc sequence
LOD
October 2005 Jax Workshop © Brian S. Yandell 35
MCMC sampled loci
subset of chromosomes N2, N3, N16
points jittered for viewblue lines at markers
note concentrationon chromosome N2
includes all models
020
4060
8010
012
0
N2 N3 N16chromosome
MC
MC
sam
pled
loci
ec2e5a
E33M49.117
ec3b12wg2d11b
wg1g8a
E32M49.73wg3g11
ec3a8
wg7f3a
E33M59.59
E35M62.117wg6b10
wg8g1b
wg5a5
tg6a12
Aca1E38M50.133
E35M59.117
ec2d1a
wg1a10tg2h10b
tg2f12
ec3e12b
wg5a1bwg2a3c
ec5e12awg7f5bE32M47.159
tg5d9bslg6E35M59.85
E33M49.491E38M62.229wg1g10b
ec4h7
wg4d10
E32M50.409
E38M50.159
E33M47.338ec2d8a
E33M49.165wg4f4cE35M48.148wg6c6ec4g7bwg7a11
wg5b1aec4e7aE32M61.218
wg4a4bE33M50.183E33M62.140
wg9c7
wg6b2
E33M49.211wg2e12bisoIdh
wg3f7c
October 2005 Jax Workshop © Brian S. Yandell 36
Bayesian model assessment
row 1: # QTLrow 2: pattern
col 1: posteriorcol 2: Bayes factornote error bars on bf
evidence suggests 4-5 QTL N2(2-3),N3,N16
1 3 5 7 9 110.
00.
10.
20.
3number of QTL
QT
L po
ster
ior
QTL posterior
15
5050
0
number of QTL
post
erio
r /
prio
r
Bayes factor ratios
1 3 5 7 9 11
weak
moderate
strong
1 3 5 7 9 11 13
0.0
0.1
0.2
0.3
2*2
2:2,
123:
2*2,
122:
2,13
2:2,
32:
2,16
2:2,
113:
2*2,
32:
2,15
4:2*
2,3,
163:
2*2,
132:
2,14
2
model index
mod
el p
oste
rior
pattern posterior
5
e-01
1
e+01
5
e+02
model indexpo
ster
ior
/ pr
ior
Bayes factor ratios
1 3 5 7 9 11 13
12
23
2 22
23
24
32
weak
moderate
strong
October 2005 Jax Workshop © Brian S. Yandell 37
Bayesian estimates of loci & effectsmodel averaging: at least 4 QTL
histogram of lociblue line is densityred lines at estimates
estimate additive effects (red circles)grey points sampled from posteriorblue line is cubic splinedashed line for 2 SD
0 50 100 150 200 2500.
000.
020.
040.
06lo
ci h
isto
gram
napus8 summaries with pattern 1,1,2,3 and m 4
N2 N3 N16
0 50 100 150 200 250
-0.0
8-0
.02
0.02
addi
tive
N2 N3 N16
October 2005 Jax Workshop © Brian S. Yandell 38
Bayesian model diagnostics
pattern: N2(2),N3,N16col 1: densitycol 2: boxplots by m
environmental variance 2 = .008, = .09heritability h2 = 52%LOD = 16(highly significant)
but note change with m
0.004 0.006 0.008 0.010 0.012
050
100
200
dens
ity
marginal envvar, m 44 5 6 7 8 9 11 12
0.00
60.
008
0.01
0
envvar conditional on number of QTL
envv
ar
0.2 0.3 0.4 0.5 0.6 0.7
01
23
45
dens
ity
marginal heritability, m 44 5 6 7 8 9 11 12
0.30
0.40
0.50
0.60
heritability conditional on number of QTL
herit
abili
ty5 10 15 20 25
0.00
0.04
0.08
0.12
dens
ity
marginal LOD, m 44 5 6 7 8 9 11 12
1012
1416
1820
LOD conditional on number of QTLLO
D
October 2005 Jax Workshop © Brian S. Yandell 39
studying diabetes in an F2
• segregating cross of inbred lines– B6.ob x BTBR.ob F1 F2– selected mice with ob/ob alleles at leptin gene (chr 6)– measured and mapped body weight, insulin, glucose at various ages
(Stoehr et al. 2000 Diabetes)– sacrificed at 14 weeks, tissues preserved
• gene expression data– Affymetrix microarrays on parental strains, F1
• key tissues: adipose, liver, muscle, -cells• novel discoveries of differential expression (Nadler et al. 2000 PNAS; Lan et
al. 2002 in review; Ntambi et al. 2002 PNAS)– RT-PCR on 108 F2 mice liver tissues
• 15 genes, selected as important in diabetes pathways• SCD1, PEPCK, ACO, FAS, GPAT, PPARgamma, PPARalpha, G6Pase, PDI,
…
October 2005 Jax Workshop © Brian S. Yandell 40
0 50 100 150 200 250 300
02
46
8LO
D
chr2 chr5 chr9
0 50 100 150 200 250 300
-0.5
0.0
0.5
1.0
effe
ct (
add=
blue
, do
m=
red)
chr2 chr5 chr9
Multiple Interval Mapping (QTLCart)SCD1: multiple QTL plus epistasis!
October 2005 Jax Workshop © Brian S. Yandell 41
Bayesian model assessment:number of QTL for SCD1
1 2 3 4 5 6 7 8 9 11 13
0.00
0.05
0.10
0.15
0.20
0.25
number of QTL
QT
L po
ster
ior
QTL posterior
15
1050
500
number of QTL
post
erio
r /
prio
r
Bayes factor ratios
1 2 3 4 5 6 7 8 9 11 13
weak
moderate
strong
October 2005 Jax Workshop © Brian S. Yandell 42
Bayesian LOD and h2 for SCD1
0 5 10 15 20
0.00
0.10
dens
ity
marginal LOD, m 11 2 3 4 5 6 7 8 9 10 12 14
510
1520
LOD conditional on number of QTL
LOD
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
01
23
4
dens
ity
marginal heritability, m 11 2 3 4 5 6 7 8 9 10 12 14
0.1
0.3
0.5
heritability conditional on number of QTL
herit
abili
ty
October 2005 Jax Workshop © Brian S. Yandell 43
Bayesian model assessment:chromosome QTL pattern for SCD1
1 3 5 7 9 11 13 15
0.00
0.05
0.10
0.15
4:1,
2,2*
34:
1,2*
2,3
5:3*
1,2,
35:
2*1,
2,2*
35:
2*1,
2*2,
36:
3*1,
2,2*
36:
3*1,
2*2,
35:
1,2*
2,2*
36:
4*1,
2,3
6:2*
1,2*
2,2*
32:
1,3
3:2*
1,2
2:1,
2
3:1,
2,3
4:2*
1,2,
3
model index
mod
el p
oste
rior
pattern posterior
0.2
0.4
0.6
0.8
model index
post
erio
r /
prio
r
Bayes factor ratios
1 3 5 7 9 11 13 15
3 44 4
55 5
6 65
66
23
2
weak
moderate
October 2005 Jax Workshop © Brian S. Yandell 44
trans-acting QTL for SCD1(no epistasis yet: see Yi, Xu, Allison 2003)
dominance?
October 2005 Jax Workshop © Brian S. Yandell 45
2-D scan: assumes only 2 QTL!
epistasisLODpeaks
jointLODpeaks
October 2005 Jax Workshop © Brian S. Yandell 46
sub-peaks can be easily overlooked!
October 2005 Jax Workshop © Brian S. Yandell 47
epistatic model fit
October 2005 Jax Workshop © Brian S. Yandell 48
Cockerham epistatic effects
October 2005 Jax Workshop © Brian S. Yandell 49
marginal heritability by locus
black = totalblue = additivered = dominantpurple = add x add
October 2005 Jax Workshop © Brian S. Yandell 50
marginal LOD by locus
black = totalblue = additivered = dominantpurple = add x addgreen = scanone
October 2005 Jax Workshop © Brian S. Yandell 51
Bayesian & classical (HK)
October 2005 Jax Workshop © Brian S. Yandell 52
sex-adjusted anova for SCD1(only 3-way sex interactions significant)
Summary for fit QTL Method is: imp Number of observations: 107
Full model result---------------------------------- Model formula is: y ~ sex * (Q1 + Q2 + Q3 + Q1:Q3 + Q2:Q3)
df SS MS LOD %var Pvalue(Chi2) Pvalue(F)Model 29 109.02204 3.7593806 25.79574 67.05143 7.869261e-13 1.592058e-09Error 77 53.57261 0.6957482 Total 106 162.59465
Drop one QTL at a time ANOVA table: ---------------------------------- df Type III SS LOD %var F value Pvaluesex 15 36.369 12.039 22.368 3.485 0.000154Chr2@105 12 41.248 13.266 25.369 4.940 5.38e-06 Chr5@67 12 39.771 12.901 24.460 4.764 8.93e-06Chr9@15 20 59.542 17.365 36.620 4.279 1.87e-06 Chr2@105:Chr9@15 8 33.637 11.322 20.687 6.043 5.11e-06 Chr5@67:Chr9@15 8 30.042 10.344 18.476 5.397 2.12e-05sex:Chr2@105 6 18.499 6.892 11.377 4.431 0.000669sex:Chr5@67 6 18.130 6.773 11.151 4.343 0.000793sex:Chr9@15 10 25.945 9.176 15.957 3.729 0.000413 sex:Chr2@105:Chr9@15 4 17.444 6.549 10.729 6.268 0.000202sex:Chr5@67:Chr9@15 4 15.728 5.981 9.673 5.652 0.000483
October 2005 Jax Workshop © Brian S. Yandell 53
anova for SCD1(sex terms n.s.)
Model:y ~ Chr2@105 + Chr5@67 + Chr9@67 + Chr2@80 + Chr9@67^2 + Chr2@105:Chr9@67 + Chr5@67:Chr9@67^2
Df Sum Sq Mean Sq F value Pr(>F) Model 7 69.060 69.060 73.095 < 2.2e-16 ***Error 99 93.535 0.945 Total 106 162.595 70.005
Single term deletions Df Sum of Sq RSS AIC F value Pr(F) <none> 93.535 22.992 Chr2@80 1 9.073 102.608 28.225 9.6036 0.002528 **
Chr2@105 1 0.073 93.607 18.402 0.0769 0.782140 Chr5@67 1 0.218 93.753 18.568 0.2307 0.632084 Chr9@67 1 7.156 100.691 26.207 7.5746 0.007042 ** Chr9@67^2 1 0.106 93.641 18.440 0.1123 0.738200 Chr2@105:Chr9@67 1 15.612 109.147 34.836 16.5246 9.64e-05 ***Chr5@67:Chr9@67^2 1 7.211 100.745 26.265 7.6318 0.006838 **
October 2005 Jax Workshop © Brian S. Yandell 54
epistatic interaction plots(chr 5x9 p=0.007; chr 2x9 p<0.0001)
mostly axd mostly axa
October 2005 Jax Workshop © Brian S. Yandell 55
SCD1 on chr2x9 by sex
October 2005 Jax Workshop © Brian S. Yandell 56
obesity in CAST/Ei BC onto M16i
• 421 mice (Daniel Pomp)– (213 male, 208 female)
• 92 microsatellites on 19 chromosomes– 1214 cM map
• subcutaneous fat pads– pre-adjusted for sex and dam effects
• Yi, Yandell, Churchill, Allison, Eisen, Pomp (2005) Genetics (in press)
October 2005 Jax Workshop © Brian S. Yandell 57
non-epistatic analysis
05
1015
20
LOD
sco
re
050
100
150
200
250
300
Baye
s fa
ctor
single QTL LOD profile multiple QTLBayes factor profile
October 2005 Jax Workshop © Brian S. Yandell 58
posterior profile of main effectsin epistatic analysis
-0.8
-0.4
0.0
0.2
0.4
Mai
n ef
fect
0.00
0.05
0.10
0.15
0.20
Her
itabi
lity
010
2030
4050
Bay
es fa
cto
r
main effects & heritability profile Bayes factor profile
October 2005 Jax Workshop © Brian S. Yandell 59
posterior profile of main effectsin epistatic analysis
October 2005 Jax Workshop © Brian S. Yandell 60
model selectionvia
Bayes factorsfor
epistatic model
number of QTL
QTL pattern
October 2005 Jax Workshop © Brian S. Yandell 61
posterior probability of effects
Posterior probability
0.0 0.2 0.4 0.6 0.8 1.0
Chr2(72,85)
Chr13(20,42)
Chr15(1,31)
Chr18(43,71)
Chr1(26,54)
Chr19(15,45)
Chr7(50,75)
Chr14(12,41)
Chr1(26,54)*Chr18(43,71)
Chr2(72,85)*Chr13(20,42)
Chr15(1,31)*Chr19(15,45)
Chr2(72,85)*Chr14(12,41)
Chr7(50,75)*Chr19(15,45)
Chr13(20,42)*Chr15(1,31)
October 2005 Jax Workshop © Brian S. Yandell 62
scatterplot estimates of epistatic loci
October 2005 Jax Workshop © Brian S. Yandell 63
stronger epistatic effects
October 2005 Jax Workshop © Brian S. Yandell 64
Bayesian software for QTLs• R/bim: Bayesian IM (Satagopan Yandell 1996; Gaffney 2001)*
– www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl– www.r-project.org contributed package to CRAN– version available within WinQTLCart (statgen.ncsu.edu/qtlcart)
• R/bmqtl: Bayesian Multiple QTL (N Yi, T Mehta & BS Yandell)*– epistasis, new MCMC algorithms, extensive graphics– extension of R/bim; due out Fall 2005
• R/qtl (Broman et al. 2003 Bioinformatics)*– biosun01.biostat.jhsph.edu/~kbroman/software– www.r-project.org contributed package
• Pseudomarker (Sen, Churchill 2002 Genetics)– www.jax.org/staff/churchill/labsite/software
• Bayesian QTL / Multimapper– Sillanpää Arjas (1998)– www.rni.helsinki.fi/~mjs
• Stephens & Fisch (1998 Biometrics)• R/bqtl (C Berry, hacuna.ucsd.edu/bqtl)
* Hao Wu, Jackson Labs, provide crucial computing support
October 2005 Jax Workshop © Brian S. Yandell 65
R/bim: our software• www.stat.wisc.edu/~yandell/qtl/software/Bmapqtl
– R contributed library (www.r-project.org)• library(bim) is cross-compatible with library(qtl)
– Bayesian module within WinQTLCart• WinQTLCart output can be processed using R library
• Software history– initially designed by JM Satagopan (1996)– major revision and extension by PJ Gaffney (2001)
• whole genome• multivariate update of effects; long range position updates• substantial improvements in speed, efficiency• pre-burnin: initial prior number of QTL very large
– upgrade (H Wu, PJ Gaffney, CF Jin, BS Yandell 2003)– epistasis in progress (H Wu, BS Yandell, N Yi 2004)
October 2005 Jax Workshop © Brian S. Yandell 66
many thanks
Jackson Labs
Gary Churchill
Hao Wu
U AL Birmingham
David Allison
Nengjun Yi
Tapan Mehta
Michael Newton
Daniel Sorensen
Daniel Gianola
Liang Li
my studentsJaya Satagopan
Fei Zou
Patrick Gaffney
Chunfang Jin
Elias Chaibub Neto
USDA Hatch, NIH/NIDDK (Attie), NIH/R01 (Yi)
Tom Osborn
David Butruille
Marcio Ferrera
Josh Udahl
Pablo Quijada
Alan Attie
Jonathan Stoehr
Hong Lan
Susie Clee
Jessica Byers